Don’t call it AI: Turn words into numbers with quantitative ethnography
Summary
Quantitative ethnography is the niche subfield you’ve never heard of, but it’s one you’ve been increasingly pressured to practice in over the past couple of years. It’s the math that turns words into numbers underlying generative AI, and LLMs have been getting in between you and a radically new approach to working with verbatims, transcripts, and other texts. Business stakeholders are always pushing for greater efficiency, faster turnarounds. Qualitative researchers are always looking for more contact with users, and greater engagement with findings and reporting. Quantitative ethnography (and epistemic network analysis) offers a compromise: by trading structure and semantics for human sensemaking in the analysis part of research, perhaps both groups can get what they want. I’ve had the opportunity to conduct quantitative ethnographic analyses in enterprise studies involving dozens of products, and impacting hundreds of thousands of end-users. Stakeholders were willing to accept a different kind of analysis, and engage more deeply with the process, in exchange for quicker answers. In this talk, I’ll share how quantitative ethnography differs from qualitative ethnography, the tradeoffs you’ll have to make, and the kinds of results you can expect. This isn’t a tools talk, but you won’t need to do any math, either. I’ll close with a look into the near future, one where you can talk with as many users as will take your call with effectively zero additional analysis work; where you can have the analysis running live during your session, and have the user participate in the sensemaking process on-the-fly; and the dream of every product manager, one where stakeholders can have dashboards of evidence updated live as users talk.
Key Insights
-
•
Quantitative ethnography unifies qualitative ethnographic methods with quantitative statistical validation, avoiding typical mixed-methods back-and-forth.
-
•
Formalizing coding rules in a detailed code book is essential to scale qualitative insights and enable automation.
-
•
Defining mechanistic signifiers, such as keywords or phrase rules, is necessary to automate qualitative coding effectively.
-
•
Intra-sample statistical analysis uses each coded line as a data point rather than each respondent, enabling meaningful stats from small sample sizes.
-
•
Partnering with data scientists is critical because quantitative ethnography requires specialized, adjusted statistical methods that differ from conventional ones.
-
•
Researchers must regularly validate coding accuracy and statistical assumptions over time, a process called closing the interpretive loop.
-
•
Quantitative ethnography can scale from a handful of interviews to thousands of verbatim responses, maintaining rigor at all scales.
-
•
Epistemic network analysis helps identify and quantify relationships between qualitative codes within the text data.
-
•
Large language models can automate parts of quantitative ethnography but require sacrificing some control over code definitions and initial synthesis.
-
•
Quantitative ethnography opens the possibility for near-real-time insights by automating coding and saturation metrics during ongoing data collection.
Notable Quotes
"Business stakeholders push researchers for faster turnarounds and numbers, often favoring surveys over deep interviews."
"Quantitative ethnography isn’t mixed methods; it’s a unified method using both qualitative theory and quantitative validation."
"If you can’t come up with a rule for something, you can’t code it."
"Each coded line is a data point, which enables statistical power even with small numbers of respondents."
"Partner with data scientists to pick and adjust statistical tests because quantitative ethnography requires new assumptions."
"Closing the interpretive loop means regularly checking that your coding and stats hold up as new data arrives."
"Epistemic network analysis reveals meaningful connections between codes, suggesting but not proving why ideas cluster."
"Large language models cluster text using semantic relationships rather than shared vocabulary like traditional QDA."
"Using generative AI math lets you skip stats, but you lose control over what codes start your synthesis."
"If rules and stats update in real time, you could know when saturation is reached as data streams in."
Or choose a question:
More Videos
"If we deeply understand outcomes and advocate for impact, we’ll be the ones making real change happen in the enterprise."
Greg PetroffSoftware as Material—A Redux
June 6, 2023
"They walked. The locals were most likely right."
Brendan JarvisFraming Tomorrow by Questioning Today
June 8, 2022
"Changing the mindset takes time; like a big ship it takes hours to turn direction."
Prerna MakanawalaAchieving Balanced Design Consistency
June 9, 2021
"The user identity is a digital construct that commodifies humans for their data."
Tricia WangFrom Users to Shapers of AI: The Future of Research
March 25, 2024
"Duolingo’s AI push damaged immersion by breaking narrative cohesion and lowering course quality."
Cheryl PlatzEmbrace Your Fun Factor: Game Development Best Practices for Product Design
January 9, 2026
"We still have so much more work to do and things to learn."
Ariel KennanTheme Two Intro
November 17, 2022
"Representation is key to avoiding harm and misinterpretation, especially when working with Indigenous and marginalized communities."
Tricia WangThe most popular design thinking strategy is BS
January 27, 2022
"The state of IT is a daily reflection of what the company thinks and feels about its employees."
Kristin WisnewskiMeasuring What Matters
October 23, 2019
"It’s not move fast and break things; it’s slow and steady."
Scott Jensen Sarah Delaney Carmen LiuShort Take #2: UX/Product Lessons from Your Industry Peers
December 6, 2022