Preparation Podcast Research

Core Prep:
Exploring
Synthetic Data's
Role in XM

After identifying just how crucial synthetic data is becoming for research teams across the world, I wanted deeper insights from veteran Qualtrics employees,

finding in-depth conversations with leaders Ali Henriques and Isabelle Zdatny.

These episodes solidified my decision to focus my technical project on attempting to model synthetic data generation and exploring the insights it provided.

🌮 I'm serious - I actually deliver. 🌮

You can get tacos delivered to the Qualtrics Provo office.

On me. Freshly picked from Taco Bell.

But not just tacos. TACOS tacos.

The kind whose main ingredients are Transparency, being All-in,Customer-Obsession, being One-Team, and Scrappiness.

Hopefully you can tell I'm all in.

And I'll continue to deliver, day after day.

Just with less cheese and hot sauce, and more 11/10

effort across every project.

GET TACOS

#616
"Exploring synthetic personas
in market research with Ali Henriques"

After listening to these two episodes and reviewing my notes from the 2025 trend reports, it's very clear that even as we see incredible benefits from synthetic data, it would be a mistake to say that it's still in its infancy. Demand is high, use cases abound, and we can look forward to continued algorithmic progress in AI that will only increase synthetic data's ability to quickly and accurately capture core insights. Good, fast, and cheap are more or less practical realities thanks to this data.

It's obvious that it provides incredible value (in the right use cases), and I have no doubt that cutting-edge research teams and forward-thinking companies will increasingly view investments in this area as absolutely vital to their performance. I am fascinated by this wild ability to close gaps in research.

Big picture: synthetic data is taking human-based data and combining it with publicly available data to fill crucial gaps in research efforts

Current Edge team is split between a legacy market research agency and new approaches with synthetic data

It's basically like a "digital twin"

You're taking human-based data (any info available on-hand or on the internet) and using that to model a response to a question
Is a combination of record-level and real-level data
Ex: can be extremely granular depending on availability of data: if I have a demo of 18-24 year olds, what will they think of this new salad concept?

Rise of synthetic data usage

Balance of good, fast, and cheap - get more output from your input
Some audiences are hard to reach or get feedback from (need embedded, in-field team to get survey responses from people)
Can reduce risk when testing new concepts (private, contained env't)
Very helpful in heavily-regulated areas where privacy is a concern

Overall accuracy: ~80%!

LLMS are more neutral, but are effective when comparing mean scores/basic validity measures (what a survey is designed to measure)
They'll cycle between 5-6 measures to determine if a model is good enough to us

Addressing research teams' hesitancy around synthetic data usage

Publish docs on model validation; lead with empathy
Brands are building their own in-house models

Integration of AI into research workflows: highest impact areas

Great for stat-testing, filtering through open-ended answers
Lots of new tools, but key is to "blend and balance," and enhance human-generated data
"Qualitative bundling" - pairing human-generated interview responses with synthetic responses

Consumers stand to benefit too

Increasingly more personalized surveys (huge - decent YC companies doing approaches like this)

#601
"Improving the customer experience
in a feedback recession with Isabelle Zdatny"

Core drivers of feedback recession

Consumers are staying silent about experiences (even good/bad)
They also have higher expectations for brands
BUT - only 1/3 send feedback to companies (8% YoY drop)

Perfect storm: lack of feedback, poorly designed surveys

Companies still use surveys, but their surveys are still incomplete
- Insufficient data, limited scope, data lags

Potential causes of the feedback recession

General "why bother" feeling: companies don't acknowledge (or even meaningfully acknowledge) when consumers give them information
Super easy and relatively inexpensive to switch competitors
People are just tired of surveys (guilty as charged). Genuine fatigue

Identifying feedback mechanisms beyond just surveys

Collecting and organizing transactional data points
Anything from "unstructured data": text (reviews, social media, videos, audio)
80-90% of CX data comes in these formats
- Can expand customer listening portfolios and ID/collect more data beyond that
  - Behavioral, operational, unsolicited, etc
- Need to filter for the right data, ensure software is synced

Combining the right data sources to nail core insights

Look at adjacent datasets and find other departments who are willing to partner + share data
Grow naturally: build a business case and expand into other teams and data sets
Map out your org: determine which types of data you need from which depts
Do NOT blast surveys 24/7 - only send surveys during those "moments that matter most"

#653
"Accelerating speed to insights
using synthetic feedback with Ali Henriques"

Different types of synthetic data

Can also be described as "AI-modeled" responses
Wrapper models - good, but only referencing publicly available info
RAG method - take 200 human responses and make it 400
Custom "foundational model" (the good stuff)
- Need access to a robust data source; needs to be combined with research data
- Receives daily training based on real-time data to stay fresh/relevant

Instant Insights

Industry-specific marketing intelligence platform based
Combines survey research ("syndicated research")
Ex: Restaurants
- Can ask it questions like "Where did you dine last?" " Where else would you consider?"
- Enhanced w/ 5-6+ data sources (search trends), promos, things influencing behavior
- Pulling in live data + transaction data + behavioral data (in-person location tracing), web+digital
- Much better than just foot traffic/transaction data

Interesting booking.com example

Study asked in January: "What did you do in December?" (assuming visiting family and friends)
AI came back and pointed out that most people probably went to the beach (beach trips = most popular type of travel)
Just really interesting to see our human-based nuances (recency bias!) when writing and answering questions and how AI views them

The role of the researcher

Still important! Even our unique way of asking questions and our way of presenting those questions reveals important insights
Synthetic responses are great for attitudinal, psychographic-related work
It's very important for researchers to understand the nature of the questions they're asking
Researchers needs to help guide companies to use right blend of synthetic/human data and make stakeholders feel comfortable

Future of synthetic data

Ideally access to rapid testing based on advanced data is widespread across an org, and not just siloed within the role of the researcher

Core Takeaway

At this point, my decision to base my project around synthetic data was fairly solidified, but hearing granular, detailed insights from boots-on-the-ground researchers like Ali and Isabelle only helped to finalize my decision.

I don't think it's extremely inaccurate to say that the current feedback recession and rise of survey fatigue will most likely continue - but knowing that we'll have access to an increasingly growing body of unstructured data helps soften that blow, especially given the fact that the models used to process and simulate that data will only get smarter over time.

This understanding directly informed the creation of my simulation.

Take a deep dive into my attempt to generate my own "synthetic data" in a quest to answer some burning questions about one of my favorite shoe brands (right under "Observations & Hypotheses").

VIEW SIMULATION

NEXT PROJECT

A complete objection-handling document

used to rapidly resolve client concerns.

Core Prep: Exploring Synthetic Data's Role in XM

🌮 I'm serious - I actually deliver. 🌮

​

​

#616 "Exploring synthetic personas in market research with Ali Henriques"

#601 "Improving the customer experience in a feedback recession with Isabelle Zdatny"

#653 "Accelerating speed to insights using synthetic feedback with Ali Henriques"

Core Takeaway

Core Prep:
Exploring
Synthetic Data's
Role in XM

#616
"Exploring synthetic personas
in market research with Ali Henriques"

#601
"Improving the customer experience
in a feedback recession with Isabelle Zdatny"

#653
"Accelerating speed to insights
using synthetic feedback with Ali Henriques"