top of page

Core Prep:
Exploring
Synthetic Data's
Role in XM

image.png
image.png

After identifying just how crucial synthetic data is becoming for research teams across the world, I wanted deeper insights from veteran Qualtrics employees, 

finding in-depth conversations with leaders Ali Henriques and Isabelle Zdatny.

​

These episodes solidified my decision to focus my technical project on attempting to model synthetic data generation and exploring the insights it provided.

Screenshot 2025-06-04 151339.png

🌮  I'm serious - I actually deliver.  🌮

​

​

You can get tacos delivered to the Qualtrics Provo office.

​

On me. Freshly picked from Taco Bell.

​

But not just tacos. TACOS  tacos.

​

The kind whose main ingredients are Transparency, being All-in,Customer-Obsession, being One-Team, and Scrappiness.

​

Hopefully you can tell I'm all in.

​

And I'll continue to deliver, day after day.

​

Just with less cheese and hot sauce, and more 11/10

effort across every project.

x

#616
"Exploring synthetic personas
in market  research with Ali Henriques"

After listening to these two episodes and reviewing my notes from the 2025 trend reports, it's very clear that even as we see incredible benefits from synthetic data, it would be a mistake to say that it's still in its infancy. Demand is high, use cases abound, and we can look forward to continued algorithmic progress in AI that will only increase synthetic data's ability to quickly and accurately capture core insights. Good, fast, and cheap are more or less practical realities thanks to this data.

 

It's obvious that it provides incredible value (in the right use cases), and I have no doubt that cutting-edge research teams and forward-thinking companies will increasingly view investments in this area as absolutely vital to their performance. I am fascinated by this wild ability to close gaps in research.

Big picture: synthetic data is  taking human-based data and combining it with publicly available data to fill crucial gaps in research efforts 

  • Current Edge team is split between a legacy market research agency and new approaches with synthetic data

​

It's basically like a "digital twin"

  • You're taking human-based data (any info available on-hand  or on the internet) and using that to model a response to a  question

  • Is a combination of record-level and real-level data 

  • Ex: can be extremely granular depending on availability of data: if I have a demo of 18-24 year olds, what will they think of this new salad concept? 

​

Rise of synthetic data usage

  • Balance of good, fast, and cheap  - get more output from your input

  • Some audiences are hard to reach or get feedback from (need embedded, in-field team to get survey responses from people)

  • Can reduce risk when testing new concepts  (private, contained env't)

  • Very helpful in heavily-regulated areas where privacy is a concern

​

Overall accuracy: ~80%!

  •  LLMS are more neutral, but are effective when comparing mean scores/basic validity measures (what a survey is designed to measure)

  • They'll cycle between 5-6 measures to determine if a model is good enough to us​

​

Addressing research teams' hesitancy around synthetic data usage

  • Publish docs on model validation; lead with empathy

  • Brands are building their own in-house models

​

Integration of AI into research workflows: highest impact areas

  • Great for stat-testing, filtering through open-ended answers

  • Lots of new tools, but key is to "blend and balance," and enhance human-generated data

  • "Qualitative bundling" - pairing human-generated interview responses with synthetic responses

​

Consumers stand to benefit too

  • Increasingly more ​personalized surveys (huge - decent YC companies doing approaches like this)

#601
"Improving the customer experience
in a feedback recession with Isabelle Zdatny"

Core drivers of feedback recession

  • Consumers are staying silent about experiences (even good/bad)

  • They also have higher expectations for brands

  • BUT - only 1/3 send feedback to companies (8% YoY drop)​

​

Perfect storm: lack of feedback, poorly designed surveys

  • Companies still use surveys, but their surveys are still incomplete

    • Insufficient data, limited scope, data lags​

​

Potential causes of the feedback recession

  • General "why bother" feeling: companies don't acknowledge (or even meaningfully acknowledge) when consumers give them information

  • Super easy and relatively inexpensive to switch competitors

  • People are just tired of surveys (guilty as charged). Genuine fatigue​

​

Identifying feedback mechanisms beyond just  surveys

  • Collecting and organizing transactional data points

  • Anything from "unstructured data": text (reviews, social media, videos, audio)

  • 80-90% of CX data comes in these formats

    • Can expand customer listening portfolios and ID/collect more data beyond that​​

      • Behavioral, operational, unsolicited, etc​

    • Need to filter for the right data, ensure software is synced

​

Combining the right data sources to nail core insights

  • Look at adjacent datasets and find other departments who are willing to partner + share data

  • Grow naturally: build a business case and expand into other teams and data sets

  • Map out your org: determine which types of data you need from which depts

  • Do NOT blast surveys 24/7 - only send surveys during those "moments that matter most"

​

#653
"Accelerating speed to insights
using synthetic feedback with Ali Henriques"

Different types of  synthetic data​​

  • Can also be described as "AI-modeled" responses

  • Wrapper models - good, but only referencing publicly available info

  • RAG method - take 200 human responses and make it 400

  • Custom "foundational model" (the good stuff)

    • Need access to a robust data source; needs to be combined with research data

    • Receives daily training based on real-time data to stay fresh/relevant

​

Instant Insights

  • Industry-specific marketing intelligence platform based

  • Combines survey research ("syndicated research")​
  • Ex: Restaurants

    • Can ask it questions like "Where did you dine last?" " Where else would you consider?"

    • Enhanced w/ 5-6+ data sources (search trends), promos, things influencing behavior

    • Pulling in live data + transaction data + behavioral data (in-person location tracing), web+digital

    • Much better than just foot traffic/transaction data​​​

​

Interesting booking.com example

  • Study asked in January: "What did you do in December?" (assuming visiting family and friends)

  • AI came back and pointed out that most people probably went to the beach (beach trips = most popular type of travel)

  • Just really interesting to see our human-based nuances (recency bias!) when writing and answering questions and how AI views them

​

The role of the researcher

  • Still important! Even our unique way of asking questions and our way of presenting those questions reveals important insights

  • Synthetic responses are great for attitudinal, psychographic-related work

  • It's very important for researchers to understand the nature of the questions they're asking

  • Researchers needs to help guide companies to use right blend of synthetic/human data and make stakeholders feel comfortable

​

Future of synthetic data

  • Ideally access to rapid testing based on advanced data is widespread across an org, and not just siloed within the role of the researcher

Core Takeaway

At this point, my decision to base my project around synthetic data was fairly solidified, but hearing granular, detailed insights from boots-on-the-ground researchers like Ali and Isabelle only helped to finalize my decision.

​

I don't think it's extremely inaccurate to say that the current feedback recession and rise of survey fatigue will most likely continue - but knowing that we'll have access to an increasingly growing body of unstructured data helps soften that blow, especially given the fact that the models used to process and simulate that data will only get smarter over time.

​

This understanding directly informed the creation of my simulation.

​

Take a deep dive into my attempt to generate my own "synthetic data" in a quest to answer some burning questions about one of my favorite shoe brands (right under "Observations & Hypotheses").

​

​

​

​​​

​

 

 

​​

​

A complete objection-handling document

used to rapidly resolve client concerns.

bottom of page