Maier et al. 2025 — SSR for Purchase Intent

Citation

Maier, B.F., Aslak, U., Fiaschi, L., et al. (2025). LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings. arXiv:2510.08338v3.

Core Contribution

Introduces Semantic Similarity Rating (SSR): a method that elicits textual responses from LLMs, then maps them to Likert distributions using embedding similarity to reference anchor statements. Achieves 90% of human test-retest reliability while maintaining realistic response distributions.

Key Framing

The problem isn’t whether LLMs can simulate human survey responses, it’s how we ask them. Direct Likert elicitation produces narrow, regression-to-mean distributions. Textual elicitation + embedding-based mapping produces human-like variance.

This reframes apparent LLM limitations as elicitation method artifacts.

Method Summary

Prompt LLM with demographic persona + product concept
Ask purchase intent question, elicit free-text response
Embed response using text-embedding model
Compare to five reference anchor statements (one per Likert point)
Generate probability distribution over Likert scale based on cosine similarity

Key Findings

Direct Likert Rating (DLR): High correlation (~80%) but poor distributions (KS sim ~0.26-0.39). Models cluster around “3”
Follow-up Likert Rating (FLR): Better but still narrow distributions
SSR: 90% correlation attainment + KS similarity >0.85
Demographic conditioning essential: Without personas, correlation attainment drops to ~50%
Synthetic consumers show less positivity bias than humans, providing wider discriminative range

Data

57 consumer surveys on personal care products (Colgate-Palmolive), 9,300 human respondents, 150-400 participants per survey.

Limitations Acknowledged

Reference statement sets require manual design
Some demographic patterns (gender, region, ethnicity) not consistently replicated
Bounded by LLM training data coverage of domain
Last-write-wins for concurrent updates

Transferable Insights

Elicitation design shapes output quality: applies to any structured output task
Two-stage elicitation (generate text → map to structure) preserves richness while enabling quantification
Embedding space as semantic bridge between natural language and structured formats
Correlation attainment as validation metric for synthetic data quality

Extracted Content

Atoms:

Molecules:

>heyMHK

LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings