By chance I read two articles at a similar time that join up nicely.

First an academic paper The potential existential threat of large language models to online survey research, by Sean Westwood.1

Westwood demonstrates that autonomous AI agents can complete surveys with a 99.8% pass rate on attention checks, logic puzzles, and other standard bot detection methods. His bot maintains a consistent persona, can remembers its prior answers and calibrates its vocabulary and spelling errors to match its assigned education level. It strategically refuses superhuman tasks to avoid revealing itself.2 and only a small number of synthetic respondents can be sufficient to flip a poll’s predicted winner.

There’s two possible motives for using a tool like this in the wild:

  1. Financial. Survey platforms like Prolific, Mechanical Turk and others pay human respondents to complete surveys—typically $1-2 for a 10-minute survey. Westwood’s synthetic respondent costs about $0.05 in API fees to complete that same survey for a nice 96.8% profit margin for anyone willing to run bots at scale.

  2. Political. We live in an adversarial information environment. State actors have both the motive and the scale to deliberately corrupt these systems e.g. we know that Russia already engages in information warfare

Westwood’s recommendations for avoidance include going back to face-to-face interviews and using logitudinal panels that build trust over time. The problem is that these mitigations add significant cost as well as reducing speed.

Then I read an article from The Barnes Center for Social Change at Northwestern University in the States on how governments and companies are deploying “digital twins” 3: AI models that simulate public opinion for policy decisions (as liked by the Tony Blair institute who are influential wiith the current UK Labour government)

But without reliable base data, the synthetic panels are going to lead us to bad conclusions.

So of course we won’t use them without mitigations right? lol

The sociologist Diane Vaughan developed the idea of the normalisation of deviance: the gradual acceptance of risky practices because nothing bad has happened yet. She defines the process whereby a clearly unsafe practice becomes considered normal if it does not immediately cause a catastrophe: “a long incubation period before a final disaster with early warning signs that were either misinterpreted, ignored or missed completely”

This brings us back to those digital twins. If governments and companies are using AI to simulate public opinion—and they will—then those simulations are only as good as the data they’re trained on. Which may come from surveys that can’t distinguish humans from bots.

Vaughan’s theory is going to work here and each accommodation will seem fine at the time: faster panels, cheaper samples, looser validation and data that will still look reasonable. In an environment of falling budgets and a push for more speed, who is going to say no?

The risk is creating a feedback loop wherein synthetic respondents corrupt survey data, which trains synthetic panels, which inform policy decisions about real people. And bad data pushes out good because it’s easier. The Barnes Center piece calls this “drift”: AI systems become default decision-makers not because they’re legitimate, but because they’re convenient.

As that piece puts it: “Synthetic publics will not fail loudly. They will fail confidently and persuasively”


  1. I came across this via Tom Stafford’s substack post. Tom is an actual academic in this area, so you’ll want to read his post. ↩︎

  2. E.g. when asked to solve calculus problems or write FORTRAN code it declines 88-100% of the time, correctly inferring that attempting such tasks would blow its cover. ↩︎

  3. Via Rachel Coldicutt on Bluesky. If you find this interesting, you should follow her ↩︎