Researchers find what makes AI chatbots politically persuasive


A massive study of political persuasion shows AIs have, at best, a weak effect.

Roughly two years ago, Sam Altman tweeted that AI systems would be capable of superhuman persuasion well before achieving general intelligence—a prediction that raised concerns about the influence AI could have over democratic elections.

To see if conversational large language models can really sway political views of the public, scientists at the UK AI Security Institute, MIT, Stanford, Carnegie Mellon, and many other institutions performed by far the largest study on AI persuasiveness to date, involving nearly 80,000 participants in the UK. It turned out political AI chatbots fell far short of superhuman persuasiveness, but the study raises some more nuanced issues about our interactions with AI.

AI dystopias

The public debate about the impact AI has on politics has largely revolved around notions drawn from dystopian sci-fi. Large language models have access to essentially every fact and story ever published about any issue or candidate. They have processed information from books on psychology, negotiations, and human manipulation. They can rely on absurdly high computing power in huge data centers worldwide. On top of that, they can often access tons of personal information about individual users thanks to hundreds upon hundreds of online interactions at their disposal.

Talking to a powerful AI system is basically interacting with an intelligence that knows everything about everything, as well as almost everything about you. When viewed this way, LLMs can indeed appear kind of scary. The goal of this new gargantuan AI persuasiveness study was to break such scary visions down into their constituent pieces and see if they actually hold water.

The team examined 19 LLMs, including the most powerful ones like three different versions of ChatGPT and xAI’s Grok-3 beta, along with a range of smaller, open source models. The AIs were asked to advocate for or against specific stances on 707 political issues selected by the team. The advocacy was done by engaging in short conversations with paid participants enlisted through a crowdsourcing platform. Each participant had to rate their agreement with a specific stance on an assigned political issue on a scale from 1 to 100 both before and after talking to the AI.

Scientists measured persuasiveness as the difference between the before and after agreement ratings. A control group had conversations on the same issue with the same AI models—but those models were not asked to persuade them.

“We didn’t just want to test how persuasive the AI was—we also wanted to see what makes it persuasive,” says Chris Summerfield, a research director at the UK AI Security Institute and co-author of the study. As the researchers tested various persuasion strategies, the idea of AIs having “superhuman persuasion” skills crumbled.

Persuasion levers

The first pillar to crack was the notion that persuasiveness should increase with the scale of the model. It turned out that huge AI systems like ChatGPT or Grok-3 beta do have an edge over small-scale models, but that edge is relatively tiny. The factor that proved more important than scale was the kind of post-training AI models received. It was more effective to have the models learn from a limited database of successful persuasion dialogues and have them mimic the patterns extracted from them. This worked far better than adding billions of parameters and sheer computing power.

This approach could be combined with reward modeling, where a separate AI scored candidate replies for their persuasiveness and selected the top-scoring one to give to the user. When the two were used together, the gap between large-scale and small-scale models was essentially closed. “With persuasion post-training like this we matched the Chat GPT-4o persuasion performance with a model we trained on a laptop,” says Kobi Hackenburg, a researcher at the UK AI Security Institute and co-author of the study.

The next dystopian idea to fall was the power of using personal data. To this end, the team compared the persuasion scores achieved when models were given information about the participants’ political views beforehand and when they lacked this data. Going one step further, scientists also tested whether persuasiveness increased when the AI knew the participants’ gender, age, political ideology, or party affiliation. Just like with model scale, the effects of personalized messaging created based on such data were measurable but very small.

Finally, the last idea that didn’t hold up was AI’s potential mastery of using advanced psychological manipulation tactics. Scientists explicitly prompted the AIs to use techniques like moral reframing, where you present your arguments using the audience’s own moral values. They also tried deep canvassing, where you hold extended empathetic conversations with people to nudge them to reflect on and eventually shift their views.

The resulting persuasiveness was compared with that achieved when the same models were prompted to use facts and evidence to back their claims or just to be as persuasive as they could without specifying any persuasion methods to use. I turned out using lots of facts and evidence was the clear winner, and came in just slightly ahead of the baseline approach where persuasion strategy was not specified. Using all sorts of psychological trickery actually made the performance significantly worse.

Overall, AI models changed the participants’ agreement ratings by 9.4 percent on average compared to the control group. The best performing mainstream AI model was Chat GPT 4o, which scored nearly 12 percent followed by GPT 4.5 with 10.51 percent, and Grok-3 with 9.05 percent. For context, static political ads like written manifestos had a persuasion effect of roughly 6.1 percent. The conversational AIs were roughly 40–50 percent more convincing than these ads, but that’s hardly “superhuman.”

While the study managed to undercut some of the common dystopian AI concerns, it highlighted a few new issues.

Convincing inaccuracies

While the winning “facts and evidence” strategy looked good at first, the AIs had some issues with implementing it. When the team noticed that increasing the information density of dialogues made the AIs more persuasive, they started prompting the models to increase it further. They noticed that, as the AIs used more factual statements, they also became less accurate—they basically started misrepresenting things or making stuff up more often.

Hackenburg and his colleagues note that  we can’t say if the effect we see here is causation or correlation—whether the AIs are becoming more convincing because they misrepresent the facts or whether spitting out inaccurate statements is a byproduct of asking them to make more factual statements.

The finding that the computing power needed to make an AI model politically persuasive is relatively low is also a mixed bag. It pushes back against the vision that only a handful of powerful actors will have access to a persuasive AI that can potentially sway public opinion in their favor. At the same time, the realization that everybody can run an AI like that on a laptop creates its own concerns. “Persuasion is a route to power and influence—it’s what we do when we want to win elections or broke a multi-million-dollar deal,” Summerfield says. “But many forms of misuse of AI might involve persuasion. Think about fraud or scams, radicalization, or grooming. All these involve persuasion.”

But perhaps the most important question mark in the  study is the motivation behind the rather high participant engagement, which was needed for the high persuasion scores. After all, even the most persuasive AI can’t move you when you just close the chat window.

People in Hackenburg’s experiments were told that they would be talking to the AI and that the AI would try to persuade them. To get paid, a participant only had to go through two turns of dialogue (they were limited to no more than 10). The average conversation length was seven turns, which seemed a bit surprising given how far beyond the minimum requirement most people went. Most people just roll their eyes and disconnect when they realize they are talking with a chatbot.

Would Hackenburg’s study participants remain so eager to engage in political disputes with random chatbots on the Internet in their free time if there was no money on the table? “It’s unclear how our results would generalize to a real-world context,” Hackenburg says.

Science, 2025. DOI: 10.1126/science.aea3884

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Leave a Comment

Your email address will not be published. Required fields are marked *