large langauge models

ai-chatbots-might-be-better-at-swaying-conspiracy-theorists-than-humans

AI chatbots might be better at swaying conspiracy theorists than humans

Out of the rabbit hole —

Co-author Gordon Pennycook: “The work overturns a lot of how we thought about conspiracies.”

A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Enlarge / A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Stephanie Keith | Getty Images

Belief in conspiracy theories is rampant, particularly in the US, where some estimates suggest as much as 50 percent of the population believes in at least one outlandish claim. And those beliefs are notoriously difficult to debunk. Challenge a committed conspiracy theorist with facts and evidence, and they’ll usually just double down—a phenomenon psychologists usually attribute to motivated reasoning, i.e., a biased way of processing information.

A new paper published in the journal Science is challenging that conventional wisdom, however. Experiments in which an AI chatbot engaged in conversations with people who believed at least one conspiracy theory showed that the interaction significantly reduced the strength of those beliefs, even two months later. The secret to its success: the chatbot, with its access to vast amounts of information across an enormous range of topics, could precisely tailor its counterarguments to each individual.

“These are some of the most fascinating results I’ve ever seen,” co-author Gordon Pennycook, a psychologist at Cornell University, said during a media briefing. “The work overturns a lot of how we thought about conspiracies, that they’re the result of various psychological motives and needs. [Participants] were remarkably responsive to evidence. There’s been a lot of ink spilled about being in a post-truth world. It’s really validating to know that evidence does matter. We can act in a more adaptive way using this new technology to get good evidence in front of people that is specifically relevant to what they think, so it’s a much more powerful approach.”

When confronted with facts that challenge a deeply entrenched belief, people will often seek to preserve it rather than update their priors (in Bayesian-speak) in light of the new evidence. So there has been a good deal of pessimism lately about ever reaching those who have plunged deep down the rabbit hole of conspiracy theories, which are notoriously persistent and “pose a serious threat to democratic societies,” per the authors. Pennycook and his fellow co-authors devised an alternative explanation for that stubborn persistence of belief.

Bespoke counter-arguments

The issue is that “conspiracy theories just vary a lot from person to person,” said co-author Thomas Costello, a psychologist at American University who is also affiliated with MIT. “They’re quite heterogeneous. People believe a wide range of them and the specific evidence that people use to support even a single conspiracy may differ from one person to another. So debunking attempts where you try to argue broadly against a conspiracy theory are not going to be effective because people have different versions of that conspiracy in their heads.”

By contrast, an AI chatbot would be able to tailor debunking efforts to those different versions of a conspiracy. So in theory a chatbot might prove more effective in swaying someone from their pet conspiracy theory.

To test their hypothesis, the team conducted a series of experiments with 2,190 participants who believed in one or more conspiracy theories. The participants engaged in several personal “conversations” with a large language model (GT-4 Turbo) in which they shared their pet conspiracy theory and the evidence they felt supported that belief. The LLM would respond by offering factual and evidence-based counter-arguments tailored to the individual participant. GPT-4 Turbo’s responses were professionally fact-checked, which showed that 99.2 percent of the claims it made were true, with just 0.8 percent being labeled misleading, and zero as false. (You can try your hand at interacting with the debunking chatbot here.)

Screenshot of the chatbot opening page asking questions to prepare for a conversation

Enlarge / Screenshot of the chatbot opening page asking questions to prepare for a conversation

Thomas H. Costello

Participants first answered a series of open-ended questions about the conspiracy theories they strongly believed and the evidence they relied upon to support those beliefs. The AI then produced a single-sentence summary of each belief, for example, “9/11 was an inside job because X, Y, and Z.” Participants would rate the accuracy of that statement in terms of their own beliefs and then filled out a questionnaire about other conspiracies, their attitude toward trusted experts, AI, other people in society, and so forth.

Then it was time for the one-on-one dialogues with the chatbot, which the team programmed to be as persuasive as possible. The chatbot had also been fed the open-ended responses of the participants, which made it better to tailor its counter-arguments individually. For example, if someone thought 9/11 was an inside job and cited as evidence the fact that jet fuel doesn’t burn hot enough to melt steel, the chatbot might counter with, say, the NIST report showing that steel loses its strength at much lower temperatures, sufficient to weaken the towers’ structures so that it collapsed. Someone who thought 9/11 was an inside job and cited demolitions as evidence would get a different response tailored to that.

Participants then answered the same set of questions after their dialogues with the chatbot, which lasted about eight minutes on average. Costello et al. found that these targeted dialogues resulted in a 20 percent decrease in the participants’ misinformed beliefs—a reduction that persisted even two months later when participants were evaluated again.

As Bence Bago (Tilburg University) and Jean-Francois Bonnefon (CNRS, Toulouse, France) noted in an accompanying perspective, this is a substantial effect compared to the 1 to 6 percent drop in beliefs achieved by other interventions. They also deemed the persistence of the effect noteworthy, while cautioning that two months is “insufficient to completely eliminate misinformed conspiracy beliefs.”

AI chatbots might be better at swaying conspiracy theorists than humans Read More »

microsoft-cto-kevin-scott-thinks-llm-“scaling-laws”-will-hold-despite-criticism

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism

As the word turns —

Will LLMs keep improving if we throw more compute at them? OpenAI dealmaker thinks so.

Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media's 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

Enlarge / Kevin Scott, CTO and EVP of AI at Microsoft speaks onstage during Vox Media’s 2023 Code Conference at The Ritz-Carlton, Laguna Niguel on September 27, 2023 in Dana Point, California.

During an interview with Sequoia Capital’s Training Data podcast published last Tuesday, Microsoft CTO Kevin Scott doubled down on his belief that so-called large language model (LLM) “scaling laws” will continue to drive AI progress, despite some skepticism in the field that progress has leveled out. Scott played a key role in forging a $13 billion technology-sharing deal between Microsoft and OpenAI.

“Despite what other people think, we’re not at diminishing marginal returns on scale-up,” Scott said. “And I try to help people understand there is an exponential here, and the unfortunate thing is you only get to sample it every couple of years because it just takes a while to build supercomputers and then train models on top of them.”

LLM scaling laws refer to patterns explored by OpenAI researchers in 2020 showing that the performance of language models tends to improve predictably as the models get larger (more parameters), are trained on more data, and have access to more computational power (compute). The laws suggest that simply scaling up model size and training data can lead to significant improvements in AI capabilities without necessarily requiring fundamental algorithmic breakthroughs.

Since then, other researchers have challenged the idea of persisting scaling laws over time, but the concept is still a cornerstone of OpenAI’s AI development philosophy.

You can see Scott’s comments in the video below beginning around 46: 05:

Microsoft CTO Kevin Scott on how far scaling laws will extend

Scott’s optimism contrasts with a narrative among some critics in the AI community that progress in LLMs has plateaued around GPT-4 class models. The perception has been fueled by largely informal observations—and some benchmark results—about recent models like Google’s Gemini 1.5 Pro, Anthropic’s Claude Opus, and even OpenAI’s GPT-4o, which some argue haven’t shown the dramatic leaps in capability seen in earlier generations, and that LLM development may be approaching diminishing returns.

“We all know that GPT-3 was vastly better than GPT-2. And we all know that GPT-4 (released thirteen months ago) was vastly better than GPT-3,” wrote AI critic Gary Marcus in April. “But what has happened since?”

The perception of plateau

Scott’s stance suggests that tech giants like Microsoft still feel justified in investing heavily in larger AI models, betting on continued breakthroughs rather than hitting a capability plateau. Given Microsoft’s investment in OpenAI and strong marketing of its own Microsoft Copilot AI features, the company has a strong interest in maintaining the perception of continued progress, even if the tech stalls.

Frequent AI critic Ed Zitron recently wrote in a post on his blog that one defense of continued investment into generative AI is that “OpenAI has something we don’t know about. A big, sexy, secret technology that will eternally break the bones of every hater,” he wrote. “Yet, I have a counterpoint: no it doesn’t.”

Some perceptions of slowing progress in LLM capabilities and benchmarking may be due to the rapid onset of AI in the public eye when, in fact, LLMs have been developing for years prior. OpenAI continued to develop LLMs during a roughly three-year gap between the release of GPT-3 in 2020 and GPT-4 in 2023. Many people likely perceived a rapid jump in capability with GPT-4’s launch in 2023 because they had only become recently aware of GPT-3-class models with the launch of ChatGPT in late November 2022, which used GPT-3.5.

In the podcast interview, the Microsoft CTO pushed back against the idea that AI progress has stalled, but he acknowledged the challenge of infrequent data points in this field, as new models often take years to develop. Despite this, Scott expressed confidence that future iterations will show improvements, particularly in areas where current models struggle.

“The next sample is coming, and I can’t tell you when, and I can’t predict exactly how good it’s going to be, but it will almost certainly be better at the things that are brittle right now, where you’re like, oh my god, this is a little too expensive, or a little too fragile, for me to use,” Scott said in the interview. “All of that gets better. It’ll get cheaper, and things will become less fragile. And then more complicated things will become possible. That is the story of each generation of these models as we’ve scaled up.”

Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism Read More »