Author name: Kris Guyer

google’s-new-“web-guide”-will-use-ai-to-organize-your-search-results

Google’s new “Web Guide” will use AI to organize your search results

Web Guide is halfway between normal search and AI Mode.

Credit: Google

Web Guide is halfway between normal search and AI Mode. Credit: Google

Google suggests trying Web Guide with longer or open-ended queries, like “how to solo travel in Japan.” The video below uses that search as an example. It has many of the links you might expect, but there are also AI-generated headings with summaries and suggestions. It really looks halfway between standard search and AI Mode. Because it has to run additional searches and generate content, Web Guide takes a beat longer to produce results compared to a standard search. There’s no AI Overview at the top, though.

Web Guide is a Search Labs experiment, meaning you have to opt-in before you’ll see any AI organization in your search results. When enabled, this feature takes over the “Web” tab of Google search. Even if you turn it on, Google notes there will be a toggle that allows you to revert to the normal, non-AI-optimized page.

An example of the Web Guide test.

Eventually, the test will expand to encompass more parts of the search experience, like the “All” tab—that’s the default search experience when you input a query from a browser or phone search bar. Google says it’s approaching this as an opt-in feature to start. So that sounds like Web Guide might be another AI Mode situation in which the feature rolls out widely after a short testing period. It’s technically possible the test will not result in a new universal search feature, but Google hasn’t yet met a generative AI implementation that it hasn’t liked.

Google’s new “Web Guide” will use AI to organize your search results Read More »

trump’s-order-to-make-chatbots-anti-woke-is-unconstitutional,-senator-says

Trump’s order to make chatbots anti-woke is unconstitutional, senator says


Trump plans to use chatbots to eliminate dissent, senator alleged.

The CEOs of every major artificial intelligence company received letters Wednesday urging them to fight Donald Trump’s anti-woke AI order.

Trump’s executive order requires any AI company hoping to contract with the federal government to jump through two hoops to win funding. First, they must prove their AI systems are “truth-seeking”—with outputs based on “historical accuracy, scientific inquiry, and objectivity” or else acknowledge when facts are uncertain. Second, they must train AI models to be “neutral,” which is vaguely defined as not favoring DEI (diversity, equity, and inclusion), “dogmas,” or otherwise being “intentionally encoded” to produce “partisan or ideological judgments” in outputs “unless those judgments are prompted by or otherwise readily accessible to the end user.”

Announcing the order in a speech, Trump said that the US winning the AI race depended on removing allegedly liberal biases, proclaiming that “once and for all, we are getting rid of woke.”

“The American people do not want woke Marxist lunacy in the AI models, and neither do other countries,” Trump said.

Senator Ed Markey (D.-Mass.) accused Republicans of basing their policies on feelings, not facts, joining critics who suggest that AI isn’t “woke” just because of a few “anecdotal” outputs that reflect a liberal bias. And he suggested it was hypocritical that Trump’s order “ignores even more egregious evidence” that contradicts claims that AI is trained to be woke, such as xAI’s Elon Musk explicitly confirming that Grok was trained to be more right-wing.

“On May 1, 2025, Grok—the AI chatbot developed by xAI, Elon Musk’s AI company—acknowledged that ‘xAI tried to train me to appeal to the right,’” Markey wrote in his letters to tech giants. “If OpenAI’s ChatGPT or Google’s Gemini had responded that it was trained to appeal to the left, congressional Republicans would have been outraged and opened an investigation. Instead, they were silent.”

He warned the heads of Alphabet, Anthropic, Meta, Microsoft, OpenAI, and xAI that Trump’s AI agenda was allegedly “an authoritarian power grab” intended to “eliminate dissent” and was both “dangerous” and “patently unconstitutional.”

Even if companies’ AI models are clearly biased, Markey argued that “Republicans are using state power to pressure private companies to adopt certain political viewpoints,” which he claimed is a clear violation of the First Amendment. If AI makers cave, Markey warned, they’d be allowing Trump to create “significant financial incentives” to ensure that “their AI chatbots do not produce speech that would upset the Trump administration.”

“This type of interference with private speech is precisely why the US Constitution has a First Amendment,” Markey wrote, while claiming that Trump’s order is factually baseless.

It’s “based on the erroneous belief that today’s AI chatbots are ‘woke’ and biased against Trump,” Markey said, urging companies “to fight this unconstitutional executive order and not become a pawn in Trump’s effort to eliminate dissent in this country.”

One big reason AI companies may fight order

Some experts agreed with Markey that Trump’s order was likely unconstitutional or otherwise unlawful, The New York Times reported.

For example, Trump may struggle to convince courts that the government isn’t impermissibly interfering with AI companies’ protected speech or that such interference may be necessary to ensure federal procurement of unbiased AI systems.

Genevieve Lakier, a law professor at the University of Chicago, told the NYT that the lack of clarity around what makes a model biased could be a problem. Courts could deem the order an act of “unconstitutional jawboning,” with the Trump administration and Republicans generally perceived as using legal threats to pressure private companies into producing outputs that they like.

Lakier suggested that AI companies may be so motivated to win government contracts or intimidated by possible retaliation from Trump that they may not even challenge the order, though.

Markey is hoping that AI companies will refuse to comply with the order; however, despite recognizing that it places companies “in a difficult position: Either stand on your principles and face the wrath of the Trump administration or cave to Trump and modify your company’s political speech.”

There is one big possible reason that AI companies may have to resist, though.

Oren Etzioni, the former CEO of the AI research nonprofit Allen Institute for Artificial Intelligence, told CNN that Trump’s anti-woke AI order may contradict the top priority of his AI Action Plan—speeding up AI innovation in the US—and actually threaten to hamper innovation.

If AI developers struggle to produce what the Trump administration considers “neutral” outputs—a technical challenge that experts agree is not straightforward—that could delay model advancements.

“This type of thing… creates all kinds of concerns and liability and complexity for the people developing these models—all of a sudden, they have to slow down,” Etzioni told CNN.

Senator: Grok scandal spotlights GOP hypocrisy

Some experts have suggested that rather than chatbots adopting liberal viewpoints, chatbots are instead possibly filtering out conservative misinformation and unintentionally appearing to favor liberal views.

Andrew Hall, a professor of political economy at Stanford Graduate School of Business—who published a May paper finding that “Americans view responses from certain popular AI models as being slanted to the left”—told CNN that “tech companies may have put extra guardrails in place to prevent their chatbots from producing content that could be deemed offensive.”

Markey seemed to agree, writing that Republicans’ “selective outrage matches conservatives’ similar refusal to acknowledge that the Big Tech platforms suspend or impose other penalties disproportionately on conservative users because those users are disproportionately likely to share misinformation, rather than due to any political bias by the platforms.”

It remains unclear what amount of supposed bias detected in outputs could cause a contract bid to be rejected or an ongoing contract to be canceled, but AI companies will likely be on the hook to pay any fees in terminating contracts.

Complying with Trump’s order could pose a struggle for AI makers for several reasons. First, they’ll have to determine what’s fact and what’s ideology, contending with conflicting government standards in how Trump defines DEI. For example, the president’s order counts among “pervasive and destructive” DEI ideologies any outputs that align with long-standing federal protections against discrimination on the basis of race or sex. In addition, they must figure out what counts as “suppression or distortion of factual information about” historical topics like critical race theory, systemic racism, or transgenderism.

The examples in Trump’s order highlighting outputs offensive to conservatives seem inconsequential. He calls out image generators depicting the Pope, the Founding Fathers, and Vikings as not white as problematic, as well as models refusing to misgender a person “even if necessary to stop a nuclear apocalypse” or show white people celebrating their achievements.

It’s hard to imagine how these kinds of flawed outputs could impact government processes, as compared to, say, government contracts granted to models that could be hiding covert racism or sexism.

So far, there has been one example of an AI model displaying a right-wing bias earning a government contract with no red flags raised about its outputs.

Earlier this summer, Grok shocked the world after Musk announced he would be updating the bot to eliminate a supposed liberal bias. The unhinged chatbot began spouting offensive outputs, including antisemitic posts that praised Hitler as well as proclaiming itself “MechaHitler.”

But those obvious biases did not conflict with the Pentagon’s decision to grant xAI a $200 million federal contract. In a statement, a Pentagon spokesperson insisted that “the antisemitism episode wasn’t enough to disqualify” xAI, NBC News reported, partly since “several frontier AI models have produced questionable outputs.”

The Pentagon’s statement suggested that the government expected to deal with such risks while seizing the opportunity of rapidly deploying emerging AI technology into government prototype processes. And perhaps notably, Trump provides a carveout for any agencies using AI models to safeguard national security, which could exclude the Pentagon from experiencing any “anti-woke” delays in accessing frontier models.

But that won’t help other agencies that must figure out how to assess models to meet anti-woke AI requirements over the next few months. And those assessments could cause delays that Trump may wish to avoid in pushing for widespread AI adoption across government.

Trump’s anti-woke AI agenda may be impossible

On the same day that Trump issued his anti-woke AI order, his AI Action Plan promised an AI “renaissance” fueling “intellectual achievements” by “unraveling ancient scrolls once thought unreadable, making breakthroughs in scientific and mathematical theory, and creating new kinds of digital and physical art.”

To achieve that, the US must “innovate faster and more comprehensively than our competitors” and eliminate regulatory barriers impeding innovation in order to “set the gold standard for AI worldwide.”

However, achieving the anti-woke ambitions of both orders raises a technical problem that even the president must accept currently has no solution. In his AI Action Plan, Trump acknowledged that “the inner workings of frontier AI systems are poorly understood,” with even “advanced technologists” unable to explain “why a model produced a specific output.”

Whether requiring AI companies to explain their AI outputs to win government contracts will mess with other parts of Trump’s action plan remains to be seen. But Samir Jain, vice president of policy at a civil liberties group called the Center for Democracy and Technology, told the NYT that he predicts the anti-woke AI agenda will set “a really vague standard that’s going to be impossible for providers to meet.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s order to make chatbots anti-woke is unconstitutional, senator says Read More »

white-house-unveils-sweeping-plan-to-“win”-global-ai-race-through-deregulation

White House unveils sweeping plan to “win” global AI race through deregulation

Trump’s plan was not welcomed by everyone. J.B. Branch, Big Tech accountability advocate for Public Citizen, in a statement provided to Ars, criticized Trump as giving “sweetheart deals” to tech companies that would cause “electricity bills to rise to subsidize discounted power for massive AI data centers.”

Infrastructure demands and energy requirements

Trump’s new AI plan tackles infrastructure head-on, stating that “AI is the first digital service in modern life that challenges America to build vastly greater energy generation than we have today.” To meet this demand, it proposes streamlining environmental permitting for data centers through new National Environmental Policy Act (NEPA) exemptions, making federal lands available for construction and modernizing the power grid—all while explicitly rejecting “radical climate dogma and bureaucratic red tape.”

The document embraces what it calls a “Build, Baby, Build!” approach—echoing a Trump campaign slogan—and promises to restore semiconductor manufacturing through the CHIPS Program Office, though stripped of “extraneous policy requirements.”

On the technology front, the plan directs Commerce to revise NIST’s AI Risk Management Framework to “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.” Federal procurement would favor AI developers whose systems are “objective and free from top-down ideological bias.” The document strongly backs open source AI models and calls for exporting American AI technology to allies while blocking administration-labeled adversaries like China.

Security proposals include high-security military data centers and warnings that advanced AI systems “may pose novel national security risks” in cyberattacks and weapons development.

Critics respond with “People’s AI Action Plan”

Before the White House unveiled its plan, more than 90 organizations launched a competing “People’s AI Action Plan” on Tuesday, characterizing the Trump administration’s approach as “a massive handout to the tech industry” that prioritizes corporate interests over public welfare. The coalition includes labor unions, environmental justice groups, and consumer protection nonprofits.

White House unveils sweeping plan to “win” global AI race through deregulation Read More »

ukrainians-arrest-alleged-admin-of-major-crime-forum-xss

Ukrainians arrest alleged admin of major crime forum XSS

Yesterday, Ukrainian authorities arrested the suspected administrator of a notorious Russian-language crime forum, XSS.is.

In an X post, the Paris Prosecutor’s Office announced that Ukrainian authorities detained the suspect after an investigation conducted with French authorities’ and Europol’s help that began almost exactly four years ago.

XSS has been “one of the main hubs of global cybercrime” since 2013, French authorities said, allowing “the sale of malware, access to compromised systems, stolen data, and ransomware-related services.”

Used by criminals globally to cover up illicit activity, the forum was shut down soon after the admin’s arrest.

The suspected admin has so far not been named. But police said the suspect was identified after authorities began intercepting encrypted chats sent on a Jabber messaging server that members used, “thesecure.biz.”

Surveilling chats between forum users, the government eventually intercepted a message that tipped authorities off to the alleged admin’s identity back in September. Soon after, they deployed agents to find the admin, and ultimately, it took months for Ukrainian authorities to make the arrest, with both French and Europol authorities present.

“The intercepted messages revealed numerous illicit activities related to cybercrime and ransomware, and established that they generated at least $7 million in profits,” a translation of the press release said.

Ukrainians arrest alleged admin of major crime forum XSS Read More »

audi-has-a-new-midsize-ev,-and-we’ve-driven-it:-the-2025-a6-sportback

Audi has a new midsize EV, and we’ve driven it: The 2025 A6 Sportback

Audi S6 drives on a straight road past vineyards

Long straight roads glide underneath. Credit: Audi

The car’s cabin layout and ergonomics are starting to feel familiar at this point—it shares much not only with the electric Q6 e-tron but also Audi’s new midsize combustion cars, the A5 and Q5. (We’ll leave for now the fact that a combustion A6, unrelated to today’s vehicle in virtually all but name, is also in development, bringing an end to the “odd numbers for ICE, even numbers for EV” convention that briefly took hold at the automaker. Now nameplate chaos reigns.)

Hey Audi…

The voice control proved a frustrating alternative to using the touchscreen, with a lot of “I’m sorry I can’t do that” and “can you ask me that again” for commands that I’m pretty sure ought to have worked. But both the A6 and S6 felt mature in terms of software, something that wasn’t true for the same infotainment platform a year ago. I remain frustrated with how limited the UI options remain for the main instrument display, however.

I keep writing this, but Audi pioneered the use of high-resolution digital displays instead of analog dials and gave owners quite a lot of choice, including the option of a moving map for navigation. Now, there’s a way to make the display very minimal, which would be useful at night, but otherwise, you’re extremely limited in what you can display in front of you. The optional full-color heads-up display has the same augmented-reality direction tech that we’ve seen in other luxury cars, and it remains helpful when driving on unfamiliar roads, although that requires using the native navigation app; Apple CarPlay users should still see turn-by-turn directions on the HUD, though.

The layout is starting to become familiar. Audi

There’s no true one-pedal driving mode, just a choice between B—0.25 G of lift-off regeneration deceleration—and D, which can be toggled between none, 0.06 G, and 0.15 G of lift-off regen braking using the paddles behind the steering wheel. B is preferable when the road turns twisty, something both A6 and S6 coped with surprisingly well. Hairpins proved the steering and suspension rapid enough to rotate the car quickly, and what felt like numb steering initially began to reveal some information about road surfaces and available grip as the road surface changed then changed again. There’s also a noticeable difference between the drive modes. Comfort feels a little soft and wallowing, Dynamic effectively transfers more bumps into the cabin, and Balanced is a rather good midpoint between the two, and where I spent most of my time. I should also note the lack of fatigue I felt despite a full day behind the wheel of both cars.

Audi has a new midsize EV, and we’ve driven it: The 2025 A6 Sportback Read More »

toy-company-may-regret-coming-for-“sylvanian-drama”-tiktoker,-experts-say

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say


Possible legal paths to revive a shuttered video series on TikTok and Instagram.

A popular account on TikTok and Instagram stopped posting suddenly at the end of last year, hit by a lawsuit after garnering millions of views on funny videos it made using adorable children’s Calico Critter dolls to act out dark, cringe-y adult storylines.

While millions of followers mourn the so-called “Sylvanian Drama” account’s demise, experts told Ars that the creator may have a decent chance at beating the lawsuit.

The “Sylvanian Drama” account derived its name from “Sylvanian Families,” a brand name used by Epoch Company Ltd., the maker of Calico Critters, for its iconic fuzzy animal dolls in some markets outside the US. Despite these videos referencing murder, drugs, and hookups, the toy company apparently had no problem, until the account, managed by Ireland-based Thea Von Engelbrechten, started accepting big brand partnerships and making sponsored content featuring the dolls.

Since Epoch, too, strikes partnerships with brands and influencers to promote its own videos marketing the dolls, the company claimed “Sylvanian Drama” risked creating too much confusion online. They also worried viewers would think Epoch had signed off on the videos, since the sponsored content was marked “paid partnership” without specifying precisely which featured brands had paid for the spots. They further accused Von Engelbrechten of building her advertising business around their brand without any attempt to properly license the dolls, while allegedly usurping licensing opportunities from Epoch.

So far, Von Engelbrechten has delayed responding in the lawsuit. As the account remained inactive over the past few months, fans speculated whether it could survive the lawsuit, which raised copyright and trademark infringement claims to get all the videos removed. In their complaint, the toy company requested not only an injunction preventing Von Engelbrechten from creating more “Sylvanian Drama” videos, but also sought all of her profits from her online accounts, in addition to further damages.

Von Engelbrechten declined Ars’ request to provide an update on her defense in the case, but her response is due in early August. That filing will make clear what arguments she may make to overcome Epoch’s suit, but legal experts told Ars that the case isn’t necessarily a slam dunk for the toy company. So all that “Sylvanian Drama” isn’t over just yet.

Epoch’s lawyers did not respond to Ars’ request to comment.

“Sylvanian Drama” needs the court to get the joke

Epoch raised copyright infringement charges that could hit Von Engelbrechten with fines totaling $150,000 per violation.

For Von Engelbrechten to defeat the copyright infringement claim, she’ll need to convince the court that her videos are parodies. A law professor at Santa Clara University School of Law, Eric Goldman, told Ars that her videos may qualify since “even if they don’t expressly reference Epoch’s offerings by name, the videos intentionally communicate a jarring juxtaposition of adorable critters who are important parts of pop culture living through the darker sides of humanity.”

Basically, Von Engelbrechten will need the court to understand the humor in her videos to win on that claim, Rebecca Tushnet, a First Amendment law professor at Harvard Law School, told Ars.

“Courts have varied in their treatment of parodies; the complaint’s definition of parody is not controlling but humor is one of the hardest things to predict—if the court gets the joke, it will be more likely to say that the juxtaposition between the storylines and the innocent appearance of the dolls is parodic,” Tushnet said.

But if the court does get the joke, Goldman suggested that even the sponsored content—which hilariously incorporates product placements from various big brands like Marc Jacobs, Taco Bell, Hilton, and Sephora into storylines—could possibly be characterized as parody.

However, “the fact that the social media posts were labeled #ad will make it extremely difficult for the artist to contest the videos’ status as ads,” Goldman said.

Ultimately, Goldman said that Epoch’s lawsuit “raises a host of complex legal issues” and is “not an easy case on either side.”

And one of the most significant issues that Epoch may face in the courtroom could end up gutting all of its trademark infringement claims that supposedly entitle the toy company to all of Von Engelbrechten’s profits, Alexandra Jane Roberts, a Northeastern University professor of law and media with special expertise in trademark law, told Ars.

Calico Critters may stumble on trademark hurdle

The toy company has raised several trademark infringement claims, all of which depend on Epoch proving that Von Engelbrechten “knowingly and willfully” used its trademarks without permission.

However, Roberts pointed out to Ars that Epoch has no trademarks for its iconic dolls, relying only on common law to assert sole rights to the “look and design of the critters.”

It’s likely impossible for Epoch to trademark the dolls, since trademarks are not intended to block competition, and there are only so many ways to design cute dolls that resemble cats or bunnies, Roberts suggested. A court may decide “there’s only so many ways to make a small fuzzy bunny that doesn’t look like this,” potentially narrowing the rights Epoch has under trade dress, a term that Epoch doesn’t use once in its complaint.

Roberts told Ars that Epoch’s trademark claims are “not so far off the mark,” and Von Engelbrechten’s defense was certainly not strengthened by her decision to monetize the content. Prior cases, like the indie band OK Go sending a cease-and-desist to Post cereal over a breakfast product called “OK Go” due to fears of false endorsement, make it clear that courts have agreed in the past that online collaborations have muddied the waters regarding who is the actual source of content for viewers.

“The question becomes whether people are going to see these videos, even though they’re snarky, and even though they’re silly and think, ‘Oh, Calico Critters must have signed off on this,'” Roberts said. “So the argument about consumer confusion, I think, is a plausible argument.”

However, if Epoch fails to convince the court that its trademarks have been infringed, then its other claims alleging false endorsement and unfair competition would likely also collapse.

“You can still get sometimes to unfair competition or to kind of like a false endorsement, but it’s harder to win on those claims and certainly harder to get damages on those claims,” Roberts said. “You don’t get trademark infringement if you don’t have a trademark.”

Possible defenses to keep “Sylvanian Drama” alive

Winning on the trademark claims may not be easy for Von Engelbrechten, who possibly weakened her First Amendment defense by creating the sponsored content. Regardless, she will likely try to convince the court to view the videos as parody, which is a slightly different analysis under trademark law than copyright’s more well-known fair use parody exceptions.

That could be a struggle, since trademark law requires that Von Engelbrechten’s parody videos directly satirize the “Sylvanian Families” brand, and “Sylvanian Drama” videos, even the ads, instead seem to be “making fun of elements of society and culture,” rather than the dolls themselves, Roberts said.

She pointed to winning cases involving the Barbie trademark as an instructive example. In a case disputing Mattel trademarks used in the lyrics for the one-hit wonder “Barbie Girl,” the song was cleared for trademark infringement as a “purely expressive work” that directly parodies Barbie in the lyrics. And in another case, where an artist, Tom Forsythe, captured photos of Barbie dolls in kitchen vessels like a blender or a margarita glass, more robust First Amendment protection was offered since his photos “had a lot to say about sexism and the dolls and what the dolls represent,” Roberts said.

The potential “Sylvanian Drama” defense seems to lack strong go-to arguments that typically win trademark cases, but Roberts said there is still one other defense the content creator may be weighing.

Under “nominative fair use,” it’s OK to use another company’s trademark if it’s necessary in an ad. Roberts provided examples, like a company renting Lexus cars needing to use that trademark or comparative advertising using Tiffany’s diamonds as a reference point to hype their lower prices.

If Von Engelbrechten goes that route, she will need to prove she used “no more of the mark than is necessary” and did not mislead fans on whether Epoch signed off on the use.

“Here it’s hard to say that ‘Sylvanian Drama’ really needed to use so much of those characters and that they didn’t use more than they needed and that they weren’t misleading,” Roberts said.

However, Von Engelbrechten’s best bet might be arguing that there was no confusion, since “Sylvanian Families” isn’t even a brand that’s used in the US, which is where Epoch chose to file its lawsuit because the brands that partnered with the popular account are based in New York. And the case may not even get that far, Roberts suggested, since “before you can get to those questions about the likelihood of confusion, you have to show that you actually have trademark or trade dress rights to enforce.”

Calico Critters creator may face millennial backlash

Epoch may come to regret filing the lawsuit, Roberts said, noting that as a millennial who grew up a big “Hello Kitty” fan, she still buys merch that appeals to her, and Epoch likely knows about that market, as it has done collaborations with the “Hello Kitty” brand. The toymaker could risk alienating other millennials nostalgic for Calico Critters who may be among the “Sylvanian Drama” audience and feel turned off by the lawsuit.

“When you draw attention to something like this and appear litigious, and that you’re coming after a creator who a lot of people really like and really enjoy and probably feel defensive about, like, ‘Oh, she’s just making these funny videos that everyone loves. Why would you want to sue her?'” Roberts said, “that can be really bad press.”

Goldman suggested that Epoch might be better off striking a deal with the creator, which “could establish some boundaries for the artist to keep going without stepping on the IP owner’s rights.” But he noted that “often IP owners in these situations are not open to negotiation,” and “that requires courts to draw difficult and unpredictable lines about the permissible scope of fair use.”

For Von Engelbrechten, the lawsuit may mean that her days of creating “Sylvanian Drama”-sponsored content are over, which could risk crushing a bigger dream she had to succeed in advertising. However, if the lawsuit can be amicably settled, the beloved content creator could also end up making money for Epoch, considering her brand deals appeared to be bigger.

While she seems to take her advertising business seriously, Von Engelbrechten’s videos often joke about legal consequences, such as one where a cat doll says she cannot go to a party because she’s in jail but says “I’ll figure it out” when told her ex will be attending. Perhaps Von Engelbrechten is currently devising a scheme, like her characters, to escape consequences and keep the “Sylvanian Drama” going.

“Maybe if this company were really smart, they would want to hire this person instead of suing them,” Roberts said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say Read More »

google-and-openai-get-2025-imo-gold

Google and OpenAI Get 2025 IMO Gold

Congratulations, as always, to everyone who got to participate in the 2025 International Mathematical Olympiad, and especially to the gold and other medalists. Gautham Kamath highlights 11th grader Warren Bei, who in his 5th (!) IMO was one of five participants with a perfect 42/42 score, along with Ivan Chasovskikh, Satoshi Kano, Leyan Deng and Hengye Zhang.

Samuel Albanie: Massive respect to the students who solved P6.

Congratulations to Team USA, you did not ‘beat China’ but 2nd place is still awesome. Great job, China, you got us this time, three perfect scores is crazy.

You’ve all done a fantastic, amazingly hard thing, and as someone who tried hard to join you and only got as far as the [, year censored because oh man I am old] USAMO and would probably have gotten 0/45 on this IMO if I had taken it today, and know what it is like to practice for the USAMO in a room with multiple future IMO team members that must have thought I was an idiot, let me say: I am always in awe.

But that’s not important right now.

What matters is that Google and OpenAI have LLMs with gold medal performances, each scoring exactly the threshold of 35/42 by solving the first five of the six problems.

This is up from Google’s 28/42 performance last year, which was previously achieved with a longer time frame. The methods used by both are presented as being more general, whereas last year’s version was a more specialized effort.

The new scores were a 92nd percentile result at the event.

Google did this under official collaboration with the IMO, announced on Monday as per the IMO’s request. OpenAI did it on their own, so they announced a bit earlier, so we are taking their word on many details.

This was not expected. Prediction markets thought gold this year was unlikely.

What matters more is how they did it, with general purpose LLMs without tools, in ways that represent unexpected and large future gains in other reasoning as well.

The more I think about the details here, the more freaked out I get rather than less. This is a big deal. How big remains to be seen, as we lack details, and no one knows how much of this will generalize.

The IMO 2025 results quickly came in for released models.

Teortaxes: I sure jumped the gun calling Grok a next generation model.

It’s probably not *thatfar from Gemini, compute-wise, and not really close in diversity and rigor of post-training.

This was an early sign that problem 3 was easier than usual this year, and a strong performance by the release version of Gemini 2.5 Pro.

So this is how it started seven hours before OpenAI announced its result:

Jxmo (replying to Ravid): if they did well, you’d be complaining that they overfit.

Ravid Shwartz: That’s true, because they are 👽

Rohit: This isn’t a gotcha. Any problem that we fundamentally focus on deeply enough is one that AI will be able to solve. The question, as ever, is whether that solution is likely to carry over to other domains.

I disagree, I think this is a gotcha in the positive sense. People took ‘the AIs that weren’t aimed at this problem that are publicly released are only doing okay relative to the best humans, and have not proven themselves the best yet’ to be ‘look at the pathetic AIs,’ one day before we learned that, well, actually, in a way prediction markets did not expect.

I do think people need to update their models of the future.

Also, it’s kind of a full gotcha given this:

Lin Yang: 🚨 Olympiad math + AI:

We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity.

The model could win gold! 🥇

It would be non-trivial for non-math person to achieve the same score. We have spent some time to carefully check the solutions. Regardless, the prompts are very general and can be applied to other models. We will release an automatic agent soon.

Jun Wu: They added a lot of steps in order to solve 5 problems. They didn’t publish the details on how these steps were done beyond the concepts.

I don’t have time to investigate how ‘legit’ the Gemini 2.5 Pro solutions are, including in terms of how much you have to cheat to get them.

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. Google’s solutions are here.

We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.

To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.

We will be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.

Google’s answers were even in nice form.

IMO President Dr. Gregor Dolinar: We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points — a gold medal score. Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow.

Colin Fraser: has anyone actually read these LLM IMO proofs? I read one of the Google ones and it’s good. I find the OAI version of the same one impenetrable. The Google one is also kind of hard to read but possible.

Ernest Davis (6th in US Math Olympiad once, just short of the IMO): Second: The proofs produced by DM-IMO and by every single earlier LLM, whether correct or incorrect, are written in a smooth, elegant style. They could be cut and pasted into a journal article or into a textbook with little or no editing. The worst you can say of them is that they are sometimes verbose.

By contrast, OpenAI-IMO writes proofs in the style of an informal spoken presentation who is not very practiced or competent at giving informal presentations, and regularly mutters reassurances to themselves that they’re on the right rack.

Miles Brundage: OAI one got RL’d to within an inch of its life.

What else did they say about how they did this?

DeepMind: With Deep Think, an enhanced reasoning mode, our model could simultaneously explore and combine multiple possible solutions before giving definitive answers.

We also trained it on RL techniques that use more multi-step reasoning, problem-solving and theorem-proving data.

Finally, we pushed this version of Gemini further by giving it:

🔘 More thinking time

🔘 Access to a set of high-quality solutions to previous problems

🔘 General hints and tips on how to approach IMO problems

That sounds mostly rather general. There’s some specialized IMO context, but orders of magnitude less than what IMO competitors devote to this.

Elon Musk: While a notable milestone, this is already borderline trivial for AI.

Um, Elon, no, and I remind you that Grok 4 got 11.9%. Which for a human would be super impressive, but seriously, borderline trivial?

Noam Brown (OpenAI): Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress.

OpenAI claimed its victory first, right after the closing ceremony and before the party, whereas Google DeepMind waited to announce until the following Monday.

The most impressive thing about OpenAI’s result is that they claim this is not an IMO-specific model, and that it uses only general-purpose techniques.

Alexander Wei (OpenAI): I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.

Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold! 🥇

HUGE congratulations to the team—@SherylHsu02, @polynoamial, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.

Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.

If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅)

Lastly, we’d like to congratulate all the participants of the 2025 IMO on their achievement! We are proud to have many past IMO participants at @OpenAI and recognize that these are some of the brightest young minds of the future.

Noam Brown (OpenAI): Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline.

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

Jacques: Most important part of the IMO Gold achievement. Were you surprised by this? Did you not update all the way to avoid likelihood of surprise?

Indeed. Purely getting the gold medal is surprising but not that big a deal. The way they got the result, assuming they’re reporting accurately? That’s a really big deal.

Noam Brown (resuming): Also this model thinks for a *longtime. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it’s also more efficient with its thinking. And there’s a lot of room to push the test-time compute and efficiency further.

Importantly, I think we’re close to AI substantially contributing to scientific discovery. There’s a big difference between AI slightly below top human performance vs slightly above.

This was a small team effort led by @alexwei_. He took a research idea few believed in and used it to achieve a result fewer thought possible. This also wouldn’t be possible without years of research+engineering from many at @OpenAI and the wider AI community.

Tifa Chen: Last night we IMO tonight we party.

What about Problem 6? Did the programs submit incorrect solutions?

Note that if you are maximizing, then when time runs out if you have anything at all then yes you do submit the best incorrect solution you have, because you might get you partial credit, although this rarely works out.

Daniel Litt: One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it?

Alexander Wei: On IMO P6 (without going into too much detail about our setup), the model “knew” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!

If one person gets to say ‘Not So Fast’ about this sort of thing, Tao is that one person.

It is entirely fair to say that if you don’t disclose conditions in advance, and definitely if you don’t disclose conditions after the fact, it is difficult to know exactly what to make of the result. Tao’s objections are valid.

Terence Tao: It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an “honorable mention”.

But consider what happens to the difficulty level of the Olympiad if we alter the format in various ways, such as the following:

  1. One gives the students several days to complete each question, rather than four and half hours for three questions. (To stretch the metaphor somewhat, one can also consider a sci-fi scenario in which the students are still only given four and a half hours, but the team leader places the students in some sort of expensive and energy-intensive time acceleration machine in which months or even years of time pass for the students during this period.)

  2. Before the exam starts, the team leader rewrites the questions in a format that the students find easier to work with.

  3. The team leader gives the students unlimited access to calculators, computer algebra packages, formal proof assistants, textbooks, or the ability to search the internet.

  4. The team leader has the six student team work on the same problem simultaneously, communicating with each other on their partial progress and reported dead ends.

  5. The team leader gives the students prompts in the direction of favorable approaches, and intervenes if one of the students is spending too much time on a direction that they know to be unlikely to succeed.

  6. Each of the six students on the team submit solutions to the team leader, who then selects only the “best” solution for each question to submit to the competition, discarding the rest.

  7. If none of the students on the team obtains a satisfactory solution, the team leader does not submit any solution at all, and silently withdraws from the competition without their participation ever being noted.

In each of these formats, the submitted solutions are still technically generated by the high school contestants, rather than the team leader. However, the reported success rate of the students on the competition can be dramatically affected by such changes of format; a student or team of students who might not even always reach bronze medal performance if taking the competition under standard test conditions might instead reach reliable gold medal performance under some of the modified formats indicated above.

So, in the absence of a controlled test methodology that was not self-selected by the competing teams, one should be wary of making overly simplistic apples-to-apples comparisons between the performance of various AI models on competitions such as the IMO, or between such models and the human contestants.

Related to this, I will not be commenting on any self-reported AI competition performance results for which the methodology was not disclosed in advance of the competition.

EDIT: In particular, the above comments are not specific to any single result of this nature.

The catch is that this is about grading the horse’s grammar, as opposed to the observation that the horse can talk and rather intelligently and with rapidly improving performance at that.

Thus, while the objections are valid, as long as we know the AIs had no access to outside tools or to the internet (which is confirmed), we should seek the answers to these other questions but the concerns primarily matter for comparisons between models, and within a reasonably narro (in the grand scheme of things) band of capabilities.

I also would note that if OpenAI did essentially do the ‘team thinks in parallel’ thing where it had multiple inference processes running simultaneously on multiple computers, well, that is something AIs can do in the real world, and this seems entirely fair for our purposes the same way humans can fire multiple neurons at once. It’s totally fair to also want a limited-compute or one-thread category or what not, but that’s not important right now.

To use Tao’s metaphor, if you took 99.99% of high school students, you could fully and simultaneously apply all these interventions other than formal proof assistants and internet searches or hints so clear they give you the first point on a question, and you still almost always get zero.

Nat McAleese: 17 M U.S. teens grades 9-12, ~5 US IMO golds in practice but ~20 kids at gold-level. So IMO gold is one-in-a-million math talent (for 18 year olds; but I bet next Putnam falls too). 99.9999th percentile.

As a former not only math competitor but also Magic: The Gathering competitor, absolutely all these details matter for competitions, and I respect the hell out of getting all of those details right – I just don’t think that, in terms of takeaways, they change the answer much here.

In other words? Not Not So Fast. So Fast.

OpenAI chose not to officially collaborate with the IMO. They announced their result after the IMO closing ceremony and prior to the IMO 2025 closing party. Those who did collaborate agreed to wait until the following Monday, which was when Google announced. By going first, OpenAI largely stole the spotlight on this from Google, yet another case of Google Failing Marketing Forever.

A question that was debated is, did OpenAI do something wrong here?

Mikhail Samin claimed that they did, and put their hype and clout ahead of the kids celebrating their achievements against the wishes of the IMO.

OpenAI’s Noam Brown replied that they waited until after the closing ceremony exactly to avoid stealing the spotlight. He said he was the only person at OpenAI to speak to anyone at the IMO, and that person only requested waiting until after the ceremony, so that is what OpenAI did.

Not collaborating with the IMO was a choice that OpenAI made.

Mikhail Samin: AI companies that chose to cooperate with the IMO on assessment of the performance of their models had in-person meetings with IMO people on July 16. It was agreed there that announcements of AI achievements should be made on 28 July or later.

A quote from someone involved: “I certainly expect that if OpenAI had contacted the IMO in advance and expressed interest in cooperating in the assessment of their work, they would have been able to be included in that meeting, so I suppose that unless there was a major miscommunication somewhere, they effectively ended up choosing, by default or otherwise, not to cooperate with the IMO on this, and so not to be aware of what ground rules might have been agreed by those who did cooperate.”

Demis Hassabis (CEO DeepMind): Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightfully received the acclamation they deserved.

We’ve now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

Noam Brown: ~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Over the past several months, we made a lot of progress on general reasoning. This involved collecting, curating, and training on high-quality math data, which will also go into future models. In our IMO eval we did not use RAG or any tools.

Before we shared our results, we spoke with an IMO board member, who asked us to wait until after the award ceremony to make it public, a request we happily honored.

We had each submitted proof graded by 3 external IMO medalists and there was unanimous consensus on correctness. We have also posted the proofs publicly so that anyone can verify correctness.

Jasper: DeepMind got a gold medal at the IMO on Friday afternoon. But they had to wait for marketing to approve the tweet — until Monday. @OpenAI shared theirs first at 1am on Saturday and stole the spotlight.

In this game, speed > bureaucracy. Miss the moment, lose the narrative.

Clarification: I’ve been told by someone at Google that their IMO results are still being verified internally. Once that’s done, they plan to share them officially—curious to see their approach. Another source mentioned that the IMO committee asked not to publicly discuss AI involvement within a week after the closing ceremony. Things just got a bit more interesting.

Daniel Eth: “In this game, speed > bureaucracy. Miss the moment, lose the narrative.” Honestly, disagree. If GDM beats OpenAI, then the narrative will shift once that’s public.

I have reflected on this. It is not the main thing, the results are the main thing. I do think that on reflection while OpenAI did not break any agreements or their word, and strictly speaking they do not owe the IMO or the kids anything, and this presumably net increased the focus on the kids, this still does represent a meaningful failure to properly honor the competition and process, as well as offering us the proper opportunities for verification, and they should have known that this was the case. I do get that this was a small team’s last minute effort, which makes me more understanding, but it’s still not great.

Fig Spirit: then again, assuming Myers is correct about his impression of the “general coordinator view”, seems like the kind of thing that OpenAI could have known about *ifthey cared, no? by e.g. talking to the right people at the IMO… which imo is not asking much! and looks like others did?

Thus, I was careful to wait to write this until after Google’s results were announced, and have placed Google’s announcement before OpenAI’s in this post, even though due to claimed details by OpenAI I do think their achievement here is likely the more meaningful one. Perhaps that is simply Google failing marketing again and failing to share details.

Ultimately, the reason OpenAI stole my spotlight is that it harkens something general and new in a way that Google’s announcement doesn’t.

With Google sharing its results I don’t want to wait any longer, but note Harmonic?

Harmonic Math: This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad – the most prestigious mathematics competition in the world.

To uphold the sanctity of the student competition, the IMO Board has asked us, along with the other leading AI companies that participated, to hold on releasing our results until July 28th.

So please join us live on @X next Monday, July 28th at 3PM PT and hear from our CEO @tachim and Executive Chairman @vladtenev about the advent of mathematical superintelligence (and maybe a few surprises along the way).

This would be a weird flex if they didn’t also get gold, although it looks like they would have done it in a less general and thus less ultimately interesting way. On the flip side, they are not a big lab like Google or OpenAI, so that’s pretty impressive.

I think the failure to expect this was largely a mistake, but Manifold tells a clear story:

Andrew Curran: OpenAI’s new model has achieved gold level at the International Math Olympiad in a stunning result. It is a reasoning model that incorporates new experimental general-purpose techniques. This has happened much sooner than was predicted by most experts.

Noam Brown (OpenAI): When you work at a frontier lab, you usually know where frontier capabilities are months before anyone else. But this result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI. Today, everyone gets to see where the frontier is.

Peter Wildeford: AI progress comes at you fast.

JGalt Tweets: When will an AI win a Gold Medal in the International Math Olympiad? Median predicted date over time

July 2021: 2043 (22 years away)

July 2022: 2029 (7 years away)

July 2023: 2028 (5 years away)

July 2024: 2026 (2 years away)

Final result, July 2025: 2025 (now). Buckle up, Dorothy.

Some people did expect it, some of whom offered caveats.

Greg Burnham: Pretty happy with how my predictions are holding up.

5/6 was the gold medal threshold this year. OAI’s “experimental reasoning LLM” got that exactly, failing only to solve the one hard combinatorics problem, P6.

My advice remains: look beyond the medal.

Now, this is an LLM, not AlphaProof. That means LLMs have improved at proofs. I didn’t expect that so soon.

Though, FWIW, P3 is a bit of an outlier this year, at least for humans: over 15% of humans got it, higher than any P3 in the last 10 years.

But “the big one” remains whether the AI solutions show qualitatively creative problem-solving.

LLMs could already grind out “low insight” sol’ns to hard AIME problems. If OAI found a way to train them do that for olympiad proof-based problems too, that’s new, but less exciting.

So, clear progress, but not *toosurprising. I’ll keep my takes tempered until looking at the AI solutions in depth, which I hope to do soon! Above excerpts from my preregistered take on the IMO here.

Mikhail Samin: As someone who bet back in 2023 that that it’s >70% likely AI will get an IMO gold medal by 2027:

the IMO markets have been incredibly underpriced, especially for the past year.

(Sadly, another prediction I’ve been >70% confident about is that AI will literally kill everyone.)

The AIs took the IMO under the same time limits as the humans, and success was highly valued, so it is no surprise that they used parallel inference to get more done within that time frame, trading efficiency for speed.

Andrew Curran: These agentic teams based models like Grok Heavy, the Gemini Deep Think that just won gold, and the next gen from OpenAI are all going to use about fifteen times more tokens than current systems. This is why Pro plans are north of $200. Essentially; Jensen wins again.

[from June 14]: Claude Opus, coordinating four instances of Sonnet as a team, used about 15 times more tokens than normal. (90% performance boost) Jensen has mentioned similar numbers on stage recently. GPT-5 is rumored to be agentic teams based. The demand for compute will continue to increase.

Arthur B: IMO gold is super impressive.

I just want to register a prediction, I’m 80% confident the inference run cost over $1M in compute.

Mostly because if they could do it for $1M they would, and they would be able to do it for $1M before they can do it for less.

Jerry Tworek (OpenAI): I’m so limited by compute you wouldn’t believe it. Stargate can’t finish soon enough.

Sure, you solved this particular problem, but that would never generalize, right? That part is the same hype as always?

Near Cyan: you wont believe how smart our new frontier llm is. it repeatedly samples from the data manifold just like our last one. but this time we gave it new data to cover a past blindspot. watch in awe as we now sample from a slightly different area of the data manifold.

there may lay a prize at the end of the hyperdimensioanl checkered rainbow, but it’s likely not what you think it is.

i really thought someone would have done something original by now. of course, if anything was ~truly~ cooking, it shouldn’t be something i’d know about… but the years continue to pass

and, right right we have to finish *this phaseso that we have the pre-requisites. and yet.

David Holz (CEO MidJourney): noooo money can’t be dumb it’s so green.

Near Cyan: it is for now! but some of it may turn a dark crimson surprisingly quickly.

Nico: What do you make of [the OpenAI model knowing it didn’t have a correct solution to problem 6]? Sounds pretty important.

Near Cyan: seems cool i bet they have some great data.

A grand tradition is:

  1. AI can do a set of things [X] better than humans, but not a set of things [Y].

  2. People say [X] and [Y] are distinct because Moravec’s Paradox and so on.

  3. AI lab announces that [Z], previously in [Y], is now in [X].

  4. People move [Z] from [Y] to [X] and then repeat that this distinct category of things [Y] exists because Moravec’s Paradox, that one task was simply miscategorized before, so it’s fine.

Or: AI can do the things it can do, and can’t do the things it can’t do, they’re hard.

Yuchen Jin: OpenAI and DeepMind models winning IMO golds is super cool, but not surprising if you remember AlphaGo beat Lee Sedol.

What’s easy for AI can be hard for humans, and vice versa. That’s Moravec’s Paradox.

So yes, AI can win math gold medals and beat humans in competitive coding contests. But ask it to act like a competent “intern” across a multi-step project without messing things up? Still a long way to go.

To get there, models need longer context windows, far less hallucination (a single one can derail a multi-step task), and likely a new learning paradigm. RL with a single scalar +1/-1 reward at the end of a long trajectory just isn’t informative enough to drive actual learning.

An oldie but a goodie:

Colin Fraser: Can an LLM make a good IMO problem

Posting before someone else does

I mean, it probably can’t even do real math, right?

Kevin Buzzard (Mathematician, Imperial College): I certainly don’t agree that machines which can solve IMO problems will be useful for mathematicians doing research, in the same way that when I arrived in Cambridge UK as an undergraduate clutching my IMO gold medal I was in no position to help any of the research mathematicians there.

It is still entirely unclear whether things will scale from machines being able to do mathematics which can be solved using high school techniques to machines being able to help with mathematics which can only be solved by having a deep understanding of modern research ideas.

This is a big open question right now.

Hehe: What most people don’t realize is that IMO (and IOI, though to a different extent) aren’t particularly hard. They’re aimed at high schoolers, so anyone with decent uni education should be able to solve most of them.

Daniel Litt: I’m sorry, this is nonsense. Vast majority of strong math majors can’t do 5/6 IMO problems. It’s a specific skill that getting a math major doesn’t really train you for.

So yes, we still do not know for sure if being able to do [X] will extend to doing [Y], either with the same model or with a future different model, and [X] and [Y] are distinct skills such that the humans who do [X] cannot yet do [Y] and training humans to do [Y] does not give them the ability to do [X]. However please try to think ahead.

Daniel Litt: An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the horizon? Obviously not.

Worth keeping timescales in mind here: IMO competitors spend an average of 1.5 hrs on each problem. High-quality math research, by contrast, takes month or years.

What are the obstructions to AI performing high-quality autonomous math research? I don’t claim to know for sure, but I think they include many of the same obstructions that prevent it from doing many jobs:

Long context, long-term planning, consistency, unclear rewards, lack of training data, etc.

It’s possible that some or all of these will be solved soon (or have been solved) but I think it’s worth being cautious about over-indexing on recent (amazing) progress.

To briefly expand on the point about timescales: one recent paper I wrote solved a problem I’ve been thinking about since 2017. Another was 94 pages of extremely densely-written math, aimed at experts.

We don’t know much yet about how the best internal models work, but I don’t think it’s clear that getting capabilities of that level is “only” an engineering problem. That said, I do think it’s pretty likely that many or all of these issues will be solved within the span of my mathematics career.

That is all entirely fair. An IMO problem is measured in hours, not months, and is bounded in important ways. That is exactly the paradigm of METR, and the one being talked about by Noam Brown and Alexander Wei, that we have now made the move from 10 minute problems to 100 minute problems.

That does not mean we can yet solve 10,000 minute or 1 million minute problems, but why would you expect the scaling to stop here? As I discussed in the debates over AI 2027, it makes sense to think that these orders of magnitude start to get easier rather than harder once you get into longer problems. If you can do 100 minute problems that doesn’t mean you can easily go to 1000 or a million, but if you can go 1 million, I bet you can probably do 1 billion without fundamentally changing things that much, if you actually have that kind of time. At some point your timeline is ‘indefinite’ or ‘well, how much time and compute have you got?’

David White: the openai IMO news hit me pretty heavy this weekend.

i’m still in the acute phase of the impact, i think.

i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don’t think i can answer a single imo question.

ok, yes, imo is its own little athletic subsection of math for which i have not trained, etc. etc., but. if i meet someone in the wild who has an IMO gold, i immediately update to “this person is much better at math than i am”

now a bunch of robots can do it. as someone who has a lot of their identity and their actual life built around “is good at math,” it’s a gut punch. it’s a kind of dying.

like, one day you discover you can talk to dogs. it’s fun and interesting so you do it more, learning the intricacies of their language and their deepest customs. you learn other people are surprised by what you can do. you have never quite fit in, but you learn people appreciate your ability and want you around to help them. the dogs appreciate you too, the only biped who really gets it. you assemble for yourself a kind of belonging. then one day you wake up and the universal dog translator is for sale at walmart for $4.99.

the IMO result isn’t news, exactly. in fact, if you look at the METR agent task length over time plot, i think agents being able to solve ~ 1.5 hour problems is coming right on time. so in some way we should not be surprised. and indeed, it appears multiple companies have achieved the same result. it’s just… the rising tide rising as fast as it has been rising.

of course, grief for my personal identity as a mathematician (and/or productive member of society) is the smallest part of this story

multiply that grief out by *everymathematician, by every coder, maybe every knowledge worker, every artist… over the next few years… it’s a slightly bigger story

and of course, beyond that, there is the fear of actual death, which perhaps i’ll go into more later.

this package — grief for relevance, grief for life, grief for what i have known — isn’t unique to the ai age or anything like that. i think it is a standard thing as one appreaches end of career or end of life. it just might be that that is coming a bit sooner for many of us, all at once.

i wonder if we are ready

I am very confident we are not ready. If we are fortunate we might survive, but we definitely are not ready.

I grade this as minus one million points for asking the wrong questions.

Mechanize: Automating math would generate less than 1% as much value as automating software engineering.

Perhaps AI labs should focus less on chasing gold medals and focus more on the hard problem of automating SWE.

T11s: this is pretty reductionist? innovations in math uniquely enable lots of software (eg cryptography made ecommerce possible)

Deedy: Quant trading is a lot of math and accounts for $50-100B in revenue.

Never confuse costs and benefits #RulesForLife, and never reason from a price change.

(This defines ‘math’ rather narrowly as advanced Real Math that mathematicians and maybe quants and other professionals do, not the kind of math that underlies absolutely everything we do all day, since Fake Math is already mostly automated.)

The value of automating is not determined by how much we spent on it before it got automated. The value is determined by how much additional value we get out of something when we automate it, which might involve a lot more production and very diffuse benefits.

Back in February 2022, Eliezer Yudkowsky bet with Paul Christiano about IMO performance by 2025. The results were not super clear cut if you look at the details, as Christiano was in large part doubting that the hardest problem would be solved and indeed the hardest problem was #6 and was not solved, but a gold medal was still achieved.

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.

Separately, we have Paul at <4% of an AI able to solve the "hardest" problem under the same conditions.

How [I, Paul, would] update

The informative:

  • I think the IMO challenge would be significant direct evidence that powerful AI would be sooner, or at least would be technologically possible sooner. I think this would be fairly significant evidence, perhaps pushing my 2040 TAI [transformational AI] probability up from 25% to 40% or something like that.

  • I think this would be significant evidence that takeoff will be limited by sociological facts and engineering effort rather than a slow march of smooth ML scaling. Maybe I’d move from a 30% chance of hard takeoff to a 50% chance of hard takeoff.

  • If Eliezer wins, he gets 1 bit of epistemic credit. These kinds of updates are slow going, and it would be better if we had a bigger portfolio of bets, but I’ll take what we can get.

  • This would be some update for Eliezer’s view that “the future is hard to predict.” I think we have clear enough pictures of the future that we have the right to be surprised by an IMO challenge win; if I’m wrong about that then it’s general evidence my error bars are too narrow.

If an AI wins a gold on some but not all of those years, without being able to solve the hardest problems, then my update will be somewhat more limited but in the same direction.

At this point, we have a lot of people who have updated far past 40% chance of transformational AI by 2040 and have 40% for dates like 2029.

If we take all of OpenAI’s statements at face value, think about what they actually did.

Sam Altman: we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.

when we first started openai, this was a dream but not one that felt very realistic to us; it is a significant marker of how far AI has come over the past decade.

we are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we will use in future models. we think you will love GPT-5, but we don’t plan to release a model with IMO gold level of capability for many months.

Sheryl Hsu (OpenAI): Watching the model solve these IMO problems and achieve gold-level performance was magical.

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level – trying out different strategies, making observations from examples, and testing hypothesis.

It’s crazy how we’ve gone from 12% on AIME (GPT 4o) → IMO gold in ~ 15 months. We have come very far very quickly. I wouldn’t be surprised if by next year models will be deriving new theorems and contributing to original math research!

I was particularly motivated to work on this project because this win came from general research advancements. Beyond just math, we will improve on other capabilities and make ChatGPT more useful over the coming months.

Sebastien Bubeck: It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI.

Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.

Nomore ID: Read Noam’s thread carefully.

Winning a gold medal at the 2025 IMO is an outstanding achievement, but in some ways, it might just be noise that grabbed the headlines.

They have recently developed new techniques that work much better on hard-to-verify problems, have extended TTC to several hours, and have improved thinking efficiency.

Jerry Tworek (OpenAI): Why am I excited about IMO results we just published:

– we did very little IMO-specific work, we just keep training general models

– all natural language proofs

– no evaluation harness

We needed a new research breakthrough and @alexwei_ and team delivered.

Diego Aud: Jerry, is this breakthrough included in GPT-5, or is it reserved for the next generation?

Jerry Tworek: It’s a later model probably end of year thing.

Guizin: Agent 1.

Jerry Tworek: I’m so limited by compute you wouldn’t believe it. Stargate can’t finish soon enough.

Going back to Tao’s objections, we know essentially nothing about this new model, or about what Google did to get their result. Given that P3 was unusually easy this year, these scores are perhaps not themselves that terribly impressive relative to expectations.

Can we trust this? It’s not like OpenAI has never misled us on such things in the past.

In terms of the result being worthy of a 35/42, I think we can mostly trust that. They shared the solution, in its garbled semi-English, and if there was something that would have lost them points I think someone would have spotted it by now.

In terms of OpenAI otherwise cheating, we don’t have any proof about this but I think the chances of this are quite low. There’s different kinds of deception or lies, different parts of OpenAI are differently trustworthy, and this kind of lie is not in their nature nor do they have much incentive to try it given the chance it gets exposed, and the fact that if it’s not real then they won’t be able to pay it off later.

The place where one might doubt the most is, can we trust that what OpenAI did this time is more general, in the ways they are claiming?

Gary Marcus: The paradox of the OpenAI IMO discussion is that the new model scored only slightly better than DeepMind’s system from last year (as @NeelNanda5 notes); but that we assume that the new model is far more general.

Yet we have not yet seen any direct evidence of that.

It can barely speak english.

The ‘barely speak English’ part makes the solution worse in some ways but actually makes me give their claims to be doing something different more credence rather than less. It also should worry anyone who wants to maintain monitorable chain of thought.

Then again, one could say that the version that does it better, and more naturally, is thus more important, for exactly the same reasons.

Vladimir Nesov: [GDM’s] is even more surprising than OpenAI’s entry (in its details). Since it can now write proofs well automatically (even if it costs a lot and takes a lot of time), in a few months regular reasoning models might get enough training data to reliably understand what proofs are directly, and that’s an important basic ingredient for STEM capabilities.

We only have OpenAI’s word on the details of how this went down. So what to think?

I am mostly inclined to believe them on the main thrust of what is going on. That doesn’t mean that this result will generalize. I do give them credit for having something that they believe came out of a general approach, and that they expect to generalize.

Still, it’s reasonable to ask what the catch might be, that there’s always going to be a catch. Certainly it is plausible that this was, as Miles suggested, RLed to within an inch of its life, and it starting to be unable to speak English is the opposite of what is claimed, that it is losing its generality, or things are otherwise going off the rails.

The thing is, to me this doesn’t feel like it is fake. It might not be a big deal, it might not transfer all that well to other contexts, but it doesn’t feel fake.

To wrap up, another reminder that no, you can’t pretend none of this matters, and both the Google and OpenAI results matter and should update you:

Cole Wyeth: The headline result was obviously going to happen, not an update for anyone paying attention.

Garrett Baker: “Obviously going to happen” is very different from ‘happens at this point in time rather than later or sooner and with this particular announcement by this particular company’. You should still update off this. Hell, I was pretty confident this would be first done by Google DeepMind, so its a large update for me (I don’t know what for yet though)!

Your claim “not an update for anyone paying attention” also seems false. I’m sure there are many who are updating off this who were paying attention, for whatever reason, as they likely should.

I generally dislike this turn of phrase as it serves literally no purpose but to denigrate people who are changing their mind in light of evidence, which is just a bad thing to do.

cdt: I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another.

Discussion about this post

Google and OpenAI Get 2025 IMO Gold Read More »

xai-workers-balked-over-training-request-to-help-“give-grok-a-face,”-docs-show

xAI workers balked over training request to help “give Grok a face,” docs show

For the more than 200 employees who did not opt out, xAI asked that they record 15- to 30-minute conversations, where one employee posed as the potential Grok user and the other posed as the “host.” xAI was specifically looking for “imperfect data,” BI noted, expecting that only training on crystal-clear videos would limit Grok’s ability to interpret a wider range of facial expressions.

xAI’s goal was to help Grok “recognize and analyze facial movements and expressions, such as how people talk, react to others’ conversations, and express themselves in various conditions,” an internal document said. Allegedly among the only guarantees to employees—who likely recognized how sensitive facial data is—was a promise “not to create a digital version of you.”

To get the most out of data submitted by “Skippy” participants, dubbed tutors, xAI recommended that they never provide one-word answers, always ask follow-up questions, and maintain eye contact throughout the conversations.

The company also apparently provided scripts to evoke facial expressions they wanted Grok to understand, suggesting conversation topics like “How do you secretly manipulate people to get your way?” or “Would you ever date someone with a kid or kids?”

For xAI employees who provided facial training data, privacy concerns may still exist, considering X—the social platform formerly known as Twitter that recently was folded into xAI—has recently been targeted by what Elon Musk called a “massive” cyberattack. Because of privacy risks ranging from identity theft to government surveillance, several states have passed strict biometric privacy laws to prevent companies from collecting such data without explicit consent.

xAI did not respond to Ars’ request for comment.

xAI workers balked over training request to help “give Grok a face,” docs show Read More »

mercedes-amg-gives-us-a-ride-in-its-next-high-performance-ev

Mercedes-AMG gives us a ride in its next high-performance EV

The first thing I noticed was the simulated engine noise. It was developed to be unique to AMG.EA, taking inspiration from some of the great AMGs of the past. AMG boss Michael Schiebe tells us that they set up shop outside the offices and had people drive by in various cars to find the right engine and exhaust notes to fit into the creation. It’s a deep, throaty sound.

It’s a sound you can feel

Seriously, I feel something in my seat. The engineer later asks if I notice anything in my seat, and while I can’t confirm what it was adding to the sound—be it a speaker or a motor—it does help make the car feel more alive.

The artificial gearshifts are more than just halting power for a brief period; they’re part of a mapped-out torque curve. Like in the Hyundai, you can feel the acceleration build like you would in a combustion engine. It’s not as prominent as in the Hyundai, but it’s there.

When the car shifts, it feels a lot like a ZF 8-speed automatic in a modern performance car. It’s smooth, but enthusiastic. It’s not as extreme as the Hyundai, but I’d argue the Hyundai driver and the AMG driver are looking for a different experience, with the AMG being a bit more adult.

The AMG also, at least as it sits now, will automatically upshift at redline. The fun thing about the Hyundai is, if you intentionally miss a shift, the car will throw your head into the steering wheel as you hit the artificial rev limiter. It’s hilarious. The vibe from the prototype I’m in is that things are a bit more serious.

Mercedes-AMG gives us a ride in its next high-performance EV Read More »

uk-backing-down-on-apple-encryption-backdoor-after-pressure-from-us

UK backing down on Apple encryption backdoor after pressure from US

Under the terms of the legislation, recipients of such a notice are unable to discuss the matter publicly, even with customers affected by the order, unless granted permission by the Home Secretary.

The legislation’s use against Apple has triggered the tech industry’s highest-profile battle over encryption technology in almost a decade.

In response to the demand, Apple withdrew its most secure cloud storage service from the UK in February and is now challenging the Home Office’s order at the Investigatory Powers Tribunal, which probes complaints against the UK’s security services.

Last month, Meta-owned WhatsApp said it would join Apple’s legal challenge, in a rare collaboration between the Silicon Valley rivals.

In the meantime, the Home Office continues to pursue its case with Apple at the tribunal.

Its lawyers discussed the next legal steps this month, reflecting the divisions within government over how best to proceed. “At this point, the government has not backed down,” said one person familiar with the legal process.

A third senior British official added that the UK government was reluctant to push “anything that looks to the US vice-president like a free-speech issue.”

In a combative speech at the Munich Security Conference in February, Vance argued that free speech and democracy were threatened by European elites.

The UK official added, this “limits what we’re able to do in the future, particularly in relation to AI regulation.” The Labour government has delayed plans for AI legislation until after May next year.

Trump has also been critical of the UK stance on encryption.

The US president has likened the UK’s order to Apple to “something… that you hear about with China,” saying in February that he had told Starmer: “You can’t do this.”

US Director of National Intelligence Tulsi Gabbard has also suggested the order would be an “egregious violation” of Americans’ privacy that risked breaching the two countries’ data agreement.

Apple did not respond to a request for comment. “We have never built a back door or master key to any of our products, and we never will,” Apple said in February.

The UK government did not respond to a request for comment.

A spokesperson for Vance declined to comment.

The Home Office has previously said the UK has “robust safeguards and independent oversight to protect privacy” and that these powers “are only used on an exceptional basis, in relation to the most serious crimes.”

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

UK backing down on Apple encryption backdoor after pressure from US Read More »

after-a-partly-successful-test-flight,-european-firm-eyes-space-station-mission

After a partly successful test flight, European firm eyes space station mission

Last month, the parachutes on Hélène Huby’s small spacecraft failed to deploy, and the vehicle and its cargo crashed into the ocean on Earth.

It was both a success and a failure.

The success was that after Huby founded The Exploration Company in Europe, she managed to move nimbly with the “Mission Possible” spacecraft such that it cost less than $25 million to build and reached space in less than three years. The vehicle ticked off a number of successes in spaceflight before making a controlled descent through the atmosphere.

But at 26 km above the planet, as the spacecraft slowed to Mach one, The Exploration Company lost contact. Huby was not sure how this loss would be received in Europe, where failures in spaceflight have not been traditionally well-tolerated.

“What was interesting is the feedback I got in Europe,” Huby said in an interview this week at the company’s offices in Houston. “The German Space Agency, the French space agency, the European Space Agency said, OK, that’s a great achievement. For the time and money we spent, performing 80 percent of that mission was a good investment.”

No drop tests

After the spacecraft was lost on June 24, the company established an independent investigation team. Huby said it is “99 percent” confirmed there was a problem with the deployment of the parachutes, either the drogue chutes or the main parachutes. The fault was not with the provider of the parachutes themselves, US-based Airborne Systems, but the company’s mechanism, she said.

To save time and money, The Exploration Company did not conduct any drop tests. Such a campaign would have added millions of dollars to a program that was trying to be lean, plus a year of schedule to a mission attempting to move fast.

“We made a mistake, basically, to underestimate the risks,” she said. In retrospect, Huby added, the company could have done more testing on the ground.

Now the firm faces a big decision: How to proceed from here. One option is building another small spacecraft, similar to Mission Possible, for testing purposes. But there is limited commonality in the parachute system for this vehicle and the larger Nyx spacecraft the company is building for operational missions. So if the Mission Possible parachutes were to work, that would not guarantee success for Nyx.

After a partly successful test flight, European firm eyes space station mission Read More »

rfk-jr.-wants-to-change-program-that-stopped-vaccine-makers-from-leaving-us-market

RFK Jr. wants to change program that stopped vaccine makers from leaving US market


RFK Jr. is targeting a little-known program that underpins childhood immunizations in the US.

US Secretary of Health and Human Services Robert F. Kennedy Jr. testifies before the Senate Committee on Health, Education, Labor, and Pensions on Capitol Hill on May 20, 2025 in Washington, DC. Credit: Getty | Tasos Katopodis

This story was originally published by ProPublica.

Five months after taking over the federal agency responsible for the health of all Americans, Robert F. Kennedy Jr. wants to overhaul an obscure but vital program that underpins the nation’s childhood immunization system.

Depending on what he does, the results could be catastrophic.

In his crosshairs is the Vaccine Injury Compensation Program, a system designed to provide fair and quick payouts for people who suffer rare but serious side effects from shots—without having to prove that drugmakers were negligent. Congress created the program in the 1980s when lawsuits drove vaccine makers from the market. A special tax on immunizations funds the awards, and manufacturers benefit from legal protections that make it harder to win big-money verdicts against them in civil courts.

Kennedy, who founded an anti-vaccination group and previously accused the pharmaceutical industry of inflicting “unnecessary and risky vaccines” on children for profits, has long argued that the program removes any incentive for the industry to make safe products.

In a recent interview with Tucker Carlson, Kennedy condemned what he called corruption in the program and said he had assigned a team to overhaul it and expand who could seek compensation. He didn’t detail his plans but did repeat the long-debunked claim that vaccines cause autism and suggested, without citing any evidence, that shots could also be responsible for a litany of chronic ailments, from diabetes to narcolepsy.

There are a number of ways he could blow up the program and prompt vaccine makers to stop selling shots in the US, like they did in the 1980s. The trust fund that pays awards, for instance, could run out of money if the government made it easy for Kennedy’s laundry list of common health problems to qualify for payments from the fund.

Or he could pick away at the program one shot at a time. Right now, immunizations routinely recommended for children or pregnant women are covered by the program. Kennedy has the power to drop vaccines from the list, a move that would open up their manufacturers to the kinds of lawsuits that made them flee years ago.

Dr. Eddy Bresnitz, who served as New Jersey’s state epidemiologist and then spent a dozen years as a vaccine executive at Merck, is among those worried.

“If his unstated goal is to basically destroy the vaccine industry, that could do it,” said Bresnitz, who retired from Merck and has consulted for vaccine manufacturers. “I still believe, having worked in the industry, that they care about protecting American health, but they are also for-profit companies with shareholders, and anything that detracts from the bottom line that can be avoided, they will avoid.”

A spokesperson for PhRMA, a US trade group for pharmaceutical companies, told ProPublica in a written statement that upending the Vaccine Injury Compensation Program “would threaten continued patient access to FDA-approved vaccines.”

The spokesperson, Andrew Powaleny, said the program “has compensated thousands of claims while helping ensure the continued availability of a safe and effective vaccine supply. It remains a vital safeguard for public health and importantly doesn’t shield manufacturers from liability.”

Since its inception, the compensation fund has paid about $4.8 billion in awards for harm from serious side effects, such as life-threatening allergic reactions and Guillain-Barré syndrome, an autoimmune condition that can cause paralysis. The federal agency that oversees the program found that for every 1 million doses of vaccine distributed between 2006 and 2023, about one person was compensated for an injury.

Since becoming Health and Human Services secretary, Kennedy has turned the staid world of immunizations on its ear. He reneged on the US government’s pledge to fund vaccinations for the world’s poorest kids. He fired every member of the federal advisory group that recommends which shots Americans get, and his new slate vowed to scrutinize the US childhood immunization schedule. Measles, a vaccine-preventable disease eliminated here in 2000, roared back and hit a grim record—more cases than the US has seen in 33 years, including three deaths. When a US senator asked Kennedy if he recommended measles shots, Kennedy answered, “Senator, if I advised you to swim in a lake that I knew there to be alligators in, wouldn’t you want me to tell you there were alligators in it?”

Fed up, the American Academy of Pediatrics and other medical societies sued Kennedy last week, accusing him of dismantling “the longstanding, Congressionally-authorized, science- and evidence-based vaccine infrastructure that has prevented the deaths of untold millions of Americans.” (The federal government has yet to respond to the suit.)

Just about all drugs have side effects. What’s unusual about vaccines is that they’re given to healthy people—even newborns on their first day of life. And many shots protect not just the individuals receiving them but also the broader community by making it harder for deadly scourges to spread. The Centers for Disease Control and Prevention estimates that routine childhood immunizations have prevented more than 1.1 million deaths and 32 million hospitalizations among the generation of Americans born between 1994 and 2023.

To most people, the nation’s vaccine system feels like a solid, reliable fact of life, doling out shots to children like clockwork. But in reality it is surprisingly fragile.

There are only a handful of companies that make nearly all of the shots children receive. Only one manufacturer makes chickenpox vaccines. And just two or three make the shots that protect against more than a dozen diseases, including polio and measles. If any were to drop out, the country could find itself in the same crisis that led President Ronald Reagan to sign the law creating the Vaccine Injury Compensation Program in 1986.

Back then, pharmaceutical companies faced hundreds of lawsuits alleging that the vaccine protecting kids from whooping cough, diphtheria, and tetanus caused unrelenting seizures that led to severe disabilities. (Today’s version of this shot is different.) One vaccine maker after another left the US market.

At one point, pediatricians could only buy whooping cough vaccines from a single company. Shortages were so bad that the CDC recommended doctors stop giving booster shots to preserve supplies for the most vulnerable babies.

While Congress debated what to do, public health clinics’ cost per dose jumped 5,000 percent in five years.

“We were really concerned that we would lose all vaccines, and we would get major resurgences of vaccine-preventable diseases,” recalled Dr. Walter Orenstein, a vaccine expert who worked in the CDC’s immunization division at the time.

A Forbes headline captured the anxiety of parents, pediatricians, and public health workers: “Scared Shotless.” So a bipartisan group in Congress hammered out the no-fault system.

Today, the program covers vaccines routinely recommended for children or pregnant women once Congress approves the special tax that funds awards. (COVID-19 shots are part of a separate, often-maligned system for handling claims of harm, though Kennedy has said he’s looking at ways to add them to the Vaccine Injury Compensation Program.)

Under program rules, people who say they are harmed by covered vaccines can’t head straight to civil court to sue manufacturers. First, they have to go through the no-fault system. The law established a table of injuries and the time frame for when those conditions must have appeared in order to be considered for quicker payouts. A tax on those vaccines — now 75 cents for every disease that a shot protects against — flows into a trust fund that pays those approved for awards. Win or lose, the program, for the most part, pays attorney fees and forbids lawyers from taking a cut of the money paid to the injured.

The law set up a dedicated vaccine court where government officials known as special masters, who operate like judges, rule on cases without juries. People can ask for compensation for health problems not listed on the injury table, and they don’t have to prove that the vaccine maker was negligent or failed to warn them about the medical condition they wound up with. At the same time, they can’t claim punitive damages, which drive up payouts in civil courts, and pain and suffering payments are capped at $250,000.

Plaintiffs who aren’t satisfied with the outcome or whose cases drag on too long can exit the program and file their cases in traditional civil courts. There they can pursue punitive damages, contingency-fee agreements with lawyers and the usual evidence gathering that plaintiffs use to hold companies accountable for wrongdoing.

But a Supreme Court ruling, interpreting the law that created the Vaccine Injury Compensation Program, limited the kinds of claims that can prevail in civil court. So while the program isn’t a full liability shield for vaccine makers, its very existence significantly narrows the cases trial lawyers can file.

Kennedy has been involved in such civil litigation. In his federal disclosures, he revealed that he referred plaintiffs to a law firm filing cases against Merck over its HPV shot in exchange for a 10 percent cut of the fees if they win. After a heated exchange with Sen. Elizabeth Warren during his confirmation proceedings, Kennedy said his share of any money from those cases would instead go to one of his adult sons, who he later said is a lawyer in California. His son Conor works as an attorney at the Los Angeles law firm benefiting from his referrals. When ProPublica asked about this arrangement, Conor Kennedy wrote, “I don’t work on those cases and I’m not receiving any money from them.”

In March, a North Carolina federal judge overseeing hundreds of cases that alleged Merck failed to warn patients about serious side effects from its HPV vaccine ruled in favor of Merck; an appeal is pending.

The Vaccine Injury Compensation Program succeeded in stabilizing the business of childhood vaccines, with many more shots developed and approved in the decades since it was established. But even ardent supporters acknowledge there are problems. The program’s staff levels haven’t kept up with the caseload. The law capped the number of special masters at eight, and congressional bills to increase that have failed. An influx of adult claims swamped the system after adverse reactions to flu shots became eligible for compensation in 2005 and serious shoulder problems were added to the injury table in 2017.

The quick and smooth system of payouts originally envisioned has evolved into a more adversarial one with lawyers for the Department of Justice duking it out with plaintiffs’ attorneys, which Kennedy says runs counter to the program’s intent. Many cases drag on for years.

In his recent interview with Carlson, he described “the lawyers of the Department of Justice, the leaders of it” working on the cases as corrupt. “They saw their job as protecting the trust fund rather than taking care of people who made this national sacrifice, and we’re going to change all that,” he said. “And I’ve brought in a team this week that is starting to work on that.”

The system is “supposed to be generous and fast and gives a tie to the runner,” he told Carlson. “In other words, if there’s doubts about, you know, whether somebody’s injury came from a vaccine or not, you’re going to assume they got it and compensate them.”

Kennedy didn’t identify who is on the team reviewing the program. At one point in the interview, he said, “We just brought a guy in this week who’s going to be revolutionizing the Vaccine Injury Compensation Program.”

The HHS employee directory now lists Andrew Downing as a counselor working in Kennedy’s office. Downing for many years has filed claims with the program and suits in civil courts on behalf of clients alleging harm from shots. Last month, HHS awarded a contract for “Vaccine Injury Compensation Program expertise” to Downing’s firm, as NOTUS has reported.

Downing did not respond to a voicemail left at his law office. HHS didn’t reply to a request to make him and Kennedy available for an interview and declined to answer detailed questions about its plans for the Vaccine Injury Compensation Program. In the past, an HHS spokesperson has said that Kennedy is “not anti-vaccine—he is pro-safety.”

While it’s not clear what changes Downing and Kennedy have in mind, Kennedy’s interview with Carlson offered some insights. Kennedy said he was working to expand the program’s three-year statute of limitations so that more people can be compensated. Downing has complained that patients who have certain autoimmune disorders don’t realize their ailments were caused by a vaccine until it’s too late to file. Congress would have to change the law to allow this, experts said.

A key issue is whether Kennedy will try to add new ailments to the list of injuries that qualify for quicker awards.

In the Carlson interview, Kennedy dismissed the many studies and scientific consensus that shots don’t cause autism as nothing more than statistical trickery. “We’re going to do real science,” Kennedy said.

The vaccine court spent years in the 2000s trying cases that alleged autism was caused by the vaccine ingredient thimerosal and the shot that protects people from measles, mumps, and rubella. Facing more than 5,000 claims, the court asked a committee of attorneys representing children with autism to pick test cases that represented themes common in the broader group. In the cases that went to trial, the special masters considered more than 900 medical articles and heard testimony from dozens of experts. In each of those cases, the special masters found that the shots didn’t cause autism.

In at least two subsequent cases, children with autism were granted compensation because they met the criteria listed in the program’s injury table, according to a vaccine court decision. That table, for instance, lists certain forms of encephalopathy—a type of brain dysfunction—as a rare side effect of shots that protect people from whooping cough, measles, mumps, and rubella. In a 2016 vaccine court ruling, Special Master George L. Hastings Jr. explained, “The compensation of these two cases, thus does not afford any support to the notion that vaccinations can contribute to the causation of autism.”

Hastings noted that when Congress set up the injury table, the lawmakers acknowledged that people would get compensated for “some injuries that were not, in fact, truly vaccine-caused.”

Many disabling neurological disorders in children become apparent around the time kids get their shots. Figuring out whether the timing was coincidental or an indication that the vaccines caused the problem has been a huge challenge.

Devastating seizures in young children were the impetus for the compensation program. But in the mid-1990s, after a yearslong review of the evidence, HHS removed seizure disorder from the injury table and narrowed the type of encephalopathy that would automatically qualify for compensation. Scientists subsequently have discovered genetic mutations that cause some of the most severe forms of epilepsy.

What’s different now, though, is that Kennedy, as HHS secretary, has the power to add autism or other disorders to that injury table. Experts say he’d have to go through the federal government’s cumbersome rulemaking process to do so. He could also lean on federal employees to green-light more claims.

In addition, Kennedy has made it clear he’s thinking about illnesses beyond autism. “We have now this epidemic of immune dysregulation in our country, and there’s no way to rule out vaccines as one of the key culprits,” he told Carlson. Kennedy mentioned diabetes, rheumatoid arthritis, seizure disorders, ADHD, speech delay, language delay, tics, Tourette syndrome, narcolepsy, peanut allergies, and eczema.

President Donald Trump’s budget estimated that the value of the investments in the Vaccine Injury Compensation Program trust fund could reach $4.8 billion this year. While that’s a lot of money, a life-care plan for a child with severe autism can cost tens of millions of dollars, and the CDC reported in April that 1 in 31 children is diagnosed with autism by their 8th birthday. The other illnesses Kennedy mentioned also affect a wide swath of the US population.

Dr. Paul Offit, a co-inventor of a rotavirus vaccine and director of the Vaccine Education Center at Children’s Hospital of Philadelphia, for years has sparred with Kennedy over vaccines. Offit fears that Kennedy will use flawed studies to justify adding autism and other common medical problems to the injury table, no matter how much they conflict with robust scientific research.

“You can do that, and you will bankrupt the program,” he said. “These are ways to end vaccine manufacturing in this country.”

If the trust fund were to run out of money, Congress would have to act, said Dorit Reiss, a law professor at University of California Law San Francisco who has studied the Vaccine Injury Compensation Program. Congress could increase the excise tax on vaccines, she said, or pass a law limiting what’s on the injury table. Or Congress could abolish the program, and the vaccine makers would find themselves back in the situation they faced in the 1980s.

“That’s not unrealistic,” Reiss said.

Rep. Paul Gosar, an Arizona Republican, last year proposed the End the Vaccine Carveout Act, which would have allowed people to bypass the no-fault system and head straight to civil court. His press release for the bill—written in September, before Kennedy’s ascension to HHS secretary—quoted Kennedy saying, “If we want safe and effective vaccines, we need to end the liability shield.”

The legislation never came up for a vote. A spokesperson for the congressman said he expects to introduce it again “in the very near future.”

Renée Gentry, director of the George Washington University Law School’s Vaccine Injury Litigation Clinic, thinks it’s unlikely Congress will blow up the no-fault program. But Gentry, who represents people filing claims for injuries, said it’s hard to predict what Congress, faced with a doomsday scenario, would do.

“Normally Democrats are friends of plaintiffs’ lawyers,” she said. “But talking about vaccines on the Hill is like walking on a razor blade that’s on fire.”

Photo of ProPublica

RFK Jr. wants to change program that stopped vaccine makers from leaving US market Read More »