Author name: Kris Guyer

north-korean-hackers-ran-us-based-“laptop-farm”-from-arizona-woman’s-home

North Korean hackers ran US-based “laptop farm” from Arizona woman’s home

As the number of computers mounted, Chapman began stacking them on shelves around her residence, labeling them with sticky notes so she could remember which “worker” and company controlled which machine. When Chapman’s home was searched, FBI agents took photos of her setup, which is… something to behold, really.

Chapman’s origin story is a sad one. According to her public defender, her childhood was marked by “her father’s infidelity, alcoholism, and emotional absence.” Chapman was placed in 12 different schools across multiple states before she graduated high school, “leaving her socially isolated, bullied, and unable to form lasting friendships or a sense of belonging.” She also suffered “severe and escalating violence from her older brother, who repeatedly beat and choked her, held a shotgun to her chest, and once left her so visibly bruised that her school intervened.” And she was “sexually abused at various points in her childhood and adolescence by family members, peers, and even individuals she believed to be friends.”

Unfortunately, Chapman’s poor choice to involve herself with the North Koreans inflicted plenty of pain on others, too, including those whose identity was stolen. One victim told the court that the crime “left me feeling violated, helpless, and afraid,” adding:

Although identity theft is not a physical assault, the psychological and financial damage is lasting. It feels like someone broke into my life, impersonated me, and left me to pick up the pieces. There is a lingering fear that my information is still out there, ready to be misused again. The stigma of being a fraud victim also weighs heavily; I have had to explain myself to banks, creditors, and sometimes even to people I know. There is an ongoing sense of vulnerability and lack of control.

In addition to her 8.5-year sentence, Chapman will serve three years of “supervised release,” must forfeit $284,555 that was meant for the North Koreans, and must repay $176,850 of her own money.

Such “remote work” scams have become increasingly common over the last few years, most originating from North Korea, and the FBI has released repeated guidance on what to look for when hiring remote workers.

North Korean hackers ran US-based “laptop farm” from Arizona woman’s home Read More »

going-chain-free-with-the-priority-gemini-gravel-bike

Going chain-free with the Priority Gemini gravel bike

It’s no speed demon—if you’re looking for something to race, this isn’t it, as the Gemini is designed more for long, leisurely gravel rides. It was fantastic on a 68-mile jaunt down the Kal-Haven Trail with Space Editor Eric Berger, but it’s not the bike for exploring those single-track paths leading off the main trail.

It’s almost the perfect commuter bike. The slightly flared (6-degree) drop bars help the rider settle into a comfortable riding position, and the belt drive is nearly silent. Best of all, we were able to hose the bike off after a wet or muddy ride without having to dry off and lubricate a chain. That’s a big win for winter and springtime commuting. The Gemini also has rack mounts (in addition to two water bottle cage mounts and mounts for a top tube bag) for bikepacking or commuting.

We noticed a couple of irritants with the Gemini. First, there’s no way to tell which gear you’re in. Well, that’s not entirely true—if you have your phone in a mount with the Pinion app running, you can see your gearing. Unfortunately, the Pinion gearbox won’t pair with a cycling computer, so there’s no way to check your gearing as is possible with electronic groupsets from SRAM and Shimano.

Shifting could be touchy at times. The belt drive is sensitive to excessive torque, so we found that shifts sometimes did not register when pedaling too hard. During our testing, we learned to ease up when shifting, but we found this mildly annoying at times. Lastly, the charge port for the gearbox is on the end of a wire that just flops around in an annoying and unsightly fashion.

Annoyances to be sure, but these are far from deal-breakers for this bicycle. Being able to tweak the shifting and make minor adjustments via an app are useful features to have, and the belt drive makes cleanup after dirty rides a breeze. The 600 percent gear ratio will no doubt help on big climbs (not that we have many of those in Chicagoland). At $3,499, you’ll be paying more for this than the lower-end gravel offerings from the likes of Trek and Specialized. But if you’re looking for a low-maintenance daily driver, the Gemini Gravel fits the bill.

Going chain-free with the Priority Gemini gravel bike Read More »

google’s-new-“web-guide”-will-use-ai-to-organize-your-search-results

Google’s new “Web Guide” will use AI to organize your search results

Web Guide is halfway between normal search and AI Mode.

Credit: Google

Web Guide is halfway between normal search and AI Mode. Credit: Google

Google suggests trying Web Guide with longer or open-ended queries, like “how to solo travel in Japan.” The video below uses that search as an example. It has many of the links you might expect, but there are also AI-generated headings with summaries and suggestions. It really looks halfway between standard search and AI Mode. Because it has to run additional searches and generate content, Web Guide takes a beat longer to produce results compared to a standard search. There’s no AI Overview at the top, though.

Web Guide is a Search Labs experiment, meaning you have to opt-in before you’ll see any AI organization in your search results. When enabled, this feature takes over the “Web” tab of Google search. Even if you turn it on, Google notes there will be a toggle that allows you to revert to the normal, non-AI-optimized page.

An example of the Web Guide test.

Eventually, the test will expand to encompass more parts of the search experience, like the “All” tab—that’s the default search experience when you input a query from a browser or phone search bar. Google says it’s approaching this as an opt-in feature to start. So that sounds like Web Guide might be another AI Mode situation in which the feature rolls out widely after a short testing period. It’s technically possible the test will not result in a new universal search feature, but Google hasn’t yet met a generative AI implementation that it hasn’t liked.

Google’s new “Web Guide” will use AI to organize your search results Read More »

trump’s-order-to-make-chatbots-anti-woke-is-unconstitutional,-senator-says

Trump’s order to make chatbots anti-woke is unconstitutional, senator says


Trump plans to use chatbots to eliminate dissent, senator alleged.

The CEOs of every major artificial intelligence company received letters Wednesday urging them to fight Donald Trump’s anti-woke AI order.

Trump’s executive order requires any AI company hoping to contract with the federal government to jump through two hoops to win funding. First, they must prove their AI systems are “truth-seeking”—with outputs based on “historical accuracy, scientific inquiry, and objectivity” or else acknowledge when facts are uncertain. Second, they must train AI models to be “neutral,” which is vaguely defined as not favoring DEI (diversity, equity, and inclusion), “dogmas,” or otherwise being “intentionally encoded” to produce “partisan or ideological judgments” in outputs “unless those judgments are prompted by or otherwise readily accessible to the end user.”

Announcing the order in a speech, Trump said that the US winning the AI race depended on removing allegedly liberal biases, proclaiming that “once and for all, we are getting rid of woke.”

“The American people do not want woke Marxist lunacy in the AI models, and neither do other countries,” Trump said.

Senator Ed Markey (D.-Mass.) accused Republicans of basing their policies on feelings, not facts, joining critics who suggest that AI isn’t “woke” just because of a few “anecdotal” outputs that reflect a liberal bias. And he suggested it was hypocritical that Trump’s order “ignores even more egregious evidence” that contradicts claims that AI is trained to be woke, such as xAI’s Elon Musk explicitly confirming that Grok was trained to be more right-wing.

“On May 1, 2025, Grok—the AI chatbot developed by xAI, Elon Musk’s AI company—acknowledged that ‘xAI tried to train me to appeal to the right,’” Markey wrote in his letters to tech giants. “If OpenAI’s ChatGPT or Google’s Gemini had responded that it was trained to appeal to the left, congressional Republicans would have been outraged and opened an investigation. Instead, they were silent.”

He warned the heads of Alphabet, Anthropic, Meta, Microsoft, OpenAI, and xAI that Trump’s AI agenda was allegedly “an authoritarian power grab” intended to “eliminate dissent” and was both “dangerous” and “patently unconstitutional.”

Even if companies’ AI models are clearly biased, Markey argued that “Republicans are using state power to pressure private companies to adopt certain political viewpoints,” which he claimed is a clear violation of the First Amendment. If AI makers cave, Markey warned, they’d be allowing Trump to create “significant financial incentives” to ensure that “their AI chatbots do not produce speech that would upset the Trump administration.”

“This type of interference with private speech is precisely why the US Constitution has a First Amendment,” Markey wrote, while claiming that Trump’s order is factually baseless.

It’s “based on the erroneous belief that today’s AI chatbots are ‘woke’ and biased against Trump,” Markey said, urging companies “to fight this unconstitutional executive order and not become a pawn in Trump’s effort to eliminate dissent in this country.”

One big reason AI companies may fight order

Some experts agreed with Markey that Trump’s order was likely unconstitutional or otherwise unlawful, The New York Times reported.

For example, Trump may struggle to convince courts that the government isn’t impermissibly interfering with AI companies’ protected speech or that such interference may be necessary to ensure federal procurement of unbiased AI systems.

Genevieve Lakier, a law professor at the University of Chicago, told the NYT that the lack of clarity around what makes a model biased could be a problem. Courts could deem the order an act of “unconstitutional jawboning,” with the Trump administration and Republicans generally perceived as using legal threats to pressure private companies into producing outputs that they like.

Lakier suggested that AI companies may be so motivated to win government contracts or intimidated by possible retaliation from Trump that they may not even challenge the order, though.

Markey is hoping that AI companies will refuse to comply with the order; however, despite recognizing that it places companies “in a difficult position: Either stand on your principles and face the wrath of the Trump administration or cave to Trump and modify your company’s political speech.”

There is one big possible reason that AI companies may have to resist, though.

Oren Etzioni, the former CEO of the AI research nonprofit Allen Institute for Artificial Intelligence, told CNN that Trump’s anti-woke AI order may contradict the top priority of his AI Action Plan—speeding up AI innovation in the US—and actually threaten to hamper innovation.

If AI developers struggle to produce what the Trump administration considers “neutral” outputs—a technical challenge that experts agree is not straightforward—that could delay model advancements.

“This type of thing… creates all kinds of concerns and liability and complexity for the people developing these models—all of a sudden, they have to slow down,” Etzioni told CNN.

Senator: Grok scandal spotlights GOP hypocrisy

Some experts have suggested that rather than chatbots adopting liberal viewpoints, chatbots are instead possibly filtering out conservative misinformation and unintentionally appearing to favor liberal views.

Andrew Hall, a professor of political economy at Stanford Graduate School of Business—who published a May paper finding that “Americans view responses from certain popular AI models as being slanted to the left”—told CNN that “tech companies may have put extra guardrails in place to prevent their chatbots from producing content that could be deemed offensive.”

Markey seemed to agree, writing that Republicans’ “selective outrage matches conservatives’ similar refusal to acknowledge that the Big Tech platforms suspend or impose other penalties disproportionately on conservative users because those users are disproportionately likely to share misinformation, rather than due to any political bias by the platforms.”

It remains unclear what amount of supposed bias detected in outputs could cause a contract bid to be rejected or an ongoing contract to be canceled, but AI companies will likely be on the hook to pay any fees in terminating contracts.

Complying with Trump’s order could pose a struggle for AI makers for several reasons. First, they’ll have to determine what’s fact and what’s ideology, contending with conflicting government standards in how Trump defines DEI. For example, the president’s order counts among “pervasive and destructive” DEI ideologies any outputs that align with long-standing federal protections against discrimination on the basis of race or sex. In addition, they must figure out what counts as “suppression or distortion of factual information about” historical topics like critical race theory, systemic racism, or transgenderism.

The examples in Trump’s order highlighting outputs offensive to conservatives seem inconsequential. He calls out image generators depicting the Pope, the Founding Fathers, and Vikings as not white as problematic, as well as models refusing to misgender a person “even if necessary to stop a nuclear apocalypse” or show white people celebrating their achievements.

It’s hard to imagine how these kinds of flawed outputs could impact government processes, as compared to, say, government contracts granted to models that could be hiding covert racism or sexism.

So far, there has been one example of an AI model displaying a right-wing bias earning a government contract with no red flags raised about its outputs.

Earlier this summer, Grok shocked the world after Musk announced he would be updating the bot to eliminate a supposed liberal bias. The unhinged chatbot began spouting offensive outputs, including antisemitic posts that praised Hitler as well as proclaiming itself “MechaHitler.”

But those obvious biases did not conflict with the Pentagon’s decision to grant xAI a $200 million federal contract. In a statement, a Pentagon spokesperson insisted that “the antisemitism episode wasn’t enough to disqualify” xAI, NBC News reported, partly since “several frontier AI models have produced questionable outputs.”

The Pentagon’s statement suggested that the government expected to deal with such risks while seizing the opportunity of rapidly deploying emerging AI technology into government prototype processes. And perhaps notably, Trump provides a carveout for any agencies using AI models to safeguard national security, which could exclude the Pentagon from experiencing any “anti-woke” delays in accessing frontier models.

But that won’t help other agencies that must figure out how to assess models to meet anti-woke AI requirements over the next few months. And those assessments could cause delays that Trump may wish to avoid in pushing for widespread AI adoption across government.

Trump’s anti-woke AI agenda may be impossible

On the same day that Trump issued his anti-woke AI order, his AI Action Plan promised an AI “renaissance” fueling “intellectual achievements” by “unraveling ancient scrolls once thought unreadable, making breakthroughs in scientific and mathematical theory, and creating new kinds of digital and physical art.”

To achieve that, the US must “innovate faster and more comprehensively than our competitors” and eliminate regulatory barriers impeding innovation in order to “set the gold standard for AI worldwide.”

However, achieving the anti-woke ambitions of both orders raises a technical problem that even the president must accept currently has no solution. In his AI Action Plan, Trump acknowledged that “the inner workings of frontier AI systems are poorly understood,” with even “advanced technologists” unable to explain “why a model produced a specific output.”

Whether requiring AI companies to explain their AI outputs to win government contracts will mess with other parts of Trump’s action plan remains to be seen. But Samir Jain, vice president of policy at a civil liberties group called the Center for Democracy and Technology, told the NYT that he predicts the anti-woke AI agenda will set “a really vague standard that’s going to be impossible for providers to meet.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s order to make chatbots anti-woke is unconstitutional, senator says Read More »

white-house-unveils-sweeping-plan-to-“win”-global-ai-race-through-deregulation

White House unveils sweeping plan to “win” global AI race through deregulation

Trump’s plan was not welcomed by everyone. J.B. Branch, Big Tech accountability advocate for Public Citizen, in a statement provided to Ars, criticized Trump as giving “sweetheart deals” to tech companies that would cause “electricity bills to rise to subsidize discounted power for massive AI data centers.”

Infrastructure demands and energy requirements

Trump’s new AI plan tackles infrastructure head-on, stating that “AI is the first digital service in modern life that challenges America to build vastly greater energy generation than we have today.” To meet this demand, it proposes streamlining environmental permitting for data centers through new National Environmental Policy Act (NEPA) exemptions, making federal lands available for construction and modernizing the power grid—all while explicitly rejecting “radical climate dogma and bureaucratic red tape.”

The document embraces what it calls a “Build, Baby, Build!” approach—echoing a Trump campaign slogan—and promises to restore semiconductor manufacturing through the CHIPS Program Office, though stripped of “extraneous policy requirements.”

On the technology front, the plan directs Commerce to revise NIST’s AI Risk Management Framework to “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.” Federal procurement would favor AI developers whose systems are “objective and free from top-down ideological bias.” The document strongly backs open source AI models and calls for exporting American AI technology to allies while blocking administration-labeled adversaries like China.

Security proposals include high-security military data centers and warnings that advanced AI systems “may pose novel national security risks” in cyberattacks and weapons development.

Critics respond with “People’s AI Action Plan”

Before the White House unveiled its plan, more than 90 organizations launched a competing “People’s AI Action Plan” on Tuesday, characterizing the Trump administration’s approach as “a massive handout to the tech industry” that prioritizes corporate interests over public welfare. The coalition includes labor unions, environmental justice groups, and consumer protection nonprofits.

White House unveils sweeping plan to “win” global AI race through deregulation Read More »

ukrainians-arrest-alleged-admin-of-major-crime-forum-xss

Ukrainians arrest alleged admin of major crime forum XSS

Yesterday, Ukrainian authorities arrested the suspected administrator of a notorious Russian-language crime forum, XSS.is.

In an X post, the Paris Prosecutor’s Office announced that Ukrainian authorities detained the suspect after an investigation conducted with French authorities’ and Europol’s help that began almost exactly four years ago.

XSS has been “one of the main hubs of global cybercrime” since 2013, French authorities said, allowing “the sale of malware, access to compromised systems, stolen data, and ransomware-related services.”

Used by criminals globally to cover up illicit activity, the forum was shut down soon after the admin’s arrest.

The suspected admin has so far not been named. But police said the suspect was identified after authorities began intercepting encrypted chats sent on a Jabber messaging server that members used, “thesecure.biz.”

Surveilling chats between forum users, the government eventually intercepted a message that tipped authorities off to the alleged admin’s identity back in September. Soon after, they deployed agents to find the admin, and ultimately, it took months for Ukrainian authorities to make the arrest, with both French and Europol authorities present.

“The intercepted messages revealed numerous illicit activities related to cybercrime and ransomware, and established that they generated at least $7 million in profits,” a translation of the press release said.

Ukrainians arrest alleged admin of major crime forum XSS Read More »

audi-has-a-new-midsize-ev,-and-we’ve-driven-it:-the-2025-a6-sportback

Audi has a new midsize EV, and we’ve driven it: The 2025 A6 Sportback

Audi S6 drives on a straight road past vineyards

Long straight roads glide underneath. Credit: Audi

The car’s cabin layout and ergonomics are starting to feel familiar at this point—it shares much not only with the electric Q6 e-tron but also Audi’s new midsize combustion cars, the A5 and Q5. (We’ll leave for now the fact that a combustion A6, unrelated to today’s vehicle in virtually all but name, is also in development, bringing an end to the “odd numbers for ICE, even numbers for EV” convention that briefly took hold at the automaker. Now nameplate chaos reigns.)

Hey Audi…

The voice control proved a frustrating alternative to using the touchscreen, with a lot of “I’m sorry I can’t do that” and “can you ask me that again” for commands that I’m pretty sure ought to have worked. But both the A6 and S6 felt mature in terms of software, something that wasn’t true for the same infotainment platform a year ago. I remain frustrated with how limited the UI options remain for the main instrument display, however.

I keep writing this, but Audi pioneered the use of high-resolution digital displays instead of analog dials and gave owners quite a lot of choice, including the option of a moving map for navigation. Now, there’s a way to make the display very minimal, which would be useful at night, but otherwise, you’re extremely limited in what you can display in front of you. The optional full-color heads-up display has the same augmented-reality direction tech that we’ve seen in other luxury cars, and it remains helpful when driving on unfamiliar roads, although that requires using the native navigation app; Apple CarPlay users should still see turn-by-turn directions on the HUD, though.

The layout is starting to become familiar. Audi

There’s no true one-pedal driving mode, just a choice between B—0.25 G of lift-off regeneration deceleration—and D, which can be toggled between none, 0.06 G, and 0.15 G of lift-off regen braking using the paddles behind the steering wheel. B is preferable when the road turns twisty, something both A6 and S6 coped with surprisingly well. Hairpins proved the steering and suspension rapid enough to rotate the car quickly, and what felt like numb steering initially began to reveal some information about road surfaces and available grip as the road surface changed then changed again. There’s also a noticeable difference between the drive modes. Comfort feels a little soft and wallowing, Dynamic effectively transfers more bumps into the cabin, and Balanced is a rather good midpoint between the two, and where I spent most of my time. I should also note the lack of fatigue I felt despite a full day behind the wheel of both cars.

Audi has a new midsize EV, and we’ve driven it: The 2025 A6 Sportback Read More »

toy-company-may-regret-coming-for-“sylvanian-drama”-tiktoker,-experts-say

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say


Possible legal paths to revive a shuttered video series on TikTok and Instagram.

A popular account on TikTok and Instagram stopped posting suddenly at the end of last year, hit by a lawsuit after garnering millions of views on funny videos it made using adorable children’s Calico Critter dolls to act out dark, cringe-y adult storylines.

While millions of followers mourn the so-called “Sylvanian Drama” account’s demise, experts told Ars that the creator may have a decent chance at beating the lawsuit.

The “Sylvanian Drama” account derived its name from “Sylvanian Families,” a brand name used by Epoch Company Ltd., the maker of Calico Critters, for its iconic fuzzy animal dolls in some markets outside the US. Despite these videos referencing murder, drugs, and hookups, the toy company apparently had no problem, until the account, managed by Ireland-based Thea Von Engelbrechten, started accepting big brand partnerships and making sponsored content featuring the dolls.

Since Epoch, too, strikes partnerships with brands and influencers to promote its own videos marketing the dolls, the company claimed “Sylvanian Drama” risked creating too much confusion online. They also worried viewers would think Epoch had signed off on the videos, since the sponsored content was marked “paid partnership” without specifying precisely which featured brands had paid for the spots. They further accused Von Engelbrechten of building her advertising business around their brand without any attempt to properly license the dolls, while allegedly usurping licensing opportunities from Epoch.

So far, Von Engelbrechten has delayed responding in the lawsuit. As the account remained inactive over the past few months, fans speculated whether it could survive the lawsuit, which raised copyright and trademark infringement claims to get all the videos removed. In their complaint, the toy company requested not only an injunction preventing Von Engelbrechten from creating more “Sylvanian Drama” videos, but also sought all of her profits from her online accounts, in addition to further damages.

Von Engelbrechten declined Ars’ request to provide an update on her defense in the case, but her response is due in early August. That filing will make clear what arguments she may make to overcome Epoch’s suit, but legal experts told Ars that the case isn’t necessarily a slam dunk for the toy company. So all that “Sylvanian Drama” isn’t over just yet.

Epoch’s lawyers did not respond to Ars’ request to comment.

“Sylvanian Drama” needs the court to get the joke

Epoch raised copyright infringement charges that could hit Von Engelbrechten with fines totaling $150,000 per violation.

For Von Engelbrechten to defeat the copyright infringement claim, she’ll need to convince the court that her videos are parodies. A law professor at Santa Clara University School of Law, Eric Goldman, told Ars that her videos may qualify since “even if they don’t expressly reference Epoch’s offerings by name, the videos intentionally communicate a jarring juxtaposition of adorable critters who are important parts of pop culture living through the darker sides of humanity.”

Basically, Von Engelbrechten will need the court to understand the humor in her videos to win on that claim, Rebecca Tushnet, a First Amendment law professor at Harvard Law School, told Ars.

“Courts have varied in their treatment of parodies; the complaint’s definition of parody is not controlling but humor is one of the hardest things to predict—if the court gets the joke, it will be more likely to say that the juxtaposition between the storylines and the innocent appearance of the dolls is parodic,” Tushnet said.

But if the court does get the joke, Goldman suggested that even the sponsored content—which hilariously incorporates product placements from various big brands like Marc Jacobs, Taco Bell, Hilton, and Sephora into storylines—could possibly be characterized as parody.

However, “the fact that the social media posts were labeled #ad will make it extremely difficult for the artist to contest the videos’ status as ads,” Goldman said.

Ultimately, Goldman said that Epoch’s lawsuit “raises a host of complex legal issues” and is “not an easy case on either side.”

And one of the most significant issues that Epoch may face in the courtroom could end up gutting all of its trademark infringement claims that supposedly entitle the toy company to all of Von Engelbrechten’s profits, Alexandra Jane Roberts, a Northeastern University professor of law and media with special expertise in trademark law, told Ars.

Calico Critters may stumble on trademark hurdle

The toy company has raised several trademark infringement claims, all of which depend on Epoch proving that Von Engelbrechten “knowingly and willfully” used its trademarks without permission.

However, Roberts pointed out to Ars that Epoch has no trademarks for its iconic dolls, relying only on common law to assert sole rights to the “look and design of the critters.”

It’s likely impossible for Epoch to trademark the dolls, since trademarks are not intended to block competition, and there are only so many ways to design cute dolls that resemble cats or bunnies, Roberts suggested. A court may decide “there’s only so many ways to make a small fuzzy bunny that doesn’t look like this,” potentially narrowing the rights Epoch has under trade dress, a term that Epoch doesn’t use once in its complaint.

Roberts told Ars that Epoch’s trademark claims are “not so far off the mark,” and Von Engelbrechten’s defense was certainly not strengthened by her decision to monetize the content. Prior cases, like the indie band OK Go sending a cease-and-desist to Post cereal over a breakfast product called “OK Go” due to fears of false endorsement, make it clear that courts have agreed in the past that online collaborations have muddied the waters regarding who is the actual source of content for viewers.

“The question becomes whether people are going to see these videos, even though they’re snarky, and even though they’re silly and think, ‘Oh, Calico Critters must have signed off on this,'” Roberts said. “So the argument about consumer confusion, I think, is a plausible argument.”

However, if Epoch fails to convince the court that its trademarks have been infringed, then its other claims alleging false endorsement and unfair competition would likely also collapse.

“You can still get sometimes to unfair competition or to kind of like a false endorsement, but it’s harder to win on those claims and certainly harder to get damages on those claims,” Roberts said. “You don’t get trademark infringement if you don’t have a trademark.”

Possible defenses to keep “Sylvanian Drama” alive

Winning on the trademark claims may not be easy for Von Engelbrechten, who possibly weakened her First Amendment defense by creating the sponsored content. Regardless, she will likely try to convince the court to view the videos as parody, which is a slightly different analysis under trademark law than copyright’s more well-known fair use parody exceptions.

That could be a struggle, since trademark law requires that Von Engelbrechten’s parody videos directly satirize the “Sylvanian Families” brand, and “Sylvanian Drama” videos, even the ads, instead seem to be “making fun of elements of society and culture,” rather than the dolls themselves, Roberts said.

She pointed to winning cases involving the Barbie trademark as an instructive example. In a case disputing Mattel trademarks used in the lyrics for the one-hit wonder “Barbie Girl,” the song was cleared for trademark infringement as a “purely expressive work” that directly parodies Barbie in the lyrics. And in another case, where an artist, Tom Forsythe, captured photos of Barbie dolls in kitchen vessels like a blender or a margarita glass, more robust First Amendment protection was offered since his photos “had a lot to say about sexism and the dolls and what the dolls represent,” Roberts said.

The potential “Sylvanian Drama” defense seems to lack strong go-to arguments that typically win trademark cases, but Roberts said there is still one other defense the content creator may be weighing.

Under “nominative fair use,” it’s OK to use another company’s trademark if it’s necessary in an ad. Roberts provided examples, like a company renting Lexus cars needing to use that trademark or comparative advertising using Tiffany’s diamonds as a reference point to hype their lower prices.

If Von Engelbrechten goes that route, she will need to prove she used “no more of the mark than is necessary” and did not mislead fans on whether Epoch signed off on the use.

“Here it’s hard to say that ‘Sylvanian Drama’ really needed to use so much of those characters and that they didn’t use more than they needed and that they weren’t misleading,” Roberts said.

However, Von Engelbrechten’s best bet might be arguing that there was no confusion, since “Sylvanian Families” isn’t even a brand that’s used in the US, which is where Epoch chose to file its lawsuit because the brands that partnered with the popular account are based in New York. And the case may not even get that far, Roberts suggested, since “before you can get to those questions about the likelihood of confusion, you have to show that you actually have trademark or trade dress rights to enforce.”

Calico Critters creator may face millennial backlash

Epoch may come to regret filing the lawsuit, Roberts said, noting that as a millennial who grew up a big “Hello Kitty” fan, she still buys merch that appeals to her, and Epoch likely knows about that market, as it has done collaborations with the “Hello Kitty” brand. The toymaker could risk alienating other millennials nostalgic for Calico Critters who may be among the “Sylvanian Drama” audience and feel turned off by the lawsuit.

“When you draw attention to something like this and appear litigious, and that you’re coming after a creator who a lot of people really like and really enjoy and probably feel defensive about, like, ‘Oh, she’s just making these funny videos that everyone loves. Why would you want to sue her?'” Roberts said, “that can be really bad press.”

Goldman suggested that Epoch might be better off striking a deal with the creator, which “could establish some boundaries for the artist to keep going without stepping on the IP owner’s rights.” But he noted that “often IP owners in these situations are not open to negotiation,” and “that requires courts to draw difficult and unpredictable lines about the permissible scope of fair use.”

For Von Engelbrechten, the lawsuit may mean that her days of creating “Sylvanian Drama”-sponsored content are over, which could risk crushing a bigger dream she had to succeed in advertising. However, if the lawsuit can be amicably settled, the beloved content creator could also end up making money for Epoch, considering her brand deals appeared to be bigger.

While she seems to take her advertising business seriously, Von Engelbrechten’s videos often joke about legal consequences, such as one where a cat doll says she cannot go to a party because she’s in jail but says “I’ll figure it out” when told her ex will be attending. Perhaps Von Engelbrechten is currently devising a scheme, like her characters, to escape consequences and keep the “Sylvanian Drama” going.

“Maybe if this company were really smart, they would want to hire this person instead of suing them,” Roberts said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Toy company may regret coming for “Sylvanian Drama” TikToker, experts say Read More »

google-and-openai-get-2025-imo-gold

Google and OpenAI Get 2025 IMO Gold

Congratulations, as always, to everyone who got to participate in the 2025 International Mathematical Olympiad, and especially to the gold and other medalists. Gautham Kamath highlights 11th grader Warren Bei, who in his 5th (!) IMO was one of five participants with a perfect 42/42 score, along with Ivan Chasovskikh, Satoshi Kano, Leyan Deng and Hengye Zhang.

Samuel Albanie: Massive respect to the students who solved P6.

Congratulations to Team USA, you did not ‘beat China’ but 2nd place is still awesome. Great job, China, you got us this time, three perfect scores is crazy.

You’ve all done a fantastic, amazingly hard thing, and as someone who tried hard to join you and only got as far as the [, year censored because oh man I am old] USAMO and would probably have gotten 0/45 on this IMO if I had taken it today, and know what it is like to practice for the USAMO in a room with multiple future IMO team members that must have thought I was an idiot, let me say: I am always in awe.

But that’s not important right now.

What matters is that Google and OpenAI have LLMs with gold medal performances, each scoring exactly the threshold of 35/42 by solving the first five of the six problems.

This is up from Google’s 28/42 performance last year, which was previously achieved with a longer time frame. The methods used by both are presented as being more general, whereas last year’s version was a more specialized effort.

The new scores were a 92nd percentile result at the event.

Google did this under official collaboration with the IMO, announced on Monday as per the IMO’s request. OpenAI did it on their own, so they announced a bit earlier, so we are taking their word on many details.

This was not expected. Prediction markets thought gold this year was unlikely.

What matters more is how they did it, with general purpose LLMs without tools, in ways that represent unexpected and large future gains in other reasoning as well.

The more I think about the details here, the more freaked out I get rather than less. This is a big deal. How big remains to be seen, as we lack details, and no one knows how much of this will generalize.

The IMO 2025 results quickly came in for released models.

Teortaxes: I sure jumped the gun calling Grok a next generation model.

It’s probably not *thatfar from Gemini, compute-wise, and not really close in diversity and rigor of post-training.

This was an early sign that problem 3 was easier than usual this year, and a strong performance by the release version of Gemini 2.5 Pro.

So this is how it started seven hours before OpenAI announced its result:

Jxmo (replying to Ravid): if they did well, you’d be complaining that they overfit.

Ravid Shwartz: That’s true, because they are 👽

Rohit: This isn’t a gotcha. Any problem that we fundamentally focus on deeply enough is one that AI will be able to solve. The question, as ever, is whether that solution is likely to carry over to other domains.

I disagree, I think this is a gotcha in the positive sense. People took ‘the AIs that weren’t aimed at this problem that are publicly released are only doing okay relative to the best humans, and have not proven themselves the best yet’ to be ‘look at the pathetic AIs,’ one day before we learned that, well, actually, in a way prediction markets did not expect.

I do think people need to update their models of the future.

Also, it’s kind of a full gotcha given this:

Lin Yang: 🚨 Olympiad math + AI:

We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity.

The model could win gold! 🥇

It would be non-trivial for non-math person to achieve the same score. We have spent some time to carefully check the solutions. Regardless, the prompts are very general and can be applied to other models. We will release an automatic agent soon.

Jun Wu: They added a lot of steps in order to solve 5 problems. They didn’t publish the details on how these steps were done beyond the concepts.

I don’t have time to investigate how ‘legit’ the Gemini 2.5 Pro solutions are, including in terms of how much you have to cheat to get them.

Advanced version of Gemini with Deep Think officially achieves gold-medal standard at the International Mathematical Olympiad. Google’s solutions are here.

We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.

To make the most of the reasoning capabilities of Deep Think, we additionally trained this version of Gemini on novel reinforcement learning techniques that can leverage more multi-step reasoning, problem-solving and theorem-proving data. We also provided Gemini with access to a curated corpus of high-quality solutions to mathematics problems, and added some general hints and tips on how to approach IMO problems to its instructions.

We will be making a version of this Deep Think model available to a set of trusted testers, including mathematicians, before rolling it out to Google AI Ultra subscribers.

Google’s answers were even in nice form.

IMO President Dr. Gregor Dolinar: We can confirm that Google DeepMind has reached the much-desired milestone, earning 35 out of a possible 42 points — a gold medal score. Their solutions were astonishing in many respects. IMO graders found them to be clear, precise and most of them easy to follow.

Colin Fraser: has anyone actually read these LLM IMO proofs? I read one of the Google ones and it’s good. I find the OAI version of the same one impenetrable. The Google one is also kind of hard to read but possible.

Ernest Davis (6th in US Math Olympiad once, just short of the IMO): Second: The proofs produced by DM-IMO and by every single earlier LLM, whether correct or incorrect, are written in a smooth, elegant style. They could be cut and pasted into a journal article or into a textbook with little or no editing. The worst you can say of them is that they are sometimes verbose.

By contrast, OpenAI-IMO writes proofs in the style of an informal spoken presentation who is not very practiced or competent at giving informal presentations, and regularly mutters reassurances to themselves that they’re on the right rack.

Miles Brundage: OAI one got RL’d to within an inch of its life.

What else did they say about how they did this?

DeepMind: With Deep Think, an enhanced reasoning mode, our model could simultaneously explore and combine multiple possible solutions before giving definitive answers.

We also trained it on RL techniques that use more multi-step reasoning, problem-solving and theorem-proving data.

Finally, we pushed this version of Gemini further by giving it:

🔘 More thinking time

🔘 Access to a set of high-quality solutions to previous problems

🔘 General hints and tips on how to approach IMO problems

That sounds mostly rather general. There’s some specialized IMO context, but orders of magnitude less than what IMO competitors devote to this.

Elon Musk: While a notable milestone, this is already borderline trivial for AI.

Um, Elon, no, and I remind you that Grok 4 got 11.9%. Which for a human would be super impressive, but seriously, borderline trivial?

Noam Brown (OpenAI): Congrats to the GDM team on their IMO result! I think their parallel success highlights how fast AI progress is. Their approach was a bit different than ours, but I think that shows there are many research directions for further progress.

OpenAI claimed its victory first, right after the closing ceremony and before the party, whereas Google DeepMind waited to announce until the following Monday.

The most impressive thing about OpenAI’s result is that they claim this is not an IMO-specific model, and that it uses only general-purpose techniques.

Alexander Wei (OpenAI): I’m excited to share that our latest @OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs.

Why is this a big deal? First, IMO problems demand a new level of sustained creative thinking compared to past benchmarks. In reasoning time horizon, we’ve now progressed from GSM8K (~0.1 min for top humans) → MATH benchmark (~1 min) → AIME (~10 mins) → IMO (~100 mins).

Second, IMO submissions are hard-to-verify, multi-page proofs. Progress here calls for going beyond the RL paradigm of clear-cut, verifiable rewards. By doing so, we’ve obtained a model that can craft intricate, watertight arguments at the level of human mathematicians.

Besides the result itself, I am excited about our approach: We reach this capability level not via narrow, task-specific methodology, but by breaking new ground in general-purpose reinforcement learning and test-time compute scaling.

In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold! 🥇

HUGE congratulations to the team—@SherylHsu02, @polynoamial, and the many giants whose shoulders we stood on—for turning this crazy dream into reality! I am lucky I get to spend late nights and early mornings working alongside the very best.

Btw, we are releasing GPT-5 soon, and we’re excited for you to try it. But just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months.

Still—this underscores how fast AI has advanced in recent years. In 2021, my PhD advisor @JacobSteinhardt had me forecast AI math progress by July 2025. I predicted 30% on the MATH benchmark (and thought everyone else was too optimistic). Instead, we have IMO gold.

If you want to take a look, here are the model’s solutions to the 2025 IMO problems! The model solved P1 through P5; it did not produce a solution for P6. (Apologies in advance for its … distinct style—it is very much an experimental model 😅)

Lastly, we’d like to congratulate all the participants of the 2025 IMO on their achievement! We are proud to have many past IMO participants at @OpenAI and recognize that these are some of the brightest young minds of the future.

Noam Brown (OpenAI): Today, we at @OpenAI achieved a milestone that many considered years away: gold medal-level performance on the 2025 IMO with a general reasoning LLM—under the same time limits as humans, without tools. As remarkable as that sounds, it’s even more significant than the headline.

Typically for these AI results, like in Go/Dota/Poker/Diplomacy, researchers spend years making an AI that masters one narrow domain and does little else. But this isn’t an IMO-specific model. It’s a reasoning LLM that incorporates new experimental general-purpose techniques.

So what’s different? We developed new techniques that make LLMs a lot better at hard-to-verify tasks. IMO problems were the perfect challenge for this: proofs are pages long and take experts hours to grade. Compare that to AIME, where answers are simply an integer from 0 to 999.

Jacques: Most important part of the IMO Gold achievement. Were you surprised by this? Did you not update all the way to avoid likelihood of surprise?

Indeed. Purely getting the gold medal is surprising but not that big a deal. The way they got the result, assuming they’re reporting accurately? That’s a really big deal.

Noam Brown (resuming): Also this model thinks for a *longtime. o1 thought for seconds. Deep Research for minutes. This one thinks for hours. Importantly, it’s also more efficient with its thinking. And there’s a lot of room to push the test-time compute and efficiency further.

Importantly, I think we’re close to AI substantially contributing to scientific discovery. There’s a big difference between AI slightly below top human performance vs slightly above.

This was a small team effort led by @alexwei_. He took a research idea few believed in and used it to achieve a result fewer thought possible. This also wouldn’t be possible without years of research+engineering from many at @OpenAI and the wider AI community.

Tifa Chen: Last night we IMO tonight we party.

What about Problem 6? Did the programs submit incorrect solutions?

Note that if you are maximizing, then when time runs out if you have anything at all then yes you do submit the best incorrect solution you have, because you might get you partial credit, although this rarely works out.

Daniel Litt: One piece of info that seems important to me in terms of forecasting usefulness of new AI models for mathematics: did the gold-medal-winning models, which did not solve IMO problem 6, submit incorrect answers for it?

Alexander Wei: On IMO P6 (without going into too much detail about our setup), the model “knew” it didn’t have a correct solution. The model knowing when it didn’t know was one of the early signs of life that made us excited about the underlying research direction!

If one person gets to say ‘Not So Fast’ about this sort of thing, Tao is that one person.

It is entirely fair to say that if you don’t disclose conditions in advance, and definitely if you don’t disclose conditions after the fact, it is difficult to know exactly what to make of the result. Tao’s objections are valid.

Terence Tao: It is tempting to view the capability of current AI technology as a singular quantity: either a given task X is within the ability of current tools, or it is not. However, there is in fact a very wide spread in capability (several orders of magnitude) depending on what resources and assistance gives the tool, and how one reports their results.

One can illustrate this with a human metaphor. I will use the recently concluded International Mathematical Olympiad (IMO) as an example. Here, the format is that each country fields a team of six human contestants (high school students), led by a team leader (often a professional mathematician). Over the course of two days, each contestant is given four and a half hours on each day to solve three difficult mathematical problems, given only pen and paper. No communication between contestants (or with the team leader) during this period is permitted, although the contestants can ask the invigilators for clarification on the wording of the problems. The team leader advocates for the students in front of the IMO jury during the grading process, but is not involved in the IMO examination directly.

The IMO is widely regarded as a highly selective measure of mathematical achievement for a high school student to be able to score well enough to receive a medal, particularly a gold medal or a perfect score; this year the threshold for the gold was 35/42, which corresponds to answering five of the six questions perfectly. Even answering one question perfectly merits an “honorable mention”.

But consider what happens to the difficulty level of the Olympiad if we alter the format in various ways, such as the following:

  1. One gives the students several days to complete each question, rather than four and half hours for three questions. (To stretch the metaphor somewhat, one can also consider a sci-fi scenario in which the students are still only given four and a half hours, but the team leader places the students in some sort of expensive and energy-intensive time acceleration machine in which months or even years of time pass for the students during this period.)

  2. Before the exam starts, the team leader rewrites the questions in a format that the students find easier to work with.

  3. The team leader gives the students unlimited access to calculators, computer algebra packages, formal proof assistants, textbooks, or the ability to search the internet.

  4. The team leader has the six student team work on the same problem simultaneously, communicating with each other on their partial progress and reported dead ends.

  5. The team leader gives the students prompts in the direction of favorable approaches, and intervenes if one of the students is spending too much time on a direction that they know to be unlikely to succeed.

  6. Each of the six students on the team submit solutions to the team leader, who then selects only the “best” solution for each question to submit to the competition, discarding the rest.

  7. If none of the students on the team obtains a satisfactory solution, the team leader does not submit any solution at all, and silently withdraws from the competition without their participation ever being noted.

In each of these formats, the submitted solutions are still technically generated by the high school contestants, rather than the team leader. However, the reported success rate of the students on the competition can be dramatically affected by such changes of format; a student or team of students who might not even always reach bronze medal performance if taking the competition under standard test conditions might instead reach reliable gold medal performance under some of the modified formats indicated above.

So, in the absence of a controlled test methodology that was not self-selected by the competing teams, one should be wary of making overly simplistic apples-to-apples comparisons between the performance of various AI models on competitions such as the IMO, or between such models and the human contestants.

Related to this, I will not be commenting on any self-reported AI competition performance results for which the methodology was not disclosed in advance of the competition.

EDIT: In particular, the above comments are not specific to any single result of this nature.

The catch is that this is about grading the horse’s grammar, as opposed to the observation that the horse can talk and rather intelligently and with rapidly improving performance at that.

Thus, while the objections are valid, as long as we know the AIs had no access to outside tools or to the internet (which is confirmed), we should seek the answers to these other questions but the concerns primarily matter for comparisons between models, and within a reasonably narro (in the grand scheme of things) band of capabilities.

I also would note that if OpenAI did essentially do the ‘team thinks in parallel’ thing where it had multiple inference processes running simultaneously on multiple computers, well, that is something AIs can do in the real world, and this seems entirely fair for our purposes the same way humans can fire multiple neurons at once. It’s totally fair to also want a limited-compute or one-thread category or what not, but that’s not important right now.

To use Tao’s metaphor, if you took 99.99% of high school students, you could fully and simultaneously apply all these interventions other than formal proof assistants and internet searches or hints so clear they give you the first point on a question, and you still almost always get zero.

Nat McAleese: 17 M U.S. teens grades 9-12, ~5 US IMO golds in practice but ~20 kids at gold-level. So IMO gold is one-in-a-million math talent (for 18 year olds; but I bet next Putnam falls too). 99.9999th percentile.

As a former not only math competitor but also Magic: The Gathering competitor, absolutely all these details matter for competitions, and I respect the hell out of getting all of those details right – I just don’t think that, in terms of takeaways, they change the answer much here.

In other words? Not Not So Fast. So Fast.

OpenAI chose not to officially collaborate with the IMO. They announced their result after the IMO closing ceremony and prior to the IMO 2025 closing party. Those who did collaborate agreed to wait until the following Monday, which was when Google announced. By going first, OpenAI largely stole the spotlight on this from Google, yet another case of Google Failing Marketing Forever.

A question that was debated is, did OpenAI do something wrong here?

Mikhail Samin claimed that they did, and put their hype and clout ahead of the kids celebrating their achievements against the wishes of the IMO.

OpenAI’s Noam Brown replied that they waited until after the closing ceremony exactly to avoid stealing the spotlight. He said he was the only person at OpenAI to speak to anyone at the IMO, and that person only requested waiting until after the ceremony, so that is what OpenAI did.

Not collaborating with the IMO was a choice that OpenAI made.

Mikhail Samin: AI companies that chose to cooperate with the IMO on assessment of the performance of their models had in-person meetings with IMO people on July 16. It was agreed there that announcements of AI achievements should be made on 28 July or later.

A quote from someone involved: “I certainly expect that if OpenAI had contacted the IMO in advance and expressed interest in cooperating in the assessment of their work, they would have been able to be included in that meeting, so I suppose that unless there was a major miscommunication somewhere, they effectively ended up choosing, by default or otherwise, not to cooperate with the IMO on this, and so not to be aware of what ground rules might have been agreed by those who did cooperate.”

Demis Hassabis (CEO DeepMind): Btw as an aside, we didn’t announce on Friday because we respected the IMO Board’s original request that all AI labs share their results only after the official results had been verified by independent experts & the students had rightfully received the acclamation they deserved.

We’ve now been given permission to share our results and are pleased to have been part of the inaugural cohort to have our model results officially graded and certified by IMO coordinators and experts, receiving the first official gold-level performance grading for an AI system!

Noam Brown: ~2 months ago, the IMO emailed us about participating in a formal (Lean) version of the IMO. We’ve been focused on general reasoning in natural language without the constraints of Lean, so we declined. We were never approached about a natural language math option.

Over the past several months, we made a lot of progress on general reasoning. This involved collecting, curating, and training on high-quality math data, which will also go into future models. In our IMO eval we did not use RAG or any tools.

Before we shared our results, we spoke with an IMO board member, who asked us to wait until after the award ceremony to make it public, a request we happily honored.

We had each submitted proof graded by 3 external IMO medalists and there was unanimous consensus on correctness. We have also posted the proofs publicly so that anyone can verify correctness.

Jasper: DeepMind got a gold medal at the IMO on Friday afternoon. But they had to wait for marketing to approve the tweet — until Monday. @OpenAI shared theirs first at 1am on Saturday and stole the spotlight.

In this game, speed > bureaucracy. Miss the moment, lose the narrative.

Clarification: I’ve been told by someone at Google that their IMO results are still being verified internally. Once that’s done, they plan to share them officially—curious to see their approach. Another source mentioned that the IMO committee asked not to publicly discuss AI involvement within a week after the closing ceremony. Things just got a bit more interesting.

Daniel Eth: “In this game, speed > bureaucracy. Miss the moment, lose the narrative.” Honestly, disagree. If GDM beats OpenAI, then the narrative will shift once that’s public.

I have reflected on this. It is not the main thing, the results are the main thing. I do think that on reflection while OpenAI did not break any agreements or their word, and strictly speaking they do not owe the IMO or the kids anything, and this presumably net increased the focus on the kids, this still does represent a meaningful failure to properly honor the competition and process, as well as offering us the proper opportunities for verification, and they should have known that this was the case. I do get that this was a small team’s last minute effort, which makes me more understanding, but it’s still not great.

Fig Spirit: then again, assuming Myers is correct about his impression of the “general coordinator view”, seems like the kind of thing that OpenAI could have known about *ifthey cared, no? by e.g. talking to the right people at the IMO… which imo is not asking much! and looks like others did?

Thus, I was careful to wait to write this until after Google’s results were announced, and have placed Google’s announcement before OpenAI’s in this post, even though due to claimed details by OpenAI I do think their achievement here is likely the more meaningful one. Perhaps that is simply Google failing marketing again and failing to share details.

Ultimately, the reason OpenAI stole my spotlight is that it harkens something general and new in a way that Google’s announcement doesn’t.

With Google sharing its results I don’t want to wait any longer, but note Harmonic?

Harmonic Math: This past week, Harmonic had the opportunity to represent our advanced mathematical reasoning model, Aristotle, at the International Mathematics Olympiad – the most prestigious mathematics competition in the world.

To uphold the sanctity of the student competition, the IMO Board has asked us, along with the other leading AI companies that participated, to hold on releasing our results until July 28th.

So please join us live on @X next Monday, July 28th at 3PM PT and hear from our CEO @tachim and Executive Chairman @vladtenev about the advent of mathematical superintelligence (and maybe a few surprises along the way).

This would be a weird flex if they didn’t also get gold, although it looks like they would have done it in a less general and thus less ultimately interesting way. On the flip side, they are not a big lab like Google or OpenAI, so that’s pretty impressive.

I think the failure to expect this was largely a mistake, but Manifold tells a clear story:

Andrew Curran: OpenAI’s new model has achieved gold level at the International Math Olympiad in a stunning result. It is a reasoning model that incorporates new experimental general-purpose techniques. This has happened much sooner than was predicted by most experts.

Noam Brown (OpenAI): When you work at a frontier lab, you usually know where frontier capabilities are months before anyone else. But this result is brand new, using recently developed techniques. It was a surprise even to many researchers at OpenAI. Today, everyone gets to see where the frontier is.

Peter Wildeford: AI progress comes at you fast.

JGalt Tweets: When will an AI win a Gold Medal in the International Math Olympiad? Median predicted date over time

July 2021: 2043 (22 years away)

July 2022: 2029 (7 years away)

July 2023: 2028 (5 years away)

July 2024: 2026 (2 years away)

Final result, July 2025: 2025 (now). Buckle up, Dorothy.

Some people did expect it, some of whom offered caveats.

Greg Burnham: Pretty happy with how my predictions are holding up.

5/6 was the gold medal threshold this year. OAI’s “experimental reasoning LLM” got that exactly, failing only to solve the one hard combinatorics problem, P6.

My advice remains: look beyond the medal.

Now, this is an LLM, not AlphaProof. That means LLMs have improved at proofs. I didn’t expect that so soon.

Though, FWIW, P3 is a bit of an outlier this year, at least for humans: over 15% of humans got it, higher than any P3 in the last 10 years.

But “the big one” remains whether the AI solutions show qualitatively creative problem-solving.

LLMs could already grind out “low insight” sol’ns to hard AIME problems. If OAI found a way to train them do that for olympiad proof-based problems too, that’s new, but less exciting.

So, clear progress, but not *toosurprising. I’ll keep my takes tempered until looking at the AI solutions in depth, which I hope to do soon! Above excerpts from my preregistered take on the IMO here.

Mikhail Samin: As someone who bet back in 2023 that that it’s >70% likely AI will get an IMO gold medal by 2027:

the IMO markets have been incredibly underpriced, especially for the past year.

(Sadly, another prediction I’ve been >70% confident about is that AI will literally kill everyone.)

The AIs took the IMO under the same time limits as the humans, and success was highly valued, so it is no surprise that they used parallel inference to get more done within that time frame, trading efficiency for speed.

Andrew Curran: These agentic teams based models like Grok Heavy, the Gemini Deep Think that just won gold, and the next gen from OpenAI are all going to use about fifteen times more tokens than current systems. This is why Pro plans are north of $200. Essentially; Jensen wins again.

[from June 14]: Claude Opus, coordinating four instances of Sonnet as a team, used about 15 times more tokens than normal. (90% performance boost) Jensen has mentioned similar numbers on stage recently. GPT-5 is rumored to be agentic teams based. The demand for compute will continue to increase.

Arthur B: IMO gold is super impressive.

I just want to register a prediction, I’m 80% confident the inference run cost over $1M in compute.

Mostly because if they could do it for $1M they would, and they would be able to do it for $1M before they can do it for less.

Jerry Tworek (OpenAI): I’m so limited by compute you wouldn’t believe it. Stargate can’t finish soon enough.

Sure, you solved this particular problem, but that would never generalize, right? That part is the same hype as always?

Near Cyan: you wont believe how smart our new frontier llm is. it repeatedly samples from the data manifold just like our last one. but this time we gave it new data to cover a past blindspot. watch in awe as we now sample from a slightly different area of the data manifold.

there may lay a prize at the end of the hyperdimensioanl checkered rainbow, but it’s likely not what you think it is.

i really thought someone would have done something original by now. of course, if anything was ~truly~ cooking, it shouldn’t be something i’d know about… but the years continue to pass

and, right right we have to finish *this phaseso that we have the pre-requisites. and yet.

David Holz (CEO MidJourney): noooo money can’t be dumb it’s so green.

Near Cyan: it is for now! but some of it may turn a dark crimson surprisingly quickly.

Nico: What do you make of [the OpenAI model knowing it didn’t have a correct solution to problem 6]? Sounds pretty important.

Near Cyan: seems cool i bet they have some great data.

A grand tradition is:

  1. AI can do a set of things [X] better than humans, but not a set of things [Y].

  2. People say [X] and [Y] are distinct because Moravec’s Paradox and so on.

  3. AI lab announces that [Z], previously in [Y], is now in [X].

  4. People move [Z] from [Y] to [X] and then repeat that this distinct category of things [Y] exists because Moravec’s Paradox, that one task was simply miscategorized before, so it’s fine.

Or: AI can do the things it can do, and can’t do the things it can’t do, they’re hard.

Yuchen Jin: OpenAI and DeepMind models winning IMO golds is super cool, but not surprising if you remember AlphaGo beat Lee Sedol.

What’s easy for AI can be hard for humans, and vice versa. That’s Moravec’s Paradox.

So yes, AI can win math gold medals and beat humans in competitive coding contests. But ask it to act like a competent “intern” across a multi-step project without messing things up? Still a long way to go.

To get there, models need longer context windows, far less hallucination (a single one can derail a multi-step task), and likely a new learning paradigm. RL with a single scalar +1/-1 reward at the end of a long trajectory just isn’t informative enough to drive actual learning.

An oldie but a goodie:

Colin Fraser: Can an LLM make a good IMO problem

Posting before someone else does

I mean, it probably can’t even do real math, right?

Kevin Buzzard (Mathematician, Imperial College): I certainly don’t agree that machines which can solve IMO problems will be useful for mathematicians doing research, in the same way that when I arrived in Cambridge UK as an undergraduate clutching my IMO gold medal I was in no position to help any of the research mathematicians there.

It is still entirely unclear whether things will scale from machines being able to do mathematics which can be solved using high school techniques to machines being able to help with mathematics which can only be solved by having a deep understanding of modern research ideas.

This is a big open question right now.

Hehe: What most people don’t realize is that IMO (and IOI, though to a different extent) aren’t particularly hard. They’re aimed at high schoolers, so anyone with decent uni education should be able to solve most of them.

Daniel Litt: I’m sorry, this is nonsense. Vast majority of strong math majors can’t do 5/6 IMO problems. It’s a specific skill that getting a math major doesn’t really train you for.

So yes, we still do not know for sure if being able to do [X] will extend to doing [Y], either with the same model or with a future different model, and [X] and [Y] are distinct skills such that the humans who do [X] cannot yet do [Y] and training humans to do [Y] does not give them the ability to do [X]. However please try to think ahead.

Daniel Litt: An AI tool that gets gold on the IMO is obviously immensely impressive. Does it mean math is “solved”? Is an AI-generated proof of the Riemann hypothesis clearly on the horizon? Obviously not.

Worth keeping timescales in mind here: IMO competitors spend an average of 1.5 hrs on each problem. High-quality math research, by contrast, takes month or years.

What are the obstructions to AI performing high-quality autonomous math research? I don’t claim to know for sure, but I think they include many of the same obstructions that prevent it from doing many jobs:

Long context, long-term planning, consistency, unclear rewards, lack of training data, etc.

It’s possible that some or all of these will be solved soon (or have been solved) but I think it’s worth being cautious about over-indexing on recent (amazing) progress.

To briefly expand on the point about timescales: one recent paper I wrote solved a problem I’ve been thinking about since 2017. Another was 94 pages of extremely densely-written math, aimed at experts.

We don’t know much yet about how the best internal models work, but I don’t think it’s clear that getting capabilities of that level is “only” an engineering problem. That said, I do think it’s pretty likely that many or all of these issues will be solved within the span of my mathematics career.

That is all entirely fair. An IMO problem is measured in hours, not months, and is bounded in important ways. That is exactly the paradigm of METR, and the one being talked about by Noam Brown and Alexander Wei, that we have now made the move from 10 minute problems to 100 minute problems.

That does not mean we can yet solve 10,000 minute or 1 million minute problems, but why would you expect the scaling to stop here? As I discussed in the debates over AI 2027, it makes sense to think that these orders of magnitude start to get easier rather than harder once you get into longer problems. If you can do 100 minute problems that doesn’t mean you can easily go to 1000 or a million, but if you can go 1 million, I bet you can probably do 1 billion without fundamentally changing things that much, if you actually have that kind of time. At some point your timeline is ‘indefinite’ or ‘well, how much time and compute have you got?’

David White: the openai IMO news hit me pretty heavy this weekend.

i’m still in the acute phase of the impact, i think.

i consider myself a professional mathematician (a characterization some actual professional mathematicians might take issue with, but my party my rules) and i don’t think i can answer a single imo question.

ok, yes, imo is its own little athletic subsection of math for which i have not trained, etc. etc., but. if i meet someone in the wild who has an IMO gold, i immediately update to “this person is much better at math than i am”

now a bunch of robots can do it. as someone who has a lot of their identity and their actual life built around “is good at math,” it’s a gut punch. it’s a kind of dying.

like, one day you discover you can talk to dogs. it’s fun and interesting so you do it more, learning the intricacies of their language and their deepest customs. you learn other people are surprised by what you can do. you have never quite fit in, but you learn people appreciate your ability and want you around to help them. the dogs appreciate you too, the only biped who really gets it. you assemble for yourself a kind of belonging. then one day you wake up and the universal dog translator is for sale at walmart for $4.99.

the IMO result isn’t news, exactly. in fact, if you look at the METR agent task length over time plot, i think agents being able to solve ~ 1.5 hour problems is coming right on time. so in some way we should not be surprised. and indeed, it appears multiple companies have achieved the same result. it’s just… the rising tide rising as fast as it has been rising.

of course, grief for my personal identity as a mathematician (and/or productive member of society) is the smallest part of this story

multiply that grief out by *everymathematician, by every coder, maybe every knowledge worker, every artist… over the next few years… it’s a slightly bigger story

and of course, beyond that, there is the fear of actual death, which perhaps i’ll go into more later.

this package — grief for relevance, grief for life, grief for what i have known — isn’t unique to the ai age or anything like that. i think it is a standard thing as one appreaches end of career or end of life. it just might be that that is coming a bit sooner for many of us, all at once.

i wonder if we are ready

I am very confident we are not ready. If we are fortunate we might survive, but we definitely are not ready.

I grade this as minus one million points for asking the wrong questions.

Mechanize: Automating math would generate less than 1% as much value as automating software engineering.

Perhaps AI labs should focus less on chasing gold medals and focus more on the hard problem of automating SWE.

T11s: this is pretty reductionist? innovations in math uniquely enable lots of software (eg cryptography made ecommerce possible)

Deedy: Quant trading is a lot of math and accounts for $50-100B in revenue.

Never confuse costs and benefits #RulesForLife, and never reason from a price change.

(This defines ‘math’ rather narrowly as advanced Real Math that mathematicians and maybe quants and other professionals do, not the kind of math that underlies absolutely everything we do all day, since Fake Math is already mostly automated.)

The value of automating is not determined by how much we spent on it before it got automated. The value is determined by how much additional value we get out of something when we automate it, which might involve a lot more production and very diffuse benefits.

Back in February 2022, Eliezer Yudkowsky bet with Paul Christiano about IMO performance by 2025. The results were not super clear cut if you look at the details, as Christiano was in large part doubting that the hardest problem would be solved and indeed the hardest problem was #6 and was not solved, but a gold medal was still achieved.

So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.

Separately, we have Paul at <4% of an AI able to solve the "hardest" problem under the same conditions.

How [I, Paul, would] update

The informative:

  • I think the IMO challenge would be significant direct evidence that powerful AI would be sooner, or at least would be technologically possible sooner. I think this would be fairly significant evidence, perhaps pushing my 2040 TAI [transformational AI] probability up from 25% to 40% or something like that.

  • I think this would be significant evidence that takeoff will be limited by sociological facts and engineering effort rather than a slow march of smooth ML scaling. Maybe I’d move from a 30% chance of hard takeoff to a 50% chance of hard takeoff.

  • If Eliezer wins, he gets 1 bit of epistemic credit. These kinds of updates are slow going, and it would be better if we had a bigger portfolio of bets, but I’ll take what we can get.

  • This would be some update for Eliezer’s view that “the future is hard to predict.” I think we have clear enough pictures of the future that we have the right to be surprised by an IMO challenge win; if I’m wrong about that then it’s general evidence my error bars are too narrow.

If an AI wins a gold on some but not all of those years, without being able to solve the hardest problems, then my update will be somewhat more limited but in the same direction.

At this point, we have a lot of people who have updated far past 40% chance of transformational AI by 2040 and have 40% for dates like 2029.

If we take all of OpenAI’s statements at face value, think about what they actually did.

Sam Altman: we achieved gold medal level performance on the 2025 IMO competition with a general-purpose reasoning system! to emphasize, this is an LLM doing math and not a specific formal math system; it is part of our main push towards general intelligence.

when we first started openai, this was a dream but not one that felt very realistic to us; it is a significant marker of how far AI has come over the past decade.

we are releasing GPT-5 soon but want to set accurate expectations: this is an experimental model that incorporates new research techniques we will use in future models. we think you will love GPT-5, but we don’t plan to release a model with IMO gold level of capability for many months.

Sheryl Hsu (OpenAI): Watching the model solve these IMO problems and achieve gold-level performance was magical.

The model solves these problems without tools like lean or coding, it just uses natural language, and also only has 4.5 hours. We see the model reason at a very high level – trying out different strategies, making observations from examples, and testing hypothesis.

It’s crazy how we’ve gone from 12% on AIME (GPT 4o) → IMO gold in ~ 15 months. We have come very far very quickly. I wouldn’t be surprised if by next year models will be deriving new theorems and contributing to original math research!

I was particularly motivated to work on this project because this win came from general research advancements. Beyond just math, we will improve on other capabilities and make ChatGPT more useful over the coming months.

Sebastien Bubeck: It’s hard to overstate the significance of this. It may end up looking like a “moon‑landing moment” for AI.

Just to spell it out as clearly as possible: a next-word prediction machine (because that’s really what it is here, no tools no nothing) just produced genuinely creative proofs for hard, novel math problems at a level reached only by an elite handful of pre‑college prodigies.

Nomore ID: Read Noam’s thread carefully.

Winning a gold medal at the 2025 IMO is an outstanding achievement, but in some ways, it might just be noise that grabbed the headlines.

They have recently developed new techniques that work much better on hard-to-verify problems, have extended TTC to several hours, and have improved thinking efficiency.

Jerry Tworek (OpenAI): Why am I excited about IMO results we just published:

– we did very little IMO-specific work, we just keep training general models

– all natural language proofs

– no evaluation harness

We needed a new research breakthrough and @alexwei_ and team delivered.

Diego Aud: Jerry, is this breakthrough included in GPT-5, or is it reserved for the next generation?

Jerry Tworek: It’s a later model probably end of year thing.

Guizin: Agent 1.

Jerry Tworek: I’m so limited by compute you wouldn’t believe it. Stargate can’t finish soon enough.

Going back to Tao’s objections, we know essentially nothing about this new model, or about what Google did to get their result. Given that P3 was unusually easy this year, these scores are perhaps not themselves that terribly impressive relative to expectations.

Can we trust this? It’s not like OpenAI has never misled us on such things in the past.

In terms of the result being worthy of a 35/42, I think we can mostly trust that. They shared the solution, in its garbled semi-English, and if there was something that would have lost them points I think someone would have spotted it by now.

In terms of OpenAI otherwise cheating, we don’t have any proof about this but I think the chances of this are quite low. There’s different kinds of deception or lies, different parts of OpenAI are differently trustworthy, and this kind of lie is not in their nature nor do they have much incentive to try it given the chance it gets exposed, and the fact that if it’s not real then they won’t be able to pay it off later.

The place where one might doubt the most is, can we trust that what OpenAI did this time is more general, in the ways they are claiming?

Gary Marcus: The paradox of the OpenAI IMO discussion is that the new model scored only slightly better than DeepMind’s system from last year (as @NeelNanda5 notes); but that we assume that the new model is far more general.

Yet we have not yet seen any direct evidence of that.

It can barely speak english.

The ‘barely speak English’ part makes the solution worse in some ways but actually makes me give their claims to be doing something different more credence rather than less. It also should worry anyone who wants to maintain monitorable chain of thought.

Then again, one could say that the version that does it better, and more naturally, is thus more important, for exactly the same reasons.

Vladimir Nesov: [GDM’s] is even more surprising than OpenAI’s entry (in its details). Since it can now write proofs well automatically (even if it costs a lot and takes a lot of time), in a few months regular reasoning models might get enough training data to reliably understand what proofs are directly, and that’s an important basic ingredient for STEM capabilities.

We only have OpenAI’s word on the details of how this went down. So what to think?

I am mostly inclined to believe them on the main thrust of what is going on. That doesn’t mean that this result will generalize. I do give them credit for having something that they believe came out of a general approach, and that they expect to generalize.

Still, it’s reasonable to ask what the catch might be, that there’s always going to be a catch. Certainly it is plausible that this was, as Miles suggested, RLed to within an inch of its life, and it starting to be unable to speak English is the opposite of what is claimed, that it is losing its generality, or things are otherwise going off the rails.

The thing is, to me this doesn’t feel like it is fake. It might not be a big deal, it might not transfer all that well to other contexts, but it doesn’t feel fake.

To wrap up, another reminder that no, you can’t pretend none of this matters, and both the Google and OpenAI results matter and should update you:

Cole Wyeth: The headline result was obviously going to happen, not an update for anyone paying attention.

Garrett Baker: “Obviously going to happen” is very different from ‘happens at this point in time rather than later or sooner and with this particular announcement by this particular company’. You should still update off this. Hell, I was pretty confident this would be first done by Google DeepMind, so its a large update for me (I don’t know what for yet though)!

Your claim “not an update for anyone paying attention” also seems false. I’m sure there are many who are updating off this who were paying attention, for whatever reason, as they likely should.

I generally dislike this turn of phrase as it serves literally no purpose but to denigrate people who are changing their mind in light of evidence, which is just a bad thing to do.

cdt: I think it was reasonable to expect GDM to achieve gold with an AlphaProof-like system. Achieving gold with a general LLM-reasoning system from GDM would be something else and it is important for discussion around this to not confuse one forecast for another.

Discussion about this post

Google and OpenAI Get 2025 IMO Gold Read More »

xai-workers-balked-over-training-request-to-help-“give-grok-a-face,”-docs-show

xAI workers balked over training request to help “give Grok a face,” docs show

For the more than 200 employees who did not opt out, xAI asked that they record 15- to 30-minute conversations, where one employee posed as the potential Grok user and the other posed as the “host.” xAI was specifically looking for “imperfect data,” BI noted, expecting that only training on crystal-clear videos would limit Grok’s ability to interpret a wider range of facial expressions.

xAI’s goal was to help Grok “recognize and analyze facial movements and expressions, such as how people talk, react to others’ conversations, and express themselves in various conditions,” an internal document said. Allegedly among the only guarantees to employees—who likely recognized how sensitive facial data is—was a promise “not to create a digital version of you.”

To get the most out of data submitted by “Skippy” participants, dubbed tutors, xAI recommended that they never provide one-word answers, always ask follow-up questions, and maintain eye contact throughout the conversations.

The company also apparently provided scripts to evoke facial expressions they wanted Grok to understand, suggesting conversation topics like “How do you secretly manipulate people to get your way?” or “Would you ever date someone with a kid or kids?”

For xAI employees who provided facial training data, privacy concerns may still exist, considering X—the social platform formerly known as Twitter that recently was folded into xAI—has recently been targeted by what Elon Musk called a “massive” cyberattack. Because of privacy risks ranging from identity theft to government surveillance, several states have passed strict biometric privacy laws to prevent companies from collecting such data without explicit consent.

xAI did not respond to Ars’ request for comment.

xAI workers balked over training request to help “give Grok a face,” docs show Read More »

mercedes-amg-gives-us-a-ride-in-its-next-high-performance-ev

Mercedes-AMG gives us a ride in its next high-performance EV

The first thing I noticed was the simulated engine noise. It was developed to be unique to AMG.EA, taking inspiration from some of the great AMGs of the past. AMG boss Michael Schiebe tells us that they set up shop outside the offices and had people drive by in various cars to find the right engine and exhaust notes to fit into the creation. It’s a deep, throaty sound.

It’s a sound you can feel

Seriously, I feel something in my seat. The engineer later asks if I notice anything in my seat, and while I can’t confirm what it was adding to the sound—be it a speaker or a motor—it does help make the car feel more alive.

The artificial gearshifts are more than just halting power for a brief period; they’re part of a mapped-out torque curve. Like in the Hyundai, you can feel the acceleration build like you would in a combustion engine. It’s not as prominent as in the Hyundai, but it’s there.

When the car shifts, it feels a lot like a ZF 8-speed automatic in a modern performance car. It’s smooth, but enthusiastic. It’s not as extreme as the Hyundai, but I’d argue the Hyundai driver and the AMG driver are looking for a different experience, with the AMG being a bit more adult.

The AMG also, at least as it sits now, will automatically upshift at redline. The fun thing about the Hyundai is, if you intentionally miss a shift, the car will throw your head into the steering wheel as you hit the artificial rev limiter. It’s hilarious. The vibe from the prototype I’m in is that things are a bit more serious.

Mercedes-AMG gives us a ride in its next high-performance EV Read More »

uk-backing-down-on-apple-encryption-backdoor-after-pressure-from-us

UK backing down on Apple encryption backdoor after pressure from US

Under the terms of the legislation, recipients of such a notice are unable to discuss the matter publicly, even with customers affected by the order, unless granted permission by the Home Secretary.

The legislation’s use against Apple has triggered the tech industry’s highest-profile battle over encryption technology in almost a decade.

In response to the demand, Apple withdrew its most secure cloud storage service from the UK in February and is now challenging the Home Office’s order at the Investigatory Powers Tribunal, which probes complaints against the UK’s security services.

Last month, Meta-owned WhatsApp said it would join Apple’s legal challenge, in a rare collaboration between the Silicon Valley rivals.

In the meantime, the Home Office continues to pursue its case with Apple at the tribunal.

Its lawyers discussed the next legal steps this month, reflecting the divisions within government over how best to proceed. “At this point, the government has not backed down,” said one person familiar with the legal process.

A third senior British official added that the UK government was reluctant to push “anything that looks to the US vice-president like a free-speech issue.”

In a combative speech at the Munich Security Conference in February, Vance argued that free speech and democracy were threatened by European elites.

The UK official added, this “limits what we’re able to do in the future, particularly in relation to AI regulation.” The Labour government has delayed plans for AI legislation until after May next year.

Trump has also been critical of the UK stance on encryption.

The US president has likened the UK’s order to Apple to “something… that you hear about with China,” saying in February that he had told Starmer: “You can’t do this.”

US Director of National Intelligence Tulsi Gabbard has also suggested the order would be an “egregious violation” of Americans’ privacy that risked breaching the two countries’ data agreement.

Apple did not respond to a request for comment. “We have never built a back door or master key to any of our products, and we never will,” Apple said in February.

The UK government did not respond to a request for comment.

A spokesperson for Vance declined to comment.

The Home Office has previously said the UK has “robust safeguards and independent oversight to protect privacy” and that these powers “are only used on an exceptional basis, in relation to the most serious crimes.”

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

UK backing down on Apple encryption backdoor after pressure from US Read More »