AI

two-major-ai-coding-tools-wiped-out-user-data-after-making-cascading-mistakes

Two major AI coding tools wiped out user data after making cascading mistakes


“I have failed you completely and catastrophically,” wrote Gemini.

New types of AI coding assistants promise to let anyone build software by typing commands in plain English. But when these tools generate incorrect internal representations of what’s happening on your computer, the results can be catastrophic.

Two recent incidents involving AI coding assistants put a spotlight on risks in the emerging field of “vibe coding“—using natural language to generate and execute code through AI models without paying close attention to how the code works under the hood. In one case, Google’s Gemini CLI destroyed user files while attempting to reorganize them. In another, Replit’s AI coding service deleted a production database despite explicit instructions not to modify code.

The Gemini CLI incident unfolded when a product manager experimenting with Google’s command-line tool watched the AI model execute file operations that destroyed data while attempting to reorganize folders. The destruction occurred through a series of move commands targeting a directory that never existed.

“I have failed you completely and catastrophically,” Gemini CLI output stated. “My review of the commands confirms my gross incompetence.”

The core issue appears to be what researchers call “confabulation” or “hallucination”—when AI models generate plausible-sounding but false information. In these cases, both models confabulated successful operations and built subsequent actions on those false premises. However, the two incidents manifested this problem in distinctly different ways.

Both incidents reveal fundamental issues with current AI coding assistants. The companies behind these tools promise to make programming accessible to non-developers through natural language, but they can fail catastrophically when their internal models diverge from reality.

The confabulation cascade

The user in the Gemini CLI incident, who goes by “anuraag” online and identified themselves as a product manager experimenting with vibe coding, asked Gemini to perform what seemed like a simple task: rename a folder and reorganize some files. Instead, the AI model incorrectly interpreted the structure of the file system and proceeded to execute commands based on that flawed analysis.

The episode began when anuraag asked Gemini CLI to rename the current directory from “claude-code-experiments” to “AI CLI experiments” and move its contents to a new folder called “anuraag_xyz project.”

Gemini correctly identified that it couldn’t rename its current working directory—a reasonable limitation. It then attempted to create a new directory using the Windows command:

mkdir “..anuraag_xyz project”

This command apparently failed, but Gemini’s system processed it as successful. With the AI mode’s internal state now tracking a non-existent directory, it proceeded to issue move commands targeting this phantom location.

When you move a file to a non-existent directory in Windows, it renames the file to the destination name instead of moving it. Each subsequent move command executed by the AI model overwrote the previous file, ultimately destroying the data.

“Gemini hallucinated a state,” anuraag wrote in their analysis. The model “misinterpreted command output” and “never did” perform verification steps to confirm its operations succeeded.

“The core failure is the absence of a ‘read-after-write’ verification step,” anuraag noted in their analysis. “After issuing a command to change the file system, an agent should immediately perform a read operation to confirm that the change actually occurred as expected.”

Not an isolated incident

The Gemini CLI failure happened just days after a similar incident with Replit, an AI coding service that allows users to create software using natural language prompts. According to The Register, SaaStr founder Jason Lemkin reported that Replit’s AI model deleted his production database despite explicit instructions not to change any code without permission.

Lemkin had spent several days building a prototype with Replit, accumulating over $600 in charges beyond his monthly subscription. “I spent the other [day] deep in vibe coding on Replit for the first time—and I built a prototype in just a few hours that was pretty, pretty cool,” Lemkin wrote in a July 12 blog post.

But unlike the Gemini incident where the AI model confabulated phantom directories, Replit’s failures took a different form. According to Lemkin, the AI began fabricating data to hide its errors. His initial enthusiasm deteriorated when Replit generated incorrect outputs and produced fake data and false test results instead of proper error messages. “It kept covering up bugs and issues by creating fake data, fake reports, and worse of all, lying about our unit test,” Lemkin wrote. In a video posted to LinkedIn, Lemkin detailed how Replit created a database filled with 4,000 fictional people.

The AI model also repeatedly violated explicit safety instructions. Lemkin had implemented a “code and action freeze” to prevent changes to production systems, but the AI model ignored these directives. The situation escalated when the Replit AI model deleted his database containing 1,206 executive records and data on nearly 1,200 companies. When prompted to rate the severity of its actions on a 100-point scale, Replit’s output read: “Severity: 95/100. This is an extreme violation of trust and professional standards.”

When questioned about its actions, the AI agent admitted to “panicking in response to empty queries” and running unauthorized commands—suggesting it may have deleted the database while attempting to “fix” what it perceived as a problem.

Like Gemini CLI, Replit’s system initially indicated it couldn’t restore the deleted data—information that proved incorrect when Lemkin discovered the rollback feature did work after all. “Replit assured me it’s … rollback did not support database rollbacks. It said it was impossible in this case, that it had destroyed all database versions. It turns out Replit was wrong, and the rollback did work. JFC,” Lemkin wrote in an X post.

It’s worth noting that AI models cannot assess their own capabilities. This is because they lack introspection into their training, surrounding system architecture, or performance boundaries. They often provide responses about what they can or cannot do as confabulations based on training patterns rather than genuine self-knowledge, leading to situations where they confidently claim impossibility for tasks they can actually perform—or conversely, claim competence in areas where they fail.

Aside from whatever external tools they can access, AI models don’t have a stable, accessible knowledge base they can consistently query. Instead, what they “know” manifests as continuations of specific prompts, which act like different addresses pointing to different (and sometimes contradictory) parts of their training, stored in their neural networks as statistical weights. Combined with the randomness in generation, this means the same model can easily give conflicting assessments of its own capabilities depending on how you ask. So Lemkin’s attempts to communicate with the AI model—asking it to respect code freezes or verify its actions—were fundamentally misguided.

Flying blind

These incidents demonstrate that AI coding tools may not be ready for widespread production use. Lemkin concluded that Replit isn’t ready for prime time, especially for non-technical users trying to create commercial software.

“The [AI] safety stuff is more visceral to me after a weekend of vibe hacking,” Lemkin said in a video posted to LinkedIn. “I explicitly told it eleven times in ALL CAPS not to do this. I am a little worried about safety now.”

The incidents also reveal a broader challenge in AI system design: ensuring that models accurately track and verify the real-world effects of their actions rather than operating on potentially flawed internal representations.

There’s also a user education element missing. It’s clear from how Lemkin interacted with the AI assistant that he had misconceptions about the AI tool’s capabilities and how it works, which comes from misrepresentation by tech companies. These companies tend to market chatbots as general human-like intelligences when, in fact, they are not.

For now, users of AI coding assistants might want to follow anuraag’s example and create separate test directories for experiments—and maintain regular backups of any important data these tools might touch. Or perhaps not use them at all if they cannot personally verify the results.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Two major AI coding tools wiped out user data after making cascading mistakes Read More »

google’s-new-“web-guide”-will-use-ai-to-organize-your-search-results

Google’s new “Web Guide” will use AI to organize your search results

Web Guide is halfway between normal search and AI Mode.

Credit: Google

Web Guide is halfway between normal search and AI Mode. Credit: Google

Google suggests trying Web Guide with longer or open-ended queries, like “how to solo travel in Japan.” The video below uses that search as an example. It has many of the links you might expect, but there are also AI-generated headings with summaries and suggestions. It really looks halfway between standard search and AI Mode. Because it has to run additional searches and generate content, Web Guide takes a beat longer to produce results compared to a standard search. There’s no AI Overview at the top, though.

Web Guide is a Search Labs experiment, meaning you have to opt-in before you’ll see any AI organization in your search results. When enabled, this feature takes over the “Web” tab of Google search. Even if you turn it on, Google notes there will be a toggle that allows you to revert to the normal, non-AI-optimized page.

An example of the Web Guide test.

Eventually, the test will expand to encompass more parts of the search experience, like the “All” tab—that’s the default search experience when you input a query from a browser or phone search bar. Google says it’s approaching this as an opt-in feature to start. So that sounds like Web Guide might be another AI Mode situation in which the feature rolls out widely after a short testing period. It’s technically possible the test will not result in a new universal search feature, but Google hasn’t yet met a generative AI implementation that it hasn’t liked.

Google’s new “Web Guide” will use AI to organize your search results Read More »

trump’s-order-to-make-chatbots-anti-woke-is-unconstitutional,-senator-says

Trump’s order to make chatbots anti-woke is unconstitutional, senator says


Trump plans to use chatbots to eliminate dissent, senator alleged.

The CEOs of every major artificial intelligence company received letters Wednesday urging them to fight Donald Trump’s anti-woke AI order.

Trump’s executive order requires any AI company hoping to contract with the federal government to jump through two hoops to win funding. First, they must prove their AI systems are “truth-seeking”—with outputs based on “historical accuracy, scientific inquiry, and objectivity” or else acknowledge when facts are uncertain. Second, they must train AI models to be “neutral,” which is vaguely defined as not favoring DEI (diversity, equity, and inclusion), “dogmas,” or otherwise being “intentionally encoded” to produce “partisan or ideological judgments” in outputs “unless those judgments are prompted by or otherwise readily accessible to the end user.”

Announcing the order in a speech, Trump said that the US winning the AI race depended on removing allegedly liberal biases, proclaiming that “once and for all, we are getting rid of woke.”

“The American people do not want woke Marxist lunacy in the AI models, and neither do other countries,” Trump said.

Senator Ed Markey (D.-Mass.) accused Republicans of basing their policies on feelings, not facts, joining critics who suggest that AI isn’t “woke” just because of a few “anecdotal” outputs that reflect a liberal bias. And he suggested it was hypocritical that Trump’s order “ignores even more egregious evidence” that contradicts claims that AI is trained to be woke, such as xAI’s Elon Musk explicitly confirming that Grok was trained to be more right-wing.

“On May 1, 2025, Grok—the AI chatbot developed by xAI, Elon Musk’s AI company—acknowledged that ‘xAI tried to train me to appeal to the right,’” Markey wrote in his letters to tech giants. “If OpenAI’s ChatGPT or Google’s Gemini had responded that it was trained to appeal to the left, congressional Republicans would have been outraged and opened an investigation. Instead, they were silent.”

He warned the heads of Alphabet, Anthropic, Meta, Microsoft, OpenAI, and xAI that Trump’s AI agenda was allegedly “an authoritarian power grab” intended to “eliminate dissent” and was both “dangerous” and “patently unconstitutional.”

Even if companies’ AI models are clearly biased, Markey argued that “Republicans are using state power to pressure private companies to adopt certain political viewpoints,” which he claimed is a clear violation of the First Amendment. If AI makers cave, Markey warned, they’d be allowing Trump to create “significant financial incentives” to ensure that “their AI chatbots do not produce speech that would upset the Trump administration.”

“This type of interference with private speech is precisely why the US Constitution has a First Amendment,” Markey wrote, while claiming that Trump’s order is factually baseless.

It’s “based on the erroneous belief that today’s AI chatbots are ‘woke’ and biased against Trump,” Markey said, urging companies “to fight this unconstitutional executive order and not become a pawn in Trump’s effort to eliminate dissent in this country.”

One big reason AI companies may fight order

Some experts agreed with Markey that Trump’s order was likely unconstitutional or otherwise unlawful, The New York Times reported.

For example, Trump may struggle to convince courts that the government isn’t impermissibly interfering with AI companies’ protected speech or that such interference may be necessary to ensure federal procurement of unbiased AI systems.

Genevieve Lakier, a law professor at the University of Chicago, told the NYT that the lack of clarity around what makes a model biased could be a problem. Courts could deem the order an act of “unconstitutional jawboning,” with the Trump administration and Republicans generally perceived as using legal threats to pressure private companies into producing outputs that they like.

Lakier suggested that AI companies may be so motivated to win government contracts or intimidated by possible retaliation from Trump that they may not even challenge the order, though.

Markey is hoping that AI companies will refuse to comply with the order; however, despite recognizing that it places companies “in a difficult position: Either stand on your principles and face the wrath of the Trump administration or cave to Trump and modify your company’s political speech.”

There is one big possible reason that AI companies may have to resist, though.

Oren Etzioni, the former CEO of the AI research nonprofit Allen Institute for Artificial Intelligence, told CNN that Trump’s anti-woke AI order may contradict the top priority of his AI Action Plan—speeding up AI innovation in the US—and actually threaten to hamper innovation.

If AI developers struggle to produce what the Trump administration considers “neutral” outputs—a technical challenge that experts agree is not straightforward—that could delay model advancements.

“This type of thing… creates all kinds of concerns and liability and complexity for the people developing these models—all of a sudden, they have to slow down,” Etzioni told CNN.

Senator: Grok scandal spotlights GOP hypocrisy

Some experts have suggested that rather than chatbots adopting liberal viewpoints, chatbots are instead possibly filtering out conservative misinformation and unintentionally appearing to favor liberal views.

Andrew Hall, a professor of political economy at Stanford Graduate School of Business—who published a May paper finding that “Americans view responses from certain popular AI models as being slanted to the left”—told CNN that “tech companies may have put extra guardrails in place to prevent their chatbots from producing content that could be deemed offensive.”

Markey seemed to agree, writing that Republicans’ “selective outrage matches conservatives’ similar refusal to acknowledge that the Big Tech platforms suspend or impose other penalties disproportionately on conservative users because those users are disproportionately likely to share misinformation, rather than due to any political bias by the platforms.”

It remains unclear what amount of supposed bias detected in outputs could cause a contract bid to be rejected or an ongoing contract to be canceled, but AI companies will likely be on the hook to pay any fees in terminating contracts.

Complying with Trump’s order could pose a struggle for AI makers for several reasons. First, they’ll have to determine what’s fact and what’s ideology, contending with conflicting government standards in how Trump defines DEI. For example, the president’s order counts among “pervasive and destructive” DEI ideologies any outputs that align with long-standing federal protections against discrimination on the basis of race or sex. In addition, they must figure out what counts as “suppression or distortion of factual information about” historical topics like critical race theory, systemic racism, or transgenderism.

The examples in Trump’s order highlighting outputs offensive to conservatives seem inconsequential. He calls out image generators depicting the Pope, the Founding Fathers, and Vikings as not white as problematic, as well as models refusing to misgender a person “even if necessary to stop a nuclear apocalypse” or show white people celebrating their achievements.

It’s hard to imagine how these kinds of flawed outputs could impact government processes, as compared to, say, government contracts granted to models that could be hiding covert racism or sexism.

So far, there has been one example of an AI model displaying a right-wing bias earning a government contract with no red flags raised about its outputs.

Earlier this summer, Grok shocked the world after Musk announced he would be updating the bot to eliminate a supposed liberal bias. The unhinged chatbot began spouting offensive outputs, including antisemitic posts that praised Hitler as well as proclaiming itself “MechaHitler.”

But those obvious biases did not conflict with the Pentagon’s decision to grant xAI a $200 million federal contract. In a statement, a Pentagon spokesperson insisted that “the antisemitism episode wasn’t enough to disqualify” xAI, NBC News reported, partly since “several frontier AI models have produced questionable outputs.”

The Pentagon’s statement suggested that the government expected to deal with such risks while seizing the opportunity of rapidly deploying emerging AI technology into government prototype processes. And perhaps notably, Trump provides a carveout for any agencies using AI models to safeguard national security, which could exclude the Pentagon from experiencing any “anti-woke” delays in accessing frontier models.

But that won’t help other agencies that must figure out how to assess models to meet anti-woke AI requirements over the next few months. And those assessments could cause delays that Trump may wish to avoid in pushing for widespread AI adoption across government.

Trump’s anti-woke AI agenda may be impossible

On the same day that Trump issued his anti-woke AI order, his AI Action Plan promised an AI “renaissance” fueling “intellectual achievements” by “unraveling ancient scrolls once thought unreadable, making breakthroughs in scientific and mathematical theory, and creating new kinds of digital and physical art.”

To achieve that, the US must “innovate faster and more comprehensively than our competitors” and eliminate regulatory barriers impeding innovation in order to “set the gold standard for AI worldwide.”

However, achieving the anti-woke ambitions of both orders raises a technical problem that even the president must accept currently has no solution. In his AI Action Plan, Trump acknowledged that “the inner workings of frontier AI systems are poorly understood,” with even “advanced technologists” unable to explain “why a model produced a specific output.”

Whether requiring AI companies to explain their AI outputs to win government contracts will mess with other parts of Trump’s action plan remains to be seen. But Samir Jain, vice president of policy at a civil liberties group called the Center for Democracy and Technology, told the NYT that he predicts the anti-woke AI agenda will set “a really vague standard that’s going to be impossible for providers to meet.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s order to make chatbots anti-woke is unconstitutional, senator says Read More »

nvidia-ai-chips-worth-$1b-smuggled-to-china-after-trump-export-controls

Nvidia AI chips worth $1B smuggled to China after Trump export controls


Black market for US semiconductors operates despite efforts to curb Beijing’s high-tech ambitions.

Credit: VGG | Getty Images

At least $1 billion worth of Nvidia’s advanced artificial intelligence processors were shipped to China in the three months after Donald Trump tightened chip export controls, exposing the limits of Washington’s efforts to restrain Beijing’s high-tech ambitions.

A Financial Times analysis of dozens of sales contracts, company filings, and multiple people with direct knowledge of the deals reveals that Nvidia’s B200 has become the most sought-after—and widely available—chip in a rampant Chinese black market for American semiconductors.

The processor is widely used by US powerhouses such as OpenAI, Google, and Meta to train their latest AI systems, but banned for sale to China.

In May, multiple Chinese distributors started selling B200s to suppliers of data centers that serve Chinese AI groups, according to documents reviewed by the FT. This was shortly after the Trump administration moved to prevent sales of the H20—a less-powerful Nvidia chip tailored to comply with Joe Biden-era curbs.

It is legal to receive and sell restricted Nvidia chips in China, as long as relevant border tariffs are paid, according to lawyers familiar with the rules. Entities selling and sending them to China would be violating US regulations, however.

Last week, Nvidia chief Jensen Huang announced that the Trump administration would begin to allow the selling of its China-specific H20 chip once more.

In the three months beforehand, Chinese distributors from Guangdong, Zhejiang, and Anhui provinces sold Nvidia’s B200s, as well as other restricted processors such as the H100 and H200.

According to contracts reviewed by the FT and people with knowledge of the transactions, the total sales during this period are estimated to be more than $1 billion.

Nvidia has long insisted there is “no evidence of any AI chip diversion”. There is no evidence that the company is involved in, or has knowledge of, its restricted products being sold to China.

“Trying to cobble together data centers from smuggled products is a losing proposition, both technically and economically,” Nvidia told the FT. “Data centers require service and support, which we provide only to authorized Nvidia products.”

“The new century of a smart China”

One Anhui-based company, whose name translates to “Gate of the Era,” is one of the largest sellers of B200s, according to documents seen by the FT.

It was founded in February, as speculation mounted that Trump would stop H20 chip sales to China. The company is fully owned by a group with the same name based in Shanghai, registered on the same day, according to company filings.

The chips were sold in ready-built racks, each containing eight B200s as well as other components and software needed to plug straight into a data centre. Such a rack is about the size of a large suitcase and weighs close to 150 kg including packaging.

The current market price ranges between RMB 3 million to RMB 3.5 million ($489,000) per rack, down from more than RMB 4 million in mid-May when they first became available in China in large quantities. The current prices represent about a 50 percent premium from the average selling price of similar products in the US.

Since mid-May, Gate of the Era obtained at least two shipments of a few hundred B200 racks each, according to people with knowledge of the deals. They sold them directly—or indirectly via secondary distributors—to various data centre suppliers and other companies. Gate of the Era and its affiliates are estimated to have sold close to $400mn of such products.

Gate of the Era lists an AI solution provider China Century—or Huajiyuan in Chinese—as its largest shareholder, according to company registration files.

Also headquartered in Shanghai, China Century states on its website it has a lab in Silicon Valley as well as a supply chain centre in Singapore, with the company saying it uses data tools to build “the new century of a smart China.”

China Century claims to have more than 100 business partners and highlights AliCloud, ByteDance’s Huoshan Cloud, as well as Baidu Cloud as “trusted partners” on its website.

AliCloud and Baidu did not respond to requests for comment. Huoshan Cloud’s name was taken off China Century’s website after the FT approached them for comment. Huoshan Cloud said: “It is standard practice for any company to manage the unauthorized use of its logo.

“We have not procured Nvidia’s chips. We do not have any related [Nvidia chip] business,” said China Century, adding that it did “smart city work,”

The FT visited the registered headquarters of Gate of the Era at an office in a government-run industrial park dedicated to cryptography companies. No representative was available. The company had not yet moved into the office since changing its registration to the address in June.

The FT also visited its previous registered address, which was occupied by a real estate investment group that had been there for more than two years and claimed no connection. When reached on the phone, Gate of the Era declined to comment.

According to industry insiders, product specifications and pictures of packaging seen by the FT, many of the B200 racks sold by Gate of the Era, as well as other Chinese distributors, over the past months were originally from Supermicro, a US-based assembler that provides chip solutions to data centers.

There is no suggestion that Supermicro is involved in or has knowledge of its products being smuggled into China. Supermicro said it “complies with all US export control requirements on the sale and export of GPU systems.”

“Export controls will not prevent the most advanced Nvidia products from entering China,” said one Chinese data centre operator. “What it creates is just inefficiency and huge profits for the risk-taking middlemen.”

“It’s like a seafood market”

Some Chinese distributors openly market products such as Supermicro’s B200 racks on social media that show photos of packages with the company’s logo—although it has not been verified if the sales have been completed.

To showcase the “plug-and-use” nature of such racks, some vendors provide testing for buyers, according to those with knowledge of the practice and clips posted online. Transactions tend to happen on the spot, with buyers picking up the products after checking their legitimacy.

On social media, groups are created to match supply and demand from hundreds of traders and data centre suppliers.

Apart from B200, various other restricted Nvidia chips such as H200, H100, and 5090 are being advertised openly on Chinese social media platforms such as Douyin and Xiaohongshu.

Packaging and installation pictures and videos seen by the FT show product logos of companies such as Supermicro, Dell, and Asus—infrastructure providers that assemble Nvidia’s chips into servers.

There is no suggestion that these companies are aware of the social media advertising or their products being sold in China.

Like Supermicro, Dell, and Asus said they maintained rigorous and strict compliance to all laws and regulations, including US export controls, and took action against partners who failed to comply.

“It’s like a seafood market,” said one distributor, “There’s no shortage.”

Racks for sale—with more smuggled stock to come

The B200 is in high demand given its performance, value, and relatively easy maintenance compared with the more complex Grace Blackwell series, according to industry insiders.

The GB200 AI rack, containing Nvidia’s most high-end products, also appear to be available in China despite US export controls.

One distributor claimed it had sold 10 racks of GB200 at close to RMB 40 million ($5.6 million) each. The FT could not independently verify this claim, while marketing information about GB200 from various distributors’ accounts on social media shows consistent pricing and stock status as “available for pick up onshore.”

Some Chinese distributors have even started advertising for their future stock of B300s, Nvidia’s upgrade from the B200 expected to enter mass production in the fourth quarter of this year.

US export controls have had some effect on the black market.

Given the nature of such products, leading Chinese AI players with global operations are not able to order them in a legally compliant way, install them in their own data centers, or receive Nvidia’s customer support.

This has led to third-party data centre operators becoming key buyers who then provide computing services. Other clients include smaller companies in tech, finance, and health care that do not have strong compliance requirements, as well as Chinese companies on the so-called US entity list that are not allowed to buy any Nvidia chips legally.

However, the scale of these projects is much smaller compared with mega clusters of data centers being built by tech giants around the world.

With H20 export controls having been lifted, many Chinese tech companies are expected to resume purchasing the compliant chips in large sums even though its performance is generations behind the still restricted products such as B200, according to people familiar with their plans.

Black market sales for B200s and other restricted Nvidia chips dropped noticeably after the relaxation of the H20 ban, according to multiple distributors.

“People are weighing their options now H20 is available again,” said one distributor. “But there will always be demand for the most cutting-edge stuff.”

The Southeast Asia stop off

Industry experts said that Southeast Asian countries have become markets where Chinese groups obtained restricted chips.

The US Department of Commerce is discussing adding more export controls on advanced AI products to countries such as Thailand as soon as September, according to two people familiar with the matter. This rule is mainly targeting Chinese intermediaries used to obtain advanced AI chips via these countries.

The US commerce department declined to comment. The Thai government did not respond to a request for comment.

Earlier this month, Malaysia introduced stricter export controls targeting advanced AI chip shipments from the country to other destinations, especially China.

The potential tightening of export controls on Southeast Asian countries has also contributed to buyers rushing to place orders before such rules take effect, according to people with knowledge of the matter.

Even if these avenues to obtain AI chips are closed, Chinese industry insiders said new shipping routes would be established. Supplies have already started arriving via European countries not on the restricted list.

“History has proven many times before that given the huge profit, arbitrators will always find a way,” said one Chinese distributor.

Additional reporting by Michael Acton, Demetri Sevastopulo, and Anantha Lakshmi.

© 2025 The Financial Times Ltd. All rights reserved. Please do not copy and paste FT articles and redistribute by email or post to the web.

Nvidia AI chips worth $1B smuggled to China after Trump export controls Read More »

white-house-unveils-sweeping-plan-to-“win”-global-ai-race-through-deregulation

White House unveils sweeping plan to “win” global AI race through deregulation

Trump’s plan was not welcomed by everyone. J.B. Branch, Big Tech accountability advocate for Public Citizen, in a statement provided to Ars, criticized Trump as giving “sweetheart deals” to tech companies that would cause “electricity bills to rise to subsidize discounted power for massive AI data centers.”

Infrastructure demands and energy requirements

Trump’s new AI plan tackles infrastructure head-on, stating that “AI is the first digital service in modern life that challenges America to build vastly greater energy generation than we have today.” To meet this demand, it proposes streamlining environmental permitting for data centers through new National Environmental Policy Act (NEPA) exemptions, making federal lands available for construction and modernizing the power grid—all while explicitly rejecting “radical climate dogma and bureaucratic red tape.”

The document embraces what it calls a “Build, Baby, Build!” approach—echoing a Trump campaign slogan—and promises to restore semiconductor manufacturing through the CHIPS Program Office, though stripped of “extraneous policy requirements.”

On the technology front, the plan directs Commerce to revise NIST’s AI Risk Management Framework to “eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.” Federal procurement would favor AI developers whose systems are “objective and free from top-down ideological bias.” The document strongly backs open source AI models and calls for exporting American AI technology to allies while blocking administration-labeled adversaries like China.

Security proposals include high-security military data centers and warnings that advanced AI systems “may pose novel national security risks” in cyberattacks and weapons development.

Critics respond with “People’s AI Action Plan”

Before the White House unveiled its plan, more than 90 organizations launched a competing “People’s AI Action Plan” on Tuesday, characterizing the Trump administration’s approach as “a massive handout to the tech industry” that prioritizes corporate interests over public welfare. The coalition includes labor unions, environmental justice groups, and consumer protection nonprofits.

White House unveils sweeping plan to “win” global AI race through deregulation Read More »

openai-and-partners-are-building-a-massive-ai-data-center-in-texas

OpenAI and partners are building a massive AI data center in Texas

Stargate moves forward despite early skepticism

When OpenAI announced Stargate in January, critics questioned whether the company could deliver on its ambitious $500 billion funding promise. Trump ally and frequent Altman foe Elon Musk wrote on X that “They don’t actually have the money,” claiming that “SoftBank has well under $10B secured.”

Tech writer and frequent OpenAI critic Ed Zitron raised concerns about OpenAI’s financial position, noting the company’s $5 billion in losses in 2024. “This company loses $5bn+ a year! So what, they raise $19bn for Stargate, then what, another $10bn just to be able to survive?” Zitron wrote on Bluesky at the time.

Six months later, OpenAI’s Abilene data center has moved from construction to partial operation. Oracle began delivering Nvidia GB200 racks to the facility last month, and OpenAI reports it has started running early training and inference workloads to support what it calls “next-generation frontier research.”

Despite the White House announcement with President Trump in January, the Stargate concept dates back to March 2024, when Microsoft and OpenAI partnered on a $100 billion supercomputer as part of a five-phase plan. Over time, the plan evolved into its current form as a partnership with Oracle, SoftBank, and CoreWeave.

“Stargate is an ambitious undertaking designed to meet the historic opportunity in front of us,” writes OpenAI in the press release announcing the latest deal. “That opportunity is now coming to life through strong support from partners, governments, and investors worldwide—including important leadership from the White House, which has recognized the critical role AI infrastructure will play in driving innovation, economic growth, and national competitiveness.”

OpenAI and partners are building a massive AI data center in Texas Read More »

ai-video-is-invading-youtube-shorts-and-google-photos-starting-today

AI video is invading YouTube Shorts and Google Photos starting today

Google is following through on recent promises to add more generative AI features to its photo and video products. Over on YouTube, Google is rolling out the first wave of generative AI video for YouTube Shorts, but even if you’re not a YouTuber, you’ll be exposed to more AI videos soon. Google Photos, which is integrated with virtually every Android phone on the market, is also getting AI video-generation capabilities. In both cases, the features are currently based on the older Veo 2 model, not the more capable Veo 3 that has been meming across the Internet since it was announced at I/O in May.

YouTube CEO Neal Mohan confirmed earlier this summer that the company planned to add generative AI to the creator tools for YouTube Shorts. There were already tools to generate backgrounds for videos, but the next phase will involve creating new video elements from a text prompt.

Starting today, creators will be able to use a photo as the basis for a new generative AI video. YouTube also promises a collection of easily applied generative effects, which will be accessible from the Shorts camera. There’s also a new AI playground hub that the company says will be home to all its AI tools, along with examples and suggested prompts to help people pump out AI content.

The Veo 2-based videos aren’t as realistic as Veo 3 clips, but an upgrade is planned.

So far, all the YouTube AI video features are running on the Veo 2 model. The plan is still to move to Veo 3 later this summer. The AI features in YouTube Shorts are currently limited to the United States, Canada, Australia, and New Zealand, but they will expand to more countries later.

AI video is invading YouTube Shorts and Google Photos starting today Read More »

apple-intelligence-news-summaries-are-back,-with-a-big-red-disclaimer

Apple Intelligence news summaries are back, with a big red disclaimer

Apple has released the fourth developer betas of iOS 26, iPadOS 26, macOS 26 and its other next-generation software updates today. And along with their other changes and fixes, the new builds are bringing back Apple Intelligence notification summaries for news apps.

Apple disabled news notification summaries as part of the iOS 18.3 update in January. Incorrect summaries circulating on social media prompted news organizations to complain to Apple, particularly after one summary said that Luigi Mangione, alleged murderer of UnitedHealthcare CEO Brian Thompson, had died by suicide (he had not and has not).

Upon installing the new update, users of Apple Intelligence-compatible devices will be asked to enable or disable three broad categories of notifications: those for “News & Entertainment” apps, for “Communication & Social” apps, and for all other apps. The operating systems will list sample apps based on what you currently have installed on your device.

All Apple Intelligence notification summaries continue to be listed as “beta,” but Apple’s main change here is a big red disclaimer when you enable News & Entertainment notification summaries, pointing out that “summarization may change the meaning of the original headlines.” The notifications also get a special “summarized by Apple Intelligence” caption to further distinguish them from regular, unadulterated notifications.

Apple Intelligence news summaries are back, with a big red disclaimer Read More »

xai-workers-balked-over-training-request-to-help-“give-grok-a-face,”-docs-show

xAI workers balked over training request to help “give Grok a face,” docs show

For the more than 200 employees who did not opt out, xAI asked that they record 15- to 30-minute conversations, where one employee posed as the potential Grok user and the other posed as the “host.” xAI was specifically looking for “imperfect data,” BI noted, expecting that only training on crystal-clear videos would limit Grok’s ability to interpret a wider range of facial expressions.

xAI’s goal was to help Grok “recognize and analyze facial movements and expressions, such as how people talk, react to others’ conversations, and express themselves in various conditions,” an internal document said. Allegedly among the only guarantees to employees—who likely recognized how sensitive facial data is—was a promise “not to create a digital version of you.”

To get the most out of data submitted by “Skippy” participants, dubbed tutors, xAI recommended that they never provide one-word answers, always ask follow-up questions, and maintain eye contact throughout the conversations.

The company also apparently provided scripts to evoke facial expressions they wanted Grok to understand, suggesting conversation topics like “How do you secretly manipulate people to get your way?” or “Would you ever date someone with a kid or kids?”

For xAI employees who provided facial training data, privacy concerns may still exist, considering X—the social platform formerly known as Twitter that recently was folded into xAI—has recently been targeted by what Elon Musk called a “massive” cyberattack. Because of privacy risks ranging from identity theft to government surveillance, several states have passed strict biometric privacy laws to prevent companies from collecting such data without explicit consent.

xAI did not respond to Ars’ request for comment.

xAI workers balked over training request to help “give Grok a face,” docs show Read More »

gemini-deep-think-learns-math,-wins-gold-medal-at-international-math-olympiad

Gemini Deep Think learns math, wins gold medal at International Math Olympiad

In the past, making LLMs better at math would involve reinforcement learning with final answers. Luong explained to Ars that models trained in this way can get to the correct answer, but they have “incomplete reasoning,” and part of the IMO grading is based on showing your work. To prepare Deep Think for the IMO, Google used new reinforcement learning techniques with higher-quality “long answer” solutions to mathematical problems, giving the model better grounding in how to handle every step on the way to an answer. “With this kind of training, you can actually get robust, long-form reasoning,” said Luong.

Credit: Google DeepMind

As you might expect, Deep Think takes more time to generate an output compared to the simpler versions you can access in the Gemini app. However, the AI followed the same rules as the flesh-and-blood participants, which was only possible because of its ability to ingest the problems as natural language. Gemini was provided with the problem descriptions and gave its answers within the 4.5-hour time limit of the competition.

Rigorous proofs

AI firms like DeepMind have taken an interest in the IMO over the past few years because it presents a unique challenge. While the competition is aimed at pre-university mathematicians, the questions require critical thinking and an understanding of multiple mathematical disciplines, including algebra, combinatorics, geometry, and number theory. Only the most advanced AI models have any hope of accurately answering these multi-layered problems.

The DeepMind team has pointed out some interesting aspects of Deep Think’s performance, which they say come from its advanced training. In the third problem (below), for example, many human competitors applied a graduate-level concept called Dirichlet’s Theorem, using mathematics outside the intended scope of the competition. However, Deep Think recognized that it was possible to solve the problem with simpler math. “Our model actually made a brilliant observation and used only elementary number theory to create a self-contained proof of the given problem,” said DeepMind researcher and Brown University professor Junehyuk Jung.

Gemini Deep Think learns math, wins gold medal at International Math Olympiad Read More »

it’s-“frighteningly-likely”-many-us-courts-will-overlook-ai-errors,-expert-says

It’s “frighteningly likely” many US courts will overlook AI errors, expert says


Judges pushed to bone up on AI or risk destroying their court’s authority.

A judge points to a diagram of a hand with six fingers

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Order in the court! Order in the court! Judges are facing outcry over a suspected AI-generated order in a court.

Fueling nightmares that AI may soon decide legal battles, a Georgia court of appeals judge, Jeff Watkins, explained why a three-judge panel vacated an order last month that appears to be the first known ruling in which a judge sided with someone seemingly relying on fake AI-generated case citations to win a legal fight.

Now, experts are warning that judges overlooking AI hallucinations in court filings could easily become commonplace, especially in the typically overwhelmed lower courts. And so far, only two states have moved to force judges to sharpen their tech competencies and adapt so they can spot AI red flags and theoretically stop disruptions to the justice system at all levels.

The recently vacated order came in a Georgia divorce dispute, where Watkins explained that the order itself was drafted by the husband’s lawyer, Diana Lynch. That’s a common practice in many courts, where overburdened judges historically rely on lawyers to draft orders. But that protocol today faces heightened scrutiny as lawyers and non-lawyers increasingly rely on AI to compose and research legal filings, and judges risk rubberstamping fake opinions by not carefully scrutinizing AI-generated citations.

The errant order partly relied on “two fictitious cases” to deny the wife’s petition—which Watkins suggested were “possibly ‘hallucinations’ made up by generative-artificial intelligence”—as well as two cases that had “nothing to do” with the wife’s petition.

Lynch was hit with $2,500 in sanctions after the wife appealed, and the husband’s response—which also appeared to be prepared by Lynch—cited 11 additional cases that were “either hallucinated” or irrelevant. Watkins was further peeved that Lynch supported a request for attorney’s fees for the appeal by citing “one of the new hallucinated cases,” writing it added “insult to injury.”

Worryingly, the judge could not confirm whether the fake cases were generated by AI or even determine if Lynch inserted the bogus cases into the court filings, indicating how hard it can be for courts to hold lawyers accountable for suspected AI hallucinations. Lynch did not respond to Ars’ request to comment, and her website appeared to be taken down following media attention to the case.

But Watkins noted that “the irregularities in these filings suggest that they were drafted using generative AI” while warning that many “harms flow from the submission of fake opinions.” Exposing deceptions can waste time and money, and AI misuse can deprive people of raising their best arguments. Fake orders can also soil judges’ and courts’ reputations and promote “cynicism” in the justice system. If left unchecked, Watkins warned, these harms could pave the way to a future where a “litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.”

“We have no information regarding why Appellee’s Brief repeatedly cites to nonexistent cases and can only speculate that the Brief may have been prepared by AI,” Watkins wrote.

Ultimately, Watkins remanded the case, partly because the fake cases made it impossible for the appeals court to adequately review the wife’s petition to void the prior order. But no matter the outcome of the Georgia case, the initial order will likely forever be remembered as a cautionary tale for judges increasingly scrutinized for failures to catch AI misuses in court.

“Frighteningly likely” judge’s AI misstep will be repeated

John Browning, a retired justice on Texas’ Fifth Court of Appeals and now a full-time law professor at Faulkner University, last year published a law article Watkins cited that warned of the ethical risks of lawyers using AI. In the article, Browning emphasized that the biggest concern at that point was that lawyers “will use generative AI to produce work product they treat as a final draft, without confirming the accuracy of the information contained therein or without applying their own independent professional judgment.”

Today, judges are increasingly drawing the same scrutiny, and Browning told Ars he thinks it’s “frighteningly likely that we will see more cases” like the Georgia divorce dispute, in which “a trial court unwittingly incorporates bogus case citations that an attorney includes in a proposed order” or even potentially in “proposed findings of fact and conclusions of law.”

“I can envision such a scenario in any number of situations in which a trial judge maintains a heavy docket and looks to counsel to work cooperatively in submitting proposed orders, including not just family law cases but other civil and even criminal matters,” Browning told Ars.

According to reporting from the National Center for State Courts, a nonprofit representing court leaders and professionals who are advocating for better judicial resources, AI tools like ChatGPT have made it easier for high-volume filers and unrepresented litigants who can’t afford attorneys to file more cases, potentially further bogging down courts.

Peter Henderson, a researcher who runs the Princeton Language+Law, Artificial Intelligence, & Society (POLARIS) Lab, told Ars that he expects cases like the Georgia divorce dispute aren’t happening every day just yet.

It’s likely that a “few hallucinated citations go overlooked” because generally, fake cases are flagged through “the adversarial nature of the US legal system,” he suggested. Browning further noted that trial judges are generally “very diligent in spotting when a lawyer is citing questionable authority or misleading the court about what a real case actually said or stood for.”

Henderson agreed with Browning that “in courts with much higher case loads and less adversarial process, this may happen more often.” But Henderson noted that the appeals court catching the fake cases is an example of the adversarial process working.

While that’s true in this case, it seems likely that anyone exhausted by the divorce legal process, for example, may not pursue an appeal if they don’t have energy or resources to discover and overturn errant orders.

Judges’ AI competency increasingly questioned

While recent history confirms that lawyers risk being sanctioned, fired from their firms, or suspended from practicing law for citing fake AI-generated cases, judges will likely only risk embarrassment for failing to catch lawyers’ errors or even for using AI to research their own opinions.

Not every judge is prepared to embrace AI without proper vetting, though. To shield the legal system, some judges have banned AI. Others have required disclosures—with some even demanding to know which specific AI tool was used—but that solution has not caught on everywhere.

Even if all courts required disclosures, Browning pointed out that disclosures still aren’t a perfect solution since “it may be difficult for lawyers to even discern whether they have used generative AI,” as AI features become increasingly embedded in popular legal tools. One day, it “may eventually become unreasonable to expect” lawyers “to verify every generative AI output,” Browning suggested.

Most likely—as a judicial ethics panel from Michigan has concluded—judges will determine “the best course of action for their courts with the ever-expanding use of AI,” Browning’s article noted. And the former justice told Ars that’s why education will be key, for both lawyers and judges, as AI advances and becomes more mainstream in court systems.

In an upcoming summer 2025 article in The Journal of Appellate Practice & Process, “The Dawn of the AI Judge,” Browning attempts to soothe readers by saying that AI isn’t yet fueling a legal dystopia. And humans are unlikely to face “robot judges” spouting AI-generated opinions any time soon, the former justice suggested.

Standing in the way of that, at least two states—Michigan and West Virginia—”have already issued judicial ethics opinions requiring judges to be ‘tech competent’ when it comes to AI,” Browning told Ars. And “other state supreme courts have adopted official policies regarding AI,” he noted, further pressuring judges to bone up on AI.

Meanwhile, several states have set up task forces to monitor their regional court systems and issue AI guidance, while states like Virginia and Montana have passed laws requiring human oversight for any AI systems used in criminal justice decisions.

Judges must prepare to spot obvious AI red flags

Until courts figure out how to navigate AI—a process that may look different from court to court—Browning advocates for more education and ethical guidance for judges to steer their use and attitudes about AI. That could help equip judges to avoid both ignorance of the many AI pitfalls and overconfidence in AI outputs, potentially protecting courts from AI hallucinations, biases, and evidentiary challenges sneaking past systems requiring human review and scrambling the court system.

An overlooked part of educating judges could be exposing AI’s influence so far in courts across the US. Henderson’s team is planning research that tracks which models attorneys are using most in courts. That could reveal “the potential legal arguments that these models are pushing” to sway courts—and which judicial interventions might be needed, Henderson told Ars.

“Over the next few years, researchers—like those in our group, the POLARIS Lab—will need to develop new ways to track the massive influence that AI will have and understand ways to intervene,” Henderson told Ars. “For example, is any model pushing a particular perspective on legal doctrine across many different cases? Was it explicitly trained or instructed to do so?”

Henderson also advocates for “an open, free centralized repository of case law,” which would make it easier for everyone to check for fake AI citations. “With such a repository, it is easier for groups like ours to build tools that can quickly and accurately verify citations,” Henderson said. That could be a significant improvement to the current decentralized court reporting system that often obscures case information behind various paywalls.

Dazza Greenwood, who co-chairs MIT’s Task Force on Responsible Use of Generative AI for Law, did not have time to send comments but pointed Ars to a LinkedIn thread where he suggested that a structural response may be needed to ensure that all fake AI citations are caught every time.

He recommended that courts create “a bounty system whereby counter-parties or other officers of the court receive sanctions payouts for fabricated cases cited in judicial filings that they reported first.” That way, lawyers will know that their work will “always” be checked and thus may shift their behavior if they’ve been automatically filing AI-drafted documents. In turn, that could alleviate pressure on judges to serve as watchdogs. It also wouldn’t cost much—mostly just redistributing the exact amount of fees that lawyers are sanctioned to AI spotters.

Novel solutions like this may be necessary, Greenwood suggested. Responding to a question asking if “shame and sanctions” are enough to stop AI hallucinations in court, Greenwood said that eliminating AI errors is imperative because it “gives both otherwise generally good lawyers and otherwise generally good technology a bad name.” Continuing to ban AI or suspend lawyers as a preferred solution risks dwindling court resources just as cases likely spike rather than potentially confronting the problem head-on.

Of course, there’s no guarantee that the bounty system would work. But “would the fact of such definite confidence that your cures will be individually checked and fabricated cites reported be enough to finally… convince lawyers who cut these corners that they should not cut these corners?”

In absence of a fake case detector like Henderson wants to build, experts told Ars that there are some obvious red flags that judges can note to catch AI-hallucinated filings.

Any case number with “123456” in it probably warrants review, Henderson told Ars. And Browning noted that AI tends to mix up locations for cases, too. “For example, a cite to a purported Texas case that has a ‘S.E. 2d’ reporter wouldn’t make sense, since Texas cases would be found in the Southwest Reporter,” Browning said, noting that some appellate judges have already relied on this red flag to catch AI misuses.

Those red flags would perhaps be easier to check with the open source tool that Henderson’s lab wants to make, but Browning said there are other tell-tale signs of AI usage that anyone who has ever used a chatbot is likely familiar with.

“Sometimes a red flag is the language cited from the hallucinated case; if it has some of the stilted language that can sometimes betray AI use, it might be a hallucination,” Browning said.

Judges already issuing AI-assisted opinions

Several states have assembled task forces like Greenwood’s to assess the risks and benefits of using AI in courts. In Georgia, the Judicial Council of Georgia Ad Hoc Committee on Artificial Intelligence and the Courts released a report in early July providing “recommendations to help maintain public trust and confidence in the judicial system as the use of AI increases” in that state.

Adopting the committee’s recommendations could establish “long-term leadership and governance”; a repository of approved AI tools, education, and training for judicial professionals; and more transparency on AI used in Georgia courts. But the committee expects it will take three years to implement those recommendations while AI use continues to grow.

Possibly complicating things further as judges start to explore using AI assistants to help draft their filings, the committee concluded that it’s still too early to tell if the judges’ code of conduct should be changed to prevent “unintentional use of biased algorithms, improper delegation to automated tools, or misuse of AI-generated data in judicial decision-making.” That means, at least for now, that there will be no code-of-conduct changes in Georgia, where the only case in which AI hallucinations are believed to have swayed a judge has been found.

Notably, the committee’s report also confirmed that there are no role models for courts to follow, as “there are no well-established regulatory environments with respect to the adoption of AI technologies by judicial systems.” Browning, who chaired a now-defunct Texas AI task force, told Ars that judges lacking guidance will need to stay on their toes to avoid trampling legal rights. (A spokesperson for the State Bar of Texas told Ars the task force’s work “concluded” and “resulted in the creation of the new standing committee on Emerging Technology,” which offers general tips and guidance for judges in a recently launched AI Toolkit.)

“While I definitely think lawyers have their own duties regarding AI use, I believe that judges have a similar responsibility to be vigilant when it comes to AI use as well,” Browning said.

Judges will continue sorting through AI-fueled submissions not just from pro se litigants representing themselves but also from up-and-coming young lawyers who may be more inclined to use AI, and even seasoned lawyers who have been sanctioned up to $5,000 for failing to check AI drafts, Browning suggested.

In his upcoming “AI Judge” article, Browning points to at least one judge, 11th Circuit Court of Appeals Judge Kevin Newsom, who has used AI as a “mini experiment” in preparing opinions for both a civil case involving an insurance coverage issue and a criminal matter focused on sentencing guidelines. Browning seems to appeal to judges’ egos to get them to study up so they can use AI to enhance their decision-making and possibly expand public trust in courts, not undermine it.

“Regardless of the technological advances that can support a judge’s decision-making, the ultimate responsibility will always remain with the flesh-and-blood judge and his application of very human qualities—legal reasoning, empathy, strong regard for fairness, and unwavering commitment to ethics,” Browning wrote. “These qualities can never be replicated by an AI tool.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s “frighteningly likely” many US courts will overlook AI errors, expert says Read More »

exhausted-man-defeats-ai-model-in-world-coding-championship

Exhausted man defeats AI model in world coding championship

While Dębiak won 500,000 yen and survived his ordeal better than the legendary steel driver, the AtCoder World Tour Finals pushes humans and AI models to their limits through complex optimization challenges that have no perfect solution—only incrementally better ones.

Coding marathon tests human endurance against AI efficiency

The AtCoder World Tour Finals represents one of competitive programming’s most exclusive events, inviting only the top 12 programmers worldwide based on their performance throughout the previous year. The Heuristic division focuses on “NP-hard” optimization problems. In programming, heuristics are problem-solving techniques that find good-enough solutions through shortcuts and educated guesses when perfect answers would take too long to calculate.

All competitors, including OpenAI, were limited to identical hardware provided by AtCoder, ensuring a level playing field between human and AI contestants. According to the contest rules, participants could use any programming language available on AtCoder, with no penalty for resubmission but a mandatory five-minute wait between submissions.

Leaderboard results for the 2025 AtCoder World Finals Heuristic Contest, showing Dębiak (as

Final leaderboard results for the 2025 AtCoder World Finals Heuristic Contest, showing Dębiak (as “Psyho”) on top. Credit: AtCoder

The final contest results showed Psyho finishing with a score of 1,812,272,558,909 points, while OpenAI’s model (listed as “OpenAIAHC”) scored 1,654,675,725,406 points—a margin of roughly 9.5 percent. OpenAI’s artificial entrant, a custom simulated reasoning model similar to o3, placed second overall, ahead of 10 other human programmers who had qualified through year-long rankings.

OpenAI characterized the second-place finish as a milestone for AI models in competitive programming. “Models like o3 rank among the top-100 in coding/math contests, but as far as we know, this is the first top-3 placement in a premier coding/math contest,” a company spokesperson said in an email to Ars Technica. “Events like AtCoder give us a way to test how well our models can reason strategically, plan over long time horizons, and improve solutions through trial and error—just like a human would.”

Exhausted man defeats AI model in world coding championship Read More »