AI

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »

google-translate-expands-live-translation-to-all-earbuds-on-android

Google Translate expands live translation to all earbuds on Android

Gemini text translation

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word.

Credit: Google

Translate can now use Gemini to interpret the meaning of a phrase rather than simply translating each word. Credit: Google

Regardless of whether you’re using live translate or just checking a single phrase, Google claims the Gemini-powered upgrade will serve you well. Google Translate is now apparently better at understanding the nuance of languages, with an awareness of idioms and local slang. Google uses the example of “stealing my thunder,” which wouldn’t make a lick of sense when translated literally into other languages. The new translation model, which is also available in the search-based translation interface, supports over 70 languages.

Google also debuted language-learning features earlier this year, borrowing a page from educational apps like Duolingo. You can tell the app your skill level with a language, as well as whether you need help with travel-oriented conversations or more everyday interactions. The app uses this to create tailored listening and speaking exercises.

AI Translate learning

The Translate app’s learning tools are getting better.

Credit: Google

The Translate app’s learning tools are getting better. Credit: Google

With this big update, Translate will be more of a stickler about your pronunciation. Google promises more feedback and tips based on your spoken replies in the learning modules. The app will also now keep track of how often you complete language practice, showing your daily streak in the app.

If “number go up” will help you learn more, then this update is for you. Practice mode is also launching in almost 20 new countries, including Germany, India, Sweden, and Taiwan.

Google Translate expands live translation to all earbuds on Android Read More »

scientists-built-an-ai-co-pilot-for-prosthetic-bionic-hands

Scientists built an AI co-pilot for prosthetic bionic hands

To test their AI-powered hand, the team asked intact and amputee participants to manipulate fragile objects: pick up a paper cup and drink from it, or take an egg from a plate and put it down somewhere else. Without the AI, they could succeed roughly one or two times in 10 attempts. With the AI assistant turned on, their success rate jumped to 80 or 90 percent. The AI also decreased the participants’ cognitive burden, meaning they had to focus less on making the hand work.

But we’re still a long way away from seamlessly integrating machines with the human body.

Into the wild

“The next step is to really take this system into the real world and have someone use it in their home setting,” Trout says. So far, the performance of the AI bionic hand was assessed under controlled laboratory conditions, working with settings and objects the team specifically chose or designed.

“I want to make a caveat here that this hand is not as dexterous or easy to control as a natural, intact limb,” George cautions. He thinks that every little increment that we make in prosthetics is allowing amputees to do more tasks in their daily life. Still, to get to the Star Wars or Cyberpunk technology level where bionic prostheses are just as good or better than natural limbs, we’re going to need more than just incremental changes.

Trout says we’re almost there as far as robotics go. “These prostheses are really dexterous, with high degrees of freedom,” Trout says, “but there’s no good way to control them.” This in part comes down to the challenge of getting the information in and out of users themselves. “Skin surface electromyography is very noisy, so improving this interface with things like internal electromyography or using neural implants can really improve the algorithms we already have,” Trout argued. This is why the team is currently working on neural interface technologies and looking for industry partners.

“The goal is to combine all these approaches in one device,” George says. “We want to build an AI-powered robotic hand with a neural interface working with a company that would take it to the market in larger clinical trials.”

Nature Communications, 2025.  DOI: 10.1038/s41467-025-65965-9

Scientists built an AI co-pilot for prosthetic bionic hands Read More »

trump-tries-to-block-state-ai-laws-himself-after-congress-decided-not-to

Trump tries to block state AI laws himself after Congress decided not to


Trump claims state laws force AI makers to embed “ideological bias” in models.

President Donald Trump talks to journalists after signing executive orders in the Oval Office at the White House on August 25, 2025 in Washington, DC. Credit: Getty Images | Chip Somodevilla

President Trump issued an executive order yesterday attempting to thwart state AI laws, saying that federal agencies must fight state laws because Congress hasn’t yet implemented a national AI standard. Trump’s executive order tells the Justice Department, Commerce Department, Federal Communications Commission, Federal Trade Commission, and other federal agencies to take a variety of actions.

“My Administration must act with the Congress to ensure that there is a minimally burdensome national standard—not 50 discordant State ones. The resulting framework must forbid State laws that conflict with the policy set forth in this order… Until such a national standard exists, however, it is imperative that my Administration takes action to check the most onerous and excessive laws emerging from the States that threaten to stymie innovation,” Trump’s order said. The order claims that state laws, such as one passed in Colorado, “are increasingly responsible for requiring entities to embed ideological bias within models.”

Congressional Republicans recently decided not to include a Trump-backed plan to block state AI laws in the National Defense Authorization Act (NDAA), although it could be included in other legislation. Sen. Ted Cruz (R-Texas) has also failed to get congressional backing for legislation that would punish states with AI laws.

“After months of failed lobbying and two defeats in Congress, Big Tech has finally received the return on its ample investment in Donald Trump,” US Sen. Ed Markey (D-Mass.) said yesterday. “With this executive order, Trump is delivering exactly what his billionaire benefactors demanded—all at the expense of our kids, our communities, our workers, and our planet.”

Markey said that “a broad, bipartisan coalition in Congress has rejected the AI moratorium again and again.” Sen. Maria Cantwell (D-Wash.) said the “executive order’s overly broad preemption threatens states with lawsuits and funding cuts for protecting their residents from AI-powered frauds, scams, and deepfakes.”

Trump orders Bondi to sue states

Sen. Brian Schatz (D-Hawaii) said that “preventing states from enacting common-sense regulation that protects people from the very real harms of AI is absurd and dangerous. Congress has a responsibility to get this technology right—and quickly—but states must be allowed to act in the public interest in the meantime. I’ll be working with my colleagues to introduce a full repeal of this order in the coming days.”

The Trump order includes a variation on Cruz’s proposal to prevent states with AI laws from accessing broadband grant funds. The executive order also includes a plan that Trump recently floated to have the federal government file lawsuits against states with AI laws.

Within 30 days of yesterday’s order, US Attorney General Pam Bondi is required to create an AI Litigation Task Force “whose sole responsibility shall be to challenge State AI laws inconsistent with the policy set forth in section 2 of this order, including on grounds that such laws unconstitutionally regulate interstate commerce, are preempted by existing Federal regulations, or are otherwise unlawful in the Attorney General’s judgment.”

Americans for Responsible Innovation, a group that lobbies for regulation of AI, said the Trump order “relies on a flimsy and overly broad interpretation of the Constitution’s Interstate Commerce Clause cooked up by venture capitalists over the last six months.”

Section 2 of Trump’s order is written vaguely to give the administration leeway to challenge many types of AI laws. “It is the policy of the United States to sustain and enhance the United States’ global AI dominance through a minimally burdensome national policy framework for AI,” the section says.

Colorado law irks Trump

The executive order specifically names a Colorado law that requires AI developers to protect consumers against “algorithmic discrimination.” It defines this type of discrimination as “any condition in which the use of an artificial intelligence system results in an unlawful differential treatment or impact that disfavors an individual or group of individuals on the basis” of age, race, sex, and other protected characteristics.

The Colorado law compels developers of “high-risk systems” to make various disclosures, implement a risk management policy and program, give consumers the right to “correct any incorrect personal data that a high-risk system processed in making a consequential decision,” and let consumers appeal any “adverse consequential decision concerning the consumer arising from the deployment of a high-risk system.”

Trump’s order alleges that the Colorado law “may even force AI models to produce false results in order to avoid a ‘differential treatment or impact’ on protected groups.” Trump’s order also says that “state laws sometimes impermissibly regulate beyond State borders, impinging on interstate commerce.”

Trump ordered the Commerce Department to evaluate existing state AI laws and identify “onerous” ones that conflict with the policy. “That evaluation of State AI laws shall, at a minimum, identify laws that require AI models to alter their truthful outputs, or that may compel AI developers or deployers to disclose or report information in a manner that would violate the First Amendment or any other provision of the Constitution,” the order said.

States would be declared ineligible for broadband funds

Under the order, states with AI laws that get flagged by the Trump administration will be deemed ineligible for “non-deployment funds” from the US government’s $42 billion Broadband Equity, Access, and Deployment (BEAD) program. The amount of non-deployment funds will be sizable because it appears that only about half of the $42 billion allocated by Congress will be used by the Trump administration to help states subsidize broadband deployment.

States with AI laws would not be blocked from receiving the deployment subsidies, but would be ineligible for the non-deployment funds that could be used for other broadband-related purposes. Beyond broadband, Trump’s order tells other federal agencies to “assess their discretionary grant programs” and consider withholding funds from states with AI laws.

Other agencies are being ordered to use whatever authority they have to preempt state laws. The order requires Federal Communications Commission Chairman Brendan Carr to “initiate a proceeding to determine whether to adopt a Federal reporting and disclosure standard for AI models that preempts conflicting State laws.” It also requires FTC Chairman Andrew Ferguson to issue a policy statement detailing “circumstances under which State laws that require alterations to the truthful outputs of AI models are preempted by the Federal Trade Commission Act’s prohibition on engaging in deceptive acts or practices affecting commerce.”

Finally, Trump’s order requires administration officials to “prepare a legislative recommendation establishing a uniform Federal policy framework for AI that preempts State AI laws that conflict with the policy set forth in this order.” The proposed ban would apply to most types of state AI laws, with exceptions for rules relating to “child safety protections; AI compute and data center infrastructure, other than generally applicable permitting reforms; [and] state government procurement and use of AI.”

It would be up to Congress to decide whether to pass the proposed legislation. But the various other components of the executive order could dissuade states from implementing AI laws even if Congress takes no action.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Trump tries to block state AI laws himself after Congress decided not to Read More »

runway-claims-its-gwm-1-“world-models”-can-stay-coherent-for-minutes-at-a-time

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time

Even using the word “general” has an air of aspiration to it. You would expect a general world model to be, well, one model—but in this case, we’re looking at three distinct, post-trained models. That caveats the general-ness a bit, but Runway says that it’s “working toward unifying many different domains and action spaces under a single base world model.”

A competitive field

And that brings us to another important consideration: With GWM-1, Runway is entering a competitive gold-rush space where its differentiators and competitive advantages are less clear than they were for video. With video, Runway has been able to make major inroads in film/television, advertising, and other industries because its founders are perceived as being more rooted in those creative industries than most competitors, and they’ve designed tools with those industries in mind.

There are indeed hypothetical applications of world models in film, television, advertising, and game development—but it was apparent from Runway’s livestream that the company is also looking at applications in robotics as well as physics and life sciences research, where competitors are already well-established and where we’ve seen increasing investment in recent months.

Many of those competitors are big tech companies with massive resource advantages over Runway. Runway was one of the first to market with a sellable product, and its aggressive efforts to court industry professionals directly has so far allowed it to overcome those advantages in video generation, but it remains to be seen how things will play out with world models, where it doesn’t enjoy either advantage any more than the other entrants.

Regardless, the GWM-1 advancements are impressive—especially if Runway’s claims about consistency and coherence over longer stretches of time are true.

Runway also used its livestream to announce new Gen 4.5 video generation capabilities, including native audio, audio editing, and multi-shot video editing. Further, it announced a deal with CoreWeave, a cloud computing company with an AI focus. The deal will see Runway utilizing Nvidia’s GB300 NVL72 racks on CoreWeave’s cloud infrastructure for future training and inference.

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »

disney-says-google-ai-infringes-copyright-“on-a-massive-scale”

Disney says Google AI infringes copyright “on a massive scale”

While Disney wants its characters out of Google AI generally, the letter specifically cited the AI tools in YouTube. Google has started adding its Veo AI video model to YouTube, allowing creators to more easily create and publish videos. That seems to be a greater concern for Disney than image models like Nano Banana.

Google has said little about Disney’s warning—a warning Google must have known was coming. A Google spokesperson has issued the following brief statement on the mater.

“We have a longstanding and mutually beneficial relationship with Disney, and will continue to engage with them,” Google says. “More generally, we use public data from the open web to build our AI and have built additional innovative copyright controls like Google-extended and Content ID for YouTube, which give sites and copyright holders control over their content.”

Perhaps this is previewing Google’s argument in a theoretical lawsuit. That copyrighted Disney content was all over the open internet, so is it really Google’s fault it ended up baked into the AI?

Content silos for AI

The generative AI boom has treated copyright as a mere suggestion as companies race to gobble up training data and remix it as “new” content. A cavalcade of companies, including The New York Times and Getty Images, have sued over how their material has been used and replicated by AI. Disney itself threatened a lawsuit against Character.AI earlier this year, leading to the removal of Disney content from the service.

Google isn’t Character.AI, though. It’s probably no coincidence that Disney is challenging Google at the same time it is entering into a content deal with OpenAI. Disney has invested $1 billion in the AI firm and agreed to a three-year licensing deal that officially brings Disney characters to OpenAI’s Sora video app. The specifics of that arrangement are still subject to negotiations.

Disney says Google AI infringes copyright “on a massive scale” Read More »

disney-invests-$1-billion-in-openai,-licenses-200-characters-for-ai-video-app-sora

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora

An AI-generated version of OpenAI CEO Sam Altman, seen in a still capture from a video generated by Sora 2.

An AI-generated version of OpenAI CEO Sam Altman seen in a still capture from a video generated by Sora 2. Credit: OpenAI

Under the new agreement with Disney, Sora users will be able to generate short videos using characters such as Mickey Mouse, Darth Vader, Iron Man, Simba, and characters from franchises including Frozen, Inside Out, Toy Story, and The Mandalorian, along with costumes, props, vehicles, and environments.

The ChatGPT image generator will also gain official access to the same intellectual property, although that information was trained into these AI models long ago. What’s changing is that OpenAI will allow Disney-related content generated by its AI models to officially pass through its content moderation filters and reach the user, sanctioned by Disney.

On Disney’s end of the deal, the company plans to deploy ChatGPT for its employees and use OpenAI’s technology to build new features for Disney+. A curated selection of fan-made Sora videos will stream on the Disney+ platform starting in early 2026.

The agreement does not include any talent likenesses or voices. Disney and OpenAI said they have committed to “maintaining robust controls to prevent the generation of illegal or harmful content” and to “respect the rights of individuals to appropriately control the use of their voice and likeness.”

OpenAI CEO Sam Altman called the deal a model for collaboration between AI companies and studios. “This agreement shows how AI companies and creative leaders can work together responsibly to promote innovation that benefits society, respect the importance of creativity, and help works reach vast new audiences,” Altman said.

From adversary to partner

Money opens all kinds of doors, and the new partnership represents a dramatic reversal in Disney’s approach to OpenAI from just a few months ago. At that time, Disney and other major studios refused to participate in Sora 2 following its launch on September 30.

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora Read More »

a-new-open-weights-ai-coding-model-is-closing-in-on-proprietary-options

A new open-weights AI coding model is closing in on proprietary options

On Tuesday, French AI startup Mistral AI released Devstral 2, a 123 billion parameter open-weights coding model designed to work as part of an autonomous software engineering agent. The model achieves a 72.2 percent score on SWE-bench Verified, a benchmark that attempts to test whether AI systems can solve real GitHub issues, putting it among the top-performing open-weights models.

Perhaps more notably, Mistral didn’t just release an AI model, it released a new development app called Mistral Vibe. It’s a command line interface (CLI) similar to Claude Code, OpenAI Codex, and Gemini CLI that lets developers interact with the Devstral models directly in their terminal. The tool can scan file structures and Git status to maintain context across an entire project, make changes across multiple files, and execute shell commands autonomously. Mistral released the CLI under the Apache 2.0 license.

It’s always wise to take AI benchmarks with a large grain of salt, but we’ve heard from employees of the big AI companies that they pay very close attention to how well models do on SWE-bench Verified, which presents AI models with 500 real software engineering problems pulled from GitHub issues in popular Python repositories. The AI must read the issue description, navigate the codebase, and generate a working patch that passes unit tests. While some AI researchers have noted that around 90 percent of the tasks in the benchmark test relatively simple bug fixes that experienced engineers could complete in under an hour, it’s one of the few standardized ways to compare coding models.

At the same time as the larger AI coding model, Mistral also released Devstral Small 2, a 24 billion parameter version that scores 68 percent on the same benchmark and can run locally on consumer hardware like a laptop with no Internet connection required. Both models support a 256,000 token context window, allowing them to process moderately large codebases (although whether you consider it large or small is very relative depending on overall project complexity). The company released Devstral 2 under a modified MIT license and Devstral Small 2 under the more permissive Apache 2.0 license.

A new open-weights AI coding model is closing in on proprietary options Read More »

us-taking-25%-cut-of-nvidia-chip-sales-“makes-no-sense,”-experts-say

US taking 25% cut of Nvidia chip sales “makes no sense,” experts say


Trump’s odd Nvidia reversal may open the door for China to demand Blackwell access.

Donald Trump’s decision to allow Nvidia to export an advanced artificial intelligence chip, the H200, to China may give China exactly what it needs to win the AI race, experts and lawmakers have warned.

The H200 is about 10 times less powerful than Nvidia’s Blackwell chip, which is the tech giant’s currently most advanced chip that cannot be exported to China. But the H200 is six times more powerful than the H20, the most advanced chip available in China today. Meanwhile China’s leading AI chip maker, Huawei, is estimated to be about two years behind Nvidia’s technology. By approving the sales, Trump may unwittingly be helping Chinese chip makers “catch up” to Nvidia, Jake Sullivan told The New York Times.

Sullivan, a former Biden-era national security advisor who helped design AI chip export curbs on China, told the NYT that Trump’s move was “nuts” because “China’s main problem” in the AI race “is they don’t have enough advanced computing capability.”

“It makes no sense that President Trump is solving their problem for them by selling them powerful American chips,” Sullivan said. “We are literally handing away our advantage. China’s leaders can’t believe their luck.”

Trump apparently was persuaded by Nvidia CEO Jensen Huang and his “AI czar,” David Sacks, to reverse course on H200 export curbs. They convinced Trump that restricting sales would ensure that only Chinese chip makers would get a piece of China’s market, shoring up revenue flows that dominant firms like Huawei could pour into R&D.

By instead allowing Nvidia sales, China’s industry would remain hooked on US chips, the thinking goes. And Nvidia could use those funds—perhaps $10–15 billion annually, Bloomberg Intelligence has estimated—to further its own R&D efforts. That cash influx, theoretically, would allow Nvidia to maintain the US advantage.

Along the way, the US would receive a 25 percent cut of sales, which lawmakers from both sides of the aisle warned may not be legal and suggested to foreign rivals that US national security was “now up for sale,” NYT reported. The president has claimed there are conditions to sales safeguarding national security but, frustrating critics, provided no details.

Experts slam Nvidia plan as “flawed”

Trump’s plan is “flawed,” The Economist reported.

For years, the US has established tech dominance by keeping advanced technology away from China. Trump risks rocking that boat by “tearing up America’s export-control policy,” particularly if China’s chip industry simply buys up the H200s as a short-term tactic to learn from the technology and beef up its domestic production of advanced chips, The Economist reported.

In a sign that’s exactly what many expect could happen, investors in China were apparently so excited by Trump’s announcement that they immediately poured money into Moore Threads, expected to be China’s best answer to Nvidia, the South China Morning Post reported.

Several experts for the non-partisan think tank the Counsel on Foreign Relations also criticized the policy change, cautioning that the reversal of course threatened to undermine US competition with China.

Suggesting that Trump was “effectively undoing” export curbs sought during his first term, Zongyuan Zoe Liu warned that China “buys today to learn today, with the intention to build tomorrow.”

And perhaps more concerning, she suggested, is that Trump’s policy signals weakness. Rather than forcing Chinese dependence on US tech, reversing course showed China that the US will “back down” under pressure, she warned. And they’re getting that message at a time when “Chinese leaders have a lot of reasons to believe they are not only winning the trade war but also making progress towards a higher degree of strategic autonomy.”

In a post on X, Rush Doshi—a CFR expert who previously advised Biden on national security issues related to China—suggested that the policy change was “possibly decisive in the AI race.”

“Compute is our main advantage—China has more power, engineers, and the entire edge layer—so by giving this up, we increase the odds the world runs on Chinese AI,” Doshi wrote.

Experts fear Trump may not understand the full impact of his decision. In the short-term, Michael C. Horowitz wrote for CFR, “it is indisputable” that allowing H200 exports benefits China’s frontier AI and efforts to scale data centers. And Doshi pointed out that Trump’s shift may trigger more advanced technology flowing into China, as US allies that restricted sales of machines to build AI chips may soon follow his lead and lift their curbs. As China learns to be self-reliant from any influx of advanced tech, Sullivan warned that China’s leaders “intend to get off of American semiconductors as soon as they can.”

“So, the argument that we can keep them ‘addicted’ holds no water,” Sullivan said. “They want American chips right now for one simple reason: They are behind in the AI race, and this will help them catch up while they build their own chip capabilities.”

China may reject H200, demand Blackwell access

It remains unclear if China will approve H200 sales, but some of the country’s biggest firms, including ByteDance, Tencent, and Alibaba, are interested, anonymous insider sources told Reuters.

In the past, China has instructed companies to avoid Nvidia, warning of possible backdoors giving Nvidia a kill switch to remotely shut down chips. Such backdoors could potentially destabilize Chinese firms’ operations and R&D. Nvidia has denied such backdoors exist, but Chinese firms have supposedly sought reassurances from Nvidia in the aftermath of Trump’s policy change. Likely just as unpopular with the Chinese firms and government, Nvidia confirmed recently that it has built location verification tech that could help the US detect when restricted chips are leaked into China. Should the US ever renew export curbs on H200 chips, adopting them widely could cause chaos in the future.

Without giving China sought-after reassurances, Nvidia may not end up benefiting as much as it hoped from its mission to reclaim lost revenue from the Chinese market. Today, Chinese firms control about 60 percent of China’s AI chip market, where only a few years ago American firms—led by Nvidia—controlled 80 percent, the Economist reported.

But for China, the temptation to buy up Nvidia chips may be too great to pass up. Another CFR expert, Chris McGuire, estimated that Nvidia could suddenly start exporting as many as 3 million H200s into China next year. “This would at least triple the amount of aggregate AI computing power China could add domestically” in 2026, McGuire wrote, and possibly trigger disastrous outcomes for the US.

“This could cause DeepSeek and other Chinese AI developers to close the gap with leading US AI labs and enable China to develop an ‘AI Belt and Road’ initiative—a complement to its vast global infrastructure investment network already in place—that competes with US cloud providers around the world,” McGuire forecasted.

As China mulls the benefits and risks, an emergency meeting was called, where the Chinese government discussed potential concerns of local firms buying chips, according to The Information. Reportedly, Beijing ended that meeting with a promise to issue a decision soon.

Horowitz suggested that a primary reason that China may reject the H200s could be to squeeze even bigger concessions out of Trump, whose administration recently has been working to maintain a tenuous truce with China.

“China could come back demanding the Blackwell or something else,” Horowitz suggested.

In a statement, Nvidia—which plans to release a chip called the Rubin to surpass the Blackwell soon—praised Trump’s policy as striking “a thoughtful balance that is great for America.”

China will rip off Nvidia’s chips, Republican warns

Both Democratic and Republican lawmakers in Congress criticized Trump’s plan, including senators behind a bipartisan push to limit AI chip sales to China.

Some have questioned how much thought was put into the policy, as the US confusingly continues restricting less advanced AI chips (like the A100 and H100) while green-lighting H200 sales. Trump’s Justice Department also seems to be struggling to keep up. The NYT noted that just “hours before” Trump announced the policy change, the DOJ announced “it had detained two people for selling those chips to the country.”

The chair of the Select Committee on Competition with China, Rep. John Moolenaar (R-Mich.), warned on X that the news wouldn’t be good for the US or Nvidia. First, the Chinese Communist Party “will use these highly advanced chips to strengthen its military capabilities and totalitarian surveillance,” he suggested. And second, “Nvidia should be under no illusions—China will rip off its technology, mass produce it themselves, and seek to end Nvidia as a competitor.”

“That is China’s playbook and it is using it in every critical industry,” Moolenaar said.

House Democrats on committees dealing with foreign affairs and competition with China echoed those concerns, The Hill reported, warning that “under this administration, our national security is for sale.”

Nvidia’s Huang seems pleased with the outcome, which comes after months of reportedly pressuring the administration to lift export curbs limiting its growth in Chinese markets, the NYT reported. Last week, Trump heaped praise on Huang after one meeting, calling Huang a “smart man” and suggesting the Nvidia chief has “done an amazing job” helping Trump understand the stakes.

At an October news conference ahead of the deal’s official approval, Huang suggested that government lawyers were researching ways to get around a US law that prohibits charging companies fees for export licenses. Eventually, Trump is expected to release a policy that outlines how the US will collect those fees without conflicting with that law.

Senate Democrats appear unlikely to embrace such a policy, issuing a joint statement condemning the H200 sales as dooming the US in the AI race and threatening national security.

“Access to these chips would give China’s military transformational technology to make its weapons more lethal, carry out more effective cyberattacks against American businesses and critical infrastructure and strengthen their economic and manufacturing sector,” Senators wrote.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

US taking 25% cut of Nvidia chip sales “makes no sense,” experts say Read More »

big-tech-joins-forces-with-linux-foundation-to-standardize-ai-agents

Big Tech joins forces with Linux Foundation to standardize AI agents

Big Tech has spent the past year telling us we’re living in the era of AI agents, but most of what we’ve been promised is still theoretical. As companies race to turn fantasy into reality, they’ve developed a collection of tools to guide the development of generative AI. A cadre of major players in the AI race, including Anthropic, Block, and OpenAI, has come together to promote interoperability with the newly formed Agentic AI Foundation (AAIF). This move elevates a handful of popular technologies and could make them a de facto standard for AI development going forward.

The development path for agentic AI models is cloudy to say the least, but companies have invested so heavily in creating these systems that some tools have percolated to the surface. The AAIF, which is part of the nonprofit Linux Foundation, has been launched to govern the development of three key AI technologies: Model Context Protocol (MCP), goose, and AGENTS.md.

MCP is probably the most well-known of the trio, having been open-sourced by Anthropic a year ago. The goal of MCP is to link AI agents to data sources in a standardized way—Anthropic (and now the AAIF) is fond of calling MCP a “USB-C port for AI.” Rather than creating custom integrations for every different database or cloud storage platform, MCP allows developers to quickly and easily connect to any MCP-compliant server.

Since its release, MCP has been widely used across the AI industry. Google announced at I/O 2025 that it was adding support for MCP in its dev tools, and many of its products have since added MCP servers to make data more accessible to agents. OpenAI also adopted MCP just a few months after it was released.

mcp simple diagram

Credit: Anthropic

Expanding use of MCP might help users customize their AI experience. For instance, the new Pebble Index 01 ring uses a local LLM that can act on your voice notes, and it supports MCP for user customization.

Local AI models have to make some sacrifices compared to bigger cloud-based models, but MCP can fill in the functionality gaps. “A lot of tasks on productivity and content are fully doable on the edge,” Qualcomm head of AI products, Vinesh Sukumar, tells Ars. “With MCP, you have a handshake with multiple cloud service providers for any kind of complex task to be completed.”

Big Tech joins forces with Linux Foundation to standardize AI agents Read More »

in-comedy-of-errors,-men-accused-of-wiping-gov-databases-turned-to-an-ai-tool

In comedy of errors, men accused of wiping gov databases turned to an AI tool

Two sibling contractors convicted a decade ago for hacking into US State Department systems have once again been charged, this time for a comically hamfisted attempt to steal and destroy government records just minutes after being fired from their contractor jobs.

The Department of Justice on Thursday said that Muneeb Akhter and Sohaib Akhter, both 34, of Alexandria, Virginia, deleted databases and documents maintained and belonging to three government agencies. The brothers were federal contractors working for an undisclosed company in Washington, DC, that provides software and services to 45 US agencies. Prosecutors said the men coordinated the crimes and began carrying them out just minutes after being fired.

Using AI to cover up an alleged crime—what could go wrong?

On February 18 at roughly 4: 55 pm, the men were fired from the company, according to an indictment unsealed on Thursday. Five minutes later, they allegedly began trying to access their employer’s system and access federal government databases. By then, access to one of the brothers’ accounts had already been terminated. The other brother, however, allegedly accessed a government agency’s database stored on the employer’s server and issued commands to prevent other users from connecting or making changes to the database. Then, prosecutors said, he issued a command to delete 96 databases, many of which contained sensitive investigative files and records related to Freedom of Information Act matters.

Despite their brazen attempt to steal and destroy information from multiple government agencies, the men lacked knowledge of the database commands needed to cover up their alleged crimes. So they allegedly did what many amateurs do: turned to an AI chat tool.

One minute after deleting Department of Homeland Security information, Muneep Akhter allegedly asked an AI tool “how do i clear system logs from SQL servers after deleting databases.” Shortly afterward, he queried the tool “how do you clear all event and application logs from Microsoft windows server 2012,” prosecutors said.

The indictment provides enough details of the databases wiped and information stolen to indicate that the brothers’ attempts to cover their tracks failed. It’s unclear whether the apparent failure was due to the AI tool providing inadequate instructions or the men failing to follow them correctly. Prosecutors say they also obtained records of discussions between the men in the hours or days following, in which they discussed removing incriminating evidence from their homes. Three days later, the men allegedly wiped their employer-issued laptops by reinstalling the operating system.

In comedy of errors, men accused of wiping gov databases turned to an AI tool Read More »