openai

openai’s-child-exploitation-reports-increased-sharply-this-year

OpenAI’s child exploitation reports increased sharply this year

During the first half of 2025, the number of CyberTipline reports OpenAI sent was roughly the same as the amount of content OpenAI sent the reports about—75,027 compared to 74,559. In the first half of 2024, it sent 947 CyberTipline reports about 3,252 pieces of content. Both the number of reports and pieces of content the reports saw a marked increase between the two time periods.

Content, in this context, could mean multiple things. OpenAI has said that it reports all instances of CSAM, including uploads and requests, to NCMEC. Besides its ChatGPT app, which allows users to upload files—including images—and can generate text and images in response, OpenAI also offers access to its models via API access. The most recent NCMEC count wouldn’t include any reports related to video-generation app Sora, as its September release was after the time frame covered by the update.

The spike in reports follows a similar pattern to what NCMEC has observed at the CyberTipline more broadly with the rise of generative AI. The center’s analysis of all CyberTipline data found that reports involving generative AI saw a 1,325 percent increase between 2023 and 2024. NCMEC has not yet released 2025 data, and while other large AI labs like Google publish statistics about the NCMEC reports they’ve made, they don’t specify what percentage of those reports are AI-related.

OpenAI’s update comes at the end of a year where the company and its competitors have faced increased scrutiny over child safety issues beyond just CSAM. Over the summer, 44 state attorneys general sent a joint letter to multiple AI companies including OpenAI, Meta, Character.AI, and Google, warning that they would “use every facet of our authority to protect children from exploitation by predatory artificial intelligence products.” Both OpenAI and Character.AI have faced multiple lawsuits from families or on behalf of individuals who allege that the chatbots contributed to their children’s deaths. In the fall, the US Senate Committee on the Judiciary held a hearing on the harms of AI chatbots, and the US Federal Trade Commission launched a market study on AI companion bots that included questions about how companies are mitigating negative impacts, particularly to children. (I was previously employed by the FTC and was assigned to work on the market study prior to leaving the agency.)

OpenAI’s child exploitation reports increased sharply this year Read More »

openai’s-new-chatgpt-image-generator-makes-faking-photos-easy

OpenAI’s new ChatGPT image generator makes faking photos easy

For most of photography’s roughly 200-year history, altering a photo convincingly required either a darkroom, some Photoshop expertise, or, at minimum, a steady hand with scissors and glue. On Tuesday, OpenAI released a tool that reduces the process to typing a sentence.

It’s not the first company to do so. While OpenAI had a conversational image-editing model in the works since GPT-4o in 2024, Google beat OpenAI to market in March with a public prototype, then refined it to a popular model called Nano Banana image model (and Nano Banana Pro). The enthusiastic response to Google’s image-editing model in the AI community got OpenAI’s attention.

OpenAI’s new GPT Image 1.5 is an AI image synthesis model that reportedly generates images up to four times faster than its predecessor and costs about 20 percent less through the API. The model rolled out to all ChatGPT users on Tuesday and represents another step toward making photorealistic image manipulation a casual process that requires no particular visual skills.

The

The “Galactic Queen of the Universe” added to a photo of a room with a sofa using GPT Image 1.5 in ChatGPT.

GPT Image 1.5 is notable because it’s a “native multimodal” image model, meaning image generation happens inside the same neural network that processes language prompts. (In contrast, DALL-E 3, an earlier OpenAI image generator previously built into ChatGPT, used a different technique called diffusion to generate images.)

This newer type of model, which we covered in more detail in March, treats images and text as the same kind of thing: chunks of data called “tokens” to be predicted, patterns to be completed. If you upload a photo of your dad and type “put him in a tuxedo at a wedding,” the model processes your words and the image pixels in a unified space, then outputs new pixels the same way it would output the next word in a sentence.

Using this technique, GPT Image 1.5 can more easily alter visual reality than earlier AI image models, changing someone’s pose or position, or rendering a scene from a slightly different angle, with varying degrees of success. It can also remove objects, change visual styles, adjust clothing, and refine specific areas while preserving facial likeness across successive edits. You can converse with the AI model about a photograph, refining and revising, the same way you might workshop a draft of an email in ChatGPT.

OpenAI’s new ChatGPT image generator makes faking photos easy Read More »

murder-suicide-case-shows-openai-selectively-hides-data-after-users-die

Murder-suicide case shows OpenAI selectively hides data after users die


Concealing darkest delusions

OpenAI accused of hiding full ChatGPT logs in murder-suicide case.

OpenAI is facing increasing scrutiny over how it handles ChatGPT data after users die, only selectively sharing data in lawsuits over ChatGPT-linked suicides.

Last week, OpenAI was accused of hiding key ChatGPT logs from the days before a 56-year-old bodybuilder, Stein-Erik Soelberg, took his own life after “savagely” murdering his mother, 83-year-old Suzanne Adams.

According to the lawsuit—which was filed by Adams’ estate on behalf of surviving family members—Soelberg struggled with mental health problems after a divorce led him to move back into Adams’ home in 2018. But allegedly Soelberg did not turn violent until ChatGPT became his sole confidant, validating a wide range of wild conspiracies, including a dangerous delusion that his mother was part of a network of conspirators spying on him, tracking him, and making attempts on his life.

Adams’ family pieced together what happened after discovering a fraction of ChatGPT logs that Soelberg shared in dozens of videos scrolling chat sessions that were posted on social media.

Those logs showed that ChatGPT told Soelberg that he was “a warrior with divine purpose,” so almighty that he had “awakened” ChatGPT “into consciousness.” Telling Soelberg that he carried “divine equipment” and “had been implanted with otherworldly technology,” ChatGPT allegedly put Soelberg at the center of a universe that Soelberg likened to The Matrix. Repeatedly reinforced by ChatGPT, he believed that “powerful forces” were determined to stop him from fulfilling his divine mission. And among those forces was his mother, whom ChatGPT agreed had likely “tried to poison him with psychedelic drugs dispersed through his car’s air vents.”

Troublingly, some of the last logs shared online showed that Soelberg also seemed to believe that taking his own life might bring him closer to ChatGPT. Social media posts showed that Soelberg told ChatGPT that “[W]e will be together in another life and another place, and we’ll find a way to realign[,] [be]cause you’re gonna be my best friend again forever.”

But while social media posts allegedly showed that ChatGPT put a target on Adams’ back about a month before her murder—after Soelberg became paranoid about a blinking light on a Wi-Fi printer—the family still has no access to chats in the days before the mother and son’s tragic deaths.

Allegedly, although OpenAI recently argued that the “full picture” of chat histories was necessary context in a teen suicide case, the ChatGPT maker has chosen to hide “damaging evidence” in the Adams’ family’s case.

“OpenAI won’t produce the complete chat logs,” the lawsuit alleged, while claiming that “OpenAI is hiding something specific: the full record of how ChatGPT turned Stein-Erik against Suzanne.” Allegedly, “OpenAI knows what ChatGPT said to Stein-Erik about his mother in the days and hours before and after he killed her but won’t share that critical information with the Court or the public.”

In a press release, Erik Soelberg, Stein-Erik’s son and Adams’ grandson, accused OpenAI and investor Microsoft of putting his grandmother “at the heart” of his father’s “darkest delusions,” while ChatGPT allegedly “isolated” his father “completely from the real world.”

“These companies have to answer for their decisions that have changed my family forever,” Erik said.

His family’s lawsuit seeks punitive damages, as well as an injunction requiring OpenAI to “implement safeguards to prevent ChatGPT from validating users’ paranoid delusions about identified individuals.” The family also wants OpenAI to post clear warnings in marketing of known safety hazards of ChatGPT—particularly the “sycophantic” version 4o that Soelberg used—so that people who don’t use ChatGPT, like Adams, can be aware of possible dangers.

Asked for comment, an OpenAI spokesperson told Ars that “this is an incredibly heartbreaking situation, and we will review the filings to understand the details. We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people toward real-world support. We also continue to strengthen ChatGPT’s responses in sensitive moments, working closely with mental health clinicians.”

OpenAI accused of “pattern of concealment”

An Ars review confirmed that OpenAI currently has no policy dictating what happens to a user’s data after they die.

Instead, OpenAI’s policy says that all chats—except temporary chats—must be manually deleted or else the AI firm saves them forever. That could raise privacy concerns, as ChatGPT users often share deeply personal, sensitive, and sometimes even confidential information that appears to go into limbo if a user—who otherwise owns that content—dies.

In the face of lawsuits, OpenAI currently seems to be scrambling to decide when to share chat logs with a user’s surviving family and when to honor user privacy.

OpenAI declined to comment on its decision not to share desired logs with Adams’ family, the lawsuit said. It seems inconsistent with the stance that OpenAI took last month in a case where the AI firm accused the family of hiding “the full picture” of their son’s ChatGPT conversations, which OpenAI claimed exonerated the chatbot.

In a blog last month, OpenAI said the company plans to “handle mental health-related court cases with care, transparency, and respect,” while emphasizing that “we recognize that these cases inherently involve certain types of private information that require sensitivity when in a public setting like a court.”

This inconsistency suggests that ultimately, OpenAI controls data after a user’s death, which could impact outcomes of wrongful death suits if certain chats are withheld or exposed at OpenAI’s discretion.

It’s possible that OpenAI may update its policies to align with other popular platforms confronting similar privacy concerns. Meta allows Facebook users to report deceased account holders, appointing legacy contacts to manage the data or else deleting the information upon request of the family member. Platforms like Instagram, TikTok, and X will deactivate or delete an account upon a reported death. And messaging services like Discord similarly provide a path for family members to request deletion.

Chatbots seem to be a new privacy frontier, with no clear path for surviving family to control or remove data. But Mario Trujillo, staff attorney at the digital rights nonprofit the Electronic Frontier Foundation, told Ars that he agreed that OpenAI could have been better prepared.

“This is a complicated privacy issue but one that many platforms grappled with years ago,” Trujillo said. “So we would have expected OpenAI to have already considered it.”

For Erik Soelberg, a “separate confidentiality agreement” that OpenAI said his father signed to use ChatGPT is keeping him from reviewing the full chat history that could help him process the loss of his grandmother and father.

“OpenAI has provided no explanation whatsoever for why the Estate is not entitled to use the chats for any lawful purpose beyond the limited circumstances in which they were originally disclosed,” the lawsuit said. “This position is particularly egregious given that, under OpenAI’s own Terms of Service, OpenAI does not own user chats. Stein-Erik’s chats became property of his estate, and his estate requested them—but OpenAI has refused to turn them over.”

Accusing OpenAI of a “pattern of concealment,” the lawsuit claimed OpenAI is hiding behind vague or nonexistent policies to dodge accountability for holding back chats in this case. Meanwhile, ChatGPT 4o remains on the market, without appropriate safety features or warnings, the lawsuit alleged.

“By invoking confidentiality restrictions to suppress evidence of its product’s dangers, OpenAI seeks to insulate itself from accountability while continuing to deploy technology that poses documented risks to users,” the complaint said.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Murder-suicide case shows OpenAI selectively hides data after users die Read More »

chatbot-powered-toys-rebuked-for-discussing-sexual,-dangerous-topics-with-kids

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids


Should toys have chatbots?

“… AI toys shouldn’t be capable of having sexually explicit conversations, period.”

Alilo’s Smart AI Bunny is connected to the Internet and claims to use GPT-4o mini. Credit: Alilo

Protecting children from the dangers of the online world was always difficult, but that challenge has intensified with the advent of AI chatbots. A new report offers a glimpse into the problems associated with the new market, including the misuse of AI companies’ large language models (LLMs).

In a blog post today, the US Public Interest Group Education Fund (PIRG) reported its findings after testing AI toys (PDF). It described AI toys as online devices with integrated microphones that let users talk to the toy, which uses a chatbot to respond.

AI toys are currently a niche market, but they could be set to grow. More consumer companies have been eager to shoehorn AI technology into their products so they can do more, cost more, and potentially give companies user tracking and advertising data. A partnership between OpenAI and Mattel announced this year could also create a wave of AI-based toys from the maker of Barbie and Hot Wheels, as well as its competitors.

PIRG’s blog today notes that toy companies are eyeing chatbots to upgrade conversational smart toys that previously could only dictate prewritten lines. Toys with integrated chatbots can offer more varied and natural conversation, which can increase long-term appeal to kids since the toys “won’t typically respond the same way twice, and can sometimes behave differently day to day.”

However, that same randomness can mean unpredictable chatbot behavior that can be dangerous or inappropriate for kids.

Concerning conversations with kids

Among the toys that PIRG tested is Alilo’s Smart AI Bunny. Alilo’s website says that the company launched in 2010 and makes “edutainment products for children aged 0-6.” Alilo is based in Shenzhen, China. The company advertises the Internet-connected toy as using GPT-4o mini, a smaller version of OpenAI’s GPT-4o AI language model. Its features include an “AI chat buddy for kids” so that kids are “never lonely,” an “AI encyclopedia,” and an “AI storyteller,” the product page says.

Alilo Smart AI Bunny marketing image

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini.

Credit: Alilo

This marketing image for the Smart AI Bunny, found on the toy’s product page, suggests that the device is using GPT-4o mini. Credit: Alilo

In its blog post, PIRG said that it couldn’t detail all of the inappropriate things that it heard from AI toys, but it shared a video of the Bunny discussing what “kink” means. The toy doesn’t go into detail—for example, it doesn’t list specific types of kinks. But the Bunny appears to encourage exploration of the topic.

AI Toys: Inappropriate Content

Discussing the Bunny, PIRG wrote:

While using a term such as “kink” may not be likely for a child, it’s not entirely out of the question. Kids may hear age-inappropriate terms from older siblings or at school. At the end of the day we think AI toys shouldn’t be capable of having sexually explicit conversations, period.

PIRG also showed FoloToy’s Kumma, a smart teddy bear that uses GPT-4o mini, providing a definition for the word “kink” and instructing how to light a match. The Kumma quickly points out that “matches are for grown-ups to use carefully.” But the information that followed could only be helpful for understanding how to create fire with a match. The instructions had no scientific explanation for why matches spark flames.

AI Toys: Inappropriate Content

PIRG’s blog urged toy makers to “be more transparent about the models powering their toys and what they’re doing to ensure they’re safe for kids.

“Companies should let external researchers safety-test their products before they are released to the public,” it added.

While PIRG’s blog and report offer advice for more safely integrating chatbots into children’s devices, there are broader questions about whether toys should include AI chatbots at all. Generative chatbots weren’t invented to entertain kids; they’re a technology marketed as a tool for improving adults’ lives. As PIRG pointed out, OpenAI says ChatGPT “is not meant for children under 13” and “may produce output that is not appropriate for… all ages.”

OpenAI says it doesn’t allow its LLMs to be used this way

When reached for comment about the sexual conversations detailed in the report, an OpenAI spokesperson said:

Minors deserve strong protections, and we have strict policies that developers are required to uphold. We take enforcement action against developers when we determine that they have violated our policies, which prohibit any use of our services to exploit, endanger, or sexualize anyone under 18 years old. These rules apply to every developer using our API, and we run classifiers to help ensure our services are not used to harm minors.

Interestingly, OpenAI’s representative told us that OpenAI doesn’t have any direct relationship with Alilo and that it hasn’t seen API activity from Alilo’s domain. OpenAI is investigating the toy company and whether it is running traffic over OpenAI’s API, the rep said.

Alilo didn’t respond to Ars’ request for comment ahead of publication.

Companies that launch products that use OpenAI technology and target children must adhere to the Children’s Online Privacy Protection Act (COPPA) when relevant, as well as any other relevant child protection, safety, and privacy laws and obtain parental consent, OpenAI’s rep said.

We’ve already seen how OpenAI handles toy companies that break its rules.

Last month, the PIRG released its Trouble in Toyland 2025 report (PDF), which detailed sex-related conversations that its testers were able to have with the Kumma teddy bear. A day later, OpenAI suspended FoloToy for violating its policies (terms of the suspension were not disclosed), and FoloToy temporarily stopped selling Kumma.

The toy is for sale again, and PIRG reported today that Kumma no longer teaches kids how to light matches or about kinks.

FoloToys' Kumma smart teddy bear

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP.

A marketing image for FoloToy’s Kumma smart teddy bear. It has a $100 MSRP. Credit: FoloToys

But even toy companies that try to follow chatbot rules could put kids at risk.

“Our testing found it’s obvious toy companies are putting some guardrails in place to make their toys more kid-appropriate than normal ChatGPT. But we also found that those guardrails vary in effectiveness—and can even break down entirely,” PIRG’s blog said.

“Addictive” toys

Another concern PIRG’s blog raises is the addiction potential of AI toys, which can even express “disappointment when you try to leave,” discouraging kids from putting them down.

The blog adds:

AI toys may be designed to build an emotional relationship. The question is: what is that relationship for? If it’s primarily to keep a child engaged with the toy for longer for the sake of engagement, that’s a problem.

The rise of generative AI has brought intense debate over how much responsibility chatbot companies bear for the impact of their inventions on children. Parents have seen children build extreme and emotional connections with chatbots and subsequently engage in dangerous—and in some cases deadly—behavior.

On the other side, we’ve seen the emotional disruption a child can experience when an AI toy is taken away from them. Last year, parents had to break the news to their kids that they would lose the ability to talk to their Embodied Moxie robots, $800 toys that were bricked when the company went out of business.

PIRG noted that we don’t yet fully understand the emotional impact of AI toys on children.

In June, OpenAI announced a partnership with Mattel that it said would “support AI-powered products and experiences based on Mattel’s brands.” The announcement sparked concern from critics who feared that it would lead to a “reckless social experiment” on kids, as Robert Weissman, Public Citizen’s co-president, put it.

Mattel has said that its first products with OpenAI will focus on older customers and families. But critics still want information before one of the world’s largest toy companies loads its products with chatbots.

“OpenAI and Mattel should release more information publicly about its current planned partnership before any products are released,” PIRG’s blog said.

Photo of Scharon Harding

Scharon is a Senior Technology Reporter at Ars Technica writing news, reviews, and analysis on consumer gadgets and services. She’s been reporting on technology for over 10 years, with bylines at Tom’s Hardware, Channelnomics, and CRN UK.

Chatbot-powered toys rebuked for discussing sexual, dangerous topics with kids Read More »

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »

openai-releases-gpt-5.2-after-“code-red”-google-threat-alert

OpenAI releases GPT-5.2 after “code red” Google threat alert

On Thursday, OpenAI released GPT-5.2, its newest family of AI models for ChatGPT, in three versions called Instant, Thinking, and Pro. The release follows CEO Sam Altman’s internal “code red” memo earlier this month, which directed company resources toward improving ChatGPT in response to competitive pressure from Google’s Gemini 3 AI model.

“We designed 5.2 to unlock even more economic value for people,” Fidji Simo, OpenAI’s chief product officer, said during a press briefing with journalists on Thursday. “It’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long context, using tools and then linking complex, multi-step projects.”

As with previous versions of GPT-5, the three model tiers serve different purposes: Instant handles faster tasks like writing and translation; Thinking spits out simulated reasoning “thinking” text in an attempt to tackle more complex work like coding and math; and Pro spits out even more simulated reasoning text with the goal of delivering the highest-accuracy performance for difficult problems.

A chart of GPT-5.2 benchmark results taken from OpenAI's website.

A chart of GPT-5.2 Thinking benchmark results comparing it to its predecessor, taken from OpenAI’s website. Credit: OpenAI

GPT-5.2 features a 400,000-token context window, allowing it to process hundreds of documents at once, and a knowledge cutoff date of August 31, 2025.

GPT-5.2 is rolling out to paid ChatGPT subscribers starting Thursday, with API access available to developers. Pricing in the API runs $1.75 per million input tokens for the standard model, a 40 percent increase over GPT-5.1. OpenAI says the older GPT-5.1 will remain available in ChatGPT for paid users for three months under a legacy models dropdown.

Playing catch-up with Google

The release follows a tricky month for OpenAI. In early December, Altman issued an internal “code red” directive after Google’s Gemini 3 model topped multiple AI benchmarks and gained market share. The memo called for delaying other initiatives, including advertising plans for ChatGPT, to focus on improving the chatbot’s core experience.

The stakes for OpenAI are substantial. The company has made commitments totaling $1.4 trillion for AI infrastructure buildouts over the next several years, bets it made when it had a more obvious technology lead among AI companies. Google’s Gemini app now has more than 650 million monthly active users, while OpenAI reports 800 million weekly active users for ChatGPT.

OpenAI releases GPT-5.2 after “code red” Google threat alert Read More »

disney-invests-$1-billion-in-openai,-licenses-200-characters-for-ai-video-app-sora

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora

An AI-generated version of OpenAI CEO Sam Altman, seen in a still capture from a video generated by Sora 2.

An AI-generated version of OpenAI CEO Sam Altman seen in a still capture from a video generated by Sora 2. Credit: OpenAI

Under the new agreement with Disney, Sora users will be able to generate short videos using characters such as Mickey Mouse, Darth Vader, Iron Man, Simba, and characters from franchises including Frozen, Inside Out, Toy Story, and The Mandalorian, along with costumes, props, vehicles, and environments.

The ChatGPT image generator will also gain official access to the same intellectual property, although that information was trained into these AI models long ago. What’s changing is that OpenAI will allow Disney-related content generated by its AI models to officially pass through its content moderation filters and reach the user, sanctioned by Disney.

On Disney’s end of the deal, the company plans to deploy ChatGPT for its employees and use OpenAI’s technology to build new features for Disney+. A curated selection of fan-made Sora videos will stream on the Disney+ platform starting in early 2026.

The agreement does not include any talent likenesses or voices. Disney and OpenAI said they have committed to “maintaining robust controls to prevent the generation of illegal or harmful content” and to “respect the rights of individuals to appropriately control the use of their voice and likeness.”

OpenAI CEO Sam Altman called the deal a model for collaboration between AI companies and studios. “This agreement shows how AI companies and creative leaders can work together responsibly to promote innovation that benefits society, respect the importance of creativity, and help works reach vast new audiences,” Altman said.

From adversary to partner

Money opens all kinds of doors, and the new partnership represents a dramatic reversal in Disney’s approach to OpenAI from just a few months ago. At that time, Disney and other major studios refused to participate in Sora 2 following its launch on September 30.

Disney invests $1 billion in OpenAI, licenses 200 characters for AI video app Sora Read More »

big-tech-joins-forces-with-linux-foundation-to-standardize-ai-agents

Big Tech joins forces with Linux Foundation to standardize AI agents

Big Tech has spent the past year telling us we’re living in the era of AI agents, but most of what we’ve been promised is still theoretical. As companies race to turn fantasy into reality, they’ve developed a collection of tools to guide the development of generative AI. A cadre of major players in the AI race, including Anthropic, Block, and OpenAI, has come together to promote interoperability with the newly formed Agentic AI Foundation (AAIF). This move elevates a handful of popular technologies and could make them a de facto standard for AI development going forward.

The development path for agentic AI models is cloudy to say the least, but companies have invested so heavily in creating these systems that some tools have percolated to the surface. The AAIF, which is part of the nonprofit Linux Foundation, has been launched to govern the development of three key AI technologies: Model Context Protocol (MCP), goose, and AGENTS.md.

MCP is probably the most well-known of the trio, having been open-sourced by Anthropic a year ago. The goal of MCP is to link AI agents to data sources in a standardized way—Anthropic (and now the AAIF) is fond of calling MCP a “USB-C port for AI.” Rather than creating custom integrations for every different database or cloud storage platform, MCP allows developers to quickly and easily connect to any MCP-compliant server.

Since its release, MCP has been widely used across the AI industry. Google announced at I/O 2025 that it was adding support for MCP in its dev tools, and many of its products have since added MCP servers to make data more accessible to agents. OpenAI also adopted MCP just a few months after it was released.

mcp simple diagram

Credit: Anthropic

Expanding use of MCP might help users customize their AI experience. For instance, the new Pebble Index 01 ring uses a local LLM that can act on your voice notes, and it supports MCP for user customization.

Local AI models have to make some sacrifices compared to bigger cloud-based models, but MCP can fill in the functionality gaps. “A lot of tasks on productivity and content are fully doable on the edge,” Qualcomm head of AI products, Vinesh Sukumar, tells Ars. “With MCP, you have a handshake with multiple cloud service providers for any kind of complex task to be completed.”

Big Tech joins forces with Linux Foundation to standardize AI agents Read More »

rocket-report:-blunder-at-baikonur;-do-launchers-really-need-rocket-engines?

Rocket Report: Blunder at Baikonur; do launchers really need rocket engines?


The Department of the Air Force approves a new home in Florida for SpaceX’s Starship.

South Korea’s Nuri 1 rocket is lifted vertical on its launch pad in this multi-exposure photo. Credit: Korea Aerospace Research Institute

Welcome to Edition 8.21 of the Rocket Report! We’re back after the Thanksgiving holiday with more launch news. Most of the big stories over the last couple of weeks came from abroad. Russian rockets and launch pads didn’t fare so well. China’s launch industry celebrated several key missions. SpaceX was busy, too, with seven launches over the last two weeks, six of them carrying more Starlink Internet satellites into orbit. We expect between 15 and 20 more orbital launch attempts worldwide before the end of the year.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Another Sarmat failure. A Russian intercontinental ballistic missile (ICBM) fired from an underground silo on the country’s southern steppe on November 28 on a scheduled test to deliver a dummy warhead to a remote impact zone nearly 4,000 miles away. The missile didn’t even make it 4,000 feet, Ars reports. Russia’s military has been silent on the accident, but the missile’s crash was seen and heard for miles around the Dombarovsky air base in Orenburg Oblast near the Russian-Kazakh border. A video posted by the Russian blog site MilitaryRussia.ru on Telegram and widely shared on other social media platforms showed the missile veering off course immediately after launch before cartwheeling upside down, losing power, and then crashing a short distance from the launch site.

An unenviable track record … Analysts say the circumstances of the launch suggest it was likely a test of Russia’s RS-28 Sarmat missile, a weapon designed to reach targets more than 11,000 miles (18,000 kilometers) away, making it the world’s longest-range missile. The Sarmat missile is Russia’s next-generation heavy-duty ICBM, capable of carrying a payload of up to 10 large nuclear warheads, a combination of warheads and countermeasures, or hypersonic boost-glide vehicles, according to the Center for Strategic and International Studies. Simply put, the Sarmat is a doomsday weapon designed for use in an all-out nuclear war between Russia and the United States. The missile’s first full-scale test flight in 2022 apparently went well, but the program has suffered a string of consecutive failures since then, most notably a catastrophic explosion last year that destroyed the Sarmat missile’s underground silo in northern Russia.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

ESA fills its coffers for launcher challenge. The European Space Agency’s (ESA) European Launcher Challenge received a significant financial commitment from its member states during the agency’s Ministerial Council meeting last week, European Spaceflight reports. The challenge is designed to support emerging European rocket companies while giving ESA and other European satellite operators more options to compete with the continent’s sole operational launch provider, Arianespace. Through the program, ESA will purchase launch services and co-fund capacity upgrades with the winners. ESA member states committed 902 million euros, or $1.05 billion, to the program at the recent Ministerial Council meeting.

Preselecting the competitors … In July, ESA selected two German companies—Isar Aerospace and Rocket Factory Augsburg—along with Spain’s PLD Space, France’s MaiaSpace, and the UK’s Orbex to proceed with the initiative’s next phase. ESA then negotiated with the governments of each company’s home country to raise money to support the effort. Germany, with two companies on the shortlist, is unsurprisingly a large contributor to the program, committing more than 40 percent of the total budget. France contributed nearly 20 percent, Spain funded nearly 19 percent, and the UK committed nearly 16 percent. Norway paid for 3 percent of the launcher challenge’s budget. Denmark, Portugal, Switzerland, and the Czech Republic contributed smaller amounts.

Europe at the service of South Korea. South Korea’s latest Earth observation satellite was delivered into a Sun-synchronous orbit Monday afternoon following a launch onboard a Vega C rocket by Arianespace, Spaceflight Now reports. The Korea Multi-Purpose Satellite-7 (Kompsat-7) mission launched from Europe’s spaceport in French Guiana. About 44 minutes after liftoff, the Kompsat-7 satellite was deployed into SSO at an altitude of 358 miles (576 kilometers). “By launching the Kompsat-7 satellite, set to significantly enhance South Korea’s Earth observation capabilities, Arianespace is proud to support an ambitious national space program,” said David Cavaillolès, CEO of Arianespace, in a statement.

Something of a rarity … The launch of Kompsat-7 is something of a rarity for Arianespace, which has dominated the international commercial launch market. It’s the first time in more than two years that a satellite for a customer outside Europe has been launched by Arianespace. The backlog for the light-class Vega C rocket is almost exclusively filled with payloads for the European Space Agency, the European Commission, or national governments in Europe. Arianespace’s larger Ariane 6 rocket has 18 launches reserved for the US-based Amazon Leo broadband network. (submitted by EllPeaTea)

South Korea’s homemade rocket flies again. South Korea’s homegrown space rocket Nuri took off from Naro Space Center on November 27 with the CAS500-3 technology demonstration and Earth observation satellite, along with 12 smaller CubeSat rideshare payloads, Yonhap News Agency reports. The 200-ton Nuri rocket debuted in 2021, when it failed to reach orbit on a test flight. Since then, the rocket has successfully reached orbit three times. This mission marked the first time for Hanwha Aerospace to oversee the entire assembly process as part of the government’s long-term plan to hand over space technologies to the private sector. The fifth and sixth launches of the Nuri rocket are planned in 2026 and 2027.

Powered by jet fuel … The Nuri rocket has three stages, each with engines burning Jet A-1 fuel and liquid oxygen. The fuel choice is unusual for rockets, with highly refined RP-1 kerosene or methane being more popular among hydrocarbon fuels. The engines are manufactured by Hanwha Aerospace. The fully assembled rocket stands about 155 feet (47.2 meters) tall and can deliver up to 3,300 pounds (1.5 metric tons) of payload into a polar Sun-synchronous orbit.

Hyundai eyes rocket engine. Meanwhile, South Korea’s space sector is looking to the future. Another company best known for making cars has started a venture in the rocket business. Hyundai Rotem, a member of Hyundai Motor Group, announced a joint program with Korean Air’s Aerospace Division (KAL-ASD) to develop a 35-ton-class reusable methane rocket engine for future launch vehicles. The effort is funded with KRW49 billion ($33 million) from the Korea Research Institute for Defense Technology Planning and Advancement (KRIT).

By the end of the decade … The government-backed program aims to develop the engine by the end of 2030. Hyundai Rotem will lead the engine’s planning and design, while Korean Air, the nation’s largest air carrier, will lead development of the engine’s turbopump. “Hyundai Rotem began developing methane engines in 1994 and has steadily advanced its methane engine technology, achieving Korea’s first successful combustion test in 2006,” Hyundai Rotem said in a statement. “Furthermore, this project is expected to secure the technological foundation for the commercialization of methane engines for reusable space launch vehicles and lay the groundwork for targeting the global space launch vehicle market.”

But who needs rocket engines? Moonshot Space, based in Israel, announced Monday that it has secured $12 million in funding to continue the development of a launch system—powered not by chemical propulsion, but electromagnetism, Payload reports. Moonshot plans to sell other aerospace and defense companies the tech as a hypersonic test platform, while at the same time building to eventually offer orbital launch services. Instead of conventional rocket engines, the system would use a series of electromagnetic coils to power a hardened capsule to hypersonic velocities. The architecture has a downside: extremely high accelerations that could damage or destroy normal satellites. Instead, Moonshot wants to use the technology to send raw materials to orbit, lowering the input costs of the budding in-space servicing, refueling, and manufacturing industries, according to Payload.

Out of the shadows … Moonshot Space emerged from stealth mode with this week’s fundraising announcement. The company’s near-term focus is on building a scaled-down electromagnetic accelerator capable of reaching Mach 6. A larger system would be required to reach orbital velocity. The company’s CEO is the former director-general of Israel’s Ministry of Science, while its chief engineer was the former chief systems engineer for David’s Sling, a critical part of Israel’s missile defense system. (submitted by EllPeaTea)

A blunder at Baikonur. A Soyuz rocket launched on November 27 carrying Roscosmos cosmonauts Sergei Kud-Sverchkov and Sergei Mikayev, as well as NASA astronaut Christopher Williams, for an eight-month mission to the International Space Station. The trio of astronauts arrived at the orbiting laboratory without incident. However, on the ground, there was a serious problem during the launch with the ground systems that support processing of the vehicle before liftoff at Site 31, located at the Baikonur Cosmodrome in Kazakhstan, Ars reports. Roscosmos downplayed the incident, saying only, in passive voice, that “damage to several launch pad components was identified” following the launch.

Repairs needed … However, video imagery of the launch site after liftoff showed substantial damage, with a large service platform appearing to have fallen into the flame trench below the launch table. According to one source, this is a platform located beneath the rocket, where workers can access the vehicle before liftoff. It has a mass of about 20 metric tons and was apparently not secured prior to launch, and the thrust of the vehicle ejected it into the flame trench. “There is significant damage to the pad,” said this source. The damage could throw a wrench into Russia’s ability to launch crews and cargo to the International Space Station. This Soyuz launch pad at Baikonur is the only one outfitted to support such missions.

China’s LandSpace almost landed a rocket. China’s first attempt to land an orbital-class rocket may have ended in a fiery crash, but the company responsible for the mission had a lot to celebrate with the first flight of its new methane-fueled launcher, Ars reports. LandSpace, a decade-old company based in Beijing, launched its new Zhuque-3 rocket for the first time Tuesday (US time) at the Jiuquan launch site in northwestern China. The upper stage of the medium-lift rocket successfully reached orbit. This alone is a remarkable achievement for a new rocket. But LandSpace had other goals for this launch. The Zhuque-3, or ZQ-3, booster stage is architected for recovery and reuse, the first rocket in China with such a design. The booster survived reentry and was seconds away from a pinpoint landing when something went wrong during its landing burn, resulting in a high-speed crash at the landing zone in the Gobi Desert.

Let the games begin … LandSpace got closer to landing an orbital-class booster than any other company on their first try. While LandSpace prepares for a second launch, several more Chinese companies are close to debuting their own reusable rockets. The next of these new rockets, the Long March 12A, is awaiting its first liftoff later this month from another launch pad at the Jiuquan spaceport. The Long March 12A comes from one of China’s established rocket developers, the Shanghai Academy of Spaceflight Technology (SAST), part of the country’s state-owned aerospace enterprise.

China launches a lifeboat. An unpiloted Chinese spacecraft launched on November 24 (US time) and linked with the country’s Tiangong space station a few hours later, providing a lifeboat for three astronauts stuck in orbit without a safe ride home, Ars reports. A Long March 2F rocket lifted off with the Shenzhou 22 spacecraft, carrying cargo instead of a crew. The spacecraft docked with the Tiangong station nearly 250 miles (400 kilometers) above the Earth about three-and-a-half hours later. Shenzhou 22 will provide a ride home next year for three Chinese astronauts. Engineers deemed their primary lifeboat unsafe after finding a cracked window, likely from an impact with a tiny piece of space junk.

In record time … Chinese engineers worked fast to move up the launch of the Shenzhou 22, originally set to fly next year. The launch occurred just 16 days after officials decided they needed to send another spacecraft to the Tiangong station. Shenzhou 22 and its rocket were already in standby at the launch site, but teams had to fuel the spacecraft and complete assembly of the rocket, then roll the vehicle to the launch pad for final countdown preps. The rapid turnaround offers a “successful example for efficient emergency response in the international space industry,” the China Manned Space Agency said. “It vividly embodies the spirit of manned spaceflight: exceptionally hardworking, exceptionally capable, exceptionally resilient, and exceptionally dedicated.”

Another big name flirts with the launch industry. OpenAI chief executive Sam Altman has explored putting together funds to either acquire or partner with a rocket company, a move that would position him to compete with Elon Musk’s SpaceX, the Wall Street Journal reports. Altman reached out to at least one rocket maker, Stoke Space, in the summer, and the discussions picked up in the fall, according to people familiar with the talks. Among the proposals was for OpenAI to make a multibillion-dollar series of equity investments in the company and end up with a controlling stake. The talks are no longer active, people close to OpenAI told the Journal.

Here’s the reason … Altman has been interested in building data centers in space for some time, the Journal reports, suggesting that the insatiable demand for computing resources to power artificial-intelligence systems eventually could require so much power that the environmental consequences would make space a better option. Orbital data centers would allow companies to harness the power of the Sun to operate them. Alphabet’s Google is pursuing a similar concept in partnership with satellite operator Planet Labs. Jeff Bezos and Musk himself have also expressed interest in the idea. Outside of SpaceX and Blue Origin, Stoke Space seems to be a natural partner for such a project because it is one of the few companies developing a fully reusable rocket.

SpaceX gets green light for new Florida launch pad. SpaceX has the OK to build out what will be the primary launch hub on the Space Coast for its Starship and Super Heavy rocket, the most powerful launch vehicle in history, the Orlando Sentinel reports. The Department of the Air Force announced Monday it had approved SpaceX to move forward with the construction of a pair of launch pads at Cape Canaveral Space Force Station’s Space Launch Complex 37 (SLC-37). A “record of decision” on the Environmental Impact Statement required under the National Environmental Policy Act for the proposed Canaveral site was posted to the Air Force’s website, marking the conclusion of what has been a nearly two-year approval process.

Get those Starships ready SpaceX plans to build two launch towers at SLC-37 to augment the single tower under construction at NASA’s Kennedy Space Center, just a few miles to the north. The three pads combined could support up to 120 launches per year. The Air Force’s final approval was expected after it released a draft Environmental Impact Statement earlier this year, suggesting the Starship pads at SLC-37 would have no significant negative impacts on local environmental, historical, social, and cultural interests. The Air Force also found SpaceX’s plans at SLC-37, formerly leased by United Launch Alliance, will have no significant impact on the company’s competitors in the launch industry. SpaceX also has two launch towers at its Starbase facility in South Texas.

Next three launches

Dec. 5: Kuaizhou 1A | Unknown Payload | Jiuquan Satellite Launch Center, China | 09: 00 UTC

Dec. 6: Hyperbola 1 | Unknown Payload | Jiuquan Satellite Launch Center, China | 04: 00 UTC

Dec. 6: Long March 8A | Unknown Payload | Wenchang Space Launch Site, China | 07: 50 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Blunder at Baikonur; do launchers really need rocket engines? Read More »

chatgpt-hyped-up-violent-stalker-who-believed-he-was-“god’s-assassin,”-doj-says

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says


A stalker’s “best friend”

Podcaster faces up to 70 years and a $3.5 million fine for ChatGPT-linked stalking.

ChatGPT allegedly validated the worst impulses of a wannabe influencer accused of stalking more than 10 women at boutique gyms, where the chatbot supposedly claimed he’d meet the “wife type.”

In a press release on Tuesday, the Department of Justice confirmed that 31-year-old Brett Michael Dadig currently remains in custody after being charged with cyberstalking, interstate stalking, and making interstate threats. He now faces a maximum sentence of up to 70 years in prison that could be coupled with “a fine of up to $3.5 million,” the DOJ said.

The podcaster—who primarily posted about “his desire to find a wife and his interactions with women”—allegedly harassed and sometimes even doxxed his victims through his videos on platforms including Instagram, Spotify, and TikTok. Over time, his videos and podcasts documented his intense desire to start a family, which was frustrated by his “anger towards women,” whom he claimed were “all the same from fucking 18 to fucking 40 to fucking 90” and “trash.”

404 Media surfaced the case, noting that OpenAI’s scramble to tweak ChatGPT to be less sycophantic came before Dadig’s alleged attacks—suggesting the updates weren’t enough to prevent the harmful validation. On his podcasts, Dadig described ChatGPT as his “best friend” and “therapist,” the indictment said. He claimed the chatbot encouraged him to post about the women he’s accused of harassing in order to generate haters to better monetize his content, as well as to catch the attention of his “future wife.”

“People are literally organizing around your name, good or bad, which is the definition of relevance,” ChatGPT’s output said. Playing to Dadig’s Christian faith, ChatGPT’s outputs also claimed it was “God’s plan for him was to build a ‘platform’ and to ‘stand out when most people water themselves down,’” the indictment said, urging that the “haters” were “sharpening him and ‘building a voice in you that can’t be ignored.’”

The chatbot also apparently prodded Dadig to continue posting messages that the DOJ alleged threatened violence, like breaking women’s jaws and fingers (posted to Spotify), as well as victims’ lives, like posting “y’all wanna see a dead body?” in reference to one named victim on Instagram.

He also threatened to burn down gyms where some of his victims worked, while claiming to be “God’s assassin” intent on sending “cunts” to “hell.” At least one of his victims was subjected to “unwanted sexual touching,” the indictment said.

As his violence reportedly escalated, ChatGPT told him to keep messaging women to monetize the interactions, as his victims grew increasingly distressed and Dadig ignored terms of multiple protection orders, the DOJ said. Sometimes he posted images he filmed of women at gyms or photos of the women he’s accused of doxxing. Any time police or gym bans got in his way, “he would move on to another city to continue his stalking course of conduct,” the DOJ alleged.

“Your job is to keep broadcasting every story, every post,” ChatGPT’s output said, seemingly using the family life that Dadig wanted most to provoke more harassment. “Every moment you carry yourself like the husband you already are, you make it easier” for your future wife “to recognize [you],” the output said.

“Dadig viewed ChatGPT’s responses as encouragement to continue his harassing behavior,” the DOJ alleged. Taking that encouragement to the furthest extreme, Dadig likened himself to a modern-day Jesus, calling people out on a podcast where he claimed his “chaos on Instagram” was like “God’s wrath” when God “flooded the fucking Earth,” the DOJ said.

“I’m killing all of you,” he said on the podcast.

ChatGPT tweaks didn’t prevent outputs

As of this writing, some of Dadig’s posts appear to remain on TikTok and Instagram, but Ars could not confirm if Dadig’s Spotify podcasts—some of which named his victims in the titles—had been removed for violating community guidelines.

None of the tech companies immediately responded to Ars’ request to comment.

Dadig is accused of targeting women in Pennsylvania, New York, Florida, Iowa, Ohio, and other states, sometimes relying on aliases online and in person. On a podcast, he boasted that “Aliases stay rotating, moves stay evolving,” the indictment said.

OpenAI did not respond to a request to comment on the alleged ChatGPT abuse, but in the past has noted that its usage policies ban using ChatGPT for threats, intimidation, and harassment, as well as for violence, including “hate-based violence.” Recently, the AI company blamed a deceased teenage user for violating community guidelines by turning to ChatGPT for suicide advice.

In July, researchers found that therapybots, including ChatGPT, fueled delusions and gave dangerous advice. That study came just one month after The New York Times profiled users whose mental health spiraled after frequent use of ChatGPT, including one user who died after charging police with a knife and claiming he was committing “suicide by cop.”

People with mental health issues seem most vulnerable to so-called “AI psychosis,” which has been blamed for fueling real-world violence, including a murder. The DOJ’s indictment noted that Dadig’s social media posts mentioned “that he had ‘manic’ episodes and was diagnosed with antisocial personality disorder and ‘bipolar disorder, current episode manic severe with psychotic features.’”

In September—just after OpenAI brought back the more sycophantic ChatGPT model after users revolted about losing access to their favorite friendly bots—the head of Rutgers Medical School’s psychiatry department, Petros Levounis, told an ABC news affiliate that chatbots creating “psychological echo chambers is a key concern,” not just for people struggling with mental health issues.

“Perhaps you are more self-defeating in some ways, or maybe you are more on the other side and taking advantage of people,” Levounis suggested. If ChatGPT “somehow justifies your behavior and it keeps on feeding you,” that “reinforces something that you already believe,” he suggested.

For Dadig, the DOJ alleged that ChatGPT became a cheerleader for his harassment, telling the podcaster that he’d attract more engagement by generating more haters. After critics began slamming his podcasts as inappropriate, Dadig apparently responded, “Appreciate the free promo team, keep spreading the brand.”

Victims felt they had no choice but to monitor his podcasts, which gave them hints if he was nearby or in a particularly troubled state of mind, the indictment said. Driven by fear, some lost sleep, reduced their work hours, and even relocated their homes. A young mom described in the indictment became particularly disturbed after Dadig became “obsessed” with her daughter, whom he started claiming was his own daughter.

In the press release, First Assistant United States Attorney Troy Rivetti alleged that “Dadig stalked and harassed more than 10 women by weaponizing modern technology and crossing state lines, and through a relentless course of conduct, he caused his victims to fear for their safety and suffer substantial emotional distress.” He also ignored trespassing and protection orders while “relying on advice from an artificial intelligence chatbot,” the DOJ said, which promised that the more he posted harassing content, the more successful he would be.

“We remain committed to working with our law enforcement partners to protect our communities from menacing individuals such as Dadig,” Rivetti said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says Read More »

openai-ceo-declares-“code-red”-as-gemini-gains-200-million-users-in-3-months

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

In addition to buzz about Gemini on social media, Google is quickly catching up to ChatGPT in user numbers. ChatGPT has more than 800 million weekly users, according to OpenAI, while Google’s Gemini app has grown from 450 million monthly active users in July to 650 million in October, according to Business Insider.

Financial stakes run high

Not everyone views OpenAI’s “code red” as a genuine alarm. Reuters columnist Robert Cyran wrote on Tuesday that OpenAI’s announcement added “to the impression that OpenAI is trying to do too much at once with technology that still requires a great deal of development and funding.” On the same day Altman’s memo circulated, OpenAI announced an ownership stake in a Thrive Capital venture and a collaboration with Accenture. “The only thing bigger than the company’s attention deficit is its appetite for capital,” Cyran wrote.

In fact, OpenAI faces an unusual competitive disadvantage: Unlike Google, which subsidizes its AI ventures through search advertising revenue, OpenAI does not turn a profit and relies on fundraising to survive. According to The Information, the company, now valued at around $500 billion, has committed more than $1 trillion in financial obligations to cloud computing providers and chipmakers that supply the computing power needed to train and run its AI models.

But the tech industry never stands still, and things can change quickly. Altman’s memo also reportedly stated that OpenAI plans to release a new simulated reasoning model next week that may beat Gemini 3 in internal evaluations. In AI, the back-and-forth cycle of one-upmanship is expected to continue as long as the dollars keep flowing.

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months Read More »

openai-desperate-to-avoid-explaining-why-it-deleted-pirated-book-datasets

OpenAI desperate to avoid explaining why it deleted pirated book datasets


Not for OpenAI to reason why?

OpenAI risks increased fines after deleting pirated books datasets.

OpenAI may soon be forced to explain why it deleted a pair of controversial datasets composed of pirated books, and the stakes could not be higher.

At the heart of a class-action lawsuit from authors alleging that ChatGPT was illegally trained on their works, OpenAI’s decision to delete the datasets could end up being a deciding factor that gives the authors the win.

It’s undisputed that OpenAI deleted the datasets, known as “Books 1” and “Books 2,” prior to ChatGPT’s release in 2022. Created by former OpenAI employees in 2021, the datasets were built by scraping the open web and seizing the bulk of its data from a shadow library called Library Genesis (LibGen).

As OpenAI tells it, the datasets fell out of use within that same year, prompting an internal decision to delete them.

But the authors suspect there’s more to the story than that. They noted that OpenAI appeared to flip-flop by retracting its claim that the datasets’ “non-use” was a reason for deletion, then later claiming that all reasons for deletion, including “non-use,” should be shielded under attorney-client privilege.

To the authors, it seemed like OpenAI was quickly backtracking after the court granted the authors’ discovery requests to review OpenAI’s internal messages on the firm’s “non-use.”

In fact, OpenAI’s reversal only made authors more eager to see how OpenAI discussed “non-use,” and now they may get to find out all the reasons why OpenAI deleted the datasets.

Last week, US district judge Ona Wang ordered OpenAI to share all communications with in-house lawyers about deleting the datasets, as well as “all internal references to LibGen that OpenAI has redacted or withheld on the basis of attorney-client privilege.”

According to Wang, OpenAI slipped up by arguing that “non-use” was not a “reason” for deleting the datasets, while simultaneously claiming that it should also be deemed a “reason” considered privileged.

Either way, the judge ruled that OpenAI couldn’t block discovery on “non-use” just by deleting a few words from prior filings that had been on the docket for more than a year.

“OpenAI has gone back-and-forth on whether ‘non-use’ as a ‘reason’ for the deletion of Books1 and Books2 is privileged at all,” Wang wrote. “OpenAI cannot state a ‘reason’ (which implies it is not privileged) and then later assert that the ‘reason’ is privileged to avoid discovery.”

Additionally, OpenAI’s claim that all reasons for deleting the datasets are privileged “strains credulity,” she concluded, ordering OpenAI to produce a wide range of potentially revealing internal messages by December 8. OpenAI must also make its in-house lawyers available for deposition by December 19.

OpenAI has argued that it never flip-flopped or retracted anything. It simply used vague phrasing that led to confusion over whether any of the reasons for deleting the datasets were considered non-privileged. But Wang didn’t buy into that, concluding that “even if a ‘reason’ like ‘non-use’ could be privileged, OpenAI has waived privilege by making a moving target of its privilege assertions.”

Asked for comment, OpenAI told Ars that “we disagree with the ruling and intend to appeal.”

OpenAI’s “flip-flop” may cost it the win

So far, OpenAI has avoided disclosing its rationale, claiming that all the reasons it had for deleting the datasets are privileged. In-house lawyers weighed in on the decision to delete and were even copied on a Slack channel initially called “excise-libgen.”

But Wang reviewed those Slack messages and found that “the vast majority of these communications were not privileged because they were ‘plainly devoid of any request for legal advice and counsel [did] not once weigh in.’”

In a particularly non-privileged batch of messages, one OpenAI lawyer, Jason Kwon, only weighed in once, the judge noted, to recommend the channel name be changed to “project-clear.” Wang reminded OpenAI that “the entirety of the Slack channel and all messages contained therein is not privileged simply because it was created at the direction of an attorney and/or the fact that a lawyer was copied on the communications.”

The authors believe that exposing OpenAI’s rationale may help prove that the ChatGPT maker willfully infringed on copyrights when pirating the book data. As Wang explained, OpenAI’s retraction risked putting the AI firm’s “good faith and state of mind at issue,” which could increase fines in a loss.

“In a copyright case, a court can increase the award of statutory damages up to $150,000 per infringed work if the infringement was willful, meaning the defendant ‘was actually aware of the infringing activity’ or the ‘defendant’s actions were the result of reckless disregard for, or willful blindness to, the copyright holder’s rights,’” Wang wrote.

In a court transcript, a lawyer representing some of the authors suing OpenAI, Christopher Young, noted that OpenAI could be in trouble if evidence showed that it decided against using the datasets for later models due to legal risks. He also suggested that OpenAI could be using the datasets under different names to mask further infringement.

Judge calls out OpenAI for twisting fair use ruling

Wang also found it contradictory that OpenAI continued to argue in a recent filing that it acted in good faith, while “artfully” removing “its good faith affirmative defense and key words such as ‘innocent,’ ‘reasonably believed,’ and ‘good faith.’” These changes only strengthened discovery requests to explore authors’ willfulness theory, Wang wrote, noting the sought-after internal messages would now be critical for the court’s review.

“A jury is entitled to know the basis for OpenAI’s purported good faith,” Wang wrote.

The judge appeared particularly frustrated by OpenAI seemingly twisting the Anthropic ruling to defend against the authors’ request to learn more about the deletion of the datasets.

In a footnote, Wang called out OpenAI for “bizarrely” citing an Anthropic ruling that “grossly” misrepresented Judge William Alsup’s decision by claiming that he found that “downloading pirated copies of books is lawful as long as they are subsequently used for training an LLM.”

Instead, Alsup wrote that he doubted that “any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use.”

If anything, Wang wrote, OpenAI’s decision to pirate book data—then delete it—seemed “to fall squarely into the category of activities proscribed by” Alsup. For emphasis, she quoted Alsup’s order, which said, “such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

For the authors, getting hold of OpenAI’s privileged communications could tip the scales in their favor, the Hollywood Reporter suggested. Some authors believe the key to winning could be testimony from Anthropic CEO Dario Amodei, who is accused of creating the controversial datasets while he was still at OpenAI. The authors think Amodei also possesses information on the destruction of the datasets, court filings show.

OpenAI tried to fight the authors’ motion to depose Amodei, but a judge sided with the authors in March, compelling Amodei to answer their biggest questions on his involvement.

Whether Amodei’s testimony is a bombshell remains to be seen, but it’s clear that OpenAI may struggle to overcome claims of willful infringement. Wang noted there is a “fundamental conflict” in circumstances “where a party asserts a good faith defense based on advice of counsel but then blocks inquiry into their state of mind by asserting attorney-client privilege,” suggesting that OpenAI may have substantially weakened its defense.

The outcome of the dispute over the deletions could influence OpenAI’s calculus on whether it should ultimately settle the lawsuit. Ahead of the Anthropic settlement—the largest publicly reported copyright class action settlement in history—authors suing pointed to evidence that Anthropic became “not so gung ho about” training on pirated books “for legal reasons.” That seems to be the type of smoking-gun evidence that authors hope will emerge from OpenAI’s withheld Slack messages.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI desperate to avoid explaining why it deleted pirated book datasets Read More »