chatbots

openai-built-an-ai-coding-agent-and-uses-it-to-improve-the-agent-itself

OpenAI built an AI coding agent and uses it to improve the agent itself


“The vast majority of Codex is built by Codex,” OpenAI told us about its new AI coding agent.

With the popularity of AI coding tools rising among software developers, their adoption has begun to touch every aspect of the process, including the improvement of AI coding tools themselves.

In interviews with Ars Technica this week, OpenAI employees revealed the extent to which the company now relies on its own AI coding agent, Codex, to build and improve the development tool. “I think the vast majority of Codex is built by Codex, so it’s almost entirely just being used to improve itself,” said Alexander Embiricos, product lead for Codex at OpenAI, in a conversation on Tuesday.

Codex, which OpenAI launched in its modern incarnation as a research preview in May 2025, operates as a cloud-based software engineering agent that can handle tasks like writing features, fixing bugs, and proposing pull requests. The tool runs in sandboxed environments linked to a user’s code repository and can execute multiple tasks in parallel. OpenAI offers Codex through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for VS Code, Cursor, and Windsurf.

The “Codex” name itself dates back to a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. Embiricos said the name is rumored among staff to be short for “code execution.” OpenAI wanted to connect the new agent to that earlier moment, which was crafted in part by some who have left the company.

“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos said. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”

A place to enter a prompt, set parameters, and click

The interface for OpenAI’s Codex in ChatGPT. Credit: OpenAI

It’s no secret that the current command-line version of Codex bears some resemblance to Claude Code, Anthropic’s agentic coding tool that launched in February 2025. When asked whether Claude Code influenced Codex’s design, Embiricos parried the question but acknowledged the competitive dynamic. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said. He noted that OpenAI had been building web-based Codex features internally before shipping the CLI version, which arrived after Anthropic’s tool.

OpenAI’s customers apparently love the command line version, though. Embiricos said Codex usage among external developers jumped 20 times after OpenAI shipped the interactive CLI extension alongside GPT-5 in August 2025. On September 15, OpenAI released GPT-5 Codex, a specialized version of GPT-5 optimized for agentic coding, which further accelerated adoption.

It hasn’t just been the outside world that has embraced the tool. Embiricos said the vast majority of OpenAI’s engineers now use Codex regularly. The company uses the same open-source version of the CLI that external developers can freely download, suggest additions to, and modify themselves. “I really love this about our team,” Embiricos said. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”

The recursive nature of Codex development extends beyond simple code generation. Embiricos described scenarios where Codex monitors its own training runs and processes user feedback to “decide” what to build next. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can also submit a ticket to Codex through project management tools like Linear, assigning it tasks the same way they would assign work to a human colleague.

This kind of recursive loop, of using tools to build better tools, has deep roots in computing history. Engineers designed the first integrated circuits by hand on vellum and paper in the 1960s, then fabricated physical chips from those drawings. Those chips powered the computers that ran the first electronic design automation (EDA) software, which in turn enabled engineers to design circuits far too complex for any human to draft manually. Modern processors contain billions of transistors arranged in patterns that exist only because software made them possible. OpenAI’s use of Codex to build Codex seems to follow the same pattern: each generation of the tool creates capabilities that feed into the next.

But describing what Codex actually does presents something of a linguistic challenge. At Ars Technica, we try to reduce anthropomorphism when discussing AI models as much as possible while also describing what these systems do using analogies that make sense to general readers. People can talk to Codex like a human, so it feels natural to use human terms to describe interacting with it, even though it is not a person and simulates human personality through statistical modeling.

The system runs many processes autonomously, addresses feedback, spins off and manages child processes, and produces code that ships in real products. OpenAI employees call it a “teammate” and assign it tasks through the same tools they use for human colleagues. Whether the tasks Codex handles constitute “decisions” or sophisticated conditional logic smuggled through a neural network depends on definitions that computer scientists and philosophers continue to debate. What we can say is that a semi-autonomous feedback loop exists: Codex produces code under human direction, that code becomes part of Codex, and the next version of Codex produces different code as a result.

Building faster with “AI teammates”

According to our interviews, the most dramatic example of Codex’s internal impact came from OpenAI’s development of the Sora Android app. According to Embiricos, the development tool allowed the company to create the app in record time.

“The Sora Android app was shipped by four engineers from scratch,” Embiricos told Ars. “It took 18 days to build, and then we shipped it to the app store in 28 days total,” he said. The engineers already had the iOS app and server-side components to work from, so they focused on building the Android client. They used Codex to help plan the architecture, generate sub-plans for different components, and implement those components.

Despite OpenAI’s claims of success with Codex in house, it’s worth noting that independent research has shown mixed results for AI coding productivity. A METR study published in July found that experienced open source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers noted AI may perform better on simpler projects.

Ed Bayes, a designer on the Codex team, described how the tool has changed his own workflow. Bayes said Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to assign coding tasks directly to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes told Ars. “Codex is literally a teammate in your workspace.”

This integration means that when someone posts feedback in a Slack channel, they can tag Codex and ask it to fix the issue. The agent will create a pull request, and team members can review and iterate on the changes through the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.

For Bayes, who works on the visual design and interaction patterns for Codex’s interfaces, the tool has enabled him to contribute code directly rather than handing off specifications to engineers. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. He noted that designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.

The command line version of OpenAI codex running in a macOS terminal window.

The command line version of OpenAI codex running in a macOS terminal window. Credit: Benj Edwards

OpenAI’s approach treats Codex as what Bayes called “a junior developer” that the company hopes will graduate into a senior developer over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”

Given this teammate approach, will there be anything left for humans to do? When asked, Embiricos drew a distinction between “vibe coding,” where developers accept AI-generated code without close review, and what AI researcher Simon Willison calls “vibe engineering,” where humans stay in the loop. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”

He added that vibe coding still has its place for prototypes and throwaway tools. “I think vibe coding is great,” he said. “Now you have discretion as a human about how much attention you wanna pay to the code.”

Looking ahead

Over the past year, “monolithic” large language models (LLMs) like GPT-4.5 have apparently become something of a dead end in terms of frontier benchmarking progress as AI companies pivot to simulated reasoning models and also agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the best path forward for squeezing utility out of existing LLM technology.

He dismissed concerns that AI capabilities have plateaued. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He pointed to recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the company has seen the model work independently for 24 hours on complex tasks.

OpenAI faces competition from multiple directions in the AI coding market. Anthropic’s Claude Code and Google’s Gemini CLI offer similar terminal-based agentic coding experiences. This week, Mistral AI released Devstral 2 alongside a CLI tool called Mistral Vibe. Meanwhile, startups like Cursor have built dedicated IDEs around AI coding, reportedly reaching $300 million in annualized revenue.

Given the well-known issues with confabulation in AI models when people attempt to use them as factual resources, could it be that coding has become the killer app for LLMs? We wondered if OpenAI has noticed that coding seems to be a clear business use case for today’s AI models with less hazard than, say, using AI language models for writing or as emotional companions.

“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”

But will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but said Codex has not reduced headcount at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Similarly, the two men don’t project a future where Codex runs by itself without some form of human oversight. They feel the tool is an amplifier of human potential rather than a replacement for it.

The practical implications of agents like Codex extend beyond OpenAI’s walls. Embiricos said the company’s long-term vision involves making coding agents useful to people who have no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”

This article was updated on December 12, 2025 at 6: 50 PM to mention the METR study.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

OpenAI built an AI coding agent and uses it to improve the agent itself Read More »

openai-ceo-declares-“code-red”-as-gemini-gains-200-million-users-in-3-months

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months

In addition to buzz about Gemini on social media, Google is quickly catching up to ChatGPT in user numbers. ChatGPT has more than 800 million weekly users, according to OpenAI, while Google’s Gemini app has grown from 450 million monthly active users in July to 650 million in October, according to Business Insider.

Financial stakes run high

Not everyone views OpenAI’s “code red” as a genuine alarm. Reuters columnist Robert Cyran wrote on Tuesday that OpenAI’s announcement added “to the impression that OpenAI is trying to do too much at once with technology that still requires a great deal of development and funding.” On the same day Altman’s memo circulated, OpenAI announced an ownership stake in a Thrive Capital venture and a collaboration with Accenture. “The only thing bigger than the company’s attention deficit is its appetite for capital,” Cyran wrote.

In fact, OpenAI faces an unusual competitive disadvantage: Unlike Google, which subsidizes its AI ventures through search advertising revenue, OpenAI does not turn a profit and relies on fundraising to survive. According to The Information, the company, now valued at around $500 billion, has committed more than $1 trillion in financial obligations to cloud computing providers and chipmakers that supply the computing power needed to train and run its AI models.

But the tech industry never stands still, and things can change quickly. Altman’s memo also reportedly stated that OpenAI plans to release a new simulated reasoning model next week that may beat Gemini 3 in internal evaluations. In AI, the back-and-forth cycle of one-upmanship is expected to continue as long as the dollars keep flowing.

OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months Read More »

forget-agi—sam-altman-celebrates-chatgpt-finally-following-em-dash-formatting-rules

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules


Next stop: superintelligence

Ongoing struggles with AI model instruction-following show that true human-level AI still a ways off.

Em dashes have become what many believe to be a telltale sign of AI-generated text over the past few years. The punctuation mark appears frequently in outputs from ChatGPT and other AI chatbots, sometimes to the point where readers believe they can identify AI writing by its overuse alone—although people can overuse it, too.

On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. “Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it’s supposed to do!” he wrote.

The post, which came two days after the release of OpenAI’s new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this “small win” raises a very big question: If the world’s most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim.

Sam Altman @sama Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do! 11:48 PM · Nov 13, 2025 · 2.4M Views

A screenshot of Sam Altman’s post about em dashes on X. Credit: X

“The fact that it’s been 3 years since ChatGPT first launched, and you’ve only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings,” wrote one X user in a reply. “Not a good sign for the future.”

While Altman likes to publicly talk about AGI (a hypothetical technology equivalent to humans in general learning ability), superintelligence (a nebulous concept for AI that is far beyond human intelligence), and “magic intelligence in the sky” (his term for AI cloud computing?) while raising funds for OpenAI, it’s clear that we still don’t have reliable artificial intelligence here today on Earth.

But wait, what is an em dash anyway, and why does it matter so much?

AI models love em dashes because we do

Unlike a hyphen, which is a short punctuation mark used to connect words or parts of words, that lives with a dedicated key on your keyboard (-), an em dash is a long dash denoted by a special character (—) that writers use to set off parenthetical information, indicate a sudden change in thought, or introduce a summary or explanation.

Even before the age of AI language models, some writers frequently bemoaned the overuse of the em dash in modern writing. In a 2011 Slate article, writer Noreen Malone argued that writers used the em dash “in lieu of properly crafting sentences” and that overreliance on it “discourages truly efficient writing.” Various Reddit threads posted prior to ChatGPT’s launch featured writers either wrestling over the etiquette of proper em dash use or admitting to their frequent use as a guilty pleasure.

In 2021, one writer in the r/FanFiction subreddit wrote, “For the longest time, I’ve been addicted to Em Dashes. They find their way into every paragraph I write. I love the crisp straight line that gives me the excuse to shove details or thoughts into an otherwise orderly paragraph. Even after coming back to write after like two years of writer’s block, I immediately cram as many em dashes as I can.”

Because of the tendency for AI chatbots to overuse them, detection tools and human readers have learned to spot em dash use as a pattern, creating a problem for the small subset of writers who naturally favor the punctuation mark in their work. As a result, some journalists are complaining that AI is “killing” the em dash.

No one knows precisely why LLMs tend to overuse em dashes. We’ve seen a wide range of speculation online that attempts to explain the phenomenon, from noticing that em dashes were more popular in 19th-century books used as training data (according to a 2018 study, dash use in the English language peaked around 1860 before declining through the mid-20th century) or perhaps AI models borrowed the habit from automatic em-dash character conversion on the blogging site Medium.

One thing we know for sure is that LLMs tend to output frequently seen patterns in their training data (fed in during the initial training process) and from a subsequent reinforcement learning process that often relies on human preferences. As a result, AI language models feed you a sort of “smoothed out” average style of whatever you ask them to provide, moderated by whatever they are conditioned to produce through user feedback.

So the most plausible explanation is still that requests for professional-style writing from an AI model trained on vast numbers of examples from the Internet will lean heavily toward the prevailing style in the training data, where em dashes appear frequently in formal writing, news articles, and editorial content. It’s also possible that during training through human feedback (called RLHF), responses with em dashes, for whatever reason, received higher ratings. Perhaps it’s because those outputs appeared more sophisticated or engaging to evaluators, but that’s just speculation.

From em dashes to AGI?

To understand what Altman’s “win” really means, and what it says about the road to AGI, we need to understand how ChatGPT’s custom instructions actually work. They allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt that is fed into the model just before the chat begins. Users can specify tone, format, and style requirements without needing to repeat those requests manually in every new chat.

However, the feature has not always worked reliably because LLMs do not work reliably (even OpenAI and Anthropic freely admit this). An LLM takes an input and produces an output, spitting out a statistically plausible continuation of a prompt (a system prompt, the custom instructions, and your chat history), and it doesn’t really “understand” what you are asking. With AI language model outputs, there is always some luck involved in getting them to do what you want.

In our informal testing of GPT-5.1 with custom instructions, ChatGPT did appear to follow our request not to produce em dashes. But despite Altman’s claim, the response from X users appears to show that experiences with the feature continue to vary, at least when the request is not placed in custom instructions.

So if LLMs are statistical text-generation boxes, what does “instruction following” even mean? That’s key to unpacking the hypothetical path from LLMs to AGI. The concept of following instructions for an LLM is fundamentally different from how we typically think about following instructions as humans with general intelligence, or even a traditional computer program.

In traditional computing, instruction following is deterministic. You tell a program “don’t include character X,” and it won’t include that character. The program executes rules exactly as written. With LLMs, “instruction following” is really about shifting statistical probabilities. When you tell ChatGPT “don’t use em dashes,” you’re not creating a hard rule. You’re adding text to the prompt that makes tokens associated with em dashes less likely to be selected during the generation process. But “less likely” isn’t “impossible.”

Every token the model generates is selected from a probability distribution. Your custom instruction influences that distribution, but it’s competing with the model’s training data (where em dashes appeared frequently in certain contexts) and everything else in the prompt. Unlike code with conditional logic, there’s no separate system verifying outputs against your requirements. The instruction is just more text that influences the statistical prediction process.

When Altman celebrates finally getting GPT to avoid em dashes, he’s really celebrating that OpenAI has tuned the latest version of GPT-5.1 (probably through reinforcement learning or fine-tuning) to weight custom instructions more heavily in its probability calculations.

There’s an irony about control here: Given the probabilistic nature of the issue, there’s no guarantee the issue will stay fixed. OpenAI continuously updates its models behind the scenes, even within the same version number, adjusting outputs based on user feedback and new training runs. Each update arrives with different output characteristics that can undo previous behavioral tuning, a phenomenon researchers call the “alignment tax.”

Precisely tuning a neural network’s behavior is not yet an exact science. Since all concepts encoded in the network are interconnected by values called weights, adjusting one behavior can alter others in unintended ways. Fix em dash overuse today, and tomorrow’s update (aimed at improving, say, coding capabilities) might inadvertently bring them back, not because OpenAI wants them there, but because that’s the nature of trying to steer a statistical system with millions of competing influences.

This gets to an implied question we mentioned earlier. If controlling punctuation use is still a struggle that might pop back up at any time, how far are we from AGI? We can’t know for sure, but it seems increasingly likely that it won’t emerge from a large language model alone. That’s because AGI, a technology that would replicate human general learning ability, would likely require true understanding and self-reflective intentional action, not statistical pattern matching that sometimes aligns with instructions if you happen to get lucky.

And speaking of getting lucky, some users still aren’t having luck with controlling em dash use outside of the “custom instructions” feature. Upon being told in-chat to not use em dashes within a chat, ChatGPT updated a saved memory and replied to one X user, “Got it—I’ll stick strictly to short hyphens from now on.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules Read More »

openai-walks-a-tricky-tightrope-with-gpt-5.1’s-eight-new-personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

OpenAI claims that both models perform better on technical benchmarks such as math and coding evaluations (including AIME 2025 and Codeforces) than GPT-5, which was released in August.

Improved benchmarks may win over some users, but the biggest change with GPT-5.1 is in its presentation. OpenAI says it heard from users that they wanted AI models to simulate different communication styles depending on the task, so the company is offering eight preset options, including Professional, Friendly, Candid, Quirky, Efficient, Cynical, and Nerdy, alongside a Default setting.

These presets alter the instructions fed into each prompt to simulate different personality styles, but the underlying model capabilities remain the same across all settings.

An illustration showing GPT-5.1's eight personality styles in ChatGPT.

An illustration showing GPT-5.1’s eight personality styles in ChatGPT. Credit: OpenAI

In addition, the company trained GPT-5.1 Instant to use “adaptive reasoning,” meaning that the model decides when to spend more computational time processing a prompt before generating output.

The company plans to roll out the models gradually over the next few days, starting with paid subscribers before expanding to free users. OpenAI plans to bring both GPT-5.1 Instant and GPT-5.1 Thinking to its API later this week. GPT-5.1 Instant will appear as gpt-5.1-chat-latest, and GPT-5.1 Thinking will be released as GPT-5.1 in the API, both with adaptive reasoning enabled. The older GPT-5 models will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months.

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities Read More »

researchers-surprised-that-with-ai,-toxicity-is-harder-to-fake-than-intelligence

Researchers surprised that with AI, toxicity is harder to fake than intelligence

The next time you encounter an unusually polite reply on social media, you might want to check twice. It could be an AI model trying (and failing) to blend in with the crowd.

On Wednesday, researchers from the University of Zurich, University of Amsterdam, Duke University, and New York University released a study revealing that AI models remain easily distinguishable from humans in social media conversations, with overly friendly emotional tone serving as the most persistent giveaway. The research, which tested nine open-weight models across Twitter/X, Bluesky, and Reddit, found that classifiers developed by the researchers detected AI-generated replies with 70 to 80 percent accuracy.

The study introduces what the authors call a “computational Turing test” to assess how closely AI models approximate human language. Instead of relying on subjective human judgment about whether text sounds authentic, the framework uses automated classifiers and linguistic analysis to identify specific features that distinguish machine-generated from human-authored content.

“Even after calibration, LLM outputs remain clearly distinguishable from human text, particularly in affective tone and emotional expression,” the researchers wrote. The team, led by Nicolò Pagan at the University of Zurich, tested various optimization strategies, from simple prompting to fine-tuning, but found that deeper emotional cues persist as reliable tells that a particular text interaction online was authored by an AI chatbot rather than a human.

The toxicity tell

In the study, researchers tested nine large language models: Llama 3.1 8B, Llama 3.1 8B Instruct, Llama 3.1 70B, Mistral 7B v0.1, Mistral 7B Instruct v0.2, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, DeepSeek-R1-Distill-Llama-8B, and Apertus-8B-2509.

When prompted to generate replies to real social media posts from actual users, the AI models struggled to match the level of casual negativity and spontaneous emotional expression common in human social media posts, with toxicity scores consistently lower than authentic human replies across all three platforms.

To counter this deficiency, the researchers attempted optimization strategies (including providing writing examples and context retrieval) that reduced structural differences like sentence length or word count, but variations in emotional tone persisted. “Our comprehensive calibration tests challenge the assumption that more sophisticated optimization necessarily yields more human-like output,” the researchers concluded.

Researchers surprised that with AI, toxicity is harder to fake than intelligence Read More »

openai-signs-massive-ai-compute-deal-with-amazon

OpenAI signs massive AI compute deal with Amazon

On Monday, OpenAI announced it has signed a seven-year, $38 billion deal to buy cloud services from Amazon Web Services to power products like ChatGPT and Sora. It’s the company’s first big computing deal after a fundamental restructuring last week that gave OpenAI more operational and financial freedom from Microsoft.

The agreement gives OpenAI access to hundreds of thousands of Nvidia graphics processors to train and run its AI models. “Scaling frontier AI requires massive, reliable compute,” OpenAI CEO Sam Altman said in a statement. “Our partnership with AWS strengthens the broad compute ecosystem that will power this next era and bring advanced AI to everyone.”

OpenAI will reportedly use Amazon Web Services immediately, with all planned capacity set to come online by the end of 2026 and room to expand further in 2027 and beyond. Amazon plans to roll out hundreds of thousands of chips, including Nvidia’s GB200 and GB300 AI accelerators, in data clusters built to power ChatGPT’s responses, generate AI videos, and train OpenAI’s next wave of models.

Wall Street apparently liked the deal, because Amazon shares hit an all-time high on Monday morning. Meanwhile, shares for long-time OpenAI investor and partner Microsoft briefly dipped following the announcement.

Massive AI compute requirements

It’s no secret that running generative AI models for hundreds of millions of people currently requires a lot of computing power. Amid chip shortages over the past few years, finding sources of that computing muscle has been tricky. OpenAI is reportedly working on its own GPU hardware to help alleviate the strain.

But for now, the company needs to find new sources of Nvidia chips, which accelerate AI computations. Altman has previously said that the company plans to spend $1.4 trillion to develop 30 gigawatts of computing resources, an amount that is enough to roughly power 25 million US homes, according to Reuters.

OpenAI signs massive AI compute deal with Amazon Read More »

senators-move-to-keep-big-tech’s-creepy-companion-bots-away-from-kids

Senators move to keep Big Tech’s creepy companion bots away from kids

Big Tech says bans aren’t the answer

As the bill advances, it could change, senators and parents acknowledged at the press conference. It will likely face backlash from privacy advocates who have raised concerns that widely collecting personal data for age verification puts sensitive information at risk of a data breach or other misuse.

The tech industry has already voiced opposition. On Tuesday, Chamber of Progress, a Big Tech trade group, criticized the law as taking a “heavy-handed approach” to child safety. The group’s vice president of US policy and government relations, K.J. Bagchi, said that “we all want to keep kids safe, but the answer is balance, not bans.

“It’s better to focus on transparency when kids chat with AI, curbs on manipulative design, and reporting when sensitive issues arise,” Bagchi said.

However, several organizations dedicated to child safety online, including the Young People’s Alliance, the Tech Justice Law Project, and the Institute for Families and Technology, cheered senators’ announcement Tuesday. The GUARD Act, these groups told Time, is just “one part of a national movement to protect children and teens from the dangers of companion chatbots.”

Mourning parents are rallying behind that movement. Earlier this month, Garcia praised California for “finally” passing the first state law requiring companies to protect their users who express suicidal ideations to chatbots.

“American families, like mine, are in a battle for the online safety of our children,” Garcia said at that time.

During Tuesday’s press conference, Blumenthal noted that the chatbot ban bill was just one initiative of many that he and Hawley intend to raise to heighten scrutiny on AI firms.

Senators move to keep Big Tech’s creepy companion bots away from kids Read More »

openai-data-suggests-1-million-users-discuss-suicide-with-chatgpt-weekly

OpenAI data suggests 1 million users discuss suicide with ChatGPT weekly

Earlier this month, the company unveiled a wellness council to address these concerns, though critics noted the council did not include a suicide prevention expert. OpenAI also recently rolled out controls for parents of children who use ChatGPT. The company says it’s building an age prediction system to automatically detect children using ChatGPT and impose a stricter set of age-related safeguards.

Rare but impactful conversations

The data shared on Monday appears to be part of the company’s effort to demonstrate progress on these issues, although it also shines a spotlight on just how deeply AI chatbots may be affecting the health of the public at large.

In a blog post on the recently released data, OpenAI says these types of conversations in ChatGPT that might trigger concerns about “psychosis, mania, or suicidal thinking” are “extremely rare,” and thus difficult to measure. The company estimates that around 0.07 percent of users active in a given week and 0.01 percent of messages indicate possible signs of mental health emergencies related to psychosis or mania. For emotional attachment, the company estimates around 0.15 percent of users active in a given week and 0.03 percent of messages indicate potentially heightened levels of emotional attachment to ChatGPT.

OpenAI also claims that on an evaluation of over 1,000 challenging mental health-related conversations, the new GPT-5 model was 92 percent compliant with its desired behaviors, compared to 27 percent for a previous GPT-5 model released on August 15. The company also says its latest version of GPT-5 holds up to OpenAI’s safeguards better in long conversations. OpenAI has previously admitted that its safeguards are less effective during extended conversations.

In addition, OpenAI says it’s adding new evaluations to attempt to measure some of the most serious mental health issues facing ChatGPT users. The company says its baseline safety testing for its AI language models will now include benchmarks for emotional reliance and non-suicidal mental health emergencies.

Despite the ongoing mental health concerns, OpenAI CEO Sam Altman announced on October 14 that the company will allow verified adult users to have erotic conversations with ChatGPT starting in December. The company had loosened ChatGPT content restrictions in February but then dramatically tightened them after the August lawsuit. Altman explained that OpenAI had made ChatGPT “pretty restrictive to make sure we were being careful with mental health issues” but acknowledged this approach made the chatbot “less useful/enjoyable to many users who had no mental health problems.”

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

OpenAI data suggests 1 million users discuss suicide with ChatGPT weekly Read More »

chatgpt-erotica-coming-soon-with-age-verification,-ceo-says

ChatGPT erotica coming soon with age verification, CEO says

On Tuesday, OpenAI CEO Sam Altman announced that the company will allow verified adult users to have erotic conversations with ChatGPT starting in December. The change represents a shift in how OpenAI approaches content restrictions, which the company had loosened in February but then dramatically tightened after an August lawsuit from parents of a teen who died by suicide after allegedly receiving encouragement from ChatGPT.

“In December, as we roll out age-gating more fully and as part of our ‘treat adult users like adults’ principle, we will allow even more, like erotica for verified adults,” Altman wrote in his post on X (formerly Twitter). The announcement follows OpenAI’s recent hint that it would allow developers to create “mature” ChatGPT applications once the company implements appropriate age verification and controls.

Altman explained that OpenAI had made ChatGPT “pretty restrictive to make sure we were being careful with mental health issues” but acknowledged this approach made the chatbot “less useful/enjoyable to many users who had no mental health problems.” The CEO said the company now has new tools to better detect when users are experiencing mental distress, allowing OpenAI to relax restrictions in most cases.

Striking the right balance between freedom for adults and safety for users has been a difficult balancing act for OpenAI, which has vacillated between permissive and restrictive chat content controls over the past year.

In February, the company updated its Model Spec to allow erotica in “appropriate contexts.” But a March update made GPT-4o so agreeable that users complained about its “relentlessly positive tone.” By August, Ars reported on cases where ChatGPT’s sycophantic behavior had validated users’ false beliefs to the point of causing mental health crises, and news of the aforementioned suicide lawsuit hit not long after.

Aside from adjusting the behavioral outputs for its previous GPT-40 AI language model, new model changes have also created some turmoil among users. Since the launch of GPT-5 in early August, some users have been complaining that the new model feels less engaging than its predecessor, prompting OpenAI to bring back the older model as an option. Altman said the upcoming release will allow users to choose whether they want ChatGPT to “respond in a very human-like way, or use a ton of emoji, or act like a friend.”

ChatGPT erotica coming soon with age verification, CEO says Read More »

to-shield-kids,-california-hikes-fake-nude-fines-to-$250k-max

To shield kids, California hikes fake nude fines to $250K max

California is cracking down on AI technology deemed too harmful for kids, attacking two increasingly notorious child safety fronts: companion bots and deepfake pornography.

On Monday, Governor Gavin Newsom signed the first-ever US law regulating companion bots after several teen suicides sparked lawsuits.

Moving forward, California will require any companion bot platforms—including ChatGPT, Grok, Character.AI, and the like—to create and make public “protocols to identify and address users’ suicidal ideation or expressions of self-harm.”

They must also share “statistics regarding how often they provided users with crisis center prevention notifications to the Department of Public Health,” the governor’s office said. Those stats will also be posted on the platforms’ websites, potentially helping lawmakers and parents track any disturbing trends.

Further, companion bots will be banned from claiming that they’re therapists, and platforms must take extra steps to ensure child safety, including providing kids with break reminders and preventing kids from viewing sexually explicit images.

Additionally, Newsom strengthened the state’s penalties for those who create deepfake pornography, which could help shield young people, who are increasingly targeted with fake nudes, from cyber bullying.

Now any victims, including minors, can seek up to $250,000 in damages per deepfake from any third parties who knowingly distribute nonconsensual sexually explicit material created using AI tools. Previously, the state allowed victims to recover “statutory damages of not less than $1,500 but not more than $30,000, or $150,000 for a malicious violation.”

Both laws take effect January 1, 2026.

American families “are in a battle” with AI

The companion bot law’s sponsor, Democratic Senator Steve Padilla, said in a press release celebrating the signing that the California law demonstrates how to “put real protections into place” and said it “will become the bedrock for further regulation as this technology develops.”

To shield kids, California hikes fake nude fines to $250K max Read More »

vandals-deface-ads-for-ai-necklaces-that-listen-to-all-your-conversations

Vandals deface ads for AI necklaces that listen to all your conversations

In addition to backlash over feared surveillance capitalism, critics have accused Schiffman of taking advantage of the loneliness epidemic. Conducting a survey last year, researchers with Harvard Graduate School of Education’s Making Caring Common found that people between “30-44 years of age were the loneliest group.” Overall, 73 percent of those surveyed “selected technology as contributing to loneliness in the country.”

But Schiffman rejects these criticisms, telling the NYT that his AI Friend pendant is intended to supplement human friends, not replace them, supposedly helping to raise the “average emotional intelligence” of users “significantly.”

“I don’t view this as dystopian,” Schiffman said, suggesting that “the AI friend is a new category of companionship, one that will coexist alongside traditional friends rather than replace them,” the NYT reported. “We have a cat and a dog and a child and an adult in the same room,” the Friend founder said. “Why not an AI?”

The MTA has not commented on the controversy, but Victoria Mottesheard—a vice president at Outfront Media, which manages MTA advertising—told the NYT that the Friend campaign blew up because AI “is the conversation of 2025.”

Website lets anyone deface Friend ads

So far, the Friend ads have not yielded significant sales, Schiffman confirmed, telling the NYT that only 3,100 have sold. He expects that society isn’t ready for AI companions to be promoted at such a large scale and that his ad campaign will help normalize AI friends.

In the meantime, critics have rushed to attack Friend on social media, inspiring a website where anyone can vandalize a Friend ad and share it online. That website has received close to 6,000 submissions so far, its creator, Marc Mueller, told the NYT, and visitors can take a tour of these submissions by choosing “ride train to see more” after creating their own vandalized version.

For visitors to Mueller’s site, riding the train displays a carousel documenting backlash to Friend, as well as “performance art” by visitors poking fun at the ads in less serious ways. One example showed a vandalized ad changing “Friend” to “Fries,” with a crude illustration of McDonald’s French fries, while another transformed the ad into a campaign for “fried chicken.”

Others were seemingly more serious about turning the ad into a warning. One vandal drew a bunch of arrows pointing to the “end” in Friend while turning the pendant into a cry-face emoji, seemingly drawing attention to research on the mental health risks of relying on AI companions—including the alleged suicide risks of products like Character.AI and ChatGPT, which have spawned lawsuits and prompted a Senate hearing.

Vandals deface ads for AI necklaces that listen to all your conversations Read More »

experts-urge-caution-about-using-chatgpt-to-pick-stocks

Experts urge caution about using ChatGPT to pick stocks

“AI models can be brilliant,” Dan Moczulski, UK managing director at eToro, told Reuters. “The risk comes when people treat generic models like ChatGPT or Gemini as crystal balls.” He noted that general AI models “can misquote figures and dates, lean too hard on a pre-established narrative, and overly rely on past price action to attempt to predict the future.”

The hazards of AI stock picking

Using AI to trade stocks at home feels like it might be the next step in a long series of technological advances that have democratized individual retail investing, for better or for worse. Computer-based stock trading for individuals dates back to 1984, when Charles Schwab introduced electronic trading services for dial-up customers. E-Trade launched in 1992, and by the late 1990s, online brokerages had transformed retail investing, dropping commission fees from hundreds of dollars per trade to under $10.

The first “robo-advisors” appeared after the 2008 financial crisis, which began the rise of automated online services that use algorithms to manage and rebalance portfolios based on a client’s goals. Services like Betterment launched in 2010, and Wealthfront followed in 2011, using algorithms to automatically rebalance portfolios. By the end of 2015, robo-advisors from nearly 100 companies globally were managing $60 billion in client assets.

The arrival of ChatGPT in November 2022 arguably marked a new phase where retail investors could directly query an AI model for stock picks rather than relying on pre-programmed algorithms. But Leung acknowledged that ChatGPT cannot access data behind paywalls, potentially missing crucial analyses available through professional services. To get better results, he creates specific prompts like “assume you’re a short analyst, what is the short thesis for this stock?” or “use only credible sources, such as SEC filings.”

Beyond chatbots, reliance on financial algorithms is growing. The “robo-advisory” market, which includes all companies providing automated, algorithm-driven financial advice from fintech startups to established banks, is forecast to grow roughly 600 percent by 2029, according to data-analysis firm Research and Markets.

But as more retail investors turn to AI tools for investment decisions, it’s also potential trouble waiting to happen.

“If people get comfortable investing using AI and they’re making money, they may not be able to manage in a crisis or downturn,” Leung warned Reuters. The concern extends beyond individual losses to whether retail investors using AI tools understand risk management or have strategies for when markets turn bearish.

Experts urge caution about using ChatGPT to pick stocks Read More »