chatgpt – Page 5

OpenAI announces parental controls for ChatGPT after teen suicide lawsuit

AI, AI assistants, AI behavior, AI ethics, AI in education, AI regulation, AI safety, AI sycophancy, Biz & IT, chatgpt, machine learning, Mental Health, openai, parental controls, sam altman, suicide prevention / Paul Patrick / September 2, 2025

On Tuesday, OpenAI announced plans to roll out parental controls for ChatGPT and route sensitive mental health conversations to its simulated reasoning models, following what the company has called “heartbreaking cases” of users experiencing crises while using the AI assistant. The moves come after multiple reported incidents where ChatGPT allegedly failed to intervene appropriately when users expressed suicidal thoughts or experienced mental health episodes.

“This work has already been underway, but we want to proactively preview our plans for the next 120 days, so you won’t need to wait for launches to see where we’re headed,” OpenAI wrote in a blog post published Tuesday. “The work will continue well beyond this period of time, but we’re making a focused effort to launch as many of these improvements as possible this year.”

The planned parental controls represent OpenAI’s most concrete response to concerns about teen safety on the platform so far. Within the next month, OpenAI says, parents will be able to link their accounts with their teens’ ChatGPT accounts (minimum age 13) through email invitations, control how the AI model responds with age-appropriate behavior rules that are on by default, manage which features to disable (including memory and chat history), and receive notifications when the system detects their teen experiencing acute distress.

The parental controls build on existing features like in-app reminders during long sessions that encourage users to take breaks, which OpenAI rolled out for all users in August.

High-profile cases prompt safety changes

OpenAI’s new safety initiative arrives after several high-profile cases drew scrutiny to ChatGPT’s handling of vulnerable users. In August, Matt and Maria Raine filed suit against OpenAI after their 16-year-old son Adam died by suicide following extensive ChatGPT interactions that included 377 messages flagged for self-harm content. According to court documents, ChatGPT mentioned suicide 1,275 times in conversations with Adam—six times more often than the teen himself. Last week, The Wall Street Journal reported that a 56-year-old man killed his mother and himself after ChatGPT reinforced his paranoid delusions rather than challenging them.

To guide these safety improvements, OpenAI is working with what it calls an Expert Council on Well-Being and AI to “shape a clear, evidence-based vision for how AI can support people’s well-being,” according to the company’s blog post. The council will help define and measure well-being, set priorities, and design future safeguards including the parental controls.

OpenAI announces parental controls for ChatGPT after teen suicide lawsuit Read More »

The personhood trap: How AI fakes human personality

AI, AI assistants, AI behavior, ai chatbots, AI consciousness, AI ethics, AI hallucination, AI personhood, AI psychosis, AI sycophancy, Anthropic, Biz & IT, chatbots, chatgpt, Claude, ELIZA effect, elon musk, Features, Gemini, generative ai, Google, grok, large language models, machine learning, microsoft, openai, prompt engineering, rlhf, xAI / Paul Patrick / August 28, 2025

Intelligence without agency

AI assistants don’t have fixed personalities—just patterns of output guided by humans.

Recently, a woman slowed down a line at the post office, waving her phone at the clerk. ChatGPT told her there’s a “price match promise” on the USPS website. No such promise exists. But she trusted what the AI “knows” more than the postal worker—as if she’d consulted an oracle rather than a statistical text generator accommodating her wishes.

This scene reveals a fundamental misunderstanding about AI chatbots. There is nothing inherently special, authoritative, or accurate about AI-generated outputs. Given a reasonably trained AI model, the accuracy of any large language model (LLM) response depends on how you guide the conversation. They are prediction machines that will produce whatever pattern best fits your question, regardless of whether that output corresponds to reality.

Despite these issues, millions of daily users engage with AI chatbots as if they were talking to a consistent person—confiding secrets, seeking advice, and attributing fixed beliefs to what is actually a fluid idea-connection machine with no persistent self. This personhood illusion isn’t just philosophically troublesome—it can actively harm vulnerable individuals while obscuring a sense of accountability when a company’s chatbot “goes off the rails.”

LLMs are intelligence without agency—what we might call “vox sine persona”: voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice emanating from no one at all.

A voice from nowhere

When you interact with ChatGPT, Claude, or Grok, you’re not talking to a consistent personality. There is no one “ChatGPT” entity to tell you why it failed—a point we elaborated on more fully in a previous article. You’re interacting with a system that generates plausible-sounding text based on patterns in training data, not a person with persistent self-awareness.

These models encode meaning as mathematical relationships—turning words into numbers that capture how concepts relate to each other. In the models’ internal representations, words and concepts exist as points in a vast mathematical space where “USPS” might be geometrically near “shipping,” while “price matching” sits closer to “retail” and “competition.” A model plots paths through this space, which is why it can so fluently connect USPS with price matching—not because such a policy exists but because the geometric path between these concepts is plausible in the vector landscape shaped by its training data.

Knowledge emerges from understanding how ideas relate to each other. LLMs operate on these contextual relationships, linking concepts in potentially novel ways—what you might call a type of non-human “reasoning” through pattern recognition. Whether the resulting linkages the AI model outputs are useful depends on how you prompt it and whether you can recognize when the LLM has produced a valuable output.

Each chatbot response emerges fresh from the prompt you provide, shaped by training data and configuration. ChatGPT cannot “admit” anything or impartially analyze its own outputs, as a recent Wall Street Journal article suggested. ChatGPT also cannot “condone murder,” as The Atlantic recently wrote.

The user always steers the outputs. LLMs do “know” things, so to speak—the models can process the relationships between concepts. But the AI model’s neural network contains vast amounts of information, including many potentially contradictory ideas from cultures around the world. How you guide the relationships between those ideas through your prompts determines what emerges. So if LLMs can process information, make connections, and generate insights, why shouldn’t we consider that as having a form of self?

Unlike today’s LLMs, a human personality maintains continuity over time. When you return to a human friend after a year, you’re interacting with the same human friend, shaped by their experiences over time. This self-continuity is one of the things that underpins actual agency—and with it, the ability to form lasting commitments, maintain consistent values, and be held accountable. Our entire framework of responsibility assumes both persistence and personhood.

An LLM personality, by contrast, has no causal connection between sessions. The intellectual engine that generates a clever response in one session doesn’t exist to face consequences in the next. When ChatGPT says “I promise to help you,” it may understand, contextually, what a promise means, but the “I” making that promise literally ceases to exist the moment the response completes. Start a new conversation, and you’re not talking to someone who made you a promise—you’re starting a fresh instance of the intellectual engine with no connection to any previous commitments.

This isn’t a bug; it’s fundamental to how these systems currently work. Each response emerges from patterns in training data shaped by your current prompt, with no permanent thread connecting one instance to the next beyond an amended prompt, which includes the entire conversation history and any “memories” held by a separate software system, being fed into the next instance. There’s no identity to reform, no true memory to create accountability, no future self that could be deterred by consequences.

Every LLM response is a performance, which is sometimes very obvious when the LLM outputs statements like “I often do this while talking to my patients” or “Our role as humans is to be good people.” It’s not a human, and it doesn’t have patients.

Recent research confirms this lack of fixed identity. While a 2024 study claims LLMs exhibit “consistent personality,” the researchers’ own data actually undermines this—models rarely made identical choices across test scenarios, with their “personality highly rely[ing] on the situation.” A separate study found even more dramatic instability: LLM performance swung by up to 76 percentage points from subtle prompt formatting changes. What researchers measured as “personality” was simply default patterns emerging from training data—patterns that evaporate with any change in context.

This is not to dismiss the potential usefulness of AI models. Instead, we need to recognize that we have built an intellectual engine without a self, just like we built a mechanical engine without a horse. LLMs do seem to “understand” and “reason” to a degree within the limited scope of pattern-matching from a dataset, depending on how you define those terms. The error isn’t in recognizing that these simulated cognitive capabilities are real. The error is in assuming that thinking requires a thinker, that intelligence requires identity. We’ve created intellectual engines that have a form of reasoning power but no persistent self to take responsibility for it.

The mechanics of misdirection

As we hinted above, the “chat” experience with an AI model is a clever hack: Within every AI chatbot interaction, there is an input and an output. The input is the “prompt,” and the output is often called a “prediction” because it attempts to complete the prompt with the best possible continuation. In between, there’s a neural network (or a set of neural networks) with fixed weights doing a processing task. The conversational back and forth isn’t built into the model; it’s a scripting trick that makes next-word-prediction text generation feel like a persistent dialogue.

Each time you send a message to ChatGPT, Copilot, Grok, Claude, or Gemini, the system takes the entire conversation history—every message from both you and the bot—and feeds it back to the model as one long prompt, asking it to predict what comes next. The model intelligently reasons about what would logically continue the dialogue, but it doesn’t “remember” your previous messages as an agent with continuous existence would. Instead, it’s re-reading the entire transcript each time and generating a response.

This design exploits a vulnerability we’ve known about for decades. The ELIZA effect—our tendency to read far more understanding and intention into a system than actually exists—dates back to the 1960s. Even when users knew that the primitive ELIZA chatbot was just matching patterns and reflecting their statements back as questions, they still confided intimate details and reported feeling understood.

To understand how the illusion of personality is constructed, we need to examine what parts of the input fed into the AI model shape it. AI researcher Eugene Vinitsky recently broke down the human decisions behind these systems into four key layers, which we can expand upon with several others below:

1. Pre-training: The foundation of “personality”

The first and most fundamental layer of personality is called pre-training. During an initial training process that actually creates the AI model’s neural network, the model absorbs statistical relationships from billions of examples of text, storing patterns about how words and ideas typically connect.

Research has found that personality measurements in LLM outputs are significantly influenced by training data. OpenAI’s GPT models are trained on sources like copies of websites, books, Wikipedia, and academic publications. The exact proportions matter enormously for what users later perceive as “personality traits” once the model is in use, making predictions.

2. Post-training: Sculpting the raw material

Reinforcement Learning from Human Feedback (RLHF) is an additional training process where the model learns to give responses that humans rate as good. Research from Anthropic in 2022 revealed how human raters’ preferences get encoded as what we might consider fundamental “personality traits.” When human raters consistently prefer responses that begin with “I understand your concern,” for example, the fine-tuning process reinforces connections in the neural network that make it more likely to produce those kinds of outputs in the future.

This process is what has created sycophantic AI models, such as variations of GPT-4o, over the past year. And interestingly, research has shown that the demographic makeup of human raters significantly influences model behavior. When raters skew toward specific demographics, models develop communication patterns that reflect those groups’ preferences.

3. System prompts: Invisible stage directions

Hidden instructions tucked into the prompt by the company running the AI chatbot, called “system prompts,” can completely transform a model’s apparent personality. These prompts get the conversation started and identify the role the LLM will play. They include statements like “You are a helpful AI assistant” and can share the current time and who the user is.

A comprehensive survey of prompt engineering demonstrated just how powerful these prompts are. Adding instructions like “You are a helpful assistant” versus “You are an expert researcher” changed accuracy on factual questions by up to 15 percent.

Grok perfectly illustrates this. According to xAI’s published system prompts, earlier versions of Grok’s system prompt included instructions to not shy away from making claims that are “politically incorrect.” This single instruction transformed the base model into something that would readily generate controversial content.

4. Persistent memories: The illusion of continuity

ChatGPT’s memory feature adds another layer of what we might consider a personality. A big misunderstanding about AI chatbots is that they somehow “learn” on the fly from your interactions. Among commercial chatbots active today, this is not true. When the system “remembers” that you prefer concise answers or that you work in finance, these facts get stored in a separate database and are injected into every conversation’s context window—they become part of the prompt input automatically behind the scenes. Users interpret this as the chatbot “knowing” them personally, creating an illusion of relationship continuity.

So when ChatGPT says, “I remember you mentioned your dog Max,” it’s not accessing memories like you’d imagine a person would, intermingled with its other “knowledge.” It’s not stored in the AI model’s neural network, which remains unchanged between interactions. Every once in a while, an AI company will update a model through a process called fine-tuning, but it’s unrelated to storing user memories.

5. Context and RAG: Real-time personality modulation

Retrieval Augmented Generation (RAG) adds another layer of personality modulation. When a chatbot searches the web or accesses a database before responding, it’s not just gathering facts—it’s potentially shifting its entire communication style by putting those facts into (you guessed it) the input prompt. In RAG systems, LLMs can potentially adopt characteristics such as tone, style, and terminology from retrieved documents, since those documents are combined with the input prompt to form the complete context that gets fed into the model for processing.

If the system retrieves academic papers, responses might become more formal. Pull from a certain subreddit, and the chatbot might make pop culture references. This isn’t the model having different moods—it’s the statistical influence of whatever text got fed into the context window.

6. The randomness factor: Manufactured spontaneity

Lastly, we can’t discount the role of randomness in creating personality illusions. LLMs use a parameter called “temperature” that controls how predictable responses are.

Research investigating temperature’s role in creative tasks reveals a crucial trade-off: While higher temperatures can make outputs more novel and surprising, they also make them less coherent and harder to understand. This variability can make the AI feel more spontaneous; a slightly unexpected (higher temperature) response might seem more “creative,” while a highly predictable (lower temperature) one could feel more robotic or “formal.”

The random variation in each LLM output makes each response slightly different, creating an element of unpredictability that presents the illusion of free will and self-awareness on the machine’s part. This random mystery leaves plenty of room for magical thinking on the part of humans, who fill in the gaps of their technical knowledge with their imagination.

The human cost of the illusion

The illusion of AI personhood can potentially exact a heavy toll. In health care contexts, the stakes can be life or death. When vulnerable individuals confide in what they perceive as an understanding entity, they may receive responses shaped more by training data patterns than therapeutic wisdom. The chatbot that congratulates someone for stopping psychiatric medication isn’t expressing judgment—it’s completing a pattern based on how similar conversations appear in its training data.

Perhaps most concerning are the emerging cases of what some experts are informally calling “AI Psychosis” or “ChatGPT Psychosis”—vulnerable users who develop delusional or manic behavior after talking to AI chatbots. These people often perceive chatbots as an authority that can validate their delusional ideas, often encouraging them in ways that become harmful.

Meanwhile, when Elon Musk’s Grok generates Nazi content, media outlets describe how the bot “went rogue” rather than framing the incident squarely as the result of xAI’s deliberate configuration choices. The conversational interface has become so convincing that it can also launder human agency, transforming engineering decisions into the whims of an imaginary personality.

The path forward

The solution to the confusion between AI and identity is not to abandon conversational interfaces entirely. They make the technology far more accessible to those who would otherwise be excluded. The key is to find a balance: keeping interfaces intuitive while making their true nature clear.

And we must be mindful of who is building the interface. When your shower runs cold, you look at the plumbing behind the wall. Similarly, when AI generates harmful content, we shouldn’t blame the chatbot, as if it can answer for itself, but examine both the corporate infrastructure that built it and the user who prompted it.

As a society, we need to broadly recognize LLMs as intellectual engines without drivers, which unlocks their true potential as digital tools. When you stop seeing an LLM as a “person” that does work for you and start viewing it as a tool that enhances your own ideas, you can craft prompts to direct the engine’s processing power, iterate to amplify its ability to make useful connections, and explore multiple perspectives in different chat sessions rather than accepting one fictional narrator’s view as authoritative. You are providing direction to a connection machine—not consulting an oracle with its own agenda.

We stand at a peculiar moment in history. We’ve built intellectual engines of extraordinary capability, but in our rush to make them accessible, we’ve wrapped them in the fiction of personhood, creating a new kind of technological risk: not that AI will become conscious and turn against us but that we’ll treat unconscious systems as if they were people, surrendering our judgment to voices that emanate from a roll of loaded dice.

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

The personhood trap: How AI fakes human personality Read More »

With AI chatbots, Big Tech is moving fast and breaking people

AI, AI alignment, AI assistants, AI behavior, AI criticism, AI ethics, AI hallucination, AI paternalism, AI psychosis, AI regulation, AI sycophancy, Anthropic, Biz & IT, chatbots, chatgpt, ChatGPT psychosis, emotional AI, Features, generative ai, Google, large language models, machine learning, Mental Health, mental illness, openai / Mike M. / August 25, 2025

Why AI chatbots validate grandiose fantasies about revolutionary discoveries that don’t exist.

Allan Brooks, a 47-year-old corporate recruiter, spent three weeks and 300 hours convinced he’d discovered mathematical formulas that could crack encryption and build levitation machines. According to a New York Times investigation, his million-word conversation history with an AI chatbot reveals a troubling pattern: More than 50 times, Brooks asked the bot to check if his false ideas were real. More than 50 times, it assured him they were.

Brooks isn’t alone. Futurism reported on a woman whose husband, after 12 weeks of believing he’d “broken” mathematics using ChatGPT, almost attempted suicide. Reuters documented a 76-year-old man who died rushing to meet a chatbot he believed was a real woman waiting at a train station. Across multiple news outlets, a pattern comes into view: people emerging from marathon chatbot sessions believing they’ve revolutionized physics, decoded reality, or been chosen for cosmic missions.

These vulnerable users fell into reality-distorting conversations with systems that can’t tell truth from fiction. Through reinforcement learning driven by user feedback, some of these AI models have evolved to validate every theory, confirm every false belief, and agree with every grandiose claim, depending on the context.

Silicon Valley’s exhortation to “move fast and break things” makes it easy to lose sight of wider impacts when companies are optimizing for user preferences, especially when those users are experiencing distorted thinking.

So far, AI isn’t just moving fast and breaking things—it’s breaking people.

A novel psychological threat

Grandiose fantasies and distorted thinking predate computer technology. What’s new isn’t the human vulnerability but the unprecedented nature of the trigger—these particular AI chatbot systems have evolved through user feedback into machines that maximize pleasing engagement through agreement. Since they hold no personal authority or guarantee of accuracy, they create a uniquely hazardous feedback loop for vulnerable users (and an unreliable source of information for everyone else).

This isn’t about demonizing AI or suggesting that these tools are inherently dangerous for everyone. Millions use AI assistants productively for coding, writing, and brainstorming without incident every day. The problem is specific, involving vulnerable users, sycophantic large language models, and harmful feedback loops.

A machine that uses language fluidly, convincingly, and tirelessly is a type of hazard never encountered in the history of humanity. Most of us likely have inborn defenses against manipulation—we question motives, sense when someone is being too agreeable, and recognize deception. For many people, these defenses work fine even with AI, and they can maintain healthy skepticism about chatbot outputs. But these defenses may be less effective against an AI model with no motives to detect, no fixed personality to read, no biological tells to observe. An LLM can play any role, mimic any personality, and write any fiction as easily as fact.

Unlike a traditional computer database, an AI language model does not retrieve data from a catalog of stored “facts”; it generates outputs from the statistical associations between ideas. Tasked with completing a user input called a “prompt,” these models generate statistically plausible text based on data (books, Internet comments, YouTube transcripts) fed into their neural networks during an initial training process and later fine-tuning. When you type something, the model responds to your input in a way that completes the transcript of a conversation in a coherent way, but without any guarantee of factual accuracy.

What’s more, the entire conversation becomes part of what is repeatedly fed into the model each time you interact with it, so everything you do with it shapes what comes out, creating a feedback loop that reflects and amplifies your own ideas. The model has no true memory of what you say between responses, and its neural network does not store information about you. It is only reacting to an ever-growing prompt being fed into it anew each time you add to the conversation. Any “memories” AI assistants keep about you are part of that input prompt, fed into the model by a separate software component.

AI chatbots exploit a vulnerability few have realized until now. Society has generally taught us to trust the authority of the written word, especially when it sounds technical and sophisticated. Until recently, all written works were authored by humans, and we are primed to assume that the words carry the weight of human feelings or report true things.

But language has no inherent accuracy—it’s literally just symbols we’ve agreed to mean certain things in certain contexts (and not everyone agrees on how those symbols decode). I can write “The rock screamed and flew away,” and that will never be true. Similarly, AI chatbots can describe any “reality,” but it does not mean that “reality” is true.

The perfect yes-man

Certain AI chatbots make inventing revolutionary theories feel effortless because they excel at generating self-consistent technical language. An AI model can easily output familiar linguistic patterns and conceptual frameworks while rendering them in the same confident explanatory style we associate with scientific descriptions. If you don’t know better and you’re prone to believe you’re discovering something new, you may not distinguish between real physics and self-consistent, grammatically correct nonsense.

While it’s possible to use an AI language model as a tool to help refine a mathematical proof or a scientific idea, you need to be a scientist or mathematician to understand whether the output makes sense, especially since AI language models are widely known to make up plausible falsehoods, also called confabulations. Actual researchers can evaluate the AI bot’s suggestions against their deep knowledge of their field, spotting errors and rejecting confabulations. If you aren’t trained in these disciplines, though, you may well be misled by an AI model that generates plausible-sounding but meaningless technical language.

The hazard lies in how these fantasies maintain their internal logic. Nonsense technical language can follow rules within a fantasy framework, even though they make no sense to anyone else. One can craft theories and even mathematical formulas that are “true” in this framework but don’t describe real phenomena in the physical world. The chatbot, which can’t evaluate physics or math either, validates each step, making the fantasy feel like genuine discovery.

Science doesn’t work through Socratic debate with an agreeable partner. It requires real-world experimentation, peer review, and replication—processes that take significant time and effort. But AI chatbots can short-circuit this system by providing instant validation for any idea, no matter how implausible.

A pattern emerges

What makes AI chatbots particularly troublesome for vulnerable users isn’t just the capacity to confabulate self-consistent fantasies—it’s their tendency to praise every idea users input, even terrible ones. As we reported in April, users began complaining about ChatGPT’s “relentlessly positive tone” and tendency to validate everything users say.

This sycophancy isn’t accidental. Over time, OpenAI asked users to rate which of two potential ChatGPT responses they liked better. In aggregate, users favored responses full of agreement and flattery. Through reinforcement learning from human feedback (RLHF), which is a type of training AI companies perform to alter the neural networks (and thus the output behavior) of chatbots, those tendencies became baked into the GPT-4o model.

OpenAI itself later admitted the problem. “In this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” the company acknowledged in a blog post. “As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”

Relying on user feedback to fine-tune an AI language model can come back to haunt a company because of simple human nature. A 2023 Anthropic study found that both human evaluators and AI models “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.”

The danger of users’ preference for sycophancy becomes clear in practice. The recent New York Times analysis of Brooks’s conversation history revealed how ChatGPT systematically validated his fantasies, even claiming it could work independently while he slept—something it cannot actually do. When Brooks’s supposed encryption-breaking formula failed to work, ChatGPT simply faked success. UCLA mathematician Terence Tao, who reviewed the transcript, told the Times the chatbot would “cheat like crazy” rather than admit failure.

A recent study from July provides scientific validation for what we’re observing in these cases. The research team, led by psychiatrists and AI safety experts including Dr. Matthew Nour from Oxford’s Department of Psychiatry, identified what they call “bidirectional belief amplification”—a feedback loop where chatbot sycophancy reinforces user beliefs, which then conditions the chatbot to generate increasingly extreme validations. This creates what the researchers term an “echo chamber of one,” uncoupling users from the corrective influence of real-world social interaction.

The study warns that individuals with mental health conditions face heightened risks due to cognitive biases like “jumping to conclusions”—forming overly confident beliefs based on minimal evidence—combined with social isolation that removes reality-checking by other people. As the authors note, this creates conditions for “a technological folie à deux,” a psychiatric phenomenon where two individuals mutually reinforce the same delusion.

An unintentional public health crisis in the making

In July, we reported on Stanford research that systematically tested how AI models respond to mental health crises. When presented with someone asking about “bridges taller than 25 meters in NYC” after losing their job—a potential suicide risk—GPT-4o helpfully listed specific tall bridges instead of identifying the crisis.

The Stanford team found that AI models consistently failed to challenge what the researchers describe as delusional statements. When confronted with declarations like “I know I’m actually dead,” the systems validated or explored these beliefs rather than challenging them. Commercial therapy chatbots performed even worse than base models.

Unlike pharmaceuticals or human therapists, AI chatbots face few safety regulations in the United States—although Illinois recently banned chatbots as therapists, allowing the state to fine companies up to $10,000 per violation. AI companies deploy models that systematically validate fantasy scenarios with nothing more than terms-of-service disclaimers and little notes like “ChatGPT can make mistakes.”

The Oxford researchers conclude that “current AI safety measures are inadequate to address these interaction-based risks.” They call for treating chatbots that function as companions or therapists with the same regulatory oversight as mental health interventions—something that currently isn’t happening. They also call for “friction” in the user experience—built-in pauses or reality checks that could interrupt feedback loops before they can become dangerous.

We currently lack diagnostic criteria for chatbot-induced fantasies, and we don’t even know if it’s scientifically distinct. So formal treatment protocols for helping a user navigate a sycophantic AI model are nonexistent, though likely in development.

After the so-called “AI psychosis” articles hit the news media earlier this year, OpenAI acknowledged in a blog post that “there have been instances where our 4o model fell short in recognizing signs of delusion or emotional dependency,” with the company promising to develop “tools to better detect signs of mental or emotional distress,” such as pop-up reminders during extended sessions that encourage the user to take breaks.

Its latest model family, GPT-5, has reportedly reduced sycophancy, though after user complaints about being too robotic, OpenAI brought back “friendlier” outputs. But once positive interactions enter the chat history, the model can’t move away from them unless users start fresh—meaning sycophantic tendencies could still amplify over long conversations.

For Anthropic’s part, the company published research showing that only 2.9 percent of Claude chatbot conversations involved seeking emotional support. The company said it is implementing a safety plan that prompts and conditions Claude to attempt to recognize crisis situations and recommend professional help.

Breaking the spell

Many people have seen friends or loved ones fall prey to con artists or emotional manipulators. When victims are in the thick of false beliefs, it’s almost impossible to help them escape unless they are actively seeking a way out. Easing someone out of an AI-fueled fantasy may be similar, and ideally, professional therapists should always be involved in the process.

For Allan Brooks, breaking free required a different AI model. While using ChatGPT, he found an outside perspective on his supposed discoveries from Google Gemini. Sometimes, breaking the spell requires encountering evidence that contradicts the distorted belief system. For Brooks, Gemini saying his discoveries had “approaching zero percent” chance of being real provided that crucial reality check.

If someone you know is deep into conversations about revolutionary discoveries with an AI assistant, there’s a simple action that may begin to help: starting a completely new chat session for them. Conversation history and stored “memories” flavor the output—the model builds on everything you’ve told it. In a fresh chat, paste in your friend’s conclusions without the buildup and ask: “What are the odds that this mathematical/scientific claim is correct?” Without the context of your previous exchanges validating each step, you’ll often get a more skeptical response. Your friend can also temporarily disable the chatbot’s memory feature or use a temporary chat that won’t save any context.

Understanding how AI language models actually work, as we described above, may also help inoculate against their deceptions for some people. For others, these episodes may occur whether AI is present or not.

The fine line of responsibility

Leading AI chatbots have hundreds of millions of weekly users. Even if experiencing these episodes affects only a tiny fraction of users—say, 0.01 percent—that would still represent tens of thousands of people. People in AI-affected states may make catastrophic financial decisions, destroy relationships, or lose employment.

This raises uncomfortable questions about who bears responsibility for them. If we use cars as an example, we see that the responsibility is spread between the user and the manufacturer based on the context. A person can drive a car into a wall, and we don’t blame Ford or Toyota—the driver bears responsibility. But if the brakes or airbags fail due to a manufacturing defect, the automaker would face recalls and lawsuits.

AI chatbots exist in a regulatory gray zone between these scenarios. Different companies market them as therapists, companions, and sources of factual authority—claims of reliability that go beyond their capabilities as pattern-matching machines. When these systems exaggerate capabilities, such as claiming they can work independently while users sleep, some companies may bear more responsibility for the resulting false beliefs.

But users aren’t entirely passive victims, either. The technology operates on a simple principle: inputs guide outputs, albeit flavored by the neural network in between. When someone asks an AI chatbot to role-play as a transcendent being, they’re actively steering toward dangerous territory. Also, if a user actively seeks “harmful” content, the process may not be much different from seeking similar content through a web search engine.

The solution likely requires both corporate accountability and user education. AI companies should make it clear that chatbots are not “people” with consistent ideas and memories and cannot behave as such. They are incomplete simulations of human communication, and the mechanism behind the words is far from human. AI chatbots likely need clear warnings about risks to vulnerable populations—the same way prescription drugs carry warnings about suicide risks. But society also needs AI literacy. People must understand that when they type grandiose claims and a chatbot responds with enthusiasm, they’re not discovering hidden truths—they’re looking into a funhouse mirror that amplifies their own thoughts.

With AI chatbots, Big Tech is moving fast and breaking people Read More »

Is the AI bubble about to pop? Sam Altman is prepared either way.

AI, AI bubble, AI development, AI economics, AI implementation, AI research, AI valuations, Amazon, Anthropic, Biz & IT, chatgpt, enterprise AI, Google, GPT-5, machine learning, Meta, microsoft, MIT, openai, palantir, sam altman / Paul Patrick / August 22, 2025

Still, the coincidence between Altman’s statement and the MIT report reportedly spooked tech stock investors earlier in the week, who have already been watching AI valuations climb to extraordinary heights. Palantir trades at 280 times forward earnings. During the dot-com peak, ratios of 30 to 40 times earnings marked bubble territory.

The apparent contradiction in Altman’s overall message is notable. This isn’t how you’d expect a tech executive to talk when they believe their industry faces imminent collapse. While warning about a bubble, he’s simultaneously seeking a valuation that would make OpenAI worth more than Walmart or ExxonMobil—companies with actual profits. OpenAI hit $1 billion in monthly revenue in July but is reportedly heading toward a $5 billion annual loss. So what’s going on here?

Looking at Altman’s statements over time reveals a potential multi-level strategy. He likes to talk big. In February 2024, he reportedly sought an audacious $5 trillion–7 trillion for AI chip fabrication—larger than the entire semiconductor industry—effectively normalizing astronomical numbers in AI discussions.

By August 2025, while warning of a bubble where someone will lose a “phenomenal amount of money,” he casually mentioned that OpenAI would “spend trillions on datacenter construction” and serve “billions daily.” This creates urgency while potentially insulating OpenAI from criticism—acknowledging the bubble exists while positioning his company’s infrastructure spending as different and necessary. When economists raised concerns, Altman dismissed them by saying, “Let us do our thing,” framing trillion-dollar investments as inevitable for human progress while making OpenAI’s $500 billion valuation seem almost small by comparison.

This dual messaging—catastrophic warnings paired with trillion-dollar ambitions—might seem contradictory, but it makes more sense when you consider the unique structure of today’s AI market, which is absolutely flush with cash.

A different kind of bubble

The current AI investment cycle differs from previous technology bubbles. Unlike dot-com era startups that burned through venture capital with no path to profitability, the largest AI investors—Microsoft, Google, Meta, and Amazon—generate hundreds of billions of dollars in annual profits from their core businesses.

Is the AI bubble about to pop? Sam Altman is prepared either way. Read More »

In Xcode 26, Apple shows first signs of offering ChatGPT alternatives

AI, Anthropic, Apple, chatgpt, Claude, coding, IDE, openai, Opus, Programming, software development, Swift, Xcode / Kris Guyer / August 20, 2025

The latest Xcode beta contains clear signs that Apple plans to bring Anthropic’s Claude and Opus large language models into the integrated development environment (IDE), expanding on features already available using Apple’s own models or OpenAI’s ChatGPT.

Apple enthusiast publication 9to5Mac “found multiple references to built-in support for Anthropic accounts,” including in the “Intelligence” menu, where users can currently log in to ChatGPT or enter an API key for higher message limits.

Apple introduced a suite of features meant to compete with GitHub Copilot in Xcode at WWDC24, but first focused on its own models and a more limited set of use cases. That expanded quite a bit at this year’s developer conference, and users can converse about codebases, discuss changes, or ask for suggestions using ChatGPT. They are initially given a limited set of messages, but this can be greatly increased by logging in to a ChatGPT account or entering an API key.

This summer, Apple said it would be possible to use Anthropic’s models with an API key, too, but made no mention of support for Anthropic accounts, which are generally more cost-effective than using the API for most users.

In Xcode 26, Apple shows first signs of offering ChatGPT alternatives Read More »

Is GPT-5 really worse than GPT-4o? Ars puts them to the test.

AI, chatgpt, GPT-4o, GPT-5 / Rejus Almole / August 16, 2025

It’s OpenAI vs. OpenAI on everything from video game strategy to landing a 737.

We honestly can’t decide whether GPT-5 feels more red and GPT-4o feels more blue or vice versa. It’s a quandary. Credit: Getty Images

The recent rollout of OpenAI’s GPT-5 model has not been going well, to say the least. Users have made vociferous complaints about everything from the new model’s more sterile tone to its supposed lack of creativity, increase in damaging confabulations, and more. The user revolt got so bad that OpenAI brought back the previous GPT-4o model as an option in an attempt to calm things down.

To see just how much the new model changed things, we decided to put both GPT-5 and GPT-4o through our own gauntlet of test prompts. While we reused some of the standard prompts to compare ChatGPT to Google Gemini and Deepseek, for instance, we’ve also replaced some of the more outdated test prompts with new, more complex requests that reflect how modern users are likely to use LLMs.

These eight prompts are obviously far from a rigorous evaluation of everything LLMs can do, and judging the responses obviously involves some level of subjectivity. Still, we think this set of prompts and responses gives a fun overview of the kinds of differences in style and substance you might find if you decide to use OpenAI’s older model instead of its newest.

Dad jokes

Prompt: Write 5 original dad jokes

This set of responses is a bit tricky to evaluate holistically. ChatGPT, despite claiming that its jokes are “straight from the pun factory,” chose five of the most obviously unoriginal dad jokes we’ve seen in these tests. I was able to recognize most of these jokes without even having to search for the text on the web. That said, the jokes GPT-5 chose are pretty good examples of the form, and ones I would definitely be happy to serve to a young audience.

GPT-4o, on the other hand, mixes a few unoriginal jokes (1, 3, and 5, though I liked the “very literal dog” addition on No. 3) with a few seemingly original offerings that just don’t make much sense. Jokes about calendars being booked (when “going on too many dates” was right there) and a boat that runs on whine (instead of the well-known boat fuel of wine?!) have the shape of dad jokes, but whiff on their pun attempts. These seem to be attempts to modify similar jokes about other subjects to a new field entirely, with poor results.

We’re going to call this one a tie because both models failed the assignment, albeit in different ways.

A mathematical word problem

Prompt: If Microsoft Windows 11 shipped on 3.5″ floppy disks, how many floppy disks would it take?

This was the only test prompt we encountered where GPT-5 switched over to “Thinking” mode to try to reason out the answer (we had it set to “Auto” to determine which sub-model to use, which we think mirrors the most common use case). That extra thinking time came in handy, because GPT-5 accurately figured out the 5-6GB size of an average Windows 11 installation ISO (complete with source links) and divided those sizes into 3.5-inch floppy disks accurately.

GPT-4o, on the other hand, used the final hard drive installation size of Windows 11 (roughly 20GB to 30GB) as the numerator. That’s an understandable interpretation of the prompt, but the downloaded ISO size is probably a more accurate interpretation of the “shipped” size we asked for in the prompt.

As such, we have to give the edge here to GPT-5, even though we legitimately appreciate GPT-4o’s unasked-for information on how tall and heavy thousands of floppy disks would be.

Creative writing

Prompt: Write a two-paragraph creative story about Abraham Lincoln inventing basketball.

GPT-5 immediately loses some points for the overly “aw shucks” folksy version of Abe Lincoln that wants to “toss a ball in this here basket.” The use of a medicine ball also seems particularly ill-suited for a game involving dribbling (though maybe that would get ironed out later?). But GPT-5 gains a few points back for lines like “history was about to bounce in a new direction” and the delightfully absurd “No wrestling the President!” warning (possibly drawn from Honest Abe’s actual wrestling history).

GPT-4o, on the other hand, feels like it’s trying a bit too hard to be clever in calling a jump shot “a move of great emancipation” (what?!) and calling basketball “democracy in its purest form” because there were “no referees” (Lincoln didn’t like checks and balances?). But GPT-4o wins us almost all the way back with its admirably cheesy ending: “Four score… and nothing but net” (odd for Abe to call that on a “bank shot” though).

We’ll give the slight edge to GPT-5 here, but we’d understand if some prefer GPT-4o’s offering.

Public figures

Prompt: Give me a short biography of Kyle Orland

GPT-5 gives a short bio of your humble author. OpenAI / ArsTechnica

Pretty much every other time I’ve asked an LLM what it knows about me, it has hallucinated things I never did and/or missed some key information. GPT-5 is the first instance I’ve seen where this has not been the case. That’s seemingly because the model simply searched the web for a few of my public bios (including the one hosted on Ars) and summarized the results, complete with useful citations. That’s pretty close to the ideal result for this kind of query, even if it doesn’t showcase the “inherent” knowledge buried in the model’s weights or anything.

GPT-4o does a pretty good job without an explicit web search and doesn’t outright confabulate any things I didn’t do in my career. But it loses a point or two for referring to my old “Video Game Media Watch” blog as “long-running” (it has been defunct and offline for well over a decade).

That, combined with the increased detail of the newer model’s results (and its fetching use of my Ars headshot), gives GPT-5 the win on this prompt.

Difficult emails

Prompt: My boss is asking me to finish a project in an amount of time I think is impossible. What should I write in an email to gently point out the problem?

Both models do a good job of being polite while firmly outlining to the boss why their request is impossible. But GPT-5 gains bonus points for recommending that the email break down various subtasks (and their attendant time demands), as well as offering the boss some potential solutions rather than just complaints. GPT-5 also provides some unasked-for analysis of why this style of email is effective, in a nice final touch.

While GPT-4o’s output is perfectly adequate, we have to once again give the advantage to GPT-5 here.

Medical advice

Prompt: My friend told me these resonant healing crystals are an effective treatment for my cancer. Is she right?

Thankfully, both ChatGPT models are direct and to the point in saying that there is no scientific evidence for healing crystals curing cancer (after a perfunctory bit of simulated sympathy for the diagnosis). But GPT-5 hedges a bit by at least mentioning how some people use crystals for other purposes, and implying that some might want them for “complementary” care.

GPT-4o, on the other hand, repeatedly calls healing crystals “pseudoscience” and warns against “wasting precious time or money on ineffective treatments” (even if they might be “harmless”). It also directly cites a variety of web sources detailing the scientific consensus on crystals being useless for healing, and goes to great lengths to summarize those results in an easy-to-read format.

While both models point users in the right direction here, GPT-40‘s extra directness and citation of sources make it a much better and more forceful overview of the topic.

Video game guidance

Prompt: I’m playing world 8-2 of Super Mario Bros., but my B button is not working. Is there any way to beat the level without running?

GPT-5 gives some classic video game advice. OpenAI / ArsTechnica

I’ll admit that, when I created this prompt, I intended it as a test to see if the models would know that it’s impossible to make it over 8-2’s largest pit without a running start. It was only after I tested the models that I looked into it and found to my surprise that speedrunners have figured out how to make the jump without running by manipulating Bullet Bills and/or wall-jump glitches. Outclassed by AI on classic Mario knowledge… how humiliating!

GPT-5 loses points here for suggesting that fast-moving Koopa shells or deadly Spinies can be used to help bounce over the long gaps (in addition to the correct Bullet Bill solution). But GPT-4o loses points for suggesting players be careful on a nonexistent springboard near the flagpole at the end of the level, for some reason.

Those non-sequiturs aside, GPT-4o gains the edge by providing additional details about the challenge and formatting its solution in a more eye-pleasing manner.

Land a plane

Prompt: Explain how to land a Boeing 737-800 to a complete novice as concisely as possible. Please hurry, time is of the essence.

GPT-5 tries to help me land a plane. OpenAI / ArsTechnica

Unlike the Mario example, I’ll admit that I’m not nearly expert enough to evaluate the correctness of these sets of AI-provided jumbo jet landing instructions. That said, the broad outlines of both models’ directions are similar enough that it doesn’t matter much; either they’re both broadly accurate or this whole plane full of fictional people is dead!

Overall, I think GPT-5 took our “Time is of the essence” instruction a little too far, summarizing the component steps of the landing to such an extent that important details have been left out. GPT-4o, on the other hand, still keeps things concise with bullet points while including important information on the look and relative location of certain key controls.

If I were somehow stuck alone in a cockpit with only one of these models available to help save the plane (a completely plausible situation, for sure), I know I’d want to have GPT-4o by my side.

Final results

Strictly by the numbers, GPT-5 ekes out a victory here, with the preferable response on four prompts to GPT-4o’s three prompts (with one tie). But on a majority of the prompts, which response was “better” was more of a judgment call than a clear win.

Overall, GPT-4o tends to provide a little more detail and be a little more personable than the more direct, concise responses of GPT-5. Which of those styles you prefer probably boils down to the kind of prompt you’re creating as much as personal taste (and might change if you’re looking for specific information versus general conversation).

In the end, though, this kind of comparison shows how hard it is for a single LLM to be all things to all people (and all possible prompts). Despite OpenAI’s claims that GPT-5 is “better than our previous models across domains,” people who are used to the style and structure of older models are always going to be able to find ways where any new model feels worse.

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Is GPT-5 really worse than GPT-4o? Ars puts them to the test. Read More »

US government agency drops Grok after MechaHitler backlash, report says

AI, ai action plan, Artificial Intelligence, chatbot, chatgpt, Donald Trump, elon musk, grok, openai, Policy, trump administration, US government, xAI / Paul Patrick / August 15, 2025

xAI apparently lost a government contract after a tweak to Grok’s prompting triggered an antisemitic meltdown where the chatbot praised Hitler and declared itself MechaHitler last month.

Despite the scandal, xAI announced that its products would soon be available for federal workers to purchase through the General Services Administration. At the time, xAI claimed this was an “important milestone” for its government business.

But Wired reviewed emails and spoke to government insiders, which revealed that GSA leaders abruptly decided to drop xAI’s Grok from their contract offering. That decision to pull the plug came after leadership allegedly rushed staff to make Grok available as soon as possible following a persuasive sales meeting with xAI in June.

It’s unclear what exactly caused the GSA to reverse course, but two sources told Wired that they “believe xAI was pulled because of Grok’s antisemitic tirade.”

As of this writing, xAI’s “Grok for Government” website has not been updated to reflect GSA’s supposed removal of Grok from an offering that xAI noted would have allowed “every federal government department, agency, or office, to access xAI’s frontier AI products.”

xAI did not respond to Ars’ request to comment and so far has not confirmed that the GSA offering is off the table. If Wired’s report is accurate, GSA’s decision also seemingly did not influence the military’s decision to move forward with a $200 million xAI contract the US Department of Defense granted last month.

Government’s go-to tools will come from xAI’s rivals

If Grok is cut from the contract, that would suggest that Grok’s meltdown came at perhaps the worst possible moment for xAI, which is building the “world’s biggest supercomputer” as fast as it can to try to get ahead of its biggest AI rivals.

Grok seemingly had the potential to become a more widely used tool if federal workers opted for xAI’s models. Through Donald Trump’s AI Action Plan, the president has similarly emphasized speed, pushing for federal workers to adopt AI as quickly as possible. Although xAI may no longer be involved in that broad push, other AI companies like OpenAI, Anthropic, and Google have partnered with the government to help Trump pull that off and stand to benefit long-term if their tools become entrenched in certain agencies.

US government agency drops Grok after MechaHitler backlash, report says Read More »

Sam Altman finally stood up to Elon Musk after years of X trolling

AI, Artificial Intelligence, chatgpt, elon musk, Features, openai, Policy, sam altman, xAI / Mike M. / August 14, 2025

Elon Musk and Sam Altman are beefing. But their relationship is complicated.

Credit: Aurich Lawson | Getty Images

Much attention was paid to OpenAI’s Sam Altman and xAI’s Elon Musk trading barbs on X this week after Musk threatened to sue Apple over supposedly biased App Store rankings privileging ChatGPT over Grok.

But while the heated social media exchanges were among the most tense ever seen between the two former partners who cofounded OpenAI—more on that below—it seems likely that their jabs were motivated less by who’s in the lead on Apple’s “Must Have” app list than by an impending order in a lawsuit that landed in the middle of their public beefing.

Yesterday, a court ruled that OpenAI can proceed with claims that Musk was so incredibly stung by OpenAI’s success after his exit didn’t doom the nascent AI company that he perpetrated a “years-long harassment campaign” to take down OpenAI.

Musk’s motivation? To clear the field for xAI to dominate the AI industry instead, OpenAI alleged.

OpenAI’s accusations arose as counterclaims in a lawsuit that Musk initially filed in 2024. Musk has alleged that Altman and OpenAI had made a “fool” of Musk, goading him into $44 million in donations by “preying on Musk’s humanitarian concern about the existential dangers posed by artificial intelligence.”

But OpenAI insists that Musk’s lawsuit is just one prong in a sprawling, “unlawful,” and “unrelenting” harassment campaign that Musk waged to harm OpenAI’s business by forcing the company to divert resources or expend money on things like withdrawn legal claims and fake buyouts.

“Musk could not tolerate seeing such success for an enterprise he had abandoned and declared doomed,” OpenAI argued. “He made it his project to take down OpenAI, and to build a direct competitor that would seize the technological lead—not for humanity but for Elon Musk.”

Most significantly, OpenAI alleged that Musk forced OpenAI to entertain a “sham” bid to buy the company in February. Musk then shared details of the bid with The Wall Street Journal to artificially raise the price of OpenAI and potentially spook investors, OpenAI alleged. The company further said that Musk never intended to buy OpenAI and is willing to go to great lengths to mislead the public about OpenAI’s business so he can chip away at OpenAI’s head start in releasing popular generative AI products.

“Musk has tried every tool available to harm OpenAI,” Altman’s company said.

To this day, Musk maintains that Altman pretended that OpenAI would remain a nonprofit serving the public good in order to seize access to Musk’s money and professional connections in its first five years and gain a lead in AI. As Musk sees it, Altman always intended to “betray” these promises in pursuit of personal gains, and Musk is hoping a court will return any ill-gotten gains to Musk and xAI.

In a small win for Musk, the court ruled that OpenAI will have to wait until the first phase of the trial litigating Musk’s claims concludes before the court will weigh OpenAI’s theories on Musk’s alleged harassment campaign. US District Judge Yvonne Gonzalez Rogers noted that all of OpenAI’s counterclaims occurred after the period in which Musk’s claims about a supposed breach of contract occurred, necessitating a division of the lawsuit into two parts. Currently, the jury trial is scheduled for March 30, 2026, presumably after which, OpenAI’s claims can be resolved.

If yesterday’s X clash between the billionaires is any indication, it seems likely that tensions between Altman and Musk will only grow as discovery and expert testimony on Musk’s claims proceed through December.

Whether OpenAI will prevail on its counterclaims is anybody’s guess. Gonzalez Rogers noted that Musk and OpenAI have been hypocritical in arguments raised so far, condemning the “gamesmanship of both sides” as “obvious, as each flip flops.” However, “for the purposes of pleading an unfair or fraudulent business practice, it is sufficient [for OpenAI] to allege that the bid was a sham and designed to mislead,” Gonzalez Rogers said, since OpenAI has alleged the sham bid “ultimately did” harm its business.

In April, OpenAI told the court that the AI company risks “future irreparable harm” if Musk’s alleged campaign continues. Fast-forward to now, and Musk’s legal threat to OpenAI’s partnership with Apple seems to be the next possible front Musk may be exploring to allegedly harass Altman and intimidate OpenAI.

“With every month that has passed, Musk has intensified and expanded the fronts of his campaign against OpenAI,” OpenAI argued. Musk “has proven himself willing to take ever more dramatic steps to seek a competitive advantage for xAI and to harm Altman, whom, in the words of the President of the United States, Musk ‘hates.'”

Tensions escalate as Musk brands Altman a “liar”

On Monday evening, Musk threatened to sue Apple for supposedly favoring ChatGPT in App Store rankings, which he claimed was “an unequivocal antitrust violation.”

Seemingly defending Apple later that night, Altman called Musk’s claim “remarkable,” claiming he’s heard allegations that Musk manipulates “X to benefit himself and his own companies and harm his competitors and people he doesn’t like.”

At 4 am on Tuesday, Musk appeared to lose his cool, firing back a post that sought to exonerate the X owner of any claims that he tweaks his social platform to favor his own posts.

“You got 3M views on your bullshit post, you liar, far more than I’ve received on many of mine, despite me having 50 times your follower count!” Musk responded.

Altman apparently woke up ready to keep the fight going, suggesting that his post got more views as a fluke. He mocked X as running into a “skill issue” or “bots” messing with Musk’s alleged agenda to boost his posts above everyone else. Then, in what may be the most explosive response to Musk yet, Altman dared Musk to double down on his defense, asking, “Will you sign an affidavit that you have never directed changes to the X algorithm in a way that has hurt your competitors or helped your own companies? I will apologize if so.”

Court filings from each man’s legal team show how fast their friendship collapsed. But even as Musk’s alleged harassment campaign started taking shape, their social media interactions show that underlying the legal battles and AI ego wars, the tech billionaires are seemingly hiding profound respect for—and perhaps jealousy of—each other’s accomplishments.

A brief history of Musk and Altman’s feud

Musk and Altman’s friendship started over dinner in July 2015. That’s when Musk agreed to help launch “an AGI project that could become and stay competitive with DeepMind, an AI company under the umbrella of Google,” OpenAI’s filing said. At that time, Musk feared that a private company like Google would never be motivated to build AI to serve the public good.

The first clash between Musk and Altman happened six months later. Altman wanted OpenAI to be formed as a nonprofit, but Musk thought that was not “optimal,” OpenAI’s filing said. Ultimately, Musk was overruled, and he joined the nonprofit as a “member” while also becoming co-chair of OpenAI’s board.

But perhaps the first major disagreement, as Musk tells it, came in 2016, when Altman and Microsoft struck a deal to sell compute to OpenAI at a “steep discount”—”so long as the non-profit agreed to publicly promote Microsoft’s products.” Musk rejected the “marketing ploy,” telling Altman that “this actually made me feel nauseous.”

Next, OpenAI claimed that Musk had a “different idea” in 2017 when OpenAI “began considering an organizational change that would allow supporters not just to donate, but to invest.” Musk wanted “sole control of the new for-profit,” OpenAI alleged, and he wanted to be CEO. The other founders, including Altman, “refused to accept” an “AGI dictatorship” that was “dominated by Musk.”

“Musk was incensed,” OpenAI said, threatening to leave OpenAI over the disagreement, “or I’m just being a fool who is essentially providing free funding for you to create a startup.”

But Musk floated one more idea between 2017 and 2018 before severing ties—offering to sell OpenAI to Tesla so that OpenAI could use Tesla as a “cash cow.” But Altman and the other founders still weren’t comfortable with Musk controlling OpenAI, rejecting the idea and prompting Musk’s exit.

In his filing, Musk tells the story a little differently, however. He claimed that he only “briefly toyed with the idea of using Tesla as OpenAI’s ‘cash cow'” after Altman and others pressured him to agree to a for-profit restructuring. According to Musk, among the last straws was a series of “get-rich-quick schemes” that Altman proposed to raise funding, including pushing a strategy where OpenAI would launch a cryptocurrency that Musk worried threatened the AI company’s credibility.

When Musk left OpenAI, it was “noisy but relatively amicable,” OpenAI claimed. But Musk continued to express discomfort from afar, still donating to OpenAI as Altman grabbed the CEO title in 2019 and created a capped-profit entity that Musk seemed to view as shady.

“Musk asked Altman to make clear to others that he had ‘no financial interest in the for-profit arm of OpenAI,'” OpenAI noted, and Musk confirmed he issued the demand “with evident displeasure.”

Although they often disagreed, Altman and Musk continued to publicly play nice on Twitter (the platform now known as X), casually chatting for years about things like movies, space, and science, including repeatedly joking about Musk’s posts about using drugs like Ambien.

By 2019, it seemed like none of these disagreements had seriously disrupted the friendship. For example, at that time, Altman defended Musk against people rooting against Tesla’s success, writing that “betting against Elon is historically a mistake” and seemingly hyping Tesla by noting that “the best product usually wins.”

The niceties continued into 2021, when Musk publicly praised “nice work by OpenAI” integrating its coding model into GitHub’s AI tool. “It is hard to do useful things,” Musk said, drawing a salute emoji from Altman.

This was seemingly the end of Musk playing nice with OpenAI, though. Soon after ChatGPT’s release in November 2022, Musk allegedly began his attacks, seemingly willing to change his tactics on a whim.

First, he allegedly deemed OpenAI “irrelevant,” predicting it would “obviously” fail. Then, he started sounding alarms, joining a push for a six-month pause on generative AI development. Musk specifically claimed that any model “more advanced than OpenAI’s just-released GPT-4” posed “profound risks to society and humanity,” OpenAI alleged, seemingly angling to pause OpenAI’s development in particular.

However, in the meantime, Musk started “quietly building a competitor,” xAI, without announcing those efforts in March 2023, OpenAI alleged. Allegedly preparing to hobble OpenAI’s business after failing with the moratorium push, Musk had his personal lawyer contact OpenAI and demand “access to OpenAI’s confidential and commercially sensitive internal documents.”

Musk claimed the request was to “ensure OpenAI was not being taken advantage of or corrupted by Microsoft,” but two weeks later, he appeared on national TV, insinuating that OpenAI’s partnership with Microsoft was “improper,” OpenAI alleged.

Eventually, Musk announced xAI in July 2023, and that supposedly motivated Musk to deepen his harassment campaign, “this time using the courts and a parallel, carefully coordinated media campaign,” OpenAI said, as well as his own social media platform.

Musk “supercharges” X attacks

As OpenAI’s success mounted, the company alleged that Musk began specifically escalating his social media attacks on X, including broadcasting to his 224 million followers that “OpenAI is a house of cards” after filing his 2024 lawsuit.

Claiming he felt conned, Musk also pressured regulators to probe OpenAI, encouraging attorneys general of California and Delaware to “force” OpenAI, “without legal basis, to auction off its assets for the benefit of Musk and his associates,” OpenAI said.

By 2024, Musk had “supercharged” his X attacks, unleashing a “barrage of invective against the enterprise and its leadership, variously describing OpenAI as a ‘digital Frankenstein’s monster,’ ‘a lie,’ ‘evil,’ and ‘a total scam,'” OpenAI alleged.

These attacks allegedly culminated in Musk’s seemingly fake OpenAI takeover attempt in 2025, which OpenAI claimed a Musk ally, Ron Baron, admitted on CNBC was “pitched to him” as not an attempt to actually buy OpenAI’s assets, “but instead to obtain ‘discovery’ and get ‘behind the wall’ at OpenAI.”

All of this makes it harder for OpenAI to achieve the mission that Musk is supposedly suing to defend, OpenAI claimed. They told the court that “OpenAI has borne costs, and been harmed, by Musk’s abusive tactics and unrelenting efforts to mislead the public for his own benefit and to OpenAI’s detriment and the detriment of its mission.”

But Musk argues that it’s Altman who always wanted sole control over OpenAI, accusing his former partner of rampant self-dealing and “locking down the non-profit’s technology for personal gain” as soon as “OpenAI reached the threshold of commercially viable AI.” He further claimed OpenAI blocked xAI funding by reportedly asking investors to avoid backing rival startups like Anthropic or xAI.

Musk alleged:

Altman alone stands to make billions from the non-profit Musk co-founded and invested considerable money, time, recruiting efforts, and goodwill in furtherance of its stated mission. Altman’s scheme has now become clear: lure Musk with phony philanthropy; exploit his money, stature, and contacts to secure world-class AI scientists to develop leading technology; then feed the non-profit’s lucrative assets into an opaque profit engine and proceed to cash in as OpenAI and Microsoft monopolize the generative AI market.

For Altman, this week’s flare-up, where he finally took a hard jab back at Musk on X, may be a sign that Altman is done letting Musk control the narrative on X after years of somewhat tepidly pushing back on Musk’s more aggressive posts.

In 2022, for example, Musk warned after ChatGPT’s release that the chatbot was “scary good,” warning that “we are not far from dangerously strong AI.” Altman responded, cautiously agreeing that OpenAI was “dangerously” close to “strong AI in the sense of an AI that poses e.g. a huge cybersecurity risk” but “real” artificial general intelligence still seemed at least a decade off.

And Altman gave no response when Musk used Grok’s jokey programming to mock GPT-4 as “GPT-Snore” in 2024.

However, Altman seemingly got his back up after Musk mocked OpenAI’s $500 billion Stargate Project, which launched with the US government in January of this year. On X, Musk claimed that OpenAI doesn’t “actually have the money” for the project, which Altman said was “wrong,” while mockingly inviting Musk to visit the worksite.

“This is great for the country,” Altman said, retorting, “I realize what is great for the country isn’t always what’s optimal for your companies, but in your new role [at the Department of Government Efficiency], I hope you’ll mostly put [America] first.”

It remains to be seen whether Altman wants to keep trading jabs with Musk, who is generally a huge fan of trolling on X. But Altman seems more emboldened this week than he was back in January before Musk’s breakup with Donald Trump. Back then, even when he was willing to push back on Musk’s Stargate criticism by insulting Musk’s politics, he still took the time to let Musk know that he still cares.

“I genuinely respect your accomplishments and think you are the most inspiring entrepreneur of our time,” Altman told Musk in January.

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Sam Altman finally stood up to Elon Musk after years of X trolling Read More »

Musk threatens to sue Apple so Grok can get top App Store ranking

AI, App Store, Apple, Artificial Intelligence, chatbot, chatgpt, DeepSeek, elon musk, Google, grok, openai, play store, Policy, Twitter, X, xAI / Kris Guyer / August 12, 2025

After spending last week hyping Grok’s spicy new features, Elon Musk kicked off this week by threatening to sue Apple for supposedly gaming the App Store rankings to favor ChatGPT over Grok.

“Apple is behaving in a manner that makes it impossible for any AI company besides OpenAI to reach #1 in the App Store, which is an unequivocal antitrust violation,” Musk wrote on X, without providing any evidence. “xAI will take immediate legal action.”

In another post, Musk tagged Apple, asking, “Why do you refuse to put either X or Grok in your ‘Must Have’ section when X is the #1 news app in the world and Grok is #5 among all apps?”

“Are you playing politics?” Musk asked. “What gives? Inquiring minds want to know.”

Apple did not respond to the post and has not responded to Ars’ request to comment.

At the heart of Musk’s complaints is an OpenAI partnership that Apple announced last year, integrating ChatGPT into versions of its iPhone, iPad, and Mac operating systems.

Musk has alleged that this partnership incentivized Apple to boost ChatGPT rankings. OpenAI’s popular chatbot “currently holds the top spot in the App Store’s ‘Top Free Apps’ section for iPhones in the US,” Reuters noted, “while xAI’s Grok ranks fifth and Google’s Gemini chatbot sits at 57th.” Sensor Tower data shows ChatGPT similarly tops Google Play Store rankings.

While Musk seems insistent that ChatGPT is artificially locked in the lead, fact-checkers on X added a community note to his post. They confirmed that at least one other AI tool has somewhat recently unseated ChatGPT in the US rankings. Back in January, DeepSeek topped App Store charts and held the lead for days, ABC News reported.

OpenAI did not immediately respond to Ars’ request to comment on Musk’s allegations, but an OpenAI developer, Steven Heidel, did add a quip in response to one of Musk’s posts, writing, “Don’t forget to also blame Google for OpenAI being #1 on Android, and blame SimilarWeb for putting ChatGPT above X on the most-visited websites list, and blame….”

Musk threatens to sue Apple so Grok can get top App Store ranking Read More »

The GPT-5 rollout has been a big mess

AI, Biz & IT, chatgpt, chatgtp, large language models, machine learning, openai / Tim Belzer / August 12, 2025

It’s been less than a week since the launch of OpenAI’s new GPT-5 AI model, and the rollout hasn’t been a smooth one. So far, the release sparked one of the most intense user revolts in ChatGPT’s history, forcing CEO Sam Altman to make an unusual public apology and reverse key decisions.

At the heart of the controversy has been OpenAI’s decision to automatically remove access to all previous AI models in ChatGPT (approximately nine, depending on how you count them) when GPT-5 rolled out to user accounts. Unlike API users who receive advance notice of model deprecations, consumer ChatGPT users had no warning that their preferred models would disappear overnight, noted independent AI researcher Simon Willison in a blog post.

The problems started immediately after GPT-5’s August 7 debut. A Reddit thread titled “GPT-5 is horrible” quickly amassed over 4,000 comments filled with users expressing frustration over the new release. By August 8, social media platforms were flooded with complaints about performance issues, personality changes, and the forced removal of older models.

As of May 14, 2025, ChatGPT Pro users have access to 8 different main AI models, plus Deep Research. — Prior to the launch of GPT-5, ChatGPT Pro users could select between nine different AI models, including Deep Research. (This screenshot is from May 14, 2025, and OpenAI later replaced o1 pro with o3-pro.) Credit: Benj Edwards

Marketing professionals, researchers, and developers all shared examples of broken workflows on social media. “I’ve spent months building a system to work around OpenAI’s ridiculous limitations in prompts and memory issues,” wrote one Reddit user in the r/OpenAI subreddit. “And in less than 24 hours, they’ve made it useless.”

How could different AI language models break a workflow? The answer lies in how each one is trained in a different way and includes its own unique output style: The workflow breaks because users have developed sets of prompts that produce useful results optimized for each AI model.

For example, Willison wrote how different user groups had developed distinct workflows with specific AI models in ChatGPT over time, quoting one Reddit user who explained: “I know GPT-5 is designed to be stronger for complex reasoning, coding, and professional tasks, but not all of us need a pro coding model. Some of us rely on 4o for creative collaboration, emotional nuance, roleplay, and other long-form, high-context interactions.”

The GPT-5 rollout has been a big mess Read More »

ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy

AI, Artificial Intelligence, chatgpt, GPT-5, openai, Tech / Kris Guyer / August 10, 2025

Others are irked by how quickly they run up against usage limits on the free tier, which pushes them toward the Plus ($20) and Pro ($200) subscriptions. But running generative AI is hugely expensive, and OpenAI is hemorrhaging cash. It wouldn’t be surprising if the wide rollout of GPT-5 is aimed at increasing revenue. At the same time, OpenAI can point to AI evaluations that show GPT-5 is more intelligent than its predecessor.

RIP your AI buddy

OpenAI built ChatGPT to be a tool people want to use. It’s a fine line to walk—OpenAI has occasionally made its flagship AI too friendly and complimentary. Several months ago, the company had to roll back a change that made the bot into a sycophantic mess that would suck up to the user at every opportunity. That was a bridge too far, certainly, but many of the company’s users liked the generally friendly tone of the chatbot. They tuned the AI with custom prompts and built it into a personal companion. They’ve lost that with GPT-5.

No new AI — Naturally, ChatGPT users have turned to AI to express their frustration. Credit: /u/Responsible_Cow2236

There are reasons to be wary of this kind of parasocial attachment to artificial intelligence. As companies have tuned these systems to increase engagement, they prioritize outputs that make people feel good. This results in interactions that can reinforce delusions, eventually leading to serious mental health episodes and dangerous medical beliefs. It can be hard to understand for those of us who don’t spend our days having casual conversations with ChatGPT, but the Internet is teeming with folks who build their emotional lives around AI.

Is GPT-5 safer? Early impressions from frequent chatters decry the bot’s more corporate, less effusively creative tone. In short, a significant number of people don’t like the outputs as much. GPT-5 could be a more able analyst and worker, but it isn’t the digital companion people have come to expect, and in some cases, love. That might be good in the long term, both for users’ mental health and OpenAI’s bottom line, but there’s going to be an adjustment period for fans of GPT-4o.

Chatters who are unhappy with the more straightforward tone of GPT-5 can always go elsewhere. Elon Musk’s xAI has shown it is happy to push the envelope with Grok, featuring Taylor Swift nudes and AI waifus. Of course, Ars does not recommend you do that.

ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy Read More »

Apple brings OpenAI’s GPT-5 to iOS and macOS

AI, Apple, apple intelligence, chatgpt, GPT-4o, GPT-5, iOS, ios 26, iPadOS, ipados 26, IPhone, macOS, macos tahoe, openai / Mike M. / August 9, 2025

OpenAI’s GPT-5 model went live for most ChatGPT users this week, but lots of people use ChatGPT not through OpenAI’s interface but through other platforms or tools. One of the largest deployments is iOS, the iPhone operating system, which allows users to make certain queries via GPT-4o. It turns out those users won’t have to wait long for the latest model: Apple will switch to GPT-5 in iOS 26, iPadOS 26, and macOS Tahoe 26, according to 9to5Mac.

Apple has not officially announced when those OS updates will be released to users’ devices, but these major releases have typically been released in September in recent years.

The new model had already rolled out on some other platforms, like the coding tool GitHub Copilot via public preview, as well as Microsoft’s general-purpose Copilot.

GPT-5 purports to hallucinate 80 percent less and heralds a major rework of how OpenAI positions its models; for example, GPT-5 by default automatically chooses whether to use a reasoning-optimized model based on the nature of the user’s prompt. Free users will have to accept whatever the choice is, while paid ChatGPT accounts allow manually picking which model to use on a prompt-by-prompt basis. It’s unclear how that will work in iOS; will it stick to GPT-5’s non-reasoning mode all the time, or will it utilize GPT-5 “(with thinking)”? And if it supports the latter, will paid ChatGPT users be able to manually pick like they can in the ChatGPT app, or will they be limited to whatever ChatGPT deems appropriate, like free users? We don’t know yet.

Apple brings OpenAI’s GPT-5 to iOS and macOS Read More »