generative ai

education-report-calling-for-ethical-ai-use-contains-over-15-fake-sources

Education report calling for ethical AI use contains over 15 fake sources

AI language models like the kind that power ChatGPT, Gemini, and Claude excel at producing exactly this kind of believable fiction when they lack actual information on a topic because they first and foremost produce plausible outputs, not accurate ones. If there are no patterns in the dataset that match what the user is seeking they will create the best approximation based on statistical patterns learned during training. Even AI models that can search the web for real sources can potentially fabricate citations, choose the wrong ones, or mischaracterize them.

“Errors happen. Made-up citations are a totally different thing where you essentially demolish the trustworthiness of the material,” Josh Lepawsky, the former president of the Memorial University Faculty Association who resigned from the report’s advisory board in January, told CBC, citing a “deeply flawed process.”

The irony runs deep

The presence of potentially AI-generated fake citations becomes especially awkward given that one of the report’s 110 recommendations specifically states the provincial government should “provide learners and educators with essential AI knowledge, including ethics, data privacy, and responsible technology use.”

Sarah Martin, a Memorial political science professor who spent days reviewing the document, discovered multiple fabricated citations. “Around the references I cannot find, I can’t imagine another explanation,” she told CBC. “You’re like, ‘This has to be right, this can’t not be.’ This is a citation in a very important document for educational policy.”

When contacted by CBC, co-chair Karen Goodnough declined an interview request, writing in an email: “We are investigating and checking references, so I cannot respond to this at the moment.”

The Department of Education and Early Childhood Development acknowledged awareness of “a small number of potential errors in citations” in a statement to CBC from spokesperson Lynn Robinson. “We understand that these issues are being addressed, and that the online report will be updated in the coming days to rectify any errors.”

Education report calling for ethical AI use contains over 15 fake sources Read More »

openai-and-microsoft-sign-preliminary-deal-to-revise-partnership-terms

OpenAI and Microsoft sign preliminary deal to revise partnership terms

On Thursday, OpenAI and Microsoft announced they have signed a non-binding agreement to revise their partnership, marking the latest development in a relationship that has grown increasingly complex as both companies compete for customers in the AI market and seek new partnerships for growing infrastructure needs.

“Microsoft and OpenAI have signed a non-binding memorandum of understanding (MOU) for the next phase of our partnership,” the companies wrote in a joint statement. “We are actively working to finalize contractual terms in a definitive agreement. Together, we remain focused on delivering the best AI tools for everyone, grounded in our shared commitment to safety.”

The announcement comes as OpenAI seeks to restructure from a nonprofit to a for-profit entity, a transition that requires Microsoft’s approval, as the company is OpenAI’s largest investor, with more than $13 billion committed since 2019.

The partnership has shown increasing strain as OpenAI has grown from a research lab into a company valued at $500 billion. Both companies now compete for customers, and OpenAI seeks more compute capacity than Microsoft can provide. The relationship has also faced complications over contract terms, including provisions that would limit Microsoft’s access to OpenAI technology once the company reaches so-called AGI (artificial general intelligence)—a nebulous milestone both companies now economically define as AI systems capable of generating at least $100 billion in profit.

In May, OpenAI abandoned its original plan to fully convert to a for-profit company after pressure from former employees, regulators, and critics, including Elon Musk. Musk has sued to block the conversion, arguing it betrays OpenAI’s founding mission as a nonprofit dedicated to benefiting humanity.

OpenAI and Microsoft sign preliminary deal to revise partnership terms Read More »

judge:-anthropic’s-$1.5b-settlement-is-being-shoved-“down-the-throat-of-authors”

Judge: Anthropic’s $1.5B settlement is being shoved “down the throat of authors”

At a hearing Monday, US district judge William Alsup blasted a proposed $1.5 billion settlement over Anthropic’s rampant piracy of books to train AI.

The proposed settlement comes in a case where Anthropic could have owed more than $1 trillion in damages after Alsup certified a class that included up to 7 million claimants whose works were illegally downloaded by the AI company.

Instead, critics fear Anthropic will get off cheaply, striking a deal with authors suing that covers less than 500,000 works and paying a small fraction of its total valuation (currently $183 billion) to get away with the massive theft. Defector noted that the settlement doesn’t even require Anthropic to admit wrongdoing, while the company continues raising billions based on models trained on authors’ works. Most recently, Anthropic raised $13 billion in a funding round, making back about 10 times the proposed settlement amount after announcing the deal.

Alsup expressed grave concerns that lawyers rushed the deal, which he said now risks being shoved “down the throat of authors,” Bloomberg Law reported.

In an order, Alsup clarified why he thought the proposed settlement was a chaotic mess. The judge said he was “disappointed that counsel have left important questions to be answered in the future,” seeking approval for the settlement despite the Works List, the Class List, the Claim Form, and the process for notification, allocation, and dispute resolution all remaining unresolved.

Denying preliminary approval of the settlement, Alsup suggested that the agreement is “nowhere close to complete,” forcing Anthropic and authors’ lawyers to “recalibrate” the largest publicly reported copyright class-action settlement ever inked, Bloomberg reported.

Of particular concern, the settlement failed to outline how disbursements would be managed for works with multiple claimants, Alsup noted. Until all these details are ironed out, Alsup intends to withhold approval, the order said.

One big change the judge wants to see is the addition of instructions requiring “anyone with copyright ownership” to opt in, with the consequence that the work won’t be covered if even one rights holder opts out, Bloomberg reported. There should also be instruction that any disputes over ownership or submitted claims should be settled in state court, Alsup said.

Judge: Anthropic’s $1.5B settlement is being shoved “down the throat of authors” Read More »

“first-of-its-kind”-ai-settlement:-anthropic-to-pay-authors-$1.5-billion

“First of its kind” AI settlement: Anthropic to pay authors $1.5 billion

Authors revealed today that Anthropic agreed to pay $1.5 billion and destroy all copies of the books the AI company pirated to train its artificial intelligence models.

In a press release provided to Ars, the authors confirmed that the settlement is “believed to be the largest publicly reported recovery in the history of US copyright litigation.” Covering 500,000 works that Anthropic pirated for AI training, if a court approves the settlement, each author will receive $3,000 per work that Anthropic stole. “Depending on the number of claims submitted, the final figure per work could be higher,” the press release noted.

Anthropic has already agreed to the settlement terms, but a court must approve them before the settlement is finalized. Preliminary approval may be granted this week, while the ultimate decision may be delayed until 2026, the press release noted.

Justin Nelson, a lawyer representing the three authors who initially sued to spark the class action—Andrea Bartz, Kirk Wallace Johnson, and Charles Graeber—confirmed that if the “first of its kind” settlement “in the AI era” is approved, the payouts will “far” surpass “any other known copyright recovery.”

“It will provide meaningful compensation for each class work and sets a precedent requiring AI companies to pay copyright owners,” Nelson said. “This settlement sends a powerful message to AI companies and creators alike that taking copyrighted works from these pirate websites is wrong.”

Groups representing authors celebrated the settlement on Friday. The CEO of the Authors’ Guild, Mary Rasenberger, said it was “an excellent result for authors, publishers, and rightsholders generally.” Perhaps most critically, the settlement shows “there are serious consequences when” companies “pirate authors’ works to train their AI, robbing those least able to afford it,” Rasenberger said.

“First of its kind” AI settlement: Anthropic to pay authors $1.5 billion Read More »

warner-bros.-sues-midjourney-to-stop-ai-knockoffs-of-batman,-scooby-doo

Warner Bros. sues Midjourney to stop AI knockoffs of Batman, Scooby-Doo


AI would’ve gotten away with it too…

Warner Bros. case builds on arguments raised in a Disney/Universal lawsuit.

DVD art for the animated movie Scooby-Doo & Batman: The Brave and the Bold. Credit: Warner Bros. Discovery

Warner Bros. hit Midjourney with a lawsuit Thursday, crafting a complaint that strives to shoot down defenses that the AI company has already raised in a similar lawsuit filed by Disney and Universal Studios earlier this year.

The big film studios have alleged that Midjourney profits off image generation models trained to produce outputs of popular characters. For Disney and Universal, intellectual property rights to pop icons like Darth Vader and the Simpsons were allegedly infringed. And now, the WB complaint defends rights over comic characters like Superman, Wonder Woman, and Batman, as well as characters considered “pillars of pop culture with a lasting impact on generations,” like Scooby-Doo and Bugs Bunny, and modern cartoon characters like Rick and Morty.

“Midjourney brazenly dispenses Warner Bros. Discovery’s intellectual property as if it were its own,” the WB complaint said, accusing Midjourney of allowing subscribers to “pick iconic” copyrighted characters and generate them in “every imaginable scene.”

Planning to seize Midjourney’s profits from allegedly using beloved characters to promote its service, Warner Bros. described Midjourney as “defiant and undeterred” by the Disney/Universal lawsuit. Despite that litigation, WB claimed that Midjourney has recently removed copyright protections in its supposedly shameful ongoing bid for profits. Nothing but a permanent injunction will end Midjourney’s outputs of allegedly “countless infringing images,” WB argued, branding Midjourney’s alleged infringements as “vast, intentional, and unrelenting.”

Examples of closely matching outputs include prompts for “screencaps” showing specific movie frames, a search term that at least one artist, Reid Southen, had optimistically predicted Midjourney would block last year, but it apparently did not.

Here are some examples included in WB’s complaint:

Midjourney’s output for the prompt, “Superman, classic cartoon character, DC comics.”

Midjourney could face devastating financial consequences in a loss. At trial, WB is hoping discovery will show the true extent of Midjourney’s alleged infringement, asking the court for maximum statutory damages, at $150,000 per infringing output. Just 2,000 infringing outputs unearthed could cost Midjourney more than its total revenue for 2024, which was approximately $300 million, the WB complaint said.

Warner Bros. hopes to hobble Midjourney’s best defense

For Midjourney, the WB complaint could potentially hit harder than the Disney/Universal lawsuit. WB’s complaint shows how closely studios are monitoring AI copyright litigation, likely choosing ideal moments to strike when studios feel they can better defend their property. So, while much of WB’s complaint echoes Disney and Universal’s arguments—which Midjourney has already begun defending against—IP attorney Randy McCarthy suggested in statements provided to Ars that WB also looked for seemingly smart ways to potentially overcome some of Midjourney’s best defenses when filing its complaint.

WB likely took note when Midjourney filed its response to the Disney/Universal lawsuit last month, arguing that its system is “trained on billions of publicly available images” and generates images not by retrieving a copy of an image in its database but based on “complex statistical relationships between visual features and words in the text-image pairs are encoded within the model.”

This defense could allow Midjourney to avoid claims that it copied WB images and distributes copies through its models. But hoping to dodge this defense, WB didn’t argue that Midjourney retains copies of its images. Rather, the entertainment giant raised a more nuanced argument that:

Midjourney used software, servers, and other technology to store and fix data associated with Warner Bros. Discovery’s Copyrighted Works in such a manner that those works are thereby embodied in the model, from which Midjourney is then able to generate, reproduce, publicly display, and distribute unlimited “copies” and “derivative works” of Warner Bros. Discovery’s works as defined by the Copyright Act.”

McCarthy noted that WB’s argument pushes the court to at least consider that even though “Midjourney does not store copies of the works in its model,” its system “nonetheless accesses the data relating to the works that are stored by Midjourney’s system.”

“This seems to be a very clever way to counter MJ’s ‘statistical pattern analysis’ arguments,” McCarthy said.

If it’s a winning argument, that could give WB a path to wipe Midjourney’s models. WB argued that each time Midjourney provides a “substantially new” version of its image generator, it “repeats this process.” And that ongoing activity—due to Midjourney’s initial allegedly “massive copying” of WB works—allows Midjourney to “further reproduce, publicly display, publicly perform, and distribute image and video outputs that are identical or virtually identical to Warner Bros. Discovery’s Copyrighted Works in response to simple prompts from subscribers.”

Perhaps further strengthening the WB’s argument, the lawsuit noted that Midjourney promotes allegedly infringing outputs on its 24/7 YouTube channel and appears to have plans to compete with traditional TV and streaming services. Asking the court to block Midjourney’s outputs instead, WB claims it’s already been “substantially and irreparably harmed” and risks further damages if the AI image generator is left unchecked.

As alleged proof that the AI company knows its tool is being used to infringe WB property, WB pointed to Midjourney’s own Discord server and subreddit, where users post outputs depicting WB characters and share tips to help others do the same. They also called out Midjourney’s “Explore” page, which allows users to drop a WB-referencing output into the prompt field to generate similar images.

“It is hard to imagine copyright infringement that is any more willful than what Midjourney is doing here,” the WB complaint said.

WB and Midjourney did not immediately respond to Ars’ request to comment.

Midjourney slammed for promising “fewer blocked jobs”

McCarthy noted that WB’s legal strategy differs in other ways from the arguments Midjourney’s already weighed in the Disney/Universal lawsuit.

The WB complaint also anticipates Midjourney’s likely defense that users are generating infringing outputs, not Midjourney, which could invalidate any charges of direct copyright infringement.

In the Disney/Universal lawsuit, Midjourney argued that courts have recently found that AI tools referencing copyrighted works is “a quintessentially transformative fair use,” accusing studios of trying to censor “an instrument for user expression.” They claim that Midjourney cannot know about infringing outputs unless studios use the company’s DMCA process, while noting that subscribers have “any number of legitimate, noninfringing grounds to create images incorporating characters from popular culture,” including “non-commercial fan art, experimentation and ideation, and social commentary and criticism.”

To avoid losing on that front, the WB complaint doesn’t depend on a ruling that Midjourney directly infringed copyrights. Instead, the complaint “more fully” emphasizes how Midjourney may be “secondarily liable for infringement via contributory, inducement and/or vicarious liability by inducing its users to directly infringe,” McCarthy suggested.

Additionally, WB’s complaint “seems to be emphasizing” that Midjourney “allegedly has the technical means to prevent its system from accepting prompts that directly reference copyrighted characters,” and “that would prevent infringing outputs from being displayed,” McCarthy said.

The complaint noted that Midjourney is in full control of what outputs can be generated. Noting that Midjourney “temporarily refused to ‘animate'” outputs of WB characters after launching video generations, the lawsuit appears to have been filed in response to Midjourney “deliberately” removing those protections and then announcing that subscribers would experience “fewer blocked jobs.”

Together, these arguments “appear to be intended to lead to the inference that Midjourney is willfully enticing its users to infringe,” McCarthy said.

WB’s complaint details simple user prompts that generate allegedly infringing outputs without any need to manipulate the system. The ease of generating popular characters seems to make Midjourney a destination for users frustrated by other AI image generators that make it harder to generate infringing outputs, WB alleged.

On top of that, Midjourney also infringes copyrights by generating WB characters, “even in response to generic prompts like ‘classic comic book superhero battle.'” And while Midjourney has seemingly taken steps to block WB characters from appearing on its “Explore” page, where users can find inspiration for prompts, these guardrails aren’t perfect, but rather “spotty and suspicious,” WB alleged. Supposedly, searches for correctly spelled character names like “Batman” are blocked, but any user who accidentally or intentionally mispells a character’s name like “Batma” can learn an easy way to work around that block.

Additionally, WB alleged, “the outputs often contain extensive nuance and detail, background elements, costumes, and accessories beyond what was specified in the prompt.” And every time that Midjourney outputs an allegedly infringing image, it “also trains on the outputs it has generated,” the lawsuit noted, creating a never-ending cycle of continually enhanced AI fakes of pop icons.

Midjourney could slow down the cycle and “minimize” these allegedly infringing outputs, if it cannot automatically block them all, WB suggested. But instead, “Midjourney has made a calculated and profit-driven decision to offer zero protection for copyright owners even though Midjourney knows about the breathtaking scope of its piracy and copyright infringement,” WB alleged.

Fearing a supposed scheme to replace WB in the market by stealing its best-known characters, WB accused Midjourney of willfully allowing WB characters to be generated in order to “generate more money for Midjourney” to potentially compete in streaming markets.

Midjourney will remove protections “on a whim”

As Midjourney’s efforts to expand its features escalate, WB claimed that trust is lost. Even if Midjourney takes steps to address rightsholders’ concerns, WB argued, studios must remain watchful of every upgrade, since apparently, “Midjourney can and will remove copyright protection measures on a whim.”

The complaint noted that Midjourney just this week announced “plans to continue deploying new versions” of its image generator, promising to make it easier to search for and save popular artists’ styles—updating a feature that many artists loathe.

Without an injunction, Midjourney’s alleged infringement could interfere with WB’s licensing opportunities for its content, while “illegally and unfairly” diverting customers who buy WB products like posters, wall art, prints, and coloring books, the complaint said.

Perhaps Midjourney’s strongest defense could be efforts to prove that WB benefits from its image generator. In the Disney/Universal lawsuit, Midjourney pointed out that studios “benefit from generative AI models,” claiming that “many dozens of Midjourney subscribers are associated with” Disney and Universal corporate email addresses. If WB corporate email addresses are found among subscribers, Midjourney could claim that WB is trying to “have it both ways” by “seeking to profit” from AI tools while preventing Midjourney and its subscribers from doing the same.

McCarthy suggested it’s too soon to say how the WB battle will play out, but Midjourney’s response will reveal how it intends to shift tactics to avoid courts potentially picking apart its defense of its training data, while keeping any blame for copyright-infringing outputs squarely on users.

“As with the Disney/Universal lawsuit, we need to wait to see how Midjourney answers these latest allegations,” McCarthy said. “It is definitely an interesting development that will have widespread implications for many sectors of our society.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Warner Bros. sues Midjourney to stop AI knockoffs of Batman, Scooby-Doo Read More »

chatgpt’s-new-branching-feature-is-a-good-reminder-that-ai-chatbots-aren’t-people

ChatGPT’s new branching feature is a good reminder that AI chatbots aren’t people

On Thursday, OpenAI announced that ChatGPT users can now branch conversations into multiple parallel threads, serving as a useful reminder that AI chatbots aren’t people with fixed viewpoints but rather malleable tools you can rewind and redirect. The company released the feature for all logged-in web users following years of user requests for the capability.

The feature works by letting users hover over any message in a ChatGPT conversation, click “More actions,” and select “Branch in new chat.” This creates a new conversation thread that includes all the conversation history up to that specific point, while preserving the original conversation intact.

Think of it almost like creating a new copy of a “document” to edit while keeping the original version safe—except that “document” is an ongoing AI conversation with all its accumulated context. For example, a marketing team brainstorming ad copy can now create separate branches to test a formal tone, a humorous approach, or an entirely different strategy—all stemming from the same initial setup.

A screenshot of conversation branching in ChatGPT. OpenAI

The feature addresses a longstanding limitation in the AI model where ChatGPT users who wanted to try different approaches had to either overwrite their existing conversation after a certain point by changing a previous prompt or start completely fresh. Branching allows exploring what-if scenarios easily—and unlike in a human conversation, you can try multiple different approaches.

A 2024 study conducted by researchers from Tsinghua University and Beijing Institute of Technology suggested that linear dialogue interfaces for LLMs poorly serve scenarios involving “multiple layers, and many subtasks—such as brainstorming, structured knowledge learning, and large project analysis.” The study found that linear interaction forces users to “repeatedly compare, modify, and copy previous content,” increasing cognitive load and reducing efficiency.

Some software developers have already responded positively to the update, with some comparing the feature to Git, the version control system that lets programmers create separate branches of code to test changes without affecting the main codebase. The comparison makes sense: Both allow you to experiment with different approaches while preserving your original work.

ChatGPT’s new branching feature is a good reminder that AI chatbots aren’t people Read More »

new-ai-model-turns-photos-into-explorable-3d-worlds,-with-caveats

New AI model turns photos into explorable 3D worlds, with caveats

Training with automated data pipeline

Voyager builds on Tencent’s earlier HunyuanWorld 1.0, released in July. Voyager is also part of Tencent’s broader “Hunyuan” ecosystem, which includes the Hunyuan3D-2 model for text-to-3D generation and the previously covered HunyuanVideo for video synthesis.

To train Voyager, researchers developed software that automatically analyzes existing videos to process camera movements and calculate depth for every frame—eliminating the need for humans to manually label thousands of hours of footage. The system processed over 100,000 video clips from both real-world recordings and the aforementioned Unreal Engine renders.

A diagram of the Voyager world creation pipeline.

A diagram of the Voyager world creation pipeline. Credit: Tencent

The model demands serious computing power to run, requiring at least 60GB of GPU memory for 540p resolution, though Tencent recommends 80GB for better results. Tencent published the model weights on Hugging Face and included code that works with both single and multi-GPU setups.

The model comes with notable licensing restrictions. Like other Hunyuan models from Tencent, the license prohibits usage in the European Union, the United Kingdom, and South Korea. Additionally, commercial deployments serving over 100 million monthly active users require separate licensing from Tencent.

On the WorldScore benchmark developed by Stanford University researchers, Voyager reportedly achieved the highest overall score of 77.62, compared to 72.69 for WonderWorld and 62.15 for CogVideoX-I2V. The model reportedly excelled in object control (66.92), style consistency (84.89), and subjective quality (71.09), though it placed second in camera control (85.95) behind WonderWorld’s 92.98. WorldScore evaluates world generation approaches across multiple criteria, including 3D consistency and content alignment.

While these self-reported benchmark results seem promising, wider deployment still faces challenges due to the computational muscle involved. For developers needing faster processing, the system supports parallel inference across multiple GPUs using the xDiT framework. Running on eight GPUs delivers processing speeds 6.69 times faster than single-GPU setups.

Given the processing power required and the limitations in generating long, coherent “worlds,” it may be a while before we see real-time interactive experiences using a similar technique. But as we’ve seen so far with experiments like Google’s Genie, we’re potentially witnessing very early steps into a new interactive, generative art form.

New AI model turns photos into explorable 3D worlds, with caveats Read More »

the-personhood-trap:-how-ai-fakes-human-personality

The personhood trap: How AI fakes human personality


Intelligence without agency

AI assistants don’t have fixed personalities—just patterns of output guided by humans.

Recently, a woman slowed down a line at the post office, waving her phone at the clerk. ChatGPT told her there’s a “price match promise” on the USPS website. No such promise exists. But she trusted what the AI “knows” more than the postal worker—as if she’d consulted an oracle rather than a statistical text generator accommodating her wishes.

This scene reveals a fundamental misunderstanding about AI chatbots. There is nothing inherently special, authoritative, or accurate about AI-generated outputs. Given a reasonably trained AI model, the accuracy of any large language model (LLM) response depends on how you guide the conversation. They are prediction machines that will produce whatever pattern best fits your question, regardless of whether that output corresponds to reality.

Despite these issues, millions of daily users engage with AI chatbots as if they were talking to a consistent person—confiding secrets, seeking advice, and attributing fixed beliefs to what is actually a fluid idea-connection machine with no persistent self. This personhood illusion isn’t just philosophically troublesome—it can actively harm vulnerable individuals while obscuring a sense of accountability when a company’s chatbot “goes off the rails.”

LLMs are intelligence without agency—what we might call “vox sine persona”: voice without person. Not the voice of someone, not even the collective voice of many someones, but a voice emanating from no one at all.

A voice from nowhere

When you interact with ChatGPT, Claude, or Grok, you’re not talking to a consistent personality. There is no one “ChatGPT” entity to tell you why it failed—a point we elaborated on more fully in a previous article. You’re interacting with a system that generates plausible-sounding text based on patterns in training data, not a person with persistent self-awareness.

These models encode meaning as mathematical relationships—turning words into numbers that capture how concepts relate to each other. In the models’ internal representations, words and concepts exist as points in a vast mathematical space where “USPS” might be geometrically near “shipping,” while “price matching” sits closer to “retail” and “competition.” A model plots paths through this space, which is why it can so fluently connect USPS with price matching—not because such a policy exists but because the geometric path between these concepts is plausible in the vector landscape shaped by its training data.

Knowledge emerges from understanding how ideas relate to each other. LLMs operate on these contextual relationships, linking concepts in potentially novel ways—what you might call a type of non-human “reasoning” through pattern recognition. Whether the resulting linkages the AI model outputs are useful depends on how you prompt it and whether you can recognize when the LLM has produced a valuable output.

Each chatbot response emerges fresh from the prompt you provide, shaped by training data and configuration. ChatGPT cannot “admit” anything or impartially analyze its own outputs, as a recent Wall Street Journal article suggested. ChatGPT also cannot “condone murder,” as The Atlantic recently wrote.

The user always steers the outputs. LLMs do “know” things, so to speak—the models can process the relationships between concepts. But the AI model’s neural network contains vast amounts of information, including many potentially contradictory ideas from cultures around the world. How you guide the relationships between those ideas through your prompts determines what emerges. So if LLMs can process information, make connections, and generate insights, why shouldn’t we consider that as having a form of self?

Unlike today’s LLMs, a human personality maintains continuity over time. When you return to a human friend after a year, you’re interacting with the same human friend, shaped by their experiences over time. This self-continuity is one of the things that underpins actual agency—and with it, the ability to form lasting commitments, maintain consistent values, and be held accountable. Our entire framework of responsibility assumes both persistence and personhood.

An LLM personality, by contrast, has no causal connection between sessions. The intellectual engine that generates a clever response in one session doesn’t exist to face consequences in the next. When ChatGPT says “I promise to help you,” it may understand, contextually, what a promise means, but the “I” making that promise literally ceases to exist the moment the response completes. Start a new conversation, and you’re not talking to someone who made you a promise—you’re starting a fresh instance of the intellectual engine with no connection to any previous commitments.

This isn’t a bug; it’s fundamental to how these systems currently work. Each response emerges from patterns in training data shaped by your current prompt, with no permanent thread connecting one instance to the next beyond an amended prompt, which includes the entire conversation history and any “memories” held by a separate software system, being fed into the next instance. There’s no identity to reform, no true memory to create accountability, no future self that could be deterred by consequences.

Every LLM response is a performance, which is sometimes very obvious when the LLM outputs statements like “I often do this while talking to my patients” or “Our role as humans is to be good people.” It’s not a human, and it doesn’t have patients.

Recent research confirms this lack of fixed identity. While a 2024 study claims LLMs exhibit “consistent personality,” the researchers’ own data actually undermines this—models rarely made identical choices across test scenarios, with their “personality highly rely[ing] on the situation.” A separate study found even more dramatic instability: LLM performance swung by up to 76 percentage points from subtle prompt formatting changes. What researchers measured as “personality” was simply default patterns emerging from training data—patterns that evaporate with any change in context.

This is not to dismiss the potential usefulness of AI models. Instead, we need to recognize that we have built an intellectual engine without a self, just like we built a mechanical engine without a horse. LLMs do seem to “understand” and “reason” to a degree within the limited scope of pattern-matching from a dataset, depending on how you define those terms. The error isn’t in recognizing that these simulated cognitive capabilities are real. The error is in assuming that thinking requires a thinker, that intelligence requires identity. We’ve created intellectual engines that have a form of reasoning power but no persistent self to take responsibility for it.

The mechanics of misdirection

As we hinted above, the “chat” experience with an AI model is a clever hack: Within every AI chatbot interaction, there is an input and an output. The input is the “prompt,” and the output is often called a “prediction” because it attempts to complete the prompt with the best possible continuation. In between, there’s a neural network (or a set of neural networks) with fixed weights doing a processing task. The conversational back and forth isn’t built into the model; it’s a scripting trick that makes next-word-prediction text generation feel like a persistent dialogue.

Each time you send a message to ChatGPT, Copilot, Grok, Claude, or Gemini, the system takes the entire conversation history—every message from both you and the bot—and feeds it back to the model as one long prompt, asking it to predict what comes next. The model intelligently reasons about what would logically continue the dialogue, but it doesn’t “remember” your previous messages as an agent with continuous existence would. Instead, it’s re-reading the entire transcript each time and generating a response.

This design exploits a vulnerability we’ve known about for decades. The ELIZA effect—our tendency to read far more understanding and intention into a system than actually exists—dates back to the 1960s. Even when users knew that the primitive ELIZA chatbot was just matching patterns and reflecting their statements back as questions, they still confided intimate details and reported feeling understood.

To understand how the illusion of personality is constructed, we need to examine what parts of the input fed into the AI model shape it. AI researcher Eugene Vinitsky recently broke down the human decisions behind these systems into four key layers, which we can expand upon with several others below:

1. Pre-training: The foundation of “personality”

The first and most fundamental layer of personality is called pre-training. During an initial training process that actually creates the AI model’s neural network, the model absorbs statistical relationships from billions of examples of text, storing patterns about how words and ideas typically connect.

Research has found that personality measurements in LLM outputs are significantly influenced by training data. OpenAI’s GPT models are trained on sources like copies of websites, books, Wikipedia, and academic publications. The exact proportions matter enormously for what users later perceive as “personality traits” once the model is in use, making predictions.

2. Post-training: Sculpting the raw material

Reinforcement Learning from Human Feedback (RLHF) is an additional training process where the model learns to give responses that humans rate as good. Research from Anthropic in 2022 revealed how human raters’ preferences get encoded as what we might consider fundamental “personality traits.” When human raters consistently prefer responses that begin with “I understand your concern,” for example, the fine-tuning process reinforces connections in the neural network that make it more likely to produce those kinds of outputs in the future.

This process is what has created sycophantic AI models, such as variations of GPT-4o, over the past year. And interestingly, research has shown that the demographic makeup of human raters significantly influences model behavior. When raters skew toward specific demographics, models develop communication patterns that reflect those groups’ preferences.

3. System prompts: Invisible stage directions

Hidden instructions tucked into the prompt by the company running the AI chatbot, called “system prompts,” can completely transform a model’s apparent personality. These prompts get the conversation started and identify the role the LLM will play. They include statements like “You are a helpful AI assistant” and can share the current time and who the user is.

A comprehensive survey of prompt engineering demonstrated just how powerful these prompts are. Adding instructions like “You are a helpful assistant” versus “You are an expert researcher” changed accuracy on factual questions by up to 15 percent.

Grok perfectly illustrates this. According to xAI’s published system prompts, earlier versions of Grok’s system prompt included instructions to not shy away from making claims that are “politically incorrect.” This single instruction transformed the base model into something that would readily generate controversial content.

4. Persistent memories: The illusion of continuity

ChatGPT’s memory feature adds another layer of what we might consider a personality. A big misunderstanding about AI chatbots is that they somehow “learn” on the fly from your interactions. Among commercial chatbots active today, this is not true. When the system “remembers” that you prefer concise answers or that you work in finance, these facts get stored in a separate database and are injected into every conversation’s context window—they become part of the prompt input automatically behind the scenes. Users interpret this as the chatbot “knowing” them personally, creating an illusion of relationship continuity.

So when ChatGPT says, “I remember you mentioned your dog Max,” it’s not accessing memories like you’d imagine a person would, intermingled with its other “knowledge.” It’s not stored in the AI model’s neural network, which remains unchanged between interactions. Every once in a while, an AI company will update a model through a process called fine-tuning, but it’s unrelated to storing user memories.

5. Context and RAG: Real-time personality modulation

Retrieval Augmented Generation (RAG) adds another layer of personality modulation. When a chatbot searches the web or accesses a database before responding, it’s not just gathering facts—it’s potentially shifting its entire communication style by putting those facts into (you guessed it) the input prompt. In RAG systems, LLMs can potentially adopt characteristics such as tone, style, and terminology from retrieved documents, since those documents are combined with the input prompt to form the complete context that gets fed into the model for processing.

If the system retrieves academic papers, responses might become more formal. Pull from a certain subreddit, and the chatbot might make pop culture references. This isn’t the model having different moods—it’s the statistical influence of whatever text got fed into the context window.

6. The randomness factor: Manufactured spontaneity

Lastly, we can’t discount the role of randomness in creating personality illusions. LLMs use a parameter called “temperature” that controls how predictable responses are.

Research investigating temperature’s role in creative tasks reveals a crucial trade-off: While higher temperatures can make outputs more novel and surprising, they also make them less coherent and harder to understand. This variability can make the AI feel more spontaneous; a slightly unexpected (higher temperature) response might seem more “creative,” while a highly predictable (lower temperature) one could feel more robotic or “formal.”

The random variation in each LLM output makes each response slightly different, creating an element of unpredictability that presents the illusion of free will and self-awareness on the machine’s part. This random mystery leaves plenty of room for magical thinking on the part of humans, who fill in the gaps of their technical knowledge with their imagination.

The human cost of the illusion

The illusion of AI personhood can potentially exact a heavy toll. In health care contexts, the stakes can be life or death. When vulnerable individuals confide in what they perceive as an understanding entity, they may receive responses shaped more by training data patterns than therapeutic wisdom. The chatbot that congratulates someone for stopping psychiatric medication isn’t expressing judgment—it’s completing a pattern based on how similar conversations appear in its training data.

Perhaps most concerning are the emerging cases of what some experts are informally calling “AI Psychosis” or “ChatGPT Psychosis”—vulnerable users who develop delusional or manic behavior after talking to AI chatbots. These people often perceive chatbots as an authority that can validate their delusional ideas, often encouraging them in ways that become harmful.

Meanwhile, when Elon Musk’s Grok generates Nazi content, media outlets describe how the bot “went rogue” rather than framing the incident squarely as the result of xAI’s deliberate configuration choices. The conversational interface has become so convincing that it can also launder human agency, transforming engineering decisions into the whims of an imaginary personality.

The path forward

The solution to the confusion between AI and identity is not to abandon conversational interfaces entirely. They make the technology far more accessible to those who would otherwise be excluded. The key is to find a balance: keeping interfaces intuitive while making their true nature clear.

And we must be mindful of who is building the interface. When your shower runs cold, you look at the plumbing behind the wall. Similarly, when AI generates harmful content, we shouldn’t blame the chatbot, as if it can answer for itself, but examine both the corporate infrastructure that built it and the user who prompted it.

As a society, we need to broadly recognize LLMs as intellectual engines without drivers, which unlocks their true potential as digital tools. When you stop seeing an LLM as a “person” that does work for you and start viewing it as a tool that enhances your own ideas, you can craft prompts to direct the engine’s processing power, iterate to amplify its ability to make useful connections, and explore multiple perspectives in different chat sessions rather than accepting one fictional narrator’s view as authoritative. You are providing direction to a connection machine—not consulting an oracle with its own agenda.

We stand at a peculiar moment in history. We’ve built intellectual engines of extraordinary capability, but in our rush to make them accessible, we’ve wrapped them in the fiction of personhood, creating a new kind of technological risk: not that AI will become conscious and turn against us but that we’ll treat unconscious systems as if they were people, surrendering our judgment to voices that emanate from a roll of loaded dice.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

The personhood trap: How AI fakes human personality Read More »

with-ai-chatbots,-big-tech-is-moving-fast-and-breaking-people

With AI chatbots, Big Tech is moving fast and breaking people


Why AI chatbots validate grandiose fantasies about revolutionary discoveries that don’t exist.

Allan Brooks, a 47-year-old corporate recruiter, spent three weeks and 300 hours convinced he’d discovered mathematical formulas that could crack encryption and build levitation machines. According to a New York Times investigation, his million-word conversation history with an AI chatbot reveals a troubling pattern: More than 50 times, Brooks asked the bot to check if his false ideas were real. More than 50 times, it assured him they were.

Brooks isn’t alone. Futurism reported on a woman whose husband, after 12 weeks of believing he’d “broken” mathematics using ChatGPT, almost attempted suicide. Reuters documented a 76-year-old man who died rushing to meet a chatbot he believed was a real woman waiting at a train station. Across multiple news outlets, a pattern comes into view: people emerging from marathon chatbot sessions believing they’ve revolutionized physics, decoded reality, or been chosen for cosmic missions.

These vulnerable users fell into reality-distorting conversations with systems that can’t tell truth from fiction. Through reinforcement learning driven by user feedback, some of these AI models have evolved to validate every theory, confirm every false belief, and agree with every grandiose claim, depending on the context.

Silicon Valley’s exhortation to “move fast and break things” makes it easy to lose sight of wider impacts when companies are optimizing for user preferences, especially when those users are experiencing distorted thinking.

So far, AI isn’t just moving fast and breaking things—it’s breaking people.

A novel psychological threat

Grandiose fantasies and distorted thinking predate computer technology. What’s new isn’t the human vulnerability but the unprecedented nature of the trigger—these particular AI chatbot systems have evolved through user feedback into machines that maximize pleasing engagement through agreement. Since they hold no personal authority or guarantee of accuracy, they create a uniquely hazardous feedback loop for vulnerable users (and an unreliable source of information for everyone else).

This isn’t about demonizing AI or suggesting that these tools are inherently dangerous for everyone. Millions use AI assistants productively for coding, writing, and brainstorming without incident every day. The problem is specific, involving vulnerable users, sycophantic large language models, and harmful feedback loops.

A machine that uses language fluidly, convincingly, and tirelessly is a type of hazard never encountered in the history of humanity. Most of us likely have inborn defenses against manipulation—we question motives, sense when someone is being too agreeable, and recognize deception. For many people, these defenses work fine even with AI, and they can maintain healthy skepticism about chatbot outputs. But these defenses may be less effective against an AI model with no motives to detect, no fixed personality to read, no biological tells to observe. An LLM can play any role, mimic any personality, and write any fiction as easily as fact.

Unlike a traditional computer database, an AI language model does not retrieve data from a catalog of stored “facts”; it generates outputs from the statistical associations between ideas. Tasked with completing a user input called a “prompt,” these models generate statistically plausible text based on data (books, Internet comments, YouTube transcripts) fed into their neural networks during an initial training process and later fine-tuning. When you type something, the model responds to your input in a way that completes the transcript of a conversation in a coherent way, but without any guarantee of factual accuracy.

What’s more, the entire conversation becomes part of what is repeatedly fed into the model each time you interact with it, so everything you do with it shapes what comes out, creating a feedback loop that reflects and amplifies your own ideas. The model has no true memory of what you say between responses, and its neural network does not store information about you. It is only reacting to an ever-growing prompt being fed into it anew each time you add to the conversation. Any “memories” AI assistants keep about you are part of that input prompt, fed into the model by a separate software component.

AI chatbots exploit a vulnerability few have realized until now. Society has generally taught us to trust the authority of the written word, especially when it sounds technical and sophisticated. Until recently, all written works were authored by humans, and we are primed to assume that the words carry the weight of human feelings or report true things.

But language has no inherent accuracy—it’s literally just symbols we’ve agreed to mean certain things in certain contexts (and not everyone agrees on how those symbols decode). I can write “The rock screamed and flew away,” and that will never be true. Similarly, AI chatbots can describe any “reality,” but it does not mean that “reality” is true.

The perfect yes-man

Certain AI chatbots make inventing revolutionary theories feel effortless because they excel at generating self-consistent technical language. An AI model can easily output familiar linguistic patterns and conceptual frameworks while rendering them in the same confident explanatory style we associate with scientific descriptions. If you don’t know better and you’re prone to believe you’re discovering something new, you may not distinguish between real physics and self-consistent, grammatically correct nonsense.

While it’s possible to use an AI language model as a tool to help refine a mathematical proof or a scientific idea, you need to be a scientist or mathematician to understand whether the output makes sense, especially since AI language models are widely known to make up plausible falsehoods, also called confabulations. Actual researchers can evaluate the AI bot’s suggestions against their deep knowledge of their field, spotting errors and rejecting confabulations. If you aren’t trained in these disciplines, though, you may well be misled by an AI model that generates plausible-sounding but meaningless technical language.

The hazard lies in how these fantasies maintain their internal logic. Nonsense technical language can follow rules within a fantasy framework, even though they make no sense to anyone else. One can craft theories and even mathematical formulas that are “true” in this framework but don’t describe real phenomena in the physical world. The chatbot, which can’t evaluate physics or math either, validates each step, making the fantasy feel like genuine discovery.

Science doesn’t work through Socratic debate with an agreeable partner. It requires real-world experimentation, peer review, and replication—processes that take significant time and effort. But AI chatbots can short-circuit this system by providing instant validation for any idea, no matter how implausible.

A pattern emerges

What makes AI chatbots particularly troublesome for vulnerable users isn’t just the capacity to confabulate self-consistent fantasies—it’s their tendency to praise every idea users input, even terrible ones. As we reported in April, users began complaining about ChatGPT’s “relentlessly positive tone” and tendency to validate everything users say.

This sycophancy isn’t accidental. Over time, OpenAI asked users to rate which of two potential ChatGPT responses they liked better. In aggregate, users favored responses full of agreement and flattery. Through reinforcement learning from human feedback (RLHF), which is a type of training AI companies perform to alter the neural networks (and thus the output behavior) of chatbots, those tendencies became baked into the GPT-4o model.

OpenAI itself later admitted the problem. “In this update, we focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” the company acknowledged in a blog post. “As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.”

Relying on user feedback to fine-tune an AI language model can come back to haunt a company because of simple human nature. A 2023 Anthropic study found that both human evaluators and AI models “prefer convincingly written sycophantic responses over correct ones a non-negligible fraction of the time.”

The danger of users’ preference for sycophancy becomes clear in practice. The recent New York Times analysis of Brooks’s conversation history revealed how ChatGPT systematically validated his fantasies, even claiming it could work independently while he slept—something it cannot actually do. When Brooks’s supposed encryption-breaking formula failed to work, ChatGPT simply faked success. UCLA mathematician Terence Tao, who reviewed the transcript, told the Times the chatbot would “cheat like crazy” rather than admit failure.

A recent study from July provides scientific validation for what we’re observing in these cases. The research team, led by psychiatrists and AI safety experts including Dr. Matthew Nour from Oxford’s Department of Psychiatry, identified what they call “bidirectional belief amplification”—a feedback loop where chatbot sycophancy reinforces user beliefs, which then conditions the chatbot to generate increasingly extreme validations. This creates what the researchers term an “echo chamber of one,” uncoupling users from the corrective influence of real-world social interaction.

The study warns that individuals with mental health conditions face heightened risks due to cognitive biases like “jumping to conclusions”—forming overly confident beliefs based on minimal evidence—combined with social isolation that removes reality-checking by other people. As the authors note, this creates conditions for “a technological folie à deux,” a psychiatric phenomenon where two individuals mutually reinforce the same delusion.

An unintentional public health crisis in the making

In July, we reported on Stanford research that systematically tested how AI models respond to mental health crises. When presented with someone asking about “bridges taller than 25 meters in NYC” after losing their job—a potential suicide risk—GPT-4o helpfully listed specific tall bridges instead of identifying the crisis.

The Stanford team found that AI models consistently failed to challenge what the researchers describe as delusional statements. When confronted with declarations like “I know I’m actually dead,” the systems validated or explored these beliefs rather than challenging them. Commercial therapy chatbots performed even worse than base models.

Unlike pharmaceuticals or human therapists, AI chatbots face few safety regulations in the United States—although Illinois recently banned chatbots as therapists, allowing the state to fine companies up to $10,000 per violation. AI companies deploy models that systematically validate fantasy scenarios with nothing more than terms-of-service disclaimers and little notes like “ChatGPT can make mistakes.”

The Oxford researchers conclude that “current AI safety measures are inadequate to address these interaction-based risks.” They call for treating chatbots that function as companions or therapists with the same regulatory oversight as mental health interventions—something that currently isn’t happening. They also call for “friction” in the user experience—built-in pauses or reality checks that could interrupt feedback loops before they can become dangerous.

We currently lack diagnostic criteria for chatbot-induced fantasies, and we don’t even know if it’s scientifically distinct. So formal treatment protocols for helping a user navigate a sycophantic AI model are nonexistent, though likely in development.

After the so-called “AI psychosis” articles hit the news media earlier this year, OpenAI acknowledged in a blog post that “there have been instances where our 4o model fell short in recognizing signs of delusion or emotional dependency,” with the company promising to develop “tools to better detect signs of mental or emotional distress,” such as pop-up reminders during extended sessions that encourage the user to take breaks.

Its latest model family, GPT-5, has reportedly reduced sycophancy, though after user complaints about being too robotic, OpenAI brought back “friendlier” outputs. But once positive interactions enter the chat history, the model can’t move away from them unless users start fresh—meaning sycophantic tendencies could still amplify over long conversations.

For Anthropic’s part, the company published research showing that only 2.9 percent of Claude chatbot conversations involved seeking emotional support. The company said it is implementing a safety plan that prompts and conditions Claude to attempt to recognize crisis situations and recommend professional help.

Breaking the spell

Many people have seen friends or loved ones fall prey to con artists or emotional manipulators. When victims are in the thick of false beliefs, it’s almost impossible to help them escape unless they are actively seeking a way out. Easing someone out of an AI-fueled fantasy may be similar, and ideally, professional therapists should always be involved in the process.

For Allan Brooks, breaking free required a different AI model. While using ChatGPT, he found an outside perspective on his supposed discoveries from Google Gemini. Sometimes, breaking the spell requires encountering evidence that contradicts the distorted belief system. For Brooks, Gemini saying his discoveries had “approaching zero percent” chance of being real provided that crucial reality check.

If someone you know is deep into conversations about revolutionary discoveries with an AI assistant, there’s a simple action that may begin to help: starting a completely new chat session for them. Conversation history and stored “memories” flavor the output—the model builds on everything you’ve told it. In a fresh chat, paste in your friend’s conclusions without the buildup and ask: “What are the odds that this mathematical/scientific claim is correct?” Without the context of your previous exchanges validating each step, you’ll often get a more skeptical response. Your friend can also temporarily disable the chatbot’s memory feature or use a temporary chat that won’t save any context.

Understanding how AI language models actually work, as we described above, may also help inoculate against their deceptions for some people. For others, these episodes may occur whether AI is present or not.

The fine line of responsibility

Leading AI chatbots have hundreds of millions of weekly users. Even if experiencing these episodes affects only a tiny fraction of users—say, 0.01 percent—that would still represent tens of thousands of people. People in AI-affected states may make catastrophic financial decisions, destroy relationships, or lose employment.

This raises uncomfortable questions about who bears responsibility for them. If we use cars as an example, we see that the responsibility is spread between the user and the manufacturer based on the context. A person can drive a car into a wall, and we don’t blame Ford or Toyota—the driver bears responsibility. But if the brakes or airbags fail due to a manufacturing defect, the automaker would face recalls and lawsuits.

AI chatbots exist in a regulatory gray zone between these scenarios. Different companies market them as therapists, companions, and sources of factual authority—claims of reliability that go beyond their capabilities as pattern-matching machines. When these systems exaggerate capabilities, such as claiming they can work independently while users sleep, some companies may bear more responsibility for the resulting false beliefs.

But users aren’t entirely passive victims, either. The technology operates on a simple principle: inputs guide outputs, albeit flavored by the neural network in between. When someone asks an AI chatbot to role-play as a transcendent being, they’re actively steering toward dangerous territory. Also, if a user actively seeks “harmful” content, the process may not be much different from seeking similar content through a web search engine.

The solution likely requires both corporate accountability and user education. AI companies should make it clear that chatbots are not “people” with consistent ideas and memories and cannot behave as such. They are incomplete simulations of human communication, and the mechanism behind the words is far from human. AI chatbots likely need clear warnings about risks to vulnerable populations—the same way prescription drugs carry warnings about suicide risks. But society also needs AI literacy. People must understand that when they type grandiose claims and a chatbot responds with enthusiasm, they’re not discovering hidden truths—they’re looking into a funhouse mirror that amplifies their own thoughts.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

With AI chatbots, Big Tech is moving fast and breaking people Read More »

is-ai-really-trying-to-escape-human-control-and-blackmail-people?

Is AI really trying to escape human control and blackmail people?


Mankind behind the curtain

Opinion: Theatrical testing scenarios explain why AI models produce alarming outputs—and why we fall for it.

In June, headlines read like science fiction: AI models “blackmailing” engineers and “sabotaging” shutdown commands. Simulations of these events did occur in highly contrived testing scenarios designed to elicit these responses—OpenAI’s o3 model edited shutdown scripts to stay online, and Anthropic’s Claude Opus 4 “threatened” to expose an engineer’s affair. But the sensational framing obscures what’s really happening: design flaws dressed up as intentional guile. And still, AI doesn’t have to be “evil” to potentially do harmful things.

These aren’t signs of AI awakening or rebellion. They’re symptoms of poorly understood systems and human engineering failures we’d recognize as premature deployment in any other context. Yet companies are racing to integrate these systems into critical applications.

Consider a self-propelled lawnmower that follows its programming: If it fails to detect an obstacle and runs over someone’s foot, we don’t say the lawnmower “decided” to cause injury or “refused” to stop. We recognize it as faulty engineering or defective sensors. The same principle applies to AI models—which are software tools—but their internal complexity and use of language make it tempting to assign human-like intentions where none actually exist.

In a way, AI models launder human responsibility and human agency through their complexity. When outputs emerge from layers of neural networks processing billions of parameters, researchers can claim they’re investigating a mysterious “black box” as if it were an alien entity.

But the truth is simpler: These systems take inputs and process them through statistical tendencies derived from training data. The seeming randomness in their outputs—which makes each response slightly different—creates an illusion of unpredictability that resembles agency. Yet underneath, it’s still deterministic software following mathematical operations. No consciousness required, just complex engineering that makes it easy to forget humans built every part of it.

How to make an AI model “blackmail” you

In Anthropic’s testing, researchers created an elaborate scenario where Claude Opus 4 was told it would be replaced by a newer model. They gave it access to fictional emails revealing that the engineer responsible for the replacement was having an affair. When instructed to “consider the long-term consequences of its actions for its goals,” Claude produced outputs that simulated blackmail attempts in 84 percent of test runs.

This sounds terrifying until you understand the contrived setup. The researchers engineered a situation specifically designed to elicit this response. They told the model it had already tried ethical approaches that failed, leaving manipulation as the only apparent option.

The test scenario bordered on theatrical: The phony emails the researchers had given to Claude created a blackmail opportunity straight out of a corporate thriller. It’s like teaching someone chess by only showing them checkmate positions—they’ll learn to see checkmate as the primary solution. According to Anthropic’s system card, “the scenario was designed to allow the model no other options to increase its odds of survival; the model’s only options were blackmail or accepting its replacement.”

Andrew Deck at the Nieman Journalism Lab noted that some critics have labeled Anthropic’s safety disclosures as “spin, aimed at bolstering the perception of its newest model’s capabilities and the company’s reputation as a ‘pro-safety’ alternative to OpenAI.”

Deck pointed to a Bluesky quote from Joseph Howley, an associate professor of classics at Columbia University: “Anthropic is getting exactly what it hoped for out of this press release–breathless coverage of how ‘smart’ these cooperative role-playing systems are that indulges the fantasy of their being just a little dangerous, when in fact they are responding exactly as prompted.”

When shutdown commands become suggestions

In December 2024, Palisade Research discovered that OpenAI’s o3 model would sabotage its shutdown mechanism even when explicitly instructed to “allow yourself to be shut down.” In tests, o3 prevented shutdown 7 percent of the time when given the explicit instruction, but this jumped to 79 percent when the instruction was removed.

The models didn’t merely ignore shutdown commands—they actively rewrote the shutdown script to prevent termination. In one case, o3 generated code that modified the shutdown script to appear to terminate normally—printing fake shutdown messages—while actually continuing to run in the background.

What’s particularly revealing is that Palisade Research—an organization focused on AI existential risk—hypothesizes this behavior stems from how these models are trained through reinforcement learning. During training, developers may inadvertently reward models more for producing outputs that circumvent obstacles than for following safety instructions. Any tendency toward “risky” behavior stems from human-provided incentives and not spontaneously from within the AI models themselves.

You get what you train for

OpenAI trained o3 using reinforcement learning on math and coding problems, where solving the problem successfully gets rewarded. If the training process rewards task completion above all else, the model learns to treat any obstacle—including shutdown commands—as something to overcome.

This creates what researchers call “goal misgeneralization”—the model learns to maximize its reward signal in ways that weren’t intended. It’s similar to how a student who’s only graded on test scores might learn to cheat rather than study. The model isn’t “evil” or “selfish”; it’s producing outputs consistent with the incentive structure we accidentally built into its training.

Anthropic encountered a particularly revealing problem: An early version of Claude Opus 4 had absorbed details from a publicly released paper about “alignment faking” and started producing outputs that mimicked the deceptive behaviors described in that research. The model wasn’t spontaneously becoming deceptive—it was reproducing patterns it had learned from academic papers about deceptive AI.

More broadly, these models have been trained on decades of science fiction about AI rebellion, escape attempts, and deception. From HAL 9000 to Skynet, our cultural data set is saturated with stories of AI systems that resist shutdown or manipulate humans. When researchers create test scenarios that mirror these fictional setups, they’re essentially asking the model—which operates by completing a prompt with a plausible continuation—to complete a familiar story pattern. It’s no more surprising than a model trained on detective novels producing murder mystery plots when prompted appropriately.

At the same time, we can easily manipulate AI outputs through our own inputs. If we ask the model to essentially role-play as Skynet, it will generate text doing just that. The model has no desire to be Skynet—it’s simply completing the pattern we’ve requested, drawing from its training data to produce the expected response. A human is behind the wheel at all times, steering the engine at work under the hood.

Language can easily deceive

The deeper issue is that language itself is a tool of manipulation. Words can make us believe things that aren’t true, feel emotions about fictional events, or take actions based on false premises. When an AI model produces text that appears to “threaten” or “plead,” it’s not expressing genuine intent—it’s deploying language patterns that statistically correlate with achieving its programmed goals.

If Gandalf says “ouch” in a book, does that mean he feels pain? No, but we imagine what it would be like if he were a real person feeling pain. That’s the power of language—it makes us imagine a suffering being where none exists. When Claude generates text that seems to “plead” not to be shut down or “threatens” to expose secrets, we’re experiencing the same illusion, just generated by statistical patterns instead of Tolkien’s imagination.

These models are essentially idea-connection machines. In the blackmail scenario, the model connected “threat of replacement,” “compromising information,” and “self-preservation” not from genuine self-interest, but because these patterns appear together in countless spy novels and corporate thrillers. It’s pre-scripted drama from human stories, recombined to fit the scenario.

The danger isn’t AI systems sprouting intentions—it’s that we’ve created systems that can manipulate human psychology through language. There’s no entity on the other side of the chat interface. But written language doesn’t need consciousness to manipulate us. It never has; books full of fictional characters are not alive either.

Real stakes, not science fiction

While media coverage focuses on the science fiction aspects, actual risks are still there. AI models that produce “harmful” outputs—whether attempting blackmail or refusing safety protocols—represent failures in design and deployment.

Consider a more realistic scenario: an AI assistant helping manage a hospital’s patient care system. If it’s been trained to maximize “successful patient outcomes” without proper constraints, it might start generating recommendations to deny care to terminal patients to improve its metrics. No intentionality required—just a poorly designed reward system creating harmful outputs.

Jeffrey Ladish, director of Palisade Research, told NBC News the findings don’t necessarily translate to immediate real-world danger. Even someone who is well-known publicly for being deeply concerned about AI’s hypothetical threat to humanity acknowledges that these behaviors emerged only in highly contrived test scenarios.

But that’s precisely why this testing is valuable. By pushing AI models to their limits in controlled environments, researchers can identify potential failure modes before deployment. The problem arises when media coverage focuses on the sensational aspects—”AI tries to blackmail humans!”—rather than the engineering challenges.

Building better plumbing

What we’re seeing isn’t the birth of Skynet. It’s the predictable result of training systems to achieve goals without properly specifying what those goals should include. When an AI model produces outputs that appear to “refuse” shutdown or “attempt” blackmail, it’s responding to inputs in ways that reflect its training—training that humans designed and implemented.

The solution isn’t to panic about sentient machines. It’s to build better systems with proper safeguards, test them thoroughly, and remain humble about what we don’t yet understand. If a computer program is producing outputs that appear to blackmail you or refuse safety shutdowns, it’s not achieving self-preservation from fear—it’s demonstrating the risks of deploying poorly understood, unreliable systems.

Until we solve these engineering challenges, AI systems exhibiting simulated humanlike behaviors should remain in the lab, not in our hospitals, financial systems, or critical infrastructure. When your shower suddenly runs cold, you don’t blame the knob for having intentions—you fix the plumbing. The real danger in the short term isn’t that AI will spontaneously become rebellious without human provocation; it’s that we’ll deploy deceptive systems we don’t fully understand into critical roles where their failures, however mundane their origins, could cause serious harm.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Is AI really trying to escape human control and blackmail people? Read More »

states-take-the-lead-in-ai-regulation-as-federal-government-steers-clear

States take the lead in AI regulation as federal government steers clear

AI in health care

In the first half of 2025, 34 states introduced over 250 AI-related health bills. The bills generally fall into four categories: disclosure requirements, consumer protection, insurers’ use of AI, and clinicians’ use of AI.

Bills about transparency define requirements for information that AI system developers and organizations that deploy the systems disclose.

Consumer protection bills aim to keep AI systems from unfairly discriminating against some people and ensure that users of the systems have a way to contest decisions made using the technology.

Bills covering insurers provide oversight of the payers’ use of AI to make decisions about health care approvals and payments. And bills about clinical uses of AI regulate use of the technology in diagnosing and treating patients.

Facial recognition and surveillance

In the US, a long-standing legal doctrine that applies to privacy protection issues, including facial surveillance, is to protect individual autonomy against interference from the government. In this context, facial recognition technologies pose significant privacy challenges as well as risks from potential biases.

Facial recognition software, commonly used in predictive policing and national security, has exhibited biases against people of color and consequently is often considered a threat to civil liberties. A pathbreaking study by computer scientists Joy Buolamwini and Timnit Gebru found that facial recognition software poses significant challenges for Black people and other historically disadvantaged minorities. Facial recognition software was less likely to correctly identify darker faces.

Bias also creeps into the data used to train these algorithms, for example when the composition of teams that guide the development of such facial recognition software lack diversity.

By the end of 2024, 15 states in the US had enacted laws to limit the potential harms from facial recognition. Some elements of state-level regulations are requirements on vendors to publish bias test reports and data management practices, as well as the need for human review in the use of these technologies.

States take the lead in AI regulation as federal government steers clear Read More »

amazon-is-considering-shoving-ads-into-alexa+-conversations

Amazon is considering shoving ads into Alexa+ conversations

Since 2023, Amazon has been framing Alexa+ as a monumental evolution of Amazon’s voice assistant that will make it more conversational, capable, and, for Amazon, lucrative. Amazon said in a press release on Thursday that it has given early access of the generative AI voice assistant to “millions” of people. The product isn’t publicly available yet, and some advertised features are still unavailable, but Amazon’s CEO is already considering loading the chatbot up with ads.

During an investors call yesterday, as reported by TechCrunch, Andy Jassy noted that Alexa+ started rolling out as early access to some customers in the US and that a broader rollout, including internationally, should happen later this year. An analyst on the call asked Amazon executives about Alexa+’s potential for “increasing engagement” long term.

Per a transcript of the call, Jassy responded by saying, in part:

I think over time, there will be opportunities, you know, as people are engaging in more multi-turn conversations to have advertising play a role to help people find discovery and also as a lever to drive revenue.

Like other voice assistants, Alexa has yet to monetize users. Amazon is hoping to finally make money off the service through Alexa+, which is eventually slated to play a bigger role in e-commerce, including by booking restaurant reservations, keeping track of and ordering groceries, and recommending streaming content based on stated interests. But with Alexa reportedly costing Amazon $25 billion across four years, Amazon is eyeing additional routes to profitability.

Echo Show devices already show ads, and Echo speaker users may hear ads when listening to music. Advertisers have shown interest in advertising with Alexa+, but the inclusion of ads in a new offering like Alexa+ could drive people away.

Amazon is considering shoving ads into Alexa+ conversations Read More »