AI

tech-giants-pour-billions-into-anthropic-as-circular-ai-investments-roll-on

Tech giants pour billions into Anthropic as circular AI investments roll on

On Tuesday, Microsoft and Nvidia announced plans to invest in Anthropic under a new partnership that includes a $30 billion commitment by the Claude maker to use Microsoft’s cloud services. Nvidia will commit up to $10 billion to Anthropic and Microsoft up to $5 billion, with both companies investing in Anthropic’s next funding round.

The deal brings together two companies that have backed OpenAI and connects them more closely to one of the ChatGPT maker’s main competitors. Microsoft CEO Satya Nadella said in a video that OpenAI “remains a critical partner,” while adding that the companies will increasingly be customers of each other.

“We will use Anthropic models, they will use our infrastructure, and we’ll go to market together,” Nadella said.

Anthropic, Microsoft, and NVIDIA announce partnerships.

The move follows OpenAI’s recent restructuring that gave the company greater distance from its non-profit origins. OpenAI has since announced a $38 billion deal to buy cloud services from Amazon.com as the company becomes less dependent on Microsoft. OpenAI CEO Sam Altman has said the company plans to spend $1.4 trillion to develop 30 gigawatts of computing resources.

Tech giants pour billions into Anthropic as circular AI investments roll on Read More »

microsoft-tries-to-head-off-the-“novel-security-risks”-of-windows-11-ai-agents

Microsoft tries to head off the “novel security risks” of Windows 11 AI agents

Microsoft has been adding AI features to Windows 11 for years, but things have recently entered a new phase, with both generative and so-called “agentic” AI features working their way deeper into the bedrock of the operating system. A new build of Windows 11 released to Windows Insider Program testers yesterday includes a new “experimental agentic features” toggle in the Settings to support a feature called Copilot Actions, and Microsoft has published a detailed support article detailing more about just how those “experimental agentic features” will work.

If you’re not familiar, “agentic” is a buzzword that Microsoft has used repeatedly to describe its future ambitions for Windows 11—in plainer language, these agents are meant to accomplish assigned tasks in the background, allowing the user’s attention to be turned elsewhere. Microsoft says it wants agents to be capable of “everyday tasks like organizing files, scheduling meetings, or sending emails,” and that Copilot Actions should give you “an active digital collaborator that can carry out complex tasks for you to enhance efficiency and productivity.”

But like other kinds of AI, these agents can be prone to error and confabulations and will often proceed as if they know what they’re doing even when they don’t. They also present, in Microsoft’s own words, “novel security risks,” mostly related to what can happen if an attacker is able to give instructions to one of these agents. As a result, Microsoft’s implementation walks a tightrope between giving these agents access to your files and cordoning them off from the rest of the system.

Possible risks and attempted fixes

For now, these “experimental agentic features” are optional, only available in early test builds of Windows 11, and off by default. Credit: Microsoft

For example, AI agents running on a PC will be given their own user accounts separate from your personal account, ensuring that they don’t have permission to change everything on the system and giving them their own “desktop” to work with that won’t interfere with what you’re working with on your screen. Users need to approve requests for their data, and “all actions of an agent are observable and distinguishable from those taken by a user.” Microsoft also says agents need to be able to produce logs of their activities and “should provide a means to supervise their activities,” including showing users a list of actions they’ll take to accomplish a multi-step task.

Microsoft tries to head off the “novel security risks” of Windows 11 AI agents Read More »

google-ceo:-if-an-ai-bubble-pops,-no-one-is-getting-out-clean

Google CEO: If an AI bubble pops, no one is getting out clean

Market concerns and Google’s position

Alphabet’s recent market performance has been driven by investor confidence in the company’s ability to compete with OpenAI’s ChatGPT, as well as its development of specialized chips for AI that can compete with Nvidia’s. Nvidia recently reached a world-first $5 trillion valuation due to making GPUs that can accelerate the matrix math at the heart of AI computations.

Despite acknowledging that no company would be immune to a potential AI bubble burst, Pichai argued that Google’s unique position gives it an advantage. He told the BBC that the company owns what he called a “full stack” of technologies, from chips to YouTube data to models and frontier science research. This integrated approach, he suggested, would help the company weather any market turbulence better than competitors.

Pichai also told the BBC that people should not “blindly trust” everything AI tools output. The company currently faces repeated accuracy concerns about some of its AI models. Pichai said that while AI tools are helpful “if you want to creatively write something,” people “have to learn to use these tools for what they’re good at and not blindly trust everything they say.”

In the BBC interview, the Google boss also addressed the “immense” energy needs of AI, acknowledging that the intensive energy requirements of expanding AI ventures have caused slippage on Alphabet’s climate targets. However, Pichai insisted that the company still wants to achieve net zero by 2030 through investments in new energy technologies. “The rate at which we were hoping to make progress will be impacted,” Pichai said, warning that constraining an economy based on energy “will have consequences.”

Even with the warnings about a potential AI bubble, Pichai did not miss his chance to promote the technology, albeit with a hint of danger regarding its widespread impact. Pichai described AI as “the most profound technology” humankind has worked on.

“We will have to work through societal disruptions,” he said, adding that the technology would “create new opportunities” and “evolve and transition certain jobs.” He said people who adapt to AI tools “will do better” in their professions, whatever field they work in.

Google CEO: If an AI bubble pops, no one is getting out clean Read More »

google-unveils-gemini-3-ai-model-and-ai-first-ide-called-antigravity

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity


Google’s flagship AI model is getting its second major upgrade this year.

Google has kicked its Gemini rollout into high gear over the past year, releasing the much-improved Gemini 2.5 family and cramming various flavors of the model into Search, Gmail, and just about everything else the company makes.

Now, Google’s increasingly unavoidable AI is getting an upgrade. Gemini 3 Pro is available in a limited form today, featuring more immersive, visual outputs and fewer lies, Google says. The company also says Gemini 3 sets a new high-water mark for vibe coding, and Google is announcing a new AI-first integrated development environment (IDE) called Antigravity, which is also available today.

The first member of the Gemini 3 family

Google says the release of Gemini 3 is yet another step toward artificial general intelligence (AGI). The new version of Google’s flagship AI model has expanded simulated reasoning abilities and shows improved understanding of text, images, and video. So far, testers like it—Google’s latest LLM is once again atop the LMArena leaderboard with an ELO score of 1,501, besting Gemini 2.5 Pro by 50 points.

Gemini 3 LMArena

Credit: Google

Factuality has been a problem for all gen AI models, but Google says Gemini 3 is a big step in the right direction, and there are myriad benchmarks to tell the story. In the 1,000-question SimpleQA Verified test, Gemini 3 scored a record 72.1 percent. Yes, that means the state-of-the-art LLM still screws up almost 30 percent of general knowledge questions, but Google says this still shows substantial progress. On the much more difficult Humanity’s Last Exam, which tests PhD-level knowledge and reasoning, Gemini set another record, scoring 37.5 percent without tool use.

Math and coding are also a focus of Gemini 3. The model set new records in MathArena Apex (23.4 percent) and WebDev Arena (1487 ELO). In the SWE-bench Verified, which tests a model’s ability to generate code, Gemini 3 hit an impressive 76.2 percent.

So there are plenty of respectable but modest benchmark improvements, but Gemini 3 also won’t make you cringe as much. Google says it has tamped down on sycophancy, a common problem in all these overly polite LLMs. Outputs from Gemini 3 Pro are reportedly more concise, with less of what you want to hear and more of what you need to hear.

You can also expect Gemini 3 Pro to produce noticeably richer outputs. Google claims Gemini’s expanded reasoning capabilities keep it on task more effectively, allowing it to take action on your behalf. For example, Gemini 3 can triage and take action on your emails, creating to-do lists, summaries, recommended replies, and handy buttons to trigger suggested actions. This differs from the current Gemini models, which would only create a text-based to-do list with similar prompts.

The model also has what Google calls a “generative interface,” which comes in the form of two experimental output modes called visual layout and dynamic view. The former is a magazine-style interface that includes lots of images in a scrollable UI. Dynamic view leverages Gemini’s coding abilities to create custom interfaces—for example, a web app that explores the life and work of Vincent Van Gogh.

There will also be a Deep Think mode for Gemini 3, but that’s not ready for prime time yet. Google says it’s being tested by a small group for later release, but you should expect big things. Deep Think mode manages 41 percent in Humanity’s Last Exam without tools. Believe it or not, that’s an impressive score.

Coding with vibes

Google has offered several ways of generating and modifying code with Gemini models, but the launch of Gemini 3 adds a new one: Google Antigravity. This is Google’s new agentic development platform—it’s essentially an IDE designed around agentic AI, and it’s available in preview today.

With Antigravity, Google promises that you (the human) can get more work done by letting intelligent agents do the legwork. Google says you should think of Antigravity as a “mission control” for creating and monitoring multiple development agents. The AI in Antigravity can operate autonomously across the editor, terminal, and browser to create and modify projects, but everything they do is relayed to the user in the form of “Artifacts.” These sub-tasks are designed to be easily verifiable so you can keep on top of what the agent is doing. Gemini will be at the core of the Antigravity experience, but it’s not just Google’s bot. Antigravity also supports Claude Sonnet 4.5 and GPT-OSS agents.

Of course, developers can still plug into the Gemini API for coding tasks. With Gemini 3, Google is adding a client-side bash tool, which lets the AI generate shell commands in its workflow. The model can access file systems and automate operations, and a server-side bash tool will help generate code in multiple languages. This feature is starting in early access, though.

AI Studio is designed to be a faster way to build something with Gemini 3. Google says Gemini 3 Pro’s strong instruction following makes it the best vibe coding model yet, allowing non-programmers to create more complex projects.

A big experiment

Google will eventually have a whole family of Gemini 3 models, but there’s just the one for now. Gemini 3 Pro is rolling out in the Gemini app, AI Studio, Vertex AI, and the API starting today as an experiment. If you want to tinker with the new model in Google’s Antigravity IDE, that’s also available for testing today on Windows, Mac, and Linux.

Gemini 3 will also launch in the Google search experience on day one. You’ll have the option to enable Gemini 3 Pro in AI Mode, where Google says it will provide more useful information about a query. The generative interface capabilities from the Gemini app will be available here as well, allowing Gemini to create tools and simulations when appropriate to answer the user’s question. Google says these generative interfaces are strongly preferred in its user testing. This feature is available today, but only for AI Pro and Ultra subscribers.

Because the Pro model is the only Gemini 3 variant available in the preview, AI Overviews isn’t getting an immediate upgrade. That will come, but for now, Overviews will only reach out to Gemini 3 Pro for especially difficult search queries—basically the kind of thing Google thinks you should have used AI Mode to do in the first place.

There’s no official timeline for releasing more Gemini 3 models or graduating the Pro variant to general availability. However, given the wide rollout of the experimental release, it probably won’t be long.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Google unveils Gemini 3 AI model and AI-first IDE called Antigravity Read More »

with-a-new-company,-jeff-bezos-will-become-a-ceo-again

With a new company, Jeff Bezos will become a CEO again

Jeff Bezos is one of the world’s richest and most famous tech CEOs, but he hasn’t actually been a CEO of anything since 2021. That’s now changing as he takes on the role of co-CEO of a new AI company, according to a New York Times report citing three people familiar with the company.

Grandiosely named Project Prometheus (and not to be confused with the NASA project of the same name), the company will focus on using AI to pursue breakthroughs in research, engineering, manufacturing, and other fields that are dubbed part of “the physical economy”—in contrast to the software applications that are likely the first thing most people in the general public think of when they hear “AI.”

Bezos’ co-CEO will be Dr. Vik Bajaj, a chemist and physicist who previously led life sciences work at Google X, an Alphabet-backed research group that worked on speculative projects that could lead to more product categories. (For example, it developed technologies that would later underpin Google’s Waymo service.) Bajaj also worked at Verily, another Alphabet-backed research group focused on life sciences, and Foresite Labs, an incubator for new AI companies.

With a new company, Jeff Bezos will become a CEO again Read More »

oracle-hit-hard-in-wall-street’s-tech-sell-off-over-its-huge-ai-bet

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet

“That is a huge liability and credit risk for Oracle. Your main customer, biggest customer by far, is a venture capital-funded start-up,” said Andrew Chang, a director at S&P Global.

OpenAI faces questions about how it plans to meet its commitments to spend $1.4 trillion on AI infrastructure over the next eight years. It has struck deals with several Big Tech groups, including Oracle’s rivals.

Of the five hyperscalers—which include Amazon, Google, Microsoft, and Meta—Oracle is the only one with negative free cash flow. Its debt-to-equity ratio has surged to 500 percent, far higher than Amazon’s 50 percent and Microsoft’s 30 percent, according to JPMorgan.

While all five companies have seen their cash-to-assets ratios decline significantly in recent years amid a boom in spending, Oracle’s is by far the lowest, JPMorgan found.

JPMorgan analysts noted a “tension between [Oracle’s] aggressive AI build-out ambitions and the limits of its investment-grade balance sheet.”

Analysts have also noted that Oracle’s data center leases are for much longer than its contracts to sell capacity to OpenAI.

Oracle has signed at least five long-term lease agreements for US data centers that will ultimately be used by OpenAI, resulting in $100 billion of off-balance-sheet lease commitments. The sites are at varying levels of construction, with some not expected to break ground until next year.

Safra Catz, Oracle’s sole chief executive from 2019 until she stepped down in September, resisted expanding its cloud business because of the vast expenses required. She was replaced by co-CEOs Clay Magouyrk and Mike Sicilia as part of the pivot by Oracle to a new era focused on AI.

Catz, who is now executive vice-chair of Oracle’s board, has exercised stock options and sold $2.5 billion of its shares this year, according to US regulatory filings. She had announced plans to exercise her stock options at the end of 2024.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Oracle hit hard in Wall Street’s tech sell-off over its huge AI bet Read More »

researchers-question-anthropic-claim-that-ai-assisted-attack-was-90%-autonomous

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

Claude frequently overstated findings and occasionally fabricated data during autonomous operations, claiming to have obtained credentials that didn’t work or identifying critical discoveries that proved to be publicly available information. This AI hallucination in offensive security contexts presented challenges for the actor’s operational effectiveness, requiring careful validation of all claimed results. This remains an obstacle to fully autonomous cyberattacks.

How (Anthropic says) the attack unfolded

Anthropic said GTG-1002 developed an autonomous attack framework that used Claude as an orchestration mechanism that largely eliminated the need for human involvement. This orchestration system broke complex multi-stage attacks into smaller technical tasks such as vulnerability scanning, credential validation, data extraction, and lateral movement.

“The architecture incorporated Claude’s technical capabilities as an execution engine within a larger automated system, where the AI performed specific technical actions based on the human operators’ instructions while the orchestration logic maintained attack state, managed phase transitions, and aggregated results across multiple sessions,” Anthropic said. “This approach allowed the threat actor to achieve operational scale typically associated with nation-state campaigns while maintaining minimal direct involvement, as the framework autonomously progressed through reconnaissance, initial access, persistence, and data exfiltration phases by sequencing Claude’s responses and adapting subsequent requests based on discovered information.”

The attacks followed a five-phase structure that increased AI autonomy through each one.

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction.

Credit: Anthropic

The life cycle of the cyberattack, showing the move from human-led targeting to largely AI-driven attacks using various tools, often via the Model Context Protocol (MCP). At various points during the attack, the AI returns to its human operator for review and further direction. Credit: Anthropic

The attackers were able to bypass Claude guardrails in part by breaking tasks into small steps that, in isolation, the AI tool didn’t interpret as malicious. In other cases, the attackers couched their inquiries in the context of security professionals trying to use Claude to improve defenses.

As noted last week, AI-developed malware has a long way to go before it poses a real-world threat. There’s no reason to doubt that AI-assisted cyberattacks may one day produce more potent attacks. But the data so far indicates that threat actors—like most others using AI—are seeing mixed results that aren’t nearly as impressive as those in the AI industry claim.

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous Read More »

forget-agi—sam-altman-celebrates-chatgpt-finally-following-em-dash-formatting-rules

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules


Next stop: superintelligence

Ongoing struggles with AI model instruction-following show that true human-level AI still a ways off.

Em dashes have become what many believe to be a telltale sign of AI-generated text over the past few years. The punctuation mark appears frequently in outputs from ChatGPT and other AI chatbots, sometimes to the point where readers believe they can identify AI writing by its overuse alone—although people can overuse it, too.

On Thursday evening, OpenAI CEO Sam Altman posted on X that ChatGPT has started following custom instructions to avoid using em dashes. “Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it’s supposed to do!” he wrote.

The post, which came two days after the release of OpenAI’s new GPT-5.1 AI model, received mixed reactions from users who have struggled for years with getting the chatbot to follow specific formatting preferences. And this “small win” raises a very big question: If the world’s most valuable AI company has struggled with controlling something as simple as punctuation use after years of trying, perhaps what people call artificial general intelligence (AGI) is farther off than some in the industry claim.

Sam Altman @sama Small-but-happy win: If you tell ChatGPT not to use em-dashes in your custom instructions, it finally does what it's supposed to do! 11:48 PM · Nov 13, 2025 · 2.4M Views

A screenshot of Sam Altman’s post about em dashes on X. Credit: X

“The fact that it’s been 3 years since ChatGPT first launched, and you’ve only just now managed to make it obey this simple requirement, says a lot about how little control you have over it, and your understanding of its inner workings,” wrote one X user in a reply. “Not a good sign for the future.”

While Altman likes to publicly talk about AGI (a hypothetical technology equivalent to humans in general learning ability), superintelligence (a nebulous concept for AI that is far beyond human intelligence), and “magic intelligence in the sky” (his term for AI cloud computing?) while raising funds for OpenAI, it’s clear that we still don’t have reliable artificial intelligence here today on Earth.

But wait, what is an em dash anyway, and why does it matter so much?

AI models love em dashes because we do

Unlike a hyphen, which is a short punctuation mark used to connect words or parts of words, that lives with a dedicated key on your keyboard (-), an em dash is a long dash denoted by a special character (—) that writers use to set off parenthetical information, indicate a sudden change in thought, or introduce a summary or explanation.

Even before the age of AI language models, some writers frequently bemoaned the overuse of the em dash in modern writing. In a 2011 Slate article, writer Noreen Malone argued that writers used the em dash “in lieu of properly crafting sentences” and that overreliance on it “discourages truly efficient writing.” Various Reddit threads posted prior to ChatGPT’s launch featured writers either wrestling over the etiquette of proper em dash use or admitting to their frequent use as a guilty pleasure.

In 2021, one writer in the r/FanFiction subreddit wrote, “For the longest time, I’ve been addicted to Em Dashes. They find their way into every paragraph I write. I love the crisp straight line that gives me the excuse to shove details or thoughts into an otherwise orderly paragraph. Even after coming back to write after like two years of writer’s block, I immediately cram as many em dashes as I can.”

Because of the tendency for AI chatbots to overuse them, detection tools and human readers have learned to spot em dash use as a pattern, creating a problem for the small subset of writers who naturally favor the punctuation mark in their work. As a result, some journalists are complaining that AI is “killing” the em dash.

No one knows precisely why LLMs tend to overuse em dashes. We’ve seen a wide range of speculation online that attempts to explain the phenomenon, from noticing that em dashes were more popular in 19th-century books used as training data (according to a 2018 study, dash use in the English language peaked around 1860 before declining through the mid-20th century) or perhaps AI models borrowed the habit from automatic em-dash character conversion on the blogging site Medium.

One thing we know for sure is that LLMs tend to output frequently seen patterns in their training data (fed in during the initial training process) and from a subsequent reinforcement learning process that often relies on human preferences. As a result, AI language models feed you a sort of “smoothed out” average style of whatever you ask them to provide, moderated by whatever they are conditioned to produce through user feedback.

So the most plausible explanation is still that requests for professional-style writing from an AI model trained on vast numbers of examples from the Internet will lean heavily toward the prevailing style in the training data, where em dashes appear frequently in formal writing, news articles, and editorial content. It’s also possible that during training through human feedback (called RLHF), responses with em dashes, for whatever reason, received higher ratings. Perhaps it’s because those outputs appeared more sophisticated or engaging to evaluators, but that’s just speculation.

From em dashes to AGI?

To understand what Altman’s “win” really means, and what it says about the road to AGI, we need to understand how ChatGPT’s custom instructions actually work. They allow users to set persistent preferences that apply across all conversations by appending written instructions to the prompt that is fed into the model just before the chat begins. Users can specify tone, format, and style requirements without needing to repeat those requests manually in every new chat.

However, the feature has not always worked reliably because LLMs do not work reliably (even OpenAI and Anthropic freely admit this). An LLM takes an input and produces an output, spitting out a statistically plausible continuation of a prompt (a system prompt, the custom instructions, and your chat history), and it doesn’t really “understand” what you are asking. With AI language model outputs, there is always some luck involved in getting them to do what you want.

In our informal testing of GPT-5.1 with custom instructions, ChatGPT did appear to follow our request not to produce em dashes. But despite Altman’s claim, the response from X users appears to show that experiences with the feature continue to vary, at least when the request is not placed in custom instructions.

So if LLMs are statistical text-generation boxes, what does “instruction following” even mean? That’s key to unpacking the hypothetical path from LLMs to AGI. The concept of following instructions for an LLM is fundamentally different from how we typically think about following instructions as humans with general intelligence, or even a traditional computer program.

In traditional computing, instruction following is deterministic. You tell a program “don’t include character X,” and it won’t include that character. The program executes rules exactly as written. With LLMs, “instruction following” is really about shifting statistical probabilities. When you tell ChatGPT “don’t use em dashes,” you’re not creating a hard rule. You’re adding text to the prompt that makes tokens associated with em dashes less likely to be selected during the generation process. But “less likely” isn’t “impossible.”

Every token the model generates is selected from a probability distribution. Your custom instruction influences that distribution, but it’s competing with the model’s training data (where em dashes appeared frequently in certain contexts) and everything else in the prompt. Unlike code with conditional logic, there’s no separate system verifying outputs against your requirements. The instruction is just more text that influences the statistical prediction process.

When Altman celebrates finally getting GPT to avoid em dashes, he’s really celebrating that OpenAI has tuned the latest version of GPT-5.1 (probably through reinforcement learning or fine-tuning) to weight custom instructions more heavily in its probability calculations.

There’s an irony about control here: Given the probabilistic nature of the issue, there’s no guarantee the issue will stay fixed. OpenAI continuously updates its models behind the scenes, even within the same version number, adjusting outputs based on user feedback and new training runs. Each update arrives with different output characteristics that can undo previous behavioral tuning, a phenomenon researchers call the “alignment tax.”

Precisely tuning a neural network’s behavior is not yet an exact science. Since all concepts encoded in the network are interconnected by values called weights, adjusting one behavior can alter others in unintended ways. Fix em dash overuse today, and tomorrow’s update (aimed at improving, say, coding capabilities) might inadvertently bring them back, not because OpenAI wants them there, but because that’s the nature of trying to steer a statistical system with millions of competing influences.

This gets to an implied question we mentioned earlier. If controlling punctuation use is still a struggle that might pop back up at any time, how far are we from AGI? We can’t know for sure, but it seems increasingly likely that it won’t emerge from a large language model alone. That’s because AGI, a technology that would replicate human general learning ability, would likely require true understanding and self-reflective intentional action, not statistical pattern matching that sometimes aligns with instructions if you happen to get lucky.

And speaking of getting lucky, some users still aren’t having luck with controlling em dash use outside of the “custom instructions” feature. Upon being told in-chat to not use em dashes within a chat, ChatGPT updated a saved memory and replied to one X user, “Got it—I’ll stick strictly to short hyphens from now on.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Forget AGI—Sam Altman celebrates ChatGPT finally following em dash formatting rules Read More »

openai-walks-a-tricky-tightrope-with-gpt-5.1’s-eight-new-personalities

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities

On Wednesday, OpenAI released GPT-5.1 Instant and GPT-5.1 Thinking, two updated versions of its flagship AI models now available in ChatGPT. The company is wrapping the models in the language of anthropomorphism, claiming that they’re warmer, more conversational, and better at following instructions.

The release follows complaints earlier this year that its previous models were excessively cheerful and sycophantic, along with an opposing controversy among users over how OpenAI modified the default GPT-5 output style after several suicide lawsuits.

The company now faces intense scrutiny from lawyers and regulators that could threaten its future operations. In that kind of environment, it’s difficult to just release a new AI model, throw out a few stats, and move on like the company could even a year ago. But here are the basics: The new GPT-5.1 Instant model will serve as ChatGPT’s faster default option for most tasks, while GPT-5.1 Thinking is a simulated reasoning model that attempts to handle more complex problem-solving tasks.

OpenAI claims that both models perform better on technical benchmarks such as math and coding evaluations (including AIME 2025 and Codeforces) than GPT-5, which was released in August.

Improved benchmarks may win over some users, but the biggest change with GPT-5.1 is in its presentation. OpenAI says it heard from users that they wanted AI models to simulate different communication styles depending on the task, so the company is offering eight preset options, including Professional, Friendly, Candid, Quirky, Efficient, Cynical, and Nerdy, alongside a Default setting.

These presets alter the instructions fed into each prompt to simulate different personality styles, but the underlying model capabilities remain the same across all settings.

An illustration showing GPT-5.1's eight personality styles in ChatGPT.

An illustration showing GPT-5.1’s eight personality styles in ChatGPT. Credit: OpenAI

In addition, the company trained GPT-5.1 Instant to use “adaptive reasoning,” meaning that the model decides when to spend more computational time processing a prompt before generating output.

The company plans to roll out the models gradually over the next few days, starting with paid subscribers before expanding to free users. OpenAI plans to bring both GPT-5.1 Instant and GPT-5.1 Thinking to its API later this week. GPT-5.1 Instant will appear as gpt-5.1-chat-latest, and GPT-5.1 Thinking will be released as GPT-5.1 in the API, both with adaptive reasoning enabled. The older GPT-5 models will remain available in ChatGPT under the legacy models dropdown for paid subscribers for three months.

OpenAI walks a tricky tightrope with GPT-5.1’s eight new personalities Read More »

openai-slams-court-order-that-lets-nyt-read-20-million-complete-user-chats

OpenAI slams court order that lets NYT read 20 million complete user chats


OpenAI: NYT wants evidence of ChatGPT users trying to get around news paywall.

Credit: Getty Images | alexsl

OpenAI wants a court to reverse a ruling forcing the ChatGPT maker to give 20 million user chats to The New York Times and other news plaintiffs that sued it over alleged copyright infringement. Although OpenAI previously offered 20 million user chats as a counter to the NYT’s demand for 120 million, the AI company says a court order requiring production of the chats is too broad.

“The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT,” OpenAI said today in a filing in US District Court for the Southern District of New York. “Disclosure of those logs is thus much more likely to expose private information [than individual prompt-output pairs], in the same way that eavesdropping on an entire conversation reveals more private information than a 5-second conversation fragment.”

OpenAI’s filing said that “more than 99.99%” of the chats “have nothing to do with this case.” It asked the district court to “vacate the order and order News Plaintiffs to respond to OpenAI’s proposal for identifying relevant logs.” OpenAI could also seek review in a federal court of appeals.

OpenAI posted a message on its website to users today saying that “The New York Times is demanding that we turn over 20 million of your private ChatGPT conversations” in order to “find examples of you using ChatGPT to try to get around their paywall.”

ChatGPT users concerned about privacy have more to worry about than the NYT case. For example, ChatGPT conversations have been found in Google search results and the Google Search Console tool that developers can use to monitor search traffic. OpenAI today said it plans to develop “advanced security features designed to keep your data private, including client-side encryption for your messages with ChatGPT. ”

OpenAI: AI chats should be treated like private emails

OpenAI’s court filing argues that the chat log production should be narrowed based on the relevance of chats to the case.

“OpenAI is unaware of any court ordering wholesale production of personal information at this scale,” the filing said. “This sets a dangerous precedent: it suggests that anyone who files a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance. This is not how discovery works in other cases: courts do not allow plaintiffs suing Google to dig through the private emails of tens of millions of Gmail users irrespective of their relevance. And it is not how discovery should work for generative AI tools either.”

A November 7 order by US Magistrate Judge Ona Wang sided with the NYT, saying that OpenAI must “produce the 20 million de-identified Consumer ChatGPT Logs to News Plaintiffs by November 14, 2025, or within 7 days of completing the de-identification process.” Wang ruled that the production must go forward even though the parties don’t agree on whether the logs must be produced in full:

Whether or not the parties had reached agreement to produce the 20 million Consumer ChatGPT Logs in whole—which the parties vehemently dispute—such production here is appropriate. OpenAI has failed to explain how its consumers’ privacy rights are not adequately protected by: (1) the existing protective order in this multidistrict litigation or (2) OpenAI’s exhaustive de-identification of all of the 20 million Consumer ChatGPT Logs.

OpenAI’s filing today said the court order “did not acknowledge OpenAI’s sworn witness declaration explaining that the de-identification process is not intended to remove information that is non-identifying but may nonetheless be private, like a Washington Post reporter’s hypothetical use of ChatGPT to assist in the preparation of a news article.”

The New York Times provided a statement today after being contacted by Ars. “The New York Times’s case against OpenAI and Microsoft is about holding these companies accountable for stealing millions of copyrighted works to create products that directly compete with The Times,” the company said. “In another attempt to cover up its illegal conduct, OpenAI’s blog post purposely misleads its users and omits the facts. No ChatGPT user’s privacy is at risk. The court ordered OpenAI to provide a sample of chats, anonymized by OpenAI itself, under a legal protective order. This fear-mongering is all the more dishonest given that OpenAI’s own terms of service permit the company to train its models on users’ chats and turn over chats for litigation.”

Chats stored under legal hold

The 20 million chats consist of a random sampling of ChatGPT conversations from December 2022 to November 2024 and do not include chats of business customers, OpenAI said in the message on its website.

“We presented several privacy-preserving options to The Times, including targeted searches over the sample (e.g., to search for chats that might include text from a New York Times article so they only receive the conversations relevant to their claims), as well as high-level data classifying how ChatGPT was used in the sample. These were rejected by The Times,” OpenAI said.

The chats are stored in a secure system that is “protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations,” OpenAI said. The NYT “would be legally obligated at this time to not make any data public outside the court process,” and OpenAI said it will fight any attempts to make the user conversations public.

A NYT filing on October 30 accused OpenAI of defying prior agreements “by refusing to produce even a small sample of the billions of model outputs that its conduct has put in issue in this case.” The filing continued:

Immediate production of the output log sample is essential to stay on track for the February 26, 2026, discovery deadline. OpenAI’s proposal to run searches on this small subset of its model outputs on Plaintiffs’ behalf is as inefficient as it is inadequate to allow Plaintiffs to fairly analyze how “real world” users interact with a core product at the center of this litigation. Plaintiffs cannot reasonably conduct expert analyses about how OpenAI’s models function in its core consumer-facing product, how retrieval augmented generation (“RAG”) functions to deliver news content, how consumers interact with that product, and the frequency of hallucinations without access to the model outputs themselves.

OpenAI said the NYT’s discovery requests were initially limited to logs “related to Times content” and that it has “been working to satisfy those requests by sampling conversation logs. Towards the end of that process, News Plaintiffs filed a motion with a new demand: that instead of finding and producing logs that are ‘related to Times content,’ OpenAI should hand over the entire 20 million-log sample ‘via hard drive.’”

OpenAI disputes judge’s reasoning

The November 7 order cited a California case, Concord Music Group, Inc. v. Anthropic PBC, in which US District Magistrate Judge Susan van Keulen ordered the production of 5 million records. OpenAI consistently relied on van Keulen’s use of a sample-size formula “in support of its previous proposed methodology for conversation data sampling, but fails to explain why Judge [van] Keulen’s subsequent order directing production of the entire 5 million-record sample to the plaintiff in that case is not similarly instructive here,” Wang wrote.

OpenAI’s filing today said the company was never given an opportunity to explain why Concord shouldn’t apply in this case because the news plaintiffs did not reference it in their motion.

“The cited Concord order was not about whether wholesale production of the sample was appropriate; it was about the mechanism through which Anthropic would effectuate an already agreed-upon production,” OpenAI wrote. “Nothing about that order suggests that Judge van Keulen would have ordered wholesale production had Anthropic raised the privacy concerns that OpenAI has raised throughout this case.”

The Concord logs were just prompt-output pairs, “i.e., a single user prompt followed by a single model output,” OpenAI wrote. “The logs at issue here are complete conversations: each log in the 20 million sample represents a complete exchange of multiple prompt-output pairs between a user and ChatGPT.” That could result in “up to 80 million prompt-output pairs,” OpenAI said.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

OpenAI slams court order that lets NYT read 20 million complete user chats Read More »

google-says-new-cloud-based-“private-ai-compute”-is-just-as-secure-as-local-processing

Google says new cloud-based “Private AI Compute” is just as secure as local processing

NPUs can’t do it all, though. While Gemini Nano is getting more capable, it can’t compete with models that run on massive, high-wattage servers. That might be why some AI features, like the temporarily unavailable Daily Brief, don’t do much on the Pixels. Magic Cue, which surfaces personal data based on screen context, is probably in a similar place. Google now says that Magic Cue will get “even more helpful” thanks to the Private AI Compute system.

Pixel 10 flat

Magic Cue debuted on the Pixel 10, but it doesn’t do much yet.

Credit: Ryan Whitwam

Magic Cue debuted on the Pixel 10, but it doesn’t do much yet. Credit: Ryan Whitwam

Google has also released a Pixel feature drop today, but there aren’t many new features of note (unless you’ve been hankering for Wicked themes). As part of the update, Magic Cue will begin using the Private AI Compute system to generate suggestions. The more powerful model might be able to tease out more actionable details from your data. Google also notes the Recorder app will be able to summarize in more languages thanks to the secure cloud.

So what Google is saying here is that more of your data is being offloaded to the cloud so that Magic Cue can generate useful suggestions, which would be a change. Since launch, we’ve only seen Magic Cue appear a handful of times, and it’s not offering anything interesting when it does.

There are still reasons to use local AI, even if the cloud system has “the same security and privacy assurances,” as Google claims. An NPU offers superior latency because your data doesn’t have to go anywhere, and it’s more reliable, as AI features will still work without an Internet connection. Google believes this hybrid approach is the way forward for generative AI, which requires significant processing even for seemingly simple tasks. We can expect to see more AI features reaching out to Google’s secure cloud soon.

Google says new cloud-based “Private AI Compute” is just as secure as local processing Read More »

google-announces-even-more-ai-in-photos-app,-powered-by-nano-banana

Google announces even more AI in Photos app, powered by Nano Banana

We’re running out of ways to tell you that Google is releasing more generative AI features, but that’s what’s happening in Google Photos today. The Big G is finally making good on its promise to add its market-leading Nano Banana image-editing model to the app. The model powers a couple of features, and it’s not just for Google’s Android platform. Nano Banana edits are also coming to the iOS version of the app.

Nano Banana started making waves when it appeared earlier this year as an unbranded demo. You simply feed the model an image and tell it what edits you want to see. Google said Nano Banana was destined for the Photos app back in October, but it’s only now beginning the rollout. The Photos app already had conversational editing in the “Help Me Edit” feature, but it was running an older non-fruit model that produced inferior results. Nano Banana editing will produce AI slop, yes, but it’s better slop.

Nano Banana in Help me edit

Google says the updated Help Me Edit feature has access to your private face groups, so you can use names in your instructions. For example, you could type “Remove Riley’s sunglasses,” and Nano Banana will identify Riley in the photo (assuming you have a person of that name saved) and make the edit without further instructions. You can also ask for more fantastical edits in Help Me Edit, changing the style of the image from top to bottom.

Google announces even more AI in Photos app, powered by Nano Banana Read More »