Artificial Intelligence

chatgpt-users-hate-gpt-5’s-“overworked-secretary”-energy,-miss-their-gpt-4o-buddy

ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy

Others are irked by how quickly they run up against usage limits on the free tier, which pushes them toward the Plus ($20) and Pro ($200) subscriptions. But running generative AI is hugely expensive, and OpenAI is hemorrhaging cash. It wouldn’t be surprising if the wide rollout of GPT-5 is aimed at increasing revenue. At the same time, OpenAI can point to AI evaluations that show GPT-5 is more intelligent than its predecessor.

RIP your AI buddy

OpenAI built ChatGPT to be a tool people want to use. It’s a fine line to walk—OpenAI has occasionally made its flagship AI too friendly and complimentary. Several months ago, the company had to roll back a change that made the bot into a sycophantic mess that would suck up to the user at every opportunity. That was a bridge too far, certainly, but many of the company’s users liked the generally friendly tone of the chatbot. They tuned the AI with custom prompts and built it into a personal companion. They’ve lost that with GPT-5.

No new AI

Naturally, ChatGPT users have turned to AI to express their frustration.

Credit: /u/Responsible_Cow2236

Naturally, ChatGPT users have turned to AI to express their frustration. Credit: /u/Responsible_Cow2236

There are reasons to be wary of this kind of parasocial attachment to artificial intelligence. As companies have tuned these systems to increase engagement, they prioritize outputs that make people feel good. This results in interactions that can reinforce delusions, eventually leading to serious mental health episodes and dangerous medical beliefs. It can be hard to understand for those of us who don’t spend our days having casual conversations with ChatGPT, but the Internet is teeming with folks who build their emotional lives around AI.

Is GPT-5 safer? Early impressions from frequent chatters decry the bot’s more corporate, less effusively creative tone. In short, a significant number of people don’t like the outputs as much. GPT-5 could be a more able analyst and worker, but it isn’t the digital companion people have come to expect, and in some cases, love. That might be good in the long term, both for users’ mental health and OpenAI’s bottom line, but there’s going to be an adjustment period for fans of GPT-4o.

Chatters who are unhappy with the more straightforward tone of GPT-5 can always go elsewhere. Elon Musk’s xAI has shown it is happy to push the envelope with Grok, featuring Taylor Swift nudes and AI waifus. Of course, Ars does not recommend you do that.

ChatGPT users hate GPT-5’s “overworked secretary” energy, miss their GPT-4o buddy Read More »

openai-releases-its-first-open-source-models-since-2019

OpenAI releases its first open source models since 2019

OpenAI is releasing new generative AI models today, and no, GPT-5 is not one of them. Depending on how you feel about generative AI, these new models may be even more interesting, though. The company is rolling out gpt-oss-120b and gpt-oss-20b, its first open weight models since the release of GPT-2 in 2019. You can download and run these models on your own hardware, with support for simulated reasoning, tool use, and deep customization.

When you access the company’s proprietary models in the cloud, they’re running on powerful server infrastructure that cannot be replicated easily, even in enterprise. The new OpenAI models come in two variants (120b and 20b) to be run on less powerful hardware configurations. Both are transformers with configurable chain of thought (CoT), supporting low, medium, and high settings. The lower settings are faster and use fewer compute resources, but the outputs are better with the highest setting. You can set the CoT level with a single line in the system prompt.

The smaller gpt-oss-20b has a total of 21 billion parameters, utilizing mixture-of-experts (MoE) to reduce that to 3.6 billion parameters per token. As for gpt-oss-120b, its 117 billion parameters come down to 5.1 billion per token with MoE. The company says the smaller model can run on a consumer-level machine with 16GB or more of memory. To run gpt-oss-120b, you need 80GB of memory, which is more than you’re likely to find in the average consumer machine. It should fit on a single AI accelerator GPU like the Nvidia H100, though. Both models have a context window of 128,000 tokens.

Credit: OpenAI

The team says users of gpt-oss can expect robust performance similar to its leading cloud-based models. The larger one benchmarks between the o3 and o4-mini proprietary models in most tests, with the smaller version running just a little behind. It gets closest in math and coding tasks. In the knowledge-based Humanity’s Last Exam, o3 is far out in front with 24.9 percent (with tools), while gpt-oss-120b only manages 19 percent. For comparison, Google’s leading Gemini Deep Think hits 34.8 percent in that test.

OpenAI releases its first open source models since 2019 Read More »

deepmind-reveals-genie-3-“world-model”-that-creates-real-time-interactive-simulations

DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations

While no one has figured out how to make money from generative artificial intelligence, that hasn’t stopped Google DeepMind from pushing the boundaries of what’s possible with a big pile of inference. The capabilities (and costs) of these models have been on an impressive upward trajectory, a trend exemplified by the reveal of Genie 3. A mere seven months after showing off the Genie 2 “foundational world model,” which was itself a significant improvement over its predecessor, Google now has Genie 3.

With Genie 3, all it takes is a prompt or image to create an interactive world. Since the environment is continuously generated, it can be changed on the fly. You can add or change objects, alter weather conditions, or insert new characters—DeepMind calls these “promptable events.” The ability to create alterable 3D environments could make games more dynamic for players and offer developers new ways to prove out concepts and level designs. However, many in the gaming industry have expressed doubt that such tools would help.

Genie 3: building better worlds.

It’s tempting to think of Genie 3 simply as a way to create games, but DeepMind sees this as a research tool, too. Games play a significant role in the development of artificial intelligence because they provide challenging, interactive environments with measurable progress. That’s why DeepMind previously turned to games like Go and StarCraft to expand the bounds of AI.

World models take that to the next level, generating an interactive world frame by frame. This provides an opportunity to refine how AI models—including so-called “embodied agents”—behave when they encounter real-world situations. One of the primary limitations as companies work toward the goal of artificial general intelligence (AGI) is the scarcity of reliable training data. After piping basically every webpage and video on the planet into AI models, researchers are turning toward synthetic data for many applications. DeepMind believes world models could be a key part of this effort, as they can be used to train AI agents with essentially unlimited interactive worlds.

DeepMind says Genie 3 is an important advancement because it offers much higher visual fidelity than Genie 2, and it’s truly real-time. Using keyboard input, it’s possible to navigate the simulated world in 720p resolution at 24 frames per second. Perhaps even more importantly, Genie 3 can remember the world it creates.

DeepMind reveals Genie 3 “world model” that creates real-time interactive simulations Read More »

delta-denies-using-ai-to-come-up-with-inflated,-personalized-prices

Delta denies using AI to come up with inflated, personalized prices

Delta scandal highlights value of transparency

According to Delta, the company has “zero tolerance for discriminatory or predatory pricing” and only feeds its AI system aggregated data “to enhance our existing fare pricing processes.”

Rather than basing fare prices on customers’ personal information, Carter clarified that “all customers have access to the same fares and offers based on objective criteria provided by the customer such as origin and destination, advance purchase, length of stay, refundability, and travel experience selected.”

The AI use can result in higher or lower prices, but not personalized fares for different customers, Carter said. Instead, Delta plans to use AI pricing to “enhance market competitiveness and drive sales, benefiting both our customers and our business.”

Factors weighed by the AI system, Carter explained, include “customer demand for seats and purchasing data at an aggregated level, competitive offers and schedules, route performance, and cost of providing the service inclusive of jet fuel.” That could potentially mean a rival’s promotion or schedule change could trigger the AI system to lower prices to stay competitive, or it might increase prices based on rising fuel costs to help increase revenue or meet business goals.

“Given the tens of millions of fares and hundreds of thousands of routes for sale at any given time, the use of new technology like AI promises to streamline the process by which we analyze existing data and the speed and scale at which we can respond to changing market dynamics,” Carter wrote.

He explained the AI system helps Delta aggregate purchasing data for specific routes and flights, adapt to new market conditions, and factor in “thousands of variables simultaneously.” AI could also eventually be used to assist with crew scheduling, improve flight availability, or help reservation specialists answer complex questions or resolve disputes.

But “to reiterate, prices are not targeted to individual consumers,” Carter emphasized.

Delta further pointed out that the company does not require customers to log in to search for tickets, which means customers can search for flights without sharing any personal information.

For AI companies paying attention to the Delta backlash, there may be a lesson about the value of transparency in Delta’s scandal. Critics noted Delta was among the first to admit it was using AI to influence pricing, but the vague explanation on the earnings call stoked confusion over how, as Delta seemed to drag its feet amid calls by groups like Consumer Watchdog for more transparency.

Delta denies using AI to come up with inflated, personalized prices Read More »

google-releases-gemini-2.5-deep-think-for-ai-ultra-subscribers

Google releases Gemini 2.5 Deep Think for AI Ultra subscribers

Google is unleashing its most powerful Gemini model today, but you probably won’t be able to try it. After revealing Gemini 2.5 Deep Think at the I/O conference back in May, Google is making this AI available in the Gemini app. Deep Think is designed for the most complex queries, which means it uses more compute resources than other models. So it should come as no surprise that only those subscribing to Google’s $250 AI Ultra plan will be able to access it.

Deep Think is based on the same foundation as Gemini 2.5 Pro, but it increases the “thinking time” with greater parallel analysis. According to Google, Deep Think explores multiple approaches to a problem, even revisiting and remixing the various hypotheses it generates. This process helps it create a higher-quality output.

Deep Think benchmarks

Credit: Google

Like some other heavyweight Gemini tools, Deep Think takes several minutes to come up with an answer. This apparently makes the AI more adept at design aesthetics, scientific reasoning, and coding. Google has exposed Deep Think to the usual battery of benchmarks, showing that it surpasses the standard Gemini 2.5 Pro and competing models like OpenAI o3 and Grok 4. Deep Think shows a particularly large gain in Humanity’s Last Exam, a collection of 2,500 complex, multi-modal questions that cover more than 100 subjects. Other models top out at 20 or 25 percent, but Gemini 2.5 Deep Think managed a score of 34.8 percent.

Google releases Gemini 2.5 Deep Think for AI Ultra subscribers Read More »

google-confirms-it-will-sign-the-eu-ai-code-of-practice

Google confirms it will sign the EU AI Code of Practice

The regulation of AI systems could be the next hurdle as Big Tech aims to deploy technologies framed as transformative and vital to the future. Google products like search and Android have been in the sights of EU regulators for years, so getting in on the ground floor with the AI code would help it navigate what will surely be a tumultuous legal environment.

A comprehensive AI framework

The US has shied away from AI regulation, and the current administration is actively working to remove what few limits are in place. The White House even attempted to ban all state-level AI regulation for a period of ten years in the recent tax bill. Europe, meanwhile, is taking the possible negative impacts of AI tools seriously with a rapidly evolving regulatory framework.

The AI Code of Practice aims to provide AI firms with a bit more certainty in the face of a shifting landscape. It was developed with the input of more than 1,000 citizen groups, academics, and industry experts. The EU Commission says companies that adopt the voluntary code will enjoy a lower bureaucratic burden, easing compliance with the block’s AI Act, which came into force last year.

Under the terms of the code, Google will have to publish summaries of its model training data and disclose additional model features to regulators. The code also includes guidance on how firms should manage safety and security in compliance with the AI Act. Likewise, it includes paths to align a company’s model development with EU copyright law as it pertains to AI, a sore spot for Google and others.

Companies like Meta that don’t sign the code will not escape regulation. All AI companies operating in Europe will have to abide by the AI Act, which includes the most detailed regulatory framework for generative AI systems in the world. The law bans high-risk uses of AI like intentional deception or manipulation of users, social scoring systems, and real-time biometric scanning in public spaces. Companies that violate the rules in the AI Act could be hit with fines as high as 35 million euros ($40.1 million) or up to 7 percent of the offender’s global revenue.

Google confirms it will sign the EU AI Code of Practice Read More »

delta’s-ai-spying-to-“jack-up”-prices-must-be-banned,-lawmakers-say

Delta’s AI spying to “jack up” prices must be banned, lawmakers say

“There is no fare product Delta has ever used, is testing or plans to use that targets customers with individualized offers based on personal information or otherwise,” Delta said. “A variety of market forces drive the dynamic pricing model that’s been used in the global industry for decades, with new tech simply streamlining this process. Delta always complies with regulations around pricing and disclosures.”

Other companies “engaging in surveillance-based price setting” include giants like Amazon and Kroger, as well as a ride-sharing app that has been “charging a customer more when their phone battery is low.”

Public Citizen, a progressive consumer rights group that endorsed the bill, condemned the practice in the press release, urging Congress to pass the law and draw “a clear line in the sand: companies can offer discounts and fair wages—but not by spying on people.”

“Surveillance-based price gouging and wage setting are exploitative practices that deepen inequality and strip consumers and workers of dignity,” Public Citizen said.

AI pricing will cause “full-blown crisis”

In January, the Federal Trade Commission requested information from eight companies—including MasterCard, Revionics, Bloomreach, JPMorgan Chase, Task Software, PROS, Accenture, and McKinsey & Co—joining a “shadowy market” that provides AI pricing services. Those companies confirmed they’ve provided services to at least 250 companies “that sell goods or services ranging from grocery stores to apparel retailers,” lawmakers noted.

That inquiry led the FTC to conclude that “widespread adoption of this practice may fundamentally upend how consumers buy products and how companies compete.”

In the press release, the anti-monopoly watchdog, the American Economic Liberties Project, was counted among advocacy groups endorsing the Democrats’ bill. Their senior legal counsel, Lee Hepner, pointed out that “grocery prices have risen 26 percent since the pandemic-era explosion of online shopping,” and that’s “dovetailing with new technology designed to squeeze every last penny from consumers.”

Delta’s AI spying to “jack up” prices must be banned, lawmakers say Read More »

google’s-new-“web-guide”-will-use-ai-to-organize-your-search-results

Google’s new “Web Guide” will use AI to organize your search results

Web Guide is halfway between normal search and AI Mode.

Credit: Google

Web Guide is halfway between normal search and AI Mode. Credit: Google

Google suggests trying Web Guide with longer or open-ended queries, like “how to solo travel in Japan.” The video below uses that search as an example. It has many of the links you might expect, but there are also AI-generated headings with summaries and suggestions. It really looks halfway between standard search and AI Mode. Because it has to run additional searches and generate content, Web Guide takes a beat longer to produce results compared to a standard search. There’s no AI Overview at the top, though.

Web Guide is a Search Labs experiment, meaning you have to opt-in before you’ll see any AI organization in your search results. When enabled, this feature takes over the “Web” tab of Google search. Even if you turn it on, Google notes there will be a toggle that allows you to revert to the normal, non-AI-optimized page.

An example of the Web Guide test.

Eventually, the test will expand to encompass more parts of the search experience, like the “All” tab—that’s the default search experience when you input a query from a browser or phone search bar. Google says it’s approaching this as an opt-in feature to start. So that sounds like Web Guide might be another AI Mode situation in which the feature rolls out widely after a short testing period. It’s technically possible the test will not result in a new universal search feature, but Google hasn’t yet met a generative AI implementation that it hasn’t liked.

Google’s new “Web Guide” will use AI to organize your search results Read More »

trump’s-order-to-make-chatbots-anti-woke-is-unconstitutional,-senator-says

Trump’s order to make chatbots anti-woke is unconstitutional, senator says


Trump plans to use chatbots to eliminate dissent, senator alleged.

The CEOs of every major artificial intelligence company received letters Wednesday urging them to fight Donald Trump’s anti-woke AI order.

Trump’s executive order requires any AI company hoping to contract with the federal government to jump through two hoops to win funding. First, they must prove their AI systems are “truth-seeking”—with outputs based on “historical accuracy, scientific inquiry, and objectivity” or else acknowledge when facts are uncertain. Second, they must train AI models to be “neutral,” which is vaguely defined as not favoring DEI (diversity, equity, and inclusion), “dogmas,” or otherwise being “intentionally encoded” to produce “partisan or ideological judgments” in outputs “unless those judgments are prompted by or otherwise readily accessible to the end user.”

Announcing the order in a speech, Trump said that the US winning the AI race depended on removing allegedly liberal biases, proclaiming that “once and for all, we are getting rid of woke.”

“The American people do not want woke Marxist lunacy in the AI models, and neither do other countries,” Trump said.

Senator Ed Markey (D.-Mass.) accused Republicans of basing their policies on feelings, not facts, joining critics who suggest that AI isn’t “woke” just because of a few “anecdotal” outputs that reflect a liberal bias. And he suggested it was hypocritical that Trump’s order “ignores even more egregious evidence” that contradicts claims that AI is trained to be woke, such as xAI’s Elon Musk explicitly confirming that Grok was trained to be more right-wing.

“On May 1, 2025, Grok—the AI chatbot developed by xAI, Elon Musk’s AI company—acknowledged that ‘xAI tried to train me to appeal to the right,’” Markey wrote in his letters to tech giants. “If OpenAI’s ChatGPT or Google’s Gemini had responded that it was trained to appeal to the left, congressional Republicans would have been outraged and opened an investigation. Instead, they were silent.”

He warned the heads of Alphabet, Anthropic, Meta, Microsoft, OpenAI, and xAI that Trump’s AI agenda was allegedly “an authoritarian power grab” intended to “eliminate dissent” and was both “dangerous” and “patently unconstitutional.”

Even if companies’ AI models are clearly biased, Markey argued that “Republicans are using state power to pressure private companies to adopt certain political viewpoints,” which he claimed is a clear violation of the First Amendment. If AI makers cave, Markey warned, they’d be allowing Trump to create “significant financial incentives” to ensure that “their AI chatbots do not produce speech that would upset the Trump administration.”

“This type of interference with private speech is precisely why the US Constitution has a First Amendment,” Markey wrote, while claiming that Trump’s order is factually baseless.

It’s “based on the erroneous belief that today’s AI chatbots are ‘woke’ and biased against Trump,” Markey said, urging companies “to fight this unconstitutional executive order and not become a pawn in Trump’s effort to eliminate dissent in this country.”

One big reason AI companies may fight order

Some experts agreed with Markey that Trump’s order was likely unconstitutional or otherwise unlawful, The New York Times reported.

For example, Trump may struggle to convince courts that the government isn’t impermissibly interfering with AI companies’ protected speech or that such interference may be necessary to ensure federal procurement of unbiased AI systems.

Genevieve Lakier, a law professor at the University of Chicago, told the NYT that the lack of clarity around what makes a model biased could be a problem. Courts could deem the order an act of “unconstitutional jawboning,” with the Trump administration and Republicans generally perceived as using legal threats to pressure private companies into producing outputs that they like.

Lakier suggested that AI companies may be so motivated to win government contracts or intimidated by possible retaliation from Trump that they may not even challenge the order, though.

Markey is hoping that AI companies will refuse to comply with the order; however, despite recognizing that it places companies “in a difficult position: Either stand on your principles and face the wrath of the Trump administration or cave to Trump and modify your company’s political speech.”

There is one big possible reason that AI companies may have to resist, though.

Oren Etzioni, the former CEO of the AI research nonprofit Allen Institute for Artificial Intelligence, told CNN that Trump’s anti-woke AI order may contradict the top priority of his AI Action Plan—speeding up AI innovation in the US—and actually threaten to hamper innovation.

If AI developers struggle to produce what the Trump administration considers “neutral” outputs—a technical challenge that experts agree is not straightforward—that could delay model advancements.

“This type of thing… creates all kinds of concerns and liability and complexity for the people developing these models—all of a sudden, they have to slow down,” Etzioni told CNN.

Senator: Grok scandal spotlights GOP hypocrisy

Some experts have suggested that rather than chatbots adopting liberal viewpoints, chatbots are instead possibly filtering out conservative misinformation and unintentionally appearing to favor liberal views.

Andrew Hall, a professor of political economy at Stanford Graduate School of Business—who published a May paper finding that “Americans view responses from certain popular AI models as being slanted to the left”—told CNN that “tech companies may have put extra guardrails in place to prevent their chatbots from producing content that could be deemed offensive.”

Markey seemed to agree, writing that Republicans’ “selective outrage matches conservatives’ similar refusal to acknowledge that the Big Tech platforms suspend or impose other penalties disproportionately on conservative users because those users are disproportionately likely to share misinformation, rather than due to any political bias by the platforms.”

It remains unclear what amount of supposed bias detected in outputs could cause a contract bid to be rejected or an ongoing contract to be canceled, but AI companies will likely be on the hook to pay any fees in terminating contracts.

Complying with Trump’s order could pose a struggle for AI makers for several reasons. First, they’ll have to determine what’s fact and what’s ideology, contending with conflicting government standards in how Trump defines DEI. For example, the president’s order counts among “pervasive and destructive” DEI ideologies any outputs that align with long-standing federal protections against discrimination on the basis of race or sex. In addition, they must figure out what counts as “suppression or distortion of factual information about” historical topics like critical race theory, systemic racism, or transgenderism.

The examples in Trump’s order highlighting outputs offensive to conservatives seem inconsequential. He calls out image generators depicting the Pope, the Founding Fathers, and Vikings as not white as problematic, as well as models refusing to misgender a person “even if necessary to stop a nuclear apocalypse” or show white people celebrating their achievements.

It’s hard to imagine how these kinds of flawed outputs could impact government processes, as compared to, say, government contracts granted to models that could be hiding covert racism or sexism.

So far, there has been one example of an AI model displaying a right-wing bias earning a government contract with no red flags raised about its outputs.

Earlier this summer, Grok shocked the world after Musk announced he would be updating the bot to eliminate a supposed liberal bias. The unhinged chatbot began spouting offensive outputs, including antisemitic posts that praised Hitler as well as proclaiming itself “MechaHitler.”

But those obvious biases did not conflict with the Pentagon’s decision to grant xAI a $200 million federal contract. In a statement, a Pentagon spokesperson insisted that “the antisemitism episode wasn’t enough to disqualify” xAI, NBC News reported, partly since “several frontier AI models have produced questionable outputs.”

The Pentagon’s statement suggested that the government expected to deal with such risks while seizing the opportunity of rapidly deploying emerging AI technology into government prototype processes. And perhaps notably, Trump provides a carveout for any agencies using AI models to safeguard national security, which could exclude the Pentagon from experiencing any “anti-woke” delays in accessing frontier models.

But that won’t help other agencies that must figure out how to assess models to meet anti-woke AI requirements over the next few months. And those assessments could cause delays that Trump may wish to avoid in pushing for widespread AI adoption across government.

Trump’s anti-woke AI agenda may be impossible

On the same day that Trump issued his anti-woke AI order, his AI Action Plan promised an AI “renaissance” fueling “intellectual achievements” by “unraveling ancient scrolls once thought unreadable, making breakthroughs in scientific and mathematical theory, and creating new kinds of digital and physical art.”

To achieve that, the US must “innovate faster and more comprehensively than our competitors” and eliminate regulatory barriers impeding innovation in order to “set the gold standard for AI worldwide.”

However, achieving the anti-woke ambitions of both orders raises a technical problem that even the president must accept currently has no solution. In his AI Action Plan, Trump acknowledged that “the inner workings of frontier AI systems are poorly understood,” with even “advanced technologists” unable to explain “why a model produced a specific output.”

Whether requiring AI companies to explain their AI outputs to win government contracts will mess with other parts of Trump’s action plan remains to be seen. But Samir Jain, vice president of policy at a civil liberties group called the Center for Democracy and Technology, told the NYT that he predicts the anti-woke AI agenda will set “a really vague standard that’s going to be impossible for providers to meet.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Trump’s order to make chatbots anti-woke is unconstitutional, senator says Read More »

ai-video-is-invading-youtube-shorts-and-google-photos-starting-today

AI video is invading YouTube Shorts and Google Photos starting today

Google is following through on recent promises to add more generative AI features to its photo and video products. Over on YouTube, Google is rolling out the first wave of generative AI video for YouTube Shorts, but even if you’re not a YouTuber, you’ll be exposed to more AI videos soon. Google Photos, which is integrated with virtually every Android phone on the market, is also getting AI video-generation capabilities. In both cases, the features are currently based on the older Veo 2 model, not the more capable Veo 3 that has been meming across the Internet since it was announced at I/O in May.

YouTube CEO Neal Mohan confirmed earlier this summer that the company planned to add generative AI to the creator tools for YouTube Shorts. There were already tools to generate backgrounds for videos, but the next phase will involve creating new video elements from a text prompt.

Starting today, creators will be able to use a photo as the basis for a new generative AI video. YouTube also promises a collection of easily applied generative effects, which will be accessible from the Shorts camera. There’s also a new AI playground hub that the company says will be home to all its AI tools, along with examples and suggested prompts to help people pump out AI content.

The Veo 2-based videos aren’t as realistic as Veo 3 clips, but an upgrade is planned.

So far, all the YouTube AI video features are running on the Veo 2 model. The plan is still to move to Veo 3 later this summer. The AI features in YouTube Shorts are currently limited to the United States, Canada, Australia, and New Zealand, but they will expand to more countries later.

AI video is invading YouTube Shorts and Google Photos starting today Read More »

it’s-“frighteningly-likely”-many-us-courts-will-overlook-ai-errors,-expert-says

It’s “frighteningly likely” many US courts will overlook AI errors, expert says


Judges pushed to bone up on AI or risk destroying their court’s authority.

A judge points to a diagram of a hand with six fingers

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

Order in the court! Order in the court! Judges are facing outcry over a suspected AI-generated order in a court.

Fueling nightmares that AI may soon decide legal battles, a Georgia court of appeals judge, Jeff Watkins, explained why a three-judge panel vacated an order last month that appears to be the first known ruling in which a judge sided with someone seemingly relying on fake AI-generated case citations to win a legal fight.

Now, experts are warning that judges overlooking AI hallucinations in court filings could easily become commonplace, especially in the typically overwhelmed lower courts. And so far, only two states have moved to force judges to sharpen their tech competencies and adapt so they can spot AI red flags and theoretically stop disruptions to the justice system at all levels.

The recently vacated order came in a Georgia divorce dispute, where Watkins explained that the order itself was drafted by the husband’s lawyer, Diana Lynch. That’s a common practice in many courts, where overburdened judges historically rely on lawyers to draft orders. But that protocol today faces heightened scrutiny as lawyers and non-lawyers increasingly rely on AI to compose and research legal filings, and judges risk rubberstamping fake opinions by not carefully scrutinizing AI-generated citations.

The errant order partly relied on “two fictitious cases” to deny the wife’s petition—which Watkins suggested were “possibly ‘hallucinations’ made up by generative-artificial intelligence”—as well as two cases that had “nothing to do” with the wife’s petition.

Lynch was hit with $2,500 in sanctions after the wife appealed, and the husband’s response—which also appeared to be prepared by Lynch—cited 11 additional cases that were “either hallucinated” or irrelevant. Watkins was further peeved that Lynch supported a request for attorney’s fees for the appeal by citing “one of the new hallucinated cases,” writing it added “insult to injury.”

Worryingly, the judge could not confirm whether the fake cases were generated by AI or even determine if Lynch inserted the bogus cases into the court filings, indicating how hard it can be for courts to hold lawyers accountable for suspected AI hallucinations. Lynch did not respond to Ars’ request to comment, and her website appeared to be taken down following media attention to the case.

But Watkins noted that “the irregularities in these filings suggest that they were drafted using generative AI” while warning that many “harms flow from the submission of fake opinions.” Exposing deceptions can waste time and money, and AI misuse can deprive people of raising their best arguments. Fake orders can also soil judges’ and courts’ reputations and promote “cynicism” in the justice system. If left unchecked, Watkins warned, these harms could pave the way to a future where a “litigant may be tempted to defy a judicial ruling by disingenuously claiming doubt about its authenticity.”

“We have no information regarding why Appellee’s Brief repeatedly cites to nonexistent cases and can only speculate that the Brief may have been prepared by AI,” Watkins wrote.

Ultimately, Watkins remanded the case, partly because the fake cases made it impossible for the appeals court to adequately review the wife’s petition to void the prior order. But no matter the outcome of the Georgia case, the initial order will likely forever be remembered as a cautionary tale for judges increasingly scrutinized for failures to catch AI misuses in court.

“Frighteningly likely” judge’s AI misstep will be repeated

John Browning, a retired justice on Texas’ Fifth Court of Appeals and now a full-time law professor at Faulkner University, last year published a law article Watkins cited that warned of the ethical risks of lawyers using AI. In the article, Browning emphasized that the biggest concern at that point was that lawyers “will use generative AI to produce work product they treat as a final draft, without confirming the accuracy of the information contained therein or without applying their own independent professional judgment.”

Today, judges are increasingly drawing the same scrutiny, and Browning told Ars he thinks it’s “frighteningly likely that we will see more cases” like the Georgia divorce dispute, in which “a trial court unwittingly incorporates bogus case citations that an attorney includes in a proposed order” or even potentially in “proposed findings of fact and conclusions of law.”

“I can envision such a scenario in any number of situations in which a trial judge maintains a heavy docket and looks to counsel to work cooperatively in submitting proposed orders, including not just family law cases but other civil and even criminal matters,” Browning told Ars.

According to reporting from the National Center for State Courts, a nonprofit representing court leaders and professionals who are advocating for better judicial resources, AI tools like ChatGPT have made it easier for high-volume filers and unrepresented litigants who can’t afford attorneys to file more cases, potentially further bogging down courts.

Peter Henderson, a researcher who runs the Princeton Language+Law, Artificial Intelligence, & Society (POLARIS) Lab, told Ars that he expects cases like the Georgia divorce dispute aren’t happening every day just yet.

It’s likely that a “few hallucinated citations go overlooked” because generally, fake cases are flagged through “the adversarial nature of the US legal system,” he suggested. Browning further noted that trial judges are generally “very diligent in spotting when a lawyer is citing questionable authority or misleading the court about what a real case actually said or stood for.”

Henderson agreed with Browning that “in courts with much higher case loads and less adversarial process, this may happen more often.” But Henderson noted that the appeals court catching the fake cases is an example of the adversarial process working.

While that’s true in this case, it seems likely that anyone exhausted by the divorce legal process, for example, may not pursue an appeal if they don’t have energy or resources to discover and overturn errant orders.

Judges’ AI competency increasingly questioned

While recent history confirms that lawyers risk being sanctioned, fired from their firms, or suspended from practicing law for citing fake AI-generated cases, judges will likely only risk embarrassment for failing to catch lawyers’ errors or even for using AI to research their own opinions.

Not every judge is prepared to embrace AI without proper vetting, though. To shield the legal system, some judges have banned AI. Others have required disclosures—with some even demanding to know which specific AI tool was used—but that solution has not caught on everywhere.

Even if all courts required disclosures, Browning pointed out that disclosures still aren’t a perfect solution since “it may be difficult for lawyers to even discern whether they have used generative AI,” as AI features become increasingly embedded in popular legal tools. One day, it “may eventually become unreasonable to expect” lawyers “to verify every generative AI output,” Browning suggested.

Most likely—as a judicial ethics panel from Michigan has concluded—judges will determine “the best course of action for their courts with the ever-expanding use of AI,” Browning’s article noted. And the former justice told Ars that’s why education will be key, for both lawyers and judges, as AI advances and becomes more mainstream in court systems.

In an upcoming summer 2025 article in The Journal of Appellate Practice & Process, “The Dawn of the AI Judge,” Browning attempts to soothe readers by saying that AI isn’t yet fueling a legal dystopia. And humans are unlikely to face “robot judges” spouting AI-generated opinions any time soon, the former justice suggested.

Standing in the way of that, at least two states—Michigan and West Virginia—”have already issued judicial ethics opinions requiring judges to be ‘tech competent’ when it comes to AI,” Browning told Ars. And “other state supreme courts have adopted official policies regarding AI,” he noted, further pressuring judges to bone up on AI.

Meanwhile, several states have set up task forces to monitor their regional court systems and issue AI guidance, while states like Virginia and Montana have passed laws requiring human oversight for any AI systems used in criminal justice decisions.

Judges must prepare to spot obvious AI red flags

Until courts figure out how to navigate AI—a process that may look different from court to court—Browning advocates for more education and ethical guidance for judges to steer their use and attitudes about AI. That could help equip judges to avoid both ignorance of the many AI pitfalls and overconfidence in AI outputs, potentially protecting courts from AI hallucinations, biases, and evidentiary challenges sneaking past systems requiring human review and scrambling the court system.

An overlooked part of educating judges could be exposing AI’s influence so far in courts across the US. Henderson’s team is planning research that tracks which models attorneys are using most in courts. That could reveal “the potential legal arguments that these models are pushing” to sway courts—and which judicial interventions might be needed, Henderson told Ars.

“Over the next few years, researchers—like those in our group, the POLARIS Lab—will need to develop new ways to track the massive influence that AI will have and understand ways to intervene,” Henderson told Ars. “For example, is any model pushing a particular perspective on legal doctrine across many different cases? Was it explicitly trained or instructed to do so?”

Henderson also advocates for “an open, free centralized repository of case law,” which would make it easier for everyone to check for fake AI citations. “With such a repository, it is easier for groups like ours to build tools that can quickly and accurately verify citations,” Henderson said. That could be a significant improvement to the current decentralized court reporting system that often obscures case information behind various paywalls.

Dazza Greenwood, who co-chairs MIT’s Task Force on Responsible Use of Generative AI for Law, did not have time to send comments but pointed Ars to a LinkedIn thread where he suggested that a structural response may be needed to ensure that all fake AI citations are caught every time.

He recommended that courts create “a bounty system whereby counter-parties or other officers of the court receive sanctions payouts for fabricated cases cited in judicial filings that they reported first.” That way, lawyers will know that their work will “always” be checked and thus may shift their behavior if they’ve been automatically filing AI-drafted documents. In turn, that could alleviate pressure on judges to serve as watchdogs. It also wouldn’t cost much—mostly just redistributing the exact amount of fees that lawyers are sanctioned to AI spotters.

Novel solutions like this may be necessary, Greenwood suggested. Responding to a question asking if “shame and sanctions” are enough to stop AI hallucinations in court, Greenwood said that eliminating AI errors is imperative because it “gives both otherwise generally good lawyers and otherwise generally good technology a bad name.” Continuing to ban AI or suspend lawyers as a preferred solution risks dwindling court resources just as cases likely spike rather than potentially confronting the problem head-on.

Of course, there’s no guarantee that the bounty system would work. But “would the fact of such definite confidence that your cures will be individually checked and fabricated cites reported be enough to finally… convince lawyers who cut these corners that they should not cut these corners?”

In absence of a fake case detector like Henderson wants to build, experts told Ars that there are some obvious red flags that judges can note to catch AI-hallucinated filings.

Any case number with “123456” in it probably warrants review, Henderson told Ars. And Browning noted that AI tends to mix up locations for cases, too. “For example, a cite to a purported Texas case that has a ‘S.E. 2d’ reporter wouldn’t make sense, since Texas cases would be found in the Southwest Reporter,” Browning said, noting that some appellate judges have already relied on this red flag to catch AI misuses.

Those red flags would perhaps be easier to check with the open source tool that Henderson’s lab wants to make, but Browning said there are other tell-tale signs of AI usage that anyone who has ever used a chatbot is likely familiar with.

“Sometimes a red flag is the language cited from the hallucinated case; if it has some of the stilted language that can sometimes betray AI use, it might be a hallucination,” Browning said.

Judges already issuing AI-assisted opinions

Several states have assembled task forces like Greenwood’s to assess the risks and benefits of using AI in courts. In Georgia, the Judicial Council of Georgia Ad Hoc Committee on Artificial Intelligence and the Courts released a report in early July providing “recommendations to help maintain public trust and confidence in the judicial system as the use of AI increases” in that state.

Adopting the committee’s recommendations could establish “long-term leadership and governance”; a repository of approved AI tools, education, and training for judicial professionals; and more transparency on AI used in Georgia courts. But the committee expects it will take three years to implement those recommendations while AI use continues to grow.

Possibly complicating things further as judges start to explore using AI assistants to help draft their filings, the committee concluded that it’s still too early to tell if the judges’ code of conduct should be changed to prevent “unintentional use of biased algorithms, improper delegation to automated tools, or misuse of AI-generated data in judicial decision-making.” That means, at least for now, that there will be no code-of-conduct changes in Georgia, where the only case in which AI hallucinations are believed to have swayed a judge has been found.

Notably, the committee’s report also confirmed that there are no role models for courts to follow, as “there are no well-established regulatory environments with respect to the adoption of AI technologies by judicial systems.” Browning, who chaired a now-defunct Texas AI task force, told Ars that judges lacking guidance will need to stay on their toes to avoid trampling legal rights. (A spokesperson for the State Bar of Texas told Ars the task force’s work “concluded” and “resulted in the creation of the new standing committee on Emerging Technology,” which offers general tips and guidance for judges in a recently launched AI Toolkit.)

“While I definitely think lawyers have their own duties regarding AI use, I believe that judges have a similar responsibility to be vigilant when it comes to AI use as well,” Browning said.

Judges will continue sorting through AI-fueled submissions not just from pro se litigants representing themselves but also from up-and-coming young lawyers who may be more inclined to use AI, and even seasoned lawyers who have been sanctioned up to $5,000 for failing to check AI drafts, Browning suggested.

In his upcoming “AI Judge” article, Browning points to at least one judge, 11th Circuit Court of Appeals Judge Kevin Newsom, who has used AI as a “mini experiment” in preparing opinions for both a civil case involving an insurance coverage issue and a criminal matter focused on sentencing guidelines. Browning seems to appeal to judges’ egos to get them to study up so they can use AI to enhance their decision-making and possibly expand public trust in courts, not undermine it.

“Regardless of the technological advances that can support a judge’s decision-making, the ultimate responsibility will always remain with the flesh-and-blood judge and his application of very human qualities—legal reasoning, empathy, strong regard for fairness, and unwavering commitment to ethics,” Browning wrote. “These qualities can never be replicated by an AI tool.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

It’s “frighteningly likely” many US courts will overlook AI errors, expert says Read More »

cops’-favorite-ai-tool-automatically-deletes-evidence-of-when-ai-was-used

Cops’ favorite AI tool automatically deletes evidence of when AI was used


AI police tool is designed to avoid accountability, watchdog says.

On Thursday, a digital rights group, the Electronic Frontier Foundation, published an expansive investigation into AI-generated police reports that the group alleged are, by design, nearly impossible to audit and could make it easier for cops to lie under oath.

Axon’s Draft One debuted last summer at a police department in Colorado, instantly raising questions about the feared negative impacts of AI-written police reports on the criminal justice system. The tool relies on a ChatGPT variant to generate police reports based on body camera audio, which cops are then supposed to edit to correct any mistakes, assess the AI outputs for biases, or add key context.

But the EFF found that the tech “seems designed to stymie any attempts at auditing, transparency, and accountability.” Cops don’t have to disclose when AI is used in every department, and Draft One does not save drafts or retain a record showing which parts of reports are AI-generated. Departments also don’t retain different versions of drafts, making it difficult to assess how one version of an AI report might compare to another to help the public determine if the technology is “junk,” the EFF said. That raises the question, the EFF suggested, “Why wouldn’t an agency want to maintain a record that can establish the technology’s accuracy?”

It’s currently hard to know if cops are editing the reports or “reflexively rubber-stamping the drafts to move on as quickly as possible,” the EFF said. That’s particularly troubling, the EFF noted, since Axon disclosed to at least one police department that “there has already been an occasion when engineers discovered a bug that allowed officers on at least three occasions to circumvent the ‘guardrails’ that supposedly deter officers from submitting AI-generated reports without reading them first.”

The AI tool could also possibly be “overstepping in its interpretation of the audio,” possibly misinterpreting slang or adding context that never happened.

A “major concern,” the EFF said, is that the AI reports can give cops a “smokescreen,” perhaps even allowing them to dodge consequences for lying on the stand by blaming the AI tool for any “biased language, inaccuracies, misinterpretations, or lies” in their reports.

“There’s no record showing whether the culprit was the officer or the AI,” the EFF said. “This makes it extremely difficult if not impossible to assess how the system affects justice outcomes over time.”

According to the EFF, Draft One “seems deliberately designed to avoid audits that could provide any accountability to the public.” In one video from a roundtable discussion the EFF reviewed, an Axon senior principal product manager for generative AI touted Draft One’s disappearing drafts as a feature, explaining, “we don’t store the original draft and that’s by design and that’s really because the last thing we want to do is create more disclosure headaches for our customers and our attorney’s offices.”

The EFF interpreted this to mean that “the last thing” that Axon wants “is for cops to have to provide that data to anyone (say, a judge, defense attorney or civil liberties non-profit).”

“To serve and protect the public interest, the AI output must be continually and aggressively evaluated whenever and wherever it’s used,” the EFF said. “But Axon has intentionally made this difficult.”

The EFF is calling for a nationwide effort to monitor AI-generated police reports, which are expected to be increasingly deployed in many cities over the next few years, and published a guide to help journalists and others submit records requests to monitor police use in their area. But “unfortunately, obtaining these records isn’t easy,” the EFF’s investigation confirmed. “In many cases, it’s straight-up impossible.”

An Axon spokesperson provided a statement to Ars:

Draft One helps officers draft an initial report narrative strictly from the audio transcript of the body-worn camera recording and includes a range of safeguards, including mandatory human decision-making at crucial points and transparency about its use. Just as with narrative reports not generated by Draft One, officers remain fully responsible for the content. Every report must be edited, reviewed, and approved by a human officer, ensuring both accuracy and accountability. Draft One was designed to mirror the existing police narrative process—where, as has long been standard, only the final, approved report is saved and discoverable, not the interim edits, additions, or deletions made during officer or supervisor review.

Since day one, whenever Draft One is used to generate an initial narrative, its use is stored in Axon Evidence’s unalterable digital audit trail, which can be retrieved by agencies on any report. By default, each Draft One report also includes a customizable disclaimer, which can appear at the beginning or end of the report in accordance with agency policy. We recently added the ability for agencies to export Draft One usage reports—showing how many drafts have been generated and submitted per user—and to run reports on which specific evidence items were used with Draft One, further supporting transparency and oversight. Axon is committed to continuous collaboration with police agencies, prosecutors, defense attorneys, community advocates, and other stakeholders to gather input and guide the responsible evolution of Draft One and AI technologies in the justice system, including changes as laws evolve.

“Police should not be using AI”

Expecting Axon’s tool would likely spread fast—marketed as a time-saving add-on service to police departments that already rely on Axon for tasers and body cameras—EFF’s senior policy analyst Matthew Guariglia told Ars that the EFF quickly formed a plan to track adoption of the new technology.

Over the spring, the EFF sent public records requests to dozens of police departments believed to be using Draft One. To craft the requests, they also reviewed Axon user manuals and other materials.

In a press release, the EFF confirmed that the investigation “found the product offers meager oversight features,” including a practically useless “audit log” function that seems contradictory to police norms surrounding data retention.

Perhaps most glaringly, Axon’s tool doesn’t allow departments to “export a list of all police officers who have used Draft One,” the EFF noted, or even “export a list of all reports created by Draft One, unless the department has customized its process.” Instead, Axon only allows exports of basic logs showing actions taken on a particular report or an individual user’s basic activity in the system, like logins and uploads. That makes it “near impossible to do even the most basic statistical analysis: how many officers are using the technology and how often,” the EFF said.

Any effort to crunch the numbers would be time-intensive, the EFF found. In some departments, it’s possible to look up individual cops’ records to determine when they used Draft One, but that “could mean combing through dozens, hundreds, or in some cases, thousands of individual user logs.” And it would take a similarly “massive amount of time” to sort through reports one by one, considering “the sheer number of reports generated” by any given agency, the EFF noted.

In some jurisdictions, cops are required to disclose when AI is used to generate reports. And some departments require it, the EFF found, which made the documents more easily searchable and in turn made some police departments more likely to respond to public records requests without charging excessive fees or requiring substantial delays. But at least one department in Indiana told the EFF, “We do not have the ability to create a list of reports created through Draft One. They are not searchable.”

While not every cop can search their Draft One reports, Axon can, the EFF reported, suggesting that the company can track how much police use the tool better than police themselves can.

The EFF hopes its reporting will curtail the growing reliance on shady AI-generated police reports, which Guariglia told Ars risk becoming even more common in US policing without intervention.

In California, where some cops have long been using Draft One, a bill has been introduced that would require disclosures clarifying which parts of police reports are AI-generated. That law, if passed, would also “require the first draft created to be retained for as long as the final report is retained,” which Guariglia told Ars would make Draft One automatically unlawful as currently designed. Utah is weighing a similar but less robust initiative, the EFF noted.

Guariglia told Ars that the EFF has talked to public defenders who worry how the proliferation of AI-generated police reports is “going to affect cross-examination” by potentially giving cops an easy scapegoat when accused of lying on the stand.

To avoid the issue entirely, at least one district attorney’s office in King County, Washington, has banned AI police reports, citing “legitimate concerns about some of the products on the market now.” Guariglia told Ars that one of the district attorney’s top concerns was that using the AI tool could “jeopardize cases.” The EFF is now urging “other prosecutors to follow suit and demand that police in their jurisdiction not unleash this new, unaccountable, and intentionally opaque AI product.”

“Police should not be using AI to write police reports,” Guariglia said. “There are just too many questions left unanswered about how AI would translate the audio of situations, whether police will actually edit those drafts, and whether the public will ever be able to tell what was written by a person and what was written by a computer. This is before we even get to the question of how these reports might lead to problems in an already unfair and untransparent criminal justice system.”

This story was updated to include a statement from Axon. 

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Cops’ favorite AI tool automatically deletes evidence of when AI was used Read More »