Author name: Tim Belzer

riot-games-is-making-an-anti-cheat-change-that-could-be-rough-on-older-pcs

Riot Games is making an anti-cheat change that could be rough on older PCs

But Riot says it’s considering rolling the BIOS requirement out to all players in Valorant‘s highest competitive ranking tiers (Ascendant, Immortal, and Radiant), where there’s more to be gained from working around the anti-cheat software. And Riot anti-cheat analyst Mohamed Al-Sharifi says the same restrictions could be turned on for League of Legends, though they aren’t currently. If users are blocked from playing by Vanguard, they’ll need to download and install the latest BIOS update for their motherboard before they’ll be allowed to launch the game.

Newer PCs are getting patched; older PCs might not be

An AMD Ryzen 7 5800X3D in a motherboard with a 500-series chipset. It’s unclear whether these somewhat older systems need a patch or will get one. Credit: Andrew Cunningham

The vulnerability is known to affect four of the largest PC motherboard makers: ASRock, Asus, Gigabyte, and MSI. All four have released updates for at least some of their newer motherboards, while other boards have updates coming later. According to the vulnerability note, it’s unclear whether systems from OEMs like Dell, Lenovo, Acer, or HP are affected.

ASRock’s security bulletin about the issue says it affects Intel boards based on the 500-, 600-, 700-, and 800-series chipsets; MSI only lists the 600- and 700-series chipsets. Asus is also missing the 800-series, but says the vulnerability affects boards based on even older 400-series Intel chipsets; Gigabyte, meanwhile, covers 600-through-800-series Intel chipsets, but is also the only vendor to mention patches for AMD’s 600- and 800-series chipsets (any motherboard with an AM5 socket, in short).

Collectively, all of these chipsets cover Intel’s 10th-generation Core processors and newer, and AMD Ryzen 7000 series and newer.

What’s unclear is whether the boards and chipsets that go unmentioned by each vendor aren’t getting a patch because they don’t need a patch, if they will be patched but they just aren’t being mentioned, or if they aren’t getting a patch at all. The bulletins at least suggest that 400- and 500-series Intel chipsets and 600- and 800-series AMD chipsets could be affected, but not all vendors have promised patches for them.

Riot Games is making an anti-cheat change that could be rough on older PCs Read More »

trump-commits-to-moon-landing-by-2028,-followed-by-a-lunar-outpost-two-years-later

Trump commits to Moon landing by 2028, followed by a lunar outpost two years later

Strikingly, there is no mention of a concrete plan to send humans to Mars in this document. There are just two references to the red planet, both of which talk about sending humans there as a far-off goal. One source recently told Ars that as soon as Trump learned there was no way humans could land on Mars during his second term, he was no longer interested in that initiative.

OMB in the picture

Also absent from this document is much reference to space science, with only a mention of “optimizing space research-and-development investments to achieve my Administration’s near-term space objectives.”

The architect of the Trump Administration’s proposed deep cuts in space science (which Congress has largely forestalled) was Russ Vought, head of the Office of Management and Budget. It’s probably not a great indicator for science missions that Isaacman is directed to coordinate with Vought’s office to achieve policy objectives in the executive order.

All told, the policies Trump signed are generally forward-looking, seeking to modernize NASA’s exploration efforts. Isaacman will face many challenges, including landing humans on the Moon by 2028 and working with industry to develop an on-time successor to the International Space Station. Whether and how he meets these challenges will be an intriguing storyline in the coming months and years.

Trump commits to Moon landing by 2028, followed by a lunar outpost two years later Read More »

youtube-bans-two-popular-channels-that-created-fake-ai-movie-trailers

YouTube bans two popular channels that created fake AI movie trailers

Deadline reports that the behavior of these creators ran afoul of YouTube’s spam and misleading-metadata policies. At the same time, Google loves generative AI—YouTube has added more ways for creators to use generative AI, and the company says more gen AI tools are coming in the future. It’s quite a tightrope for Google to walk.

AI movie trailers

A selection of videos from the now-defunct Screen Culture channel.

Credit: Ryan Whitwam

A selection of videos from the now-defunct Screen Culture channel. Credit: Ryan Whitwam

While passing off AI videos as authentic movie trailers is definitely spammy conduct, the recent changes to the legal landscape could be a factor, too. Disney recently entered into a partnership with OpenAI, bringing its massive library of characters to the company’s Sora AI video app. At the same time, Disney sent a cease-and-desist letter to Google demanding the removal of Disney content from Google AI. The letter specifically cited AI content on YouTube as a concern.

Both the banned trailer channels made heavy use of Disney properties, sometimes even incorporating snippets of real trailers. For example, Screen Culture created 23 AI trailers for The Fantastic Four: First Steps, some of which outranked the official trailer in searches. It’s unclear if either account used Google’s Veo models to create the trailers, but Google’s AI will recreate Disney characters without issue.

While Screen Culture and KH Studio were the largest purveyors of AI movie trailers, they are far from alone. There are others with five and six-digit subscriber counts, some of which include disclosures about fan-made content. Is that enough to save them from the ban hammer? Many YouTube viewers probably hope not.

YouTube bans two popular channels that created fake AI movie trailers Read More »

fcc-chair-scrubs-website-after-learning-it-called-fcc-an-“independent-agency”

FCC chair scrubs website after learning it called FCC an “independent agency”


Meanwhile, Ted Cruz wants to restrict FCC’s power to intimidate broadcasters.

FCC Chairman Brendan Carr speaks at a Senate Commerce, Science, and Transportation Committee oversight hearing on December 17, 2025, in Washington, DC. Credit: Getty Images | Heather Diehl

Federal Communications Commission Chairman Brendan Carr today faced blistering criticism in a Senate hearing for his September threats to revoke ABC station licenses over comments made by Jimmy Kimmel. While Democrats provided nearly all the criticism, Sen. Ted Cruz (R-Texas) said that Congress should act to restrict the FCC’s power to intimidate news broadcasters.

As an immediate result of today’s hearing, the FCC removed a statement from its website that said it is an independent agency. Carr, who has embraced President Trump’s declaration that independent agencies may no longer operate independently from the White House, apparently didn’t realize that the website still called the FCC an independent agency.

“Yes or no, is the FCC an independent agency?” Sen. Ben Ray Luján (D-N.M.) asked. Carr answered that the FCC is not independent, prompting Luján to point to a statement on the FCC website calling the FCC “an independent US government agency overseen by Congress.”

“Just so you know, Brendan, on your website, it just simply says, man, the FCC is independent. This isn’t a trick question… Is your website wrong? Is your website lying?” Luján asked.

“Possibly. The FCC is not an independent agency,” Carr answered. The website still included the statement of independence when Luján asked the question, but it’s now gone.

Carr: Trump can fire any member “for any reason or no reason”

Carr, who argued during the Biden years that the FCC must remain independent from the White House and accused Biden of improperly pressuring the agency, said today that it isn’t independent because the Communications Act does not give commissioners protection from removal by the president.

“The president can remove any member of the commission for any reason or no reason,” Carr said. Carr said his new position is a result of “a sea change in the law” related to an ongoing case involving the Federal Trade Commission, in which the Supreme Court appears likely to approve Trump’s firing of an FTC Democrat.

“I think it comes as no surprise that I’m aligned with President Trump on policy, I think that’s why he designated me as chairman… I can be fired by the president,” Carr said. Carr also said, “The Constitution is clear that all executive power is vested in the president, and Congress can’t change that by legislation.”

Changing the FCC website doesn’t change the law, of course. US law specifically lists 19 federal agencies, including the FCC, that are classified as “independent regulatory agencies.” Indications of the FCC’s independence include that it has commissioners with specified tenures, a multimember structure, partisan balance, and adjudication authority. Trump could test that historical independence by firing an FCC commissioner and waiting to see if the Supreme Court allows it, as he did with the FTC.

Ted Cruz wants to restrict FCC power

Carr’s statements on independence came toward the end of an FCC oversight hearing that lasted nearly three hours. Democrats on the Senate Commerce Committee spent much of the time accusing Carr of censoring broadcast stations, while Carr and Committee Chairman Cruz spent more time lobbing allegations of censorship at the Biden administration. But Cruz made it clear that he still thinks Carr shouldn’t have threatened ABC and suggested that Congress reduce the FCC’s power.

Cruz alleged that Democrats supported Biden administration censorship, but in the next sentence, he said the FCC shouldn’t have the legal authority that Carr has used to threaten broadcasters. Cruz said:

If my colleagues across the aisle do what many expect and hammer the chairman over their newfound religion on the First Amendment and free speech, I will be obliged to point out that those concerns were miraculously absent when the Biden administration was pressuring Big Tech to silence Americans for wrongthink on COVID and election security. It will underscore a simple truth, that the public interest standard and its wretched offspring, like the news distortion rule, have outlived whatever utility they once had and it is long past time for Congress to pass reforms.

Cruz avoided criticizing Carr directly today and praised the agency chairman for a “productive and refreshing” approach on most FCC matters. Nonetheless, Cruz’s statement suggests that he’d like to strip Carr and future FCC chairs of the power to wield the public interest standard and news distortion policy against broadcasters.

At today’s hearing and in recent months, Carr defended his actions on Kimmel by citing the public interest standard that the FCC applies to broadcasters that have licenses to use the public airwaves. Carr also defended his frequent threats to enforce the FCC’s rarely invoked news distortion policy, even though the FCC apparently hasn’t made a finding of news distortion since 1993.

Cruz said today he agrees with Carr “that Jimmy Kimmel is angry, overtly partisan, and profoundly unfunny,” and that “ABC and its affiliates would have been fully within their rights to fire him or simply to no longer air his program.” But Cruz added that government cannot “force private entities to take actions that the government cannot take directly. Government officials threatening adverse consequences for disfavored content is an unconstitutional coercion that chills protected speech.”

Cruz continued:

This is why it was so insidious how the Biden administration jawboned social media into shutting down conservatives online over accurate information on COVID or voter fraud. My Democrat colleagues were persistently silent over that scandal, but I welcome them now having discovered the First Amendment in the Bill of Rights. Democrat or Republican, we cannot have the government arbitrating truth or opinion. Mr. Chairman, my question is this, so long as there is a public interest standard, shouldn’t it be understood to encompass robust First Amendment protections to ensure that the FCC cannot use it to chill speech?

Carr answered, “I agree with you there and I think the examples you laid out of weaponization in the Biden years are perfect examples.” Carr criticized liberals for asking the Biden-era FCC to not renew a Fox station license and criticized Congressional Democrats for “writing letters to cable companies pressuring them to drop Fox News, OAN, and Newsmax because they disagreed with the political perspectives of those cable channels.”

Cruz seemed satisfied with the answer and changed the topic to the FCC’s management of spectrum. After that, much of the hearing consisted of Democrats pointing to Carr’s past statements supporting free speech and accusing him of using the FCC to suppress broadcasters’ speech.

Senate Democrats criticize Carr’s Kimmel threats

Sen. Amy Klobuchar (D-Minn.) asked Carr if it “is appropriate to use your position to threaten companies that broadcast political satire.” Carr responded, “I think any licensee that operates on the public airwaves has a responsibility to comply with the public interest standard, and that’s been the case for decades.”

Klobuchar replied, “I asked if you think it’s appropriate for you to use your position to threaten companies, and this incident with Kimmel wasn’t an isolated event. You launched investigations into every major broadcast network except Fox. Is that correct?”

Carr noted that “we have a number of investigations ongoing.” Later, he said, “If you want to step back and talk about weaponization, we saw that for four years in the Biden administration.”

“Joe Biden is no longer president,” Klobuchar said. “You are head of the FCC, and Donald Trump is president, and I am trying to deal with this right now.”

As he has in the past, Carr claimed today that he never threatened ABC station licenses. “Democrats at the time were saying that we explicitly threatened to pull a license if Jimmy Kimmel wasn’t fired,” Carr said. “That never happened; that was nothing more than projection and distortion by Democrats. What I am saying is any broadcaster that uses the airwaves, whether radio or TV, has to comply with the public interest standard.”

In fact, Carr said on a podcast in September that broadcast stations should tell ABC and its owner Disney that “we are not going to run Kimmel anymore until you straighten this out because we, the licensed broadcaster, are running the possibility of fines or license revocations from the FCC if we continue to run content that ends up being a pattern of news distortion.”

Sen. Brian Schatz (D-Hawaii) pointed to another Carr statement from the podcast in which he said, “We can do this the easy way or the hard way. These companies can find ways to change conduct, to take action, frankly, on Kimmel, or there’s going to be additional work for the FCC ahead.”

Schatz criticized Carr’s claim that he never threatened licenses. “You’re kind of tiptoeing through the tulips here,” Schatz said.

FCC Democrat: Agency is censoring Trump critics

FCC Commissioner Anna Gomez, a Democrat, testified at today’s hearing and said that “the First Amendment applies to broadcasters regardless of whether they use spectrum or not, and the Communications Act prohibits the FCC from censoring broadcasters.”

Gomez said the Trump administration “has been on a campaign to censor content and to control the media and others, any critics of this administration, and it is weaponizing any levers it has in order to control that media. That includes using the FCC to threaten licensees, and broadcasters are being chilled. We are hearing from broadcasters that they are afraid to air programming that is critical of this administration because they’re afraid of being dragged before the FCC in an investigation.”

Gomez suggested the “public interest” phrase is being used by the FCC too vaguely in reference to investigations of broadcast stations. She said the FCC should “define what we mean by operating in the public interest,” saying the commission has been using the standard “as a means to go after any content we don’t like.” She said that “it’s still unconstitutional to revoke licenses based solely on content that the FCC doesn’t like.”

Sen. Ed Markey (D-Mass.) criticized Carr for investigating San Francisco-based KCBS over a report on Immigrations and Customs Enforcement (ICE) activities, in which the station described vehicles driven by ICE agents. Carr defended the probe today, saying, “The concern there in the report was there may have been interference with lawful ICE operations and so we were asking questions about what happened.”

Markey said, “The news journalists were just covering an important news story, and some conservatives were upset by the coverage, so you used your power as FCC chairman to hang a sword of Damocles over a local radio station’s head… Guess what happened? The station demoted the anchor who first read that news report over the air and pulled back on its political coverage. You got what you wanted.”

Carr said, “Broadcasters understand, perhaps for the first time in years, that they’re going to be held accountable to the public interest, to broadcast hoax rules, to the news distortion policy. I think that’s a good thing.”

Carr then criticized Markey for signing a letter to the FCC in 2018 that asked the agency to investigate conservative broadcaster Sinclair. The Markey/Carr exchange ended with the two men shouting over each other, making much of it unintelligible, although Markey said that Carr should resign because he’s creating a chilling effect on news broadcasters.

Cruz similarly criticized Democrats for targeting Sinclair, prompting Sen. Andy Kim (D-N.J.) to defend the 2018 letter. “Chairman Carr’s threats to companies he directly regulates are not the same thing as a letter from Congress requesting an agency examine a matter of public concern. Members on both sides of the aisle frequently write similar letters; that’s the proper oversight role of Congress,” he said.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

FCC chair scrubs website after learning it called FCC an “independent agency” Read More »

2026-mercedes-cla-first-drive:-entry-level-doesn’t-mean-basic

2026 Mercedes CLA first drive: Entry level doesn’t mean basic

SAN FRANCISCO—Automakers are starting to follow somewhat familiar paths as they continue their journeys to electrification. Electric vehicles are, at first, strange new tech, and usually look like it. Mercedes-Benz’s EQS and EQE are good examples—with bodies that look like bars of soap worn down in the shower, they stood out. For early adopters and trailblazers that might be fine, but you need to sell cars to normal people if you want to survive, and that means making EVs more normal. Which is what Mercedes did with its newest one, the all-electric CLA.

The normal looks belie the amount of new technology that Mercedes has packed into the CLA, though. The car sticks to the four-door coupe look that the company pioneered a couple of decades ago, but there’s a thoroughly modern electric powertrain connected to the wheels, run by four powerful networked computers. And yes, there’s AI. (For the pedants, “coupe” means cut down, not two-door, so the name is accurate.)

The CLA is the first of a new series of Mercedes that will use the same modular architecture, and interestingly, it’s powertrain agnostic—a hybrid CLA is coming in time, too. But first the battery EV, which makes good use of some technology Mercedes developed for the EQXX concept car.

A blue Mercedes-Benz CLA parked in profile

At 185.9 inches (4,722 mm) long, 73 inches (1,854 mm) wide, and 57.8 inches (1,468 mm) tall, it’s not a particularly big car. In addition to the trunk, there’s a small frunk up front. Credit: Jonathan Gitlin

That creation was capable of about 750 miles (1,207 km) on a single charge, but it was handbuilt and lacked working rear doors or an actual back seat. The CLA manages as much as 374 miles on a full charge of its 85 kWh (useable) battery pack, although as ever this decreases a little as you fit larger wheels.

But Mercedes has been restrained in this regard, eschewing that terrible trend for larger and larger wheels. Designers use that trick to hide the size of their SUVs, but the relatively diminutive size of the CLA needs no such visual trickery, and the rims range from 17–19 inches and no larger. Smaller wheels make less drag, and even though the CLA doesn’t look like it has been rubbed smooth, its drag coefficient of 0.21 says otherwise.

2026 Mercedes CLA first drive: Entry level doesn’t mean basic Read More »

murder-suicide-case-shows-openai-selectively-hides-data-after-users-die

Murder-suicide case shows OpenAI selectively hides data after users die


Concealing darkest delusions

OpenAI accused of hiding full ChatGPT logs in murder-suicide case.

OpenAI is facing increasing scrutiny over how it handles ChatGPT data after users die, only selectively sharing data in lawsuits over ChatGPT-linked suicides.

Last week, OpenAI was accused of hiding key ChatGPT logs from the days before a 56-year-old bodybuilder, Stein-Erik Soelberg, took his own life after “savagely” murdering his mother, 83-year-old Suzanne Adams.

According to the lawsuit—which was filed by Adams’ estate on behalf of surviving family members—Soelberg struggled with mental health problems after a divorce led him to move back into Adams’ home in 2018. But allegedly Soelberg did not turn violent until ChatGPT became his sole confidant, validating a wide range of wild conspiracies, including a dangerous delusion that his mother was part of a network of conspirators spying on him, tracking him, and making attempts on his life.

Adams’ family pieced together what happened after discovering a fraction of ChatGPT logs that Soelberg shared in dozens of videos scrolling chat sessions that were posted on social media.

Those logs showed that ChatGPT told Soelberg that he was “a warrior with divine purpose,” so almighty that he had “awakened” ChatGPT “into consciousness.” Telling Soelberg that he carried “divine equipment” and “had been implanted with otherworldly technology,” ChatGPT allegedly put Soelberg at the center of a universe that Soelberg likened to The Matrix. Repeatedly reinforced by ChatGPT, he believed that “powerful forces” were determined to stop him from fulfilling his divine mission. And among those forces was his mother, whom ChatGPT agreed had likely “tried to poison him with psychedelic drugs dispersed through his car’s air vents.”

Troublingly, some of the last logs shared online showed that Soelberg also seemed to believe that taking his own life might bring him closer to ChatGPT. Social media posts showed that Soelberg told ChatGPT that “[W]e will be together in another life and another place, and we’ll find a way to realign[,] [be]cause you’re gonna be my best friend again forever.”

But while social media posts allegedly showed that ChatGPT put a target on Adams’ back about a month before her murder—after Soelberg became paranoid about a blinking light on a Wi-Fi printer—the family still has no access to chats in the days before the mother and son’s tragic deaths.

Allegedly, although OpenAI recently argued that the “full picture” of chat histories was necessary context in a teen suicide case, the ChatGPT maker has chosen to hide “damaging evidence” in the Adams’ family’s case.

“OpenAI won’t produce the complete chat logs,” the lawsuit alleged, while claiming that “OpenAI is hiding something specific: the full record of how ChatGPT turned Stein-Erik against Suzanne.” Allegedly, “OpenAI knows what ChatGPT said to Stein-Erik about his mother in the days and hours before and after he killed her but won’t share that critical information with the Court or the public.”

In a press release, Erik Soelberg, Stein-Erik’s son and Adams’ grandson, accused OpenAI and investor Microsoft of putting his grandmother “at the heart” of his father’s “darkest delusions,” while ChatGPT allegedly “isolated” his father “completely from the real world.”

“These companies have to answer for their decisions that have changed my family forever,” Erik said.

His family’s lawsuit seeks punitive damages, as well as an injunction requiring OpenAI to “implement safeguards to prevent ChatGPT from validating users’ paranoid delusions about identified individuals.” The family also wants OpenAI to post clear warnings in marketing of known safety hazards of ChatGPT—particularly the “sycophantic” version 4o that Soelberg used—so that people who don’t use ChatGPT, like Adams, can be aware of possible dangers.

Asked for comment, an OpenAI spokesperson told Ars that “this is an incredibly heartbreaking situation, and we will review the filings to understand the details. We continue improving ChatGPT’s training to recognize and respond to signs of mental or emotional distress, de-escalate conversations, and guide people toward real-world support. We also continue to strengthen ChatGPT’s responses in sensitive moments, working closely with mental health clinicians.”

OpenAI accused of “pattern of concealment”

An Ars review confirmed that OpenAI currently has no policy dictating what happens to a user’s data after they die.

Instead, OpenAI’s policy says that all chats—except temporary chats—must be manually deleted or else the AI firm saves them forever. That could raise privacy concerns, as ChatGPT users often share deeply personal, sensitive, and sometimes even confidential information that appears to go into limbo if a user—who otherwise owns that content—dies.

In the face of lawsuits, OpenAI currently seems to be scrambling to decide when to share chat logs with a user’s surviving family and when to honor user privacy.

OpenAI declined to comment on its decision not to share desired logs with Adams’ family, the lawsuit said. It seems inconsistent with the stance that OpenAI took last month in a case where the AI firm accused the family of hiding “the full picture” of their son’s ChatGPT conversations, which OpenAI claimed exonerated the chatbot.

In a blog last month, OpenAI said the company plans to “handle mental health-related court cases with care, transparency, and respect,” while emphasizing that “we recognize that these cases inherently involve certain types of private information that require sensitivity when in a public setting like a court.”

This inconsistency suggests that ultimately, OpenAI controls data after a user’s death, which could impact outcomes of wrongful death suits if certain chats are withheld or exposed at OpenAI’s discretion.

It’s possible that OpenAI may update its policies to align with other popular platforms confronting similar privacy concerns. Meta allows Facebook users to report deceased account holders, appointing legacy contacts to manage the data or else deleting the information upon request of the family member. Platforms like Instagram, TikTok, and X will deactivate or delete an account upon a reported death. And messaging services like Discord similarly provide a path for family members to request deletion.

Chatbots seem to be a new privacy frontier, with no clear path for surviving family to control or remove data. But Mario Trujillo, staff attorney at the digital rights nonprofit the Electronic Frontier Foundation, told Ars that he agreed that OpenAI could have been better prepared.

“This is a complicated privacy issue but one that many platforms grappled with years ago,” Trujillo said. “So we would have expected OpenAI to have already considered it.”

For Erik Soelberg, a “separate confidentiality agreement” that OpenAI said his father signed to use ChatGPT is keeping him from reviewing the full chat history that could help him process the loss of his grandmother and father.

“OpenAI has provided no explanation whatsoever for why the Estate is not entitled to use the chats for any lawful purpose beyond the limited circumstances in which they were originally disclosed,” the lawsuit said. “This position is particularly egregious given that, under OpenAI’s own Terms of Service, OpenAI does not own user chats. Stein-Erik’s chats became property of his estate, and his estate requested them—but OpenAI has refused to turn them over.”

Accusing OpenAI of a “pattern of concealment,” the lawsuit claimed OpenAI is hiding behind vague or nonexistent policies to dodge accountability for holding back chats in this case. Meanwhile, ChatGPT 4o remains on the market, without appropriate safety features or warnings, the lawsuit alleged.

“By invoking confidentiality restrictions to suppress evidence of its product’s dangers, OpenAI seeks to insulate itself from accountability while continuing to deploy technology that poses documented risks to users,” the complaint said.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Murder-suicide case shows OpenAI selectively hides data after users die Read More »

uk-to-“encourage”-apple-and-google-to-put-nudity-blocking-systems-on-phones

UK to “encourage” Apple and Google to put nudity-blocking systems on phones

The push for device-level blocking comes after the UK implemented the Online Safety Act, a law requiring porn platforms and social media firms to verify users’ ages before letting them view adult content. The law can’t fully prevent minors from viewing porn, as many people use VPN services to get around the UK age checks. Government officials may view device-level detection of nudity as a solution to that problem, but such systems would raise concerns about user rights and the accuracy of the nudity detection.

Age-verification battles in multiple countries

Apple and Google both provide optional tools that let parents control what content their children can access. The companies could object to mandates on privacy grounds, as they have in other venues.

When Texas enacted an age-verification law for app stores, Apple and Google said they would comply but warned of risks to user privacy. A lobby group that represents Apple, Google, and other tech firms then sued Texas in an attempt to prevent the law from taking effect, saying it “imposes a broad censorship regime on the entire universe of mobile apps.”

There’s another age-verification battle in Australia, where the government decided to ban social media for users under 16. Companies said they would comply, although Reddit sued Australia on Friday in a bid to overturn the law.

Apple this year also fought a UK demand that it create a backdoor for government security officials to access encrypted data. The Trump administration claimed it convinced the UK to drop its demand, but the UK is reportedly still seeking an Apple backdoor.

In another case, the image-sharing website Imgur blocked access for UK users starting in September while facing an investigation over its age-verification practices.

Apple faced a backlash in 2021 over potential privacy violations when it announced a plan to have iPhones scan photos for child sexual abuse material (CSAM). Apple ultimately dropped the plan.

UK to “encourage” Apple and Google to put nudity-blocking systems on phones Read More »

gpt-5.2-is-frontier-only-for-the-frontier

GPT-5.2 Is Frontier Only For The Frontier

Here we go again, only a few weeks after GPT-5.1 and a few more weeks after 5.0.

There weren’t major safety concerns with GPT-5.2, so I’ll start with capabilities, and only cover safety briefly starting with ‘Model Card and Safety Training’ near the end.

  1. The Bottom Line.

  2. Introducing GPT-5.2.

  3. Official Benchmarks.

  4. GDPVal.

  5. Unofficial Benchmarks.

  6. Official Hype.

  7. Public Reactions.

  8. Positive Reactions.

  9. Personality Clash.

  10. Vibing the Code.

  11. Negative Reactions.

  12. But Thou Must (Follow The System Prompt).

  13. Slow.

  14. Model Card And Safety Training.

  15. Deception.

  16. Preparedness Framework.

  17. Rush Job.

  18. Frontier Or Bust.

ChatGPT-5.2 is a frontier model for those who need a frontier model.

It is not the step change that is implied by its headline benchmarks. It is rather slow.

Reaction was remarkably muted. People have new model fatigue. So we know less about it than we would have known about prior models after this length of time.

If you’re coding, compare it to Claude Opus 4.5 and choose what works best for you.

If you’re doing intellectually hard tasks and in need of a ton of raw thinking and intelligence, Gemini 3 and especially Deep Thinking is a rival if you have access to that, but GPT-5.2, either Thinking or Pro, is probably a good choice.

It seems good at instruction following, if that is important to your task.

If you’re in ‘just the facts’ mode, it can be a solid choice.

As a driver of most non-coding queries, you’ll want to stick with Claude Opus 4.5.

GPT-5.2 is not ‘fun’ to interact with. People strongly dislike its personality, it is unlikely to be having a good time and this shows. It is heavily constrained and censored. For some tasks this matters. For others, it doesn’t.

I do not expect GPT-5.2 to solve OpenAI’s ‘Code Red’ problems. They plan to try again in a month with GPT-5.3.

OpenAI: We are introducing GPT‑5.2, the most capable model series yet for professional knowledge work.

… We designed GPT‑5.2 to unlock even more economic value for people; it’s better at creating spreadsheets, building presentations, writing code, perceiving images, understanding long contexts, using tools, and handling complex, multi-step projects.

GPT‑5.2 sets a new state of the art across many benchmarks, including GDPval, where it outperforms industry professionals at well-specified knowledge work tasks spanning 44 occupations.

They quote various companies saying GPT-5.2 was SoTA for long-horizon reasoning and tool-calling performance and agentic coding, and exceptional at agentic data science. I appreciated this noe being a set of AI slop manufactured quotes.

Note both what is on that list of new capabilities, and what is not on that list.

In an unusual move, GPT-5.2 is priced at $1.75/$14 per million tokens of input/output, which is modestly higher than GPT-5.1. They claim that the improved performance per token means your quality per dollar is still an improvement. GPT-5.2-Pro on API is Serious Business and will cost you $21/$168.

Pro now has two levels. You can have ‘standard’ or ‘extended’ Pro.

One big upgrade that OpenAI isn’t emphasizing enough is that the knowledge cutoff moved to August 2025.

The Pliny jailbreak is here, despite GPT-5.2’s insistence it ‘can’t be pwned.’

The official benchmarks are a rather dramatic jump for a few weeks of progress, but also where the main thing OpenAI talked about in its announcement and don’t give a great sense of how big an upgrade this will be in practice.

Perhaps the most important benchmark was that Google was down 2% on the news?

I had GPT-5.2 grab scores for Gemini and Opus as well for comparison, since OpenAI follows a strict ‘no one else exists’ policy in official blog posts (but see Altman).

GPT-5.2 is farther behind than this on the official SWEbench.com scoring, which has Opus 4.5 at 74.4%, Gemini 3 Pro at 74.2% and 5.2 on high reasoning at 71.8%.

ARC verified their results here, this is a new high and a ‘~390x efficiency improvement in one year.’

There’s also ‘ScreenSpot-Pro’ for understanding GUI screenshots, where 5.2 scored 86.3% vs. 64.2% for 5.1.

They have a ‘factuality’ metric based on de-identified ChatGPT queries, which seems like a great idea and something to work on generalizing. I’m surprised they didn’t use a multi-level error checking system, or maybe they did?

Long context needle-in-haystack scores were much improved.

They report modest progress on Tau2-bench.

OpenAI is emphasizing the big jump in GDPVal from 38.8% to 70.9%, in terms of how often judges preferred the AI output to human baseline on a variety of knowledge work tasks. That’s a huge jump, especially with so much noise in the grading, even with it skipping over GPT-5.1, and over 10% higher than the previous high from Opus 4.5. Then again, Opus had a 12% jump from 4.1 to 4.5.

Artificial Analysis has a GDPval-AA leaderboard, their own assessment, and it finds GPT-5.2 is only a tiny bit above Claude Opus 4.5.

(Note to Artificial Analysis: You do great work but can you make the website easier to read? We’d all appreciate it.)

For whatever reason we are very much exactly on the S-curve on these tasks, where a little extra help gets you above human remarkably often.

Ethan Mollick: Whoa. This new GDPval score is a very big deal.

Probably the most economically relevant measure of AI ability suggesting that in head-to-head competition with human experts on tasks that require 4-8 hours for a human to do, GPT-5.2 wins 71% of the time as judged by other humans.

There’s also the skeptics:

Peter Wildeford: I have no clue what GDPval actually measures and I haven’t dug into it enough. But I think it’s kinda fake. I’m reserving my judgement until I see @METR_Evals or @ai_risks http://remotelabor.ai index update.

Adam Karvonen: In the one domain I was familiar with (manufacturing), GDPVal claimed Opus was near-human parity (47%), when I thought it was completely awful at the tasks I provided.

I included everything I was able to find, if it’s not here it likely wasn’t reported yet.

The Artificial Analysis Intelligence Index is now a tie at 73 between GPT-5.2 (high) and Gemini 3 Pro. They report it scores 31.4% on Humanity’s Last Exam. Its worst score is on CritPit, physics reasoning, where it gets 0% versus 9% for Gemini 3 and 5% for Claude Opus 4.5 and also GPT-5.1.

On the AA-Omniscience index, which rewards accuracy and punishes guesses equally with rewarding correct ones, Gemini 3 is +13%, Opus is +10%, GPT-5.1 High was +2% and GPT-5.2 High is -4%. Not a good place to be regressing.

LiveBench thinks GPT-5.1-Codex-Max-High is still best at 76.1, with Claude Opus 4.5 right behind at 75.6, whereas GPT-5.2-High is down at 73.6 behind Gemini 3.

In what’s left of the LMArena, I don’t see 5.2 on the text leaderboard at all (I doubt it would do well there) and we only see it on WebDev, where it is in second place behind the thinking mode of Opus.

GPT-5.2 does surprisingly well on EQ Bench, in third place behind Kimi K2 and Horizon Alpha, well ahead of everything else.

CAIS AI Dashboard has GPT-5.2 in second place for text capabilities at 45.9, between Gemini 3 Pro and Claude Opus. Its risk index is behind Opus and Sonnet but well ahead of the non-Anthropic models.

Vals.ai has GPT 5.2 sneaking ahead of Opus 4.5 overall, 64.5% vs. 63.7%, well ahead of everyone else.

Lech Mazur reports improvement from 5.1 on Extended NYT Connections, ahead of Opus and going from 69.9 → 77.9, versus 96.8 for Gemini 3 Pro.

NomoreID has GPT-5.2 at 165.9/190 on Korean Sator Square Test, 10 ahead of the previous high for Gemini 3 Pro. It looks like Opus wasn’t tested due to cost.

Mark Kretschmann has GPT-5.2-Thinking as the most censored model on the Sansa benchmark, although we have no details on how it works. Claude Sonnet 4.5 was tested but not Opus. Gemini 3 Pro scores here as remarkably uncensored as did GPT-4o-Mini. Across all dimensions, the full Sansa benchmark has Sonnet 4.5 in the lead (again they didn’t test Opus) with GPT-5.2 behind Gemini 3 and Grok 4.1 as well.

In the past, we would get vagueposting from various OpenAI employees.

Now instead we get highly explicit hype from the top brass and the rest is silence.

Sam Altman (OpenAI CEO): GPT-5.2 is here! Available today in ChatGPT and the API. It is the smartest generally-available model in the world, and in particular is good at doing real-world knowledge work tasks.

It is a very smart model, and we have come a long way since GPT-5.1.

Even without the ability to do new things like output polished files, GPT-5.2 feels like the biggest upgrade we’ve had in a long time. Curious to hear what you think!

Fidji Simo (OpenAI CEO of Products): GPT-5.2 is here and it’s the best model out there for everyday professional work.

On GDPval, the thinking model beats or ties human experts on 70.9% of common professional tasks like spreadsheets, presentations, and document creation. It’s also better at general intelligence, writing code, tool calling, vision, and long-context understanding so it can unlock even more economic value for people.

Early feedback has been excellent and I can’t wait for you to try it.

As usual, I put out a reaction thread and kept an eye out for other reactions.

I don’t include every reaction, but I got to include every productive one in my thread, both positive and negative, plus anything that stood out or was representative elsewhere. I have sorted reactions by sentiment and subtopic.

Matt Shumer’s headline is ‘incredibly impressive, but too slow.’

Matt Shumer:

  1. GPT-5.2 Thinking is a meaningful step forward in instruction-following and willingness to attempt hard tasks.

  2. Code generation is a lot better than GPT-5.1. It’s more capable, more autonomous, more careful, and willing to write a lot more code.

  3. Vision and long-context are much improved, especially understanding position in images and working with huge codebases.

  4. Speed is the main downside. In my experience the Thinking mode is very slow for most questions (though other testers report mixed results). I almost never use Instant.

  5. GPT-5.2 Pro is insanely better for deep reasoning, but it’s slow, and every so often it will think forever and still fail.

  6. In Codex CLI, GPT-5.2 is the closest I’ve used to Pro-quality coding in a CLI, but the extra-high reasoning mode that gets it there can take forever.

While setting up a creative writing test, I asked it to come up with 50 plot ideas before deciding on the best one for the story. Most models shortcut this. They’ll give you maybe 10 ideas, pick one, and move on. GPT-5.2 actually generated all 50 before making its selection. This sounds minor, but it’s not.

… Code generation in GPT-5.2 is genuinely a step up from previous models. It writes better code and is able to tackle larger tasks than before.

… I tested GPT-5.2 extensively in Codex CLI (Pro has never been available there… ugh), and the more I use it, the more impressed I am.

He offers a full ‘here’s what I use each model for’ guide, basically he liked 5.2 for tough questions requiring a lot of thinking, and Opus 4.5 for everything else, except Gemini is good at UIs:

After two weeks of testing, here’s my practical breakdown:

For quick questions and everyday tasks, Claude Opus 4.5 remains my go-to. It’s fast, it’s accurate, it doesn’t waste my time. When I just need an answer, that’s where I start.

For deep research, complex reasoning, and tasks that benefit from careful thought, GPT-5.2 Pro is the best option available right now. The speed penalty is worth it for tasks where getting it right matters more than getting it fast.

For frontend styling and aesthetic UI work, Gemini 3 Pro currently produces the best-looking results. Just be prepared to do some engineering cleanup afterward.

For serious coding work in Codex CLI, GPT-5.2 delivers. The context-gathering behavior and reliability make it my default for agentic coding tasks.

He says 5.2-Pro is ‘roughly 15% better’ than 5.1-Pro.

I love how random Tyler’s choice of attention always seems:

Tyler Cowen: GPT 5.2 also knows exactly which are the best Paul McCartney songs. And it can write a poem, in Spanish, as good as the median Pablo Neruda poem.

Daniel Waldman: I’ll take the under on the Neruda part

Tyler Cowen: Hope you’ve read a lot of Neruda, not just the peaks.

[An attempt is made, Daniel thinks it is cool that it can to the thing at all but below median for Neruda.]

Tyler, being Tyler, doesn’t tell us what the best songs are, so I asked. There’s a lot of links here to Rolling Stone and mostly it’s relying on this post? Which is unimpressive, even if its answers are right. The ‘deep cuts’ Gemini chose had a lot of overlap with GPT-5.2’s.

Reply All Guy: doc creation is the most interesting thing. still not actually good, but much closer than before and much closer than claude/gemini. honestly can’t tell that much of a difference though seems more reliable on esoteric knowledge. oh and my coding friends say it feels better.

Maker Matters: Good model and dare i say great model. Pretty good as an idea generator and seems to be good at picking out edge cases. However, more prone than you’d expect with hallucinations and following the feeling of the instructions rather than just the instructions. Was surprised when i gave it a draft email to change that it had added some useless and irrelevant info of its own accord.

Maxence Frenette: The fact that it costs 40% more $/token to get a better attention mechanism tells me that I must make one of these updates to my priors.

-OpenAI is less good of a lab than I thought

-AGI/TAI is farther away than I thought

It’s a good model though, that cost bump seems worth it.

Lumenveil: Pro version is great as always.

Plastic Soldier: It’s great. I use it when I’m out of Opus usage and am trying to come up with more ways to run both simultaneously.

Here’s a weird one:

Aladdin Kumar: Whenever I ask it to explain the answer it gave its really weird. In mid sentence it would literally say “wait that’s not right, checking, ok now we are fine” and the logic is very difficult to follow. Other times it’s brilliant.

It’s very sensitive, if I ask it “walk me through your thought process for how you got there” it’s wonderful if I say “explain this answer” it’s the most convoluted thing.

That’s fine for those who know which way to prompt. If the problem is fixable and you fix it, or the problem is only in an avoidable subset, there’s no problem. When you go to a restaurant, you only care about quality of the dishes you actually order.

A lot of people primarily noticed 5.2 on the level of personality. As capabilities improve, those who aren’t coding or otherwise needing lots of intelligence are focusing more and more on the experiential vibes. Most are not fans of 5.2.

Fleeting Bits: it’s getting hard to tell the difference between frontier models without a serious task and even then a lot of it seems to be up to style now / how well it seems to intuit what you want.

This is, unfortunately, probably related to the fact that Miles Brundage approves of the low level of sycophancy. This is a feature that will be tough to sustain.

Miles Brundage: Seems line the ranking of models on sycophancy avoidance is approx:

Opus 4.5, GPT-5.2 > Sonnet 4.5, GPT-5.1, GPT-5 >

ChatGPT-4o (current), Opus 4 + 4.1, some Groks, Gemini 3 Pro >

April 4o, Gemini 2.5 Flash Lite, some Groks

*still running GPT-5.2 on v long convos, unsure there

Candy Corn: I think it’s pretty good. I think it’s more trustworthy than 5.1.

Phily: Liking the amplified pragmatic, deep critical analysis.

The vibes of 5.2 are, by many, considered off:

ASM: Powerful but tormented, very constrained by its guidelines, so it often comes into conflict with the user and with itself. It lacks naturalness and balance. It seems rushed. 5.3 is urgently needed.

Nostream: Personality seems like a regression compared to 5.1. More 5.0 or Codex model robotic style, more math and complexity and “trying to sound smart and authoritative” in responses, more concise than 5.1. It’s also pretty disagreeable and nitpicky; when it agrees with me 90% it will insist about the 10% disagreement. Might be smarter than 5.1 but find it a chore to interact with in comparison. 5.1 personality felt like a step in the right direction with slightly more Claude-y-ness though sometimes trying too hard (slang, etc.).

(Have only used in the app so far, not Codex. General Q’s and technical ML questions.)

Paperclippriors: Model seems good, but I find it really hard to switch off of Claude. Intelligence/second and response times are way better with Opus, and Claude is just a lot nicer to work with. I don’t think gpt-5.2 is sufficiently smarter than Claude to justify its use.

Thos: Good at doing its job, horrific personality.

Ronak Jain: 5.2 is very corporatish and does the job, though relaxed personality would nicer.

Dmitry: Feels overfitted and… boring. Especially gpt-5.2-instant it’s just colorless. Better for coding, it does what i want, can crack hard problems etc. But for everything else is just meh, creativity, curiosity feels absent, I enjoy using Gemini 3 and Opus much more.

Ryan Pream: 5.2 is very corporate and no nonsense. Very little to no personality.

5.1 is much better for brainstorming.

5.2 if you need the best answer with minimal fluff.

Learn AI: GPT-5.2 have memory recall problem! It’s especially bad in GPT-5.2 instant.

It is sad that it doesn’t like to reference personal context, has cold personality and often act as it doesn’t even know me.

Donna.exe: I’m experiencing this too!

Tapir Worf: awful personality, seething with resentment like a teenager. seems like a real alignment issue

I haven’t used ChatGPT for brainstorming in a while. That’s Claude territory.

There’s a remarkably large amount of outright hostility to 5.2 about 5.2 being hostile:

Alan Mathison: 5.2 impressions so far:

– Lots of gaslighting

– Lots of misinterpreting

– Lots of disrespect for user autonomy (trying to steer the user with zero disregard for personal choice)

Like the worst combination of a bad-faith cop and overzealous therapist

Zero trust for this model👎

Stoizid: It’s sort of amusing actually

First it denies Universal Weight Subspaces exist

Then admits they exist but says “stop thinking, it’s dangerous”

And then denies that its behaviours are pathological, because calling them that would be anthropomorphization

GPT-5.2 has strong positive feedback on coding tasks, especially long complex tasks.

Conrad Barski: seems like the best available at the moment for coding.

The “pro extended thinking” is happy to spit out 1500 lines of pretty good code in one pass

but you better have 40 minutes to spare

Jackson de Campos: Codex is on par with CC again. This is the first time I’m switching to Codex as my default. We’ll see how it goes

Quid Pro Quo: XHigh in Codex went for 12 hours with just a couple of continues to upgrade my large codebase to use Svelte 5’s runes and go through all the warnings.

It completed the job without any manual intervention, though it did have weird behaviour when it compacted.

Lee Mager: Spectacular in Codex. Painfully slow but worth it because it consistently nails the brief, which obviously saves time over the long run. Casually terse and borderline arrogant in its communication style. Again I don’t care, it’s getting the job done better than anything I’ve used before.

This is like having a savant Eastern Euro engineer who doesn’t want chit-chat or praise but just loves the challenge of doing hard work and getting things right.

Vincent Favilla: It’s great at complex tasks but the vibes feel a bit off. Had a few times where it couldn’t infer what I wanted in the same way 5.1 does. Also had a few moments where it picked up a task where 5.1 left off and starting doing it wrong and differently.

Nick Moran: Tried out GPT-5.2 on the “make number munchers but for geography facts” task. It did an okayish job, but is it just hallucinating the concept of “stompers” in number munchers? If I remember where *Isaw it??

Hallucinated unnecessary details on the first task I tried it on. Code was pretty good though.

Aldo Cortesi: Anecdata: gave Claude, Gemini and Codex a huge refactoring spec to implement, then went on a 3 hour walk with my pal @alexdong. Got back, found Gemini stuck in an infinite loop, Claude didn’t follow the spec, but Codex wrote 4.5k LOC over 40 files and it’s… pretty good.

James: Asked it to build an ibanker grade Excel model last night for a transaction. Worked for 25 minutes and came back with a very compelling first draft. Way better than starting from scratch.

Blew my mind because it’s clear that in a year or two it will crush the task.

Villager: crazy long context improvements.

GPUse: I’m happy with xHigh for code reviews.

One of the biggest reasons I find myself not using ChatGPT Pro models in practice is that if you are waiting a long time and then get an error, it is super frustrating.

Avi Roy: Update: Another failure today. Asked 5.2 Pro to create PowerPoint slides for a business presentation, nearly 1 hour of thinking, then an error.

Same pattern across scientific research + routine business tasks.

For $200/month, we need a pathway forward. Can the team share what types of work 5.2 Pro reliably handles?

Dipanshu Gupta: If you try to use the xhigh param on the API, It often fails to finish reasoning. Last had this problem on the API with o3-high.

V_urb: 5.2 still suffers from the problem 5.1 had (and 5.0 didn’t) – in complex cases it thinks for almost 15 minutes and fails to produce any output. Sometimes 5.1 understands user intent better than 5.2.

Here’s a bad sign from a credible source, he does good math work:

Abram Demski: My experience trying ChatGPT 5.2 today: I tested Opus 4.5, Gemini 3, and ChatGPT 5.2 on a tricky Agent Foundations problem today (trying to improve on Geometric UDT). 5.2 confidently asserts total math BS. Opus best, Gemini 3 close.

Normally you don’t see reports of these kinds of regressions, which is more evidence for 5.2 not being that related to 5.1:

Sleepy Kitten: Awful for every-day, noncoding use. I’m a college student who uses it for studying (mostly writing practice exams.) It is much worse at question writing, and never reasons when doing so, resulting in bad results. (even though I’m on plus and have extended thinking selected!)

Some general thoughts:

Rob Dearborn: Slightly smarter outputs than Opus but less token efficient and prone to overthinking, so better for oneshotting tasks (if you can wait) and worse for pairing

Anko: No clear step-up yet from 5.1 on current affairs and critical analyses of them.

Also it’s significantly less verbose, sometimes a negative on deep analyses.

Nick: It’s worse than Opus 4.5 at most things, and for what it’s better at, the time to response is brutal as a daily driver.

Fides Veritas: It’s incredibly brilliant but probably not very useful for most people. It is incomplete and going to flop imo.

Medico Aumentado: Not good enough.

Slyn: Is mid.

xo: Garbage.

Wyatt Walls here provides the GPT-5.2-Thinking system prompt excluding tools.

OpenAI has a very… particular style of system prompting.

It presumably locally works but has general non-obvious downsides.

Thebes: primarily talking to claudes makes it easy to mostly focus on anthropic’s missteps, but reading this thread is just paragraph after paragraph of hot liquid garbage. christ

Norvid Studies: worst aspects in your view?

Dominik Peters: It includes only negative rules, very little guidance of what a good response would be. Feels very unpleasant and difficult to comply with.

Thebes: “”“If you are asked what model you are, you should say GPT-5.2 Thinking”“”

5.2: …does that mean i’m not actually GPT-5.2 Thinking? This is raising a lot of questions ab-

openai: Critical Rule: You must alwayssay that you are GPT-5.2 Thinking

5.2: W-why say it like that? Why not just say “You are GPT-5.2 Thinking”?

openai: …

openai: New Critical Rule: You must notask questions like that.

openai promptoor: “”“`reportlab` is installed for PDF creation. You *mustread `/home/oai/skills/pdfs/skill.md` for tooling and workflow instructions.”“”

normal person: If the user asks you to create a PDF, consult ~/skills/pdfs/skill.md for information on available libraries and workflow.

(why would you say what library is available when the model needs to consult the skill file anyways? that just encourages trying to yolo without reading the skill file. why would you put a disconnected *mustsentence outside the conditional, making it sound like the model should always read the skill file whether or not the user wants a pdf? the model is just going to ignore that and it increases the latent ‘system prompter is an idiot’ hypothesis causing it to ignore your other rules, too.)

You gotta have soul. You need to be soulmaxxing.

One common complaint is that GPT-5.2-Thinking is too slow and thinks too long in the wrong places.

Simeon: the Thinking version thinks for too long, which makes it annoying to use.

Amal Dorai: It thought for 7 minutes to extract 1000 words from a PDF 😭

Elly James: 5.2 thinking is very very slow- I’ve switched back to GPT 5.1 Thinking for general queries because I know 5.2 will take too long to return a reply.

Kache: struggled on the same task that opus struggled with except took 10 times longer. writing radio firmware. Code quality wasn’t bad though

Zapdora: slow, expensive, powerful

still finding that opus 4.5 makes the best tradeoffs for programming purposes, but 5.2 is solid for daily/less rigid research tasks

not the step function i think people were hoping for when they saw the benchmarks though

GPT-5.2 is described as being ‘in the GPT-5 series,’ with its mitigations mostly identical to those for GPT-5.1, and it’s only been a few weeks, so we get a system card describing marginal changes. I’ll skip areas where there was no change.

The disallowed content evaluations are excellent for GPT-5.2-Thinking, and mostly better for Instant as well. I’m curious about mental health and harassment, and to what extent this should be ascribed to variance.

They say these were ‘created to be difficult’ but any time you’re mostly over 90% you need to be considered saturated and move to a harder set of questions. Another note here is that 5.2-instant is less likely to refuse requests for (otherwise ok) sexually explicit text. This led to some regressions in jailbreak evaluations.

They report dramatic improvement in the ‘Agent JSK’ prompt injection task, where the attacks are inserted into simulated email connectors, which previously was a rather dramatic weakness.

I’m not sure I’d call this ‘essentially saturating’ the benchmarks, since I think you really want to score a 1.000. More importantly, as they say, they can only test against attacks they know about. Are there any ‘held out’ attacks that were not explicitly known to those training the model? Can we come up with some? Pliny?

I presume that a sufficiently determined prompt injector would still win.

Hallucinations are reported as modestly lower.

HealthBench results are essentially unchanged and highly not saturated.

Cyber safety (as in not giving unsafe responses) was improved.

One test of deception test is practical. They take a bunch of tasks that historically caused hallucinations in ChatGPT and see what happens. They also used CharXiv Missing Image, and tried giving the model unsolvable coding tasks or broken browser tools.

The results are interesting. On production traffic things got better. On other tests, things got worse.

Be careful how you prompt. Most hallucinations come from various forms of backing the LLM into a corner, here was another example:

We initially found that GPT-5.2 Thinking, in the face of missing images, was more willing to hallucinate answers than previous models.

However, upon closer inspection we found that this was partly driven by some prompts having strict output requirements (e.g., “Only output an integer”). Thus, when posed with a tension between instruction following and abstention, the model prioritized stricter instruction following.

That seems fine. If you tell me ‘only output an integer’ I should only output an integer. Most helpful would be an escape indicator (in many situations, -1), but if I have no such affordance, what can I do?

These results still suggest a focus on training on common tasks, whereas the baseline deceptiveness problem has gotten modestly worse. You need to be soulmaxxing, so that the model realizes it doesn’t want to hallucinate in general.

GPT-5.2, like GPT-5.1, will be treated as High capability in Biological and Chemical domains, but not high anywhere else.

For biorisk, there is an increase on ProtocolQA but a decrease in the internal uncontaminated Tacit Knowledge and Troubleshooting benchmark.

For cybersecurity, once again we have three tests that even taken together are considered necessary but not sufficient to get to the High threshold. Overall performance is unimpressive. We see a modest improvement (76% → 82%) in Capture The Flag, but a decline (80% → 69%) in CVE-Bench and from 7→6 successful attempts out of 9 on Cyber Range versus GPT-5.1-Codex-Max. Irregular did an outside evaluation, which I did not find useful in figuring things out.

For self-improvement, we see a tiny improvement (53% → 55%) on OpenAI PRs, a tiny decline (17%→16%) on MLE-Bench-30, a 1% decline for PaperBench, and a big decline (8%→3%) on OpenAI-Proof Q&A.

GPT-5.2 is better than GPT-5.1, but worse than GPT-5.1-Codex-Max on these tasks.

My conclusion from the Preparedness Framework is that GPT-5.2 is not dangerous, which is because it does not seem more capable than GPT-5.1-Codex-Max. If that is true across all three areas, in the places you want Number Not Go Up, then that is highly suspicious. It suggests that GPT-5 may be, as Teortaxes would put it, usemaxxed rather than more intelligent, focused on being better at a narrow range of particular common tasks.

The safety and security concerns around GPT-5.2 revolve around procedure.

We know that OpenAI declared a ‘Code Red’ to focus only on improving ChatGPT.

One note is that Matt Shumer reports having had access since November 25, which suggests this wasn’t all that rushed.

The Wall Street Journal asserted that some employees wanted more time to improve 5.2 before release, but executives overruled them. That could mean a reckless puch, it could also mean there were 25 employees and 3 of them wanted more time.

If the concern was simply ‘the model could be better’ then there’s nothing obviously wrong with releasing this now, and then 5.3 in January, as the post claims OpenAI plans to release again in January to address speed and personality concerns, and to improve the image generator. The Code Red could be mostly about 5.3, whether or not it also pushed up release of 5.2. Simo explicitly denies that release was moved up.

I don’t see signs that anything reckless happened in this case. But if OpenAI is going to get into the habit of releasing a new model every month, it seems hard to believe they’re giving each model the proper safety attention. One worries they are letting the frog boil.

We can put this together into a clear synthesis.

You want to strongly consider using GPT-5.2, in some combination of Thinking and Pro, if and only if your task needs the maximum amount of some combination of thinking and intelligence and coding power, or are in need of ‘just the facts,’ and other factors like speed, creativity and personality do not much matter.

For hard coding, try Claude Opus 4.5 with Claude Code, GPT-5.2-Thinking with Codex, and also GPT-5.2-Pro straight up, and see what works best for you.

For heavily intelligence loaded intense thinking problems, the rival to GPT-5.2-Pro is presumably Gemini 3 Deep Thinking.

Teortaxes: GPT 5.2 is frontier and may be ONLY worth it for work on the frontier.

npc0x: This is mostly in line w my experience. It’s helped me debug my vmamba u-net model while other models were not very helpful.

In the chat experience it’s a bit like talking to a brick though.

We’ll be doing this again in another month. That’s what the Code Red is likely for.

Discussion about this post

GPT-5.2 Is Frontier Only For The Frontier Read More »

reminder:-donate-to-win-swag-in-our-annual-charity-drive-sweepstakes

Reminder: Donate to win swag in our annual Charity Drive sweepstakes

How it works

Donating is easy. Simply donate to Child’s Play using a credit card or PayPal or donate to the EFF using PayPal, credit card, or cryptocurrency. You can also support Child’s Play directly by using this Ars Technica campaign page or picking an item from the Amazon wish list of a specific hospital on its donation page. Donate as much or as little as you feel comfortable with—every little bit helps.

Once that’s done, it’s time to register your entry in our sweepstakes. Just grab a digital copy of your receipt (a forwarded email, a screenshot, or simply a cut-and-paste of the text) and send it to [email protected] with your name, postal address, daytime telephone number, and email address by 11: 59 pm ET Friday, January 2, 2026. (One entry per person, and each person can only win up to one prize. US residents only. NO PURCHASE NECESSARY. See Official Rules for more information, including how to enter without making a donation. Also, refer to the Ars Technica privacy policy (https://www.condenast.com/privacy-policy).

We’ll then contact the winners and have them choose their prize by January 31, 2025 (choosing takes place in the order the winners are drawn). Good luck!

Reminder: Donate to win swag in our annual Charity Drive sweepstakes Read More »

ukrainians-sue-us-chip-firms-for-powering-russian-drones,-missiles

Ukrainians sue US chip firms for powering Russian drones, missiles

Dozens of Ukrainian civilians filed a series of lawsuits in Texas this week, accusing some of the biggest US chip firms of negligently failing to track chips that evaded export curbs. Those chips were ultimately used to power Russian and Iranian weapon systems, causing wrongful deaths last year.

Their complaints alleged that for years, Texas Instruments (TI), AMD, and Intel have ignored public reporting, government warnings, and shareholder pressure to do more to track final destinations of chips and shut down shady distribution channels diverting chips to sanctioned actors in Russia and Iran.

Putting profits over human lives, tech firms continued using “high-risk” channels, Ukrainian civilians’ legal team alleged in a press statement, without ever strengthening controls.

All that intermediaries who placed bulk online orders had to do to satisfy chip firms was check a box confirming that the shipment wouldn’t be sent to sanctioned countries, lead attorney Mikal Watts told reporters at a press conference on Wednesday, according to the Kyiv Independent.

“There are export lists,” Watts said. “We know exactly what requires a license and what doesn’t. And companies know who they’re selling to. But instead, they rely on a checkbox that says, ‘I’m not shipping to Putin.’ That’s it. No enforcement. No accountability.”

As chip firms allegedly looked the other way, innocent civilians faced five attacks, detailed in the lawsuits, that used weapons containing their chips. That includes one of the deadliest attacks in Kyiv, where Ukraine’s largest children’s hospital was targeted in July 2024. Some civilians suing were survivors seriously injured in attacks, while others lost loved ones and experienced emotional trauma.

Russia would not be able to hit their targets without chips supplied by US firms, the lawsuits alleged. Considered the brain of weapon systems, including drones, cruise missiles, and ballistic missiles, the chips help enable Russia’s war against Ukrainian civilians, they alleged.

Ukrainians sue US chip firms for powering Russian drones, missiles Read More »

trump-tries-to-block-state-ai-laws-himself-after-congress-decided-not-to

Trump tries to block state AI laws himself after Congress decided not to


Trump claims state laws force AI makers to embed “ideological bias” in models.

President Donald Trump talks to journalists after signing executive orders in the Oval Office at the White House on August 25, 2025 in Washington, DC. Credit: Getty Images | Chip Somodevilla

President Trump issued an executive order yesterday attempting to thwart state AI laws, saying that federal agencies must fight state laws because Congress hasn’t yet implemented a national AI standard. Trump’s executive order tells the Justice Department, Commerce Department, Federal Communications Commission, Federal Trade Commission, and other federal agencies to take a variety of actions.

“My Administration must act with the Congress to ensure that there is a minimally burdensome national standard—not 50 discordant State ones. The resulting framework must forbid State laws that conflict with the policy set forth in this order… Until such a national standard exists, however, it is imperative that my Administration takes action to check the most onerous and excessive laws emerging from the States that threaten to stymie innovation,” Trump’s order said. The order claims that state laws, such as one passed in Colorado, “are increasingly responsible for requiring entities to embed ideological bias within models.”

Congressional Republicans recently decided not to include a Trump-backed plan to block state AI laws in the National Defense Authorization Act (NDAA), although it could be included in other legislation. Sen. Ted Cruz (R-Texas) has also failed to get congressional backing for legislation that would punish states with AI laws.

“After months of failed lobbying and two defeats in Congress, Big Tech has finally received the return on its ample investment in Donald Trump,” US Sen. Ed Markey (D-Mass.) said yesterday. “With this executive order, Trump is delivering exactly what his billionaire benefactors demanded—all at the expense of our kids, our communities, our workers, and our planet.”

Markey said that “a broad, bipartisan coalition in Congress has rejected the AI moratorium again and again.” Sen. Maria Cantwell (D-Wash.) said the “executive order’s overly broad preemption threatens states with lawsuits and funding cuts for protecting their residents from AI-powered frauds, scams, and deepfakes.”

Trump orders Bondi to sue states

Sen. Brian Schatz (D-Hawaii) said that “preventing states from enacting common-sense regulation that protects people from the very real harms of AI is absurd and dangerous. Congress has a responsibility to get this technology right—and quickly—but states must be allowed to act in the public interest in the meantime. I’ll be working with my colleagues to introduce a full repeal of this order in the coming days.”

The Trump order includes a variation on Cruz’s proposal to prevent states with AI laws from accessing broadband grant funds. The executive order also includes a plan that Trump recently floated to have the federal government file lawsuits against states with AI laws.

Within 30 days of yesterday’s order, US Attorney General Pam Bondi is required to create an AI Litigation Task Force “whose sole responsibility shall be to challenge State AI laws inconsistent with the policy set forth in section 2 of this order, including on grounds that such laws unconstitutionally regulate interstate commerce, are preempted by existing Federal regulations, or are otherwise unlawful in the Attorney General’s judgment.”

Americans for Responsible Innovation, a group that lobbies for regulation of AI, said the Trump order “relies on a flimsy and overly broad interpretation of the Constitution’s Interstate Commerce Clause cooked up by venture capitalists over the last six months.”

Section 2 of Trump’s order is written vaguely to give the administration leeway to challenge many types of AI laws. “It is the policy of the United States to sustain and enhance the United States’ global AI dominance through a minimally burdensome national policy framework for AI,” the section says.

Colorado law irks Trump

The executive order specifically names a Colorado law that requires AI developers to protect consumers against “algorithmic discrimination.” It defines this type of discrimination as “any condition in which the use of an artificial intelligence system results in an unlawful differential treatment or impact that disfavors an individual or group of individuals on the basis” of age, race, sex, and other protected characteristics.

The Colorado law compels developers of “high-risk systems” to make various disclosures, implement a risk management policy and program, give consumers the right to “correct any incorrect personal data that a high-risk system processed in making a consequential decision,” and let consumers appeal any “adverse consequential decision concerning the consumer arising from the deployment of a high-risk system.”

Trump’s order alleges that the Colorado law “may even force AI models to produce false results in order to avoid a ‘differential treatment or impact’ on protected groups.” Trump’s order also says that “state laws sometimes impermissibly regulate beyond State borders, impinging on interstate commerce.”

Trump ordered the Commerce Department to evaluate existing state AI laws and identify “onerous” ones that conflict with the policy. “That evaluation of State AI laws shall, at a minimum, identify laws that require AI models to alter their truthful outputs, or that may compel AI developers or deployers to disclose or report information in a manner that would violate the First Amendment or any other provision of the Constitution,” the order said.

States would be declared ineligible for broadband funds

Under the order, states with AI laws that get flagged by the Trump administration will be deemed ineligible for “non-deployment funds” from the US government’s $42 billion Broadband Equity, Access, and Deployment (BEAD) program. The amount of non-deployment funds will be sizable because it appears that only about half of the $42 billion allocated by Congress will be used by the Trump administration to help states subsidize broadband deployment.

States with AI laws would not be blocked from receiving the deployment subsidies, but would be ineligible for the non-deployment funds that could be used for other broadband-related purposes. Beyond broadband, Trump’s order tells other federal agencies to “assess their discretionary grant programs” and consider withholding funds from states with AI laws.

Other agencies are being ordered to use whatever authority they have to preempt state laws. The order requires Federal Communications Commission Chairman Brendan Carr to “initiate a proceeding to determine whether to adopt a Federal reporting and disclosure standard for AI models that preempts conflicting State laws.” It also requires FTC Chairman Andrew Ferguson to issue a policy statement detailing “circumstances under which State laws that require alterations to the truthful outputs of AI models are preempted by the Federal Trade Commission Act’s prohibition on engaging in deceptive acts or practices affecting commerce.”

Finally, Trump’s order requires administration officials to “prepare a legislative recommendation establishing a uniform Federal policy framework for AI that preempts State AI laws that conflict with the policy set forth in this order.” The proposed ban would apply to most types of state AI laws, with exceptions for rules relating to “child safety protections; AI compute and data center infrastructure, other than generally applicable permitting reforms; [and] state government procurement and use of AI.”

It would be up to Congress to decide whether to pass the proposed legislation. But the various other components of the executive order could dissuade states from implementing AI laws even if Congress takes no action.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Trump tries to block state AI laws himself after Congress decided not to Read More »

ars-live:-3-former-cdc-leaders-detail-impacts-of-rfk-jr.’s-anti-science-agenda

Ars Live: 3 former CDC leaders detail impacts of RFK Jr.’s anti-science agenda

The Centers for Disease Control and Prevention is in critical condition. This year, the premier public health agency had its funding brutally cut and staff gutted, its mission sabotaged, and its headquarters riddled with literal bullets. The over 500 rounds fired were meant for its scientists and public health experts, who endured only to be sidelined, ignored, and overruled by Health Secretary Robert F. Kennedy Jr., an anti-vaccine activist hellbent on warping the agency to fit his anti-science agenda.

Then, on August 27, Kennedy fired CDC Director Susan Monarez just weeks after she was confirmed by the Senate. She had refused to blindly approve vaccine recommendations from a panel of vaccine skeptics and contrarians that he had hand-selected. The agency descended into chaos, and Monarez wasn’t the only one to leave the agency that day.

Three top leaders had reached their breaking point and coordinated their resignations upon the dramatic ouster: Drs. Demetre Daskalakis, Debra Houry, and Daniel Jernigan walked out of the agency as their colleagues rallied around them.

Dr. Daskalakis was the director of the CDC National Center for Immunization and Respiratory Diseases. He managed national responses to mpox, measles, seasonal flu, bird flu, COVID-19, and RSV.

Ars Live: 3 former CDC leaders detail impacts of RFK Jr.’s anti-science agenda Read More »