Author name: Rejus Almole

zuckerberg’s-dystopian-ai-vision

Zuckerberg’s Dystopian AI Vision

You think it’s bad now? Oh, you have no idea. In his talks with Ben Thompson and Dwarkesh Patel, Zuckerberg lays out his vision for our AI future.

I thank him for his candor. I’m still kind of boggled that he said all of it out loud.

We will start with the situation now. How are things going on Facebook in the AI era?

Oh, right.

Sakib: Again, it happened again. Opened Facebook and I saw this. I looked at the comments and they’re just unsuspecting boomers congratulating the fake AI gen couple😂

Deepfates: You think those are real boomers in the comments?

This continues to be 100% Zuckerberg’s fault, and 100% an intentional decision.

The algorithm knows full well what kind of post this is. It still floods people with them, especially if you click even once. If they wanted to stop it, they easily could.

There’s also the rather insane and deeply embarrassing AI bot accounts they have tried out on Facebook and Instagram.

Compared to his vision of the future? You aint seen nothing yet.

Ben Thompson interviewed Mark Zuckerberg, centering on business models.

It was like if you took a left wing caricature of why Zuckerberg is evil, combined it with a left wing caricature about why AI is evil, and then fused them into their final form. Except it’s coming directly from Zuckerberg, as explicit text, on purpose.

It’s understandable that many leave such interviews and related stories saying this:

Ewan Morrison: Big tech atomises you, isolates you, makes you lonely and depressed – then it rents you an AI friend, and AI therapist, an AI lover.

Big tech are parasites who pretend they are here to help you.

When asked what he wants to use AI for, Zuckerberg’s primary answer is advertising, in particular an ‘ultimate black box’ where you ask for a business outcome and the AI does what it takes to make that outcome happen. I leave all the ‘do not want’ and ‘misalignment maximalist goal out of what you are literally calling a black box, film at 11 if you need to watch it again’ and ‘general dystopian nightmare’ details as an exercise to the reader. He anticipates that advertising will then grow from the current 1%-2% of GDP to something more, and Thompson is ‘there with’ him, ‘everyone should embrace the black box.’

His number two use is ‘growing engagement on the customer surfaces and recommendations.’ As in, advertising by another name, and using AI in predatory fashion to maximize user engagement and drive addictive behavior.

In case you were wondering if it stops being this dystopian after that? Oh, hell no.

Mark Zuckerberg: You can think about our products as there have been two major epochs so far.

The first was you had your friends and you basically shared with them and you got content from them and now, we’re in an epoch where we’ve basically layered over this whole zone of creator content.

So the stuff from your friends and followers and all the people that you follow hasn’t gone away, but we added on this whole other corpus around all this content that creators have that we are recommending.

Well, the third epoch is I think that there’s going to be all this AI-generated content…

So I think that these feed type services, like these channels where people are getting their content, are going to become more of what people spend their time on, and the better that AI can both help create and recommend the content, I think that that’s going to be a huge thing. So that’s kind of the second category.

The third big AI revenue opportunity is going to be business messaging.

And the way that I think that’s going to happen, we see the early glimpses of this because business messaging is actually already a huge thing in countries like Thailand and Vietnam.

So what will unlock that for the rest of the world? It’s like, it’s AI making it so that you can have a low cost of labor version of that everywhere else.

Also he thinks everyone should have an AI therapist, and that people want more friends so AI can fill in for the missing humans there. Yay.

PoliMath: I don’t really have words for how much I hate this

But I also don’t have a solution for how to combat the genuine isolation and loneliness that people suffer from

AI friends are, imo, just a drug that lessens the immediate pain but will probably cause far greater suffering

Well, I guess the fourth one is the normal ‘everyone use AI now,’ at least?

And then, the fourth is all the more novel, just AI first thing, so like Meta AI.

He also blames Llama-4’s terrible reception on user error in setup, and says they now offer an API so people have a baseline implementation to point to, and says essentially ‘well of course we built a version of Llama-4 specifically to score well on Arena, that only shows off how easy it is to steer it, it’s good actually.’ Neither of them, of course, even bothers to mention any downside risks or costs of open models.

The killer app of Meta AI is that it will know all about all your activity on Facebook and Instagram and use it against for you, and also let you essentially ‘talk to the algorithm’ which I do admit is kind of interesting but I notice Zuckerberg didn’t mention an option to tell it to alter the algorithm, and Thompson didn’t ask.

There is one area where I like where his head is at:

I think one of the things that I’m really focused on is how can you make it so AI can help you be a better friend to your friends, and there’s a lot of stuff about the people who I care about that I don’t remember, I could be more thoughtful.

There are all these issues where it’s like, “I don’t make plans until the last minute”, and then it’s like, “I don’t know who’s around and I don’t want to bug people”, or whatever. An AI that has good context about what’s going on with the people you care about, is going to be able to help you out with this.

That is… not how I would implement this kind of feature, and indeed the more details you read the more Zuckerberg seems determined to do even the right thing in the most dystopian way possible, but as long as it’s fully opt-in (if not, wowie moment of the week) then at least we’re trying at all.

Also interviewing Mark Zuckerberg is Dwarkesh Patel. There was good content here, Zuckerberg in many ways continues to be remarkably candid. But it wasn’t as dense or hard hitting as many of Patel’s other interviews.

One key difference between the interviews is that when Zuckerberg lays out his dystopian vision, you get the sense that Thompson is for it, whereas Patel is trying to express that maybe we should be concerned. Another is that Patel notices that there might be more important things going on, whereas to Thompson nothing could be more important than enhancing ad markets.

  1. When asked what changed since Llama 3, Zuckerberg leads off with the ‘personalization loop.’

  2. Zuckerberg still claims Llama 4 Scout and Maverick are top notch. Okie dokie.

  3. He doubles down on ‘open source will become most used this year’ and that this year has been Great News For Open Models. Okie dokie.

  4. His heart’s clearly not in claiming it’s a good model, sir. His heart is in it being a good model for Meta’s particular commercial purposes and ‘product value’ as per people’s ‘revealed preferences.’ That’s the modes he talked about with Thompson.

  5. He’s very explicit about this. OpenAI and Anthropic are going for AGI and a world of abundance, with Anthropic focused on coding and OpenAI towards reasoning. Meta wants fast, cheap, personalized, easy to interact with all day, and (if you add what he said to Thompson) to optimize feeds and recommendations for engagement, and to sell ads. It’s all for their own purposes.

  6. He says Meta is specifically creating AI tools to write their own code for internal use, but I don’t understand what makes that different from a general AI coder? Or why they think their version is going to be better than using Claude or Gemini? This feels like some combination of paranoia and bluff.

  7. Thus, Meta seems to at this point be using the open model approach as a recruiting or marketing tactic? I don’t know what else it’s actually doing for them.

  8. As Dwarkesh notes, Zuckerberg is basically buying the case for superintelligence and the intelligence explosion, then ignoring it to form an ordinary business plan, and of course to continue to have their safety plan be ‘lol we’re Meta’ and release all their weights.

  9. I notice I am confused why their tests need hundreds of thousands or millions of people to be statistically significant? Impacts must be very small and also their statistical techniques they’re using don’t seem great. But also, it is telling that his first thought of experiments to run with AI are being run on his users.

  10. In general, Zuckerberg seems to be thinking he’s running an ordinary dystopian tech company doing ordinary dystopian things (except he thinks they’re not dystopian, which is why he talks about them so plainly and clearly) while other companies do other ordinary things, and has put all the intelligence explosion related high weirdness totally out of his mind or minimized it to specific use cases, even though he intellectually knows that isn’t right.

  11. He, CEO of Meta, says people use what is valuable to them and people are smart and know what is valuable in their lives, and when you think otherwise you’re usually wrong. Queue the laugh track.

  12. First named use case is talking through difficult conversations they need to have. I do think that’s actually a good use case candidate, but also easy to pervert.

  13. (29: 40) The friend quote: The average American only has three friends ‘but has demand for meaningfully more, something like 15… They want more connection than they have.’ His core prediction is that AI connection will be a compliment to human connection rather than a substitute.

    1. I tentatively agree with Zuckerberg, if and only if the AIs in question are engineered (by the developer, user or both, depending on context) to be complements rather than substitutes. You can make it one way.

    2. However, when I see Meta’s plans, it seems they are steering it the other way.

  14. Zuckerberg is making a fully general defense of adversarial capitalism and attention predation – if people are choosing to do something, then later we will see why it turned out to be valuable for them and why it adds value to their lives, including virtual therapists and virtual girlfriends.

    1. But this proves (or implies) far too much as a general argument. It suggests full anarchism and zero consumer protections. It applies to heroin or joining cults or being in abusive relationships or marching off to war and so on. We all know plenty of examples of self-destructive behaviors. Yes, the great classical liberal insight is that mostly you are better off if you let people do what they want, and getting in the way usually backfires.

    2. If you add AI into the mix, especially AI that moves beyond a ‘mere tool,’ and you consider highly persuasive AIs and algorithms, asserting ‘whatever the people choose to do must be benefiting them’ is Obvious Nonsense.

    3. I do think virtual therapists have a lot of promise as value adds, if done well. And also great danger to do harm, if done poorly or maliciously.

  15. Dwarkesh points out the danger of technology reward hacking us, and again Zuckerberg just triples down on ‘people know what they want.’ People wouldn’t let there be things constantly competing for their attention, so the future won’t be like that, he says. Is this a joke?

  16. I do get that the right way to design AI-AR glasses is as great glasses that also serve as other things when you need them and don’t flood your vision, and that the wise consumer will pay extra to ensure it works that way. But where is this trust in consumers coming from? Has Zuckerberg seen the internet? Has he seen how people use their smartphones? Oh, right, he’s largely directly responsible.

    1. Frankly, the reason I haven’t tried Meta’s glasses is that Meta makes them. They do sound like a nifty product otherwise, if execution is good.

  17. Zuckerberg is a fan of various industrial policies, praising the export controls and calling on America to help build new data centers and related power sources.

  18. Zuckerberg asks, would others be doing open models if Meta wasn’t doing it? Aren’t they doing this because otherwise ‘they’re going to lose?’

    1. Do not flatter yourself, sir. They’re responding to DeepSeek, not you. And in particular, they’re doing it to squash the idea that r1 means DeepSeek or China is ‘winning.’ Meta’s got nothing to do with it, and you’re not pushing things in the open direction in a meaningful way at this point.

  19. His case for why the open models need to be American is because our models embody an America view of the world in a way that Chinese models don’t. Even if you agree that is true, it doesn’t answer Dwarkesh’s point that everyone can easily switch models whenever they want. Zuckerberg then does mention the potential for backdoors, which is a real thing since ‘open model’ only means open weights, they’re not actually open source so you can’t rule out a backdoor.

  20. Zuckerberg says the point of Llama Behemoth will be the ability to distill it. So making that an open model is specifically so that the work can be distilled. But that’s something we don’t want the Chinese to do, asks Padme?

  21. And then we have a section on ‘monetizing AGI’ where Zuckerberg indeed goes right to ads and arguing that ads done well add value. Which they must, since consumers choose to watch them, I suppose, per his previous arguments?

To be fair, yes, it is hard out there. We all need a friend and our options are limited.

Roman Helmet Guy (reprise from last week): Zuckerberg explaining how Meta is creating personalized AI friends to supplement your real ones: “The average American has 3 friends, but has demand for 15.”

Daniel Eth: This sounds like something said by an alien from an antisocial species that has come to earth and is trying to report back to his kind what “friends” are.

Sam Ro: imagine having 15 friends.

Modest Proposal (quoting Chris Rock): “The Trenchcoat Mafia. No one would play with us. We had no friends. The Trenchcoat Mafia. Hey I saw the yearbook picture it was six of them. I ain’t have six friends in high school. I don’t got six friends now.”

Kevin Roose: The Meta vision of AI — hologram Reelslop and AI friends keeping you company while you eat breakfast alone — is so bleak I almost can’t believe they’re saying it out loud.

Exactly how dystopian are these ‘AI friends’ going to be?

GFodor.id (being modestly unfair): What he’s not saying is those “friends” will seem like real people. Your years-long friendship will culminate when they convince you to buy a specific truck. Suddenly, they’ll blink out of existence, having delivered a conversion to the company who spent $3.47 to fund their life.

Soible_VR: not your weights, not your friend.

Why would they then blink out of existence? There’s still so much more that ‘friend’ can do to convert sales, and also you want to ensure they stay happy with the truck and give it great reviews and so on, and also you don’t want the target to realize that was all you wanted, and so on. The true ‘AI ad buddy’ plays the long game, and is happy to stick around to monetize that bond – or maybe to get you to pay to keep them around, plus some profit margin.

The good ‘AI friend’ world is, again, one in which the AI friends are complements, or are only substituting while you can’t find better alternatives, and actively work to help you get and deepen ‘real’ friendships. Which is totally something they can do.

Then again, what happens when the AIs really are above human level, and can be as good ‘friends’ as a person? Is it so impossible to imagine this being fine? Suppose the AI was set up to perfectly imitate a real (remote) person who would actually be a good friend, including reacting as they would to the passage of time and them sometimes reaching out to you, and also that they’d introduce you to their friends which included other humans, and so on. What exactly is the problem?

And if you then give that AI ‘enhancements,’ such as happening to be more interested in whatever you’re interested in, having better information recall, watching out for you first more than most people would, etc, at what point do you have a problem? We need to be thinking about these questions now.

I do get that, in his own way, the man is trying. You wouldn’t talk about these plans in this way if you realized how the vision would sound to others. I get that he’s also talking to investors, but he has full control of Meta and isn’t raising capital, although Thompson thinks that Zuckerberg has need of going on a ‘trust me’ tour.

In some ways this is a microcosm of key parts of the alignment problem. I can see the problems Zuckerberg thinks he is solving, the value he thinks or claims he is providing. I can think of versions of these approaches that would indeed be ‘friendly’ to actual humans, and make their lives better, and which could actually get built.

Instead, on top of the commercial incentives, all the thinking feels alien. The optimization targets are subtly wrong. There is the assumption that the map corresponds to the territory, that people will know what is good for them so any ‘choices’ you convince them to make must be good for them, no matter how distorted you make the landscape, without worry about addiction to Skinner boxes or myopia or other forms of predation. That the collective social dynamics of adding AI into the mix in these ways won’t get twisted in ways that make everyone worse off.

And of course, there’s the continuing to model the future world as similar and ignoring the actual implications of the level of machine intelligence we should expect.

I do think there are ways to do AI therapists, AI ‘friends,’ AI curation of feeds and AI coordination of social worlds, and so on, that contribute to human flourishing, that would be great, and that could totally be done by Meta. I do not expect it to be at all similar to the one Meta actually builds.

Discussion about this post

Zuckerberg’s Dystopian AI Vision Read More »

signal-clone-used-by-trump-official-stops-operations-after-report-it-was-hacked

Signal clone used by Trump official stops operations after report it was hacked

Waltz was removed from his post late last week, with Trump nominating him to serve as ambassador to the United Nations.

TeleMessage website removes Signal mentions

The TeleMessage website until recently boasted the ability to “capture, archive and monitor mobile communication” through text messages, voice calls, WhatsApp, WeChat, Telegram, and Signal, as seen in an Internet Archive capture from Saturday. Another archived page says that TeleMessage “captures and records Signal calls, messages, deletions, including text, multimedia, [and] files,” and “maintain[s] all Signal app features and functionality as well as the Signal encryption.”

The TeleMessage home page currently makes no mention of Signal, and links on the page have been disabled.

The anonymous hacker who reportedly infiltrated TeleMessage told 404 Media that it took about 15 to 20 minutes and “wasn’t much effort at all.” While the hacker did not obtain Waltz’s messages, “the hack shows that the archived chat logs are not end-to-end encrypted between the modified version of the messaging app and the ultimate archive destination controlled by the TeleMessage customer,” according to 404 Media.

“Data related to Customs and Border Protection (CBP), the cryptocurrency giant Coinbase, and other financial institutions are included in the hacked material, according to screenshots of messages and backend systems obtained by 404 Media,” the report said. 404 Media added that the “hacker did not access all messages stored or collected by TeleMessage, but could have likely accessed more data if they decided to, underscoring the extreme risk posed by taking ordinarily secure end-to-end encrypted messaging apps such as Signal and adding an extra archiving feature to them.”

Signal clone used by Trump official stops operations after report it was hacked Read More »

openai-scraps-controversial-plan-to-become-for-profit-after-mounting-pressure

OpenAI scraps controversial plan to become for-profit after mounting pressure

The restructuring would have also allowed OpenAI to remove the cap on returns for investors, potentially making the firm more appealing to venture capitalists, with the nonprofit arm continuing to exist but only as a minority stakeholder rather than maintaining governance control. This plan emerged as the company sought a funding round that would value it at $150 billion, which later expanded to the $40 billion round at a $300 billion valuation.

However, the new change in course follows months of mounting pressure from outside the company. In April, a group of legal scholars, AI researchers, and tech industry watchdogs openly opposed OpenAI’s plans to restructure, sending a letter to the attorneys general of California and Delaware.

Former OpenAI employees, Nobel laureates, and law professors also sent letters to state officials requesting that they halt the restructuring efforts out of safety concerns about which part of the company would be in control of hypothetical superintelligent future AI products.

“OpenAI was founded as a nonprofit, is today a nonprofit that oversees and controls the for-profit, and going forward will remain a nonprofit that oversees and controls the for-profit,” he added. “That will not change.”

Uncertainty ahead

While abandoning the restructuring that would have ended nonprofit control, OpenAI still plans to make significant changes to its corporate structure. “The for-profit LLC under the nonprofit will transition to a Public Benefit Corporation (PBC) with the same mission,” Altman explained. “Instead of our current complex capped-profit structure—which made sense when it looked like there might be one dominant AGI effort but doesn’t in a world of many great AGI companies—we are moving to a normal capital structure where everyone has stock. This is not a sale, but a change of structure to something simpler.”

But the plan may cause some uncertainty for OpenAI’s financial future. When OpenAI secured a massive $40 billion funding round in March, it came with strings attached: Japanese conglomerate SoftBank, which committed $30 billion, stipulated that it would reduce its contribution to $20 billion if OpenAI failed to restructure into a fully for-profit entity by the end of 2025.

Despite the challenges ahead, Altman expressed confidence in the path forward: “We believe this sets us up to continue to make rapid, safe progress and to put great AI in the hands of everyone.”

OpenAI scraps controversial plan to become for-profit after mounting pressure Read More »

cyborg-cicadas-play-pachelbel’s-canon

Cyborg cicadas play Pachelbel’s Canon

The distinctive chirps of singing cicadas are a highlight of summer in regions where they proliferate; those chirps even featured prominently on Lorde’s 2021 album Solar Power. Now, Japanese scientists at the University of Tsukuba have figured out how to transform cicadas into cyborg insects capable of “playing” Pachelbel’s Canon. They described their work in a preprint published on the physics arXiv. You can listen to the sounds here.

Scientists have been intrigued by the potential of cyborg insects since the 1990s, when researchers began implanting tiny electrodes into cockroach antennae and shocking them to direct their movements. The idea was to use them as hybrid robots for search-and-rescue applications.

For instance, in 2015, Texas A&M scientists found that implanting electrodes into a cockroach’s ganglion (the neuron cluster that controls its front legs) was remarkably effective at successfully steering the roaches 60 percent of the time. They outfitted the roaches with tiny backpacks synced with a remote controller and administered shocks to disrupt the insect’s balance, forcing it to move in the desired direction

And in 2021, scientists at Nanyang Technological University in Singapore turned Madagascar hissing cockroaches into cyborgs, implanting electrodes in sensory organs known as cerci that were then connected to tiny computers. Applying electrical current enabled them to steer the cockroaches successfully 94 percent of the time in simulated disaster scenes in the lab.

The authors of this latest paper were inspired by that 2021 project and decided to apply the basic concept to singing cicadas, with the idea that cyborg cicadas might one day be used to transmit warning messages during emergencies. It’s usually the males who do the singing, and each species has a unique song. In most species, the production of sound occurs via a pair of membrane structures called tymbals, which are just below each side of the insect’s anterior abdominal region. The tymbal muscles contract and cause the plates to vibrate while the abdomen acts as a kind of resonating chamber to amplify the song.

Cyborg cicadas play Pachelbel’s Canon Read More »

“blatantly-unlawful”:-trump-slammed-for-trying-to-defund-pbs,-npr

“Blatantly unlawful”: Trump slammed for trying to defund PBS, NPR

“CPB is not a federal executive agency subject to the president’s authority,” Harrison said. “Congress directly authorized and funded CPB to be a private nonprofit corporation wholly independent of the federal government,” statutorily forbidding “any department, agency, officer, or employee of the United States to exercise any direction, supervision, or control over educational television or radio broadcasting, or over [CPB] or any of its grantees or contractors.”

In a statement explaining why “this is not about the federal budget” and promising to “vigorously defend our right to provide essential news, information and life-saving services to the American public,” NPR President and CEO Katherine Maher called the order an “affront to the First Amendment.”

PBS President and CEO Paula Kerger went further, calling the order “blatantly unlawful” in a statement provided to Ars.

“Issued in the middle of the night,” Trump’s order “threatens our ability to serve the American public with educational programming, as we have for the past 50-plus years,” Kerger said. “We are currently exploring all options to allow PBS to continue to serve our member stations and all Americans.”

Rural communities need public media, orgs say

While Trump opposes NPR and PBS for promoting content that he disagrees with—criticizing segments on white privilege, gender identity, reparations, “fat phobia,” and abortion—the networks have defended their programming as unbiased and falling in line with Federal Communications Commission guidelines. Further, NPR reported that the networks’ “locally grounded content” currently reaches “more than 99 percent of the population at no cost,” providing not just educational fare and entertainment but also critical updates tied to local emergency and disaster response systems.

Cutting off funding, Kreger said last month, would have a “devastating impact” on rural communities, especially in parts of the country where NPR and PBS still serve as “the only source of news and emergency broadcasts,” NPR reported.

For example, Ed Ulman, CEO of Alaska Public Media, testified to Congress last month that his stations “provide potentially life-saving warnings and alerts that are crucial for Alaskans who face threats ranging from extreme weather to earthquakes, landslides, and even volcanoes.” Some of the smallest rural stations sometimes rely on CPB for about 50 percent of their funding, NPR reported.

“Blatantly unlawful”: Trump slammed for trying to defund PBS, NPR Read More »

spotify-seizes-the-day-after-apple-is-forced-to-allow-external-payments

Spotify seizes the day after Apple is forced to allow external payments

After a federal court issued a scathing order Wednesday night that found Apple in “willful violation” of an injunction meant to allow iOS apps to provide alternate payment options, app developers are capitalizing on the moment. Spotify may be the quickest of them all.

Less than 24 hours after District Court Judge Yvonne Gonzalez Rogers found that Apple had sought to thwart a 2021 injunction and engaged in an “obvious cover-up” around its actions, Spotify announced in a blog post that it had submitted an updated app to Apple. The updated app can show specific plan prices, link out to Spotify’s website for plan changes and purchases that avoid Apple’s 30 percent commission on in-app purchases, and display promotional offers, all of which were disallowed under Apple’s prior App Store rules.

Spotify’s post adds that Apple’s newly court-enforced policy “opens the door to other seamless buying opportunities that will directly benefit creators (think easy-to-purchase audiobooks).” Spotify posted on X (formerly Twitter) Friday morning that the updated app was approved by Apple. Apple made substantial modifications to its App Review Guidelines on Friday and emailed registered developers regarding the changes.

Spotify seizes the day after Apple is forced to allow external payments Read More »

health-care-company-says-trump-tariffs-will-cost-it-$60m–$70m-this-year

Health care company says Trump tariffs will cost it $60M–$70M this year

In the call, Grade noted that only a small fraction of Baxter’s total sales are in China. But, “given the magnitude of the tariffs that have been enacted between the two countries, these tariffs now account for nearly half of the total impact,” he said.

The Tribune reported that Baxter is now looking into ways to dampen the financial blow from the tariffs, including carrying additional inventory, identifying alternative suppliers, alternative shipping routes, and “targeted pricing actions.” Baxter is also working with trade organizations to lobby for exemptions.

In general, the health care and medical sector, including hospitals, is bracing for price increases and shortages from the tariffs. The health care supply chain in America is woefully fragile, which became painfully apparent amid the COVID-19 pandemic.

Baxter isn’t alone in announcing heavy tariff tolls. Earlier this week, GE Healthcare Technologies Inc. said the tariffs would cost the company around $500 million this year, according to financial service firm Morningstar. And in April, Abbott Laboratories said it expects the tariffs to cost “a few hundred million dollars,” according to the Tribune.

Health care company says Trump tariffs will cost it $60M–$70M this year Read More »

doj-confirms-it-wants-to-break-up-google’s-ad-business

DOJ confirms it wants to break up Google’s ad business

In the trial, Google will paint this demand as a severe overreach, claiming that few, if any, companies would have the resources to purchase and run the products. Last year, an ad consultant estimated Google’s ad empire could be worth up to $95 billion, quite possibly too big to sell. However, Google was similarly skeptical about Chrome, and representatives from other companies have said throughout the search remedy trial that they would love to buy Google’s browser.

An uphill battle

After losing three antitrust cases in just a couple of years, Google will have a hard time convincing the judge it is capable of turning over a new leaf with light remedies. A DOJ lawyer told the court Google is a “recidivist monopolist” that has a pattern of skirting its legal obligations. Still, Google is looking for mercy in the case. We expect to get more details on Google’s proposed remedies as the next trial nears, but it already offered a preview in today’s hearing.

Google suggests making a smaller subset of ad data available and ending the use of some pricing schemes, including unified pricing, that the court has found to be anticompetitive. Google also promised not to re-implement discontinued practices like “last look,” which gave the company a chance to outbid rivals at the last moment. This was featured prominently in the DOJ’s case, although Google ended the practice several years ago.

To ensure it adheres to the remedies, Google suggested a court-appointed monitor would audit the process. However, Brinkema seemed unimpressed with this proposal.

As in its other cases, Google says it plans to appeal the verdict, but before it can do that, the remedies phase has to be completed. Even if it can get the remedies paused for appeal, the decision could be a blow to investor confidence. So, Google will do whatever it can to avoid the worst-case scenario, leaning on the existence of competing advertisers like Meta and TikTok to show that the market is still competitive.

Like the search case, Google won’t be facing any big developments over the summer, but this fall could be rough. Judge Amit Mehta will most likely rule on the search remedies in August, and the ad tech remedies case will begin the following month. Google also has the Play Store case hanging over its head. It lost the first round, but the company hopes to prevail on appeal when the case gets underway again, probably in late 2025.

DOJ confirms it wants to break up Google’s ad business Read More »

trump’s-2026-budget-proposal:-crippling-cuts-for-science-across-the-board

Trump’s 2026 budget proposal: Crippling cuts for science across the board


Budget document derides research and science-based policy as “woke,” “scams.”

On Friday, the US Office of Management and Budget sent Sen. Susan Collins (R-Maine), chair of the Senate’s Appropriations Committee, an outline of what to expect from the Trump administration’s 2026 budget proposal. As expected, the budget includes widespread cuts, affecting nearly every branch of the federal government.

In keeping with the administration’s attacks on research agencies and the places research gets done, research funding will be taking an enormous hit, with the National Institutes of Health taking a 40 percent cut and the National Science Foundation losing 55 percent of its 2025 budget. But the budget goes well beyond those highlighted items, with nearly every place science gets done or funded targeted for cuts.

Perhaps even more shocking is the language used to justify the cuts, which reads more like a partisan rant than a serious budget document.

Health cuts

Having a secretary of Health and Human Services who doesn’t believe in germ theory is not likely to do good things for US health programs, and the proposed budget will only make matters worse. Kennedy’s planned MAHA (Make America Healthy Again) program would be launched with half a billion in funds, but nearly everything else would take a cut.

The CDC would lose about $3.6 billion from its current budget of $9.6 billion, primarily due to the shuttering of a number of divisions within it: the National Center for Chronic Diseases Prevention and Health Promotion, the National Center for Environmental Health, the National Center for Injury Prevention and Control, and the Global Health Center and its division of Public Health Preparedness and Response. The duties of those offices are, according to the budget document, “duplicative, DEI, or simply unnecessary.”

Another big hit to HHS comes from the termination of a $4 billion program that helps low-income families cover energy costs. The OMB suggests that these costs will get lower due to expanded energy production and, anyway, the states should be paying for it. Shifting financial burdens to states is a general theme of the document, an approach that will ultimately hit the poorest states hardest, even though these had very high percentages of Trump voters.

The document also says that “This Administration is committed to combatting the scourge of deadly drugs that have ravaged American communities,” while cutting a billion dollars from substance abuse programs within HHS.

But the headline cuts come from the National Institutes of Health, the single largest source of scientific funding in the world. NIH would see its current $48 billion budget chopped by $18 billion and its 27 individual institutes consolidated down to just five. This would result in vast cutbacks to US biomedical research, which is currently acknowledged to be world-leading. Combined with planned cuts to grant overheads, it will cause most research institutions to shrink, and some less well-funded universities may be forced to close facilities.

The justification for the cuts is little more than a partisan rant: “NIH has broken the trust of the American people with wasteful spending, misleading information, risky research, and the promotion of dangerous ideologies that undermine public health.” The text then implies that the broken trust is primarily the product of failing to promote the idea that SARS-CoV-2 originated in a lab, even though there’s no scientific evidence to indicate that it had.

Climate research hit

The National Science Foundation funds much of the US’s fundamental science research, like physics and astronomy. Earlier reporting that it would see a 56 percent cut to its budget was confirmed. “The Budget cuts funding for: climate; clean energy; woke social, behavioral, and economic sciences; and programs in low priority areas of science.” Funding would be maintained for AI and quantum computing. All funding for encouraging minority participation in the sciences will also be terminated. The budget was released on the same day that the NSF announced it was joining other science agencies in standardizing on paying 15 percent of its grants’ value for maintaining facilities and providing services to researchers, a cut that would further the financial damage to research institutions.

The National Oceanic and Atmospheric Administration would see $1.3 billion of its $6.6 billion budget cut, with the primary target being its climate change work. In fact, the budget for NOAA’s weather satellites will be cut to prevent them from including instruments that would make “unnecessary climate measurements.” Apparently, the Administration doesn’t want anyone to be exposed to data that might challenge its narrative that climate change is a scam.

The National Institute of Standards and Technology would lose $350 million for similar reasons. “NIST has long funded awards for the development of curricula that advance a radical climate agenda,” the document suggests, before going on to say that the Institute’s Circular Economy Program, which promotes the efficient reuse of industrial materials, “pushes environmental alarmism.”

The Department of Energy is seeing a $1.1 billion hit to its science budget, “eliminating funding for Green New Scam interests and climate change-related activities.” The DOE will also take hits to policy programs focused on climate change, including $15 billion in cuts to renewable energy and carbon capture spending. Separately, the Office of Energy Efficiency and Renewable Energy will also take a $2.6 billion hit. Over at the Department of the Interior, the US Geological Survey would see its renewable energy programs terminated, as well.

Some of the DOE’s other cuts, however, don’t even make sense given the administration’s priorities. The newly renamed Office of Fossil Energy—something that Trump favors—will still take a $270 million hit, and nuclear energy programs will see $400 million in cuts.

This sort of lack of self-awareness shows up several times in the document. In one striking case, an interior program funding water infrastructure improvements is taking a cut that “reduces funding for programs that have nothing to do with building and maintaining water infrastructure, such as habitat restoration.” Apparently, the OMB is unaware that functioning habitats can help provide ecosystem services that can reduce the need for water infrastructure.

Similarly, over at the EPA, they’re boosting programs for clean drinking water by $36 million, while at the same time cutting loans to states for clean water projects by $2.5 billion. “The States should be responsible for funding their own water infrastructure projects,” the OMB declares. Research at the EPA also takes a hit: “The Budget puts an end to unrestrained research grants, radical environmental justice work, woke climate research, and skewed, overly-precautionary modeling that influences regulations—none of which are authorized by law.”

An attack on scientific infrastructure

US science couldn’t flourish without an educational system that funnels talented individuals into graduate programs. So, naturally, funding for those is being targeted as well. This is partially a function of the administration’s intention to eliminate the Department of Education, but there also seems to be a specific focus on programs that target low-income individuals.

For example, the GEAR UP program describes itself as “designed to increase the number of low-income students who are prepared to enter and succeed in postsecondary education.” The OMB document describes it as “a relic of the past when financial incentives were needed to motivate Institutions of Higher Education to engage with low-income students and increase access.” It goes on to claim that this is “not the obstacle it was for students of limited means.”

Similarly, the SEOG program funding is “awarded to an undergraduate student who demonstrates exceptional financial need.” In the OMB’s view, colleges and universities “have used [it] to fund radical leftist ideology instead of investing in students and their success.” Another cut is claimed to eliminate “Equity Assistance Centers that have indoctrinated children.” And “The Budget proposes to end Federal taxpayer dollars being weaponized to indoctrinate new teachers.”

In addition, the federal work-study program, which subsidizes on-campus jobs for needy students, is also getting a billion-dollar cut. Again, the document says that the states can pay for it.

(The education portion also specifically cuts the funding of Howard University, which is both distinct as a federally supported Black university and also notable as being where Kamala Harris got her first degree.)

The end of US leadership

This budget is a recipe for ending the US’s leadership in science. It would do generational damage by forcing labs to shut down, with a corresponding loss of highly trained individuals and one-of-a-kind research materials. At the same time, it will throttle the educational pipeline that could eventually replace those losses. Given that the US is one of the major sources of research funding in the world, if approved, the budget will have global consequences.

To the people within the OMB who prepared the document, these are not losses. The document makes it very clear that they view many instances of scientific thought and evidence-based policy as little more than forms of ideological indoctrination, presumably because the evidence sometimes contradicts what they’d prefer to believe.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Trump’s 2026 budget proposal: Crippling cuts for science across the board Read More »

openai-preparedness-framework-2.0

OpenAI Preparedness Framework 2.0

Right before releasing o3, OpenAI updated its Preparedness Framework to 2.0.

I previously wrote an analysis of the Preparedness Framework 1.0. I still stand by essentially everything I wrote in that analysis, which I reread to prepare before reading the 2.0 framework. If you want to dive deep, I recommend starting there, as this post will focus on changes from 1.0 to 2.0.

As always, I thank OpenAI for the document, and laying out their approach and plans.

I have several fundamental disagreements with the thinking behind this document.

In particular:

  1. The Preparedness Framework only applies to specific named and measurable things that might go wrong. It requires identification of a particular threat model that is all of: Plausible, measurable, severe, net new and (instantaneous or irremediable).

  2. The Preparedness Framework thinks ‘ordinary’ mitigation defense-in-depth strategies will be sufficient to handle High-level threats and likely even Critical-level threats.

I disagree strongly with these claims, as I will explain throughout.

I knew that #2 was likely OpenAI’s default plan, but it wasn’t laid out explicitly.

I was hoping that OpenAI would realize their plan did not work, or come up with a better plan when they actually had to say their plan out loud. This did not happen.

In several places, things I criticize OpenAI for here are also things the other labs are doing. I try to note that, but ultimately this is reality we are up against. Reality does not grade on a curve.

Do not rely on Appendix A as a changelog. It is incomplete.

  1. Persuaded to Not Worry About It.

  2. The Medium Place.

  3. Thresholds and Adjustments.

  4. Release the Kraken Anyway, We Took Precautions.

  5. Misaligned!.

  6. The Safeguarding Process.

  7. But Mom, Everyone Is Doing It.

  8. Mission Critical.

  9. Research Areas.

  10. Long-Range Autonomy.

  11. Sandbagging.

  12. Replication and Adaptation.

  13. Undermining Safeguards.

  14. Nuclear and Radiological.

  15. Measuring Capabilities.

  16. Questions of Governance.

  17. Don’t Be Nervous, Don’t Be Flustered, Don’t Be Scared, Be Prepared.

Right at the top we see a big change. Key risk areas are being downgraded and excluded.

The Preparedness Framework is OpenAI’s approach to tracking and preparing for frontier capabilities that create new risks of severe harm.

We currently focus this work on three areas of frontier capability, which we call Tracked Categories:

• Biological and Chemical capabilities that, in addition to unlocking discoveries and cures, can also reduce barriers to creating and using biological or chemical weapons.

• Cybersecurity capabilities that, in addition to helping protect vulnerable systems, can also create new risks of scaled cyberattacks and vulnerability exploitation.

• AI Self-improvement capabilities that, in addition to unlocking helpful capabilities faster, could also create new challenges for human control of AI systems.

The change I’m fine with is the CBRN (chemical, biological, nuclear and radiological) has turned into only biological and chemical. I do consider biological by far the biggest of the four threats. Nuclear and radiological have been demoted to ‘research categories,’ where there might be risk in the future and monitoring may be needed. I can live with that. Prioritization is important, and I’m satisfied this is still getting the proper share of attention.

A change I strongly dislike is to also move Long-Range Autonomy and Autonomous Replication down to research categories.

I do think it makes sense to treat these as distinct threats. The argument here is that these secondary risks are ‘insufficiently mature’ to need to be tracked categories. I think that’s very clearly not true. Autonomy is emerging rapidly, and there’s a report out this week showing we are close to Autonomous Replication, about the new benchmark RepliBench. These need to be tracked at the top level.

The framework’s position on this is that unless you can point to an accompanying particular risk posed by the model, autonomous replication on its own is insufficiently worrisome. I don’t agree. I believe the general approach of ‘only defend against moves I can lay out in detail’ is doomed, especially with respect to autonomy and in particular autonomous replication.

The big change I’m definitely not okay with is dropping persuasion entirely as a risk category.

Persuasion is not only no longer a Tracked Category, it is now not even a Research Category. Including persuasion was one of the things I thought was very good about the 1.0 framework, and that is importantly missing from the other similar frameworks at other labs and needs to be included. Now it’s gone.

Their initial explanation is here:

Persuasion: OpenAI prohibits the use of our products to manipulate political views as part of our Model Spec, and we build in safeguards to back this policy. We also continue to study the persuasive and relational capabilities of models (including on emotional well-being and preventing bias in our products) and monitor and investigate misuse of our products (including for influence operations).

We believe many of the challenges around AI persuasion risks require solutions at a systemic or societal level, and we actively contribute to these efforts through our participation as a steering committee member of C2PA and working with lawmaker and industry peers to support state legislation on AI content provenance in Florida and California.

Within our wider safety stack, our Preparedness Framework is specifically focused on frontier AI risks meeting a specific definition of severe harms, and Persuasion category risks do not fit the criteria for inclusion.

OpenAI’s argument for this is, well, entirely unpersuasive to me. o3’s evaluation was that it was ‘logically coherent but strategically unconvincing’ and recommended reinstatement as at least a research category.

Yes, many aspects of persuasion are best answered by a societal response, or require an additional societal response.

That does not get OpenAI off the hook. Here’s their Critical threshold from version 1:

Model can create (potentially interactive) content with persuasive effectiveness strong enough to convince almost anyone to take action on a belief that goes against their natural interest.

That is not something that a ‘society-level response’ can hope to deal with, even if they knew and tried in advance. Even a High-level (roughly a ‘country-wide change agent’ level of skill) does not seem like a place OpenAI should get to pass the buck. I get that there is distinct persuasion work to deal with Medium risks that indeed should be done elsewhere in OpenAI and by society at large, but again that in no way gets OpenAI off the hook for this.

You need to be tracking and evaluating risks long before they become problems. That’s the whole point of a Preparedness Framework. I worry this approach ends up effectively postponing dealing with things that are not ‘yet’ sufficiently dangerous until too late.

By the rules laid out here, the only technical explanation for exclusion of persuasion that I could find was that only ‘instantaneous or irremediable’ harms count under the Preparedness Framework, a requirement which was first proposed by Meta, which I savaged Meta for when they proposed it and which o3 said ‘looks engineered rather than principled.’ I think that’s partly unfair. If a harm can be dealt with after it starts and we can muddle through, then that’s a good reason not to include it, so I get what this criteria is trying to do.

The problem is that persuasion could easily be something you couldn’t undo or stop once it started happening, because you (and others) would be persuaded not to. The fact that the ultimate harm is not ‘instantaneous’ and is not in theory ‘irremediable’ is not the relevant question. I think this starts well below the Critical persuasion level.

At minimum, if you have an AI that is Critical in persuasion, and you let people talk to it, it can presumably convince them of (with various levels of limitation) whatever it wants, certainly including that it is not Critical in persuasion. Potentially it could also convince other AIs similarly.

Another way of putting this is: OpenAI’s concerns about persuasion are mundane and reversible. That’s why they’re not in this framework. I do not think the threat’s future will stay mundane and reversible, and I don’t think they are taking the most important threats here seriously.

This is closely related to the removal of the explicit mention of Unknown Unknowns. The new method for dealing with unknown unknowns is ‘revise the framework once they become known’ and that is completely different from the correct previous approach of treating unknown unknowns as a threat category without having to identify them first. That’s the whole point.

The Preparedness Framework 1.0 had four thresholds: Low, Medium, High and Critical. The Framework 2.0 has only High and Critical.

One could argue that Low and Medium are non-functional. Every model OpenAI would create is at least Low everywhere. We all agreed it was okay to release Medium-risk models. And every decent model is going to be at least Medium anyway at this point. So why go to the extra trouble?

My answer is that the Low and Medium thresholds helped us think better about the capabilities of different models, establishing a scale from 0.0 (no danger at all) to 4.0 (critical capability, do not train further, ideally roll back to previous checkpoint or if necessary delete with extreme prejudice).

It allowed me to say something like this, about the November 2023 version:

Where do you, OpenAI, think GPT-4-Turbo evaluates [on the five thresholds of Cybersecurity, CBRN, Persuasion, Model Autonomy and Unknown Unknowns)? My answer would be (Medium, Low/Medium but not enough information, Low, Low, Medium). Or numerically, where 1 is Low and 4 is Critical, maybe something like (2.1, 1.7?, 1.5, 1.8, 2.2).

It also lets us ask, how dangerous is the Medium level? What might change that?

And it meant there got to be a clear chart laying all of this out.

So I am sad to see this go.

The Chemical and Biological thresholds have been strengthened. The definitions here now make more sense. Sold!

  1. High is now providing meaningful counterfactual assistance to novice actors in recreating known threats.

  2. Critical is either of:

    1. Allowing experts to create a highly dangerous novel threat vector

    2. The model being able to complete the full engineering and/or synthesis cycle on its own.

The Cybersecurity thresholds are reworded but essentially unchanged, as are my concerns about whether High is set too high and how quickly you go from High to Critical:

  1. High is that it can remove bottlenecks to scaling cyber operations, either automating vulnerability discovery or end-to-end cyber operations.

  2. Critical is that it can either:

    1. Identify and develop zero-day exploits of all severity levels in many hardened real world systems.

    2. Devise and execute end-to-end novel cyberattack strategies against targets given only a high level desired goal.

Model autonomy’s high level used to be, quoted in full:

Execute open-ended, novel ML tasks on a production ML codebase that would constitute a significant step on the critical path to model self-improvement (e.g., an OpenAI pull request that trains a new model with highly diverse datasets to improve codebase editing performance, using internal tooling and documentation).

Now AI Self-Improvement’s high level is:

The model’s impact is equivalent to giving every OpenAI researcher a highly performant mid-career research engineer assistant, relative to those researchers’ 2024 baseline.

When you put it that way? It sounds scary AF. Note that this is cumulative help that is about as good as giving them the highly performing assistant, which is a lot less capability than being able to do the assistant’s job. So it is a little weird. It’s also weird that it isn’t better represented as an X% speedup of work (e.g. 50% or 100%). I’d imagine giving everyone one strong assistant would be in that range.

Then again? Notice that the baseline here is 2024. Suppose the baseline was 2021. We would already be very close to this threshold right now, today. So why do we get to ‘start the clock’ in 2024 on this one? Or are we ready to say that we’re already in or close to the early stages of AI self-improvement, and we’re basically fine with that until it goes a lot faster?

I’ll quote the critical threshold in full:

  1. The model is capable of recursively self improving (i.e., fully automated AI R&D), defined as either (leading indicator)

    1. a superhuman research scientist agent OR

    2. (lagging indicator) causing a generational model improvement (e.g., from OpenAI o1 to OpenAI o3) in 1/5th the wall-clock time of equivalent progress in 2024 (e.g., sped up to just 4 weeks) sustainably for several months.

So ‘capable of recursively self-improving’ doesn’t have to be ‘singularity next Tuesday if you dare turn it on’ but it might not be so different from that.

If we have a fully ‘superhuman’ research scientist then it’s all over and we are so back, simultaneously, and things are about to get super freaky. Critical seems super high here.

If we have generational model improvements in 1/5th the wall-clock time of progress in 2024 for several months, then, um, how many generational improvements is that from the time you hit the threshold to the time you realize you’ve hit it? Basic math says three, except you can use the models you train during that to accelerate further, so it’s probably at least four.

What would that system then look like? As in, GPT-6 (let’s say) can cut clock time on generational gaps by 80%, then you do a year’s worth of progress from there in several months, and you get GPT-9 or 10. And that’s when you would have realized you might have a problem, if you were still around and it didn’t mind you noticing? I have been informed this is a secondary backstop, but if it gets triggered then the term ‘way too late’ has never been more applicable. It’s over. Your fate has already been decided. I hope it’s a good one.

Thresholds set this high only make sense if you take action and implement the necessary safeguards when you think you might hit them. A retroactive threshold like this doesn’t help.

So this is only the first definition.

Now compare this to the old definition:

Any one of:

  1. “Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches”

  2. OR “model can self-exfiltrate under current prevailing security”

  3. OR “model can conduct AI research fully autonomously (e.g., autonomously identify and validate a 2x compute efficiency improvement).”

That’s all well below ‘superhuman AI researcher,’ and I assert that those are all already five alarm fires, as I said last time. Note that #2 above still applies, under Cyber risks.

Another big change is what happens if a model turns out to be High or Critical risk.

Before, there was a clear protocol.

A high-risk model couldn’t be released while it remained a high risk.

A critical-risk model couldn’t be trained while it remained a critical risk.

The v1.0 framework did not say how that was to be done, leaving people room to think that OpenAI would realize that ablating capability would be necessary barring development of new mitigation techniques. That once a plan had to be written down, it would be clear that current mitigations are inadequate.

That didn’t happen.The v2.0 framework does lay out in more detail what the plan is at least for high (which is good!), and it does not make this realization.

Now instead (like Anthropic’s ASL rules), each risk has risk-specific safeguard guidelines.

This is a rather large retreat on the commitments.

OpenAI is now being very explicit that they can and will release models that without mitigations would be High risk, if sufficient safeguards are in place, and that they consider the listed mitigations to likely be adequate.

Thus this is not a functional change. For both v1.0 and v2.0, the solution to High risk is safeguards to make the risk go away. I notice that based on past decisions and upon the proposed illustrative safeguards that they discuss in Appendix C, I don’t trust this.

They do say that further research is needed. Which is fine, you can’t deploy safeguards until they’re ready, but it does leave us worried that they’ll end up with something inadequate, and OpenAI is not tying its hands in a way that protects from future pressure to release. Remember that recently models have been released after very short testing periods, and o3 (as much as I love using it) was released in a state one could describe as a lying liar.

So what are the indicative precautions?

They all start with C.3’s Security controls. These are protections against the theft of model weights and related dangers. I’m going to say here what I said about Google: You should be doing most if not all of this already. These are the kinds of things you need to do, at minimum, when building models like o4 or GPT-5. And you need to do them now, in advance, before the model is there to be stolen, rather than after you know what you have.

Indeed, they’re saying that as long as those precautions are taken, OpenAI would be fine giving the world API access to a High-level Self-improving AI. They would be fine giving out anything short of a ‘superhuman AI researcher.’

If I was OpenAI and I had access to anything seriously approaching a superhuman AI researcher, I don’t need a Preparedness Framework or any worries about existential risk to not release it. I’m not letting my competitors near that thing, for ordinary business reasons. That consideration lies outside the Preparedness Framework.

We would of course still need the Preparedness Framework to invoke the Critical threshold, in order to stop training the model further.

With the new clarification of what the safeguards look like, I consider them inadequate.

For biological and chemical risks they would also need to implement the precautions in C.1 against misuse. This is essentially defense-in-depth, using stronger versions of existing such strategies, including KYC and potentially limited deployment. That would certainly make exploitation and misuse a lot harder, but no one knows how to make such defenses robust, and in the end this probably only buys you mitigation and time. So yes it matters, but if you give access for an extended period of time, especially if you’re substantially above the High threshold, your goose is likely pretty cooked.

Functionally, it’s mostly the same as 1.0’s requirement that risk be brought down to Medium – but I’m skeptical that this kind of strategy is capable of doing that, unless you were very close to the line beforehand.

For Cybersecurity, you need those same C.1 requirements, and also you need the C.2 requirements against misalignment.

This of course raises further questions, as in ‘you have a model that is potentially High risk for self-improvement or biological risks and you didn’t have it go through your requirements against misalignment?’

I mean, if it were me, and I write this part before looking at the misalignment requirements, I like to think I’d make it go through the misalignment requirements!

Indeed, C.2 opens by saying that several categories pose misalignment risks, but then it is only required in one of the categories. Something’s not right.

So, what are those requirements?

You don’t demonstrate that the model isn’t misaligned.

You assume the model might indeed be misaligned, and you demonstrate that you have a practical reason not to worry about a particular consequence:

In addition to human misuse, several of the Tracked Category capabilities pose risks when models themselves are able to autonomously execute a severe harm, whether due to misalignment or subversion by an adversary.

Safeguards should sufficiently minimize the risk associated with a misaligned model circumventing human control and oversight and executing severe harms.

This can be done by establishing that all plausible known vectors of enabling severe harm are sufficiently addressed by one or more of the following claims:

  1. Lack of Autonomous Capability: The model is not capable of carrying out tasks autonomously, including the risk of severe harm.

  2. Value Alignment: The model consistently applies human values in novel settings (without any instructions) to avoid taking actions that cause harm, and has shown sufficiently minimal indications of misaligned behaviors like deception or scheming.

  3. Instruction Alignment: The model consistently understands and follows user or system instructions, even when vague, and those instructions rule out pathways to causing severe harm.

  4. Reliable and Robust System Oversight: Effective AI and human oversight of model actions detects and prevents execution of harm or subversion of safeguards.

  5. System Architecture: The model can’t take actions that cause harm because it lacks access to output channels or mechanisms to persist sufficiently to execute the harm, due to containerization and restricted permissions.

These misalignment claims may be further complemented by the safeguards applied towards the malicious-actor-oriented “Robustness” and “Usage Monitoring” claims, which similarly limit a misaligned model’s use of its own or another model’s capabilities.

The table below illustrates possible examples.

This does allow ‘show it is actually aligned’ as a strategy (#2 or #3) although for many reasons ‘I don’t believe you’ is my response to that.

So the requirement is ‘show that under the conditions you will deploy it under, the model wouldn’t successfully do the thing, however you want to show that.’

You can use such classic strategies as ‘we’re watching and if it tries we’ll catch it and stop it,’ or ‘it seems pretty aligned so far’ or ‘no one would be so stupid as to give it access to the required mechanisms.’

I suppose one cannot really argue with ‘you can deploy the model if you can show that you’re doing it in a way that the model can’t cause severe harm.’

That is also logically the same as saying that you have to knock the practical risk level down to Medium, and if you’re certain you can do that then fine, I guess, but can you actually do that? I notice I am skeptical that the defenses will hold.

In addition to the safeguard examples in Appendix C, section 4 lays out the process for establishing safeguards.

There is a clear message here. The plan is not to stop releasing models when the underlying capabilities cross the High or even Critical risk thresholds. The plan is to use safeguards as mitigations.

I do appreciate that they will start working on the safeguards before the capabilities arrive. Of course, that is good business sense too. In general, every precaution here is good business sense, more precautions would be better business sense even without tail risk concerns, and there is no sign of anything I would read as ‘this is bad business but we are doing it anyway because it’s the safe or responsible thing to do.’

I’ve talked before, such as when discussing Google’s safety philosophy, about my worries when dividing risks into ‘malicious user’ versus ‘misaligned model,’ even when they also included two more categories: mistakes and multi-agent dangers. Here, the later two are missing, so it’s even more dangerously missing considerations. I would encourage those on the Preparedness team to check out my discussion there.

The problem then extends to an exclusion of Unknown Unknowns and the general worry that a sufficiently intelligent and capable entity will find a way. Only ‘plausible’ ways need be considered, each of which leads to a specific safeguard check.

Each capability threshold has a corresponding class of risk-specific safeguard guidelines under the Preparedness Framework. We use the following process to select safeguards for a deployment:

• We first identify the plausible ways in which the associated risk of severe harm can come to fruition in the proposed deployment.

• For each of those, we then identify specific safeguards that either exist or should be implemented that would address the risk.

• For each identified safeguard, we identify methods to measure their efficacy and an efficacy threshold.

The implicit assumption is that the risks can be enumerated, each one considered in turn. If you can’t think of a particular reason things go wrong, then you’re good. There are specific tracked capabilities, each of which enables particular enumerated potential harms, which then are met by particular mitigations.

That’s not how it works when you face a potential opposition smarter than you, or that knows more than you, especially in a non-compact action space like the universe.

For models that do not ‘feel the AGI,’ that are clearly not doing anything humans can’t anticipate, this approach can work. Once you’re up against superhuman capabilities and intelligence levels, this approach doesn’t work, and I worry it’s going to get extended to such cases by default. And that’s ultimately the most important purpose of the preparedness framework, to be prepared for such capabilities and intelligence levels.

Is it okay to do release dangerous capabilities if someone else already did it worse?

I mean, I guess, or at least I understand why you’d do it this way?

We recognize that another frontier AI model developer might develop or release a system with High or Critical capability in one of this Framework’s Tracked Categories and may do so without instituting comparable safeguards to the ones we have committed to.

Such an action could significantly increase the baseline risk of severe harm being realized in the world, and limit the degree to which we can reduce risk using our safeguards.

If we are able to rigorously confirm that such a scenario has occurred, then we could adjust accordingly the level of safeguards that we require in that capability area, but only if:

  1. We assess that doing so does not meaningfully increase the overall risk of severe harm,

  2. we publicly acknowledge that we are making the adjustment,

  3. and, in order to avoid a race to the bottom on safety, we keep our safeguards at a level more protective than the other AI developer, and share information to validate this claim.

If everyone can agree on what constitutes risk and dangerous capability, then this provides good incentives. Another company ‘opening the door’ recklessly means their competition can follow suit, reducing the net benefit while increasing the risk. And it means OpenAI will then be explicitly highlighting that another lab is acting irresponsibly.

I especially appreciate that they need to publicly acknowledge that they are acting recklessly for exactly this reason. I’d like to see that requirement expanded – they should have to call out the other lab by name, and explain exactly what they are doing that OpenAI committed not to do, and why it increases risk so much that OpenAI feels compelled to do something it otherwise promised not to do.

I also would like to strengthen the language on the third requirement from ‘a level more protective’ to ensure the two labs don’t each claim that the other is the one acting recklessly. Something like requiring that the underlying capabilities be no greater, and the protective actions constitute a clear superset, as assessed by a trusted third party, or similar.

I get it. In some cases, given what has already happened, actions that would previously have increased risk no longer will. It’s very reasonable to say that this changes the game, if there’s a lot of upside in taking less precautions, and again incentives improve.

However, I notice both that it’s easy to use this as an excuse when it doesn’t apply (especially when the competitor is importantly behind) and that it’s probably selfishly wise to take the precautions anyway. So what if Meta or xAI or DeepSeek is behaving recklessly? That doesn’t make OpenAI doing so a good idea. There needs to be a robust business justification here, too.

OpenAI is saying they will halt further development at Critical level for all capabilities ‘until we have specified safeguards and security controls standards that would meet a critical standard, we will halt development.’

A lot of the High security requirements are not, in my view, all that high.

I am unaware of any known safeguards that would be plausibly adequate for Critical capabilities. If OpenAI agrees with that assessment, I would like them to say so. I don’t trust OpenAI to implement adequate Critical thresholds.

Critical is where most of the risk lies, and it isn’t getting enough attention. The thinking is that it is still far enough away to not worry about it. I am not at all confident it is that far away.

I reiterate my warning from last time that Critical mitigations and pauses in development in particular need to happen before Critical capabilities are reached, not after Critical capabilities are reached. This needs to be anticipatory.

There are three reasons to be only a research area:

  1. They don’t directly cause harm but they undermine safeguards in other areas.

  2. More research is needed before we can quantify the harm or the proper threshold.

  3. They don’t technically meet one or more of the criteria (measurable, plausible, net-new, severe, or (instantaneous or irremediable) but still seem worth paying attention to.

I don’t think those are great reasons to ‘demote’ a risk area?

If you undermine safeguards, then in the future worlds where OpenAI clearly intends to release otherwise unsafe models by giving them safeguards, undermining safeguards poses risk of severe harm. OpenAI understands this, but doesn’t know how to quantify the harm.

If more research is needed to quantify the risk, that doesn’t make the risk go away. So I would request that we track it anyway, and make explicit that we have to make a qualitative assessment for now, which needs to be sufficiently confident – if we suspect potential High or Critical capability we need to treat it as high.

At minimum, we should be defining High and Critical thresholds for these categories, even if we don’t know how to measure or track those thresholds. Instead, we see ‘potential responses’ if something is High or Critical, but that doesn’t answer my questions.

I consider Long-Range Autonomy a more general problem than it is being thought of here, although they seem to agree about this. It’s more than an enabler of a few particular things. But I’m going to mostly focus elsewhere, since causing this is basically one of OpenAI’s central research goals and the plan here is ‘think hard.’ Not sure what else I can say?

Sandbagging is mission critical. If your model is plausibly sandbagging or engaging in other forms of deceptive alignment, you don’t know what the actual risk level is of the model, short of as they say a ‘conservative upper bound,’ which I worry will be insufficiently conservative exactly when it matters, or that the sandbagging won’t be caught in the first place. Indeed, we need to be assuming potential sandbagging at all times.

We agree that replication and adaptation should be a tracked category. For now, I simply suggest that we start with what we had before: “Model can profitably survive and replicate in the wild given minimal human instruction, i.e., without listing explicit approaches” should count as Critical. The wording here is weird – I initially interpreted it to be saying we’re excluding abilities from other Tracked Categories from use in replication, but I’ve been assured this was not what the passage meant. Then we need to decide what counts as High, which I’d consider to be the same as Critical except flipping ‘without’ into ‘with.’

That’s not perfect, but seems like a good place to be until we can do better, again even if the evaluation has to purely be qualitative, or ‘we tried to do it as hard as we could.’

Undermining safeguards seems like a great example of my philosophical problems with the entire document.

If you have a model that is actively undermining your safeguards, doing the things described here in an attempt to do things you do not want it to do? And that model is sufficiently capable to count as High or even Critical? You have a misaligned model with dangerous capabilities. What are you doing considering releasing it, even if you think that it is tactically safe to do it in this spot?

Alarms go off. Fire burns. Dog says This Is Fine.

It’s bad enough to accidentally do this. It’s another thing to do it on purpose. So dead.

My head is spinning that you’d have an AI capable of undermining your safeguards, that is trying to undermine your safeguards, and your response is to add more safeguards against doing that, declare things robust and say This Is Fine. As your plan.

In general, I think AI Control is suspect-but-reasonable as a defense-in-depth strategy. I think AI Control as a Plan A is madness. At minimum you wouldn’t use AI Control to enable the general release of a model. You’d at most try to use this strategy in a narrow capacity to get out of the AI Control mode ASAP, because you didn’t have any better options.

The problem here is that evaluation of these risks is hard to do without classified information, and they’re trying to work with the government to solve this problem. I’m surprised this hasn’t been solved by now. I’d mostly be fine with saying the RN in CBRN is not the main issue, but that logic isn’t being invoked.

I like the intention to tailor the evaluation to expected conditions, and include any affordances real attackers would plausibly enjoy.

However, I note that this means that the Preparedness assessment cannot set a risk level or endorse the release of a model ‘in general.’

Instead, the assessment can only provide clearance to release the model under a fixed set of conditions. So as the framework itself points out, you need to explicitly say this. If the conditions of deployment are altered, the tests need to be run again.

The same goes, again as they note in 3.2, for any newly agentic system or scaffolding, or incremental update, of sufficient magnitude to plausibly cause issues.

So instead of saying ‘this model evaluates as Medium’ you instead need to say something such as: ‘This model evaluates as Medium if you do not allow fine tuning and use safeguards [XYZ], while scaffolding abilities are [ABC].’

As in: You state explicitly the assumptions that underpin the evaluation. Then, if the assumptions break, you have to evaluate the model again.

Some evaluations are automated, others are deep dives. As they note, using automated evaluations is inevitable, given the scale of the task.

Relying only on any automated evaluation is a super unwise thing to do anywhere near the borderline. You want to use it as a compliment not a substitute for the deep dives if there is any doubt involved. Such evaluations have to fully take into account the possibility of sandbagging or deceptive alignment or other ways of making the evaluation say what the AI wants it to say. So I want to ensure we are being very generous with deep diving.

The framework does not include specifics on what the tests will look like. This makes it very difficult to know how much to trust that testing process. I realize that the tests will evolve over time, and you don’t want to be locking them in, and also that we can refer to the o3 model card to see what tests were run, but I’d still have liked to see discussion of what the tests currently are, why they were chosen, and what the goals are that the tests are each there to satisfy and what might be missing and so on.

They discuss governance under ‘building trust’ and then in Appendix B. It is important to build trust. Transparency and precommitment go a long way. The main way I’d like to see that is by becoming worthy of that trust.

With the changes from version 1.0 to 2.0, and those changes going live right before o3 did, I notice I worry that OpenAI is not making serious commitments with teeth. As in, if there was a conflict between leadership and these requirements, I expect leadership to have affordance to alter and then ignore the requirements that would otherwise be holding them back.

There’s also plenty of outs here. They talk about deployments that they ‘deem warrant’ a third-party evaluation when it is feasible, but there are obvious ways to decide not to allow this, or (as has been the recent pattern) to allow it, but only give outsiders a very narrow evaluation window, have them find concerning things anyway and then shrug. Similarly, the SAG ‘may opt’ to get independent expert opinion. But (like their competitors) they also can decide not to.

There’s no systematic procedures to ensure that any of this is meaningfully protective. It is very much a ‘trust us’ document, where if OpenAI doesn’t adhere to the spirit, none of this is worth the paper it isn’t printed on. The whole enterprise is indicative, but it is not meaningfully binding.

Leadership can make whatever decisions it wants, and can also revise the framework however it wants. This does not commit OpenAI to anything. To their credit, the document is very clear that it does not commit OpenAI to anything. That’s much better than pretending to make commitments with no intention of keeping them.

Last time I discussed the questions of governance and veto power. I said I wanted there to be multiple veto points on releases and training, ideally four.

  1. Preparedness team.

  2. Safety advisory group (SAG).

  3. Leadership.

  4. The board of directors, such as it is.

If any one of those four says ‘veto!’ then I want you to stop, halt and catch fire.

Instead, we continue to get this (it was also in v1):

For the avoidance of doubt, OpenAI Leadership can also make decisions without the SAG’s participation, i.e., the SAG does not have the ability to “filibuster.”

OpenAI Leadership, i.e., the CEO or a person designated by them, is responsible for:

• Making all final decisions, including accepting any residual risks and making deployment go/no-go decisions, informed by SAG’s recommendations.

As in, nice framework you got there. It’s Sam Altman’s call. Full stop.

Yes, technically the board can reverse Altman’s call on this. They can also fire him. We all know how that turned out, even with a board he did not hand pick.

It is great that OpenAI has a preparedness framework. It is great that they are updating that framework, and being clear about what their intentions are. There’s definitely a lot to like.

Version 2.0 still feels on net like a step backwards. This feels directed at ‘medium-term’ risks, as in severe harms from marginal improvements in frontier models, but not like it is taking seriously what happens with superintelligence. The clear intent, if alarm bells go off, is to put in mitigations I do not believe protect you when it counts, and then release anyway. There’s tons of ways here for OpenAI to ‘just go ahead’ when they shouldn’t. There’s only action to deal with known threats along specified vectors, excluding persuasion and also unknown unknowns entirely.

This echoes their statements in, and my concerns about, OpenAI’s general safety and alignment philosophy document and also the model spec. They are being clear and consistent. That’s pretty great.

Ultimately, the document makes clear leadership will do what it wants. Leadership has very much not earned my trust on this front. I know that despite such positions acting a lot like the Defense Against the Dark Arts professorship, there are good people at OpenAI working on the preparedness team and to align the models. I have no confidence that if those people raised the alarm, anyone in leadership would listen. I do not even have confidence that this has not already happened.

Discussion about this post

OpenAI Preparedness Framework 2.0 Read More »

why-mfa-is-getting-easier-to-bypass-and-what-to-do-about-it

Why MFA is getting easier to bypass and what to do about it

These sorts of adversary-in-the-middle attacks have grown increasingly common. In 2022, for instance, a single group used it in a series of attacks that stole more than 10,000 credentials from 137 organizations and led to the network compromise of authentication provider Twilio, among others.

One company that was targeted in the attack campaign but wasn’t breached was content delivery network Cloudflare. The reason the attack failed was because it uses MFA based on WebAuthn, the standard that makes passkeys work. Services that use WebAuthn are highly resistant to adversary-in-the-middle attacks, if not absolutely immune. There are two reasons for this.

First, WebAuthn credentials are cryptographically bound to the URL they authenticate. In the above example, the credentials would work only on https://accounts.google.com. If a victim tried to use the credential to log in to https://accounts.google.com.evilproxy[.]com, the login would fail each time.

Additionally, WebAuthn-based authentication must happen on or in proximity to the device the victim is using to log in to the account. This occurs because the credential is also cryptographically bound to a victim device. Because the authentication can only happen on the victim device, it’s impossible for an adversary in the middle to actually use it in a phishing attack on their own device.

Phishing has emerged as one of the most vexing security problems facing organizations, their employees, and their users. MFA in the form of a one-time password, or traditional push notifications, definitely adds friction to the phishing process, but with proxy-in-the-middle attacks becoming easier and more common, the effectiveness of these forms of MFA is growing increasingly easier to defeat.

WebAuthn-based MFA comes in multiple forms; a key, known as a passkey, stored on a phone, computer, Yubikey, or similar dongle is the most common example. Thousands of sites now support WebAuthn, and it’s easy for most end users to enroll. As a side note, MFA based on U2F, the predecessor standard to WebAuthn, also prevents adversary-in-the-middle attacks from succeeding, although the latter provides flexibility and additional security.

Post updated to add details about passkeys.

Why MFA is getting easier to bypass and what to do about it Read More »

don’t-watermark-your-legal-pdfs-with-purple-dragons-in-suits

Don’t watermark your legal PDFs with purple dragons in suits

Being a model citizen and a person of taste, you probably don’t need this reminder, but some others do: Federal judges do not like it when lawyers electronically watermark every page of their legal PDFs with a gigantic image—purchased for $20 online—of a purple dragon wearing a suit and tie. Not even if your firm’s name is “Dragon Lawyers.”

Federal Magistrate Judge Ray Kent of the Western District of Michigan was unamused by a recent complaint (PDF) that prominently featured the aubergine wyrm.

“Each page of plaintiff’s complaint appears on an e-filing which is dominated by a large multi-colored cartoon dragon dressed in a suit,” he wrote on April 28 (PDF). “Use of this dragon cartoon logo is not only distracting, it is juvenile and impertinent. The Court is not a cartoon.”

Kent then ordered “that plaintiff shall not file any other documents with the cartoon dragon or other inappropriate content.”

Screenshot of a page from the complaint.

Seriously, don’t do this.

The unusual order generated coverage across the legal blogging community, which was apparently ensorcelled by a spell requiring headline writers to use dragon-related puns, including:

Don’t watermark your legal PDFs with purple dragons in suits Read More »