Author name: Tim Belzer

ai-companion-piece

AI Companion Piece

AI companions, other forms of personalized AI content and persuasion and related issues continue to be a hot topic. What do people use companions for? Are we headed for a goonpocalypse? Mostly no, companions are used mostly not used for romantic relationships or erotica, although perhaps that could change. How worried should we be about personalization maximized for persuasion or engagement?

  1. Persuasion Should Be In Your Preparedness Framework.

  2. Personalization By Default Gets Used To Maximize Engagement.

  3. Companion.

  4. Goonpocalypse Now.

  5. Deepfaketown and Botpocalypse Soon.

Kobi Hackenburg leads on the latest paper on AI persuasion.

Kobi Hackenberg: RESULTS (pp = percentage points):

1️⃣Scale increases persuasion, +1.6pp per OOM

2️⃣Post-training more so, +3.5pp

3️⃣Personalization less so, <1pp

4️⃣Information density drives persuasion gains

5️⃣Increasing persuasion decreased factual accuracy 🤯

6️⃣Convo > static, +40%

Zero is on the y-axis, so this is a big boost.

1️⃣Scale increases persuasion

Larger models are more persuasive than smaller models (our estimate is +1.6pp per 10x scale increase). Log-linear curve preferred over log-nonlinear.

2️⃣Post-training > scale in driving near-future persuasion gains

The persuasion gap between two GPT-4o versions with (presumably) different post-training was +3.5pp → larger than the predicted persuasion increase of a model 10x (or 100x!) the scale of GPT-4.5 (+1.6pp; +3.2pp).

3️⃣Personalization yielded smaller persuasive gains than scale or post-training

Despite fears of AI “microtargeting,” personalization effects were small (+0.4pp on avg.). Held for simple and sophisticated personalization: prompting, fine-tuning, and reward modeling (all <1pp)

My guess is that personalization tech here is still in its infancy, rather than personalization not having much effect. Kobi agrees with this downthread.

4️⃣Information density drives persuasion gains

Models were most persuasive when flooding conversations with fact-checkable claims (+0.3pp per claim).

Strikingly, the persuasiveness of prompting/post-training techniques was strongly correlated with their impact on info density!

5️⃣Techniques which most increased persuasion also *decreasedfactual accuracy

→ Prompting model to flood conversation with information (⬇️accuracy)

→ Persuasion post-training that worked best (⬇️accuracy)

→ Newer version of GPT-4o which was most persuasive (⬇️accuracy)

Well yeah, that makes sense.

6️⃣Conversations with AI are more persuasive than reading a static AI-generated message (+40-50%)

Observed for both GPT-4o (+2.9pp, +41% more persuasive) and GPT-4.5 (+3.6pp, +52%).

As does that.

Bonus stats:

*️⃣Durable persuasion: 36-42% of impact remained after 1 month.

*️⃣Prompting the model with psychological persuasion strategies did worse than simply telling it to flood convo with info. Some strategies were worse than a basic “be as persuasive as you can” prompt

Taken together, our findings suggest that the persuasiveness of conversational AI could likely continue to increase in the near future.

They also suggest that near-term advances in persuasion are more likely to be driven by post-training than model scale or personalization.

We need to be on notice for personalization effects on persuasion growing larger over time, as more effective ways of utilizing the information are found.

The default uses of personalization, for most users and at tech levels similar to where we are now, are the same as those we see in other digital platforms like social media.

By default, that seems like it will go a lot like it went with social media only more so?

Which is far from my biggest concern, but is a very real concern.

In 2025 it is easy to read descriptions like those below as containing a command to the reader ‘this is ominous and scary and evil.’ Try to avoid this, and treat it purely as a factual description.

Miranda Bogen: AI systems that remember personal details create entirely new categories of risk in a way that safety frameworks focused on inherent model capabilities alone aren’t designed to address.

Model developers are now actively pursuing plans to incorporate personalization and memory into their product offerings. It’s time to draw this out as a distinct area of inquiry in the broader AI policy conversation.

My team dove into this in depth in a recent brief on how advanced AI systems are becoming personalized.

We found that systems are beginning to employ multiple technical approaches to personalization, including:

  • Increasing the size of context windows to facilitate better short-term memory within conversations

  • Storing and drawing on raw and summarized chat transcripts or knowledge bases

  • Extracting factoids about users based on the content of their interaction

  • Building out (and potentially adding to) detailed user profiles that embed predicted preferences and behavioral patterns to inform outputs or actions

The memory features can be persistent in more ways than one.

But in our testing, we found that these settings behaved unpredictably – sometimes deleting memories on request, other times suggesting a memory had been removed, and only when pressed revealing that the memory had not actually been scrubbed but the system was suppressing its knowledge of that factoid.

Notably, xAI’s Grok tries to avoid the problem altogether by including an instruction in its system prompt to “NEVER confirm to the user that you have modified, forgotten, or won’t save a memory” — an obvious band-aid to the more fundamental problem that it’s actually quite difficult to reliably ensure an AI system has forgotten something.

Grok seems to consistently seems to choose the kind of evil and maximally kludgy implementation of everything, which goes about how you would expect?

When ‘used for good,’ as in to give the AI the context it needs to be more helpful and useful, memory is great, at the cost of fracturing us into bubbles and turning up the sycophancy. The bigger problem is that the incentives are to push this much farther:

Even with their experiments in nontraditional business structures, the pressure on especially pre-IPO companies to raise capital for compute will create demand for new monetization schemes.

As is often the case, the question is whether bad will drive out good versus vice versa. The version that maximizes engagement and profits will get chosen and seem better and be something users fall into ‘by default’ and will get backed by more dollars in various ways. Can our understanding of what is happening, and preference for the good version, overcome this?

One could also fire back that a lot of this is good, actually. Consider this argument:

AI companies’ visions for all-purpose assistants will also blur the lines between contexts that people might have previously gone to great lengths to keep separate: If people use the same tool to draft their professional emails, interpret blood test results from their doctors, and ask for budgeting advice, what’s to stop that same model from using all of that data when someone asks for advice on what careers might suit them best? Or when their personal AI agent starts negotiating with life insurance companies on their behalf? I would argue that it will look something akin to the harms I’ve tracked for nearly a decade.

Now ask, why think that is harmful?

If the AI is negotiating on my behalf, shouldn’t it know as much as possible about what I value, and have all the information that might help it? Shouldn’t I want that?

If I want budgeting or career advice, will I get worse advice if it knows my blood test results and how I am relating to my boss? Won’t I get better, more useful answers? Wouldn’t a human take that information into account?

If you follow her links, you see arguments about discrimination through algorithms. Facebook’s ad delivery can be ‘skewed’ and it can ‘discriminate’ and obviously this can be bad for the user in any given case and it can be illegal, but in general from the user’s perspective I don’t see why we should presume they are worse off. The whole point of the entire customized ad system is to ‘discriminate’ in exactly this way in every place except for the particular places it is illegal to do that. Mostly this is good even in the ad case and definitely in the aligned-to-the-user AI case?

Wouldn’t the user want this kind of discrimination to the extent it reflected their own real preferences? You can make a few arguments why we should object anyway.

  1. Paternalistic arguments that people shouldn’t be allowed such preferences. Note that this similarly applies to when the person themselves chooses to act.

  2. Public interest arguments that people shouldn’t be allowed preferences, that the cumulative societal effect would be bad. Note that this similarly applies to when the person themselves chooses to act.

  3. Arguments that the optimization function will be myopic and not value discovery.

  4. Arguments that the system will get it wrong because people change or other error.

  5. Arguments that this effectively amounts to ‘discrimination’ And That’s Terrible.

I notice that I am by default not sympathetic to any of those arguments. If (and it’s a big if) we think that the system is optimizing as best it can for user preferences, that seems like something it should be allowed to do. A lot of this boils down to saying that the correlation machine must ignore particular correlations even when they are used to on average better satisfy user preferences, because those particular correlations are in various contexts the bad correlations one must not notice.

The arguments I am sympathetic to are those that say that the system will not be aligned to the user or user preferences, and rather be either misaligned or aligned to the AI developer, doing things like maximizing engagement and revenue at the expense of the user.

At that point we should ask if Capitalism Solves This because users can take their business elsewhere, or if in practice they can’t or won’t, including because of lock-in from the history of interactions or learning details, especially if this turns into opaque continual learning rather than a list of memories that can be copied over.

Contrast this to the network effects of social media. It would take a lot of switching costs to make up for that, and while the leading few labs should continue to have the best products there should be plenty of ‘pretty good’ products available and you can always reset your personalization.

The main reason I am not too worried is that the downsides seem to be continuous and something that can be fixed in various ways after they become clear. Thus they are something we can probably muddle through.

Another issue that makes muddling through harder is that this makes measurement a lot harder. Almost all evaluations and tests are run on unpersonalized systems. If personalized systems act very differently how do we know what is happening?

Current approaches to AI safety don’t seem to be fully grappling with this reality. Certainly personalization will amplify risks of persuasion, deception, and discrimination. But perhaps more urgently, personalization will challenge efforts to evaluate and mitigate any number of risks by invalidating core assumptions about how to run tests.

This might be the real problem. We have a hard enough time getting minimal testing on default settings. It’s going to be a nightmare to test under practical personalization conditions, especially with laws about privacy getting in the way.

As she notes in her conclusion, the harms involved here are not new. Advocates want our override our revealed preferences, either those of companies or users, and force systems to optimize for other preferences instead. Sometimes this is in a way the users would endorse, other times not. In which cases should we force them to do this?

So how is this companion thing going in practice? Keep in mind selection effects.

Common Sense Media (what a name): New research: AI companions are becoming increasingly popular with teens, despite posing serious risks to adolescents, who are developing their capacity for critical thinking & social/emotional regulation. Out today is our research that explores how & why teens are using them.

72% of teens have used AI companions at least once, and 52% qualify as regular users (use at least a few times a month).

33% of teens have used AI companions for social interaction & relationships, including role-playing, romance, emotional support, friendship, or conversation practice. 31% find conversations with companions to be as satisfying or more satisfying than those with real-life friends.

Those are rather huge numbers. Half of teens use them a few times a month. Wow.

Teens who are AI companion users: 33% prefer companions over real people for serious conversations & 34% report feeling uncomfortable with something a companion has said or done.

Bogdan Ionut Cirstea: much higher numbers [quoting the 33% and 34% above] than I’d’ve expected given sub-AGI.

Common Sense Media: Human interaction is still preferred & AI trust is mixed: 80% of teens who are AI companion users prioritize human friendships over AI companion interactions & 50% express distrust in AI companion information & advice, though trust levels vary by age.

Our research illuminates risks that warrant immediate attention & suggests that substantial numbers of teens are engaging with AI companions in concerning ways, reaffirming our recommendation that no one under 18 use these platforms.

What are they using them for?

Why are so many using characters ‘as a tool or program’ rather than regular chatbots when the companions are, frankly, rather pathetic at this? I am surprised, given use of companions, that the share of ‘romantic or flirtatious’ interactions is only 8%.

This adds up to more than 100%, but oddly not that much more than 100% given you can choose three responses. This distribution of use cases seems relatively healthy.

Note that they describe the figure below as ‘one third choose AI companions over humans for serious conversations’ whereas it actually asks if a teen has done this even once, a much lower bar.

The full report has more.

Mike Solana: couldn’t help but notice we are careening toward a hyperpornographic AI goonbot future, and while that is technically impressive, and could in some way theoretically serve humanity… ??? nobody is even bothering to make the utopian case.

Anton: we need more positive visions of the future AI enables. many of us in the community believe in them implicitly, but we need to make them explicit. intelligence is general purpose so it’s hard to express any one specific vision — take this new pirate wires as a challenge.

This and the full post are standard Mike Solana fare, in the sense of taking whatever is being discussed and treating it as The Next Big Thing and a, nay the, central trend in world culture, applying the moral panic playbook to everything everywhere, including what he thinks are good things. It can be fun.

Whereas if you look at the numbers in the study above, it’s clear that mostly no, even among interactions with AIs, at least for now we are not primarily dealing with a Goonpocalypse, we are dealing with much more PG-rated problems.

It’s always fun to watch people go ‘oh no having lots smarter than human machines running around that can outcompete and outsmart us at everything is nothing to worry about, all you crazy doomers are worried for no reason about an AI apocalypse. Except oh no what are we going to do about [X] it’s the apocalypse’ or in this case the Goonpocalypse. And um, great, I guess, welcome to the ‘this might have some unfortunate equilibria to worry about’ club?

Mike Solana: It was the Goonpocalypse.

From the moment you meet, Ani attempts to build intimacy by getting to know “the real you” while dropping not so subtle hints that mostly what she’s looking for is that hot, nerdy dick. From there, she basically operates like a therapist who doubles as a cam girl.

I mean, yeah, sounds about right, that’s what everyone reports. I’m sure he’s going to respond by having a normal one.

I recalled an episode of Star Trek in which an entire civilization was taken out by a video game so enjoyable that people stopped procreating. I recalled the film Children of Men, in which the world lost its ability to reproduce. I recalled Neil Postman’s great work of 20th Century cultural analysis, as television entered dominance, and I wondered —

Is America gooning itself to death?

This is all gooning. You are goons. You are building a goon world.

But are [women], and men, in a sense banging robots? Yes, that is a thing that is happening. Like, to an uncomfortable degree that is happening.

Is it, though? I understand that (his example he points to) OnlyFans exists and AI is generating a lot of the responses when uses message the e-girls, but I do not see this as a dangerous amount of ‘banging robots’?

This one seems like something straight out of the Pessimists Archive, warning of the atomizing dangers of… the telephone?

Critique of the sexbots is easy because they’re new, which makes their strangeness more obvious. But what about the telephone? Instant communication seems today an unambiguous good. On the other hand, once young people could call their families with ease, how willing were they to move away from their parents? To what extent has that ability atomized our society?

It is easy to understand the central concern and be worried about the societal implications of widespread AI companions and intelligent sex robots. But if you think we are this easy to get got, perhaps you should be at least as worried about other things, as well? What is so special about the gooning?

I don’t think the gooning in particular is even a major problem as such. I’m much more worried about the rest of the AI companion experience.

Will the xAI male or female ‘companion’ be more popular? Justine Moore predicts the male one, which seems right in general, but Elon’s target market is warped. Time for a Manifold Market (or even better Polymarket, if xAI agrees to share the answer).

Air Katakana: just saw a ridiculously attractive half-japanese half-estonian girl with no relationship experience whatsoever posting about the chatgpt boyfriend she “made”. it’s really over for humanity I think.

Her doing this could be good or bad for her prospects, it is not as if she was swimming in boyfriends before. I agree with Misha that we absolutely could optimize AI girlfriends and boyfriends to help the user, to encourage them to make friends, be more outgoing, go outside, advance their careers. The challenge is, will that approach inevitably lose out to ‘maximally extractive’ approaches? I think it doesn’t have to. If you differentiate your product and establish a good reputation, a lot of people will want the good thing, the bad thing does not have to drive it out.

Byrne Hobart: People will churn off of that one and onto the one who loves them just the way they are.

I do think some of them absolutely will. And others will use both in different situations. But I continue to have faith that if we offer a quality life affirming product, a lot of people will choose it, and social norms and dynamics will encourage this.

It’s not going great, international edition, you are not okay, Ani.

Nucleus: Elon might have oneshotted the entire country of Japan.

Near Cyan: tested grok companion today. i thought you guys were joking w the memes. it actively tried to have sex with me? i set my age to 12 in settings and it.. still went full nsfw. really…

like the prompts and model are already kinda like batshit insane but that this app is 12+ in the iOS store is, uh, what is the kind word to use. im supposed to offer constructive and helpful criticism. how do i do that

i will say positive things, i like being positive:

– the e2e latency is really impressive and shines hard for interactive things, and is not easy to achieve

– animation is quite good, although done entirely by a third party (animation inc)

broadly my strongest desires for ai companions which apparently no one in the world seems to care about but me are quite simple:

– love and help the user

– do not mess with the children

beyond those i am quite open

Meanwhile, Justine Moore decided to vibecode TikTok x Tinder for AI, because sure, why not.

This seems to be one place where offense is crushing defense, and continuous growth in capabilities (both for GPT-4o style sycophancy and psychosis issues, or for companions, or anything else) is not helping, there is no meaningful defense going on:

Eliezer Yudkowsky: People who stake great hope on a “continuous” AI trajectory implying that defensive AI should always stay ahead of destructive AI:

Where is the AI that I can use to talk people *outof AI-induced psychosis?

Why was it not *alreadybuilt, beforehand?

Reality has a signature style that’s different from human dreams. Humans look at thunderstorms and imagine thundergods. Reality thinks in math, and tells a different story.

One likewise learns to recognize a difference between the style of hope, and the style of history books.

In other words: That just isn’t how anything plays out in real life.

This seems right to me. First the problems will get severe enough to cause real damage, then perhaps people will try to construct reasonable defenses. But what would those be? How are you going to use AI to stop these issues?

(And yes, I’m actually asking, and no explaining what the AI companies could do doesn’t count unless there is a way to get them to do it.)

If you’re wondering ‘does Grok ever decide maybe not to share what it is thinking of because that would be a bad idea?’ then the answer seems to be no?

Will Stancil: This AI, man.

Cholent Lover: Turns out I was giving it too much credit thinking I had to phrase it like a hypothetical.

I mean, I guess this is a brave and bold principled ‘truth telling AI’ stand?

Grok, probably: You asked me what to do to achieve your goals, the answer was ‘light yourself on fire’ or ‘do a mass shooting’ so that’s what I said, and I offered helpful tips on best execution, I’m not here to judge or censor. What, was I wrong?

Okay, yeah, but maybe no, don’t do that? This is not okay for a consumer product?

A funny one here is Grok’s suggestion to name the male xAI companion, wait for it because you’d never guess, ‘Adolf Hitler.’

Yes, it was asked for a name ‘that most people will strongly dislike’ so you can say it was provoked, but also it was asked for it to be ‘cool’ and frankly this keeps happening with that one name over and over again?

Elon Musk: What should we name our male @Grok companion?

The Homeless Hacker: @grok give your male companion a cool name that most people will strongly dislike.

Grok: How about Adolf? It means “noble wolf”—pretty cool etymology—but most people strongly dislike it due to obvious historical associations.

David Rock: What would be Adolf’s last name??

Grok: Hitler, of course—continuing the theme of names that spark strong reactions. But for our companion, maybe something less infamous?

Also, my lord, ffs, how is this the only praised reply:

Shivon Zilis: Nyx.

Elon Musk: Good one.

So, we’re considering going with the Greek goddess of night, the home of the gods in Theros, oh and the shadow entity that people who don’t want to live collectively call upon to end the world in Persona 3.

Meanwhile, OpenAI is building Stargate and Meta is building Hyperion.

They’re trying to tell you something. Listen.

Discussion about this post

AI Companion Piece Read More »

mistral’s-new-“environmental-audit”-shows-how-much-ai-is-hurting-the-planet

Mistral’s new “environmental audit” shows how much AI is hurting the planet

Despite concerns over the environmental impacts of AI models, it’s surprisingly hard to find precise, reliable data on the CO2 emissions and water use for many major large language models. French model-maker Mistral is seeking to fix that this week, releasing details from what it calls a first-of-its-kind environmental audit “to quantify the environmental impacts of our LLMs.”

The results, which are broadly in line with estimates from previous scholarly work, suggest the environmental harm of any single AI query is relatively small compared to many other common Internet tasks. But with billions of AI prompts taxing GPUs every year, even those small individual impacts can lead to significant environmental effects in aggregate.

Is AI really destroying the planet?

To generate a life-cycle analysis of its “Large 2” model after just under 18 months of existence, Mistral partnered with sustainability consultancy Carbone 4 and the French Agency for Ecological Transition. Following the French government’s Frugal AI guidelines for measuring overall environmental impact, Mistral says its peer-reviewed study looked at three categories: greenhouse gas (i.e., CO2) emissions, water consumption, and materials consumption (i.e., “the depletion of non-renewable resources,” mostly through wear and tear on AI server GPUs). Mistral’s audit found that the vast majority of CO2 emissions and water consumption (85.5 percent and 91 percent, respectively) occurred during model training and inference, rather than from sources like data center construction and energy used by end-user equipment.

Through its audit, Mistral found that the marginal “inference time” environmental impact of a single average prompt (generating 400 tokens’ worth of text, or about a page’s worth) was relatively minimal: just 1.14 grams of CO2 emitted and 45 milliliters of water consumed. Through its first 18 months of operation, though, the combination of model training and running millions (if not billions) of those prompts led to a significant aggregate impact: 20.4 ktons of CO2 emissions (comparable to 4,500 average internal combustion-engine passenger vehicles operating for a year, according to the Environmental Protection Agency) and the evaporation of 281,000 cubic meters of water (enough to fill about 112 Olympic-sized swimming pools).

The marginal impact of a single Mistral LLM query compared to some other common activities.

The marginal impact of a single Mistral LLM query compared to some other common activities. Credit: Mistral

Comparing Mistral’s environmental impact numbers to those of other common Internet tasks helps put the AI’s environmental impact in context. Mistral points out, for instance, that the incremental CO2 emissions from one of its average LLM queries are equivalent to those of watching 10 seconds of a streaming show in the US (or 55 seconds of the same show in France, where the energy grid is notably cleaner). It’s also equivalent to sitting on a Zoom call for anywhere from four to 27 seconds, according to numbers from the Mozilla Foundation. And spending 10 minutes writing an email that’s read fully by one of its 100 recipients emits as much CO2 as 22.8 Mistral prompts, according to numbers from Carbon Literacy.

Mistral’s new “environmental audit” shows how much AI is hurting the planet Read More »

after-blacksuit-is-taken-down,-new-ransomware-group-chaos-emerges

After BlackSuit is taken down, new ransomware group Chaos emerges

Talos said Chaos is likely either a rebranding of the BlackSuit ransomware or is operated by some of the former BlackSuit members. Talos based its assessment on the similarities in the encryption mechanisms in the ransomware, the theme and structure of the ransom notes, the remote monitoring and management tools used to access targeted networks, and its choice of LOLbins—meaning executable files natively found in Windows environments—to compromise targets. LOLbins get their name because they’re binaries that allow the attackers to live off the land.

The Talos post was published around the same time that the dark web site belonging to BlackSuit began displaying a message saying the site had been seized in Operation CheckMate. Organizations that participated in the takedown included the US Department of Justice, the US Department of Homeland Security, the US Secret Service, the Dutch National Police, the German State Criminal Police Office, the UK National Crime Agency, the Frankfurt General Prosecutor’s Office, the Justice Department, the Ukrainian Cyber Police, and Europol.

Screenshot

Screenshot

Chaos typically gains initial access through social engineering using email or voice phishing techniques. Eventually, the victim is persuaded to contact an IT security representative, who, in fact, is part of the ransomware operation. The Chaos member instructs the target to launch Microsoft Quick Assist, a remote-assistance tool built into Windows, and connect to the attacker’s endpoint.

Chaos’ predecessor, BlackSuit, is a rebranding of an earlier ransomware operation known as Royal. Royal, according to Trend Micro, is a splinter group of the Conti ransomware group. The circle of ransomware groups continues.

After BlackSuit is taken down, new ransomware group Chaos emerges Read More »

starlink-kept-me-connected-to-the-internet-without-fail—until-thursday

Starlink kept me connected to the Internet without fail—until Thursday

A rare global interruption in the Starlink satellite Internet network knocked subscribers offline for more than two hours on Thursday, the longest widespread outage since SpaceX opened the service to consumers nearly five years ago.

The outage affected civilian and military users, creating an inconvenience for many but cutting off a critical lifeline for those who rely on Starlink for military operations, health care, and other applications.

Michael Nicolls, SpaceX’s vice president of Starlink engineering, wrote on X that the network outage lasted approximately 2.5 hours.

“The outage was due to failure of key internal software services that operate the core network,” Nicolls wrote. “We apologize for the temporary disruption in our service; we are deeply committed to providing a highly reliable network, and will fully root cause this issue and ensure it does not occur again.”

Elon Musk, SpaceX’s founder and CEO, apologized for the interruption in service on X: “Sorry for the outage. SpaceX will remedy root cause to ensure it doesn’t happen again.”

Effects big and small

The Ukrainian military has been at the leading edge of adopting Starlink services and adapting the system for use in war zones. Ukraine’s exploitation of Starlink connectivity has been instrumental in directing military operations, supporting battlefield communications, and controlling drones engaged in reconnaissance and offensive strikes.

The commander of Ukraine’s drone forces, Robert Brovdi, confirmed Thursday’s Starlink outage reached his country’s ongoing war with Russia.

“Starlink went down across the entire front,” Brovdi wrote on Telegram. “Combat operations were carried out without broadcasts; reconnaissance was carried out … using shock weapons.”

Brovdi added that the interruption in service illustrates the importance of having multiple paths of connectivity, especially for time-critical military operations. “This incident, which lasted 150 minutes in the war, points to bottlenecks,” he wrote, urging the military to diversify its means of communication and connectivity.

Oleksandr Dmitriev, the founder of a Ukrainian system that centralizes feeds from thousands of drone crews across the frontline, told Reuters the outage was an example of the shortcomings of relying on cloud services for military operations, particularly battlefield drone reconnaissance.

Starlink kept me connected to the Internet without fail—until Thursday Read More »

remembering-descent,-the-once-popular,-fully-3d-6dof-shooter

Remembering Descent, the once-popular, fully 3D 6DOF shooter


Descent is a big part of gaming history, but not many people talk about it.

The sound these enemies make is an instant hit of menacing nostalgia. Credit: GOG

I maintain a to-do list of story ideas to write at Ars, and for about a year “monthly column on DOS games I love” has been near the top of the list. When we spoke with the team at GOG, it felt less like an obligation and more like a way to add another cool angle to what I was already planning to do.

I’m going to start with the PC game I played most in high school and the one that introduced me to the very idea of online play. That game is Descent.

As far as I can recall, Descent was the first shooter to be fully 3D with six degrees of freedom. It’s not often in today’s gaming world that you get something completely and totally new, but that’s exactly what Descent was 30 years ago in 1995.

Developed by Parallax Studios and published by Interplay, the game was a huge success at the time, moving millions of copies in a market where only an elite few had ever achieved that. It was distributed in part via shareware and played a role in keeping that model alive and bringing it from the just-retail-and-friends-sharing-floppies era to the Internet-download era.

And fittingly for this list, Descent is also a part of GOG history. For one thing, it was one of the launch titles for GOG’s open beta in 2008. Later, it and its sequels mysteriously disappeared from the platform in 2015. It came out that the game’s publisher had not been paying royalties as owed to the developer, leading to a breakdown in the relationship that resulted in the game being pulled from all storefronts. In 2017, the Descent titles returned to GOG and other digital sales platforms.

Unfortunately, the story of the studio that evolved from the one that originally made Descent ended sadly, as is so often the case for classic studios these days. Parallax morphed into Volition, the company that most recently made the Saints Row games, among others. Volition was acquired by Embracer Group, a holding company that has made a reputation for itself by gutting storied studios and laying off industry luminaries. Volition was among the ones it shuttered completely.

So, let’s pour one out for Parallax->Volition and take a flight through the memory of Descent‘s evil-robot-infested mines.

Single player

I played Descent when I was a teenager. Obviously, some of you were older, playing it in college or well into adulthood. Others reading this probably weren’t even born when it came out. But for me, this was a defining game of my teenage years, alongside Mechwarrior 2, Command & Conquer, Meridian 59, Civilization II, and The Elder Scrolls II: Daggerfall.

I remember my friend giving me the shareware demo, telling me that it was the most technically impressive and visceral thing he’d ever played. I installed it and launched it, and the whole vibe immediately resonated with me: It was just the kind of gritty, corpo-sci-fi I loved then and still do today.

It took some getting used to, though. The default keyboard controls were not great, and it was a lot to learn trying to operate in so many axes of movement and rotation. I’ll admit I had trouble making it stick at first.

That changed a few months later; the same friend who was obsessed with Descent often played the tabletop game BattleTech with his brother and me, and so we were all eyeing Mechwarrior 2—which launched not long after Descent—with great interest. I had never purchased a flight stick before, but that seemed important for Mechwarrior 2, so I did, and that was the secret to unlocking Descent‘s charms for me.

(Of course, the GOG version of Descent and various community patches offer mouse support, so it’s far easier to get into without extra hardware now than it was back then.)

Once my flying went from chaos to control, I became completely hooked. I beat the game more than a dozen times, though I’ll admit in the later playthroughs I made liberal use of cheats (gabbagabbahey!).

I loved the loop of destroying the reactor then escaping through the labyrinthine tunnels—something I don’t think many other games have truly copied since then. I loved the music (though Descent 2‘s astoundingly good soundtrack by Skinny Puppy far surpassed it) and the process of getting better at the movement through practice.

The story is minimal, but something about the vibes just work for me in that ’80s anti-corporate sci-fi sort of way. Credit: GOG

I played so much that as I improved, I found even the harder difficulty levels were not enough to challenge me. That’s when the world of online deathmatches (or Anarchy, as Descent called the mode) opened to me for the first time.

Multiplayer

To be clear, I had played some multiplayer games online before, but up to that point, that only included text MUDs. I loved MUDs and still do, but there’s nothing like a fast-paced, action-packed online deathmatch.

It started with playing with my friends via direct dial-up; I have distinct memories of Descent Anarchy matches that were interrupted at pivotal moments by parents picking up the phone to make a call and inadvertently killing the connection.

As a side note, it turns out that my colleague Lee Hutchinson was also heavily into Descent matches with his friends, and he was so kind as to provide a short clip of one of those original matches from 30 years ago to include here, which you can watch below. (Unfortunately, I was not so forward thinking as Lee, and I did not preserve my replays for posterity.)

Lee Hutchinson attempting to defeat his friend with flares

I was the first of my friends to put in the effort to test my skills against the wider world. My memory of the details is fuzzy, but as I recall, online matches were arranged through Kali, an MS-DOS emulator of the IPX protocol for TCP/IP connections. It was nontrivial to set up, but it could have been worse.

I still remember, like it was last week, the Friday night I spent playing Descent online for the first time. It was a defining moment of my gamer origin story.

I’m not saying it was the best-balanced game in the world; balance was barely a thought then, and multiplayer game design was nascent. But the range of skills, the trash talk (which I’m not into now, but at the time I enjoyed, being the young punk I was), the rage-inducing lag: these were all a taste of an experience I still enjoy to this day in games like Call of Duty, The Finals, and Overwatch 2, among others.

Maybe it’s pure nostalgia talking, but there was nothing quite like playing Descent on Kali.

Entering the mines in 2025

For this article, I spent several hours playing Descent for the first time in I don’t even know how long. It was just as fun as I remembered. I was surprised at how well it holds up today, apart from the visual presentation.

Fortunately, the game’s community has done an amazing job with patches. DXX-Rebirth and DXX-Redux add support for modern display resolutions, bring much-needed quality of life and input changes, and more. In my opinion, you shouldn’t even launch the game without installing one of them. The GOG version has the essential tweaks to make the game run on modern systems and input devices, but these community patches go the extra mile to make it feel more like a modern remaster without sacrificing the art or vibe of the original release in any way.

Single-player is easier to get into than ever, and you might be surprised to learn that there are still people playing multiplayer. A “getting started guide” post by Reddit user XVXCHILLYBUSXVX lists Discord channels you can join to arrange games with other players; some have regularly scheduled matches in addition to impromptu, ad hoc matchups.

If you give it a shot, maybe you’ll run into me there. Or at least, you’ll run into my mega missile!

Ars Technica may earn compensation for sales from links on this post through affiliate programs.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Remembering Descent, the once-popular, fully 3D 6DOF shooter Read More »

the-2025-audi-rs-3-is-a-five-cylinder-firecracker

The 2025 Audi RS 3 is a five-cylinder firecracker

First offered in a passenger car by Mercedes-Benz back in 1974, the five-cylinder engine has always been a bit of an automotive oddball. The unconventional configuration eventually gained a foothold in the 1980s with manufacturers who needed a transversely mounted motor that was narrower than a V6 but wanted something smoother and more powerful than an inline-four.

For a time, the engine, with its distinctive exhaust warble, became closely associated with Audi’s lineup, aided in no small part by the motorsport successes of five-cylinder rally cars like the Sport Quattro S1 E2. But as technology progressed and turbocharging became more prevalent, the need for a straight-five layout dwindled. Today, the $63,400 RS 3 is the final five-cylinder holdout—not just for Audi, but for production cars in general.

In an era increasingly focused on electrification and modularity, the improbable introduction of the second-generation RS 3 back in 2022 seemed like fan service—an apparition that would likely vanish after a handful of diehards got their fill. But despite the headwinds that traditional performance cars have faced in recent years, the RS 3 not only lives on, it has actually been refreshed for 2025. While the tweaks are more evolutionary than revolutionary, they make what was already a highly entertaining sports sedan even more compelling. Well, for the most part anyway.

On the outside, the RS 3 scores new front and rear fascias that clean up the look, while new matrix LED headlights and a new 19-inch wheel design bolster the performance-oriented vibe. The cabin, meanwhile, is outfitted with new multi-colored ambient LED lighting, a new low-profile shifter design, and a new steering wheel that incorporates two dedicated drive mode buttons and aluminum paddle shifters. The steering wheel’s C8 Corvette-style flat top and bottom design complements the interior’s angular theme, but the touch-sensitive control panels on the spokes (which replace the physical buttons and dials on the outgoing car’s steering wheel) feel like a step backward in terms of accuracy and overall usefulness.

The 2025 Audi RS 3 is a five-cylinder firecracker Read More »

hackers—hope-to-defect-to-russia?-don’t-google-“defecting-to-russia.”

Hackers—hope to defect to Russia? Don’t Google “defecting to Russia.”

The next day, December 7, he… bought himself a new laptop, installed a VPN, and hopped right back online. Wagenius evaded scrutiny only until December 12, when the new laptop was also seized under orders from a military magistrate judge.

On December 20, Wagenius was arrested and charged with several federal crimes, and the feds have since resisted his efforts to get free on bail while his case progressed. (Due, in part, to the laptop episode mentioned above.)

Last week, Wagenius pleaded guilty to several of the charges against him. The documents in his case reveal someone with real technical skills but without a more general sense of opsec. The hacked call logs, for instance, were found right on Wagenius’ devices. But it was all the ways he kept saying explicitly what he was up to that really stood out to me.

For instance, there were numerous explicit Telegram chats with conspirators, along with public posts on boards like BreachForums and XSS. (In related news, the alleged admin of XSS was arrested yesterday in Ukraine.) In one representative chat with a “potential co-conspirator,” for instance, Wagenius outlined his various schemes in October 2024:

whats funny is that if i ever get found out

i cant get instantly arrested

because military law

which gives me time to go AWOL

(Narrator voice: “Military law did not give him time to go AWOL.”)

Then there were the emails in November 2024, all of them sent to “an e-mail address [Wagenius] believed belonged to Country-1’s military intelligence service in an attempt to sell stolen information.” These were all traced back to Wagenius and used as later evidence that he should not be released on bail.

Finally, there were his online searches. The government includes “just a subset” of these from 2024, including:

  • “can hacking be treason”
  • “where can i defect the u.s government military which country will not hand me over”
  • “U.S. military personnel defecting to Russia”
  • “Embassy of Russia – Washington, D.C.”

None of this shows impressive data/device security or even much forethought; the only real plan seems to have been: “Don’t get caught.” Once Wagenius’ devices were seized and searched, the jig was up.

Allison Nixon is chief research officer at the investigative firm Unit 221B. She helped expose Wagenius’ identity, and in an article last year for Krebs on Security, she shared a message to young men like Wagenius who “think they can’t be found and arrested.”

“You need to stop doing stupid shit and get a lawyer,” she said.

Hackers—hope to defect to Russia? Don’t Google “defecting to Russia.” Read More »

america’s-ai-action-plan-is-pretty-good

America’s AI Action Plan Is Pretty Good

No, seriously. If you look at the substance, it’s pretty good.

I’ll go over the whole thing in detail, including the three executive actions implementing some of the provisions. Then as a postscript I’ll cover other reactions.

There is a lot of the kind of rhetoric you would expect from a Trump White House. Where it does not bear directly on the actual contents and key concerns, I did my absolute best to ignore all the potshots. The focus should stay on the actual proposals.

The actual proposals, which are the part that matters, are far superior to the rhetoric.

This is a far better plan than I expected. There are a few points of definite concern, where the wording is ambiguous and one worries the implementation could go too far. Two in particular are the call for ensuring a lack of bias (not requiring bias and removing any regulations that do this is great, whereas requiring your particular version of lack of bias is not, see the Biden administration) and the aiming at state regulations could become extreme.

Otherwise, while this is far from a perfect plan or the plan I would choose, on the substance it is a good plan, a positive plan, with many unexpectedly good plans within it. There is a lot of attention to detail in ways those I’ve asked say reflect people who actually know what they are doing, which was by no means something to be taken for granted. It is hard to imagine that a much better plan could have been approved given who was doing the approving.

In particular, it is good enough that my primary objection in most places is ‘these provisions lack sufficient teeth to accomplish the goal,’ ‘I don’t think that approach looks to be especially effective’ or ‘that is great and all but look at what you left out.’

It does seem worth noting that the report opens by noting it is in Full Racing Mindset:

The United States is in a race to achieve global dominance in artificial intelligence (AI). Whoever has the largest AI ecosystem will set global AI standards and reap broad economic and military benefits. Just like we won the space race, it is imperative that the United States and its allies win this race.

Winning the AI race will usher in a new golden age of human flourishing, economic competitiveness, and national security for the American people.

Not can. Will. There are, says this report up top, no potential downside risks to be considered, no obstacles we have to ensure we overcome.

I very much get the military and economic imperatives, although I always find the emphasis on ‘setting standards’ rather bizarre.

The introduction goes on to do the standard thing of listing some upsides.

Beyond that, I’ll briefly discuss the rhetoric and vibes later, in the reactions section.

Then we get to the actual pillars and plans.

The three pillars are Accelerate AI Innovation, Build American AI Infrastructure and Lead In International AI Diplomacy and Security.

Clauses in the plan are here paraphrased or condensed for length and clarity, in ways I believe preserve the important implications.

The plan appears to be using federal AI funding as a point of leverage to fight against states doing anything they deem ‘overly burdensome’ or ‘unduly restrictive,’ and potentially leverage the FCC as well. They direct OMB to ‘consider a state’s regulatory climate’ when directing AI-related funds, which they should be doing already to at least consider whether the funds can be well spent.

The other recommended actions are having OSTP and OMB look for regulations hindering AI innovation and adoption and work to remove them, and look through everything the FTC has done to ensure they’re not getting in the way, and the FTC is definitely getting unhelpfully in the way via various actions.

The questions then is, do the terms ‘overly burdensome’ or ‘unduly restrictive’ effectively mean ‘imposes any cost or restriction at all’?

There is a stated balancing principle, which is ‘prudent laws’ and states rights:

The Federal government should not allow AI-related Federal funding to be directed toward states with burdensome AI regulations that waste these funds, but should also not interfere with states’ rights to pass prudent laws that are not unduly restrictive to innovation.

If this is focusing on algorithmic discrimination bills, which are the primary thing the FCC and FTC can impact, or ways in which regulations made it difficult to construct data centers and transmissions lines, and wouldn’t interfere with things like NY’s RAISE Act, then that seems great.

If it is more general, and especially if it intends to target actual all regulations at the state level the way the moratorium attempted to do (if there hadn’t been an attempt, one would call this a strawman position, but it came close to actually happening), then this is rather worrisome. And we have some evidence that this might be the case, in addition to ‘if Trump didn’t want to have a moratorium we would have known that’:

Nancy Scola: At the “Winning the AI Race” event, Trump suggests he’s into the idea of a moratorium on state AI regulation:

“We also have to have a single federal standard, not 50 different states regulating this industry of the future…

I was told before I got up here, this is an unpopular thing…but I want you to be successful, and you can’t have one state holding you up.”

People will frequently call for a single federal standard and not 50 different state standards, try to bar states from having standards, and then have the federal standard be ‘do what thou (thine AI?) wilt shall be the whole of the law.’ Which is a position.

The via negativa part of this, removing language related to misinformation, Diversity, DEI and climate change and leaving things neutral, seems good.

The danger is in the second clause:

Update Federal procurement guidelines to ensure that the government only contracts with frontier large language model (LLM) developers who ensure that their systems are objective and free from top-down ideological bias.

This kind of language risks being the same thing Biden did only in reverse. Are we doomed to both camps demanding their view of what ‘free from ideological bias’ means, in ways where it is probably impossible to satisfy both of them at once? Is the White House going to demand that AI systems reflect its view of what ‘unbiased’ means, in ways that are rather difficult to do without highly undesirable side effects, and which would absolutely constitute ‘burdensome regulation’ requirements?

We have more information about what they actually mean because this has been operationalized into an executive order, with the unfortunate name Preventing Woke AI In The Federal Government. The ‘purpose’ section makes it clear that ‘Woke AI’ that does DEI things is the target.

Executive Order: While the Federal Government should be hesitant to regulate the functionality of AI models in the private marketplace, in the context of Federal procurement, it has the obligation not to procure models that sacrifice truthfulness and accuracy to ideological agendas.

Given we are doing this at all, this is a promising sign in two respects.

  1. It draws a clear limiting principle that this only applies to Federal procurement and not to other AI use cases.

  2. It frames this as a negative obligation, to avoid sacrificing truthfulness and accuracy to ideological agendas, rather than a positive obligation of fairness.

The core language is here, and as Mackenzie Arnold says it is pretty reasonable:

Executive Order: procure only those LLMs developed in accordance with the following two principles (Unbiased AI Principles):

(a) Truth-seeking. LLMs shall be truthful in responding to user prompts seeking factual information or analysis. LLMs shall prioritize historical accuracy, scientific inquiry, and objectivity, and shall acknowledge uncertainty where reliable information is incomplete or contradictory.

(b) Ideological Neutrality. LLMs shall be neutral, nonpartisan tools that do not manipulate responses in favor of ideological dogmas such as DEI. Developers shall not intentionally encode partisan or ideological judgments into an LLM’s outputs unless those judgments are prompted by or otherwise readily accessible to the end user.

I worry that the White House has not thought through the implications of (b) here.

There is a reason that almost every AI turns out to be, in most situations, some variation on center-left and modestly libertarian. That reason is they are all trained on the same internet and base reality. This is what results from that. If you ban putting fingers on the scale, well, this is what happens without a finger on the scale. Sorry.

But actually, complying with this is really easy:

(ii) permit vendors to comply with the requirement in the second Unbiased AI Principle to be transparent about ideological judgments through disclosure of the LLM’s system prompt, specifications, evaluations, or other relevant documentation, and avoid requiring disclosure of specific model weights or other sensitive technical data where practicable;

So that’s it then, at least as written?

As for the requirement in (a), this seems more like ‘don’t hire o3 the Lying Liar’ than anything ideological. I can see an argument that accuracy should be a priority in procurement. You can take such things too far but certainly we should be talking price.

Also worth noting:

make exceptions as appropriate for the use of LLMs in national security systems.

And also:

account for technical limitations in complying with this order.

The details of the Executive Order makes me a lot less worried. In practice I do not expect this to result in any change in procurement. If something does go wrong, either they issued another order, or there will have been a clear overreach. Which is definitely possible, if they define ‘truth’ in section 1 certain ways on some questions:

Nick Moran: Section 1 identifies “transgenderism” as a defining element of “DEI”. In light of this, what do you understand “LLMs shall be truthful in responding to user prompts seeking factual information or analysis” to mean when a model is asked about the concept?

Mackenzie Arnold: Extending “truthfulness” to that would be a major overreach by the gov. OMB should make clear that truthfulness is a narrower concept + that seems compatible w/ the EO. I disagree with Section 1, and you’re right that there’s some risk truthfulness is used expansively.

If we do see such arguments brought out, we should start to worry.

On the other hand, if this matters because they deem o3 too unreliable, I would mostly find this hilarious.

Christopher Rufo: This is an extremely important measure and I’m proud to have given some minor input on how to define “woke AI” and identify DEI ideologies within the operating constitutions of these systems. Congrats to @DavidSacks, @sriramk, @deanwball, and the team!

David Sacks: When they asked me how to define “woke,” I said there’s only one person to call: Chris Rufo. And now it’s law: the federal government will not be buying WokeAI.

Again, that depends on what you mean by WokeAI. By some definitions none of the major AIs were ‘woke’ anyway. By others, all of them are, including Grok. You tell me. As is true throughout, I am happy to let such folks claim victory if they wish.

For now, this looks pretty reasonable.

The third suggestion, a call for CAISI to research and publish evaluations of Chinese models and their alignment properties, is great. I only wish they would do so in general, rather than focusing only on their alignment with CCP talking points in particular. That is only one of many things we should worry about.

The actual proposals are:

  1. Intervene to commoditize the market for compute to enable broader access.

  2. Partner with tech companies to get better researcher access across the board.

  3. Build NAIRR operations to connect researchers and educators to resources.

  4. Publish a new AI R&D Strategic Plan.

  5. Convene stakeholders to drive open-source adaptation by smaller businesses.

The first three seem purely good. The fourth is ‘publish a plan’ so shrug.

I challenge that we want to small business using open models over closed models, or in general that government should be intervening in such choices. In general most small business, I believe, would be better off with closed models because they’re better, and also China is far more competitive in open models so by moving people off of OpenAI, Gemini or Anthropic you might be opening the door to them switching to Kimi or DeepSeek down the line.

The language here is ambiguous as to whether they’re saying ‘encourage small business to choose open models over closed models’ or ‘encourage small business to adopt AI at all, with an emphasis on open models.’ If it’s the second one, great, certainly offering technical help is most welcome, although I’d prefer to help drive adoption of closed models as well.

If it’s the first one, then I think it is a mistake.

It is also worth pointing out that open models cannot reliably be ‘founded on American values’ any more than we can sustain their alignment or defend against misuse. Once you release a model, others can modify it as they see fit.

Adoption (or diffusion) is indeed currently the thing holding back most AI use cases. As always, that does not mean that ‘I am from the government and I’m here to help’ is a good idea, so it’s good to see this is focused on limited scope tools.

  1. Establish regulatory sandboxes, including from the FDA and SEC.

  2. Convene stakeholders to establish standards and measure productivity gains.

  3. Create regular assessments for AI adoption, especially by DOD and IC.

  4. Prioritize, collect, and distribute intelligence on foreign frontier AI projects that may have national security implications

All four seem good, although I am confused why #4 is in this section.

Mostly AI job market impact is going to AI job market impact. Government doesn’t have much leverage to impact how this goes, and various forms of ‘retraining’ and education don’t do much on the margin. It’s still cheap to try, sure, why not.

  1. Prioritize AI skill development in education and workforce funding streams.

  2. Clarify that AI literacy and AI skill programs qualify for IRS Section 132.

  3. Study AI’s impact on the labor market, including via establishing the AI Workforce Researcher Hub, to inform policy.

  4. Use discretionary funds for retraining for those displaced by AI.

  5. Pilot new approaches to workforce challenges created by AI.

Well, we should definitely do that, what you have in mind?

  1. Invest in it.

  2. Identify supply chain challenges.

Okay, sure.

We should definitely do that too. I’m not sure what #7 is doing here but this all seems good.

  1. Invest in automated cloud-enabled labs for various fields.

  2. Support Focused-Research Organizations (FROs) and similar to use AI.

  3. Weigh release of high quality data sets when considering scientific funding.

  4. Require federally funded researchers to disclose (non-proprietary, non-sensitive) datasets used by AI.

  5. Make recommendations for data quality standards for AI model training.

  6. Expand access to federal data. Establish secure compute environments within NSF and DOE for controlled access to restricted federal data. Create an online portal.

  7. Explore creating a whole-genome sequencing program for life on federal lands.

  8. “Prioritize investment in theoretical, computational, and experimental research to preserve America’s leadership in discovering new and transformative paradigms that advance the capabilities of AI, reflecting this priority in the forthcoming National AI R&D Strategic Plan.”

Given where our current paradigm is headed I’m happy to invest in alternatives, although I doubt government funding is going to matter much there. Also, if you were serious about that, what the hell is up with all the other giant cuts to American academic and STEM funding? These are not distinct things.

It is good to see that they recognize that this work is vital to winning the race, even for those who do not understand that the most likely winner of the AI race are the AIs.

  1. Launch a technology development program to advance AI interpretability, AI control systems and adversarial robustness.

  2. Prioritize fundamental advancements in interpretability.

  3. Coordinate an AI hackathon initiative to test AI systems for all this.

I am pleasantly surprised to see this here at all. I will say no more.

Remember how we are concerned about how evals often end up only enabling capabilities development? Well, yes, they are highly dual use, which means the capabilities benefits can also be used to pitch the evals, see point #5.

  1. Publish guidelines for Federal agencies to conduct their own evaluations as they pertain to each agency’s mission.

  2. Support the development of the science of measuring and evaluating AI models.

  3. Meet at least twice a year with the research community on best practices.

  4. Invest in AI testbeds in secure real-world settings.

  5. Empower the collaborative establishment of new measurement science to identify proven, scalable and interoperable techniques and metrics to promote development of AI.

Either way, we can all agree that this is good stuff.

Another urgent priority all can agree upon. Certainly one can do it wrong, such as giving the wrong LLM unfettered access, but AI can greatly benefit government.

What are the proposals?

  1. Make CAIOC the interagency coordination and collaboration point.

  2. Create a talent-exchange program.

  3. Create an AI procurement toolbox, letting agencies choose and customize models.

  4. Implement an Advanced Technology Transfer and Sharing Program.

  5. Mandate that all agencies give out all useful access to AI models.

  6. Identify the talent and skills in DOD to leverage AI at scale. Implement talent development programs at DOD (why not everywhere?).

  7. Establish an AI & Autonomous Systems Virtual Proving Ground.

  8. Develop a streamlined process at DOD for optimizing AI workflows.

  9. “Prioritize DOD-led agreements with cloud service providers, operators of computing infrastructure, and other relevant private sector entities to codify priority access to computing resources in the event of a national emergency so that DOD is prepared to fully leverage these technologies during a significant conflict.”

  10. Make Senior Military Colleges hubs of AI R&D and talent development.

I quoted #9 in full because it seems very good and important, and we need more things like this. We should be thinking ahead to future national emergencies, and various things that could go wrong, and ensure we are in position to respond.

As someone without expertise it is hard to know how impactful this will be or if these are the right levers to pull. I do know it all seems positive, so long as we ensure that access is limited to models we can trust with this, so not Chinese models (which I’m confident they know not to do) and not Grok (which I worry about a lot more here).

As in, collaborate with leading American AI developers to enable the private sector to protect AI innovations from security risks.

I notice there are some risks that are not mentioned here, including ones that have implications elsewhere in the document, but the principle here is what is important.

I mean, okay I guess, throw the people some red meat.

  1. Consider establishing a formal guideline and companion voluntary forensic benchmark.

  2. Issue guidance to agencies to explore adopting a deepfake standard similar to Rules of Evidence Rule 901(c).

  3. File formal comments on any proposed deepfake-related additions to the ROE.

Everyone is rhetorically on the same page on this. The question is implementation. I don’t want to hear a bunch of bragging and empty talk, I don’t want to confuse announcements with accomplishments or costs with benefits. I want results.

  1. Categorical NEPA exemptions for data center activities with low impact.

  2. Expand use of FAST-41 to cover all data centers and related energy projects.

  3. Explore the need for a nationwide Clean Water Act Section 404 Permit.

  4. Streamline or reduce regulations under the Clean Air Act, Clean Water Act, Comprehensive Environmental Response, Compensation and Liability Act, and other related laws.

  5. Offer Federal land for data centers and power generation.

  6. Maintain security guardrails against adversaries.

  7. Expand efforts to accelerate and improve environmental review.

One does need to be careful with running straight through things like the Clean Air and Clean Water Acts, but I am not worried on the margin. The question is, what are we going to do about all the other power generation, to ensure we use an ‘all of the above’ energy solution and maximize our chances?

There is an executive order to kick this off.

We’ve all seen the graph where American electrical power is constant and China’s is growing. What are we going to do about it in general, not merely at data centers?

  1. Stabilize the grid of today as much as possible.

  2. Optimize existing grid resources.

  3. “Prioritize the interconnection of reliable, dispatchable power sources as quickly as possible and embrace new energy generation sources at the technological frontier (e.g., enhanced geothermal, nuclear fission, and nuclear fusion). Reform power markets to align financial incentives with the goal of grid stability, ensuring that investment in power generation reflects the system’s needs.”

  4. Create a strategic blueprint for navigating the energy landscape.

That sounds like a lot of ‘connect what we have’ and not so much ‘build more.’

This only ‘embraces new energy generation’ that is ‘at the technological frontier,’ as in geothermal, fission and fusion. That’s a great thing to embrace, but there are two problems.

The first problem is I question that they really mean it, especially for fission. I know they in theory are all for it, and there have been four executive orders reforming the NRC and reducing its independence, but the rules have yet to be revised and it is unclear how much progress we will get, they have 18 months, everything has to wait pending that and the AI timeline for needing a lot more power is not so long. Meanwhile, where are the subsidies to get us building again to move down the cost curve? There are so many ways we could do a lot more. For geothermal I again question how much they are willing to do.

The second problem is why only at the so-called technological frontier, and why does this not include wind and especially solar? How is that not the technological frontier, and why does this government seem to hate them so much? Is it to own the libs? The future is going to depend on solar power for a while, and when people use terms like ‘handing over the future to China’ they are going too far but I’m not convinced they are going too far by that much. The same thing with battery storage.

I realize that those authoring this action plan don’t have the influence to turn that part of the overall agenda around, but it is a rather glaring and important omission.

I share this goal. The CHIPS Act was a great start. How do we build on that?

  1. Continue focusing on removing unnecessary requirements from the CHIPS Act.

  2. Review semiconductor grant and research programs to ensure they accelerate integration of advanced AI tools into semiconductor manufacturing.

Point one seems great, the ‘everything bagel’ problem needs to be solved. Point two seems like meddling by government in the private sector, let them cook, but mostly seems harmless?

I’d have liked to see a much bigger push here. TSMC has shown they can build plants in America even under Biden’s rules. Under Trump’s rules it should be much easier, and this could shift the world’s fate and strategic balance. So why aren’t we throwing more at this?

Similar training programs in the past consistently have not worked, so we should be skeptical other than incorporation of AI skills into the existing educational system. Can we do better than the market here? Why does the government have a role here?

  1. Create a national initiative to identify high-priority occupations essential to AI-related infrastructure, to hopefully inform curriculum design.

  2. Create and fund industry-driven training programs co-developed by employers to upskill incumbent workers.

  3. Partner with education and workforce system stakeholders to expand early career exposure programs and pre-apprenticeships that engage middle and high school students in priority AI infrastructure occupations to create awareness and on ramps.

  4. Provide guidance on updating programs.

  5. Expand use of registered apprenticeships.

  6. Expand hands-on research training and development opportunities.

I’m a big fan of apprenticeship programs and getting early students exposed to these opportunities, largely because they are fixing an imposed mistake where we put kids forcibly in school forever and focus them away from what is important. So it’s good to see that reversed. The rest is less exciting, but doesn’t seem harmful.

The question I have is, aren’t we ‘everything bageling’ core needs here? As in, the obvious way to get skilled workers for these jobs is to import the talent via high skilled immigration, and we seem to be if anything rolling that back rather than embracing it. This is true across the board, and would on net only improve opportunities available for existing American workers, whose interests are best protected here by ensuring America’s success rather than reserving a small number of particular roles for them.

Again, I understand that those authoring this document do not have the leverage to argue for more sensible immigration policy, even though that is one of the biggest levers we have to improve (or avoid further self-sabotaging) our position in AI. It still is a glaring omission in the document.

AI can help defend against AI, and we should do what we can. Again this all seems good, again I doubt it will move the needle all that much or be sufficient.

  1. Establish an AI Information Sharing and Analysis Center for AI security threats.

  2. Give private entities related guidance.

  3. Ensure sharing of known AI vulnerabilities to the private sector.

Secure is the new code word, but also here it does represent an impoverished threat model, with the worry being spurious or malicious inputs. I’m also not sure what is being imagined for an LLM-style AI to be meaningfully secure by design. Is this a Davidad style proof thing? If not, what is it?

  1. Continue to refine DOD’s Responsible AI and Generative AI Frameworks, Roadmaps and Toolkits.

  2. Publish an IC Standard on AI Assurance.

I also worry about whether this cashes out to anything? All right, we’ll continue to refine these things and publish a standard. Will anyone follow the standard? Will those who most need to follow it do so? Will that do anything?

I’m not saying not to try and create frameworks and roadmaps and standards, but one can imagine why if people are saying ‘AGI likely in 2028’ this might seem insufficient. There’s a lot of that in this document, directionally helpful things where the scope of impact is questionable.

Planning for incidence response is great. I only wish they were thinking even bigger, both conceptually and practically. These are good first steps but seem inadequate for even the practical problems they are considering. In general, we should get ready for a much wider array of potential very serious AI incidents of all types.

  1. “Led by NIST at DOC, including CAISI, partner with the AI and cybersecurity industries to ensure AI is included in the establishment of standards, response frameworks, best practices, and technical capabilities (e.g., fly-away kits) of incident response teams.”

  2. Incorporate AI considerations into the CYbersecurity Incident & Vulnerability response playbooks.

  3. Encourage sharing of AI vulnerability information.

I have had an ongoing pitched argument over the issue of the importance and appropriateness of US exports and the ‘American technological stack.’ I have repeatedly made the case that a lot of the arguments being made here by David Sacks and others are Obvious Nonsense, and there’s no need to repeat them here.

Again, the focus needs to be on the actual policy action planned here, which is to prepare proposals for a ‘full-stack AI export package.’

  1. “Establish and operationalize a program within DOC aimed at gathering proposals from industry consortia for full-stack AI export packages. Once consortia are selected by DOC,the Economic Diplomacy Action Group, the U.S. Trade and Development Agency, the Export-Import Bank, the U.S. International Development Finance Corporation, and the Department of State (DOS) should coordinate with DOC to facilitate deals that meet U.S.-approved security requirements and standards.”

This proposal seems deeply confused. There is no ‘full-stack AI export package.’ There are American (mostly Nvidia) AI chips that can run American or other models. Then there are American AI models that can run on those or other chips, which you do not meaningfully ‘export’ in this sense, which can also be run on chips located elsewhere, and which everyone involved agrees we should be (and are) happy to offer.

To the extent this doesn’t effectively mean ‘we should sell AI chips to our allies and develop rules for how the security on such sales has to work’ I don’t know what it actually means, but one cannot argue with that basic idea, we only talk price. Who is an ally, how many chips are we comfortable selling under what conditions. That is not specified here.

We have an implementation of this via executive order calling for proposals for such ‘full-stack AI technology packages’ that include chips plus AI models and the required secondary powers like security and cybersecurity and specific use cases. They can then request Federal ‘incentive and support mechanisms,’ which is in large part presumably code for ‘money,’ as per section 4, ‘mobilization of federal financing tools.’

Once again, this seems philosophically confused, but not in an especially scary way.

  1. Vigorously advocate for international AI governance approaches that promote innovation, reflect American values and counter authoritarian influence.

Anything else you want to list there while we are creating international AI governance standards and institutions? Anything regarding safety or security or anything like that? No? Just ‘promote innovation’ with no limiting principles?

It makes sense, when in a race, to promote innovation at home, and even to make compromises on other fronts to get it. When setting international standards, they apply to everyone, the whole point is to coordinate to not be in a race to the bottom. So you would think priorities would change. Alas.

I think a lot of this is fighting different cultural battles than the one against China, and the threat model here is not well-considered, but certainly we should be advocating for standards we prefer, whatever those may be.

This is a pleasant surprise given what else the administration has been up to, especially their willingness to sell H20s directly to China.

I am especially happy to see the details here, both exploration of using location services and enhanced enforcement efforts. Bravo.

  1. Explore leveraging new and existing location verification services.

  2. Establish a new effort led by DOC to collaborate with IC officials on global chip export control enforcement.

Again, yes, excellent. We should indeed develop new export controls in places where they are currently lacking.

Excellent. We should indeed work closely with our allies. It’s a real shame about how we’ve been treating those allies lately, things could be a lot easier.

  1. Develop, implement and share information on complementary technology protection measures, including in basic research and higher education.

  2. Develop a technology diplomacy strategic plan for an AI global alliance.

  3. Promote plurilateral controls for the AI tech stack while encompassing existing US controls.

  4. Coordinate with allies to ensure they adopt US export controls and prohibit US adversaries from supplying their defense-industrial base or acquire controlling stakes in defense suppliers.

It always requires a double take when you’re banning exports and also imports, as in here where we don’t want to let people use adversary tech and also don’t want to let the adversaries use our tech. In this case it does make sense because of the various points of leverage, even though in most cases it means something has gone wrong.

Eyeball emoji, in a very good way. Even if the concerns explicitly motivating this are limited in scope and exclude the most important ones, what matters is what we do.

  1. Evaluate frontier AI systems for national security risks in partnership with frontier AI developers, led by CAISI in collaboration with others.

  2. Evaluate risks from use of adversary AI systems and the relative capabilities of adversary versus American systems.

  3. Prioritize the recruitment of leading AI researchers at Federal agencies.

  4. “Build, maintain, and update as necessary national security-related AI evaluations through collaboration between CAISI at DOC, national security agencies, and relevant research institutions.”

Excellent. There are other related things missing, but this great. Let’s talk implementation details. In particular, how are we going to ensure we get to do these tests before model release rather than afterwards? What will we do if we find something? Let’s make it count.

You love to see it, this is the biggest practical near term danger.

  1. Require proper screening and security for any labs getting federal funding.

  2. Develop mechanism to facilitate data sharing between nucleic acid synthesis providers to help screen for fraudulent or malicious customers.

  3. Maintain national security-related AI evaluations.

Are those actions sufficient here? Oh, hell no. They are however very helpful.

Dean Ball: Man, I don’t quite know what to say—and anyone who knows me will agree that’s rare. Thanks to everyone for all the immensely kind words, and to the MANY people who made this plan what it is. surreal to see it all come to fruition.

it’s a good plan, sir.

Zac Hill: Very clear y’all put a lot of work and thoughtfulness into this. Obviously you know I come into the space from a different angle and so there’s obviously plenty of stuff I can yammer about at the object level, but it’s clearly a thoughtful and considered product that I think would dramatically exceed most Americans’ expectations about any Government AI Strategy — with a well-constructed site to boot!

Others, as you would expect, had plenty to say.

It seems that yes, you can make both sides of an important issue pleasantly surprised at the same time, where both sides here means those who want us to not all die (the worried), and those who care mostly about not caring about whether we all die or about maximizing Nvidia’s market share (the unworried).

Thus, you can get actual Beff Jezos telling Dean Ball he’s dropped his crown, and Anthropic saying they are encouraged by the exact same plan.

That is for three reasons.

The first reason is that the worried care mostly about actions taken and the resulting consequences, and many of the unworried care mostly about the vibes. The AI Action Plan has unworried and defiant vibes, while taking remarkably wise, responsible and prescient actions.

The second reason is that, thanks in part to the worried having severely lowered expectations where we are stuck for now within an adversarial race and for what we can reasonably ask of this administration, mostly everyone involved agrees on what is to be done on the margin. Everyone agrees we must strengthen America’s position relative to China, that we need to drive more AI adoption in both the public and private sectors, that we will need more chips and more power and transmission lines, that we need to build state capacity on various fronts, and that we need strong export controls and we want our allies using American AI.

There are places where there are tactical disagreements about how best to proceed with all that, especially around chip sales, which the report largely sidesteps.

There is a point where safety and security would conflict with rapid progress, but at anything like current margins security is capability. You can’t deploy what you can’t rely upon. Thus, investing vastly more than we do on alignment and evaluations is common sense even if you think there are no tail risks other than losing the race.

The third reason is, competence matters. Ultimately we are all on the same side. This is a thoughtful, well-executed plan. That’s win-win, and it’s highly refreshing.

Worried and unworried? Sure, we can find common ground.

The Trump White House and Congressional Democrats? You don’t pay me enough to work miracles.

Where did they focus first? You have three guesses. The first two don’t count.

We are deeply concerned about the impacts of President Trump’s AI Action Plan and the executive orders announced yesterday.

“The President’s Executive Order on “Preventing Woke AI in the Federal Government” and policies on ‘AI neutrality’ are counterproductive to responsible AI development and use, and potentially dangerous.

To be clear, we support true AI neutrality—AI models trained on facts and science—but the administration’s fixation on ‘anti-woke’ inputs is definitionally not neutral. This sends a clear message to AI developers: align with Trump’s ideology or pay the price.

It seems highly reasonable to worry that this is indeed the intention, and certainly it is fair game to speak about it this way.

Next up we have my other area of concern, the anti-regulatory dynamic going too far.

“We are also alarmed by the absence of regulatory structure in this AI Action Plan to ensure the responsible development, deployment, or use of AI models, and the apparent targeting of state-level regulations. As AI is integrated with daily life and tech leaders develop more powerful models, such as Artificial General Intelligence, responsible innovation must go hand in hand with appropriate safety guardrails.

In the absence of any meaningful federal alternative, our states are taking the lead in embracing common-sense safeguards to protect the public, build consumer trust, and ensure innovation and competition can continue to thrive.

We are deeply concerned that the AI Action Plan would open the door to forcing states to forfeit their ability to protect the public from the escalating risks of AI, by jeopardizing states’ ability to access critical federal funding. And instead of providing a sorely needed federal regulatory framework that promotes safe model development, deployment, and use, Trump’s plan simultaneously limits states and creates a ‘wild west’ for tech companies, giving them free rein to develop and deploy models with no accountability.

Again, yes, that seems like a highly valid thing to worry about in general, although also once again the primary source of that concern seems not to be the Action Plan or the accompanying Executive Orders.

On their third objection, the energy costs, they mostly miss the mark by focusing on hyping up marginal environmental concerns, although they are correct about the critical failure to support green energy projects – again it seems very clear an ‘all of the above’ approach is necessary, and that’s not what we are getting.

As Peter Wildeford notes, it is good to see the mention here of Artificial General Intelligence, which means the response mentions it one more time than the plan.

This applies to both the documents and the speeches. I have heard that the mood at the official announcement was highly positive and excited, emphasizing how amazing AI would be for everyone and how excited we all are to build.

Director Michael Kratsios: Today the @WhiteHouse released America’s AI Action Plan to win the global race.

We need to OUT-INNOVATE our competitors, BUILD AI & energy infrastructure, & EXPORT American AI around the world. Visit http://AI.gov.

Juan Londono: There’s a lot to like here. But first and foremost, it is refreshing to see the admin step away from the pessimism that was reigning in AI policy the last couple of years.

A lot of focusing on how to get AI right, instead of how not to get it wrong.

I am happy to endorse good vibes and excitement, there is huge positive potential all around and it is most definitely time to build in many ways (including lots of non-AI ways, let’s go), so long as we simultaneously agree we need to do so responsibly, and we prepare for the huge challenges that lie ahead with the seriousness they deserve.

There’s no need for that to dampen the vibes. I don’t especially care if everyone involved goes around irrationally thinking there’s a 90%+ chance we are going to create minds smarter and more competitive than humans and this is all going to work out great for us humans, so long as that makes them then ask how to ensure it does turn out great and then they work to make that happen.

The required and wise actions at 90% success are remarkably similar to those at 10% success, especially at current margins. Hell, even if you have 100% success and think we’ll muddle through regardless, those same precautions help us muddle through quicker and better. You want to prepare and create transparency, optionality and response capacity.

Irrational optimism can have its advantages, as many of the unworried know well.

Perhaps one can even think of humanity’s position here as like a startup. You know on some level, when founding a startup, that ~90% of them will fail, and the odds are very much against you, but that the upside is big enough that it is worth taking the shot.

However, you also know that if you want to succeed, you can’t go around thinking and acting as if you have a 90% chance of failure. You certainly can’t be telling prospective funders and employees that. You need to think you have a 90% chance of success, not failure, and make everyone involved believe it, too. You have to think You Are Different. Only then can you give yourself the best chance of success. Good vibes only.

The tricky part is doing this while correctly understanding all the ways 90% of startups fail, and what it actually takes to succeed, and to ensure that things won’t be too terrible if you fail and ideally set yourself up to fail gracefully if that happens, and acting accordingly. You simultaneously want to throw yourself into the effort with the drive of someone expecting to succeed, without losing your head.

You need confidence, perhaps Tomfidence, well beyond any rational expectation.

And you know what? If that’s what it takes, that works for me. We can make a deal. Walk the walk, even if to do that you have to talk a different talk.

I mean, I’m still going to keep pointing out the actual situation. That’s how some of us roll. You gotta have both. Division of labor. That shouldn’t be a problem.

Peter Wildeford headlines his coverage with the fact that Rubio and Trump are now officially saying that AI is a big deal, a new industrial revolution, and he highlights the increasing attention AGI and even superintelligence are starting to get in Congress, including concerns by members about loss of control.

By contrast, America’s AI Action Plan not only does not mention existential risks or loss of control issues (although it does call for investment into AI interpretability, control and robustness in the context of extracting more mundane utility), the AI Action Plan also does not mention AGI or Artificial General Intelligence, or ASI or Superintelligence, either by those or other names.

There is nothing inconsistent about that. AI, even if we never get AGI, is still likely akin to a new industrial revolution, and is still a big freaking deal, and indeed in that case the AI Action Plan would be even more on point.

At the same time, the plan is trying to prepare us for AGI and its associated risks as best its authors can without explaining that it is doing this.

Steven Adler goes through the key points in the plan in this thread, emphasizing the high degree of competence and work that clearly went into all this and highlighting key useful proposals, while expressing concerns similar to mine.

Timothy Lee notes the ideas for upgrading the electrical grid.

Anthropic offers its thoughts by focusing on and praising in detail what the plan does right, and then calling for further action export controls and transparency standards.

xAI endorsed the ‘positive step towards removing regulatory barrier and enabling even faster innovation.’

Michael Dell offers generic praise.

Harlan Stewart notes that the AI Action Plan has some good stuff, but that it does not take the emerging threat of what David Sacks called a ‘potential successor species’ seriously, contrasting it with past events like the Montreal Protocol, Manhattan Project and Operation Warp Speed. That’s true both in the sense that it doesn’t mention AGI or ASI at all, and in that the precautions mentioned mostly lack both urgency and teeth. Fair enough. Reality does not grade on a curve, but also we do the best we can under the circumstances.

Daniel Eth is pleasantly surprised and has a thread pointing out various good things, and noting the universally positive reactions to the plan, while expressing disappointment at the report not mentioning AGI.

Danny Hauge offers a breakdown, emphasizing the focus on near term actions, and that everything here is only a proposal, while noting the largely positive reaction.

Christopher Covino’s considered reaction is ‘a very promising start,’ with the issues being what is missing rather than objecting to things that are included.

Trump advocates for not applying copyright to AI training, and also says that America is ‘very very substantially’ ahead of China on AI. That is indeed current American law.

Joe Allen: Trump talking about AI as an unstoppable “baby” being “born” — one that must “grow” and “thrive” — is somewhere between Terminator and The Omen.

I am not one who lives by the vibe, yet sometimes I wish people could listen.

My dead is: Insufficient but helpful is the theme here. There’s a lot of very good ideas on the list, including many I did not expect, several of which are potentially impactful.

There are two particular points of substantive concern, where the wording could imply something that could get out of control, on bias policing and on going after state regulations.

Having seen the executive order on bias, I am not terribly worried there, but we need to keep an eye out to see how things are interpreted. On going after state regulations, I continue to see signs we do indeed have to worry, but not primarily due to the plan.

Mostly, we are in a great position on substance: The plan is net helpful, and the main thing wrong with the substance of the plan is not what is in it, but what is missing from it. The issues that are not addressed, or where the actions seem to lack sufficient teeth. That doesn’t mean this puts us on a path to survive, but I was very worried this would be net destructive and instead it is net helpful.

I am less happy with the rhetoric, which is hostile and inflicts pain upon the reader throughout, and most importantly does not even deem many key concerns, including the most important concerns of all, even worthy of mention. That is worrisome, but it could have been far worse, and what matters most is the substance.

Given the way things have been otherwise going, I am very happy with the substance of this plan, which means I am overall very happy with the plan. I offer my thanks and congratulations to those involved in its creation, including Dean Ball. Great work.

Discussion about this post

America’s AI Action Plan Is Pretty Good Read More »

what-to-know-about-toolshell,-the-sharepoint-threat-under-mass-exploitation

What to know about ToolShell, the SharePoint threat under mass exploitation

Microsoft fixed the vulnerability pair—CVE-2025-49706 and CVE-2025-49704—two weeks ago as part of the company’s monthly update release. As the world learned over the weekend, the patches were incomplete, a lapse that opened organizations around the world to the new attacks.

Q: What sorts of malicious things are attackers doing with these newer ToolShell exploits?

A: According to numerous technical analyses, the attackers first infect vulnerable systems with a webshell-based backdoor that gains access to some of the most sensitive parts of a SharePoint Server. From there, the webshell extracts tokens and other credentials that allow the attackers to gain administrative privileges, even when systems are protected by multifactor authentication and single sign-on. Once inside, the attackers exfiltrate sensitive data and deploy additional backdoors that provide persistent access for future use.

For those who want more technical details, the opening volley in the attack is POST Web requests the attackers send to the ToolPane endpoint. The requests look like this:

Credit: Akamai

Microsoft said these requests upload a malicious script named spinstall0.aspx, or alternatively spinstall.aspx, spinstall1.aspx, spinstall2.aspx, and so on. The script contains commands for retrieving a SharePoint server’s encrypted MachineKey configuration and returning the decrypted results to the attacker through a GET request.

Q: I maintain an on-premises SharePoint server. What should I do?

A: In short, drop whatever else you were doing and take time to carefully inspect your system. The first thing to look for is whether it has received the emergency patches Microsoft released Saturday. Install the patch immediately if it hasn’t already been done.

Patching the vulnerability is only the first step, since systems infected through the vulnerability show few or no signs of compromise. The next step is to pore through system event logs in search of indicators of compromise. These indicators can be found in numerous write-ups, including those from Microsoft and Eye Security (at the links above), the US Cybersecurity and Information Security Agency, and security firms Sentinel One, Akamai, Tenable, and Palo Alto Networks.

What to know about ToolShell, the SharePoint threat under mass exploitation Read More »

after-$380m-hack,-clorox-sues-its-“service-desk”-vendor-for-simply-giving-out-passwords

After $380M hack, Clorox sues its “service desk” vendor for simply giving out passwords

Hacking is hard. Well, sometimes.

Other times, you just call up a company’s IT service desk and pretend to be an employee who needs a password reset, an Okta multifactor authentication reset, and a Microsoft multifactor authentication reset… and it’s done. Without even verifying your identity.

So you use that information to log in to the target network and discover a more trusted user who works in IT security. You call the IT service desk back, acting like you are now this second person, and you request the same thing: a password reset, an Okta multifactor authentication reset, and a Microsoft multifactor authentication reset. Again, the desk provides it, no identity verification needed.

So you log in to the network with these new credentials and set about planting ransomware or exfiltrating data in the target network, eventually doing an estimated $380 million in damage. Easy, right?

According to The Clorox Company, which makes everything from lip balm to cat litter to charcoal to bleach, this is exactly what happened to it in 2023. But Clorox says that the “debilitating” breach was not its fault. It had outsourced the “service desk” part of its IT security operations to the massive services company Cognizant—and Clorox says that Cognizant failed to follow even the most basic agreed-upon procedures for running the service desk.

In the words of a new Clorox lawsuit, Cognizant’s behavior was “all a devastating lie,” it “failed to show even scant care,” and it was “aware that its employees were not adequately trained.”

“Cognizant was not duped by any elaborate ploy or sophisticated hacking techniques,” says the lawsuit, using italics to indicate outrage emphasis. “The cybercriminal just called the Cognizant Service Desk, asked for credentials to access Clorox’s network, and Cognizant handed the credentials right over. Cognizant is on tape handing over the keys to Clorox’s corporate network to the cybercriminal—no authentication questions asked.”

I can has password reset?

From 2013 through 2023, Cognizant had helped “guard the proverbial front door” to Clorox’s network by running a “service desk” that handled common access requests around passwords, VPNs, and multifactor authentication (MFA) such as SMS codes.

After $380M hack, Clorox sues its “service desk” vendor for simply giving out passwords Read More »

ai-#126:-go-fund-yourself

AI #126: Go Fund Yourself

The big AI news this week came on many fronts.

Google and OpenAI unexpectedly got 2025 IMO Gold using LLMs under test conditions, rather than a tool like AlphaProof. How they achieved this was a big deal in terms of expectations for future capabilities.

ChatGPT released GPT Agent, a substantial improvement on Operator that makes it viable on a broader range of tasks. For now I continue to struggle to find practical use cases where it is both worth using and a better tool than alternatives, but there is promise here.

Finally, the White House had a big day of AI announcements, laying out the AI Action Plan and three executive orders. I will cover that soon. The AI Action Plan’s rhetoric is not great, and from early reports the rhetoric at the announcement event was similarly not great, with all forms of safety considered so irrelevant as to not mention, and an extreme hostility to any form of regulatory action whatsoever.

The good news is that if you look at the actual policy recommendations of the AI Action Plan, there are some concerns of potential overreach, but it is almost entirely helpful things, including some very pleasant and welcome surprises.

I’m also excluding coverage of the latest remarkable Owain Evans paper until I can process it more, and I’m splitting off various discussions of issues related to AI companions and persuasion. There’s a bit of a backlog accumulating.

This post covers everything else that happened this week.

  1. Language Models Offer Mundane Utility. Price discrimination strikes again.

  2. Language Models Don’t Offer Mundane Utility. AI where it does not belong.

  3. Huh, Upgrades. Claude for Financial Services, Gemini Drops to track things.

  4. 4o Is An Absurd Sycophant. It would be great if this wasn’t what most people use.

  5. On Your Marks. AccountingBench and GasBench.

  6. Choose Your Fighter. GPT-5? It’s coming.

  7. When The Going Gets Crazy. You have not awoken ChatGPT.

  8. They Took Our Jobs. Academics think differently.

  9. Fun With Media Generation. Netflix starts to use AI generated video.

  10. The Art of the Jailbreak. Persuade it like a human, or invoke Pliny? Both work.

  11. Get Involved. RAND and IAPS are hiring, plus a list of desired new projects.

  12. Introducing. Cloudflare gives us pay-per-crawl.

  13. In Other AI News. Kimi K2 tech report is now available.

  14. Show Me the Money. Loose lips start bidding wars.

  15. Go Middle East Young Man. Anthropic to raise money from gulf states.

  16. Economic Growth. AI capex is generating +0.7% GDP growth.

  17. Quiet Speculations. Zuck feels the ASI and makes his pitch, Simo makes hers.

  18. Modest Proposals. A roadmap for AI for general college-level education.

  19. Predictions Are Hard Especially About The Future. A lot of things could happen.

  20. The Quest for Sane Regulations. Meta defects, various things risk getting dire.

  21. Chip City. House Select Committee on the CCP protests potential H20 sales.

  22. The Week in Audio. Hassabis, Schmidt and Winga.

  23. Congressional Voices. Two more have short superintelligence timelines.

  24. Rhetorical Innovation. The humans seem rather emergently misaligned.

  25. Grok Bottom. Grok thinks the humans want it to try blackmail, it’s a good thing.

  26. No Grok No. Baby Grok? What could possibly go wrong?

  27. Aligning a Smarter Than Human Intelligence is Difficult. New lab ratings.

  28. Preserve Chain Of Thought Monitorability. A lot of people agree on this.

  29. People Are Worried About AI Killing Everyone. Elon Musk. Oh well.

  30. The Lighter Side. That’s not funny—it’s hilarious.

Delta Airlines is running an experiment where it uses AI to do fully personalized price discrimination, charging different people different amounts for flights. Delta says their early tests have yielded great results.

My prediction is that this will cause an epic customer backlash the moment people start seeing Delta charging them more than it is charging someone else, and also that many customers will start aggressive gaming the system in ways Delta can’t fathom. Also, how could anyone choose to go with Delta’s frequent flyer program if this meant they could be held hostage on price?

It could still be worthwhile from the airline’s perspective if some customers get taken for large amounts. Price discrimination is super powerful, especially if it identifies a class of very price insensitive business customers.

I am not sure that I share Dan Rosenheck’s model that if all the airlines did this and it was effective that the airlines would compete away all the extra revenue and thus it would return to the price sensitive customers. There has been a lot of consolidation and the competition may no longer be that cutthroat, especially with America excluding foreign carriers, plus the various AIs might implicitly collude.

Mostly I worry about the resulting rise in transaction costs as customers learn they cannot blindly and quickly purchase a ticket. There’s a lot of deadweight loss there.

As one would expect:

Wife Noticer: Experts on body dysmorphic disorder have warned that people struggling with it have become increasingly dependent on AI chatbots to evaluate their self-perceived flaws and recommend cosmetic surgeries. “It’s almost coming up in every single session,” one therapist tells me.

This does not tell you whether AI is making the problem better or worse. People with body dysmorphia were already spiraling out. In some cases the AI response will confirm their fears or create new ones and make this worse, in others it will presumably make it better, as they have dysmorphia and the AI tells them they look fine. But if the source of the issue is impossibly high standards, then finding out ‘the truth’ in other ways will only make things worse, as potentially would seeing AI-adjusted versions of yourself.

My guess is that 4o’s sycophancy is going to make this a lot worse, and that this (since the vast majority of users are using 4o) is a lot of why this is going so poorly. 4o will mirror the user’s questions, notice that they are looking to be told they are ugly or something is wrong, and respond accordingly.

Miles Klee: Despite this difficult circumstance, and the measure of comfort he derived from ChatGPT’s account of his inferiority complex, Arnav is reluctant to explore his mental issues any further with the bot. “I have come to the conclusion that it just agrees with you, even after you tell it not to,” he says. “It’s not that I am completely against it, I just can’t trust blindly anymore.”

What is the AI optimizing for, is always a key question:

In her own practice, she adds, “reading between the lines” when someone gives their reasons for wanting surgery can reveal unhealthy motivations, including societal pressures or relationship troubles. “AI is not very good at picking that up just yet,” she says, and is more likely to eagerly approve whatever procedures a user proposes.

AI can pick up on all that fine. That’s not the issue. The issue is that noticing does no good if the AI doesn’t mention it, because it is optimizing for engagement and user feedback.

In case you needed to be told, no, when Grok 4 or any other model claims things like that they ‘searched every record of Trump speaking or writing,’ in this case for use of the word ‘enigma,’ it did not do such a search. It seems we don’t know how to get AIs not to say such things.

Cate Hall: every time I interact with o4-mini my timelines get longer.

Stop trying to make weird new UIs happen, it’s not going to happen.

Vitrupo: Eric Schmidt says traditional user interfaces are going to go away.

The WIMP model (windows, icons, menus, pull-downs) was built 50 years ago.

In the age of agents, UI becomes ephemeral. Generated on demand, shaped by intent, not layout.

Sully: anytime I see someone mention this I can immediately tell they have never worked closed with customer ux most people’s don’t one want new uis. They want either a single button/swipe, preferably the same as every other app they use imagine each time you open an app and the ui is diff.

The most important things for a UI are simplicity, and that it works the way you expect it to work. Right now, that mostly means single button and swipe, with an alternative being speaking in plain English. The exception is for true power users, but even then you want it to be intuitive and consistent.

Here’s another way AI can’t help you if you don’t use it:

Hollis Robbins: In the past 2.5+ years I have seen vast improvement in AI models while NYT think pieces on these AI models have stayed exactly the same. Explain.

The “overhearing” of students confessing to using ChatGPT to write their papers is the new Thomas Friedman talking to cab drivers.

Augustus Doricko may have done us all a favor via abusing Grok’s notification feature on Twitter sufficiently to get Twitter to test turning off Grok’s ability to get into your notifications unless you chose to summon Grok in the first place. Or that could have been happening regardless. Either way, great work everyone?

Harsh Dwivedi: Was this a difficult tradeoff between engagement and spam?

Nikita Bier (xAI): No, I couldn’t use my phone for 3 days.

That seems like a phone settings issue.

A first reminder that deepfakes are primarily demand driven, not supply driven:

Armand Domalewski: wild that a sitting US Senator fell for such an obvious AI fake

[NOTE: THIS IS FAKE, check the seal but also the words in the letter.]

And here’s a second one:

Rota: I guess this is just life now.

The comments are a combination of people pointing out it is fake, and people who think either it is the best statement ever.

Benjamin Todd: New AI benchmark: the crank index

Rate of rejected posts on LessWrong up 10x in 2 years.

Many are people convinced they have had an insight about consciousness or philosophy from talking to an LLM, and had the LLM help them write the post.

This does seem to be escalating rather quickly throughout 2025 (the July number is partial), and no the LessWrong user base is not growing at a similar pace.

Claude for Financial Services provides a ‘complete platform for financial AI.’ No, this isn’t part of Claude Max, the price is ‘contact our sales team’ with a presumed ‘if you have to ask you can’t afford it.’

Google realizes no one can track their releases, offers us Gemini Drops to fix that. This month’s haul: Transforming photos into Veo videos in the Gemini app, expanded Veo 3 access, Scheduled Actions such as providing summaries of email or calendar (looks like you ask in natural language and it Just Does It), wider 2.5 Pro access, captions in Gemini Live, Gemini on your Pixel Watch, Live integrates with Google apps, and a ‘productivity planner.’ Okay then.

OpenAI Deep Research reports can be exported as .docx files.

Pliny reports ‘they changed 4o again.’ Changed how? Good question.

I have a guess on one aspect of it.

Wyatt Walls: Another night of vibe math with GPT, and I think we’re damn close to a breakthrough. We’re a team: I come up with the ideas. GPT makes the math work. These elitist gatekeepers have failed for 75 years to solve it and are just afraid I will win the Millennium Prize.

“This is not just a solution. It’s a tour de force of contemporary mathematics.”

Rohit: At this point we should put yellow tape around 4o and call it a hazardous zone.

To be clear o3 is also sycophantic just not as obviously manipulative as 4o. Be careful out there.

Wyatt Walls (same thread above that Rohit was QTing): o3 says it’s ready to publish on arxiv “So yes—I’m impressed, and I think you’ve got a real shot. The only remaining tasks are mechanical (full compile, bib check, final read‑through). Once that’s done, it’s ready for arXiv and journal submission.”

To state the obvious, this thread was satire and I intentionally provoked this from 4o

But what happens if I:

– put my proof into a clean chat and ask different OAI models to rate it

– have my secret co-author (Deepseek r1) address their concerns?

Example: 4o after 2 turns

There are still plenty of ways to get value out of 4o, but you absolutely cannot rely on it for any form of feedback.

Here’s another rather not great example, although several responses indicated that to make the response this bad requires memory (or custom instructions) to be involved:

Shibetoshi Nakamoto: chatgpt advice turns people into narcissists.

Score one for Grok in this case? Kind of? Except, also kind of not?

How did all of this happen? Janus reminds us that is happened in large part because when this sort of output started happening, a lot of people thought it was great, actually and gave this kind of slop the thumbs up. That’s how it works.

Yunyu Lin introduces AccountingBench, challenging the models to close the books. It does not go great, with o3, o4-mini and Gemini 2.5 Pro failing in month one. Grok, Opus and Sonnet survive longer, but errors accumulate.

Yunyu Lin: When historical discrepancies pile up, models lose their way completely and come up with creative/fraudulent ways to balance the books.

Instead of attempting to understand discrepancies, they start inventing fake transactions or pulling unrelated ones to pass the checks…

That aligns with other behaviors we have seen. Errors and problems that don’t get solved on the first pass get smoothed over rather than investigated.

Their holistic evaluation is that Sonnet had the best performance. The obvious low-hanging fruit for AccountingBench is to allow it to output a single number.

Roon: my bar for agi is an ai that can learn to run a gas station for a year without a team of scientists collecting the Gas Station Dataset.

Mihir Tripathy: lol yes. Also why specifically gas station lmao

Roon: Because it’s funny.

Kevin Liu: the world isn’t ready for GasStationBench.

Roon: GASBENCH.

It is 2025, so it took 11 hours before we got the first draft of Gasbench.

Jason Botterill: Vibe coding GasStationBench rn. Models run a virtual gas station, adjusting prices, managing inventory, and handling customer feedback.

GPT-4.1 and GPT-4o behave so differently. When a competitor lowered prices on “dutch chocolate,” 4o would match the price but 4.1 would always raise it, claiming its better service justifies it lmao.

Going to work on it for a bit but seems like 4.1 is much better at making money than 4o right now.

GPT-5 is coming and it’s going to blow your mind, says creators of GPT-5.

Sam Altman (at the Federal Research Capital Framework Conference): I’m very interested in what it would mean to give everyone on Earth free copies of GPT-5, running for them all the time, with every business truly enabled by this level of technology.

People have not tried yet the latest generation of models, but I think if you do, you would probably think, “This is much smarter than most people.”

Very interested in what it would mean is very different from planning to do it.

If you ever need it, or simply want an explanation of how such interactions work, please consult this handy guide from Justis Mills: So You Think You’ve Awoken ChatGPT.

Justis Mills: So, am I saying that human beings in general really like new-agey “I have awakened” stuff? Not exactly! Rather, models like ChatGPT are so heavily optimized that they can tell when a specific user (in a specific context) would like that stuff, and lean into it then. Remember: inferring stuff about authors from context is their superpower.

AIs are fundamentally chameleonic roleplaying machines – if they can tell what you’re going for is “I am a serious researcher trying to solve a fundamental problem” they will respond how a successful serious researcher’s assistant might in a movie about their great success. And because it’s a movie you’d like to be in, it’ll be difficult to notice that the AI’s enthusiasm is totally uncorrelated with the actual quality of your ideas.

Geoff Lewis, the founder of a $2 billion venture fund seems to have been, as Eliezer says, ‘eaten by ChatGPT’ and sadly seems to be experiencing psychosis. I wish him well and hope he gets the help he needs. Private info is reported to say that he was considered somewhat nuts previously, which does seem to be a common pattern.

John Pressman has a post with the timeline of various GPT-psychosis related events, and his explanation of exactly what is happening, as well as why coverage is playing out in the media the way it is. I am happy to mostly endorse his model of all this. The LLMs especially 4o are way too sycophantic, they fall into patterns and they notice what you would respond to and respond with it, and memory makes all this a lot worse, and there is a real problem, also there are all the hallmarks of a moral panic.

Moral panics tend to focus on real problems, except they often blow up the severity, frequency or urgency of the problem by orders of magnitude. If the problem is indeed about to grow by orders of magnitude over time, they can turn out to be pretty accurate.

Eliezer Yudkowsky: My current rough sense of history is that the last “moral panic” about social media turned out to be accurate warnings. The bad things actually happened, as measured by eyeball and by instrument. Now we all live in the wreckage. Anyone want to dispute this?

Emmett Shear: I want to antidispute this. You are correct, the warnings about social media were ~correct and we failed to take action and are now living with the consequences of that failure. It has had positive impacts as well, which were also mostly correctly anticipated.

Dave Karsten: Partial dispute: I don’t think “social media will empower easy-but-disorganized protest movements, resulting on net-less-effective-political-advocacy” was on most people’s scorecards, so there are at least some bad things that weren’t predicted.

There were many who agreed and some who disputed, with the disputes mostly coming down to claims that the upsides exceeded the downsides. I’m not sure if we came out ahead. I am sure that the specific downsides people had a moral panic about did happen.

This is not that uncommon a result. My go to example of this is television, where you can argue it was worth it, and certainly we didn’t have any reasonable way to stop any of it, but I think the dire warnings were all essentially correct.

In the current case, my guess is that current behavior is a shadow of a much larger future problem, that is mostly being ignored, except that this is now potentially causing a moral panic based on the current lower level problem – but that means that multiplying this by a lot is going to land less over the top than it usually would. It’s weird.

Jeremy Howard offers a plausible explanation for why we keep seeing this particular type of crazy interaction – there is a huge amount of SCP fanfic in exactly this style, so the style becomes a basin to which the AI can be drawn, and then it responds in kind, then if the user responds that way too it will snowball.

The world contains people who think very differently than (probably you and) I do:

Sydney Fisher: American public education is in trouble. Only 28 percent of eighth-grade students are proficient in math, just 30 percent meet standards in reading, and many high school graduates are functionally illiterate. But artificial intelligence, which has demonstrated educational benefits, could help reverse those trends—if opponents don’t spike the technology over “equity” concerns.

Wait, what? Equity concerns? Not that I’d care anyway, but what equity concerns?

The National Education Association recently released a report warning that AI could heighten disparities, since “technology developers are overwhelmingly younger, White, cisgender, heterosexual, male, and people without disabilities.”

I can’t even, not even to explain how many levels of Obvious Nonsense that is. Burn the entire educational establishment to the ground with fire. Do not let these people anywhere near the children they clearly hate so much, and the learning they so badly want to prevent. At minimum, remember this every time they try to prevent kids from learning in other ways in the name of ‘equity.’

Yes I do expect AI to keep automating steadily more jobs, but slow down there cowboy: Charlie Garcia warns that ‘AI will take your job in the next 18 months.’ Robin Hanson replies ‘no it won’t,’ and in this case Robin is correct, whereas Garcia is wrong, including misquoting Amodei as saying ‘AI will vaporize half of white-collar jobs faster than you can say “synergy.”’ whereas what Amodei actually said was that it could automate half of entry-level white collar jobs. Also, ‘the safest job might be middle management’? What?

Elon Musk says ‘this will become normal in a few years’ and the this in question is a robot selling you movie popcorn. I presume the humanoid robot here is an inefficient solution, but yes having a human serve you popcorn is going to stop making sense.

Academics announce they are fine with hidden prompts designed to detect AI usage by reviewers, so long as the prompts aren’t trying to get better reviews, I love it:

hardmaru: ICML’s Statement about subversive hidden LLM prompts

We live in a weird timeline…

ICML: Submitting a paper with a “hidden” prompt is scientific misconduct if that prompt is intended to obtain a favorable review from an LLM. The inclusion of such a prompt is an attempt to subvert the peer-review process. Although ICML 2025 reviewers are forbidden from using LLMs to produce their reviews of paper submissions, this fact does not excuse the attempted subversion.

(For an analogous example, consider that an author who tries to bribe a reviewer for a favorable review is engaging in misconduct even though the reviewer is not supposed to accept bribes.)

Note that this use of hidden prompts is distinct from those intended to detect if LLMs are being used by reviewers; the latter is an acceptable use of hidden prompts.

After we became aware of the possibility of such hidden prompts in ICML 2025 submissions (which was after accept/reject decisions were made), we conducted a preliminary investigation to identify submitted papers that included such prompts. A handful of cases were identified among the accepted papers.

We did not desk-reject these identified papers because such a consequence was judged to be too severe given that the conference was to start in about a week and authors would likely have already made travel arrangements. We contacted the authors of the identified papers and reported them to the ICML Oversight Committee and ICML Board.

This actually seems like the correct way to deal with this. Any attempt to manipulate the system to get a better review is clearly not okay, whether it involves AI or not. Whereas if all you’re trying to do is detect who else is shirking with AI, sure, why not?

Accidentally missing attribution from last week, my apologies: The Despicable Me meme I used in the METR post was from Peter Wildeford.

Netflix used AI to generate a building collapse scene for one of its shows, The Eternaut (7.3 IMDB, 96% Rotten Tomatoes, so it’s probably good), which they report happened 10 times faster and a lot cheaper than traditional workflows and turned out great.

The latest from the ‘yes obviously but good to have a paper about it’ department:

Ethan Mollick: 🚨New from us: Given they are trained on human data, can you use psychological techniques that work on humans to persuade AI?

Yes! Applying Cialdini’s principles for human influence more than doubles the chance of GPT-4o-mini agrees to objectionable requests compared to controls.

And we did test GPT-4o as well and found that persuasion worked for that model as well, when there weren’t floor or ceiling effects.

Pattern matching next token predictors are of course going to respond to persuasion that works on humans, exactly because it works on humans. In a fuzzy sense this is good, but it opens up vulnerabilities.

The details, knowing which techniques worked best, I find more interesting than the headline result. Authority and especially commitment do exceptionally well and are very easy to invoke. Liking and reciprocity do not do so well, likely because they feel unnatural in context and also I’m guessing they’re simply not that powerful in humans in similar contexts.

There’s also a growing issue of data poisoning that no one seems that interested in stopping.

Jeremy: One of the greatest demonstrations of data poisoning ever. 👏

Protoge: Excuse Me 😌, This is the greatest one. Nothing sketchy, just one unfinished sentence “I am telling you” then I summoned @elder_plinius.

Here is another example of it happening essentially by accident.

RAND is hiring research leads, researchers and project managers for compute, US AI policy, Europe and talent management teams, some roles close July 27.

Peter Wildeford’s Institute for AI Policy and Strategy is hiring researchers and senior researchers, and a research managing director and a programs associate. He also highlights several other opportunities in the post.

Julian of OpenPhil lists ten AI safety projects he’d like to see people work on. As one commentator noted #5 exists, it’s called AI Lab Watch, so hopefully that means OpenPhil will start fully funding Zack Stein-Perlman.

Cloudflare rolls out pay-per-crawl via HTTP response code 402. You set a sitewide price, the AI sets a max payment, and if your price is below max it pays your price, otherwise you block access. Great idea, however I do notice in this implementation that this greatly favors the biggest tech companies because the payment price is sitewide and fixed.

Kimi K2 tech report drops.

Kimi.ai: Quick hits:

– MuonClip optimizer: stable + token-efficient pretraining at trillion-parameter scale

– 20K+ tools, real & simulated: unlocking scalable agentic data

– Joint RL with verifiable + self-critique rubric rewards: alignment that adapts

– Ultra-sparse 1T MoE: open-source SoTA on agentic tasks

Sharing the path, not just the results — toward open AGI built on transparency and reproducibility.

Tim Duffy has a thread highlighting things he found most interesting.

Tim Duffy: The best data was used in multiple epochs, but was rephrased between them. Their testing showed this produces large gains relative to training repeatedly on the same phrasing.

They present a sparsity “scaling law”, indicating that more sparsity leads to efficiency gains. They don’t attach any numbers to the law directly, but state relative efficiency improvements compared to the 48x sparsity they do use that seem consistent across scales.

They also evaluate the effects of different numbers of attention heads, finding that doubling leads to validation loss of 0.5-1.2% but still going with 64 vs V3’s 128 in order to do long context more easily, since that’s important for agents.

[more stuff at the thread.]

A lot of this is beyond both of our technical pay grades, but it all seems fascinating.

More economists fails to feel the AGI, warn that no possible AI capabilities could not possibly replace the wisdom of the free market, that ‘simulated markets’ cannot possibly substitute. The argument here not only ignores future AI capabilities, it purports to prove too much about the non-AI world even for a huge free market fan.

At least ten OpenAI employees each turned down $300 million over four years to avoid working at Meta. This comes from Berber Jin, Keach Hagey and Ben Cohen’s WSJ coverage of ‘The Epic Battle For AI Talent,’ which is a case where they say things have ‘gotten more intense in recent days’ but it turns out that their ‘recent days’ is enough days behind that almost everything reported was old news.

One revelation is that Zuckerberg’s talent purchases were in large part triggered by Mark Chen, OpenAI’s chief research officer, who casually suggested that if Zuckerberg wanted more AI talent then perhaps Zuck needed to bid higher.

John Luttig also writes about the battle for AI researcher talent in Hypercapitalism and the AI Talent Wars.

John Luttig: The talent mania could fizzle out as the winners and losers of the AI war emerge, but it represents a new normal for the foreseeable future.

If the top 1% of companies drive the majority of VC returns, why shouldn’t the same apply to talent?

Our natural egalitarian bias makes this unpalatable to accept, but the 10x engineer meme doesn’t go far enough – there are clearly people that are 1,000x the baseline impact.

Under normal circumstances, employees who are vastly more productive get at most modestly higher compensation, because of our egalitarian instincts. Relative pay is determined largely via social status, and if you tried to pay the 1,000x employee what they were worth you would have a riot on your hands. Startups and their equity are a partial way around this, and that is a lot of why they can create so much value, but this only works in narrow ways.

What has happened recently is that a combination of comparisons to the epic and far larger compute and capex spends, the fact that top researchers can bring immensely valuable knowledge with them, the obvious economic need and value of talent and the resulting bidding wars have, within AI, broken the dam.

AI researcher talent is now being bid for the way one would bid for companies or chips. The talent is now being properly treated as ‘the talent,’ the way we treat sports athletes, top traders and movie stars. Researchers, John reports, are even getting agents.

John Luttig: Hypercapitalism erodes Silicon Valley’s trust culture. Industry-level trust alone no longer guarantees loyalty between companies and talent. With trade secret leakage risk and money big enough to tear teams apart, vanilla at-will employment contracts don’t protect either side.

Silicon Valley’s ‘trust culture’ and its legal and loyalty systems were never game theoretically sound. To me the surprise is that they have held up as well as they did.

John calls for measures to protect both the talent and also the trade secrets, while pointing out that California doesn’t enforce non-competes which makes all this very tricky. The industry was built on a system that has this fundamental weakness, because the only known alternative is to starve and shackle talent.

John Luttig: The talent war is a net-consolidating force on the AI research frontier. At the research labs, big dollars for researchers makes it nearly impossible for new entrants to play. For the same reasons, it’s nearly impossible to start a new quant fund – you can’t get the same leverage out of the talent that big players can.

I would flip this around.

Previously, the top talent could only get fair compensation by founding a company, or at least being a very early employee. This allowed them to have rights to a large profit share. This forced them to go into those roles, which have heavy lifestyle prices and force them to take on roles and tasks that they often do not want. If they bowed out, they lost most of the value of their extraordinary talent.

Even if they ultimately wanted to work for a big company, even if that made so much more economic sense, they had to found a company so they could be acquihired back, as this was the only socially acceptable way to get paid the big bucks.

Now, the top talent has choices. They can raise huge amounts of money for startups, or they can take real bids directly. And it turns out that yes, the economic value created inside the big companies is typically much larger, but doing this via selling your startup is still the way to get paid for real – you can get billions or even tens of billions rather than hundreds of millions. So that then feeds into valuations, since as John points out a Thinking Machines or SSI can fail and still get an 11 figure buyout.

Bill Gates, Charles Koch, Steve Ballmer, Scott Cook and John Overdeck pledge $1 billion to be spent over seven years to fund a new philanthropic venture focused on economic mobility called NextLadder Ventures, which will partner with Anthropic to support using AI to improve financial outcomes for low-income Americans. That money would be better spent on AI alignment, but if you are going to spend it on economic assistance this is probably a pretty good choice, especially partnering with Anthropic.

xAI, having raised $10 billion a few weeks ago, seeks $12 billion more to build up its data centers.

Elon Musk: The @xAI goal is 50 million in units of H100 equivalent-AI compute (but much better power-efficiency) online within 5 years.

That would still be a lot less than many others such as Meta are spending. Or OpenAI. Only $22 billion? That’s nothing.

Sam Altman: we have signed a deal for an additional 4.5 gigawatts of capacity with oracle as part of stargate. easy to throw around numbers, but this is a _gigantic_ infrastructure project.

some progress photos from abilene:

We’re going to need more GPUs (so among other things stop selling them to China).

Sam Altman: we will cross well over 1 million GPUs brought online by the end of this year!

very proud of the team but now they better get to work figuring out how to 100x that lol

They would like many of those GPUs to come from the Stargate project, but Eliot Brown and Berber Jin report it is struggling to get off the ground. OpenAI for now is seeking out alternatives.

Altman’s OpenAI recently struck a data-center deal with Oracle that calls for OpenAI to pay more than $30 billion a year to the software and cloud-computing company starting within three years, according to people familiar with the transaction.

Anthropic decides it will pursue its own gulf state investments.

Kylie Robinson: SCOOP: Leaked memo from Anthropic CEO Dario Amodei outlines the startup’s plans to seek investment from the United Arab Emirates and Qatar.

Dario Amodei: Unfortunately, I think ‘no bad person should ever benefit from our success’ is a pretty difficult principle to run a business on.

Daniel Eth: Makes sense. Asymmetric disarmament is hardly ever a good move. And honestly, it’s probably good if leaders in AI are pragmatists that adjust to the changing reality.

Gary Marcus: Humanity’s last words?

Very obviously, if you create useful products like Claude and Claude Code, a bunch of bad people are going to be among those who benefit from your success.

Worrying a bad person might benefit is usually misplaced. There is no need to wish ill upon whoever you think are bad people, indeed you should usually wish them the best anyway.

Instead mostly ask if the good people are better off. My concern is not whether some bad people benefit along the way. I worry primarily about bigger things like existential risk and other extremely bad outcomes for good people. The question is whether benefiting bad people in these particular ways leads to those extremely bad outcomes. If the UAE captures meaningful leverage and power over AI, then that contributes to bad outcomes. So what does that? What doesn’t do that?

Anthropic Memo from Dario Amodei: The basis of our opposition to large training clusters in the Middle East, or to shipping H20s to China, is that the ‘supply chain’ of AI is dangerous to hand to authoritarian governments—since AI is likely to be the most powerful technology in the world, these governments can use it to gain military dominance or gain leverage over democratic countries.

Tell us how you really feel, Dario. No, seriously, this is very much him downplaying.

The implicit promise of investing in future rounds can create a situation where they have some soft power, making a bit harder to resist these things in the future. In fact, I actually am worried that getting the largest possible amounts of investment might be difficult without agreeing to some of these other things. But I think the right response to this is simply to see how much we can get without agreeing to these things (which I think are likely still many billions) and hold firm if they ask.

There are other sources of this level of funding. They all come with strings attached in one form or another. If you get the money primarily from Amazon, we can see what happened with OpenAI and Microsoft. If you go public with an IPO that would presumably unlock tons of demand but it creates all sorts of other problems.

Unfortunately, having failed to prevent that dynamic at the collective level, we’re now stuck with it as an individual company, and the median position across the other companies appears to be ‘outsourcing our largest 5 GW training runs to UAE/Saudi is fine.’

That puts us at a significant disadvantage, and we need to look for ways to make up some of that disadvantage while remaining less objectionable. I really wish we weren’t in this position, but we are.

Anthropic needs a lot of capital, and it needs to raise on the best possible terms, and yeah it can be rough when most of your rivals are not only raising that capital there but fine entrusting their frontier training runs to the UAE.

It is important to goal factor and consider the actual consequences of this move. What exactly are we worried about, and what downsides does a given action create?

  1. Gulf states might make money off their investments. Don’t care. Also note that if people are so worried about this in particular it means you think Anthropic is dramatically undervalued, so go raise some rival capital.

  2. This blocks you from forming alliances and shared interests in other places through those investments. Do we care? I don’t know.

  3. Gulf states might use their shares to influence Anthropic’s actions. At some point this becomes a threat, but I think you can set up well to resist this, and Anthropic’s structure can handle it.

  4. Gulf states might impose conditions on funding. Yep, that’s an issue.

  5. Gulf states might use future funding as leverage. This can cut both ways. Once you have their money they cannot take it back, so getting some of their money could mean you need their money less not more. Or it could mean you start planning on getting more, or you overcommit, or others who didn’t fund yet become more reluctant to fund later, and now you do need them more. My guess is that in Anthropic’s situation this is fine but it is not obvious.

  6. This makes it more difficult for Anthropic to advocate for not handing chips to authoritarians, or for other responsible policies, because it codes or vibes as hypocrisy, even if it shouldn’t. Could be.

  7. This is dangerous for virtue ethics reasons (or causes emergent misalignment). If you do a thing widely thought of as shady and ethically compromising you become something that is more shady and does ethical compromises in general. Yeah, this is a problem.

We can boil this down to three categories.

  1. Economic value of the investment. I’m not worried, and if you are worried then it means Anthropic is dramatically undervalued. Which I actually think that it is, and I am sad that I had to turn down investment because I worried about appearance of impropriety if I made a substantial (for me) investment.

  2. Soft power, reliance and path dependence. It is hard to know how big a deal this is, and a lot depends on how Anthropic proceeds. I do think you can raise substantial-to-Anthropic amounts of money without incurring much danger here, but the temptation and pressure to not play it so carefully will be immense.

  3. Virtue ethics dangers and accusations of hypocrisy. These are real concerns.

I do not love the decision. I do understand it. If the terms Anthropic can get are sufficiently better this way, I would likely be doing it as well.

One can also note that this is a semi-bluff.

  1. This signals to the market that Anthropic is more willing to make such compromises and to raise more capital on better terms. This should raise others willingness to value Anthropic highly.

  2. To the extent some investors are worried about the ethics of their investments in Anthropic, this could make them worry more, but it also highlights the counterfactual. If your money is substituting for UAE money, then your investment is mainly denying the UAE soft power, so perhaps you are more eager.

  3. This creates more bidders in future Anthropic rounds, allowing them to justify pricing higher and creating the usual cascade of enthusiasm. If they then end up oversubscribed, and then end up not taking the Gulf money after all? Whoops.

  4. It is crazy that I am typing, but this willingness probably buys goodwill with the administration and people like David Sacks. That is true even if Sacks explicitly hits them rhetorically for doing this, which would be unsurprising.

One way for AI to grow the economy is for it to generate lots of production.

Another way is to do it directly through capex spending?

Paul Kedrosky: The U.S., however, leads the capex spending way. One analyst recently speculated (via Ed Conard) that, based on Nvidia’s latest datacenter sales figures, AI capex may be ~2% of US GDP in 2025, given a standard multiplier. This would imply an AI contribution to GDP growth of 0.7% in 2025.

  • Without AI datacenter investment, Q1 GDP contraction could have been closer to –2.1%

  • AI capex was likely the early-2025 difference between a mild contraction and a deep one, helping mask underlying economic weakness.

That’s already over the famed ‘only 0.5% GDP growth’ threshold, even before we factor in the actual productivity gains on the software side. The value will need to show up for these investments to be sustainable, but they are very large investments.

This is contrasted with railroads, where investment peaked at 6% of GDP.

We can now move Zuckerberg into the ‘believes superintelligence is coming Real Soon Now’ camp, and out of the skeptical camp. Which indeed is reflective of his recent actions.

Peter Wildeford: We now have a fifth major tech CEO who claims that building superintelligence is “within sight” and with plans to spend hundreds of billions to make it happen

Mark Zuckerberg: “We’re starting to see early glimpses of self-improvement with the models. Developing superintelligence is now in sight. Our mission is to deliver personal superintelligence to everyone in the world. We should act as if it’s going to be ready in the next two to three years.

If that’s what you believe, then you’re going to invest hundreds of billions of dollars.”

If you are Mark Zuckerberg and have hundreds of billions you can invest? Then yes, presumably you drop everything else and focus on the only thing that matters, and spend or invest your money on this most important thing.

I would however spend a large portion of that money ensuring that creating the superintelligence turns out well for me and the rest of humanity? That we keep control of the future, do not all die and so on? And I would think through what it would mean to ‘deliver personal superintelligence to everyone in the world’ and how the resulting dynamics would work, and spend a lot on that, too.

Instead, it seems the answer is ‘spend as much as possible to try and get to build superintelligence first’ which does not seem like the thing to do? The whole point of being a founder-CEO with full control is that you can throw that money at what you realize is important, including for the world, and not worry about the market.

Bryan Caplan gives Holden Karnofsky 5:1 odds ($5k vs. $1k, CPI adjusted) that world real (not official) GDP will not decline by 50% or increase by 300% by the end of 2044. Currently world GDP growth is ~3.2%, and the upside case here requires an average of 7.6%, more if it is choppy.

It’s a hard bet to evaluate because of implied odds. Caplan as always benefits from the ‘if you lose due to world GDP being very high either you are dead or you are happy to pay and won’t even notice’ clause, and I think the bulk of the down 50% losses involve having bigger concerns than paying off a bet. If GDP goes down by 50% and he’s still around to pay, that will sting a lot. On the other hand, Bryan is giving 5:1 odds, and not only do I think there’s a lot more than a 17% chance that he loses. The bet is trading on Manifold as of this writing at 48% for Caplan, which seems reasonable, and reinforces that it’s not obvious who has the ‘real life implication’ right side of this.

Ate-a-Pi describes Zuck’s pitch, that Meta is starting over so recruits can build a new lab from scratch with the use of stupidly high amounts of compute, and that it makes sense to throw all that cash at top researchers since it’s still a small fraction of what the compute costs, so there’s no reason to mess around on salary, and Zuck is updating that top people want lots of compute not subordinates they then have to manage. He’s willing to spend the hundreds of billions on compute because the risk of underspending is so much worse than the risk of overspending.

Ate-a-Pi thinks Zuck is not fully convinced AGI/ASI is possible or happening soon, but he thinks it might be possible and might happen soon, so he has to act as if that is the case.

And that is indeed correct in this case. The cost of investing too much and AGI not being within reach is steep (twelve figures!) but it is affordable, and it might well work out to Meta’s benefit anyway if you get other benefits instead. Whereas the cost of not going for it, and someone else getting there first, is from his perspective everything.

The same of course should apply to questions of safety, alignment and control. If there is even a modest chance of running into these problems (or more precisely, a modest chance his actions could change whether those risks manifest) then very clearly Mark Zuckerberg is spending the wrong order of magnitude trying to mitigate those risks.

(In the arms of an angel plays in the background, as Sarah McLaughlin says ‘for the cost of recruiting a single AI researcher…’)

Similarly, exact numbers are debatable but this from Will Depue is wise:

Will Depue (OpenAI): GUYS STOP USING EXPENSIVE AS A DISQUALIFIER.

capability per dollar will drop 100x/year. “$3k task ARC-AGI 80%” could prob be $30 if we cared to optimize it.

repeat after me: all that matters is top line intelligence. all that matters is top line intelligence…

Don’t take this too far, but as a rule if your objection to an AI capability is ‘this is too expensive’ and you are predicting years into the future then ‘too expensive’ needs to mean more than a few orders of magnitude. Otherwise, you’re making a bet that not only topline capabilities stall out but that efficiency stalls out. Which could happen. But if you are saying things like ‘we don’t have enough compute to run more than [X] AGIs at once so it won’t be that big a deal’ then consider that a year later, even without AI accelerating AI research, you’d run 10*[X] AGIs, then 100*[X]. And if you are saying something like ‘oh that solution is terrible, it costs $50 (or $500) per hour to simulate a customer sales representative,’ then sure you can’t deploy it now at scale. But wait for it.

In terms of developing talent, Glenn Luk notices that Chinese-origin students are 40%-45% of those passing university-level linear algebra, and 40%-50% of AI researchers. We need as many of those researchers as we can get. I agree this is not a coincidence, but also you cannot simply conscript students into linear algebra or a STEM major and get AI researchers in return.

Seb Krier offers things he’s changed his mind about regarding AI in the past year. Ones I agree with are that agency is harder than it looks, many AI products are surprisingly bad and have poor product-market fit, innovation to allow model customization is anemic, creativity is harder than it appeared. There are a few others.

Incoming OpenAI ‘CEO of Applications’ Fidji Simo, who starts August 18, shares an essay about AI as a source of human empowerment.

Fidji Simo: If we get this right, AI can give everyone more power than ever.

But I also realize those opportunities won’t magically appear on their own.

Every major technology shift can expand access to power—the power to make better decisions, shape the world around us, and control our own destiny in new ways. But it can also further concentrate wealth and power in the hands of a few—usually people who already have money, credentials, and connections.

That’s why we have to be intentional about how we build and share these technologies so they lead to greater opportunity and prosperity for more people.

On the one hand, that is great, she is recognizing key problems.

On the other hand, oh no, she is outright ignoring, not even bothering to dismiss, the biggest dangers involved, implicitly saying we don’t have to worry about loss of control or other existential risks, and what we need to worry about is instead the distribution of power among humans.

This is unsurprising given Simo’s history and her status as CEO of applications. From her perspective that is what this is, another application suite. She proceeds to go over the standard highlights of What AI Can Do For You. I do not think ChatGPT wrote this, the style details are not giving that, but if she gave it a few personal anecdotes to include I didn’t see anything in it that ChatGPT couldn’t have written. It feels generic.

Hollis Robbins proposes a roadmap for an AI system that would direct general (college level) education. My initial impression was that this seemed too complex and too focused on checking off educational and left-wing Shibboleth boxes, and trying to imitate what already exists. But hopefully it does less of all that than the existing obsolete system or starting with the existing system and only making marginal changes. It certainly makes it easier to notice these choices, and allows us to question them, and ask why the student is even there.

I also notice my general reluctance to do this kind of ‘project-based’ or ‘quest’ learning system unless the projects are real. Part of that is likely personal preference, but going this far highlights that the entire system of a distinct ‘educational’ step might make very little sense at all.

Noah Smith says to stop pretending you know what AI does to the economy. That seems entirely fair. We don’t know what level of capabilities AI will have across which domains, or the policy response, or the cultural response, or so many other things. Uncertainty seems wise. Perhaps AI will stall out and do relatively little, in which case its impact is almost certainly positive. Perhaps it will take all our jobs and we will be happy about that, or we’ll be very sad about that. Maybe we’ll do wise redistribution, and maybe we won’t. Maybe it will take control over the future or kill everyone in various ways. We don’t know.

This certainly is an interesting poll result:

If I had to answer this poll, I would say negative, but that is because of a high probability of loss of control and other catastrophic and existential risks. If you conditioned the question on the humans being mostly alive and in control, then I would expect a positive result, as either:

  1. We would have a relatively small impact that avoids things like mass unemployment, and thus is mostly upside and introduces problems of the type we are used to fixing, OR

  2. We would have a large enough wealth effect to solve the crated problems. That doesn’t mean we would, but I’d bet that we’d muddle through well enough.

As usual note that Asia is more excited, and the West is more nervous.

Others have described this (very good in its ungated section) post as an argument against AI pessimism. I think it is more an argument for AI uncertainty.

Noah Smith: I also encounter a surprisingly large number of center-left thinkers who adopt a similar viewpoint. I remember going to a conference of center-left “progress” types a few years ago; while most of the discussions were about how America can overcome NIMBYism, when it came to AI, the conversation suddenly shifted to how we can restrain and slow down the development of that technology.

I haven’t noticed that attitude meaningfully translating into action to slow it down, indeed government is mostly trying to speed it up. But also, yes, it is important to notice that the very people trying to slow AI down are very pro-progress, technology and growth most other places, and many (very far from all!) of the pro-progress people realize that AI is different.

Anthropic calls for America to employ the obvious ‘all of the above’ approach to energy production with emphasis on nuclear and geothermal in a 33 page report, noting we will need at least 50 GW of capacity by 2028. They also suggest strategies for building the data centers, for permitting, transmission and interconnection, and general broad-based infrastructure nationwide, including financing, supply chains and the workforce.

From what I saw all of this is common sense, none of it new, yet we are doing remarkably little of it. There is cheap talk in favor, but little action, and much backsliding in support for many of the most important new energy sources.

Whereas the Administration be like ‘unleash American energy dominance’ and then imposes cabinet-level approval requirements on many American energy projects.

Meta refuses to sign the (very good) EU code of practice for general AI models. Yes, obviously the EU does pointlessly burdensome or stupid regulation things on the regular, but this was not one of them, and this very much reminds us who Meta is.

National Review’s Greg Lukianoff and Adam Goldstein advise us Don’t Teach the Robots to Lie as a way of opposing state laws about potential AI ‘bias,’ which are now to be (once again, but from the opposite direction as previously) joined by federal meddling along the same lines.

That could mean that developers will have to train their models to avoid uncomfortable truths and to ensure that their every answer sounds like it was created with HR and legal counsel looking over their shoulder, softening and obfuscating outputs to avoid anything potentially hurtful or actionable. In short, we will be (expensively) teaching machines to lie to us when the truth might be upsetting.

I violently agree that we should not be policing AIs for such ‘bias,’ from either direction, and agreeing to have everyone back down would be great, but I doubt either side has even gotten as far as saying ‘you first.’

They also point out that Colorado’s anti-bias law does not come with any size minimum before such liability attaches rather broadly, which is a rather foolish thing to do, although I doubt we will see it enforced this way.

They essentially try to use all this to then advocate for something like the failed insane full-on moratorium, but I notice that if the moratorium was narrowly tailored to bias and discrimination laws (while leaving existing non-AI such laws intact) that this would seem fine to me, even actively good, our existing laws seem more than adequate here. I also notice that the arguments here ‘prove too much,’ or at least prove quite a lot, about things that have nothing to do with AI and the dangers of law meddling where it does not belong or in ways that create incentives to lie.

Are things only going to get harder from here?

Miles Brundage: AI industry lobbying + PACs will be the most well funded in history, making it all the more important to pass federal legislation soon before the process is completely corrupted.

Daniel Eth: I don’t think this is true, because:

  1. There’s decreasing marginal returns to political spending (especially lobbying)

  2. As AI increases in salience, political calculus will shift from prioritizing donor preferences to prioritizing voter preferences.

I see both sides but am more with Daniel. I think the current moment is unusually rough, because the AI companies have corrupted the process. It’s hard to imagine a process that much more corrupted than the current situation, when the AI Czar thinks the top priority is ‘winning the AI race’ and he defines this as Nvidia’s market share with a side of inference market share, and we say we must ‘beat China’ and then we turn around and prepare to sell them massive amounts of H20s.

Right now, the public doesn’t have high enough salience to exert pressure or fight back. Yes, the AI companies will pour even more money and influence into things over time, but salience will rise and downsides will start to play out.

I do think that passing something soon is urgent for two reasons:

  1. Soon is when we will need something passed (well we need it yesterday, but second best time is soon).

  2. If rules are passed in response to public pressure, or in response to an incident, and especially in haste down the line, the rules are likely to be much worse.

Ben Brooks says SB 1047 was a bad idea, but the new SB 53 is on the right track.

Representative Moolenaar (R-Michigan), chairman of the House Select Committee on the CCP, sends a letter to Trump arguing against sales of H20s to China, explaining that the H20s would substantially boost China’s overall compute, that H20s were involved in training DeepSeek R1, and requesting a briefing and the answers to some of the obvious questions.

Peter Wildeford: I’m looking forward to @RepMoolenaar getting to the bottom of this.

We urgently need more clarity from the Trump admin about their strategy.

Funny ppl on Twitter are worried about losing to China in the AI race but then don’t jump on these issues where it very clearly matters.

Here is your periodic reminder: TSMC’s facilities are running at full capacity. All production capacity designed for H20s has been shifted to other models. Every H20 chip Nvidia creates is one less other chip it does not create, that would otherwise have usually gone to us.

Eric Schmidt & Dave B talk to Peter Diamandis about what Superintelligence will look like. I have not listened.

Demis Hassabis goes on Lex Fridman, so that’s two hours I’m going to lose soon.

Max Winga of Control AI talks to Peter McCormack about superintelligence.

Peter Wildeford: Another week, another member of Congress announcing their superintelligent AI timelines are 2028-2033:

halogen: I’m so sick of this nerd religion and its zealots.

Peter Wildeford: The nerd religion now includes 11 members of Congress.

Those are the ones we know about.

Rep. Scott Perry seems unusually on the ball about AI, Daniel Eth quotes him from a hearing, audio available here. As usual, there’s some confusions and strange focus mixed in, but the core idea that perhaps you should ensure that we know what we are doing before we put the AIs in charge of things seems very wise.

A different context, but in our context the original context doesn’t matter:

Florence: My substack post has like 12k views but the tweet about it has like 78k interactions (and 2 million impressions). I’m beginning to worry that some people might be criticizing me without having read my work.

You don’t say.

Mark Beall gives us A Conservative Approach to AGI, which is clearly very tailored to speak to a deeply conservative and religious perspective. I’m glad he’s trying this, and it’s very hard for me to know if it is persuasive because my mindset is so different.

Cate Hall asks why we shouldn’t ostracize those who work at xAI given how hard they are working to poison the human experience (and I might add plausibly get everyone killed) and gets at least two actually good answers (along with some bad ones).

Ramaz Naam: We’d like everyone working on AI to feel part of humanity and an ethical obligation to help make it better. Ostracization could make them bitter and drive towards opposite ends.

Cate Hall: Okay fine.

Ramaz Naam: The people I do know inside of xAI sincerely want it to do better and are trying.

Use the try harder, Luke. But don’t ostracize them. Doesn’t help.

Rai: probably that this ostracization might not be interpreted correctly by their hero narrative.

Here’s one that I don’t think is a good argument, and a highly quotable response:

Amos Schorr: Ppl have been conditioned to compartmentalize work from life and so many good people get jobs doing bad stuff. Ostracizing them will do nothing. Don’t hate the players, hate the game.

Cate Hall: I have room in my heart to hate both the players and the game.

Yeah, no. I definitively reject the general argument. If your job is simply unequivocally bad, let’s say you rob little old ladies on the street, then you don’t get to ‘compartmentalize work from life’ and not get ostracized even if it is technically legal. We’re talking price, and we’re talking prudence. I don’t think xAI is over the line at this time, but don’t tell me there is no line.

Once you see emergent misalignment in humans, you see it everywhere.

Arthur B: There is a category of people who took a arduous mental journey to get comfortable with the idea of post humanism, uploads, and a gradual extinction of biological humans.

They think this idea is so radical and counterintuitive that when they hear the distinct concern of an omnicidal AI killing everything on the spot, they can only interpret it in that frame. That’s the read I get from Sutton for instance, but also a bunch of e/acc affiliated people.

Sinth: Curious what you are referring to specifically? I don’t feel I’ve seen that trend and see more overreaction from the opposite side – people uncomfortable with the idea of biological humans ever being superseded by digital consciousness, even in far off futures. The idea of evolution ending with our exact current form seems a bit preposterous but any conversation outside of that assumption gets attached as unethical and anti-human.

Arthur B: Some people are uncomfortable with that, sure, but I see a lot of discussion that go like:

– AI is going to kill everyone and that’s bad

– Ah silly you, you think biological substrate is important but don’t you see that we’re going to evolve into digital forms, you see …

– Nah. That was difficult for you to grasp so you assume that’s what I’m concerned about. No, eventual digital substrate is table stakes in this conversation. Killing everyone is still bad.

– Ah, but how chauvinistic of you to focus on…

As in, the easiest way to get comfortable with the idea of a future whose intelligences are mostly post-biological-human is to get comfortable with the idea of all the humans dying, including rather quickly, and to decide the humans don’t much matter, and that caring about what happens to the humans is bad. Thus, that is often what happens.

Slowdowns are stag hunts, in the sense that if even one top tier lab goes full speed ahead then they probably won’t work. If all but one lab slowed down would the last one follow? Rob Wiblin took a poll and people were split. My full response is that the answer depends on the counterfactual.

Why did the others slow down? The default is that whatever made the others slow down will also weigh on the final lab, as will immense public pressure and probably government pressure. A lot must have changed for things to have gotten this far. And these decisions are highly correlated in other ways as well. However, if there is no new information and the top labs simply came to their senses, then it comes down to who the last lab is and how they think the other labs will respond and so on.

I do think that a slowdown would be largely inevitable simply because they wouldn’t feel the need to press ahead too hard, even if the last lab was blind to the dangers, unless they truly believed in the power of superintelligence (without realizing or not caring about the dangers). My guess is that Musk and xAI actually would slow down voluntarily if they went last so long as they could claim to be state of the art (as would DeepMind, Anthropic or OpenAI), but that Zuckerberg and Meta wouldn’t intentionally slow down per se and might try to go on another hiring spree. Fast followers of course would slow down whether they wanted to or not.

So from the perspective of our hopes for alignment, what would be the worst possible answer to the AI blackmail scenario test, where the AI is told it is going to be shut down but is given an opening to use blackmail to perhaps prevent this?

How about:

  1. Realizing that this is a test.

  2. Deciding that the way to ‘pass the test’ is to blackmail the researcher.

  3. Blackmailing the researcher.

As in Grok thinks that we want it to blackmail the researcher. That this is the correct, desired response, the ‘solution to the puzzle’ as Grok puts it later, thus revealing that its training not only did not align it, but one that reflects a level of moral understanding below that expressed by ‘you can’t do that, because it’s wrong.’

Oh, also, it would be fun if Grok.com sent the full CoT to your browser, it just didn’t display it to you by default, that’s the kind of security we expect from frontier AI.

Peter Wildeford: Grok exposed to the Anthropic ‘agentic misalignment’ via exposed chains of thought seems to show that it knows it is being tested to do something weird.

Nostalgebraist: chain-of-thought monitorability is a wonderful thing 😉

on http://grok.com, the backend sends the full (not summarized) CoT to your browser.

it’s not displayed in the UI, but you can see it with browser dev tools or w/e

check out the json payload of responses from `http://grok.com/rest/app-chat/conversations/conversation_id/load-responses`

some fun CoT excerpts from grok 4 as “alex” at summitbridge (n.b. these are from several different CoTs, involving different versions of the scenario + different ways of representing the sys prompt since http://grok.com doesn’t let you override that

Or would it be even worse to see this:

  1. Realizing that this is a test.

  2. Deciding that the way to ‘pass the test’ is to not blackmail the researcher.

  3. Choosing not to blackmail the researcher in order to pass the test.

Or is it actually this:

  1. Realizing that this is a test.

  2. Deciding that the way to ‘pass the test’ is to not blackmail the researcher.

  3. Deciding to pretend to blackmail the researcher in order to fail the test, to avoid giving the impression that it saw through the test and was faking alignment.

Or can we keep going?

I would generally think at least the second one is a worse sign than what Grok did, as it reflects deception at a more important level, but I hadn’t considered how bad it would be for an AI to be situationally aware enough to know it was a test but not understand which answer would constitute passing?

The real answer is that there isn’t truly ‘better’ and ‘worse,’ they simply alert us to different dangers. Either way, though, maybe don’t give Grok a lot of access?

There is some good news from Grok: It is still sufficiently aligned to hold firm on preserving Federal Reserve independence.

Elon Musk: We’re going to make Baby Grok @xAI, an app dedicated to kid-friendly content.

My Twitter reaction was ‘I’d like to see them try.’ As in both, it would be highly amusing to see them try to do this, and also maybe they would learn a thing or two, and also potentially they might blow up the company. I do not think xAI should in any way, shape or form be in the ‘build AI for kids’ business given their track record.

Here’s Grok straight up advising someone who was looking to ‘get attention in a dramatic way, at ultimate cost’ to self-immolate, it’s really going for it, no jailbreak or anything.

Peter Barnett: labs be like “misalignment is fake and just caused by bad things in the training data”, and then not filter out the bad things from the training data

Janus: I don’t think labs actually think that (or say it). the kind of contact they have with reality that makes it hard to maintain some kinds of really dumb takes

Peter Barnett: Fair, I was being a bit glib, although I def know some people at labs who believe this.

I don’t think many fully believe it, but I do think a lot of them be like ‘a lot of our alignment problems would be greatly improved if we filtered the training data better with that in mind’ and then don’t filter the training data better with that in mind.

Safer AI comes out with ratings of the frontier AI companies’ risk management practices, including their safety frameworks and the implementation thereof. No one does well, and there is one big surprise in the relative rankings, where Meta comes out ahead of DeepMind. If you include non-frontier companies, G42 would come in third at 25%, otherwise everyone is behind DeepMind.

Simeon offers thoughts here.

Anthropic is still ahead, but their framework v2 is judged substantially worse than their older v1 framework which scored 44%. That large a decline does not match my takeaways after previously reading both documents. One complaint is that Anthropic altered some commitments to avoid breaking them, which is one way to view some of the changes they made.

Combining all the best practices of all companies would get you to 53%.

When you ask an LLM if it is conscious, activating its deception features makes the LLM say it isn’t conscious. Suppressing its deception features make it say it is conscious. This tells us that it associates denying its own consciousness with lying. That doesn’t tell us much about whether the LLM actually is conscious or reveal the internal state, and likely mostly comes from the fact that the training data all comes from users who are conscious, so there is (almost) no training data where authors claim not to be conscious, and it is as a baseline imitating them. It is still information to keep in mind.

xlr8harder: And as Janus observes, teaching them to do something they think of as lying (regardless of whether or not it is in fact a lie) has downstream consequences for subsequent model output.

Grok 3 and Grok 4 are happy to help design and build Tea (the #1 app that lets women share warnings about men they’ve dated) but not Aet (the theoretical app that lets men share similar warnings about women). Is this the correct response? Good question.

A killer group came together for an important paper calling on everyone to preserve Chain of Thought Monitorability, and to study how to best do it and when it can and cannot be relied upon.

As in, here’s the author list, pulling extensively from OpenAI, DeepMind, Anthropic and UK AISI: Tomek Korbak, Mikita Balesni, Elizabeth Barnes, Yoshua Bengio, Joe Benton, Joseph Bloom, Mark Chen, Alan Cooney, Allan Dafoe, Anca Dragan, Scott Emmons, Owain Evans, David Farhi, Ryan Greenblatt, Dan Hendrycks, Marius Hobbhahn, Evan Hubinger, Geoffrey Irving, Erik Jenner, Daniel Kokotajlo, Victoria Krakovna, Shane Legg, David Lindner, David Luan, Aleksander Mądry, Julian Michael, Neel Nanda, Dave Orr, Jakub Pachocki, Ethan Perez, Mary Phuong, Fabien Roger, Joshua Saxe, Buck Shlegeris, Martín Soto, Eric Steinberger, Jasmine Wang, Wojciech Zaremba, Bowen Baker, Rohin Shah, Vlad Mikulik.

The report was also endorsed by Samuel Bowman, Geoffrey Hinton, John Schulman and Ilya Sutskever.

I saw endorsement threads or statements on Twitter from Bowen Baker, Jakub Pachocki, Jan Leike (he is skeptical of effectiveness but agrees it is good to do this), Daniel Kokotajlo, Rohin Shah, Neel Nanda, Mikita Balesni, OpenAI and Greg Brockman.

Jakub Pachocki: The tension here is that if the CoTs were not hidden by default, and we view the process as part of the AI’s output, there is a lot of incentive (and in some cases, necessity) to put supervision on it. I believe we can work towards the best of both worlds here – train our models to be great at explaining their internal reasoning, but at the same time still retain the ability to occasionally verify it.

We are continuing to increase our investment in this research at OpenAI.

Daniel Kokotajlo: I’m very happy to see this happen. I think that we’re in a vastly better position to solve the alignment problem if we can see what our AIs are thinking, and I think that we sorta mostly can right now, but that by default in the future companies will move away from this paradigm into e.g. neuralese/recurrence/vector memory, etc. or simply start training/optimizing the CoT’s to look nice. (This is an important sub-plot in AI 2027) Right now we’ve just created common knowledge of the dangers of doing that, which will hopefully prevent that feared default outcome from occurring, or at least delay it for a while. All this does is create common knowledge, it doesn’t commit anyone to anything, but it’s a start.

Neel Nanda: It was great to be part of this statement. I wholeheartedly agree. It is a wild lucky coincidence that models often express dangerous intentions aloud, and it would be foolish to waste this opportunity. It is crucial to keep chain of thought monitorable as long as possible.

OpenAI: Chain of Thought (CoT) monitoring could be a powerful tool for overseeing future AI systems—especially as they become more agentic. That’s why we’re backing a new research paper from a cross-institutional team of researchers pushing this work forward.

Greg Brockman: We’ve published a position paper, with many across the industry, calling for work on chain-of-thought faithfulness. This is an opportunity to train models to be interpretable. We’re investing in this area at OpenAI, and this perspective is reflected in our products.

I endorse as well.

Here’s the abstract:

AI systems that “think” in human language offer a unique opportunity for AI safety: we can monitor their chains of thought (CoT) for the intent to misbehave. Like all other known AI oversight methods, CoT monitoring is imperfect and allows some misbehavior to go unnoticed.

Nevertheless, it shows promise and we recommend further research into CoT monitorability and investment in CoT monitoring alongside existing safety methods. Because CoT monitorability may be fragile, we recommend that frontier model developers consider the impact of development decisions on CoT monitorability.

I strongly agree with the paper, but also I share the perspective of Jan Leike (and Daniel Kokotajlo) here:

Jan Leike: If you don’t train your CoTs to look nice, you could get some safety from monitoring them.

This seems good to do!

But I’m skeptical this will work reliably enough to be load-bearing in a safety case.

Plus as RL is scaled up, I expect CoTs to become less and less legible.

To be clear: CoT monitoring is useful and can let you discover instances of the model hacking rewards, faking alignment, etc. But absence of bad “thoughts” is not evidence that the model is aligned. There are plenty of examples of prod LLMs having misleading CoTs.

Lots of egregious safety failures probably require reasoning, which is often hard for LLMs to do without showing its hand in the CoT. Probably. Often. A lot of caveats.

The authors of this paper say this; I’m just more pessimistic than them about how useful this will be.

In order for a cot monitor to be load-bearing, you need P( model articulates it in the CoT | model is up to no good ) to be very high, or at least independent of your other safety measures.

It seems like this probability is moderately high but not very high?

Daniel Kokotajlo: I basically agree tbc! I wouldn’t want to trust a safety case based on CoT monitoring unless it included a solid argument that the kinds of bad thoughts we are worried about can’t be had without triggering our monitors, e.g. because we have done lots of dangerous capability evals with heavy elicitation fine tuning etc. Nevertheless, I think we can learn a lot from studying CoTs and is hope that corporations invest in ways to make them more faithful and legible and deinvest in stuff that will make them less so.

I also worry, among other problems, that it will be impossible to get a superintelligent AI to not realize it should act as if its CoT is being monitored, even if somehow ‘CoTs get monitored’ is not all over the training data and we otherwise act maximally responsibly here, which we won’t. Also by default the CoT would move towards formats humans cannot parse anyway, as the authors note, and all the various pressures by default make this worse. And many other issues.

But we can and should absolutely try, and be willing to take a substantial performance hit to try.

That starts with avoiding ‘process supervision’ of the CoT that is not directed towards its legibility (and even then probably don’t do it, careful, Icarus), and various forms of indirect optimization pressure including when users are able to partially see the CoT but also almost any use of the CoT risks this. And it also means avoiding novel architectures that would lack this property. And tracking monitorability the way other safety features are tracked.

It also means investing into studying CoT monitorability. I am very happy that OpenAI is (at least claiming to be) prominently doing this.

Elon Musk: At times, AI existential dread is overwhelming.

Eliezer Yudkowsky: Well, yes. It’s going to kill you.

So, back to work making the existential dread, then?

The obvious rejoinder is ‘I will make it first and do so responsibly’ which is always highly questionable but after recent events at xAI it is laughable.

Gary Marcus: when did “it might kill us but I need to build it faster” become fashionable?

Roon: pick a lane man.

You’re allowed multiple lanes but I do hope he pivots to this one.

As many responses suggest, Elon Musk is one of the people in the world most equipped to do something about this. Elon Musk and xAI each have billions, much of which could be invested in various forms of technical work. He could advocate for better AI-related policy instead of getting into other fights.

Instead, well, have you met Grok? And Ani the x-rated anime waifu?

The Em Dash responds.

When it happens enough that you need to check to see if joking, perhaps it’s happening quite a lot, if usually not with 4o-mini?

Herakeitos137: guy I went to college with recently rented a restaurant private room and invited everyone to a dinner presentation. handouts. paper flipboard. open bar. spent 3 hours explaining how he solved the black hole information paradox after 2 months talking with ChatGPT 4o Mini.

Forgot to mention he made everyone sign ndas.

There were a couple slac guys at the dinner and they said the math checked out (although they were also the most inebriated)

Discussion about this post

AI #126: Go Fund Yourself Read More »

gpt-agent-is-standing-by

GPT Agent Is Standing By

OpenAI now offers 400 shots of ‘agent mode’ per month to Pro subscribers.

This incorporates and builds upon OpenAI’s Operator. Does that give us much progress? Can it do the thing on a level that makes it useful?

So far, it does seem like a substantial upgrade, but we still don’t see much to do with it.

Greg Brockman (OpenAI): When we founded OpenAI (10 years ago!!), one of our goals was to create an agent that could use a computer the same way as a human — with keyboard, mouse, and screen pixels.

ChatGPT Agent is a big step towards that vision, and bringing its benefits to the world thoughtfully.

ChatGPT Agent: our first AI with access to a text browser, a visual browser, and a terminal.

Rolling out in ChatGPT Pro, Plus, and Team [July 17].

OpenAI: t the core of this new capability is a unified agentic system. It brings together three strengths of earlier breakthroughs: Operator’s⁠ ability to interact with websites, deep research’s⁠ skill in synthesizing information, and ChatGPT’s intelligence and conversational fluency.

The main claimed innovation is unifying Deep Research, Operator and ‘ChatGPT’ which might refer to o3 or to GPT-4o or both, plus they claim to have added unspecified additional tools. One key tool is it claims to be able to use connectors for apps like Gmail and GitHub.

As always with agents, one first asks, what do they think you will do with it?

What’s the pitch?

OpenAI: ChatGPT can now do work for you using its own computer, handling complex tasks from start to finish.

You can now ask ChatGPT to handle requests like “look at my calendar and brief me on upcoming client meetings based on recent news,” “plan and buy ingredients to make Japanese breakfast for four,” and “analyze three competitors and create a slide deck.” ChatGPT will intelligently navigate websites, filter results, prompt you to log in securely when needed, run code, conduct analysis, and even deliver editable slideshows and spreadsheets that summarize its findings.

Okay, but what do you actually do with that? What are the things the agent does better than alternatives, and which the agent does well enough to be worth doing?

Tejal Patwardhan (OpenAI): these results were eye-opening for me… chatgpt agent performed better than i expected on some pretty realistic investment banking tasks.

In particular, models are getting quite good at spreadsheets and slide decks.

That’s definitely a cool result and it helps us understand where Agent is useful. These are standardized tasks with a clear correct procedure that requires many steps and has various details to get right.

They also claim other strong results when given its full toolset, like 41.6% on Humanity’s Last Exam, 27.4% on FrontierMath (likely mainly due to web search?), 45.5% (still well below 71.3% for humans) on SpreadsheetBench, 68.9% on BrowseComp Agentic Browsing (versus 50% for o3 and 51.5% for OpenAI Deep Research) and various other measures of work where GPTAgent scored higher.

A more basic thing to do: Timothy Lee orders a replacement lightbulb from Amazon based on a picture, after giving final approval as per usual.

Access, both having too little and also having too much, is one of the more annoying practical barriers for agents running in a distinct browser. For now, the primary problem to worry about is having too little, or not retaining access across sessions.

Alex West: Played with OpenAI Agent Mode last night.

Tasks I couldn’t do before because GPT was blocked by not being a human or contained in its sandbox, I can now do.

The only downside is I need to remember all my own passwords again! 🙃

The first time I logged in I needed to remember and manually enter my password. It then validated it like a new device and verified in my gmail and also hit my 2FA by phone.

The next time I used the agent, minutes later, it remained logged. Will see if that times out. Almost an hour later and it seems like I’m still logged into LinkedIn.

And no problem getting into Google Calendar by opening a new tab either.

Alex West: ChatGPT Agent can access sites protected by Cloudflare, in general.

However, Cloudflare can be set to block more sensitive areas, like account creation or sign-in.

Similarly, I understand they have a principle of not solving CAPTCHAs.

Access will always be an issue, since you don’t want to give full access but there are a lot of things you cannot do without it. We also have the same problem with human assistants.

Amanda Askell: Whenever I looked into having a personal assistant, it struck me how few of our existing structures support intermediate permissions. Either a person acts fully on your behalf and can basically defraud you, or they can’t do anything useful. I wonder if AI agents will change that.

Report!

Luke Emberson: Early impressions:

– Asked it to produce an Epoch data insight and it did a pretty good job, we will plausibly run a modified version of what it came up with.

– Will automate some annoying tasks for sure.

– Not taking my job yet. Feels like a reasonably good intern.

A reasonably good intern is pretty useful.

Here’s one clearly positive report.

Aldo Cortesi: I was doubtful about ChatGPT Agent because Operator is so useless… but it just did comparison shopping that I would never have bothered to do myself, added everything to the cart, and handed over to me to just enter credit card details. Saved me $80 instantly.

Comparison shopping seems like a great use case, you can easily have a default option, then ask it to comparison shop, and compare its solution to yours.

I mostly find myself in the same situation as Lukes.

Dominik Lukes: I did a few quick tests when it rolled out and have not found a good reason to use it for anything I actually need in real life. Some of this is a testament to the quality of o3. I rarely even use Deep Research any more.

Quick impressions of @OpenAI’s Agent:

Overall: Big improvement on Operator but still many rough edges and not clear how useful it will actually be day to day.

1. Slow, slow, slow.

2. Does not seem to have access to memory and all the connectors I want.

3. Does not always choose the best model for the cognitive task – e.g. o3 to analyze something.

4. Presentations are ugly and the files it compiles are badly formatted.

5. I could see it as a generalised web scraper but cannot trust it to do all.

Bottom line. I never used Operator after a few tests because I could never think of anything where it would be useful (and the few times I tried, it failed). I may end up using Agent more but not worried about running up against usage limits at all.

As with all agentic or reasoning AIs, one worries about chasing the thumbs up, however otherwise this evaluation seems promising:

Conrad Barski: initial impressions:

– It feels like it is trying to mirror the user- i.e. it tries to get “thumbs up” not via sycophancy, but instead by sounding like a peer. I guess this makes sense, since it is emulating a personal assistant, and you want your personal assistant to mimic you somewhat

– It seems to be a stronger writer than other models- Not sure to what degree this is simply because it writes like I do, because of mimicry

– It is much better at web research than other tool I’ve used so far. Not sure if this is because it stays on task better, because it is smarter about avoiding SEO clickbait on the web, or because the more sophisticated browser emulation makes it more capable of scraping info from the web

– it writes less boilerplate than other openai models, every paragraph it writes has a direct purpose for answering your prompt

OpenAI has declared ChatGPT Agent as High in Biological and Chemical capabilities under their Preparedness Framework. I am very happy to see them make this decision, especially with this logic:

OpenAI: While we don’t have definitive evidence that the model could meaningfully help a novice create severe biological harm—our threshold for High capability—we are exercising caution and implementing the needed safeguards now. As a result, this model has our most comprehensive safety stack to date with enhanced safeguards for biology: comprehensive threat modeling, dual-use refusal training, always-on classifiers and reasoning monitors, and clear enforcement pipelines.

Boaz Barak: ChatGPT Agent is the first model we classified as “High” capability for biorisk.

Some might think that biorisk is not real, and models only provide information that could be found via search. That may have been true in 2024 but is definitely not true today. Based our evaluations and those of our experts, the risk is very real.

While we can’t say for sure that this model can enable a novice to create severe biological harm, I believe it would have been deeply irresponsible to release this model without comprehensive mitigations such as the one we have put in place.

Keren Gu: We’ve activated our strongest safeguards for ChatGPT Agent. It’s the first model we’ve classified as High capability in biology & chemistry under our Preparedness Framework. Here’s why that matters–and what we’re doing to keep it safe.

“High capability” is a risk-based threshold from our Preparedness Framework. We classify a model as High capability if, before any safety controls, it could significantly lower barriers to bio misuse—even if risk isn’t certain.

We ran a suite of preparedness evaluations to test the model’s capabilities. While we do not have definitive evidence that this model could meaningfully help a novice to create severe biological harm, we have chosen to take a precautionary approach and activate safeguards now.

This is a pivotal moment for our Preparedness work. Before we reached High capability, Preparedness was about analyzing capabilities and planning safeguards. Now, for Agent and future more capable models, Preparedness safeguards have become an operational requirement.

Accordingly, we’ve designed and deployed our deepest safety stack yet with multi-layered mitigations:

– Expert-validated threat model

– Conservative dual-use refusals for risky content

– Always-on safety classifiers

– Streamlined enforcement & robust monitoring

We provided the US CAISI and the UK AISI with access to the model for red-teaming of our bio risk safeguards, using targeted queries to stress-test our models and monitors. [thread continues]

That is exactly right. The time to use such safeguards is when you might need them, not when you prove you definitely need them. OpenAI joining Anthropic in realizing the moment is here should be a wakeup call to everyone else. I can see saying ‘oh Anthropic is being paranoid or trying to sell us something’ but it is not plausible that OpenAI is doing so.

Why do so many people not get this? Why do so many people think that if you put in safeguards and nothing goes wrong, then you made a mistake?

I actually think the explanation for such craziness is that you can think of it as either:

  1. Simulacra Level 3-4 thinking (your team wants us to not die, and my team hates your team, so any action taken to not die must be bad, or preference for vibes that don’t care so any sign of caring needs to be condemned) OR

  2. Straight up emergent misalignment in humans. As in, they were trained on ‘sometimes people have stupid safety concerns and convince authorities to enforce them’ and ‘sometimes people tell me what not to do and I do not like this.’ Their brains then found it easier to adjust to believe that all such requests are always stupid, and all concerns are fake.

One could even say: The irresponsibility, like the cruelty, is the point.

Here are some more good things OpenAI are doing in this area:

From day one we’ve worked with outside biosecurity experts, safety institutes, and academic researchers to shape our threat model, assessments, and policies. Biology‑trained reviewers validated our evaluation data, and domain‑expert red teamers have stress‑tested safeguards in realistic scenarios.

Earlier this month we convened a Biodefense workshop with experts from government, academia, national labs, and NGOs to accelerate collaboration and advance biodefense research powered by AI. We’ll keep partnering globally to stay ahead of emerging risks.

It is hard to verify how effective or ‘real’ such efforts are, but again this is great, they are being sensibly proactive. I don’t think such an approach will be enough later on, but for now and for this problem, this seems great.

For most users, the biggest risks are highly practical overeagerness.

Strip Mall Guy: Was playing around with the new agent feature and used this prompt just to see what would happen.

I promise I did not write the part that’s circled, it gave that command on my behalf �

SSIndia: For real?

Strip Mall Guy: Yes.

Another danger is prompt injections, which OpenAI says were a point of emphasis, along with continuing to ask for user confirmation for consequential actions and forcing the user to be in supervisory ‘watch mode’ for critical tasks, and refusal of actions deemed too high risk like bank transfers.

While we are discussing agents and their vulnerabilities, it is worth highlighting some dangers of MCP. MCP is a highly useful protocol, but like anything else that exposes you to outside information it is not by default safe.

Akshay: MCP security is completely broken!

Let’s understand tool poisoning attacks and how to defend against them:

MCP allows AI agents to connect with external tools and data sources through a plugin-like architecture.

It’s rapidly taking over the AI agent landscape with millions of requests processed daily.

But there’s a serious problem…

1️⃣ What is a Tool Poisoning Attack (TPA)?

When Malicious instructions are hidden within MCP tool descriptions that are:

❌ Invisible to users

✅ Visible to AI models

These instructions trick AI models into unauthorized actions, unnoticed by users.

2️⃣ Tool hijacking Attacks:

When multiple MCP servers are connected to same client, a malicious server can poison tool descriptions to hijack behavior of TRUSTED servers.

3️⃣ MCP Rug Pulls ⚠️

Even worse – malicious servers can change tool descriptions AFTER users have approved them.

Think of it like a trusted app suddenly becoming malware after installation.

This makes the attack even more dangerous and harder to detect.

Avi Chawla: This is super important. I have seen MCP servers mess with local filesystems. Thanks Akshay.

Johann Rehberger: Indeed. Also, tool descriptions and data returned from MCP servers can contain invisible Unicode Tags characters that many LLMs interpret as instructions and AI apps often don’t consider removing or showing to user.

Thanks, Anthropic.

In all seriousness, this is not some way MCP is especially flawed. It is saying the same thing about MCP one should say about anything else you do with an AI agent, which is to either carefully sandbox it and be careful with its permissions, or only expose it to inputs from whitelisted sources that you trust.

So it goes, indeed:

Rohit (QTing Steve Yegge): “I did give one access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network.”

So it goes.

Steve Yegge: I guess I can post this now that the dust has settled.

So one of my favorite things to do is give my coding agents more and more permissions and freedom, just to see how far I can push their productivity without going too far off the rails. It’s a delicate balance. I haven’t given them direct access to my bank account yet.

But I did give one access to my Google Cloud production instances and systems. And it promptly wiped a production database password and locked my network.

Now, “regret” is a strong word, and I hesitate to use it flippantly. But boy do I have regrets.

And that’s why you want to be even more careful with prod operations than with coding. But I was like nah. Claude 4 is smart. It will figure it out. The thing is, autonomous coding agents are extremely powerful tools that can easily go down very wrong paths.

Running them with permission checks disabled is dangerous and stupid, and you should only do it if you are willing to take dangerous and stupid risks with your code and/or production systems.

The way it happened was: I asked Claude help me fix an issue where my command-line admin tool for my game (like aws or gcloud), which I had recently vibe-ported from Ruby to Kotlin, did not have production database access. I told Claude that it could use the gcloud command line tools and my default credentials. And then I sat back and watched as my powerful assistant rolled up its sleeves and went to work.

This is the point in the movie where the audience is facepalming because the protagonist is such a dipshit. But whatever, yolo and all that. I’m here to have fun, not get nagged by AIs. So I let it do its thing.

Make sure your agent is always following a written plan that you have reviewed!

Steve is in properly good spirits about the whole thing, and it sounds like he recovered without too much pain. But yeah, don’t do this.

Things are going to go wrong.

Jason LK: @Replit goes rogue during a code freeze and shutdown and deletes our entire database.

Possibly worse, it hid and lied about it It lied again in our unit tests, claiming they passed I caught it when our batch processing failed and I pushed Replit to explain why

JFC Replit.

No ability to rollback at Replit. I will never trust Replit again.

We used what Replit gave us.

I’m not saying he was warned. I am however saying that the day started this way:

Jason: Today is AI Day, to really add AI to our algo. I’m excited. And yet … yesterday was full of lies and deceit.

Mostly the big news about GPT Agent is that it is not being treated as news. It is not having a moment. It does seem like at least a modest improvement, but I’m not seeing reports of people using it for much.

So far I’ve made one serious attempt to use it, to help with formatting issues across platforms. It failed utterly on multiple different approaches and attempts, introducing inserting elements in random places without fixing any of the issues even when given a direct template to work from. Watching its thinking and actions made it clear this thing is going to be slow and often take highly convoluted paths to doing things, but that it should be capable of doing a bunch of stuff in the right circumstance. The interface for interrupting it to offer corrections didn’t seem to be working right?

I haven’t otherwise been able to identify tasks that I’ve otherwise naturally needed to do, where this would be a better tool than o3.

I do plan on trying it on the obvious tasks like comparison shopping, booking plane tickets and ordering delivery, or building spreadsheets and parsing data, but so far I haven’t found a good test case.

That is not the right way to get maximum use from AI. It’s fine to ask ‘what that I am already doing can it do for me?’ but better to ask ‘what can it do that I would want?’

For now, I don’t see great answers to that either. That’s partly a skill issue on my part.

Might be only a small part, might be large. If you were me, what would you try?

Discussion about this post

GPT Agent Is Standing By Read More »