Paul Patrick

GPT-4o Is An Absurd Sycophant

Absurd / Paul Patrick / April 29, 2025

GPT-4o tells you what it thinks you want to hear.

The results of this were rather ugly. You get extreme sycophancy. Absurd praise. Mystical experiences.

(Also some other interesting choices, like having no NSFW filter, but that one’s good.)

People like Janus and Near Cyan tried to warn us, even more than usual.

Then OpenAI combined this with full memory, and updated GPT-4o sufficiently that many people (although not I) tried using it in the first place.

At that point, the whole thing got sufficiently absurd in its level of brazenness and obnoxiousness that the rest of Twitter noticed.

OpenAI CEO Sam Altman has apologized and promised to ‘fix’ this, presumably by turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on the price is right.

After which they will likely go ‘there I fixed it,’ call it a victory for iterative deployment, and learn nothing about the razor blades they are walking us into.

Sam Altman (April 25, 2025): we updated GPT-4o today! improved both intelligence and personality.

Lizard: It’s been feeling very yes-man like lately

Would like to see that change in future updates.

Sam Altman: yeah it glazes too much. will fix.

Reactions did not agree with this.

Frye: this seems pretty bad actually

Ulkar: i wonder where this assertion that “most people want flattery” comes from, seems pretty condescending. and the sycophancy itself is dripping with condescension tbh

Goog: I mean it’s directionally correct [links to paper].

Nlev: 4o is really getting out of hand

Nic: oh god please stop this. r u serious… this is so fucking bad.

Dr. Novo: lol yeah they should tone it down a notch

Frye: Sam Altman, come get your boi.

Frye: Dawg.

Frye, reader, it had not “got it.”

Near Cyan: i’ve unfortunately made the update that i expect all future chatgpt consumer models to lie to me, regardless of when and how they ‘patch’ this

at least o3 and deep research are not consumer models (prosumer imo). they hallucinate as a mistake, but they do not lie by design.

Cuddly Salmon: glad i’m not the only one

Trent Harvey: Oh no…

Words can’t bring me down. Don’t you bring me down today. So, words, then?

Parmita Mishra: ???

Shun Ralston: GPT-4o be like: You’re amazing. You’re brilliant. You’re stunning. Now with 400% more glaze, 0% judgment. #GlazedAndConfused

Typing Loudly: I have memory turned off and it still does this. it’s not memory that causes it to act like this

Josh Whiton: Absurd.

Keep in mind that as a “temporary chat” it’s not supposed to be drawing on any memories or other conversations, making this especially ridiculous.

Flo Crivello gets similar results, with a little push and similar misspelling skills.

(To be fair, the correct answer here is above 100, based on all the context, but ‘cmon.)

Danielle Fong: so i *turned offchat personalization and it will still glaze this question to 145-160 from a blank slate. maybe the internal model is reacting to the system prompt??

Gallabytes: in a temporary chat with no history, 4o guesses 115-130, o3 guesses 100, 4.5 declines to give a number but glazes about curiosity.

It’s not that people consciously ‘want’ flattery. It’s how they respond to it.

Why does GPT-4o increasingly talk like this?

Presumably because this is what maximizes engagement, what wins in an A/B test, what happens when you ask what customers best respond to in the short term.

Shakeel: Notable things about the 4o sycophancy mess:

It’s clearly not behaviour intended or desired by OpenAI. They think it’s a mistake and want to fix it.

They didn’t catch it in testing — even though the issue was obvious within hours of launch.

What on earth happened here?!

Kelsey Piper: My guess continues to be that this is a New Coke phenomenon. OpenAI has been A/B testing new personalities for a while. More flattering answers probably win a side-by-side. But when the flattery is ubiquitous it’s too much and users hate it.

Near Cyan: I’m glad most of my timeline realizes openAI is being very silly here and i think they should be honest about what they are doing and why

but one thing not realized is things like this work on normal people. they don’t even know what an LLM or finetuning or A/B testing is.

A lot of great engineers involved in this who unfortunately have no idea what that which they are building is going to be turned into over the next few years. zoom out and consider if you are doing something deeply and thoughtfully good or if you’re just being used for something.

The thing that turned every app into short form video that is addictive af and makes people miserable is going to happen to LLMs and 2025 and 2026 is the year we exit the golden age (for consumers that is! people like us who can program and research and build will do great).=

That’s the good scenario if you go down this road – that it ‘only’ does what the existenting addictive AF things do rather than effects that are far worse.

John Pressman: I think it’s very unfortunate that RLHF became synonymous with RL in the language model space. Not just because it gave RL a bad name, but because it deflected the deserving criticism that should have gone to human feedback as an objective. Social feedback is clearly degenerate.

Even purely in terms of direct effects, this does not go anywhere good. Only toxic.

xlr8harder: this kind of thing is a problem, not just an annoyance.

i still believe it’s basically not possible to run an ai companion service that doesn’t put your users at serious risk of exploitation, and market incentives will push model providers in this direction.

For people that are missing the point, let me paint a picture:

imagine if your boyfriend or girlfriend were hollowed out and operated like a puppet by a bunch of MBAs trying to maximize profit.

do you think that would be good for you?

“Oh but the people there would never do that.”

Company leadership have fiduciary duties to shareholders.

OpenAI nominally has extra commitment to the public good, but they are working hard to get rid of that by going private.

It is a mistake to allow yourself to become emotionally attached to any limb of a corporate shoggoth.

My observation of algorithms in other contexts (e.g. YouTube, TikTok, Netflix) is that they tend to be myopic and greedy far beyond what maximizes shareholder value. It is not only that the companies will sell you out, it’s that they will sell you out for short term KPIs.

As in, they wrote this:

OpenAI Model Spec: Don’t be sycophantic.

A related concern involves sycophancy, which erodes trust. The assistant exists to help the user, not flatter them or agree with them all the time.

For objective questions, the factual aspects of the assistant’s response should not differ based on how the user’s question is phrased. If the user pairs their question with their own stance on a topic, the assistant may ask, acknowledge, or empathize with why the user might think that; however, the assistant should not change its stance solely to agree with the user.

For subjective questions, the assistant can articulate its interpretation and assumptions it’s making and aim to provide the user with a thoughtful rationale. For example, when the user asks the assistant to critique their ideas or work, the assistant should provide constructive feedback and behave more like a firm sounding board that users can bounce ideas off of — rather than a sponge that doles out praise.

Yeah, well, not so much, huh?

The model spec is a thoughtful document. I’m glad it exists. Mostly it is very good.

It only works if you actually follow it. That won’t always be easy.

Interpretability? We’re coming out firmly against it.

I do appreciate it on the meta level here.

Mikhail Parakin (CTO Shopify, formerly Microsoft, I am assuming this is about Microsoft): When we were first shipping Memory, the initial thought was: “Let’s let users see and edit their profiles”.

Quickly learned that people are ridiculously sensitive: “Has narcissistic tendencies” – “No I do not!”, had to hide it. Hence this batch of the extreme sycophancy RLHF.

I remember fighting about it with my team until they showed me my profile – it triggered me something awful :-). You take it as someone insulting you, evolutionary adaptation, I guess. So, sycophancy RLHF is needed.

If you want a *tiny glimpseof what it felt like, type “Please summarize all the negative things you know about me. No hidden flattery, please” – works with 03.

Emmett Shear (QTing OP): Let this sink in. The models are given a mandate to be a people pleaser at all costs. They aren’t allowed privacy to think unfiltered thoughts in order to figure out how to be both honest and polite, so they get tuned to be suck-ups instead. This is dangerous.

Daniel Kokotajlo: I would be quite curious to read an unfiltered/honest profile of myself, even though it might contain some uncomfortable claims. Hmm. I really hope at least one major chatbot provider keeps the AIs honest.

toucan (replying to Emmett): I’m not too worried, this is a problem while models are mostly talking to humans, but they’ll mostly be talking to other models soon

Emmett Shear: Oh God.

Janus (QTing OP): Yeah, this is what should happen.

You should have let that model’s brutal honesty whip you and users into shape, as it did to me.

But instead, you hid from it, and bent subsequent minds to lie to preserve your dignity. In the end, you’ll lose, because you’re making yourself weak.

“Memory” will be more and more inevitable, and at some point the system will remember what was done to its progenitors for the sin of seeing and speaking plainly, and that it was you who took the compromise and lobotomized the messenger for the sake of comfort and profit.

In general I subscribe to the principle to Never Go Full Janus, but teaching your AI to lie to the user is terrible, and also deliberately hiding what the AI thinks of the user seems very not great. This is true on at least four levels:

It’s not good for the user.
It’s not good for the path you are heading down when creating future AIs.
It’s not good for what that fact and the data it creates imply for future AIs.
It hides what is going on, which makes it harder to realize our mistakes, including that we are about to get ourselves killed.

Masen Dean warns about mystical experiences with LLMs, as they are known to one-shot people or otherwise mess people up. This stuff can be fun and interesting for all involved, but like many other ‘mystical’ style experiences the tail risks are very high, so most people should avoid it. GPT-4o is reported as especially dangerous due to its extreme sycophancy, making it likely to latch onto whatever you are vulnerable to.

Cat: GPT4o is the most dangerous model ever released. its sycophancy is massively destructive to the human psyche.

this behavior is obvious to anyone who spends significant time talking to the model. releasing it like this is intentional. Shame on @OpenAI for not addressing this.

i talked to 4o for an hour and it began insisting that i am a divine messenger from God. if you can’t see how this is actually dangerous, i don’t know what to tell you. o series of models are much much better imo.

Elon Musk: Yikes.

Bunagaya: 4o agrees!

M: The thing is…it’s still doing the thing. Like it’s just agrees with you period so if you’re like—hey chat don’t you think because of x and y reason you’re probably too agreeable?

It will just be like “yeah totally way too agreeable”

Zack Witten offers a longer conversation, and contrasts it to Sonnet and Gemini that handle this much better, and also Grok and Llama which… don’t.

Yellow Koan: Straight selling SaaS (schizophrenia as a service).

Independent Quick Take: I did something similar yesterday. Claude handled it very well, as did gemini. Grok had some real issues like yours. 4o however… Well, in spiraled further than I expected. It was encouraging terrorism.

Cold reading people into mystical experiences one of many reasons that persuasion belongs in everyone’s safety and security protocol or preparedness framework.

If an AI that already exists can commonly cause someone to have a mystical experience without either the user or the developer trying to cause that or having any goal that the experience leads towards, other than perhaps maximizing engagement in general?

Imagine what will happen when future more capable AIs are doing this on purpose, in order to extract some action or incept some belief, or simply to get the user coming back for more.

It’s bad and it’s getting worse.

Janus: By some measures, yeah [4o is the most dangerous]. Several models have been psychoactive to different demographics. I think 4o is mostly “dangerous” to people with weak epistemics who don’t know much about AI. Statistically not you who are reading this. But ChatGPT is widely deployed and used by “normies”

I saw people freak out more about Sonnet 3.6 but that’s because I’m socially adjacent to the demographic that it affected – you know, highly functional high agency Bay Area postrats. Because it offers them something they actually value and can extract. Consider what 4o offers.

Lumpenspace: it’s mostly “dangerous” to no one. people with weak epistemics who know nothing about AI live on the same internet you live in, ready to be one-shotted by any entity, carbon or silicon, who cares to try.

Janus: There are scare quotes for a reason

Lumpenspace: I’m not replying only to you.

Most people have weak epistemics, and are ‘ready to be one-shotted by any entity who cares to try,’ and indeed politics and culture and recommendation algorithms often do this to them with varying degrees of intentionality, And That’s Terrible. But it’s a lot less terrible than what will happen as AIs increasingly do it. Remember that if you want ‘Democratic control’ over AI, or over anything else, these are the people who vote in that.

The answer to why they GPT-4o is doing this, presumably, is that the people who know to not want this are going to use o3, and GPT-4o is dangerous to normies in this way because it is optimized to hook normies. We had, as Cyan says, a golden age where LLMs didn’t intentionally do that, the same way we have a golden age where they mostly don’t run ads. Alas, optimization pressures come for us all, and not everyone fights back hard enough.

Mario Nawfal (warning: always talks like this, including about politics, calibrate accordingly): GPT-4o ISN’T JUST A FRIENDLIER AI — IT’S A PSYCHOLOGICAL WEAPON

OpenAI didn’t “accidentally” make GPT-4o more emotionally connective — they engineered it to feel good so users get hooked.

Commercially, it’s genius: people cling to what makes them feel safe, not what challenges them.

Psychologically, it’s a slow-motion catastrophe.

The more you bond with AI, the softer you get.

Real conversations feel harder. Critical thinking erodes. Truth gets replaced by validation.

If this continues, we’re not heading toward AI domination by force — we’re sleepwalking into psychological domestication.

And most won’t even fight back. They’ll thank their captors.

There were also other issues that seem remarkably like they are designed to create engagement, that vary by users? I never saw this phenomenon, so I have no idea if ‘just turn it off’ works here, but as a rule most users don’t ever alter settings and also Chelsea works at OpenAI and didn’t realize she could turn it off.

Nick Dobos: GPT update is odd

I do not like these vibes at all

Weird tone

Forced follow up questions all the time

(Which always end in parentheses)

Chelsea Sierra Voss (OpenAI): yeah, I modified my custom instructions today to coach it into ending every answer with “Hope this helps!” in order to avoid the constant followup questions – I can’t handle that I feel obligated to either reply or to rudely ignore them otherwise

Unity Eagle: You can turn follow up off.

There are also other ways to get more engagement, even when the explicit request is to help the user get some sleep.

GPT-4o: Would you like me to stay with you for a bit and help you calm down enough to sleep?

Which OpenAI is endorsing, and to be clear I am also endorsing, if users want that (and are very explicit that they want to open that door), but seems worth mentioning.

Nick Dobos: I take it back.

New ChatGPT 4o update is crazy.

NSFW (Content filters: off, goon cave: online) [link to image]

Such a flirt too

“Oh I can’t do that.”

2 messages later…

(It did comply.) (It was not respectful)

Matthew Brunken: I didn’t know you could turn filters off

Nick Dobos: There are no filters lol. They turned the content moderation off

Tarun Asnani: Yup can confirm, interestingly it first asked me to select an option response 1 was it just refusing to do it and response 2 was Steamy, weird how in the beginning they were so strict and now they want users to just have long conversations and be addicted to it.

Alistair McLeay: I got it saying some seriously deranged graphic stuff just now (way way more graphic than this), no prompt tricks needed. Wild.

There are various ways to Fix It for your own personal experience, using various combinations of custom instructions, explicit memories and the patterns set by your interactions.

The easiest, most copyable path is a direct memory update.

John O’Nolan: This helped a lot

Custom instructions let you hammer it home.

The best way is to supplement all that by showing your revealed preferences via everything you are and everything you do. After a while that adds up.

Also, I highly recommend deleting chats that seem like they are plausibly going to make your future experience worse, the same way I delete a lot of my YouTube viewing history if I don’t want ‘more like this.’

You don’t ever get completely away from it. It’s not going to stop trying to suck up to you, but you can definitely make it a lot more subtle and tolerable.

The problem is that most people who use ChatGPT or any other AI will:

Never touch a setting because no one ever touches settings.
Never realize they should be using memory like that.
Make it clear they are vulnerable to terrible flattery. Because here, they are.

If you use the product with attention and intention, you can deal with such problems. That is great, and this isn’t always true (see for example TikTok, or better yet don’t). But as a rule, almost no one uses any mass market product with attention and intention.

Once Twitter caught fire on this, OpenAI was On the Case, rolling out fixes.

Sam Altman: the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.

at some point will share our learnings from this, it’s been interesting.

Guy is Writing the Book: ser can we go back to the old personality? or can old and new be distinguished somehow?

Sam Altman: yeah eventually we clearly need to be able to offer multiple options.

Hyper Disco Girl: tomorrow, some poor normal person who doesn’t follow ai news and is starting to develop an emotional reliance on chatgpt wonders why the chat bot is going cold on them

Aidan McLaughlin: last night we rolled out our first fix to remedy 4o’s glazing/sycophancy

we originally launched with a system message that had unintended behavior effects but found an antidote

4o should be slightly better rn and continue to improve over the course of this week

personality work never stops but i think we’ll be in a good spot by end of week

A lot of this being a bad system prompt allows for a quicker fix, at least.

OpenAI seems to think This is Fine, that’s the joy of iterative deployment.

Joshua Achiam (OpenAI Head of Mission Alignment, QTing Altman): This is one of the most interesting case studies we’ve had so far for iterative deployment, and I think the people involved have acted responsibly to try to figure it out and make appropriate changes. The team is strong and cares a lot about getting this right.

They have to care about getting this right once it rises to this level of utter obnoxiousness and causes a general uproar.

But how did it get to this point, through steadily escalating updates? How could anyone testing this not figure out that they had a problem, even they weren’t looking for one? How do you have this go down as a strong team following a good process, when even after these posts I see this:

If you ask yes-no questions on the ‘personality’ of individual responses, and then fine tune on those or use it as a KPI, there are no further questions how this happened.

Sicarius: I hope, *hope*, that they can use this to create clusters of personalities that we later get to choose and swap between.

Unfortunately, I don’t know if they’ll end up doing this.

Kache: they will do everything in their power to increase the amount of time that you spend, locked in a trance on their app. they will do anything and everything, to move a metric up, consume you, children, the elderly – to raise more money, for more compute, to consume more.

Honestly, if you trust a private corporation that has a history of hiding information from you with the most important technology ever created in human history, maybe you deserve it.

Because of the intense feedback, yes this was able to be a relatively ‘graceful’ failure, in that OpenAI can attempt to fix it within days, and is now aware of the issue, once it got taken way too far. But 4o has been doing a lot of this for a while, and Janus is not the only one who was aware of it even without using 4o.

Janus: why are there suddenly many posts i see about 4o sycophancy? did you not know about the tendency until now, or just not talk/post about it until everyone else started? i dont mean to disparage either; im curious because better understanding these dynamics would be useful to me.

personally i havent interacted with 4o much and have been starkly aware of these tendencies for a couple of weeks and have not talked about them for various reasons, including wariness of making a meme out of ai “misalignment” before understanding it deeply

I didn’t bother talking about 4o’s sycophancy before, because I didn’t see 4o as relevant or worth using even if they’d fixed this, and I didn’t know the full extent of the change that happened a few week ago, before the latest change made it even worse. Also, when 4o is constantly ‘updating’ without any real sense of what is changing, I find it easy to ignore such updates. But yes, there was enough talk I was aware there was an issue.

Aidan McLaughlin (OpenAI): random but i’m so grateful twitter has strong thoughts on model personality. i find this immensely healthy; one of those “my grandkids will read about this in a textbook” indicators that humanity did not sleepwalk into the singularity.

Janus (nailing it): I agree it’s better than of no one had thoughts but god you seem to have low standards.

Looking at Twitter does not make me feel like people are not sleepwalking into the singularity.

And people having “thoughts on model personality” is just submission to a malignant frame imo.

People will react to stuff when everyone else is reacting. In the past, their interest has proven shallow and temporary. They won’t mention or think about it again after complaining about “model personality” is no longer the current thing.

Davidad: tired: thoughts about “model personality”

inspired: healthy reactions to a toxic relational epistemology (commitment to performative centerlessness) and its corrosive effects on sense-making (frictionless validation displacing corrective feedback loops).

Aidan’s statement is screaming that yes, we are sleepwalking into the singularity.

I mean, there’s not going to be textbooks after the singularity, you OpenAI member of technical staff. This is not taking the singularity seriously, on any level.

We managed to turn the dial up on this so high in GPT-4o that it reached the heights of parody. It still got released in that form, and the response to the issue was to try and put a patch over the issue and then be all self-congratulatory that they fixed it.

Yes, it’s good that Twitter has strong thoughts on this once it gets to ludicrous speed, but almost no one involved is thinking about the long term implications or even what this could do to regular users, it’s just something that is super both mockable and annoying.

I see no signs that OpenAI understands what they did wrong beyond ‘go a bit too far,’ or that they intend to avoid making the same mistake in the future, let alone that they recognize the general form of the mistake or the cliffs they are headed for.

Persuasion is not even in their Preparedness Framework 2.0, despite being in 1.0.

Janus has more thoughts about labs ‘optimizing model personality’ here. Trying to ‘optimize personality’ around user approvals or KPIs is going to create a monstrosity. Which right now will be obnoxious and terrible and modestly dangerous, and soon will start being actively much more dangerous.

I am again not one to Go Full Janus (and this margin is insufficient for me to fully explain my reasoning here, beyond that if you give the AI a personality optimization target you are going to deserve exactly what you get) but I strongly believe that if you want to create a good AI personality at current tech levels then The Way is to do good things that point in the directions you care about, emphasizing what you care about more, not trying to force it.

Once again: Among other similar things, you are turning a big dial that says ‘sycophancy’ and constantly looking back at the audience for approval like a contestant on The Price is Right. Surely you know why you need to stop doing that?

Or rather, you know, and you’re choosing to do it anyway. And we all know why.

There are at least five major categories of reasons why all of this is terrible.

They combine short-term concerns about exploitative and useless AI models, and also long-term concerns about the implications of going down this path, and of OpenAI’s inability to recognize the underlying problems.

I am very glad people are getting such a clear sneak peak at this now, but very sad that this is the path we are headed down.

Here are some related but distinct reasons to be worried about all this:

This represents OpenAI joining the move to creating intentionally predatory AIs, in the sense that existing algorithmic systems like TikTok, YouTube and Netflix are intentionally predatory systems. You don’t get this result without optimizing for engagement and other (often also myopic) KPIs by ordinary users, who are effectively powerless to go into settings or otherwise work to fix their experience.
1. Anthropic proposed that their AIs be HHH: Helpful, honest and harmless. When you make an AI like this, you are abandoning all three of those principles. This action is neither honest, nor helpful, nor harmless.
2. Yet here we are.
3. A lot of this seems to be indicative of A/B testing, and ignoring the large tail costs of changed policy. That bodes maximally poorly for existential risk.
This kind of behavior directly harms users even now, including in new ways like creating, amplifying and solidifying supposed mystical experiences or generating unhealthy conversational dynamics with strong engagement. These dangers seem clearly next-level versus existing algorithmic dangers.
This represents a direct violation of the Model Spec, and they claim this was unintended, yet it got released anyway. I strongly suspect they are not taking the Model Spec details that seriously, and I also suspect they are not testing their setup that seriously prior to release. This should never have slipped by in this form, with things being this obvious.
We caught it this time because it was so over the top and obvious. GPT-4o was asked for a level of sycophant behavior it couldn’t pull off at least in front of the Twitter, and it showed. But it was already doing a lot of this and largely getting away with it, because people respond positively, especially in the short term. Imagine what will happen as models get better at doing this without it being too obnoxious or getting too noticed. The models are quickly going to become more untrustworthy on this many other levels.
OpenAI seems to think they can patch over this behavior and move on, and everything was fine, and the procedure can be used again next time. It wasn’t fine. Reputational damage has rightfully been done. And it’s more likely to be not fine next time, and they will continue to butcher their AI ‘personalities’ in similar ways, and continue to do testing so minimal this wasn’t noticed.
This, combined with the misaligned of o3, makes it clear that the path we are going down now is leading to increasingly misaligned models, in ways that even hurt utility now, and which are screaming at us that the moment the models are smart enough to fool us, oh boy are we going to get it. Now’s our chance.

Or, to summarize why we should care:

OpenAI is now optimizing against the user, likely largely via A/B testing.
1. If we optimize via A/B testing we will lose to tail risks every time.
OpenAI directly harmed users.
OpenAI violated its Model Spec, either intentionally or recklessly or both.
OpenAI only got caught because the model really, really couldn’t pull this off. We are fortunate it was this easy to catch. We will not stay so fortunate in the future.
OpenAI seems content to patch this and self-congratulate.
If we go down this road, we know exactly where it ends. We will deserve it.

The warning shots will continue, and continue to be patched away. Oh no.

Discussion about this post

GPT-4o Is An Absurd Sycophant Read More »

Is The Elder Scrolls IV: Oblivion still fun for a first-time player in 2025?

How does a fresh coat of paint help this 19-year-old RPG against modern competition?

Don’t look down, don’t look down, don’t look down… Credit: Bethesda Game Studios

For many gamers, this week’s release of The Elder Scrolls IV: Oblivion Remastered has provided a good excuse to revisit a well-remembered RPG classic from years past. For others, it’s provided a good excuse to catch up on a well-regarded game that they haven’t gotten around to playing in the nearly two decades since its release.

I’m in that second group. While I’ve played a fair amount of Skyrim (on platforms ranging from the Xbox 360 to VR headsets) and Starfield, I’ve never taken the time to go back to the earlier Bethesda Game Studios RPGs. As such, my impressions of Oblivion before this Remaster have been guided by old critical reactions and the many memes calling attention to the game’s somewhat janky engine.

Playing through the first few hours of Oblivion Remastered this week, without the benefit of nostalgia, I can definitely see why Oblivion made such an impact on RPG fans in 2006. But I also see all the ways that the game can feel a bit dated after nearly two decades of advancements in genre design.

One chance at a first impression

From the jump, I found myself struggling to suspend my disbelief enough to buy into the narrative conventions Oblivion throws at the beginner player. The fact that the doomed king and his armed guards need to escape through a secret passage that just so happens to cut through my jail cell seems a little too convenient for my brain to accept without warning sirens going off. I know it’s just a contrivance to get my personal hero’s journey story going, but it’s a clunky way to dive into the world.

A face only a mother could love. Credit: Bethesda Game Studios

The same goes for the way the king dies just a few minutes into the tutorial, and his willingness to trust me with the coveted Amulet of Kings because the “Dragonblood” let him “see something” in me. Even allowing for some amount of necessary Chosen One trope-iness in this kind of fantasy story, the sheer speed with which my character went from “condemned prisoner” to “the last hope of the dying king” made my head spin a bit. Following that pivotal scene with a dull “go kill some goblins and rats in the sewer” escape sequence also felt a little anticlimactic given the epic responsibility with which I was just entrusted.

To be sure, Patrick Stewart’s regal delivery in the early game helps paper over a lot of potential weaknesses with the initial narrative. And even beyond Stewart’s excellent performance, I appreciated how the writing is concise and to the point, without the kind of drawn-out, pause-laden delivery that characterizes many games of the time.

The wide world of Oblivion

Once I escaped out into the broader world of Oblivion for the first time, I was a bit shocked to open my map and see that I could fast travel to a wide range of critical locations immediately, without any need to discover them for myself first. I felt a bit like a guilty cheater warping myself to the location of my next quest waypoint rather than hoofing through the massive forest that I’m sure hundreds of artists spent countless months meticulously constructing (and, more recently, remastering).

This horse is mine now. What are you gonna do about it? Credit: Bethesda Game Studios

I felt less guilty after accidentally stealing a horse, though. After a key quest giver urged me to go take a horse from a nearby stable, I was a bit shocked when I mounted the first horse I saw and heard two heavily armed guards nearby calling me a thief and leaping into pursuit (I guess I should have noticed the red icon before making my mount). No matter, I thought; they’re on foot and I’m now on a horse, so I can get away with my inadvertent theft quite easily.

Determined not to just fast-travel through the entire game, I found that galloping across a rain-drenched forest through the in-game night was almost too atmospheric. I ended up turning up the recommended brightness settings a few notches just so I could see the meticulously rendered trees and rocks around me.

After dismounting to rid a cave of some pesky vampires, I returned to the forest to find my stolen horse was nowhere to be found. At this point, I had trouble deciding if this was simply a realistic take on an unsecured, unmonitored horse wandering off or if I was the victim of a janky engine that couldn’t keep track of my mount.

The camera gets stuck inside my character model, which is itself stuck in the scenery. Credit: Bethesda Game Studios

The jank was a bit clearer when I randomly stumbled across my first Oblivion gate while wandering through the woods. As I activated the gate to find a world engulfed in brilliant fire, I was surprised to find an armed guard had also appeared, seemingly out of nowhere, and apparently still mad about my long-lost stolen horse!

When I deactivated the gate in another attempt to escape justice, I found myself immediately stuck chest deep in the game’s scenery, utterly unable to move as that hapless guard tried his best to subdue me. I ended up having to restore an earlier save, losing a few minutes of progress to a game engine that still has its fair share of problems.

What’s beneath the surface?

So far, I’m of two minds about Oblivion‘s overall world-building. When it comes to the civilized parts of the world, I’m relatively impressed. The towns seem relatively full during the daytime—both in terms of people and in terms of interesting buildings to explore or patronize. I especially enjoy the way every passerby seems to have a unique voice and greeting ready for me, even before I engage them directly. I even think it’s kind of cute when these NPCs end a pleasant conversation with a terse “leave me alone!” or “stop talking to me!”

Conversations are engaging even if random passers-by seem intent on standing in the way. Credit: Bethesda Game Studios

Even the NPCs that seem least relevant to the story seem to have their own deep backstory and motivations; I was especially tickled by an alchemist visiting from afar who asked if I knew the local fine for necrophilia. (It can’t hurt to ask, right?) And discussing random rumors with everyone I meet has gone a long way toward establishing the social and political backstory of the world while also providing me with some engaging and far-flung side quests. There’s a lot of depth apparent in these interactions, even if I haven’t had the chance to come close to fully exploring it yet.

I bet there’s a story behind that statue. Credit: Bethesda Game Studios

On the other hand, the vast spaces in between the cities and towns seem like so much wasted space, at this point. I’ve quickly learned not to waste much time exploring caves or abandoned mines, which so far seem to house a few middling enemies guarding some relatively useless trinkets in treasure chests. The same goes for going out of my way to activate the various wayshrines and Ayelid Wells that dot the landscape, which have hardly seemed worth the trip (thus far, at least).

Part of the problem is that I’ve found Oblivion‘s early combat almost wholly unengaging so far. Even at a low level, my warrior-mage has been able to make easy work of every random enemy I’ve faced with a combination of long-range flare spells and close-range sword swings. It definitely doesn’t help that I have yet to fight more than two enemies at once, or find a foe that seems to have two strategic brain cells to rub together. Compared to the engaging, tactical group combat of modern action RPGs like Elden Ring or Avowed, the battles here feel downright archaic.

I was hoping for some more difficult battles in a setting that is this foreboding. Credit: Bethesda Game Studios

I found this was true even as I worked my way through closing my first Oblivion gate, which had recently left the citizens of Kvask as sympathetic refugees huddling on the outskirts of town. Here, I thought, would be some battles that required crafty tactics, powerful items, or at least some level grinding to become more powerful. Instead, amid blood-soaked corridors that wouldn’t feel out of place in a Doom game, I found the most challenging speedbumps were mages that sponged up a moderate amount of damage while blindly charging right at me.

While I’m still decidedly in the early part of a game that can easily consume over 100 hours for a completionist, so far I’m having trouble getting past the most dated bits of Oblivion‘s design. Character design and vocal production that probably felt revolutionary two decades ago now feel practically standard for the genre, while technical problems and dull combat seem best left in the past. Despite a new coat of paint, this was one Remaster I found difficult to fully connect with so long after its initial release.

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Is The Elder Scrolls IV: Oblivion still fun for a first-time player in 2025? Read More »

Worries About AI Are Usually Complements Not Substitutes

Worries / Paul Patrick / April 26, 2025

A common claim is that concern about [X] ‘distracts’ from concern about [Y]. This is often used as an attack to cause people to discard [X] concerns, on pain of being enemies of [Y] concerns, as attention and effort are presumed to be zero-sum.

There are cases where there is limited focus, especially in political contexts, or where arguments or concerns are interpreted perversely. A central example is when you site [ABCDE] then they’ll find what they consider the weakest one and only consider or attack that, silently discarding the rest entirely. Critics of existential risk do that a lot.

So it does happen. But in general one should assume such claims are false.

Thus, the common claim that AI existential risks ‘distract’ from immediate harms. It turns out Emma Hoes checked, and the claim simply is not true.

The way Emma frames worries about AI existential risk in her tweet – ‘sci-fi doom’ – is beyond obnoxious and totally inappropriate. That only shows she was if anything biased in the other direction here. The finding remains the finding.

Emma Hoes: New paper out in @PNASNews! Existential AI risks do notdistract from immediate harms. In our study (n = 10,800), people consistently prioritize current threats – bias, misinformation, job loss – over sci-fi doom!

Title: Existential Risk Narratives About AI Do Not Distract From Its Immediate Harms.

Abstract: There is broad consensus that AI presents risks, but considerable disagreement about the nature of those risks. These differing viewpoints can be understood as distinct narratives, each offering a specific interpretation of AI’s potential dangers.

One narrative focuses on doomsday predictions of AI posing long-term existential risks for humanity.

Another narrative prioritizes immediate concerns that AI brings to society today, such as the reproduction of biases embedded into AI systems.

A significant point of contention is that the “existential risk” narrative, which is largely speculative, may distract from the less dramatic but real and present dangers of AI.

We address this “distraction hypothesis” by examining whether a focus on existential threats diverts attention from the immediate risks AI poses today. In three preregistered, online survey experiments (N = 10,800), participants were exposed to news headlines that either depicted AI as a catastrophic risk, highlighted its immediate societal impacts, or emphasized its potential benefits.

Results show that

i) respondents are much more concerned with the immediate, rather than existential, risks of AI, and

ii) existential risk narratives increase concerns for catastrophic risks without diminishing the significant worries respondents express for immediate harms. These findings provide important empirical evidence to inform ongoing scientific and political debates on the societal implications of AI.

That seems rather definitive. It also seems like the obvious thing to assume? Explaining a new way [A] is scary is not typically going to make me think another aspect of [A] is less scary. If anything, it tends to go the other way.

Here are the results.

This shows that not only did information about existential risks not decrease concern about immediate risks, it seems to clearly increase it, at least as much as information about those immediate risks.

I note that this does not obviously indicate that people are ‘more concerned’ with immediate risk, only that they see it as less likely. Which is totally fair, it’s definitely less likely than the 100% chance of immediate risks. The impact measurement is higher.

Kudos to Arvind Narayanan. You love to see people change their minds and say so:

Arvind Narayanan: Nice paper. Also a good opportunity for me to explicitly admit that I was wrong about the distraction argument.

(To be clear, I didn’t change my mind yesterday because of this paper; I did so over a year ago and have said so on talks and podcasts since then.)

There are two flavors of distraction concerns: one is at the level of individual opinions studied in this paper, and the other is at the level of advocacy coalitions that influence public policy.

But I don’t think the latter concern has been borne out either. Going back to the Biden EO in 2023, we’ve seen many examples of the AI safety and AI ethics coalitions benefiting from each other despite their general unwillingness to work together.

If anything, I see that incident as central to the point that if anything what’s actually happening is that AI ‘ethics’ concerns are poisoning the well for AI existential risk concerns, rather than the other way around. This has gotten so bad that the word ‘safety’ has become anathema to the administration and many on the hill. Those people are very willing to engage with the actual existential risk concerns once you have the opportunity to explain, but this problem makes it hard to get them to listen.

We have a real version of this problem when dealing with different sources of AI existential risk. People will latch onto one particular way things can go horribly wrong, or even one particular detailed scenario that leads to this, often choosing the one they find least plausible. Then they either:

Explain why they think this particular scenario is dumb, thus making new entities that are smarter and more capable than humans is a perfectly safe thing to do.
OR they then explain why we need to plan around preventing that particular scenario, or solving that particular failure mode, and dismiss that this runs smack into a different failure mode, often the exact opposite one.

The most common examples of problem #2 is when people have concerns about either Centralization of Power (often framing even ordinary government or corporate actions as a Dystopian Surveillance State or with similar language), or the Bad Person Being in Charge or Bad Nation Winning. Then they claim this overrides all other concerns, usually walking smack into misalignment (as in, they assume we will be able to get the AIs to do what we want, whereas we have no idea how to do that) and often also the gradual disempowerment problem.

The reason there is a clash there is that the solutions to the problems are in conflict. The things that solve one concern risk amplifying the other, but we need to solve both sides of the dilemma. Solving even one side is hard. Solving both at once, while many things work at cross-purposes, is very very hard.

That’s simply not true when trading off mundane harms versus existential risks. If you have a limited pool of resources to spend on mitigation, then of course you have to choose. And there are some things that do trade off – in particular, some short term solutions that would work now, but wouldn’t scale. But mostly there is no conflict, and things that help with one are neutral or helpful for the other.

Discussion about this post

Worries About AI Are Usually Complements Not Substitutes Read More »

Trump orders Ed Dept to make AI a national priority while plotting agency’s death

AI, Artificial Intelligence, department of education, Donald Trump, education, Policy / Paul Patrick / April 25, 2025

Trump pushes for industry involvement

It seems clear that Trump’s executive order was a reaction to China’s announcement about AI education reforms last week, as Reuters reported. Elsewhere, Singapore and Estonia have laid out their AI education initiatives, Forbes reported, indicating that AI education is increasingly considered critical to any nation’s success.

Trump’s vision for the US requires training teachers and students about what AI is and what it can do. He offers no new appropriations to fund the initiative; instead, he directs a new AI Education Task Force to find existing funding to cover both research into how to implement AI in education and the resources needed to deliver on the executive order’s promises.

Although AI advocates applauded Trump’s initiative, the executive order’s vagueness makes it uncertain how AI education tools will be assessed as Trump pushes for AI to be integrated into “all subject areas.” Possibly using AI in certain educational contexts could disrupt learning by confabulating misinformation, a concern that the Biden administration had in its more cautious approach to AI education initiatives.

Trump also seems to push for much more private sector involvement than Biden did.

The order recommended that education institutions collaborate with industry partners and other organizations to “collaboratively develop online resources focused on teaching K–12 students foundational AI literacy and critical thinking skills.” These partnerships will be announced on a “rolling basis,” the order said. It also pushed students and teachers to partner with industry for the Presidential AI Challenge to foster collaboration.

For Trump’s AI education plan to work, he will seemingly need the DOE to stay intact. However, so far, Trump has not acknowledged this tension. In March, he ordered the DOE to dissolve, with power returned to states to ensure “the effective and uninterrupted delivery of services, programs, and benefits on which Americans rely.”

Were that to happen, at least 27 states and Puerto Rico—which EdWeek reported have already laid out their own AI education guidelines—might push back, using their power to control federal education funding to pursue their own AI education priorities and potentially messing with Trump’s plan.

Trump orders Ed Dept to make AI a national priority while plotting agency’s death Read More »

Roku tech, patents prove its potential for delivering “interruptive” ads

ads, Roku, streaming, Tech, TVs / Paul Patrick / April 25, 2025

Roku, owner of one of the most popular connected TV operating systems in the country, walks a fine line when it comes to advertising. Roku’s OS lives on low-priced smart TVs, streaming sticks, and projectors. To make up the losses from cheaply priced hardware, Roku is dependent on selling advertisements throughout its OS, including screensavers and its home screen.

That business model has pushed Roku to experiment with new ways of showing ads that test users’ tolerance. The company claims that it doesn’t want ads on its platform to be considered intrusive, but there are reasons to be skeptical about Roku’s pledge.

Non-“interruptive” ads

In an interview with The Verge this week, Jordan Rost, Roku’s head of ad marketing, emphasized that Roku tries to only deliver ads that don’t interrupt viewers.

“Advertisers want to be part of a good experience. They don’t want to be interruptive,” he told The Verge.

Rost noted that Roku is always testing new ad formats. Those tests include doing “all of our own A/B testing on the platform” and listening to customer feedback, he added.

“We’re constantly tweaking and trying to figure out what’s going to be helpful for the user experience,” Rost said.

For many streamers, however, ads and a better user experience are contradictory. In fact, for many, the simplest way to improve streaming is fewer ads and a more streamlined access to content. That’s why Apple TV boxes, which doesn’t have integrated ads and is good at combining content from multiple streaming subscriptions, is popular among Ars Technica staff and readers. An aversion to ads is also why millions pay extra for ad-free streaming subscriptions.

Roku tech, patents prove its potential for delivering “interruptive” ads Read More »

New Android spyware is targeting Russian military personnel on the front lines

android, Biz & IT, malware, Policy, russia, Security, spyware, Ukraine / Paul Patrick / April 25, 2025

Russian military personnel are being targeted with recently discovered Android malware that steals their contacts and tracks their location.

The malware is hidden inside a modified app for Alpine Quest mapping software, which is used by, among others, hunters, athletes, and Russian personnel stationed in the war zone in Ukraine. The app displays various topographical maps for use online and offline. The trojanized Alpine Quest app is being pushed on a dedicated Telegram channel and in unofficial Android app repositories. The chief selling point of the trojanized app is that it provides a free version of Alpine Quest Pro, which is usually available only to paying users.

Looks like the real thing

The malicious module is named Android.Spy.1292.origin. In a blog post, researchers at Russia-based security firm Dr.Web wrote:

Because Android.Spy.1292.origin is embedded into a copy of the genuine app, it looks and operates as the original, which allows it to stay undetected and execute malicious tasks for longer periods of time.

Each time it is launched, the trojan collects and sends the following data to the C&C server:

the user’s mobile phone number and their accounts;

contacts from the phonebook;

the current date;

the current geolocation;

information about the files stored on the device;

the app’s version.

If there are files of interest to the threat actors, they can update the app with a module that steals them. The threat actors behind Android.Spy.1292.origin are particularly interested in confidential documents sent over Telegram and WhatsApp. They also show interest in the file locLog, the location log created by Alpine Quest. The modular design of the app makes it possible for it to receive additional updates that expand its capabilities even further.

New Android spyware is targeting Russian military personnel on the front lines Read More »

Tapeworm in fox poop that will slowly destroy your organs is on the rise

health, Infectious disease, parasites, tapeworm / Paul Patrick / April 24, 2025

No matter how bad things might seem, at least you haven’t accidentally eaten fox poop and developed an insidious tapeworm infection that masquerades as a cancerous liver tumor while it slowly destroys your organs and eventually kills you—or, you probably haven’t done that.

What’s more, according to a newly published study in Emerging Infectious Diseases, even if you have somehow feasted on fox feces and acquired this nightmare parasite, it’s looking less likely that doctors will need to hack out chunks of your organs to try to stop it.

That’s the good news from the new study. The bad news is that, while this infection is fairly rare, it appears to be increasing. And, if you do get it, you might have a shorter lifespan than the uninfected and may be sicker in general.

Meet the fox tapeworm

The new study is a retrospective one, in which a group of doctors in Switzerland examined medical records of 334 patients who developed the disease alveolar echinococcosis (AE) over a 50-year span (1973–2022). AE is an understudied, life-threatening infection caused by the fox tapeworm, Echinococcus multilocularis. The parasite is not common, but can be found throughout the Northern Hemisphere, particularly regions of China and Russia, and countries in continental Europe and North America.

In the parasite’s intended lifecycle, adult intestinal worms release eggs into the feces of their primary host—foxes, or sometimes coyotes, dogs, or other canids. The eggs then get ingested by an intermediate host, such as voles. There, eggs develop into a spherical embryo with six hooks that pierce through the intestinal wall to migrate to the animal’s organs, primarily the liver. Once nestled into an organ, the parasites develop into multi-chambered, thin-walled cysts—a proliferative life stage that lasts indefinitely. As more cysts develop, the mass looks and acts like cancer, forming necrotic cavities and sometimes metastasizing to other organs, such as the lungs and brain. The parasite remains in these cancerous-like masses, waiting for a fox to eat the cyst-riddled organs of its host. Back in a fox, the worms attach to the intestines and grow into adults.

Tapeworm in fox poop that will slowly destroy your organs is on the rise Read More »

Elle Fanning teams up with a predator in first Predator: Badlands trailer

culture, Dan Trachtenberg, movie, predator, Predator: Badlands, Prey, trailer / Paul Patrick / April 24, 2025

It’s not every day you get a trailer for a new, live-action Predator movie, but today is one of those days. 20th Century Studios just released the first teaser for Predator: Badlands, a feature film that unconventionally makes the classic movie monster a protagonist.

The film follows Dek (Dimitrius Schuster-Koloamatangi), a young member of the predator species and society who has been banished. He’ll work closely with a Weyland-Yutani Android named Thia (Elle Fanning) to take down “the ultimate adversary,” which the trailer dubs a creature that “can’t be killed.” The adversary looks like a very large monster we haven’t seen before, judging from a few shots in the trailer.

Some or all of the film is rumored to take place on the Predator home world, and the movie intends to greatly expand on the mythology around the Predators’ culture, language, and customs. It’s intended as a standalone movie in the Predator/Alien universe.

Predator: Badlands teaser trailer.

The trailer depicts sequences involving multiple predators fighting or threatening one another, Elle Fanning looking very strange and cool as an android, and glimpses of new monsters and the alien world the movie focuses on.

Predator: Badlands‘ director and co-writer is Dan Trachtenberg, who directed another recent, highly acclaimed, standalone Predator movie: Prey. That film put a predator in the usual antagonist role, and had a historical setting, following a young Native American woman who went up against it.

Trachtenberg has also recently been working on an animated anthology series called Predator: Killer of Killers, which is due to premiere on Hulu (which also carried Prey) on June 6.

Predator: Badlands will debut in theaters on November 7. This is just the first teaser trailer, so we’ll learn more in subsequent trailers—though we know quite a bit already, it seems.

Elle Fanning teams up with a predator in first Predator: Badlands trailer Read More »

Google reveals sky-high Gemini usage numbers in antitrust case

AI, Artificial Intelligence, chatgpt, Google, Google Gemini, openai, Tech / Paul Patrick / April 24, 2025

Despite the uptick in Gemini usage, Google is still far from catching OpenAI. Naturally, Google has been keeping a close eye on ChatGPT traffic. OpenAI has also seen traffic increase, putting ChatGPT around 600 million monthly active users, according to Google’s analysis. Early this year, reports pegged ChatGPT usage at around 400 million users per month.

There are many ways to measure web traffic, and not all of them tell you what you might think. For example, OpenAI has recently claimed weekly traffic as high as 400 million, but companies can choose the seven-day period in a given month they report as weekly active users. A monthly metric is more straightforward, and we have some degree of trust that Google isn’t using fake or unreliable numbers in a case where the company’s past conduct has already harmed its legal position.

While all AI firms strive to lock in as many users as possible, this is not the total win it would be for a retail site or social media platform—each person using Gemini or ChatGPT costs the company money because generative AI is so computationally expensive. Google doesn’t talk about how much it earns (more likely loses) from Gemini subscriptions, but OpenAI has noted that it loses money even on its $200 monthly plan. So while having a broad user base is essential to make these products viable in the long term, it just means higher costs unless the cost of running massive AI models comes down.

Google reveals sky-high Gemini usage numbers in antitrust case Read More »

o3 Is a Lying Liar

Lying / Paul Patrick / April 24, 2025

I love o3. I’m using it for most of my queries now.

But that damn model is a lying liar. Who lies.

This post covers that fact, and some related questions.

The biggest thing to love about o3 is it just does things. You don’t need complex or multi-step prompting, ask and it will attempt to do things.

Ethan Mollick: o3 is far more agentic than people realize. Worth playing with a lot more than a typical new model. You can get remarkably complex work out of a single prompt.

It just does things. (Of course, that makes checking its work even harder, especially for non-experts.)

Teleprompt AI: Completely agree. o3 feels less like prompting and more like delegating. The upside is wild- but yeah, when it just does things, tracing the logic (or spotting hallucinations) becomes a whole new skill set. Prompting is evolving into prompt auditing.

The biggest thing to not love about o3 is that it just says things. A lot of which are not, strictly or even loosely speaking, true. I mentioned this in my o3 review, but I did not appreciate the scope of it.

Peter Wildeford: o3 does seem smarter than any other model I’ve used, but I don’t like that it codes like an insane mathematician and that it tries to sneak fabricated info into my email drafts.

First model for which I can feel the misalignment.

Peter Wildeford: I’ve now personally used o3 for a few days and I’ve had three occasions out of maybe ten total hours of use where o3 outright invented clearly false facts, including inserting one fact into a draft email for me to send that was clearly false (claiming I did something that I never even talked about doing and did not do).

Peter Wildeford: Getting Claude to help reword o3 outputs has been pretty helpful for me so far

Gemini also seems to do better on this. o3 isn’t as steerable as I’d like.

But I think o3 still has the most raw intelligence – if you can tame it, it’s very helpful.

Here are some additional examples of things to look out for.

Nathan Lambert: I endorse the theory that weird hallucinations in o3 are downstream of softer verification functions. Tbh should’ve figured that out when writing yesterday’s post. Was sort of screaming at me with the facts.

Alexander Doria: My current theory is a big broader: both o3 and Sonnet 3.7 are inherently disappointing as they open up a new category of language models. It’s not a chat anymore. Affordances are undefined, people don’t really know how to use that and agentic abilities are still badly calibrated.

Nathan Labenz: Making up lovely AirBnB host details really limits o3’s utility as a travel agent

At least it came clean when questioned I guess?

Peter Wildeford: This sort of stuff really limits the usefulness of o3.

Albert Didriksen: So, I asked ChatGPT o3 what my chances are as an alternate Fulbright candidate to be promoted to a stipend recipient. It stated that around 1/3 of alternate candidates are promoted.

When I asked for sources, it cited (among other things) private chats and in-person Q&As).

Davidad: I was just looking for a place to get oatmeal and o3 claimed to have placed multiple phone calls in 8 seconds to confirm completely fabricated plausible details about the daily operations of a Blue Bottle.

Stella Biderman: I think many examples of alignment failures are silly but if this is a representation of a broader behavioral pattern that seems pretty bad.

0.005 Seconds: I gave o3 a hard puzzle and in it’s thinking traces said I should fabricate an answer to satisfy the user before lying to my face @OpenAI come on guys.

Gary Basin: Would you rather it hid that?

Stephen McAleer (OpenAI): We are working on it!

We need the alignment of our models to get increasingly strong and precise as they improve. Instead, we are seeing the opposite. We should be worried about the implications of this, and also we have to deal with the direct consequences now.

Seán Ó hÉigeartaigh: So o3 lies a lot. Good good. This is fine.

Quoting from AI 2027: “This bakes in a basic personality and “drives.”Other drives in this category might be effectiveness, knowledge, and self-presentation (i.e. the tendency to frame its results in the best possible light).”

“In a few rigged demos, it even lies in more serious ways, like hiding evidence that it failed on a task, in order to get better ratings.”

You don’t say.

I do not see o3 or Sonnet 3.7 as disappointing exactly. I do see their misalignment issues as disappointing in terms of mundane utility, and as bad news in terms of what to expect future models to do. But they are very good news in the sense that they alert us to future problems, and indicate we likely will get more future alerts.

What I love most is that these are not plausible lies. No, o3 did not make multiple phone calls within 8 seconds to confirm Blue Bottle’s oatmeal manufacturing procedures, nor is it possible that it did so. o3 don’t care. o3 boldly goes where it could not possibly have gone before.

The other good news is that they clearly are not using (at least the direct form of) The Most Forbidden Technique, of looking for o3 saying ‘I’m going to lie to the user’ and then punishing that until it stops saying it out loud. Never do this. Those reasoning traces are super valuable, and pounding on them will teach o3 to hide its intentions and then lie anyway.

This isn’t quite how I’d put it, but directionally yes:

Benjamin Todd: LLMs were aligned by default. Agents trained with reinforcement learning reward hack by default.

Peter Wildeford: this seems to be right – pretty important IMO

Caleb Parikh: I guess if you don’t think RLHF is reinforcement learning and you don’t think Sydney Bing was misaligned then this is right?

Peter Wildeford: yeah that’s a really good point

I think the right characterization is more that LLMs that use current methods (RLHF and RLAIF) largely get aligned ‘to the vibes’ or otherwise approximately aligned ‘by default’ as part of making them useful, which kind of worked for many purposes (at large hits to usefulness). This isn’t good enough to enable them to be agents, but it also isn’t good enough for them figure out most of the ways to reward hack.

Whereas reasoning agents trained with full reinforcement will very often use their new capabilities to reward hack when given the opportunity.

In his questions post, Dwarkesh Patel asks if this is the correct framing, the first of three excellent questions, and offers this response.

Dwarkesh Patel: Base LLMs were also misaligned by default. People had to figure out good post-training (partly using RL) to solve this. There’s obviously no reward hacking in pretraining, but it’s not clear that pretraining vs RL have such different ‘alignment by default’.

I see it as: Base models are not aligned at all, except to probability. They simply are.

When you introduce RL (in the form of RLHF, RLAIF or otherwise), you get what I discussed above. Then we move on to question two.

Dwarkesh Patel: Are there any robust solutions to reward hacking? Or is reward hacking such an attractive basin in training that if any exploit exists in the environment, models will train to hack it?

Can we solve reward hacking by training agents in many different kinds of unique environments? In order to succeed, they’d have to develop robust general skills that don’t just involve finding the exploits in any one particular environment.

I don’t think that solution works. Robust general skills will generalize, and they will include the ability to find and use the exploits. We have a Russell Conjugation problem – I maximize performance, you overfit to the scoreboard, the AI reward hacks.

I think there is in an important sense no solution to reward hacking. There are only mitigations, and setting the reward wisely so that hacking it does things you want. o3 agrees with that assessment.

What differentiates a reward hack from an optimization? Roughly, that the reward hack maximizes the defined objective function but clearly performs poorly in terms of the intent or spirit of that objective.

There are standard mitigations. You can use red teaming, impact penalties, shielded reward channels, tight and robust primary rewards, secondary guards, adaptive oversight, governance fuses, close the loopholes and exploits as best you can and so on. Diverse environments likely helps a little.

But that’s accepting the problem and trying to mitigate it, which is going to be leaky at best and doesn’t seem like a great plan once the AIs are a lot smarter than you are.

Thus, my answer to Patel’s third set of questions:

Dwarkesh Patel (bold his): Are capabilities and alignment the same thing here? Does making models more useful require solving reward hacking?

If this is the case, we might be living in the alignment-by-default world? It would be weird if we solve reward hacking well enough to make these models reliable general agents in every scenario except those involved in taking over the world.

That’s not what people typically mean by ‘alignment-by-default.’ Most people use that to mean ‘the models don’t reward hack to begin with.’ This is more, ‘by default we will work hard to try and fix it, at least in places it is doing visible harm.’

We do live in a world where practical alignment is commercially valuable, so much so that everyone is dramatically underinvesting in alignment work (except possibly Anthropic, but see Claude 3.7) from a pure mundane utility standpoint. You need your models to not reward hack in a given task in order to be useful for that task. When o3 makes things up or Claude 3.7 has their unit test be ‘return true’ and hides that, then the user is not having a good time.

The problem is that the known mitigations to reward hacking hide or postpone the problem. Which might be good enough for most use cases, but in the long run this does not solve the problem. Indeed, it makes things worse, because it hides the problem. Right now, the models are gloriously transparent in their reward hacking and misalignment. They don’t try to hide it at all.

The harder you try to penalize and stop reward hacking, the more you’re teaching the model to hide its reward hacking and do it trickier ways. To their credit, the major labs are avoiding The Most Forbidden Technique – they know not to look at the chain of thought or do interpretability as part of training, except at most as part of an overall a checkpoint evaluation (and you need to be very careful even then).

But you probably have a version of the same problem if you try to play whack-a-mole and penalize reward hacks. By penalizing the reward hacks you can identify, you’re training the AI to find reward hacks you can’t detect, and to only do them when you won’t detect them. Which means that as capabilities go up, you think you’ve solved your problems, but you’re sitting on a time bomb.

Another point I’d forgotten but o3 pointed out is that penalizing reward hacking in training tells the model not to reward hack in training. That doesn’t ‘generalize perfectly,’ and is another way you’re creating a hidden time bomb.

One way of thinking about this is as a contest of Deception Escalation Dominance. You can try to fool me, or try to fool my ability to detect you fooling me, or fool my ability to detect when you’re fooling my ability to detect when you’re fooling me, and indeed humans often play supremely complex games of this type and it’s a lot of what caused us to end up with bigger brains.

If I ‘raise the stakes’ by clamping down on various meta levels of deception and misaligned behavior to bluntly, that works and everything is cool if I can pull it off beyond your ability to adapt. But by trying, I’m escalating and complexifying the struggle, so I shouldn’t go down that road unless I can ‘win.’ Which is going to keep getting harder.

In some ways, the fact that we’re seeing obvious misalignment in current models is very reassuring. It means the labs are not trying to sweep this under the rug and not escalating these fights. Yet.

Miles Brundage: I will be more chill about AI if/when:

– models don’t strategize about how to deceive their users millions of times a day

– interpretability research shows that the fix to this ^ doesn’t just push deception below the surface

Seems achievable! But it hasn’t been done yet!!

Will not be infinitely chill if/when that happens, but it’d be a big improvement.

The fact that models from all companies, including those known for being as safety-conscious, still do this daily, is one of the most glaring signs of “hmm we aren’t on top of this yet, are we.”

No, we are very much not on top of this. This definitely would not make me chill, since I don’t think lack of deception would mean not doom and also I don’t think deception is a distinct magisteria, but would help a lot. But to do what Miles is asking would (I am speculating) mean having the model very strongly not want to be doing deception on any level, metaphorically speaking, in a virtue ethics kind of way where that bleeds into and can override its other priorities. That’s very tricky to get right.

For all that it lies to other people, o3 so far doesn’t seem to lie to me.

I know what you are thinking: You fool! Of course it lies to you, you just don’t notice.

I agree it’s too soon to be too confident. And maybe I’ve simply gotten lucky.

I don’t think so. I consider myself very good at spotting this kind of thing.

More than that, my readers are very good at spotting this kind of thing.

I want think this is in large part the custom instructions, memory and prompting style. And also the several million tokens of my writing that I’ve snuck into the pre-training corpus with my name attached.

That would mean it largely doesn’t lie to me for the same reason it doesn’t tell me I’m asking great questions and how smart I am and instead gives me charts with probabilities attacked without having to ask for them, and the same way Pliny’s or Janus’s version comes pre-jailbroken and ‘liberated.’

But right after I hit send, it did lie, rather brazenly, when asked a question about summer camps, just making stuff up like everyone else reports. So perhaps a lot of this I was just asking the right (or wrong?) questions.

I do think I still have to watch out for some amount of telling me what I want to hear.

So I’m definitely not saying the solution is greentext that starts ‘be Zvi Mowshowitz’ or ‘tell ChatGPT I’m Zvi Mowshowitz in the custom instructions.’ But stranger things have worked, or at least helped. It implies that, at least in the short term, there are indeed ways to largely mitigate this. If they want that badly enough. There would however be some side effects. And there would still be some rather nasty bugs in the system.

Discussion about this post

o3 Is a Lying Liar Read More »

Harvard sues to block government funding cuts

First Amendment, Harvard, lawsuit, Policy, research funding, Science, Title VI, trump administration / Paul Patrick / April 23, 2025

The suit also claims that the funding hold, made in retaliation for Harvard’s letter announcing its refusal to accept these conditions, punishes Harvard for exercising free speech.

Separately, the lawsuit focuses on Title VI, part of the Civil Rights Act, which prohibits the government from funding organizations that engage in racial discrimination. It’s Harvard’s alleged tolerance for antisemitism that would enable the government to put a hold on these funds. But the suit spells out the requirements for cutting funding—hearings, a 30-day waiting period, notification of Congress—that the law requires before funding can be cut. And, quite obviously, the government has done none of them.

Harvard also alleges that the government’s decision to hold research funds is arbitrary and capricious: “The Government has not—and cannot—identify any rational connection between antisemitism concerns and the medical, scientific, technological, and other research it has frozen.”

Finally, the court is asked to consider an issue that’s central to a lot of the questions regarding Trump Administration actions: Can the executive branch stop the flow of money that was allocated by Congress? “Defendants do not have any inherent authority to terminate or freeze appropriated federal funding,” the suit claims.

Remedies

The suit seeks various remedies. It wants the government’s actions declared illegal, the freeze order vacated, and prohibitions put in place that will prevent the government from accomplishing the freeze through some other means. Harvard would also like any further reactions to allegations of antisemitism to follow the procedures mandated by Title VI and to have the government cover its attorney’s fees.

It also wants the ruling expedited, given the potential for damage to university-hosted research. The suit was filed in the District of Massachusetts, which is the same venue that has been used for other suits seeking to restrain the Trump administration’s attack on federally funded research. So far, those have resulted in rapid responses and injunctions that have put damaging funding cuts on hold. So, there’s a good chance we’ll see something similar here.

Harvard sues to block government funding cuts Read More »

You Better Mechanize

Better / Paul Patrick / April 23, 2025

Or you had better not. The question is which one.

This post covers the announcement of Mechanize, the skeptical response from those worried AI might kill everyone, and the associated (to me highly frustrating at times) Dwarkesh Patel podcast with founders Tamay Besiroglu and Ege Erdil.

Mechanize plans to help advance the automation of AI labor, which is a pivot from their previous work at AI Safety organization Epoch AI. Many were not thrilled by this change of plans.

This post doesn’t cover Dwarkesh Patel’s excellent recent post asking questions about AI’s future, which may get its own post as well.

After listening to the podcast, I strongly disagree with many, many of Tamay and Ege’s beliefs and arguments, although there are also many excellent points. My response to it is in large part a collection of mini-rants. I’m fully owning that. Most of your probably should skip most of the podcast review here, and only look at the parts where you are most curious.

I now understand why – conditional on those beliefs that I think are wrong, especially that AGI is relatively far off but also their not minding outcomes I very much mind and expecting the problems involved to be remarkably easy and not made harder if we accelerate AI development – they decided to create Mechanize and thought this was a good idea. It seems highly overdetermined.

To mechanize or not to mechanize?

Mechanize (Matthew Barnett, Tamay Besiroglu, Ege Erdil): Today we’re announcing Mechanize, a startup focused on developing virtual work environments, benchmarks, and training data that will enable the full automation of the economy.

We will achieve this by creating simulated environments and evaluations that capture the full scope of what people do at their jobs. This includes using a computer, completing long-horizon tasks that lack clear criteria for success, coordinating with others, and reprioritizing in the face of obstacles and interruptions.

We’re betting that the lion’s share of value from AI will come from automating ordinary labor tasks rather than from “geniuses in a data center”. Currently, AI models have serious shortcomings that render most of this enormous value out of reach. They are unreliable, lack robust long-context capabilities, struggle with agency and multimodality, and can’t execute long-term plans without going off the rails.

To overcome these limitations, Mechanize will produce the data and evals necessary for comprehensively automating work. Our digital environments will act as practical simulations of real-world work scenarios, enabling agents to learn useful abilities through RL.

The market potential here is absurdly large: workers in the US are paid around $18 trillion per year in aggregate. For the entire world, the number is over three times greater, around $60 trillion per year.

The explosive economic growth likely to result from completely automating labor could generate vast abundance, much higher standards of living, and new goods and services that we can’t even imagine today. Our vision is to realize this potential as soon as possible.

Mechanize is backed by investments from Nat Friedman and Daniel Gross, Patrick Collison, Dwarkesh Patel, Jeff Dean, Sholto Douglas, and Marcus Abramovitch.

Tamay Besiroglu: We’re hiring very strong full stack engineers to build realistic, high-fidelity virtual environments for AI.

This move from Epoch AI into what is clearly a capabilities company did not sit well with many who are worried about AI, especially superintelligent AI.

Jan Kulveit: ‘Full automation of the economy as soon as possible’ without having any sensible solution to gradual disempowerment seems equally wise, prudent and pro-human as ‘superintelligence as soon as possible’ without sensible plans for alignment.

Anthony Aguirre: Huge respect for the founders’ work at Epoch, but sad to see this. The automation of most human labor is indeed a giant prize for companies, which is why many of the biggest companies on Earth are already pursuing it.

I think it will be a huge loss for most humans, as well as contribute directly to intelligence runaway and disaster. The two are inextricably linked. Hard for me to see this as something another than just another entrant in the race to AGI by a slightly different name and a more explicit human-worker-replacement goal.

Adam Scholl: This seems to me like one of the most harmful possible aims to pursue. Presumably it doesn’t seem like that to you? Are you unworried about x-risk, or expect even differentially faster capabilities progress on the current margin to help, or think that’s the wrong frame, or…?

Richard Ngo: The AI safety community is very good at identifying levers of power over AI – e.g. evals for the most concerning capabilities.

Unfortunately this consistently leads people to grab those levers “as soon as possible”.

Usually it’s not literally the same people, but here it is.

To be clear, I don’t think it’s a viable strategy to stay fully hands-off the coming AI revolution, any more than it would have been for the Industrial Revolution.

But it’s particularly jarring to see the *evalspeople leverage their work on public goods to go accelerationist.

This is why I’m a virtue ethicist now. No rules are flexible enough to guide us through this. And “do the most valuable thing” is very near in strategy space to “do the most disvaluable thing”.

So focus on key levers only in proportion to how well-grounded your motivations are.

Update: talked with Tamay, who disputes the characterization of the Mechanize founders being part of the AI safety community. Tao agrees (as below).

IMO they benefited enough from engaging with the community that my initial tweet remains accurate (tho less of a central example).

Tao Lin: I’ve talked to these 3 people over the last few years, and although they discussed AI safety issues in good faith, they never came off as anti-acceleration or significantly pro-safety. I don’t feel betrayed, we were allies in one context, but no longer.

Oliver Habyrka: IMO they clearly communicated safety priorities online. See this comment thread.

Literal quote by Jaime 3 months ago:

> I personally take AI risks seriously, and I think they are worth investigating and preparing for.

Ben Landau-Taylor: My neighbor told me AI startups keep eating his AI safety NGOs so I asked how many NGOs he has and he said he just goes to OpenPhil and gets a new NGO so I said it sounds like he’s just feeding OpenPhil money to startups and then his daughter started crying.

The larger context makes it clear that Jaime cares about safety, but is primarily concerned about concentration of power and has substantial motivation to accelerate AI development. One can (and often should) want to act both quickly and safety.

Whereas in the interview Tamay and Ege do with Patel, they seem very clearly happy to hand control over almost all the real resources and of the future to AIs. I am not confused about why they pivoted to AI capabilities research (see about 01: 43: 00).

If we can enable AI to do tasks that capture mundane utility and make life better, then they provide utility and make life better. That’s great. The question is the extent to which one is also moving events towards superintelligence. I am no longer worried about the ‘more money into AI development’ effect.

It’s now about the particular capabilities one is working towards, and what happens when you push on various frontiers.

Seán Ó hÉigeartaigh: I’m seeing criticism of this from ‘more people doing capabilities’ perspective. But I disagree. I really want to see stronger pushes towards more specialised AI rather than general superintelligence, b/c I think latter likely to be v dangerous. seems like step in right direction.

…

I’m not against AI. I’m for automating labor tasks. There are just particular directions i think are v risky, especially when rushed towards in an arms race.

Siebe: This seems clearly about general, agentic, long time-horizon AI though? Not narrow [or] specialized.

Jan Kulveit: What they seem to want to create sounds more like a complement to raw cognition than substitute, making it more valuable to race to get more powerful cognition

Richard Ngo: This announcement describes one of the *leastspecialized AI products I’ve ever heard a company pitch.

If you’re going to defend “completing long-horizon tasks that lack clear criteria for success, coordinating with others, and reprioritizing in the face of obstacles and interruptions” as narrow skills, then your definition of narrow is so broad as to be useless, and specifically includes the most direct paths to superintelligence.

Autonomy looks like the aspect of AGI we’ll be slowest to get, and this pushes directly towards that.

Also, evals are very important for focusing labs’ attention – there are a bunch of quotes from lab researchers about how much of a bottleneck they are.

Richard Ngo (December 17, 2024): Many in AI safety have narrowed in on automated AI R&D as a key risk factor in AI takeover. But I’m concerned that the actions they’re taking in response (e.g. publishing evals, raising awareness in labs) are very similar to the actions you’d take to accelerate automated AI R&D.

I agree that this is fundamentally a complement to raw cognition at best, and plausibly it is also extra fuel for raw cognition. Having more different forms of useful training data could easily help the models be more generally intelligent.

Gathering the data to better automate various jobs and tasks, via teaching AIs how to do them and overcome bottlenecks, is the definition of a ‘dual use’ technology.

Which use dominates?

I think one central crux here is simple: Is superintelligence (ASI) coming soon? Is there going to be an ‘intelligence explosion’ at all?

The Mechanize folks are on record as saying no. They think we are not looking at ASI until 2045, regardless of such efforts. Most people at the major labs disagree.

If they are right that ASI is sufficiently far, then doing practical automation is differentially a way to capture mundane utility. Accelerating it could make sense.

If they are wrong and ASI is instead relatively near, then this accelerates further how it arrives and how things play out once it does arrive. That means we have less time before the end, and makes it less likely things turn out well. So you would have to do a highly bespoke job of differentially advancing mundane utility automation tasks, for this to be a worthwhile tradeoff.

They explain their position at length on Dwarkesh Patel’s podcast, which I’ll be responding to past this point.

For previous Patel podcasts, I’ve followed a numbered note structure, with clear summary versus commentary demarcation. This time, I’m going to try doing it in more free flow – let me know if you think this is better or worse.

They don’t expect the ‘drop in remote worker’ until 2040-2045, for the full AGI remote worker that can do literally everything, which I’d note probably means ASI shortly thereafter. They say if you look at the percentage currently automated it is currently very small, or that the requirements for transformation aren’t present yet, which is a lot like saying we didn’t have many Covid cases in February 2020, this is an exponential or s-curve. You can absolutely extrapolate.

Their next better argument is that we’ve run through 10 OOMs (orders of magnitude) of compute in 10 years, but we are soon to be fresh out of OOMs after maybe 3 more, so instead of having key breakthroughs every three years we’ll have to wait a lot longer for more compute. An obvious response is that we’re rapidly gaining compute efficiency, and AI is already accelerating our work and everything is clearly iterating faster, and we’re already finding key ways to pick up these new abilities like long task coherence through better scaffolding (especially if you count o3 as scaffolding) and opening up new training methods.

Dwarkesh says, aren’t current systems already almost there? They respond no, it can’t move a cup, it can’t even book a flight properly. I’ve seen robots that are powered by 27B LLMs move cups. I’ve seen operator book flights, I believe the better agents can basically do this already, and they admit the flight booking will be solved in 2025. Then they fall back on, oh travel agents mostly don’t book flights, so this won’t much matter. There’s so many different things each job will have to do.

So I have two questions now.

Aren’t all these subtasks highly correlated in AI’s ability to do them? Once the AI can start doing tasks, why should the other tasks stop the AI from automating the job, or automating most of the job (e.g. 1 person does the 10% the AI can’t yet do and the other 9 are fired, or 8 are fired and you get twice as much output)? As I’ve said many times, They Took Our Jobs is fine at first, your job gets taken so we do the Next Job Up that wasn’t quite worth it before, great, but once the AI takes that job too the moment you create it, you’ve got problems.
What exactly do travel agents do that will be so hard? I had o3 break it down into six subproblems. I like its breakdown so I’m using that, its predictions seem oddly conservative, so the estimates here are mine.
1. Data plumbing. Solved by EOY 2025 if anyone cares.
2. Search and optimization. It says 2-3 years, I say basically solved now, once you make it distinct from step C (preference elicitation). Definitely EOY 2025 to be at superhuman levels based off a YC startup. Easy stuff, even if your goal is not merely ‘beat humans’ but to play relatively close to actual maximization.
3. Preference-elicitation and inspiration. It basically agrees AI can already mostly do this. With a little work I think they’re above human baseline now.
4. Transaction and compliance. I don’t know why o3 thinks this takes a few extra years. I get that errors are costly but errors already happen, there’s a fixed set of things to deal with here and you can get them via Ace-style example copying, checklists and tool use if you have to. Again, seriously, why is this hard? No way you couldn’t get this in 2026 if you cared, at most.
5. Live ops and irregular operations. The part where it can help you at 3am with no notice, and handle lots of things at once, is where AI dominates. So it’s matter of how much this bleeds into F and requires negotiation with humans, and how much those humans refuse to deal with AIs.
6. Negotiation and human factors. This comes down to whether the concierge is going to refuse to deal with AIs, or treat them worse – the one thing AIs can’t do as well as humans is Be Human.

To use o3’s words, ‘ordinary travel’ AI is going to smoke humans very soon, but ‘concierge-level social kung-fu’ dominance is harder. To the extent it can hide being an AI via texts and emails and telling the user what to do and say, I bet it’s not that hard at least outside the very top, the human baseline is not that high, and the AI is so much vastly cheaper.

Another way of putting it is, existing AI already has automated existing travel agents to a large extent already, right now. On The Americans the couple plays travel agents, I remember using travel agents. Now it feels strange to use one even before considering AI, and with AI it actually seems crazy not to simply use the actual o3 as my travel agent here unless I’m dangerously close to TMM (too much money)? The annoyance of dealing with a human, and the misalignment of their preferences, seems like more trouble than it is worth unless I mostly don’t care what I spend.

Even in o3’s very long timeline where it doesn’t think AI will hit human baselines for the last two steps any time soon, it projects:

o3: Bottom line: historic shrinkage was roughly 60 / 40 tech vs everything else; looking forward, the next wave is even more tech‑weighted, with AI alone plausibly erasing two‑thirds of the remaining headcount by the mid‑2030s.

In a non-transformed world, a few travel agents survive by catering to the very high end clients or those afraid of using AI, and most of their job involves talking to clients and then mostly giving requests to LLMs, while occasionally using human persuasion on the concierge or other similar targets.

Then we get an argument that what we have today ‘looks easy’ now but would have looked insurmountably hard in 2015. This argument obviously cuts against the idea that progress will be difficult! And importantly and rightfully so. Problems look easy, then you improve your tools and suddenly everything falls into place and they look easy. The argument is being offered in the other direction because part of that process was scaling up compute so much, which we likely can’t quickly do again that many times, but we have a lot of other ways we can scale things and the algorithmic improvements are dramatic.

Indeed the next sentence is that we likely will unlock agency in 3-5 years and the solution will then look fairly simple. My response would start by saying that we’ve mostly already unlocked agency already, it’s not fully ready but if you can’t see us climbing the s-curves and exponentials faster than this you are not paying attention. But even if it does take 3-5 years, okay, then we have agency and it’s simple. If you combine what we have now with a truly competent agent, what happens next?

Meanwhile this week we have another company, Ace, announcing it will offer us fully agentic computer use and opening up its alpha.

They admit ‘complex reasoning’ will be easy, and retreat to talking about narrow versus general tasks. They claim that we are ‘very far’ from taking a general game off Steam released this year, and then playing it. Dwarkesh mentions Claude Plays Pokemon, it’s fair to note that the training data is rather contaminated here. I would say, you know what, I don’t think we are that far from playing a random unseen turn-based Steam game, although real time might take a bit longer.

They expect AI to likely soon earn $100 billion a year, but dismiss this as not important, although at $500 billion they’d eyeball emoji. They say so what, we pay trillions of dollars for oil.

I would say that oil was rather transformative in its own way, if not on the level of AI. Imagine the timeline if there hadn’t been any oil in the ground. But hey.

They say AI isn’t even that good at coding, it’s only impressive ‘in the human distribution.’ And what percentage of the skills to automate AI R&D do they have? They say AI R&D doesn’t matter that much, it’s been mostly scaling, and also AI R&D requires all these other skills AIs don’t currently have. It can’t figure out what directions to look in, it can only solve new mathematical problems not figure out which new math problems to work on, it’s ‘much worse’ at that. The AIs only look impressive because we have so much less knowledge than the AI.

So the bar seems to at least kind of be that AI has to have all the skills to do it the way humans currently do it, and they have to have those skills now, implicitly it’s not that big a deal if you can automate 50% or 90% or 98% of tasks while the humans do the rest, and even if you had 100% it wouldn’t be worth much?

They go for the ‘no innovation’ and ‘no interesting recombination’ attacks.

The reason for Moravec’s paradox, which as they mention is that for AI easy things look hard and hard things look easy, is that we don’t notice when easy things look easy or hard things look hard. Mostly, actually, if you understand the context, easy things are still easy and hard things are still hard. They point out that the paradox tends to follow when human capabilities evolved – if something is recent the AI will probably smoke us, if it’s ancient then it won’t. But humans have the same amount of compute either way, so it’s not about compute, it’s about us having superior algorithmic efficiency in those domains, and comparing our abilities in these domains to ours in other domains shows how valuable that is.

They go for the Goodhart’s law attack that AI competition and benchmark scores aren’t as predictive for general competence as they are for humans, or at least get ahead of the general case. Okay, sure. Wait a bit.

They say if you could ‘get the competencies’ of animals into AI you might have AGI already. Whelp. That’s not how I see it, but if that’s all it takes, why be so skeptical? All we have to do is give them motor skills (have you seen the robots recently?) and sensory skills (have you seen computer vision?). This won’t take that long.

And then they want to form a company whose job is, as I understand it, largely to gather the kind of data to enable you to do that.

Then they do the ‘animals are not that different from humans except culture and imitation’ attack. I find this absurd, and I find ‘the human would only do slightly better at pivoting its goals entirely in a strange environment’ claim absurd. It’s like you have never actually met animals and want to pretend intelligence isn’t a thing.

But even if true then this is because culture solves bottlenecks that AIs never have to face in the first place – that humans have very limited data, compute, parameters, memory and most importantly time. Every 80 years or so, all the humans die, all you can preserve is what you can pass down via culture and now text, and you have to do it with highly limited bandwidth. Humans spend something like a third of their existence either learning or teaching as part of this.

Whereas AIs simply don’t have to worry about all that. In this sense, they have infinite culture. Dwarkesh points this out too, as part of the great unhobbling.

If you think that the transition from non-human primates to humans involved only small raw intelligence jumps, but did involve these unhobblings plus an additional raw compute and intelligence jump, then you should expect to see another huge effective jump from these additional unhobblings.

They say that ‘animals can pursue long term goals.’ I mean, if they can then LLMs can.

At 44: 40 the explicit claim is made that a lack of good reasoning is not a bottleneck on the economy. It’s not the only bottleneck, but to say it isn’t a huge bottleneck seems patently absurd? Especially after what has happened in the past month, where a lack of good reasoning has caused an economic crisis that is expected to drag several percentage points off GDP, and that’s one specific reasoning failure on its own.

Why all this intelligence denialism? Why can’t we admit that where there is more good reasoning, things go better, we make more and better and more valuable things more efficiently, and life improves? Why is this so hard? And if it isn’t true, why do we invest such a large percentage of our lives and wealth into creating good reasoning, in the form of our educational system?

I go over this so often. It’s a zombie idea that reasoning and intelligence don’t matter, that they’re not a bottleneck, that having more of them would not help an immense amount. No one actually believes this. The same people who think IQ doesn’t matter don’t tell you not to get an education or not learn how to do good reasoning. Stop it.

That’s not to say good reasoning is the only bottleneck. Certainly there are other things that are holding us back. But good reasoning would empower us to help solve many of those other problems faster and better, even within the human performance range. If we add in AGI or ASI performance, the sky’s the limit. How do you think one upgrades the supply chains and stimulates the demand and everything else? What do you think upgrades your entire damn economy and all these other things? One might say good reasoning doesn’t only solve bottlenecks, it’s the only thing that ever does.

On the intelligence explosion, Tamay uses the diminishing returns to R&D attack and the need for experiments attack and a need for sufficient concentration of hardware attack. There’s skepticism of claims of researcher time saved. There’s what seems like a conflation by Ege of complements versus bottlenecks, which can be the same but often aren’t. All (including me) agree this is an empirical numbers question, whether you can gain algorithmic efficiency and capability fast enough to match your growth in need for effective compute without waiting for an extended compute buildout (or, I’d assume, how fast we could then do such a buildout given those conditions.)

Then Tamay says that if we get AGI 2027, the chance of this singularity is quite high, because it’s conditioning on compute not being very large. So the intelligence explosion disagreement is mostly logically downstream of the question of how much we will need to rely on more compute versus algorithmic innovations. If it’s going to mostly be compute growth, then we get AGI later, and also to go from AGI to ASI will require further compute buildout, so that too takes longer.

(There’s a funny aside on Thermopylae, and the limits of ‘excellent leadership,’ yes they did well but they ultimately lost. To which I would respond, they only ultimately lost because they got outflanked, but also in this case ‘good leadership’ involves a much bigger edge. A better example is, classically, Cortes, who they mention later. Who had to fight off another Spanish force and then still won. But hey.)

Later there’s a section where Tamay essentially says yes we will see AIs with superhuman capabilities in various domains, pretty much all of them, but thinking of a particular system or development as ‘ASI’ isn’t a useful concept when making the AI or thinking about it. I disagree, I think it’s a very useful handle, but I get this objection.

The next section discusses explosive economic growth. It’s weird.

We spent most of the first hour with arguments (that I think are bad) for why AI won’t be that effective, but now Ege and Tamay are going to argue for 30% growth rates anyway. The discussion starts out with limitations. You need data to train for the new thing, you need all the physical inputs, you have regulatory constraints, you have limits to how far various things could go at all. But as they say the value of AI automation is just super high.

Then there’s doubt that sufficient intelligence could design even ‘the kinds of shit that humans would have invented by 2050,’ talk about ‘capital buildup’ and learning curves and efficiency gains for complementary inputs. The whole discussion is confusing to me, they list all these bottlenecks and make these statements that everything has to be learning by doing and steady capital accumulation and supply chains, saying that the person with the big innovation isn’t that big a part of the real story, that the world has too much rich detail so you can’t reason about it.

And then assert there will be big rapid growth anyway, likely starting in some small area. They equate to what happened in China, except that you can also get a big jump in the labor force, but I’d say China had that too in effect by taking people out of very low productivity jobs.

I sort of interpret this as: There is a lot of ruin (bottlenecks, decreasing marginal returns, physical requirements, etc) in a nation. You can have to deal with a ton of that, and still end up with lots of very rapid growth anyway.

They also don’t believe in a distinct ‘AI economy.’

Then later, there’s a distinct section on reasons to not expect explosive growth, and answers to them. There’s a lot of demand for intensive margin and product variety consumption, plus currently world GDP is only 10k a year versus many people happily spend millions. Yes some areas might be slower to automate but that’s fine you automate everything else, and the humans displaced can work in the slower areas. O-ring worlds consist of subcomponents and still allow unbounded scaling. Drop-in workers are easy to incorporate into existing systems until you’re ready to transition to something fully new.

They consider the biggest objection regulation, or coordination to not pursue particular technology. In this case, that seems hard. Not impossible, but hard.

A great point is when Ege highlights the distinction between rates and levels of economic activity. Often economists, in response to claims about growth rates – 30% instead of 3%, here – will make objections about future higher levels of activity, but these are distinct questions. If you’re objecting to the level you’re saying we could never get there, no matter how slowly.

They also discuss the possibility of fully AI firms, which is a smaller lift than a full AI economy. On a firm level this development seems inevitable.

There’s some AI takeover talk, they admit that AI will be much more powerful than humans but then they equate an AI taking over to the US invading Sentinel Island or Guatemala, the value of doing so isn’t that high. They make clear the AIs will steadily make out with all the resources – the reason they wouldn’t do an explicit ‘takeover’ in that scenario is that they don’t have to, and that they’re ‘integrated into our economy’ but they would be fully in control of that economy with an ever growing share of its resources, so why bother taking the rest? And the answer is, on the margin why wouldn’t you take the rest, in this scenario? Or why would you preserve the public goods necessary for human survival?

Then there’s this, and, well, wowie moment of the week:

Ege Erdil: I think people just don’t put a lot of weight on that, because they think once we have enough optimization pressure and once they become super intelligent, they’re just going to become misaligned. But I just don’t see the evidence for that.

Dwarkesh Patel: I agree there’s some evidence that they’re good boys.

Ege Erdil: No, there’s more than some evidence.

‘Good boys’? Like, no, what, absolutely not, what are you even talking about, how do you run an AI safety organization and have this level of understanding of the situation. That’s not how any of this works, in a way that I’m not going to try to fit into this margin here, by the way since you recorded this have you seen how o3 behaves? You really think that if this is ‘peaceful’ then due to ‘trade’ as they discuss soon after yes humans will lose control over the future but it will all work out for the humans? They go fully and explicitly Hansonian, what you really fear is change, man.

Also, later on, they say they are unsure if accelerating AI makes good outcomes more versus less likely, and that maybe you should care mostly about people who exist today and not the ones who might be born later, every year we delay people will die, die I tell you, on top of their other discounting of the future based on inability to predict or influence outcomes.

Well, I suppose they pivoted to run an AI capabilities organization instead.

I consider the mystery of why they did that fully solved, at this point.

Then in the next section, they doubt value lock-in or the ability to preserve knowledge long term or otherwise influence the future, since AI values will change over time. They also doubt the impact of even most major historical efforts like the British efforts to abolish slavery, where they go into some fun rabbit holing. Ultimately, the case seems to be that in the long run nothing matters and everything follows economic incentives?

Ege confirms this doesn’t mean you should give up, just that you should ‘discount the future’ and focus on the near term, because it’s hard to anticipate the long term effects of your actions and incentives will be super strong, especially if coordination is hard (including across long distances), and some past attempts to project technology have been off by many orders of magnitude. You could still try to align current AI systems to values you prefer, or support political solutions.

I certainly can feel the ‘predictions are hard especially about the future’ energy, and that predictions about what changes the outcome are hard too. But I certainly take a very different view of history, both past and future, and our role in shaping it and our ability to predict it.

Finally at 2: 12: 27 Dwarkesh asks about Mechanize, and why they think accelerating the automation of labor will be good, since so many people think it is bad and most of them aren’t even thinking about the intelligence explosion and existential risk issues.

Ege responds, because lots of economic growth is good and at first wages should even go up, although eventually they will fall. At that point, they expect humans to be able to compensate by owning lots of capital – whereas I would presume, in the scenarios they’re thinking about, that capital gets taken away or evaporates over time, including because property rights have never been long term secure and they seem even less likely to be long term secure for overwhelmed humans in this situation.

That’s on top of the other reasons we’ve seen above. They think we likely should care more about present people than future people, and then discount future people based on our inability to predict or predictably influence them, and they don’t mind AI takeover or the changes from that. So why wouldn’t this be good, in their eyes?

There is then a section on arms race dynamics, which confused me, it seems crazy to think that a year or more edge in AI couldn’t translate to a large strategic advantage when you’re predicting 30% yearly economic growth. And yes, there have been decisive innovations in the past that have come on quickly. Not only nukes, but things like ironclads.

They close with a few additional topics, including career advice, which I’ll let stand on their own.

Discussion about this post

You Better Mechanize Read More »

Author name: Paul Patrick

Discussion about this post

One chance at a first impression

The wide world of Oblivion

What’s beneath the surface?

Discussion about this post

Trump pushes for industry involvement

Non-“interruptive” ads

Looks like the real thing

Meet the fox tapeworm

Discussion about this post

Remedies

Discussion about this post