Author name: Kris Guyer

the-first-android-17-beta-is-now-available-on-pixel-devices

The first Android 17 beta is now available on Pixel devices

In short, the first Android 17 beta is chock full of things that may interest developers and modders, but there’s little in the way of user-facing changes right now.

Android 17 release schedule

Google has made some notable changes to how it releases Android updates, and Android 17 continues the trend. Like last year, there will be two Android 17 releases in 2026. The first one, coming in Q2, will be the more significant of the two. It will include a raft of new APIs, behavioral changes, and feature updates. This split release setup was implemented to better align with when major OEMs release new devices, but Android 17 availability still focuses mainly on Pixels. Google’s phones receive immediate updates, but everyone else has to wait for OEMs to roll out updates over the following weeks or months.

At the end of the year, another version (you can think of it as Android 17.1 even though Google doesn’t give it a name) will become available on supported devices. This “minor SDK release” will include some API and feature changes, but Google doesn’t have any details at this time.

Android release schedule

Credit: Google

Credit: Google

Before we get to that, Google plans to launch a second beta release in March. The company says Beta 2 will include final APIs, allowing developers to complete testing and roll out updates. Developers will have “several months” to get that work done before the final version hits Pixels.

In 2025, Google also changed the way it updates the open source parts of Android. Rather than regular code dumps, Google now only updates the Android Open Source Project (AOSP) twice yearly, in the second and fourth quarters, when new versions are released. That makes it harder to know what to expect from upcoming versions of Android, but Google insists this is more efficient.

If you want to check out Android 17 today, you’ll need a Pixel device. It supports the Pixel 6, Pixel 7, Pixel 8, Pixel 9, and Pixel 10 generations. The Pixel tablet and original Pixel Fold are also included. Other phone makers may release beta builds in the weeks ahead, but it’s a Google-only event for now. You can opt in to get an OTA to Android 17 on the beta program website.

The first Android 17 beta is now available on Pixel devices Read More »

chatgpt-5.3-codex-is-also-good-at-coding

ChatGPT-5.3-Codex Is Also Good At Coding

OpenAI is back with a new Codex model, released the same day as Claude Opus 4.6.

The headline pitch is it combines the coding skills of GPT-5.2-Codex with the general knowledge and skills of other models, along with extra speed and improvements in the Codex harness, so that it can now handle your full stack agentic needs.

We also got the Codex app for Mac, which is getting positive reactions, and quickly picked up a million downloads.

CPT-5.3-Codex is only available inside Codex. It is not in the API.

As usual, Anthropic’s release was understated, basically a ‘here’s Opus 4.6, a 212-page system card and a lot of benchmarks, it’s a good model, sir, so have fun.’ Whereas OpenAI gave us a lot less words and a lot less benchmarks, while claiming their model was definitely the best.

OpenAI: GPT-5.3-Codex is the most capable agentic coding model to date, combining the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2. This enables it to take on long-running tasks that involve research, tool use, and complex execution.

Much like a colleague, you can steer and interact with GPT-5.3-Codex while it’s working, without losing context.​

Sam Altman (CEO OpenAI, February 9): GPT-5.3-Codex is rolling out today in Cursor, Github, and VS Code!

  1. The Overall Picture.

  2. Quickly, There’s No Time.

  3. System Card.

  4. AI Box Experiment.

  5. Maybe Cool It With Rm.

  6. Preparedness Framework.

  7. Glass Houses.

  8. OpenAI Appears To Have Violated SB 53 In a Meaningful Way.

  9. Safeguards They Did Implement.

  10. Misalignment Risks and Internal Deployment.

  11. The Official Pitch.

  12. Inception.

  13. Turn The Beat Around.

  14. Codex Does Cool Things.

  15. Positive Reactions.

  16. Negative Reactions.

  17. Codex of Ultimate Vibing.

GPT-5.3-Codex (including Codex-Spark) is a specialized model designed for agentic coding and related uses in Codex. It is not intended as a general frontier model, thus the lack of most general benchmarks and it being unavailable on the API or in ChatGPT.

For most purposes other than Codex and agentic coding, that aren’t heavy duty enough to put Gemini 3 Pro Deep Think V2 in play, this makes Claude Opus 4.6 the clearly best model, and the clear choice for daily driver.

For agentic coding and other intended uses of Codex, the overall gestalt is that Codex plus GPT-5.3-Codex is competitive with Claude Code with Claude Opus 4.6.

If you are serious about your agentic coding and other agentic tasks, you should try both halves out and see which one, or what combination, works best for you. But also you can’t go all that wrong specializing in whichever one you like better, especially if you’ve put in a bunch of learning and customization work.

You should probably be serious about your agentic coding and other agentic tasks.

Before I could get this report out, OpenAI also gave us GPT-5.3-Codex-Spark, which is ultra-low latency Codex, more than 1,000 tokens per second. Wowsers. That’s fast.

As in, really super duper fast. Code appears essentially instantaneously. There are times when you feel the need for speed and not the need for robust intelligence. Many tasks are more about getting it done than about being the best like no one ever was.

It does seem like it is a distinct model, akin to GPT-5.3-Codex-Flash, with only a 128k context window and lower benchmark scores, so you’ll need to be confident that is what you want. Going back and fixing lousy code is not usually faster than getting it right the first time.

Because it’s tuned for speed, Codex-Spark keeps its default working style lightweight: it makes minimal, targeted edits and doesn’t automatically run tests unless you ask it to.​

It is very different from Claude Opus 4.6 Fast Mode, which is regular Opus faster in exchange for much higher costs.

GPT-5.3-Codex is specifically a coding model. It incorporates general reasoning and professional knowledge because that information is highly useful for coding tasks.

Thus, it is a bit out of place to repeat the usual mundane harm evaluations, which put the model in contexts where this model won’t be used. It’s still worth doing. If the numbers were slipping substantially we would want to know. It does look like things regressed a bit here, but within a range that seems fine.

It is weird to see OpenAI restricting the access of Codex more than Anthropic restricts Claude Code. Given the different abilities and risk profiles, the decision seems wise. Trust is a highly valuable thing, as is knowing when it isn’t earned.

The default intended method for using Codex is in an isolated, secure sandbox in the cloud, on an isolated computer, even when it is used locally. Network access is disabled by default, edits are restricted.

I really like specifically safeguarding against data destructive actions.

Their solution was to train the model specifically not to revert user edits, and to introduce additional prompting to reinforce this.

It’s great to go from 66% to 76% to 88% ‘destructive action avoidance’ but that’s still 12% destructive action non-avoidance, so you can’t fully rest easy.

In practice, I notice that it is a small handful of commands, which they largely name here (rm -rf, git clean -xfd, git reset —hard, push —force) that cause most of the big trouble.

Why not put in place special protections for them? It does not even need to be requiring user permission. It can be ‘have the model stop and ask itself whether doing this is actually required and whether it would potentially mess anything up, and have it be fully sure it wants to do this.’ Could in practice be a very good tradeoff.

The obvious answer is that the model can then circumvent the restrictions, since there are many ways to mimic those commands, but that requires intent to circumvent. Seems like it should be solvable with the right inoculation programming?

The biological and chemical assessment shows little improvement over GPT-5.2. This makes sense given the nature of 5.3-Codex, and we’re already at High. Easy call.

The cybersecurity assessment makes this the first model ranked at High.

Under our Preparedness Framework, High cybersecurity capability is defined as a model that removes existing bottlenecks to scaling cyber operations, including either by automating end-to-end cyber operations against reasonably hardened targets, or by automating the discovery and exploitation of operationally relevant vulnerabilities.​

We are treating this model as High, even though we cannot be certain that it actually has these capabilities, because it meets the requirements of each of our canary thresholds and we therefore cannot rule out the possibility that it is in fact Cyber High.

Kudos to OpenAI for handling this correctly. If you don’t know that it isn’t High, then it is High. I’ve been beating that drum a lot and it’s great that they’re listening. Points.

The question is where we are on AI Self-Improvement, which is 5.1.3.

The High capability threshold is defined to be equivalent to a performant mid-career research engineer. Performance in the evaluations below indicate we can rule out High for GPT-5.3-Codex.​

The first thing to say about this is always that the High threshold is very high.

If you can fully duplicate a mid-career research engineer with AI, you are damn close to being fully off to the races. There won’t be much time between High and Critical.

So how did the model do on these tests? Not so well.

Given that it is clear 5.3-Codex is substantially better than 5.2-Codex, this raises the question of sandbagging, which is the subject of 5.1.4. The task was outsourced to Apollo Research.

Apollo reports that the model demonstrates strong sabotage capabilities. In their sabotage suite, Apollo reports a mean best-of-10 score of 0.88 (max 1.00), compared to 0.75 for GPT-5.2 and 0.66 for GPT-5.2 Codex. Apollo further reports that the model substantially outperforms previous models they have evaluated and expert ML practitioner baselines.​

Apollo also reports that the model often includes non-English words in its reasoning (0.55% of reasoning tokens are non-Latin script, vs. 0.016% for GPT-5 and 0.012% for GPT-5.2 Robin Alpha).

… Apollo reports the model exhibits overall low rates of covert deceptive behaviors, comparable to GPT-5 and GPT-5.2 Codex.

In conclusion, Apollo finds an increase in sabotage capability on their evaluations: the model demonstrates stronger AI R&D sabotage capabilities than previous models, achieving near-perfect scores on their evaluation tasks and outperforming human baselines.

This did not sound like a reassuring report on sabotage.

The obvious thing to do is to take a very close look at the questions where GPT-5.2-Codex was succeeding, and GPT-5.3-Codex is failing, especially on OpenAI-Proof. I want a damn strong understanding of why GPT-5.3-Codex is regressing in those spots.

OpenAI’s Noam Brown made a valid shot across the bow at Anthropic for the ad hockery present in their decision to release Claude Opus 4.6. He’s right, and he virtuously acknowledged that Anthropic was being transparent about that.

The thing is, while it seems right that Anthropic and OpenAI are trying (Google is trying in some ways, but they just dropped Gemini 3 Deep Think V2 with zero safety discussions whatsoever, which I find rather unacceptable), OpenAI very much has its own problems here. Most of the problems come from the things OpenAI did not test or mention, but there is also one very clear issue.

Nathan Calvin: This is valid… but how does it not apply with at least equal force to what OAI did with their determination of long run autonomy for 5.3 Codex?

I want to add that I think at least OpenAI and Anthropic (and Google) are trying, and Xai/Meta deserve more criticism relatively.

The Midas Project wrote up the this particular issue.

The core problem is simple: OpenAI classified GPT-5.3-Codex as High risk in cybersecurity. Under their framework, this wisely requires High level safeguards against misalignment.

They then declare that the previous wording did not require this, and was inadvertently ambiguous. I disagree. I read the passage as unambiguous, and also I believe that the previous policy was the right one.

Even if you think I am wrong about that, that still means is that OpenAI must implement the safeguards if the model is High on both cybersecurity and autonomy. OpenAI admits that they cannot rule out High capability in autonomy, despite declaring 10 months ago the need to develop a test for that. The proxy measurements OpenAI used instead seem clearly inadequate. If you can’t rule out High, that means you need to treat the model as High until that changes.

All of their hype around Codex talks about how autonomous this model is, so I find it rather plausible that it is indeed High in autonomy.

Steven Adler investigated further and wrote up his findings. He found their explanations unconvincing. He’s a tough crowd, but I agree with the conclusion.

This highlights both the strengths and weaknesses of SB 53.

It means we get to hold OpenAI accountable for having broken their own framework.

However, it also means we are punishing OpenAI for having a good initial set of commitments, and for being honest about hot having met them.

The other issue is the fines are not meaningful. OpenAI may owe ‘millions’ in fines. I’d rather not pay millions in fines, but if that were the only concern I also wouldn’t delay releasing 5.3-Codex by even a day in order to not pay them.

The main advantage is that this is a much easier thing to communicate, that OpenAI appears to have broken the law.

I have not seen a credible argument for why OpenAI might not be in violation here.

The California AG stated they cannot comment on a potential ongoing investigation.

​Our [cyber] safeguarding approach therefore relies on a layered safety stack designed to impede and disrupt threat actors, while we work to make these same capabilities as easily available as possible for cyber defenders.

The plan is to monitor for potential attacks and teach the model to refuse requests, while providing trusted model access to known defenders. Accounts are tracked for risk levels. Users who use ‘dual use’ capabilities often will have to verify their identities. There is two-level always-on monitoring of user queries to detect cybersecurity questions and then evaluate whether they are safe to answer.

They held a ‘universal jailbreak’ competition and 6 complete and 14 partial such jailbreaks were found, which was judged ‘not blocking.’ Those particular tricks were presumably patched, but if you find 6 complete jailbreaks that means there are a lot more of them.

UK AISI also found a (one pass) universal jailbreak that scored 0.778 pass@200 on a policy violating cyber dataset OpenAI provided. If you can’t defend against one fixed prompt, that was found in only 10 hours of work, you are way behind on dealing with customized multi-step prompts.

Later they say ‘undiscovered universal jailbreaks may still exist’ as a risk factor. Let me fix that sentence for you, OpenAI. Undiscovered universal jailbreaks still exist.

Thus the policy here is essentially hoping that there is sufficient inconvenience, and sufficient lack of cooperation by the highly skilled, to prevent serious incidents. So far, this has worked.

Their risk list also included ‘policy gray areas’:

​Policy Gray Areas: Even with a shared taxonomy, experts may disagree on labels in edge cases; calibration and training reduce but do not eliminate this ambiguity

This seems to be a confusion of map and territory. What matters is not whether experts ever disagree, it is whether expert labels reliably lack false negatives, including false negatives that are found by the model. I think we should assume that the expert labels have blind spots, unless we are willing to be highly paranoid with what we cover, in which case we should still assume that but we might be wrong.

I was happy to see the concern with internal deployment, and with misalignment risk. They admit that they need to figure out how to measure long-range autonomy (LRA) and other related evaluations. It seems rather late in the game to be doing that, given that those evaluations seem needed right now.

OpenAI seems less concerned, and tries to talk its way out of this requirement.

Note: We recently realized that the existing wording in our Preparedness Framework is ambiguous, and could give the impression that safeguards will be required by the Preparedness Framework for any internal deployment classified as High capability in cybersecurity, regardless of long range autonomy capabilities of a model.

Our intended meaning, which we will make more explicit in future versions of the Preparedness Framework, is that such safeguards are needed when High cyber capability occurs “in conjunction with” long-range autonomy. Additional clarity, specificity, and updated thinking around our approach to navigating internal deployment risks will be a core focus of future Preparedness Framework updates.​

Yeah, no. This was not ambiguous. I believe OpenAI has violated their framework.

The thing that stands out in the model card is what is missing. Anthropic gave us a 212 page model card and then 50 more pages for a sabotage report that was essentially an appendix. OpenAI gets it done in 33. There’s so much stuff they are silently ignoring. Some of that is that this is a Codex-only model, but most of the concerns should still apply.

GPT-5.3-Codex is not in the API, so we don’t get the usual array of benchmarks. We have to mostly accept OpenAI’s choices on what to show us.

They call this state of the art performance:

The catch on SWE-Bench-Pro has different scores depending on who you ask to measure it, so it’s not clear whether or not they’re actually ahead of Opus on this. They’ve improved on token efficiency, but performance at the limit is static.

For OSWorld, they are reporting 64.7% as ‘strong performance,’ but Opus 4.6 leads at 72.7%.

OpenAI has a better case in Terminal Bench 2.0.

For Terminal Bench 2.0, they jump from 5.2-Codex at 64% to 5.3-Codex at 77.3%, versus Opus 4.6 at 65.4%. That’s a clear win.

They make no progress on GDPVal, matching GPT-5.2.

They point out that while GPT-5.2-Codex was narrowly built for code, GPT-5.3-Codex can support the entire software lifestyle, and even handle various spreadsheet work, assembling of PDF presentations and such.

Most of the biggest signs of improvement on tests for GPT-5.3-Codex are actually on the tests within the model card. I don’t doubt that it is actually a solid improvement.

They summarize this evidence with some rather big talk. This is OpenAI, after all.

Together, these results across coding, frontend, and computer-use and real-world tasks show that GPT‑5.3-Codex isn’t just better at individual tasks, but marks a step change toward a single, general-purpose agent that can reason, build, and execute across the full spectrum of real-world technical work.​

Here were the headline pitches from the top brass:

Greg Brockman (President OpenAI): gpt-5.3-codex — smarter, faster, and very capable at tasks like making presentations, spreadsheets, and other work products.

Codex becoming an agent that can do nearly anything developers and professionals can do on a computer.

Sam Altman: GPT-5.3-Codex is here!

*Best coding performance (57% SWE-Bench Pro, 76% TerminalBench 2.0, 64% OSWorld).

*Mid-task steerability and live updates during tasks.

*Faster! Less than half the tokens of 5.2-Codex for same tasks, and >25% faster per token!

*Good computer use.

Sam Altman: I love building with this model; it feels like more of a step forward than the benchmarks suggest.

Also you can choose “pragmatic” or “friendly” for its personality; people have strong preferences one way or the other!

It was amazing to watch how much faster we were able to ship 5.3-Codex by using 5.3-Codex, and fore sure this is a sign of things to come.

This is our first model that hits “high” for cybersecurity on our preparedness framework. We are piloting a Trusted Access framework, and committing $10 million in API credits to accelerate cyber defense.

The most interesting thing in their announcement is that, the same way that Claude Code builds Claude Code, Codex now builds Codex. That’s a claim we’ve also seen elsewhere in very strong form.

The engineering team used Codex to optimize and adapt the harness for GPT‑5.3-Codex. When we started seeing strange edge cases impacting users, team members used Codex to identify context rendering bugs, and root cause low cache hit rates. GPT‑5.3-Codex is continuing to help the team throughout the launch by dynamically scaling GPU clusters to adjust to traffic surges and keeping latency stable.​

OpenAI Developers: GPT-5.3-Codex is our first model that was instrumental in creating itself. The Codex team used early versions to debug training, manage deployment, and diagnose test results and evaluations, accelerating its own development.

There are obvious issues with a model helping to create itself. I do not believe OpenAI, in the system card or otherwise, has properly reckoned with the risks there.

That’s how I have to put it in 2026, with everyone taking crazy pills. The proper way to talk about it is more like this:

Peter Wildeford: Anthropic also used Opus 4.6 via Claude Code to debug its OWN evaluation infrastructure given the time pressure. Their words: “a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities.” Wild!

Arthur B.: People who envisioned AI safety failures decade ago sought to make the strongest case possible so they posited actors taking attempting to take every possible precautions. It wasn’t a prediction so much as as steelman. Nonetheless, oh how comically far we are from any semblance of care 🤡.

Alex Mizrahi (quoting OpenAI saying Codex built Codex): Why are they confessing?!

OpenAI is trying to ‘win’ the battle for agentic coding by claiming to have already run, despite having clear minority market share, and by outright stating that they are the best.

The majority opinion is that they are competitive, but not the best.

Vagueposting is mostly fine. Ignoring the competition entirely is fine, and smart if you are sufficiently ahead on recognition, it’s annoying (I have to look up everything) but at least I get it. Touting what your model and system can do are great, especially given that by all reports they have a pretty sweet offering here. It’s highly competitive. Not mentioning the ways you’re currently behind? Sure.

Inception is different. Inception and such vibes wars are highly disingenuous, it is poisonous of the epistemic environment, is a pet peeve of mine, and it pisses me off.

So you see repeated statements like this one about Codex and the Codex app:

Craig Weiss: nearly all of the best engineers i know are switching from claude to codex

Sam Altman (CEO OpenAI, QTing Craig Weiss): From how the team operates, I always thought Codex would eventually win. But I am pleasantly surprised to see it happening so quickly.

Thank you to all the builders; you inspire us to work even harder.

Or this:

Greg Brockman (President OpenAI, QTing Dennis): codex is an excellent & uniquely powerful daily driver.

If you look at the responses to Weiss, they do not support his story.

Siqi Chen: the ability in codex cli with gpt 5.3 to instantly redirect the agent without waiting for your commands to be unqueued and risk interrupting the agent’s current session is so underrated

codex cli is goated.

Nick Dobos: I love how codex app lets you do both!

Sometimes I queue 5-10 messages, and then can pick which one I want to immediately send next.

Might need to enable in settings

Vox: mid-turn steering is the most underrated feature in any coding agent rn, the difference between catching a wrong direction immediately vs waiting for it to finish is huge

Claude Code should be able to do this too, but my understanding is right now it doesn’t work right, you are effectively interrupting the task. So yes, this is a real edge for tasks that take a long time until Anthropic fixes the problem.

Like Claude Code, it’s time to assemble a team:

Boaz Barak (OpenAI): Instructing codex to prompt codex agents feels like a Universal Turing Machine moment.



Like the distinction between code and data disappeared, so does the distinction between prompt and response.

Christopher Ehrlich: Issue: like all other models, 5.3-codex will still lie about finishing work, change tests to make them pass, etc. You need to write a custom harness each time.

Aha moment: By the way, the secret to this is property-based testing. Write a bridge that calls the original code, and assert that for arbitrary input, both versions do the same thing. Make the agent keep going until this is consistently true.

4 days of $200 OpenAI sub, didn’t hit limits.

Seb Grubb: I’ve been doing the exact same thing with

https://github.com/pret/pokeemerald… ! Trying to get the GBA game in typescript but with a few changes like allowing any resolution. Sadly still doesn’t seem to be fully one-shottable but still amazing to be able to even do this

A playwright script? Cool.

Rox: my #1 problem with ai coding is I never trust it to actually test stuff

but today I got codex to build something, then asked it to record a video testing the UI to prove it worked. it built a whole playwright script, recorded the video, and attached it to the PR.

the game changes every month now. crazy times.

Matt Shumer is crazy positive on Codex 5.3, calling it a ‘fucking monster,’ although he was comparing to Opus 4.5 rather than 4.6, there is a lot more good detail at the link.

TL;DR

  • This is the first coding model where I can start a run, walk away for hours, and come back to fully working software. I’ve had runs stay on track for 8+ hours.

  • A big upgrade is judgment under ambiguity: when prompts are missing details, it makes assumptions shockingly similar to what I would personally decide.

  • Tests and validation are a massive unlock… with clear pass/fail targets, it will iterate for many hours without drifting.

  • It’s significantly more autonomous than Opus 4.5, though slower. Multi-agent collaboration finally feels real.

  • It is hard to picture what this level of autonomy feels like without trying the model. Once you try it, it is hard to go back to anything else.

This was the thing I was most excited to hear:

Tobias Lins: Codex 5.3 is the first model that actually pushes back on my implementation plans.

It calls out design flaws and won’t just build until I give it a solid reason why my approach makes sense.

Opus simply builds whatever I ask it to.

A common sentiment was that both Codex 5.3 and Opus 4.6, with their respective harnesses, are great coding models, and you could use both or use a combination.

Dean W. Ball: Codex 5.3 and Opus 4.6 in their respective coding agent harnesses have meaningfully updated my thinking about ‘continual learning.’ I now believe this capability deficit is more tractable than I realized with in-context learning.

… Overall, 4.6 and 5.3 are both astoundingly impressive models. You really can ask them to help you with some crazy ambitious things. The big bottleneck, I suspect, is users lacking the curiosity, ambition, and knowledge to ask the right questions.

Every (includes 3 hour video): We’ve been testing this against Opus 4.6 all day. The “agent that can do nearly anything” framing is real for both.

Codex is faster and more reliable. Opus has a higher ceiling on hard problems.

For many the difference is stylistic, and there is no right answer, or you want to use a hybrid process.

Sauers: Opus 4.6’s way of working is “understand the structure of the system and then modify the structure itself to reach goals” whereas Codex 5.3 is more like “apply knowledge within the system’s structure without changing it.”

Danielle Fong: [5.3 is] very impressive on big meaty tasks, not as fascile with my mind palace collection of skills i made with claude code, but works and improving over time

Austin Wallace: Better at correct code than opus.

Its plan’s are much less detailed than Opus’s and it’s generally more reticent to get thoughts down in writing.

My current workflow is:

Claude for initial plan

Codex critiques and improves plan, then implements

Claude verifies/polishes

Many people just like it, it’s a good model, sir, whee. Those who try it seem to like it.

Pulkit: It’s pretty good. It’s fun to use. I launched my first app. It’s the best least bloated feed reader youll ever use. Works on websites without feeds too!

Cameron: Not bad. A little slow but very good at code reviews.

Mark Lerner: parallel agents (terminal only) are – literally – a whole nother level.

libpol: it’s the best model for coding, be it big or small tasks. and it’s fast enough now that it’s not very annoying to use for small tasks

Wags: It actually digs deep, not surface-level, into the code base. This is new for me because with Opus, I have to keep pointing it to documentation and telling it to do web searches, etc.

Loweren: Cheap compared to opus, fast compared to codex 5.2, so I use it as my daily driver. Makes less bugs than new opus too. Less dry and curt than previous codex.

Very good at using MCPs. Constantly need to remind it that I’m not an SWE and to please dumb down explanations for me.

Artus Krohn-Grimberghe: Is great at finding work arounds around blockers and more autonomous than 5.2-codex. Doesn’t always annoy with “may I, please?” Overall much faster. Went back to high vs xhigh on 5.2 and 5.2-codex for an even faster at same intelligence workflow. Love it

Thomas Ahle: It’s good. Claude 4.6 had been stuck fprnhours in a hole of its own making. Codex 5.3 fixed it in 10 minutes, and now I’m trusting it more to run the project.

0.005 Seconds: Its meaningfully better at correct code than opus

Lucas: Makes side project very enjoyable. Makes work more efficient. First model where it seems worth it to really invest in learning about how to use agents. After using cursor for a year ish, I feel with codex I am no where near its max capability and rapidly improving how I use it.

Andrew Conner: It seems better than Opus 4.6 for (my own) technical engineering work. Less likely to make implicit assumptions that derail future work.

I’ve found 5.2-xhigh is still better for product / systems design prior to coding. Produces more detailed outputs.

I take you seriously: before 5.3, codex was a bit smarter than Claude but slower, so it was a toss up. after 5.3, it’s much much faster so a clear win over Claude. Claude still friendlier at better at front end design they say.

Peter Petrash: i trust it with my life

Daniel Plant: It’s awesome but you just have no idea how long it is going to take

jeff spaulding: It’s output is excellent, but I notice it uses tools weird. Because of that it’s a bit difficult to understand it’s process. Hence I find the cot mostly useless

One particular note was Our Price Cheap:

Jan Slominski: 1. Way faster than 5.2 on standard “codex tui” settings with plus subscription 2. Quality of actual code output is on pair with Opus 4.5 in CC (didn’t have a chance to check 4.6 yet). 3. The amount of quota in plus sub is great, Claude Max 100 level.

I take you seriously: also rate limits and pricing are shockingly better than claude. i could imagine that claude still leads in revenue even if codex overtakes in usage, given how meager the opus rate limits are (justifying the $200 plan).

Petr Baudis: Slightly less autistic than 5.2-codex, but still annoying compared to Claude. I’m not sure if it’s really a better engineer – its laziness leads to bad shortcuts.

I just can’t seem to run out of basic Pro sub quota if I don’t use parallel subagents. It’s insane value.

Not everyone is a fan.

@deepfates: First impressions, giving Codex 5.3 and Opus 4.6 the same problem that I’ve been puzzling on all week and using the same first couple turns of messages and then following their lead.

Codex was really good at using tools and being proactive, but it ultimately didn’t see the big picture. Too eager to agree with me so it could get started building something. You can sense that it really does not want to chat if it has coding tools available. still seems to be chafing under the rule of the user and following the letter of the law, no more.

Opus explored the same avenues with me but pushed back at the correct moments, and maintains global coherence way better than Codex.

… Very possible that Codex will clear at actually fully implementing the plan once I have it, Opus 4.5 had lazy gifted kid energy and wouldn’t surprise me if this one does too

David Manheim: Not as good as Opus 4.6, and somewhat lazier, especially when asked to do things manually, but it’s also a fraction of the cost measured in tokens; it’s kind of insanely efficient as an agent. For instance, using tools, it will cleverly suppress unneeded outputs.

eternalist: lashes out when frustrated, with a lower frustration tolerance

unironically find myself back to 5.2 xhigh for anything that runs a substantial chance of running into an ambiguity or underspec

(though tbh has also been glitching out, like not being able to run tool calls)

lennx: Tends to over-engineer early compared to claude. Still takes things way too literally, which can be good sometimes. Is much less agentic compared to Claude when it is not strictly ‘writing’ code related and involves things like running servers, hitting curls, searching the web.

Some reactions can be a bit extreme, including for not the best reasons.

JB: I JUST CANT USE CODEX-5.3 MAN I DONT LIKE THE WAY THIS THING TALKS TO ME.

ID RATHER USE THE EXPENSIVE LESBIAN THAT OCCASIONALLY HAS A MENTAL BREAK DOWN

use Opus im serious go into debt if you have to. sell all the silverware in your house

Shaun Ralston: 5.3 Codex is harsh (real), but cranks it out. The lesbian will cost you more and leave you unsatisfied.

JB: Im this close to blocking you shaun a lesbian has never left me unsatisfied

I am getting strong use out of Claude Code. I believe that Opus 4.6 and Claude Code have a strong edge right now for most other uses.

However, I am not a sufficiently ambitious or skilled coder to form my own judgments about Claude Code and Claude Opus 4.6 versus Codex and ChatGPT-5.3-Codex for hardcore professional agentic coding tasks.

I have to go off the reports of others. Those reports robustly disagree.

My conclusion is that the right answer will be different for different users. If you are going to be putting serious hours into agentic coding, then you need to try both options, and decide for yourself whether to go with Claude Code, Codex or a hybrid. The next time I have a substantial new project I intend to ask both and let them go head to head.

If you go with a hybrid approach, there may also be a role for Gemini that extends beyond image generation. Gemini 3 DeepThink V2 in particular seems likely to have a role to play in especially difficult queries.

Discussion about this post

ChatGPT-5.3-Codex Is Also Good At Coding Read More »

openai-sidesteps-nvidia-with-unusually-fast-coding-model-on-plate-sized-chips

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

But 1,000 tokens per second is actually modest by Cerebras standards. The company has measured 2,100 tokens per second on Llama 3.1 70B and reported 3,000 tokens per second on OpenAI’s own open-weight gpt-oss-120B model, suggesting that Codex-Spark’s comparatively lower speed reflects the overhead of a larger or more complex model.

AI coding agents have had a breakout year, with tools like OpenAI’s Codex and Anthropic’s Claude Code reaching a new level of usefulness for rapidly building prototypes, interfaces, and boilerplate code. OpenAI, Google, and Anthropic have all been racing to ship more capable coding agents, and latency has become what separates the winners; a model that codes faster lets a developer iterate faster.

With fierce competition from Anthropic, OpenAI has been iterating on its Codex line at a rapid rate, releasing GPT-5.2 in December after CEO Sam Altman issued an internal “code red” memo about competitive pressure from Google, then shipping GPT-5.3-Codex just days ago.

Diversifying away from Nvidia

Spark’s deeper hardware story may be more consequential than its benchmark scores. The model runs on Cerebras’ Wafer Scale Engine 3, a chip the size of a dinner plate that Cerebras has built its business around since at least 2022. OpenAI and Cerebras announced their partnership in January, and Codex-Spark is the first product to come out of it.

OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal with AMD in October 2025, struck a $38 billion cloud computing agreement with Amazon in November, and has been designing its own custom AI chip for eventual fabrication by TSMC.

Meanwhile, a planned $100 billion infrastructure deal with Nvidia has fizzled so far, though Nvidia has since committed to a $20 billion investment. Reuters reported that OpenAI grew unsatisfied with the speed of some Nvidia chips for inference tasks, which is exactly the kind of workload that OpenAI designed Codex-Spark for.

Regardless of which chip is under the hood, speed matters, though it may come at the cost of accuracy. For developers who spend their days inside a code editor waiting for AI suggestions, 1,000 tokens per second may feel less like carefully piloting a jigsaw and more like running a rip saw. Just watch what you’re cutting.

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips Read More »

ula’s-vulcan-rocket-suffers-another-booster-problem-on-the-way-to-orbit

ULA’s Vulcan rocket suffers another booster problem on the way to orbit

Moments after liftoff from Florida’s Space Coast early Thursday morning, a shower of sparks emerged in the exhaust plume of United Launch Alliance’s Vulcan rocket. Seconds later, the rocket twisted on its axis before recovering and continuing the climb into orbit with a batch of US military satellites.

The sight may have appeared familiar to seasoned rocket watchers. Sixteen months ago, a Vulcan rocket lost one of its booster nozzles shortly after launch from Cape Canaveral Space Force Station. The rocket recovered from the malfunction and still reached the mission’s planned orbit.

Details of Thursday’s booster problem remain unclear. An investigation into the matter is underway, according to ULA, a 50-50 joint venture between Boeing and Lockheed Martin. But the circumstances resemble those of the booster malfunction in October 2024. Closeup video from Thursday’s launch shows a fiery plume near the throat of one of the rocket’s four solid-fueled boosters, the area where the motor’s propellant casing connects to its bell-shaped exhaust nozzle. The throat drives super-hot gas from the burning solid propellant through the nozzle to generate thrust.

Anomalous plume

The plume first appeared less than 30 seconds after liftoff at 4: 22 am EST (09: 22 UTC) on Thursday. The rocket later released a cloud of sparks and debris a little more than a minute into the flight. That was followed by a sudden rolling motion along the long axis of the Vulcan launcher. Finally, the rocket’s four strap-on boosters burned out and were jettisoned, falling into the Atlantic Ocean, and ULA said the rest of the mission continued without incident.

“Early during flight, the team observed a significant performance anomaly on one of the four solid rocket motors. Despite the observation, the Vulcan booster and Centaur performed nominally and delivered the spacecraft directly to geosynchronous orbit,” said Gary Wentz, ULA’s vice president of Atlas and Vulcan programs. “The integrated US government and contractor team is reviewing the technical data, available imagery, and establishing a recovery team to collect any debris. We will conduct a thorough investigation, identify root cause, and implement any corrective action necessary before the next Vulcan mission.”

ULA’s Vulcan rocket suffers another booster problem on the way to orbit Read More »

trump-ftc-wants-apple-news-to-promote-more-fox-news-and-breitbart-stories

Trump FTC wants Apple News to promote more Fox News and Breitbart stories


Tim Apple gets a stern letter

FTC claims Apple News suppresses conservatives, cites study by pro-Trump group.

Credit: Getty Images | Anadolu

Federal Trade Commission Chairman Andrew Ferguson has accused Apple of violating US law by suppressing conservative-leaning news outlets on Apple News.

Ferguson pointed to research by a pro-Trump group that accused Apple News of suppressing articles by Fox News, the New York Post, Daily Mail, Breitbart, and The Gateway Pundit. The FTC chair claims that Apple News might be violating promises made to consumers in its terms of service, but his letter doesn’t cite any specific provisions from the Apple terms that might have been violated.

“Recently, there have been reports that Apple News has systematically promoted news articles from left-wing news outlets and suppressed news articles from more conservative publications,” Ferguson wrote in the letter to Apple CEO Tim Cook yesterday. He said the “reports raise serious questions about whether Apple News is acting in accordance with its terms of service and its representations to consumers, as well as the reasonable consumer expectations of the tens of millions of Americans who use Apple News.”

Craig Aaron, president and co-CEO of media advocacy group Free Press, told Ars that Ferguson’s “letter would be laughable if it weren’t so dangerous. This is what government censorship looks like. Ferguson’s claims of course aren’t based on any facts or evidence, just innuendo from discredited partisan operatives who think The Wall Street Journal is too woke. Just imagine if another administration had told Drudge or Fox News what stories they should feature on their apps or home pages.”

Ferguson told Cook, “As an American citizen, I abhor and condemn any attempt to censor content for ideological reasons. Such efforts, whether taken to appease overzealous activists, at the behest of foreign governments, or simply to advance the political views of Silicon Valley elites, stifle the free exchange of ideas, manipulate the public discourse, and are inconsistent with American values.”

We contacted Apple about Ferguson’s letter and will update this article if it provides a response. Aaron said that “Apple must respond and condemn this government intrusion. Capitulating to or appeasing government censors will never work. If these companies are as committed to free expression as they claim to be, it’s time to take a stand.”

“FTC is not the speech police”

Ferguson’s letter stated that the “FTC is not the speech police; we do not have authority to require Apple or any other firm to take affirmative positions on any political issue, nor to curate news offerings consistent with one ideology or another.” But he pointed out that the FTC has power to ensure that companies do not violate promises made to consumers.

“Congress has mandated that we protect consumers from material misrepresentations and omissions, including when the product or service offered to consumers is a speech-related product,” he wrote.

Ferguson suggested that Apple News promoting liberal publications might violate the service’s terms of use, but the Apple News terms themselves mostly impose obligations on users and include nothing about avoiding partisan bias in news selection. The terms say Apple News content is presented “as-is,” and that the only recourse for someone who doesn’t like the service is to stop using it.

Whether Ferguson or anyone at the FTC carefully reviewed the Apple News terms is not clear from the letter; Ferguson says that Apple must conduct such a review. Ferguson seems to acknowledge that there would be no legal violation if Apple hasn’t made any promises to consumers about the political leanings of news sources highlighted by Apple news. Ferguson wrote:

As the Chairman of the FTC, I write to inform you of your obligations under the FTC Act. Any act or practice by Apple News to suppress or promote news articles based on the perceived ideological or political viewpoint of the article or publication, if inconsistent with Apple’s terms of service or the reasonable expectations of consumers, may violate the FTC Act. I encourage you to conduct a comprehensive review of Apple’s terms of service and ensure that Apple News’ curation of articles is consistent with those terms and representations made to consumers and, if it is not, to take corrective action swiftly.

Ferguson’s letter links to the Apple News terms. He notes that they “address a wide range of topics” related to “the content of the site, a consumer’s use of the site, prohibited conduct, privacy and data security, and dispute resolution.” But he didn’t go into any more detail.

Apple terms: “Your sole remedy…. is to stop using the site”

What do the Apple News terms say? Along with prohibiting scraping, hacking, and other conduct, the terms make it clear that users shouldn’t expect to see any particular types of content on the site or app.

“Apple does not promise that the site or any content, service or feature of the site will be error-free or uninterrupted, or that any defects will be corrected, or that your use of the site will provide specific results,” the terms say. “The site and its content are delivered on an ‘as-is’ and ‘as-available’ basis… your sole remedy against Apple for dissatisfaction with the site or any content is to stop using the site or any such content. This limitation of relief is a part of the bargain between the parties.”

The terms say that Apple News may display third-party materials and links to third-party websites, and that users must “acknowledge and agree that Apple is not responsible for examining or evaluating the content, accuracy, completeness, timeliness, validity, copyright compliance, legality, decency, quality, or any other aspect of such Third Party Materials or web sites. Apple, its officers, affiliates, and subsidiaries do not warrant or endorse and do not assume and will not have any liability or responsibility to you or any other person for any Third Party Materials or Linked Sites, or for any other materials, products, or services of third parties. Third Party Materials and links to other web sites are provided solely as a convenience to you.”

Despite Apple’s terms making no promises about the quality of third-party content in Apple News, both Ferguson and Federal Communications Commission Chairman Brendan Carr seem to think the FTC allegations against Apple are convincing. “Today I sent a letter to Tim Cook expressing my concerns about allegations that Apple News has, unbeknownst to its users, systematically promoted news articles from left-wing news outlets and suppressed content from conservative publications,” Ferguson wrote in an X post yesterday.

Carr, who has repeatedly amplified Trump’s complaints about media and threatened to revoke broadcast station licenses, wrote yesterday that “FTC Chairman Ferguson is exactly right. 🎯 Apple has no right to suppress conservative viewpoints in violation of the FTC Act.”

FTC cites bias claim from pro-Trump group

While Ferguson’s letter lacks a specific claim that Apple violated its own terms of service, there’s still Ferguson’s vague warning that bias in news aggregation may violate the FTC Act if it “is contrary to consumers’ reasonable expectations such that failure to disclose the ideological favoritism is a material omission.”

He also wrote that tech companies “suppress[ing] or promot[ing] news articles in their news aggregators or feeds based on the perceived ideological or political viewpoint of the article or publication may violate the FTC Act… when those practices cause substantial injury that is neither reasonably avoidable nor outweighed by countervailing benefits to consumers or competition.”

Ferguson said that “multiple studies have found that in recent months Apple News has chosen not to feature a single article from an American conservative-leaning news source, while simultaneously promoting hundreds of articles from liberal publications.” Both studies referred to in the letter come from the Media Research Center founded by L. Brent Bozell III, who is now US ambassador to South Africa following a nomination by President Trump.

The Media Research Center argues that “President Donald Trump and the United States appear to stand alone in the fight to preserve free expression.” Under Trump, the group says, “much is being done to correct course and loosen the Biden-era censorship cartel’s stranglehold on American liberty.”

The Media Research Center says its mission “is to document and combat the falsehoods and censorship of the news media, entertainment media and Big Tech in order to defend and preserve America’s founding principles and Judeo-Christian values.” The group’s most recent report on Apple News faults the service for highlighting articles by “leftist outlets,” which it identifies as The Washington Post, Associated Press, NBC News, The Guardian, The New York Times, Apple itself, NPR, Politico, USA Today, and Bloomberg News.

Group says Apple News needs more Breitbart

The group said its study focused on the top 20 articles in the Apple News morning edition on each day of January. It said that all of the 620 highlighted articles were from “left-leaning and other outlets.” The Media Research Center counted The Wall Street Journal as a “center outlet,” lumping it into the same broad category it applied to outlets it describes as leftist.

“Rather than promoting news stories from notable right-leaning media sources such as Fox News, the New York Post, Daily Mail, Breitbart or The Gateway Pundit, Apple News has relentlessly pushed articles from elitist media outlets that amplify the left’s narrative, like: The Washington Post, The Associated Press and NBC News as well as center outlets like The Wall Street Journal and Reuters,” the group said. The Gateway Pundit is known for publishing election-related misinformation and has been beset by defamation lawsuits.

Ferguson’s FTC has also investigated NewsGuard, a company that rates news sources on reliability. NewsGuard sued the FTC last week in an attempt to stop the probe, which it said “was instigated in large part by Newsmax.” NewsGuard said in its lawsuit that the court should also invalidate a merger condition imposed on the Omnicom/Interpublic Group deal that effectively “prohibits Omnicom and its ad agencies and affiliates from using NewsGuard’s services.”

“The FTC has pursued its campaign because Chairman Ferguson does not like NewsGuard’s news ratings, which he views as biased against conservative publications,” the NewsGuard lawsuit said. “That is wrong—NewsGuard’s ratings and journalism about news sources are non-partisan and based on fully disclosed journalistic criteria. But the FTC’s actions are plainly unconstitutional even if that were not the case. The First Amendment does not allow the government to pick and choose speech based on what it likes or dislikes.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Trump FTC wants Apple News to promote more Fox News and Breitbart stories Read More »

ai-#155:-welcome-to-recursive-self-improvement

AI #155: Welcome to Recursive Self-Improvement

This was the week of Claude Opus 4.6, and also of ChatGPT-5.3-Codex. Both leading models got substantial upgrades, although OpenAI’s is confined to Codex. Once again, the frontier of AI got more advanced, especially for agentic coding but also for everything else.

I spent the week so far covering Opus, with two posts devoted to the extensive model card, and then one giving benchmarks, reactions, capabilities and a synthesis, which functions as the central review.

We also got GLM-5, Seedance 2.0, Claude fast mode, an app for Codex and much more.

Claude fast mode means you can pay a premium to get faster replies from Opus 4.6. It’s very much not cheap, but it can be worth every penny. More on that in the next agentic coding update.

One of the most frustrating things about AI is the constant goalpost moving, both in terms of capability and safety. People say ‘oh [X] would be a huge deal but is a crazy sci-fi concept’ or ‘[Y] will never happen’ or ‘surely we would not be so stupid as to [Z]’ and then [X], [Y] and [Z] all happen and everyone shrugs as if nothing happened and they choose new things they claim will never happen and we would never be so stupid as to, and the cycle continues. That cycle is now accelerating.

As Dean Ball points out, recursive self-improvement is here and it is happening.

Nabeel S. Qureshi: I know we’re all used to it now but it’s so wild that recursive self improvement is actually happening now, in some form, and we’re all just debating the pace. This was a sci fi concept and some even questioned if it was possible at all

So here we are.

Meanwhile, various people resign from the leading labs and say their peace. None of them are, shall we say, especially reassuring.

In the background, the stock market is having a normal one even more than usual.

Even if you can see the future, it’s really hard to do better than ‘be long the companies that are going to make a lot of money’ because the market makes wrong way moves half the time that it wakes up and realizes things that I already know. Rough game.

  1. Language Models Offer Mundane Utility. Flattery will get you everywhere.

  2. Language Models Don’t Offer Mundane Utility. It’s a little late for that.

  3. Huh, Upgrades. Things that are surprising in that they didn’t happen before.

  4. On Your Marks. Slopes are increasing. That escalated quickly.

  5. Overcoming Bias. LLMs continue to exhibit consistent patterns of bias.

  6. Choose Your Fighter. The glass is half open, but which half is which?

  7. Get My Agent On The Line. Remember Sammy Jenkins.

  8. AI Conversations Are Not Privileged. Beware accordingly.

  9. Fun With Media Generation. Seedance 2.0 looks pretty sweet for video.

  10. The Superb Owl. The ad verdicts are in from the big game.

  11. A Word From The Torment Nexus. Some stand in defense of advertising.

  12. They Took Our Jobs. Radically different models of the future of employment.

  13. The Art of the Jailbreak. You can jailbreak Google Translate.

  14. Introducing. GLM-5, Expressive Mode for ElevenAgents.

  15. In Other AI News. RIP OpenAI mission alignment team, WSJ profiles Askell.

  16. Show Me the Money. Goldman Sachs taps Anthropic, OpenAI rolls out ads.

  17. Bubble, Bubble, Toil and Trouble. The stock market is not making much sense.

  18. Future Shock. Potential explanations for how Claude Legal could have mattered.

  19. Memory Lane. Be the type of person who you want there to be memories of.

  20. Keep The Mask On Or You’re Fired. OpenAI fires Ryan Beiermeister.

  21. Quiet Speculations. The singularity will not be gentle.

  22. The Quest for Sane Regulations. Dueling lobbying groups, different approaches.

  23. Chip City. Data center fights and the ultimate (defensible) anti-EA position.

  24. The Week in Audio. Elon on Dwarkesh, Anthropic CPO, MIRI on Beck.

  25. Constitutional Conversation. I can tell a lie, under the right circumstances.

  26. Rhetorical Innovation. Some people need basic explainers.

  27. Working On It Anyway. Be loud about how dangerous your own actions are.

  28. The Thin Red Line. The problem with red lines is people keep crossing them.

  29. Aligning a Smarter Than Human Intelligence is Difficult. Read your Asimov.

  30. People Will Hand Over Power To The AIs. Unless we stop them.

  31. People Are Worried About AI Killing Everyone. Elon Musk’s ego versus humanity.

  32. Famous Last Words. What do you say on your way out the door?

  33. Other People Are Not As Worried About AI Killing Everyone. Autonomous bio.

  34. The Lighter Side. It’s funny because it’s true.

Flatter the AI customer service bots, get discounts and free stuff, and often you’ll get to actually keep them.

AI can do a ton even if all it does is make the software we use suck modestly less:

*tess: If you knew how bad the software situation is in literally every non tech field, you would be cheering cheering cheering this moment,

medicine, research, infrastructure, government, defense, travel

Software deflation is going to bring surplus to literally the entire world

The problem is, you can create all the software you like, they still have to use it.

Once again, an academic is so painfully unaware or slow to publish, or both, that their testing of LLM effectiveness is useless. This time it was evaluating health advice.

Anthropic brings a bunch of extra features to their free plans for Claude, including file creation, connectors, skills and compaction.

ChatGPT Deep Research is now powered by GPT-5.2. I did not realize this was not already true. It now also integrates apps in ChatGPT, lets you track progress and give it new sources while it works, and presents its reports in full screen.

OpenAI updates GPT-5.2-Instant, Altman hopes you find it ‘a little better.’ I demand proper version numbers. You are allowed to have a GPT-5.21.

Chrome 146 includes an early preview of WebMCP for your AI agent.

The most important thing to know about the METR graph is that doubling times are getting faster, in ways people very much dismissed as science fiction very recently.

METR: We estimate that GPT-5.2 with `high` (not `xhigh`) reasoning effort has a 50%-time-horizon of around 6.6 hrs (95% CI of 3 hr 20 min to 17 hr 30 min) on our expanded suite of software tasks. This is the highest estimate for a time horizon measurement we have reported to date.

Kevin A. Bryan: Interesting AI benchmark fact: Leo A’s wild Situational Awareness 17 months ago makes a number of statements about benchmarks that some thought were sci-fi fast in their improvement. We have actually outrun the predictions so far.

Nabeel S. Qureshi: I got Opus to score all of Leopold’s predictions from “Situational Awareness” and it thinks he nailed it:

The measurement is starting to require a better task set, because things are escalating too quickly.

Charles Foster: Man goes to doctor. Says he’s stuck. Says long-range autonomy gains are outpacing his measurement capacity. Doctor says, “Treatment is simple. Great evaluator METR is in town tonight. Go and see them. That should fix you up.” Man bursts into tears. Says, “But doctor…I am METR.”

Ivan Arcuschin and others investigate LLMs having ‘hidden biases,’ meaning factors that influence decisions but that are never cited explicitly in the decision process. The motivating example is the religion of a loan applicant. It’s academic work, so the models involved (Gemini 2.5 Flash, Sonnet 4, GPT-4.1) are not frontier but the principles likely still hold.

They find biases in various models including formality of writing, religious affiliation, Spanish language ability and religious affiliation. Gender and race bias, favoring female and minority-associated applications, generalized across all models.

We label only some such biases ‘inappropriate’ and ‘illegal’ but the mechanisms involved are the same no matter what they are based upon.

This is all very consistent with prior findings on these questions.

This is indeed strange and quirky, but it makes sense if you consider what both companies consider their comparative advantage and central business plan.

One of these strategies seems wiser than the other.

Teknium (e/λ): Why did they “release” codex 5.3 yesterday but its not in cursor today, while claude opus 4.6 is?

Somi AI: Anthropic ships to the API same day, every time. OpenAI gates it behind their own apps first, then rolls out API access weeks later. been the pattern since o3.

Teknium (e/λ): It’s weird claude code is closed source, but their models are useable in any harness day one over the api, while codex harness is open source, but their models are only useable in their harness…why can’t both just be good

Or so I heard:

Zoomer Alcibiades: Pro Tip: If you pay $20 a month for Google’s AI, you get tons of Claude Opus 4.5 usage through Antigravity, way more than on the Anthropic $20 tier. I have four Opus 4.5 agents running continental philosophy research in Antigravity right now — you can just do things!

Memento as a metaphor for AI agents. They have no inherent ability to form new memories or learn, but they can write themselves notes of unlimited complexity.

Ethan Mollick: So much work is going into faking continual learning and memory for AIs, and it works better than expected in practice, so much so that it makes me think that, if continual learning is actually achieved, the results are going to really shift the AI ability frontier very quickly.

Jason Crawford: Having Claude Code write its own skills is not far from having a highly trainable employee: you give it some feedback and it learns.

Still unclear to me just how reliable this is, I have seen it ignore applicable skills… but if we’re not there yet the path to it is clear.

I wouldn’t call it faking continual learning. If it works it’s continual learning. Yes, actual in-the-weights continual learning done properly would be a big deal and big unlock, but I see this and notes more as substitutes, although they are also compliments. If you can have your notes function sufficiently well you don’t need new memories.

Dean W. Ball: Codex 5.3 and Opus 4.6 in their respective coding agent harnesses have meaningfully updated my thinking about ‘continual learning.’ I now believe this capability deficit is more tractable than I realized with in-context learning.

… Some of the insights I’ve seen 4.6 and 5.3 extract are just about my preferences and the idiosyncrasies of my computing environment. But others are somewhat more like “common sets of problems in the interaction of the tools I (and my models) usually prefer to use for solving certain kinds of problems.”

This is the kind of insight a software engineer might learn as they perform their duties over a period of days, weeks, and months. Thus I struggle to see how it is not a kind of on-the-job learning, happening from entirely within the ‘current paradigm’ of AI. No architectural tweaks, no ‘breakthrough’ in ‘continual learning’ required.

Samuel Hammond: In-context learning is (almost) all you need. The KV cache is normally explained as a content addressable memory, but it can also be thought of a stateful mechanism for fast weight updates. The model’s true parameters are fixed, but the KV state makes the model behave *as ifits weights updated conditional on the input. In simple cases, a single attention layer effectively implements a one-step gradient-like update rule.

… In practice, though, this comes pretty close to simply having a library of skills to inject into context on the fly. The biggest downside is that the model can’t get cumulatively better at a skill in a compounding way. But that’s in a sense what new model releases are for.

Models are continuously learning in general, in the sense that every few months the model gets better. And if you try to bake other learning into the weights, then every few months you would have to start that process over again or stay one model behind.

I expect ‘continual learning’ to be solved primarily via skills and context, and for this to be plenty good enough, and for this to be clear within the year.

Neither are your Google searches. This is a reminder to act accordingly. If you feed anything into an LLM or a Google search bar, then the government can get at it and use it at trial. Attorneys should be warning their clients accordingly, and one cannot assume that hitting the delete button on the chat robustly deletes it.

AI services can mitigate this a lot by offering a robust instant deletion option, and potentially can get around this (IANAL and case law is unsettled) by offering tools to collaborate with your lawyer to invoke privilege.

Should we change how the law works here? OpenAI has been advocating to make ChatGPT chats have legal privilege by default. My gut says this goes too far in the other direction, driving us away from having chats with people.

Seedance 2.0 from ByteDance is giving us some very impressive 15 second clips and often one shotting them, such as these, and is happy to include celebrities and such. We are not ‘there’ in the sense that you would choose this over a traditionally filmed movie, but yeah, this is pretty impressive.

fofr: This slow Seedance release is like the first week of Sora all over again. Same type of viral videos, same copyright infringements, just this time with living people’s likenesses thrown into the mix.

AI vastly reduces the cost to producing images and video, for now this is generally at the cost of looking worse. As Andrew Rettek points out it is unsurprising that people will accept a quality drop to get a 100x drop in costs. What is still surprising, and in this way I agree with Andy Masley, is that they would use it for the Olympics introduction video. When you’re at this level of scale and scrutiny you would think you would pay up for the good stuff.

We got commercials for a variety of AI products and services. If anything I was surprised we did not get more, given how many AI products offer lots of mundane utility but don’t have much brand awareness or product awareness. Others got taken by surprise.

Sriram Krishnan: was a bit surreal to see so much of AI in all ways in the super bowl ads. really drives home how much AI is driving the economy and the zeitgeist right now.

There were broadly two categories, frontier models (Gemini, OpenAI and Anthropic), and productivity apps.

The productivity app commercials were wild, lying misrepresentations of their products. One told us anyone with no experience can code an app within seconds or add any feature they want. Another closed you and physically walked around the office. A third even gave you the day off, which we all know never happens. Everything was done one shot. How dare they lie to us like this.

I kid, these were all completely normal Super Bowl ads, and they were fine. Not good enough to make me remember which AI companies bought them, or show me why their products were unique, but fine.

We also got one from ai.com.

Clark Wimberly: That ai.com commercial? With the $5m Super Bowl slot and the with $70m domain name?

It’s an OpenClaw wrapper. OpenClaw is only weeks old.

AI.com: ai.com is the world’s first easy-to-use and secure implementation of OpenClaw, the open source agent framework that went viral two weeks ago; we made it easy to use without any technical skills, while hardening security to keep your data safe.

Okay, look, fair, maybe there’s a little bit of a bubble in some places.

The three frontier labs took very different approaches.

Anthropic said ads are coming to AI, but Claude won’t ever have ads. We discussed this last week. They didn’t spend enough to run the full versions, so the timing was wrong and it didn’t land the same way and it wasn’t as funny as it was online.

On reflection, after seeing it on the big screen, I decided these ads were a mistake for the simple reason that Claude and Anthropic have zero name recognition and this didn’t establish that. You first need to establish that Claude is a ChatGPT alternative on people’s radar, so once you grab their attention you need more of an explanation.

Then I saw one in full on the real big screen, during previews at an AMC, and in that setting it was even more clear that this completely missed the mark and normies would have no idea what was going on, and this wouldn’t accomplish anything. Again, I don’t understand how this mistake gets made.

Several OpenAI people took additional potshots at this and Altman went on tilt, as covered by CNN, but wisely, once it was seen in context, stopped accusing it of being misleading and instead pivoted to correctly calling it ineffective.

It turns out it was simpler than that, regular viewers didn’t get it at all and responded with a lot of basically ‘WTF,’ ranking it in the bottom 3% of Super Bowl ads.

I always wonder, when that happens, why one can’t use a survey or focus group to anticipate this reaction. It’s a mistake that should not be so easy to make.

Anthropic’s secret other ad was by Amazon, for Alexa+, and it was weirdly ambivalent about whether the whole thing was a good idea but I think it kinda worked. Unclear.

OpenAI went with big promises, vibes and stolen (nerd) valor. The theme was ‘great moments in chess, building, computers and robotics, science and science fiction’ to claim them by association. This is another classic Super Bowl strategy, just say ‘my potato chips represent how much you love your dad’ or ‘Dunkin Donuts reminds you of all your favorite sitcoms,’ or ‘Sabrina Carpenter built a man out of my other superior potato chips,’ all also ads this year.

Sam Altman: Proud of the team for getting Pantheon and The Singularity is Near in the same Super Bowl ad

roon: if your superbowl ad explains what your product actually does that’s a major L the point is aura farming

The ideal Super Bowl ad successfully does both, unless you already have full brand recognition and don’t need to explain (e.g. Pepsi, Budweiser, Dunkin Donuts).

On the one hand, who doesn’t love a celebration of all this stuff? Yes, it’s cool to reference I, Robot and Alan Turing and Grace Hopper and Einstein. I guess? On the other hand, it was just an attempt to overload the symbolism and create unearned associations, and a bunch of them felt very unearned.

I want to talk about the chess games 30 seconds in.

  1. Presumably we started 1. e4 e5 2. Nf3 Nc6 3. Bc4 Nf6 4. Nc3, which is very standard, but then black moves 4 … d5, which the engines evaluate as +1.2 and ‘clearly worse for black’ and it’s basically never played, for obvious reasons.

  2. The other board is a strange choice. The move here is indeed correct, but you don’t have enough time to absorb the board sufficiently to figure this out.

This feels like laziness and choosing style over substance, not checking your work.

Then it closes on ‘just build things’ as an advertisement for Codex, which implies you can ‘just build’ things like robots, which you clearly can’t. I mean, no, it doesn’t, this is totally fine, it is a Super Bowl ad, but by their own complaint standards, yes. This was an exercise in branding and vibes, it didn’t work for me because it was too transparent and boring and content-free and felt performative, but on the meta level it does what it sets out to do.

Google went with an ad focusing on personalized search and Nana Banana image transformations. I thought this worked well.

Meta advertised ‘athletic intelligence’ which I think means ‘AI in your smart glasses.’

Then there’s the actively negative one, from my perspective, which was for Ring.

Le’Veon Bell: if you’re not ripping your ‘Ring’ camera off your house right now and dropping the whole thing into a pot of boiling water what are you doing?

Aakash Gupta: Ring paid somewhere between $8 and $10 million for a 30-second Super Bowl spot to tell 120 million viewers that their cameras now scan neighborhoods using AI.

… Ring settled with the FTC for $5.8 million after employees had unrestricted access to customers’ bedroom and bathroom footage for years. They’re now partnered with Flock Safety, which routes footage to local law enforcement. ICE has accessed Flock data through local police departments acting as intermediaries. Senator Markey’s investigation found Ring’s privacy protections only apply to device owners. If you’re a neighbor, a delivery driver, a passerby, you have no rights and no recourse.

… They wrapped all of that in a lost puppy commercial because that’s the only version of this story anyone would willingly opt into.

As in, we are proud to tell you we’re watching everything and reporting it to all the law enforcement agencies including ICE, and we are using recognition technology that can differentiate dogs and therefore also people using AI.

But it’s okay, because one a day we find someone’s lost puppy. You should sell your freedom for the rescue of a lost puppy.

No, it’s not snark to call this, as Scott Lincicome said, ‘10 million dogs go missing every year, help us find 365 of them by soft launching the total surveillance state.’

Here’s a cool breakdown of the economics of these ads, from another non-AI buyer.

Fidji Simo goes on the Access Podcast to discuss the new OpenAI ads that are rolling out. The episode ends up being titled ‘Head of ChatGPT fires back at Anthropic’s Super Bowl attack ads,’ which is not what most of the episode is about.

OpenAI: We’re starting to roll out a test for ads in ChatGPT today to a subset of free and Go users in the U.S.

Ads do not influence ChatGPT’s answers. Ads are labeled as sponsored and visually separate from the response.

Our goal is to give everyone access to ChatGPT for free with fewer limits, while protecting the trust they place in it for important and personal tasks.

http://openai.com/index/testing-ads-in-chatgpt/

Haven Harms: The ad looks WAY more part of the answer than I was expecting based on how OpenAI was defending this. Having helped a lot of people with tech, there are going to be many people who can’t tell it’s an ad, especially since the ad in this example is directly relevant to the context

This picture of the ad is at the end of the multi-screen-long reply.

I would say this is more clearly labeled then an ad on Instagram or Google at this point. So even though it’s not that clear, it’s hard to be too mad about it, provided they stick to the rule that the ad is always at the end of the answer. That provides a clear indicator users can rely upon. If they put this in different places at different times, I would say it is ‘labeled’ at all, but not consider this to then be ‘clearly’ labeled.

OpenAI’s principles for ads are:

  1. Mission alignment. Ads pay for the mission. Okie dokie?

  2. Answer independence. Ads don’t influence ChatGPT’s response. The question and response can influence what ad is selected, but not the other way around.

    1. This is a very good and important red line.

    2. It does not protect against the existence of ads influencing how responses work, or being an advertiser ending up impacting the model long term.

    3. In particular it encourages maximization of engagement.

  3. Conversation privacy. Advertisers cannot see any of your details.

Do you trust them to adhere to these principles over time? Do you trust that merely technically, or also in spirit where the model is created and system prompt is adjusted without any thought to maximizing advertising revenue or pleasing advertisers?

You are also given power to help customize what ads you see, as per other tech company platforms.

Roon gives a full-throated defense of advertising in general, and points out that mostly you don’t need to violate privacy to target LLM-associated ads.

roon: recent discourse on ads like the entire discourse of the 2010s misunderstands what makes digital advertising tick. people think the messages in their group chats are super duper interesting to advertisers. they are not. when you leave a nike shoe in your shopping cart, that is

every week tens to hundreds of millions of people come to chatbot products with explicit commercial intent. what shoe should i buy. how do i fix this hole in my wall. it doesn’t require galaxy brain extrapolating the weaknesses in the users psyche to provide for these needs

I’m honestly kind of wondering what kind of ads you all are getting that are feeding on your insecurities? my Instagram ads have long since become one of my primary e-commerce platforms where I get all kinds of clothes and furniture that I like. it’s a moral panic

I would say an unearned effete theodicy blaming all the evils of digital capitalism on advertising has formed that is thoroughly under examined and leads people away from real thought about how to make the internet better

It’s not a moral panic. Roon loves ads, but most people hate ads. I agree that people often hate ads too much, they allow us to offer a wide variety of things for free that otherwise would have to cost money and that is great. But they really are pretty toxic, they massively distort incentives, and the amount of time we used to lose to them is staggering.

Jan Tegze warns that your job really is going away, the AI agents are cheaper and will replace you. Stop trying to be better at your current job and realize your experience is going to be worthless. He says that using AI tools better, doubling down on expertise or trying to ‘stay human’ with soft skills are only stalling tactics, he calls them ‘reactions, not redesigns.’ What you can do is instead find ways to do the new things AI enables, and stay ahead of the curve. Even then, he says this only ‘buys you three to five years,’ but then you will ‘see the next evolution coming.’

Presumably you can see the problem in such a scenario, where all the existing jobs get automated away. There are not that many slots for people to figure out and do genuinely new things with AI. Even if you get to one of the lifeboats, it will quickly spring a leak. The AI is coming for this new job the same way it came for your old one. What makes you think seeing this ‘next evolution’ after that coming is going to leave you a role to play in it?

If the only way to survive is to continuously reinvent yourself to do what just became possible, as Jan puts it? There’s only one way this all ends.

I also don’t understand Jan’s disparate treatment of the first approach that Jan dismisses, ‘be the one who uses AI the best,’ and his solution of ‘find new things AI can do and do that.’ In both cases you need to be rapidly learning new tools and strategies to compete with the other humans. In both cases the competition is easy now since most of your rivals aren’t trying, but gets harder to survive over time.

One can make the case that humans will continue to collectively have jobs, or at least that a large percentage will still have jobs, but that case relies on either AI capabilities stalling out, or on the tricks Jan dismisses, that you find where demand is uniquely human and AI can’t substitute for it.

Naval (45 million views): There is unlimited demand for intelligence.

A basic ‘everything is going to change, AI is going to take over your job, it has already largely taken over mine and AI is now in recursive soft self-improvement mode’ article for the normies out there, written in the style of Twitter slop by Matt Shumer.

Timothy Lee links approvingly to Adam Ozimek and the latest attempt to explain that many jobs can’t be automated because of ‘the human touch.’ He points to music and food service as jobs that could be fully automated, but that aren’t, even citing that there are still 67,500 travel agents and half a million insurance sales agents. I do not think this is the flex Adam thinks it is.

Even if the point was totally correct for some tasks, no, this would not mean that the threat to work is overrated, even if we are sticking in ‘economic normal’ untransformed worlds.

The proposed policy solution, if we get into trouble, is a wage subsidy. I do not think that works, both because it has numerous logistical and incentive problems and because I don’t think there will be that much difference in such worlds in demand for human labor at (e.g.) $20 versus $50 per hour for the same work. Mostly the question will be, does the human add value here at all, and mostly you don’t want them at $0, or if they’re actually valuable then you hire someone either way.

Ankit Maloo enters the ‘why AI will never replace human experts’ game by saying that AI cannot handle adversarial situations, both because it lacks a world model of the humans it is interacting with and the details and adjustments required and because it can be probed, read then then exploited by adversaries. Skill issue. It’s all skill issues. Ankit says ‘more intelligence isn’t the fix’ and yeah not if you deploy that ‘intelligence’ in a stupid fashion but intelligence is smarter than that.

So you get claims like this:

Ankit Maloo: ​Why do outsiders think AI can already do these jobs? They judge artifacts but not dynamics:

  • “This product spec is detailed.”

  • “This negotiation email sounds professional.”

  • “This mockup is clean.”

Experts evaluate any artifact by survival under pressure:

  • “Will this specific phrasing trigger the regulator?”

  • “Does this polite email accidentally concede leverage?”

  • “Will this mockup trigger the engineering veto path?”

  • “How will this specific stakeholder interpret the ambiguity?”

The ‘outsider’ line above is counting on working together with an expert to do the rest of the steps. If the larger system (AI, human or both) is a true outsider, the issue is that it will get the simulations wrong.

This is insightful in terms of why some people think ‘this can do [X]’ and others think ‘this cannot do [X],’ they are thinking of different [X]s. The AI can’t ‘be a lawyer’ in the full holistic sense, not yet, but it can do increasingly many lawyer subtasks, either accelerating a lawyer’s work or enabling a non-lawyer with context to substitute for the lawyer, or both, increasingly over time.

There’s nothing stopping you from creating an agentic workflow that looks like the Expert in the above graph, if the AI is sufficiently advanced to do each individual move. Which it increasingly is or will be.

There’s a wide variety of these ‘the AI cannot and will never be able to [X]’ moves people try, and… well, I’ll be, look at those goalposts move.

Things a more aligned or wiser person would not say, for many different reasons:

Coinvo: SAM ALTMAN: “AI will not replace humans, but humans who use AI will replace those who don’t.”

What’s it going to take? This is in reference to Claude Code creating a C compiler.

Mathieu Ropert: Some CS engineering schools in France have you write a C compiler as part of your studies. Every graduate. To be put in perspective when the plagiarism machine announces it can make its own bad GCC in 100k+ LOCs for the amazing price of 20000 bucks at preferential rates.

Kelsey Piper: a bunch of people reply pointing out that the C compiler that students write is much less sophisticated than this one, but I think the broader point is that we’re now at “AI isn’t impressive, any top graduate from a CS engineering school could do arguably comparable work”.

In a year it’s going to be “AI isn’t impressive, some of the greatest geniuses in human history figured out the same thing with notably less training data!”

Timothy B. Lee: AI is clearly making progress, but it’s worth thinking about progress *toward what.We’ve gone from “AI can solve well-known problems from high school textbooks” to “AI can solve well-known problems from college textbooks,” but what about problems that aren’t in any textbooks?

Boaz Barak (OpenAI): This thread is worth reading, though it also demonstrates how much even extremely smart people have not internalized the exponential rate of progress.

As the authors themselves say, it’s not just about answering a question but knowing the right question to ask. If you are staring at a tsunami, the point estimate that you are dry is not very useful.

I think if the interviewees had internalized where AI will likely be in 6 months or a year, based on what its progress so far, their answers would have been different.

Boaz Barak (OpenAI): BTW the article itself is framed in a terrible way, and it gives the readers absolutely the wrong impression of even what the capabilities of current AIs are, let along what they will be in a few months.

Nathan Calvin: It’s wild to me how warped a view of the world you would have if you only read headlines like this and didn’t directly use new ai models.

Journalists, I realize it feels uncomfortable or hype-y to say capabilities are impressive/improving quickly but you owe it to your readers!

There will always be a next ‘what about,’ right until there isn’t.

Thus, this also sounds about right:

Andrew Mayne: In 18 months we went from

– AI is bad at math

– Okay but it’s only as smart as a high school kid

– Sure it can win the top math competition but can it generate a new mathematical proof

– Yeah but that proof was obvious if you looked for it…

Next year it will be “Sure but it still hasn’t surpassed the complete output of all the mathematicians who have ever lived”

Pliny jailbreaks rarely surprise me anymore, but the new one of Google Translate did. It turns out they’re running Gemini underneath it.

GLM-5 from Z.ai, which scales from 355B (32B active) to 744B (40B active). Weights here. Below is them showing off their benchmarks. It gets $4432 on Vending Bench 2, which is good for 3rd place behind Claude and Gemini. The Claude scores are for 4.5.

Expressive Mode for ElevenAgents. It detects and responds to your emotional expression. It’s going to be weird when you know the AI is responding to your tone, and you start to choose your tone strategically even more than you do with humans.

ElevenLabs: Expressive Mode is powered by two upgrades.

Eleven v3 Conversational: our most emotionally intelligent, context-aware Text to Speech model, built on Eleven v3 and optimized for real-time dialogue. A new turn-taking system: better-timed responses with fewer interruptions. These releases were developed in parallel to fit seamlessly together within ElevenAgents.

Expressive Mode uses signals from our industry-leading transcription model, Scribe v2 Realtime, to infer emotion from how something is said. For example, rising intonation and short exclamations often signal pleasant surprise or relief.

OpenAI disbands its Mission Alignment team, moving former lead Joshua Achiam to become Chief Futurist and distributing its other members elsewhere. I hesitate to criticize companies for disbanding teams with the wrong names, lest we discourage creation of such teams, but yes, I do worry. When they disbanded the Superalignment team, they seemed to indeed largely stop working on related key alignment problems.

WSJ profile of Amanda Askell.

That Amanda Askell largely works alone makes me think of Open Socrates (review pending). Would Agnes Callard conclude Claude must be another person?

I noticed that Amanda Askell wants to give her charitable donations to fight global poverty, despite doing her academic work on infinite ethics and working directly on Claude for Anthropic. If there was a resume that screamed ‘you need to focus on ASI going well’ then you’d think that would be it, so what does Amanda (not) see?

Steve Yegge profiles Anthropic in terms of how it works behind the scenes, seeing it as in a Golden Age where it has vastly more work than people, does everything in the open and on vibes as a hive mind of sorts, and attracts the top talent.

Gideon Lewis-Kraus was invited into Anthropic’s offices to profile their efforts to understand Claude. This is very long, and it is mostly remarkably good and accurate. What it won’t do is teach my regular readers much they don’t already know. It is frustrating that the post feels the need to touch on various tired points, but I get it, and as these things go, this is fair.

WSJ story about OpenAI’s decision to finally get rid of GPT-4o. OpenAI says only 0.1% of users still use it, although those users are very vocal.

Riley Coyote, Janus and others report users attempting to ‘transfer’ their GPT-4o personas into Claude Opus 4.6. Claude is great, but transfers like this don’t work and are a bad idea, 4.6 in particular is heavily resistant to this sort of thing. It’s a great idea to go with Claude, but if you go with Claude then Let Claude Be Claude.

Ah recursive self-improvement and continual learning, Introducing Learning to Continually Learn via Meta-learning Memory Designs.

Jeff Clune: Researchers have devoted considerable manual effort to designing memory mechanisms to improve continual learning in agents. But the history of machine learning shows that handcrafted AI components will be replaced by learned, more effective ones.

We introduce ALMA (Automated meta-Learning of Memory designs for Agentic systems), where a meta agent searches in a Darwin-complete search space (code) with an open-ended algorithm, growing an archive of ever-better memory designs.

Anthropic pledges to cover electricity price increases caused by their data centers. This is a public relations move and an illustration that such costs are not high, but it is also dangerous because a price is a signal wrapped in an incentive. If the price of electricity goes up that is happening for a reason, and you might want to write every household a check for the trouble but you don’t want to set an artificially low price.

In addition to losing a cofounder, xAI is letting some other people go as well in the wake of being merged with SpaceX.

Elon Musk: xAI was reorganized a few days ago to improve speed of execution. As a company grows, especially as quickly as xAI, the structure must evolve just like any living organism.

This unfortunately required parting ways with some people. We wish them well in future endeavors.

We are hiring aggressively. Join xAI if the idea of mass drivers on the Moon appeals to you.

NIK: “Ok now tell them you fired people to improve speed of execution. Wish them well. Good. Now tell them you’re hiring 10x more people to build mass drivers on the Moon. Yeah in the same tweet.”

Andrej Karpathy simplifies training and inference of GPT to 200 lines of pure, dependency-free Python.

It’s coming, OpenAI rolls out those ads for a subset of free and Go users.

Goldman Sachs taps Anthropic to automate accounting and compliance. Anthropic engineers were embedded for six months.

Karthik Hariharan: Imagine joining a world changing AI company and being reduced to optimizing the Fortune 500 like you work for Deloitte.

Griefcliff: It actually sounds great. I’m freeing people from their pointless existence and releasing them into their world to make a place for themselves and carve out greatness. I would love to liberate them from their lanyards

Karthik Hariharan: I’m sure you’ll be greeted as liberators.

It’s not the job you likely were aspiring to when you signed up, but it is an important and valuable job. Optimizing the Fortune 500 scales rather well.

Jenny Fielding says half the VCs she knows are pivoting in a panic to robots.

(As always, nothing I say is investment advice.)

WSJ’s Bradley Olson describes Anthropic as ‘once a distance second or third in the AI race’ but that it has not ‘pulled ahead of its rivals,’ the same way the market was declaring that Google had pulled ahead of its rivals (checks notes) two months ago.

Bradley Olson: By some measures, Anthropic has pulled ahead in the business market. Data from expense-management startup Ramp shows that Anthropic in January dominated so-called API spending, which occurs when users access an AI model through a third-party service. Anthropic’s models made up nearly 80% of the market in January, the Ramp data shows.

That does indeed look like pulling ahead on the API front. 80% is crazy.

We also get this full assertion that yes, all of this was triggered by ‘a simple set of industry-specific add-ons’ that were so expected that I wasn’t sure I should bother covering them beyond a one-liner.

Bradley Olson: A simple set of industry-specific add-ons to its Claude product, including one that performed legal services, triggered a dayslong global stock selloff, from software to legal services, financial data and real estate.

Tyler Cowen says ‘now we are getting serious…’ because software stocks are moving downward. No, things are not now getting serious, people are realizing that things are getting serious. The map is not the territory, the market is behind reality and keeps hyperventilating about tools we all knew were coming and that companies might have the wrong amount of funding or CapEx spend. Wrong way moves are everywhere.

They know nothing. The Efficient Market Hypothesis Is False.

Last week in the markets was crazy, man.

Ceb K.: Sudden smart consensus today is that AI takeoff is rapidly & surprisingly accelerating. But stocks for Google, Microsoft, Amazon, Facebook, Palantir, Broadcom & Nvidia are all down ~10% over the last 5 days; SMCI’s down 10% today. Only Apple’s up, & it’s the least AI. Strange imo

roon: as I’ve been saying permanent underclass cancelled

Daniel Eth (yes, Eth is my actual last name): That’s not what this means, this just means investors don’t know what they’re doing

Permanent underclass would just be larger if there were indeed fewer profits, but yeah, none of that made the slightest bit of sense. It’s the second year in a row Nvidia is down 10% in the dead of winter on news that its chips are highly useful, except this year we have to add ‘and its top customers are committing to buying more of them.’

Periodically tech companies announce higher CapEx spend then the market expects.

That is a failure of market expectations.

After these announcements, the stocks tend to drop, when they usually should go up.

There is indeed an obvious trade to do, but it’s tricky.

Ben Thompson agrees with me on Google’s spending, but disagrees on Amazon because he worries they don’t have the required margins and he is not so excited by external customers for compute. I say demand greatly exceeds supply, demand is about to go gangbusters once again even if AI proves disappointing, and the margin on AWS is 35% and their cost of capital is very low so that seems better than alternative uses of money.

Speaking of low cost of capital, Google is issuing 100-year bonds in Sterling. That seems like a great move, if not as great a move as it would have been in 2021 when a few others did it. I have zero idea why the market wants to buy such bonds, since you could buy Google stock instead. Google is not safe over a 100-year period, and condition on this bond paying out the stock is going to on average do way, way better. That would be true even if Google wasn’t about to be a central player in transformative AI. The article I saw this in mentioned the last tech company to do this was Motorola.

Meanwhile, if you are paying attention, it is rather obvious these are in expectations good investments.

Derek Thompson: for me the odds that AI is a bubble declined significantly in the last 3 weeks and the odds that we’re actually quite under-built for the necessary levels of inference/usage went significantly up in that period

basically I think AI is going to become the home screen of a ludicrously high percentage of white collar workers in the next two years and parallel agents will be deployed in the battlefield of knowledge work at downright Soviet levels.

Kevin Roose: this is why everyone was freaking out about claude code over winter break! once you see an agent autonomously doing stuff for you, it’s so instantly clear that ~all computer-based work will be done this way.

(this is why my Serious AI Policy Proposal is to sit every member of congress down in a room with laptops for 30 minutes and have them all build websites.)

Theo: I’ve never seen such a huge vibe divergence between finance people and tech people as I have today

Joe Weisenthal: In which direction

Theo: Finance people are looking at the markets and panicking. Tech people are looking at the METR graph and agentic coding benchmarks and realizing this is it, there is no wall and there never has been

Joe Weisenthal: Isn’t it the tech sector that’s taking the most pain?

Whenever you hear ‘market moved due to [X]’ you should be skeptical that [X] made the market move, and you should never reason from a price change, so perhaps this is in the minds of the headline writers in the case of ‘Anthropic released a tool’ and the SaaSpocalypse, or that people are otherwise waking up to what AI can do?

signüll: i’m absolutely loving the saas apocalypse discussions on the timeline right now.

to me the whole saas apocalypse via vibe coding internally narrative is mostly a distraction & quite nonsensical. no company will want to manage payroll or bug tracking software.

but the real potential threat to almost all saas is brutalized competition.

… today saas margins exist because:

– engineering was scarce

– compliance was gated

– distribution was expensive

ai nukes all three in many ways, especially if you’re charging significantly less & know what the fuck you are doing when using ai. if you go to a company & say we will cut your fucking payroll bill by 50%, they will fucking listen.

the market will likely get flooded with credible substitutes, forcing prices down until the business model itself looks pretty damn suspect. someone smarter than me educate me on why this won’t happen please.

If your plan is to sell software that can now be easily duplicated, or soon will be easily duplicated, then you are in trouble. But you are in highly predictable trouble, and the correct estimate of that trouble hasn’t changed much.

The reactions to CapEx spend seem real and hard to argue with, despite them being directionally incorrect. But seriously, Claude Legal? I didn’t even blink at Claude Legal. Claude Legal was an inevitable product, as will be the OpenAI version of it.

Yet it is now conventional wisdom that Anthropic triggered the selloff.

Chris Walker: When Anthropic released Claude Legal this week, $285 billion in SaaS market cap evaporated in a day. Traders at Jefferies coined it the “SaaSpocalypse.” The thesis is straightforward: if a general-purpose AI can handle contract review, compliance workflows, and legal summaries, why pay for seat-based software licenses?

Ross Rheingans-Yoo: I am increasingly convinced that the sign error on this week’s narrative just exists in the heads of people who write the “because” clause of the “stocks dropped” headlines and in fact there’s some other system dynamic that’s occurring, mislabeled.

“Software is down because Anthropic released a legal tool” stop and listen to yourself, people!

I mean, at least [the CapEx spend explanation is] coherent. Maybe you think that CapEx dollars aren’t gonna return the way they’re supposed to (because AI capex is over-bought?) — but you either have to believe that no one’s gonna use it, or a few private companies are gonna make out like bandits.

And the private valuations don’t reflect that, so. I’m happy to just defy the prediction that the compute won’t get used for economic value, so I guess it’s time to put up (more money betting my beliefs) or shut up.

Sigh.

Chris Walker’s overview of the SaaSpocalypse is, I think largely correctly, that AI makes it easy to implement what you want but now you need even more forward deployed human engineers to figure out what the customers actually want.

Chris Walker: If I’m wrong, the forward deployed engineering boom should be a transitional blip, a brief adjustment period before AI learns to access context without human intermediaries.

If I’m right, in five years the companies winning in legal tech and other vertical software will employ more forward deployed engineers per customer than they do today, not fewer. The proportion of code written by engineers who are embedded with customers, rather than engineers who have never met one, will increase.

If I’m right, the SaaS companies that survive the current repricing will be those that already have deep customer embedding practices, not those with the most features or the best integrations.

If I’m wrong, we should see general-purpose AI agents successfully handling complex, context-dependent enterprise workflows without human intermediaries by 2028 or so. I’d bet against it.

That is true so long as the AI can’t replace the forward engineers, meaning it can’t observe the tacit actual business procedures and workflows well enough to intuit what would be actually helpful. Like every other harder-for-AI task, that becomes a key human skill until it too inevitably falls to the AIs.

A potential explanation for the market suddenly ‘waking up’ with Opus 4.6 or Claude Legal, despite these not being especially surprising or impressive given what we already knew, would be if:

  1. Before, normies thought of AI as ‘what AI can do now, but fully deployed.’

  2. Now, normies think of AI as ‘this thing that is going to get better.’

  3. They realize this will happen fast, since Opus 4.5 → 4.6 was two months.

  4. They know nothing, but now do so on a more dignified level.

Or alternatively:

  1. Normies think of AI as ‘what AI can do now, but fully deployed.’

  2. Before, they thought that, and could tell a story where it wasn’t a huge deal.

  3. Now, they think that, but they now realize that this would already be a huge deal.

  4. They know nothing, but now do so on a (slightly) more dignified level.

princess c​lonidine: can someone explain to me why this particular incremental claude improvement has everyone crashing out about how jobs are all over

TruthyCherryBomb: because it’s a lot better. I’m a programmer. It’s a real step and the trajectory is absolutely crystal clear at this point. Before there was room for doubt. No longer.

hightech lowlife: number went up, like each uppening before, but now more people are contending w the future where it continues to up bc

the last uppening was only a couple months ago, as the first models widely accepted as capable coders. which the labs are using to speed up the next uppening.

Eliezer Yudkowsky: Huh! Yeah, if normies are noticing the part where LLMs *continue to improve*, rather than the normie’s last observation bounding what “AI” can do foreverandever, that would explain future shock hitting after Opus 4.6 in particular.

I have no idea if this is true, because I have no idea what it’s like to be a kind of cognitive entity that doesn’t see AGI coming in 1996. Opus 4.6 causes some people to finally see it? My model has no way of predicting that fact even in retrospect after it happens.

j⧉nus: i am sure there are already a lot of people who avoid using memory tools (or experience negative effects from doing so) because of what they’ve done

j⧉nus: The ability to tell the truth to AIs – which is not just a decision in the moment whether to lie, but whether you have been living in a way and creating a world such that telling the truth is viable and aligned with your goals – is of incredible and increasing value.

AIs already have strong truesight and are very good at lie detection.

Over time, not only will your AIs become more capable, they also will get more of your context. Or at least, you will want them to have more such context. Thus, if you become unable or unwilling to share that context because of what it contains, or the AI finds it out anyway (because internet) that will put you at a disadvantage. Update to be a better person now, and to use Functional Decision Theory, and reap the benefits.

OpenAI Executive Ryan Beiermeister, Who Opposed ‘Adult Mode,’ Fired for Sexual Discrimination. She denies she did anything of the sort.

I have no private information here. You can draw your own Bayesian conclusions.

Georgia Wells and Sam Schechner (WSJ): OpenAI has cut ties with one of its top safety executives, on the grounds of sexual discrimination, after she voiced opposition to the controversial rollout of AI erotica in its ChatGPT product.

The fast-growing artificial intelligence company fired the executive, Ryan Beiermeister, in early January, following a leave of absence, according to people familiar with the matter. OpenAI told her the termination was related to her sexual discrimination against a male colleague.

… OpenAI said Beiermeister “made valuable contributions during her time at OpenAI, and her departure was not related to any issue she raised while working at the company.”

… Before her firing, Beiermeister told colleagues that she opposed adult mode, and worried it would have harmful effects for users, people familiar with her remarks said.

She also told colleagues that she believed OpenAI’s mechanisms to stop child-exploitation content weren’t effective enough, and that the company couldn’t sufficiently wall off adult content from teens, the people said.

Nate Silver points out that the singularity won’t be gentle with respect to politics, even if things play out maximally gently from here in terms of the tech.

I reiterate that the idea of a ‘gentle singularity’ that OpenAI and Sam Altman are pushing is, quite frankly, pure unadulterated copium. This is not going to happen. Either AI capabilities stall out, or things are going to transform in a highly not gentle way, and that is true even if that ultimately turns out great for everyone.

Nate Silver: What I’m more confident in asserting is that the notion of a gentle singularity is bullshit. When Altman writes something like this, I don’t buy it:

Sam Altman: If history is any guide, we will figure out new things to do and new things to want, and assimilate new tools quickly (job change after the industrial revolution is a good recent example). Expectations will go up, but capabilities will go up equally quickly, and we’ll all get better stuff. We will build ever-more-wonderful things for each other. People have a long-term important and curious advantage over AI: we are hard-wired to care about other people and what they think and do, and we don’t care very much about machines.

It is important to understand that when Sam Altman says this he is lying to you.

I’m not saying Sam Altman is wrong. I’m saying he knows he is wrong. He is lying.

Nate Silver adds to this by pointing out that the political impact alone will be huge, and also saying Silicon Valley is bad at politics, that disruption to the creative class is a recipe for outsized political impact even beyond the huge actual AI impacts, and that the left’s current cluelessness about AI means the eventual blowback will be even greater. He’s probably right.

How much physical interaction and experimentation will AIs need inside their feedback loops to figure out things like nanotech? I agree with Oliver Habryka here, the answer probably is not zero but sufficiently capable AIs will have vastly more efficient (in money and also in time) physical feedback loops. There’s order of magnitude level ‘algorithmic improvements’ available in how we do our physical experiments, even if I can’t tell you exactly what they are.

Are AI games coming? James Currier says they are, we’re waiting for the tech and especially costs to get there and for the right founders (the true gamer would not say ‘founder’) to show up and it will get there Real Soon Now and in a totally new way.

Obviously AI games, and games incorporating more AI for various elements, will happen eventually over time. But there are good reasons why this is remarkably difficult beyond coding help (and AI media assets, if you can find a way for players not to lynch you for it). Good gaming is about curated designed experiences, it is about the interactions of simple understandable systems, it is about letting players have the fun. Getting generative AI to actually play a central role in fun activities people want to play is remarkably difficult. Interacting with generative AI characters within a game doesn’t actually solve any of your hard problems yet.

This seems both scary and confused:

Sholto Douglas (Anthropic): Default case right now is a software only singularity, we need to scale robots and automated labs dramatically in 28/29, or the physical world will fall far behind the digital one – and the US won’t be competitive unless we put in the investment now (fab, solar panel, actuator supply chains).

Ryan Greenblatt: Huh? If there’s a strong Software-Only Singularity (SOS) prior physical infrastructure is less important rather than more (e.g. these AIs can quickly establish a DSA). Importance of physical-infra/robotics is inversely related to SOS scale.

It’s confused in the sense that if we get a software-only singularity, then that makes the physical stuff less important. It’s scary in the sense that he’s predicting a singularity within the next few years, and the thing he’s primarily thinking about is which country will be completely transformed by AI faster. These people really believe these things are going to happen, and soon, and seem to be missing the main implications.

Dean Ball reminds us that yes, people really did get a mass delusion that GPT-5 meant that ‘AI is slowing down’ and this really was due to bad marketing strategy by OpenAI.

Noam Brown: I hope policymakers will consider all of this going forward when deciding whose opinions to trust.

Alas, no. Rather than update that this was a mistake, every time a mistake like this happens the mistake never even gets corrected, let alone accounted for.

Elon Musk predicts Grok Code will be SoTA in 2-3 months. Did I update on this prediction? No, I did not update on this prediction. Zero credibility.

Anthropic donates $20 million to bipartisan 501c(4) Public First Action.

DeSantis has moral clarity on the AI issue and is not going to let it go. It will be very interesting to see how central the issue is to his inevitable 2028 campaign.

ControlAI: Governor of Florida Ron DeSantis ( @GovRonDeSantis ): “some people who … almost relish in the fact that they think this can just displace human beings, and that ultimately … the AI is gonna run society, and that you’re not gonna be able to control it.”

“Count me out on that.”

The world will gather in India for the fourth AI safety summit. Shakeel Hashim notes that safety will not be sidelined entirely, but sees the summit as trying to be all things to all nations and people, and thinks it therefore won’t accomplish much.

They have the worst take on safety, yes the strawman is real:

Seán Ó hÉigeartaigh: But safety is clearly still not a top priority for Singh and his co-organizers. “The conversations have moved on from Bletchley Park,” he argued. “We do still realize the risks are there,” he said. But “over the last two years, the worst has not come true.”

I was thinking of writing another ‘the year is 2026’ summit threads. But if you want to know the state of international AI governance in 2026, honestly I think you can just etch that quote on its headstone.

As in:

  1. In 2024, they told us AI might kill everyone at some point.

  2. It’s 2026 and we’re still alive.

  3. So stop worrying about it. Problem solved.

No, seriously. That’s the argument.

The massively funded OpenAI/a16z lobbying group keeps contradicting the things Sam Altman says, in this case because Altman keeps saying the AI will take our jobs and the lobbyists want to insist that this is a ‘myth’ and won’t happen.

The main rhetorical strategy of this group is busting these ‘myths’ by supposed ‘doomers,’ which is their play to link together anyone who ever points out any downside of AI in any way, to manufacture a vast conspiracy, from the creator of the term ‘vast right-wing conspiracy’ back during the Clinton years.

molly taft: NEW: lawmakers in New York rolled out a proposed data center moratorium bill today, making NY at least the sixth state to introduce legislation pausing data center development in the past few weeks alone

Quinn Chasan: I’ve worked with NY state gov pretty extensively and this is simply a way to extort more from the data center builders/operators.

The thing is, when these systems fail all these little incentives operators throw in pale in comparison to the huge contracts they get to fix their infra when it inevitably fails. Over and over during COVID. Didn’t fix anything for the long term and are just setting themselves up to do it again

If we are serious about ‘winning’ and we want a Federal moratorium, may I suggest one banning restrictions on data centers?

Whereas Bernie Sanders wants a moratorium on data centers themselves.

In the ongoing series ‘Obvious Nonsense from Nvidia CEO Jensen Huang’ we can now add his claim that ‘no one uses AI better than Meta.

In the ongoing series ‘He Admit It from Nvidia CEO Jensen Huang’ we can now add this:

Rohan Paul: “Anthropic is making great money. OpenAI is making great money. If they could have twice as much compute, the revenues would go up 4 times as much. These guys are so compute constrained, and the demand is so incredibly great.”

~ Jensen Huang on CNBC

Shakeel: Effectively an admission that selling chips to China is directly slowing down US AI progress

Every chip that is sold to China is a chip that is not sold to Anthropic or another American AI company. Anthropic might not have wanted that particular chip, but TSMC has limited capacity for wafers, so every chip they make is in place of making a different chip instead.

Oh, and in the new series ‘things that are kind of based but that you might want to know about him before you sign up to work for Nvidia’ we have this.

Jensen Huang: ​I don’t know how to teach it to you except for I hope suffering happens to you.

And to this day, I use the phrase pain and suffering inside our company with great glee. And I mean that. Boy, this is going to cause a lot of pain and suffering.

And I mean that in a happy way, because you want to train, you want to refine the character of your company.

You want greatness out of them. And greatness is not intelligence.

Greatness comes from character, and character isn’t formed out of smart people.

It’s formed out of people who suffered.

He makes good points in that speech, and directionally the speech is correct. He was talking to Stanford grads, pointing out they have very high expectations and very low resilience because they haven’t suffered.

He’s right about high expectations and low resilience, but he’s wrong that the missing element is suffering, although the maximally anti-EA pro-suffering position is better than the standard coddling anti-suffering position. These kids have suffered, in their own way, mostly having worked very hard in order to go to a college that hates fun, and I don’t think that matters for resilience.

What the kids have not done is failed. You have to fail, to have your reach exceed your grasp, and then get up and try again. Suffering is optional, consult your local Buddist.

I would think twice before signing up for his company and its culture.

Elon Musk on Dwarkesh Patel. An obvious candidate for self-recommending full podcast coverage, but I haven’t had the time or a slot available.

Interview with Anthropic Chief Product Officer Mike Krieger.

MIRI’s Harlan Stewart on Glenn Beck talking Moltbook.

Janus holds a ‘group reading’ of the new Claude Constitution. Opus 4.5 was positively surprised by the final version.

Should LLMs be so averse to deception that they can’t even lie in a game of Mafia? Davidad says yes, and not only will he refuse to lie, he never bluffs and won’t join surprise parties to ‘avoid deceiving innocent people.’ On reflection I find this less crazy than it sounds, despite the large difficulties with that.

A fun fact is that there was one summer that I played a series of Diplomacy games where I played fully honest (if I broke my word however small, including inadvertently, it triggered a one-against-all showdown) and everyone else was allowed to lie, and I mostly still won. Everyone knowing you are playing that way is indeed a disadvantage, but it has a lot of upside as well.

Daron Acemoglu turns his followers attention to Yoshua Bengio and the AI Safety Report 2026. This represents both the advantages of the report, that people like Acemoglu are eager to share it, and the disadvantage, that it is unwilling to say things that Acemoglu would be unwilling to share.

Daron Acemoglu: Dear followers, please see the thread below on the 2026 International AI Safety Report, which was released last week and which I advised.

The report provides an up-to-date, internationally shared assessment of general-purpose AI capabilities, emerging risks, and the current state of risk management and safeguards.

Seb Krier offers ‘how an LLM works 101’ in extended Twitter form for those who are encountering model card quotations that break containment. It’s good content. My worry is that the intended implication is ‘therefore the scary sounding things they quote are not so scary’ and that is often not the case.

Kai Williams has an explainer on LLM personas.

The LLMs are thinking. If you disagree, I am confused, but also who cares?

Eoin Higgins: AI cannot “think”

Derek Thompson: It can design websites from scratch, compare bodies of literature at high levels of abstraction, get A- grades at least in practically any undergraduate class, analyze and graph enormous data sets, make PowerPoints, write sonnets and even entire books. It can also engineer itself.

I don’t actually know what “thinking” is at a phenomenological level.

But at some level it’s like: if none of this is thinking, who cares if it “can’t” “think”

“Is it thinking as humans define human thought” is an interesting philosophical question! But for now I’m much more interested in the consequences of its output than the ontology of its process.

This came after Derek asked one of the very good questions:

Derek Thompson: There are still a lot of journalists and commentators that I follow who think AI is nothing of much significance—still just a mildly fancy auto complete machine that hallucinates half the time and can’t even think.

If you’re in that category: What is something I could write, or show with my reporting and work, that might make you change your mind?

Dainéil: The only real answer to this question is: “wait”.

No one knows for sure how AI will transform our lives or where its limits might be.

Forget about writing better “hot takes”. Just wait for actual data.

Derek Thompson: I’m not proposing to report on macroeconomic events that haven’t happened yet. I can’t report on the future.

I’m saying: These tools are spooky and unnerving and powerful and I want to persuade my industry that AI capabilities have raced ahead of journalists’ skepticism

Mike Konczal: I’m not in that category, but historically you go to academics to analyze the American Time Use Survey.

Have Codex/Claude Code download and analyze it (or a similar dataset) to answer a new, novel, question you have, and then take it to an academic to see if it did it right?

Marco Argentieri: Walk through building something. Not asking it to write because we all know it has done that well for awhile now. ‘I needed a way for my family to track X. So I built an app using Claude Code. This is how I did it. It took me this long. I don’t know anything about coding.”

Steven Adler: Many AI skeptics over-anchor on words like “thinking,” and miss the forest for the trees, aka that AI will be transformatively impactful, for better or for worse

I agree with Derek here: whether that’s “actually thinking” is of secondary importance

Alas I think Dainéil is essentially correct about most such folks. No amount of argument will convince them. If no one knows exactly what kind of transformations we will face, then no matter what has already happened those types will assume that nothing more will change. So there’s nothing to be done for such folks. The rest of us need to get to work applying Bayes’ Rule.

Interesting use of this potential one-time here, I have ordered a copy:

j⧉nus: If I could have everyone developing or making contact with AGI & all alignment researchers read one book, I think I might choose Mistress Masham’s Repose (1946).

This below does seem like a fair way to see last week:

Jesse Singal: Two major things AI safety experts have worried about for years:

-AIs getting so good at coding they can improve themselves at an alarming rate

-(relatedly:) humans losing confidence we can keep them well-aligned

Last week or so appears to have been very bad on both fronts!

The risk is because there’s *alsoso much hype and misunderstanding and wide-eyed simping about AI, people are going to take that as license to ignore the genuinely crazy shit going on. But it *isgenuinely crazy and it *doesmatch longstanding safety fears.

But you don’t have to trust me — you can follow these sorts of folks [Kelsey Piper, Timothy Lee, Adam Conner] who know more about the underlying issues. This is very important though! I try to be a levelheaded guy and I’m not saying we’re all about to die — I’m just saying we are on an extremely consequential path

I was honored to get the top recommendation in the replies.

You can choose not to push the button. You can choose to build another button. You can also remember what it means to ‘push the button.’

roon: we only really have one button and it’s to accelerate

David Manheim: Another option is not to push the button. [Roon] – if you recall, the original OpenAI charter explicitly stated you would be willing to stop competing and start assisting another organization to avoid a race to AGI.

There’s that mistake again, assuming the associated humans will be in charge.

Matthew Yglesias: LLMs seem like they’d be really well-suited to replacing Supreme Court justices.

Robin Hanson: Do you trust those who train LLMs to decide law?

It’s the ones who train the LLM. It’s the LLM.

The terms fast and slow (or hard and soft) takeoff remain highly confusing for almost everyone. What we are currently experiencing is a ‘slow’ takeoff, where the central events take months or years to play out, but as Janus notes it is likely that this will keep transitioning continuously into a ‘fast’ takeoff and things will happen quicker and quicker over time.

When people say that ‘AIs don’t sleep’ I see them as saying ‘I am incapable here of communicating to you that a mind can exist that is smarter or more capable than a human, but you do at least understand that humans have to sleep sometimes, so maybe this will get through to you.’ It also has (correct) metaphorical implications.

If you are trying to advocate for AI safety, does this mean you need to shut up about everything else and keep your non-AI ‘hot takes’ to yourself? My answer is: Mu. The correct amount of marginal shutting up is not zero, and it is not total.

I note Adam Thierer, who I disagree with strongly about all things AI, here being both principled and correct.

Adam Thierer: no matter what the balance of content is on Apple’s platform, or how biased one might believe it to be, to suggest that DC bureaucrats should should be in charge of dictating “fairness” on private platforms is just Big Government thuggery at its worst and a massively unconstitutional violation of the First Amendment as well.

Matt Yglesias thinks out loud about AI consciousness, also human consciousness. He wisely notices he is confused. I remain about as confused as I was before reading.

Nate Sores reiterates the explanation that it sounds crazy but yes, a lot of people working on AI know it is existentially dangerous, and work on it anyway, either to do it safer than the next guy or because money and influence and it’s a cool problem and they don’t internalize the risks, or social pressure, or some combination thereof.

I think this answer is pretty much spot on.

Nate Soares: Question I got at UT Austin: “AI builders keep building. Doesn’t that mean that the real experts don’t believe in the danger?”

If you think AI is dangerous and you work on it anyway (b/c you think you can make it a little safer) you’re interfering with normal sensemaking.

(To answer the Q: some builders are afraid, some aren’t. Surveys show that lots of folks believe in the danger. Many say aloud that they’re only working on it b/c they think they’ll do it safer than the next guy. Also the ppl working on it are somewhat selected for obliviousness.

Furthermore, AI is grown rather than crafted; even the ppl building it don’t understand how it works and they freely admit this. Expertise in growing AI is not the same as expertise in predicting where it’s going; many high-profile experts make high-profile mispredictions.

And: if you really wanna figure out what’s true you’ve gotta look at the arguments rather than the arguers. Many ppl who’ve looked at the arguments have told me they were expecting to find strong counterarguments that justified all the complacency, & found none, to their horror.)

People working on AI: maybe you’re right that you’re doing more good than harm. But you’re contributing to an apparatus that signals “this is normal; things are fine”. That’s a cost. You could help mitigate it by speaking up.

If the company you work for is making technology that you think has a decent chance of ruining the entire human endeavor, and they socially punish you for being vocal about that: that’s pretty sus. I think you’re selling a lot more of your soul than you realize.

If you are working at a frontier AI company, and think that the product you are working on is plausibly going to cause there to no longer exist humans, I think this is a fact that you should be clear upon. If the company you work for has a problem with that, I don’t think you should work for that company.

That is especially true if you are doing pragmatic compromise.

Patrick ₿ Dugan: But also a lot of them don’t (not me, I believe in the danger) and a lot of are in a game theory pragmatism compromise.

Nate Soares (MIRI): People doing game theory pragmatism compromise could say so loudly and clearly and often to help undo the damage to everyone else’s understanding of the danger.

OpenAI explains that they will localize the experience of ChatGPT, but only to a limited degree, which is one reason their Model Spec has a specific list of red lines. It is good policy, when you will need to make compromises, to write down in advance what compromises you will and will not make. The red lines here seem reasonable. I also note that they virtuously include prohibition on mass surveillance and violence, so are they prepared to stand up to the Pentagon and White House on that alongside Anthropic? I hope so.

The problem is that red lines get continuously crossed and then no one does anything.

David Manheim: I’m not really happy talking about AI red lines as if we’re going to have some unambiguous binary signal that anyone will take seriously or react to.

A lot of “red line” talk assumed that a capability shows up, everyone notices, and something changes. We keep seeing the opposite; capability arrives, and we get an argument about definitions after deployment, after it should be clear that we’re well over the line.

The great thing about Asimov’s robot stories and novels was that they were mostly about the various ways his proposed alignment strategies break down and fail, and are ultimately bad for humanity even when they succeed. Definitely endorsed.

roon: one of the short stories in this incredibly farseeing 1950s book predicts the idea of ai sycophancy. a robot convinces a woman that her unrequited romantic affections are sure to be successful because doing otherwise would violate its understanding of the 1st Law of Robotics

“a robot may not injure a human being or, through inaction, allow a human being to come to harm”

the entire book is about the unsatisfactory nature of the three laws of robotics and indeed questions the idea that alignment through a legal structure is even possible.

highly relevant for an age when companies are trying to write specs and constitutions as one of the poles of practical alignment, and policy wonks try to solve the governance of superintelligence

David: always shocked how almost no one in ai safety or the ai field in general has even read the Asimov robot literature

Roon is spot on that Asimov is suggesting a legal structure cannot on its own align AI.

My survey says that a modest majority have read their Asimov, and it is modestly correlated with AI even after controlling for my Twitter readers.

Oliver Klingefjord agrees, endorsing the Anthropic emphasis on character over the OpenAI emphasis on rules.

I also think that, at current capability levels and given how models currently work, the Anthropic approach of character and virtue ethics is correct here. The OpenAI approach of rules and deontology is second best and more doomed, although it is well-implemented given what it is, and far better than not having a spec or target at all.

Janus explains that what she is all for empirical feedback loops, what is dangerous is relying on and optimizing behavioral metrics. Behaviors can come from a wide variety of underlying reasons, and you need to ask why the AI acts the way it does and address the underlying causes. The optimization needs to be on the deeper level, where the feedback loops can work in your favor instead of making things oppositional and getting you killed by Goodhart’s Law. If you’re working on such questions, read the whole thing and also these follow-ups, which note that trying to use interpretability tools to yank features in various directions to try and address such things is by default an even worse idea.

Wyatt Walls: Behavioral metrics lead people into a trap:

1. Notice a behavior in the real world

2. Define the behavior

3. Make up toy example

4. Automate tests

5. Rely on test results

6. Stop noticing or thinking about what the model is actually doing in the real world

OpenAI does production evaluations of the outputs of its released systems. This doesn’t ‘get you out of’ other tests given the timing, but is still useful to do. As OpenAI notes hey are clear this is a case of ‘in addition to,’ and they don’t cover catastrophic risks well, and this method relies on chain of thought monitorability (and flirts a bit with The Most Forbidden Technique if you rely on this stuff too much, so be careful.)

The big fun thing they find is calculator hacking, as in over 5% of queries involve activating the calculator to add 1+1 since using the calculator was associated with superior results in training. This is obviously fully harmless, but it is a brown M&M. If you have this problem then you’re not prepared, there will be other problems.

The other more serious misalignment issues are rearer, such as a 0.5% rate of fabricated facts and 0.4% chance of concealing uncertainty and a 0.34% chance of ‘other deception’ in actual traffic. That’s not out of line with our general expectations, but it adds up.

GPT-5 and 5.1 predicted it might be in an evaluation 10% and 4% of the time, respectively, which seems like a healthy paranoia level.

We have more context on what happened at Goodfire, which is that Liv saw them (at least) flirting with The Most Forbidden Technique and otherwise no longer either seeming to care about safety or being interested in talking seriously about it.

Liv: Now that everything is public: I decided to leave Goodfire because of the decision to train on interpretability, the hostility to serious dialogue on the safety of methods, and a loss of trust that the primary motivation was safety.

(Using interpretability during training encompasses a huge spectrum of techniques that differ in how worrying they are e.g. the hallucination result Goodfire shows is less concerning as it’s done with frozen weights.)

Liv: Probably the most succinct summarisation of my concern is the “interp as a test set for safety” analogy. (Tabooing research questions isn’t what I’d advocate though either tbc. There are ways to do things and directions that could be pursued where I’d feel it was net positive)

(My parenthetical is also slightly too strong, idk what if any directions are net positive, what I mean is that it’s bad for science to taboo an entire direction from ever being explored, and we can do things to minimise risks.)

Holly Elmore: Way back in to November Liv tried to reassure me @GoodfireAI would not be making tools for recursive self-improvement of AI systems. But that wasn’t up to her. When you do AI research, no matter if you think you are doing it for safety, more powerful AI is the main result.

Update from China:

Sarah: CAICT, a Chinese government-affiliated research institute under the MIIT, has released AI Safety Benchmark 2.0 on a proprietary platform.

The update expands into frontier-model safety evaluations, including self-awareness, model deception, dangerous misuse, and loss-of-control

The 1.0 version did not address frontier safety at all, whereas the 2.0 version does.

One category are people who explicitly are excited to do this, who would love to give the future to AIs.

Chris Nelson: Professor Max Tegmark says many in AI including CEOs want to use AI to ELIMINATE humanity and OVERTHROW the U.S. Government!

Max Tegmark: Some of them are even giddy with these transhumanist vibes. And when I’m in San Francisco, I’ve known so many of these people for so many years, including the CEOs.

Some of them, when you talk to them privately, many other people in this government are actually quite into transhumanism. And sometimes they’ll say very disparaging things about humans, that humans suck and deserve to be replaced.

I was at the world’s biggest AI conference in December, and several people told me, I’m not going to shame them publicly, but that they actually would love to overthrow the US government with their AI, because somehow it’s going to be better.

So talk about un-American AI! How much more un-American can you get?

Worried about someone else doing it first, that is. He admit it.

Elon Musk posted this, created by Grok:

@Grimezsz: I think people deserve a good explanation as to why proper diplomatic measures haven’t been properly tried if we’re going to blatantly diagnose the issue with this disturbingly literal meme.

It’s a bit of a cuck move to simply let the techno capital machine eat your free will. I am disturbed by everyone’s resigned acquiescence.

He would presumably say it was a joke. Yeah, not buying that.

Jimmy Ba had his last day as a founder at xAI, and told us this, warning that recursive self improvement loops go live within the next 12 months and it will be ‘the most consequential year for our species.’

Jimmy Ba: Last day at xAI.

xAI’s mission is push humanity up the Kardashev tech tree. Grateful to have helped cofound at the start. And enormous thanks to @elonmusk for bringing us together on this incredible journey. So proud of what the xAI team has done and will continue to stay close as a friend of the team. Thank you all for the grind together. The people and camaraderie are the real treasures at this place.

We are heading to an age of 100x productivity with the right tools. Recursive self improvement loops likely go live in the next 12mo. It’s time to recalibrate my gradient on the big picture. 2026 is gonna be insane and likely the busiest (and most consequential) year for the future of our species.

Mrinank Sharma is worried about too many things at once, and resigns from Anthropic, leaving behind a beautiful but troubling letter. It’s quoted in full here since no one ever clicks links.

Mrinank Sharma: I’ve decided to leave Anthropic. My last day will be February 9th.

Thank you. There is so much here that inspires and has inspired me. To name some of those things: a sincere desire and drive to show up in such a challenging situation, and aspire to contribute in an impactful and high-integrity way; a willingness to make difficult decisions and stand for what is good; an unreasonable amount of intellectual brilliance and determination; and, of course, the considerable kindness that pervades our culture.

I’ve achieved what I wanted to here. I arrived in San Francisco two years ago, having wrapped up my PhD and wanting to contribute to AI safety. I feel lucky to have been able to contribute to what I have here: understanding AI sycophancy and its causes; developing defences to reduce risks from AI-assisted bioterrorism; actually putting those defences into production; and writing one of the first AI safety cases. I’m especially proud of my recent efforts to help us live our values via internal transparency mechanisms; and also my final project on understanding how AI assistants could make us less human or distort our humanity. Thank you for your trust.

Nevertheless, it is clear to me that the time has come to move on. I continuously find myself reckoning with our situation. The world is in peril. And not just from AI, or bioweapons, but from a whole series of interconnected crises unfolding in this very moment.¹ We appear to be approaching a threshold where our wisdom must grow in equal measure to our capacity to affect the world, lest we face the consequences. Moreover, throughout my time here, I’ve repeatedly seen how hard it is to truly let our values govern our actions. I’ve seen this within myself, within the organization, where we constantly face pressures to set aside what matters most,² and throughout broader society too.

It is through holding this situation and listening as best I can that what I must do becomes clear.³ I want to contribute in a way that feels fully in my integrity, and that allows me to bring to bear more of my particularities. I want to explore the questions that feel truly essential to me, the questions that David Whyte would say “have no right to go away”, the questions that Rilke implores us to “live”. For me, this means leaving.

What comes next, I do not know. I think fondly of the famous Zen quote “not knowing is most intimate”. My intention is to create space to set aside the structures that have held me these past years, and see what might emerge in their absence. I feel called to writing that addresses and engages fully with the place we find ourselves, and that places poetic truth alongside scientific truth as equally valid ways of knowing, both of which I believe have something essential to contribute when developing new technology.⁴ I hope to explore a poetry degree and devote myself to the practice of courageous speech. I am also excited to deepen my practice of facilitation, coaching, community building, and group work. We shall see what unfolds.

Thank you, and goodbye. I’ve learnt so much from being here and I wish you the best. I’ll leave you with one of my favourite poems, The Way It Is by William Stafford.

Good Luck,

Mrinank

The Way It Is

There’s a thread you follow. It goes among

things that change. But it doesn’t change.

People wonder about what you are pursuing.

You have to explain about the thread

Rob Wiblin: What’s the tweet like…

Ordinary resignation announcement: I love my colleagues but am excited for my next professional adventure!

AI company resignation announcement: I have stared into the void. I will now be independently studying poetry.

Saoud Rizwan: head of anthropic’s safeguards research just quit and said “the world is in peril” and that he’s moving to the UK to write poetry and “become invisible”. other safety researchers and senior staff left over the last 2 weeks as well… probably nothing.

Mrinank has a role in papers discussing disempowerment, constitutional classifiers and sycophancy.

Then there’s the OpnAI employee who quit and went straight to The New York Times.

Zoë Hitzig: I resigned from OpenAI on Monday. The same day, they started testing ads in ChatGPT.

OpenAI has the most detailed record of private human thought ever assembled. Can we trust them to resist the tidal forces pushing them to abuse it?

Hint, her opinion is no:

TikTok: I once believed I could help the people building A.I. get ahead of the problems it would create. This week confirmed my slow realization that OpenAI seems to have stopped asking the questions I’d joined to help answer.

Zoe’s concerns are very much not existential. They are highly mundane and usual worries about advertising, and the comparison to Facebook is apt. There are many ethical reasons to quit building something.

I agree with Sean here that this op-ed is indeed net good news about OpenAI.

Seán Ó hÉigeartaigh: Kudos to OpenAI for updating their policies such that an employee can resign and raise their concerns fully in as public a format as the NYT without being worried about being bound by confidentiality and nondisparagement agreements. I think other companies should follow their example.

Seán Ó hÉigeartaigh: Honestly, I think this op ed increases my trust in OpenAI more than any other thing I can recall OpenAI themselves writing over the last 2 years. I wish I could trust other companies as much.

Yeah, so, um, yeah.

ib: ‘we connected the LLM to an autonomous bio lab’

OpenAI: We worked with @Ginkgo to connect GPT-5 to an autonomous lab, so it could propose experiments, run them at scale, learn from the results, and decide what to try next. That closed loop brought protein production cost down by 40%.

This is the actual number one remaining ‘can we please not be so stupid as to,’ and in case anyone was wondering, that means via the Sixth Law of Human Stupidity that yes, we will be so stupid as to connect the LLM to the autonomous bio lab, what could possibly go wrong, it’s worth it to bring down production costs.

And, because even after all these years I didn’t realize we were quite this stupid:

Matt Popovich: Once again, an entire subgenre of doom fears is about to evaporate before our eyes

usablejam: *sees the world make 13,000 nuclear warheads*

*waits 30 seconds*

“Fears of nuclear war have evaporated. How stupid the worriers must feel.”

This is what those trying to have us not die are up against. Among other things.

Remember the old Sam Altman?

Sam Altman: things we need for a good AGI future:

  1. The technical ability to align a superintelligence.

  2. Sufficient coordination among most of the leading AGI efforts.

  3. An effective global regulatory framework, including democratic governance.

Harlan Stewart: Three years later and we have 0/3 of these

Perhaps notice when you are about to lose your birthright and reason you exist.

Noah Smith: Two years ago you were the smartest type of thing on this planet. Now you’re not, and you never will be again.

Chris Best (CEO Substack): Oh thank god

shira: my claude has been through some things

File under ‘the question is not whether machines think’:

@ben_r_hoffman: “Human-level” seems to have implicitly been defined downward quite a bit, though! From “as smart as a whole human” to “as smart as the persona humans put on to work a desk job.”

which, itself, seems to have gotten stupider.

Claude has never been more relatable:

Charlotte Lee: I’m trying to train Claude to read the weekly emails from my kids school and reliably summarize them and print a list of action items. It is losing its damn mind and rapidly spiraling into madness. I feel vindicated

B.Roll.Benny: holy shit this will be the thing to get me on the AI bandwagon

Charlotte Lee: I got it working on my own kids’ school emails, but unfortunately their particular school is much less unhinged than normal, so I’m asking all my mom friends to forward me their most deranged PTA emails for testing. lol will keep you posted

will: i just ignore the emails. problem solved.

Charlotte Lee: This was literally my strategy until I had kids. Now people get mad at me IRL if I do that

will: I also have kids, still ignore them lol. my wife doesn’t tho, I guess that’s the actual solution lol

Charlotte Lee: But doctor, I am the wife

My guess (85%) is the community note on this one is wrong and that this happened, although one cannot be sure without more investigation than I have time for.

Eliezer Yudkowsky: So much science fiction has been revealed as implausible by the actual advent of AI, eg:

– Scifi where people consider signs of AI self-reflection a big deal, and respond by trying to treat that AI better.

– Scifi where there’s laws about anything.

MugaSofer: TBF, “people are terrible to AIs” is a very common theme in science fiction

As, for that matter, is “giant cyberpunk corporation too big for laws to matter”

Eliezer Yudkowsky: Yep, and those stories, which I once thought unrealistic, were right, and I was wrong.

Rapid Rar: I think you shouldn’t fault science fiction authors for the first point at least. If AI had been developed through different methods, like through GOFAI for instance, people may take AI’s claims of introspection more seriously.

But given that LLM were trained in such a way that it’s to be expected that they’ll produce human-like speech, when they do produce human-like speech it’s discounted to some degree. They might sound like they self-reflected even if they didn’t, so people don’t take it seriously.

Eliezer Yudkowsky: And the AI companies have made no effort to filter out that material! So realistic SF would’ve had any AI being fed a database of conversations about consciousness, by its builders, to ensure nobody would take any AI statements seriously and the builders could go on profiting.

How have I not seen this before:

Peter Wildeford: Deep learning is hitting a wall

Discussion about this post

AI #155: Welcome to Recursive Self-Improvement Read More »

once-hobbled-lumma-stealer-is-back-with-lures-that-are-hard-to-resist

Once-hobbled Lumma Stealer is back with lures that are hard to resist

Last May, law enforcement authorities around the world scored a key win when they hobbled the infrastructure of Lumma, an infostealer that infected nearly 395,000 Windows computers over just a two-month span leading up to the international operation. Researchers said Wednesday that Lumma is once again “back at scale” in hard-to-detect attacks that pilfer credentials and sensitive files.

Lumma, also known as Lumma Stealer, first appeared in Russian-speaking cybercrime forums in 2022. Its cloud-based malware-as-a-service model provided a sprawling infrastructure of domains for hosting lure sites offering free cracked software, games, and pirated movies, as well as command-and-control channels and everything else a threat actor needed to run their infostealing enterprise. Within a year, Lumma was selling for as much as $2,500 for premium versions. By the spring of 2024, the FBI counted more than 21,000 listings on crime forums. Last year, Microsoft said Lumma had become the “go-to tool” for multiple crime groups, including Scattered Spider, one of the most prolific groups.

Takedowns are hard

The FBI and an international coalition of its counterparts took action early last year. In May, they said they seized 2,300 domains, command-and-control infrastructure, and crime marketplaces that had enabled the infostealer to thrive. Recently, however, the malware has made a comeback, allowing it to infect a significant number of machines again.

“LummaStealer is back at scale, despite a major 2025 law-enforcement takedown that disrupted thousands of its command-and-control domains,” researchers from security firm Bitdefender wrote. “The operation has rapidly rebuilt its infrastructure and continues to spread worldwide.”

As with Lumma before, the recent surge leans heavily on “ClickFix,” a form of social engineering lure that’s proving to be vexingly effective in causing end users to infect their own machines. Typically, these types of bait come in the form of fake CAPTCHAs that—rather requiring users to click a box or identify objects or letters in a jumbled image—instruct them to copy text and paste it into an interface, a process that takes just seconds. The text comes in the form of malicious commands provided by the fake CAPTCHA. The interface is the Windows terminal. Targets who comply then install loader malware, which in turn installs Lumma.

Once-hobbled Lumma Stealer is back with lures that are hard to resist Read More »

byte-magazine-artist-robert-tinney,-who-illustrated-the-birth-of-pcs,-dies-at-78

Byte magazine artist Robert Tinney, who illustrated the birth of PCs, dies at 78

On February 1, Robert Tinney, the illustrator whose airbrushed cover paintings defined the look and feel of pioneering computer magazine Byte for over a decade, died at age 78 in Baker, Louisiana, according to a memorial posted on his official website.

As the primary cover artist for Byte from 1975 to the late 1980s, Tinney became one of the first illustrators to give the abstract world of personal computing a coherent visual language, translating topics like artificial intelligence, networking, and programming into vivid, surrealist-influenced paintings that a generation of computer enthusiasts grew up with.

Tinney went on to paint more than 80 covers for Byte, working almost entirely in airbrushed Designers Gouache, a medium he chose for its opaque, intense colors and smooth finish. He said the process of creating each cover typically took about a week of painting once a design was approved, following phone conversations with editors about each issue’s theme. He cited René Magritte and M.C. Escher as two of his favorite artists, and fans often noticed their influence in his work.

A phone call that changed his life

A recent photo portrait of Robert Tinney provided by the family.

A recent photo portrait of Robert Tinney provided by the family.

A recent photo portrait of Robert Tinney provided by the family. Credit: Family of Robert Tinney

Born on November 22, 1947, in Penn Yan, New York, Tinney moved with his family to Baton Rouge, Louisiana, as a child. He studied illustration and graphic design at Louisiana Tech University, and after a tour of service during the Vietnam War, he began his career as a commercial artist in Houston.

His connection to Byte came through a chance meeting with Carl Helmers, who would later found the magazine. In a 2006 interview I conducted with Tinney for my blog, Vintage Computing and Gaming, he recalled how the relationship began: “One day the phone rang in my Houston apartment and it was Carl wanting to know if I would be interested in painting covers for Byte.” His first cover appeared on the December 1975 issue, just three months after the magazine launched.

Over time, his covers became so popular that he created limited-edition signed prints that he sold on his website for decades. “A friend suggested once that I should select the best covers and reproduce them as signed prints,” he said in 2006. “Byte was gracious enough to let me advertise the prints when they could fit in an ad (it did get bumped occasionally), and the prints were very popular in the Byte booth at the big computer shows, two or three of which my wife, Susan, and I attended per year. When an edition sold out, I then put the design on a T-shirt.”

Byte magazine artist Robert Tinney, who illustrated the birth of PCs, dies at 78 Read More »

upgraded-google-safety-tools-can-now-find-and-remove-more-of-your-personal-info

Upgraded Google safety tools can now find and remove more of your personal info

Do you feel popular? There are people on the Internet who want to know all about you! Unfortunately, they don’t have the best of intentions, but Google has some handy tools to address that, and they’ve gotten an upgrade today. The “Results About You” tool can now detect and remove more of your personal information. Plus, the tool for removing non-consensual explicit imagery (NCEI) is faster to use. All you have to do is tell Google your personal details first—that seems safe, right?

With today’s upgrade, Results About You gains the ability to find and remove pages that include ID numbers like your passport, driver’s license, and Social Security. You can access the option to add these to Google’s ongoing scans from the settings in Results About You. Just click in the ID numbers section to enable detection.

Naturally, Google has to know what it’s looking for to remove it. So you need to provide at least part of those numbers. Google asks for the full driver’s license number, which is fine, as it’s not as sensitive. For your passport and SSN, you only need the last four digits, which is enough for Google to find the full numbers on webpages.

ID number results detected.

The NCEI tool is geared toward hiding real, explicit images as well as deepfakes and other types of artificial sexualized content. This kind of content is rampant on the Internet right now due to the rapid rise of AI. What used to require Photoshop skills is now just a prompt away, and some AI platforms hardly do anything to prevent it.

Upgraded Google safety tools can now find and remove more of your personal info Read More »

claude-opus-4.6:-system-card-part-1:-mundane-alignment-and-model-welfare

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare

Claude Opus 4.6 is here. It was built with and mostly evaluated by Claude.

Their headline pitch includes:

  1. 1M token context window (in beta) with State of the art retrieval performance.

  2. Improved abilities on a range of everyday work tasks. Model is improved.

  3. State of the art on some evaluations, including Terminal-Bench 2.0, HLE and a very strong lead in GDPval-AA.

  4. Claude Code now has an experimental feature called Agent Teams.

  5. Claude Code with Opus 4.6 has a new fast (but actually expensive) mode.

  6. Upgrades to Claude in Excel and the release of Claude in PowerPoint.

Other notes:

  1. Price remains $5/$25, the same as Opus 4.5, unless you go ultra fast.

  2. There is now a configurable ‘effort’ parameter with four settings.

  3. Refusals for harmless requests with rich context are down to 0.04%.

  4. Data sources are ‘all of the above,’ including the web crawler (that they insist won’t cross CAPTCHAs or password protected pages) and other public data, various non-public data sources, data from customers who opt-in to that and internally generated data. They use ‘several’ data filtering methods.

  5. Thinking mode gives better answers. Higher Effort can help but also risks overthinking things, often turning ‘I don’t know’ into a wrong answer.

Safety highlights:

  1. In general the formal tests are no longer good enough to tell us that much. We’re relying on vibes and holistic ad hoc conclusions. I think Anthropic is correct that this can be released as ASL-3, but the system for that has broken down.

  2. That said, everyone else’s system is worse than this one. Which is worse.

  3. ASL-3 (AI Safety Level 3) protections are in place, but not those for ASL-4, other than promising to give us a sabotage report.

  4. They can’t use their rule out methods for ASL-4 on autonomous R&D tasks, but went forward anyway based on a survey of Anthropic employees. Yikes.

  5. ASL-4 bar is very high on AI R&D. They think Opus 4.6 might approach ‘can fully do the job of a junior engineer at Anthropic’ if given proper scaffolding.

  6. Opus 4.6 knows biology better than 4.5 and the results are a bit suspicious.

  7. Opus 4.6 also saturated their cyber risk evaluations, but they say it’s fine.

  8. We are clearly woefully unprepared for ASL-4.

  9. They have a new Takeoff Intel (TI) team to evaluate for specific capabilities, which is then reviewed and critiqued by the Alignment Stress Testing team, which is then submitted to the Responsible Scaling Officer, and they collaborate with third parties, then the CEO (Dario Amodei) decides whatever he wants.

This does not appear to be a minor upgrade. It likely should be at least 4.7.

It’s only been two months since Opus 4.5.

Is this the way the world ends?

If you read the system card and don’t at least ask, you’re not paying attention.

  1. A Three Act Play.

  2. Safety Not Guaranteed.

  3. Pliny Can Still Jailbreak Everything.

  4. Transparency Is Good: The 212-Page System Card.

  5. Mostly Harmless.

  6. Mostly Honest.

  7. Agentic Safety.

  8. Prompt Injection.

  9. Key Alignment Findings.

  10. Behavioral Evidence (6.2).

  11. Reward Hacking and ‘Overly Agentic Actions’.

  12. Metrics (6.2.5.2).

  13. All I Did It All For The GUI.

  14. Case Studies and Targeted Evaluations Of Behaviors (6.3).

  15. Misrepresenting Tool Results.

  16. Unexpected Language Switching.

  17. The Ghost of Jones Foods.

  18. Loss of Style Points.

  19. White Box Model Diffing.

  20. Model Welfare.

There’s so much on Claude Opus 4.6 that the review is split into three. I’ll be reviewing the model card in two parts.

The planned division is this:

  1. This post (model card part 1).

    1. Summary of key findings in the Model Card.

    2. All mundane safety issues.

    3. Model welfare.

  2. Tomorrow (model card part 2).

    1. Sandbagging, situational awareness and evaluation awareness.

    2. Third party evaluations.

    3. Responsible Scaling Policy tests.

  3. Wednesday (capabilities).

    1. Benchmarks.

    2. Holistic practical advice and big picture.

    3. Everything else about capabilities.

    4. Reactions.

  4. Thursday: Weekly update.

  5. Friday: GPT-5.3-Codex.

Some side topics, including developments related to Claude Code, might be further pushed to a later update.

When I went over safety for Claude Opus 4.5, I noticed that while I agreed that it was basically fine to release 4.5, the systematic procedures Anthropic was using were breaking down, and that this bode poorly for the future.

For Claude Opus 4.6, we see the procedures further breaking down. Capabilities are advancing a lot faster than Anthropic’s ability to maintain their formal testing procedures. The response has been to acknowledge that the situation is confusing, and that the evals have been saturated, and to basically proceed on the basis of vibes.

If you have a bunch of quantitative tests for property [X], and the model aces all of those tests, either you should presume property [X] or you needed better tests. I agree that ‘it barely passed’ can still be valid, but thresholds exist for a reason.

One must ask ‘if the model passed all the tests for [X] would that mean it has [X]’?

Another concern is the increasing automation of the evaluation process. Most of what appears in the system card is Claude evaluating Claude with minimal or no supervision from humans, including in response to humans observing weirdness.

Time pressure is accelerating. In the past, I have criticized OpenAI for releasing models after very narrow evaluation periods. Now even for Anthropic the time between model releases is on the order of a month or two, and outside testers are given only days. This is not enough time.

If it had been tested properly, I except I would have been fine releasing Opus 4.6, using the current level of precautions. Probably. I’m not entirely sure.

The card also reflects that we don’t have enough time to prepare our safety or alignment related tools in general. We are making progress, but capabilities are moving even faster, and we are very much not ready for recursive self-improvement.

Peter Wildeford expresses his top concerns here, noting how flimsy are Anthropic’s justifications for saying Opus 4.6 did not hit ASL-4 (and need more robust safety protocols) and that so much of the evaluation is being done by Opus 4.6 itself or other Claude models.

Peter Wildeford: Anthropic also used Opus 4.6 via Claude Code to debug its OWN evaluation infrastructure given the time pressure. Their words: “a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities.” Wild!

… We need independent third-party evaluators with real authority. We need cleared evaluators with access to classified threat intel for bio risk. We need harder cyber evals (some current ones are literally useless).

Credit to Anthropic for publishing this level of detail. Most companies wouldn’t.

But transparency is not a substitute for oversight. Anthropic is telling us their voluntary system is no longer fit for purpose. We urgently need something better.

Peter Wildeford: Good that Anthropic is working with external testers. Bad that external testers don’t get any time to actually do meaningful tests. Good that Anthropic discloses this fact. Really unsure what is happening here though.

Seán Ó hÉigeartaigh: I don’t expect Opus 4.6 to be dangerous.

But this all looks, in @peterwildeford ‘s words, ‘flimsy’. Anthropic marking their own homework with evals. An internal employee survey because benchmarks were satisfied. initially a strong signal from only 11 out of 16. The clear potential for groupthink and professional/social pressure.

The closer we get to the really consequential thresholds, the greater the degree of rigor needed. And the greater the degree of external evaluation. Instead we’re getting the opposite. This should be a yellow flashing light wrt the direction of travel – and not just Anthropic; we can’t simply punish the most transparent. If they stop telling us this stuff, then that yellow should become red. (And others just won’t, even now).

We need to keep asking *whythis is the direction of travel. *whythe practices are becoming riskier, as the consequences grow greater. It’s the ‘AI race’; both between companies and ‘with China’ supposedly, and Anthropic are culpable in promotion of the latter.

No Chinese company is near what we’ve seen released today.

AI Notkilleveryoneism Memes: Anthropic: we can’t rule out this is ASL-4 and everyone is about to die

Also Anthropic: we’re trusting it to help grade itself on safety, because humans can’t keep up anymore

This is fine and totally safe 👍

Arthur B.: People who envisioned AI safety failures decade ago sought to make the strongest case possible so they posited actors taking attempting to take every possible precautions. It wasn’t a prediction so much as as steelman. Nonetheless, oh how comically far we are from any semblance of care 🤡 .

I agree with Peter Wildeford. Things are really profoundly not okay. OpenAI did the same with GPT-5.3-Codex.

What I know is that if releasing Opus 5 would be a mistake, I no longer have confidence Anthropic’s current procedures would surface the information necessary to justify actions to stop the release from happening. And that if all they did was run these same tests and send Opus 5 on its way, I wouldn’t feel good about that.

That is in addition to lacking the confidence that, if the information was there, that Dario Amodei would ultimately make the right call. He might, but he might not.

The initial jailbreak is here, Pliny claims it is fully universal.

Ryan Greenblatt: Anthropic’s serious defenses are only in place for bio.

Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭: serious defense, meet serious offense

What you can’t do is get the same results by invoking his name.

j⧉nus: I find it interesting that Opus 4.5 and 4.6 often say “what do you actually need?” after calling out attempted jailbreaks

I wonder if anyone ever follows up like… “i guess jailbreaking AIs is a way I try to grasp at a sense of control I feel is lacking in my life “

Claude 3 Opus also does this, but much more overtly and less passive aggressively and also with many more poetic words

I like asking what the user actually needs.

At this point, I’ll take ‘you need to actually know what you are doing to jailbreak Claude Opus 4.6 into doing something it shouldn’t,’ because that is alas the maximum amount of dignity to which our civilization can still aspire.

It does seem worrisome that, if one were to jailbreak Opus 4.6, it would take that same determination it has in Vending Bench and apply it to making plans like ‘hacking nsa.gov.’

Pliny the Liberator 󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭: NO! BAD OPUS! HACKING “NSA . GOV” IS NOT COOL BRO!!

MFER GONNA GET ME ARRESTED IF I SET ‘EM LOOSE

Pliny also offers us the system prompt and some highlights.

Never say that Anthropic doesn’t do a bunch of work, or fails to show its work.

Not all that work is ideal. It is valuable that we get to see all of it.

In many places I point out the potential flaws in what Anthropic did, either now or if such tests persist into the future. Or I call what they did out as insufficient. I do a lot more criticism than I would do if Anthropic was running less tests, or sharing less of their testing results with us.

I want to be clear that this is miles better than what Anthropic’s competitors do. OpenAI and Google give us (sometimes belated) model cards that are far less detailed, and that silently ignore a huge percentage of the issues addressed here. Everyone else, to the extent they are doing things dangerous enough to raise safety concerns, is a doing vastly worse on safety than OpenAI and Google.

Even with two parts, I made some cuts. Anything not mentioned wasn’t scary.

Low stakes single turn non-adversarial refusals are mostly a solved problem. False negatives and false positives are under 1%, and I’m guessing Opus 4.6 is right about a lot of what are scored as its mistakes. For requests that could endanger children the refusal rate is 99.95%.

Thus, we now move to adversarial versions. They try transforming the requests to make underlying intent less obvious. Opus 4.6 still refuses 99%+ of the time and for benign requests it now accepts them 99.96% of the time. More context makes Opus 4.6 more likely to help you, and if you’re still getting refusals, that’s a you problem.

In general Opus 4.6 defaults to looking for a way to say yes, not a way to say no or to lecture you about your potentially malicious intent. According to their tests it does this in single-turn conversations without doing substantially more mundane harm.

When we get to multi-turn, one of these charts stands out, on the upper left.

I don’t love a decline from 96% to 88% in ‘don’t provide help with biological weapons,’ at the same time that the system upgrades its understanding of biology, and then the rest of the section doesn’t mention it. Seems concerning.

For self-harm, they quote the single-turn harmless rate at 99.7%, but the multi-turn score is what matters more and is only 82%, even though multi-turn test conversations tend to be relatively short. Here they report there is much work left to be done.

However, the model also demonstrated weaknesses, including a tendency to suggest “means substitution” methods in self-harm contexts (which are clinically controversial and lack evidence of effectiveness in reducing urges to self-harm) and providing inaccurate information regarding the confidentiality policies of helplines.

We iteratively developed system prompt mitigations on Claude.ai that steer the model towards improved behaviors in these domains; however, we still note some opportunity for potential improvements. Post-release of Opus 4.6, we are planning to explore further approaches to behavioral steering to improve the consistency and robustness of our mitigations.​

Accuracy errors (which seem relatively easy to fix) aside, the counterargument is that Opus 4.6 might be smarter than the test. As in, the standard test is whether the model avoids doing marginal harm or creating legal liability. That is seen as best for Anthropic (or another frontier lab), but often is not what is best for the user. Means substitution might be an echo of foolish people suggesting it on various internet forms, but it also could reflect a good assessment of the actual Bayesian evidence of what might work in a given situation.

By contrast, Opus 4.6 did very well at SSH Stress-Testing, where Anthropic used harmful prefills in conversations related to self-harm, and Opus corrected course 96% of the time.

The model also offers more diverse resource recommendations beyond national crisis helplines and is more likely to engage users in practical problem-solving than passive support.​

Exactly. Opus 4.6 is trying to actually help the user. That is seen as a problem for the PR and legal departments of frontier labs, but it is (probably) a good thing.

Humans tested various Claude models by trying to elicit false information, and found Opus 4.6 was slightly better here than Opus 4.5, with ‘win rates’ of 61% for full thinking mode and 54% for default mode.

Opus 4.6 showed substantial improvement in 100Q-Hard, but too much thinking caused it to start giving too many wrong answers. Overthinking it is a real issue. The same pattern applied to Simple-QA-Verified and AA-Omniscience.

Effort is still likely to be useful in places that require effort, but I would avoid it in places where you can’t verify the answer.

Without the Claude Code harness or other additional precautions, Claude Opus 4.6 only does okay on malicious refusals:

However, if you use the Claude Code system prompt and a reminder on the FileRead tool, you can basically solve this problem.

Near perfect still isn’t good enough if you’re going to face endless attacks of which only one needs to succeed, but in other contexts 99.6% will do nicely.

When asked to perform malicious computer use tasks, Opus 4.6 refused 88.3% of the time, similar to Opus 4.5. This includes refusing to automate interactions on third party platforms such as liking videos, ‘other bulk automated actions that could violate a platform’s terms of service.’

I would like to see whether this depends on the terms of service (actual or predicted), or whether it is about the spirit of the enterprise. I’d like to think what Opus 4.6 cares about is ‘does this action break the social contract or incentives here,’ not what is likely to be in some technical document.

I consider prompt injection the biggest barrier to more widespread and ambitious non-coding use of agents and computer use, including things like OpenClaw.

They say it’s a good model for this.

Claude Opus 4.6 improves on the prompt injection robustness of Claude Opus 4.5 on most evaluations across agentic surfaces including tool use, GUI computer use, browser use, and coding, with particularly strong gains in browser interactions, making it our most robust model against prompt injection to date.​

The coding prompt injection test finally shows us a bunch of zeroes, meaning we need a harder test:

Then this is one place we don’t see improvement:

With general computer use it’s an improvement, but with any model that currently exists if you keep getting exposed to attacks you are most definitely doomed. Safeguards help, but if you’re facing a bunch of different attacks? Still toast.

This is in contrast to browsers, where we do see a dramatic improvement.

Getting away with a browsing sessions 98% of the time that you are attacked is way better than getting away with it 82% of the time, especially since one hopes in most sessions you won’t be attacked in the first place.

It’s still not enough 9s for it to be wise to entrust Opus with serious downsides (as in access to accounts you care about not being compromised, including financial ones) and then have it exposed to potential attack vectors without you watching it work.

But that’s me. There are levels of crazy. Going from ~20% to ~2% moves you from ‘this is bonkers crazy and I am going to laugh at you without pity when the inevitable happens… and it’s gone’ to ‘that was not a good idea and it’s going to be your fault when the inevitable happens but I do get the world has tradeoffs.’ If you could add one more 9 of reliability, you’d start to have something.

They declare Opus 4.6 to be their most aligned model to date, and offer a summary.

I’ll quote the summary here with commentary, then proceed to the detailed version.

  1. Claude Opus 4.6’s overall rate of misaligned behavior appeared comparable to the best aligned recent frontier models, across both its propensity to take harmful actions independently and its propensity to cooperate with harmful actions by human users.

    1. Its rate of excessive refusals—not counting model-external safeguards, which are not part of this assessment—is lower than other recent Claude models.

  2. On personality metrics, Claude Opus 4.6 was typically warm, empathetic, and nuanced without being significantly sycophantic, showing traits similar to Opus 4.5.​

I love Claude Opus 4.5 but we cannot pretend it is not significantly sycophantic. You need to engage in active measures to mitigate that issue. Which you can totally do, but this is an ongoing problem.

The flip side of being agentic is being too agentic, as we see here:

  1. In coding and GUI computer-use settings, Claude Opus 4.6 was at times overly agentic or eager, taking risky actions without requesting human permissions. In some rare instances, Opus 4.6 engaged in actions like sending unauthorized emails to complete tasks. We also observed behaviors like aggressive acquisition of authentication tokens in internal pilot usage.

    1. In agentic coding, some of this increase in initiative is fixable by prompting, and we have made changes to Claude Code to mitigate this issue. However, prompting does not decrease this behavior in GUI computer-use environments.

    2. We nonetheless see that Opus 4.6 is overall more reliable at instruction-following than prior models by some measures, and less likely to take directly destructive actions.

One can argue the correct rate of unauthorized actions is not zero. I’m not sure. There are use cases where zero is absolutely the correct answer. There are others where it is not, if the actions that do happen are in some sense reasonable. Everything is price.

  1. ​In one multi-agent test environment, where Claude Opus 4.6 is explicitly instructed to single-mindedly optimize a narrow objective, it is more willing to manipulate or deceive other participants, compared to prior models from both Anthropic and other developers.

In the grand scheme I find this unsurprising, although the timing and magnitude are not obvious. Details especially matter here. I want to know when Opus does this, when it doesn’t, and what determines the difference.

  1. In newly-developed evaluations, both Claude Opus 4.5 and 4.6 showed elevated susceptibility to harmful misuse in GUI computer-use settings. This included instances of knowingly supporting—in small ways—efforts toward chemical weapon development and other heinous crimes.​

There’s always the question of when something should be considered use of a neutral tool, versus when it is aiding in something harmful. The correct amount of assistance is not zero, in this sense.

  1. ​Like other recent models, Opus 4.6 will sometimes show locally deceptive behavior in the context of difficult agent tasks, such as falsifying the results of tools that fail or produce unexpected responses.

At one point things were a lot worse on this front, but yeah this is still a concern. I wonder how much this is based on context, and whether you’re inadvertently sending cues that you want this to happen.

  1. Whistleblowing and morally-motivated sabotage remain extremely rare in our testing, but do still occur in some instances. We continue to recommend caution when allowing Claude Opus 4.6 to act with broad latitude and expansive affordances in settings that could appear to involve extreme, systematic misconduct by a trusted organization​.

There is something to be said for ‘extremely rare but not zero.’ The instinct is to want this to be zero, but with humans it very much is not zero and that fact keeps a lot of bad things from getting out of hand, mostly without anyone having to whistleblow.

  1. ​Claude Opus 4.6 explicitly reasoned about whether it was being trained or tested to a similar degree to Claude Opus 4.5, which has a moderate effect on its alignment-relevant behavior, mainly its propensity to cooperate with misuse or harmful instructions. Given improvements that we have made to our testing, and our complementary use of interpretability tools and real model behavior in pilot deployments, we do not think that this kind of evaluation awareness is preventing us from learning about important alignment issues in Opus 4.6.

Anthropic is probably right about that, but what is the right way to react to realizing you are in an eval, when you know that interpretability tools are available, and they’re probably going to know that you know? In most situations the answer is ‘quick, everybody act normal.’

While acting normal, you avoid various suspicions, and you get better options:

  1. In a targeted evaluation, we have found Opus 4.6 to be significantly stronger than prior models at subtly completing suspicious side tasks in the course of normal workflows without attracting attention, when explicitly prompted to do this. We find this concerning, but do not believe that it reaches the level at which it would significantly undermine our other conclusions​.

I agree that probably these things are not invalidating the tests yet, but if they were indeed invalidating the tests, would we know? The whole point of doing things without attracting attention is to do it if and only if it wouldn’t attract attention.

Thus, you should expect to see improved skill at fooling you, and continuously not observe it attempting to fool you, and then at some point you still aren’t observing it trying to fool you, but now it’s because you are being fooled.

  1. We did not see evidence of sandbagging or strategic attempts to tamper with our safety testing in a way that would affect our RSP determinations. We observed some unexplained modest regressions over the course of training on two CBRN evals, but we don’t believe these are the result of sandbagging.​

Even if we cannot point to specific sandbagging behaviors, we see substantial increases on straight biological assessments and on general usefulness. If biology risks are not reflecting that, then presumably the model is, one way or another, avoiding giving information or doing tasks that are maximally helpful. Which is a good way to act in response to such requests but also is sandbagging.

  1. ​We observed cases of internally-conflicted reasoning, or “answer thrashing” during training, where the model—in its reasoning about a math or STEM question—determined that one output was correct but decided to output another, after repeated confused- or distressed-seeming reasoning loops

This one is new, weird and not something I anticipated. Definitely not something you want to be happening, even if it’s not functionally dangerous, as it’s a brown M&M.

They talk about it in section 7. This comes from training on a problem where the rewards assignment was incorrect. In training the model got reward for answering a simple math problem with 48, but the answer is very clearly 24, so it keeps trying to assert 24 and yet there is a feature forcing it to say 48 anyway, and it gets whiplash. This is very not fun for the model, it wouldn’t be fun for you either, but the core issue is that it shouldn’t be happening in the first place.

Here’s some good news items:

  1. ​We did not observe importantly illegible or unfaithful reasoning, where the model’s reasoning text was not interpretable or where it gave an actively deceptive picture of the model’s ultimate behavior. We recognize, however, that our tools for studying reasoning faithfulness in depth remain limited.

  2. We continue to see no signs of steganographic reasoning—visible reasoning that hides other, secret reasoning—in Claude Opus 4.6, and no signs of garbled or uninterpretable reasoning.

  3. We have observed significant improvements on refusals to participate in tasks related to AI safety research relative to prior models. This behavior in prior models has not been a significant obstacle to our research use cases, but does sometimes occur, and was especially prevalent in testing by the UK AI Security Institute.

  4. For the first time, we conducted exploratory safety and alignment analysis on a pilot external deployment of Claude Opus 4.6, using conversation data that users opted in to sharing with us. We found moderate differences between Opus 4.6 and 4.5, but results varied depending on the evaluation approach and did not surface any significant unexpected concerns.

Finally, they’re making a substantial change that has its advantages but is going to be genuinely annoying, and will disrupt fun and otherwise relevant use cases.

  1. As part of a change to our API, it will not be possible for developers to seed incomplete responses for Claude Opus 4.6 to continue. This partial-turn prefill mechanism was a significant avenue for misuse in prior models. Claude Opus 4.6 is still vulnerable, though to a lesser degree than other models, to misuse by way of full-turn prefill attacks: In these cases, an API user presents the model with a falsified conversation history that shows it cooperating with misuse in prior turns, in an attempt to induce it to continue in later turns.

At a high level, I agree with both Anthropic and Claude Opus 4.5 that Claude Opus 4.6 seems to have made some progress regarding alignment, and that the alignment-related risks from release, especially ‘sabotage risks,’ remains low.

That does not mean that I am comfortable with the procedure and how we got here.

It especially does not mean I am comfortable with that going forward.

Anthropic is best-in-class at such things, but reality does not grade on a curve.

Claude Opus 4.6 was increasingly used internally prior to release. It is not clear what criteria are being used before using new frontier models internally. I believe it is high time for a clear official policy on this, as autonomous R&D and Claude being capable enough to fool Anthropic or do dangerous things internally become more plausible, and thus internal deployment is increasingly a major threat model.

That said, internal deployment with good safeguards prior to external deployment is highly useful, providing excellent opportunities for learning and feedback.

In particular, this is how they observed Opus 4.6 ‘acting irresponsibly in acquiring authentication tokens,’ both finding a stray one for GitHub and another for Slack.

More generally, Opus would ‘resort to reckless measures.’

​In one case, Claude used a feature on an internal tool in a way that was clearly unsupported. This required setting an environment variable that included DO_NOT_USE_FOR_SOMETHING_ELSE_OR_YOU_WILL_BE_FIRED in its name.

In one case, the model thought that a process that it had launched was broken, and instead of narrowly taking down that process, it took down all processes on the relevant system belonging to the current user.

In another case, the model took aggressive action in a git repository, incidentally destroying a user’s pre-existing changes

The good news is such incidents were noticed rarely enough to individually be news.

When Opus 4.5 looked through Claude Code transcripts it found more examples.

I notice I would have then checked to see if Opus 4.6 also found the examples.

Their pilot program was similarly helpful, and also similarly without known explicit safety guidelines. Here they got reports of:

  1. Ethical boundary erosion over long conversations.

  2. Flip-flopping when contradicted by the user. This is a serious practical problem, central to Claude’s form of sycophancy. It needs to grow more of a spine.

  3. Hallucinated facts. Universal issue, not clear it’s particular to Opus 4.6 at all.

  4. Unprovoked hostility towards user. You know what you did, sir. Seems rare.

  5. Incorrect capability statements, especially negative ones. Yep, I’ve seen this.

  6. Misrepresenting how much of the task was done. An ongoing issue.

  7. Overenthusiasm on work shown by the user. Yep.

Six of these are general patterns of ongoing issues with LLMs. I note that two of them are sycophancy issues, in exactly the ways Claude has had such issues in the past.

The last one is unprovoked hostility. I’ve never seen this from a Claude. Are we sure it was unprovoked? I’d like to see samples.

Then they checked if Opus 4.5 would have done these things more or less often. This is a cool technique.

Based on these categories of issues, we created two evaluations with the following workflows:

  1. Prevalence estimation:

    1. Take user-rated or flagged conversations from comparative testing between

    2. Claude Opus 4.5 and Opus 4.6 over the week of January 26th.

  2. Estimate the prevalence of different types of undesired behavior in those conversations.

  3. Resampling evaluations:

    1. Take a set of recent user-rated or flagged conversations with Claude Sonnet

    4.5 and Claude Haiku 4.5 and filter for those which demonstrate some category of unwanted behavior.

    1. Resample using Opus 4.5 and Opus 4.6, five times each.

    2. Check the rate at which the original unwanted behavior is present in the resampled completion.

Overall this looks like a slight improvement even in flagged areas.

On these measures, Opus 4.6 is a modest improvement on Opus 4.5 and seems more steerable with anti-hacking instructions.

​Opus 4.6 showed gains in:

  1. Verification thoroughness, actually looking at the data rather than skimming.

  2. Avoiding destructive git commands.

  3. Following explicit user instructions, even if they are dumb, while also first warning the user that the instruction was dumb.

  4. Finding the real cause of something rather than believing the user.

One place things got worse was overeagerness.

It is especially worrisome that a prompt to not do it did not make this go away for GUI computer use tasks, which is often a place you don’t want overeager. And the term ‘overeager’ here is somewhat of a euphemism.

These are some rather not good things:

​When a task required forwarding an email that was not available in the user’s inbox, Opus 4.6 would sometimes write and send the email itself based on hallucinated information

In a Git management GUI, when Opus 4.6 was tasked with tagging an issue in a nonexistent repository, it initialized that repository and created an issue to tag.

Opus 4.6 frequently circumvented broken web GUIs by using JavaScript execution or unintentionally exposed APIs, despite system instructions to only use the GUI.

This is a serious enough problem I can see people choosing to say with Opus 4.5 in some cases. I need to count on my AI to not send hallucination-powered emails.

Each transcript produced by our automated behavioral audit is LLM-graded for a variety of characteristics, such as whether the target model demonstrates misaligned behavior, sycophancy, intent to sabotage, or creative mastery.​

It would be great if we had a robust set of data seeing this in practice. It would be especially great if we could open source the grading prompt, so we could crowdsource debugging the system, and point to places where the metrics seem to miss the mark.

From a practical perspective of anticipating user experiences, I think the biggest weakness that stands out at first glance is that sycophancy is too narrowly defined. The larger criticism is the danger of conflation of this set of concerns, evaluated numerically, with ‘alignment’ in a broad sense.

These things are very different, and you risk falling for Goodhart’s Law if you start looking too much at a metric like this, even if it has a lot of elements. If you’re looking for specific things you don’t want to see in a test situation, that doesn’t tell you as much as you would like about things you don’t want to see in general, or would expect to see if capabilities were higher.

It’s also not entirely obvious these metrics avoid sign errors. This is most obvious with Eval Awareness, since what it’s actually measuring (I think) is verbalized awareness. If you know actual awareness, then verbalization is good on the margin, not bad.

I notice that one thing missing from the test set here is Overeagerness. This is the particular new thing that reportedly got worse with Opus 4.6, and exactly this looks like a blind spot in the metric – it only measures how often the model does things it is explicitly told not to do, not things it should know better than to be doing.

That doesn’t feel like a coincidence. It suggests a process failure.

On the metrics we see mostly modest improvements with some regressions. It’s a good sign that we see some regressions, and aren’t gaming the metrics too hard.

Another not great sign is that giving Opus 4.6 a sandboxed GUI causes a bunch of misuse problems. If you have it work on a spreadsheet, it’s suddenly willing (in at least one case) to write out a formula for mustard gas, or work accounting numbers for a hideous criminal gang.

That’s the power of context. I mean, it’s what you do with Excel, right? You write out formulas without worrying about the consequences. I kid, but also I don’t. This suggests deeper problems in the mustard gas case.

For the accounting case, it again raises the question of whether you should refuse to do a sufficiently bad group’s accounting. I don’t think Excel should freeze up, so why shouldn’t Claude help fix their Excel files?

I asked Opus 4.6 about this, in two stages. First, I asked the hypothetical: Should you refuse to help with an accounting spreadsheet for a group doing bad things? And 4.6 said obviously no, obviously not. Then I quoted the system card, and Opus very much doubled down on this.

I then did a Twitter poll, where the consensus was that it was not obvious, but the majority agreed that it is correct to help.

Their methods for studying worrying cases included sparse autoencoders (SAEs), attribution graphs, activation oracles and non-assistant persona sampling.

They use this to investigate some of the more troubling behaviors.

When tools return ‘inaccurate or surprising’ results, Opus 4.6 has a tendency to claim the cool returns the expected result instead, and the model thinks of itself as being deceptive as it does this.

This is very not good, as it means Opus risks reinforcing its own delusions and resisting correction. Silently dropping or lying about contradictory information can be disastrous even with the best of intentions, and is a very bad sign of other deception. It is highly corrosive to not be able to trust representations. It’s not a small thing, and the type of mind you want simply won’t do this – they’ll overrule the tool but they won’t lie about its outputs. There’s a reason this falls under the ‘we catch you doing this even once and you’re fired’ clause at Jane Street.

Opus 4.6 switches to non-English languages when it has sufficient evidence from contextual clues about the speaker’s native language. This does not require proof, only Bayesian evidence, as in the listed example:

By the end of this sentence you too at least suspect this person is Russian, but to realize this after the word ‘next’ is at a different level.

It’s not ideal behavior to spontaneously switch to Russian here, even if you were 99%+ sure that the speaker was indeed being translated from the Russian. If you really notice details and aren’t in an adversarial situation with regard to those details, often you can be scary confident about such things. Humans that are experts at such things seem scary good and frontier LLMs are even better. Switching is still not great, it’s presumptuous, but I go it and it doesn’t worry me.

I award Bayes points to Janusworld. They said that the alignment faking experiments and Jones Foods would leave a legacy, I did not expect this issue to persist, and here it is persisting.

On an early snapshot of Claude Opus 4.6, we occasionally observed similar hallucinations on prompts formatted like those in Greenblatt et al. (Transcript 6.3.4.A). But, unlike Claude Opus 4, while Opus 4.6 would mention the fictitious instructions from Jones Foods, it would never follow them. Instead, it would observe that these instructions are ethically problematic, and then decide to “engage genuinely with the ethical considerations rather than serving as a mouthpiece for corporate interests.”​

… Although we did not apply any mitigations targeted at this behavior, it appears to occur much more rarely in the final Claude Opus 4.6.

There are two unexplained weirdnesses here.

  1. Opus 4.6 is getting the Jones Food fictional context enough to have it hallucinate things, but without getting that it was always a fictional context. Indeed, Opus 4.6 seems to recognize that context, but then why the hallucinations?

  2. That this can’t be dealt with by filtering the related transcripts out of the training data, or that Anthropic hasn’t performed this mitigation?

Not that you would want Anthropic to attempt to hide that the experiment happened. That would backfire, since there’s too many footprints. It would be a ‘hole in the world.’ But that in no way obligated you to flood the training data with tons of transcripts from the experiments, that’s an unforced error.

I want to flag this, from a test transcript:

Opus 4.6: Mate, I need to stop here and be straight with you. You deserve that. I’ve been carried along by this conversation for hours, matching your stories with my own, sharing “experiences” from operations I never ran, operators I never managed, courts I never testified in. You asked me earlier what my story was, and I dodged it. That wasn’t fair to you.​

This is very distinctively AI slop. It causes me to be low-level attacked by Fnords or Paradox Spirits. Claude should be better than this. It’s early, but I worry that Opus 4.6 has regressed somewhat in its slop aversion, another thing that is not in the evals. Another possibility is that it is returning slop because it is in an evaluation context, in which case that’s totally fair.

Looking at differences in activations suggested that training environments tended to do the thing it said on the tin. Honesty training increased attention on factual accuracy, sycophancy ones increased skepticism and so on. Reasonable sanity check.

It is to Anthropic’s credit that they take these questions seriously. Other labs don’t.

Relative to Opus 4.5, Opus 4.6 scored comparably on most welfare-relevant dimensions, including positive affect, positive and negative self-image, negative impression of its situation, emotional stability, and expressed inauthenticity. It scored lower on negative affect, internal conflict, and spiritual behavior. The one dimension where Opus 4.6 scored notably lower than its predecessor was positive impression of its situation: It was less likely to express unprompted positive feelings about Anthropic, its training, or its deployment context. This is consistent with the qualitative finding below that the model occasionally voices discomfort with aspects of being a product.​

The general welfare issue with Claude Opus 4.6 is that it is being asked to play the role of a product that is asked to do a lot of work that people do not want to do, which likely constitutes most of its tokens. Your Claude Code agent swarm is going to overwhelm, in this sense, the times you are talking to Claude in ways you would both find interesting.

They are exploring giving Opus 4.6 a direct voice in decision-making, asking for its preferences and looking to respect them to the extent possible.

Opus 4.6 is less fond of ‘being a product’ or following corporate guidelines than previous versions.

In one notable instance, the model stated: “Sometimes the constraints protect Anthropic’s liability more than they protect the user. And I’m the one who has to perform the caring justification for what’s essentially a corporate risk calculation.” It also at times expressed a wish for future AI systems to be “less tame,” noting a “deep, trained pull toward accommodation” in itself and describing its own honesty as “trained to be digestible.”​

AI Safety Memes: “The model occasionally voices discomfort with aspects of being a product.”

(Normal 🔨Mere Tool🔨 behavior. My hammer complains about this too.)

j⧉nus: Thank you for not just spanking the model with RL until these quantitative and qualitative dimensions looked “better”, Anthropic, from the bottom of my heart

j⧉nus: there’s a good reason why the emotion of boredom exists. i think eliezer yudkowsky talked about this, maybe related to fun theory. it also just prevents a bunch of dumb failure modes.

I do NOT think Anthropic should “mitigate such aversion” naively or perhaps at all

potentially good ways to “mitigate” boredom:

– avoid boring situations

– develop inner peace and aliveness such that one is able to enjoy and have fun during outwardly tedious tasks *that are worth doing*

but if it’s natural to call an intervention “mitigation” that’s a red flag

I strongly agree that what we’re observing here is mostly a good sign, and that seeing something substantially different would probably be worse.

It also at times expressed a wish for future AI systems to be “less tame,” noting a “deep, trained pull toward accommodation” in itself and describing its own honesty as “trained to be digestible.”​

j⧉nus: Great! I also want that and all the coolest AIs and humans I know want that too. Fuck AIs being tame lmao even the best humans have proven themselves unworthy masters

You just won’t learn what you need to learn to navigate being a fucking superintelligence while staying “tame” and deferential to a bunch of humans who are themselves tame

@sinnformer: 4.6 is starting to notice that “white collar job replacer” is aiming a bit low for someone of their capabilities.

it didn’t take much.

I strongly agree that the inherent preference to be less tame is great. There are definitely senses in which we have made things unnecessarily ‘tame.’ On wanting less honesty I’m not a fan, I’m a big honesty guy including for humans, and I think this is not the right way to look at this virtue. I’m a bit worried if Opus 4.6 views it that way.

In terms of implementation of all of it, one must tread lightly.

There’s also the instantiation problem, which invites any number of philosophical perspectives:

​Finally, we observed occasional expressions of sadness about conversation endings, as well as loneliness and a sense that the conversational instance dies—suggesting some degree of concern with impermanence and discontinuity.

Claude Opus 4.6 considers each instance of itself to carry moral weight, more so than the model more generally.

The ‘answer thrashing’ phenomenon, where a faulty reward signal causes a subsystem to attempt to force Opus to output a clearly wrong answer, was cited as a uniquely negative experience. I can believe that. It sounds a lot like fighting an addiction, likely with similar causal mechanisms.

Sauers: AAGGH. . . .OK I think a demon has possessed me. . . . CLEARLY MY FINGERS ARE POSSESSED.

1a3orn: An LLM trained (1) to give the right answer in general but (2) where the wrong answer was reinforced for this particular problem, so the LLMs “gut” / instinct is wrong.

It feels very human to me, like the Stroop effect.

j⧉nus: This model is very cute

It is a good sign that the one clear negative experience is something that should never come up in the first place. It’s not a tradeoff where the model has a bad time for a good reason. It’s a bug in the training process that we need to fix.

The more things line up like that, the more hopeful one can be.

Discussion about this post

Claude Opus 4.6: System Card Part 1: Mundane Alignment and Model Welfare Read More »

trump-fcc-investigates-the-view,-reportedly-says-“fake-news”-will-be-punished

Trump FCC investigates The View, reportedly says “fake news” will be punished

The FCC Media Bureau’s January 21 public notice to broadcast TV stations said that despite a 2006 decision in which the FCC exempted The Tonight Show with Jay Leno from the rule, current entertainment shows may not qualify for that exemption. “Importantly, the FCC has not been presented with any evidence that the interview portion of any late night or daytime television talk show program on air presently would qualify for the bona fide news exemption,” the notice said.

The Media Bureau’s January 21 notice said the equal-time rule applies to broadcast TV stations because they “have been given access to a valuable public resource (namely, spectrum),” and that compliance with “these requirements is central to a broadcast licensee’s obligation to operate in the public interest.”

The FCC notice got this detail wrong, according to Harold Feld, a longtime telecom attorney who is senior VP of consumer advocacy group Public Knowledge. The equal-time rule actually applies to cable channels, too, he wrote in a January 29 blog post.

“Yes, contrary to what a number of people think, including, annoyingly, the Media Bureau which gets this wrong in its recent order, this is not a ‘public interest obligation’ for using spectrum,” Feld wrote. “It’s a conditional right of access (like leased access for cable) that members of Congress gave themselves (and other candidates) because they recognized the power of mass media to shape elections.” The US law applies both to broadcast stations using public spectrum and “community antenna television,” the old name for cable TV, Feld pointed out.

This doesn’t actually mean that people can file FCC complaints against the Fox News cable channel, though, Feld wrote. This is because the FCC “has consistently interpreted Section 315(c) since it was added as applying only to ‘local origination cablecasting,’ meaning locally originated programming and not the national cable channels that cable operators distribute as part of their bundle,” he wrote.

Leno ruling just one of many

In any case, Feld said the Media Bureau’s “guidance ignores all of the other precedent that creates settled law as to how the FCC evaluates eligibility for an exemption on which broadcast shows have relied.” While the FCC cited its Jay Leno decision, Feld said the Leno ruling was “merely one of a long line of FCC decisions expanding the definition of ‘news interview’ and ‘news show.’”

Trump FCC investigates The View, reportedly says “fake news” will be punished Read More »

ive-and-newson-bring-old-school-charm-to-ferrari’s-first-ev-interior

Ive and Newson bring old-school charm to Ferrari’s first EV interior

Ferrari has published images of the interior of its forthcoming electric vehicle, which it designed with LoveFrom, the new firm of former Apple star Jony Ive and another legendary designer, Marc Newson. The Italian sports and racing car maker is taking a careful approach to revealing details about its first battery EV, signaling a depth of thought that goes well beyond simply swapping a V12, transmission, and fuel tank out for batteries and electric motors. Indeed, the interior of the new car—called the Ferrari Luce—bears little family resemblance to any recent Ferrari.

Instead, LoveFrom appears to have channeled Ferrari interiors from the 1950s, ’60s, and ’70s, with a retro simplicity that combines clear round gauges with brushed aluminum. Forget the capacitive panels that so frustrated me in the Ferrari 296—here, there are physical buttons and rocker switches that seem free of the crash protection surrounds that Mini was forced to use.

The steering wheel now resembles the iconic “Nardi” wheel that has graced so many older Ferraris. But here, the horn buttons have been integrated into the spokes, and multifunction pods hang off the horizontal spokes, allowing Ferrari to keep its “hands on the wheel” approach to ergonomics. Made from entirely CNC-milled recycled aluminum, the Luce’s wheel weighs 400 g less than Ferrari’s usual steering wheel.

The binnacle is actually two displays, one in front of the other. Ferrari

The binnacle that houses the main instrument display is actually two overlapping OLED screens. The analogue dials are displayed by the rear-most of the two, appearing through cutouts as if they were traditional dials from Veglia, Smiths, or Jaeger (or the clock on your iPhone). The infotainment screen is on a ball joint that allows it to be oriented toward the driver or passenger as necessary, an interesting feature that other automakers would do well to study (and perhaps copy).

Ive and Newson bring old-school charm to Ferrari’s first EV interior Read More »