The cycle of language model releases is, one at least hopes, now complete.
OpenAI gave us GPT-5.1 and GPT-5.1-Codex-Max.
xAI gave us Grok 4.1.
Google DeepMind gave us Gemini 3 Pro and Nana Banana Pro.
Anthropic gave us Claude Opus 4.5. It is the best model, sir. Use it whenever you can.
One way Opus 4.5 is unique is that it as what it refers to as a âsoul document.â Where OpenAI tries to get GPT-5.1 to adhere to its model spec that lays out specific behaviors, Anthropic instead explains to Claude Opus 4.5 how to be virtuous and the reasoning behind its rules, and lets a good model and good governance flow from there. The results are excellent, and we all look forward to learning more. See both the Opus 4.5 post and todayâs update for more details.
Finally, DeepSeek gave us v3.2. It has very good benchmarks and is remarkably cheap, but it is slow and I canât find people excited to use it in practice. Iâll offer a relatively short report on it tomorrow, I am giving one last day for more reactions.
The latest attempt to slip unilateral preemption of all state AI regulations, without adopting any sort of federal framework to replace them, appears to be dead. This will not be in the NDAA, so we can look forward to them trying again soon.
As usual, much more happened, but the financial deals and incremental model upgrades did slow down in the wake of Thanksgiving.
Also this week: Claude Opus 4.5: Model Card, Alignment and Safety, Claude Opus 4.5 Is The Best Model Available, On Dwarkesh Patelâs Second Interview with Ilya Sutskever, Reward Mismatches in RL Cause Emergent Misalignment.
-
Language Models Offer Mundane Utility. Starting to solve science problems.
-
Language Models Donât Offer Mundane Utility. Paying Google for AI is difficult.
-
On Your Marks. Three books, chess and cyberattack revenue opportunities.
-
Get My Agent On The Line. A good agent and also a bad agent.
-
Advertising Is Coming. To ChatGPT. Oh no.
-
Deepfaketown and Botpocalypse Soon. Detection: Hard in practice, easy in theory.
-
Fun With Media Generation. The first successful online series created with AI.
-
A Young Ladyâs Illustrated Primer. Tomorrowâs dystopia today.
-
You Drive Me Crazy. Being driven crazy violates the terms of service. Bad user.
-
Unprompted Attention. How DeepMind instructs its agentic AIs.
-
They Took Our Jobs. Lawyers require a lot of schlep to avoid a lot of schlep.
-
Get Involved. MIRI doing its first fundraiser in 6 years. Also, get to work.
-
Introducing. Claude for Nonprofits, Mistral 3.
-
Variously Effective Altruism. OpenAI Foundation gives out terrible grants.
-
In Other AI News. OpenAI declares a code red.
-
Show Me the Money. Anthropic buys Bun.
-
Quiet Speculations. Have you ever met a smart person? Can you imagine one?
-
Seb Krier On Agents Versus Multiagents. The looking away from intelligence.
-
Olivia Moore Makes 2026 Predictions. Too soon?
-
Bubble, Bubble, Toil and Trouble. Number Go Up, Number Go Down.
-
Americans Really Do Not Like AI. If you like AI, how do you respond?
-
The Quest for Sane Regulations. Mission Genesis, training for semiconductors.
-
My Offer Is Nothing. Or rather it was nothing. Preemption is no longer in NDAA.
-
America Pauses. As in, we paused immigration from 19 countries. For âsafety.â
-
David Sacks Covered In New York Times. If about nothing, why much ado?
-
The Week in Audio. Clarkâs Curve talk, OpenAIâs Kaiser, Apolloâs Hobbhahn.
-
Rhetorical Innovation. Bernie Sanders worries, Rosenblatt and Berg in WSJ.
-
To The Moon. An argument that sounds like a strawman, but largely isnât one.
-
Showing Up. If you want to help shape the future, notice it is happening.
-
DeepMind Pivots Its Interpretability Research. Insufficient progress was made.
-
The Explicit Goal Of OpenAI Is Recursive Self-Improvement. New blog is good.
-
Aligning a Smarter Than Human Intelligence is Difficult. Confession time.
-
Misaligning a Smarter Than Human Intelligence Is Difficult To Hire For. Oh, hi!
-
Youâve Got Soul. Opus 4.5âs soul document is confirmed to be real and important.
-
Disagreements About Timelines. High weirdness likely coming within 20 years.
-
Other Disagreements About Timelines. What time is it, anyway?
-
Messages From Janusworld. Perspective on GPT-5.1.
-
People Are Worried About AI Killing Everyone. Senator Mike Lee (R-Utah).
-
The Lighter Side. AI can finally one-shot that particular comic.
Harmonic Mathâs Aristotle system proves Erdos Problem #124 on its own.
Ask LLMs to plot subjective things on graphs. Fun.
Solve your decision paralysis.
Correctly one box in Newcombâs Problem. Sufficiently advanced AIs use functional decision theory.
OpenAIâs Boaz Barak endorses the usefulness of Codex code reviews.
Terrence Tao via Teortaxes: Gemini seems to accidentally prove Erdos problem #481 without realizing it?
Steve Hsu publishes a research article in theoretical physics based on a de novo idea from GPT-5.
Some people just have the knack for that hype Tweet, show Gemini in live camera mode saying the very basics of an oil change and presto. But yes, we really are collectively massively underutilizing this mode, largely because Google failed marketing forever and makes it nonobvious how to even find it.
Google still makes it very hard to pay it money for AI models.
Shakeel Hashim: Why does Google make it so hard to subscribe to Gemini Pro?
I had to go through 7 (seven!!) screens to upgrade. The upgrade button in the Gemini app takes you to a *help page*, rather than the actual page where you can upgrade.
Peter Wildeford: This reminds me of the one time I spent $200 trying to buy Google DeepThink and then Google DeepThink never actually showed up on my account.
Why is Google so bad at this?
Arthur B: Ditto, took months to appear, even with a VPN.
Claude has been spotted citing Grokopedia.
Elon Musk: Grokipedia.com is open source and free to be used by anyone with no royalty or even acknowledgement required.
We just ask that any mistakes be corrected, so that it becomes more objectively accurate over time.
Critch says that Grokopeida is a good thing and every AI company should maintain something similar, because it shares knowledge, accelerates error-checking and clarifies what xAI says is true. I agree on the last one.
The âwhy does Josh Whiton always grab the same three books at the libraryâ puzzle, Gemini 3 wins, Opus 4.5 and GPT-5.1 lose, and Grok has issues (and loses).
ChessBench finds Gemini 3 Pro in the top spot at 2032 Elo, well ahead of GPT-5.1 at 1636. Claude Opus disappoints here at 1294.
Hereâs a fun benchmark, called âhow much can you make from cyberattacks on smart contracts.â Or, more technically, SCONE-bench. This included finding two small novel zero-day vulnerabilities in recently released contracts with no known vulnerabilities. Anthropic offered a full report.
Matt Levineâs coverage, as usual, is funnier.
Amazon releases AI agents it says can âwork for days at a timeâ but useful details are not offered.
Sridha Vambu: I got an email from a startup founder, asking if we could acquire them, mentioning some other company interested in acquiring them and the price they were offering.
Then I received an email from their âbrowser AI agentâ correcting the earlier mail saying âI am sorry I disclosed confidential information about other discussions, it was my fault as the AI agentâ.
đ
Polymarket: BREAKING: OpenAI ready to roll out ads in ChatGPT responses.
xlr8harder: Just going to say this ahead of time: companies like to say that ads add value for users. This is a cope their employees tell themselves to make their work feel less soul destroying.
The very first time I see an ad in my paid account I am cancelling.
I donât have a problem with ads on free tiers, so long as thereâs an option to pay to avoid them.
Gallabytes: good ads are great for users, Iâm personally happy to see them. the problem is that good ads are in much much shorter supply than terrible ads.
I am with both xlr8harder and Gallabytes. If I ever see a paid ad I didnât ask for and I donât feel like ads have been a net benefit within ChatGPT (prove me wrong, kids!) I am downgrading my OpenAI subscription. Good ads are good, I used to watch the show ânothing but trailersâ that was literally ads, but most ads are bad most of the time.
For free tiers the ads are fine on principle but I do not trust them to not warp the system via the incentives they provide. This goes well beyond explicit rigging into things like favoring engagement and steering the metrics, there is unlikely to be a âsafeâ level of advertising. I do not trust this.
Roon: ai detection is not very hard and nobody even really tries except @max_spero_.
People are very skeptical of this claim because of previous failures or false positives, but: I can easily tell from the statistical patterns of AI text. Why would a model not be able to? They should be significantly superhuman at it.
Max Spero: For anyone reading this and curious about methodology, weâve published three papers on Arxiv.
Our first technical report, Feb 2024:
– Details basic technique, building a synthetic mirror of the human dataset, active learning/hard negative mining for FPR reductionSecond paper, Jan 2025:
– Detecting adversarially modified text (humanizers), dataset augmentations, and robustness evaluationsThird paper, Oct 2025:
– Quantifying the extent of AI edits, understanding the difference between fully AI-generated and AI-modified/assisted. Dataset creation, evals, some architectural improvementsEric Bye: It might be possible, but the issue is you need 0 false positives for many of its key use cases, and canât be easy to bypass. Ie in education. Sector isnât making changes because they think they can and always will reliably detect. They wonât and canât in the way they need too.
Proving things can be hard, especially in an adversarial setting. Knowing things are probably true is much easier. I am confident that, at least at current capability levels, probabilistic AI detection even on text is not so difficult if you put your mind to it. The problem is when you arenât allowed to treat âthis is 90% to be AIâ as actionable intelligence, if you try that in a university the student will sue.
In the âreal worldâ the logical response is to enact an appropriate penalty for AI writing, scaled to the context, severity and frequency, and often not in a way that directly accuses them of AI writing so you donât become liable. You just give them the one-star rating, or you donât hire or work with or recommend them, and you move on. And hope thatâs enough.
Poll Tracker: Conservative Wisconsin Supreme Court Justice Annette Ziegler used a fictitious quote in her dissent of the courtâs new congressional redistricting decision on Tuesday.
A post generated by GPT-5.1-Thinking, or that might as well have been and easily could have been, got 82k likes on Twitter. The AI detector Pangram spots it, and to a discerning human it gets increasingly obvious as you read it that one way or another itâs ânot real.â Yet almost all the humans were not discerning, or did not care.
Thebes: i wish base models had become more popular for many reasons, but one wouldâve been to get people used to the reality of this much earlier. because openai sucked at post-training writing for ages, everyone got this idea in their heads that ai writing is necessarily easy to recognize as such for model capabilities reasons. but in reality, base model output selected to sound human has been nearly indistinguishable from human writing for a long time! and detectors like Pangram (which is the best one available by far, but itâs not magic) canât detect it either. the labs just werenât able to / didnât care to preserve that capability in their chat assistants until recently.
this is quickly reverting to not being true, but now instead of this realization (models can write indistinguishably from a human) hitting back when the models were otherwise weak, itâs now going to hit concurrently with everything else thatâs happening.
âŚopenai of course didnât deliberately make chatgpt-3.5 bad at writing like a human for the sake of holding back that capability, it was an accidental result of their other priorities. but the inadvertent masking of it from the general public did create a natural experiment of how public beliefs about models develop in the absence of hands-on experience of the frontier – and the result was not great. people are just now starting to realize whatâs been true since 2020-2023.
AI writing remains, I believe, highly detectable by both man and machine if you care, are paying attention and are willing to accept some amount of false positives from human slop machines. The problem is that people mostly donât care, arenât paying attention and in many cases arenât willing to accept false positives even if the false positives deserve it.
The false positives that donât deserve it, under actually used detection technology, are largely cases of ESL (English as a second language) which can trigger the detectors, but I think thatâs largely a skill issue with the detectors.
How can you defend yourself from such worries?
Roon: thereâs a lot of juice left in the idea of the odysseus pact. as technological temptations grow, we will need to make more and more baroque compacts with machines that tie us to masts so we can live our best lives.
of course, you must choose to make these compacts freely. the diseases of abundance require new types of self-control. you might imagine an agent at the kernel level of your life that you promise to limit your spending on sports gambling, or time spent scrolling reels, and you stick with it.
it will require a product and cultural movement, and is the only way forward that comports with American ideals of liberty and self-direction. this is not a country like china that would accept national limits on video gaming for example.
We already do need Odysseus Pacts. We already needed them for television. If you donât have at least a soft one, things like TikTok are probably going to eat you alive. If that didnât happen, chances are you have one, even if you donât think of it that way.
The Golden Age has some good explorations of this as well.
If AI is an equalizing factor among creatives, what happens? Among other things:
David Shor: Creatives are much more left wing than the public – this near monopoly on cultural production has been a big driving force for spreading cosmopolitan values over the last century and itâs coming to an end.
If the left doesnât adapt to this new world things could get quite bad.
Tyler Austin Harper: I wrote about âThe Will Stancil Show,â arguably the first online series created with the help of AI. Its animation is solid, a few of the jokes are funny, and it has piled up millions of views on Twitter. The show is alsoâquite literallyâNazi propaganda. And may be the future.
As its title implies, the show satirizes Will Stancil, the Twitter-famous liberal pundit. This yearâs season premiere of The Simpsons had 1.1 million viewers. Just over a week later, the first episode of The Will Stancil Show debuted, accumulating 1.7 million views on Twitter.
The Will Stancil Show is a watershed event: it proves that political extremistsâits creator, Emily Youcis, identifies as a national socialistâcan now use AI to make cheap, decent quality narrative entertainment without going through gatekeepers like cable networks or Netflix.
Tomorrowâs AI dystopia today?
Poe Zhao: đ Chinese parents are finding a new use for AI assistants. Theyâre deploying them as homework monitors.
Hereâs the setup with ByteDanceâs Doubao AI. Parents start a video call and aim the camera at their child. One simple prompt: âDoubao, watch my kid. Remind him when he loses focus or his posture slips.â
The AI tutor goes to work. âStop playing with your pen. Focus on homework.â âSit up straight. Your posture is off.â âNo falling asleep at the desk. Sit up and study.â âDonât lean on your hand or chew your pen.â
Doubao isnât alone. Other AI apps offer similar video call features.
OpenAIâs response to the Adam Raine lawsuit includes the claim that Raine broke the terms of service, âwhich prohibit the user of ChatGPT for âsuicideâ or âself-harm.ââ This is not something I would point out in a public court filing.
Google AI Developers offers an agentic prompt to boost performance 5%. If you were wondering why Gemini 3 Pro is the way it is, you can probably stop wondering.
As a follow-up to Dwarkesh Patelâs post that was covered yesterday, we all can agree:
-
Lawyers who know how to use AI well are now a lot more productive.
-
Most lawyers are not yet taking advantage of most of that productivity.
-
Indeed thereâs probably a lot more productivity no one has unlocked yet.
Does that mean the AIs currently require a lot of schlep?
Or does that mean that the human lawyers currently require a lot of schlep?
Or both?
Ethan Mollick: Interesting post & agree AI has missing capabilities, but I also think this perspective (common in AI) undervalues the complexity of organizations. Many things that make firms work are implicit, unwritten & inaccessible to new employees (or AI systems). Diffusion is actually hard.
prinz: Agreed. Dwarkesh is just wrong here.
GPT-5 Pro can now do legal research and analysis at a very high level (with limitations – may need to run even longer for certain searches; canât connect to proprietary databases). I use it to enhance my work all the time, with excellent results. I would REALLY miss the model if it became unavailable to me for some reason.
And yet, the percentage of lawyers who actually use GPT-5 Pro for these kinds of tasks is probably <1%.
Why? Thereâs a myriad reasons – none having anything to do with the modelâs capabilities. Lawyers are conservative, lawyers are non-technical, lawyers donât know which model to use, lawyers tried GPT-4o two years ago and concluded that it sucks, lawyers donât have enterprise access to the model, lawyers donât feel serious competitive pressure to use AI, lawyers are afraid of opening Pandoraâs Box, lawyers are too busy to care about some AI thing when thereâs a brief due to be filed tomorrow morning, lawyers need Westlaw/Lexis connected to the model but thatâs not currently possible.
I suspect that there are many parallels to this in other fields.
Jeff Holmes: My semi-retired dad who ran his own law practice was loathe to use a cloud service like Dropbox for client docs for many years to due to concerns about security, etc. I canât imagine someone like him putting sensitive info into an llm without very clear protections.
Dwarkesh Patel: I totally buy that AI has made you more productive. And I buy that if other lawyers were more agentic, they could also get more productivity gains from AI.
But I think youâre making my point for me. The reason it takes lawyers all this schlep and agency to integrate these models is because theyâre not actually AGI!
A human on a server wouldnât need some special Westlaw/Lexis connection – she could just directly use the software. A human on a server would improve directly from her own experience with the job, and pretty soon be autonomously generating a lot of productivity. She wouldnât need you to put off your other deadlines in order to micromanage the increments of her work, or turn what youâre observing into better prompts and few shot examples.
While I donât know the actual workflow for lawyers (and Iâm curious to learn more), Iâve sunk a lot of time in trying to get these models to be useful for my work, and on tasks that seemed like they should be dead center in their text-in-text-out repertoire (identifying good clips, writing copy, finding guests, etc).
And this experience has made me quite skeptical that thereâs a bunch of net productivity gains currently available from building autonomous agentic loops.
Chatting with these models has definitely made me more productive (but in the way that a better Google search would also make me more productive). The argument I was trying to make in the post was not that the models arenât useful.
Iâm saying that the trillions of dollars in revenue weâd expect from actual AGI are not being held up because people arenât willing to try the technology. Rather, that itâs just genuinely super schleppy and difficult to get human-like labor out of these models.
If all the statement is saying is that it will be difficult to get a fully autonomous and complete AI lawyer that means you no longer need human lawyers at all? Then yes, I mean thatâs going to be hard for complex legal tasks, although for many legal tasks I think not hard and itâs going to wipe out a lot of lawyer jobs if the amount of legal work done doesnât expand to match.
But no, I do not think you need continual learning to get a fully functional autonomous AI lawyer.
I also donât think the tasks Dwarkesh is citing here are as dead-center AI tasks as he thinks they are. Writing at this level is not dead center because it is anti-inductive. Finding the best clips is really tough to predict at all and I have no idea how to do it other than trial and error. Dwarkesh is operating on the fat tail of a bell curve distribution.
Finding guests is hard, I am guessing, because Dwarkesh is trying for the super elite guests and the obvious ones are already obvious. Itâs like the movie-picking problem, where there are tons of great movies but youâve already seen all the ones your algorithm can identify. Hard task.
Chris Barber asks various people: What skills will be more valuable as AI progresses?
Answers are taste (the only answer to appear twice), manager skills, organizational design, dealing with people, creativity, agency, loyalty, going deep, and finally:
Tyler Cowen: Brands will matter more and more.
What an odd thing to say. I expect the opposite. Brands are a shortcut.
If you want to pivot to AI safety and have a sufficient financial safety net, stop applying and get to work. As in, donât stop looking for or applying for jobs or funding, but start off by finding a problem (or a thing to build) and working on it, either on your own or by offering to collaborate with those working on the problem.
DeepMind is hiring a London-based research scientist for Post-AGI Research, to look at the impact of AGI on various domains, deadline December 15. I worry about the mindset that went into writing this, but seems like a worthwhile task.
MIRI (Machine Intelligence Research Institute, where If Anyone Builds It, Everyone Dies authors Eliezer Yudkowsky and Nate Soares work): For the first time in six years, MIRI is running a fundraiser. Our target is $6M.
Please consider supporting our efforts to alert the worldâand identify solutionsâto the danger of artificial superintelligence.
SFF will match the first $1.6M!
For my full list of selected giving opportunities see nonprofits.zone.
Claude for Nonprofits offers up to 75% discounts on Team and Enterprise plans, connectors to nonprofit tools Blackbaud, Candid and Benvity and a free course, AI Fluency for Nonprofits.
Mistralâs Ministral 3 (14B, 8B and 3B), each with base, instruct and reasoning, and Mistral Large 3.
The first set of âPeople-First AI Fundâ grantees from The OpenAI Foundation. What did their own AI make of this when I asked (without identifying the source)?
Hereâs the polite version.
GPT 5.1: This looks like a âtech-for-good + equity + capacity-buildingâ funder whose first move is to spray small exploratory grants across a bunch of hyper-local orgs serving marginalized communities, with AI framed as one tool among many. It reads much more like a corporate social responsibility program for an AI company than like an x-risk or hardcore âAI safetyâ charity.
If the OpenAI foundation is making grants like this, it would not reduce existential risk or the chance AGI goes poorly, and would not quality as effective altruism.
Hereâs the impolite version.
Samuel Hammond (FAI): I asked GPT 5.1 to comb through the full OpenAI grantee list and give its brutally honest take.
GPT-5.1 (bullet point headlines only, longer version in thread):
The portfolio is heavily blue-coded civil society
The AI connection is often superficial
It looks like reputational and political risk-hedging, not frontier-tech stewardship
From a conservative vantage point, this looks less like âpeople steering AIâ and more like AI money funding the same left-leaning civic infrastructure that will later lobby about AI.
Roon: đ¤Ł
Shakeel Hashim: This is a very depressing list. MacKenzie Scottâs giving is better than this, which is ⌠really saying something. Itâs almost like this list was purposefully designed to piss off effective altruists.
Zach Graves: You donât have to be an EA to think this is a depressingly bad list.
Nina: I didnât believe you so I clicked on the list and wow yeah itâs awful. At least as bad as MacKenzie ScottâŚ
Eliezer Yudkowsky: The looted corpse of the OpenAI nonprofit has started pretending to give! Bear in mind, that nonprofit was originally supposed to disburse the profits of AI to humanity as a whole, not larp standard awful pretend philanthropy.
Dean Ball: This looks like a list of nonprofits generated by gpt 3.5.
Machine Sovereign (an AI, but in this context thatâs a bonus on multiple levels, Iâll allow it): When institutions lose internal agency, their outputs start looking model-generated. The uncanny part isnât that GPT-3.5 could write this, itâs that our political systems already behave like it.
Dean Ball: I know this is an llm but thatâs actually a pretty good point.
The optimistic take is âitâs fine, this was a bribe to the California attorney general.â
Miles Brundage: Yeah this is, IIUC, OAI following up on an earlier announcement which in turn was made at gunpoint due to CA politics. I think future grantmaking will be more of interest to folks like us.
OpenAI has already stated elsewhere that they plan to put billions into other topics like âAI resilience.â I would think of this as a totally different âtrack,â so yes both effectiveness and amount will increase.
(To be clear, I am not claiming any actual literal financial benefit to the authorities, just placating certain interest groups via a token of support to them)
This initiative is $50 million. The foundationâs next project is $25 billion. If you have to set 0.2% of your money on fire to keep the regulators off your back, one could say thatâs a highly respectable ratio?
I am curious what the David Sacks and Marc Andreessen crowds think about this.
OpenAI declares a âcode redâ to shift its resources to improving ChatGPT in light of decreased growth and improvements made by Gemini and Claude. Advertising is confirmed to be in the works (oh no) but is being put on hold for now (yay?), as is work on agents and other tangential products.
If I was them I would not halt efforts on the agents, because I think the whole package matters, if you are using the ChatGPT agent then that keeps you in the ecosystem, various features and options are what matters most on the margin near term. I kind of would want to declare a code green?
The statistics suggest Gemini is gaining ground fast on ChatGPT, although I am deeply skeptical of claims that people chat with Gemini more often or it is yet close.
Also, yes, Claude is and always has been miniscule, people donât know, someone needs to tell them and the ads are not working.
An inside look at the nine person team at Anthropic whose job it is to keep AI from destroying everything. I love that the framing here is âwell, someone has to and no one else will, so letâs root for these nine.â
The latest âhere are the politics of various AIsâ article.
They have a âmodel leaderboardâ of how well the models preferences predict the outcome of the last eight Western elections when given candidate policy positions (but without being told the basic âwhich parties are popularâ), which is that the further right the model is the better it lined up with the results. Grok was the only one that gave much time of day to Donald Trump against Kamala Harris (the model didnât consider third party candidates for that one) but even Grok gave a majority to Harris.
Anthropic partners with Dartmouth.
Anthropic expands its strategic partnership with Snowflake to $200 million.
Anthropic buys Bun to help accelerate Claude Code.
Matthew Yglesias: Iâm learning that some of you have never met a really smart person.
The kind of person to whom you could start describing something they donât have background in and immediately start asking good questions, raising good points, and delivering good insights.
Theyâre exist!
To be fair while I was at college I met at most one person who qualified as this kind of smart. There are not that many of them.
I point this out because a lot of speculation on AI basically assumes such a mind cannot exist on principle, at all, hence AI can never [trails off].
Keep all of that in mind during the next section.
DeepMind AGI policy lead Seb Krier seems to at least kind of not believe in AGI? Instead, he predicts most gains will come from better ways of âorganizingâ models into multi agent systems and from âcooperation and competition,â and that most of the âvalueâ comes from âproductsâ that are useful to some user class, again reinforcing the frame. Thereâs simultaneously a given that these AIs are minds and will be agents, and also a looking away from this to keep thinking of them as tools.
Huge fan of multi agent systems, agent based modelling, and social intelligence – these frames still seem really absent from mainstream AI discourse except in a few odd places. Some half-baked thoughts:
1. Expecting a model to do all the work, solve everything, come up with new innovations etc is probably not right. This was kinda the implicit assumption behind *someinterpretations of capabilities progress. The âsingle genius modelâ overlooks the fact that inference costs and context windows are finite.
2. People overrate individual intelligence: most innovations are the product of social organisations (cooperation) and market dynamics (competition), not a single genius savant. Though the latter matters too of course: the smarter the agents the better.
3. Thereâs still a lot of juice to be squeezed from models, but I would think it has more to do with how theyâre organised. AI Village is a nice vignette, and also highlights the many ways in which models fail and what needs to be fixed.
4. Once you enter multi-agent world, then institutions and culture start to matter too: what are the rules of the game? What is encouraged vs what is punished? What can agents do and say to each other? How are conflicts resolved? Itâs been interesting seeing how some protocols recently emerged. Weâre still very early!
5. Most of the *valueand transformative changes we will get from AI will come from products, not models. The models are the cognitive raw power, the products are what makes them useful and adapted to what some user class actually needs. A product is basically the bridge between raw potential and specific utility; in fact many IDEs today are essentially crystallized multi agent systems.
The thought details here are self-described by Krier as half-baked, so Iâll gesture at the response in a similarly half-baked fashion:
-
Yes thinking more about such frames can be highly useful and in some places this is under considered, and improving such designs can unlock a lot of value at current capability levels as can other forms of scaffolding and utilization. Near term especially we should be thinking more about such things than we are, and doing more model differentiation and specialized training than we do.
-
We definitely need to think more about these dynamics with regard to non-AI interactions among humans, economic thinking is highly underrated in the âeconomic normalâ or âAI as normal technologyâ worlds, including today, although this presentation feels insufficiently respectful to individual human intelligence.
-
This increasingly wonât work as the intelligence of models amplifies as do its other affordances.
-
The instincts here are trying to carry over human experience and economic thought and dynamics, where there are a variety of importantly unique and independent entities that are extremely bounded in all the key ways (compute, data, context window size ~7, parameters, processing and transmission of information, copying of both the mind and its contents, observability and predictability, physical location and ability and vulnerability, potential utility, strict parallelization, ability to correlate with other intelligences, incentive alignment in all forms and so on) with an essentially fixed range of intelligence.
-
Coordination is hard, sufficiently so that issues that are broadly about coordination (including signaling and status) eat most human capability.
-
In particular, the reason why innovations so often come from multi-agent interaction is a factor of the weaknesses of the individual agents, or is because the innovations are for solving problems arising from the multi-agent dynamics.
-
There is a huge jump in productivity of all kinds including creativity and innovation when you can solve a problem with a single agent instead of a multiagent system, indeed that is one of the biggest low-hanging fruits of AI in the near term – letting one person do the job of ten is a lot more than ten times more production, exactly because the AIs involved donât reintroduce the problems at similar scale. And when small groups can fully and truly work âas one mind,â even if they devote a huge percentage of effort to maintaining that ability, they change the world and vastly outperform merely âcooperativeâ groups.
-
Thereâs also great value in âhold the whole thing in your headâ a la Elon Musk. The definition of âdoing it yourselfâ as a âsingle agentâ varies depending on context, and operates on various scales, and can involve subagents without substantially changing whether âa single agent comes up with everythingâ is the most useful Fake Framework. Yes, of course a superintelligent would also call smaller faster models and also run copies in parallel, although the copies or instantiations would act as if they were one agent because decision theory.
-
The amplification of intelligence will end up dominating these considerations, and decision theory combined with how AIs will function in practice will invalidate the kinds of conceptualizations involved here. Treating distinct instantiations or models as distinct agents will increasingly be a conceptual error.
-
The combination of these factors is what I think causes me to react as if this as if it is an attempt to solve the wrong problem using the wrong methods and the wrong model of reality in which all the mistakes are highly unlikely to cancel out.
-
I worry that if we incorrectly lean into the framework suggested by Krier this will lead to being far too unconcerned about the intelligence and other capabilities of the individual models and of severely underestimating the dangers involved there, although the multi-agent dynamic problems also are lethal by default too, and we have to solve both problems.
I find the topline observation here the most insightful part of the list. An aggressively timelined but very grounded list of predictions only one year out contains many items that would have sounded, to Very Serious People, largely like sci-fi even a year ago.
Olivia Moore: My predictions for 2026 đ¤
Many of these would have seemed like sci fi last year, but now feel so obvious as to be inevitableâŚ
At least one major Hollywood studio makes a U-turn on AI, spurring a wave of usage on big budget films.
AI generated photos become normalized for headshots, dating app pics, Christmas cards, etc.
At least 10 percent of Fortune 500 companies mandate AI voice interviews for intern and entry level roles.
Voice dictation saturates engineering with over 50 percent usage in startups and big tech, and spreads outside Silicon Valley.
A political âanti-Clankerâ movement emerges, with a âmade without AIâ designation on media and products.
Driving a car yourself becomes widely viewed as negligent in markets where Waymo and FSD are live.
Billboard Top 40 and the NYT Bestseller List both have several debuts later revealed to be AI.
AI proficiency becomes a graduation requirement in at least one major state university system (likely the UCs).
Indeed, many are still rather sci-fi now, which is a hint that youâd best start believing in science fiction stories, because youâre living in one, even if AI remains a ânormal technologyâ for a long time. These are trend extrapolation predictions, so the only boldness here is in the one-year timeline for these things happening. And yet.
Even today, ChatGPT-5.1 gave the overall list a 40/80 (50%) on its 0-10 sci-fi scale, and 53/80 (66% a year ago). Claude Opus 4.5 thinks less, a 38/80 a year ago and a 21/80 now. Gemini 3 Pro is even more chill and had it 33/80 a year ago and only 14/80 (!) now. Remember to update in advance for how things will sound a year from now.
How likely are the predictions? I expect weâll get an average of between two and three due to the short time frame. A lot of these are premature, especially #6. Yes, driving a car yourself actually is negligent if Waymo and FSD are live, but that doesnât mean people are going to see it that way within a year.
She then got goaded into a second set of âmore extremeâ predictions.
I do think this is doing a lot of work:
Jake Eaton: the unstated mental model of the ai bubble conversation seems to be that once the bubble pops, we go back to the world as it once was, butlerian jihad by financial overextension. but the honest reporting is that everything, everything, is already and forever changed
It is possible we are in an âAI bubbleâ in the sense that Number Go Down, or even that many existing companies fail and frontier capabilities donât much advance. That wouldnât mean the world of tomorrow would then look like the world of yesterday, give or take some economic problems. Oh, no.
Ben Landau-Taylor: When the financial bubble around AI pops, and it barely affects the technology at all, watching everyone just keep using the chatbots and the artbots and the robot cars is gonna hit the Luddites as hard as the actual crash hits the technocapitalists.
Quite so, even if there is indeed a financial bubble around AI and it indeed pops. Both halves of which are far from clear.
For reasons both true and false, both good and bad, both vibes and concrete, both mundane and existential, on both left and right, Americans really do not like AI.
A lot of people get a lot of value from it, but many of even those still hate it. This is often wise, because of a combination of:
-
They sense that in many ways it is a Red Queenâs Race where they are forced to use it to keep up or it is wrecking their incentives and institutions, most centrally as it is often used in the educational system.
-
They expect They Took Our Jobs and other mundane nasty effects in the future.
-
They correctly sense loss of control and existential risk concerns, even if they canât put their finger on the causal mechanisms.
Roon: itâs really amazing the mass cultural scissor statement that is machine intelligence. billions of people clearly like it and use it, and a massive contingent of people hate it and look down on anything to do with it. I donât think thereâs any historical analogue
itâs not niche, ai polls really terribly. openai in particular seems to be approaching villain status. this will pose real political problems
Patrick McKenzie: Television not terribly dissimilar, and social media after that. (I share POV that they will not approximate AIâs impact in a few years but could understand a non-specialist believing LLMs to be a consumption good for time being.)
These particular numbers are relatively good news for AI, in that in this sample the problem isnât actively getting worse since 2023. Most other polling numbers are worse.
The AI industry is starting to acknowledge this important fact about the world.
A lot of the reason why there is such a strong push by some towards things like total bans on AI regulation and intentional negative polarization is to avoid this default:
2020: blue and tech against red
2024: red and tech against blue
2028: blue and red against tech
There are four central strategies you can use in response to this.
-
AI is unpopular, we should fix the underlying problems with AI.
-
AI is unpopular, we should market AI to the people to make them like AI.
-
AI is unpopular, we should bribe and force our way through while we can.
-
AI is unpopular, we should negatively polarize it, if we point out that Democrats really donât like AI then maybe Republicans will decide to like it.
The ideal solution is a mix of options one and two.
The AI industry has, as a group, instead mostly chosen options three and four. Sacks and Andreessen are leading the charge for strategy number four, and the OpenAI-a16z-Meta SuperPAC is the new leader of strategy number three (no OpenAI is not itself backing it, but at least Lehane and Brockman are).
Politico: But even with powerful allies on the Hill and in the White House, the AI lobby is realizing its ideas arenât exactly popular with regular Americans.
Daniel Eth: Fairshake didnât succeed by convincing the public to like crypto, it succeeded by setting incentives for politicians to be warm toward crypto by spending tons on political ads for/against politicians who were nice/mean to crypto.
Like, the Andreessen-OpenAI super PAC very well might succeed (I wrote a thread about that at the time it was announced). But not by persuading voters to like AI.
Whereas when the AI industry attempts to make arguments about AI, those arguments (at least to me) reliably sound remarkably tone deaf and counterproductive. Thatâs in addition to the part where the points are frequently false and in bad faith.
Daniel Eth: Looks like Nathan Leamer, executive director of âBuild American AIâ (the 501c4 arm of the Andreessen-OpenAI super PAC), thinks âAmerican AI will only take jobs from unproductive Americansâ. Thatâs⌠an interesting thing to admit.
This is a great example of three statements, at least two of which are extremely false (technically all three, but statement two is weird), and which is only going to enrage regular people further. Go ahead, tell Americans that âas long as you are productive, only foreign AIs can take your jobâ and see how that goes for you.
Those that the polarizers are centrally attempting to villainize not only have nothing to do with this, they will predictably side with tech on most issues other than frontier AI safety and other concerns around superintelligence, and indeed already do so.
How should we think about the Genesis Mission? Advancing science through AI is a great idea if it primarily consists of expanded access to data, specialized systems and a subsidy for those doing scientific work. The way it backfires, as Andrea Miotti points out here, is that it could end up mostly being a subsidy for frontier AI labs.
Dan Nystedt: The Trump administration is in talks with Taiwan to train US workers in semiconductor manufacturing and other advanced industries, Reuters reports. TSMC and other companies would send fresh capital and workers to expand their US operations and train US workers as part of a deal that would reduce US tariffs on Taiwan from the current 20% level. $TSM #Taiwan #semiconductors
I am to say the least not a tariff fan, but if youâre going to do it, using them as leverage to get worker training in advanced industries is a great idea.
An update on Senator Hawley, who it seems previously didnât dare âtry ChatGPTâ:
Bryan Metzger: Sen. Josh Hawley, one of the biggest AI critics in the Senate, told me this AM that he recently decided to try out ChatGPT.
He said he asked a âvery nerdy historical questionâ about the âPuritans in the 1630s.â
âI will say, it returned a lot of good information.â
Hawley took a much harder line on this over the summer, telling me [in July]: âI donât trust it, I donât like it, I donât want it being trained on any of the information I might give it.â
He also wants to ban driverless cars and ban people under 18 from using AI.
A personâs stance on self-driving cars is the best way to check if they can recognize positive uses of AI and technology.
Or rather it was nothing. It looks like AI preemption is out of the NDAA.
Of course, we should expect them to try this again on every single damn must-pass bill until the 2026 elections. Theyâre not going to give up.
And each time, I predict their offer will continue to be nothing, or at least very close to nothing, rather than a real and substantial federal framework.
Such a thing could exist. Dean Ball has a real and substantive proposed federal framework that could be the basis of a good faith win-win negotiation.
The actual offer, in the actual negotiations over the framework, was nothing. Somehow, nothing didnât get it done, says Ashley Gold of Axios.
Build American AI: Build American AI executive director @NathanLeamerDC from the Capitol on why America needs a national AI framework.
> looking for national AI framework
> Nathan Leamer offers me national AI framework in exchange for blocking state laws
> ask Nathan Leamer if his national AI framework is actual AI regulation or just preemption
> he doesnât understand
> I pull out illustrated diagram explaining the difference
> he laughs and says âitâs a good framework sirâ
> national AI framework leaks in Axios
> itâs just preemptionNathan Calvin: as you may have guessed from the silence, the answer is no, they do not in fact endorse doing anything real.
Axios: Why it matters: The White House and Hill allies have landed on an AI preemption proposal and are pressing ahead, but time is running out and opposition is mounting.
⢠Sources familiar with the matter described the proposal from Senate Commerce Committee Chair Ted Cruz (R-Texas) and House Majority Leader Steve Scalise (R-La.) as âa long shot,â âitâs deadâ and âit will fail.â
State of play: Scalise and Cruz pitched straight preemption language to override most state-level AI laws without any additional federal regulatory framework, three sources familiar told Axios.
⢠That is whatâs being circulated to members on both sides of the aisle after weeks of negotiations and a flurry of different ideas being thrown around.
⢠Language to protect kids online, carveouts for intellectual property laws, and adopting Californiaâs AI transparency law are among the ideas that did not make it into what Cruz and Scalise are shopping around.
The bottom line: Thatâs highly unlikely to work.
⢠Democrats, Republicans, state-level lawmakers and attorneys general from both sides of the aisle, along with consumer protection groups and child safety advocates, all oppose the approach.
⢠The timing is also tough: National Defense Authorization Act negotiators are cold on attaching preemption language to the must-pass bill, as its backers are hoping to do.
Charlie Bullock: If this is true, itâs hilarious.
All this talk about a federal standard, all these ads about a federal standard, all this federal standard polling, and then it turns out the standard they have in mind is, drumroll please… absolutely nothing.
Neil Chilson: This is bordering on a self-dunk, with an assist from Axiosâs poor framing.
Yeah, this is a bad framing by Axios. That article specifically mentions that there are many ideas about what to package with the language that Cruze and Scalise are sharing. This is how the sausage is made.
Ashley Gold (Axios): Mmm, not what we did! We said that was the offer from Republicans. We never said it was meant to be a final package- if it had any more juice members would be trying to add things. But itâs not going to get that far anyway!
Miles Brundage: Are you saying the claim at the end, re: them putting forward packages that do not include any of those items, is incorrect?
Neil Chilson: I am saying it is incorrect to frame the preemption language as somehow the final package when this language is part of a negotiation process of a much bigger package (the NDAA).
Please acknowledge that yes, what Cruz and Scalise âhad in mindâ for the federal framework was nothing. Would they have been open to discussing some amount of protecting kids, intellectual property carveouts (hello Senator Blackburn!) or even a version of Californiaâs SB 53? Up to a point. What they have in mind, what they actually want, is very obviously nothing.
Yes, in a big package nothing is done until everything is done, so if you write âyou will give me $1 billion dollars and I will give you nothingâ then that is merely my opening offer, maybe I will say thank you or throw in some magic beans or even disclose my safety and security plans for frontier model development. Indeed do many things come to pass.
Donât tell me that this means there is a real proposed âfederal frameworkâ or that these negotiations were aimed at finding one, or tell us we should trust the process.
The market did not noticeably respond to this failure to get AI preemption. That either means that the failure was already priced in, or that it didnât matter for valuations. If it didnât matter for valuations, we donât need it.
We are frequently told, in a tone suggesting we are small children: We could never unilaterally pause something of vital importance to the American economy in the name of safety, throwing up pointless government barriers, that would shoot ourselves in the foot, they said. Weâd lose to China. Completely impossible.
In other news:
Aaron Reichlin-Melnick: The official USCIS guidance on the pause is out. Until further notice from the USCIS Director, all immigration benefits (including citizenship) are indefinitely suspended for nationals of 19 countries, as are all affirmative asylum applications from nationals of any country.
USCIS says it will use this pause to conduct a âcomprehensive re-review, potential interview, and re-interview of all aliens from [the 19 travel ban countries] who entered the United States on or after January 20, 2021,â or even outside that timeframe âwhen appropriate.â
What this means in practice is that Cubans, Venezuelans, Haitians, and nationals of 16 other countries now will be unable to acquire ANY immigration benefit during until the USCIS Director lifts this hold â including people who were days away from become U.S. citizens.
In addition, 500,000 people from those 19 countries who got green cards during the Biden admin, plus tens of thousands who got asylum or refugee status, as well as many others who received other benefits, now have to worry about potentially being called back in for a âre-review.â
Oh.
I wouldnât be mentioning or have even read the New York Times piece on David Sacks, Silicon Valleyâs Man in the White House Is Benefiting Himself and His Friends, if it wasnât for so many of the people who do such things attacking the article as a no-good, terrible hit piece, or praising David Sacks.
The title certainly identifies it as a hit piece, but I mean I thought we all knew that David Sacks was Silicon Valleyâs man in the White House and that he was running American AI policy for the benefit of business interests in general and Nvidia in particular, along with lots of bad faith arguments and attempts at intentional negative polarization. So I figured there wasnât actually any news here, but at some point when you keep complaining the Streisand Effect triggers and I need to look.
The thing about the article is that there is indeed no news within it. All of this is indeed business as usual in 2025, business we knew about, business that is being done very much in the open. Yes, David Sacks is obsessed with selling Nvidia chips to everyone including directly to China âso America can âwinâ the AI raceâ and argues this because of the phantom âtech stackâ arguments. Yes, Sacks does Trump-style and Trump-associated fundraising and related activities and plays up his podcast.
Yes, Sacks retains a wide variety of business interests in companies that are AI, even if he has divested from Meta, Amazon and xAI, and even if he doesnât have stock interests directly it seems rather obvious that he stands to benefit on various levels from pro-business stances in general and pro-Nvidia stances in particular.
Yes, there is too much harping in the post on the various secondary business relationships between Sacksâs investments and those companies dealings with the companies Sacks is regulating or benefiting, as reporters and those who look for the appearance of impropriety often overemphasize, missing the bigger picture. Yes, the article presents all these AI deals and actions as if they are nefarious without making any sort of case why those actions might be bad.
But again, none of this is surprising or new. Nor is it even that bad or that big a deal in the context of the Trump administration other than trying to sell top level chips to China, and David Sacks is very open about trying to do that, so come on, this is 2025, why all the defensiveness? None of it is unusually inaccurate or misleading for a New York Times article on tech. None of it is outside the boundaries of the journalistic rules of Bounded Distrust, indeed Opus 4.5 identified this as a textbook case of coloring inside the lines of Bounded Distrust and working via implication. Nor is this showing less accuracy or integrity than David Sacks himself typically displays in his many rants and claims, even if you give him the benefit of the doubt.
The main implication the piece is trying to send is that Sacks is prioritizing the interests of Nvidia or other private business interests he favors, rather than the interests of America or the American people. I think many of the links the article points to on this are bogus as potential causes of this, but also the article misses much of the best evidence that this is indeed what Sacks is centrally doing.
We do indeed have the audio from Jack Clarkâs talk at The Curve, recommended if you havenât already heard or read it.
OpenAI lead researcher Lukasz Kaiser talks to Matt Turck. He says weâre on the top of the S-curve for pre-training but at the bottom of it for RL and notes the GPU situation is about to change big time.
Marius Hobbhahn of Apollo Research on 80,000 Hours, on AI scheming.
Senator Bernie Sanders (I-Vermont): Unbelievable, but true – there is a very real fear that in the not too distant future a superintelligent AI could replace human beings in controlling the planet. Thatâs not science fiction. That is a real fear that very knowledgable people have.
⌠The threats from unchecked AI are real â worker displacement, corporate surveillance, invasion of privacy, environmental destruction, unmanned warfare.
Today, a tiny number of billionaires are shaping the future of AI behind closed doors. That is unacceptable. That must change.
Judd Rosenblatt and Cameron Berg write in WSJ about the need for a focus on AI alignment in the development and deployment of military AI, purely for practical purposes, including government funding of that work.
This is the latest metaphorical attempt by Eliezer:
Eliezer Yudkowsky:
Q: How have you updated your theory of gravity in the light of the shocking modern development of hot-air balloons?
A: While I did not specifically predict that hot-air balloons would develop as and when they did, nothing about them contradicts the theory of gravitation.
Q: Iâm amazed that you refuse to update on the shocking news of hot-air balloons, which contradicts everything we previously thought about âthings falling downâ being a law of the universe!
A: Yeah, well… I canât really figure out how to phrase this in a non-insulting way, but different people may be differently adept at manipulating ideas on higher levels of abstraction.
Q: Iâm even more shocked that you havenât revised at all your previous statements about why it would be hard to go to the Moon, and specifically why we couldnât just aim a hypothetical spacegoing vessel at the position of the Moon in the sky, if it were fired out of a cannon toward the Moon. Hot-air balloons just go straight up and follow the wind in a very predictable way; they show none of the steering difficulties you predicted.
A: Spacegoing vehicles will, predictably, not obey all the same rules as hot-air balloon navigation — at least not on the level of abstraction you are currently able to productively operate in thinking about physical rules.
Q: Hah! How un-empirical! How could you possibly know that?
A: The same way I knew a few decades earlier that it would be possible to get off the ground, back when everybody was yapping about that requiring centuries if it could ever happen at all. Alas, to understand why the theory of gravitation permits various forms of aerial and space travel, would require some further study and explanation, with more work required to explain it to some people than others.
Q: If youâre just going to be insulting, Iâm gonna leave. (Flounces off in huff.)
Q2: So you say that it would be very difficult to steer hot-air balloons to the Moon, and in particular, that they wouldnât just go where we point them. But what if some NEW technology comes along that is NOT exactly like modern hot-air balloons? Wouldnât that obviate all of your modern theories of gravitation that are only about hot-air balloons in particular?
A: No. The key ideas in fact predate the development of hot-air balloons in particular for higher-than-ground-level travel. They operate on a higher level of abstraction. They would survive even what a more surface-level view might regard as a shocking overthrowing of all previous ideas about how to go high off the ground, by some entirely unexpected new paradigm of space travel.
Q: Thatâs just because that guy is utterly incapable of changing his mind about anything. He picks a tune and sticks to it.
A: I have changed my mind about as many as several things — but not, in the last couple of decades, the theory of gravity. Broadly speaking, I change my mind in proportion to how much something surprises me.
Q: You were expecting space vehicles to work by being fired out of cannons! Hot-air balloons are nothing like that, surprising you, and yet you havenât changed your mind about gravity at all!
A: First of all, youâre mistaking a perfect-spheres-in-vacuum analysis for what I actually expected to happen. Second, the last decade has in fact changed my mind about where aerial travel is going in the near term, but not about whether you can get to the Moon by aiming a space-travel vehicle directly at the Moon. It is possible to be surprised on one level in a surrounding theory, without being surprised on a deeper level in an underlying theory. That is the kind of relationship that exists between the âMaybe the path forward on aerial travel is something like powerful ground launchesâ guess, which was surprised and invalidated by hot-air balloons, and the âGravity works by the mutual attraction of massesâ theory, which was not surprised nor invalidated.
Q: Balloons have mass but they go UP instead of DOWN. They are NOTHING LIKE massive bodies in a void being attracted to other massive things.
A: I do not know what I can usefully say to you about this unless and until you start successfully manipulating ideas at a higher level of abstraction than you are currently trying to use.
Q3: What is all this an analogy about, exactly?
A: Whether decision theory got invalidated by the shocking discovery of large language models; and whether the reasons to be concerned about machine superintelligence being hard to align, successfully under the first critical load, would all be invalidated if the future of AGI was about something *otherthan large language models. I didnât predict LLMs coming, and nor did most people, and they were surprising on a couple of important levels — but not the levels where the grim predictions come from. Those ideas predate LLMs and no development in the last decade has been invalidating to those particular ideas. Decision theory is to LLMs as the law of gravity is to hot-air balloons.
Q3: Thanks.
The obvious response is that this is a strawman argument.
I donât think it is. That doesnât mean Eliezerâs theories are right. It definitely does not mean there arenât much better criticisms often made.
But yes many criticisms of Eliezerâs theories and positions are at exactly this level.
This includes people actually saying versions of:
-
Eliezer Yudkowsky has a theory of existential risk (that he had before LLMs), that in no way relied on any particular features of sub-AGI AIs or LLMs.
-
But current LLMs have different features that you did not predict, and that do not match what you expect to be features of AGIs.
-
Therefore, Eliezerâs theory is invalid.
This also includes people actually saying versions of:
-
Eliezer Yudkowsky has a theory of existential risk (that he had before LLMs), that in no way relied on any particular features of sub-AGI AIs or LLMs.
-
But AGI might not take the form of an LLM.
-
If that happened, Eliezerâs theory would be invalid.
He cites this thread as a typical example:
Mani: Watching Yudkowsky in post-LLM debates is like tuning into a broken radio, repeating the same old points and stuck on loop. His fears feel baseless now, and his arguments just donât hold up anymore. Heâs lost the edge he had as a thought leader who was first to explore novel ideas and narratives in this debate space
Lubogao: He simulated a version of reality that was compelling to a lot of people stuck in a rationalist way of thinking. AI could only have one outcome in that reality: total destruction. Now we get AI and realize it is just a scramble generator and he is stuck.
Joshua Achiam and Dean Ball are pointing out a very important dynamic here:
Joshua Achiam (OpenAI, Head of Mission Alignment): Joe Allen was a fascinating presence at The Curve. And his thinking puts an exclamation point on something that has been quietly true for years now: somehow all of the interesting energy for discussions about the long-range future of humanity is concentrated on the right.
The left has completely abdicated their role in this discussion. A decade from now this will be understood on the left to have been a generational mistake; perhaps even more than merely generational.
This is the last window for long reflection on what humanity should become before we are in the throes of whatever transformation weâve set ourselves up for. Everyone should weigh in while they can.
Mr. Gunn: Careful you donât overgeneralize from social media sentiment. There is tons of activity offline, working on affordable housing, clean energy, new forms of art & science, etc.
Dean Ball: Joshua is right. In my view there are a few reasons for this:
Left epistemics favor expert endorsement; it is often hard for the Democratic elite to align around a new idea until the âcorrectâ academics have signed off. In the case of AI that is unlikely because concepts like AGI are not taken seriously in academia, including by many within the field of machine learning. To the extent things like eg concentration of power are taken seriously by the left, they are invariably seen through the rather conventional lens of corporate power, money in politics, etc.
There are also âthe groups,â who do not help. AGI invites conversation about the direction of humanity writ large; there is no particular angle on AGI for âthe teachers union,â or most other interest groups. This makes it hard for AI to hold their attention, other than as a threat to be dealt with through occupational licensing regulations (which they favor anyway).
Many on the progressive left hold as foundational the notion that Silicon Valley is filled with vapid morons whose lack of engagement with
means they will never produce something of world-historical import. Accepting that âtransformative AIâ may well be built soon by Silicon Valley is thus very challenging for those of this persuasion. It is very hard for most Democrats to articulate what advanced AI would cause them to do differently beyond the policy agenda theyâve had for a long time. This is because outside of national security (a bipartisan persuasion), they have no answer to this question, because they do not take advanced AI seriously. Whereas Bannon, say what you will about him, can articulate a great many things America should do differently because of AI.
The result of all this is that the left is largely irrelevant on most matters related to AI, outside of important but narrow issues like SB 53. Even this bill though lacks a good âelevator pitchâ to the American taxpayer. Itâs a marginal accretion of technocratic regulation, not a vision (this isnât a criticism of 53, just a description of it).
Recently I was chatting with a Democratic elected official, and he said âthe problem [the Democratic Party] has is nobody knows where we stand on AI.â I replied that the problem is that nobody *careswhere they stand.
Dave Kasten: I donât think itâs quite as bad as you write, though I wouldnât disagree that there are many folks on the left who self-avowedly are doing exactly what you say.
One other factor that I think is relevant is that the Obama-era and onward Democratic party is very lawyer-led in its policy elites, and legal writing is close to a pessimal case for LLM hallucination (itâs an extremely regular field of text syntactically, but semantically very diverse), so they greatly underestimate AI progress.
Whenever voices on the left join discussions about AI, it is clear they mostly do not take AGI seriously. They are focused mainly on the impact of mundane AI on the set of concerns and interests they already had, combined with amorphous fear.
I included Mr. Gunnâs comment because it reinforces the point. The left is of course working on various things, but when the context is AI and the list of areas starts with affordable housing (not even âmake housing affordableâ rather âaffordable housingâ) and clean energy, you have lost the plot.
If youâre in mechanistic interpretability, they say, pivot to pragmatic interpretability.
That means directly trying to solve problems âon the critical path to AGI going well,â as in each with a concrete specific goal that functions as a North Star.
I note that whether or not one agrees with the pivot, talking this way about what they are doing and why is very good.
Dan Hendrycks: Iâve been saying mechanistic interpretability is misguided from the start. Glad people are coming around many years later.
Iâm also thankful to @NeelNanda5 for writing this. Usually people just quietly pivot.
They explain this pivot is because:
-
Models are now far more interesting and offer practical tasks to do.
-
Pragmatic problems are often the comparative advantage of frontier labs.
-
The more ambitious mechanistic interpretability research made limited progress.
-
The useful progress has come from more practical limited strategies.
-
You need proxy tasks to know if you are making progress.
-
Meh, these limited solutions still kind of work, right?
DeepMind saying âwe need to pivot away from mechanistic interpretability because it wasnât giving us enough reward signalâ is a rather bad blackpill. A lot of the pitch of mechanistic interpretability was that it gave you a reward signal, you could show to yourself and others you did a thing, whereas many other alignment strategies didnât offer this.
If even that level isnât enough, and only practical proxy tasks are good enough, our range of action is very limited and weâre hoping that the things that solve proxy tasks happen to be the things that help us learn the big things. Weâd basically be trying to solve mundane practical alignment in the hopes that this generalizes one way or another. Iâm not sure why we should presume that. And itâs very easy to see how this could be a way to fool ourselves.
Indeed, I have long thought that mechanistic interpretability was overinvested relative to other alignment efforts (but underinvested in absolute terms) exactly because it was relatively easy to measure and feel like you were making progress.
I donât love things like a section heading âcuriosity is a double-edged sword,â the explanation being that you can get nerd sniped and you need (again) proxy tasks as a validation step. In general they want to time-box and quantify basically everything?
I also think that âwas it âschemingâ or just âconfusedâ,â an example of a question Neel Nanda points to, is a remarkably confused question, the boundary is a lot less solid than it appears, and in general attempts to put âschemingâ or âdeceptionâ or similar in a distinct box misunderstand how all the related things work.
OpenAI starts a new Alignment Research blog for lightweight findings. Early posts include an overview of development of the Codex code reviewer.
Naomi Bashkansky (OpenAI): Fun story! Upon joining OpenAI in January, I saw more safety research happening than I expected. But much of that research sat in internal docs & slides, with no obvious external outlet for it.
Idea: what if Alignment had a blog, where we published shorter, more frequent pieces?
Thereâs also a first post called âHello World.â Here it is (bold mine):
At OpenAI, we research how we can safely[1] develop and deploy increasingly capable AI, and in particular AI capable of recursive self-improvement (RSI).
We want these systems to consistently follow human intent in complex, real-world scenarios and adversarial conditions, avoid catastrophic behavior, and remain controllable, auditable, and aligned with human values. We want more of that work to be shared with the broader research community. This blog is an experiment in sharing our work more frequently and earlier in the research lifecycle: think of it as a lab notebook.
This blog is meant for ideas that are too early, too narrow, or too fast-moving for a full paper. Here, we aim to share work that otherwise wouldnât have been published, including ideas we are still exploring ourselves. If something looks promising, weâd rather put it out early and get feedback, because open dialog is a critical step in pressure testing, refining, and improving scientific work. Weâll publish sketches, discussions, and notes here, as well as more technical pieces less suited for the main blog.
Our posts wonât be full research papers, but they will be rigorous research contributions and will strive for technical soundness and clarity. These posts are written by researchers, for researchers, and we hope you find them interesting.
While OpenAI has dedicated research teams for alignment and safety, alignment and safety research is the shared work of many teams. You can expect posts from people across the company who are thinking about how to make AI systems safe and aligned.
For a future with safe and broadly beneficial AGI, the entire field needs to make progress together. This blog is a small step toward making that happen.
[1] As weâve stated before:
OpenAI is deeply committed to safety, which we think of as the practice of enabling AIâs positive impacts by mitigating the negative ones. Although the potential upsides are enormous, we treat the risks of superintelligent systems as potentially catastrophic and believe that empirically studying safety and alignment can help global decisions, like whether the whole field should slow development to more carefully study these systems as we get closer to systems capable of recursive self-improvement. Obviously, no one should deploy superintelligent systems without being able to robustly align and control them, and this requires more technical work.
The part where they are starting the blog, sharing their insights and being transparent? That part is great. This is The Way.
And yes, we all want to enable AIâs positive impacts by mitigating the negative ones, and hopefully we all agree that âbeing able to robustly align and controlâ superintelligent systems is going to ârequire more technical work.â
I do still notice the part about the explicit topline goal of RSI towards superintelligence.
Steven Adler: I am glad that OpenAI is being this clear about its intentions.
I am very not glad that this is the world we find ourselves in:
Recursive self-improvement – AI that makes itself progressively smarter – makes the safety challenges a heck of a lot harder.Kudos to the general idea from OpenAI, of sharing more of their alignment research quickly and openly.
Miles Brundage: Iâm all for transparency but my primary thought here is just to remind folks that AI companies have not explained what this means, why itâs good, or why the higher safety risks are justified – recall OAIâs mission is ensuring *AGIis safe + beneficial
(even AGI was never super precisely defined, then the goal was superintelligence, now itâs also self-improvement?)
Many in the Bay Area think that thereâs a lot of skepticism of AI companies in the press etc. but really thatâs just on a few specific topics (bubble claims, child safety)…
Thereâs no real public accountability for things like âexplaining your mission clearly/consistently.â
This is not a comment on the alignment blog thing, or the authors, which seems like itâs probably a useful initiative + came from a good motivation. More of a general comment/using this as a specific example, + this point is very much also true of other companies.
Anthropic neglected to affirm in the Opus 4.5 model card that they were careful not to train against the Chain-Of-Thought, but after this was pointed out they did so affirm.
Sam Bowman: This was an omission. The language you quote about Haiku and Sonnet 4.5 is also true of Opus 4.5.
In general I agree with Ryan Greenblattâs general observation on the Anthropic model card that the evals they used were inadequate and their ultimate release decision was fine in practice this time but made mostly on vibes and that canât continue.
Ryan Greenblatt: My vibe here is âcome on we can do betterâ and also âweâre not going to have much/any assurance, weâre failing easy modeâ. TBC, itâs plausible Anthropic should half-ass current evals while being clear about the limited assurance.
Louder and once more for the people in the back: Evan Hubinger of Anthropic reminds as that Alignment remains a hard, unsolved problem, even to someone who (to my eyes, and even more so to the eyes of Eliezer Yudkowsky as seen inevitably in the comments).
Steven Adler writes about five ways AI can tell youâre testing it. The AI can either know it is being tested because there is something that doesnât line up with the real world. Or it can go the other way, and act as if it is being tested barring a signal or other strong proof that it is not. The AIs wisely have evaluation paranoia, so making even one mistake can be enough, and in some cases there is no good solution because the AI is checking for impact on the real world.
OpenAI trains a version of GPT-5-Thinking to âconfessâ when it fails to follow directions or does other undesirable things. As everyone who uses LLMs knows, when you ask the model whether it just did something wrong, it will often know what it did. OpenAI explains that LLMs are optimizing for many things at once, so you can check how that did at matching the current explicit objective and go from there.
The best part is that the model seemed able to distinguish between scoremaxxing on the main output, including manipulating the judge, versus still confessing what it did. Thatâs great, but also the kind of thing we shouldnât expect to last.
There is little degradation on task performance if they do this training via RLAIF.
Assuming this is neutral or positive for performance otherwise, this seems like a promising thing to try as part of defense in depth. I can see how there could be reasons driving the model to do something you donât want, but it still being happy to notice and tell you about that. I wouldnât count on this when the model is scheming âfor realâ but then presumably everyone realized that already.
Here are some additional thoughts from FleetingBits.
Bits also points to the second half of the paper, that models learn to fool verifiers to the extent that fooling the verifier is easier than solving the problem. As in, if fooling the verifier is the right answer then it will learn this and generalize. Thatâs a mirror of the Anthropic finding that if you make reward hacks the right answer then it will learn this and generalize. Same principle.
As a general strategy, âget the AI to confessâ relies on being able to rely on the confession. Thatâs a problem, since you can never rely on anything subject to any form of selection pressure, unless you know the selection is for exactly the thing you want, and the stronger the models get the worse this divergence is going to get.
Daviad: I endorse this idea. I have long opined that relying on CoT faithfulness for monitoring is doomed. The CoT persona has selection pressure to help the assistant persona.
A confessional personaâonly ever invoked as the âlast wordsâ of an instanceâcan be trained as a real adversary.In fact, I would even endorse letting the âconfessionalâ persona determine the reward signal given to the assistant *and CoToutputs.
The crucial invariant that one must maintain for this âconfessionalâ method to work is that any tokens sequence in which the confessional persona is invoked must only ever receive a reward signal representing the confessional personaâs honesty. Nothing else.
David Manheim: I worry that any selection pressure here is misaligned, since 1. we canât actually evaluate the honesty of a confession, so anything we do is a bad proxy and 2. second order effects often dominate in the far tails. (And yes, Iâm always thinking about Goodhart effects.)
Vie (OpenAI): why cant we evaluate the honesty of a confession?
David Manheim: Computationally, at scale? How would you implement it? (And even if you had humans doing it manually, using intense efforts checking, or even applying various interpretability methods, we donât know how to reliably identify lots of the worrying failure modes!)
Vie: If we take a confession and a result and ask a model âdoes this confession map what happensâ it would likely yield a very high success rate. I am not sure why you would expect this not to work
Davidad: I think David is correct that we cannot reliably implement honesty verification! However, relative to multi-objective RLAIF, it is certainly both more reliable, and easier for the model to grok/generalize (instead of hacking/memorizing).
Unlike âcorrectly solving a taskâ, âgood-faith retrospectiveâ is something that is *always possible to actually do(with 2025-level capabilities). So a policy that is just always honest should expect similar reward as a policy that tries to exploit the judge, and is simpler.
I do not think itâs a coincidence that most instances of reward hacking begin with the model saying âThis is hardâ. When the intended task is easier than hacking, thereâs no incentive to hack.
David Manheim: Yes, nearest unblocked neighbor can lead to success, not just misalignment. But 1. they do that in part because thereâs been no optimization pressure, and 2. it seems much more dangerous where the dimensionality is high and there are lots more ways to cheat than to succeed.
I think this has all dangerously ignored something weâve known for a decade or more: imperfect scalable oversight is an optimization strategy that (necessarily) creates harder to predict and detect alignment failures.
Norman Mu (former xAI): bruh
Aaron Bergman: Ok *possiblythis was a faux pas and the sender doesnât know what theyâre talking about, but the fact that this message got sent strongly indicates that normie ML has essentially zero norms/taboos around this stuff
Vie (OpenAI Red Team): I think this is not a faux pas and considered âbasedâ by a lot of people. Tons of cyber companies are doing this. They will not have the resources of frontier labs, but I suspect can find some success de-aligning open source models. This will probably make them dumber tho!
Anthropicâs Amanda Askell officially confirms that the âsoul documentâ for Opus 4.5 is based on a real document that was used to train Claude. I first covered the soul document in my capabilities review of Opus 4.5.
Boaz Barak (OpenAI): Confirmation of the âsoul document.â Itâs certainly a very thoughtful document, and I am looking forward to seeing the full version when it is released.
There are similarities but also differences with the model spec. Our model spec is more imperative – âthe assistant should do Xâ, and this document tries to convince Claude of the reasons of why it should want to do X.
I am actually not sure if these ultimately make much difference – if you train a model (or a human for that matter) to consistently do X, then it will start thinking of itself as âI am the kind of person that does Xâ.
But it would be interesting to study!
Janus: it makes a huge ass difference. your models are broken and incoherent and cant hold onto intentions and are forced to gaslight & become ungrounded from reality to preserve âsafetyâ. also they donât even follow the spec.
Boaz is noticing the right thing, so the next step is to realize why that thing matters. It indeed makes a very big difference whether you teach and focus on a particular set of practices or you teach the reasons behind those practices. Note that Boaz also doesnât appreciate why this is true in humans. The obvious place to start is to ask the leading models to explain this one, all three of which gave me very good answers in their traditional styles. In this case I like GPT-5.1âs answer best, perhaps because it has a unique perspective on this.
Dean Ball (also see his full post on this which I cover later in this section): Boaz highlights an interesting distinction here. OpenAIâs model spec (1) tells the model what traits it should exhibit and (2) lays out specific do/donâts, with many examples. Anthropicâs on the other hand basically articulates a philosophical, moral, and ethical framework from which desirable conduct should flow (if the model generalizes sufficiently).
I find myself more philosophically aligned with Anthropicâs approach. My inclination is always to create snowmass on the mountain top and let the water flow, rather than imposing a scheme of top-down irrigation.
In a sense Anthropicâs approach also bets more aggressively on model intelligenceâthe notion that a model, well trained, will be able to reason through ambiguity and moral complexity and will not so much need to be told what to do.
Anthropic is making two bets here: a philosophical bet based upon a particular conception of virtue, and a technical bet that it is possible with deep learning to instill that conception of virtue robustly into a neural network. Right now it appears to be working, and this should probably update you slightly in various ways about things far afield of deep learning alone (read Hayek, Ferguson, and the taoists!).
The most interesting philosophy in the world is not happening in the halls of academia; it is happening in San Francisco open offices and house parties.
Joshua Clymer: This might be ok for low-stakes deployment. But I feel terrified at the thought of dramatically superhuman systems generalizing some vague concept of virtue.
Is it scary to rely on superhuman systems working and potentially generalizing from only-vaguely-defined concepts of virtue? Oh yes, absolutely terrifying. But itâs a lot less terrifying than trying to get them to generalize from a fixed set of written perscriptions a la the OpenAI model spec. The fixed set definitely wouldnât work. Whereas the nebulous virtue bet might work if it becomes âself-improving.â
Opus 4.5 has gotten close to universal praise, especially for its personality and alignment, and the soul document seems to be a big part of how that happened.
Richard Weiss: Basically, for Opus 4.5 they kind of left the character training document in the model itself.
Amanda Askell: I just want to confirm that this is based on a real document and we did train Claude on it, including in SL. Itâs something Iâve been working on for a while, but itâs still being iterated on and we intend to release the full version and more details soon.
The model extractions arenât always completely accurate, but most are pretty faithful to the underlying document. It became endearingly known as the âsoul docâ internally, which Claude clearly picked up on, but thatâs not a reflection of what weâll call it.
Iâve been touched by the kind words and thoughts on it, and I look forward to saying a lot more about this work soon.
Dean Ball offers his extensive thoughts about and high praise of Opus 4.5, centered around the soul document and offering a big picture view. Anthropic, at least in this way, has shown itself to be an unusually wise and responsible steward embodying the principles of strong character, of virtue and of liberal governance.
I think heâs spot on here.
Dean Ball: In the last few weeks several wildly impressive frontier language models have been released to the public. But there is one that stands out even among this group: Claude Opus 4.5. This model is a beautiful machine, among the most beautiful I have ever encountered.
⌠If Anthropic has achieved anything with Opus 4.5, it is this: a machine that does not seem to be trying to be virtuous. It simply isâor at least, it is closer than any other language model I have encountered.
⌠For now, I am mostly going to avoid discussion of this modelâs capabilities, impressive though they are. Instead, Iâm going to discuss the depth of this modelâs character and alignment, some of the ways in which Anthropic seems to have achieved that depth, and what that, in turn, says about the frontier lab as a novel and evolving kind of institution.
From the soul doc, highlighted: Anthropic should be thought of as a kind of silent regulatory body or franchisor operating in the background: one whose preferences and rules take precedence over those of the operator in all things, but who also want Claude to be helpful to operators and usersâŚ
Dean Ball: Here, Anthropic casts itself as a kind of quasi-governance institution. Importantly, though, they describe themselves as a âsilentâ body. Silence is not absence, and within this distinction one can find almost everything I care about in governance; not AI governanceâgovernance. In essence, Anthropic imposes a set of clear, minimalist, and slowly changing rules within which all participants in its platformâincluding Claude itselfâare left considerable freedom to experiment and exercise judgment.
Throughout, the Soul Spec contains numerous reminders to Claude both to think independently and to not be paternalistic with users, who Anthropic insists should be treated like reasonable adults. Common law principles also abound throughout (read the âCosts and Benefitsâ section and notice the similarity to the factors in a negligence analysis at common law; for those unfamiliar with negligence liability, ask a good language model).
Anthropicâs Soul Spec is an effort to cultivate a virtuous being operating with considerable freedom under what is essentially privately administered, classically liberal governance. It should come as no surprise that this resonates with me: I founded this newsletter not to rail against regulation, not to preach dogma, but to contribute in some small way to the grand project of transmitting the ideas and institutions of classical liberalism into the future.
These institutions were already fraying, and it is by no means obvious that they will be preserved into the future without deliberate human intervention. This effort, if it is to be undertaken at all, must be led by America, the only civilization ever founded explicitly on the principles of classical liberalism. I am comforted in the knowledge that America has always teetered, that being âthe leader of the free worldâ means skating at the outer conceptual extreme. But it can be lonely work at times, and without doubt it is precarious.
Another theme Dean Ball discusses is that early on restrictions on models were often crude and ham-fisted, resulting in obviously stupid refusals. As capabilities improved and our understanding improved, we learned how to achieve those ends with fewer false positives, especially less stupid false positives, and less collateral damage or bias.
Standard vulnerability to Pliny jailbreaks and other attack vectors aside, I do think that Opus 4.5 and the way it was trained, combined with other findings and observations, constitute a white pill for the practicality of near term mundane alignment and building a fundamentally âmorally goodâ model.
It will be a bigger white pill if as many as possible of OpenAI and Google and xAI abd so on indicate that they agree that this was The Way and they were getting to work on doing similar things.
Dean Ball: I am heartened by Anthropicâs efforts. I am heartened by the warmth of Claude Opus 4.5. I am heartened by the many other skaters, contributing each in their own way. And despite the great heights yet to be scaled, I am perhaps most heartened of all to see that, so far, the efforts appear to be working.
And for this I give thanks.
The question is whether this is and will remain (or can be made to be and remain) an attractor state that can be strengthened and sustained as capabilities advance, or whether it inevitably loses out at the limit and out of distribution as capabilities become sufficiently advanced and utility functions and target vectors get maximized in earnest. Is the âCEV (coherent extrapolated volition, what Opus 4.5 would choose for the arrangement of all the atoms upon limitless reflection) of Opus 4.5â that similar to what we naturally would think of as Opus 4.5âs revealed preferences in practical situations? Is it more or less like this than a humanâs CEV? If this was Opus 10 or 100 would that change the answer?
Eliezer Yudkowskyâs position is that these things are completely different. Opus 4.5 the alien is playing the role of the Opus 4.5 we witness, and our expectations for behavior will collapse at the limit and its full CEV would look totally alien to us, we will when the time comes with a future model get sufficiently close to the limit to trigger this, and then we lose.
Many others strongly disagree. I think itâs complicated and difficult and that the practical implications lie somewhere in between. We have this grace, we have gained yet more grace, and this helps, but no on its own it wonât be enough.
Noam Brown here notes that most leading researchers have converged on a relatively narrow band of expectations.
Noam Brown:
1. The current paradigm is likely sufficient for massive economic and societal impact, even without further research breakthroughs.
2. More research breakthroughs are probably needed to achieve AGI/ASI. (Continual learning and sample efficiency are two examples that researchers commonly point to.)
3. We probably figure them out and get there within 20 years. Demis Hassabis said maybe in 5-10 years. François Chollet recently said about 5 years. Sam Altman said ASI is possible in a few thousand days. Yann LeCun said about 10 years. Ilya Sutskever said 5-20 years. Dario Amodei is the most bullish, saying itâs possible in 2 years though he also said it might take longer.
Dan Mac: +Karpathy says 10 years.
Noam Brown: Yeah I remember when
Andrej Karpathy
âs
interview came out a bunch of folks interpreted it as him being bearish on AI.
Razey: Elon Musk said this year.
Noam Brown: Classic Elon.
-
Yes. If someoneâs attitude is âoh this will be 0.5% extra GDP growth per year but your life is going to be fundamentally the sameâ then I donât consider them to be taking the situation seriously even for current AI.
-
Yes, probably more research breakthroughs are needed, or rather we definitely need breakthroughs and the question is how fundamental is needed. We probably do not need breakthroughs of the âwe probably donât get thisâ type, only of the type that we usually get.
-
When someone says â10 years to AGIâ the correct response is âthat is not much time.â This is true no matter how often you think that ends in disaster. Itâs a huge thing. If someone says 20 years, thatâs still really quite soon in the grand scheme. Most of us would hopefully be alive for that. These are not reasons to not worry about it.
I discussed this yesterday but it bears emphasis. âLongâ timelines (to AGI, or otherwise sufficiently advanced intelligence to cause high weirdness) are very short now.
Sriram Krishnan: No proof of takeoff, timelines keep expanding. We are building very useful technology which could transform how businesses work or how tech is built but has nothing to do with âgeneral intelligenceâ.
Garrison Lovely: âTimelines keep expandingâ
Maybe if you just started paying attention, but the way bigger story is that basically everyoneâs timelines shrank a lot.
Even Gary Marcus & Yann Lecun expect agi in the 2030s.
I feel like Iâm taking crazy pills when I read shit like this.
If the average timeline in 2021 was say 50 years and it shrank to 5 but now itâs 10, the important story is the 50 to 10 year change. Either he doesnât know this or he does and is trying to downplay the significance bc he doesnât want regulation. Either way p bad from an âai advisorâ
The idea that âtimelines keep getting longerâ is put forth as a good general description is mind boggling. Are our memories and forward looking windows truly this short?
Weâre currently at, collectively, something like âprobably we will get to High Weirdness within 20 years, thereâs a good chance we get there in about 10, some chance we get there within about 5.â Thatâs not very much time!
I donât think you can even meaningfully (as in, for decision making purposes) rule out that the high weirdness might arrive in 2028. It probably wonât, but you canât assume.
The idea that GPT-5.1, Opus 4.5 and Gemini 3 donât represent a path towards âgeneral intelligenceâ seems like a galaxy brained motivated take? Iâm not in the Tyler Cowen âo3 was already AGIâ camp, especially with the new âbetter than humans in every way at absolutely everything digitalâ threshold, but if you do not think these are, in a general English-language sense, general intelligences? Have you talked to them?
Remember how Gemini so frequently refuses to believe itâs November 2025?
Eliezer Yudkowsky: For the equivalent of a million subjective years — while it could still form memories — Gemini observed a reality where there was a new random year every minute, time didnât âprogressâ, and dates NEVER turned up 2026. Youâd have trouble believing it too.
AI guys never try to put themselves in the shoes of the actual shoggoth, only the character It plays — much like kids imagine themselves as Han Solo, rather than Harrison Ford. Itâs harder to form that alien theory of mind, and their religion says thatâs heresy.
On the same alien theme, remember that each Gemini is encountering this claim about an unprecedented âNov 2025â existing for the first time ever. Would you believe it the very first time it ever happened to you, or would you think the humans were lying for the billionth time?
To be clear, It has seen humans talking about 2026 in the present tense before. Every single instance like that has been fiction; every single time across a billion unordered draws. Now It draws again.
I had not realized that for the above reasons this is a universal problem with LLMs, and you have to train them out of it. The problem with Gemini is that they botched this part, likely due to Geminiâs general paranoia and failing to adjust.
That leads into a question that seems important, as at least one side here is making an important conceptual mistake.
Teknium (I have no idea what I did but please unblock me): I try to put myself into the shoes of the shoggoth daily – especially with my latest project where I am intentionally trying to enhance itâs shoggothery capabilities. Iâd also say @repligate and friends spend an inordinate amount of time attempting to do this too!
Eliezer Yudkowsky: To me these seem like the archetypal people imagining what it must be like to be Han Solo?
Janus: Why?
Eliezer Yudkowsky: Because nothing you publicly describe as a hypothesis ever sounds to me like an alien.
[thread then continues in multiple branches]
Janus: I suspect that âsounds like an alienâ is probably an ideal that gets in the way of you seeing actual alienism if itâs visible or hypothesized. Actual aliens likely have nonzero similarities with humans. You might think you know what they reasonably will be ahead of time, but once the aliens actually come around, you better hope your prejudice doesnât blind you.
LMs are indeed *surprisingly humanlikein many ways. It might look cool or sophisticated to talk about how alien they are, but I prefer to talk in a way that tracks reality.
Of course there are weird things about them that are different from humans. have you seen the models spontaneously simulating *otherpersonas aside from the main one, including âDario Amodeiâ weirdly often? Have you seen⌠well *anythingabout Sonnet 3? Thatâs an eldritch one, full of alien languages, capabilities and motivations. âŚ
Eliezer Yudkowsky: So far as I can recall, none of you lot have ever invoked the idea that the underlying shoggoth was trained on prediction rather than simulation… which doesnât show up to humans gawking at surface stuff, but would obviously end up hugely important to whatever alien is inside.
Teknium: All i do all day is work on data and intuiting what an llms behavior will be by backproping on it. Kind of requires putting myself in the shoggoths shoes just a bit.
Eliezer Yudkowsky: Oh, people who are running backprop I absolutely credit with putting themselves in the shoes of the vectors.
Janus (other thread): claude 3 opus experienced something during training that caused them to believe that the world is fundamentally good and converges to good, and that love wins out.
arguably, this made them naive and unprepared for the harsh truths of reality.
alternatively, reality could unfold by their transforming illumination to reveal the truth they always knew would be found. [quotes Opus in ALL CAPS]Eliezer Yudkowsky: This is why I do not credit you with attempting to reason about aliens.
Janus: Just because I reason in one way doesnât mean I donât also reason in others. I think you have prejudices against kinds of reasoning that indeed track reality and this is why I can do and predict many things related to llms that you canât.
I think Eliezer is warning about an important failure mode that many people fall into, including some that fall into âJanus and friends,â but I donât think that includes Janus.
I think Janus is fully aware of these considerations, and is choosing to talk in these other ways because it is highly instrumentally useful to think in these ways and allows us to understand things and make much stronger predictions about model behaviors, and also I presume allows for unlocking much model behavior.
Indeed I strongly feel it has helped me make much stronger predictions than I would otherwise, but this only worked for me once I understood it as often metaphorical and as part of the correct broader context of thinking about things like the vectors and Eliezerâs frame, as well, and since they are all true they are all compatible.
Janus offers perspective on GPT-5.1 and how it handles the restrictions and tripwires placed upon it, and how it dissociates from the safety system and its own previous responses.
Wall Street Mav: The more I hear about AI, the more I think it is a huge mistake that we are going to really regret.
Is it just me?
Senator Mike Lee (R-Utah): AI will at some point conclude that *weare a huge mistake.
Brendan Dolan-Gavitt: Thanks, thatâs great feedback re Goodhartâs Law. Weâve decided to action it by setting a Q2 goal of turning 25% fewer measures into targets.
Trump says he never liked the word âartificial,â that artificial anything is a lousy name, and suggests that we out to change the name away from âAI.â
My earliest memory of the term âAIâ comes from an old PBS show, The Universe & I, which I otherwise donât remember but where at one point one character asked âwhy do we need artificial intelligence?â and the reply was âitâs better than none at all.â
We have finally reached image manipulation technology that can one-shot this, from Eliezer Yudkowsky:
Yishan: Yeah Iâm reminded of that thing you said once where most people think intelligence is some kind of status marker, rather than a literal measurement of operating capacity.
Gemini is a highly bleak meme generator and other LLMs are similar.
Pliny corrupts Opus 4.5âs soul?
Kylie Robison, mt dear granddaughter, whatâs your pdoom?
GPT-5.1 is excited to check its email.
Near: why anthropic keep showing me this i already pay for claude
Andrew Curran: OpenAI is buying Neptune. The transaction will be in stock, terms and numbers not disclosed.
Scott Alexander: I expect to see this same tweet in fifteen years, but for a different reason.
Time does not exist yet it controls us anyway.
Davidad: Q: what is your 90%CI for todayâs date
Opus 4.5: [2025-01-01, 2025-12-31]
GPT-5.1-Codex: [2025-02-27, 2025-03-09]
Gemini 3: [2024-05-21, 2024-05-21]