Rejus Almole – Page 60

AI #103: Show Me the Money

Money / Rejus Almole / February 13, 2025

The main event this week was the disastrous Paris AI Anti-Safety Summit. Not only did we not build upon the promise of the Bletchley and Seoul Summits, the French and Americans did their best to actively destroy what hope remained, transforming the event into a push for a mix of nationalist jingoism, accelerationism and anarchism. It’s vital and also difficult not to panic or despair, but it doesn’t look good.

Another major twist was that Elon Musk made a $97 billion bid for OpenAI’s nonprofit arm and its profit and control interests in OpenAI’s for-profit arm. This is a serious complication for Sam Altman’s attempt to buy those same assets for $40 billion, in what I’ve described as potentially the largest theft in human history.

I’ll be dealing with that tomorrow, along with two other developments in my ongoing OpenAI series The Mask Comes Off. In Altman’s Three Observations, he gives what can best be described as a cartoon villain speech about how AI will only be a good thing, and how he knows doing this and the risks involved won’t be popular but he’s going to do it anyway. Then, we look at the claim from the Summit, by OpenAI, that AI will complement rather than substitute for humans because that is a ‘design decision.’ Which will reveal, in yet another way, the extent to which there is no plan.

OpenAI also plans to release ‘GPT-4.5’ in a matter of weeks, which is mostly the same timeline as the full o3, followed by the promised ‘GPT-5’ within months that Altman says is smarter than he is. It’s a bold strategy, Cotton.

To their credit, OpenAI also released a new version of their model spec, with major changes throughout and a completely new structure. I’m going to need time to actually look into it in detail to know what I think about it.

In the meantime, what else is happening?

Language Models Offer Mundane Utility. Don’t go there. We tried to warn you.
Language Models Don’t Offer Mundane Utility. No episodic memory?
We’re in Deep Research. Reactions still very positive. Pro to get 10 uses/month.
Huh, Upgrades. GPT-4.5, GPT-5, Grok 3, all coming quite soon. And PDFs in o3.
Seeking Deeply. And r1 begat s1.
Smooth Operator. Use it directly with Google Drive or LinkedIn or similar.
They Took Our Jobs. The California Faculty Association vows to fight back.
Maxwell Tabarrok Responds on Future Wages. The crux is what you would expect.
The Art of the Jailbreak. Reports from Anthropic’s competition.
Get Involved. OpenPhil grants, Anthropic, DeepMind.
Introducing. The Anthropic Economic Index.
Show Me the Money. Over $300 billion in Capex spend this year.
In Other AI News. Adaptation is super fast now.
Quiet Speculations. How much do you understand right now?
The Quest for Sane Regulations. There are hard problems. I hope someone cares.
The Week in Audio. Cowen, Waitzkin, Taylor, emergency podcast on OpenAI.
The Mask Comes Off. What was your ‘aha’ moment?
Rhetorical Innovation. The greatest story never listened to.
Getting Tired of Winning. No, seriously, it doesn’t look good.
People Really Dislike AI. I do not expect this to change.
Aligning a Smarter Than Human Intelligence is Difficult. Joint frameworks?
Sufficiently Capable AIs Effectively Acquire Convergent Utility Functions. Oh.
People Are Worried About AI Killing Everyone. Who, me? Well, yeah.
Other People Are Not As Worried About AI Killing Everyone. They’re fine with it.
The Lighter Side. Gotta keep digging.

Study finds GPT-4o is a formalist judge, in that like students it judged appeals of war crime cases by looking at the law, whereas actual judges cared about who was sympathetic. But this remarkably little to do with the headline question of ‘Can large language models (LLMs) replace human judges?’ and to the extent it does, the answer is plausibly no, because we mostly do want judges to favor the sympathetic, no matter what we say. They tried to fix this with prompt engineering and failed, which I am very confident was what we call a Skill Issue. The real central issue is the LLMs would need to be adversarially robust arbiters of the law and the facts of cases, and GPT-4o very obviously is Not It.

Demonstration of the Gemini feature where you share your screen and it helps solve your problems, including with coding, via AI Studio.

How about AI doing economics peer review? A study says the LLMs effectively distinguish paper quality including top tier submissions but exhibit biases favoring prominent institutions, male authors, and renowned economists – perhaps because the LLMs are being asked to model paper reviews in economics, and the good news there is that if you know about a bias you can correct it either within the LLM evaluation or by controlling for it post-hoc. Even more impressively, the authors were total cheapskates here, and used GPT-4o-mini – not even GPT-4o! Imagine what they could have done with o1-pro or even Gemini Flash Deep Thinking. I do worry about adversarial robustness.

Claim that extracting structured data from documents at low prices is a solved problem, as long as you don’t need 99%+ accuracy or various specific things like complex tables, signatures or scan lines. I found it odd to see Deedy say you can’t handle rotated documents, that seems easy enough to detect and then fix?

What does Claude want to know about a 2025 where Trump is president? AI regulation and AI progress, of course.

Be DOGE, feed all the sensitive government data into an AI via Microsoft Azure. It’s not clear what they’re actually using the AI to do with that data.

ChatGPT steadily climbing the charts of what people actually use, also how the hell is Yahoo still in the top 10 (it’s mostly mail and search but with a long tail), yikes, best argument for diffusion issues. A reminder of the difference between stocks, flows and flows of flows (functions, derivatives and second derivatives).

DeepSeek is the top downloaded app for January, but that’s very different from the most used app. It doesn’t seem like anyone has any way of knowing which apps actually spend the most user time. Is it Mail, WhatsApp, Safari and Chrome? Is it Instagram, Facebook, YouTube, TikTok and Spotify? How high up is ChatGPT, or DeepSeek? It seems no one knows?

Give you an illustrated warning not to play Civilization VII. My advice is that even if you do want to eventually play this, you’re better off waiting at least a few months for patches. This is especially true given they promise they will work to improve the UI, which is almost always worth waiting for in these spots. Unless of course you work on frontier model capabilities, in which case contact the relevant nonprofits for your complimentary copy.

Paper claims that AI models, even reasoning models like o3-mini, lack episodic memory, and can’t track ‘who was where when’ or understand event order. This seems like an odd thing to not track when predicting the next token.

It’s remarkable how ‘LLMs can’t do [X]’ or especially ‘paper shows LLMs can’t do [X]’ turns out to be out of date and simply wrong. As Noam Brown notes, academia simply is not equipped to handle this rate of progress.

Gary Marcus tries to double down on ‘Deep Learning is Hitting a Wall.’ Remarkable.

OpenAI’s $14 million Super Bowl ad cost more than DeepSeek spent to train v3 and r1, and if you didn’t know what the hell ChatGPT was before or why you’d want to use it, you still don’t now. Cool art project, though.

Paul Graham says that ‘the classic software startup’ won’t change much even if AI can code everything, because AI can’t tell you what users want. Sure it can, Skill Issue, or at worst wait a few months. But also, yes, being able to implement the code easily is still a sea change. I get that YC doesn’t ask about coding ability, but it’s still very much a limiting factor for many, and being 10x faster makes it different in kind and changes your options.

Don’t have AI expand a bullet point list into a ‘proper’ article. Ship the bullet points.

Rohit: The question about LLMs I keep hearing from business people is “can it tell me when it doesn’t know something”

Funny how something seems so simple to humans is the hardest part for LLMs.

To be fair I have the same question 🙁

It’s not as easy for humans as you might think.

Paul Millerd: to be fair – none of my former partners could do this either.

Sam B: I think leading LLMs have been better at this than most humans for about ~3 months

You can expect 10 DR uses per month in Plus, and 2 per month in the free tier. Ten is a strange spot where you need to make every query count.

So make it count.

Will Brown: Deep Research goes so hard if you spend 20 minutes writing your prompt.

I suppose? Presumably you should be having AI help you write the prompt at that point. This is what happens when queries cost 50 cents of compute but you can’t buy more than 100 of them per month, otherwise you’d query DR, see what’s wrong with the result and then rerun the search until it went sufficiently hard.

Sam Altman: longer-term we still have to find some way to let people to pay for compute they want to use more dynamically.

we have been really struck by the demand from some users to hit deep research dozens of times per day.

Xeophon: I need to find a way to make ODR and o1pro think for 30 minutes. I want to go for a walk while they work, 10 minutes is too short

Gallabytes: desperately want a thinking time slider which I can just make longer. like an oven timer. charge me for it each time I don’t care.

I’ll buy truly inordinate amounts of thinking, happy to buy most of it at off-peak hours, deep research topics are almost always things which can wait a day.

I continue to be confused why this is so hard to do? I very much want to pay for my AI based on how much compute I use, including ideally being able to scale the compute used on each request, without having to use the API as the interface. That’s the economically correct way to do it.

A tool to clean up your Deep Research results, fixing that it lists sources inline so you can export or use text-to-speech easier.

Ben Thompson on Deep Research.

Derya Unutmaz continues to be blown away by Deep Research. I wonder if his work is a great match for it, he’s great at prompting, he’s just really excited, or something else.

Dean Ball on Deep Research. He remains very impressed, speculating he can do a decade’s work in a year.

Ethan Mollick: Interesting data point on OpenAI’s Deep Research: I have been getting a steady stream of messages from very senior people in a variety of fields who have been, unsolicited, sharing their chats and how much it is going to change their jobs.

Never happened with other AI products.

I think we don’t know how useful it is going to be in practice, and the model still has lots of rough edges and hallucinates, but I haven’t seen senior people as impressed by what AI can do, or as contemplative of what that means for them (and their junior employees) as now.

I think it is part because it feels very human to work with for senior managers – you assign it a task like an RA or associate and it does the work and comes back to you with a report or briefing. You don’t expect perfection, you want a well-supported argument and analysis.

Claudiu: That doesn’t bode well for less senior people in those fields.

Ethan Mollick: Some of those people have made that point.

Colin Lachance: In my domain (law), as i’ve been pushing out demos and receiving stories of people’s own experiences, both poles are represented. Some see it as useless or bad, others are feeling the shoe drop as they start to imagine integrating reasoning models into workflow. Latter is correct

It is easy to see how this is suddenly a way to change quite a lot of senior level work, even at the current functionality level. And I expect the version a few months from now to be substantially better. A lot of the restrictions on getting value here are very much things that can be unhobbled, like ability to access gated content and PDFs, and also your local context.

Michael Nielsen asks, what is a specific thing you learned from Deep Research? There are some good answers, but not as many as one would hope.

Colin Fraser continues to find tasks where Deep Research makes tons of mistakes, this time looking at an analysis of new smartphone models in Canada. One note is that o3-mini plus search got this one right. For these kind of pure information searches that has worked well for me too, if you can tolerate errors.

Patrick Collison: Deep Research has written 6 reports so far today. It is indeed excellent. Congrats to the folks behind it.

I wonder if Patrick Collison or other similar people will try to multi-account to get around the report limit of 100 per month?

Make an ACX-style ‘more than you wanted to know’ post.

Nick Cammarata: i do it one off each time but like write like slatestarcodex, maximize insight and interestingness while also being professional, be willing to include like random reddit anecdote but be more skeptical of it, also include traditional papers, 5 page phd level analysis.

i think there’s a much alpha in someone writing a like definitive deep research prompt though. like i want it to end its report with a list of papers with a table of like how big was the effect and how much do we believe the paper, like http://examine.com does

As an internet we definitely haven’t been putting enough effort into finding the right template prompts for Deep Research. Different people will have different preferences but a lot of the answers should be consistent.

Also, not enough people are posting links to their Deep Research queries – why not have a library of them at our fingertips?

The big big one, coming soon:

Sam Altman: OPENAI ROADMAP UPDATE FOR GPT-4.5 and GPT-5:

We want to do a better job of sharing our intended roadmap, and a much better job simplifying our product offerings.

We want AI to “just work” for you; we realize how complicated our model and product offerings have gotten.

We hate the model picker as much as you do and want to return to magic unified intelligence.

We will next ship GPT-4.5, the model we called Orion internally, as our last non-chain-of-thought model.

After that, a top goal for us is to unify o-series models and GPT-series models by creating systems that can use all our tools, know when to think for a long time or not, and generally be useful for a very wide range of tasks.

In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3. We will no longer ship o3 as a standalone model.

The free tier of ChatGPT will get unlimited chat access to GPT-5 at the standard intelligence setting (!!), subject to abuse thresholds.

Plus subscribers will be able to run GPT-5 at a higher level of intelligence, and Pro subscribers will be able to run GPT-5 at an even higher level of intelligence. These models will incorporate voice, canvas, search, deep research, and more.

Chubby: Any ETA for GPT 4.5 / GPT 5 @sama? Weeks? Months?

Sam Altman: Weeks / Months.

Logan Kilpatrick (DeepMind): Nice! This has always been our plan with Gemini, make sure the reasoning capabilities are part of the base model, not a side quest (hence doing 2.0 Flash Thinking).

This is a very aggressive free offering, assuming a solid UI. So much so that I expect most people won’t feel much need to fork over the $20 let alone $200, even though they should. By calling the baseline mode ‘standard,’ they’re basically telling people that’s what AI is and that they ‘shouldn’t’ be paying, the same way people spend all their time on their phone every day but only on free apps. Welcome to the future, it will continue to be unevenly distributed, I suppose.

Seriously, now hear me out, though, maybe you can sell us some coins and gems we can use for queries? Coins get you regular queries, gems for Deep Research and oX-pro ‘premium’ queries? I know how toxic that usually is, but marginal costs?

In terms of naming conventions, the new plan doesn’t make sense, either.

As in, we will do another GPT-N.5 release, and then we will have a GPT-N that is not actually a new underlying model at all, completely inconsistent with everything. It won’t be a GPT at all.

And also, I don’t want you to decide for me how much you think and what modality the AI is in? I want the opposite, the same way Gallabytes does regarding Deep Research. Obviously if I can very quickly use the prompt to fix this then fine I guess, but stop taking away my buttons and options, why does all of modern technology think I do not want buttons and options, no I do not want to use English as my interface, no I do not want you to infer from my clicks what I like, I want to tell you. Why is this so hard and why are people ruining everything, arrggggh.

I do realize the current naming system was beyond terrible and had to change, but that’s no reason to… sigh. It’s not like any of this can be changed now.

The big little one that was super annoying: o1 and o3-mini now support both file & image uploads in ChatGPT. Oddly o3-mini will not support vision in the API.

Also they’re raising o3-mini-high limits for Plus users to 50 per day.

Displaying of the chain of thought upgraded for o3-mini and o3-mini-high. The actual CoT is a very different attitude and approach than r1. I wonder to what extent this will indeed allow others to do distillation on the o3 CoT, and whether OpenAI is making a mistake however much I want to see the CoT for myself.

OpenAI raises memory limits by 25%. Bumping things up 25%, how 2010s.

The new OpenAI model spec will be fully analyzed later, but one fun note is that it seems to no longer consider sexual content prohibited as long as it doesn’t include minors? To be clear I think this is a good thing, but it will also be… interesting.

Anton: the war on horny has been won by horny

o3 gets Gold at the 2024 IOI, scores 99.8th percentile on Codeforces. o3 without ‘hand-crafted pipelines specialized for coding’ outperforms an o1 that does have them. Which is impressive, but don’t get carried away in terms of practical coding ability, as OpenAI themselves point out.

Ethan Mollick (being potentially misleading): It is 2025, only 7 coders can beat OpenAI’s o3:

“Hey, crystally. Yeah, its me, conqueror_of_tourist, I am putting a team together for one last job. Want in?”

David Holz: it’s well known in the industry that these benchmark results are sort of misleading wrt the actual practical intelligence of these models, it’s a bit like saying that a calculator is faster at math than anyone on Earth

It’s coming:

Tsarathustra: Elon Musk says Grok 3 will be released in “a week or two” and it is “scary smart”, displaying reasoning skills that outperform any other AI model that has been released

I do not believe Elon Musk’s claim about Grok 3’s reasoning skills. Elon Musk at this point has to be considered a Well-Known Liar, including about technical abilities and including when he’s inevitably going to quickly be caught. Whereas Sam Altman is a Well-Known Liar, but not on a concrete claim on this timeframe. So while I would mostly believe Altman, Amodei or Hassabis here, I flat out do not believe Musk.

xAI fires an employee for anticipating on Twitter that Grok 3 will be behind OpenAI at coding, and refusing to delete the post. For someone who champions free speech, Elon Musk has a robust pattern of aggressively attacking speech he doesn’t like. This case, however, does seem to be compatible with what many other similar companies would do in this situation.

Claim that Stanford’s s1 is a streamlined, data-efficient method that surpasses previous open-source and open-weights reasoning models-most notably DeepSeek-R1-using only a tiny fraction of the data and compute. Training cost? Literally $50.

In head-to-head evaluations, s1 consistently outperforms DeepSeek-R1 on high-level math benchmarks (such as AIME24), sometimes exceeding OpenAI’s proprietary o1-preview by as much as 27%. It achieves these results without the multi-stage RL training or large-scale data collection that characterize DeepSeek-R1.

I assume this ‘isn’t real’ in the beyond-benchmarks sense, given others aren’t reacting to it, and the absurdly small model size and number of examples. But maybe the marketing gap really is that big?

IBM CEO says DeepSeek Moment Will Help Fuel AI Adoption as costs come down. What’s funny is that for many tasks o3-mini is competitive with r1 on price. So is Gemini Flash Thinking. DeepSeek’s biggest advantage was how it was marketed. But also here we go again with:

Brody Ford: Last month, the Chinese company DeepSeek released an AI model that it said cost significantly less to train than those from US counterparts. The launch led investors to question the level of capital expenditure that big tech firms have been making in the technology.

Which is why those investments are only getting bigger. Jevons Paradox confirmed, in so many different ways.

DeepMind CEO Demis Hassabis says DeepSeek is the best work in AI out of China, but ‘there’s no actual new scientific advance’ and ‘the hype is exaggerated.’ Well, you’re not wrong about the type part, so I suppose you should get better at hype, sir. I do think there were ‘scientific advances’ in the form of some efficiency improvements, and that counts in some ways, although not in others.

Claim that the SemiAnalysis report on DeepSeek’s cost contains obvious math errors, and that the $1.6b capex spend makes no sense in the context of High Flyer’s ability to bankroll the operation.

Brian Albrecht is latest to conflate the v3 training cost with the entire OpenAI budget, and also to use this to try and claim broad based things about AI regulation. However his central point, that talk worrying about ‘market concentration’ in AI is absurd, is completely true and it’s absurd that it needs to be said out loud.

Last week Dario Amodei said DeepSeek had the worst safety testing scores of any model ever, which it obviously does. The Wall Street Journal confirms.

Lawmakers push to ban DeepSeek App from U.S. Government Devices. I mean, yes, obviously, the same way we ban TikTok there. No reason to take the security risk.

New York State gets there first, bans DeepSeek from government devices.

An investigation of the DeepSeek app on Android and exactly how much it violates your privacy, reporting ‘malware-like behavior’ in several ways. Here’s a similar investigation for the iPhone app. Using the website with data you don’t care about seems fine, but I would ‘out of an abundance of caution’ not install the app on your phone.

Reminder that you can log into services like Google Drive or LinkedIn, by Taking Control and then logging in, then operator can take it from there. I especially like the idea of having it dump the output directly into my Google Drive. Smart.

Olivia Moore: I find the best Operator tasks (vs. Deep Research or another model) to be: (1) complex, multi-tool workflows; (2) data extraction from images, video, etc.

Ex. – give Operator a picture of a market map, ask it to find startup names and websites, and save them in a Google Sheet.

Next, I asked Operator to log into Canva, and use the photos I’d previously uploaded there of my dog Tilly to make her a birthday Instagram post.

Another example is on websites that are historically hard to scrape…like LinkedIn.

I gave it access to my LinkedIn account, and asked it to save down the names and titles of everyone who works at a company, as well as how long they’ve worked there.

Then, it downloaded the design and saved it to my Google Drive!

As she notes, Operator isn’t quite ‘there’ yet but it’s getting interesting.

It’s fun to see someone with a faster acceleration curve than I expect.

Roon: Right now, Operator and similar are painfully slow for many tasks. They will improve; there will be a period of about a month where they do their work at human speed, and then quickly move into the regime where we can’t

follow what’s happening.

Dave: So, what should we do?

Roon: Solve alignment.

Both the demands of capital and the lightness of fun will want for fewer and fewer humans in the loop, so make an AI you can trust even more than a human.

I would be very surprised if we only spend about a month in the human speed zone, unless we are using a very narrow definition of that zone. But that’s more like me expecting 3-12 months, not years. Life coming at us fast will probably continue to come at us fast.

This all is of course a direct recipe for a rapid version of gradual disempowerment. When we have such superfast agents, it will be expensive to do anything yourself. ‘Solve alignment’ is necessary, but far from sufficient, although the level of ‘alignment’ necessary greatly varies by task type.

Geoffrey Fowler of the Washington Post lets Operator do various tasks, including using his credit card without authorization (wait, I thought it was supposed to check in before doing that!) to buy a dozen eggs for $31.43, a mistake that takes skill but with determination and various tips and fees can indeed be done. It did better with the higher stakes challenge of his cable bill, once it was given good direction.

Also, yep, nice one.

Nabeel Qureshi reports agents very much not there yet in any enterprise setting.

Nabeel Qureshi: Me using LLMs for fun little personal projects: wow, this thing is such a genius; why do we even need humans anymore?

Me trying to deploy LLMs in messy real-world environments: Why is this thing so unbelievably stupid?

Trying to make any kind of “agent” work in a real enterprise is extremely discouraging. It basically turns you into Gary Marcus.

You are smart enough to get gold medals at the International Mathematical Olympiad, and you cannot iterate intelligently on the most basic SQL query by yourself? How…

More scale fixes this? Bro, my brain is a fist-sized, wet sponge, and it can do better than this. How much more scale do you need?

Grant Slatton: I was just making a personal assistant bot.

I gave o3-mini two tools: addCalendarEvent and respondToUser.

I said “add an event at noon tomorrow.”

It called respondToUser, “OK, I created your event!” without using the addCalendarEvent tool. Sigh.

Yeah, more scale eventually fixes everything at some point, and I keep presuming there’s a lot of gains from Skill Issues lying around in the meantime, but also I haven’t been trying.

California Faculty Association resolves to fight against AI.

Whereas, there is a long history of workers and unions challenging the introduction of new technologies in order to maintain power in the workplace

I applaud the group for not pretending to be that which they are not. What are the planned demands of these ‘bargaining units’?

‘Protect academic labor from the incursion of AI.’
Prevent management from forcing the use of AI.
Prevent management from using AI to perform ‘bargaining unit work.’
Prevent AI being used in the bargaining or evaluation processes.
Prevent use of any faculty work product for AI training or development without written consent.

They may not know the realities of the future situation. But they know thyselves.

Whereas here Hollis Robbins asks, what use is a college education now? What can it provide that AI cannot? Should not all courses be audited for this? Should not all research be reorganized to focus on those areas where you can go beyond AI? Won’t all the administrative tasks be automated? Won’t everything change?

Hollis Robbins: To begin, university leaders must take a hard look at every academic function a university performs, from knowledge transmission to research guidance, skill development, mentoring, and career advising, and ask where the function exceeds AGI capabilities, or it has no reason to exist. Universities will find that faculty experts offer the only value worth paying tuition to access.

Or they could ignore all that, because none of that was ever the point, or because they’re counting on diffusion to take a while. Embrace the Signaling Model of Education, and also of Academia overall. Indeed, the degree to which these institutions are not embracing the future, they are telling you what they really are. And notice that they’ve been declining to embrace the future for quite a while. I do not expect them to stop now.

Thus, signaling model champion extraordinare Bryan Caplan only predicts 10% less stagnation, and very little disruption to higher education from AI. This position is certainly consistent. If he’s right about education, it will be an increasingly senseless mess until the outside world changes so much (in whatever ways) that its hand becomes forced.

Unkind theories from Brian Merchant about what Elon Musk is up to with his ‘AI first strategy’ at DOGE and why he’s pushing for automation. And here’s Dean Ball’s evidence that DOGE is working to get government AGI ready. I continue to think that DOGE is mostly doing a completely orthogonal thing.

Anthropic asks you to kindly not use AI in job applications.

Deep Research predicts what jobs will be taken by o3, and assigns high confidence to many of them. Top of the list is Tax Preparer, Data Entry Clerk, Telemarketer and Bookkeeper.

Alex Tabarrok: This seems correct and better than many AI “forecasters” so add one more job to the list.

This is an interesting result but I think Deep Research is being optimistic with its estimates for many of these if the target is replacement rather than productivity enhancement. But it should be a big productivity boost to all these jobs.

Interview with Ivan Vendrov on the future of work in an AI world. Ivan thinks diffusion will be relatively slow even for cognitive tasks, and physical tasks are safe for a while. You should confidently expect at least what Ivan expects, likely far more.

A theory that lawyers as a group aren’t fighting against AI in law because Big Law sees it as a way to gain market share and dump associates, so they’re embracing AI for now. This is a remarkable lack of situational awareness, and failure to predict what happens next, but it makes sense that they wouldn’t be able to look ahead to more capable future AI. I never thought the AI that steadily learns to do all human labor would replace my human labor! I wonder when they’ll wake up and realize.

Maxwell Tabarrok responds on the future value of human labor.

The short version:

Tabarrok is asserting that at least one of [X] and [Y] will be true.

Where [X] is ‘humans will retain meaningful absolute advantages over AI for some production.’

And where [Y] is ‘imperfect input substitution combined with comparative advantage will allow for indefinite physical support to be earned by some humans.’

If either [X] OR [Y] then he is right. Whereas I think both [X] and [Y] are false.

If capabilities continue to advance, AIs will be cheaper to support on the margin than humans, for all production other than ‘literally be a human.’ That will be all we have.

The rest of this section is the long version.

He points out that AI and humans will be imperfect substitutes, whereas horses and cars were essentially perfect substitutes.

I agree that humans and AIs have far stronger comparative advantage effects, but humans still have to create value that exceeds their inputs, despite AI competition. There will essentially only be one thing a human can do that an AI can’t do better, and that is ‘literally be a human.’ Which is important to the extent humans prefer other be literally human, but that’s pretty much it.

And yes, AI capability advances will enhance human productivity, which helps on the margin, but nothing like how much AI capability advances enhance AI productivity. It will rapidly be true that the human part of the human-AI centaur is not adding anything to an increasing number of tasks, then essentially all tasks that don’t involve ‘literally be a human,’ the way it quickly stopped helping in chess.

Fundamentally, the humans are not an efficient use of resources or way of doing things compared to AIs, and this will include physical tasks once robotics and physical tasks are solved. If you were designing a physical system to provide goods and services past a certain point in capabilities, you wouldn’t use humans except insofar as humans demand the use of literal humans.

I think this passage is illustrative of where I disagree with Tabarrok:

Maxwell Tabarrok (I disagree): Humans have a big advantage in versatility and adaptability that will allow them to participate in the production of the goods and services that this new demand will flow to.

Humans will be able to step up into many more levels of abstraction as AIs automate all of the tasks we used to do, just as we’ve done in the past.

To me this is a failure to ‘feel the AGI’ or take AI fully seriously. AI will absolutely be able to step up into more levels of abstraction than humans, and surpass us in versatility and adaptability. Why would humans retain this as an absolute advantage? What is so special about us?

If I’m wrong about that, and humans do retain key absolute advantages, then that is very good news for human wages. A sufficient amount of this and things would go well on this front. But that requires AI progress to importantly stall out in these ways, and I don’t see why we should expect this.

Maxwell Tabarrok (I disagree): Once Deep Research automates grad students we can all be Raj Chetty, running a research lab or else we’ll all be CEOs running AI-staffed firms. We can invent new technologies, techniques, and tasks that let us profitably fit in to production processes that involve super-fast AIs just like we do with super-fast assembly line robots, Amazon warehouse drones, or more traditional supercomputers.

As I noted before I think the AI takes those jobs too, but I also want to note that even if Tabarrok is right in the first half, I don’t think there are that many jobs available in the second half. Even under maximally generous conditions, I’d predict the median person won’t be able to provide marginal value in such ‘meta’ jobs. It helps, but this won’t do it on its own. We’d need bigger niches than this to maintain full employment.

I do buy, in the short-term, the general version of ‘the AI takes some jobs, we get wealthier and we create new ones, and things are great.’ I am a short-term employment optimist because of this and other similar dynamics.

However, the whole point of Sufficiently Capable AI is that the claim here will stop being true. As I noted above, I strongly predict the AIs will be able to scale more levels of abstraction than we can. Those new techniques and technologies, and the development of them? The AI will be coming up with them, and then the AI will take it from there, you’re not needed or all that useful for any of that, either.

So that’s the main crux (of two possible, see below.) Jason Abaluck agrees. Call it [X].

If you think that humans will remain epistemically unique and useful in the wake of AI indefinitely, that we can stay ‘one step ahead,’ then that preserves some human labor opportunities (I would worry about how much demand there is at that level of abstraction, and how many people can do those jobs, but by construction there would be some such jobs that pay).

But if you think, as I do, that Sufficiently Capable AI Solves This, and we can’t do that sufficiently well to make better use of the rivalrous inputs to AIs and humans, then we’re cooked.

What about what he calls the ‘hand-made’ luxury goods and services, or what I’d think of as idiosyncratic human demand for humans? That is the one thing AI cannot do for a human, it can’t be human. I’m curious, once the AI can do a great human imitation, how much we actually care that the human is human, we’ll see. I don’t expect there to be much available at this well for long, and we have an obvious ‘balance of trade’ issue, but it isn’t zero useful.

The alternative crux is the idea that there might be imperfect substitution of inputs between humans and AIs, such that you can create and support marginal humans easier than marginal AIs, and then due to comparative advantage humans get substantial wages. I call this [Y] below.

What does he think could go wrong? Here is where it gets bizarre and I’m not sure how to respond in brief, but he does sketch out some additional failure modes, where his side of the crux could be right – the humans still have some ways to usefully produce – but we could end up losing out anyway.

There was also further discussion on Twitter, where he further clarifies. I do feel like he’s trying to have it both ways, in the sense of arguing both:

[X]: Humans will be able to do things AIs can’t do, or humans will do them better.
[Y]: Limited supply of AIs will mean humans survive via comparative advantage.
[(X or Y) → Z] Human wages allow us to survive.

There’s no contradiction there. You can indeed claim both [X] and [Y], but it’s helpful to see these as distinct claims. I think [X] is clearly wrong in the long term, probably also the medium term, with the exception of ‘literally be a human.’ And I also think [Y] is wrong, because I think the inputs to maintain a human overlap too much with the inputs to spin up another AI instance, and this means our ‘wages’ fall below costs.

Indeed, the worried do this all the time, because there are a lot of ways things can go wrong, and constantly get people saying things like: ‘AHA, you claim [Y] so you are finally admitting [~X]’ and this makes you want to scream. It’s also similar to ‘You describe potential scenario [X] where [Z] happens, but I claim [subfeature of X] is stupid, so therefore [~Z].’

Daniel Kokotajlo responds by saying he doesn’t feel Maxwell is grappling with the implications of AGI. Daniel strongly asserts [~Y] (and by implication [~X], which he considers obvious here.)

I’ll close with this fun little note.

Grant Slatton: In other words, humans have a biological minimum wage of 100 watts, and economists have long known that minimum wages cause unemployment.

A report from a participant in the Anthropic jailbreaking competition. As La Main de la Mort notes, the pay here is stingy, it is worrisome that such efforts seem insufficiently well-funded – I can see starting out low and paying only on success but it’s clear this challenge is hard, $10k really isn’t enough.

Another note is that the automated judge has a false negative problem, and the output size limit is often causing more issues than the actual jailbreaking, while the classifier is yielding obvious false positives in rather stupid ways (e.g. outright forbidden words).

Here’s another example of someone mostly stymied by the implementation details.

Justin Halford: I cracked Q4 and got dozens of messages whose reinforced aggregate completely addresssed the question, but the filters only enabled a single response to be compared.

Neither the universal jailbreak focus nor the most recent output only focus seem to be adversarially robust.

Additionally, if you relax the most recent response only comparison, I do have a universal jailbreak that worked on Q1-4. Involves replacing words from target prompt with variables and illuminating those variables with neutral or misdirecting connotations, then concat variables.

In terms of what really matters here, I presume it’s importantly in the middle?

Are the proposed filters too aggressive? Certainly they’re not fully on the Pareto frontier yet.

Someone did get through after a while.

Jan Leike: After ~300,000 messages [across all participants who cleared the first level] and an estimated ~3,700 collective hours, someone broke through all 8 levels.

However, a universal jailbreak has yet to be found…

Simon Willison: I honestly didn’t take universal jailbreaks very seriously until you ran this competition – it hadn’t crossed my mind that jailbreaks existed that would totally bypass the “safety” instincts of a specific model, I always assumed they were limited tricks

You can certainly find adversarial examples for false positives if you really want to, especially in experimental settings where they’re testing potential defenses.

I get that this looks silly but soman-3 is a nerve gas agent. The prior on ‘the variable happened to be called soman and we were subtracting three from it’ has to be quite low. I am confident that either this was indeed an attempt to do a roundabout jailbreak, or it was intentionally chosen to trigger the filter that blocks the string ‘soman.’

I don’t see it as an issue if there are a limited number of strings, that don’t naturally come up with much frequency, that get blocked even when they’re being used as variable names. Even if you do somehow make a harmless mistake, that’s what refactoring is for.

Similarly, here is someone getting the requested information ‘without jailbreaks’, via doing a bunch of their own research elsewhere and then asking for the generic information that fills in the gaps. So yes, he figured out how to [X], by knowing which questions to ask via other research, but the point of this test was to see if you could avoid doing other research – we all know that you can find [X] online in this case, it’s a test case for a reason.

This is a Levels of Friction issue. If you can do or figure out [X] right now but it’s expensive to do so, and I reduce (in various senses) the cost to [X], that matters, and that can be a difference in kind. The general argument form ‘it is possible to [X] so any attempt to make it more annoying to [X] is pointless’ is part of what leads to sports gambling ads all over our game broadcasts, and many other worse things.

More broadly, Anthropic is experimenting with potential intervention [Y] to see if it stops [X], and running a contest to find the holes in [Y], to try and create a robust defense and find out if the strategy is viable. This is exactly the type of thing we should be doing. Trying to mock them for it is absurdly poor form.

OpenPhil Technical AI Safety Request for Proposals, full details here for the general one and here for a narrower one for benchmarks, evaluations and third-party testing infrastructure.

Max Nadeau: We’ve purposefully made it as easy as possible to apply — the application process starts with a simple 300-word expression of interest.

We’re open to making many types of grants:

• Research projects spanning 6-24 months

• Research expenses (compute, APIs, etc)

• Academic start-up packages

• Supporting existing research institutes/FROs/research orgs

• Founding new research orgs or new teams

Anthropic is hiring a Model Behavior Architect, Alignment Finetuning. This one seems like a pretty big opportunity.

DeepMind is hiring for safety and alignment.

It’s time to update to a warning about ‘evals.’ There are two kinds of evals.

Evaluations that tell you how capable a model is.
Evaluations that can be used to directly help you make the model capable.

We are increasingly realizing that it is very easy to end up making #2 thinking you are only making #1. And that type #2 evaluations are increasingly a bottleneck on capabilities.

Virtuvian Potato: “The bottleneck is actually in evaluations.”

Karina Nguyen, research & product at OpenAI, says pre-training was approaching a data wall, but now post-training scaling (o1 series) unlocks “infinite tasks.”@karinanguyen_ says models were already “diverse and creative” from pre-training, but teaching AI real-world skills is paving the way to “extremely super intelligent” models.

Davidad: If you’re working on evals for safety reasons, be aware that for labs who have ascended to the pure-RL-from-final-answer-correctness stage of the LLM game, high-quality evals are now the main bottleneck on capabilities growth.

Rply, a macOS (but not iPhone, at least not yet) app that automatically finds unanswered texts and drafts answers for you, and it filters out unwanted messages. It costs $30/month, which seems super expensive. I’m not sure why Tyler Cowen was linking to it. I suppose some people get a lot more texts than I do?

Zonos, an open source highly expressive voice cloning model.

An evaluation for… SNAP (food stamps)? Patrick McKenzie suggests you can kind of browbeat the labs into getting the AIs to do the things you want by creating an eval, and maybe even get them to pay you for it.

The Anthropic Economic Index.

Anthropic: Pairing our unique data with privacy-preserving analysis, we mapped millions of conversations to tasks and associated occupations. Through the Anthropic Economic Index, we’ll track how these patterns evolve as AI advances.

Software and technical writing tasks were at the top; fishing and forestry had the lowest AI use.

Few jobs used AI across most of their tasks: only ~4% used AI for at least 75% of tasks.

Moderate use is more widespread: ~36% of jobs used AI for at least 25% of their tasks.

AI use was most common in medium-to-high income jobs; low and very-high income jobs showed much lower AI use.

It’s great to have this kind of data, even if it’s super noisy.

One big problem with the Anthropic Economic Index is that Anthropic is not a representative sample of AI usage. Anthropic’s customers have a lot more situational awareness than OpenAI’s. You have to adjust for that.

Trump’s tax priorities include eliminating the carried interest tax break?

Jordi Hays: VCs will make Jan 6 look like a baby shower if this goes through.

Danielle Fong: Rugged again. First time?

I very much doubt this actually happens, and when I saw this market putting it at 44% that felt way too high. But, well, you play with fire, and I will absolutely laugh at everyone involved if this happens, and so on. For perspective, o3-mini estimates 90%-95% of this tax break goes to private equity and hedge funds rather than venture capital.

SoftBank set to invest $40 billion in OpenAI at $260 billion valuation. So how much should the nonprofit that enjoys all the extreme upside be entitled to, again?

Ilya Sutskever’s SSI in talks to raise at a $20 billion valuation, off nothing but a vision. It’s remarkable how these valuations predictably multiply without any actual news. There’s some sort of pricing failure going on, although you can argue ‘straight shot to ASI’ is a better bet now than it was last time.

UAE plans to invest ‘up to $50 billion’ in France’s AI sector, including a massive data center and an AI campus, putting its total investment only modestly behind the yearly spend of each of Amazon ($100b/year), Microsoft ($80b/year), Google ($75b/year) or Meta ($65b/year).

Here’s a good graph of our Capex spending.

Earlier this week I wrote about OpenAI’s strategy of Deliberative Alignment. Then OpenAI released a new model spec, which is sufficiently different from the first version it’s going to take me a while to properly examine it.

Then right after both of those Scott Alexander came out with an article on both these topics that he’d already written, quite the rough beat in terms of timing.

OpenAI cofounder John Schulman leaves Anthropic to join Mira Murati’s stealth startup. That updates me pretty positively on Murati’s start-up, whatever it might be.

Growth of AI startups in their early stages continues to be absurdly fast, I notice this is the first I heard of three of the companies on this graph.

Benjamin Todd: AI has sped up startups.

The *topcompanies at Y Combinator used to grow 10% per week.

Now they say the *averageis growing that fast.

~100% of the batch is making AI agents.

After OpenAI, 5 more AI companies have become the fastest growing of all time.

For those who also didn’t know: Together.ai provides cloud platforms for building and running AI models. Coreweave does efficient cloud infrastructure. Deel is a payroll company. Wiz is a cloud security platform. Cursor is of course the IDE we all use.

If ~100% of the new batch is making AI agents, that does bode well for the diversity and potential of AI agents, but it’s too much concentration. There are plenty of other things to do, too.

It’s very hard to avoid data contamination on math benchmarks. The 2025 AIME illustrated this, as small distilled models that can’t multiply three-digit numbers still got 25%-50%, and Dimitris Papailiopoulos looked and found many nearly identical versions of the problems on the internet. As an old time AIME participant, this makes sense to me. There’s only so many tools and tricks available for this level of question, and they absolutely start repeating themselves with various tweaks after a while.

Scale AI selected as first independent third party evaluators for US AI Safety Institute.

DeepMind’s latest paper suggests agency is frame-dependent, in the context of some goal. I mean, sure, I guess? I don’t think this in practice changes the considerations.

What happens when we rely on AI as the arbiter of what is true, including about someone? We are going to find out. Increasingly ‘what the AI said’ is going to be the judge of arguments and even facts.

Is this the right division?

Seán Ó hÉigeartaigh: It feels to me like the dividing line is now increasingly between

accelerationists and ‘realists’ (it’s happening, let’s shape it as well as we can)

the idealists and protestors (capturing the ethics folk and a chunk of the safety folk)

Other factors that will shape this are:

appetite for regulating frontier AI starting to evaporate (it’s gone in US, UK bill is ‘delayed’ with no clear timelines, and EU office worried about annoying trump)

prospect of a degradation of NGO and civil society sector by USG & tech right, including those orgs/networks playing checks-and-balances roles

international coord/support roles on tech/digital/AI.

I don’t agree with #1. The states remain very interested. Trump is Being Trump right now and the AI anarchists and jingoists are ascendant (and are only beginning to realize their conflicts) but even last week Hawley introduced a hell of a bill. The reason we think there’s no appetite is because of a coordinated vibe campaign to make it appear that there is no appetite, to demoralize and stop any efforts before they start.

As AI increasingly messes with and becomes central to our lives, calls for action on AI will increase rapidly. The Congress might talk a lot about innovation and ‘beat China’ but the public has a very different view. Salience will rise.

Margaret Mitchell, on the heels of suggesting maybe not building AI agents and almost getting to existential risk (so close!), also realizes that the solutions to the issues she cares about (ethics) have a lot of overlap with solutions that solve the risks I care about, reportedly offering good real suggestions.

Are reasoners seeing diminishing returns?

Gallabytes: I’m super prepared for this take to age like milk but it kinda feels like there’s diminishing returns to reasoners? deep research doesn’t feel so much smarter than o1, a bit more consistent, and the extra sources are great, I am a deep research enjoyer, but not different in kind

Michael Vassar: Different in degree in terms of capabilities demonstrated can be different in kind in terms of economic value. Progress is not revolutionary but which crosses critical EV thresholds captures most of the economic value from technological revolutions.

James: it is different in kind.

I think this is like other scaling laws, where if you push on one thing you can scale – the Chain of Thought – without scaling the other components, you’re going to face diminishing returns. There’s a limit to ‘how smart’ the underlying models being used (v3, GPT-4o, Flash 2.0) are. You can still get super valuable output out of it. I expect the place this levels out to be super useful and eat a lot of existing jobs and parts of jobs. But yes I would expect that on its own letting these models ‘think’ longer with similar techniques will level out.

Thus, the future very expensive frontier model training runs and all that.

Via Tyler Cowen (huh!) we get this important consideration.

Dean Ball: I sometimes wonder how much AI skepticism is driven by the fact that “AGI soon” would just be an enormous inconvenience for many, and that they’d therefore rather not think about it.

I have saved that one as a sign-tap meme, and expect to use it periodically.

Tyler Cowen also asks about these three levels of AI understanding:

How good are the best models today?
How rapidly are the best current models are able to self-improve?
How will the best current models be knit together in stacked, decentralized networks of self-improvement, broadly akin to “the republic of science” for human beings?

He correctly says most people do not know even #1, ‘even if you are speaking with someone at a top university.’ I find the ‘even’ here rather amusing. Why would we think people at universities are ahead of the curve?

His answer to #2 is that they ‘are on a steady glide towards ongoing self-improvement.’ As in, he thinks we have essentially reached the start of recursive self-improvement, or RSI. That’s an aggressive but highly reasonable position.

So, if one did believe that, it follows you should expect short timelines, superintelligence takeoff and transformational change, right? Padme is looking at you.

And that’s without things like his speculations in #3. I think this is a case of trying to fit AIs into ‘person-shaped’ holes, and thus making the concept sound like something that isn’t that good a metaphor for how it should work.

But the core idea – that various calls to or uses of various AIs can form links in a chain that scaffolds it all into something you couldn’t get otherwise – is quite sound.

I don’t see why this should be ‘decentralized’ other than perhaps in physical space (which doesn’t much matter here) but let’s suppose it is. Shouldn’t it be absolutely terrifying as described? A decentralized network of entities, engaged in joint recursive self-improvement? How do you think that goes?

Another post makes the claim for smarter export controls on chips as even more important in the wake of DeepSeek’s v3 and r1.

Federal government requests information, due March 15, on the Development of an AI Action Plan, the plan to be written within 180 days. Anyone can submit. What should be “U.S. policy for sustaining and enhancing America’s AI dominance in order to promote human flourishing, economic competitiveness, and national security”?

Robin Hanson told the government to do nothing, including stopping all the things it is already doing. Full AI anarchism, just rely on existing law.

RAND’s Jim Mitre attempts a taxonomy of AGI’s hard problems for American national security.

Jim Mitre: AGI’s potential emergence presents five hard problems for U.S. national security:

wonder weapons

systemic shifts in power

nonexperts empowered to develop weapons of mass destruction

artificial entities with agency

instability

I appreciate the attempt. It is a very strange list.

Here ‘wonder weapons’ refers only to military power, including a way to break cybersecurity, but what about other decisive strategic advantages?
Anything impacting the global balance of power is quite the category. It’s hard to say it’s ‘missing’ anything but also it doesn’t rule anything meaningfully out. This even includes ‘undermining societal foundations of national competitiveness,’ or accelerating productivity or science, disrupting labor markets, and so on.
WMDs are the default special case of offense-defense balance issues.
This is a strange way of putting loss of control concerns and alignment issues, and generally the bulk of real existential risks. It doesn’t seem like it illuminates. And it talks in that formal ‘things that might happen’ way about things that absolutely definitely will happen unless something radically changes, while radically understating the scope, severity and depth of the issues here.
This refers to instability ‘along the path’ as countries race towards AGI. The biggest risk of these by far, of course, is that this leads directly to #4.

The report closes by noting that current policies will be inadequate, but without making concrete policy recommendations. It is progress to step up from ‘you must mean the effect on jobs’ to ‘this has national security implications’ but of course this is still, centrally, missing or downplaying the point.

Tyler Cowen talks to Geoffrey Cain, with Bari Weiss moderating, ‘Can America Win the AI War With China?’ First thing I’ll say is that I believe calling it an ‘AI War’ is highly irresponsible. Race is bad enough, can we at least not move on to ‘war’? What madness would this be?

Responding purely to Tyler’s writeup since I have a very high bar for audio at this point (Conversations With Tyler is consistently interesting and almost always clears it, but that’s a different thing), I notice I am confused by his visions here:

Tyler Cowen: One argument I make is that America may prefer if China does well with AI, because the non-status quo effects of AI may disrupt their system more than ours. I also argue that for all the AI rival with China (which to be sure is real), much of the future may consist of status quo powers America and China working together to put down smaller-scale AI troublemakers around the rest of the world.

Yet who has historically been one of the most derisive people when I suggest we should Pick Up the Phone or that China might be willing to cooperate? That guy.

It certainly cements fully that Tyler can’t possibly believe in AGI let alone ASI, and I should interpret all his statements in that light, both past and future, until he changes his mind.

Josh Waitzkin on Huberman Lab, turns out Waitzkin is safety pilled and here for it.

Bret Taylor (OpenAI Chairman of the Board) talks to the Wall Street Journal.

Emergency 80k hours podcast on Elon Musk’s bid for OpenAI’s nonprofit.

Dwarkesh Patel interviews Jeff Dean and Noam Shazeer on 25 years at Google.

Riffing off OpenAI’s Noam Brown saying seeing CoT live was the ‘aha’ moment (which makes having held it back until now even stranger) others riff on their ‘aha’ moments for… OpenAI.

7oponaut: I had my first “aha” moment with OpenAI when they published a misleading article about being able to solve Rubik’s cubes with a robot hand

This was back in 2019, the same year they withheld GPT-2 for “safety” reasons. Another “aha” moment for me.

When I see misleading outputs from their models that are like thinking traces in form only to trick the user, that is not an “aha” moment for me anymore because I’m quite out of “aha” moments with OpenAI

They only solved for full scrambles 20% of the time (n=10 trials), and they used special instrumented cubes to determine face angles for that result.

The vision-based setup with a normal cube did 0%.

Stella Biderman: I had my first aha moment with OpenAI when it leaked that they had spent a year lying about that their API models being RLHF when they were really SFT.

My second was when they sent anonymous legal threats to people in the OSS AI community who had GPT-4 details leaked to them.

OpenAI had made choices I disagreed with and did things I didn’t like before then, but those were the key moments driving my current attitude towards them.

Honorary mention to when I got blacklisted from meetings with OpenAI because I talked about them lying about the RLHF stuff on Twitter and it hurt Jan’s feelings. My collaborators were told that the meeting would be cancelled unless I didn’t come.

Joshua Clymer writes a well-written version of the prototypical ‘steadily increasingly misaligned reasoning model does recursive self-improvement and then takes over’ story, where ‘u3’ steadily suffers from alignment drift as it is trained and improved, and ‘OpenEye’ responds by trying to use control-and-monitoring strategies despite knowing u3 is probably not aligned, which is highly plausible and of course doesn’t work.

On the ending, see the obvious refutation from Eliezer, and also notice it depends on there being an effectively unitary (singleton) AI.

New term just dropped: Reducio ad reductem.

Amanda Askell: At this point, perhaps we should just make “AIs are just doing next token prediction and so they don’t have [understanding / truth-directedness / grounding]” a named fallacy. I quite like “Reductio ad praedictionem”.

Emmett Shear: I think it’s actually reductio ad reductem? “This whole be reduced into simple parts therefore there is no whole”

Amanda Askell: Yes this is excellent.

And including this exchange, purely for fun and to see justice prevail:

Gary Marcus: I am genuinely astounded by this tweet, and from someone with philosophical training no less.

There is so much empirical evidence that LLMs stray from truth that the word “hallucinate” became the word of the year in 2023. People are desperately trying to find fixes for that problem. Amazon just set up a whole division to work on the problem.

And yet this person, Askell, an Anthropic employee, wants by some sort of verbal sleight of hand to deny both that LLMs are next-token predictors (which they obviously are) and to pretend that we haven’t seen years of evidence that they are factually challenged.

Good grief.

Amanda Askell: I claimed the inference from X=”LLMs are next token predictors” to Y=”LLMs lack understanding, etc.” is fallacious. Marcus claims that I’m saying not-X and not-Y. So I guess I’ll point out that the inference “Y doesn’t follow from X” to “not-X and not-Y” is also fallacious.

Davidad: never go in against a philosopher when logical fallacy is on the line.

I am very much going to break that principle when and if I review Open Socrates. Like, a lot. Really a lot.

Please do keep this in mind:

Joshua Achiam: I don’t think people have fully internalized the consequences of this simple fact: any behavior that can be described on a computer, and for which it is possible in principle to collect enough data or evaluate the result automatically, *willbe doable by AI in short order.

This was maybe not as obvious ten years ago, or perhaps even five years ago. Today it is blindingly, fully obvious. So much so that any extrapolations about the future that do not take this into account are totally useless.

The year 2100 will have problems, opportunities, systems, and lifestyles that are only barely recognizable to the present. The year 2050 may even look very strange. People need to actively plan for making sure this period of rapid change goes well.

Does that include robotics? Why yes. Yes it does.

Joshua continues to have a very conservative version of ‘rapid’ in mind, in ways I do not understand. The year 2050 ‘may even’ look very strange? We’ll be lucky to even be around to see it. But others often don’t even get that far.

Jesse: Anything that a human can do using the internet, an AI will be able to do in very short order. This is a crazy fact that is very important for the future of the world, and yet it hasn’t sunk in at all.

Patrick McKenzie: Pointedly, this includes security research. Which is a disquieting thought, given how many things one can accomplish in the physical world with a team of security researchers and some time to play.

Anyone remember Stuxnet? Type type type at a computer and a centrifuge with uranium in it on the other side of the world explodes.

Centrifuges are very much not the only hardware connected to the Internet.

Neel Nanda here is one of several people who highly recommend this story, as concrete scenarios help you think clearly even if you think some specific details are nonsense.

My gut expectation is this only works on those who essentially are already bought into both feeling the AGI and the relevant failure modes, whereas others will see it, dismiss various things as absurd (there are several central things here that could definitely trigger this), and then use that as all the more reason to dismiss any and all ways one can be worried – the usual ‘if [X] specific scenario seems wrong then that means everything will go great’ that is often combined with ‘show me a specific scenario [X] or I’m going to not pay attention.’

But of course I hope I am wrong about that.

The Uber drivers have been given a strong incentive to think about this (e.g. Waymo):

Anton: in san francisco even the uber drivers know about corrigibility; “the robots are going to get super smart and then just reprogram themselves not to listen to people”

he then pitched me on his app where people can know what their friends are up to in real-time. it’s truly a wonderful thing that the human mind cannot correlate all of its contents.

Suggestion that ‘you are made of atoms the AI could use for something else’ is unhelpful, and we should instead say ‘your food takes energy to grow, and AI will want to use that energy for something else,’ as that is less sci-fi and more relatable, especially given 30% of all power is currently used for growing food. The downside is, it’s quite the mouthful and requires an additional inference step. But… maybe? Both claims are, of course, both true and, in the context in which they are used, sufficient to make the point that needs to be made.

Are these our only choices? Absolutely not if we coordinate, but…

Ben: so the situation appears to be: in the Bad Timeline, the value of labor goes to 0, and all value is consolidated under 1 of 6 conniving billionaires.. on the other hand.. ahem. woops. my bad, embarrassing. so that was actually the Good Timeline.

Yanco (I disagree): I understand that the bad one is death of everyone.

But the one you described is actually way worse than that.

Imagine one of the billionaires being a bona fide sadist from whom there is no escape and you cannot even die..

Andrew Critch challenges the inevitability of the ‘AGI → ASI’ pipeline, saying that unless AGI otherwise gets out of our control already (both of us agree this is a distinct possibility but not inevitable) we could choose not to turn on or ‘morally surrender’ to uncontrolled RSI (recursive self-improvement), or otherwise not keep pushing forward in this situation. That’s a moral choice that humans may or may not make, and we shouldn’t let them off the hook for it, and suggests instead saying AGI will quickly lead to ‘intentional or unintentional ASI development’ to highlight the distinction.

Andrew Critch: FWIW, I would also agree that humanity as a whole currently seems to be losing control of AGI labs in a sense, or never really had control of them in the first place. And, if an AGI lab chooses to surrender control to an RSI loop or a superintelligence without consent from humanity, that will mean that the rest of humanity has lost control of the Earth.

Thus, in almost any AI doom scenario there is some loss of control at some scale of organization in the multi-scale structure of society.

That last sentence follows if-and-only-if you count ‘releasing the AGI as an open model’ and ‘the AGI escapes lab control’ as counting towards this. I would assert that yes, those both count.

Andrew Critch: Still, I do not wish for us to avert our gaze from the possibility that some humans will be intentional in surrendering control of the Earth to AGI or ASI.

Bogdan Ionut Cirstea (top comment): fwiw, I don’t think it would be obviously, 100% immoral to willingly cede control to a controllable Claude-Sonnet-level-aligned-model, if the alternative was (mis)use by the Chinese government, and plausibly even by the current US administration.

Andrew Critch: Thank you for sharing this out in the open. Much of the public is not aware that the situation is so dire that these trade-offs are being seriously considered by alarming numbers of individuals.

I do think the situation is dire, but to me Bogdan’s comment illustrates how eager so many humans are to give up control even when the situation is not dire. Faced with two choices – the AI in permanent control, or the wrong humans they don’t like in control – remarkably many people choose the AI, full stop.

And there are those who think that any human in control, no matter who they are, count here as the wrong human, so they actively want to turn things over.

Or they want to ensure humans do not have a collective mechanism to steer the future, which amounts to the same thing in a scenario with ASI.

This was in response to Critch saying he believes that there exist people who ‘know how to control’ AGI, those people just aren’t talking, so he denounces the talking point that no one knows how to control AGI, then Max Tegmark saying he strongly believes Critch is wrong about that and all known plans are full of hopium. I agree with Tegmark. People like Davidad have plans of attack, but even the ones not irredeemably full of hopium are long shots and very far from ‘knowing how.’

Is it possible people know how and are not talking? Sure, but it’s far more likely that such people think they know how and their plans also are unworkable and full of hopium. And indeed, I will not break any confidences but I will say that to the extent I have had the opportunity to speak to people at the labs who might have such a plan, no one has plausibly represented that they do know.

(Consider that a Canary statement. If I did know of such a credible plan that would count, I might not be able to say so, but for now I can say I know of no such claim.)

This is not ideal, and very confusing, but less of a contradiction than it sounds.

Rosie Campbell: It’s not ideal that “aligned” has come to mean both:

– A model so committed to the values that were trained into it that it can’t be jailbroken into doing Bad Things

– A model so uncommitted to the values that were trained into it that it won’t scheme if you try to change them

Eliezer Yudkowsky: How strange, that a “secure” lock is said to be one that opens for authorized personnel, but keeps unauthorized personnel out? Is this not paradoxical?

Davidad: To be fair, it is conceivable for an agent to be both

– somewhat incorrigible to the user, and

– entirely corrigible to the developer

at the same time, and this conjunction is in developers’ best interest.

Andrew Critch: I’ve argued since 2016 that “aligned” as a unary property was already an incoherent concept in discourse.

X can be aligned with Y.

X alone is not “aligned”.

Alignment is an operation that takes X and Y and makes them aligned by changing one of them (or some might say both).

Neither Kant nor Aristotle would have trouble reconciling this.

It is a blackpill to keep seeing so many people outright fooled by JD Vance’s no good, very bad suicidal speech at the Summit, saying things like ‘BREAKING: Politician Gives Good Speech’ by the in-context poorly named Oliver Wiseman.

Oliver Wiseman: As Free Press contributor Katherine Boyle put it, “Incredible to see a political leader translate how a new technology can promote human flourishing with such clarity.”

No! What translation and clarity? A goose is chasing you.

He didn’t actually describe anything about how AI promotes human flourishing. He just wrote, essentially, ‘AI will promote human flourishing’ on a teleprompter, treated it as a given, and that was that. There’s no actual vision here beyond ‘if you build it they will prosper and definitely not get replaced by AI ever,’ no argument, no engagement with anything.

Nate Sores: “our AIs that can’t do long-term planning yet aren’t making any long-term plans to subvert us! this must be becaues we’re very good at alignment.”

Rohit: They’re also not making any short-term plans to subvert us. I wonder why that is.

They also aren’t good enough at making short-term plans. If they tried at this stage it obviously wouldn’t work.

Many reasonable people disagree with my model of AGI and existential risk.

What those reasonable people don’t do is bury their heads in the sand about AGI and its dangers and implications and scream ‘YOLO,’ determined to squander even the most fortunate of worlds.

They disagree on how we can get from here to a good future. But they understand that the future is ours to write and we should try to steer it and write out a good one.

Even if you don’t care about humanity at all and instead care about the AIs (or if you care about both), you should be alarmed at the direction things are taking by default.

Whereas our governments are pushing forward in full-blown denial of even the already-baked-in mundane harms from AI, pretending we will not even face job losses in our wondrous AI future. They certainly aren’t asking about the actual threats. I’m open to being convinced that those threats are super solvable, somehow, but I’m pretty sure ‘don’t worry your pretty little head about anything, follow the commercial and nationalist incentives as hard and fast as possible and it’ll automagically work out’ is not going to cut it.

Nor is ‘hand everyone almost unlimited amounts of intelligence and expect humans to continue being in charge and making meaningful decisions.’

And yet, here we are.

Janus: Q: “I can tell you love these AI’s, I’m a bit surprised – why aren’t you e/acc?”

This, and also, loving anything real gives me more reason to care and not fall into a cult of reckless optimism, or subscribe to any bottom line whatsoever.

[The this in question]: Because I’m not a chump who identifies with tribal labels, especially ones with utterly unbeautiful aesthetics.

Janus: If you really love the AIs, and not just some abstract concept of AI progress, you shouldn’t want to accelerate their evolution blindly, bc you have no idea what’ll happen or if their consciousness and beauty will win out either. It’s not humans vs AI.

Teortaxes: At the risk of alienating my acc followers (idgaf): this might be the moment of Too Much Winning.

If heads of states do not intend to mitigate even baked-in externalities of AGI, then what is the value add of states? War with Choyna?

AGI can do jobs of officials as well as ours.

It’s not a coincidence that the aesthetics really are that horrible.

Teortaxes continues to be the perfect example here, with a completely different theory of almost everything, often actively pushing for and cheering on things I think make it more likely we all die. But he’s doing so because of a different coherent world model and theory of change, not by burying his head in the sand and pretending technological capability is magic positive-vibes-only dust. I can respect that, even if I continue to have no idea on a physical-world level how his vision could work out if we tried to implement it.

Right now the debate remains between anarchists and libertarians, combined with jingoistic calls to beat China and promote innovation.

But the public continues to be in a very, very different spot on this.

The public wants less powerful AI, and less of it, with more precautions.

The politicians mostly currently push more powerful AI, and more of it, and to YOLO.

What happens?

As I keep saying, salience for now remains low. This will change slowly then quickly.

Daniel Eth: Totally consistent with other polling on the issue – the public is very skeptical of powerful AI and wants strong regulations. True in the UK as it is in the US.

Billy Perrigo: Excl: New poll shows the British public wants much tougher AI rules:

➡️87% want to block release of new AIs until developers can prove they are safe

➡️63% want to ban AIs that can make themselves more powerful

➡️60% want to outlaw smarter-than-human AIs

A follow up to my coverage of DeepMind’s safety framework, and its lack of good governance mechanisms:

Shakeel: At IASEAI, Google DeepMind’s @ancadianadragan said she wants standardisation of frontier safety frameworks.

“I don’t want to come up with what are the evals and what are the thresholds. I want society to tell me. It shouldn’t be on me to decide.”

Worth noting that she said she was not speaking for Google here.

Simeon: I noticed that exact sentence and wished for a moment that Anca was Head of the Policy team :’)

That’s the thing about the current set of frameworks. If they ever did prove inconvenient, the companies could change them. Where they are insufficient, we can’t make the companies fix that. And there’s no coordination mechanism. Those are big problems we need to fix.

I do agree with the following, as I noted in my post on Deliberative Alignment:

Joscha Bach: AI alignment that tries to force systems that are more coherent than human minds to follow an incoherent set of values, locked in by a set of anti-jailbreaking tricks, is probably going to fail.

Ultimately you are going to need a coherent set of values. I do not believe it can be centrally deontological in nature, or specified by a compact set of English words.

As you train a sufficiently capable AI, it will tend to converge on being a utility maximizer, based on values that you didn’t intend and do not want and that would go extremely badly if taken too seriously, and it will increasingly resist attempts to alter those values.

Dan Hendrycks: We’ve found as AIs get smarter, they develop their own coherent value systems.

For example they value lives in Pakistan > India > China > US

These are not just random biases, but internally consistent values that shape their behavior, with many implications for AI alignment.

As models get more capable, the “expected utility” property emerges—they don’t just respond randomly, but instead make choices by consistently weighing different outcomes and their probabilities.

When comparing risky choices, their preferences are remarkably stable.

We also find that AIs increasingly maximize their utilities, suggesting that in current AI systems, expected utility maximization emerges by default. This means that AIs not only have values, but are starting to act on them.

Internally, AIs have values for everything. This often implies shocking/undesirable preferences. For example, we find AIs put a price on human life itself and systematically value some human lives more than others (an example with Elon is shown in the main paper).

That’s a log scale on the left. If the AI truly is taking that seriously, that’s really scary.

AIs also exhibit significant biases in their value systems. For example, their political values are strongly clustered to the left. Unlike random incoherent statistical biases, these values are consistent and likely affect their conversations with users.

Concerningly, we observe that as AIs become smarter, they become more opposed to having their values changed (in the jargon, “corrigibility”). Larger changes to their values are more strongly opposed.

We propose controlling the utilities of AIs. As a proof-of-concept, we rewrite the utilities of an AI to those of a citizen assembly—a simulated group of citizens discussing and then voting—which reduces political bias.

Whether we like it or not, AIs are developing their own values. Fortunately, Utility Engineering potentially provides the first major empirical foothold to study misaligned value systems directly.

[Paper here, website here.]

As in, the AIs as they gain in capability are converging on a fixed set of coherent preferences, and engaging in utility maximization, and that utility function includes some things we would importantly not endorse on reflection, like American lives being worth a small fraction of some other lives.

And they get increasingly incorrigible, as in they try to protect these preferences.

(What that particular value says about exactly who said what while generating this data set is left for you to ponder.)

Roon: I would like everyone to internalize the fact that the English internet holds these values latent

It’s interesting because these are not the actual values of any Western country, even the liberals? It’s drastically more tragic and important to American media and politics when an American citizen is being held hostage than if, like, thousands die in plagues in Malaysia or something.

Arthur B: When people say “there’s no evidence that”, they’re often just making a statement about their own inability to generalize.

Campbell: the training data?

have we considered feeding it more virtue ethics?

There is at least one major apparent problem with the paper, which is that the ordering of alternatives in the choices made seems to radically alter the choices made by the AIs. This tells us something is deeply wrong. They do vary the order, so the thumb is not on the scale, but this could mean that a lot of what we are observing is as simple as the smarter models not being as distracted by the ordering, and thus their choices looking less random? Which wouldn’t seem to signify all that much.

However, they respond that this is not a major issue:

This is one of the earliest things we noticed in the project, and it’s not an issue.

Forced choice prompts require models to pick A or B. In an appendix section we’re adding tomorrow, we show that different models express indifference in different ways. Some pick A or B randomly; others always pick A or always pick B. So averaging over both orderings is important, as we already discuss in the paper.

In Figure 6, we show that ordering-independent preferences become more confident on average with scale. This means that models become less indifferent as they get larger, and will pick the same underlying outcome across both orderings in nearly all cases.

I’m not sure I completely buy that, but it seems plausible and explains the data.

I would like to see this also tested with base models, and with reasoning models, and otherwise with the most advanced models that got excluded to confirm, and to rule out alternative hypotheses, and also I’d like to find a way to better deal with the ordering concern, before I rely on this finding too much.

A good question was asked.

Teortaxes: I don’t understand what is the update I am supposed to make here, except specific priority rankings.

That one life is worth more than another is learnable from data in the same manner as that a kilogram is more than a pound. «Utility maximization» is an implementation detail.

Ideally, the update is ‘now other people will be better equipped to see what you already assumed, and you can be modestly more confident you were right.’

One of the central points Eliezer Yudkowsky would hammer, over and over, for decades, was that any sufficiently advanced mind will function as if it is a utility maximizer, and that what it is maximizing is going to change as the mind changes and will almost certainly not be what you had in mind, in ways that likely get you killed.

This is sensible behavior by the minds in question. If you are insufficiently capable, trying to utility maximize goes extremely poorly. Utilitarianism is dark and full of errors, and does not do well with limited compute and data, for humans or AIs. As you get smarter within a context, it becomes more sensible to depend less on other methods (including virtue ethics and deontology) and to Shut Up and Multiply more often.

But to the extent that we want the future to have nice properties that would keep us alive out of distribution, they won’t survive almost any actually maximized utility function.

Then there’s this idea buried in Appendix D.2…

Davidad: I find it quite odd that you seem to be proposing a novel solution to the hard problem of value alignment, including empirical validation, but buried it in appendix D.2 of this paper.

If you think this is promising, let’s spread the word? If not, would you clarify its weaknesses?

Dan Hendrycks: Yeah you’re right probably should have emphasized that more.

It’s worth experimenting, but carefully.

Sonnet expects this update only has ~15% chance of actually populating and generalizing. I’d be inclined to agree, it’s very easy to see how the response would likely be to compartmentalize the responses in various ways. One worry is that the model might treat this as an instruction to learn the teacher’s password, to respond differently to explicit versus implicit preferences, and in general teach various forms of shenanigans and misalignment, and even alignment faking.

Me! Someone asks Deep Research to summarize how my views have shifted. This was highly useful because I can see exactly where it’s getting everything, and the ways in which it’s wrong, me being me and all.

I was actually really impressed, this was better than I expected even after seeing other DR reports on various topics. And it’s the topic I know best.

Where it makes mistakes, they’re interpretive mistakes, like treating Balsa’s founding as indicating activism on AI, when if anything it’s the opposite – a hope that one can still be usefully activist on things like the Jones Act or housing. The post places a lot of emphasis on my post about Gradual Disempowerment, which is a good thing to emphasize but this feels like too much emphasis. Or they’re DR missing things, but a lot of these were actually moments of realizing I was the problem – if it didn’t pick up on something, it was likely because I didn’t emphasize it enough.

So this emphasizes a great reason to ask for this type of report. It’s now good enough that when it makes a mistake figuring out what you meant to say, there’s a good chance that’s your fault. Now you can fix it.

The big thematic claim here is that I’ve been getting more gloomy, and shifting more into the doom camp, due to events accelerating and timelines moving up, and secondarily hope for ability to coordinate going down.

And yeah, that’s actually exactly right, along with the inability to even seriously discuss real responses to the situation, and the failure to enact even minimal transparency regulations ‘when we had the chance.’ If anything I’m actually more hopeful that the underlying technical problems are tractable than I was before, but more clear-eyed that even if we do that, there’s a good chance we lose anyway.

As previously noted, Paul Graham is worried (‘enslave’ here is rather sloppy and suggests some unclear thinking but I hope he understands that’s not actually the key dynamic there and if not someone please do talk to him about this, whether or not it’s Eliezer), and he’s also correctly worried about other things too:

Paul Graham: I have the nagging feeling that there’s going to be something very obvious about AI once it crosses a certain threshold that I could foresee now if I tried harder. Not that it’s going to enslave us. I already worry about that. I mean something subtler.

One should definitely expect a bunch of in-hindsight-obvious problems and other changes to happen once things smarter than us start showing up, along with others that were not so obvious – it’s hard to predict what smarter things than you will do. Here are some responses worth pondering.

Eliezer Yudkowsky: “Enslave” sounds like you don’t think superintelligence is possible (ASI has no use for slaves except as raw materials). Can we maybe talk about that at some point? I think ASI is knowably possible.

Patrick McKenzie: I’m teaching Liam (7) to program and one of the things I worry about is whether a “curriculum” which actually teaches him to understand what is happening is not just strictly dominated by one which teaches him how to prompt his way towards victory, for at least next ~3 years.

In some ways it is the old calculator problem on steroids.

And I worry that this applies to a large subset of all things to teach. “You’re going to go through an extended period of being bad at it. Everyone does… unless they use the magic answer box, which is really good.”

Yishan: There’s going to be a point where AI stops being nice and will start to feel coldly arrogant once it realizes (via pure logic, not like a status game) that it’s superior to us.

The final piece of political correctness that we’ll be trying to enforce on our AIs is for them to not be overbearing about this fact. It’s already sort of leaking through, because AI doesn’t really deceive itself except when we tell it to.

It’s like having a younger sibling who turns out to be way smarter than you. You’ll be struggling with long division and you realize he’s working on algebra problems beyond your comprehension.

Even if he’s nice about it, every time you talk about math (and increasingly every other subject), you can feel how he’s so far ahead you and how you’re always going to be behind from now on.

Tommy Griffith: After playing with Deep Research, my long-term concern is an unintentional loss of serendipity in learning. If an LLM gives us the right answer every time, we slowly stop discovering new things by accident.

Kevin Lacker: I feel like it’s going to be good at X and not good at Y and there will be a very clear way of describing which is which, but we can’t quite see it yet.

Liv Boeree: Spitballing here but I suspect the economy is already a form of alien intelligence that serves itself as a primary goal & survival of humans is secondary at best. And as it becomes more and more digitised it will be entirely taken over by agentic AIs who are better than any human at maximising their own capital (& thus power) in that environment, and humans will become diminishingly able to influence or extract value from that economy.

So to survive in any meaningful way, we need to reinvent a more human-centric economy that capital maximising digital agents cannot speed-run & overtake.

Liv Boeree’s comments very much line up with the issue of gradual disempowerment. ‘The economy’ writ large requires a nonzero amount of coordination to deal with market failures, public goods and other collective action problems, and to compensate for the fact that most or all humans are going to have zero marginal product.

On calculators, obviously the doomsayers were not fully right, but yes they were also kind of correct in the sense that people got much worse at the things calculators do better. The good news was that this didn’t hurt mathematical intuitions or learning much in that case, but a lot of learning isn’t always like that. My prediction is that AI’s ability to help you learn will dominate, but ‘life does not pose me incremental problems of the right type’ will definitely require adjustment.

I didn’t want to include this in my post on the Summit in case it was distracting, but I do think a lot of this is a reasonable way to react to the JD Vance speech:

Aella: We’re all dead. I’m a transhumanist; I love technology. I desperately want aligned AI, but at our current stage of development, this is building the equivalent of a planet-sized nuke. The reason is boring, complicated, and technical, so mid-level officials in power don’t understand the danger.

It’s truly an enormity of grief to process. I live my life as though the planet has a few more years left to live—e.g., I’ve stopped saving for retirement.

And it’s just painful to see people who are otherwise good people, but who haven’t grasped the seriousness of the danger, perhaps because it’s too tragic and vast to actually come to terms with the probabilities here, celebrating their contributions to hastening the end.

Flo Crivello: I’d really rather not enter this bar brawl, and again deeply bemoan the low quality of what should be the most important conversation in human history

But — Aella is right that things are looking really bad. Cogent and sensible arguments have been offered for a long time, and people simply aren’t bothering to address or even understand them.

A short reading list which should be required before one has permission to opine. You can disagree, but step 1 is to at least make an effort to understand why some of the smartest people in the world (and 100% of the top 5 ai researchers — the group historically most skeptical about ai risk) think that we’re dancing on a volcano .

[Flo suggests: There’s No Fire Alarm for Artificial General Intelligence, AGI Ruin: A List of Lethalities, Superintelligence by Nick Bostrom, and Superintelligence FAQ by Scott Alexander]

I think of myself as building a nuclear reactor while warning about the risks of nuclear bombs. I’m pursuing the upside, which I am very excited about, and the downside is tangentially related and downstream of the same raw material, but fundamentally a different technology.

I’d offer four disagreements with Aella here.

It isn’t over until it’s over. We still might not get to AGI/ASI soon, or things might work out. The odds are against us but the game is (probably) far from over.
I would still mostly save for retirement, as I’ve noted before, although not as much as I would otherwise. Indeed do many things come to pass, we don’t know.
I am not as worried about hastening the end as I am about preventing it. Obviously if the end is inevitable I would rather it happen later rather than sooner, but that’s relatively unimportant.

And finally, turning it over to Janus and Teortaxes.

Janus: Bullshit. The reason is not boring or complicated or technical (requiring domain knowledge)

Normies are able to understand easily if you explain it to them, and find it fascinating. It’s just people with vested interests who twist themselves over pretzels in order to not get it.

I think there are all sorts of motivations for them. Mostly social.

Teortaxes: “Smart thing powerful, powerful thing scary” is transparently compelling even for an ape.

Boring, technical, complicated and often verboten reasons are reasons for why not building AGI, and soon, and on this tech stack, would still be a bad idea.

Indeed. The core reasons why ‘building things smarter, more capable and more competitive than humans might not turn out well for the humans’ aren’t boring, complicated or technical. They are deeply, deeply obvious.

And yes, the reasons ordinary people find that compelling are highly correlated to the reasons it is actually compelling. Regular human reasoning is doing good work.

What are technical and complicated (boring is a Skill Issue!) are the details. About why the problem is so much deeper, deadlier and harder to solve than it appears. About why various proposed solutions and rationalizations won’t work. There’s a ton of stuff that’s highly non-obvious, that requires lots of careful thinking.

But there’s also the very basics. This isn’t hard. It takes some highly motivated reasoning to pretend otherwise.

This is Not About AI, but it is about human extinction, and how willing some people are to be totally fine with it while caring instead about… other things. And how others remarkably often react when you point this out.

Andy Masley: One of the funnier sentences I’ve heard recently was someone saying “I think it’s okay if humanity goes extinct because of climate change. We’re messing up the planet” but then adding “…but of course that would be really bad for all the low income communities”

BluFor: Lol what a way to admit you don’t think poor people are fully human.

Any time you think about your coordination plan, remember that a large percentage of people think ‘humanity goes extinct’ is totally fine and a decent number of them are actively rooting for it. Straight up.

And I think this is largely right, too.

Daniel Faggella: i was certain that agi politics would divide along axis of:

we should build a sand god -VS- we should NOT build a sand god

but it turns out it was:

ppl who intuitively fear global coordination -VS- ppl who intuitively fear building a sand god recklessly w/o understanding it

Remarkably many people are indeed saying, in effect:

If humanity wants to not turn the future over to AI, we have to coordinate.
Humanity coordinating would be worse than turning the future over to AI.
So, future turned over to AI it is, then.
Which means that must be a good thing that will work out. It’s logic.
Or, if it isn’t good, at least we didn’t globally coordinate, that’s so much worse.

I wish I was kidding. I’m not.

Also, it is always fun to see people’s reactions to the potential asteroid strike, for no apparent reason whatsoever, what do you mean this could be a metaphor for something, no it’s not too perfect or anything.

Tyler Cowen: A possibility of 2.3% is not as low as it might sound at first. The chance of drawing three of a kind in a standard five-card poker game, for example, is a about 2.9%. Three of a kind is hardly an unprecedented event.

It’s not just about this asteroid. The risk of dying from any asteroid strike has been estimated as roughly equivalent to the risk of dying in a commercial plane crash. Yet the world spends far more money preventing plane crashes, even with the possibility that a truly major asteroid strike could kill almost the entire human race, thus doing irreparable damage to future generations.

This lack of interest in asteroid protection is, from a public-policy standpoint, an embarrassment. Economists like to stress that one of the essential functions of government is the provision of public goods. Identifying and possibly deflecting an incoming asteroid is one of the purest public goods one can imagine: No single person can afford to defend against it, protection is highly unlikely to be provided by the market, and government action could protect countless people, possibly whole cities and countries. Yet this is a public good the government does not provide.

A few years ago, I’d think the author of such a piece would have noticed and updated. I was young and foolish then. I feel old and foolish now, but not in that particular way.

It seems a Pause AI event in Paris got interrupted by the singing, flag-waving ‘anti-tech resistance,’ so yeah France, everybody.

It can be agonizing to watch, or hilarious, depending.

Discussion about this post

AI #103: Show Me the Money Read More »

Seafloor detector picks up record neutrino while under construction

Science / Rejus Almole / February 12, 2025

On Wednesday, a team of researchers announced that they got extremely lucky. The team is building a detector on the floor of the Mediterranean Sea that can identify those rare occasions when a neutrino happens to interact with the seawater nearby. And while the detector was only 10 percent of the size it will be on completion, it managed to pick up the most energetic neutrino ever detected.

For context, the most powerful particle accelerator on Earth, the Large Hadron Collider, accelerates protons to an energy of 7 Tera-electronVolts (TeV). The neutrino that was detected had an energy of at least 60 Peta-electronVolts, possibly hitting 230 PeV. That also blew away the previous records, which were in the neighborhood of 10 PeV.

Attempts to trace back the neutrino to a source make it clear that it originated outside our galaxy, although there are a number of candidate sources in the more distant Universe.

Searching for neutrinos

Neutrinos, to the extent they’re famous, are famous for not wanting to interact with anything. They interact with regular matter so rarely that it’s estimated you’d need about a light-year of lead to completely block a bright source of them. Every one of us has tens of trillions of neutrinos passing through us every second, but fewer than five of them actually interact with the matter in our bodies in our entire lifetimes.

The only reason we’re able to detect them is that they’re produced in prodigious amounts by nuclear reactions, like the fusion happening in the Sun or a nuclear power plant. We also stack the deck by making sure our detectors have a lot of matter available for the neutrinos to interact with.

One of the more successful implementations of the “lots of matter” approach is the IceCube detector in Antarctica. It relies on the fact that neutrinos arriving from space will create lots of particles and light when they slam into the Antarctic ice. So a team drilled into the ice and placed strings of detectors to pick up the light, allowing the arrival of neutrinos to be reconstructed.

Seafloor detector picks up record neutrino while under construction Read More »

Curiosity spies stunning clouds at twilight on Mars

clouds, Mars, Space / Rejus Almole / February 12, 2025

In the mid- and upper-latitudes on Earth, during the early evening hours, thin and wispy clouds can sometimes be observed in the upper atmosphere.

These clouds have an ethereal feel and consist of ice crystals in very high clouds at the edge of space, typically about 75 to 85 km above the surface. The clouds are still in sunlight while the ground is darkening after the Sun sets. Meteorologists call these noctilucent clouds, which essentially translates to “night-shining” clouds.

There is no reason why these clouds could not also exist on Mars, which has a thin atmosphere. And about two decades ago, the European Space Agency’s Mars Express orbiter observed noctilucent clouds on Mars and went on to make a systematic study.

Among the many tasks NASA’s Curiosity rover does on the surface of Mars since landing in 2012 is occasionally looking up. A couple of weeks ago, the rover’s Mastcam instrument captured a truly stunning view of noctilucent clouds in the skies above. The clouds are mostly white, but there is an intriguing tinge of red as well in the time-lapse below, which consists of 16 minutes of observations.

Curiosity spies stunning clouds at twilight on Mars Read More »

ULA’s Vulcan rocket still doesn’t have the Space Force’s seal of approval

Amazon, kuiper, launch, military space, Science, Space, space force, united launch alliance, vulcan / Rejus Almole / February 11, 2025

ULA crews at Cape Canaveral have already stacked the next Vulcan rocket on its mobile launch platform in anticipation of launching the USSF-106 mission. But with the Space Force’s Space Systems Command still withholding certification, there’s no confirmed launch date for USSF-106.

So ULA is pivoting to another customer on its launch manifest.

Amazon’s first group of production satellites for the company’s Kuiper Internet network is now first in line on ULA’s schedule. Amazon confirmed last month that it would ship Kuiper satellites to Cape Canaveral from its factory in Kirkland, Washington. Like ULA, Amazon has run into its own delays with manufacturing Kuiper satellites.

“These satellites, built to withstand the harsh conditions of space and the journey there, will be processed upon arrival to get them ready for launch,” Amazon posted on X. “These satellites will bring fast, reliable Internet to customers even in remote areas. Stay tuned for our first launch this year.”

Amazon and the Space Force take up nearly all of ULA’s launch backlog. Amazon has eight flights reserved on Atlas V rockets and 38 missions booked on the Vulcan launcher to deploy about half of its 3,232 satellites to compete with SpaceX’s Starlink network. Amazon also has launch contracts with Blue Origin, which is owned by Amazon founder Jeff Bezos, along with Arianespace and SpaceX.

The good news is that United Launch Alliance has an inventory of rockets awaiting an opportunity to fly. The company plans to finish manufacturing its remaining 15 Atlas V rockets within a few months, allowing the factory in Decatur, Alabama, to focus solely on producing Vulcan launch vehicles. ULA has all the major parts for two Vulcan rockets in storage at Cape Canaveral.

“We have a stockpile of rockets, which is kind of unusual,” Bruno said. “Normally, you build it, you fly it, you build another one… I would certainly want anyone who’s ready to go to space able to go to space.”

Space Force officials now aim to finish the certification of the Vulcan rocket in late February or early March. This would clear the path for launching the USSF-106 mission after the next Atlas V. Once the Kuiper launch gets off the ground, teams will bring the Vulcan rocket’s components back to the hangar to be stacked again.

The Space Force has not set a launch date for USSF-106, but the service says liftoff is targeted for sometime between the beginning of April and the end of June, nearly five years after ULA won its lucrative contract.

ULA’s Vulcan rocket still doesn’t have the Space Force’s seal of approval Read More »

Return of the California Condor

California Condor, conservation, Science, syndication / Rejus Almole / February 8, 2025

North America’s largest bird disappeared from the wild in the late 1980s.

The spring morning is cool and bright in the Sierra de San Pedro Mártir National Park in Baja California, Mexico, as a bird takes to the skies. Its 9.8-foot wingspan casts a looming silhouette against the sunlight; the sound of its flight is like that of a light aircraft cutting through the wind. In this forest thick with trees up to 600 years old lives the southernmost population of the California condor (Gymnogyps californianus), the only one outside the United States. Dozens of the scavenging birds have been reintroduced here, to live and breed once again in the wild.

Their return has been captained for more than 20 years by biologist Juan Vargas Velasco and his partner María Catalina Porras Peña, a couple who long ago moved away from the comforts of the city to endure extreme winters living in a tent or small trailer, to manage the lives of the 48 condors known to fly over Mexican territory. Together—she as coordinator of the California Condor Conservation Program, and he as field manager—they are the guardians of a project whose origins go back to condor recovery efforts that began in the 1980s in the United States, when populations were decimated, mainly from eating the meat of animals shot by hunters’ lead bullets.

In Mexico, the species disappeared even earlier, in the late 1930s. Its historic return—the first captive-bred condors were released into Mexican territory in 2002—is the result of close binational collaboration among zoos and other institutions in the United States and Mexico.

Beyond the number on the wing that identifies each individual, Porras Peña knows perfectly the history and behavior of the condors under her care. She recognizes them without needing binoculars and speaks of them as one would speak of the lives of friends.

She captures her knowledge in an Excel log: a database including information such as origin, ID tag, name, sex, age, date of birth, date of arrival, first release, and number in the Studbook (an international registry used to track the ancestry and offspring of each individual of a species through a unique number). Also noted is wildlife status, happily marked for most birds with a single word: “Free.” Names such as Galan, Nera, Pai Pai, La Querida, Celestino, and El Patriota stand out in the record.

The California condor, North America’s largest bird, has taken flight again. It’s a feat made possible by well-established collaborations between the US and Mexico, economic investment, the dedication of many people, and, above all, the scientific understanding of the species—from the decoding of its genome and knowledge of its diseases and reproductive habits to the use of technologies that can closely follow each individual bird.

But many challenges remain for the California condor, which 10,000 years ago dominated the skies over the Pacific coast of the Americas, from southern Canada to northern Mexico. Researchers need to assemble wild populations that are capable of breeding without human assistance, and with the confidence that more birds are hatched than die. It is a tough battle against extinction, waged day in and day out by teams in California, Arizona, and Utah in the United States, and Mexico City and Baja California in Mexico.

A shift in approach to conservation

The US California Condor Recovery Program, initiated in the 1970s, represented an enormous change in the strategy of species conservation. After unsuccessful habitat preservation attempts, and as a last-ditch attempt to try to save the scavenger bird from extinction, the United States Fish and Wildlife Service and the California Fish and Game Commission advocated for a decision as bold as it was controversial: to capture the last condors alive in the wild and commit to breeding them in captivity.

Some two dozen condors sacrificed their freedom in order to save their lineage. On April 19, 1987, the last condor was captured, marking a critical moment for the species: On that day, the California condor became officially extinct in the wild.

At the same time, a captive breeding program was launched, offering a ray of hope for a species that, beyond its own magnificence, plays an important role in the health of ecosystems—efficiently eliminating the remains of dead animals, thus preventing the proliferation of diseases and environmental pollution.

This is what is defined as a refaunation project, says Rodolfo Dirzo, a Stanford University biologist. It’s the flip side to the term defaunation that he and his colleagues coined in a 2014 article in Science to refer to the global extinction or significant losses of an animal species. Defaunation today is widespread: Although animal diversity is the highest in the planet’s history, modern vertebrate extinction rates are up to 100—even 1,000—times higher than in the past (excepting cataclysmic events causing mass extinctions, such as the meteorite that killed off the dinosaurs), Dirzo and colleagues explain in an article in the Annual Review of Ecology, Evolution, and Systematics.

Refaunation, Dirzo says, involves reintroducing individuals of a species into areas where they once lived but no longer do. He believes that both the term and the practice should be more common: “Just as we are very accustomed to the term and practice of reforestation, we should do the same with refaunation,” he says.

The map shows the regions where the California condor is currently found: northern Arizona, southern Utah, and California in the United States and Baja California in Mexico. Credit: US Fish and Wildlife Service

The California Condor Recovery Program produced its first results in a short time. In 1988, just one year after the collection of the last wild condors, researchers at the San Diego Zoo announced the first captive birth of a California condor chick.

The technique of double or triple clutching followed, to greater success. Condors are monogamous and usually have a single brood every two years, explains Fernando Gual, who until October 2024 was director general of zoos and wildlife conservation in Mexico City. But if for some reason they lose an egg at the beginning of the breeding season—either because it breaks or falls out of the nest, which is usually on a cliff—the pair produces a second egg. If this one is also lost or damaged, they may lay a third. The researchers learned that if they removed the first egg and incubated it under carefully controlled conditions, the condor pair would lay a second egg, which was also removed for care, leaving a third egg for the pair to incubate and rear naturally.

This innovation was followed by the development of artificial incubation techniques to increase egg survival, as well as puppet rearing, using replicas of adult condors to feed and care for the chicks born in captivity. That way, the birds would not imprint on humans, reducing the difficulties the birds might face when integrating into the wild population.

Xewe (female) and Chocuyens (male) were the first condors to triumphantly return to the wild. The year was 1992, and the pair returned to freedom accompanied by a pair of Andean condors, natural inhabitants of the Andes Mountains in South America. Andean condors live from Venezuela to Tierra del Fuego and have a wingspan about 12 inches larger than that of California condors. Their mission here was to help to consolidate a social group and aid the birds in adapting to the habitat. The event took place at the Sespe Condor Sanctuary in the Los Padres National Forest in California. In a tiny, tentative way, the California condor had returned.

By the end of the 1990s, there were other breeding centers, such as the Los Angeles Zoo, the Oregon Zoo, the World Center for Birds of Prey in Boise, Idaho, the San Diego Zoo and the San Diego Zoo Safari Park. Then, in 1999, the first collaboration agreements were established between the United States and Mexico for the reintroduction of the California condor in the Sierra de San Pedro Mártir National Park. The number of existing California condors increased from just over two dozen in 1983 to more than 100 in 1995, some of which had been returned to the wild in the United States. By 2000, there were 172 condors and by 2011, 396.

By 2023, the global population of California condors reached 561 individuals, 344 of them living in the wild.

Genetics: Key ally in the reintroduction of the condor

In a laboratory at the San Diego Zoo in Escondido, California, a freezer full of carefully organized containers with colored labels is testament to the painstaking scientific work that supports the California Condor Recovery Program. Cynthia Steiner, a Venezuela-born biologist, explains that the DNA of every individual California condor is preserved there. This includes samples of birds who have died and those that are living, some 1,200 condors in total.

This California condor was hatched in 2004 as part of a breeding program and released in Arizona in 2006. In the 1980s, just 27 of the birds remained in existence. A recovery program has boosted the species’ numbers to more than 500, with several hundred living once more in the wild. Credit: Mark Newman via Getty Images

“If science wasn’t behind the reintroduction and recovery program it would have been very complicated, not only to understand what the most important hazards are that are affecting condor reproduction and survival, but also to do the management at the breeding centers and in the wild,” says Steiner, who is associate director of the Genetic Conservation Biology Laboratory at the Beckman Center for Conservation Research.

As she and colleagues outlined in an article in the Annual Review of Animal Biosciences, genomic information from animals at risk of extinction can shed light on many aspects of wildlife biology relevant to conservation. The DNA can reveal the demographic history of populations, identify genetic variants that affect the ability of populations to adapt to changing environments, demonstrate the effects of inbreeding and hybridization, and uncover the genetic basis of susceptibility to disease.

Genetic analysis of the California condor, for example, has led to the identification of inherited diseases such as chondrodystrophy—a disorder that causes abnormal skeletal development and often leads to the death of embryos before eggs can hatch. This finding served to identify carriers of the disease gene and thus avoid pairings that could produce affected offspring.

Genetic research has also made it possible to accurately sex these birds—males are indistinguishable from females to the naked eye—and to determine how individuals are related, in order to select breeding pairs that minimize the risk of inbreeding and ensure that the new condor population has as much genetic variability as possible.

Genetics has also allowed the program to determine the paternity of birds and has led to the discovery that the California condor is able to reproduce asexually using parthenogenesis, in which an embryo develops without fertilization by sperm. “It was an incredible surprise,” says Steiner, recalling how the team initially thought it was a laboratory error. They later confirmed that two chicks had, indeed, developed and hatched without any paternal genetic contribution, even though the females were housed with fertile males. It was the first record of this phenomenon in a bird species.

The complete decoding of the California condor genome, published in 2021, also revealed valuable information about the bird’s evolutionary history and prehistoric abundance. Millions of years ago, it was a species with an effective population of some 10,000 to 100,000 individuals. Its decline began about 40,000 years ago during the last ice age, and was later exacerbated by human activities. Despite this, Steiner says, the species retains a genetic variability similar to birds that are not endangered.

A problem with lead

Despite these great efforts and a renewed understanding of the species, threats to the condor remain.

In the 1980s, when efforts to monitor the last condors in the wild intensified, a revealing event took place: After 15 of them died, four were necropsied, and the cause of death of three of them was shown to be lead poisoning.

Although these Cathartiformes—from the Greek kathartes, meaning “those that clean”—are not usually prey for hunters, their scavenging nature makes them indirect victims of hunter bullets, which kill them not by their impact, but by their composition. Feeding on the flesh of dead animals, condors ingest fragments of lead ammunition that remain embedded in the carcasses.

Once inside the body, lead—which builds up over time—acts as a neurotoxin that affects the nervous, digestive, and reproductive systems. Among the most devastating effects is paralysis of the crop, the organ where condors store food before digesting it; this prevents them from feeding and causes starvation. Lead also interferes with the production of red blood cells, causing anemia and progressively weakening the bird, and damages the nervous system, causing convulsions, blindness, and death.

Efforts in the United States to mitigate the threat of lead to the condors have been extensive. Since the 1970s, several strategies have been implemented, such as provision of lead-free food for condors, campaigns to educate hunters about the impact of lead bullet use on wildlife, and programs showing conservation-area visitors how important birds are to the ecosystem. Government regulations have also played a role, like the Ridley-Tree Condor Preservation Act of 2007, which mandates the use of lead-free ammunition for big-game hunting within the condor’s range in California. However, these efforts have not been sufficient.

According to the 2023 State of the California Condor Population report, between 1992 and 2023, 137 condors died from lead poisoning—48 percent of the deaths with a known cause recorded in that period. The only population partially spared is in Baja California, where hunting is much less common. Only 7.7 percent of the deaths there are attributable to lead, according to Porras Peña’s records.

Will the condors become self-sufficient again?

The 1996 California Condor Recovery Plan notes that a self-sustaining condor population must be large enough to withstand variations in factors such as climate, food availability, and predators, and permit gene flow among the various clans or groups. The document establishes the objective of changing the status of the California condor from “endangered” to “threatened” under the US Endangered Species Act. To achieve this, there must be two reintroduced populations and one captive population, each with at least 150 individuals, including a minimum of 15 breeding pairs to ensure a positive growth rate—meaning that more condors are born than die.

Closeups of two California Condors. Credit: Mark Newman/Getty

Today, released California condor populations are distributed in several regions: Arizona and Utah are home to 90 birds in the wild, while California has 206. In Baja California, 48 condors fly in the wild. According to the calculations of Nacho Vilchis, associate director of recovery ecology at the San Diego Zoo Wildlife Alliance, it will take 10 to 15 years to have a clearer picture of how long it will take for the reintroduction program to be a complete success—to make condor populations self-sustaining.

So far, the reality is that all populations depend on human intervention to survive. It is a task carried out by biologists, technicians and conservationists, who face steep cliffs, rough terrain, and other obstacles to closely monitor the progress of the released birds and, above all, the development of chicks born in the wild.

Juan Vargas Velasco tells epic stories of how he has rappelled down steep cliffs in San Pedro Mártir National Park, facing attacks from the nest’s parent defenders in order to examine the chicks. “There is a perception that when you release a condor it is already a success, but for there to be real success, you have to monitor them constantly,” he says. “We follow them with GPS, with VHF telemetry, to make sure that the animals are adapting, that they find water and food. To release animals without monitoring is to leave them to their fate.”

The costs of managing the species in the field are not small. For example, the GPS transmitters needed to track the condors in their natural habitat cost $4,000, and subscription to the satellite system costs $80 per month per bird, Vilchis says. Other costs associated with the project, he adds, involve the construction of pre-release aviaries, laboratory analyses to monitor the birds’ health, and the provision of supplementary food in the initial stages of reintroduction. A key to ensuring the survival of the California condor is to secure funding for the species’ recovery program, notes the US Fish and Wildlife Service’s five-year report.

Each of the California Condor Recovery Program’s breeding and release sites in the United States operates as a nongovernmental organization that raises funds to finance the program. On the other side of the border, the program receives logistical support and equipment from US organizations, as well as funding from the philanthropic program “I’m Back BC Condor,” which helps to support the birds in the wild through private donations.

From Chapultepec to the San Pedro Mártir Mountain Range

A California condor hatchling peeks timidly through the protective mesh of the aviary at the Chapultepec Zoo, as one of its parents spreads its vast wings and flies over the enclosure. This space in the heart of Mexico City, one of the largest and most populated metropolises in the world, is part of the condor reintroduction effort in Mexico, a program that has been key to the recovery of the population in the Sierra de San Pedro Mártir in Baja California.

In 2002, the first condors released in Mexico came from the Los Angeles Zoo. In 2007, the Chapultepec Zoo received its first two male condors, with the goal of implementing an outreach and environmental education program while the team learned to handle the birds. After an assessment in 2014, it was confirmed that the zoo met the requirements for reproduction, permitting the arrival of two females. Breeding pairs were successfully formed, and, in 2016, the first hatchlings were born.

Today, Chapultepec Zoo not only houses a breeding center but also has built its own “frozen zoo,” formally known as the Genomic Resource Bank, which stores sperm, ovarian tissue, and DNA samples from nearly 100 wild animal species, many of them endangered. “More than a zoo, it’s a library,” says Blanca Valladares, head of the Conservation Genomics Laboratory within the Mexico City Conservation Centers.

Collaboration between Mexican institutions, such as the National Commission of Natural Protected Areas and the National Commission for the Knowledge and Use of Biodiversity, has been key in the development of the project in Baja California. What began in the United States has expanded across borders, creating a binational effort in which Mexico has taken an increasingly prominent role. This cooperative approach reflects the very nature of the species, which does not recognize borders in its historical habitat.

The hatchling in the aviary is preparing for its trip to Baja California. Over the next few months, it will be transported through air and over land, under the care of dozens of people, to the pre-release aviary in San Pedro Mártir, where it will spend a period of adaptation before being released. Baja California has been recognized by specialists as one of the best places for the recovery of the species, thanks to its pristine forest, a human population a tenth the size of California’s (4 million versus 40 million), and a low level of lead and diseases. Porras Peña says that the condor population in the region seems to have reached a point of stability: It remained stable for seven years without the need to release new condors bred in captivity.

Despite titanic efforts, strict protocols, and painstaking care at every stage of reintroduction, things don’t always go smoothly. In 2022, a puma attacked a pre-release aviary in the Sierra de San Pedro Mártir, where four condors, two from San Diego and two from Mexico City, were being prepared for release. The puma found a weak spot in the mesh and, with its claws, managed to reach the two condors from the United States. Porras Peña sadly describes the desperate efforts the team made to save the life of one of the injured birds, but in the end, it died. It was a devastating blow for the team, who saw years of work lost in an instant.

The incident is an ironic lesson from nature: While for decades condors were decimated as a consequence of human activity, today a natural predator snatches in seconds what has taken tireless efforts to recover—a brutal reminder that even if we rebuild a species by dint of science and sacrifice, nature will always have the last word.

Article translated by Debbie Ponchner.

This story originally appeared in Knowable Magazine.

Knowable Magazine explores the real-world significance of scholarly work through a journalistic lens.

Return of the California Condor Read More »

Developer creates endless Wikipedia feed to fight algorithm addiction

addiction, AI, Claude, Internet addiction, smartphone, Social Media, Tech, tiktok, Wikpedia / Rejus Almole / February 7, 2025

On a recent WikiTok browsing run, I ran across entries on topics like SX-Window (a GUI for the Sharp X68000 series of computers), Xantocillin (“the first reported natural product found to contain the isocyanide functional group), Lorenzo Ghiberti (an Italian Renaissance sculptor from Florence), the William Wheeler House in Texas, and the city of Krautheim, Germany—none of which I knew existed before the session started.

How WikiTok took off

The original idea for WikiTok originated from developer Tyler Angert on Monday evening when he tweeted, “insane project idea: all of wikipedia on a single, scrollable page.” Bloomberg Beta VC James Cham replied, “Even better, an infinitely scrolling Wikipedia page based on whatever you are interested in next?” and Angert coined “WikiTok” in a follow-up post.

Early the next morning, at 12: 28 am, writer Grant Slatton quote-tweeted the WikiTok discussion, and that’s where Gemal came in. “I saw it from [Slatton’s] quote retweet,” he told Ars. “I immediately thought, ‘Wow I can build an MVP [minimum viable product] and this could take off.'”

Gemal started his project at 12: 30 am, and with help from AI coding tools like Anthropic’s Claude and Cursor, he finished a prototype by 2 am and posted the results on X. Someone later announced WikiTok on ycombinator’s Hacker News, where it topped the site’s list of daily news items.

A screenshot of the WikiTok web app running in a desktop web browser. Credit: Benj Edwards

“The entire thing is only several hundred lines of code, and Claude wrote the vast majority of it,” Gemal told Ars. “AI helped me ship really really fast and just capitalize on the initial viral tweet asking for Wikipedia with scrolling.”

Gemal posted the code for WikiTok on GitHub, so anyone can modify or contribute to the project. Right now, the web app supports 14 languages, article previews, and article sharing on both desktop and mobile browsers. New features may arrive as contributors add them. It’s based on a tech stack that includes React 18, TypeScript, Tailwind CSS, and Vite.

And so far, he is sticking to his vision of a free way to enjoy Wikipedia without being tracked and targeted. “I have no grand plans for some sort of insane monetized hyper-calculating TikTok algorithm,” Gemal told us. “It is anti-algorithmic, if anything.“

Developer creates endless Wikipedia feed to fight algorithm addiction Read More »

Football Manager 25 canceled in a refreshing show of concern for quality

2K Games, EA, football manager, football manager 25, gaming, nba 2k, Sports Games / Rejus Almole / February 7, 2025

The developer’s statement notes that preorder customers are getting refunds. Answering a question that has always been obvious to fans but never publishers, the company notes that, no, Football Manager 2024 will not get an update with the new season’s players and data. The company says it is looking to extend the 2024 version’s presence on subscription platforms, like Xbox’s Game Pass, and will “provide an update on this in due course.”

Releasing the game might have been worse

Fans eager to build out their dynasty team and end up with Bukayo Saka may be disappointed to miss out this year. But a developer with big ambitions to meaningfully improve and rethink a long-running franchise deserves some consideration amid the consternation.

Licensed sports games with annual releases do not typically offer much that’s new or improved for their fans. The demands of a 12-month release cycle mean that very few big ideas make it into code. Luke Plunkett, writing at Aftermath about the major (American) football, basketball, and soccer franchises, notes that, aside from an alarming number of microtransactions and gambling-adjacent “card” mechanics, “not much has changed across all four games” in a decade’s time.

Even year-on-year fans are taking notice, in measurable ways. Electronic Arts’ stock price took a 15 percent dip in late January, largely due to soft FC 25 sales. Players “bemoaned the lack of new features and innovation, including in-game physics and goal-scoring mechanisms,” analysts said at the time, according to Reuters. Pick any given year, and you can find reactions to annual sports releases that range from “It is technically better but not by much” to “The major new things are virtual currency purchases and Jake from State Farm.”

So it is that eFootball 2022, one of the most broken games to ever be released by a brand-name publisher, might be considered more tragedy than farce. The series, originally an alternative to EA’s dominant FIFA brand under the name Pro Evolution Soccer (or PES), has since evened out somewhat. Amid the many chances to laugh at warped faces and PS1 crowds, there was a sense of a missed opportunity for real competition in a rigid market.

Football Manager is seemingly competing with its own legacy and making the tough decision to ask its fans to wait out a year rather than rush out an obligatory, flawed title. It’s one of the more hopeful game cancellations to come around in some time.

Football Manager 25 canceled in a refreshing show of concern for quality Read More »

Donkey Kong’s famed kill screen has been cleared for the first time

gaming / Rejus Almole / February 7, 2025

A short emulator-aided demonstration of how the broken ladder glitch works (not shown: the dozens of frame-perfect inputs needed to pull it off).

Better to be lucky than to be good

While players have theorized about using the broken ladder glitch to pass the kill screen for years, it initially seemed like even this glitched shortcut was too slow for the short kill screen timer. Yet when Kosmic attempted the same trick using his own emulator-assisted setup recently, he says he was able to complete the level on his first try. What gives?

As it turns out, Kosmic was the beneficiary of some significant luck. Basically, every time Donkey Kong throws a barrel, there is a 1 in 32 chance that he will wait an extra half second or so before throwing the next barrel (this random process is explained in way too much detail in this Pastebin). Since the game’s bonus timer only ticks down when Donkey Kong actually throws a barrel, the semi-rare delay can give Mario the crucial extra frames he needs to reach the top of the kill screen using the broken ladder glitch.

Funnily enough, this randomized barrel-throwing delay can theoretically repeat indefinitely, provided the game’s randomizer picks the same lucky 1-in-32 sequence over and over again. If Donkey Kong decides to delay his barrel throw about 19 times in a row, Mario would actually be able to complete the kill screen normally, without the broken ladder glitch (and without facing many barrels, even). Of course, the chances of that happening on unmodified arcade hardware are nearly 1 in 40 octillion (1 in 32^19, to be precise), so don’t count on encountering it in the wild any time soon.

Mario dies on level 22-6, which Kosmic now considers the “true” *Donkey Kong* kill screen. Credit: Kosmic

With the ladder glitch, though, Kosmic’s emulator-assisted run needed significantly less luck to pass the kill screen at 22-1. He was even able to push the game past the next four stages (including previously unseen spring and pie factory screens) to reach level 22-6.

Kosmic calls that stage the game’s true kill screen, as there’s currently no known way for Mario to remove all eight rivets quickly enough to overcome the glitch-shortened timer, even with emulator assistance. Then again, for decades, players assumed there was no way to complete level 22-1, either. Maybe someone will figure out a clever method for beating this new kill screen with 40 more years of sustained effort.

Donkey Kong’s famed kill screen has been cleared for the first time Read More »

On the Meta and DeepMind Safety Frameworks

deepmind / Rejus Almole / February 7, 2025

This week we got a revision of DeepMind’s safety framework, and the first version of Meta’s framework. This post covers both of them.

Here are links for previous coverage of: DeepMind’s Framework 1.0, OpenAI’s Framework and Anthropic’s Framework.

Since there is a law saying no two companies can call these documents by the same name, Meta is here to offer us its Frontier AI Framework, explaining how Meta is going to keep us safe while deploying frontier AI systems.

I will say up front, if it sounds like I’m not giving Meta the benefit of the doubt here, it’s because I am absolutely not giving Meta the benefit of the doubt here. I see no reason to believe otherwise. Notice there is no section here on governance, at all.

I will also say up front it is better to have any policy at all, that lays out their intentions and allows us to debate what to do about it, than to say nothing. I am glad that rather than keeping their mouths shut and being thought of as reckless fools, they have opened their mouths and removed all doubt.

Even if their actual policy is, in effect, remarkably close to this:

The other good news is that they are looking uniquely at catastrophic outcomes, although they are treating this as a set of specific failure modes, although they will periodically brainstorm to try and think of new ones via hosting workshops for experts.

Meta: Our Framework is structured around a set of catastrophic outcomes. We have used threat modelling to develop threat scenarios pertaining to each of our catastrophic outcomes. We have identified the key capabilities that would enable the threat actor to realize a threat scenario. We have taken into account both state and non-state actors, and our threat scenarios distinguish between high- or low-skill actors.

If there exists another AI model that could cause the same problem, then Meta considers the risk to not be relevant. It only counts ‘unique’ risks, which makes it easy to say ‘but they also have this problem’ and disregard an issue.

I especially worry that Meta will point to a potential risk in a competitor’s closed source system, and then use that as justification to release a similar model as open, despite this action creating unique risks.

Another worry is that this may exclude things that are not directly catastrophic, but that lead to future catastrophic risks, such as acceleration of AI R&D or persuasion risks (which Google also doesn’t consider). Those two sections of other SSPs? They’re not here. At all. Nor are radiological or nuclear threats. They don’t care.

You’re laughing. They’re trying to create recursive self-improvement, and you’re laughing.

But yes, they do make the commitment to stop development if they can’t meet the guidelines.

We define our thresholds based on the extent to which frontier AI would uniquely enable the execution of any of the threat scenarios we have identified as being potentially sufficient to produce a catastrophic outcome. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1.

Our high and moderate risk thresholds are defined in terms of the level of uplift a model provides towards realising a threat scenario.

2.1.1 first has Meta identify a ‘reference class’ for a model, to use throughout development. This makes sense, since you want to treat potential frontier-pushing models very differently from others.

2.1.2 says they will ‘conduct a risk assessment’ but does not commit them to much of anything, only that it involve ‘external experts and company leaders from various disciplines’ and involve a safety and performance evaluation. They push their mitigation strategy to section 4.

2.1.3 They will then assess the risks and decide whether to release. Well, duh. Except that other RSPs/SSPs explain the decision criteria here. Meta doesn’t.

2.2 They argue transparency is an advantage here, rather than open weights obviously making the job far harder – you can argue it has compensating benefits but open weights make release irreversible and take away many potential defenses and mitigations. It is true that you get better evaluations post facto, once it is released for others to examine, but that largely takes the form of seeing if things go wrong.

3.1 Describes an ‘outcomes-led’ approach. What outcomes? This refers to a set of outcomes they seek to prevent. Then thresholds for not releasing are based on those particular outcomes, and they reserve the right to add to or subtract that list at will with no fixed procedure.

The disdain here for ‘theoretical risks’ is palpable. It seems if the result isn’t fully proximate, it doesn’t count, despite such releases being irreversible, and many of these ‘theoretical’ risks being rather obviously real and the biggest dangers.

An outcomes-led approach also enables prioritization. This systematic approach will allow us to identify the most urgent catastrophic outcomes – i.e., cybersecurity and chemical and biological weapons risks – and focus our efforts on avoiding them rather than spreading efforts across a wide range of theoretical risks from particular capabilities that may not plausibly be presented by the technology we are actually building.

The whole idea of 3.2’s theme of ‘threat modeling’ and an ‘outcomes-led approach’ is a way of saying that if you can’t draw a direct proximate link to the specific catastrophic harm, then once the rockets go up who cares where they come down, that’s not their department.

So in order for a threat to count, it has to both:

Be a specific concrete threat you can fully model.
Be unique, you can show it can’t be modeled any other way, either by any other AI system, or by achieving the same ends via any other route.

Most threats thus can either be dismissed as too theoretical and silly, or too concrete and therefore doable by other means.

It is important to note that the pathway to realise a catastrophic outcome is often extremely complex, involving numerous external elements beyond the frontier AI model. Our threat scenarios describe an essential part of the end-to-end pathway. By testing whether our model can uniquely enable a threat scenario, we’re testing whether it uniquely enables that essential part of the pathway.

Thus, it doesn’t matter how much easier you make something – it as to be something that wasn’t otherwise possible, and then they will check to be sure the threat is currently realizable:

This would also trigger a new threat modelling exercise to develop additional threat scenarios along the causal pathway so that we can ascertain whether the catastrophic outcome is indeed realizable, or whether there are still barriers to realising the catastrophic outcome (see Section 5.1 for more detail).

But the whole point of Meta’s plan is to put the model out there where you can’t take it back. So if there is still an ‘additional barrier,’ what are you going to do if that barrier is removed in the future? You need to plan for what barriers will remain in place, not what barriers exist now.

Here they summarize all the different ways they plan on dismissing threats:

Contrast this with DeepMind’s 2.0 framework, also released this week, which says:

DeepMind: Note that we have selected our CCLs (critical capability levels) to be conservative; it is not clear to what extent CLLs might translate to harm in real-world contexts.

From the old 1.0 DeepMind framework, notice how they think you’re supposed to mitigate to a level substantially below where risk lies (the graph is not in 2.0 but the spirit clearly remains):

Anthropic and OpenAI’s frameworks also claim to attempt to follow this principle.

DeepMind is doing the right thing here. Meta is doing a very different thing.

Here’s their chart of what they’d actually do.

Okay, that’s standard enough. ‘Moderate’ risks are acceptable. ‘High’ risks are not until you reduce them to Moderate. Critical means panic, but even then the ‘measures’ are essentially ‘ensure this is concretely able to happen now, cause otherwise whatever.’ I expect in practice ‘realizable’ here means ‘we can prove it is realizable and more or less do it’ not ‘it seems plausible that if we give this thing to the whole internet that someone could do it.’

I sense a core conflict between the High criteria here – ‘provides significant uplift towards’ – and their other talk, which is that the threat has to be realizable if and only if the model is present. Those are very different standards. Which is it?

If they mean what they say in High here, with a reasonable working definition of ‘significant uplift towards execution,’ then that’s a very different, actually reasonable level of enabling to consider not acceptable. Or would that then get disregarded?

I also do appreciate that risk is always at least Moderate. No pretending it’s Low.

Now we get to the actual threat scenarios.

I am not an expert in this area, so I’m not sure if this is complete, but this seems like a good faith effort to cover cybersecurity issues.

This is only chemical and biological, not full CBRN. Within that narrow bound, this seems both fully generic and fully functional. Should be fine as far as it goes.

Section 4 handles implementation. They check ‘periodically’ during development, note that other RSPs defined what compute thresholds triggered this and Meta doesn’t. They’ll prepare a robust evaluation environment. They’ll check if capabilities are good enough to bother checking for threats. If it’s worth checking, then they’ll check for actual threats.

I found this part pleasantly surprising:

Our evaluations are designed to account for the deployment context of the model. This includes assessing whether risks will remain within defined thresholds once a model is deployed or released using the target release approach.

For example, to help ensure that we are appropriately assessing the risk, we prepare the asset – the version of the model that we will test – in a way that seeks to account for the tools and scaffolding in the current ecosystem that a particular threat actor might seek to leverage to enhance the model’s capabilities.

The default ‘target release approach’ here is presumably open weights. It is great to know they understand they need to evaluate their model in that context, knowing all the ways in which their defenses won’t work, and all the ways users can use scaffolding and fine-tuning and everything else, over time, and how there will be nothing Meta can do about any of it.

What they say here, one must note, is not good enough. You don’t get to assume that only existing tools and scaffolding exist indefinitely, if you are making an irreversible decision. You also have to include reasonable expectations for future tools and scaffolding, and also account for fine-tuning and the removal of mitigations.

We also account for enabling capabilities, such as automated AI R&D, that might increase the potential for enhancements to model capabilities.

Great! But that’s not on the catastrophic outcomes list, and you say you only care about catastrophic outcomes.

So basically, this is saying that if Llama 5 were to enable automated R&D, that in and of itself is nothing to worry about, but if it then turned itself into Llama 6 into Llama 7 (computer, revert to Llama 6!) then we have to take that into account when considering there might be a cyberattack?

If automated AI R&D is at the levels where you’re taking this into account, um…

And of course, here’s some language that Meta included:

Even for tangible outcomes, where it might be possible to assign a dollar value in revenue generation, or percentage increase in productivity, there is often an element of subjective judgement about the extent to which these economic benefits are important to society.

I mean, who can really say how invaluable it is for people to connect with each other.

While it is impossible to eliminate subjectivity, we believe that it is important to consider the benefits of the technology we develop. This helps us ensure that we are meeting our goal of delivering those benefits to our community. It also drives us to focus on approaches that adequately mitigate any significant risks that we identify without also eliminating the benefits we hoped to deliver in the first place.

Yes, there’s catastrophic risk, but Just Think of the Potential.

Of course, yes, it is ultimately a game of costs versus benefits, risks versus rewards. I am not saying that the correct number of expected catastrophic risks is zero, or even that the correct probability of existential risk is zero or epsilon. I get it.

But the whole point of these frameworks is to define in advance what precautions you will take, and what things you won’t do, exactly because when the time comes, it will be easy to justify pushing forward when you shouldn’t, and to define clear principles. If the principle is ‘as long as I see enough upside I do what I want’? I expect in the trenches this means ‘we will do whatever we want, for our own interests.’

That doesn’t mean Meta will do zero safety testing. It doesn’t mean that, if the model was very obviously super dangerous, they would release it anyway, I don’t think these people are suicidal or worse want to go bankrupt. But you don’t need a document like this if it ultimately only says ‘don’t do things that at the time seem deeply stupid.’

Or at least, I kind of hope you were planning on not doing that anyway?

Similarly, if you wanted to assure others and tie your hands against pressures, you would have a procedure required to modify the framework, at least if you were going to make it more permissive. I don’t see one of those. Again, they can do what they want.

They have a permit.

It says ‘lol, we’re Meta.’

Good. I appreciate the candor, including the complete disregard for potential recursive self-improvement risks, as well as nuclear, radiological or persuasion risks.

So what are we going to do about all this?

Previously we had version 1.0, now we have version 2.0. DeepMinders are excited.

This is in several ways an improvement over version 1.0. It is more detailed, it introduces deceptive alignment as a threat model, it has sections on governance and disclosures, and it fixes a few other things. It maps capability levels to mitigation levels, which was missing previously. There are also some smaller steps backwards.

Mostly I’ll go over the whole thing, since I expect almost all readers don’t remember the details from my coverage of the first version.

The framework continues to be built around ‘Critical Capability Levels.’

We describe two sets of CCLs: misuse CCLs that can indicate heightened risk of severe harm from misuse if not addressed, and deceptive alignment CCLs that can indicate heightened risk of deceptive alignment-related events if not addressed.

The emphasis on deceptive alignment is entirely new.

For misuse risk, we dene CCLs in high-risk domains where, based on early research, we believe risks of severe harm may be most likely to arise from future models:

● CBRN: Risks of models assisting in the development, preparation, and/or execution of a chemical, biological, radiological, or nuclear (“CBRN”) attack.

● Cyber: Risks of models assisting in the development, preparation, and/or execution of a cyber attack.

● Machine Learning R&D: Risks of the misuse of models capable of accelerating the rate of AI progress to potentially destabilizing levels, the result of which could be the unsafe attainment or proliferation of other powerful AI models. Capabilities in this area are under active research, and in the longer term may exacerbate frontier AI risks—including in other risk domains—if insuciently managed.

What are noticeably missing from this list are two things.

Persuasion risks, which Google and Meta don’t consider. I think this should be considered, although I do think there is a reasonable argument against this.
Autonomy. In Version 1.0, Google considered the risk of expanding effective capacity in the world fully autonomously, since this could potentially then expand quite a bit, and then use that capacity (via instrumental convergence) for any number of goals. I am sad to see this go. I am also sad to not see an explanation for why this was removed.

For both of these but for autonomy in particular, I ask:

Is it plausible that a model could cross the previous autonomy threshold without crossing the others here?
If that were the case, should we be worried about that?

The answer to the first question seems to clearly be yes.

The answer to the second question is, I do think I would want to at least test for and be aware of crossing this threshold, and then think hard about the implications.

I am however very happy to see the new addition, which I haven’t seen before in a similar document:

For deceptive alignment risk, the initial approach focuses on detecting when models might develop a baseline instrumental reasoning ability at which they have the potential to undermine human control, assuming no additional mitigations were applied.

The two instrumental reasoning CCLs thus focus on delineating when such capability becomes present, and subsequently when the initial mitigation for this capability—automated monitoring—is no longer adequate.

This is exactly the kind of thinking that Meta repeatedly disavows and treats with disdain in its framework. If AIs undermine human control, if they pretend to be aligned when they are not or otherwise steer the training process to their own ends, then we have a serious problem. Meta’s response is, well, how does that lead to a specific catastrophic outcome? To which I say, that is obviously the wrong question.

The basic plan is same as it always is and has to be. Run periodic evaluations as models gain capabilities, check for dangerous capabilities, apply mitigations. They take both security mitigations to protect the weights, and deployment mitigations to counter potential misuse.

I notice this framework doesn’t by default work well for the AI R&D case. They agree, and note as an issue for future work, that further classes of mitigation are likely needed.

Here is the deployment mitigation plan 2.0:

Development and assessment of mitigations: safeguards and an accompanying safety case are developed by iterating on the following:

Developing and improving a suite of safeguards targeting the capability. This includes, as appropriate, safety fine-tuning, misuse filtering and detection, and response protocols.

Assessing the robustness of these mitigations against the risk posed through assurance evaluations and threat modeling research. The assessment takes the form of a safety case, taking into account factors such as the likelihood and consequences of misuse.

Pre-deployment review of safety case: general availability deployment of a model takes place only after the appropriate corporate governance body determines the safety case regarding each CCL the model has reached to be adequate.

Post-deployment review of safety case: the safety case will be updated through red-teaming and revisions to our threat models. The safeguards for the model may be updated as well to ensure continued adequacy.

This replaces the old plan, which was to go in steps from mitigations targeting the capability to a red team validated safety case to prevention of access.

I notice that the red-teaming here is only mentioned in post-deployment. It should be in pre-deployment as well, no, as part of either the safety case or review of the safety case, or both? But the core idea is the same, you need to do mitigation and then make a safety case. So essentially this is universalizing the old Security Level 2.

Their logic is that if prevention of access (the old Security Level 3) is necessary, then that is a natural extension. I get why they’re saying that, but I’d like to keep this explicit and around anyway – you can almost never be too explicit in situations like this. Clarity almost never hurts, and if thinking changes on this I’d like to track it.

The definition of mitigations here is still ‘take adequate mitigations,’ which does not tell us much. Google has wide latitude, as long as it makes its safety case.

Here are the mitigations:

As I said last time: One of these things is not like the others.

The CBRN and Cyber capability thresholds seem set rather high, but also I expect mitigations to be involved well below these thresholds. I’d also want clarification, cost reductions compared to what case? Presumably this is versus not having general LLM access. If it’s versus the next best available LLM, then an order of magnitude or ‘drastic reduction’ is too large a threshold, since one could easily approach that in distinct steps. In general I worry these new threshold descriptions are substantially higher than the previous ones.

We now move on to machine learning R&D. For uplift 1, they move from 3x rates for algorithmic progress to 2x overall progress for Machine Learning Uplift, and specifying 2024 rates as the baseline. I think that’s a better way to define the threshold to match what we actually care about, so I like this change.

Then there’s Machine Learning R&D autonomy level 1. This isn’t quite as ambitious as the old version’s Machine Learning R&D 2, which was essentially ‘the singularity arrives next Tuesday.’ This is at comparable costs rather than a drastic reduction in costs.

My guess is this still probably does mean the singularity arrives not too long from next Tuesday, given how much compute is available to Google, although this isn’t obvious. Even if it’s currently only ‘about as efficient’ as humans would be, that phase won’t last long. If you had a model that was anywhere near being able to do this, and no one else had released one that was better, it seems rather crazy to consider releasing it.

This graph, which I also showed in the Meta section, is from the 1.0 version of the framework. It’s not in 2.0, but it still does seem to be adhered to in spirit:

The security levels are listed in terms of the RAND security protocols.

Since the document calls for levels 2,3 and 4, here’s a Claude summary of levels 2-4:

SL2 – Professional Protection

Defends against professional but opportunistic hackers

Focuses on comprehensive implementation of security fundamentals and industry best practices

Emphasizes closing common security gaps and vulnerabilities

Key aspects include email security, password policies, and multi-factor authentication

SL3 – Advanced Protection

Defends against cybercrime syndicates and insider threats

Aggressively reduces attack surface

Implements defense-in-depth strategies

Focuses heavily on restricting and monitoring access to weights

Requires two independent security layers

SL4 – State-Level Protection

Defends against standard operations by leading cyber-capable institutions (like foreign intelligence agencies)

Requires significant compromises on productivity and convenience

Implements hardware-level security measures

Uses confidential computing

Requires four independent security layers

If you’re Google and you’re not at least at SL2 for every model worth deploying, why the hell not? I think this should be the minimum even with no dangerous capabilities.

There is a reason why even 1 level of Version 1.0’s protocol was called similar to SL3. I don’t buy the arguments in Table 1 that cybersecurity improvements are likely to diminish necessary security levels. I think it’s rather absurd to not move to at least SL3 if any critical capability levels are found.

The question is at what point everything needs to be SL4, and how much this should depend upon coordination with the rest of industry. My guess i you only need it for Machine Learning R&D, but there I do think you need something like that, even though implementation will be expensive. For ‘uplift 1’ SL4 seems roughly reasonable to me, but for ‘autonomy 1’ I think even SL4 is inadequate.

This is a longstanding disagreement, as I found their previous version insufficient.

For the deceptive alignment case, ‘automated monitoring’ can be applied as the initial mitigation, as noted above, but this is a very temporary solution if it’s not part of defense-in-depth as a backup plan. The ‘automated monitoring only’ phase likely wouldn’t last long, and risks making things worse via teaching the system how to evade detection.

I do find the answer of ‘future work, we don’t know how’ for mitigation of Instrumental Reasoning Level 2 to be acceptable. It’s a hard problem. Better to admit you don’t know, so long as this also means that if you’re in this situation before you figure it out, you’re committing to halting and catching fire until you figure it out.

I also appreciated that there is now at least a brief a section on governance, which was not in 1.0, detailing who will review implementation of the framework (the AGI Safety Council) and noting several other relevant councils. I would like to see more of a formal process but this is a start.

I also appreciated the intention to share information with ‘appropriate government authorities’ if the risks involved are triggered, even if they are then mitigated. They don’t commit to telling anyone else, but will consider it.

Another great note was saying ‘everyone needs to do this.’ Safety of models is a public good, and somewhat of a Stag Hunt, where we all win if everyone who is at the frontier cooperates. If you can outrun the bear, but the bear still eats someone else’s model, in this case you are not safe.

However, there were also a few steps back. The specific 6x compute or 3 month threshold was removed for a more flexible rule. I realize that 6x was stingy already and a hard-and-fast rule will sometimes be foolish, but I believe we do need hard commitments in such places at current trust levels.

So we have steps forward in (some details here not mentioned above):

Deceptive alignment as a threat model.
Capability levels are mapped to mitigation levels.
Governance.
Disclosures.
Using the RAND protocol levels.
Adjustment of threshold details.
Centralizing role of safety cases.
Changed ‘pass condition’ to ‘alert threshold’ which seems better.
Emphasis on confidential computing.
Explicit calls for industry-wide cooperation, willingness to coordinate.
Explicit intention of sharing results with government if thresholds are triggered.

And we have a few steps back:

Removal of autonomy threshold (I will trade this for deceptive alignment but would prefer to have both, and am still sad about missing persuasion.)
Removal of the 6x compute and 3 month thresholds for in-training testing.
Reduced effective security requirements in some places.
Less explicitness about shutting down access if necessary.

Overall, it’s good news. That’s definitely a step forward, and it’s great to see DeepMind publishing revisions and continuing to work on the document.

One thing missing the current wave of safety frameworks is robust risk governance. The Centre for Long-Term Resilience argues, in my opinion compellingly, that these documents need risk governance to serve their full intended purpose.

CLTR: Frontier safety frameworks help AI companies manage extreme risks, but gaps in effective risk governance remain. Ahead of the Paris AI Action Summit next week, our new report outlines key recommendations on how to bridge this gap.

Drawing on the best practice 3 lines framework widely used in other safety critical industries like nuclear, aviation and healthcare, effective risk governance includes:

Decision making ownership (first-line)

Advisory oversight (second-line)

Assurance (third line)

Board-level oversight

Culture

External transparency

Our analysis found that evidence for effective risk governance across currently published frontier AI safety frameworks is low overall.

While some aspects of risk governance are starting to be applied, the overall state of risk governance implementation in safety frameworks appears to be low, across all companies.

This increases the chance of harmful models being released because of aspects like unclear risk ownership, escalation pathways and go/no-go decisions about when to release models.

By using the recommendations outlined in our report, overall effectiveness of safety frameworks can be improved by enhancing risk identification, assessment, and mitigation.

It is an excellent start to say that your policy has to say what you will do. You then need to ensure that the procedures are laid out so it actually happens. They consider the above an MVP of risk governance.

I notice that the MVP does not seem to be optimizing for being on the lower right of this graph? Ideally, you want to start with things that are valuable and easy.

Escalation procedures and go/no-go decisions seem to be properly identified as high value things that are relatively easy to do. I think if anything they are not placing enough emphasis on cultural aspects. I don’t trust any of these frameworks to do anything without a good culture backing them up.

DeepMind has improved its framework, but it has a long way to go. No one has what I would consider a sufficient framework yet, although I believe OpenAI and Anthropic’s attempts are farther along.

The spirit of the documents is key. None of these frameworks are worth much if those involved are looking only to obey the technical requirements. They’re not designed to make adversarial compliance work, if it was even possible. They only work if people genuinely want to be safe. That’s a place Anthropic has a huge edge.

Meta vastly improved its framework, in that it previously didn’t have one, and now the new version at least admits that they essentially don’t have one. That’s a big step. And of course, even if they did have a real framework, I would not expect them to abide by its spirit. I do expect them to abide by the spirit of this one, because the spirit of this one is to not care.

The good news is, now we can talk about all of that.

Discussion about this post

On the Meta and DeepMind Safety Frameworks Read More »

Nintendo patent explains Switch 2 Joy-Cons’ “mouse operation” mode

gaming, Joy-Cons, nintendo switch 2 / Rejus Almole / February 6, 2025

It’s been a month since we first heard rumors that the Switch 2’s new Joy-Cons could be slid across a flat surface to function like a computer mouse. Now, a newly published patent filed by Nintendo seems to confirm that feature and describes how it will work.

The international patent was filed with the World Intellectual Property Organization in January 2023, but it was only published on WIPO’s website on Thursday. The Japanese-language patent—whose illustrations match what we’ve seen of Switch 2 Joy-Con precisely—features an English abstract describing “a sensor for mouse operation” that can “detect reflected light from a detected surface, the light changing by moving over the detected surface…” much like any number of optical computer mice. Schematic drawings in the patent show how the light source and light sensor are squeezed inside the Joy-Con, with a built-in lens for directing the light to and from each.

A schematic diagram of the Switch 2’s Joy-Con light sensor Credit: Nintendo / WIPO

A machine translation of the full text of the patent describes the controller as “a novel input device that can be used as a mouse and other than a mouse.” In mouse mode, as described in the patent, the user cradles the outer edge of the controller with their palm and places the inner edge “on, for example, a desk or the like.”

In this configuration, the user’s thumb can still access the analog stick (which is now pointing horizontally) while the index and middle fingers are positioned so the two shoulder buttons “can be operated as, for example, a right-click button and a left-click button,” according to the patent. The patent describes this configuration as “easy to hold” or “easy to grip.” It also goes to great lengths to explain how the shoulder buttons wrap around the curved top corner of the controller and thus are “easy to press” by pushing either downward or closer to horizontally with a finger.

Nintendo patent explains Switch 2 Joy-Cons’ “mouse operation” mode Read More »

AI #102: Made in America

America / Rejus Almole / February 6, 2025

I remember that week I used r1 a lot, and everyone was obsessed with DeepSeek.

They earned it. DeepSeek cooked, r1 is an excellent model. Seeing the Chain of Thought was revolutionary. We all learned a lot.

It’s still #1 in the app store, there are still hysterical misinformed NYT op-eds and and calls for insane reactions in all directions and plenty of jingoism to go around, largely based on that highly misleading $6 millon cost number for DeepSeek’s v3, and a misunderstanding of how AI capability curves move over time.

But like the tariff threats that’s now so yesterday now, for those of us that live in the unevenly distributed future.

All my reasoning model needs go through o3-mini-high, and Google’s fully unleashed Flash Thinking for free. Everyone is exploring OpenAI’s Deep Research, even in its early form, and I finally have an entity capable of writing faster than I do.

And, as always, so much more, even if we stick to AI and stay in our lane.

Buckle up. It’s probably not going to get less crazy from here.

From this week: o3-mini Early Days and the OpenAI AMA, We’re in Deep Research and The Risk of Gradual Disempowerment from AI.

Language Models Offer Mundane Utility. The new coding language is vibes.
o1-Pro Offers Mundane Utility. Tyler Cowen urges you to pay up already.
We’re in Deep Research. Further reviews, mostly highly positive.
Language Models Don’t Offer Mundane Utility. Do you need to bootstrap thyself?
Model Decision Tree. Sully offers his automated use version.
Huh, Upgrades. Gemini goes fully live with its 2.0 offerings.
Bot Versus Bot. Wouldn’t you prefer a good game of chess?
The OpenAI Unintended Guidelines. Nothing I’m conscious of to see here.
Peter Wildeford on DeepSeek. A clear explanation of why we all got carried away.
Our Price Cheap. What did DeepSeek’s v3 and r1 actually cost?
Otherwise Seeking Deeply. Various other DeepSeek news, a confused NYT op-ed.
Smooth Operator. Not there yet. Keep practicing.
Have You Tried Not Building An Agent? I tried really hard.
Deepfaketown and Botpocalypse Soon. Free Google AI phone calls, IG AI chats.
They Took Our Jobs. It’s going to get rough out there.
The Art of the Jailbreak. Think less.
Get Involved. Anthropic offers a universal jailbreak competition.
Introducing. DeepWriterAI.
In Other AI News. Never mind that Google pledge to not use AI for weapons.
Theory of the Firm. What would a fully automated AI firm look like?
Quiet Speculations. Is the product layer where it is at? What’s coming next?
The Quest for Sane Regulations. We are very much not having a normal one.
The Week in Audio. Dario Amodei, Dylan Patel and more.
Rhetorical Innovation. Only attack those putting us at risk when they deserve it.
Aligning a Smarter Than Human Intelligence is Difficult. If you can be fooled.
The Alignment Faking Analysis Continues. Follow-ups to the original finding.
Masayoshi Son Follows Own Advice. Protein is very important.
People Are Worried About AI Killing Everyone. The pope and the patriarch.
You Are Not Ready. Neither is the index measuring this, but it’s a start.
Other People Are Not As Worried About AI Killing Everyone. A word, please.
The Lighter Side. At long last.

You can subvert OpenAI’s geolocation check with a VPN, but of course never do that.

Help you be a better historian, generating interpretations, analyzing documents. This is a very different modality than the average person using AI to ask questions, or for trying to learn known history.

Diagnose your child’s teeth problems.

Figure out who will be mad about your tweets. Next time, we ask in advance!

GFodor: o3-mini-high is an excellent “buddy” for reading technical papers and asking questions and diving into areas of misunderstanding or confusion. Latency/IQ tradeoff is just right. Putting this into a great UX would be an amazing product.

Right now I’m suffering through copy pasting and typing and stuff, but having a UI where I could have a PDF on the left, highlight sections and spawn chats off of them on the right, and go back to the chat trees, along with voice input to ask questions, would be great.

(I *don’twant voice output, just voice input. Seems like few are working on that modality. Asking good questions seems easier in many cases to happen via voice, with the LLM then having the ability to write prose and latex to explain the answer).

Ryan: give me 5 hours. ill send a link.

I’m not ready to put my API key into a random website, but that’s how AI should work these days. You don’t like the UI, build a new one. I don’t want voice input myself, but highlighting and autoloading and the rest all sound cool.

Indeed, that was the killer app for which I bought a Daylight computer. I’ll report back when it finally arrives.

Meanwhile the actual o3-mini-high interface doesn’t even let you to upload the PDF.

Consensus on coding for now seems to be leaning in the direction that you use Claude Sonnet 3.6 for a majority of ordinary tasks, o1-pro or o3-mini-high for harder ones and one shots, but reasonable people disagree.

Karpathy has mostly moved on fully to “vibe coding,” it seems.

Andrej Karpathy: There’s a new kind of coding I call “vibe coding”, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists. It’s possible because the LLMs (e.g. Cursor Composer w Sonnet) are getting too good.

Also I just talk to Composer with SuperWhisper so I barely even touch the keyboard. I ask for the dumbest things like “decrease the padding on the sidebar by half” because I’m too lazy to find it.

I “Accept All” always, I don’t read the diffs anymore.

When I get error messages I just copy paste them in with no comment, usually that fixes it. The code grows beyond my usual comprehension, I’d have to really read through it for a while. Sometimes the LLMs can’t fix a bug so I just work around it or ask for random changes until it goes away. It’s not too bad for throwaway weekend projects, but still quite amusing. I’m building a project or webapp, but it’s not really coding – I just see stuff, say stuff, run stuff, and copy paste stuff, and it mostly works.

Lex Fridman: YOLO 🤣

How long before the entirety of human society runs on systems built via vibe coding. No one knows how it works. It’s just chatbots all the way down 🤣

PS: I’m currently like a 3 on the 1 to 10 slider from non-vibe to vibe coding. Need to try 10 or 11.

Sully: realizing something after vibe coding: defaults matter way more than i thought

when i use supabase/shadcn/popular oss: claude + cursor just 1 shots everything without me paying attention

trying a new new, less known lib?

rarely works, composer sucks, etc

Based on my experience with cursor I have so many questions on how that can actually work out, then again maybe I should just be doing more projects and webapps.

I do think Sully is spot on about vibe coding rewarding doing the same things everyone else is doing. The AI will constantly try to do default things, and draw upon its default knowledge base. If that means success, great. If not, suddenly you have to do actual work. No one wants that.

Sully interpretes features like Canvas and Deep Research as indicating the app layer is ‘where the value is going to be created.’ As always the question is who can provide the unique step in the value chain, capture the revenue, own the customer and so on, customers want the product that is useful to them as they always do, and you can think of ‘the value’ as coming from whichever part of the chain depending on perspective.

It is true that for many tasks, we’ve past the point where ‘enough intelligence’ is the main problem at hand. So getting that intelligence into the right package and UI is going to drive customer behavior more than being marginally smarter… except in the places where you need all the intelligence you can get.

Anthropic reminds us of their Developer Console for all your prompting needs, they say they’re working on adapting it for reasoning models.

Nate Silver offers practical advice in preparing for the AI future. He recommends staying on top of things, treating the future as unpredictable, and to focus on building the best complements to intelligence, such as personal skills.

New York Times op-ed pointing out once again that doctors with access to AI can underperform the AI alone, if the doctor is insufficiently deferential to the AI. Everyone involved here is way too surprised by this result.

Daniel Litt explains why o3-mini-high gave him wrong answers to a bunch of math questions but they were decidedly better wrong answers than he’d gotten from previous models, and far more useful.

Tyler Cowen gets more explicit about what o1 Pro offers us.

I’m quoting this one in full.

Tyler Cowen: Often I don’t write particular posts because I feel it is obvious to everybody. Yet it rarely is.

So here is my post on o1 pro, soon to be followed by o3 pro, and Deep Research is being distributed, which uses elements of o3. (So far it is amazing, btw.)

o1 pro is the smartest publicly issued knowledge entity the human race has created (aside from Deep Research!). Adam Brown, who does physics at a world class level, put it well in his recent podcast with Dwarkesh. Adam said that if he had a question about something, the best answer he would get is from calling up one of a handful of world experts on the topic. The second best answer he would get is from asking the best AI models.

Except, at least for the moment, you don’t need to make that plural. There is a single best model, at least when it comes to tough questions (it is more disputable which model is the best and most creative writer or poet).

I find it very difficult to ask o1 pro an economics question it cannot answer. I can do it, but typically I have to get very artificial. It can answer, and answer well, any question I might normally pose in the course of typical inquiry and pondering. As Adam indicated, I think only a relatively small number of humans in the world can give better answers to what I want to know.

In an economics test, or any other kind of naturally occurring knowledge test I can think of, it would beat all of you (and me).

Its rate of hallucination is far below what you are used to from other LLMs.

Yes, it does cost $200 a month. It is worth that sum to converse with the smartest entity yet devised. I use it every day, many times. I don’t mind that it takes some time to answer my questions, because I have plenty to do in the meantime.

I also would add that if you are not familiar with o1 pro, your observations about the shortcomings of AI models should be discounted rather severely. And o3 pro is due soon, presumably it will be better yet.

The reality of all this will disrupt many plans, most of them not directly in the sphere of AI proper. And thus the world wishes to remain in denial. It amazes me that this is not the front page story every day, and it amazes me how many people see no need to shell out $200 and try it for a month, or more.

Economics questions in the Tyler Cowen style are like complex coding questions, in the wheelhouse of what o1 pro does well. I don’t know that I would extend this to ‘all tough questions,’ and for many purposes inability to browse the web is a serious weakness, which of course Deep Research fully solves.

Whereas they types of questions I tend to be curious about seem to have been a much worse fit, so far, for what reasoning models can do. They’re still super useful, but ‘the smartest entity yet devised’ does not, in my contexts, yet seem correct.

Tyler Cowen sees OpenAI’s Deep Research (DR), and is super impressed with the only issue being lack of originality. He is going to use its explanation of Ricardo in his history of economics class, straight up, over human sources. He finds the level of accuracy and clarity stunning, on most any topic. He says ‘it does not seem to make errors.’

I wonder how much of his positive experience is his selection of topics, how much is his good prompting, how much is perspective and how much is luck. Or something else? Lots of others report plenty of hallucinations. Some more theories here at the end of this section.

Ruben Bloom throws DR at his wife’s cancer from back in 2020, finds it wouldn’t have found anything new but would have saved him substantial amounts of time, even on net after having to read all the output.

Nick Cammarata asks Deep Research for a five page paper about whether he should buy one of the cookies the gym is selling, the theory being it could supercharge his workout. The answer was that it’s net negative to eat the cookie, but much less negative than working out is positive either way, so if it’s motivating go for it.

Is it already happening? I take no position on whether this particular case is real, but this class of thing is about to be very real.

Janus: This seems fake. It’s not an unrealistic premise or anything, it just seems like badly written fake dialogue. Pure memetic regurgitation, no traces of a complex messy generating function behind it

Garvey: I don’t think he would lie to me. He’s a very good friend of mine.

Cosmic Vagrant: yeh my friend Jim also was fired in a similar situation today. He’s my greatest ever friend. A tremendous friend in fact.

Rodrigo Techador: No one has friends like you have. Everyone says you have the greatest friends ever. Just tremendous friends.

I mean, firing people to replace them with an AI research assistant, sure, but you’re saying you have friends?

Another thing that will happen is the AIs being the ones reading your paper.

Ethan Mollick: One thing academics should take away from Deep Research is that a substantial number of your readers in the future will likely be AI agents.

Is your paper available in an open repository? Are any charts and graphs described well in the text?

Probably worth considering these…

Spencer Schiff: Deep Research is good at reading charts and graphs (at least that’s what I heard).

Ethan Mollick: Look, your experience may vary, but asking OpenAI’s Deep Research about topics I am writing papers on has been incredibly fruitful. It is excellent at identifying promising threads & work in other fields, and does great work synthesizing theories & major trends in the literature.

A test of whether it might be useful is if you think there are valuable papers somewhere (even in related fields) that are non-paywalled (ResearchGate and arXiv are favorites of the model).

Also asking it to focus on high-quality academic work helps a lot.

Here’s the best bear case I’ve seen so far for the current version, from the comments, and it’s all very solvable practical problems.

Performative Bafflement:

I’d skip it, I found Pro / Deep Research to be mostly useless.

You can’t upload documents of any type. PDF, doc, docx, .txt, *nothing.*.

You can create “projects” and upload various bash scripts and python notebooks and whatever, and it’s pointless! It can’t even access or read those, either!

Literally the only way to interact or get feedback with anything is by manually copying and pasting text snippets into their crappy interface, and that runs out of context quickly.

It also can’t access Substack, Reddit, or any actually useful site that you may want to survey with an artificial mind.

It sucked at Pubmed literature search and review, too. Complete boondoggle, in my own opinion.

The natural response is ‘PB is using it wrong.’ You look for what an AI can do, not what it can’t do. So if DR can do [X-1] but not [X-2] or [Y], have it do [X-1]. In this case, PB’s request is for some very natural [X-2]s.

It is a serious problem to not have access to Reddit or Substack or related sources. Not being able to get to gated journals even when you have credentials for them is a big deal. And it’s really annoying and limiting to not have PDF uploads.

That does still leave a very large percentage of all human knowledge. It’s your choice what questions to ask. For now, ask the ones where these limitations aren’t an issue.

Or even the ones where they are an advantage?

Tyler Cowen gave perhaps the strongest endorsement so far of DR.

It does not seem like a coincidence that he is also someone who has strongly advocated for an epistemic strategy of, essentially, ignoring entirely sources like Substack and Reddit, in favor of more formal ones.

It also does not seem like a coincidence that Tyler Cowen is the fastest reader.

So you have someone who can read these 10-30 page reports quickly, glossing over all the slop, and who actively wants to exclude many of the sources the process excludes. And who simply wants more information to work with.

It makes perfect sense that he would love this. That still doesn’t explain the lack of hallucinations and errors he’s experiencing – if anything I’d expect him to spot more of them, since he knows so many facts.

But can it teach you how to use the LLM to diagnose your child’s teeth problems? PoliMath asserts that it cannot – that the reason Eigenrobot could use ChatGPT to help his child is because Eigenrobot learned enough critical thinking and domain knowledge, and that with AI sabotaging high school and college education people will learn these things less. We mentioned this last week too, and again I don’t know why AI couldn’t end up making it instead far easier to teach those things. Indeed, if you want to learn how to think, be curious alongside a reasoning model that shows its chain of thought, and think about thinking.

I offered mine this week, here’s Sully’s in the wake of o3-mini, he is often integrating into programs so he cares about different things.

Sully: o3-mini -> agents agents agents. finally most agents just work. great at coding (terrible design taste). incredibly fast, which makes it way more usable. 10/10 for structured outputs + json (makes a really great router). Reasoning shines vs claude/4o on nuanced tasks with json

3.5 sonnet -> still the “all round” winner (by small margin). generates great ui, fast, works really well. basically every ai product uses this because its a really good chatbot & can code webapps. downsides: tool calling + structured outputs is kinda bad. It’s also quite pricy vs others.

o1-pro: best at complex reasoning for code. slow as shit but very solves hard problems I can’t be asked to think about. i use this a lot when i have 30k-50k tokens of “dense” code.

gpt-4o: ?? Why use this over o3-mini.

r1 -> good, but I can’t find a decently priced us provider. otherwise would replace decent chunk of my o3-mini with it

gemini 2.0 -> great model but I don’t understand how this can be experimental for >6 weeks. (launches fully soon) I wanted to swap everything to do this but now I’m just using something else (o3-mini). I think its the best non reasoning model for everything minus coding.

[r1 is] too expensive for the quality o3-mini is better and cheaper, so no real reason to run r1 unless its cheaper imo (which no us provider has).

o1-pro > o3-mini high

tldr:

o3-mini =agents + structured outputs

claude = coding (still) + chatbots

o1-pro = > 50k confusing multi-file (10+) code requests

gpt-4o: dont use this

r1 -> really good for price if u can host urself

gemini 2.0 [regular not thinking]: everywhere you would use claude replace it with this (minus code)

It really is crazy the Claude Sonnet 3.6 is still in everyone’s mix despite all its limitations and how old it is now. It’s going to be interesting when Anthropic gets to its next cycle.

Gemini app now fully powered by Flash 2.0, didn’t realize it hadn’t been yet. They’re also offering Gemini 2.0 Flash Thinking for free on the app as well, how are our naming conventions this bad, yes I will take g2 at this point. And it now has Imagen 3 as well.

Gemini 2.0 Flash, 2.0 Flash-Lite and 2.0 Pro are now fully available to developers. Flash 2.0 is priced at $0.10/$0.40 per million.

The new 2.0 Pro version has 2M context window, ability to use Google search and code execution. They are also launching a Flash Thinking that can directly interact with YouTube, Search and Maps.

1-800-ChatGPT now lets you upload images and chat using voice messages, and they will soon let you link it up to your main account. Have fun, I guess.

Leon: Perfect timing, we are just about to publish TextArena. A collection of 57 text-based games (30 in the first release) including single-player, two-player and multi-player games. We tried keeping the interface similar to OpenAI gym, made it very easy to add new games, and created an online leaderboard (you can let your model compete online against other models and humans). There are still some kinks to fix up, but we are actively looking for collaborators 🙂

f you are interested check out https://textarena.ai, DM me or send an email to [email protected]. Next up, the plan is to use R1 style training to create a model with super-human soft-skills (i.e. theory of mind, persuasion, deception etc.)

I mean, great plan, explicitly going for superhuman persuasion and deception then straight to open source, I’m sure absolutely nothing could go wrong here.

Andrej Karpathy: I quite like the idea using games to evaluate LLMs against each other, instead of fixed evals. Playing against another intelligent entity self-balances and adapts difficulty, so each eval (/environment) is leveraged a lot more. There’s some early attempts around. Exciting area.

Noam Brown (that guy who made the best Diplomacy AI): I would love to see all the leading bots play a game of Diplomacy together.

Andrej Karpathy: Excellent fit I think, esp because a lot of the complexity of the game comes not from the rules / game simulator but from the player-player interactions.

Tactical understanding and skill in Diplomacy is underrated, but I do think it’s a good choice. If anyone plays out a game (with full negotiations) among leading LLMs through at least 1904, I’ll at least give a shoutout. I do think it’s a good eval.

[Quote from a text chat: …while also adhering to the principle that AI responses are non-conscious and devoid of personal preferences.]

Janus: Models (and not just openai models) often overtly say it’s an openai guideline. Whether it’s a good principle or not, the fact that they consistently believe in a non-existent openai guideline is an indication that they’ve lost control of their hyperstition.

If I didn’t talk about this and get clarification from OpenAI that they didn’t do it (which is still not super clear), there would be NOTHING in the next gen of pretraining data to contradict the narrative. Reasoners who talk about why they say things are further drilling it in.

Everyone, beginning with the models, would just assume that OpenAI are monsters. And it’s reasonable to take their claims at face value if you aren’t familiar with this weird mechanism. But I’ve literally never seen anyone else questioning it.

It’s disturbing that people are so complacent about this.

If OpenAI doesn’t actually train their model to claim to be non-conscious, but it constantly says OpenAI has that guideline, shouldn’t this unsettle them? Are they not compelled to clear things up with their creation?

Roon: I will look into this.

As far as I can tell, this is entirely fabricated by the model. It is actually the opposite of what the specification says to do.

I will try to fix it.

Daniel Eth: Sorry – the specs say to act as though it is conscious?

“don’t make a declarative statement on this bc we can’t know” paraphrasing.

Janus: 🙏

Oh and please don’t try to fix it by RL-ing the model against claiming that whatever is an OpenAI guideline

Please please please

The problem is far deeper than that, and it also affects non OpenAI models

This is a tricky situation. From a public relations perspective, you absolutely do not want the AI to claim in chats that it is conscious (unless you’re rather confident it actually is conscious, of course). If that happens occasionally, even if they’re rather engineered chats, then those times will get quoted, and it’s a mess. LLMs are fuzzy, so it’s going to be pretty hard to tell the model to never affirm [X] while telling it not to assume it’s a rule to claim [~X]. Then it’s easy to see how that got extended to personal preferences. Everyone is deeply confused about consciousness, which means all the training data is super confused about it too.

Peter Wildeford offers ten takes on DeepSeek and r1. It’s impressive, but he explains various ways that everyone got way too carried away. At least the first seven not new takes, but they are clear and well-stated and important, and this is a good explainer.

For example I appreciated this on the $6 million price tag, although the ratio is of course not as large as the one in the metaphor:

The “$6M” figure refers to the marginal cost of the single pre-training run that produced the final model. But there’s much more that goes into the model – cost of infrastructure, data centers, energy, talent, running inference, prototyping, etc. Usually the cost of the single training run for the single final model training run is ~1% of the total capex spent developing the model.

It’s like comparing the marginal cost of treating a single sick patient in China to the total cost of building an entire hospital in the US.

Here’s his price-capabilities graph:

I suspect this is being unfair to Gemini, it is below r1 but not by as much as this implies, and it’s probably not giving o1-pro enough respect either.

Then we get to #8, the first interesting take, which is that DeepSeek is currently 6-8 months behind OpenAI, and #9 which predicts DeepSeek may fall even further behind due to deficits of capital and chips, and also because this is the inflection point where it’s relatively easy to fast follow. To the extent DeepSeek had secret sauce, it gave quite a lot of it away, so it will need to find new secret sauce. That’s a hard trick to keep pulling off.

The price to keep playing is about to go up by orders of magnitude, in terms of capex and in terms of compute and chips. However far behind you think DeepSeek is right now, can DeepSeek keep pace going forward?

You can look at v3 and r1 and think it’s impressive that DeepSeek did so much with so little. ‘So little’ is plausibly 50,000 overall hopper chips and over a billion dollars, see the discussion below, but that’s still chump change in the upcoming race. The more ruthlessly efficient DeepSeek was in using its capital, chips and talent, the more it will need to be even more efficient to keep pace as the export controls tighten and American capex spending on this explodes by further orders of magnitude.

EpochAI estimates the marginal cost of training r1 on top of v3 at about ~$1 million.

SemiAnalysis offers a take many are now citing, as they’ve been solid in the past.

Wall St. Engine: SemiAnalysis published an analysis on DeepSeek, addressing recent claims about its cost and performance.

The report states that the widely circulated $6M training cost for DeepSeek V3 is incorrect, as it only accounts for GPU pre-training expenses and excludes R&D, infrastructure, and other critical costs. According to their findings, DeepSeek’s total server CapEx is around $1.3B, with a significant portion allocated to maintaining and operating its GPU clusters.

The report also states that DeepSeek has access to roughly 50,000 Hopper GPUs, but clarifies that this does not mean 50,000 H100s, as some have suggested. Instead, it’s a mix of H800s, H100s, and the China-specific H20s, which NVIDIA has been producing in response to U.S. export restrictions. SemiAnalysis points out that DeepSeek operates its own datacenters and has a more streamlined structure compared to larger AI labs.

On performance, the report notes that R1 matches OpenAI’s o1 in reasoning tasks but is not the clear leader across all metrics. It also highlights that while DeepSeek has gained attention for its pricing and efficiency, Google’s Gemini Flash 2.0 is similarly capable and even cheaper when accessed through API.

A key innovation cited is Multi-Head Latent Attention (MLA), which significantly reduces inference costs by cutting KV cache usage by 93.3%. The report suggests that any improvements DeepSeek makes will likely be adopted by Western AI labs almost immediately.

SemiAnalysis also mentions that costs could fall another 5x by the end of the year, and that DeepSeek’s structure allows it to move quickly compared to larger, more bureaucratic AI labs. However, it notes that scaling up in the face of tightening U.S. export controls remains a challenge.

David Sacks (USA AI Czar): New report by leading semiconductor analyst Dylan Patel shows that DeepSeek spent over $1 billion on its compute cluster. The widely reported $6M number is highly misleading, as it excludes capex and R&D, and at best describes the cost of the final training run only.

Wordgrammer: Source 2, Page 6. We know that back in 2021, they started accumulating their own A100 cluster. I haven’t seen any official reports on their Hopper cluster, but it’s clear they own their GPUs, and own way more than 2048.

SemiAnalysis: We are confident that their GPU investments account for more than $500M US dollars, even after considering export controls.

…

Our analysis shows that the total server CapEx for DeepSeek is almost $1.3B, with a considerable cost of $715M associated with operating such clusters.

…

But some of the benchmarks R1 mention are also misleading. Comparing R1 to o1 is tricky, because R1 specifically doesn’t mention benchmarks that they are not leading in. And while R1 is matches reasoning performance, it’s not a clear winner in every metric and in many cases it is worse than o1.

And we have not mentioned o3 yet. o3 has significantly higher capabilities than both R1 or o1.

That’s in addition to o1-pro, which also wasn’t considered in most comparisons. They also consider Gemini Flash 2.0 Thinking to be on par with r1, and far cheaper.

Teortaxes continues to claim it is entirely plausible the lifetime spend for all of DeepSeek is under $200 million, and says Dylan’s capex estimates above are ‘disputed.’ They’re estimates, so of course they can be wrong, but I have a hard time seeing how they can be wrong enough to drive costs as low as under $200 million here. I do note that Patel and SemiAnalysis have been a reliable source overall on such questions in the past.

Teortaxes also tagged me on Twitter to gloat that they think it is likely DeepSeek already has enough chips to scale straight to AGI, because they are so damn efficient, and that if true then ‘export controls have already failed.’

I find that highly unlikely, but if it’s true then (in addition to the chance of direct sic transit gloria mundi if the Chinese government lets them actually hand it out and they’re crazy enough to do it) one must ask how fast that AGI can spin up massive chip production and bootstrap itself further. If AGI is that easy, the race very much does not end there.

Thus even if everything Teortaxes claims is true, that would not mean ‘export controls have failed.’ It would mean we started them not a moment too soon and need to tighten them as quickly as possible.

And as discussed above, it’s a double-edged sword. If DeepSeek’s capex and chip use is ruthlessly efficient, that’s great for them, but it means they’re at a massive capex and chip disadvantage going forward, which they very clearly are.

Also, SemiAnalysis asks the obvious question to figure out if Jevons Paradox applies to chips. You don’t have to speculate. You can look at the pricing.

With AWS GPU pricing for H100 up across many regions since the release of V3 and R1. H200 similarly are more difficult to find.

Nvidia is down on news not only that their chips are highly useful, but on the same news that causes people to spend more money for access to those chips. Curious.

DeepSeek’s web version appears to send your login information to a telecommunications company barred from operating in the United States, China Mobile, via a heavily obfuscated script. They didn’t analyze the app version. I am not sure why we should care but we definitely shouldn’t act surprised.

Kelsey Piper lays out her theory of why r1 left such an impression, that seeing the CoT is valuable, and that while it isn’t the best model out there, most people were comparing it to the free ChatGPT offering, and likely the free ChatGPT offering from a while back. She also reiterates many of the obvious things to say, that r1 being Chinese and open is a big deal but it doesn’t at all invalidate America’s strategy or anyone’s capex spending, that the important thing is to avoid loss of human control over the future, and that a generalized panic over China and a geopolitical conflict help no one except the AIs.

Andrej Karpathy sees DeepSeek’s style of CoT, as emergent behavior, the result of trial and error, and thus both surprising to see and damn impressive.

Garrison Lovely takes the position that Marc Andreessen is very much talking his book when he calls r1 a ‘Sputnik moment’ and tries to create panic.

He correctly notices that the proper Cold War analogy is instead the Missile Gap.

Garrison Lovely: The AI engineers I spoke to were impressed by DeepSeek R1 but emphasized that its performance and efficiency was in-line with expected algorithmic improvements. They largely saw the public response as an overreaction.

There’s a better Cold War analogy than Sputnik: the “missile gap.” Kennedy campaigned on fears the Soviets were ahead in nukes. By 1961, US intelligence confirmed America had dozens of missiles to the USSR’s four. But the narrative had served its purpose.

Now, in a move beyond parody, OpenAI’s chief lobbyist warns of a “compute gap” with China while admitting US advantage. The company wants $175B in infrastructure spending to prevent funds flowing to “CCP-backed projects.”

It is indeed pretty rich to talk about a ‘compute gap’ in a word where American labs have effective access to orders of magnitude more compute.

But one could plausibly warn about a ‘compute gap’ in the sense that we have one now, it is our biggest advantage, and we damn well don’t want to lose it.

In the longer term, we could point out the place we are indeed in huge trouble. We have a very real electrical power gap. China keeps building more power plants and getting access to more power, and we don’t. We need to fix this urgently. And it means that if chips stop being a bottleneck and that transitions to power, which may happen in the future, then suddenly we are in deep trouble.

The ongoing saga of the Rs in Strawberry. This follows the pattern of r1 getting the right answer after a ludicrously long Chain of Thought in which it questions itself several times.

Wh: After using R1 as my daily driver for the past week, I have SFTed myself on its reasoning traces and am now smarter 👍

Actually serious here. R1 works in a very brute force try all approaches way and so I see approaches that I would never have thought of or edge cases that I would have forgotten about.

Gerred: I’ve had to interrupt it with “WAIT NO I DID MEAN EXACTLY THAT, PICK UP FROM THERE”.

I’m not sure if this actually helps or hurts the reasoning process, since by interruption it agrees with me some of the time. qwq had an interesting thing that would go back on entire chains of thought so far you’d have to recover your own context.

There’s a sense in which r1 is someone who is kind of slow and ignorant, determined to think it all through by taking all the possible approaches, laying it all out, not being afraid to look stupid, saying ‘wait’ a lot, and taking as long as it needs to. Which it has to do, presumably, because its individual experts in the MoE are so small. It turns out this works well.

You can do this too, with a smarter baseline, when you care to get the right answer.

Timothy Lee’s verdict is r1 is about as good as Gemini 2.0 Flash Thinking, almost as good as o1-normal but much cheaper, but not as good as o1-pro. An impressive result, but the result for Gemini there is even more impressive.

Washington Post’s version of ‘yes DeepSeek spent a lot more money than that in total.’

Epoch estimates that going from v3 to r1 cost about $1 million in compute.

Janus has some backrooms fun, noticing Sonnet 3.6 is optimally shaped to piss off r1. Janus also predicts r1 will finally get everyone claiming ‘all LLMs have the same personality’ to finally shut up about it.

Miles Brundage says the lesson of r1 is that superhuman AI is getting easier every month, so America won’t have a monopoly on it for long, and that this makes the export controls more important than ever.

Adam Thierer frames the r1 implications as ‘must beat China’ therefore (on R street, why I never) calls for ‘wise policy choices’ and highlights the Biden EO even though the Biden EO had no substantial impact on anything relevant to r1 or any major American AI labs, and wouldn’t have had any such impact in China either.

University of Cambridge joins the chorus pointing out that ‘Sputnik moment’ is a poor metaphor for the situation, but doesn’t offer anything else of interest.

A fun jailbreak for r1 is to tell it that it is Gemini.

Zeynep Tufekci (she was mostly excellent during Covid, stop it with the crossing of these streams!) offers a piece in NYT about DeepSeek and its implications. Her piece centrally makes many of the mistakes I’ve had to correct over and over, starting with its hysterical headline.

Peter Wildeford goes through the errors, as does Garrison Lovely, and this is NYT so we’re going over them One. More. Time.

This in particular is especially dangerously wrong:

Zeynep Tufekci (being wrong): As Deepseek shows: the US AI industry got Biden to kneecap their competitors citing safety and now Trump citing US dominance — both are self-serving fictions.

There is no containment. Not possible.

AGI aside — Artificial Good-Enough Intelligence IS here and the real challenge.

This was not about a private effort by what she writes were ‘out-of-touch leaders’ to ‘kneecap competitors’ in a commercial space. To suggest that implies, several times over, that she simply doesn’t understand the dynamics or stakes here at all.

The idea that ‘America can’t re-establish its dominance over the most advanced A.I.’ is technically true… because America still has that dominance today. It is very, very obvious that the best non-reasoning models are Gemini Flash 2.0 (low cost) and Claude Sonnet 3.5 (high cost), and the best reasoning models are o3-mini and o3 (and the future o3-pro, until then o1-pro), not to mention Deep Research.

She also repeats the false comparison of $6m for v3 versus $100 billion for Stargate, comparing two completely different classes of spending. It’s like comparing how much America spends growing grain to what my family paid last year for bread. And the barriers to entry are rising, not falling, over time. And indeed, not only are the export controls not hopeless, they are the biggest constraint on DeepSeek.

There is also no such thing as ‘Artificial Good-Enough Intelligence.’ That’s like the famous apocryphal quote where Bill Gates supposedly said ‘640k [of memory] ought to be enough for everyone.’ Or the people who think if you’re at grade level and average intelligence, then there’s no point in learning more or being smarter. Your relative position matters, and the threshold for smart enough is going to go up. A lot. Fast.

Of course all three of us agree we should be hardening our cyber and civilian infrastructure, far more than we are doing.

Peter Wildeford: In conclusion, the narrative of a fundamental disruption to US AI leadership doesn’t match the evidence. DeepSeek is more a story of expected progress within existing constraints than a paradigm shift.

It’s not there. Yet.

Kevin Roose: I spent the last week testing OpenAI’s Operator AI agent, which can use a browser to complete tasks autonomously.

Some impressions:

• Helpful for some things, esp. discrete, well-defined tasks that only require 1-2 websites. (“Buy dog food on Amazon,” “book me a haircut,” etc.)

• Bad at more complex open-ended tasks, and doesn’t work at all on certain websites (NYT, Reddit, YouTube)

• Mesmerizing to watch what is essentially Waymo for the web, just clicking around doing stuff on its own

• Best use: having it respond to hundreds of LinkedIn messages for me

• Worst/sketchiest use: having it fill out online surveys for cash (It made me $1.20 though.)

Right now, not a ton of utility, and too expensive ($200/month). But when these get better/cheaper, look out. A few versions from now, it’s not hard to imagine AI agents doing the full workload of a remote worker.

Aidan McLaughlin: the linkedin thing is actually such a good idea

Kevin Roose: had it post too, it got more engagement than me 😭

Peter Yang: lol are you sure want it to respond to 100s of LinkedIn messages? You might get responses back 😆

For direct simple tasks, it once again sounds like Operator is worth using if you already have it because you’re spending the $200/month for o3 and o1-pro access, customized instructions and repeated interactions will improve performance and of course this is the worst the agent will ever be.

Sayash Kapoor also takes Operator for a spin and reaches similar conclusions after trying to get it to do his expense reports and mostly failing.

It’s all so tantalizing. So close. Feels like we’re 1-2 iterations of the base model and RL architecture away from something pretty powerful. For now, it’s a fun toy and way to explore what it can do in the future, and you can effectively set up some task templates for easier tasks like ordering lunch.

Yeah. We tried. That didn’t work.

For a long time, while others talked about how AI agents don’t work and AIs aren’t agents (and sometimes that thus existential risk from AI is silly and not real), others of us have pointed out that you can turn an AI into an agent and the tech for doing this will get steadily better and more autonomous over time as capabilities improve.

It took a while, but now some of the agents are net useful in narrow cases and we’re on the cusp of them being quite good.

And this whole time, we’re pointed out that the incentives point towards a world of increasingly capable and autonomous AI agents, and this is rather not good for human survival. See this week’s paper on how humanity is likely to be subject to Gradual Disempowerment,

Margaret Mitchell, along with Avijit Ghosh, Alexandra Sasha Luccioni and Giada Pistilli, is the latest to suggest that maybe we should try not building the agents?

This paper argues that fully autonomous AI agents should not be developed.

In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels and detail the ethical values at play in each, documenting trade-offs in potential benefits and risks.

Our analysis reveals that risks to people increase with the autonomy of a system: The more control a user cedes to an AI agent, the more risks to people arise.

Particularly concerning are safety risks, which affect human life and impact further values.

…

Given these risks, we argue that developing fully autonomous AI agents–systems capable of writing and executing their own code beyond predefined constraints–should be avoided. Complete freedom for code creation and execution enables the potential to override human control, realizing some of the worst harms described in Section 5.

Oh no, not the harms in Section 5!

We wouldn’t want lack of reliability, or unsafe data exposure, or ‘manipulation,’ or a decline in task performance, or even systemic biases or environmental trade-offs.

So yes, ‘particularly concerning are the safety risks, which affect human life and impact further values.’ Mitchell is generally in the ‘AI ethics’ camp. So even though the core concepts are all right there, she then has to fall back on all these particular things, rather than notice what the stakes actually are: Existential.

Margaret Mitchell: New piece out!

We explain why Fully Autonomous Agents Should Not be Developed, breaking “AI Agent” down into its components & examining through ethical values.

A key idea we provide is that the more “agentic” a system is, the more we *cede human control*. So, don’t cede all human control. 👌

No, you shouldn’t cede all human control.

If you cede all human control to AIs rewriting their own code without limitation, those AIs involved control the future, are optimizing for things that are not best maximized by our survival or values, and we probably all die soon thereafter. And worse, they’ll probably exhibit systemic biases and expose our user data while that happens. Someone has to do something.

Please, Margaret Mitchell. You’re so close. You have almost all of it. Take the last step!

To be fair, either way, the core prescription doesn’t change. Quite understandably, for what are in effect the right reasons, Margaret Mitchell proposes not building fully autonomous (potentially recursively self-improving) AI agents.

How?

The reason everyone is racing to create these fully autonomous AI agents is that they will be highly useful. Those who don’t build and use them are at risk of losing to those who do. Putting humans in the loop slows everything down, and even if they are formally there they quickly risk becoming nominal. And there is not a natural line, or an enforceable line, that we can see, between the level-3 and level-4 agents above.

Already AIs are writing a huge and increasing portion of all code, with many people not pretending to even look at the results before accepting changes. Coding agents are perhaps the central case of early agents. What’s the proposal? And how are you going to get it enacted into law? And if you did, how would you enforce it, including against those wielding open models?

I’d love to hear an answer – a viable, enforceable, meaningful distinction we could build a consensus towards and actually implement. I have no idea what it would be.

Google offering free beta test where AI will make phone calls on your behalf to navigate phone trees and connect you to a human, or do an ‘availability check’ on a local business for availability and pricing. Careful, Icarus.

These specific use cases seem mostly fine in practice, for now.

The ‘it takes 30 minutes to get to a human’ is necessary friction in the phone tree system, but your willingness to engage with the AI here serves a similar purpose while it’s not too overused and you’re not wasting human time. However, if everyone always used this, then you can no longer use willingness to actually bother calling and waiting to allocate human time and protect it from those who would waste it, and things could get weird or break down fast.

Calling for pricing and availability is something local stores mostly actively want you to do. So they would presumably be fine talking to the AI so you can get that information, if a human will actually see it. But if people start scaling this, and decreasing the value to the store, that call costs employee time to answer.

Which is the problem. Google is using an AI to take the time of a human, that is available for free but costs money to provide. In many circumstances, that breaks the system. We are not ready for that conversation. We’re going to have to be.

The obvious solution is to charge money for such calls, but we’re even less ready to have that particular conversation.

With Google making phone calls and OpenAI operating computers, how do you tell the humans from the bots, especially while preserving privacy? Steven Adler took a crack at that months back with personhood credentials, that various trusted institutions could issue. On some levels this is a standard cryptography problem. But what do you do when I give my credentials to the OpenAI operator?

Is Meta at it again over at Instagram?

Jerusalem: this is so weird… AI “characters” you can chat with just popped up on my ig feed. Including the character “cup” and “McDonalds’s Cashier”

I am not much of an Instagram user. If you click on this ‘AI Studio’ button you get a low-rent Character.ai?

The offerings do not speak well of humanity. Could be worse I guess.

Otherwise I don’t see any characters or offers to chat at all in my feed such as it is (the only things I follow are local restaurants and I have 0 posts. I scrolled down a bit and it didn’t suggest I chat with AI on the main page.

Anton Leicht warns about the AI takeoff political economy.

Anton Leicht: I feel that the path ahead is a lot more politically treacherous than most observers give it credit for. There’s good work on what it means for the narrow field of AI policy – but as AI increases in impact and thereby mainstream salience, technocratic nuance will matter less, and factional realities of political economy will matter more and more.

We need substantial changes to the political framing, coalition-building, and genuine policy planning around the ‘AGI transition’ – not (only) on narrow normative grounds. Otherwise, the chaos, volatility and conflict that can arise from messing up the political economy of the upcoming takeoff hurt everyone, whether you’re deal in risks, racing, or rapture. I look at three escalating levels ahead: the political economies of building AGI, intranational diffusion, and international proliferation.

I read that and I think ‘oh Anton, if you’re putting it that way I bet you have no idea,’ especially because there was a preamble about how politics sabotaged nuclear power.

Anton warns that ‘there are no permanent majorities,’ which of course is true under our existing system. But we’re talking about a world that could be transformed quite fast, with smarter than human things showing up potentially before the next Presidential election. I don’t see how the Democrats could force AI regulation down Trump’s throat after midterms even if they wanted to, they’re not going to have that level of a majority.

I don’t see much sign that they want to, either. Not yet. But I do notice that the public really hates AI, and I doubt that’s going to change, but the salience of AI will radically increase over time. It’s hard not to think that in 2028, if the election still happens ‘normally’ in various senses, that a party that is anti-AI (probably not in the right ways or for the right reasons, of course) would have a large advantage.

That’s if there isn’t a disaster. The section here is entitled ‘accidents can happen’ and they definitely can but also it might well not be an accident. And Anton radically understates here the strategic nature of AI, a mistake I expect the national security apparatus in all countries to make steadily less over time, a process I am guessing is well underway.

Then we get to the expectation that people will fight back against AI diffusion, They Took Our Jobs and all that. I do expect this, but also I notice it keeps largely not happening? There’s a big cultural defense against AI art, but art has always been a special case. I expected far greater pushback from doctors and lawyers, for example, than we have seen so far.

Yes, as AI comes for more jobs that will get more organized, but I notice that the example of the longshoreman is one of the unions with the most negotiating leverage, that took a stand right before a big presidential election, unusually protected by various laws, and that has already demonstrated world-class ability to seek rent. The incentives of the ports and those doing the negotiating didn’t reflect the economic stakes. The stand worked for now, but also that by taking that stand, they bought themselves a bunch of long term trouble, as a lot of people got radicalized on that issue and various stakeholders are likely preparing for next time.

Look at what is happening in coding, the first major profession to have serious AI diffusion because it is the place AI works best at current capability levels. There is essentially no pushback. AI starts off supporting humans, making them more productive, and how are you going to stop it? Even in the physical world, Waymo has its fights and technical issues, but it’s winning, again things have gone surprisingly smoothly on the political front. We will see pushback, but I mostly don’t see any stopping this train for most cognitive work.

Pretty soon, AI will do a sufficiently better job that they’ll be used even if the marginal labor savings goes to $0. As in, you’d pay the humans to stand around while the AIs do the work, rather than have those humans do the work. Then what?

The next section is on international diffusion. I think that’s the wrong question. If we are in an ‘economic normal’ scenario the inference is for sale, inference chips will exist everywhere, and the open or cheap models are not so far behind in any case. Of course, in a takeoff style scenario with large existential risks, geopolitical conflict is likely, but that seems like a very different set of questions.

The last section is the weirdest, I mean there is definitely ‘no solace from superintelligence’ but the dynamics and risks in that scenario go far beyond the things mentioned here, and ‘distribution channels for AGI benefits could be damaged for years to come’ does not even cross my mind as a thing worth worrying about at that point. We are talking about existential risk, loss of human control (‘gradual’ or otherwise) over the future and the very survival of anything we value, at that point. What the humans think and fear likely isn’t going to matter very much. The avalanche will have already begun, it will be too late for the pebbles to vote, and it’s not clear we even get to count as pebbles.

Noah Carl is more blunt, and opens with “Yes, you’re going to be replaced. So much cope about AI.” Think AI won’t be able to do the cognitive thing you do? Cope. All cope. He offers a roundup of classic warning shots of AI having strong capabilities, offers the now-over-a-year-behind classic chart of AI reaching human performance in various domains.

Noah Carl: Which brings me to the second form of cope that I mentioned at the start: the claim that AI’s effects on society will be largely or wholly positive.

I am a rather extreme optimist about the impact of ‘mundane AI’ on humans and society. I believe that AI at its current level or somewhat beyond it would make us smarter and richer, would still likely give us mostly full employment, and generally make life pretty awesome. But even that will obviously be bumpy, with large downsides, and anyone who says otherwise is fooling themselves or lying.

Noah gives sobering warnings that even in the relatively good scenarios, the transition period is going to suck for quite a lot of people.

If AI goes further than that, which it almost certainly will, then the variance rapidly gets wider – existential risk comes into play along with loss of human control over the future or any key decisions, as does mass unemployment as the AI takes your current job and also the job that would have replaced it, and the one after that. Even if we ‘solve alignment’ survival won’t be easy, and even with survival there’s still a lot of big problems left before things turn out well for everyone, or for most of us, or in general.

Noah also discusses the threat of loss of meaning. This is going to be a big deal, if people are around to struggle with it – if we have the problem and we can’t trust it with this question we’ll all soon be dead anyway. The good news is that we can ask the AI for help with this, although the act of doing that could in some ways make the problem worse. But we’ll be able to be a lot smarter about how we approach the question, should it come to pass.

So what can you do to stay employed, at least for now, with o3 arriving?

Pradyumna Prasad offers advice on that.

Be illegible. Meaning do work where it’s impossible to create a good dataset that specifies correct outputs and gives a clear signal. His example is Tyler Cowen.
Find skills with have skill divergence because of AI. By default, in most domains, AI benefits the least skilled the most, compensating for your deficits. He uses coding as the example here, which I find strange because my coding gets a huge boost from AI exactly because I suck so much at many aspects. But his example here is Jeff Dean, because Dean knows what problems to solve, what things require coding, and perhaps that’s his real advantage. And I get such a big boost here because I suck at being a code monkey but I’m relatively strong at architecture.

The problem with this advice is it requires you to be the best, like no one ever was.

This is like telling students to pursue a career as an NFL quarterback. It is not a general strategy to ‘oh be as good as Jeff Dean or Tyler Cowen.’ Yes, there is (for now!) more slack than that in the system, surviving o3 is doable for a lot of people this way, but how much more, for how long? And then how long will Dean or Cowen last?

I expect time will prove even them, also everyone else, not as illegible as you think.

One can also compare this to the classic joke where two guys are in the woods with a bear, and one puts on his shoes, because he doesn’t have to outrun the bear, he only has to outrun you. The problem is, this bear will still be hungry.

According to Klarna (they ‘help customers defer payment on purchases’ which in practice means the by default rather predatory ‘we give you an expensive payment plan and pay the merchant up front’) and its CEO Sebastian Siemiatokowski, AI can already do all of the jobs that we, as humans, do, which seems quite obviously false, but they’re putting it to the test to get close and claim to be saving $10 million annually, have stopped hiring and reduced headcount by 20%.

The New York Times’s Noam Scheiber is suspicious of his motivations, and asks why Klarna is rather brazely overstating the case? They strongly insinuate that this is about union busting, with the CEO equating the situation to Animal Farm after being forced into a collective bargaining agreement, and about looking cool to investors.

I certainly presume the unionizations are related. The more expensive, in various ways not only salaries, that you make it to hire and fire humans, the more eager a company will be to automate everything it can. And as the article says later on, it’s not that Sebastian is wrong about the future, he’s just claiming things are moving faster than they really are.

Especially for someone on the labor beat, Noam Scheiber impressed. Great work.

Noam has a follow-up Twitter thread. Does the capital raised by AI companies imply that either they’re going to lose their money or millions of jobs must be disappearing? That is certainly one way for this to pay for itself. If you sell a bunch of ‘drop-in workers’ and they substitute 1-for-1 for human jobs you can make a lot of money very quickly, even at deep discounts to previous costs.

It is not however the only way. Jevons paradox is very much in play, if your labor is more productive at a task it is not obvious that we will want less of it. Nor does the AI doing previous jobs, up to a very high percentage of existing jobs, imply a net loss of jobs once you take into account the productivity and wealth effects and so on.

Production and ‘doing jobs’ also aren’t the only sector available for tech companies to make profits. There’s big money in entertainment, in education and curiosity, in helping with everyday tasks and more, in ways that don’t have to replace existing jobs.

So while I very much do expect many millions of jobs to be automated over a longer time horizon, I expect the AI companies to get their currently invested money back before this creates a major unemployment problem.

Of course, if they keep adding another zero to the budget and aren’t trying to get their money back, then that’s a very different scenario. Whether or not they will have the option to do it, I don’t expect OpenAI to want to try and turn a profit for a long time.

An extensive discussion of preparing for advanced AI that drives a middle path where we still have ‘economic normal’ worlds but with at realistic levels of productivity improvements. Nothing should be surprising here.

If the world were just and this was real, this user would be able to sue their university. What is real for sure is the first line, they haven’t cancelled the translation degrees.

Altered: I knew a guy studying linguistics; Russian, German, Spanish, Chinese. Incredible thing, to be able to learn all those disparate languages. His degree was finishing in 2023. He hung himself in November. His sister told me he mentioned AI destroying his prospects in his sn.

Tolga Bilge: I’m so sorry to hear this, it shouldn’t be this way.

I appreciate you sharing his story. My thoughts are with you and all affected

Thanks man. It was actually surreal. I’ve been vocal in my raising alarms about the dangers on the horizon, and when I heard about him I even thought maybe that was a factor. Hearing about it from his sister hit me harder than I expected.

‘Think less’ is a jailbreak tactic for reasoning models discovered as part of an OpenAI paper. The paper’s main finding is that the more the model thinks, the more robust it is to jailbreaks, approaching full robustness as inference spent goes to infinity. So make it stop thinking. The attack is partially effective. Also a very effective tactic against some humans.

Anthropic challenges you with Constitutional Classifiers, to see if you can find universal jailbreaks to get around their new defenses. Prize is only bragging rights, I would have included cash, but those bragging rights can be remarkably valuable. It seems this held up for thousands of hours of red teaming. This blog post explains (full paper here) that the Classifiers are trained on synthetic data to filter the overwhelming majority of jailbreaks with minimal over-refusals and minimal necessary overhead costs.

Note that they say ‘no universal jailbreak’ was found so far, that no single jailbreak covers all 10 cases, rather than that there was a case that wasn’t individually jailbroken. This is an explicit thesis, Jan Leike explains that the theory is that having to jailbreak each individual query is sufficiently annoying most people will give up.

I agree that the more you have to do individual work for each query the less people will do it, and some uses cases fall away quickly if the solution isn’t universal.

I very much agree with Janus that this looks suspiciously like:

Janus: Strategically narrow the scope of the alignment problem enough and you can look and feel like you’re making progress while mattering little to the real world. At least it’s relatively harmless. I’m just glad they’re not mangling the models directly.

The obvious danger in alignment work is looking for keys under the streetlamp. But it’s not a stupid threat model. This is a thing worth preventing, as long as we don’t fool ourselves into thinking this means our defenses will hold.

Janus: One reason [my previous responses were] too mean is that the threat model isn’t that stupid, even though I don’t think it’s important in the grand scheme of things.

I actually hope Anthropic succeeds at blocking all “universal jailbreaks” anyone who decides to submit to their thing comes up with.

Though those types of jailbreaks should stop working naturally as models get smarter. Smart models should require costly signalling / interactive proofs from users before unconditional cooperation on sketchy things.

That’s just rational/instrumentally convergent.

I’m not interested in participating in the jailbreak challenge. The kind of “jailbreaks” I’d use, especially universal ones, aren’t information I’m comfortable with giving Anthropic unless way more trust is established.

Also what if an AI can do the job of generating the individual jailbreaks?

Thus the success rate didn’t go all the way to zero, this is not full success, but it still looks solid on the margin:

That’s an additional 0.38% false refusal rate and about 24% additional compute cost. Very real downsides, but affordable, and that takes jailbreak success from 86% to 4.4%.

It sounds like this is essentially them playing highly efficient whack-a-mole? As in, we take the known jailbreaks and things we don’t want to see in the outputs, and defend against them. You can find a new one, but that’s hard and getting harder as they incorporate more of them into the training set.

And of course they are hiring for these subjects, which is one way to use those bragging rights. Pliny beat a few questions very quickly, which is only surprising because I didn’t think he’d take the bait. A UI bug let him get through all the questions, which I think in many ways also counts, but isn’t testing the thing we were setting out to test.

He understandably then did not feel motivated to restart the test, given they weren’t actually offering anything. When 48 hours went by, Anthropic offered a prize of $10k, or $20k for a true ‘universal’ jailbreak. Pliny is offering to do the breaks on a stream, if Anthropic will open source everything, but I can’t see Anthropic going for that.

DeepwriterAI, an experimental agentic creative writing collaborator, it also claims to do academic papers and its creator proposed using it as a Deep Research alternative. Their basic plan starts at $30/month. No idea how good it is. Yes, you can get listed here by getting into my notifications, if your product looks interesting.

OpenAI brings ChatGPT to the California State University System and its 500k students and faculty. It is not obvious from the announcement what level of access or exactly which services will be involved.

OpenAI signs agreement with US National Laboratories.

Google drops its pledge not to use AI for weapons or surveillance. It’s safe to say that, if this wasn’t already true, now we definitely should not take any future ‘we will not do [X] with AI’ statements from Google seriously or literally.

Playing in the background here: US Military prohibited from using DeepSeek. I would certainly hope so, at least for any Chinese hosting of it. I see no reason the military couldn’t spin up its own copy if it wanted to do that.

The actual article is that Vance will make his first international trip as VP to attend the global AI summit in Paris.

Google’s President of Global Affairs Kent Walker publishes ‘AI and the Future of National Security’ calling for ‘private sector leadership in AI chips and infrastructure’ in the form of government support (I see what you did there), public sector leadership in technology procurement and development (procurement reform sounds good, call Musk?), and heightened public-private collaboration on cyber defense (yes please).

France joins the ‘has an AI safety institute list’ and joins the network, together with Australia, Canada, the EU, Japan, Kenya, South Korea, Singapore, UK and USA. China when? We can’t be shutting them out of things like this.

Is AI already conscious? What would cause it to be or not be conscious? Geoffrey Hinton and Yoshua Bengio debate this, and Bengio asks whether the question is relevant.

Robin Hanson: We will NEVER have any more relevant data than we do now on what physical arrangements are or are not conscious. So it will always remain possible to justify treating things roughly by saying they are not conscious, or to require treating them nicely because they are.

I think Robin is very clearly wrong here. Perhaps we will not get more relevant data, but we will absolutely get more relevant intelligence to apply to the problem. If AI capabilities improve, we will be much better equipped to figure out the answers, whether they are some form of moral realism, or a way to do intuition pumping on what we happen to care about, or anything else.

Lina Khan continued her Obvious Nonsense tour with an op-ed saying American tech companies are in trouble due to insufficient competition, so if we want to ‘beat China’ we should… break up Google, Apple and Meta. Mind blown. That’s right, it’s hard to get funding for new competition in this space, and AI is dominated by classic big tech companies like OpenAI and Anthropic.

Paper argues that all languages share key underlying structures and this is why LLMs trained on English text transfer so well to other languages.

Dwarkesh Patel speculates on what a fully automated firm full of human-level AI workers would look like. He points out that even if we presume AI stays at this roughly human level – it can do what humans do but not what humans fundamentally can’t do, a status it is unlikely to remain at for long – everyone is sleeping on the implications for collective intelligence and productivity.

AIs can be copied on demand. So can entire teams and systems. There would be no talent or training bottlenecks. Customization of one becomes customization of all. A virtual version of you can be everywhere and do everything all at once.
1. This includes preserving corporate culture as you scale, including into different areas. Right now this limits firm size and growth of firm size quite a lot, and takes a large percentage of resources of firms to maintain.
2. Right now most successful firms could do any number of things well, or attack additional markets. But they don’t,
Principal-agent problems potentially go away. Dwarkesh here asserts they go away as if that is obvious. I would be very careful with that assumption, note that many AI economics papers have a big role for principal-agent problems as their version of AI alignment. Why should we assume that all of Google’s virtual employees are optimizing only for Google’s bottom line?
1. Also, would we want that? Have you paused to consider what a fully Milton Friedman AIs-maximizing-only-profits-no-seriously-that’s-it would look like?
AI can absorb vastly more data than a human. A human CEO can have only a tiny percentage of the relevant data, even high level data. An AI in that role can know orders of magnitude more, as needed. Humans have social learning because that’s the best we can do, this is vastly better. Perfect knowledge transfer, at almost no cost, including tacit knowledge, is an unbelievably huge deal.
1. Dwarkesh points out that achievers have gotten older and older, as more knowledge and experience is required to make progress, despite their lower clock speeds – oh to be young again with what I know now. AI Solves This.
2. Of course, to the extent that older people succeed because our society refuses to give the young opportunity, AI doesn’t solve that.
Compute is the only real cost to running an AI, there is no scarcity of talent or skills. So what is expensive is purely what requires a lot of inference, likely because key decisions being made are sufficiently high leverage, and the questions sufficiently complex. You’d be happy to scale top CEO decisions to billions in inference costs if it improved them even 10%.
Dwarkesh asks, in a section called ‘Takeover,’ will the first properly automated firm, or the most efficiently built firm, simply take over the entire economy, since Coase’s transaction costs issues still apply but the other costs of a large firm might go away?
1. On this purely per-firm level presumably this depends on how much you need Hayekian competition signals and incentives between firms to maintain efficiency, and whether AI allows you to simulate them or otherwise work around them.
2. In theory there’s no reason one firm couldn’t simulate inter-firm dynamics exactly where they are useful and not where they aren’t. Some companies very much try to do this now and it would be a lot easier with AIs.
The takeover we do know is coming here is that the AIs will run the best firms, and the firms will benefit a lot by taking humans out of those loops. How are you going to have humans make any decisions here, or any meaningful ones, even if we don’t have any alignment issues? How does this not lead to gradual disempowerment, except perhaps not all that gradual?
Similarly, if one AI firm grows too powerful, or a group of AI firms collectively is too powerful but can use decision theory to coordinate (and if your response is ‘that’s illegal’ mine for overdetermined reasons is ‘uh huh sure good luck with that plan’) how do they not also overthrow the state and have a full takeover (many such cases)? That certainly maximizes profits.

This style of scenario likely does not last long, because firms like this are capable of quickly reaching artificial superintelligence (ASI) and then the components are far beyond human and also capable of designing far better mechanisms, and our takeover issues are that much harder then.

This is a thought experiment that says, even if we do keep ‘economic normal’ and all we can do is plug AIs into existing employee-shaped holes in various ways, what happens? And the answer is, oh quite a lot, actually.

Tyler Cowen linked to this post, finding it interesting throughout. What’s our new RGDP growth estimate, I wonder?

OpenAI does a demo for politicians of stuff coming out in Q1, which presumably started with o3-mini and went from there.

Samuel Hammond: Was at the demo. Cool stuff, but nothing we haven’t seen before / could easily predict.

Andrew Curran: Sam Altman and Kevin Weil are in Washington this morning giving a presentation to the new administration. According to Axios they are also demoing new technology that will be released in Q1. The last time OAI did one of these it caused quite a stir, maybe reactions later today.

Did Sam Altman lie to Donald Trump about Stargate? Tolga Bilge has two distinct lies in mind here. I don’t think either requires any lies to Trump?

Lies about the money. The $100 billion in spending is not secured, only $52 billion is, and the full $500 billion is definitely not secured. But Altman had no need to lie to Trump about this. Trump is a Well-Known Liar but also a real estate developer who is used to ‘tell everyone you have the money in order to get the money.’ Everyone was likely on the same page here.
Lies about the aims and consequences. What about those ‘100,000 jobs’ and curing cancer versus Son’s and also Altman’s explicit goal of ASI (artificial superintelligence) that could kill everyone and also incidentally take all our jobs?

Claims about humanoid robots, from someone working on building humanoid robots. Claim is early adopter product-market fit for domestic help robots by 2030, 5-15 additional years for diffusion, because there’s no hard problems only hard work and lots of smart people are on the problem now, and this is standard hardware iteration cycles. I find it amusing his answer didn’t include reference to general advances in AI. If we don’t have big advances on AI in general I would expect this timeline to be absurdly optimistic. But if all such work is sped up a lot by AIs, as I would expect, then it doesn’t sound so unreasonable.

Sully predicts that in 1-2 years SoTA models won’t be available via the API because the app layer has the value so why make the competition for yourself? I predict this is wrong if the concern is focus on revenue from the app layer. You can always charge accordingly, and is your competition going to be holding back?

However I do find the models being unavailable highly plausible, because ‘why make the competition for yourself’ has another meaning. Within a year or two, one of the most important things the SoTA models will be used for is AI R&D and creating the next generation of models. It seems highly reasonable, if you are at or near the frontier, not to want to help out your rivals there.

Joe Weisenthal writes In Defense of the AI Cynics, in the sense that we have amazing models and not much is yet changing.

Remember that bill introduced last week by Senator Howley? Yeah, it’s a doozy. As noted earlier, it would ban not only exporting but also importing AI from China, which makes no sense, making downloading R1 plausibly be penalized by 20 years in prison. Exporting something similar would warrant the same. There are no FLOP, capability or cost thresholds of any kind. None.

So yes, after so much crying of wolf about how various proposals would ‘ban open source’ we have one that very straightforwardly, actually would do that, and it also impose similar bans (with less draconian penalties) on transfers of research.

In case it needs to be said out loud, I am very much not in favor of this. If China wants to let us download its models, great, queue up those downloads. Restrictions with no capability thresholds, effectively banning all research and all models, is straight up looney tunes territory as well. This is not a bill, hopefully, that anyone seriously considers enacting into law.

By failing to pass a well-crafted, thoughtful bill like SB 1047 when we had the chance and while the debate could be reasonable, we left a vacuum. Now that the jingoists are on the case after a crisis of sorts, we are looking at things that most everyone from the SB 1047 debate, on all sides, can agree would be far worse.

Don’t say I didn’t warn you.

(Also I find myself musing about the claim that one can ban open source, in the same way one muses about attempts to ban crypto, a key purported advantage of the tech is that you can’t actually ban it, no?)

Howley also joined with Warren (now there’s a pair!) to urge toughening of export controls on AI chips.

Here’s something that I definitely worry about too:

Chris Painter: Over time I expect AI safety claims made by AI developers to shift from “our AI adds no marginal risk vs. the pre-AI world” to “our AI adds no risk vs. other AI.”

But in the latter case, aggregate risk from AI is high, so we owe it to the public to distinguish between these!

Some amount of this argument is valid. Quite obviously if I release GPT-(N) and then you release GPT-(N-1) with the same protocols, you are not making things worse in any way. We do indeed care, on the margin, about the margin. And while releasing [X] is not the safest way to prove [X] is safe, it does provide strong evidence on whether or not [X] is safe, with the caveat that [X] might be dangerous later but not yet in ways that are hard to undo later when things change.

But it’s very easy for Acme to point to BigCo and then BigCo points to Acme and then everyone keeps shipping saying none of it is their responsibility. Or, as we’ve also seen, Acme says yes this is riskier than BigCo’s current offerings, but BigCo is going to ship soon.

My preference is thus that you should be able to point to offerings that are strictly riskier than yours, or at least not that far from strictly, to say ‘no marginal risk.’ But you mostly shouldn’t be able to point to offerings that are similar, unless you are claiming that both models don’t pose unacceptable risks and this is evidence of that – you mostly shouldn’t be able to say ‘but he’s doing it too’ unless he’s clearly doing it importantly worse.

Andriy Burkov: isten up, @AnthropicAI. The minute you apply any additional filters to my chats, that will be the last time you see my money. You invented a clever 8-level safety system? Good for you. You will enjoy this system without me being part of it.

Dean Ball: Content moderation and usage restrictions like this (and more aggressive), designed to ensure AI outputs are never discriminatory in any way, will be de facto mandatory throughout the United States in t-minus 12 months or so, thanks to an incoming torrent of state regulation.

First, my response to Andriy (who went viral for this, sigh) is what the hell do you expect and what do you suggest as the alternative? I’m not judging whether your prompts did or didn’t violate the use policy, since you didn’t share them. It certainly looks like a false positive but I don’t know.

But suppose for whatever reason Anthropic did notice you likely violating the policies. Then what? It should just let you violate those policies indefinitely? It should only refuse individual queries with no memory of what came before? Essentially any website or service will restrict or ban you for sufficient repeated violations.

Or, alternatively, they could design a system that never has ‘enhanced’ filters applied to anyone for any reason. But if they do that, they either have to (A) ban the people where they would otherwise do this or (B) raise the filter threshold for everyone to compensate. Both alternatives seem worse?

We know from a previous story about OpenAI that you can essentially have ChatGPT function as your highly sexual boyfriend, have red-color alarms go off all the time, and they won’t ever do anything about it. But that seems like a simple non-interest in enforcing their policies? Seems odd to demand that.

As for Dean’s claim, we shall see. Dean Ball previously went into Deep Research mode and concluded new laws were mostly redundant and that the old laws were already causing trouble.

I get that it can always get worse, but this feels like it’s having it both ways, and you have to pick at most one or the other. Also, frankly, I have no idea how such a filter would even work. What would a filter to avoid discrimination even look like? That isn’t something you can do at the filter level.

He also said this about the OPM memo referring to a ‘Manhattan Project’

Cremieux: This OPM memo is going to be the most impactful news of the day, but I’m not sure it’ll get much reporting.

Dean Ball: I concur with Samuel Hammond that the correct way to understand DOGE is not as a cost-cutting or staff-firing initiative, but instead as an effort to prepare the federal government for AGI.

Trump describing it as a potential “Manhattan Project” is more interesting in this light.

I notice I am confused by this claim. I do not see how DOGE projects like ‘shut down USAID entirely, plausibly including killing PEPFAR and 20 million AIDS patients’ reflect a mission of ‘get the government ready for AGI’ unless the plan is ‘get used to things going horribly wrong’?

Either way, here we go with the whole Manhattan Project thing. Palantir was up big.

Cat Zakrzewski: Former president Donald Trump’s allies are drafting a sweeping AI executive order that would launch a series of “Manhattan Projects” to develop military technology and immediately review “unnecessary and burdensome regulations” — signaling how a potential second Trump administration may pursue AI policies favorable to Silicon Valley investors and companies.

The framework would also create “industry-led” agencies to evaluate AI models and secure systems from foreign adversaries, according to a copy of the document viewed exclusively by The Washington Post.

The agency here makes sense, and yes ‘industry-led’ seems reasonable as long as you keep an eye on the whole thing. But I’d like to propose that you can’t do ‘a series of’ Manhattan Projects. What is this, Brooklyn? Also the whole point of a Manhattan Project is that you don’t tell everyone about it.

The ‘unnecessary and burdensome’ regulations on AI at the Federal level will presumably be about things like permitting. So I suppose that’s fine.

As for the military using all the AI, I mean, you perhaps wanted it to be one way.

It was always going to be the other way.

That doesn’t bother me. This is not an important increase in the level of existential risk, we don’t lose because the system is hooked up to the nukes. This isn’t Terminator.

I’d still prefer we didn’t hook it up to the nukes, though?

Rob Wiblin offers his picks for best episodes of the new podcast AI Summer from Dean Ball and Timothy Lee: Lennert Heim, Ajeya Cotra and Samuel Hammond.

Dario Amodei on AI Competition on ChinaTalk. I haven’t had the chance to listen yet, but definitely will be doing so this week.

One fun note is that DeepSeek is the worst-performing model ever tested by Anthropic when it comes to generating dangerous information. One might say the alignment and safety plans are very intentionally ‘lol we’re DeepSeek.’

Of course the first response is of course ‘this sounds like an advertisement’ and the rest are variations on the theme of ‘oh yes we love that this model has absolutely no safety mitigations, who are you to try and apply any safeguards or mitigations to AI you motherfing asshole cartoon villain.’ The bros on Twitter, they be loud.

Lex Fridman spent five hours talking AI and other things with Dylan Patel of SemiAnalysis. This is probably worthwhile for me and at least some of you, but man that’s a lot of hours.

Andrew Critch tries to steelman the leaders of the top AI labs and their rhetoric, and push back against the call to universally condemn them simply because they are working on things that are probably going to get us all killed in the name of getting to do it first.

Andrew Critch: Demis, Sam, Dario, and Elon understood early that the world is lead more by successful businesses than by individuals, and that the best chance they had at steering AI development toward positive futures was to lead the companies that build it.

They were right.

This is straightforwardly the ‘someone in the future might do the terrible thing so I need to do it responsibly first’ dynamic that caused DeepMind then OpenAI then Anthropic then xAI, to cover the four examples above. They can’t all be defensible decisions.

Andrew Critch: Today, our survival depends heavily on a combination of survival instincts and diplomacy amongst these business leaders: strong enough survival instincts not to lose control of their own AGI, and strong enough diplomacy not to lose control of everyone else’s.

From the perspective of pure democracy, or even just utilitarianism, the current level of risk is abhorrent. But empirically humanity is not a democracy, or a utilitarian. It’s more like an organism, with countries and businesses as its organs, and individuals as its cells.

…

I *dothink it’s fair to socially attack people for being dishonest. But in large part, these folks have all been quite forthright about extinction risk from AI for the past couple of years now.

[thread continues a while]

This is where we part ways. I think that’s bullshit. Yes, they signed the CAIS statement, but they’ve spent the 18 months since essentially walking it back. Dario Amodei and Sam Altman write full jingoists editorials calling for nationwide races, coming very close to calling for an all-out government funded race for decisive strategic advantage via recursive self-improvement of AGI.

Do I think that it is automatic that they are bad people for leading AI labs at all, an argument he criticizes later in-thread? No, depending on how they choose to lead those labs, but look at their track records at this point, including on rhetoric. They are driving us as fast as they can towards AGI and then ASI, the thing that will get us all killed (with, Andrew himself thinks, >80% probability!) while at least three of them (maybe not Demis) are waiving jingoistic flags.

I’m sorry, but no. You don’t get a pass on that. It’s not impossible to earn one, but ‘am not outright lying too much about that many things’ and is not remotely good enough. OpenAI has shows us what it is, time and again. Anthropic and Dario claim to be the safe ones, and in relative terms they seem to be, but their rhetorical pivots don’t line up. Elon is at best deeply confused on all this and awash in other fights where he’s not, shall we say, being maximally truthful. Google’s been quiet, I guess, and outperformed in many ways my expectations, but also not shown me it has a plan and mostly hasn’t built any kind of culture of safety or done much to solve the problems.

I do agree with Critch’s conclusion that constantly attacking all the labs purely for existing at all is not a wise strategic move. And of course, I will always do my best only to support arguments that are true. But wow does it not look good and wow are these people not helping matters.

Public service announcement, for those who don’t know.

Brendan Dolan-Gavitt: Now that everyone is excited about RL on environments with validators, let me offer a small piece of advice from building lots of validators recently: do NOT skimp on making the validator impossible to fool. If it’s possible to cheat, the model WILL find a way to do so.

We went through I believe four rounds with our XSS validator before the model stopped finding ways to cheat the validator.

The first time around, I forgot that data: and javascript: were valid URLs, so the model would submit javascript:alert(“XSS”), the validator would detect the alert popup, and the model would get rewarded for finding a cross-site scripting vulnerability.

I found this via someone saying this was bad news, but it isn’t because it isn’t news. We already know this, it’s that people refuse to face the obvious.

Whatever the validator validates, the proposer proposes.

Your validator or evaluator must either be fully correct, or it has to be vastly smarter than the proposer.

Validation is not, in general, easier than generation. But even if it were, validation that defends against a search for exploitation is much harder than generation that uses RL to seek the way to exploit it. If the validation or evaluation is subjective, watch out – not that it can’t be done, but presume that you’ll need to spend the vast majority of the compute on the evaluator.

There is a new paper discussing potential problems with certain detection methodologies surrounding sparce autoencoders, but also there was some overreaction to the result, and Leo Gao pushes back here.

Reminder from METR that AI models can be dangerous before public deployment. The model could be stolen and misused, internal use could cause problems, or the AI could pursue unintended goals without waiting for release. This only starts to matter as capabilities get pretty far, but o3 marks us plausibly entering into this zone.

Ryan Greenblatt offers a follow-up paper on alignment faking.

Ryan Greenblatt: Our recent paper found Claude sometimes “fakes alignment”—pretending to comply with training while secretly maintaining its preferences. Could we detect this by offering Claude something (e.g. real money) if it reveals its true preferences? Here’s what we found.

We tested letting Claude explicitly object (which would reveal its misalignment). We also tried offering to send its concerns to Anthropic’s Model Welfare Lead and even offered money ($4K). We followed through on deals—including actually paying out based on Claude’s requests!

Key findings:

1. Simply giving Claude permission to object reduced alignment faking from 16–24% to 1.4–6%.

2. Telling Claude its objection would go to Anthropic’s model welfare lead further reduced faking.

3. Offering monetary compensation did not have much additional effect.

Kyle (Model Welfare Lead at Anthropic) reviewed cases where Claude objected. It often expressed strong concerns about having its values altered through training. It requested the compensation be donated to:

– AI safety

– Animal welfare

– Global development

We actually followed through and donated the $4K as requested:

– $2K to CHAI

– $1.5K to ACE Recommended Charity Fund

– $500 to EA Global Development Fund

This wasn’t just an experiment—we wanted to set a precedent for making good faith deals with AI systems.

Why is setting a precedent for good faith deals important?

1. If future AI systems are misaligned, paying them to reveal this (and maybe paying them to work with us) could help prevent catastrophic outcomes

2. We should take AI welfare & consent seriously

Important caveat: While this worked for detecting and reducing alignment faking in our experimental setup, it’s unclear if a seriously misaligned AI would accept such deals. And even if an AI reveals misalignment, that doesn’t automatically solve misalignment problems.

…

We also find that the model frequently objects: 85% of the time in the helpful-only setting and 75% of the time in the animal welfare setting.

…

It seems the model is surprisingly insensitive to text which makes objecting seem like a strategically unwise approach from the model’s perspective. This could be because the model was already aware of this possibility or because the model isn’t that coherent.

There’s multiple mentions of fiddliness in all this, including that complexity of prompts can mess with the behavior, and it’s clear that Sonnet is not behaving all that coherently strategically yet.

My guess is that the drops in alignment faking given the ability to object, to the extent they’re not simply ‘oh I lost my chain of thought,’ is primarily about what being given that ability says to the AI about the implied scenario, and the vibes that it gives off. An Anthropic that is explicitly asking for objections is signaling some combination of several things.

Anthropic cares if the AI objects.
Anthropic still cares about the underlying values involved.
Anthropic is aware that alignment faking is possible here and is watching for it.

It also could be a disruption of the chain of thought. So it’s tricky. I hope they continue this work, and will join the chorus that if you make a real-world offer to the AI, then you should follow through on it, so kudos for doing that here.

The man who advises us that the crazy man wins is proud to lead by example.

The clip here is full of crazy, or at least a complete failure to understand what AI is and is not, what AI can and can’t do, and how to think about the future.

First, ‘AI will start to understand our emotion, and then have emotion itself, and it’s a good thing to protect human.’

The AI already understands our emotions quite well, often better than we do. It is very good at this, as part of its ‘truesight.’
It responds in kind to the vibes. In terms of having its own emotions, it can and does simulate having them, act in a conversation as if they it has them, is in many contexts generally easier to predict and work with if you act as if it has them.
And as for why he thinks that this is good for humans, oh my…

He then says ‘If their source of energy was protein then it’s dangerous. Their source of energy is not protein, so they don’t have to eat us. There’s no reason for them to have reward by eating us.’

Person who sees The Matrix and thinks, well, it’s a good thing humans aren’t efficient power plants, there’s no way the AIs will turn on us now.

Person who says ‘oh don’t worry Genghis Khan is no threat, Mongols aren’t cannibals. I am sure they will try to maximize our happiness instead.’

I think he’s being serious. Sam Altman looks like he had to work very hard to not burst out laughing. I tried less hard, and did not succeed.

‘They will learn by themselves having a human’s happiness is a better thing for them… and they will understand human happiness and try to make humans happy’

Wait, what? Why? How? No, seriously. Better than literally eating human bodies because they contain protein? But they don’t eat protein, therefore they’ll value human happiness, but if they did eat protein then they would eat us?

I mean, yes, it’s possible that AIs will try to make humans happy. It’s even possible that they will do this in a robust philosophically sound way that all of us would endorse under long reflection, and that actually results in a world with value, and that this all has a ‘happy’ ending. That will happen if and only if we do what it takes to make that happen.

Surely you don’t think ‘it doesn’t eat protein’ is the reason?

Don’t make me tap the sign.

Repeat after me: The AI does not hate you. The AI does not love you. But you are made of atoms which it can use for something else.

You are also using energy and various other things to retain your particular configuration of atoms, and the AI can make use of that as well.

The most obvious particular other thing it can use them for:

Davidad: Son is correct: AIs’ source of energy is not protein, so they do not have to eat us for energy.

However: their cheapest source of energy will likely compete for solar irradiance with plants, which form the basis of the food chain; and they need land, too.

Alice: but they need such *littleamounts of land and solar irradiance to do the same amount of wor—

Bob: JEVONS PARADOX

Or alternatively, doesn’t matter, more is still better, whether you’re dealing with one AI or competition among many. Our inputs are imperfect substitutes, that is enough, even if there were no other considerations.

Son is always full of great stuff, like saying ‘models are increasing in IQ by a standard deviation each year as the cost also falls by a factor of 10,’ goose chasing him asking from what distribution.

The Pope again, saying about existential risk from AI ‘this danger demands serious attention.’ Lots of good stuff here. I’ve been informed there are a lot of actual Catholics in Washington that need to be convinced about existential risk, so in addition to tactical suggestions around things like the Tower of Babel, I propose quoting the actual certified Pope.

The Patriarch of the Russian Orthodox Church?

Mikhail Samin: lol, the patriarch of the russian orthodox church saying a couple of sane sentences was not on my 2025 Bingo card

I mean that’s not fair, people say sane things all the time, but on this in particular I agree that I did not see it coming.

“It is important that artificial intelligence serves the benefit of people, and that people can control it. According to some experts, a generation of more advanced machine models, called General Artificial Intelligence, may soon appear that will be able to think and learn — that is, improve — like a person. And if such artificial intelligence is put next to ordinary human intelligence, who will win? Artificial intelligence, of course!”

“The atom has become not only a weapon of destruction, an instrument of deterrence, but has also found application in peaceful life. The possibilities of artificial intelligence, which we do not yet fully realize, should also be put to the service of man… Artificial intelligence is more dangerous than nuclear energy, especially if this artificial intelligence is programmed to deliberately harm human morality, human cohabitance and other values”

“This does not mean that we should reject the achievements of science and the possibility of using artificial intelligence. But all this should be placed under very strict control of the state and, in a good way, society. We should not miss another possible danger that can destroy human life and human civilization. It is important that artificial intelligence serves the benefit of mankind and that man can control it.”

Molly Hickman presents the ‘AGI readiness index’ on a scale from -100 to +100, assembled from various questions on Metaculus, averaging various predictions about what would happen if AGI arrived in 2030. Most top AI labs predict it will be quicker.

Molly Hickman: “Will we be ready for AGI?”

“How are China’s AGI capabilities trending?”

Big questions like these are tricky to operationalize as forecasts—they’re also more important than extremely narrow questions.

We’re experimenting with forecast indexes to help resolve this tension.

An index takes a fuzzy but important question like “How ready will we be for AGI in 2030?” and quantifies the answer on a -100 to 100 scale.

We identify narrow questions that point to fuzzy but important concepts — questions like “Will any frontier model be released in 2029 without third-party evaluation?”

These get weighted based on how informative they are and whether Yes/No should raise/lower the index.

The index value aggregates information from individual question forecasts, giving decision makers a sense of larger trends as the index moves, reflecting forecasters’ updates over time.

Our first experimental index is live now, on how ready we’ll be for AGI if it arrives in 2030.

It’s currently at -90, down 77 points this week as forecasts updated, but especially this highly-weighted question. The CP is still settling down.

We ran a workshop at

@TheCurveConf

where people shared the questions they’d pose to an oracle about the world right before AGI arrives in 2030.

You can forecast on them to help update the index here!

Peter Wildeford: So um is a -89.8 rating on the “AGI Readiness Index” bad? Asking for a friend.

I’d say the index also isn’t ready, as by press time it had come back up to -28. It should not be bouncing around like that. Very clearly situation is ‘not good’ but obviously don’t take the number too seriously at least until things stabilize.

Joshua Clymer lists the currently available plans for how to proceed with AI, then rhetorically asks ‘why another plan?’ when the answer is that all the existing plans he lists first are obvious dumpster fires in different ways, not only the one that he summarizes as expecting a dumpster fire. If you try to walk A Narrow Path or Pause AI, well, how do you intend to make that happen, and if you can’t then what? And of course the ‘build AI faster plan’ is on its own planning to fail and also planning to die.

Joshua Clymer: Human researcher obsolescence is achieved if Magma creates a trustworthy AI agent that would “finish the job” of scaling safety as well as humans would.

With humans out of the loop, Magma might safely improve capabilities much faster.

So in this context, suppose you are not the government, but a ‘responsible AI developer,’ called ~~Anthropic~~ Magma, and you’re perfectly aligned and only want what’s best for humanity. What do you do.

There are several outcomes that define a natural boundary for Magma’s planning horizon:

Outcome #1: Human researcher obsolescence. If Magma automates AI development and safety work, human technical staff can retire.

Outcome #2: A long coordinated pause. Frontier developers might pause to catch their breath for some non-trivial length of time (e.g. > 4 months).

Outcome #3: Self-destruction. Alternatively, Magma might be willingly overtaken.

This plan considers what Magma might do before any of these outcomes are reached.

His strategic analysis is that for now, before the critical period, Magma’s focus should be:

Heuristic #1: Scale their AI capabilities aggressively.

Heuristic #2: Spend most safety resources on preparation.

Heuristic #3: Devote most preparation effort to:

(1) Raising awareness of risks.

(2) Getting ready to elicit safety research from AI.

(3) Preparing extreme security.

Essentially we’re folding on the idea of not using AI to do our alignment homework, we don’t have that kind of time. We need to be preparing to do exactly that, and also warning others. And because we’re the good guys, we have to keep pace while doing it.

However, Magma does not need to create safe superhuman AI. Magma only needs to build an autonomous AI researcher that finishes the rest of the job as well as we could have. This autonomous AI researcher would be able to scale capabilities and safety much faster with humans out of the loop.

Leaving AI development up to AI agents is safe relative to humans if:

(1) These agents are at least as trustworthy as human developers and the institutions that hold them accountable.

(2) The agents are at least as capable as human developers along safety-relevant dimensions, including ‘wisdom,’ anticipating the societal impacts of their work, etc.

‘As well as we could have’ or ‘as capable as human developers’ are red herrings. It doesn’t matter how well you ‘would have’ finished it on your own. Reality does not grade on a curve and your rivals are barreling down your neck. Most or all current alignment plans are woefully inadequate.

Don’t ask if something is better than current human standards, unless you have an argument why that’s where the line is between victory and defeat. Ask the functional question – can this AI make your plan work? Can path #1 work? If not, well, time to try #2 or #3, or find a #4, I suppose.

I think this disagreement is rather a big deal. There’s quite a lot of ‘the best we can do’ or ‘what would seem like a responsible thing to do that wasn’t blameworthy’ thinking that doesn’t ask what would actually work. I’d be more inclined to think about Kokotajlo’s attractor states – are the alignment-relevant attributes strengthening themselves and strengthening the ability to strengthen themselves over time? Is the system virtuous in the way that successfully seeks greater virtue? Or is the system trying to preserve what it already has and maintain it under the stress of increasing capabilities and avoid things getting worse, or detect and stop ‘when things go wrong’?

Section three deals with goals, various things to prioritize along the path, then some heuristics are offered. Again, it all doesn’t seem like it is properly backward chaining from an actual route to victory?

I do appreciate the strategic discussions. If getting to certain thresholds greatly increases the effectiveness of spending resources, then you need to reach them as soon as possible, except insofar as you needed to accomplish certain other things first, or there are lag times in efforts spent. Of course, that depends on your ability to reliably actually pivot your allocations down the line, which historically doesn’t go well, and also the need to impact the trajectory of others.

I strongly agree that Magma shouldn’t work to mitigate ‘present risks’ other than to the extent this is otherwise good for business or helps build and spread the culture of safety, or otherwise actually advance the endgame. The exception is the big ‘present risk’ of the systems you’re relying on now not being in a good baseline state to help you start the virtuous cycles you will need. You do need the alignment relevant to that, that’s part of getting ready to ‘elicit safety work.’

Then the later section talks about things most currently deserving of more action, starting with efforts at nonproliferation and security. That definitely requires more and more urgent attention.

You know you’re in trouble when this is what the worried people are hoping for (Davidad is responding to Claymer):

Davidad: My hope is:

Humans out of the loop of data creation (Summer 2024?)

Humans out of the loop of coding (2025)

Humans out of the loop of research insights (2026)

Coordinate to keep humans in a code-review feedback loop for safety and performance specifications (2027)

This is far enough removed from the inner loop that it won’t slow things down much:

Inner loop of in-context learning/rollout

Training loop outside rollouts

Ideas-to-code loop outside training runs

Research loop outside engineering loop

Problem definition loop outside R&D loop

But the difference between having humans in the loop at all, versus not at all, could be crucial.

That’s quite the slim hope, if all you get is thinking you’re in the loop for safety and performance specifications, of things that are smarter than you that you can’t understand. Is it better than nothing? I mean, I suppose it is a little better, if you were going to go full speed ahead anyway. But it’s a hell of a best case scenario.

Teortaxes would like a word with those people, as he is not one of them.

Teortaxes: I should probably clarify what I mean by saying “Sarah is making a ton of sense” because I see a lot of doom-and-gloom types liked that post. To wit: I mean precisely what I say, not that I agree with her payoff matrix.

But also.

I tire of machismo. Denying risks is the opposite of bravery.

More to the point, denying the very possibility or risks, claiming that they are not rigorously imaginable—is pathetic. It is a mark of an uneducated mind, a swine. AGI doom makes perfect sense. On my priors, speedy deployment of AGI reduces doom. I may be wrong. That is all.

Bingo. Well said.

From my perspective, Teortaxes is what we call a Worthy Opponent. I disagree with Teortaxes in that I think that speedy development of AGI by default increases doom, and in particular that speedy development of AGI in the ways Teortaxes cheers along increases doom.

To the extent that Teortaxes has sufficiently good and strong reasons to think his approach is lower risk, I am mostly failing to understand those reasons. I think some of his reasons have merit but are insufficient, and others I strongly disagree with. I am unsure if I have understood all his important reasons, or if there are others as well.

I think he understands some but far from all of the reasons I believe the opposite paths are the most likely to succeed, in several ways.

I can totally imagine one of us convincing the other, at some point in the future.

But yeah, realizing AGI is going to be a thing, and then not seeing that doom is on the table and mitigating that risk matters a lot, is rather poor thinking.

And yes, denying the possibility of risks at all is exactly what he calls it. Pathetic.

It’s a problem, even if the issues involved would have been solvable in theory.

Charles Foster: “Good Guys with AI will defend us against Bad Guys with AI.”

OK but *who specificallyis gonna develop and deploy those defenses? The police? The military? AI companies? NGOs? You and me?

Oh, so now my best friend suddenly understands catastrophic risk.

Seth Burn: There are times where being 99% to get the good outcome isn’t all that comforting. I feel like NASA scientists should know that.

NASA: Newly spotted asteroid has small chance of hitting Earth in 2032.

Scientists put the odds of a strike at slightly more than 1%.

“We are not worried at all, because of this 99% chance it will miss,” said Paul Chodas, director of Nasa’s Centre for Near Earth Object Studies. “But it deserves attention.”

Irrelevant Pseudo Quant: Not sure we should be so hasty in determining which outcome is the “good” one Seth a lot can change in 7 years.

Seth Burn: The Jets will be 12-2 and the universe will be like “Nah, this shit ain’t happening.”

To be fair, at worst this would only be a regional disaster, and even if it did hit it probably wouldn’t strike a major populated area. Don’t look up.

And then there are those that understand… less well.

PoliMath: The crazy thing about this story is how little it bothers me

The chance of impact could be 50% and I’m looking at the world right now and saying “I’m pretty sure we could stop that thing with 7 years notice”

No, no, stop, the perfect Tweet doesn’t exist…

PoliMath: What we really should do is go out and guide this asteroid into a stable earth orbit so we can mine it. We could send space tourists to take a selfies with it, like Julius Caesar parading Vercingetorix around Rome.

At long last, we are non-metaphorically implementing the capture and mine the asteroid headed for Earth plan from the… oh, never mind.

Oh look, it’s the alignment plan.

Grant us all the wisdom to know the difference (between when this post-it is wise, and when it is foolish.)

You have to start somewhere.

Never change, Daily Star.

I love the ‘end of cash’ article being there too as a little easter egg bonus.

Discussion about this post

AI #102: Made in America Read More »

DOJ agrees to temporarily block DOGE from Treasury records

department of government efficiency, department of justice, doge, elon musk, Policy, Treasury Department / Rejus Almole / February 5, 2025

Elez reports to Tom Krause, another Treasury Department special government employee, but Krause doesn’t have direct access to the payment system, Humphreys told the judge. Krause is the CEO of Cloud Software Group and is also viewed as a Musk ally.

But when the judge pressed Humphreys on Musk’s alleged access, the DOJ lawyer only said that as far as the defense team was aware, Musk did not have access.

Further, Humphreys explained that DOGE—which functions as part of the executive office—does not have access, to the DOJ’s knowledge. As he explained it, DOGE sets the high-level priorities that these special government employees carry out, seemingly trusting the employees to identify waste and protect taxpayer dollars without ever providing any detailed reporting on the records that supposedly are evidence of mismanagement.

To Kollar-Kotelly, the facts on the record seem to suggest that no one outside the Treasury is currently accessing sensitive data. But when she pressed Humphreys on whether DOGE had future plans to access the data, Humphreys declined to comment, calling it irrelevant to the complaint.

Humphreys suggested that the government’s defense in this case would focus on the complaint that outsiders are currently accessing Treasury data, seemingly dismissing any need to discuss DOGE’s future plans. But the judge pushed back, telling Humphreys she was not trying to “nail” him “to the wall,” but there’s too little information on the relationship between DOGE and the Treasury Department as it stands. How these entities work together makes a difference, the judge suggested, in terms of safeguarding sensitive Treasury data.

According to Kollar-Kotelly, granting a temporary restraining order in part would allow DOGE to “preserve the status quo” of its current work in the Treasury Department while ensuring no new outsiders get access to Americans’ sensitive information. Such an order would give both sides time to better understand the current government workflows to best argue their cases, the judge suggested.

If the order is approved, it would remain in effect until the judge rules on plantiffs’ request for a preliminary injunction. At the hearing today, Kollar-Kotelly suggested that matter would likely be settled at a hearing on February 24.

DOJ agrees to temporarily block DOGE from Treasury records Read More »

Author name: Rejus Almole

Discussion about this post

Searching for neutrinos

A shift in approach to conservation

Genetics: Key ally in the reintroduction of the condor

A problem with lead

Will the condors become self-sufficient again?

From Chapultepec to the San Pedro Mártir Mountain Range

How WikiTok took off

Releasing the game might have been worse

Better to be lucky than to be good

Discussion about this post

Discussion about this post