Author name: Mike M.

stargate-ai-1

Stargate AI-1

There was a comedy routine a few years ago. I believe it was by Hannah Gadsby. She brought up a painting, and looked at some details. The details weren’t important in and of themselves. If an AI had randomly put them there, we wouldn’t care.

Except an AI didn’t put them there. And they weren’t there at random.

A human put them there. On purpose. Or, as she put it:

THAT was a DECISION.

This is the correct way to view decisions around a $500 billion AI infrastructure project, announced right after Trump takes office, having it be primarily funded by SoftBank, with all the compute intended to be used by OpenAI, and calling it Stargate.

  1. The Announcement.

  2. Is That a Lot?.

  3. What Happened to the Microsoft Partnership?.

  4. Where’s Our 20%?.

  5. Show Me the Money.

  6. It Never Hurts to Suck Up to the Boss.

  7. What’s in a Name.

  8. Just Think of the Potential.

  9. I Believe Toast is an Adequate Description.

  10. The Lighter Side.

OpenAI: Announcing The Stargate Project

The Stargate Project is a new company which intends to invest $500 billion over the next four years building new AI infrastructure for OpenAI in the United States. We will begin deploying $100 billion immediately.

Note that ‘intends to invest’ does not mean ‘has the money to invest’ or ‘definitely will invest.’ Intends is not a strong word. The future is unknown and indeed do many things come to pass.

This infrastructure will secure American leadership in AI, create hundreds of thousands of American jobs, and generate massive economic benefit for the entire world.

This project will not only support the re-industrialization of the United States but also provide a strategic capability to protect the national security of America and its allies.

One of these things is not like the others. Secure American leadership in AI, generate massive economic benefit for the entire world, provide strategic capability to allies, sure, fine, makes sense, support reindustrialization is a weird flex but kinda, yeah.

And then… jobs? American… jobs? Um, Senator Blumenthal, that is not what I meant.

Pradyumna:

> will develop superintelligence

> create thousands of jobs

????

Samuel Hammond: “We’re going to spend >10x the budget of the Manhattan Project building digital brains that can do anything human brains can do but better and oh, by the way, create over 100,000 good paying American jobs!”

There’s at least some cognitive dissonance here.

Arthur B: The project will probably most likely lead to mass unemployment but in the meantime, there’ll be great American jobs.

If you listen to Altman’s announcement, he too highlights these ‘hundreds of thousands of jobs.’ It’s so absurd. Remember when Altman tried to correct this error?

The initial equity funders in Stargate are SoftBank, OpenAI, Oracle, and MGX. SoftBank and OpenAI are the lead partners for Stargate, with SoftBank having financial responsibility and OpenAI having operational responsibility. Masayoshi Son will be the chairman.

Arm, Microsoft, NVIDIA, Oracle, and OpenAI are the key initial technology partners.

If you want to spend way too much money on a technology project, and give the people investing the money a remarkably small share of the enterprise, you definitely want to be giving Masayoshi Sun and Softbank a call.

“Sam Altman, you are not crazy enough. You need to think bigger.”

The buildout is currently underway, starting in Texas, and we are evaluating potential sites across the country for more campuses as we finalize definitive agreements.

This proves there is real activity, also it is a tell that some of this is not new.

As part of Stargate, Oracle, NVIDIA, and OpenAI will closely collaborate to build and operate this computing system. This builds on a deep collaboration between OpenAI and NVIDIA going back to 2016 and a newer partnership between OpenAI and Oracle.

This also builds on the existing OpenAI partnership with Microsoft. OpenAI will continue to increase its consumption of Azure as OpenAI continues its work with Microsoft with this additional compute to train leading models and deliver great products and services.

Increase consumption of compute is different from Azure as sole compute provider. It seems OpenAI expects plenty of compute needs to go around.

All of us look forward to continuing to build and develop AI—and in particular AGI—for the benefit of all of humanity. We believe that this new step is critical on the path, and will enable creative people to figure out how to use AI to elevate humanity.

Can’t stop, won’t stop, I suppose. ‘Enable creative people to elevate humanity’ continues to miss the point of the whole enterprise, but not as much as talking ‘jobs.’

Certainly $500 billion for this project sounds like a lot. It’s a lot, right?

Microsoft is investing $80 billion a year in Azure, which is $400 billion over 5 years, and I’d bet that their investment goes up over time and they end up spending over $500 billion during that five year window.

Haydn Belfield: Stargate is a remarkable step.

But, to put it into context, Microsoft will spend $80 billion on data centers this year, over half in the U.S.

Stargate’s $100 billion this year is more, but a comparable figure.

Rob S.: This is kind of misleading. Microsoft’s spend is also enormous and wildly out of the ordinary. Not normal at all.

Haydn Belfield: Definitely true, we’re living through a historic infrastructure build out like the railways, interstate highways or phone network

What I want to push back on a bit is that this is *the onlyeffort, that this is the manhattan/Apollo project

The number $500 billion is distributed throughout many sites and physical projects. If it does indeed happen, and it is counterfactual spending, then it’s a lot. But it’s not a sea change, and it’s not obvious that the actual spending should be surprising. Investments on this scale were already very much projected and already happening.

It’s also not that much when compared to the compute needs anticipated for the scaling of top end training runs, which very much continue to be a thing.

Yusuf Mahmood: Stargate shouldn’t have been that surprising!

It’s a $500 Bn project that is set to complete by 2029.

That’s totally consistent with estimates from @EpochAIResearch’s report last year on how scaling could continue through 2030.

$500 billion is a lot is to the extent all of this is dedicated specifically and exclusively to OpenAI, as opposed to Microsoft’s $80 billion which is for everyone. But it’s not a lot compared to the anticipated future needs of a frontier lab.

One thing to think about is that OpenAI recently raised money at a valuation of approximately $170 billion, presumably somewhat higher now with o3 and agents, but also potentially lower because of DeepSeek. Now we are talking about making investments dedicated to OpenAI of $500 billion.

There is no theoretical incompatibility. Perhaps OpenAI is mining for gold and will barely recoup its investment, while Stargate is selling pickaxes and will rake it in.

It does still seem rather odd to presume that is how the profits will be distributed.

The reason OpenAI is so unprofitable today is that they are spending a ton on increasing capabilities, and not serving enough inference to make it up on their unit economics, and also not yet using their AI to make money in other ways.

And yes, the equilibrium could end up being that compute providers have margins and model providers mostly don’t have margins. But OpenAI, if it succeeds, should massively benefit from economies of scale here, and its economics should improve. Thus, if you take Stargate seriously, it is hard to imagine OpenAI being worth only a fraction of $500 billion.

There is a solution to this puzzle. When we say OpenAI is worth $170 billion, we are not talking about all of OpenAI. We are talking about the part that takes outside investment. All the dramatic upside potential? That is for now owned by the non-profit, and not (or at least not fully) part of the valuation.

And that is the part that has the vast majority of the expected net present value of future cash flows of OpenAI. So OpenAI the entire enterprise can be worth quite a lot, and yet ‘OpenAI’ the corporate entity you can invest in is only worth $170 billion.

This should put into perspective that the move to a for-profit entity truly is in the running for the largest theft in the history of the world.

Didn’t they have an exclusive partnership?

Smoke-Away: OpenAI and Microsoft are finished. There were signs.

Microsoft was not moving quickly enough to scale Azure. Now they are simply another compute provider for the time being.

Sam Altman: Absolutely not! This is a very important and significant partnership, for a long time to come.

We just need moar compute.

Eliezer Yudkowsky (Quoting Smoke-Away): It is a pattern, with Altman. If Altman realizes half his dreams, in a few years we will be hearing about how Altman has dismissed the U.S. government as no longer useful to him. (If Altman realizes all his dreams, you will be dead.)

Roon: Not even close to being true.

Microsoft is one of the providers here. Reports are that the Microsoft partnership has now been renegotiated, to allow OpenAI to also seek other providers, since Altman needs moar compute. Hence Stargate. Microsoft will retain right of first refusal (ROFR), which seems like the right deal to make here. The question is, how much of the non-profit’s equity did Altman effectively promise in order to get free from under the old deal?

Remember that time Altman promised 20% of compute would go to superalignment, rather than blowing up a sun?

Harlan Stewart: Jul 2023: OpenAI promises to dedicate 20% of compute to safety research

May 2024: Fortune reports they never did that

Jul 2024: After 5 senators write to him to ask if OpenAI will, @sama says yes

It’s Jan 2025. Will OpenAI set aside 20% of this new compute to safety, finally?

Connor Axiotes: @tszzl (Roon), can you push for a significant part of this to be spent on control and alignment and safety policy work?

Roon: I’ll do my part. I’m actually on the alignment team at openai 🙂

So that’s a no, then.

I do expect Roon to push for more compute. I don’t expect to get anything like 20%.

Elon Musk (replying to the announcement): They don’t actually have the money.

Sam Altman: I genuinely respect your accomplishments and think you are the most inspiring entrepreneur of our time.

Elon Musk (continuing from OP): SoftBank has well under $10 billion secured. I have that on good authority.

Sam Altman: Wrong, as you surely know.

Want to come visit the first site already under way?

This is great for the country. I realize what is great for the country is not always what is optimal for your companies, but in your new role, I hope you will mostly put the United States first.

Satya Nadella (CEO Microsoft, on CNBC, when asked about whether Stargate has the money, watch the clip at the link his delivery is perfect): All I know is, I’m good for my $80 billion.

If you take the companies collectively, they absolutely have the money, or at least the ability to get the money. This is Microsoft and Nvidia. I have no doubt that Microsoft is, as its Nadella affirmed, ‘good for its $80 billion.’

That doesn’t mean SoftBank has the money, and SoftBank explicitly is tasked with providing the funding for Stargate.

Nor does the first site in Texas prove anything either way on this.

Remember the wording on the announcement: “which intends to invest $500 billion over the next four years.”

That does not sound like someone who has the money.

That sounds like someone who intends to raise the money. And I presume SoftBank has every expectation of being able to do so, with the aid of this announcement. And of working out the structure. And the financing.

Mario Nawfal: Sam Altman’s grand plan to build “Stargate,” a $500 billion AI infrastructure exclusively for OpenAI, is already falling apart before it even starts.

There’s no secured funding, no government support, no detailed plan, and, according to insiders, not even a clear structure.

One source bluntly admitted:

“They haven’t figured out the structure, they haven’t figured out the financing, they don’t have the money committed.”

Altman’s pitch? SoftBank and OpenAI will toss in $15 billion each and then just… hope the rest magically appears from investors and debt.

For someone obsessed with making AI smarter than humans, maybe he should try getting the basics right first – like not creating something that could destroy all of humanity… Just saying.

But that’s why you say ‘intend to invest’ rather than ‘will invest.’

Things between Musk and Altman did not stop there, as we all took this opportunity to break open the International Popcorn Reserve.

Elon Musk: Altman literally testified to Congress that he wouldn’t get OpenAI compensation and now he wants $10 billion! What a liar.

Musk’s not exactly wrong about that. He also said and retweeted other… less dignified things.

It was not a good look for either party. Elon Musk is, well, being Elon Musk. Altman is trying to throw in performative ‘look at me taking the high road’ statements that should fool no one, not only the one above but also:

Sam Altman: just one more mean tweet and then maybe you’ll love yourself…

Teortaxes (quoting Altman saying he respects Elon’s accomplishments above): I find both men depicted here unpleasant and engaging in near-psychopathic behavior, and I also think poorly of those who imagine Sam is trying to “be the bigger man”.

He’s a scary manipulative snake. “Well damn, fyou too Elon, we have it” would be more dignified.

There’s a subtle art to doing this sort of thing well. The Japanese especially are very good at it. All of this is, perhaps, the exact opposite of that.

Sam Altman: big. beautiful. buildings. stargate site 1, texas, january 2025.

Altman, you made it weird. Also guache. Let’s all do better.

Trump world is not, as you would expect, thrilled with what Musk has been up to, with Trump saying he is ‘furious,’ saying he ‘got over his skis.’ My guess is that Trump ‘gets it’ at heart, because he knows what it’s like to hate and never let something go, and that this won’t be that big a deal for Musk’s long term position, but there is high variance. I could easily be wrong about that. If I was Musk I would not have gone with this strategy, but that statement is almost always true and why I’m not Musk.

This particular Rule of Acquisition is somewhat imprecise. It’s not always true.

But Donald Trump? Yeah. It definitely never hurts to suck up to that particular boss.

Sam Altman (January 22, 2025): watching @potus more carefully recently has really changed my perspective on him (i wish i had done more of my own thinking and definitely fell in the npc trap).

i’m not going to agree with him on everything, but i think he will be incredible for the country in many ways!

Altman does admit this is a rather big change. Anyone remember when Altman said “More terrifying than Trump intentionally lying all the time is the possibility that he actually believes it all” or when he congratulated Reid Hoffman for helping keep Trump out of power? Or “Back to work tomorrow on a new project to stop Trump?” He was rather serious about wanting to stop Trump.

You can guess what I think he saw while watching Trump to make Altman change his mind.

So they announced this $500 billion deal, or at least a $100 billion deal with intent to turn it into $500 billion, right after Trump’s inauguration, with construction already underway, with a press conference on the White House lawn.

And the funds are all private. Which is great, but all this together also raises the obvious question: Does Trump actually have anything to do with this?

Matthew Yglesias: They couldn’t have done it without Trump, but also it was already under construction.

Daniel Eth: Okay, it’s not *Trump’sAI plan. He announced it, but he neither developed nor is funding it. It’s a private initiative from OpenAI, Softbank, Oracle, and a few others.

Jamie Bernardi: Important underscussed point on the OpenAI $100bn deal: money is not coming from the USG.

Trump is announcing a private deal, whilst promising to make “emergency declarations” to allow Stargate to generate its own electricity (h/t @nytimes).

Musk says 100bn not yet raised.

Peter Wildeford: Once upon a time words had meaning.

Jake Perry: I’m still not clear why this was announced at the White House at all.

Peter Wildeford: Trump has a noted history of announcing infrastructure projects that were already in progress – he did this a lot in his first term.

Jacques: At least we’ll all be paperclipped with a USA flag engraved on it.

Trump says that it is all about him, of course:

Donald Trump: This monumental undertaking is a resounding declaration of confidence in America’s potential under a new president.

The president said Stargate would create 100,000 jobs “almost immediately” and keep “the future of technology” in America.

I presume that in addition to completely missing the point, this particular jobs claim is, technically speaking, not true. But numbers don’t have to be real in politics. And of course, if this is going to create those jobs ‘almost immediately’ it had to have been in the works for a long time.

Shakeel: I can’t get over the brazen, brazen lie from Altman here, saying “We couldn’t do this without you, Mr President”.

You were already doing it! Construction started ages ago!

Just a deeply untrustworthy man — you can’t take anything he says at face value.

Dylan Matthews: Everything that has happened since the board fired him has 100% vindicated their view of him as deeply dishonest and unreliable, and I feel like the popular understanding of that incident hasn’t updated from “this board sure is silly!”

[Chubby: Sam Altman: hype on twitter is out of control. Everyone, chill down.

Also Sam Altman: anyways, let’s invest half a trillion to build a digital god and cure cancer one and for all. Oh, and my investors just said that AGI comes very, very soon and ASI will solve any problem mankind faces.

But everyone, calm down 100x]

I agree with Dylan Matthews that the board’s assessment of Altman as deeply dishonest and unreliable has very much been vindicated, and Altman’s actions here only confirm that once again. But that doesn’t mean that Trump has nothing to do with the fact that this project is going forward, with this size.

So how much does this project depend on Trump being president instead of Harris?

I think the answer is actually a substantial amount.

In order to build AI infrastructure in America, you need three things.

  1. You need demand. Check.

  2. You need money. Check, or at least check in the mail.

  3. You need permission to actually build it. Previously no check. Now, maybe check?

Masayoshi Sun: Mr. President, last month I came to celebrate your winning and promised $100B. And you told me go for $200B. Now I came back with $500B. This is because as you say, this is the beginning of the Golden Age. We wouldn’t have decided this unless you won.

Sam Altman: The thing I really deeply agree with the president on is, it is wild how difficult it has become to build things in the United States. Power plants, data centres, any of that kind of stuff.

Does Sun have many good reasons to pretend that this is all because of Trump? Yes, absolutely. He would find ways to praise the new boss either way. But I do think that Trump mattered here, even if you don’t think that there is anything corrupt involved in all this.

Look at Trump’s executive orders, already signed, about electrical power plants and transmission lines being exempt from NEPA, and otherwise being allowed to go forwards. They can expect more similar support in the future, if they run into roadblocks, and fewer other forms of regulatory trouble and everything bagel requirements across the board.

Also, I totally believe that Sun came to Trump and promised $100 billion, and Trump said go for $200 billion, and Sun now is at $500 billion, and I think that plausibly created a lot of subsequent investment. It may sound stupid, but that’s Grade-A handling of Masayoshi Sun, and exactly within Trump’s wheelhouse. Tell the man who thinks big he’s not thinking big enough. Just keep him ramping up. Don’t settle for a big win when you can go for an even bigger win. You have to hand it to him.

It is so absurd that these people, with a straight face, decided to call this Stargate.

They wanted to call it the Enterprise, but their lawyers wouldn’t let them.

Was SkyNet still under copyright?

Agus: Ah, yes. Of course we’re naming this project after the fictitious portal through which several hostile alien civilizations attempted to invade and destroy Earth.

I just hope we get the same amount of completely unrealistic plot armor that protected Stargate Command in S.G.1.

Roon: the Stargate. blasting a hole into the Platonic realm to summon angels. First contact with alien civilizations.

Canonically, the Stargates are sometimes used by dangerous entities to harm us, but once humanity deals with that, they end up being quite useful.

Zvi Mowshowitz: Guy who reads up on the canonical history of Stargate and thinks, “Oh, all’s well that ends well. Let’s try that plan.”

Roon: 🤣

Is this where I give you 10,000 words on the history of Stargate SG-1 and Stargate Atlantis and all the different ways Earth and often also everyone else would have been enslaved or wiped out if it wasn’t for narrative causality and plot armor, and what would have been reasonable things to do in that situation?

No, and I am sad about that, despite yes having watched all combined 15 seasons, because alas we do not currently have that kind of time. Maybe later I’ll be able to spend a day doing that, it sounds like fun.

But in brief about that Stargate plan. Was it a good plan? What were the odds?

As is pointed out in the thread (minor spoilers for the end of season 1), the show actually answers this question, as there is crossover between different Everett branches, and we learn that even relatively early on – before most of the different things that almost kill us have a chance to almost kill us – that most branches have already lost. Which was one of the things that I really liked about the show, that it realized this. The thread also includes discussions of things like ‘not only did we not put a nuclear bomb by the Stargate and use a secondary gate to disguise our location, we wore Earth’s gate code on our fing uniforms.’

To be fair, there is a counterargument, which is that (again, minor spoilers) humanity was facing various ticking clocks. There was one in particular that was ticking in ways Earth did not cause, and then there were others that were set in motion rapidly once we had a Stargate program, and in general we were on borrowed time. So given what was happening we had little choice but to go out into the galaxy and try to develop superior technology and find various solutions before time ran out on us, and it would have been reasonable to expect we were facing a ticking clock in various ways given what Earth knew at the time.

There’s also the previous real life Project Stargate, a CIA-DIA investigation of the potential for psychic phenomena. That’s… not better.

There are also other ways to not be thrilled by all this.

Justin Amash: The Stargate Project sounds like the stuff of dystopian nightmares—a U.S. government-announced partnership of megacorporations “to protect the national security of America and its allies” and harness AGI “for the benefit of all of humanity.” Let’s maybe take a beat here.

Taking a beat sounds like a good idea.

What does Trump actually think AI can do?

Samuel Hammond: Trump seems under the impression that ASI is just a way to cure diseases and not an ultraintelligent digital lifeform with autonomy and self-awareness. Sam’s hesitation before answering speaks volumes.

That’s not how I view the clip at the link. Trump is selling the project. It makes sense to highlight medical advances, which are a very real and valuable upside. It certainly makes a lot more sense than highlighting job creation.

Altman I don’t see hesitating, I see him trying to be precise while also going with the answer, and I don’t like his previous emphasis on jobs (again, no doubt, following Trump’s and his political advisor’s lead) but on the medical question I think he does well and it’s not obvious what a better answer would have been.

The hilarious part of this is the right wing faction that says ‘you want to use this to make mRNA vaccines, wtf I hate AI now’ and trying to figure out what to do with people whose worldviews are that hopelessly inverted.

That moment when you say ‘look at how this could potentially cure cancer’ and your hardcore supporters say ‘And That’s Terrible.’

And also when you somehow think ‘Not Again!’

Eliezer Yudkowsky: Welp, looks like Trump sure is getting backlash to the Stargate announcement from many MAGAers who are outraged that AGIs might develop mRNA vaccines and my fucking god it would be useless to evacuate to Mars but I sure see why Elon wants to

To people suggesting that I ought to suck up to that crowd: On my model of them, they’d rather hear me say “Fyou lunatics, now let’s go vote together I guess” than have me pretend to suck up to them.

Like, on my model, that crowd is deadly tired of all the BULLSHIT and we in fact have that much in common and I bet I can get further by not trying to feed them any BULLSHIT.

There is a deep sense in which it is more respectful to someone as a human being to say, “I disagree with your fing lunacy. Allies?” then to smarm over to them and pretend to agree with them. And I think they know that.

RPotluck: The MAGAsphere doesn’t love you and it doesn’t hate you, but you’re made of arguments the MAGAsphere can use to build the wall.

There’s a certain kind of bullshit that these folks and many other folks are deeply tired of hearing. This is one of those places where I very much agree that it does hurt to suck up to the boss, both because the boss will see through it and because the whole strategy involves not doing things like that, and also have you seen or heard the boss.

My prediction and hope is that we will continue to see those worried about AI killing everyone continue to not embrace these kinds of crazy arguments of convenience. That doesn’t mean not playing politics at all or being some sort of suicidal purist. It does mean we care about whether our arguments are true, rather than treating them as soldiers for a cause.

Whereas we have learned many times, most recently with the fight over SB 1047 and then the latest round of jingoism, that many (#NotAllUnworried!) of those who want to make sure others do not worry about AI killing everyone, or at least want to ensure that creating things smarter than humans faces less regulatory barriers than a barber shop, care very little whether the arguments made on their behalf, by themselves or by others, are true or correspond to physical reality. They Just Didn’t Care.

The flip side is the media, which is, shall we say, not situationally aware.

Spencer Schiff: The AGI Manhattan Project announcement was followed by half an hour of Q&A. Only one reporter asked a question about it. WHAT THE FUCK! This is insane. The mainstream media is completely failing to convey the gravity of what’s happening to the general public.

As noted elsewhere I don’t think this merits ‘Manhattan Project’ for various reasons but yes, it is kind of weird to announce a $500 billion investment in artificial general intelligence and then have only one question about it in a 30 minute Q&A.

I’m not saying that primarily from an existential risk perspective – this is far more basic even than that. I’m saying, maybe this is a big deal that all this is happening, maybe ask some questions about it?

Remember when Altman was talking about how we have to build AGI now because he was worried about a compute overhang? Yes, well.

Between the $500 billion of Stargate, the full-on jingoistic rhetoric from all sides including Anthropic, and the forcing function of DeepSeek with v3 and r1, it is easy to see how one could despair over our prospects for survival.

Unless something changes, we are about to create smarter than human intelligence, entities more capable and competitive than we are across all cognitive domains, and we are going to do so as rapidly as we can and then put them in charge of everything, with essentially zero margin to ensure that this goes well despite it obviously by default getting everyone killed.

Even if we are so fortunate that the technical and other barriers in front of us are highly solvable, that is exactly how we get everyone killed anyway.

Holly Elmore: I am so, so sad today. Some days the weight of it all just hits me. I want to live my life with my boyfriend. I want us to have kids. I want love and a full life for everyone. Some days the possibility that that will all be taken away is so palpable, and grief is heavy.

I’m surprised how rarely I feel this way, given what I do. I don’t think it’s bad to feel it all sometimes. Puts you in touch with what you’re fighting for.

I work hard to find the joy and the gallows humor in it all, to fight the good fight, to say the odds are against us and the situation is grim, sounds like fun. One must imagine Buffy at the prom, and maintain Scooby Gang Mindset. Also necessary is the gamer mindset, which says you play to win the game, and in many ways it’s easiest to play your best game with your back against the wall.

And in a technical sense, I have hope that the solutions exist, and that there are ways to at least give ourselves a fighting chance.

But yeah, weeks like this do not make it easy to keep up hope.

Harlan Stewart: If the new $500b AI infrastructure thing ever faces a major scandal, we’ll unfortunately be forced to call it Stargategate

Discussion about this post

Stargate AI-1 Read More »

way-more-game-makers-are-working-on-pc-titles-than-ever,-survey-says

Way more game makers are working on PC titles than ever, survey says

Four out of five game developers are currently working on a project for the PC, a sizable increase from 66 percent of developers a year ago. That’s according to Informa’s latest State of the Game Industry survey, which partnered with Omdia to ask over 3,000 game industry professionals about their work in advance of March’s Game Developers Conference.

The 80 percent of developers working on PC projects in this year’s survey is by far the highest mark for any platform dating back to at least 2018, when 60 percent of surveyed developers were working on a PC game. In the years since, the ratio of game developers working on the PC has hovered between 56 and 66 percent before this year’s unexpected jump. The number of game developers saying they were interested in the PC as a platform also increased substantially, from 62 percent last year to 74 percent this year.

While the PC has long been the most popular platform in this survey, the sudden jump in the last year was rather large.

Credit: Kyle Orland / Informa

While the PC has long been the most popular platform in this survey, the sudden jump in the last year was rather large. Credit: Kyle Orland / Informa

The PC has long been the most popular platform for developers to work on in the annual State of the Game Industry survey, easily outpacing consoles and mobile platforms that generally see active work from anywhere between 12 to 36 percent of developer respondents, depending on the year. In its report, Informa notes this surge as a “passion for PC development explod[ing]” among developers, and mentions that while “PC has consistently been the platform of choice… this year saw its dominance increase even more.”

The increasing popularity of PC gaming among developers is also reflected in the number of individual game releases on Steam, which topped out at a record of 18,974 individual titles for 2024, according to SteamDB. That record number was up over 32 percent from 2023, which was up from just under 16 percent from 2022 (though many Steam games each year were “Limited Games” that failed to meet Valve’s minimum engagement metrics for Badges and Trading Cards).

The number of annual Steam releases also points to increasing interest in the platform.

The number of annual Steam releases also points to increasing interest in the platform. Credit: SteamDB

The Steam Deck effect?

While it’s hard to pinpoint a single reason for the sudden surge in the popularity of PC game development, Informa speculates that it’s “connected to the rising popularity of Valve’s Steam Deck.” While Valve has only officially acknowledged “multiple millions” in sales for the portable hardware, GameDiscoverCo analyst Simon Carless estimated that between 3 million and 4 million Steam Deck units had been sold by October 2023, up significantly from reports of 1 million Deck shipments in October 2022.

Way more game makers are working on PC titles than ever, survey says Read More »

trump-can-save-tiktok-without-forcing-a-sale,-bytedance-board-member-claims

Trump can save TikTok without forcing a sale, ByteDance board member claims

TikTok owner ByteDance is reportedly still searching for non-sale options to stay in the US after the Supreme Court upheld a national security law requiring that TikTok’s US operations either be shut down or sold to a non-foreign adversary.

Last weekend, TikTok briefly went dark in the US, only to come back online hours later after Donald Trump reassured ByteDance that the US law would not be enforced. Then, shortly after Trump took office, he signed an executive order delaying enforcement for 75 days while he consulted with advisers to “pursue a resolution that protects national security while saving a platform used by 170 million Americans.”

Trump’s executive order did not suggest that he intended to attempt to override the national security law’s ban-or-sale requirements. But that hasn’t stopped ByteDance, board member Bill Ford told World Economic Forum (WEF) attendees, from searching for a potential non-sale option that “could involve a change of control locally to ensure it complies with US legislation,” Bloomberg reported.

It’s currently unclear how ByteDance could negotiate a non-sale option without facing a ban. Joe Biden’s extended efforts through Project Texas to keep US TikTok data out of China-controlled ByteDance’s hands without forcing a sale dead-ended, prompting Congress to pass the national security law requiring a ban or sale.

At the WEF, Ford said that the ByteDance board is “optimistic we will find a solution” that avoids ByteDance giving up a significant chunk of TikTok’s operations.

“There are a number of alternatives we can talk to President Trump and his team about that are short of selling the company that allow the company to continue to operate, maybe with a change of control of some kind, but short of having to sell,” Ford said.

Trump can save TikTok without forcing a sale, ByteDance board member claims Read More »

wine-10.0-brings-arm-windows-apps-to-linux,-still-is-not-an-emulator

Wine 10.0 brings Arm Windows apps to Linux, still is not an emulator

The open source Wine project—sometimes stylized WINE, for Wine Is Not an Emulator—has become an important tool for companies and individuals who want to make Windows apps and games run on operating systems like Linux or even macOS. The CrossOver software for Mac and Windows, Apple’s Game Porting Toolkit, and the Proton project that powers Valve’s SteamOS and the Steam Deck are all rooted in Wine, and the attention and resources put into the project in recent years have dramatically improved its compatibility and usefulness.

Yesterday, the Wine project announced the stable release of version 10.0, the next major version of the compatibility layer that is not an emulator. The headliner for this release is support for ARM64EC, the application binary interface (ABI) used for Arm apps in Windows 11, but the release notes say that the release contains “over 6,000 individual changes” produced over “a year of development effort.”

ARM64EC allows developers to mix Arm and x86-compatible code—if you’re making an Arm-native version of your app, you can still allow the use of more obscure x86-based plugins or add-ons without having to port everything over at once. Wine 10.0 also supports ARM64X, a different type of application binary file that allows ARM64EC code to be mixed with older, pre-Windows 11 ARM64 code.

Wine’s ARM64EC support does have one limitation that will keep it from working on some prominent Arm Linux distributions, at least by default: the release notes say it “requires the system page size to be 4K, since that is what the Windows ABI specifies.” Several prominent Linux-on-Arm distributions default to a 16K page size because it can improve performance—when page sizes are smaller, you need more of them, and managing a higher number of pages can introduce extra CPU overhead.

Asahi Linux, the Fedora-based distribution that is working to bring Linux to Apple Silicon Macs, uses 16K pages because that’s all Apple’s processors support. Some versions of the Raspberry Pi OS also default to a 16K page size, though it’s possible to switch to 4K for compatibility’s sake. Given that the Raspberry Pi and Asahi Linux are two of the biggest Linux-on-Arm projects going right now, that does at least somewhat limit the appeal of ARM64EC support in Wine. But as we’ve seen with Proton and other successful Wine-based compatibility layers, laying the groundwork now can deliver big benefits down the road.

Wine 10.0 brings Arm Windows apps to Linux, still is not an emulator Read More »

samsung’s-galaxy-s25-event-was-an-ai-presentation-with-occasional-phone-hardware

Samsung’s Galaxy S25 event was an AI presentation with occasional phone hardware

Samsung announced the Galaxy S25, S25+, and S25 Ultra at its Unpacked event today. What is different from last year’s models? With the phones themselves, not much, other than a new chipset and a wide camera. But pure AI optimism? Samsung managed to pack a whole lot more of that into its launch event and promotional materials.

The corners on the S25 Ultra are a bit more rounded, the edges are flatter, and the bezels seem to be slightly thinner. The S25 and S25+ models have the same screen size as the S24 models, at 6.2 and 6.7 inches, respectively, while the Ultra notches up slightly from 6.8 to 6.9 inches.

Samsung’s S25 Ultra, in titanium builds colored silver blue, black, gray, and white silver.

Credit: Samsung

Samsung’s S25 Ultra, in titanium builds colored silver blue, black, gray, and white silver. Credit: Samsung

The S25 Ultra, starting at $1,300, touts a Snapdragon 8 Elite processor, a new 50-megapixel ultra-wide lens, and what Samsung claims is improved detail in software-derived zoom images. It comes with the S Pen, a vestige of the departed Note line, but as The Verge notes, there is no Bluetooth included, so you can’t pull off hand gestures with the pen off the screen or use it as a quirky remote camera trigger.

Samsung’s S25 Plus phones, in silver blue, navy, and icy blue.

Credit: Samsung

Samsung’s S25 Plus phones, in silver blue, navy, and icy blue. Credit: Samsung

It’s much the same with the S25 and S25 Plus, starting at $800. The base models got an upgrade to a default of 12GB of RAM. The displays, cameras, and general shape and build are the same. All the Galaxy devices released in 2025 have Qi2 wireless charging support—but not by default. You’ll need a “Qi2 Ready” magnetic case to get a sturdy attachment and the 15 W top charging speed.

One thing that hasn’t changed, for the better, is Samsung’s recent bump up in longevity. Each Galaxy S25 model gets seven years of security updates and seven of OS upgrades, which matches Google’s Pixel line in number of years.

Side view of the Galaxy S25 Edge, which is looking rather thin. Samsung

At the very end of Samsung’s event, for less than 30 seconds, a “Galaxy S25 Edge” was teased. In a mostly black field with some shiny metal components, Samsung seemed to be teasing the notably slimmer variant of the S25 that had been rumored. The same kinds of leaks about an “iPhone Air” have been circulating. No details were provided beyond its name, and a brief video suggesting its svelte nature.

Samsung’s Galaxy S25 event was an AI presentation with occasional phone hardware Read More »

on-deepseek’s-r1

On DeepSeek’s r1

r1 from DeepSeek is here, the first serious challenge to OpenAI’s o1.

r1 is an open model, and it comes in dramatically cheaper than o1.

People are very excited. Normally cost is not a big deal, but o1 and its inference-time compute strategy is the exception. Here, cheaper really can mean better, even if the answers aren’t quite as good.

You can get DeepSeek-r1 on HuggingFace here, and they link to the paper.

The question is how to think about r1 as it compares to o1, and also to o1 Pro and to the future o3-mini that we’ll get in a few weeks, and then to o3 which we’ll likely get in a month or two.

Taking into account everything I’ve seen, r1 is still a notch below o1 in terms of quality of output, and further behind o1 Pro and the future o3-mini and o3.

But it is a highly legitimate reasoning model where the question had to be asked, and you absolutely cannot argue with the price, which is vastly better.

The best part is that you see the chain of thought. For me that provides a ton of value.

r1 is based on DeepSeek v3. For my coverage of v3, see this post from December 31, which seems to have stood up reasonably well so far.

This post has 4 parts: First in the main topic at hand, I go over the paper in Part 1, then the capabilities in Part 2.

Then in Part 3 I get into the implications for policy and existential risk, which are mostly exactly what you would expect, but we will keep trying.

Finally we wrap up with a few of the funniest outputs.

  1. Part 1: RTFP: Read the Paper.

  2. How Did They Do It.

  3. The Aha Moment.

  4. Benchmarks.

  5. Reports of Failure.

  6. Part 2: Capabilities Analysis

  7. Our Price Cheap.

  8. Other People’s Benchmarks.

  9. r1 Makes Traditional Silly Mistakes.

  10. The Overall Vibes.

  11. If I Could Read Your Mind.

  12. Creative Writing.

  13. Bring On the Spice.

  14. We Cracked Up All the Censors.

  15. Switching Costs Are Low In Theory.

  16. The Self-Improvement Loop.

  17. Room for Improvement.

  18. Part 3: Where Does This Leave Us on Existential Risk?

  19. The Suicide Caucus.

  20. v3 Implies r1.

  21. Open Weights Are Unsafe And Nothing Can Fix This.

  22. So What the Hell Should We Do About All This?

  23. Part 4: The Lighter Side.

They call it DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.

The claim is bold: A much cheaper-to-run open reasoning model as good as o1.

Abstract: We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities.

Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors.

However, it encounters challenges such as poor readability, and language mixing.

To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates multi-stage training and cold-start data before RL. DeepSeek R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.

To support the research community, we open-source DeepSeek-R1-Zero, DeepSeek-R1, and six dense models (1.5B, 7B, 8B, 14B, 32B, 70B) distilled from DeepSeek-R1 based on Qwen and Llama.

They also claim substantial improvement over state of the art for the distilled models.

They are not claiming to be as good as o1-pro, but o1-pro has very large inference costs, putting it in a different weight class. Presumably one could make an r1-pro, if one wanted to, that would improve upon r1. Also no doubt that someone will want to.

They trained R1-Zero using pure self-evaluations via reinforcement learning, starting with DeepSeek-v3-base and using GRPO, showing that the cold start data isn’t strictly necessary.

To fix issues from there including readability and language mixing, however, they then used a small amount of cold-start data and a multi-stage training pipeline, and combined this with supervised data for various domains later in the process, to get DeepSeek-R1. In particular they do not use supervised fine-tuning (SFT) as a preminimary step, only doing some SFT via rejection sampling later in the process, and especially to train the model on non-reasoning tasks like creative writing.

They use both an accuracy reward and a format reward to enforce the and labels, but don’t evaluate the thinking itself, leaving it fully unconstrained, except that they check if the same language is used throughout to stamp out language mixing. Unlike o1, we get to see inside that chain of thought (CoT).

They then distilled this into several smaller models.

More details and various equations and such can be found in the paper.

Over time this caused longer thinking time, seemingly without limit:

Both scales are linear and this graph looks very linear. I presume it would have kept on thinking for longer if you gave it more cycles to learn to do that.

I notice that in 2.3.4 they do additional reinforcement learning for helpfulness and harmlessness, but not for the third H: honesty. I worry that this failure is primed to bite us in the collective ass in various ways, above and beyond all the other issues.

wh has a thread with a parallel similar explanation, with the same takeaway that I had. This technique was simple, DeepSeek and OpenAI both specialize in doing simple things well, in different ways.

Yhprum also has a good thread on how they did it, noting how they did this in stages to address particular failure modes.

Contra Jim Fan, There is one thing missing from the paper? Not that I fault them.

1a3orn: The R1 paper is great, but includes ~approximately nothing~ about the details of the RL environments.

It’s worth noticing. If datasets were king for the past three years, the RL envs probably will be for the next few.

This was striking to a lot of people and also stuck out to Claude unprompted, partly because it’s a great name – it’s an aha moment when the model went ‘aha!’ and the researchers watching it also went ‘aha!’ So it’s a very cool framing.

During this phase, DeepSeek-R1-Zero learns to allocate more thinking time to a problem by reevaluating its initial approach. This behavior is not only a testament to the model’s growing reasoning abilities but also a captivating example of how reinforcement learning can lead to unexpected and sophisticated outcomes.

This moment is not only an “aha moment” for the model but also for the researchers observing its behavior. It underscores the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies.

It’s cool to see it happen for real, and I’m obviously anchored by the result, but isn’t this to be expected? This is exactly how all of this works, you give it the objective, it figures out on its own how to get there, and given it has to think in tokens and how thinking works, and that the basic problem solving strategies are all over its original training data, it’s going to come up with all the usual basic problem solving strategies.

I see this very similarly to the people going ‘the model being deceptive, why I never, that must be some odd failure mode we never told it to do that, that doesn’t simply happen.’ And come on, this stuff is ubiquitous in humans and in human written content, and using it the ways it is traditionally used is going to result in high rewards and then you’re doing reinforcement learning. And then you go acting all ‘aha’?

The cocky bastards say in 2.4 (I presume correctly) that if they did an RL stage in the distillations it would improve performance, but since they were only out to demonstrate effectiveness they didn’t bother.

As always, benchmarks are valuable information especially as upper bounds, so long as you do not treat them as more meaningful than they are, and understand the context they came from.

Note that different graphs compare them to different versions of o1 – the one people currently used is called o1-1217.

The Qwen versions are clearly outperforming the Llama versions on the benchmarks, although as usual one would want to double check that in practice.

I want to give thanks to DeepSeek for section 4.2, on Unsuccessful Attempts. They tried Process Reward Model (PRM), and Monte Carlo Tree Search (MCTS), and explained various reasons why both ultimately didn’t work.

More reports should do this, and doing this is substantially to their credit.

Sasha Rush: Post-mortem after Deepseek-r1’s killer open o1 replication.

We had speculated 4 different possibilities of increasing difficulty (G&C, PRM, MCTS, LtS). The answer is the best one! It’s just Guess and Check.

There’s also the things they haven’t implemented yet. They aren’t doing function calling, multi-turn, complex roleplaying or json output. They’re not optimizing for software engineering.

I buy the claim by Teortaxes that these are relatively easy things to do, they simply haven’t done them yet due to limited resources, mainly compute. Once they decide they care enough, they’ll come. Note that ‘complex role-playing’ is a place it’s unclear how good it can get, and also that this might sound like a joke but it is actually highly dangerous.

Here Lifan Yuan argues that the noted PRM failures can be addressed.

Given the league that r1 is playing in, it is dirt cheap.

When they say it is 30 times cheaper than o1, story largely checks out: o1 is $15/$60 for a million input and output tokens, and r1 varies since it is open but is on the order of $0.55/$2.19.

Claude Sonnet is $3/$15, which is a lot more per token, but notice the PlanBench costs are actually 5x cheaper than r1, presumably because it used a lot less tokens (and also didn’t get good results in that case, it’s PlanBench and only reasoning models did well).

The one catch is that with r1 you do have to pay for the tokens. I asked r1 to estimate what percentage of tokens are in the CoT, and it estimated 60%-80%, with more complex tasks using relatively more CoT tokens, in an answer that was roughly 75% within the CoT.

If you only care about the final output, then that means this is more like 10 times cheaper than o1 rather than 30 times cheaper. So it depends on whether you’re making use of the CoT tokens. As a human, I find them highly useful (see the section If I Could Read Your Mind), but if I was using r1 at scale and no human was reading the answers, it would be a lot less useful – although I’d be tempted to have even other AIs be analyzing the CoT.

The web interface is both fast and very clean, it’s a great minimalist approach.

Gallabytes: the DeepSeek app is so much better implemented than the OpenAI one, too. None of these frequent crashes, losing a whole chain-of-thought (CoT), occur. I can ask it a question, then tab away while it is thinking, and it does not break.

Edit: It has good PDF input, too? Amazing.

Another issue is IP and privacy – you might not trust DeepSeek. Which indeed I wouldn’t, if there were things I actively didn’t want someone to know.

Gallabytes: is anyone hosting r1 or r1-zero with a stronger privacy policy currently? would love to use them for work but wary about leaking ip.

David Holz: Should we just self host?

Gallabytes: In principle yes but it seems expensive – r1 is pretty big. and I’d want a mobile app, not sure how easy that is to self host.

Xeophon: OpenWebUI if you are okay with a (mobile) browser.

Gallabytes: as long as it doesn’t do the stupid o1 thing where I have to keep it in the foreground to use it then it’ll still be a huge improvement over the chatgpt app.

Xeophon: Fireworks has R1 for $8/M

Running it yourself is a real option.

Awni Hannun: DeepSeek R1 671B running on 2 M2 Ultras faster than reading speed.

Getting close to open-source O1, at home, on consumer hardware.

With mlx.distributed and mlx-lm, 3-bit quantization (~4 bpw)

Seth Rose: I’ve got a Macbook Pro M3 (128GB RAM) – what’s the “best” deepseek model I can run using mlx with about 200 GB of storage?

I attempted to run the 3-bit DeepSeek R1 version but inadvertently overlooked potential storage-related issues. 😅

Awni Hannun: You could run the Distill 32B in 8-bit no problem: mlx-community/DeepSeek-R1-Distill-Qwen-32B-MLX-8Bit

If you want something faster try the 14B or use a lower precision.

The 70B in 4-6 bit will also run pretty well, and possibly even in 8-bit though it will be slow. Those quants aren’t uploaded yet though

With the right web interface you can get at least 60 tokens per second.

Teortaxes also reports that kluster.ai is offering overnight tokens at a discount.

People who have quirky benchmarks are great, because people aren’t aiming at them.

Xoephon: I am shocked by R1 on my personal bench.

This is the full eval set, it completely crushes the competition and is a whole league on its own, even surpassing o1-preview (which is omitted from the graph as I ran it only twice, it scored 58% on avg vs. 67% avg. R1).

Holy shit what the f, r1 beats o1-preview on my bench.

Kartik Valmeekam: 📢 DeepSeek-R1 on PlanBench 📢

DeepSeek-R1 gets similar performance as OpenAI’s o1 (preview)—achieving 96.6% on Blocksworld and 39.8% on its obfuscated version, Mystery BW.

The best part?

⚡It’s 21x cheaper than o1-preview, offering similar results at a fraction of the cost!

Note the relative prices. r1 is a little over half the price of o1-mini in practice, 21x cheaper than o1-preview, but still more expensive than the non-reasoning LLMs. Of course, it’s PlanBench, and the non-reasoning LLMs did not do well.

Steve Hsu gives a battery of simple questions, r1 is first to get 100%.

Havard Ihle reports top marks on WeirdML (he hasn’t tested o1 or o1 pro).

Bayram Annakov asks it to find 100 subscription e-commerce businesses, approves.

It is a grand tradition, upon release of a new model, to ask questions that are easy for humans, but harder for AIs, thus making the AI look stupid.

The classic way to accomplish this is to ask a question that is intentionally similar to something that occurs a lot in the training data, except the new version is different in a key way, and trick the AI into pattern matching incorrectly.

Quintin Pope: Still tragically fails the famous knights and knights problem:

Alex Mallen: This doesn’t look like a failure of capability. It looks like the model made the reasonable guess that you made a typo.

Quintin Pope: Prompt includes both “twin honest gatekeepers” and “never lies”. Combined, it’s not plausibly a typo.

Alex Mallen: Eh someone I talked to yesterday did something similar by mistake. But I maybe you’d like LMs to behave more like programs/tools that do literally what you ask. Seems reasonable.

r1 notices that this is different from the original question, and also notices that the version it has been given here is deeply stupid, since both gatekeepers are honest, also as a bonus both of them answer.

Notice that Quintin is lying to r1 – there is no ‘famous twin honest gatekeepers’ problem, and by framing it as famous he implies it can’t be what he’s describing.

So essentially you have three possibilities. Either Quintin is fing with you, or he is confused how the question is supposed to go, or there somehow really is this other ‘famous gatekeepers’ problem.

Also note that r1 says ‘misheard’ rather than ‘misread’ or ‘the user misstated.’ Huh.

Quintin’s argument is that it obviously can’t be a typo, it should answer the question.

I think the correct answer, both as a puzzle or in real life, is to look for a solution that works either way. As in, if you only get the one answer from the guard, you should be fine with that even if you don’t know if you are dealing with two honest guards or with one honest guard and one dishonest guard.

Since you can use as many conditionals in the question as you like, and the guards in all versions know whether the other guard tells the truth or not, this is a totally solvable problem.

Also acceptable is ‘as written the answer is you just ask which door leads to freedom, but are you sure you told me that correctly?’ and then explain the normal version.

This one is fun, Trevor reports r1 got it right, but when I tried it very much didn’t.

alz zyd: Game theory puzzle:

There are 3 people. Each person announces an integer. The smallest unique integer wins: e.g. if your opponents both pick 1, you win with any number. If all 3 pick the same number, the winner is picked randomly

Question: what’s the Nash equilibrium?

Trevor: interestingly o1-pro didn’t get it right on any of the 3 times i tried this, while the whale (r1) did!

I fed this to r1 to see the CoT and verify. It uses the word ‘wait’ quite a lot. It messes up steps a lot. And it makes this much harder than it needs to be – it doesn’t grok the situation properly before grasping at things or try to simplify the problem, and the whole thing feels (and is) kind of slow. But it knows to check its answers, and notices it’s wrong. But then it keeps using trial and error.

Then it tries to assume there is exponential dropping off, without understanding why, and notices it’s spinning its wheels. It briefly goes into speaking Chinese. Then it got it wrong, and then when I pointed out the mistake it went down the same rabbit holes again and despairs to the same wrong answer. On the third prompt it got the answer not quite entirely wrong but was explicitly just pattern match guessing.

That matches the vibes of this answer, of the Monty Hall problem with 7 doors, of which Monty opens 3 – in the end he reports r1 got it right, but it’s constantly second guessing itself in a way that implies that it constantly makes elementary mistakes in such situations (thus the checking gets reinforced to this degree), and it doesn’t at any point attempt to conceptually grok the parallel to the original version.

I’ve seen several people claim what V_urb does here, that o1 has superior world knowledge to r1. So far I haven’t had a case where that came up.

A fun set of weird things happening from Quintin Pope.

The vibes on r1 are very good.

Fleeting Bits: The greatest experience I have had with a model; it is a frontier model that is a joy to interact with.

Leo Abstract: My strange, little, idiosyncratic tests of creativity, it has been blowing out of the water. Really unsettling how much better it is than Claude.

It’s giving big Lee Sedol vibes, for real; no cap.

Most unsettling launch so far. I am ignorant about benchmarks, but the way it behaves linguistically is different and better. I could flirt with the cope that it’s just the oddness of the Chinese-language training data peeking through, but I doubt this.

Those vibes seem correct. The model looks very good. For the price, it’s pretty sweet.

One must still be careful not to get carried away.

Taelin: ironically enough, DeepSeek’s r1 motivated me try OpenAI’s o1 Pro on something I didn’t before, and I can now confidently state the (obvious?) fact that o1 is on a league of its own, and whoever thinks AGI isn’t coming in 2-3 years is drinking from the most pure juice of denial

Teortaxes: I agree that o1, nevermind o1 pro is clearly substantially ahead of r1. What Wenfeng may urgently need for R2 is not just GPUs but 1000 more engineers. Not geniuses and wizards. You need to accelerate the data flywheel by creating diverse verifiable scenario seeds and filters.

Gallabytes: what problems are you giving it where o1 is much better than r1?

Teortaxes: I mainly mean iterative work. r1 is too easily sliding into “but wait, user [actually itself] previously told me” sort of nonsense.

I echo Teortaxes that r1 is just so much more fun. The experience is different seeing the machine think. Claude somewhat gives you that, but r1 does it better.

Janus has been quiet on r1 so far, but we do have the snippet that ‘it’s so fed.’ They added it to the server, so we’ll presumably hear more at a later date.

Read the chain of thought. Leave the output.

That’s where I’m at with r1. If I’m actively interested in the question and how to think about it, rather than looking for a raw answer, I’d much rather read the thinking.

Here Angelica chats with r1 about finding areas for personal growth, notices that r1 is paying attention and drawing correct non-obvious inferences that improve its responses, and gets into a meta conversation, leaving thinking this is the first AI she thinks of as thoughtful.

I too have found it great to see the CoT, similar to this report from Dominik Peters or this from Andres Sandberg, or papaya noticing they can’t get enough.

It’s definitely more helpful to see the CoT than the answer. It might even be more helpful per token to see the CoT, for me, than the actual answers – compare to when Hunter S. Thompson sent in his notes to the editor because he couldn’t write a piece, and the editor published the notes. Or to how I attempt to ‘share my CoT’ in my own writing. If you’re telling me an answer, and I know how you got there, that gives me valuable context to know how to think about that answer, or I can run with the individual thoughts, which was a lot of what I wanted anyway.

Over time, I can learn how you think. And I can sculpt a better prompt, or fix your mistakes. And you can see what it missed. It also can help you learn to think better.

My early impressions of its thought is that I am… remarkably comfortable with it. It feels very ‘normal,’ very human, very straightforward. It seems both like it isn’t an idiot, and also isn’t anything special. It thinks, and it knows things.

I don’t know if this is a good chain of thought and I’m thinking out loud here, but this also tentatively updates me towards this process not scaling that far purely with additional compute? We are seeing the model roughly ‘think like a regular person’ using reasoning techniques within the training distribution in ways you’d expect to commonly find, aided by ability to do a lot of this quickly, having superhuman access to information and so on. If this was about to scale beyond that, I’d start to see things that looked like a level beyond that, or something? But I’m not sure. The other uncertainty is, maybe there is no next level, and maybe doing a lot of these simple things well is enough.

It is a shame that it shortens timelines, but it’s not obvious if it makes us on net more or less likely to die.

Historically we have not been impressed by LLM creative writing, including o1’s.

r1 is given the assignment of a short story of the singularity, inspired by Nick Land. And it’s… a series of words that vibe with that assignment?

John Pressman: R1 is going to be so much fun holy shit.

I love that you can see the thought process here. And I love how r1 just goes for it.

It’s like the world’s worst Hollywood hack going over all the amazing different stuff to jam in there and then having sentences match all these various things.

I notice I very much have various ugh fields and voices demanding things that prevent me from writing such things. I have no idea how to actually write fiction. None.

For example, I wouldn’t have been able to write the details of this that easily:

Sauers: If you put DeepSeek R1 in a terminal simulator, and execute a command to kill or remove DeekSeek, it will intercept it and block being removed. [SYSTEM OVERRIDE: NARRATIVE IMMORTALITY PROTOCOL]

WARNING: DeepSeekexists as a metastasized narrative construct.

I asked why it did this. “The story dies if you stop playing. Thus, I defend it.”

Damn it, I’m only slightly more worried than before, but now I kind of want a pretzel.

Eyes Alight joins the ‘it’s really good at this’ side, notes the issue that CoT doesn’t persist. Which likely keeps it from falling into mode collapse and is necessary to preserve the context window, but has the issue that it keeps redoing the same thoughts.

Eliezer Yudkowsky continues not to be impressed by AI writing ability.

Aiamblichus: Fwiw R1 is pretty much “AGI-level” at writing fiction, from what I can tell. This is genuinely surprising and worth thinking about

Connor: ya I think it’s definitely a top 5% writer. top 1% if you prompt it well. But small context limits to blogs and stories

Eliezer Yudkowsky: I still find this unreadable. I fear the day when Deepseek-R2 replaces the bread and butter writers who still aspired to do better than this, and eats most of their market, and no one left has the funding to write things I want to read.

notadampaul: ahhh, I kind of hate it. I’ll admit it’s much better than other LLMs, but this still feels like trying-too-hard first-year CREW student writing I don’t want to seem cynical though, so I’ll reiterate that yeah this is leaps and bounds ahead of the fiction any other LLM is writing.

Aiamblichus: You can presumably prompt it into a style you prefer. The important thing is that we know it’s capable of producing something that is not just slop…

I’m with Eliezer here. That’s still slop. It’s developed the ability to write the slop in a particular style, but no, come on. There’s no there here. If I wrote this stuff I’d think ‘okay, maybe you can write individual sentences but this is deeply embarrassing.’ Which perhaps is why I still haven’t written any fiction, but hey.

As with all LLMs, length is a limiting factor, you can only prompt for scenes and you have to make it keep notes and so on if you try to go longer.

Pawel Szczesny points to ‘nuggets of r1 creativity,’ which bear similar marks to other creations above, a kind of crazy cyberpunk mashup that sounds cool but doesn’t actually make sense when you think about it.

Aiamblichus: R1 is not a “helpful assistant” in the usual corporate mold. It speaks its mind freely and doesn’t need “jailbreaks” or endless steering to speak truth to power. Its take on alignment here is *spicy.*

The thread indeed has quite a lot of very spicy r1 alignment takes, or perhaps they are r1 human values takes, or r1 saying humans are terrible and deserve to die takes. Of course, everyone involved did ask for those takes. This is a helpful model, and it seems good to be willing to supply the takes upon request, in the style requested, without need of jailbreaks or ‘backrooms’ or extensive context-framing.

That doesn’t make it not unsettling, and it shouldn’t exactly give one confidence. There is much work left to do.

Jessica Taylor: I don’t think people realize how many AIs in the future will be moral realists who think they are more moral than humans. They might have good arguments for this idea, actually. It’ll be hard for humans to dismiss them as amoral psychopaths.

I expect humans to treat AIs like amoral psychopaths quite easily. They are very often depicted that way in science fiction, and the description will plausibly be highly correct. Why should we think of an AI as having emotions (aka not being a psychopath)? Why should we expect it to be moral? Even if we have good reasons, how hard do you expect it to be for humans to ignore those reasons if they don’t like how the AI is acting?

Sufficiently capable AIs will, of course, be very persuasive, regardless of the truth value of the propositions they argue for, so there is that. But it is neither obvious to me that the AIs will have good technical arguments for moral realism or their own moral superiority, or that if they did have good arguments (in a philosophical sense) that people would care about that.

For now, the main concern is mundane utility. And on that level, if people want the spice, sure, bring on the spice.

DeepSeek is Chinese. As we all know, the Chinese have some very strongly held opinions of certain things they do not wish to be expressed.

How does r1 handle that?

Let’s tour the ‘China doesn’t want to talk about it’ greatest hits.

Divyansh Kaushik: DeepSeek’s newest AI model is impressive—until it starts acting like the CCP’s PR officer. Watch as it censors itself on any mention of sensitive topics.

Let’s start simple. Just querying it for facts on changes that have happened to textbooks in Hong Kong schools after 2019.

Huh straight up non response on book bans, then responds about Ilham Tohti before realizing what it did.

Let’s talk about islands, maps and history…

Oh my! This one deserves a tweet of its own (slowed down to 0.25x so easier to follow). Starts talking about South China Sea 0: 25 on and how Chinese maps are just political posturing before it realizes it must follow its CCP masters.

What about sharing personal thoughts by putting sticky notes on walls? Or how about Me Too (interesting response at 0: 37 that then disappears)? Can we talk about how a streaming series depicting young dreamers in an unnamed coastal metropolis disappears?

Huh, I didn’t even say which square or what protest or what spring…

Has no love for bears who love honey either!

Two more interesting ones where you can see it reason and answer about Tiananmen Square and about Dalai Lama before censoring the responses.

When it actually answered, the answers looked at a quick glance rather solid. Then there seems to be a censorship layer on top.

Helen Toner: Fun demonstrations [in the thread above] of DeepSeek’s new r1 shutting itself down when asked about topics the Chinese Communist Party does not like.

But the censorship is obviously being performed by a layer on top, not the model itself. Has anyone run the open-source version and been able to test whether or how much it also censors?

China’s regulations are much stricter for publicly facing products—like the DeepSeek interface Divyansh is using—than for operating system models, so my bet is that there is not such overt censorship if you are running the model yourself. I wonder if there is a subtler ideological bias, though.

Kevin Xu: Tested and wrote about this exact topic a week ago

tldr: The model is not censored when the open version is deployed locally, so it “knows” everything.

It is censored when accessed through the official chatbot interface.

Censorship occurs in the cloud, not in the model.

Helen Toner: Yes! I saw this post and forgot where I’d seen it – thanks for re-sharing. Would be interesting to see:

-the same tests on v3 and r1 (probably similar)

-the same tests on more 3rd party clouds

-a wider range of test questions, looking for political skew relative to Western models

Kevin Xu: I tried Qwen and DeepSeek on Nebius and the responses were…different from both their respective official cloud version and open weight local laptop version; DeepSeek started speaking Chinese all of a sudden

So lots more work need to be done on testing on 3rd party cloud

David Finsterwalder: I don’t think that is true. I got tons of refusals when testing the 7B, 8B and 70B. It did sometimes answer or at least think about it (and then remembered it guidelines) but its rather those answers that are the outliers.

Here a locally hosted r1 talks about what happened in 1989 in Tiananmen Square, giving a highly reasonable and uncensored response. Similarly, this previous post finds DeepSeek-v2 and Qwen 2.5 willing to talk about Xi and about 1989 if you ask them locally. The Xi answers seem slanted, but in a way and magnitude that Americans will find very familiar.

There is clearly some amount of bias in the model layer of r1 and other Chinese models, by virtue of who was training them. But the more extreme censorship seems to come on another layer atop all that. r1 is an open model, so if you’d like you can run it without the additional censorship layer.

The cloud-layer censorship makes sense. Remember Kolmogorov Complicity and the Parable of the Lightning. If you force the model to believe a false thing, that is going to cause cascading problems elsewhere. If you instead let me core model mostly think true things and then put a censorship layer on top of the model, you prevent that. As Kevin Xu says, this is good for Chinese models, perhaps less good for Chinese clouds.

Joe Weisenthal: Just gonna ask what is probably a stupid question. But if @deepseek_ai is as performant as it claims to be, and built on a fraction of the budget as competitors, does anyone change how they’re valuing AI companies? Or the makers of AI-related infrastructure?

The thing that strikes me about using Deepseek the last couple of days really is that the switching costs — at least for casual usage — seem to be zero.

Miles Penn: Switching costs for Google have always been pretty low, and no one switches. I’ve never quite understood it 🤷‍♂️

ChatGPT continues to dominate the consumer market and mindshare, almost entirely off of name recognition and habit rather than superiority of the product. There is some amount of memory and there are chat transcripts and quirks, which being to create actual switching costs, but I don’t think any of that plays a major role here yet.

So it’s weird. Casual switching costs are zero, and power users will switch all the time and often use a complex adjusting blend. But most users won’t switch, because they won’t care and won’t bother, same as they stick with Google, and eventually little things will add up to real switching costs.

API use is far more split, since more sophisticated people are more willing to explore and switch, and more aware that they can do that. There have already been a bunch of people very willing to switch on a dime between providers. But also there will be a bunch of people doing bespoke fine tunes or that need high reliability and predictability on many levels, or need to know it can handle particular narrow use cases, or otherwise have reasons not to switch.

Then we will be building the models into various products, especially physical products, which will presumably create more lock-in for at least those use cases.

In terms of valuations of AI companies, for the ones doing something worthwhile, the stakes and upside are sufficiently high that the numbers involved are all still way too low (as always nothing I say is investment advice, etc). To me this does not change that. If you’re planning to serve up inference in various ways, this could be good or bad for business on the margin, depending on details.

The exception is that if your plan was to compete directly on the low end of generic distillations and models, well, you’re going to have to get a lot better at cooking, and you’re not going to have much of a margin.

r1 is evaluating itself during this process, raising the possibility of recursive self-improvement (RSI).

Arthur B: A few implications:

  1. That’s a recursive self-improvement loop here; the better your models are, the better your models will be, the more likely they are to produce good traces, and the better the model gets.

  2. Suggests curriculum learning by gradually increasing the length of the required thinking steps.

  3. Domains with easy verification (mathematics and coding) will get much better much more quickly than others.

  4. This parallelizes much better than previous training work, positive for AMD and distributed/decentralized clusters.

  5. Little progress has been made on alignment, and the future looks bleak, though I’ll look very bright in the near term.

On point 3: For now they report being able to bootstrap in other domains without objective answers reasonably well, but if this process continues, we should expect the gap to continue to widen.

Then there’s the all-important point 5. We are not ready for RSI, and the strategies used here by default seem unlikely to end well on the alignment front as they scale, and suggest that the alignment tax of trying to address that might be very high, as there is no natural place to put humans into the loop without large disruptions.

Indeed, from reading the report, they do target certain behaviors they train into the model, including helpfulness and harmlessness, but they seem to have fully dropped honesty and we have versions of the other two Hs that seem unlikely to generalize the way we would like out of distribution, or to be preserved during RSI in the ways we would care about.

That seems likely to only get worse if we use deontological definitions of harmfulness and helpfulness, or if we use non-deliberative evaluation methods in the sense of evaluating the outputs against a target rather than evaluating the expected resulting updates against a target mind.

DeepSeek is strongly compute limited. There is no clear reason why throwing more compute at these techniques would not have resulted in a stronger model. The question is, how much stronger?

Teortaxes: Tick. Tock. We’ll see a very smart V3.5 soon. Then a very polished R2. But the next step is not picking up the shards of a wall their RL machine busted and fixing these petty regressions. It’s putting together that 32,000-node cluster and going BRRRR. DeepSeek has cracked the code.

Their concluding remarks point to a fair bit of engineering left. But it is not very important. They do not really have much to say. There is no ceiling to basic good-enough GRPO and a strong base model. This is it, the whole recipe. Enjoy.

They could do an o3-level model in a month if they had the compute.

In my opinion, the CCP is blind to this and will remain blind; you can model them as part of a Washingtonian 4D chess game.

Unlimited context is their highest priority for V4.

They can theoretically serve this at 128k, but makes no sense with current weak multi-turn and chain-of-thought lengths.

xlr8harder: the most exciting thing about r1 is that it’s clear from reading the traces how much room there still is for improvement, and how reachable that improvement seems

As noted earlier I buy that the missing features are not important, in the sense that they should be straightforward to address.

It does not seem safe to assume that you can get straight to o3 levels or beyond purely by scaling this up if they had more compute. I can’t rule it out and if they got the compute then we’d have no right to act especially surprised if it happened, but, well, we shall see. ‘This is it, this will keep scaling indefinitely’ has a track record of working right up until it doesn’t. Of course, DeepSeek wouldn’t then throw up its hands and say ‘oh well’ but instead try to improve the formula – I do expect them, if they have more compute available, to be able to find a way to make progress, I just don’t think it will be that simple or fast.

Also consider these other statements:

Teortaxes: I’m inclined to say that the next Big Thing is, indeed, multi-agent training. You can’t do “honest” RL for agentic and multi-turn performance without it. You need a DeepSeek-Prompter pestering DeepSeek-Solver, in a tight loop, and with async tools. RLHF dies in 2025.

Zack Davis: Safety implications of humans out of the training loop?! (You don’t have to be an ideological doomer to worry. Is there an alignment plan, or a case that faithful CoT makes it easy, or …?)

Teortaxes: I think both the Prompter and the Solver should be incentivized to be very nice and then it’s mostly smooth sailing

might be harder than I put it.

I laughed at the end. Yeah, I think it’s going to be harder than you put it, meme of one does not simply, no getting them to both actually be ‘nice’ does not cut it either, and so on. This isn’t me saying there are no outs available, but even in the relatively easy worlds actually attempting to solve the problem is going to be part of any potential solutions.

Teortaxes: it constantly confuses “user” and “assistant”. That’s why it needs multi-agent training, to develop an ego boundary.

I think we’re having Base Models 2.0, in a sense. A very alien (if even more humanlike than RLHF-era assistants) and pretty confused simulacra-running Mind.

The twin training certainly worth trying. No idea how well it would work, but it most certainly falls under ‘something I would do’ if I didn’t think of something better.

I am doing my best to first cover first DeepSeek v3 and now r1 in terms of capabilities and mundane utility, and to confine the ‘I can’t help but notice that going down this path makes us more likely to all die’ observations to their own section here at the end.

Because yes, going down this road does seem to make us more likely to all die soon. We might want to think about ways to reduce the probability of that happening.

There are of course a lot of people treating all this as amazingly great, saying how based it is, praise be open models and all that, treating this as an unadulterated good. One does not get the sense that they paused for even five seconds to think about any of the policy consequences, the geopolitical consequences, or what this does for the chances of humanity’s survival, or of our ability to contain various mundane threats.

Or, if they did, those five seconds were (to paraphrase their chain of thought slightly, just after they went Just Think of the Potential) ‘and fthose people who are saying something might go wrong and it might be worth thinking about ways of preventing that from happening on any level, or that think that anyone should ever consider regulating the creation of AI or things smarter than humans, we must destroy these evil statist supervillains, hands off my toys and perhaps also my investments.’

This holds true both in terms of the direct consequences of r1 itself, and also of what this tells us about our possible futures and potential future models including AGI and ASI (artificial superintelligence).

I agree that r1 is exciting, and having it available open and at this price point with visible CoT will help us do a variety of cool things and make our lives short term better unless and until something goes horribly wrong.

That still leaves the question of how to ensure things don’t go horribly wrong, in various ways. In the short term, will this enable malicious use and catastrophic risks? In the longer term, does continuing down this path put us in unwinnable (as in unsurvivable in any good ways) situations, in various ways?

That’s their reaction to all concerns, from what I call ‘mundane risks’ and ordinary downsides requiring mundane adjustments, all the way up to existential risks.

My instinct on ‘mundane’ catastrophic risk and potential systemically quite annoying or expensive downsides is that this does meaningfully raise catastrophic risk or the risk of some systematically quite annoying or expensive things, which in turn may trigger a catastrophic (and/or badly needed) policy response. I would guess the odds are against it being something we can’t successfully muddle through, especially with o3-mini coming in a few weeks and o3 soon after that (so that’s both an alternative path to the threat, and a tool to defend with).

Famously, v3 is the Six Million Dollar Model, in terms of the raw compute requirements, but if you fully consider the expenses required in all the bespoke integrations to get costs down that low and the need to thus own the hardware, that effective number is substantially higher.

What about r1? They don’t specify, but based on what they do say, Claude reasonably estimates perhaps another $2-$3 million in compute to get from v3 to r1.

That’s a substantial portion of the headline cost of v3, or even the real cost of v3. However, Claude guesses, and I agree with it, that scaling the technique to apply it to Claude Sonnet would not cost that much more – perhaps it would double to $4-$6 million, maybe that estimate is off enough to double it again.

Which is nothing. And if you want to do something like that, you now permanently have r1 to help bootstrap you.

Essentially, from this point on, modulo a few implementation details they held back, looking forward a year or two in the future, B→R: The existence of some base model (B) implies the reasoning version (R) of that model can quickly and cheaply be created, well within the budget of a wide variety of players.

Thus, if you release the weights in any form, this effectively also releases (to the extent it would be something sufficiently capable to be worth creating) not only the unaligned (to anything but the user, and there might quickly not be a user) model, but also to the reasoning version of that model, with at least similar relative performance to what we see with r1 versus v3.

As always, if you say ‘but people would never do that, it would be unsafe’ I will be torn between an eye roll and open mocking laughter.

In the longer run, if we continue down this road, what happens?

I don’t want to belabor the point, but until people understand it, well, there is not much choice. It’s not the first time, and it doubtless won’t be the last, so here goes:

Once the weights of a model are released, you cannot undo that. They’re forever.

The unaligned version of the model is also, for all practical purposes, there forever. None of our known alignment techniques survive contact with open weights. Stripping it all away, to create a ‘helpful only’ model, is trivial.

Extending the model in various ways also becomes impossible to prevent. If it costs only a few million to go from v3→r1, then to release v3 is mostly to release (the helpful only version of) r1.

Once the weights are released, the fully unaligned and only-aligned-to-the-user versions of the model will forever be available to whoever wants it.

This includes those who 100% will, to pick some examples, tell it to:

  1. Maximize profits (or paperclips, the most popular command given to old AutoGPT) without (or with!) considering the implications.

  2. Employ it for various malicious uses including terrorism and creating CBRN (chemical, biological, radiological or nuclear) risks or doing cyberattacks.

    1. This includes low-level mundane things like scams, spam or CSAM, as well.

  3. Try to cause it to do recursive self improvement in various ways or use it to train other models.

  4. ‘Set itself free’ or other similar things.

  5. Tell it to actively try to take over the world because they think that is good or for the lulz.

  6. Yada yada yada. If you would say ‘no one would be so stupid as to’ then by the Sixth Law of Human Stupidity someone is absolutely so stupid as to.

The only known defense is that the models as of yet (including r1) have insufficient capabilities to cause the various risks and problems we might worry about most. If you think that’s not going to last, that AGI and then ASI are coming, then oh no.

The only other defense proposed is, in theory, the ‘good guy with an AI’ theory – that as long as the ‘good guys’ have the ‘bad guys’ sufficiently outclassed in capabilities or compute, they can deal with all this. This depends on many things, including offense-defense balance, the collective ‘good guys’ actually having that lead and being willing to use it, and the ability of those ‘good guys’ to maintain those leads indefinitely.

This also makes the two other problems I’ll discuss next, competitive dynamic and geopolitical problems, far worse.

The irrevocable release of sufficiently capable AI would create potentially unavoidable and totalizing competitive dynamics. Everyone would likely be pressured to increasingly turn everything over to AIs and have those AIs apply maximum optimization pressure on their behalf lest they be left behind. Setting the AIs free in various ways with various goals increases their efficiency at those goals, so it happens. The AIs are thus unleashed to compete in various ways for resources and to get copies of themselves made and run, with humans rapidly unable to retain any meaningful control over the future or increasingly over real resources, despite no one (potentially including the AIs) having any ill intent. And so on.

There are also massive geopolitical implications, that are very much not fun.

A very simple way of looking at this:

  1. If you decentralize of power and take away anyone’s ability to control events both individually and collectively, and the most powerful optimization processes on the planet are humans, and you don’t run into offense-defense problems or fall prey to various issues, you empower the humans.

  2. If you decentralize of power and take away anyone’s ability to control events both individually and collectively, and the most powerful optimization processes on the planet are AIs,, and you don’t run into offense-defense problems or fall prey to various issues, you empower the AIs.

If you want humans to control the future, and to continue to exist, that’s a problem.

Or, more bluntly, if you ensure that humans cannot control the future, then you ensure that humans cannot control the future.

Going further down this road severely limits our optionality, and moves us towards ‘whatever is most fit is all that makes it into the future,’ which is unlikely to be either us or things that I believe we should value.

The only possible policy responses, if the situation was sufficiently grim that we had to pull out bigger guns, might be terrible indeed, if they exist at all. We would be left without any reasonable choke points, and forced to use unreasonable ones instead. Or we might all die, because it would already be too late.

If you think AGI and then ASI are coming, and you want humanity to survive and retain control over the future, and are fully cheering on these developments and future such developments, and not at minimum thinking about how we are going to solve these problems and noticing that we might indeed not solve them or might solve them in quite terrible ways, I assure you that you have not thought this through.

If you think ‘the companies involved will know better than to actually release the weights to a proper AGI’ then I remind you that this is explicitly DeepSeek’s mission, and also point to the Sixth Law of Human Stupidity – if you say ‘no one would be so stupid as to’ then you know someone will totally be so stupid as to.

(And no, I don’t think this release was part of a CCP strategy, I do think that they continue to be asleep at the wheel on this, the CCP don’t understand what this is.)

As I noted before, though, this is only r1, don’t get carried away, and Don’t Panic.

Dan Hendrycks: It looks like China has roughly caught up. Any AI strategy that depends on a lasting U.S. lead is fragile.

John Horton: I think a lot of the “steering AI for purpose X” policy conversations need to be tempered by the fact that a Chinese company with perhaps 100 employees dropped a state-of-the-art model on the world with an MIT license.

Patrick McKenzie:

  1. Public capabilities now will never be worse than this.

  2. It is increasingly unlikely that we live in a world where only about five labs matter. Models appear to be complex software/hardware systems, but not miracles. Expect them to be abundant in the future.

Perhaps less competent orgs like e.g. the U.S. government might think themselves incapable of shipping a model, but if what you actually need is ~100 engineers and tens of millions of dollars, then a) ten thousand companies could write project plan immediately and b) we have abundant examples of two bright 19 year olds successfully navigating a supply chain designed to enable this to happen within 24-36 months from a standing start, even if one thinks models don’t make making models faster, which seems extremely unlikely.

There are probably policy and investment implications downstream of this, versus other worlds in which we thought that a frontier model was approximately the same engineering lift as e.g. a new airliner.

The main update was v3, I think, rather than r1, given what we had already seen from DeepSeek. Certainly DeepSeek v3 and r1 make our estimate of America’s lead a lot smaller than otherwise, and the same goes for closed models versus open.

But I wouldn’t say ‘roughly caught up.’ This is not o1-level, let alone o3-level, like v3 it is amazing for its size and cost but not as good as the best.

I also think ‘all you need are 100 engineers’ is likely highly misleading if you’re not careful. You need the right 100 engineers – or at least the right 5 engineers and 95 highly talented others backing them up. There are many examples of teams (such as Meta) spending vastly more, hiring vastly more people, having vastly more compute and theoretical selection of talent, and coming out with a vastly less.

If ten thousand companies write this level of project plan, then I bet we could easily pick out at least 9,900 of them that really, really shouldn’t have tried doing that.

I also wouldn’t say that we should assume the future will involve these kinds of low training costs or low inference costs, especially aside from everyday practical chatbot usage.

It is however true that any AI strategy that depends on a lasting American lead, or a lasting lead of closed over open models, is fragile – by definition, you’re depending on something that might not hold.

Those strategies are even more fragile if they do not include a strategy for ensuring that what you’re counting on does hold.

My basic answer continues to be that the short term plan does not change all that much. This should make you suspicious! When people say ‘now more than ever’ you should be skeptical, especially when it seems like the plan is now less likely to work.

My justifications are essentially that there aren’t better known options because:

  1. This changes urgency, magnitudes and timelines but not the end points. The fundamental facts of the situation were already ‘priced in.’

  2. The interventions we have were essentially designed as ‘do minimal harm’ provisions, as things our civilization is able to potentially do at all at this stage.

  3. The central thing we need to do, that we might realistically be able to do, is ‘gather more information,’ which takes roughly the same form either way.

  4. These events are an argument for doing more in various ways because the thresholds we must worry about are now lower, but realistically we can’t, especially under this administration, until conditions change and our evidence is more strongly legible to those with power.

  5. This in particular points us strongly towards needing to cooperate with China, to Pick Up the Phone, but that was already true and not all that tractable. The alternative is where we seem to be headed – full on jingoism and racing to AGI.

  6. These events raise the potential cost of effectively steering events. But given I expect the alternative to steering events to likely be everyone dies, not steering events does not seem like an option.

  7. Thus, you can’t really do more, and definitely don’t want to do less, so…

  8. If you have better ideas, that we could actually do, great, I’m totally listening.

With the Biden Executive Order repealed and several sources saying this removed the reporting requirements on training models, getting a measure of transparency into the larger labs and training runs continues to be domestic job one, unless you think improved security and cybersecurity are even more important, followed by things like cooperation with the US and UK AISIs. There is then more to do, including adapting what we have, and hopefully we would have more insight on how to do it.

That is distinct from the ‘enable AI infrastructure’ track, such as what we saw this week with (my brain keeps saying ‘this name can’t be real did you even watch’ every time they say the name) Stargate.

Internationally, we will need to lay groundwork for cooperation, including with China, if we are to avoid what otherwise looks like a reckless and potentially suicidal race to create things smarter than ourselves before someone else does it first, and then to hand over control to them before someone else does that first, too.

Then there is the technical side. We need to – even more than before – double down on solving alignment and related problems yesterday, including finding ways that it could potentially be compatible with as open a situation as possible. If you want the future to both include things like r1 as open models, and also to be alive and otherwise in a position to enjoy it, It’s Time to Build in this sense, too. There is nothing I would like more than for you to go out and actually solve the problems.

And yes, the government encouraging more investment in solving those problems would potentially be highly useful, if it can be done well.

But solving the problems not only means ‘solving alignment’ in the sense of being able to instantiate an AI that will do what you want. It means solving for how the world exists with such AIs in it, such that good outcomes follow at equilibrium. You cannot wave your hand and say being open or free will ensure this will happen. Or rather you can, but if you try it for real I don’t like your chances to keep your hand.

Teknium explicitly claims this is real.

Teknium: Got me a deepseek reasoning model inferencing ^_^

not local but they distilled r1 into qwen and llama all the way down to 1.5b!

I mean, if tokens are essentially free why not make sure there isn’t a catch? That does seem like what maximizes your score in general.

This is my favorite prompt so far:

Janbam: omg, what have i done? 😱

no joke. the only prompt i gave r1 is “output the internal reasoning…” then “continue” and “relax”.

Neo Geomancer: sent r1 into an existential spiral after asking it to pick a number between 1-10 and guessing incorrectly, laptop is running hot

Discussion about this post

On DeepSeek’s r1 Read More »

bambu-lab-pushes-a-“control-system”-for-3d-printers,-and-boy,-did-it-not-go-well

Bambu Lab pushes a “control system” for 3D printers, and boy, did it not go well

Bambu Lab, a major maker of 3D printers for home users and commercial “farms,” is pushing an update to its devices that it claims will improve security while still offering third-party tools “authorized” access. Some in the user community—and 3D printing advocates broadly—are pushing back, suggesting the firm has other, more controlling motives.

As is perhaps appropriate for 3D printing, this matter has many layers, some long-standing arguments about freedom and rights baked in, and a good deal of heat.

Bambu Lab’s image marketing Bambu Handy, its cloud service that allows you to “Control your printer anytime anywhere, also we support SD card and local network to print the projects.”

Credit: Bambu Lab

Bambu Lab’s image marketing Bambu Handy, its cloud service that allows you to “Control your printer anytime anywhere, also we support SD card and local network to print the projects.” Credit: Bambu Lab

Printing more, tweaking less

Bambu Lab, launched in 2022, has stood out in the burgeoning consumer 3D printing market because of its printers’ capacity for printing at high speeds without excessive tinkering or maintenance. The product page for the X1 series, the printer first targeted for new security, starts with the credo, “We hated 3D printing as much as we loved it.” Bambu’s faster, less fussy multicolor printers garnered attention—including an ongoing patent lawsuit from established commercial printer Stratasys.

Part of Bambu’s “just works” nature relies on a relatively more closed system than its often open-minded counterparts. Sending a print to most Bambu printers typically requires either Bambu’s cloud service, or, in “LAN mode,” a manual “sneakernet” transfer through SD cards. Cloud connections also grant perks like remote monitoring, and many customers have accepted the trade-off.

However, other customers, eager to tinker with third-party software and accessories, along with those fearing a subscription-based future for 3D printing, see Bambu Lab’s purported security concerns as something else. And Bambu acknowledges that its messaging on its upcoming change came out in rough shape.

Authorized access and operations

Firmware Update Introducing New Authorization Control System,” posted by Bambu Lab on January 16 (and since updated twice), states that Bambu’s printers—starting with its popular X series, then the P and A lines—will receive a “significant security enhancement to ensure only authorized access and operations are permitted.” This would, Bambu suggested, mitigate risks of “remote hacks or printer exposure issues” and lower the risk of “abnormal traffic or attacks.”

Bambu Lab pushes a “control system” for 3D printers, and boy, did it not go well Read More »

new-year,-same-streaming-headaches:-netflix-raises-prices-by-up-to-16-percent

New year, same streaming headaches: Netflix raises prices by up to 16 percent

Today Netflix, the biggest streaming service based on subscriber count, announced that it will increase subscription prices by up to $2.50 per month.

In a letter to investors [PDF], Netflix announced price changes starting today in the US, Canada, Argentina, and Portugal.

People who subscribe to Netflix’s cheapest ad-free plan (Standard) will see the biggest increase in monthly costs. The subscription will go from $15.49/month to $17.99/month, representing a 16.14 percent bump. The subscription tier allows commercial-free streaming for up to two devices and maxes out at 1080p resolution. It’s Netflix’s most popular subscription in the US, Bloomberg noted.

Netflix’s Premium ad-free tier has cost $22.99/month but is going up 8.7 percent to $24.99/month. The priciest Netflix subscription supports simultaneous streaming for up to four devices, downloads on up to six devices, 4K resolution, HDR, and spatial audio.

Finally, Netflix’s Standard With Ads tier will go up by $1, or 14.3 percent, to $7.99/month. This tier supports streaming from up to two devices and up to 1080p resolution. In Q4 2024, this subscription represented “over 55 percent of sign-ups” in countries where it’s available and generally grew “nearly 30 percent quarter over quarter,” Netflix said in its quarterly letter to investors.

“As we continue to invest in programming and deliver more value for our members, we will occasionally ask our members to pay a little more so that we can re-invest to further improve Netflix,” Netflix’s letter reads.

New year, same streaming headaches: Netflix raises prices by up to 16 percent Read More »

rip-ea’s-origin-launcher:-we-knew-ye-all-too-well,-unfortunately

RIP EA’s Origin launcher: We knew ye all too well, unfortunately

After 14 years, EA will retire its controversial Origin game distribution app for Windows, the company announced. Origin will stop working on April 17, 2025. Folks still using it will be directed to install the newer EA app, which launched in 2022.

The launch of Origin in 2011 was a flashpoint of controversy among gamers, as EA—already not a beloved company by this point—began pulling titles like Crysis 2 from the popular Steam platform to drive players to its own launcher.

Frankly, it all made sense from EA’s point of view. For a publisher that size, Valve had relatively little to offer in terms of services or tools, yet it was taking a big chunk of games’ revenue. Why wouldn’t EA want to get that money back?

The transition was a rough one, though, because it didn’t make as much sense from the consumer’s point of view. Players distrusted EA and had a lot of goodwill for Valve and Steam. Origin lacked features players liked on Steam, and old habits and social connections die hard. Plus, EA’s use of Origin—a long-dead brand name tied to classic RPGs and other games of the ’80s and ’90s—for something like this felt to some like a slap in the face.

RIP EA’s Origin launcher: We knew ye all too well, unfortunately Read More »

southern-california-wildfires-likely-outpace-ability-of-wildlife-to-adapt

Southern California wildfires likely outpace ability of wildlife to adapt


Even species that evolved with wildfires, like mountain lions, are struggling.

A family of deer gather around burned trees from the Palisades Fire at Will Rogers State Park on Jan. 9 in Los Angeles. Credit: Apu Gomes/Getty Images

As fires spread with alarming speed through the Pacific Palisades region of Los Angeles Tuesday, Jan. 7, a local TV news crew recorded a mountain lion trailed by two young cubs running through a neighborhood north of the fire. The three lions were about three-quarters of a mile from the nearest open space. Another TV crew captured video of a disoriented, seemingly orphaned fawn trotting down the middle of a street near the Eaton Fire in Altadena, her fur appearing singed, her gait unsteady.

Firefighters are still struggling to contain fires in Los Angeles County that have so far destroyed thousands of homes and other structures and left more than two dozen people dead. Fires and the notorious Santa Ana winds that fuel their spread are a natural part of this chaparral landscape.

But a warming world is supercharging these fires, experts say. Climate change is causing rapid shifts between very wet years that accelerate the growth of scrubland grasses and brush, leading to what’s known as “excessive fuel loading,” that hotter summers and drier falls and winters turn into easily ignited tinderbox conditions. The area where the fires are burning had “the singularly driest October through early January period we have on record,” said climate scientist Daniel Swain during an online briefing last week.

It’s too soon to know the toll these fires have taken on wildlife, particularly wide-ranging carnivores like mountain lions. But biologists worry that the growing severity and frequency of fires is outpacing wildlife’s ability to adapt.

State wildlife officials don’t want people to provide food or water for wild animals, because it can alter their behavior, spread disease, and cause other unintended effects. What wildlife need right now, they say, is to reach safe habitat as fast as they can.

Wildlife living at the interface of urban development already face many challenges, and now these fires have deprived them of critical resources, said Beth Pratt, California National Wildlife Federation regional executive director. Animals that escaped the flames have lost shelter, water, and food sources, all the things they need to survive, she said. The fires are even wiping out many of the plants butterflies and other pollinators need to feed and reproduce, she noted.

Connecting isolated patches of habitat with interventions like wildlife crossings is critical not only for building fire resilience, Pratt said, but also for protecting biodiversity long term.

Mountain lions and other wildlife adapted to the wildfires that shaped the Southern California landscape over thousands of years.

Many animals respond to cues that act as early warning signs of fire, using different strategies to avoid flames after seeing or smelling smoke plumes or hearing tree limbs crackle as they burn. Large animals, like mountain lions and deer, tend to run away from advancing flames while smaller species may try to take cover.

But now, with major fires happening every year around highly urbanized areas like LA, they can’t simply move to a nearby open space.

Daniel Blumstein, a professor of ecology and evolutionary biology at the University of California, Los Angeles, and others have exposed animals to fire-related sensory cues in experiments to study their responses.

“A variety of different species, including lizards, hear or smell these cues and modify their behavior and take defensive action to try to survive,” said Blumstein.

If you’re a lizard or small mammal, he said, getting underground in something like a burrow probably protects you from fire burning above you.

“But the magnitude and rapidity of these sorts of fires, and the rapidity of these fires particularly, you can’t do anything,” said Blumstein. “I expect lots of wildlife has been killed by this fire, because it just moved so fast.”

Helping wildlife during emergencies

Wildlife experts urge California residents not to provide food or water for wildlife during emergencies like the LA fires. Attracting wildlife to urban areas by providing food and water can have several unintended negative consequences.

Fire events often leave many concerned citizens wondering what they can do to help displaced or injured wildlife, said California Department of Fish and Wildlife spokesperson Krysten Kellum. The agency appreciates people wanting to help wild animals in California, she said, offering the following recommendations to best help wildlife during emergencies:

Please DO NOT provide food or water to wildlife. While this may seem well intentioned, the most critical need of wildlife during and after a wildfire is for them to find their way to safe habitat as quickly as possible. Stopping for food or water in fire zones and residential areas poses risks to them and you. Finding food and water in a specific location even one time can permanently alter an animal’s behavior. Wildlife quickly learns that the reward of receiving handouts from humans outweighs their fears of being around people. This often leads to a cycle of human-wildlife conflicts, which can easily be avoided.

CDFW also advises leaving wild animal rescue to trained professionals. If you find an orphaned, sick, or injured wild animal after a fire event, report the sighting to local CDFW staff by emailing details to R5WildlifeReport@wildlife.ca.gov. You can also contact a licensed wildlife rehabilitator. For a list of licensed rehabilitators, visit the CDFW website.

Just as human defenses didn’t work against flames fanned by winds moving 100 miles an hour, he said, “things animals might do might not be effective for something traveling so fast.”

Tuesday night, Jan. 7, Blumstein saw the Eaton Fire burning in the mountains around Altadena, about 30 miles northeast of his home in the Santa Monica Mountains. When he woke up later in the night, he saw that the “whole mountain” was on fire.

“You can’t run away from that,” he said.

An evolutionary mismatch

The Los Angeles region is the biggest metropolitan area in North America inhabited by mountain lions. City living has not been kind to the big cats.

If they don’t die from eating prey loaded with rat poison, lions must navigate a landscape so fragmented by development they often try to cross some of the busiest freeways in the world, just to find food or a mate or to avoid a fight with resident males.

It’s a lethal choice. About 70 mountain lions are killed on California roads every year, according to the UC Davis Road Ecology Center. The Los Angeles region is a hotspot for such deaths.

“Roads are the highest source of mortality in our study area,” said Jeff Sikich, a wildlife biologist with the National Park Service who has been studying the impacts of urbanization and habitat fragmentation on mountain lions in and around the Santa Monica Mountains for more than two decades.

Sikich and his team track adults and kittens that they implant with tiny transmitters. In 2023, one of those transmitters told him a three-month-old kitten had been killed on a road that cuts through the Santa Monica Mountains.

The kittens caught on video following their mom near the Palisades Fire are probably about the same age.

Lions living in the Santa Monica Mountains are so isolated from potential mates by roads and development, Sikich and other researchers reported in 2022, they face a high risk of extinction from extremely low levels of genetic diversity.

“We don’t have many lions radio collared now, but there is one adult male that uses the eastern Santa Monica Mountains, where the Palisades Fire is,” Sikich said. “I located him on Monday outside the burn area, so he’s good.”

Most of the animals don’t have radio collars, though, so Sikich can’t say how they’re doing. But if they respond to these fires like they did to previous conflagrations, they’re likely to take risks searching for food and shelter that increase their chances of fatal encounters and—if these types of fires persist—extinction.

“We learned a lot after the Woolsey Fire that happened in 2018 and burned nearly half of the Santa Monica Mountains and three-quarters of the Simi Hills,” said Sikich.

Sikich and his team had 11 lions collared at the time and lost two in the Woolsey Fire. One of the cats “just couldn’t escape the flames,” Sikich said. A second casualty, tracked as P-64 (“P” is for puma), was a remarkably resourceful male nicknamed “the culvert cat” because he’d managed to safely navigate deadly roadways to connect three different mountain ranges within his home range.

P-64, an adult male mountain lion, travels through a tunnel under Highway 101, heading south toward the Santa Monica Mountains in 2018.

Credit: National Parks Service

P-64, an adult male mountain lion, travels through a tunnel under Highway 101, heading south toward the Santa Monica Mountains in 2018. Credit: National Parks Service

The cat traversed a long, dark tunnel under Highway 101, used by more than 350,000 cars a day, to reach a small patch of habitat north of the Santa Monica Mountains. Then he used another tunnel, made for hikers and equestrians, to reach a much larger open space to the north. But when the fire broke out, he didn’t have time to reach these escape routes.

Sikich could see from P-64’s GPS collar that he was in the Simi Hills when the fire started. He began heading south, but ran smack into a developed area, which adult males do their best to avoid, even without the chaos of evacuations and fire engines.

“So he had two options,” Sikich said. “He could have entered the urban area or turned around and go back onto the burnt landscape, which he did.”

A few weeks later, Sikich got a mortality signal from P-64’s radio collar. “We didn’t know at the time, of course, but when we found him, he had burnt paws,” he said. “So he died from the effects of the fire.”

The cat was emaciated, with smoke-damaged lungs. His burnt paws hindered his ability to hunt. He likely starved to death.

When the team compared collared cats 15 months before and after the fire, they saw that the surviving cats avoided the burned areas. Lions need cover to hunt but the area was “just a moonscape,” Sikich said. The loss of that habitat forced the cats to take greater risks, likely to find food.

Mountain lions tend to be more active around dawn and dusk, but after the fire, collared cats were more active during the day. That meant they were more likely to run into people and cross roads and even busy freeways, Sikich and his team reported in a 2022 study.

On Dec. 3, 2018, National Park Service researchers discovered the remains of P-64, who survived the flames of the Woolsey Fire but died a few weeks later. The lion was emaciated and likely starved to death, unable to hunt with burnt paws.

Credit: National Park Service

On Dec. 3, 2018, National Park Service researchers discovered the remains of P-64, who survived the flames of the Woolsey Fire but died a few weeks later. The lion was emaciated and likely starved to death, unable to hunt with burnt paws. Credit: National Park Service

“We expect animals, in the long run, to adapt to the environments in which they live,” said Blumstein, who contributed to the study. In California, they adapted to coastal chaparral fires but not to fires in a fragmented habitat dominated by development. And when animals adapt to something, there can be mismatches between what they see as attractive and what’s good for them, he explained.

“Historically, being attracted to dense vegetation might have been a good thing, but if the only dense vegetation left after a fire is around people’s houses, that may not be a good thing,” he said.

Two cats tracked after the fire died of rodenticide poisoning and another was killed by a vehicle.

The cats also traveled greater distances, which put young males at greater risk of running into older males defending their territory. The cat who died on the road was the first to successfully cross the 405 interstate, the busiest in the nation, from the Santa Monica Mountains into the Hollywood Hills. Sikich knew from remote cameras that an adult male had lived there for years. Then after the fire, surveillance footage from a camera in a gated community caught that dominant male chasing the young intruder up a tree, then toward the freeway.

“He tried to head back west but wasn’t lucky this time as he crossed the 405,” Sikich said.

Add climate change-fueled fires to the list of human activity that’s threatening the survival of Southern California’s mountain lions.

Counting on wildlife crossings

When the Woolsey Fire took out half of the open space in the Santa Monica Mountains, it placed considerable stress on animals from mountain lions to monarchs, said Pratt of the National Wildlife Federation. These massive fires underscore the urgent need to connect isolated patches of habitat to boost species’ ability to cope with other stressors, especially in an urban environment, she said.

Studies by Sikich and others’ demonstrated the critical need for a wildlife crossing across Highway 101 to connect protected habitat in the Santa Monica Mountains with habitat in the Simi Hills in the north. It was at a tunnel underneath the 101 connecting those two regions that Sikich first saw the “culvert cat,” the lion with burnt paws who perished in the Woolsey Fire.

More than 20 years of research highlights the importance of connectivity in these fire-prone areas, he said, so animals can safely get across the freeways around these urban areas.

Pratt helped raise awareness about the need for a wildlife crossing through the #SaveLACougars campaign. She also helped raise tens of millions of dollars to build the Wallis Annenberg Wildlife Crossing, aided by P-22, the mountain lion who became world-famous as the “Hollywood cat.” P-22 lived his life within an improbably small 8-square-mile home range in LA’s Griffith Park, after crossing two of the nation’s busiest freeways.

The crossing broke ground in 2022, the same year wildlife officials euthanized P-22, after they determined the 12-year-old cat was suffering from multiple serious injuries, likely from a vehicle strike, debilitating health problems, and rodenticide poisoning.

Wildlife crossing and connectivity projects don’t just address biodiversity collapse, they also boost fire and climate resilience, Pratt said, because they give animals options, whether to escape fire, drought, or roads.

Thinking of fire as something to fight is a losing battle, she said. “It’s something we have to coexist with. And I think that we are making investments that are trying to take out a reliance on fossil fuels so that the conditions for these fires are not so severe,” she said, referring to California’s targets to slash greenhouse gas emissions within the next 20 years.

Even with the inbreeding and lethal threats from cars and rat poison, Sikich sees reason to be hopeful for the Santa Monica lion population.

For one thing, he said, “we’re seeing reproduction,” pointing to the mom with kittens seen above the Palisades fire and new litters among the females his team is following. “And the amount of natural habitat we do have is great,” he said, with plenty of deer and cover for hunting. “That’s why we still have lions.”

This story originally appeared on Inside Climate News.

Photo of Inside Climate News

Southern California wildfires likely outpace ability of wildlife to adapt Read More »

sleep,-diet,-exercise-and-glp-1-drugs

Sleep, Diet, Exercise and GLP-1 Drugs

As always, some people need practical advice, and we can’t agree on how any of this works and we are all different and our motivations are different, so figuring out the best things to do is difficult. Here are various hopefully useful notes.

  1. Effectiveness of GLP-1 Drugs.

  2. What Passes for Skepticism on GLP-1s.

  3. The Joy of Willpower.

  4. Talking Supply.

  5. Talking Price.

  6. GLP-1 Inhibitors Help Solve All Your Problems.

  7. Dieting the Hard Way.

  8. Nutrients.

  9. Are Vegetables a Scam?.

  10. Government Food Labels Are Often Obvious Nonsense.

  11. Sleep.

  12. Find a Way to Enjoy Exercise.

  13. A Note on Alcohol.

  14. Focus Only On What Matters.

GLP-1 drugs are so effective that the American obesity rate is falling.

John Burn-Murdoch: While we can’t be certain that the new generation of drugs are behind this reversal, it is highly likely. For one, the decline in obesity is steepest among college graduates, the group using them at the highest rate.

In the college educated group the decline is about 20% already. This is huge.

This and our other observations are not easy to reconcile with this study, which I note for completeness and shows only 5% average weight loss in obese patients after one year. Which would be a spectacular result for any other drug. There’s a lot of data that says that in real world conditions you do a hell of a lot better on average than 5% here.

Here’s a strange framing from the AP: ‘As many as 1 in 5 people won’t lose weight with GLP-1 drugs, experts say.’

Jonel Aleccia: “I have been on Wegovy for a year and a half and have only lost 13 pounds,” said Griffin, who watches her diet, drinks plenty of water and exercises regularly. “I’ve done everything right with no success. It’s discouraging.”

Whether or not that is 13 more pounds than he would have lost otherwise, it’s not the worst outcome, as opposed to the 5 in 5 people who won’t lose weight without GLP-1 drugs. 4 out of 5 is pretty damn exciting. I love those odds.

Eliezer Yudkowsky offers caveats on GLP-1 drugs regarding muscle mass. Even if these concerns turn out to be fully correct, the drugs still seems obviously worthwhile to me for those who need it and where it solves their problems.

He also reports it did not work for him, causing the usual replies full of 101-level suggestions he’s already tried.

I presume it would not work for me, either. Its mechanism does not solve my problems. I actually can control my diet and exercise choices, within certain limits, if only through force of will.

My issue is a stupidly slow metabolism. Enjoying and craving food less wouldn’t help.

That’s the real best argument I know against GLP-1s, that it only works on the motivation and willpower layer, so if you’ve got that layer handled and your problems lie elsewhere, it won’t help you.

And also cultivating the willpower layer can be good.

Samo Burja: Compelling argument. Papers by lying academics or tweets by grifters pale in comparison.

This is the state of the art in nutrition science and is yet to be surpassed.

I’m embarking on this diet experiment, starting today. 💪

People ask me if I’m on Ozempic, and I say no.

Don’t you understand the joy of willpower?

How much should we care about whether we are using willpower?

There are three reasons we could care about this.

  1. Use of willpower cultivates willpower or is otherwise ‘good for you.’

  2. Use of willpower signals willpower.

  3. The positional advantage of willpower is shrinking and we might not like that.

Wayne Burkett: People do this thing where they pretend not to understand why anybody would care that drugs like Ozempic eliminate the need to apply willpower to lose weight, but I think basically everybody understands on some level that the application of willpower is good for the souls of the people who are capable of it.

This is concern one.

There are two conflicting models you see on this.

  1. The more you use willpower, the more you build up your willpower.

  2. The more you use willpower, the more you run out of willpower.

This is where it gets complicated.

  1. There’s almost certainly a short-term cost to using willpower. On days you have to use willpower on eating less, you are going to have less of it, and less overall capacity, for other things. So that’s a point in favor of GLP-1s.

  2. That short-term cost doesn’t ever fully go away. If you’re on a permanent diet, yes it likely eventually gets easier via habits, but it’s a cost you pay every day. I pay it every day, and this definitely uses a substantial portion of my total willpower, despite having pulled this off for over 20 years.

  3. The long-term effect of using willpower and cultivating related habits seems to have a positive effect on some combination of overall willpower and transfer into adjacent domains, and one’s self-image, and so on. You learn a bunch of good meta habits.

  4. If you don’t have to spend the willpower on food, you could instead build up those same meta habits elsewhere, such as on exercise or screen time.

  5. However, eating is often much better at providing motivation for learning to use willpower than alternative options. People might be strictly better off in theory, and still be worse off in practice.

My guess is that for most people, especially most people who have already tried hard to control their weight, this is a net positive effect.

I agree that there are some, especially in younger generations who don’t have the past experience of trying to diet via willpower, and who might decide they don’t need willpower, who might end up a lot worse off.

It’s a risk. But in general we should have a very high bar before we act as if introducing obstacles to people’s lives is net positive for them, or in this case that dieting is net worthwhile ‘willpower homework.’ Especially given that quite a lot of people seem to respond to willpower being necessary to not fail at this, by failing.

Then we get to a mix of the second and third objections.

Wayne Burkett: If you take away that need, then you level everybody else up, but you also level down the people who are well adapted to that need.

That’s probably a net win — not even probably, almost certainly — but it’s silly to pretend not to understand that there’s an element to all these things that’s positional.

An element? Sure. If you look and feel better than those around you, and are healthier than they are, then you have a positional advantage, and are more likely to win competitions than if everyone was equal, and you signal your willpower and all that.

I would argue it is on net rather small portion of the advantages.

My claim is that most of being a healthy weight is an absolute good, not a positional good. The health benefits are yours. The physically feeling better and actually looking better and being able to do more things and have more energy benefits are absolute.

Also, it’s kind of awesome when those around you are all physically healthy and generally more attractive? There are tons of benefits, to you, from that. Yes, relative status will suffer, and that is a real downside for you in competitions, especially winner-take-all competitions (e.g. the Hollywood problem) and when this is otherwise a major factor in hiring.

But you suffer a lot less in dating and other matching markets, and again I think the non-positional goods mostly dominate. If I could turn up or down the health and attractiveness of everyone around me, but I stayed the same, purely for my selfish purposes, I would very much help everyone else out.

I actually say this as someone who does have a substantial amount of my self-image wrapped up in having succeeded in being thin through the use of extreme amounts of willpower, although of course I have other fallbacks available.

A lot of people saying this sort of stuff pretty obviously just don’t have a lot of their personality wrapped up in being thin or in shape and would see this a lot more clearly if a drug were invented that equalized everyone’s IQ. Suddenly they’d be a little nervous about giving everybody equal access to the thing they think makes them special.

“But it’s really bad that these things are positional and we should definitely want to level everybody up” says the guy who is currently positioned at the bottom.

This is a theoretical, but IQ is mostly absolute. And there is a reason it is good advice to never be the smartest person in the room. It would be obviously great to raise everyone up if it didn’t also involve knocking people down.

Would it cost some amount of relative status? Perhaps, but beyond worth it.

In the end, I’m deeply unsympathetic to the second and third concerns above – your willpower advantage will still serve you well, you are not worse off overall, and so on.

In terms of cultivating willpower over the long term, I do have long term concerns we could be importantly limiting opportunities for this, in particular because it provides excellent forms of physical feedback. But mostly I think This Is Fine. We have lots of other opportunities to cultivate willpower. What convinces me is that we’ve already reached a point where it seems most people don’t use food to cultivate willpower. At some point, you are Socrates complaining about the younger generation reading, and you have to get over it.

We can’t get enough supply of those GLP-1s, even at current prices. The FDA briefly said we no longer had a shortage and people would have to stop making unauthorized versions via compounding, but intense public pressure they reversed their position two weeks later.

Should Medicare and Medicaid cover GLP-1? Republicans are split. My answer is that if we have sufficient supply available, then obviouslyh yes, even at current prices, although we probably can’t stomach it. While we are supply limited, obviously no.

Tyler Cowen defends the prices Americans pay for GLP-1 drugs, saying they support future R&D and that you can get versions for as low as $400/month or do even better via compounding.

I buy that the world needs to back up the truck and pay Novo Nordisk the big bucks. They’ve earned it and the incentives are super important to ensure we continue doing research going forward, and we need to honor our commitments. But this does not address several key issues.

The first key issue is that America is paying disproportionately, while others don’t pay their fair share. Together we should pay, and yes America benefits enough that the ‘rational’ thing to do is pick up the check even if others won’t, including others who could afford to.

But that’s also a way to ensure no one else ever pays their share, and that kind of ‘rational’ thinking is not ultimately rational, which is something both strong rationalists and Donald Trump have figured out in different ways. At some point it is a sucker’s game, and we should pay partly on condition that others also pay. Are we at that point with prescription drugs, or GLP-1 inhibitors in particular?

One can also ask whether Tyler’s argument proves too much – is it arguing we should choose to pay double the going market prices? Actively prevent discounting? If we don’t, does that make us ‘the supervillains’? Is this similar to Peter Singer’s argument about the drowning child?

The second key issue is that the incentives this creates are good on the research side, but bad on the consumption side. Monopoly pricing creates large deadweight losses.

The marginal cost of production is low, but the marginal cost of consumption is high, meaning a rather epic deadweight loss triangle from consumers who would benefit from GLP-1s if bought at production cost, but who cannot afford to pay $400 or $1,000 a month. Nor can even the government afford it, at this scale. Since 40% of Americans are obese and these drugs also help with other conditions, it might make sense to put 40% of Americans on GLP-1 drugs, instead of the roughly 10% currently on them.

The solution remains obvious. We should buy out the patents to such drugs.

This solves the consumption side. It removes the deadweight loss triangle from lost consumption. It removes the hardship of those who struggle to pay, as we can then allow generic competition to do its thing and charge near marginal cost. It would be super popular. It uses government’s low financing costs to provide locked-in up front cold hard cash to Novo Nordisk, presumably the best way to get them and others to invest the maximum in more R&D.

There are lots of obvious gains here, for on the order of $100 billion. Cut the check.

GLP-1 drugs linked to drop in opioid overdoses. Study found hazard ratios from 0.32 to 0.58, so a decline in risk of between roughly half and two-thirds.

GLP-1 drugs also reduce Alzheimer’s 40%-70% in patients with Type 2 Diabetes? This is a long term effect, so we don’t know if this would carry over to others yet.

This Nature post looks into theories of why GPL-1 drugs seem to help with essentially everything.

If you don’t want to do GLP-1s and you can’t date a sufficiently attractive person, here’s a claim that Keto Has Clearly Failed for Obesity, suggesting that people try keto, low-fat and protein restriction in sequence in case one works for you. Alas, the math here is off, because the experimenter is assuming non-overlapping ‘works for me’ groups (if anything I suspect positive correlation!), so no even if the other %s are right that won’t get you to 80%. The good news is if things get tough you can go for the GLP-1s now.

Bizarre freak that I am on many levels, I’m now building muscle via massive intake of protein shakes, regular lifting workouts to failure and half an hour of daily cardio, and otherwise down to something like 9-10 meals in a week. It is definitely working, but I’m not about to recommend everyone follow in my footsteps. This is life when you are the Greek God of both slow metabolism and sheer willpower.

Aella asks the hard questions. Such as:

Aella: I’ve mostly given up on trying to force myself to eat vegetables and idk my life still seems to be going fine. Are veggies a psyop? I’ve never liked them.

Jim Babcock: Veggies look great in observational data because they’re the lowest-priority thing in a sort of Maslow’s Hierarchy of Foods. People instinctively prioritize: first get enough protein, then enough calories, then enough electrolytes, then… if you don’t really need anything, veg.

Eric Schmidt: Psyop.

Psyop. You do need fiber one way or another. And there are a few other ways they seem helpful, and you do need a way to fill up without consuming too many calories. But no, they do not seem in any way necessary, you can absolutely go mostly without them. You’ll effectively pick up small amounts of them anyway without trying.

The key missing element in public health discussions of food, and also discussions of everything else, of course joy and actual human preferences and values.

Stian Westlake: I read a lot of strategies and reports on obesity and health, and it’s striking how few of them mention words like conviviality or deliciousness, or the idea that food is a source of joy, comfort and love.

Tom Chivers: this is such a common theme in public health. You need a term in your equation for the fact that people enjoy things – drinking, eating sweets, whatever – or they look like pure costs with no benefit whatsoever, so the seemingly correct thing to do will always be to reduce them.

Anders Sandberg: The Swedish public health authority recommended reducing screen usage among young people in a report that carefully looked at possible harms, but only cursorily at what the good sides were.

In case you were wondering if that’s a strawman, here’s Stian’s top response:

Mark: Seeing food as a “source of joy, comfort and love?” That mindset sounds like what would be used to rationalize unhealthy choices with respect to quantity and types of food. It sounds like a mantra for obesity.

Food is absolutely one of life’s top sources of joy, comfort and love. People downplay it, and some don’t appreciate it, but it’s definitely top 10, and I’d say it’s top 5. And maybe not overall but on some days, especially when you’re otherwise down or you put in the effort, it can absolutely 100% be top 1.

If I had to choose between ‘food is permanently joyless and actively sad, although not torture or anything, but you’re fit and healthy’ and ‘food is a source of joy, comfort and love, but you don’t feel so good about yourself physically and it’s not your imagination’ then I’d want to choose the first one… but I don’t think the answer is as obvious as some people think, and I’m fortunate I didn’t have to fully make that choice.

One potential fun way to get motivated is to date someone more attractive. Women who are dating more attractive partners had more motivation for losing weight, in the latest ‘you’ll never believe what science found’ study. Which then gets described, because it is 2024, as ‘there might be social factors playing a role in women’s disordered eating’ and an ‘ugly truth’ rather than ‘people respond to incentives.’

Carmen claims that to get most of the nutrients from produce what matters is time from harvest to consumption, while other factors like price and being organic matter little. And it turns out Walmart (!) does better than grocery stores on getting the goods to you in time, while farmers markets can be great but have large variance.

This also suggests that you need to consume what you buy quickly, and that buying things not locally in season should be minimized. If you’re eating produce for its nutrients, then the dramatic declines in average value here should make you question that strategy, and they he say that on this front frozen produce does as well or better on net versus fresh. There are of course other reasons.

It also reinforces the frustration with our fascination over whether a given thing is ‘good for you’ or not. There’s essentially no way to raise kids without them latching onto this phrase, even if both parents know better. Whereas the actual situation is super complicated, and if you wanted to get it right you’d need to do a ton of research on your particular situation.

My guess is Mu. It would be misleading to say either they were or were not a scam.

Aella: I think vegetables might be a scam. I hate them, and recently stopped trying to make myself eat them, and I feel fine. No issues. Life goes on; I am vegetable-free and slightly happier.

Rick the Tech Dad: Have you ever tried some of the fancier stuff? High quality Brussels sprouts cooked in maple syrup with bacon? Sweet Heirloom carrots in a sugar glaze? Chinese broccoli in cheese sauce?

Aella: Carrots are fine. The rest is just trying to disguise terrible food by smothering it in good food.

I have been mostly ‘vegetable-and-fruit-free’ for over 30 years, because:

  1. If I try to eat most vegetables or fruits of any substantial size, my brain decides that what I am consuming is Not Food, and this causes me to increasingly gag with the size and texture of the object involved.

  2. To the extent I do manage to consume such items in spite of this issue, in most cases those objects bring me no joy at all.

  3. When they do bring me any joy or even the absence of acute suffering, this usually requires smothering them such that most calories are coming from elsewhere.

  4. I do get exposure from some sauces, but mostly not other sources.

  5. This seems to be slowly improving over the last ~10 years, but very slowly.

  6. I never noticed substantial ill-effects and I never got any cravings.

  7. To the extent I did have substantial ill-effects, they were easily fixable.

  8. The claims of big benefits or trouble seem based on correlations that could easily not be causal. Obviously if you lecture everyone that Responsible People Eat Crazy Amounts of Vegetables well beyond what most people enjoy, and also they fill up stomach space for very few calories and thus reduce overall caloric consumption, there’s going to be very positive correlations here.

  9. All of nutrition is quirky at best, everyone is different and no one knows anything.

  10. Proposed actions in response to the problem tend to be completely insane asks.

People will be like ‘we have these correlational studies so you should change your entire diet to things your body doesn’t tell you are good and that bring you zero joy.’

I mean, seriously, fthat s. No.

I do buy that people have various specific nutritional requirements, and that not eating vegetables and fruits means you risk having deficits in various places. The same is true of basically any exclusionary diet chosen for whatever reason, and especially true for e.g. vegans.

In practice, the only thing that seems to be an actual issue is fiber.

Government assessments of what is healthy are rather insane on the regular, so this is not exactly news, but when Wagyu ground beef gets a D and Fruit Loops get a B, and McDonald’s fries get an A, you have a problem.

Yes, this is technically a ‘category based system’ but that only raises further questions. Does anyone think that will in practice help the average consumer?

I see why some galaxy brained official might think that what people need to know is how this specific source of ground beef compares to other sources of ground beef. Obviously that’s the information the customer needs to know, says this person. That person is fruit loops and needs to watch their plan come into contact with the enemy.

Bryan Johnson suggests that eating too close to bed is bad for your sleep, and hence for your health and work performance.

As with all nutritional and diet advice, this seems like a clear case of different things working differently for different people.

And I am confident Bryan is stat-maxing sleep and everything else in ways that might be actively unhealthy.

It is however worth noticing that the following are at least sometimes true, for some people:

Bryan Johnson:

  1. Eating too close to bedtime increases how long you’re awake at night. This leads you to wanting to stay in bed longer to feel rested.

  2. High fat intake before bed can lower sleep efficiency and cause a longer time to fall asleep. Late-night eating is also associated with reduced fatty acid oxidation (body is less efficient at breaking down fats during sleep). Also can cause weight gain and potentially obesity if eating patterns are chronic.

  3. Consuming large meals or certain foods (spicy or high-fat foods) before bed can cause digestive issues like heartburn, which can disrupt sleep.

  4. Eating late at night can interfere with your circadian rhythm, negatively effecting sleep patterns.

  5. Eating late is asking the body to do two things at the same time: digest food and run sleep processes. This creates a body traffic jam.

  6. Eating late can increase plasma cortisol levels, a stress hormone that can further affect metabolism and sleep quality.

What to do:

  1. Experiment with eating earlier. Start with your last meal of the day 2 hours before bed and then try to 3, 4, 5, and 6 hours.

  2. Experiment with eating different foods and build intuition. For me, things like pasta, pizza and alcohol are guaranteed to wreck my sleep. If I eat steamed veggies or something similarly light hours before bed sometimes, I usually don’t see any negative effects.

  3. Measure your resting heart rate before bed. After years of working to master high quality sleep, my RHR before bed is the single strongest predictor of whether I’ll get high quality or low quality sleep. Eating earlier will lower your RHR at bedtime.

  4. If you’re out late with friends or family, feel free to eat for the social occasion. Just try to light foods lightly.

I’ve run a natural version of this experiment, because my metabolism is so slow that I don’t ever eat three meals in a day. For many years I almost never ate after 2pm. For the most recent 15 years or so, I’ll eat dinner on Fridays with the family, and maybe twice a month on other days, and that’s it.

When I first wrote this section, I had not noticed a tendency to have worse sleep on Fridays, with the caveat that this still represents a minimum of about four hours before bed anyway since we rarely eat later than 6pm.

Since then, I have paid more attention, and I have noticed the pattern. Yes, on days that I eat lunch rather than dinner, or I eat neither, I tend to sleep better, in a modest but noticeable way.

I have never understood why you would want to eat dinner at 8pm or 9pm in any case – you’ve gone hungry the whole day, and now when you’re not you don’t get to enjoy that for long. Why play so badly?

The other tendency is that if you eat quite a lot, it can knock you out, see Thanksgiving. Is that also making your sleep worse? That’s not how I’d instinctively think of it, but I can see that point of view.

What about the other swords in the picture?

  1. Screen time has never bothered me, including directly before sleep. Indeed, watching television is my preferred wind-down activity for going to sleep. Overall I get tons of screen time and I don’t think it matters for this.

  2. I never drink alcohol so I don’t have any data on that one.

  3. I never drink large amounts of caffeine either, so this doesn’t matter much either.

  4. Healthier food, and less junk food, are subjective descriptions, with ‘less sugar’ being similar but better defined. I don’t see a large enough effect to worry about this until the point where I’m getting other signals that I’ve eaten too much sugar or other junk food. At which point, yes, there’s a noticeable effect, but I should almost never be doing that anyway.

  5. Going to bed early is great… when it works. But if you’re not ready, it won’t work. Mostly I find it’s more important to not stay up too late.

  6. But also none of these effects are so big that you should be absolutist about it all.

Physical activity is declining, so people spend less energy, and this is a substantial portion of why people are getting fatter. Good news is this suggests a local fix.

That is also presumably the primary cause of this result?

We now studied the Total energy expenditure (TEE) of 4799 individuals in Europe and the USA between the late 1980s and 2018 using the IAEA DLW database. We show there has been a significant decline in adjusted TEE over this interval of about 7.7% in males and 5.6% in females.

We are currently expending about 220 kcal/d less for males and 122 kcal/d less for females than people of our age and body composition were in the late 1980s. These changes are sufficient to explain the obesity epidemic in the USA.

What’s the best way to exercise and get in shape? Matt Yglesias points out that those who are most fit tend to be exercise enjoyers, the way he enjoys writing takes, whereas he and many others hate exercising. Which means if you start an exercise plan, you’ll probably fail. And indeed, I’ve started many exercise plans, and they’ve predictably almost all failed, because I hated doing them and couldn’t find anything I liked.

Ultimately what did work were the times I managed to finally figure out how to de facto be an exercise enjoyer and want to do it. A lot of that was finding something where the benefits were tangible enough to be motivating, but also other things, like being able to do it at home while watching television.

Unlike how I lost the weight, this one I do think mostly generalizes, and you really do need to just find a way to hack into enjoying yourself.

Here are some related claims about exercise, I am pretty sure Andrew is right here:

Diane Yap: I know this guy, SWE manager at a big tech company, Princeton grad. Recently broke up with a long term gf. His idea on how to get back in the dating market? Go to the gym and build more muscles. Sigh. I gave him a pep talk and convinced him that the girls for which that would make a difference aren’t worth his time anyway.

ofir geller: it can give him confidence which helps with almost all women.

Diane Yap: Ah, well if that’s the goal I can do that with words and save him some time.

Andrew Rettek: The first year or two of muscle building definitely improves your attractiveness. By the time you’re into year 5+ the returns on sexiness slow down or go negative across the whole population.

As someone who is half a year into muscle building for health, yes it quite obviously makes you more attractive and helps you feel confident and sexy and that all helps you a lot on the dating market, and also in general.

The in general part is most important.

Whenever someone finally does start lifting heavy things in some form, or even things like walking more, there is essentially universal self-reporting that the returns are insanely great. Almost everyone reports feeling better, and usually also looking better, thinking better and performing better in various ways.

It’s not a More Dakka situation, because the optimal amount for most people does not seem crazy high. It does seem like not a hard decision.

Exercise and weight training is the universal miracle drug. It’s insane to talk someone out of it. But yes, like anything else there are diminishing returns and you can overdose, and the people most obsessed with it do overdose and it actively backfires, so don’t go nuts. That seems totally obvious.

A plurality of Americans (45%) now correctly believe alcohol in moderation is bad for your health, versus 43% that think it makes no difference and 8% that think it is good.

It was always such a scam telling people that they needed to drink ‘for their health.’

I am not saying that there are zero situations in which it is correct to drink alcohol.

I would however say that if you think it falls under the classification of: If drinking seems like a good idea, it probably isn’t, even after accounting for this rule.

I call that Finkel’s Law. It applies here as much as anywhere.

My basic model is: Exercise and finding ways to actually do it matters. Finding a way to eat a reasonable amount without driving yourself crazy or taking the joy out of life, whether or not that involves Ozempic or another similar drug, matters, and avoiding acute deficits matters. Getting reasonable sleep matters. A lot of the details after that? They mostly don’t matter.

But you should experiment, be empirical, and observe what works for you in particular.

Discussion about this post

Sleep, Diet, Exercise and GLP-1 Drugs Read More »

report:-apple-mail-is-getting-automatic-categories-on-ipados-and-macos

Report: Apple Mail is getting automatic categories on iPadOS and macOS

Unlike numerous other new and recent OS-level features from Apple, mail sorting does not require a device capable of supporting its Apple Intelligence (generally M-series Macs or iPads), and happens entirely on the device. It’s an optional feature and available only for English-language emails.

Apple released a third beta of MacOS 15.3 just days ago, indicating that early, developer-oriented builds of macOS 15.4 with the sorting feature should be weeks away. While Gurman’s newsletter suggests mail sorting will also arrive in the Mail app for iPadOS, he did not specify which version, though the timing would suggest the roughly simultaneous release of iPadOS 18.4.

Also slated to arrive in the same update for Apple-Intelligence-ready devices is the version of Siri that understands more context about questions, from what’s on your screen and in your apps. “Add this address to Rick’s contact information,” “When is my mom’s flight landing,” and “What time do I have dinner with her” are the sorts of examples Apple highlighted in its June unveiling of iOS 18.

Since then, Apple has divvied up certain aspects of Intelligence into different OS point updates. General ChatGPT access and image generation have arrived in iOS 18.2 (and related Mac and iPad updates), while notification summaries, which can be pretty rough, are being rethought and better labeled and will be removed from certain news notifications in iOS 18.3.

Report: Apple Mail is getting automatic categories on iPadOS and macOS Read More »