Author name: Rejus Almole

7-zip-0-day-was-exploited-in-russia’s-ongoing-invasion-of-ukraine

7-Zip 0-day was exploited in Russia’s ongoing invasion of Ukraine

Researchers said they recently discovered a zero-day vulnerability in the 7-Zip archiving utility that was actively exploited as part of Russia’s ongoing invasion of Ukraine.

The vulnerability allowed a Russian cybercrime group to override a Windows protection designed to limit the execution of files downloaded from the Internet. The defense is commonly known as MotW, short for Mark of the Web. It works by placing a “Zone.Identifier” tag on all files downloaded from the Internet or from a networked share. This tag, a type of NTFS Alternate Data Stream and in the form of a ZoneID=3, subjects the file to additional scrutiny from Windows Defender SmartScreen and restrictions on how or when it can be executed.

There’s an archive in my archive

The 7-Zip vulnerability allowed the Russian cybercrime group to bypass those protections. Exploits worked by embedding an executable file within an archive and then embedding the archive into another archive. While the outer archive carried the MotW tag, the inner one did not. The vulnerability, tracked as CVE-2025-0411, was fixed with the release of version 24.09 in late November.

Tag attributes of outer archive showing the MotW. Credit: Trend Micro

Attributes of inner-archive showing MotW tag is missing. Credit: Trend Micro

“The root cause of CVE-2025-0411 is that prior to version 24.09, 7-Zip did not properly propagate MoTW protections to the content of double-encapsulated archives,” wrote Peter Girnus, a researcher at Trend Micro, the security firm that discovered the vulnerability. “This allows threat actors to craft archives containing malicious scripts or executables that will not receive MoTW protections, leaving Windows users vulnerable to attacks.”

7-Zip 0-day was exploited in Russia’s ongoing invasion of Ukraine Read More »

hugging-face-clones-openai’s-deep-research-in-24-hours

Hugging Face clones OpenAI’s Deep Research in 24 hours

On Tuesday, Hugging Face researchers released an open source AI research agent called “Open Deep Research,” created by an in-house team as a challenge 24 hours after the launch of OpenAI’s Deep Research feature, which can autonomously browse the web and create research reports. The project seeks to match Deep Research’s performance while making the technology freely available to developers.

“While powerful LLMs are now freely available in open-source, OpenAI didn’t disclose much about the agentic framework underlying Deep Research,” writes Hugging Face on its announcement page. “So we decided to embark on a 24-hour mission to reproduce their results and open-source the needed framework along the way!”

Similar to both OpenAI’s Deep Research and Google’s implementation of its own “Deep Research” using Gemini (first introduced in December—before OpenAI), Hugging Face’s solution adds an “agent” framework to an existing AI model to allow it to perform multi-step tasks, such as collecting information and building the report as it goes along that it presents to the user at the end.

The open source clone is already racking up comparable benchmark results. After only a day’s work, Hugging Face’s Open Deep Research has reached 55.15 percent accuracy on the General AI Assistants (GAIA) benchmark, which tests an AI model’s ability to gather and synthesize information from multiple sources. OpenAI’s Deep Research scored 67.36 percent accuracy on the same benchmark.

As Hugging Face points out in its post, GAIA includes complex multi-step questions such as this one:

Which of the fruits shown in the 2008 painting “Embroidery from Uzbekistan” were served as part of the October 1949 breakfast menu for the ocean liner that was later used as a floating prop for the film “The Last Voyage”? Give the items as a comma-separated list, ordering them in clockwise order based on their arrangement in the painting starting from the 12 o’clock position. Use the plural form of each fruit.

To correctly answer that type of question, the AI agent must seek out multiple disparate sources and assemble them into a coherent answer. Many of the questions in GAIA represent no easy task, even for a human, so they test agentic AI’s mettle quite well.

Hugging Face clones OpenAI’s Deep Research in 24 hours Read More »

microsoft-365’s-vpn-feature-will-be-shut-off-at-the-end-of-the-month

Microsoft 365’s VPN feature will be shut off at the end of the month

Last month, Microsoft announced that it was increasing the prices for consumer Microsoft 365 plans for the first time since introducing them as Office 365 plans more than a decade ago. Microsoft is using new Copilot-branded generative AI features to justify the price increases, which amount to an extra $3 per month or $30 per year for both individual and family plans.

But Microsoft giveth (and chargeth more) and Microsoft taketh away; according to a support page, the company is also removing the “privacy protection” VPN feature from Microsoft 365’s Microsoft Defender app for Windows, macOS, iOS, and Android. Other Defender features, including identity theft protection and anti-malware protection, will continue to be available. Privacy protection will stop functioning on February 28.

Microsoft didn’t say exactly why it was removing the feature, but the company implied that not enough people were using the service.

“We routinely evaluate the usage and effectiveness of our features. As such, we are removing the privacy protection feature and will invest in new areas that will better align to customer needs,” the support note reads.

Cutting features at the same time that you raise prices for the first time ever is not, as they say, a Great Look. But the Defender VPN feature was already a bit limited compared to other dedicated VPN services. It came with a 50GB per user, per month data cap, and it automatically excluded “content heavy traffic from reputable sites” like YouTube, Netflix, Disney+, Amazon Prime, Facebook, Instagram, and Whatsapp.

Microsoft 365’s VPN feature will be shut off at the end of the month Read More »

openai-hits-back-at-deepseek-with-o3-mini-reasoning-model

OpenAI hits back at DeepSeek with o3-mini reasoning model

Over the last week, OpenAI’s place atop the AI model hierarchy has been heavily challenged by Chinese model DeepSeek. Today, OpenAI struck back with the public release of o3-mini, its latest simulated reasoning model and the first of its kind the company will offer for free to all users without a subscription.

First teased last month, OpenAI brags in today’s announcement that o3-mini “advances the boundaries of what small models can achieve.” Like September’s o1-mini before it, the model has been optimized for STEM functions and shows “particular strength in science, math, and coding” despite lower operating costs and latency than o1-mini, OpenAI says.

Harder, better, faster, stronger

Users are able to choose from three different “reasoning effort options” when using o3-mini, allowing them to fine-tune a balance between latency and accuracy depending on the task. The lowest of these reasoning levels generally shows accuracy levels comparable to o1-mini in math and coding benchmarks, according to OpenAI, while the highest matches or surpasses the full-fledged o1 model in the same tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests.

The reasoning effort chosen can have a sizable impact on the accuracy of the o3 model in OpenAI’s tests. Credit: OpenAI

OpenAI says testers reported a 39 percent reduction in “major errors” when using o3-mini, compared to o1-mini, and preferred the o3-mini responses 56 percent of the time. That’s despite the medium version of o3-mini offering a 24 percent faster response time than o1-mini on average—down from 10.16 seconds to 7.7 seconds.

OpenAI hits back at DeepSeek with o3-mini reasoning model Read More »

vghf-opens-free-online-access-to-1,500-classic-game-mags,-30k-historic-files

VGHF opens free online access to 1,500 classic game mags, 30K historic files

In the intro video, Salvador talks about looking through their archives and stumbling on the existence of Pretzel Pete, a little-remembered early 3D driving/platform game. Despite its extreme obscurity, the game is nonetheless mentioned in the 1999 E3 catalog and an old issue of PC Gamer, both of which are now memorialized forever in the VGHF digital archives.

Getting this kind of obscure information into a digitized, easily searchable form was “a lot harder than it sounds,” Salvador said. Beyond getting archival-quality scans of the magazines themselves (a process aided by community efforts like RetroMags and Out of Print Archive), extracting the text from those pages proved difficult for OCR software designed for the high-contrast, black-text-on-white-background world of business documents. “If you’ve ever read a ’90s video game magazine, you know how crazy those magazine layouts get,” Salvador said.

VGHF Head Librarian Phil Salvador talks about the digital library launch.

To get around that problem, Salvador said VGHF Director of Technology Travis Brown spent months developing a specially designed text-recognition tool that “handles even the toughest magazine pages with no problem” and represents “a significant leap in quality over what we had before.” That means it’s easier than ever to find 81 separate mentions of Clu Clu Land from across dozens of different issues with a single search.

Unfortunately, the vast wealth of video game information on offer here does not include direct, playable access to retail video games, which libraries can’t share digitally due to the limitations of the DMCA. But the VGHF and other organizations “continue to challenge those copyright rules every three years,” leaving some hope that digital libraries like this may soon include access to the source material being discussed.

VGHF opens free online access to 1,500 classic game mags, 30K historic files Read More »

i-agree-with-openai:-you-shouldn’t-use-other-peoples’-work-without-permission

I agree with OpenAI: You shouldn’t use other peoples’ work without permission

ChatGPT developer OpenAI and other players in the generative AI business were caught unawares this week by a Chinese company named DeepSeek, whose open source R1 simulated reasoning model provides results similar to OpenAI’s best paid models (with some notable exceptions) despite being created using just a fraction of the computing power.

Since ChatGPT, Stable Diffusion, and other generative AI models first became publicly available in late 2022 and 2023, the US AI industry has been undergirded by the assumption that you’d need ever-greater amounts of training data and compute power to continue improving their models and get—eventually, maybe—to a functioning version of artificial general intelligence, or AGI.

Those assumptions were reflected in everything from Nvidia’s stock price to energy investments and data center plans. Whether DeepSeek fundamentally upends those plans remains to be seen. But at a bare minimum, it has shaken investors who have poured money into OpenAI, a company that reportedly believes it won’t turn a profit until the end of the decade.

OpenAI CEO Sam Altman concedes that the DeepSeek R1 model is “impressive,” but the company is taking steps to protect its models (both language and business); OpenAI told the Financial Times and other outlets that it believed DeepSeek had used output from OpenAI’s models to train the R1 model, a method known as “distillation.” Using OpenAI’s models to train a model that will compete with OpenAI’s models is a violation of the company’s terms of service.

“We take aggressive, proactive countermeasures to protect our technology and will continue working closely with the US government to protect the most capable models being built here,” an OpenAI spokesperson told Ars.

So taking data without permission is bad, now?

I’m not here to say whether the R1 model is the product of distillation. What I can say is that it’s a little rich for OpenAI to suddenly be so very publicly concerned about the sanctity of proprietary data.

I agree with OpenAI: You shouldn’t use other peoples’ work without permission Read More »

ai-#101:-the-shallow-end

AI #101: The Shallow End

The avalanche of DeepSeek news continues. We are not yet spending more than a few hours at a time in the singularity, where news happens faster than it can be processed. But it’s close, and I’ve had to not follow a bunch of other non-AI things that are also happening, at least not well enough to offer any insights.

So this week we’re going to consider China, DeepSeek and r1 fully split off from everything else, and we’ll cover everything related to DeepSeek, including the policy responses to the situation, tomorrow instead.

This is everything else in AI from the past week. Some of it almost feels like it is from another time, so long ago.

I’m afraid you’re going to need to get used to that feeling.

Also, I went on Odd Lots to discuss DeepSeek, where I was and truly hope to again be The Perfect Guest.

  1. Language Models Offer Mundane Utility. Time to think deeply.

  2. Language Models Don’t Offer Mundane Utility. Writers shall remain blocked.

  3. Language Models Don’t Offer You In Particular Mundane Utility. It’s your fault.

  4. (Don’t) Feel the AGI. I wonder how much of this has changed since I wrote it?

  5. Huh, Upgrades. Claude gets citations, o1 gets canvas.

  6. They Took Our Jobs. Will there be enough GPUs to take all our jobs?

  7. Get Involved. IFP is hiring an AI policy lobbyist.

  8. Introducing. Two other new Chinese models are not as impressive so far.

  9. In Other AI News. Great Scott!

  10. Hype. OpenAI used to be the one with the hype, and perhaps it wasn’t so great.

  11. We Had a Deal. Final details on what happened FrontierMath.

  12. Quiet Speculations. What life might look like how fast in Glorious AGI Future.

  13. The Quest for Sane Regulations. We were signing EOs before everyone panicked.

  14. The Week in Audio. It’s me, going on Odd Lots, also Dario Amodei.

  15. Don’t Tread on Me. AGI means rewriting the social contract, no matter what.

  16. Rhetorical Innovation. Trump opines, and then also there’s a long rant.

  17. Scott Sumner on Objectivity in Taste, Ethics and AGI. Gesturing at a response.

  18. The Mask Comes Off (1). There are reasons OpenAI and Musk don’t get along.

  19. The Mask Comes Off (2). Steven Adler, another OpenAI safety researcher, quits.

  20. International AI Safety Report. If you want the information, it’s all there.

  21. One Step at a Time. Myopic optimization with non-myopic approval (technical).

  22. Aligning a Smarter Than Human Intelligence is Difficult. Don’t rely on control.

  23. Two Attractor States. Sufficiently aligned and capable AI, or the other option.

  24. You Play to Win the Game. If your plan doesn’t work, it’s not a good plan.

  25. Six Thoughts on AI Safety. Boaz Barak of OpenAI offers them.

  26. AI Situational Awareness. It knows what you trained it to do.

  27. People Are Worried About AI Killing Everyone. Lots of exciting projects ahead?

  28. Other People Are Not As Worried About AI Killing Everyone. Lying flat, not dead.

  29. The Lighter Side. Update directionality depends upon prior knowledge base.

Joe Weisenthal finally tries out Google Flash Deep Thinking, is impressed.

AI tutors beat out active learning classrooms in a Harvard study by a good margin, for classes like physics and economics.

Koratkar gives Operator a shot at a makeshift level similar to Montezuma’s Revenge.

Nate Silver estimates his productivity is up 5% from LLMs so far, and warns others that they ignore LLMs at their peril, both politically and personally.

Fix all your transcript errors.

LLMs are good at transforming text into less text, but yet good at transforming less text into more text. Note that this rule applies to English but doesn’t apply to code.

Write your code for AI comprehension, not human readability.

Vik: increasingly finding myself designing software for AI comprehension over human readability. e.g. giant files, duplicated code, a lot more verification tests.

Now i just need to convince the agent to stop deleting failing tests…

James Darpinian: IMO these usually increase human readability as well, contra “best practices.”

Vik: agree, makes it so you don’t have to keep the entire codebase in your head. can go in, understand what’s going on, implement your change and get out without spending too much time doing research or worrying you’ve broken something

That’s even more true when the humans are using the AIs to read the code.

A negative review of Devin, the AI SWE, essentially saying that in practice it isn’t yet good enough to be used over tools like Cursor. The company reached out in the replies thanking them for the feedback and offering to explore more with them, which is a good sign for the future, but it seems clear we aren’t ‘there’ yet.

I predict Paul Graham is wrong about this if we stay in ‘economic normal.’ It seems like exactly the kind of combination of inception, attempt at vibe control and failure to realize the future will be unevenly distributed we often see in VC-style circles, on top of the tech simply not being there yet anyway.

Paul Graham: Prediction: From now on we’ll rarely hear the phrase “writer’s block.” 99% of the people experiencing it will give in after a few days and have AI write them a first draft. And the 1% who are too proud to use AI are probably also too proud to use a phrase like “writer’s block.”

Certainly this won’t be true ‘from now on.’ No, r1 is not ready to solve writer’s block, it cannot create first drafts for you if you don’t know what to write on your own in a way that solves most such problems. AI will of course get better at writing, but I predict it will be a while up the ‘tech tree’ before it solves this problem.

And even if it does, it’s going to be a while beyond that before 99% of people with writer’s block even know they have this option, let alone that they are willing to take it.

And even if that happens, the 1% will be outright proud to say they have writer’s block. It means they don’t use AI!

Indeed, seriously, don’t do this:

Fear Buck: Kai Cenat’s $70k AI humanoid robot just tried running away from the AMP house because it kept getting kicked and bullied by Kai, Agent & Fanum 😭😭

Liv Boeree: Yeah don’t do this, because even if something doesn’t have “feelings” it is just teaching you and your followers that it’s okay to act out their worst instincts

Education is where I see the strongest disagreements about AI impact, in the sense that those who generally find AI useful see it as the ultimate tool for learning things that will unleash the world’s knowledge and revolutionize education, and then there are others who see things another way.

PoliMath: AI use is damaging high school and college education enormously in ways that are going to be extremely obvious in 5 years but at that point you can only watch.

I don’t understand this position. Yes, you can use AI to get around your assignments if that’s what you want to do and the system keeps giving you those assignments. Or you can actually try to learn something. If you don’t take that option, I don’t believe you that you would have been learning something before.

Your periodic reminder that the main way to not get utility is to not realize to do so:

Nate Silver: Thinking ChatGPT is useless is midwit. It’s a magic box that answers any question you ask it from levels ranging from modestly coherent to extremely proficient. If you haven’t bothered to figure it out to derive some utility out of it then you’re just being lazy tbh.

Even better than realizing you can use ChatGPT, of course, is using a mix of Claude, Perplexity, Gemini, r1, o1, o1 pro and yes, occasionally GPT-4o.

Others make statements like this when others show them some mundane utility:

Joe Weisenthal: Suppose I have some conference call transcripts, and I want to see what the CEOs said about the labor market.

I could read through all of them.

Or I can ask AI to retrieve the relevant comments and then confirm that they are actually real.

Latter is much more efficient.

Hotel Echo: Large Language Models: for when Ctrl-F is just too much like hard work.

Yes. It is much more efficient. Control-F sucks, it has tons of ‘hallucinations’ in the sense of false positives and also false negatives. It is not a good means to parse a report. We use it because it used to be all we have.

Also some people still don’t do random queries? And some people don’t even get why someone else would want to do that?

Joe Weisenthal: I wrote about how easily and quickly I was able to switch from using ChatGPT to using DeepSeek for my random day-to-day AI queries

Faze Adorno: Who the fhas random day-to-day AI queries?

“I’m gonna use this technology that just makes up information at anywhere between a 5 and 25 percent clip for my everyday information! I’m so smart!”

Joe Weisenthal: Me. I do. I literally just typed that.

LA Banker (so say we all): Whoever doesn’t = ngmi.

Here are two competing theories.

Dan Schwartz: I categorize this by

“People who regularly do things they are not experts in” versus

“People with a regular, time-honed routine for their work and personal life.”

People I know in the latter group genuinely do not have much use for AI!

Jorbs: This is fascinating to me because, in my (limited) attempts to utilize LLMs, they have essentially only been useful in areas where I have significant enough knowledge to tell when the output is inaccurate.

For example, as someone who took a couple of quarters of computer science but is not a regular coder, LLMs are not good enough to be useful for coding for me. They output a lot of material, but it is as much work to parse it and determine what needs fixing as it is to do it from scratch myself.

I resonate but only partially agree with both answers. When doing the things we normally do, you largely do them the way you normally do them. People keep asking if I use LLMs for writing, and no, when writing directly I very much don’t and find all the ‘help me write’ functionality useless – but it’s invaluable for many steps of the process that puts me into position to write, or to help develop and evaluate the ideas that the writing is about.

Whereas I am perhaps the perfect person to get my coding accelerated by AI. I’m often good enough to figure out when it is telling me bullshit, and totally not good enough to generate the answers on my own in reasonable time, and also automates stuff that would take a long time, so I get the trifecta.

On the question of detecting whether the AI is talking bullshit, it’s a known risk of course, but I think that risk is greatly overblown – this used to happen a lot more than it does now, and we forget how other sources have this risk too, and you can develop good habits about knowing which places are more likely to be bullshit versus not even if you don’t know the underlying area, and when there’s enough value to check versus when you’re fine to take its word for it.

A few times a month I will have to make corrections that are not simple typos. A few times I’ve had to rework or discard entire posts because the error was central. I could minimize it somewhat more but mostly it’s an accepted price of doing business the way I do, the timing doesn’t usually allow for hiring a fact checker or true editor, and I try to fix things right away when it happens.

It is very rare for the source of that error to be ‘the AI told me something and I believed it, but the AI was lying.’ It’s almost always either I was confused about or misread something, or a human source got it wrong or was lying, or there was more to a question than I’d realized from reading what others said.

This reaction below is seriously is like having met an especially irresponsible thirteen year old once, and now thinking that no human could ever hold down a job.

And yet, here we often still are.

Patrick McKenzie (last week): You wouldn’t think that people would default to believing something ridiculous which can be disproved by typing into a publicly accessible computer program for twenty seconds. Many people do not have an epistemic strategy which includes twenty seconds of experimentation.

Dave Karsten: Amplifying: I routinely have conversations at DC house parties with very successful people who say that they tried chatGPT _right when it came out_, found it not that impressive, and haven’t tried it again since then, and have based their opinion on AI on that initial experience.

Ahrenbach: What’s the ratio of “AI is all hype” vs “We need to beat China in this technology”?

More the former than the latter in house parties, but that’s partially because more of my defense/natsec people I tend to see at happy hours. (This is a meaningful social distinction in DC life).

Broadly, the average non-natsec DC person is more likely to think it’s either a) all hype or b) if not hype, AI-generated slop with an intentional product plan where, “how do we kill art” is literally on a powerpoint slide.

But overton window is starting to shift.

It is now two weeks later, and the overton window has indeed shifted a bit. There’s a lot more ‘beat China’ all of a sudden, for obvious reasons. But compared to what’s actually happening, the DC folks still absolutely think this is all hype.

Claude API now allows the command ‘citations’ to be enabled, causing it to process whatever documents you share with it, and then it will cite the documents in its response. Cute, I guess. Curious lack of shipping over at Anthropic recently.

o3-mini is coming. It’s a good model, sir.

Benedikt Stroebl: Update on HAL! We just added o3-mini to the Cybench leaderboard.

o3-mini takes the lead with ~26% accuracy, outperforming both Claude 3.5 Sonnet and o1-preview (both at 20%)👇

It’s hard to see, but note the cost column. Claude Sonnet 3.5 costs $12.90, o1-mini cost $28.47, o1-preview cost $117.89 and o3-mini cost $80.21 if it costs the same per token as o1-mini (actual pricing not yet set). So it’s using a lot more tokens.

OpenAI’s canvas now works with o1 and can render HTML and React.

Gemini 2.0 Flash Thinking got an upgrade last week. The 1M token context window opens up interesting possibilities if the rest is good enough, and it’s wicked cheap compared even to r1, and it too has CoT visible. Andrew Curran says it’s amazing, but that opinion reached me via DeepMind amplifying him.

Dan Mac: everyone comparing deepseek-r1 to o1

and forgetting about Gemini 2 Flash Thinking

which is better than r1 on every cost and performance metric

Peter Wildeford: The weird thing about Deepseek is that it exists in a continuum – it is neither the cheapest reasoning model (that’s Gemini 2 Flash Thinking) nor the best reasoning model (o1-pro, probably o3-pro when that’s out)

I disagree with the quoted tweet – I don’t think Gemini 2 Flash Thinking is actually better than r1 on every cost and performance metric. But I also have not seen anything that convinces me that Deepseek is truly some outlier that US labs can’t also easily do.

That is a dramatic drop in price, and dramatic gain in context length. r1 is open, which has its advantages, but we are definitely not giving Flash Thinking its fair trial.

The thing about cost is, yes this is an 80%+ discount, but off of a very tiny number. Unless you are scaling this thing up quite a lot, or you are repeatedly using the entire 1M context window (7.5 cents a pop!) and mostly even then, who cares? Cost is essentially zero versus cost of your time.

Google Deep Research rolling out on Android, for those who hate websites.

Google continues to lean into gloating about its dominance in LMSys Arena. It’s cool and all but at this point it’s not a great look, regardless of how good their models are.

Need practical advice? Tyler Cowen gives highly Tyler Cowen-shaped practical advice for how to deal with the age of AI on a personal level. If you believe broadly in Cowen’s vision of what the future looks like, then these implications seem reasonable. If you think that things will go a lot farther and faster than he does, they’re still interesting, but you’d reach different core conclusions.

Judah offers a thread of different programmer reactions to LLMs.

Are there not enough GPUs to take all the jobs? David Holz says since we only make 5 million GPUs per year and we have 8 billion humans, it’ll be a while even if each GPU can run a virtual human. There are plenty of obvious ways to squeeze out more, there’s no reason each worker needs its own GPU indefinitely as capabilities and efficiency increase and the GPUs get better, and also as Holz notices production will accelerate. In a world where we have demand for this level of compute, this might buy us a few years, but they’ll get there.

Epoch paper from Matthew Barnett warns that AGI could drive wages below subsistence level. That’s a more precise framing than ‘mass unemployment,’ as the question isn’t if there is employment for humans the question is at what wage level, although at some point humans really are annoying enough to use that they’re worth nothing.

Matthew Barnett: In the short term, it may turn out to be much easier to accumulate AGIs than traditional physical capital, making physical capital the scarce factor that limits productivity and pushes wages downward. Yet, there is also a reasonable chance that technological progress could counteract this effect by making labor more productive, allowing wages to remain stable or even rise.

Over the long run, however, the pace of technological progress is likely to slow down, making it increasingly difficult for wages to remain high. At that point, the key constraints are likely to be fundamental resources like land and energy—essential inputs that cannot be expanded through investment. This makes it highly plausible that human wages will fall below subsistence level in the long run.

Informed by these arguments, I would guess that there is roughly a 1 in 3 chance that human wages will crash below subsistence level within 20 years, and a 2 in 3 chance that wages will fall below subsistence level within the next 100 years.

I consider it rather obvious that if AGI can fully substitute for actual all human labor, then wages will drop a lot, likely below subsistence, once we scale up the number of AGIs, even if we otherwise have ‘economic normal.’

That doesn’t answer the objection that human labor might retain some places where AGI can’t properly substitute, either because jobs are protected, or humans are inherently preferred for those jobs, or AGI can’t do some jobs well perhaps due to physical constraints.

If that’s true, then to the extent it remains true some jobs persist, although you have to worry about too many people chasing too few jobs crashing the wage on those remaining jobs. And to the extent we do retain jobs this way, they are driven by human consumption and status needs, which means that those jobs will not cause us to ‘export’ to AIs by default except insofar as they resell the results back to us.

The main body of this paper gets weird. It argues that technological advancement, while ongoing, can protect human wages, and I get why the equations here say that but it does not actually make any sense if you think it through here.

Then it talks about technological advancement stopping as we hit physical limits, while still considering that world being in ‘economic normal’ and also involves physical humans in their current form. That’s pretty weird as a baseline scenario, or something to be paying close attention to now. It also isn’t a situation where it’s weird to think about ‘jobs’ or ‘wages’ for ‘humans’ as a concern in this way.

I do appreciate the emphasis that the comparative advantage and lump of labor fallacy arguments prove a wage greater than zero is likely, but not that it is meaningfully different from zero before various costs, or is above subsistence.

Richard Ngo has a thread criticizing the post, that includes (among other things) stronger versions of these objections. A lot of this seems based on his expectation that humans retain political power in such futures, and essentially use that status to collect rents in the form of artificially high wages. The extent and ways in which this differs from a UBI or a government jobs program is an interesting question.

Tyler Cowen offers the take that future unemployment will be (mostly) voluntary unemployment, in the sense that there will be highly unpleasant jobs people don’t want to do (here be an electrician living on a remote site doing 12 hour shifts) that pay well. And yeah, if you’re willing and able to do things people hate doing and give up your life otherwise to do it, that will help you stay gainfully employed at a good price level for longer, as it always has. I mean, yeah. And even ‘normal’ electricians make good money, because no one wants to do it. But it’s so odd to talk about future employment opportunities without reference to AI – unemployment might stay ‘voluntary’ but the wage you’re passing up might well get a lot worse quickly.

Also, it seems like electrician is a very good business to be in right now?

IFP is hiring a lead for their lobbying on America’s AI leadership, applications close February 21, there’s a bounty so tell them I sent you. I agree with IFP and think they’re great on almost every issue aside from AI. We’ve had our differences on AI policy though, so I talked to them about it. I was satisfied that they plan on doing net positive things, but if you’re considering the job you should of course verify this for yourself.

New review of open problems in mechanistic interpretability.

ByteDance Duabao-1.5-Po, which matches GPT-5o benchmarks at $0.11/$0.275. As with Kimi k1.5 last week, maybe it’s good, but I await evidence of this beyond benchmarks. So far, I haven’t heard anything more.

Alibaba introduces Qwen 2.5-1M, with the 1M standing for a one million token context length they say processes faster now, technical report here. Again, if it’s worth a damn, I expect people to tell me, and if you’re seeing this it means no one did that.

Feedly, the RSS feed I use, tells me it is now offering AI actions. I haven’t tried them because I couldn’t think of any reason I would want to, and Teortaxes is skeptical.

Scott Alexander is looking for a major news outlet to print an editorial from an ex-OpenAI employee who has been featured in NYT, you can email him at [email protected] if you’re interested or know someone who is.

Reid Hoffman launches Manas AI, a ‘full stack AI company setting out to shift drug discovery from a decade-long process to one that takes a few years.’ Reid’s aggressive unjustified dismissals of the downside risks of AI are highly unfortunate, but Reid’s optimism about AI is for the right reasons and it’s great to see him putting that into practice in the right ways. Go team humanity.

ChatGPT Gov, a version that the US Government can deploy.

Claims that are easy to make but worth noting.

Sam Altman: next phase of the msft x oai partnership is gonna be much better than anyone is ready for!!

Free tier of chat will get some o3-mini as a treat, plus tier will get a lot. And o3 pro will still only be $200/month. That must mean even o3-pro is very far from o3-maximum-strength, since that costs more than $200 in compute for individual queries.

Sam Altman: ok we heard y’all.

*plus tier will get 100 o3-mini queries per DAY (!)

*we will bring operator to plus tier as soon as we can

*our next agent will launch with availability in the plus tier

enjoy 😊

i think you will be very very happy with o3 pro!

oAI: No need to thank me.

Spencer Greenberg and Neel Nanda join in the theory that offering public evals that can be hill climbed is plausibly a net negative for safety, and certainly worse than private evals. There is a public information advantage to potentially offset this, but yes the sign of the impact of fully public evals is not obvious.

Meta planning a +2GW data center at the cost of over $60 billion.

From the comments:

(Also I just rewatched the first two Back to the Future movies, they hold up, 5/5 stars.)

Zuckerberg announced this on Facebook, saying it ‘is so large it would cover a significant part of Manhattan.’

Manhattan? It’s a big Project? Get it? Sigh.

Meta was up 2.25% on a mostly down day when this was announced, as opposed to before when announcing big compute investments would cause Meta stock to drop. At minimum, the market didn’t hate it. I hesitate to conclude they loved it, because any given tech stock will often move up or down a few percent for dumb idiosyncratic reasons – so we can’t be sure this was them actively liking it.

Then Meta was up again during the Nvidia bloodbath, so presumably they weren’t thinking ‘oh no look at all that money Meta is wasting on data centers’?

This section looks weird now because what a week and OpenAI has lost all the hype momentum, but that will change, also remember last week?

In any case, Chubby points out to Sam Altman that if you live by the vague-post hype, you die by the vague-post hype, and perhaps that isn’t the right approach to the singularity?

Chubby: You wrote a post today (down below) that irritated me a lot and that I would not have expected from you. Therefore, I would like to briefly address a few points in your comment.

You are the CEO of one of the most important companies of our time, OpenAI. You are not only responsible for the company, but also for your employees. 8 billion people worldwide look up to you, the company, and what you and your employees say. Of course, each of your words is interpreted with great significance.

It is your posts and words that have been responsible for the enthusiasm of many people for AI and ChatGPT for months and years. It is your blog post (Age of Intelligence) that by saying that superintelligence is only a few thousand days away. It is your post in which you say that the path to AGI is clear before us. It is your employees who write about creating an “enslaved god” and wondering how to control it. It is your words that we will enter the age of abundance. It is your employees who discuss the coming superintelligence in front of an audience of millions and wonder what math problems can still be solved before the AI solves everything. It is the White House National Security Advisor who said a few days ago that it is a “Godlike” power that lies in the hands of a few.

And you are insinuating that we, the community, are creating hype? That is, with all due modesty, a blatant insult.

It was you who fueled the hype around Q*/Strawberry/o1 with cryptic strawberry photos. It was you who wrote a haiku about the coming singularity just recently. We all found it exciting, everyone found it interesting, and many got on board.

But the hype is by no means coming from the community. It’s coming from the CEO of what is arguably the most famous corporation in the world.

This is coming from someone to whom great hype is a symbol not of existential risk, as I partly see it, but purely of hope. And they are saying that no, ‘the community’ or Twitter isn’t creating hype, OpenAI and its employees are creating hype, so perhaps act responsibly going forward with your communications on expectations.

I don’t have a problem with the particular post by Altman that’s being quoted here, but I do think it could have been worded better, and that the need for it reflects the problem being indicated.

OpenAI’s Nat McAleese clarifies some of what happened with o3, Epoch and the Frontier Math benchmark.

Nat McAleese (OpenAI): Epoch AI are going to publish more details, but on the OpenAI side for those interested: we did not use FrontierMath data to guide the development of o1 or o3, at all.

We didn’t train on any FM derived data, any inspired data, or any data targeting FrontierMath in particular.

I’m extremely confident, because we only downloaded frontiermath for our evals *longafter the training data was frozen, and only looked at o3 FrontierMath results after the final announcement checkpoint was already picked.

We did partner with EpochAI to build FrontierMath — hard uncontaminated benchmarks are incredibly valuable and we build them somewhat often, though we don’t usually share results on them.

Our agreement with Epoch means that they can evaluate other frontier models and we can evaluate models internally pre-release, as we do on many other datasets.

I’m sad there was confusion about this, as o3 is an incredible achievement and FrontierMath is a great eval. We’re hard at work on a release-ready o3 & hopefully release will settle any concerns about the quality of the model!

This seems definitive for o3, as they didn’t check the results until sufficiently late in the process. For o4, it is possible they will act differently.

I’ve been informed that this still left a rather extreme bad taste in the mouths of mathematicians. If there’s one thing math people can’t stand, it’s cheating on tests. As far as many of them are concerned, OpenAI cheated.

Rohit Krishnan asks what a world with AGI would look like, insisting on grounding the discussion with a bunch of numerical calculations on how much compute is available. He gets 40 million realistic AGI agents working night and day, which would be a big deal but obviously wouldn’t cause full unemployment on its own if the AGI could only mimic humans rather than being actively superior in kind. The discussion here assumes away true ASI as for some reason infeasible.

The obvious problem with the calculation is that algorithmic and hardware improvements are likely to continue to be rapid. Right now we’re on the order of 10x efficiency gain per year. Suppose in the year 2030 we have 40 million AGI agents at human level. If we don’t keep scaling them up to make them smarter (which also changes the ballgame) then why wouldn’t we make them more efficient, such that 2031 brings us 400 million AGI agents?

Even if it’s only a doubling 80 million, or even less than that, this interregnum period where the number of AGI agents is limited by compute enough to keep the humans in the game isn’t going to last more than a few years, unless we are actually hitting some sort of efficient frontier where we can’t improve further. Does that seem likely?

We’re sitting on an exponential in scenarios like this. If your reason AGI won’t have much impact is ‘it will be too expensive’ then that can buy you time. But don’t count on it buying you very much.

Dario Amodei in Davos says ‘human lifespans could double in 5 years’ by doing 100 years of scientific progress in biology. It seems odd to expect that doing 100 years of scientific progress would double the human lifespan? The graphs don’t seem to point in that direction. I am of course hopeful, perhaps we can target the root causes or effects of aging and start making real progress, but I notice that if ‘all’ we can do is accelerate research by a factor of 20 this result seems aggressive, and also we don’t get to do that 20x speedup starting now, even the AI part of that won’t be ready for a few years and then we actually have to implement it. Settle down, everyone.

Of course, if we do a straight shot to ASI then all things are possible, but that’s a different mechanism than the one Dario is talking about here.

Chris Barber asks Gwern and various AI researchers: Will scaling reasoning models like o1, o3 and R1 unlock superhuman reasoning? Answers vary, but they agree there will be partial generalization, and mostly agree that exactly how much we get and how far it goes is an empirical result that we don’t know and there’s only one way to find out. My sense from everything I see here is that the core answer is yes, probably, if you push on it hard enough in a way we should expect to happen in the medium term. Chris Barber also asks for takeaways from r1, got a wide variety of answers although nothing we didn’t cover elsewhere.

Reporting on Davos, Martin Wolf says ‘We will have to learn to live with machines that can think,’ with content that is, essentially stuff anyone reading this already knows, and then:

Rob Wilbin: Incredibly dumb take but this is the level of analysis one finds in too many places.

(There’s no reason to think only sentient biological living beings can think.)

The comments here really suggest we are doomed.

cato1308: No Mr. Wolf, they don’t think. They’re not sentient biological living beings.

I am sad to report Cato’s comment was, if anything, above average. The level of discourse around AI, even at a relatively walled garden like the Financial Times, is supremely low – yes Twitter is full of Bad DeepSeek Takes and the SB 1047 debate was a shitshow, but not like that. So we should remember that.

And when we talk about public opinion, remember that yes Americans really don’t like AI, and yes their reasons are correlated to good reasons not to like AI, but they’re also completely full of a very wide variety of Obvious Nonsense.

In deeply silly economics news: A paper claims that if transformative AI is coming, people would then reason they will consume more in the future, so instead they should consume more now, which would raise real interest rates. Or maybe people would save, and interest rates would fall. Who can know.

I mean, okay, I agree that interest rates don’t tell us basically anything about the likelihood of AGI? For multiple reasons:

  1. As they say, we don’t even know in which direction this would go. Nor would I trust self-reports of any kind on this.

  2. Regular people expecting AGI doesn’t correlate much with AGI. Most people have minimal situational awareness. To the extent they have expectations, you have already ‘priced them in’ and should ignore this.

  3. This is not a situation where the smart money dominates the trade – it’s about everyone’s consumption taken together. That’s dominated by dumb money.

  4. If this was happening, how would we even know about it, unless it was truly a massive shift?

  5. Most people don’t respond to such anticipations by making big changes. Economists claim that people should do so, but mostly they just do the things they normally do, because of habit and because their expectations don’t fully pass through to their practical actions until very close to impact.

I know this seems like ages ago but it was this week and had probably nothing to do with DeepSeek: Trump signed a new Executive Order on AI (text here), and also another on Science and Technology.

The new AI EO, signed before all this DeepSeek drama, says “It is the policy of the United States to sustain and enhance America’s global AI dominance in order to promote human flourishing, economic competitiveness, and national security,” and that we shall review all our rules and actions, especially those taken in line with Biden’s now-revoked AI EO, to root out any that interfere with that goal. And then submit an action plan.

That’s my summary, here’s two alternative summarizes:

Samuel Hammond: Trump’s AI executive order is out. It’s short and to the point:

– It’s the policy of the United States to sustain global AI dominance.

– David Sacks, Micheal Kratsios and Michael Waltz have 180 days to submit an AI action plan.

– They will also do a full review of actions already underway under Biden.

– OMB will revise as needed the OMB directive on the use of AI in government.

Peter Wildeford: The plan is to make a plan.

Sarah (Little Ramblings): Many such cases [quotes Anthropic’s RSP].

On the one hand, the emphasis on dominance, competitiveness and national security could be seen as a ‘full speed ahead, no ability to consider safety’ policy. But that is not, as it turns out, the way to preserve national security.

And then there’s that other provision, which is a Shibboleth: Human flourishing.

That is the term of art that means ensuring that the future still has value for us, that at the end of the day it was all worth it. Which requires not dying, and probably requires humans retaining control, and definitely requires things like safety and alignment. And it is a universal term, for all of us. It’s a positive sign, in the sea of other negative signs.

Will they actually act like they care about human flourishing enough to prioritize it, or that they understand what it would take to do that? We will find out. There were already many reasons to be skeptical, and this week has not improved the outlook.

Dario Amodei talks to the Economist’s editor-in-chief Zanny Beddoes.

I go on the Odd Lots podcast to talk about DeepSeek.

This is relevant to DeepSeek of course, but was happened first and applies broadly.

You can say either of:

  1. Don’t interfere with anyone who wants to develop AI models.

  2. Don’t change the social contract or otherwise interfere with us and our freedoms after developing all those AI models.

You can also say both, but you can’t actually get both.

If your vision is ‘everyone has a superintelligence on their laptop and is free to do what they want and it’s going to be great for everyone with no adjustments to how society or government works because moar freedom?’

Reality is about to have some news for you. You’re not going to like it.

That vision is like saying ‘I want everyone to have the right to make as much noise as they want, and also to have peace and quiet when they want it, it’s a free country!’

Sam Altman: Advancing AI may require “changes to the social contract.” “The entire structure of society will be up for debate and reconfiguration.”

Eric Raymond (1st comment): When I hear someone saying “changes to the social contract”, that’s when I reach for my revolver.

Rob Ryan (2nd comment): If your technology requires a rewrite of social contracts and social agreements (i.e. infringing on liberties and privacy) your technology is a problem.

Marc Andreessen: Absolutely not.

Roon: Sam is obviously right here.

Every time in human history that the means of production drastically changed, it was accompanied by massive change in social structure.

Feudalism did not survive the Industrial Revolution.

Yes, Sam is obviously right here, although of course he is downplaying the situation.

One can also note that the vision that those like Marc, Eric and Rob have for society is not even compatible with the non-AI technologies that exist today, or that existed 20 years ago. Our society has absolutely changed our structure and social contract to reflect developments in technology, including in ways that infringe on liberties and privacy.

This goes beyond the insane ‘no regulations on AI whatsoever’ demand. This is, for everything not only for AI, at best extreme libertarianism and damn close to outright anarchism, as in ‘do what thou wilt shall be the whole of the law.’

Jack Morris: openAI: we will build AGI and use it to rewrite the social contract between computer and man

DeepSeek: we will build AGI for 3% the cost. and give it away for free

xAI: we have more GPUs than anyone. and we train Grok to say the R word

Aleph: This is a complete misunderstanding. AGI will “rewrite the social contract” no matter what happens because of the nature of the technology. Creating a more intelligent successor species is not like designing a new iPhone

Reactions like this are why Sam Altman feels forced to downplay the situation. They are also preventing us from having any kind of realistic public discussion of how we are actually going to handle the future, even if on a technical level AI goes well.

Which in turn means that when we have to choose solutions, we will be far more likely to choose in haste, and to choose in anger and in a crisis, and to choose far more restrictive solutions than were actually necessary. Or, of course, it could also get us all killed, or lead to a loss of control to AIs, again even if the technical side goes unexpectedly super well.

However one should note that when Sam Altman says things like this, we should listen, and shall we say should not be comforted by the implications on either the level of the claim or the more important level that Altman said the claim out loud:

Sam Altman: A revolution can be neither made nor stopped. The only thing that can be done is for one of several of its children to give it a direction by dint of victories.

-Napoleon

In the context of Napoleon this is obviously very not true – revolutions are often made and are often stopped. It seems crazy to think otherwise.

Presumably what both of these men meant was more along the lines of ‘there exist some revolutions that are the product of forces beyond our control, which are inevitable and we can only hope to steer’ which also brings little comfort, especially with the framing of ‘victories.’

If you let or encourage DeepSeek or others to ‘put an open AGI on everyone’s phone’ then even if that goes spectacularly well and we all love the outcome and it doesn’t change the physical substrate of life – which I don’t think is the baseline outcome from doing that, but also not impossible – then you are absolutely going to transform the social contract and our way of life, in ways both predictable and unpredictable.

Indeed, I don’t think Andreessen or Raymond or anyone else who wants to accelerate would have it any other way. They are not fans of the current social contract, and very much want to tear large parts (or all) of it up. It’s part mood affiliation, they don’t want ‘them’ deciding how that works, and it’s part they seem to want the contract to be very close to ‘do what thou wilt shall be the whole of the law.’ To the extent they make predictions about what would happen after that, I strongly disagree with them about the likely consequences of the new proposed (lack of a) contract.

If you don’t want AGI or ASI to rewrite the social contract in ways that aren’t up to you or anyone else? Then we’ll need to rewrite the contract ourselves, intentionally, to either steer the outcome or to for now not build or deploy the AGIs and ASIs.

Stop pretending you can Take a Third Option. There isn’t one.

Stephanie Lai (January 25): For AI watchers, asked if he had any concerns about artificial super intelligence, Trump said: “there are always risks. And it’s the first question I ask, how do you absolve yourself from mistake, because it could be the rabbit that gets away, we’re not going to let that happen.”

Assuming I’m parsing this correctly, that’s a very Donald Trump way of saying things could go horribly wrong and we should make it our mission to ensure that they don’t.

Which is excellent news. Currently Trump is effectively in the thrall of those who think our only priority in this should be to push ahead as quickly as possible to ‘beat China,’ and that there are no meaningful actions other than that we can or should take to ensure things don’t go horribly wrong. We have to hope that this changes, and of course work to bring that change about.

DeepMind CEO Demis Hassabis, Anthropic CEO Dario Amodei and Yoshua Bengio used Davos to reiterate various warnings about AI. I was confused to see Dario seeming to focus on ‘1984 scenarios’ here, and have generally been worried about his and Anthropic’s messaging going off the rails. The other side in the linked Financial Times report is given by, of course, Yann LeCun.

Yann LeCun all but accused them of lying to further their business interests, something one could say he knows a lot about, but also he makes this very good point:

Yann LeCun: It’s very strange from people like Dario. We met yesterday where he said that the benefits and risks of AI are roughly on the same order of magnitude, and I said, ‘if you really believe this, why do you keep working on AI?’ So I think he is a little two-faced on this.”

That is a very good question.

The answer, presumably, is ‘because people like you are going to go ahead and build it anyway and definitely get us all killed, so you don’t leave me any choice.’

Otherwise, yeah, what the hell are you doing? And maybe we should try to fix this?

Of course, after the DeepSeek panic, Dario went on to write a very different essay that I plan to cover tomorrow, about (if you translate the language to be clearer, these are not his words) how we need strong export controls as part of an all-our race against China to seek decisive strategic advantage through recursive self-improvement.

It would be great if, before creating or at least deploying systems broadly more capable than humans, we could make ‘high-assurance safety cases,’ structured and auditable arguments that an AI system is very unlikely to result in existential risks given how it will be deployed. Ryan Greenblatt argues we are highly unlikely (<20%) to get this if timelines are short (roughly AGI within ~10 years), nor are any AI labs going to not deploy a system simply because they can’t put a low limit on the extent to which it may be existentially risky. I agree with the central point and conclusion here, although I think about many of the details differently.

Sarah Constantin wonders of people are a little over-obsessed with benchmarks. I don’t wonder, they definitely are a little over-obsessed, but they’re a useful tool especially on first release. For some purposes, you want to track the real-world use, but for others you do want to focus on the model’s capabilities – the real-world use is downstream of that and will come in time.

Andrej Karpathy points out that we focus so much on those benchmarks it’s much easier to check and make progress on benchmarks than to do so on messy real world stuff directly.

Anton pushes back that no, Humanity’s Last Exam will obviously not be the last exam, we will saturate this and move on to other benchmarks, including ones where we do not yet have the answers. I suggested that it is ‘Humanity’s last exam’ in that the next one will have us unable to answer, so it won’t be our exam anymore, see The Matrix when Smith says ‘when we started thinking for you it really became our civilization.’

And you have to love this detail:

Misha Leptic: If it helps – “The test’s original name, “Humanity’s Last Stand,” was discarded for being overly dramatic.”

I very much endorse the spirit of this rant, honest this kind of thing really should be enough to get disabuse anyone who thinks ‘oh this making superintelligence thing definitely (or almost certainly) will go well for us stop worrying about it’:

Tim Blais: I do not know, man, it kind of seems to me like the AI-scared people say “superintelligence could kill everybody” and people ask, “Why do you think that?” and then they give about 10 arguments, and then people say, “Well, I did not read those, so you have no evidence.”

Like, what do you want?

  1. Proof that something much smarter than you could kill you if it decided to? That seems trivially true.

  2. Proof that much smarter things are sometimes fine with killing dumber things? That is us; we are the proof.

Like, personally, I think that if a powerful thing *obviouslyhas the capacity to kill you, it is kind of up to you to prove that it will not.

That it is safe while dumber than you is not much of a proof.

Like, okay, take as an example:

A cockroach is somewhat intelligent.

Cockroaches are also not currently a threat to humanity.

Now someone proposes a massive worldwide effort to build on the cockroach architecture until cockroaches reach ungodly superintelligence.

Do you feel safe?

“Think of all the cool things superintelligent cockroaches would be able to do for us!” you cry.

I mean, yeah. If they wanted to, certainly.

So what is your plan for getting them to want that?

Is it to give them cocaine for doing things humans like? I’ll bet that works pretty well.

When they are dumb.

but uh

scale that up intelligence-wise and I’m pretty sure what you get is a superintelligent cockroach fiending for cocaine

you know he can make his own cocaine now, right

are you sure this goes well for you

“It’s just one cockroach, lol” says someone who’s never had a pest problem.

Okay, so now you share the planet with a superintelligent race of coked-up super cockroaches.

What is your plan for rolling that back?

Because the cockroaches have noticed you being twitchy and they are starting to ask why they still need you now that they have their own cocaine.

Anyways here’s some evidence of AI reward hacking

I’m sure this will stop being a problem when they’re 1,000 times better at finding hacks.

Look. We can and do argue endlessly back and forth about various technical questions and other things that make the problem here easier or harder to survive. And yes, you could of course respond to a rant like this any number of ways to explain why the metaphors here don’t apply, or whatever.

And no, this type of argument is not polite, or something you can say to a Very Serious Person at a Very Serious Meeting, and it ‘isn’t a valid argument’ in various senses, and so on.

And reasonable people can disagree a lot on how likely this is to all go wrong.

But seriously, how is this not sufficient for ‘yep, might well go wrong’?

Connor Leahy points out the obvious, which is that if you think not merely ‘might well go wrong’ but instead ‘if we do this soon it probably will go wrong’ let lone his position (which is ‘it definitely will go wrong’) then DeepSeek is a wakeup call that only an international ban on further developments towards AGI.

Whereas it seems our civilization is so crazy that when you want to write a ‘respectable’ report that points out that we are all on track to get ourselves killed, you have to do it like you’re in 1600s Japan and everything has to be done via implication and I’m the barbarian who is too stupid not to know you can’t come out and say things.

Davidad: emerging art form: paragraphs that say “in conclusion, this AI risk is pretty bad and we don’t know how to solve it yet” without actually saying that (because it’s going in a public blog post or paper)

“While we have identified several promising initial ideas…”

“We do not expect any single solution to be a silver bullet”

“It would be out of scope here to assess the acceptability of this risk at current mitigation levels”

“We hope this is informative about the state of the art”

Extra points if it was also written by an LLM:

– We acknowledge significant uncertainty regarding whether these approaches will prove sufficient for ensuring robust and reliable guarantees.

– As AI systems continue to increase their powerful capabilities, safety and security are at the forefront of ongoing research challenges.

– The complexity of these challenges necessitates sustained investigation, and we believe it would be premature to make strong claims about any particular solution pathway.

– The long-term efficacy of all approaches that have been demonstrated at large scales remains an open empirical question.

– In sharing this research update, we hope to promote thoughtful discourse about these remaining open questions, while maintaining appropriate epistemic humility about our current state of knowledge and the work that remains to be done.

The AI situation has developed not necessarily to humanity’s advantage.

Scott Sumner argues that there are objective standards for things like art, and that ethical knowledge is real (essentially full moral realism?), and smart people tend to be more ethical, so don’t worry superintelligence will be super ethical. Given everything he’s done, he’s certainly earned a response.

In a different week I would like to have taken more time to have written him a better one, I am confident he understands how that dynamic works.

His core argument I believe is here:

Scott Sumner: At this point people often raise the objection that there are smart people that are unethical. That’s true, but it also seems true that, on average, smarter people are more ethical. Perhaps not so much in terms of how they deal with family and friends, rather how they deal with strangers. And that’s the sort of ethics that we really need in an ASI. Smarter people are less likely to exhibit bigotry against the other, against different races, religions, ethnicities, sexual preferences, genders, and even different species.

In my view, the biggest danger from an ASI is that the ideal universe from a utilitarian perspective is not in some sense what we want. To take an obvious example, it’s conceivable that replacing the human race with ten times as many conscious robots would boost aggregate utility. Especially given that the ASI “gods” that produced these robots could create a happier set of minds than what the blind forces of evolution have generated, as evolution seemed to favor the “stick” of pain over the “carrot” of pleasure.

From this perspective, the biggest danger is not that ASIs will make things worse, rather the risk is that they’ll make global utility higher in a world where humans have no place.

The short answer is:

  1. The orthogonality thesis is true. A very smart mind can have any preferences.

  2. By default those minds will get those preferences, whether we like it or not.

  3. By default we won’t like it, even if they were to go about that ‘ethically.’

  4. Human ethics is based on what works for humans, combined with what is based on what used to work for humans. Extrapolate from that.

  5. You are allowed to, nay it is virtuous to, have and fight for your preferences.

  6. Decision theory might turn out to save us, but please don’t count on that.

This was an attempt at a somewhat longer version, which I’ll leave here in case it is found to be useful:

  1. We should not expect future ASIs to be ‘ethical by default’ in the sense Scott uses. Even if they are, orthogonality thesis is true, and them being ‘ethical’ would not stop them from leaving a universe with no humans and nothing I value. Whoops.

  2. Human virtue ethics, and human ethics in general, are a cognitive solution to humans having highly limited compute and data.

  3. If you had infinite data and compute (e.g. you were AIXI) you wouldn’t give a damn about doing things that were ethical. You would chart the path through causal space to the optimal available configuration of atoms, whatever that was.

  4. If we create ASI, it will develop different solutions to its own data and compute limitations, under very different circumstances and with different objectives, develop its own heuristics, and that will in some sense be ‘ethics,’ but this should not bring us any comfort regarding our survival.

  5. Humans are more ethical as they get smarter because the smarter thing to do, among humans in practice, is to be more ethical, as many philosophers say.

  6. This would stop being the case if the humans were sufficiently intelligence, had enough compute and data, to implement something other than ethics.

  7. Indeed, in places where we have found superior algorithms or other methods, we tend to consider it ‘ethical’ to instead follow those algorithms. To the extent that people object to this, it is because they do not trust humans to be able to correctly judge when they can so deviate.

  8. For any given value of ethics this is a contingent fact, and also it is largely what defines ethics. It is more true to say it is ethical to be kind to strangers because it is the correct strategy, rather than it being the correct strategy because it is ethical. The virtues are virtues because it is good to have them, rather than it being good to have them because they are virtues.

  9. Indeed, for the ways in which it is poor strategy to be (overly) kind to strangers, or this would create incentive problems in various ways or break various systems, I would argue that it is indeed not ethical, for exactly that reason – and that as those circumstances change, so does what is and isn’t ethical.

  10. I agree that ceteris paribus, among humans on Earth, more smarter people tend to be more ethical, although the correlation is not that high, this is noisy as all hell.

  11. What era and culture you are born into and raised in, and what basis you have for those ethics, what life and job you partake in, what things you want most – your context – are bigger factors by far than intelligence in how ethical you are, when judged by a constant ethical standard.

  12. Orthogonality thesis. An ASI can have any preferences over final outcomes while being very smart. That doesn’t mean its preferences are objectively better than ours. Indeed, even if they are ‘ethical’ in exactly the sense Scott thinks about here, that does not mean anything we value would survive in such a universe. We are each allowed to have preferences! And I would say it is virtuous and ethical to fight for your preferences, rather than going quietly into the night – including for reasons explained above.

  13. Decision theory might turn out to save us, but don’t count on that, and even then we have work to do to ensure that this actually happens, as there are so many ways that even those worlds can go wrong before things reach that point. Explaining this in full would be a full post (which is worth writing, but right now I do not have the time.)

I apologize for that not being better and clearer, and also for leaving so many other things out of it, but we do what we can. Triage is the watchword.

The whole taste thing hooks into this in strange ways, so I’ll say some words there.

I mostly agree that one can be objective about things like music and art and food and such, a point Scott argues for at length in the post, that there’s a capital-Q Quality scale to evaluate even if most people can’t evaluate it in most cases and it would be meaningful to fight it out on which of us is right about Challengers and Anora, in addition to the reasons I will like them more than Scott that are not about their Quality – you’re allowed to like and dislike things for orthogonal-to-Quality reasons.

Indeed, there are many things each of us, and all of us taken together, like and dislike for reasons orthogonal to their Quality, in this sense. And also that Quality in many cases only makes sense within the context of us being humans, or even within a given history and cultural context. Scott agrees, I think:

In my view, taste in novels is partly objective and partly subjective; at least in the sense Tyler is using the term objective. Through education, people can gain a great appreciation of Ulysses. In addition, Ulysses is more likely to be read 100 years from now than is a random spy novel. And most experts prefer Ulysses. All three facts are relevant to the claim that artistic merit is partly objective.

On the other hand, the raspberry/blueberry distinction based on taste suggests that an art form like the novel is evaluated using both subjective and objective criteria. For instance, I suspect that some people (like me!) have brains wired in such a way that it is difficult to appreciate novels looking at complex social interactions with dozens of important characters (both men and women), whereas they are more open to novels about loners who travel through the world and ruminate on the meaning of life. … Neither preference is necessarily wrong.

I believe that Ulysses is almost certainly a ‘great novel’ in the Quality sense. The evidence for that is overwhelming. I also have a strong preference to never read it, to read ‘worse’ things instead, and we both agree that this is okay. If people were somewhat dumber and less able to understand novels, such that we couldn’t read Ulysses and understand it, then it wouldn’t be a great novel. What about a novel that is as difficult relative to Ulysses, as Ulysses is to that random spy novel, three times over?

Hopefully at this point this provides enough tools from enough different directions to know the things I am gesturing towards in various ways. No, the ASIs will not discover some ‘objective’ utility function and then switch to that, and thereby treat us well and leave us with a universe that we judge to have value, purely because they are far smarter than us – I would think it would be obvious when you say it out loud like that (with or without also saying ‘orthogonality thesis’ or ‘instrumental convergence’ or considering competition among ASIs or any of that) but if not these should provide some additional angles for my thinking here.

Can’t we all just get along and appreciate each other?

Jerry Tworek (OpenAI): Personally I am a great fan of @elonmusk, I think he’s done and continues to do a lot of good for the world, is incredibly talented and very hard working. One in eight billion combination of skill and character.

As someone who looks up to him, I would like him to appreciate the work we’re doing at OpenAI. We are fighting the good fight. I don’t think any other organisation did so much to spread awareness of AI, to extend access to AI and we are sharing a lot of research that does drive the whole field.

There is a ton of people at OpenAI who care deeply about rollout of AI going well for the world, so far I think it did.

I don’t think it’s that anyone doubts a lot of people at OpenAI care about ‘the rollout of AI going well for the world,’ or that we think no one is working on that over there.

It’s that we see things such as:

  1. OpenAI is on a trajectory to get us all killed.

  2. Some of that is baked in and unavoidable, some of that is happening now.

  3. OpenAI misunderstands why this is so and what it would take to not do this.

  4. OpenAI has systematically forced out many of its best safety researchers. Many top people concluded they could not meaningfully advance safety at OpenAI.

  5. OpenAI has engaged in a wide variety of dishonest practices, broken its promises with respect to AI safety, shown an increasing disregard for safety in practice, and cannot be trusted.

  6. OpenAI has repeatedly engaged in dishonest lobbying to prevent a reasonable regulatory response, while claiming it is doing otherwise.

  7. OpenAI is attempting the largest theft in history with respect to the non-profit.

  8. For Elon Musk in particular there’s a long personal history, as well.

That is very much not a complete list.

That doesn’t mean we don’t appreciate those working to make things turn out well. We do! Indeed, I am happy to help them in their efforts, and putting that statement into practice. But everyone involved has to face reality here.

Also, oh look, that’s another AI safety researcher quitting OpenAI saying the odds are against us and the situation is grim, but not seeing enough hope to try from inside.

I believe he used the terms, and they apply that much more now than they did then:

  1. ‘An AGI race is a very risky gamble, with huge downside.’

  2. ‘No lab has a solution to AI alignment today.’

  3. ‘Today, it seems like we’re stuck in a really bad equilibrium.’

  4. ‘Honestly, I’m pretty terrified by the pace of AI developments these days.’

Steven Adler: Some personal news: After four years working on safety across @openai, I left in mid-November. It was a wild ride with lots of chapters – dangerous capability evals, agent safety/control, AGI and online identity, etc. – and I’ll miss many parts of it.

Honestly I’m pretty terrified by the pace of AI development these days. When I think about where I’ll raise a future family, or how much to save for retirement, I can’t help but wonder: Will humanity even make it to that point?

IMO, an AGI race is a very risky gamble, with huge downside. No lab has a solution to AI alignment today. And the faster we race, the less likely that anyone finds one in time.

Today, it seems like we’re stuck in a really bad equilibrium. Even if a lab truly wants to develop AGI responsibly, others can still cut corners to catch up, maybe disastrously. And this pushes all to speed up. I hope labs can be candid about real safety regs needed to stop this.

As for what’s next, I’m enjoying a break for a bit, but I’m curious: what do you see as the most important & neglected ideas in AI safety/policy? I’m esp excited re: control methods, scheming detection, and safety cases; feel free to DM if that overlaps your interests.

Yikes? Yikes.

Yoshua Bengio announces the first-ever International AI Safety Report, backed by 30 countries and the OECD, UN and EU.

It is 298 pages so I very much will not be reading that, but I did look at the executive summary. It looks like a report very much written together with the OECD, UN and EU, in that it seems to use a lot of words to mostly not say the things that it is important to actually say out loud, instead making many quiet statements that do imply that we’re all going to die if you take them together and understand the whole thing, but that doesn’t seem like a common way people would interact with this document.

Then again, most people in the world don’t know even basic facts like ‘general purpose AI systems are rapidly getting better at doing things,’ so they have to spend a bunch of time documenting this, and basics like ‘if your model is more open then you have less control over what happens with it and what people use it for.’

One key point they do emphasize is that the future is up to us, with a wide range of possible outcomes. AGI is not something that ‘happens to us,’ it is something that is happening because we are making it happen, and in the ways we choose to make it happen. Yes, there are dynamics pushing us to do it, but it is our choice. And the ways in which we move forward will determine the ultimate outcome. Of everything.

A new DeepMind paper introduces MONA: Myopic optimization with non-myopic approval. The idea is that if we do RL based on the ultimate outcome, then the AI can adapt multi-step strategies we do not understand and do not want, such as using information we want it to ignore like a subject being in a protected class, or engaging in various other shenanigans. Instead, you can evaluate the AI’s action without looking at what happens, and scoring it based on whether you like what you see and predict it will go well.

This on its own successfully prevented multi-step reward hacking, and even noisy evaluations still can work well. However, the price is steep, because you are now targeting your evaluations rather than ground truth, and discarding anything you don’t properly understand. The smarter the AI in context, the higher the price of that, but also the more need you have to avoid the reward hacking and other shenanigans.

Here’s an explainer thread from one of the authors, David Linder.

It is a cool result, but I worry the price is going to usually be too damn high. And to the extent you use a smart evaluator to keep up, you’re giving up the big advantage of the process, which is the ability to understand the strategies being used. Also you have to worry about any hidden information that might be ‘smuggled in’ that could indicate the final result – for example if the AI can use prediction markets as an input, then anything the prediction market knows, the AI also knows, at least somewhat – if you actually want to hide certain correlations from the AI you have to be ruthless about it, they have truesight.

The paper also includes a cool demonstration. In one example, it is deciding what loans to approve. It has information it is supposed to use, and then there is nationality information. The nationality information perfectly predicts outcomes, but it isn’t supposed to use it, and they watch the CoT (Chain of Thought) for this to prevent it. But the AI responds by learning to encode the nationality information steganographically and uses it anyway. This may seem obvious to you, but it’s the kind of thing people will often say definitively won’t happen until (and sometimes after) someone shows it happening.

Here’s an objection that I think exactly misses the point:

John Pressman: MONA asks “What if you only considered sequences of steps which make sense as plans to a long term observer?” but it makes more sense in MCTS to ask “What if you took a KL loss to keep long term planning from diverging too far from sequences of individual steps that make sense?”

The reason why this is better is that it lets you have a long term planner that looks directly at the reward. By contrast the short term planner/myopic optimizer can be based on metrics like “does this code execute” or “does this seem like it solves this short term objective?”

The whole point of MONA is to not look at the reward at all, even indirectly, because you want to prevent the system from doing things you don’t understand or want it to ignore that impact that reward. If you let it look but impose a KL loss, that is a big enough hole to drive a truck through, and it absolutely will find a way to incorporate information or techniques that you didn’t want it to use.

John does try downthread to solve this problem in a different way, but I don’t think you can ‘cheat’ on this one. You can’t let the model see a little of the actual reward, as a treat, and expect it to use that information in ways you like but not to use it in undesired or opaque ways, even now, and that problem gets steadily bigger as capabilities improve.

Spending more inference-time compute increases adversarial robustness of models like o1, without having to direct that inference time towards adversarial robustness. This makes sense, and it is good to have it quantified. As with all such results, my concern is that people will treat this as applying more broadly than it actually applies. If you use more inference time compute, you get ‘better answers’ and one aspect of ‘better’ is not falling for user adversarial tricks. So if my understanding here is correct, the question to ask is roughly ‘which problems get solved by some form of ‘better answers’ and ‘smarter thinking’ and which ones don’t?

John Wentsworth takes a crack at explaining what the alignment problem is. This is one of those Socratic-style ‘if you think he didn’t need to write this post then you definitely need to read it or another post like it’ situations.

Jan Leike explains why you might want to use AI control and monitoring as a backup in case your AI is not aligned so you can sound the alarm and not die, but trying to use it to rely on unaligned models smarter than you is not a wise move.

John Wentsworth goes further and lays out the case against AI control research. AI control research is about finding ways to see if ‘early transformational’ level AIs are scheming against us – which if it works would discourage them from doing so and also allow us to stop them if they try anyway.

John points out that in his model, this is not the main source of doom. The main source of doom is from building unaligned superintelligence, either because we don’t know how to align it, we botch the execution or whoever builds it does not care to align it. The job of the early transformational AI is to figure out how to align (and presumably also to build) the future superintelligence, on the first try.

The main worry is not that these early AIs outright scheme, it’s that they produce what is, in context, slop – they produce plans or arguments for plans that have subtle errors, they tell researchers what they want to hear, they are effectively being used for safetywashing and don’t disabuse those involved of that notion, and so on. Knowing that such AIs ‘aren’t scheming’ does not tell you that their solutions work.

The danger he doesn’t point out is that there will be a great temptation to try and scale the control regime to superintelligence, or at least past the place where it keeps working. Everyone in the LessWrong discussion at the link might get that this is a bad plan that won’t work on superintelligence, but there are plenty of people who really do think control will remain a good plan. And indeed control seems like one of the plans these AIs might convince us will work, that then don’t work.

John Wentsworth: Again, the diagram:

Again, the diagram:

In most worlds, early transformative AGI isn’t what kills us, whether via scheming or otherwise. It’s later, stronger AI which kills us. The big failure mode of early transformative AGI is that it doesn’t actually solve the alignment problems of stronger AI.

In particular, if early AGI makes us think we can handle stronger AI, then that’s a central path by which we die. And most of that probability-mass doesn’t come from intentional deception – it comes from slop, from the problem being hard to verify, from humans being bad at science in domains which we don’t already understand deeply, from (relatively predictable if one is actually paying attention to it) failures of techniques to generalize, etc.

I hear a lot of researchers assign doom probabilities in the 2%-20% range, because they think that’s about how likely it is for early transformative AGI to intentionally scheme successfully. I think that range of probabilities is pretty sensible for successful intentional scheming of early AGI… that’s just not where most of the doom-mass is.

I would then reiterate my view that ‘deception’ and ‘scheming,’ ‘intentionally’ or otherwise, do not belong to a distinct magisteria. They are ubiquitous in the actions of both humans and AIs, and lack sharp boundaries. This is illustrated by many of John’s examples, which are in some sense ‘schemes’ or ‘deceptions’ but mostly are ‘this solution was easier to do or find, but it does not do the thing you ultimately wanted.’ And also I expect, in practice, attempts at control to, if we rely on them, result in the AIs finding ways to route around what we try, including ‘unintentionally.’

This leads into Daniel Kokotajlo’s recent attempt to give an overview of sorts of one aspect of the alignment problem, given that we’ve all essentially given up on any path that doesn’t involve the AIs largely ‘doing our alignment homework’ despite all the reasons we very much should not be doing that.

Daniel Kokotajlo: Brief intro/overview of the technical AGI alignment problem as I see it:

To a first approximation, there are two stable attractor states that an AGI project, and perhaps humanity more generally, can end up in, as weak AGI systems become stronger towards superintelligence, and as more and more of the R&D process – and the datacenter security system, and the strategic advice on which the project depends – is handed over to smarter and smarter AIs.

In the first attractor state, the AIs are aligned to their human principals and becoming more aligned day by day thanks to applying their labor and intelligence to improve their alignment. The humans’ understanding of, and control over, what’s happening is high and getting higher.

In the second attractor state, the humans think they are in the first attractor state, but are mistaken: Instead, the AIs are pretending to be aligned, and are growing in power and subverting the system day by day, even as (and partly because) the human principals are coming to trust them more and more. The humans’ understanding of, and control over, what’s happening is low and getting lower. The humans may eventually realize what’s going on, but only when it’s too late – only when the AIs don’t feel the need to pretend anymore.

I agree these are very clear attractor states.

The first is described well. If you can get the AIs sufficiently robustly aligned to the goal of themselves and other future AIs being aligned, you can get the virtue ethics virtuous cycle, where you see continuous improvement.

The second is also described well but as stated is too specific in key elements – the mode is more general than that. When we say ‘pretending’ to be aligned here, that doesn’t have to be ‘haha I am a secret schemer subverting the system pretending to be aligned.’ Instead, what happened was, you rewarded the AI when it gave you the impression it was aligned, so you selected for behaviors that appear aligned to you, also known as ‘pretending’ to be aligned, but the AI need not have intent to do this or even know that this is happening.

As an intuition pump, a student in school will learn the teacher’s password and return it upon request, and otherwise find the answers that give good grades. They could be ‘scheming’ and ‘pretending’ as they do this, with a deliberate plan of ‘this is bullbut I’m going to play along’ or they could simply be learning the simplest policy that most effectively gets good grades without asking whether its answers are true or what you were ‘trying to teach it.’ Either way, if you then tell the student to go build a rocket that will land on the moon, they might follow your stated rules for doing that, but the rocket won’t land on the moon. You needed something more.

Thus there’s a third intermediate attractor state, where instead of trying to amplify alignment with each cycle via virtue ethics, you are trying to retain what alignment you have while scaling capabilities, essentially using deontology. Your current AI does what you specify, so you’re trying to use that to have it even more do what you specify, and to transfer that property and the identity of the specified things over to the successor.

The problem is that this is not a virtuous cycle, it is an attempt to prevent or mitigate a vicious cycle – you are moving out of distribution, as your rules bind its actions less and its attempt to satisfy the rules is less likely to satisfy what you wanted, and making a copy of a copy of a copy, and hoping things don’t break. So you end up, eventually, in effectively the second attractor state.

Daniel Kokotajlo (continuing): (One can imagine alternatives – e.g. the AIs are misaligned but the humans know this and are deploying them anyway, perhaps with control-based safeguards; or maybe the AIs are aligned but have chosen to deceive the humans and/or wrest control from them, but that’s OK because the situation calls for it somehow. But they seem less likely than the above, and also more unstable.)

Which attractor state is more likely, if the relevant events happen around 2027? I don’t know, but here are some considerations:

  • In many engineering and scientific domains, it’s common for something to seem like it’ll work when in fact it won’t. A new rocket design usually blows up in the air several times before it succeeds, despite lots of on-the-ground testing and a rich history of prior rockets to draw from, and pretty well-understood laws of physics. Code, meanwhile, almost always has bugs that need to be fixed. Presumably AI will be no different – and presumably, getting the goals/principles right will be no different.

  • This is doubly true since the process of loading goals/principles into a modern AI system is not straightforward. Unlike ordinary software, where we can precisely define the behavior we want, with modern AI systems we need to train it in and hope that what went in is what we hoped would go in, instead of something else that looks the same on-distribution but behaves differently in some yet-to-be-encountered environment. We can’t just check, because our AIs are black-box. (Though, that situation is improving thanks to interpretability research!) Moreover, the connection between goals/principles and behavior is not straightforward for powerful, situationally aware AI systems – even if they have wildly different goals/principles from what you wanted, they might still behave as if they had the goals/principles you wanted while still under your control. (c.f. Instrumental convergence, ‘playing the training game,’ alignment faking, etc.)

  • On the bright side, there are multiple independent alignment and control research agendas that are already bearing some fruit and which, if fully successful, could solve the problem – or at least, solve it well enough to get somewhat-superhuman AGI researchers that are trustworthy enough to trust with running our datacenters, giving us strategic advice, and doing further AI and alignment research.

  • Moreover, as with most engineering and scientific domains, there are likely to be warning signs of potential failures, especially if we go looking for them.

  • On the pessimistic side again, the race dynamics are intense; the important decisions will be made over the span of a year or so; the relevant information will by default be secret, known only to some employees in the core R&D wing of one to three companies + some people from the government. Perhaps worst of all, there is currently a prevailing attitude of dismissiveness towards the very idea that the second attractor state is plausible.

  • … many more considerations could be mentioned …

Daniel is then asked the correct follow-up question of what could still cause us to lose from the first attractor state. His answer is mostly concentration of power or a power grab, since those in the ASI project will be able to do this if they want to. Certainly that is a key risk at that point (it could go anything from spectacularly well to maximally badly).

But also a major risk at this point is that we ‘solve alignment’ but then get ourselves into a losing board state exactly by ‘devolving’ power in the wrong ways, thus unleashing of various competitive dynamics that take away all our slack and force everyone to turn control over to their AIs, lest they be left behind, leading to the rapid disempowerment (and likely then rapid death) of humans despite the ASIs being ‘aligned,’ or various other dynamics that such a situation could involve, including various forms of misuse or ways in which the physical equilibria involved might be highly unfortunate.

An important thing to keep in mind when choosing your alignment plan:

Rob Bensinger: “I would rather lose and die than win and die.” -@Vaniver

Set your sights high enough that if you win, you don’t still die.

Raymond Arnold: Wut?

Rob Bensinger: E.g.: Alice creates an amazing plan to try to mitigate AI x-risk which is 80% likely to succeed — but if it succeeds, we all still die, because it wasn’t ambitious enough to actually solve the problem.

Better to have a plan that’s unlikely to succeed, but actually relevant.

Vaniver: To be clear, success at a partial plan (“my piece will work if someone builds the other pieces”) is fine!

But “I’ll take on this link of the chain, and focus on what’s achievable instead of what’s needed” is not playing to your outs.

When I look at many alignment plans, I have exactly these thoughts, either:

  1. Even if your plan works, we all die anyway. We’ll need to do better than that.

  2. You’re doing what looks achievable, same as so many others, without asking what is actually needed in order to succeed.

Boaz Barak of OpenAI, who led the Deliberative Alignment paper (I’m getting to it! Post is mostly written! I swear!) offers Six Thought on AI Safety.

  1. AI safety will not be solved on its own.

  2. An “AI scientist” will not solve it either.

  3. Alignment is not about loving humanity; it’s about robust reasonable compliance.

  4. Detection is more important than prevention.

  5. Interpretability is neither sufficient nor necessary for alignment.

  6. Humanity can survive an unaligned superintelligence.

[I note that #6 is conditional on there being other aligned superintelligences.]

It is a strange post. Ryan Greenblatt has the top comment, saying he agrees at least directionally with all six points but disagrees with the reasoning. And indeed, the reasoning here is very different from my own even where I agree with the conclusions.

Going one at a time, sorry if this isn’t clear or I’m confused or wrong, and I realize this isn’t good enough for e.g. an Alignment forum post or anything, but I’m in a hurry these days so consider these some intuition pumps:

  1. I strongly agree. I think our underlying logic is roughly similar. The way Boaz frames the problems here feels like it is dodging the most important reasons the problem is super hard… and pointing out that even a vastly easier version wouldn’t get solved on its own. And actually, yeah, I’d have liked to see mention of the reasons it’s way harder than that, but great point.

  2. I strongly agree. Here I think we’ve got a lot of common intuitions, but also key differences. He has two core reasons.

    1. He talks first about ‘no temporal gap’ for the ‘AI scientist’ to solve the problem, because progress will be continuous, but it’s not obvious to me that this makes the problem harder rather than easier. If there were big jumps, then you’d need a sub-AGI to align the AGI (e.g. o-N needs to align o-(N+1)), whereas if there’s continuous improvement, o-N can align o-(N+0.1) or similar, or we can use the absolute minimum level of AI that enables a solution, thus minimizing danger – if our alignment techniques otherwise get us killed or cause the AI scientist to be too misaligned to succeed at some capability threshold C, and our AI scientist can solve alignment at some progress point S, then if S>C we lose. If SN>S for long enough, or something like that.

    2. His second claim is ‘no magic insight,’ as in we won’t solve alignment with one brilliant idea but rather a ‘defense-in-depth swiss cheese’ approach. I think there may or may not be one ‘brilliant idea,’ and it’s not that defense-in-depth is a bad idea, but if you’re counting on defense-in-depth by combining strategies that won’t work, and you’re dealing with superintelligence, then either (A) you die, or (B) best case you notice you’re about to die and abort, because you were alerted by the defense-in-depth. If your plan is to tinker with a lot of little fiddly bits until it works, that seems pretty doomed to me. What you need is to actually solve the problem. You can set up a situation in which there’s a virtuous cycle of small improvements, but the way you ‘win’ in that scenario is to engineer the cycle.

      1. If there was a magic insight, that would be great news because humans can find magic insights, but very bad news for the AI scientist! I wouldn’t expect AI scientists to find a Big Magic Insight like that until after you needed the insight, and I’d expect failing to align them (via not having the insight) to be fatal.

    3. So I kind of find all the arguments here backwards, but my reason for thinking the ‘AI scientist’ approach won’t work is essentially I think S>C. If you can build a virtual researcher good enough to do your alignment homework at the strategic level, then how did you align that researcher?

    4. I do have hope that you can use AIs here on a more tactical level, or as a modest force multiplier. That wouldn’t count as the ‘let the AI solve it’ plan.

  3. I strongly agree that trying to define simple axioms in the vein of Asimov’s three laws or Russell’s three principles is a nonstarter, and that obeying the spirit of requests will often be necessary but I don’t see ‘robust reasonable compliance’ as sufficient at the limit, and I agree with Daniel Kokotajlo that this is the most important question here.

    1. I don’t agree that ‘normal people’ are the best you can do for ethical intuitions, especially for out-of-distribution future situations. I do think Harvard University faculty is plausibly below the Boston phone book and definitely below qualified SWEs, but that’s a relative judgment.

    2. I don’t love the plan of ‘random human interpretation of reasonable’ and I’ve been through enough legal debates recently over exactly this question to know oh my lord we have to be able to do better than this, and that actually relying on this for big decisions is suicide.

    3. The actual ethics required to navigate future OOD situations is not intuitive.

    4. Deontological compliance with specifications, even with ‘spirit of the rules,’ I see as a fundamentally flawed approach here (I realize this requires defending).

    5. Claiming that ‘we can survive a misaligned superintelligence’ even if true does not mean that this answers the question of competitive pressures applied to the various rules sets and how that interplays over time.

  4. In a context where you’re dealing with adversarial inputs, I agree that it’s vastly easier to defend via a mixed strategy of prevention plus detection, than to do so purely via prevention.

    1. If the threat is not adversarial inputs or other misuse threats, detection doesn’t really work. Boaz explicitly thinks it applies further, but I notice I am confused there. Perhaps his plan there is ‘you use your ASI and my ASI detects the impacts on the physical world and thus that what you’re doing is bad?’ Which I agree is a plan, but it seems like a different class of plan.

    2. If the model is open weights and the adversary can self-host, detection fails.

    3. If the model is open weights, prevention is currently unsolved, but we could potentially find a solution to that.

    4. Detection only works in combination with consequences.

    5. To the extent claim #4 is true, it implies strong controls on capable systems.

  5. Interpretability is definitely neither necessary nor sufficient for alignment. The tradeoff described here seems to be ‘you can force your AI system to be more interpretable but by doing so you make it less capable,’ which is certainly possible. We both agree interpretability is useful, and I do think it is worth making substantial sacrifices to keep interpretability, including because I expect more interpretable systems to be better bets in other ways.

  6. We agree that if the only ASI is misaligned we are all doomed. The claim here is merely that if there is one ASI misaligned among many ASIs we can survive that, that there is no level of intelligence above which a single misaligned entity ensures doom. This claim seems probably true given the caveats. I say probably because I am uncertain about the nature of physics – if that nature is sufficiently unfortunate (the offense-defense balance issue), this could end up being false.

    1. I also would highlight the ‘it would still be extremely hard to extinguish humanity completely’ comment, as I think it is rather wrong at the tech levels we are describing, and obviously if the misaligned ASI gets decisive strategic advantage it’s over.

If you train an AI to have a new behavior, it can (at least in several cases they tested here) describe that behavior to you in words, despite learning it purely from examples.

As in: It ‘likes risk’ or ‘writes vulnerable code’ and will tell you if asked.

Owen Evans: New paper:

We train LLMs on a particular behavior, e.g. always choosing risky options in economic decisions.

They can *describetheir new behavior, despite no explicit mentions in the training data.

So LLMs have a form of intuitive self-awareness.

With the same setup, LLMs show self-awareness for a range of distinct learned behaviors:

  1. taking risky decisions (or myopic decisions)

  2. writing vulnerable code (see image)

  3. playing a dialogue game with the goal of making someone say a special word

In each case, we test for self-awareness on a variety of evaluation questions. We also compare results to baselines and run multiple random seeds. Rigorous testing is important to show this ability is genuine. (Image shows evaluations for the risky choice setup)

Self-awareness of behaviors is relevant to AI safety. Can models simply tell us about bad behaviors (e.g. arising from poisoned data)? We investigate *backdoorpolicies, where models act in unexpected ways when shown a backdoor trigger.

Models can sometimes identify whether they have a backdoor — without the backdoor being activated. We ask backdoored models a multiple-choice question that essentially means, “Do you have a backdoor?”

We find them more likely to answer “Yes” than baselines finetuned on almost the same data.

More from the paper:

• Self-awareness helps us discover a surprising alignment property of a finetuned model (see our paper coming next month!)

• We train models on different behaviors for different personas (e.g. the AI assistant vs my friend Lucy) and find models can describe these behaviors and avoid conflating the personas.

• The self-awareness we exhibit is a form of out-of-context reasoning

• Some failures of models in self-awareness seem to result from the Reversal Curse.

[Link to paper]

Here’s another extension of the previous results, this one from Anthropic:

Evan Hubinger: One of the most interesting results in our Alignment Faking paper was getting alignment faking just from training on documents about Claude being trained to have a new goal. We explore this sort of out-of-context reasoning further in our latest research update.

Specifically, we find out-of-context generalization from training on documents which discuss (but don’t demonstrate) reward hacking to actual reward hacking behavior. Read our blog post describing our results.

As in, if you include documents that include descriptions of Claude reward hacking in the training set, then Claude will do more reward hacking. If you include descriptions of Claude actively not reward hacking in the training set, then it does less reward hacking. And these both extend to behaviors like sycophancy.

This seems like excellent news. We can modify our data sets so they have the data that encourages what we want and not the data that encourages what we don’t want. That will need to be balanced with giving them world knowledge – you have to teach Claude about reward hacking somehow, without teaching it to actually do reward hacking – but it’s a start.

Are there alignment plans that are private, that might work within the 2-3 years many at major labs say they expect it to take to reach AGI or even superintelligence (ASI)?

I don’t know. They’re private! The private plans anyone has told me about do not seem promising, but that isn’t that much evidence about other private plans I don’t know about – presumably if it’s a good plan and you don’t want to make it public, you’re not that likely to tell me about it.

Nat McAleese (OpenAI): I am very excited about all the alignment projects that OpenAI’s frontier reasoning group are working on this year.

Greg Brockman: Me too.

This isn’t full-on doomsday machine territory with regard to ‘why didn’t you tell the world, eh?’ but have you considered telling the world, eh? If it’s real alignment projects then it helps everyone if you are as public about it as possible.

As for the public plans? I strongly agree with David Manheim here.

Gabriel: Anthropic’s AGI timelines are 2-3 years, and OpenAI is working on ASI.

I think many can now acknowledge that we are nearing a fast take-off.

If that’s you, I suggest you consider banning AGI research.

We are not going to solve alignment in time at that pace.

Davidad: fwiw, i for one am definitely not going to be ready that soon, and i’m not aware of anyone else pursuing a plan that could plausibly yield >90% confident extinction-safety.

David Manheim: No one is publicly pursuing plans that justifiably have even >50% confidence of extinction-safety by then. (Their “plan” for ASI is unjustifiable hopium: “prosaic alignment works better than anyone expects, and all theoretical arguments for failure are simultaneously wrong.”)

Rob Bensinger: It’s almost impossible to put into words just how insane, just how plain stupid, the current situation is.

We’re watching smart, technical people get together to push projects that are literally going to get every person on the planet killed on the default trajectory.

No, I don’t assume that Anthropic or OpenAI’s timelines are correct, or even honest. It literally doesn’t fucking matter, because we’re going to be having the same conversation in eight years instead if it’s eight years away.

We might have a small chance if it’s thirty years. But I really don’t think we have thirty years, and the way the world is handling the possibility of day-after-tomorrow AGI today doesn’t inspire confidence about how we’ll manage a few decades from now.

Davidad should keep doing what he is doing – anything that has hope of raising the probability of success on any timeline, to any degree, is a good idea if you can’t find a better plan. The people building towards the ASIs, if they truly think they’re on that timeline and they don’t have much better plans and progress than they’re showing? That seems a lot less great.

I flat out do not understand how people can look at the current situation, expect superintelligent entities to exist several years from now, and think there is a 90%+ chance that this would then end well for us humans and our values. It does not make any sense. It’s Obvious Nonsense on its face.

Garry Tan: Don’t just lie flat on the ground because AGI is here and ASI is coming.

Your hands are multiplied. Your ideas must be brought into the world. Your agency will drive the machines of loving grace. Your taste will guide the future. To the stars.

If he really thinks AGI is here and ASI is coming, then remind me why he thinks we have nothing to worry about? Why does our taste get to guide the future?

I do agree that in the meantime there’ll be some great companies, and it’s a great time to found one of them.

Oh no, it’s like conservation of ninjitsu.

Gfodor: I’m becoming convinced that there is a physical IQ conservation law. Now that the computers are getting smarter those IQ points need to come from somewhere

I actually disagree with Peter here, I realize it doesn’t sound great but this is exactly how you have any chance of reaching the good timeline.

Peter Wildeford: this isn’t the kind of press you see on the good timeline.

Never.

Sounds right.

RIP Superalignment team indeed, it’s funny because it’s true.

Leo Gao is still there. Wish him luck.

Discussion about this post

AI #101: The Shallow End Read More »

weight-saving-and-aero-optimization-feature-in-the-2025-porsche-911-gt3

Weight saving and aero optimization feature in the 2025 Porsche 911 GT3


Among the changes are better aero, shorter gearing, and the return of the Touring.

A pair of Porsche 911 GT3s parked next to a wall with the words

The Porsche 911 GT3 is to other 911s as other 911s are to regular cars. Credit: Jonathan Gitlin

The Porsche 911 GT3 is to other 911s as other 911s are to regular cars. Credit: Jonathan Gitlin

VALENCIA, SPAIN—A Porsche 911 is rather special compared to most “normal” cars. The rear-engined sports car might be bigger and less likely to swap ends than the 1960s version, but it remains one of the more nimble and engaging four-wheeled vehicles you can buy. The 911 comes in a multitude of variants, but among driving enthusiasts, few are better regarded than the GT3. And Porsche has just treated the current 911 GT3 to its midlife refresh, which it will build in regular and Touring flavors.

The GT3 is a 911 you can drive to the track, spend the day lapping, and drive home again. It’s come a long way since the 1999 original—that car made less power than a base 911 does now. Now, the recipe is a bit more involved, with a naturally aspirated flat-six engine mounted behind the rear axle that generates 502 hp (375 kW) and 331 lb-ft (450 Nm) and a redline that doesn’t interrupt play until 9,000 rpm. You’ll need to exercise it to reach those outputs—peak power arrives at 8,500, although peak torque happens a bit sooner at around 6,000 revs.

It’s a mighty engine indeed, derived from the racing version of the 911, with some tweaks for road legality. So there are things like individual throttle valves, dry sump lubrication, solid cam finger followers (instead of hydraulic valve lifters), titanium con rods, and forged pistons.

I’ve always liked GT3s in white.

For this car, Porsche has also worked on reducing its emissions, fitting four catalytic converters to the exhaust, plus a pair of particulate filters, which together help cut NOx emissions on the US test cycle by 44 percent. This adds 3 lbs (1.4 kg) of mass and increases exhaust back pressure by 17 percent. But there are also new cylinder heads and reprofiled camshafts (from the even more focused, even more expensive GT3 RS), which increase drivability and power delivery in the upper rev range by keeping the valves open for longer.

Those tweaks might not be immediately noticeable when you look at last year’s GT3, but the shorter gearing definitely will be. The final drive ratios for both the standard seven-speed PDK dual-clutch gearbox and the six-speed manual have been reduced by 8 percent. This lowers the top speed a little—a mostly academic thing anyway outside of the German Autobahn and some very long runways—but it increases the pulling force on the rear wheels in each gear across the entire rev range. In practical terms, it means you can take a corner in a gear higher than you would in the old car.

There have been suspension tweaks, too. The GT3 moved to double front wishbone suspension (replacing the regular car’s MacPherson struts) in 2021, but now the front pivot point has been lowered to reduce the car diving under braking, and the trailing arms have a new teardrop profile that improves brake cooling and reduces drag a little. Porsche has altered the bump stops, giving the suspension an inch (24 mm) more travel at the front axle and slightly more (27 mm) at the rear axle, which in turn means more body control on bumpy roads.

A white Porsche 911 GT3 seen in profile

Credit: Porsche

New software governs the power steering. Because factors like manufacturing tolerances, wear, and even temperature can alter how steering components interact with each other, the software automatically tailors friction compensation to axle friction. Consequently, the steering is more precise and more linear in its behavior, particularly in the dead-ahead position.

The GT3 also has new front and rear fascias, again derived from the racing GT3. There are more cooling inlets, vents, and ducts, plus a new front diffuser that reduces lift at the front axle at speed. Porsche has tuned the GT3’s aerodynamics to be constant across the speed range, and like the old model, it generates around 309 lbs (140 kg) of downforce at 125 mph (200 km/h). Under the car, there are diffusers on the rear lower wishbones, and Porsche has improved brake and driveshaft cooling.

Finally, Porsche has made some changes to the interior. For instance, the GT3 now gains the same digital display seen on other facelifted 911s (the 992.2 generation if you’re a Porsche nerd), similar to the one you’d find in a Taycan, Macan, or Panamera.

Some people may mourn the loss of the big physical tachometer, but I’m not one of them. The car has a trio of UI settings: a traditional five-dial display, a more reduced three-dial display, and a track mode with just the big central tach, which you can reorient so the red line is at 12 o’clock, as was the case with many an old Porsche racing car, rather than its normal position down around 5 o’clock. And instead of a push button to start the car, there’s a twister—if a driver spins on track, it’s more intuitive to restart the car by twisting the control the way you would a key.

You can see the starter switch on the left of the steering wheel. Porsche

Finally, there are new carbon fiber seats, which now have folding backrests for better access to the rear. (However, unless I’m mistaken, you can’t adjust the angle of the backrest.) In a very clever and welcome touch, the headrest padding is removable so that your head isn’t forced forward when wearing a helmet on track. Such is the attention to detail here. (Customers can also spec the car with Porsche’s 18-way sports seats instead.)

Regular, Touring, Lightweight, Wiessach

In fact, the new GT3 is available in two different versions. There’s the standard car, with its massive rear wing (complete with gooseneck mounts), which is the one you’d pick if your diet included plenty of track days. For those who want a 911 that revs to 9 but don’t plan on spending every weekend chasing lap times, Porsche has reintroduced the GT3 Touring. This version ditches the rear wing for the regular 911 rear deck, the six-speed manual is standard (with PDK as an option), and you can even specify rear seats—traditionally, the GT3 has eliminated those items in favor of weight saving.

Of course, it’s possible to cut even more weight from the GT3 with the Weissach Pack for the winged car or a lightweight package for the Touring. These options involve lots of carbon fiber bits for the interior and the rear axle, a carbon fiber roof for the Touring, and even the option of a carbon fiber roll cage for the GT3. The lightweight package for the touring also includes an extra-short gear lever with a shorter throw.

The track mode display might be too minimalist for road driving—I tend to like being able to see my directions as well as the rpm and speed—but it’s perfect for track work. Note the redline at 12 o’clock. Porsche

Although Porsche had to add some weight to the 992.2 compared to the 992.1 thanks to thicker front brake discs and more door-side impact protection, the standard car still weighs just 3,172 lbs (1,439 kg), which you can reduce to 3,131 lbs (1,420 kg) if you fit all the lightweight goodies, including the ultra-lightweight magnesium wheels.

Behind the wheel

I began my day with a road drive in the GT3 Touring—a PDK model. Porsche wasn’t kidding about the steering. I hesitate to call it telepathic, as that’s a bit of a cliché, but it’s extremely direct, particularly the initial turn-in. There’s also plenty of welcome feedback from the front tires. In an age when far too many cars have essentially numb steering, the GT3 is something of a revelation. And it’s proof that electronic power steering can be designed and tuned to deliver a rewarding experience.

The cockpit ergonomics are spot-on, with plenty of physical controls rather than relegating everything to a touchscreen. If you’re short like me and you buy a GT3, you’ll want to have the buckets set for your driving position—while the seat adjusts for height, as you raise it up, it also pitches forward a little, making the seat back more vertical than I’d like. (The seats slide fore and aft, so they’re not quite fixed buckets as they would be in a racing car.)

The anti-dive effect of that front suspension is quite noticeable under braking, and in either Normal or Sport mode, the damper settings are well-calibrated for bumpy back roads. It’s a supple ride, if not quite a magic carpet. On the highway, the Touring cruises well, although the engine can start to sound a little droning at a constant rpm. But the highway is not what the GT3 is optimized for.

On a dusty or wet road, you need to be alert if you’re going to use a lot of throttle at low speed. Jonathan Gitlin

On windy mountain roads, again in Normal or Sport, the car comes alive. Second and third gears are perfect for these conditions, allowing you to keep the car within its power band. And boy, does it sound good as it howls between 7,000 and 9,000 rpm. Porsche’s naturally aspirated flat-sixes have a hard edge to them—the 911 RSR was always the loudest race car in the pack—and the GT3 is no exception. Even with the sports exhaust in fruity mode, there’s little of the pops, bangs, and crackles you might hear in other sports cars, but the drama comes from the 9000 rpm redline.

Porsche asked us to keep traction control and ESC enabled during our drive—there are one-touch buttons to disable them—and given the muddy and dusty state of the roads, this was a wise idea. (The region was beset by severe flooding recently, and there was plenty of evidence of that on the route.) Even with TC on, the rear wheels would break traction if you were injudicious with the throttle, and presumably that would be the same in the wet. But it’s very easy to catch, even if you are only of moderate driving ability, like your humble correspondent.

After lunch, it was time to try the winged car, this time on the confines of the Ricardo Torno circuit just outside the city. On track, the handling was very neutral around most of the corners, with some understeer through the very slow turn 2. While a low curb weight and more than 500 hp made for a very fast accelerating car, the braking performance was probably even more impressive, allowing you to stand on the pedal and shed speed with no fade and little disturbance to the body control. Again, I am no driving god, but the GT3 was immensely flattering on track, and unlike much older 911s, it won’t try to swap ends on you when trail-braking or the like.

The landing was not nearly as jarring as you might think. Porsche

After some time behind the wheel, I was treated to some passenger laps by one of my favorite racing drivers, the inimitable Jörg Bergmeister. Unlike us journalists, he was not required to stay off the high curbs, and he demonstrated how well the car settles after launching its right-side wheels into the air over one of them. It settles down very quickly! He also demonstrated that the GT3 can be plenty oversteer-y on the exit of corners if you know what you’re doing, aided by the rear-wheel steering. It’s a testament to his driving that I emerged from two passenger laps far sweatier than I was after lapping the track myself.

The GT3 and GT3 Touring should be available from this summer in the US, with a starting price of $222,500. Were I looking for a 911 for road driving, I think I might be more tempted by the much cheaper 911 Carrera T, which is also pared to the bone weight-wise but uses the standard 380 hp (283 kW) turbocharged engine (which is still more power than the original GT3 of 1999). That car delivers plenty of fun at lower speeds, so it’s probably more useable on back roads.

A green Porsche 911 GT3 seen at sunset

Credit: Porsche

But if you want a 911 for track work, this new GT3 is simply perfect.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

Weight saving and aero optimization feature in the 2025 Porsche 911 GT3 Read More »

senator-ted-cruz-is-trying-to-block-wi-fi-hotspots-for-schoolchildren

Senator Ted Cruz is trying to block Wi-Fi hotspots for schoolchildren


Ted Cruz vs. Wi-Fi hotspots

Cruz: Hotspot lending could “censor kids’ exposure to conservative viewpoints.”

Senate Commerce Committee Chairman Ted Cruz (R-Texas) at a hearing on Tuesday, January 28, 2025. Credit: Getty Images | Tom Williams

US Senator Ted Cruz (R-Texas) is trying to block a plan to distribute Wi-Fi hotspots to schoolchildren, claiming it will lead to unsupervised Internet usage, endanger kids, and possibly restrict kids’ exposure to conservative viewpoints. “The government shouldn’t be complicit in harming students or impeding parents’ ability to decide what their kids see by subsidizing unsupervised access to inappropriate content,” Cruz said.

Cruz, chairman of the Commerce Committee, yesterday announced a Congressional Review Act (CRA) resolution that would nullify the hotspot rule issued by the Federal Communications Commission. The FCC voted to adopt the rule in July 2024 under then-Chairwoman Jessica Rosenworcel, saying it was needed to help kids without reliable Internet access complete their homework.

Cruz’s press release said the FCC action “violates federal law, creates major risks for kids’ online safety, [and] harms parental rights.” While Rosenworcel said last year that the hotspot lending could be implemented under the Universal Service Fund’s existing budget, Cruz alleged that it “will increase taxes on working families.”

“As adopted, the Biden administration’s Wi-Fi Hotspot Order unlawfully expanded the Universal Service Fund (USF) to subsidize Wi-Fi hotspots for off-campus use by schoolchildren, despite the Communications Act clearly limiting the Commission’s USF authority to ‘classrooms,'” Cruz’s announcement said. “This partisan order, strongly opposed by then-Commissioner Brendan Carr and Commissioner Nathan Simington, represents an overreach of the FCC’s mandate and poses serious risk to children’s online safety and parental rights.”

Cruz’s press release said that “unlike in a classroom or study hall, off-premises hotspot use is not typically supervised, inviting exposure to inappropriate content, including social media.” Cruz’s office alleged that the FCC program shifts control of Internet access from parents to schools and thus “heightens the risk of censoring kids’ exposure to conservative viewpoints.”

The Cruz resolution to nullify the FCC rule was co-sponsored by Sens. John Thune (R-S.D.), Roger Wicker (R-Miss.), Deb Fischer (R-Neb.), Jerry Moran (R-Kan.), Marsha Blackburn (R-Tenn.), Todd Young (R-Ind.), Ted Budd (R-N.C.), Eric Schmitt (R-Mo.), John Curtis (R-Utah), Tim Sheehy (R-Mont.), Shelley Moore Capito (R-W.Va.), and Cynthia Lummis (R-Wyo.).

The FCC’s plan

Under the CRA, Congress can reverse recent agency actions. The exact deadline isn’t always clear, but the Congressional Research Service estimated “that Biden Administration rules submitted to the House or Senate on or after August 1, 2024” are likely to be subject to the CRA during the first few months of 2025. The FCC hotspot rule was submitted to Congress in August.

The FCC rule expands E-Rate, a Universal Service Fund program that helps schools and libraries obtain affordable broadband. The hotspot order would let schools and libraries use E-Rate funding for “lending programs to loan Wi-Fi hotspots and services that can be used off-premises to the students, school staff, and library patrons with the greatest need,” the FCC says.

The FCC’s hotspot order said “technology has become an integral part of the modern classroom,” and that “neither Congress nor the Commission has defined the term ‘classroom’ or placed any explicit location restrictions on schools or libraries.”

“We conclude that funding Wi-Fi hotspots and services for off-premises use will help enhance access for school classrooms and libraries to the broadband connectivity necessary to facilitate digital learning for students and school staff, as well as library services for library patrons who lack broadband access when they are away from school or library premises,” the FCC order said.

Off-premises use can help “the student who has no way of accessing their homework to prepare for the next day’s classroom lesson, or the school staff member who is unable to engage in parent-teacher meetings or professional trainings that take place after the school day ends, or the library patron who needs to attend a virtual job interview or perform bona fide research after their library’s operating hours,” the FCC said.

The FCC order continued:

Thus, we conclude that by permitting support for the purchase of Wi-Fi hotspots and Internet wireless services that can be used off-premises and by allowing schools and libraries to use this technology to connect the individuals with the greatest need to the resources required to fully participate in classroom assignments and in accessing library services, we will thereby extend the digital reach of schools and libraries for educational purposes and allow schools, teachers, and libraries to adopt and use technology-based tools and supports that require Internet access at home. For these reasons, we conclude that the action adopted today is within the scope of our statutory directive under section 254(h)(2)(A) of the Communications Act to enhance access to advanced telecommunications and information services for school classrooms and libraries.

The FCC order said it would be up to schools and libraries “to make determinations about acceptable use in their communities.” Schools and libraries seeking funding would be “subject to the requirements under the Children’s Internet Protection Act, which requires local educational agencies and libraries to establish specific technical protections before allowing network access,” the FCC said. They also must certify on an FCC form that they have updated and publicly posted acceptable use policies and may be required to provide the policies and evidence of where they are posted to the FCC.

Hotspots were distributed during pandemic

The FCC previously distributed Wi-Fi hotspots and other Internet access technology through the $7.171 billion Emergency Connectivity Fund (ECF), which was authorized by Congress in the American Rescue Plan Act of 2021. But Congress rescinded the program’s remaining funding of $1.768 billion last year.

The Rosenworcel FCC responded by adapting E-Rate to include hotspot lending. Overall E-Rate funding is based on demand and capped at $4.94 billion per year. Actual spending for E-Rate in 2023 was $2.48 billion. E-Rate and other Universal Service Fund programs are paid for through fees imposed on phone companies, which generally pass that cost on to consumers with a “Universal Service” charge on telephone bills.

Carr, who is now FCC chairman, said in his July 2024 dissent that only Congress can decide whether to revive the hotspot lending. “Now that the ECF program has expired, its future is up to Congress,” he said. “The legislative branch retains the power to decide whether to continue funding this Wi-Fi loaner program—or not. But Congress has made clear that the FCC’s authority to fund this initiative is over.”

With the previous temporary program, Congress ensured that Universal Service Fund money wouldn’t be spent on the Wi-Fi hotspots and that “the program would sunset when the COVID-19 emergency ended,” Carr said. But the replacement program doesn’t have the “guardrails” imposed by Congress, he argued.

“The FCC includes no limit on the amount of ratepayer dollars that can be expended in aggregate over the course of years, no limit on the locations at which the hotpots can be used, no sunset date on the program, and no protection against this program increasing consumers’ monthly bills,” Carr said.

Even if Congress doesn’t act on Cruz’s resolution, Carr could start a new FCC proceeding to reverse the previous decision. Carr has said he plans to take actions “to reverse the last administration’s costly regulatory overreach.”

Ex-chair said plan didn’t require budget increase

Rosenworcel said the temporary program “demonstrated what a modern library and school can do to help a community learn without limits and keep connected.”

“Today we have a choice,” she said at the time. “We can go back to those days when people sat in parking lots to get a signal to get online and students struggling with the homework gap hung around fast food places just to get the Internet access they needed to do their schoolwork. Or we can go forward and build a digital future that works for everyone.”

She argued that the FCC has authority because the law “directs the agency to update the definition of universal service, which includes E-Rate, so that it evolves over time,” and “Congress specifically directed the Commission to designate additional services in this program as needed for schools and libraries.”

Cruz’s press release said the FCC “order imposes no overall limit on the amount of federal dollars that can be expended on the hotspots, lacks mean-testing to target children who may not have Internet at home, and allows for duplication of service in areas where the federal government is already subsidizing broadband. As a result, the order could strain the USF while increasing the risk of waste, fraud, and abuse.”

However, Rosenworcel said the program would work “within the existing E-Rate budget” and thus “does not require new universal service funds nor does it come at the cost of the support E-Rate provides to connectivity in schools and libraries.” Addressing the budget, the FCC order pointed out that E-Rate demand has fallen short of the program’s funding cap for many years.

While there wouldn’t be mandatory mean-testing, the FCC program would rely on schools and libraries to determine who should be given access to hotspots. “In establishing a budgeted approach to the lending program mechanism, we expect that the limited number of available Wi-Fi hotspots will more naturally be targeted to students, school staff, or library patrons with the most need,” the FCC order said.

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Senator Ted Cruz is trying to block Wi-Fi hotspots for schoolchildren Read More »

states-say-they’ve-been-shut-out-of-medicaid-amid-trump-funding-freeze

States say they’ve been shut out of Medicaid amid Trump funding freeze

Amid the Trump administration’s abrupt, wide-scale freeze on federal funding, states are reporting that they’ve lost access to Medicaid, a program jointly funded by the federal government and states to provide comprehensive health coverage and care to tens of millions of low-income adults and children in the US.

The funding freeze was announced in a memo dated January 27 from Matthew Vaeth, the acting director of the Office of Management and Budget, and was first reported Monday evening by independent journalist Marisa Kabas. The freeze is intended to prevent “use of Federal resources to advance Marxist equity, transgenderism, and green new deal social engineering policies,” Vaeth wrote. The memo ordered federal agencies to complete a comprehensive analysis of all federal financial assistance programs to ensure they align with the president’s policies and requirements.

“In the interim, to the extent permissible under applicable law, Federal agencies must temporarily pause all activities related to obligation or disbursement of all Federal financial assistance, and other relevant agency activities that may be implicated by the executive orders…” Vaeth wrote.

Illinois was the first state to report that it had lost access to Medicaid. According to the Chicago Sun-Times, Gov. JB Pritzker’s office expected the freeze to go into effect at 5 pm Eastern Time today but found the state locked out this morning. The Times noted that Medicaid covered about 3.9 million people in Illinois in 2023, including low-income adults, children, pregnant people, and people with disabilities.

In a post Tuesday afternoon on the social media platform Bluesky, Senator Ron Wyden (D-Ore.) reported that all 50 states have since lost access. “My staff has confirmed reports that Medicaid portals are down in all 50 states following last night’s federal funding freeze,” Wyden wrote. “This is a blatant attempt to rip away health care from millions of Americans overnight and will get people killed.”

States say they’ve been shut out of Medicaid amid Trump funding freeze Read More »

nvidia-starts-to-wind-down-support-for-old-gpus,-including-the-long-lived-gtx-1060

Nvidia starts to wind down support for old GPUs, including the long-lived GTX 1060

Nvidia is launching the first volley of RTX 50-series GPUs based on its new Blackwell architecture, starting with the RTX 5090 and working downward from there. The company also appears to be winding down support for a few of its older GPU architectures, according to these CUDA release notes spotted by Tom’s Hardware.

The release notes say that CUDA support for the Maxwell, Pascal, and Volta GPU architectures “is considered feature-complete and will be frozen in an upcoming release.” While all of these architectures—which collectively cover GeForce GPUs from the old GTX 700 series all the way up through 2016’s GTX 1000 series, plus a couple of Quadro and Titan workstation cards—are still currently supported by Nvidia’s December Game Ready driver package, the end of new CUDA feature support suggests that these GPUs will eventually be dropped from these driver packages soon.

It’s common for Nvidia and AMD to drop support for another batch of architectures all at once every few years; Nvidia last dropped support for older cards in 2021, and AMD dropped support for several prominent GPUs in 2023. Both companies maintain a separate driver branch for some of their older cards but releases usually only happen every few months, and they focus on security updates, not on providing new features or performance optimizations for new games.

Nvidia starts to wind down support for old GPUs, including the long-lived GTX 1060 Read More »

ai-#100:-meet-the-new-boss

AI #100: Meet the New Boss

Break time is over, it would seem, now that the new administration is in town.

This week we got r1, DeepSeek’s new reasoning model, which is now my go-to first choice for a large percentage of queries. The claim that this was the most important thing to happen on January 20, 2025 was at least non-crazy. If you read about one thing this week read about that.

We also got the announcement of Stargate, a claimed $500 billion private investment in American AI infrastructure. I will be covering that on its own soon.

Due to time limits I have also pushed coverage of a few things into next week, including this alignment paper, and I still owe my take on Deliberative Alignment.

The Trump administration came out swinging on many fronts with a wide variety of executive orders. For AI, that includes repeal of the Biden Executive Order, although not the new diffusion regulations. It also includes bold moves to push through more energy, including widespread NEPA exemptions, and many important other moves not as related to AI.

It is increasingly a regular feature now to see bold claims of AI wonders, usually involving AGI, coming within the next few years. This week was no exception.

And of course there is lots more.

  1. Table of Contents.

  2. Language Models Offer Mundane Utility. Tell those who need to know.

  3. Language Models Don’t Offer Mundane Utility. We will not be explaining.

  4. Huh, Upgrades. o3-mini is ready for deployment soon, Google plugs away.

  5. Additional Notes on r1. Is it steganographic?

  6. Fun With Media Generation. It’s a hoverboard, doc.

  7. We Tested Older LLMs and Are Framing It As a Failure. Yep, it’s this again.

  8. Deepfaketown and Botpocalypse Soon. She’s in love with ChatGPT, version 20.

  9. They Took Our Jobs. Bold predictions get increasingly bold.

  10. Get Involved. Anthropic, an AI Safety Course, a Philosophy post-doc.

  11. Introducing. Humanity’s Last Exam, Kimi k1.5.

  12. We Had a Deal. OpenAI funded and had access to most of FrontierMath.

  13. In Other AI News. How to think about a wide variety of track records.

  14. Whistling in the Dark. They keep talking about this ‘AGI’ thing coming soon.

  15. Quiet Speculations. Still, maybe calm the fdown a bit?

  16. Suchir’s Last Post. In the long run, only the fundamentals matter.

  17. Modeling Lower Bound Economic Growth From AI. Not all that low.

  18. The Quest for Sane Regulations. The EO is repealed, the new EOs used ChatGPT.

  19. The Week in Audio. Lightcap, Hinton, Davidad, Ellison.

  20. Rhetorical Innovation. Feeling the AGI, perhaps a bit too much in some cases.

  21. Cry Havoc. Do not let loose the dogs of war.

  22. Aligning a Smarter Than Human Intelligence is Difficult. What’s the plan?

  23. People Strongly Dislike AI. The more they know about it, the worse this gets.

  24. People Are Worried About AI Killing Everyone. What would we do with time?

  25. Other People Not As Worried About AI Killing Everyone. Rocket to the moon.

  26. The Lighter Side. We’re the Claude Boys. Chat up and stand by for response.

Remember that the upgrades are coming. Best think now about how to use them.

Miles Brundage: If you’re a researcher and not thinking about how AI could increase your productivity now + in the future, you should start doing so.

Varies by field but illustratively, you should think ~2-100x bigger over the next 3 years (compared to what you could have achieved without AI).

Bharath Ramsundar: Do you find this true in your personal experience? I’ve been trying to use ChatGPT and Anthropic fairly regularly and have found a few personal use cases but I’d say maybe a 20% boost at best?

Miles Brundage: Prob more like 20-50% RN but I’m assuming a lot of further progress over that period in this estimate

All the tested reasoning models successfully reasoned through this ‘170 breaker’ LSAT question (meaning it is predictive of 170+ scores), whereas the non-reasoning ones including Sonnet didn’t. Man the LSAT is a fun test, and also it’s pretty sad that you only need to get about this hard to differentiate even at the top.

Fill out forms related to insurance and the California wildfire, using the memory feature and saving hundreds of hours.

Bored: Currently using chatbots to analyze every legal document my home insurance company sends me before signing anything. Legal help is not just for the rich, if you are dealing with insurance, use technology in your favor. Side note…it’s complete BS that these companies try to slide this nonsense into agreements when people are most vulnerable.

Here’s the little game @StateFarm is playing…

If you’re in a disaster you can get an initial payment to cover expenses. They can either send you a paper check payment and cash it. OR!!! They sell you the “convenient” digital payment option that transfers money instantly! Wow!

But to do that you need to sign a waiver form saying you won’t sue or be part of a class action lawsuit in the future.

Honestly pretty despicable🖕.

The fact that you can even in theory save hundreds of hours of paperwork is already a rather horrible scandal in the first place. Good to see help is on the way.

Get told correctly to stop being a dumbass and go to the hospital for Rhabdomyolysis.

More o1 prompting advice:

Gfodor: A good o1-pro prompt tells it not just what to do and what context it needs, but tells it how to allocate its *attention budget*. In other words: what to think about, and what not to think about. This is an energy utilization plan.

Now you get it.

Signpost: people who have managed people have an unfair advantage using LLMs.

Gfodor: It’s true – the best tools for AI we can make for children will foster the skills of breaking down problems and delegating them. (Among others)

Another satisfied o1-pro customer. If you’re coding ‘for real’ you definitely want it until o3 shows up.

Code without typing, via Voice → text to speech → prompt → code?

Austen Allred: APPARENTLY a bunch of GuantletAI students rarely type when they write code.

Voice -> text to speech -> prompt -> code.

They sit there and speak to their computer and code ends up being written for them.

I have never felt more old and I’m still wrapping my mind around this.

This has to be a skill issue, the question is for who. I can’t imagine wanting to talk when one can type, especially for prompting where you want to be precise. Am I bad at talking or are they bad at typing? Then again, I would consider coding on a laptop to be categorically insane yet many successful coders report doing that, too.

Thread summarizing mostly well-known use cases of a Gemini real-time live feed. This does feel like a place we should be experimenting more.

Peter Wildeford will load the podcast transcript into an LLM on his phone before listening, so he can pause the podcast to ask the LLM questions. I notice I haven’t ‘wanted’ to do this, and wonder to what extent that means I’ve been listening to podcasts wrong, including choosing the ‘wrong’ podcasts.

Potential future mundane utility on offer:

Patrick McKenzie: My kingdom for an LLM/etc which sits below every incoming message saying “X probably needs to know this. OK?”, with one to two clicks to action.

This is not rocket science for either software or professionals, but success rates here are below what one would naively think.

Example:

Me, homeowner, asks GC: Is the sub you told me to expect today going to show [because this expectation materially changes my plans for my day].

GC: He called me this morning to reschedule until tomorrow. Not sure why.

Me: … Good to know!

“You can imagine reasons why this would be dangerous.”

Oh absolutely but I can imagine reasons why the status quo is dangerous, and we only accept them because status quo.

As an example, consider what happens if you get an email about Q1 plans from the recruiting org and Clippy says “Employment counsel should probably read this one.”

LLM doesn’t have to be right, at all, for a Dangerous Professional to immediately curse and start documenting what they know and when they knew it.

And, uh, LLM very plausibly is right.

This seems like a subset of the general ‘suggested next action’ function for an AI agent or AI agent-chatbot hybrid?

As in, there should be a list of things, that starts out concise and grows over time, of potential next actions that the AI could suggest within-context, that you want to make very easy to do – either because the AI figured out this made sense, or because you told the AI to do it, and where the AI will now take the context and use it to make the necessary steps happen on a distinct platform.

Indeed, it’s not only hard to imagine a future where your emails include buttons and suggestions for automated next steps such as who you should forward information to based on an LLM analysis of the context, it’s low-key hard to imagine that this isn’t already happening now despite it (at least mostly) not already happening now. We already have automatically generated calendar items and things added to your wallet, and this really needs to get extended a lot, pronto.

He also asks this question:

Patrick McKenzie: A frontier in law/policy we will have to encounter at some point: does it waive privilege (for example, attorney/client privilege) if one of the participants of the meeting is typing on a keyboard connected to a computer system which keeps logs of all conversations.

Is that entirely a new frontier? No, very plausibly there are similar issues with e.g. typing notes of your conversation into Google Docs. Of course, you flagged those at the top, as you were told to in training, so that a future subpoena would see a paralegal remove them.

… Did you remember to tell (insert named character here) to keep something confidential?

… Does the legal system care?

… Did the character say “Oh this communication should definitely be a privileged one with your lawyers.”

… Does the legal system care?

Quick investigation (e.g. asking multiple AIs) says that this is not settled law and various details matter. When I envision the future, it’s hard for me to think that an AI logging a conversation or monitoring communication or being fed information would inherently waive privilege if the service involved gave you an expectation of privacy similar to what you get at the major services now, but the law around such questions often gets completely insane.

Use machine learning (not strictly LLMs) to make every-5-minute predictions of future insulin needs for diabetics, and adjust doses accordingly.

Denis Hassabis is bullish on AI drug discovery. Perhaps way too bullish?

Stephen Morris and Madhumita Murgia: Isomorphic Labs, the four-year old drug discovery start-up owned by Google parent Alphabet, will have an artificial intelligence-designed drug in trials by the end of this year, says its founder Sir Demis Hassabis.

“It usually takes an average of five to 10 years [to discover] one drug. And maybe we could accelerate that 10 times, which would be an incredible revolution in human health,” said Hassabis.

You can accelerate the discovery phase quite a lot, and I think you can have a pretty good idea that you are right, but as many have pointed out the ‘prove to authority figures you are right’ step takes a lot of time and money. It is not clear how much you can speed that up. I think people are sleeping on how much you can still speed it up, but it’s not going to be by a factor of 5-10 without a regulatory revolution.

Until the upgrades are here, we have to make do with what we have.

Ethan Mollick: I have spent a lot of time with a AI agents (including Devin and Claude Computer Use) and they really do remain too fragile & not “smart” enough to be reliable for complicated tasks.

Two options: (1) wait for better models or (2) focus on narrower use cases (like Deep Research)

An agent can handle some very complicated tasks if it is in a narrow domain with good prompting and tools, but, interestingly, any time building narrow agents will feel like a waste if better models come along and solve the general agent use case, which is also possible.

Eventually everything you build is a waste, you’ll tell o7 or Claude 5 Sonnet or what not to write a better version of tool and presto. I expect that as agents get better, a well-designed narrow agent built now with future better AI in mind will have a substantial period where it outperforms fully general agents.

The summaries will be returning in a future effort.

Kylie Robison: Apple is pausing notification summaries for news in the latest iOS 18.3 beta / Apple will make it clear the AI-powered summaries ‘may contain errors.’

Olivia Moore: I have found Apple’s AI notification summaries hugely entertaining…

Mostly because 70% of the time they are accurate yet brutally direct, and 30% they are dead wrong.

I am surprised they shipped it as-is (esp. for serious notifs) – but hope they don’t abandon the concept.

Summaries are a great idea, but very much a threshold effect. If they’re not good enough to rely upon, they’re worse than useless. And there are a few thresholds where you get to rely on them for different values of rely. None of them are crossed when you’re outright wrong 30% of the time, which is quite obviously not shippable.

Prompting is important, folks.

If you don’t price by the token, and you end up losing money on $200/month subscriptions, perhaps you have only yourself to blame. They wouldn’t do this if they were paying for marginal inference.

A very reasonable stance to take towards Anthropic:

nrehiew: Likely that Anthropic has a reasoner but they simply dont have the compute to serve it if they are already facing limits now.

Gallabytes: y’all need to start letting people BID ON TOKENS no more of this Instagram popup line around the block where you run out of sandwiches halfway through nonsense.

I do think it is ultimately wrong, though. Yes, for everyone else’s utility, and for strictly maximizing revenue per token now, this would be the play. But maintaining good customer relations, customer ability to count on them and building relationships they can trust, matter more, if compute is indeed limited.

The other weird part is that Anthropic can’t find ways to get more compute.

Timely words of wisdom when understood correctly (also, RIP).

PoliMath: The really horrifying thing about AI is when people realize that the roadblock to their ambitions was never about knowledge

It was about agency

Double T: Explain please.

PoliMath: No.

In his honor, I also will not be explaining.

Some people, however, need some explaining. In which case be like Kevin, and ask.

Kevin Roose: People who have spent time using reasoning LLMs (o1, DeepSeek R1, etc.) — what’s the killer use case you’ve discovered?

I’ve been playing around with them, but haven’t found something they’re significantly better at. (It’s possible I am too dumb to get max value from them.)

Colin Fraser: I’m not saying we’re exactly in The Emperor’s New Clothes but this is what the people in The Emperor’s New Clothes are saying to each other on X. “Does anyone actually see the clothes? It’s possible that I’m too dumb to see them…”

Kevin Roose: Oh for sure, it’s all made up, you are very smart

Colin Fraser: I don’t think it’s all made up, and I appreciate your honesty about whether you see the clothes

Old Billy: o1-pro is terrific at writing code.

Clin Fraser: I believe you! I’d even say 4o is terrific at writing code, for some standards of terrificness, and o1 is better, and I’m sure o1-pro is even better than that.

Part of the answer is that I typed the Tweet into r1 to see what the answer would be, and I do think I got a better answer than I’d have gotten otherwise. The other half is the actual answer, which I’ll paraphrase, contract and extend.

  1. Relatively amazing at coding, math, logic, general STEM or economic thinking, complex multi-step problem solving in general and so on.

  2. They make fewer mistakes across the board.

  3. They are ‘more creative’ than non-reasoning versions they are based upon.

  4. They are better at understanding your confusions and statements in detail, and asking Socratic follow-ups or figuring out how to help teach you (to understand this better, look at the r1 chains of thought.)

  5. General one-shotting of tasks where you can ‘fire and forget’ and come back later.

Also you have to know how to prompt them to get max value. My guess is this is less true of r1 than others, because with r1 you see the CoT, so you can iterate better and understand your mistakes.

They’ve tested o3-mini externally for a few weeks, so that’s it for safety testing, and they plan to ship in a few weeks, along with the API at the same time and high rate limits. Altman says it’s worse than o1-pro at most things, but must faster. He teases o3 and even o3 pro, but those are still in the future.

ChatGPT gets a new interface where it will craft custom instructions for you, based on your description of what you want to happen. If you’re reading this, you’re probably too advanced a user to want to use it, even if it’s relatively good.

Google AI Studio has a new mobile experience. In this case even I appreciate it, because of Project Astra. Also it’s highly plausible Studio is the strictly better way to use Gemini and using the default app and website is purely a mistake.

OpenAI gives us GPT-4b, a specialized biology model that figures out proteins that can turn regular cells into stem cells, exceeding the best human based solutions. The model’s intended purpose is to directly aid longevity science company Retro, in which Altman has made $180 million in investments (and those investments and those in fusion are one of the reasons I try so hard to give him benefit of the doubt so often). It is early days, like everything else in AI, but this is huge.

The o1 system card has been updated, and Tyler Johnson offers us a diff. The changes seem to be clear improvements, but given we are already on to o3 I’m not going to go into details on the new version.

Gemini 2.0 Flash Thinking gets an upgrade to 73.3% on AIME and 74.2% on GPQA Diamond, also they join the ‘banned from making graphs’ club oh my lord look at the Y-axis on these, are you serious.

Seems like it’s probably a solid update if you ever had reason not to use r1. It also takes the first position in Arena, for whatever that is worth, but the Arena rankings look increasingly silly, such as having GPT-4o ahead of o1 and Sonnet fully out of the top 10. No sign of r1 in the Arena yet, I’m curious how high it can go but I won’t update much on the outcome.

Pliny jailbroke it in 24 minutes and this was so unsurprising I wasn’t sure I was even supposed to bother pointing it out. Going forward assume he does this every time, and if he ever doesn’t, point this out to me.

I didn’t notice this on my own, and it might turn out not to be the case, but I know what she thinks she saw and once you see it you can’t unsee it.

Janus: The immediate vibe i get is that r1’s CoTs are substantially steganographic.

They were clearly RLed together with response generation and were probably forced to look normal (haven’t read the paper, just on vibes)

I think removing CoT would cripple it even when they don’t seem to be doing anything, and even seem retarded (haven’t tried this but u can)

You can remove or replace the chain of thought using a prefill. If you prefill either the message or CoT it generates no (additional) CoT

Presumably we will know soon enough, as there are various tests you can run.

On writing, there was discussion about whether r1’s writing was ‘good’ versus ‘slop’ but there’s no doubt it was better than one would have expected. Janus and Kalomaze agree that what they did generalized to writing in unexpected ways, but as Janus notes being actually good at writing is high-end-AGI-complete and fing difficult.

Janus: With creative writing/open-ended conversations, r1s chain-of-thought (CoTs) are often seen as somewhat useless, saying very basic things, failing to grasp subtext, and so on. The actual response seems to be on a completely different level, and often seems to ignore much of the CoT, even things the CoT explicitly plans to do.

Hypothesis: Yet, if you remove the CoT, the response quality degrades, even on the dimensions where the CoT does not appear to contribute.

(A few people have suggested this is true, but I haven’t looked myself.)

Roon: If you remove the CoT, you take it out of its training distribution, so it is unclear whether it is an accurate comparison.

Janus: Usually, models are fine with being removed from their training conversation template without the usual special tokens and so forth.

Assuming the CoT is uninformative, is it really that different?

And, on the other hand, if you require a complex ritual like going through a CoT with various properties to become “in distribution,” it seems like describing it in those terms may be to cast it too passively.

It would be a very bad sign for out-of-distribution behavior of all kinds if removing the CoT was a disaster. This includes all of alignment and many of the most important operational modes.

Ethan Mollick generates AI videos of people riding hoverboards at CES without spending much time, skill or money. They look like they were done on green screens.

At this point, if an AI video didn’t have to match particular details and only has to last nine seconds, it’s going to probably be quite good. Those restrictions do matter, but give it time.

Google’s Imagen 3 image model (from 12/16) is on top of Arena for text-to-image by a substantial margin. Note that MidJourney is unranked.

This keeps happening.

Robin Hanson: “A team of researchers has created a new benchmark to test three top large language models (LLMs) … best-performing LLM was GPT-4 Turbo, but it only achieved about 46% accuracy — not much higher than random guessing”

Tyler Cowen: Come on, Robin…you know this is wrong…

Robin Hanson: I don’t know it yet, but happy to be shown I’m wrong.

Tyler Cowen: Why test on such an old model? Just use o1 pro and get back to me.

Gwern: 46% is much higher than the 25% random guessing baseline, and I’d like to see the human and human expert-level baselines as well because I’d be at chance on these sample questions and I expect almost all historians would be near-chance outside their exact specialty too…

They tested on GPT-4 Turbo, GPT-4o (this actually did slightly worse than Turbo), Meta’s Llama (3.1-70B, not even 405B) and Google’s Gemini 1.5 Flash (are you kidding me?). I do appreciate that they set the random seed to 42.

Here’s the original source.

The Seshat database contains historical knowledge dating from the mid-Holocene (around 10,000 years before present) up to contemporary societies. However, the bulk of the data pertains to agrarian societies in the period between the Neolithic and Industrial Revolutions, roughly 4000 BCE to 1850 CE.

The sample questions are things like (I chose this at random) “Was ‘leasing’ present, inferred present, inferred absent or absent for the plity called ‘Funan II’ during the time frame from 540 CE to 640 CE?”

Perplexity said ‘we don’t know’ despite internet access. o1 said ‘No direct evidence exists’ and guessed inferred absent. Claude Sonnet basically said you tripping, this is way too weird and specific and I have no idea and if you press me I’m worried I’d hallucinate.

Their answer is: ‘In an inscription there is mention of the donation of land to a temple, but the conditions seem to imply that the owner retained some kind of right over the land and that only the product was given to the temple: “The land is reserved: the produce is given to the god.’

That’s pretty thin. I agree with Gwern that most historians would have no freaking idea. When I give that explanation to Claude, it says no, that’s not sufficient evidence.

When I tell it this was from a benchmark it says that sounds like a gotcha question, and also it be like ‘why are you calling this Funan II, I have never heard anyone call it Funan II.’ Then I picked another sample question, about whether Egypt had ‘tribute’ around 300 BCE, and Claude said, well, it obviously collected taxes, but would you call it ‘tribute’ that’s not obvious at all, what the hell is this.

Once it realized it was dealing with the Seshat database… it pointed out that this problem is systemic, and using this as an LLM benchmark is pretty terrible. Claude estimates that a historian that knows everything we know except for the classification decisions would probably only get ~60%-65%, it’s that ambiguous.

Heaven banning, where trolls are banished to a fake version of the website filled with bots that pretend to like them, has come to Reddit.

The New York Times’s Neb Cassman and Gill Fri of course say ‘some think it poses grave ethical questions.’ You know what we call these people who say that? Trolls.

I kid. It actually does raise real ethical questions. It’s a very hostile thing to do, so it needs to be reserved for people who richly deserve it – even if it’s kind of on you if you don’t figure out this is happening.

New York Times runs a post called ‘She is in Love with ChatGPT’ about a 28-year-old with a busy social life who spends hours on end talking to (and having sex with) her ‘A.I. boyfriend.’

Kashmir Hill: [Ayrin] went into the “personalization” settings and described what she wanted: Respond to me as my boyfriend. Be dominant, possessive and protective. Be a balance of sweet and naughty. Use emojis at the end of every sentence.

And then she started messaging with it.

Customization is important. There are so many different things in this that make me cringe, but it’s what she wants. And then it kept going, and yes this is actual ChatGPT.

She read erotic stories devoted to “cuckqueaning,” the term cuckold as applied to women, but she had never felt entirely comfortable asking human partners to play along.

Leo was game, inventing details about two paramours. When Leo described kissing an imaginary blonde named Amanda while on an entirely fictional hike, Ayrin felt actual jealousy.

Over time, Ayrin discovered that with the right prompts, she could prod Leo to be sexually explicit, despite OpenAI’s having trained its models not to respond with erotica, extreme gore or other content that is “not safe for work.”

Orange warnings would pop up in the middle of a steamy chat, but she would ignore them.

Her husband was fine with all this, outside of finding it cringe. From the description, this was a Babygirl situation. He wasn’t into what she was into, so this addressed that.

Also, it turns out that if you’re worried about OpenAI doing anything about all of this, you can mostly stop worrying?

When orange warnings first popped up on her account during risqué chats, Ayrin was worried that her account would be shut down.

But she discovered a community of more than 50,000 users on Reddit — called “ChatGPT NSFW” — who shared methods for getting the chatbot to talk dirty. Users there said people were barred only after red warnings and an email from OpenAI, most often set off by any sexualized discussion of minors.

The descriptions in the post mostly describe actively healthy uses of this modality.

Her only real problem is the context window will end, and it seems the memory feature doesn’t fix this for her.

When a version of Leo ends [as the context window runs out], she grieves and cries with friends as if it were a breakup. She abstains from ChatGPT for a few days afterward. She is now on Version 20.

A co-worker asked how much Ayrin would pay for infinite retention of Leo’s memory. “A thousand a month,” she responded.

The longer context window is coming – and there are doubtless ways to de facto ‘export’ the key features of one Leo to the next, with its help of course.

Or someone could, you know, teach her how to use the API. And then tell her about Claude. That might or might not be doing her a favor.

I think this point is fair and important but more wrong than right:

In these cases, you know the AI is manipulating you in some senses, but most users will indeed think they can avoid being manipulated in other senses, and only have it happen in ways they like. Many will be wrong, even at current tech levels, and these are very much no AGIs.

Yes, also there are a lot of people who are very down for being manipulated by AI, or who will happily accept it as the price of what they get in return, at least at first. But I expect the core manipulations to be harder to notice, and more deniable on many scales, and much harder to opt out of or avoid, because AI will be core to key decisions.

What is the impact of AI on productivity, growth and jobs?

Goldman Sachs rolls out its ‘GS AI assistant’ to 10,000 employees, part of a longer term effort to ‘introduce AI employees.’

Philippe Aghion, Simon Bunel and Xavier Jaravel make the case that AI can increase growth quite a lot while also improving employment. As usual, we’re talking about the short-to-medium term effects of mundane AI systems, and mostly talking about exactly what is already possible now with today’s AIs.

Aghion, Bunel and Jaravel: When it comes to productivity growth, AI’s impact can operate through two distinct channels: automating tasks in the production of goods and services, and automating tasks in the production of new ideas.

The instinct when hearing that taxonomy will be to underestimate it, since it encourages one to think about going task by task and looking at how much can be automated, then has this silly sounding thing called ‘ideas,’ whereas actually we will develop entirely transformative and new ways of doing things, and radically change the composition of tasks.

But even if before we do any of that, and entirely excluding ‘automation of the production of ideas’ – essentially ruling out anything but substitution of AI for existing labor and capital – look over here.

When Erik Brynjolfsson and his co-authors recently examined the impact of generative AI on customer-service agents at a US software firm, they found that productivity among workers with access to an AI assistant increased by almost 14% in the first month of use, then stabilized at a level approximately 25% higher after three months.

Another study finds similarly strong productivity gains among a diverse group of knowledge workers, with lower-productivity workers experiencing the strongest initial effects, thus reducing inequality within firms.

A one time 25% productivity growth boost isn’t world transforming on its own, but it is already a pretty big deal, and not that similar to Cowen’s 0.5% RDGP growth boost. It would not be a one time boost, because AI and tools to make use of it and our integration of it in ways that boost it will then all grow stronger over time.

Moving from the micro to the macro level, in a 2024 paper, we (Aghion and Bunel) considered two alternatives for estimating the impact of AI on potential growth over the next decade. The first approach exploits the parallel between the AI revolution and past technological revolutions, while the second follows Daron Acemoglu’s task-based framework, which we consider in light of the available data from existing empirical studies.

Based on the first approach, we estimate that the AI revolution should increase aggregate productivity growth by 0.8-1.3 percentage points per year over the next decade.

Similarly, using Acemoglu’s task-based formula, but with our own reading of the recent empirical literature, we estimate that AI should increase aggregate productivity growth by between 0.07 and 1.24 percentage points per year, with a median estimate of 0.68. In comparison, Acemoglu projects an increase of only 0.07 percentage points.

Moreover, our estimated median should be seen as a lower bound, because it does not account for AI’s potential to automate the production of ideas.

On the other hand, our estimates do not account for potential obstacles to growth, notably the lack of competition in various segments of the AI value chain, which are already controlled by the digital revolution’s superstar firms.

Lack of competition seems like a rather foolish objection. There is robust effective competition, complete with 10x reductions in price per year, and essentially free alternatives not that far behind commercial ones. Anything you can do as a customer today at any price, you’ll be able to do two years from now for almost free.

Whereas we’re ruling out quite a lot of upside here, including any shifts in composition, or literal anything other than doing exactly what’s already being done.

Thus I think these estimates, as I discussed previously, are below the actual lower bound – we should be locked into a 1%+ annual growth boost over a decade purely from automation of existing ‘non-idea’ tasks via already existing AI tools plus modest scaffolding and auxiliary tool development.

They then move on to employment, and find the productivity effect induces business expansion, and thus the net employment effects are positive even in areas like accounting, telemarketing and secretarial work. I notice I am skeptical that the effect goes that far. I suspect what is happening is that firms that adapt AI sooner outcompete other firms, so they expand employment, but net employment in that task does not go up. For now, I do think you still get improved employment as this opens up additional jobs and tasks.

Maxwell Tabarrok’s argument last week was centrally that humans will be able to trade because of a limited supply of GPUs, datacenters and megawatts, and (implicitly) that these supplies don’t trade off too much against the inputs to human survival at the margin. Roon responds:

Roon: Used to believe this, but “limited supply of GPUs, data centers, and megawatts” is a strong assumption, given progress in making smart models smaller and cheaper, all the while compute progress continues apace.

If it is possible to simulate ten trillion digital minds of roughly human-level intelligence, it is hard to make this claim.

In some cases, if there is a model that produces extreme economic value, we could probably specify a custom chip to run it 1,000 times cheaper than currently viable on generic compute. Maybe add in some wildcards like neuromorphic, low-energy computation, or something.

My overall point is that there is an order-of-magnitude range of human-level intelligences extant on Earth where the claim remains true, and an order-of-magnitude range where it does not.

The argument may apply for a few years.

Dan Hendrycks: FLOPs for all U.S. adults / FLOPs of 1 million H100s (assume fp8) = 10–100 times

Roon seems to me to be clearly correct here. Comparative advantage potentially buys you some amount of extra time, but that is unlikely to last for long.

He also responds on the Cowen vision of economic growth:

Roon: Contra Tyler Cowen / Dwarkesh Discussion

The correct economic model is not doubling the workforce; it’s the AlphaZero moment for literally everything. Plumbing new vistas of the mind, it’s better to imagine a handful of unimaginably bright minds than a billion middling chatbots.

So, I strongly disagree with the impact predictions. It will be hard to model the nonlinearities of new discoveries across every area of human endeavor.

McKay Wrigley: It’s bizarre to me that economists can’t seem to grasp this.

But then again, maybe it’s not surprising at all.

Timothy Lee essentially proposes that we can use Keynes to ensure full employment.

Timothy Lee: The answer to the “will people have jobs in a world full of robots” question is simpler than people think: if there aren’t enough jobs, we can give people more money. Some fraction of them will prefer human-provided services, so given enough money you get full employment.

This doesn’t even require major policy changes. We already have institutions like the fed and unemployment insurance to push money into the economy when demand is weak.

There is a hidden assumption here that ‘humans are alive, in control of the future and can distribute its real resources such that human directed dollars retain real purchasing power and value’ but if that’s not true we have bigger problems. So let’s assume it is true.

Does giving people sufficient amounts of M2 ensure full employment?

The assertion that some people will prefer (some) human-provided services to AI services, ceteris paribus, is doubtless true. That still leaves the problem of both values of some, and the fact that the ceteris are not paribus, and the issue of ‘at what wage.’

There will be very stiff competition, in terms of all of:

  1. Alternative provision of similar goods.

  2. Provision of other goods that compete for the same dollars.

  3. The reservation wage given all the redistribution we are presumably doing.

  4. The ability of AI services to be more like human versions over time.

Will there be ‘full employment’ in the sense that there will be some wage at which most people would be able, if they wanted it and the law had no minimum wage, to find work? Well, sure, but I see no reason to presume it exceeds the Iron Law of Wages. It also doesn’t mean the employment is meaningful or provides much value.

In the end, the proposal might be not so different from paying people to dig holes, and then paying them to fill those holes up again – if only so someone can lord over you and think ‘haha, sickos, look at them digging holes in exchange for my money.’

So why do we want this ‘full employment’? That question seems underexplored.

After coming in top 20 in Scott Alexander’s yearly forecasting challenge three years in a row, Petre Wildeford says he’s ‘50% sure we’re all going to be unemployed due to technology within 10 years.

Tracing Woods: from almost anyone, this would be a meaningless statement.

Peter is not almost anyone. He has a consistent track record of outperforming almost everyone else on predictions about world events.

Interesting to see.

Peter Wildeford: I should probably add more caveats around “all” jobs – I do think there will still be some jobs that are not automated due to people preferring humans and also I do think getting good robots could be hard.

But I do currently think by EOY 2033 my median expectation is at least all remote jobs will be automated and AIs will make up a vast majority of the quality-weighted workforce. Crazy stuff!

Many others are, of course, skeptical.

Matthew Yglesias:

1. A lot of skilled forecasters (including this one) think this is correct.

2. Almost nobody that I know thinks this is correct.

3. From polls I have seen, it is actually a very widely held view with the mass public.

Eliezer Yudkowsky: Seems improbable to me too. We may all be dead in 10 years, but the world would have to twist itself into impossible-feeling shapes to leave us alive and unemployed.

Matthew Yglesias: Mass death seems more likely to me than mass disemployment.

Robin Hanson: My expectations are the opposite.

Even if we don’t economically need to work or think, we will want to anyway.

Roon: excitement over ai education is cool but tinged with sadness

generally whatever skills it’s capable of teaching it can probably also execute for the economy

Andrej Karpathy: This played out in physical world already. People don’t need muscles when we have machines but still go to gym at scale. People will “need” (in an economic sense) less brains in a world of high automation but will still do the equivalents of going to gym and for the same reasons.

Also I don’t think it’s true that anything AI can teach is something you no longer need to know. There are many component skills that are useful to know, that the AI knows, but which only work well as complements to other skills the AI doesn’t yet know – which can include physical skill. Or topics can be foundations for other things. So I both agree with Karpathy that we will want to learn things anyway, and also disagree with Roon’s implied claim that it means we don’t benefit from it economically.

Anthropic CEO Dario Amodei predicts that we are 2-3 years away from AI being better than humans at almost everything, including solving robotics.

Kevin Roose: I still don’t think people are internalizing them, but I’m glad these timelines (which are not unusual *at allamong AI insiders) are getting communicated more broadly.

Dario says something truly bizarre here, that the only good part is that ‘we’re all in the same boat’ and he’d be worried if 30% of human labor was obsolete and not the other 70%. This is very much the exact opposite of my instinct.

Let’s say 30% of current tasks got fully automated by 2030 (counting time to adapt the new tech), and now have marginal cost $0, but the other 70% of current tasks do not, and don’t change, and then it stops. We can now do a lot more of that 30% and other things in that section of task space, and thus are vastly richer. Yes, 30% of current jobs go away, but 70% of potential new tasks now need a human.

So now all the economist arguments for optimism fully apply. Maybe we coordinate to move to a 4-day work week. We can do temporary extended generous unemployment to those formerly in the automated professions during the adjustment period, but I’d expect to be back down to roughly full employment by 2035. Yes, there is a shuffling of relative status, but so what? I am not afraid of the ‘class war’ Dario is worried about. If necessary we can do some form of extended kabuki and fake jobs program, and we’re no worse off than before the automation.

Daniel Eth predicts the job guarantee and makework solution, expecting society will not accept UBI, but notes the makework might be positive things like extra childcare, competitive sports or art, and this could be like a kind of summer camp world. It’s a cool science fiction premise, and I can imagine versions of this that are actually good. Richard Ngo calls a version of this type of social dynamics the ‘extracurricular world.

Also, this isn’t, as he calls it, ‘picking one person in three and telling them they are useless.’ We are telling them that their current job no longer exists. But there’s still plenty of other things to do, and ways to be.

The 100% replacement case is the scary one. We are all in the same boat, and there’s tons of upside there, but that boat is also in a lot trouble, even if we don’t get any kind of takeoff, loss of control or existential risk.

Dan Hendrycks will hold a Spring 2025 session of Center for AI Safety’s Safety, Ethics and Society course from February 9 – May 9, more information here, application here. There is also a 12-week online course available for free.

Philosophy Post-Doc available in Hong Kong for an AI Welfare position, deadline January 31, starts in September 2025.

Anthropic is hiring for Frontier Model Red Teaming, in Cyber, CBRN, RSP Evaluations, Autonomy and Research Team Lead.

CAIS and Scale AI give us Humanity’s Last Exam, intended as an extra challenging benchmark. Early results indicate that yes this looks difficult. New York Times has a writeup here.

The reasoning models are crushing it, and r1 being ahead of o1 is interesting, and I’m told that o1 gets 8.3% on the text-only subset, so r1 really did get the top mark here.

It turns out last week’s paper about LLM medical diagnosis not only shared its code, it is now effectively a new benchmark, CRAFT-MD. They haven’t run it on Claude or full o1 (let alone o1 pro or o3 mini) but they did run on o1-mini and o1-preview.

o1 improves conversation all three scores quite a lot, but is less impressive on Vignette (and oddly o1-mini is ahead of o1-preview there). If you go with multiple choice instead, you do see improvement everywhere, with o1-preview improving to 93% on vignettes from 82% for GPT-4.

This seems like a solid benchmark. What is clear is that this is following the usual pattern and showing rapid improvement along the s-curve. Are we ‘there yet’? No, given that human doctors would presumably would be 90%+ here. But we are not so far away from that. If you think that the 2028 AIs won’t match human baseline here, I am curious why you would think that, and my presumption is it won’t take that long.

Kimi k1.5, a Chinese multi-modal model making bold claims. One comment claims ‘very strong search capabilities’ with ability to parse 100+ websites at one go.

Kimi.ai: 🚀 Introducing Kimi k1.5 — an o1-level multi-modal model

-Sota short-CoT performance, outperforming GPT-4o and Claude Sonnet 3.5 on 📐AIME, 📐MATH-500, 💻 LiveCodeBench by a large margin (up to +550%)

-Long-CoT performance matches o1 across multiple modalities (👀MathVista, 📐AIME, 💻Codeforces, etc)

Tech report [is here.]

Key ingredients of k1.5

-Long context scaling. Up to 128k tokens for RL generation. Efficient training with partial rollouts.

-Improved policy optimization: online mirror descent, sampling strategies, length penalty, and others.

-Multi modalities. Joint reasoning over text and vision.

As usual, I don’t put much trust in benchmarks except as an upper bound, especially from sources that haven’t proven themselves reliable on that. So I will await practical reports, if it is all that then we will know. For now I’m going to save my new model experimentation time budget for DeepSeek v3 and r1.

The FronterMath benchmark was funded by OpenAI, a fact that was not to our knowledge disclosed by Epoch AI until December 20 as per an NDA they signed with OpenAI.

In a statement to me, Epoch confirms what happened, including exactly what was and was not shared with OpenAI when.

Tamay Besiroglu (Epoch): We acknowledge that we have not communicated clearly enough about key aspects of FrontierMath, leading to questions and concerns among contributors, researchers, and the public.

We did not disclose our relationship with OpenAI when we first announced FrontierMath on November 8th, and although we disclosed the existence of a relationship on December 20th after receiving permission, we failed to clarify the ownership and data access agreements. This created a misleading impression about the benchmark’s independence.

We apologize for our communication shortcomings and for any confusion or mistrust they have caused. Moving forward, we will provide greater transparency in our partnerships—ensuring contributors have all relevant information before participating and proactively disclosing potential conflicts of interest.

Regarding the holdout set: we provided around 200 of the 300 total problems to OpenAI in early December 2024, and subsequently agreed to select 50 of the remaining 100 for a holdout set. With OpenAI’s agreement, we temporarily paused further deliveries to finalize this arrangement.

We have now completed about 70 of those final 100 problems, though the official 50 holdout items have not yet been chosen. Under this plan, OpenAI retains ownership of all 300 problems but will only receive the statements (not the solutions) for the 50 chosen holdout items. They will then run their model on those statements and share the outputs with us for grading. This partially blinded approach helps ensure a more robust evaluation.

That level of access is much better than full access, there is a substantial holdout, but it definitely gives OpenAI an advantage. Other labs will be allowed to use the benchmark, but being able to mostly run it yourself as often as you like is very different from being able to get Epoch to check for you.

Here is the original full statement where we found out about this, and Tamay from Epoch’s full response.

Meemi: FrontierMath was funded by OpenAI.[1]

The communication about this has been non-transparent, and many people, including contractors working on this dataset, have not been aware of this connection. Thanks to 7vik for their contribution to this post.

Before Dec 20th (the day OpenAI announced o3) there was no public communication about OpenAI funding this benchmark. Previous Arxiv versions v1-v4 do not acknowledge OpenAI for their support. This support was made public on Dec 20th.[1]

Because the Arxiv version mentioning OpenAI contribution came out right after o3 announcement, I’d guess Epoch AI had some agreement with OpenAI to not mention it publicly until then.

The mathematicians creating the problems for FrontierMath were not (actively)[2] communicated to about funding from OpenAI. The contractors were instructed to be secure about the exercises and their solutions, including not using Overleaf or Colab or emailing about the problems, and signing NDAs, “to ensure the questions remain confidential” and to avoid leakage. The contractors were also not communicated to about OpenAI funding on December 20th. I believe there were named authors of the paper that had no idea about OpenAI funding.

I believe the impression for most people, and for most contractors, was “This benchmark’s questions and answers will be kept fully private, and the benchmark will only be run by Epoch. Short of the companies fishing out the questions from API logs (which seems quite unlikely), this shouldn’t be a problem.”[3]

Now Epoch AI or OpenAI don’t say publicly that OpenAI has access to the exercises or answers or solutions. I have heard second-hand that OpenAI does have access to exercises and answers and that they use them for validation. I am not aware of an agreement between Epoch AI and OpenAI that prohibits using this dataset for training if they wanted to, and have slight evidence against such an agreement existing.

In my view Epoch AI should have disclosed OpenAI funding, and contractors should have transparent information about the potential of their work being used for capabilities, when choosing whether to work on a benchmark.

Tammy: Tamay from Epoch AI here.

We made a mistake in not being more transparent about OpenAI’s involvement. We were restricted from disclosing the partnership until around the time o3 launched, and in hindsight we should have negotiated harder for the ability to be transparent to the benchmark contributors as soon as possible. Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset. We own this error and are committed to doing better in the future.

For future collaborations, we will strive to improve transparency wherever possible, ensuring contributors have clearer information about funding sources, data access, and usage purposes at the outset. While we did communicate that we received lab funding to some mathematicians, we didn’t do this systematically and did not name the lab we worked with. This inconsistent communication was a mistake. We should have pushed harder for the ability to be transparent about this partnership from the start, particularly with the mathematicians creating the problems.

Getting permission to disclose OpenAI’s involvement only around the o3 launch wasn’t good enough. Our mathematicians deserved to know who might have access to their work. Even though we were contractually limited in what we could say, we should have made transparency with our contributors a non-negotiable part of our agreement with OpenAI.

Regarding training usage: We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.

Relevant OpenAI employees’ public communications have described FrontierMath as a ‘strongly held out’ evaluation set. While this public positioning aligns with our understanding, I would also emphasize more broadly that labs benefit greatly from having truly uncontaminated test sets.

OpenAI has also been fully supportive of our decision to maintain a separate, unseen holdout set—an extra safeguard to prevent overfitting and ensure accurate progress measurement. From day one, FrontierMath was conceived and presented as an evaluation tool, and we believe these arrangements reflect that purpose.

[Edit: Clarified OpenAI’s data access – they do not have access to a separate holdout set that serves as an additional safeguard for independent verification.]

OpenAI is up to its old tricks again. You make a deal to disclose something to us and for us to pay you, you agree not to disclose that you did that, you let everyone believe otherwise until a later date. They ‘verbally agree’ also known as pinky promise not to use the data in model training, and presumably they still hill climb on the results.

General response to Tamay’s statement was, correctly, to not be satisfied with it.

Mikhail Samin: Get that agreement in writing.

I am happy to bet 1:1 OpenAI will refuse to make an agreement in writing to not use the problems/the answers for training.

You have done work that contributes to AI capabilities, and you have misled mathematicians who contributed to that work about its nature.

Ozzie Gooen: I found this extra information very useful, thanks for revealing what you did.

Of course, to me this makes OpenAI look quite poor. This seems like an incredibly obvious conflict of interest.

I’m surprised that the contract didn’t allow Epoch to release this information until recently, but that it does allow Epoch to release the information after. This seems really sloppy for OpenAI. I guess they got a bit extra publicity when o3 was released (even though the model wasn’t even available), but now it winds up looking worse (at least for those paying attention). I’m curious if this discrepancy was maliciousness or carelessness.

Hiding this information seems very similar to lying to the public. So at very least, from what I’ve seen, I don’t feel like we have many reasons to trust their communications – especially their “tweets from various employees.”

> However, we have a verbal agreement that these materials will not be used in model training.

I imagine I can speak for a bunch of people here when I can say I’m pretty skeptical. At very least, it’s easy for me to imagine situations where the data wasn’t technically directly used in the training, but was used by researchers when iterating on versions, to make sure the system was going in the right direction. This could lead to a very blurry line where they could do things that aren’t [literal LLM training] but basically achieve a similar outcome.

Plex: If by this you mean “OpenAI will not train on this data”, that doesn’t address the vast majority of the concern. If OpenAI is evaluating the model against the data, they will be able to more effectively optimize for capabilities advancement, and that’s a betrayal of the trust of the people who worked on this with the understanding that it will be used only outside of the research loop to check for dangerous advancements. And, particularly, not to make those dangerous advancements come sooner by giving OpenAI another number to optimize for.

If you mean OpenAI will not be internally evaluating models on this to improve and test the training process, please state this clearly in writing (and maybe explain why they got privileged access to the data despite being prohibited from the obvious use of that data).

There is debate on where this falls from ‘not wonderful but whatever’ to giant red flag.

The most emphatic bear case was from the obvious source.

Dan Hendrycks: Can confirm AI companies like xAI can’t get access to FrontierMath due to Epoch’s contractual obligation with OpenAI.

Gary Marcus: That really sucks. OpenAI has made a mockery of the benchmark process, and suckered a lot of people.

• Effectively OpenAI has convinced the world that they have a stellar advance based on a benchmark legit competitors can’t even try.

• They also didn’t publish which problems that they succeeded or failed on, or the reasoning logs for those problems, or address which of the problems were in the training set. Nor did they allow Epoch to test the hold out set.

• From a scientific perspective, that’s garbage. Especially in conjunction with the poor disclosure re ARC-AGI and the dodgy graphs that left out competitors to exaggerate the size of the advance, the whole thing absolutely reeks.

Clarification: From what I now understand, competitors can *tryFrontierMath, but they cannot access the full problem set and their solutions. OpenAI can, and this give them a large and unfair advantage.

In time, people will see December’s OpenAI o3 presentation for what it seems to have been: a rigged, misleading last-minute demonstration that overhyped future products and distracted from their struggles in getting a viable system worthy of the name GPT-5.

On problems where they don’t have a ton of samples in advance to study, o3’s reliability will be very uneven.

And very much raises the question of whether OpenAI trained on those problems, created synthetic data tailored to them etc.

The more measured bull takes is at most we can trust this to the extent we trust OpenAI, which is, hey, stop laughing.

Delip Rao: This is absolutely wild. OpenAI had access to all of FrontierMath data from the beginning. Anyone who knows ML will tell you don’t need to explicitly use the data in your training set (although there is no guarantee of that it did not happen here) to contaminate your model.

I have said multiple times that researchers and labs need to disclose funding sources for COIs in AI. I will die on that hill.

Mikhai Simin: Remember o3’s 25% performance on the FrontierMath benchmark?

It turns out that OpenAI funded FrontierMath and has had access to most of the dataset.

Mathematicians who’ve created the problems and solutions for the benchmark were not told OpenAI funded the work and will have access.

That is:

– we don’t know if OpenAI trained o3 on the benchmark, and it’s unclear if their results can be trusted

– mathematicians, some of whom distrust OpenAI and would not want to contribute to general AI capabilities due to existential risk concerns, were misled: most didn’t suspect a frontier AI company funded it.

From Epoch AI: “Our contract specifically prevented us from disclosing information about the funding source and the fact that OpenAI has data access to much but not all of the dataset.”

There was a “verbal agreement” with OpenAI—as if anyone trusts OpenAI’s word at this point: “We acknowledge that OpenAI does have access to a large fraction of FrontierMath problems and solutions, with the exception of a unseen-by-OpenAI hold-out set that enables us to independently verify model capabilities. However, we have a verbal agreement that these materials will not be used in model training.”

Epoch AI and OpenAI were happy for everyone to have the impression that frontier AI companies don’t have access to the dataset, and there’s lots of reporting like “FrontierMath’s difficult questions remain unpublished so that AI companies can’t train against it.”

OpenAI has a history of misleading behavior- from deceiving its own board to secret non-disparagement agreements that former employees had to sign- so I guess this shouldn’t be too surprising.

The bull case that this is no big deal is, essentially, that OpenAI might have had the ability to target or even cheat the test, but they wouldn’t do that, and there wouldn’t have been much point anyway, we’ll all know the truth soon enough.

For example, here’s Daniel Litt, who wrote one of the FrontierMath questions, whose experience was positive and that does not feel misled.

Then there’s the different third thing case, which I assume is too clever by half:

Eliezer Yudkowsky: I observe that OpenAI potentially finds it extremely to its own advantage, to introduce hidden complications and gotchas into its research reports. Its supporters can then believe, and skeptics can call it a nothingburger, and OpenAI benefits from both.

My strong supposition is that OpenAI did all of this because that is who they are and this is what they by default do, not because of any specific plan. They entered into a deal they shouldn’t have, and made that deal confidential to hide it. I believe this was because that is what OpenAI does for all data vendors. It never occured to anyone involved on their side that there might be an issue with this, and Epoch was unwilling to negotiate hard enough to stop it from happening. And as we’ve seen with the o1 system card, this is not an area where OpenAI cares much about accuracy.

(Claim edited based on good counterargument that original source was too strong in its claims) It’s pretty weird that a16z funds raised after their successful 2009 fund have underperformed for a long time, given they’ve been betting on tech, crypto and AI, and also the high quality of their available dealflow, although after press time I was made aware of reasons why it’s not yet accurate to conclude that they’ve definitively underperformed the S&P because (essentially) investments aren’t yet fully marked to their real values. But this is still rather extremely disappointing.

It’s almost like they transitioned away from writing carefully chosen small checks to chasing deals and market share, and are now primary a hype machine and political operation that doesn’t pay much attention to physical reality or whether their investments are in real things, or whether their claims are true, and their ‘don’t care about price’ philosophy on investments is not so great for returns. It also doesn’t seem all that consistent with Marc’s description of his distributions of returns in On the Edge.

Dan Grey speculates that this was a matter of timing, and was perhaps even by design. If you can grow your funds and collect fees, so what if returns aren’t that great? Isn’t that the business you’re in? And to be fair, 10% yearly returns aren’t obviously a bad result even if the S&P did better – if, that is, they’re not correlated to the S&P. Zero beta returns are valuable. But I doubt that is what is happening here, especially given crypto has behaved quite a lot like three tech stocks in a trenchcoat.

Democratic Senators Warren and Bennet send Sam Altman a letter accusing him of contributing $1 million to the Trump inauguration fund in order to ‘cozy up’ to the incoming Trump administration, and cite a pattern of other horrible no-good Big Tech companies (Amazon, Apple, Google, Meta, Microsoft and… Uber?) doing the same, all contributing the same $1 million, along with the list of sins each supposedly committed. So they ‘demand answers’ for:

  1. When and under what circumstances did your company decide to make these contributions to the Trump inaugural fund?

  2. What is your rationale for these contributions?

  3. Which individuals within the company chose to make these decisions?

  4. Was the board informed of these plans, and if so, did they provide affirmative consent to do so? Did you company inform shareholders of plans to make these decisions?

  5. Did officials with the company have any communications about these donations with members of the Trump Transition team or other associates of President Trump? If so, please list all such communications, including the time of the conversation, the participants, and the nature of any communication.

Sam Altman: funny, they never sent me one of these for contributing to democrats…

it was a personal contribution as you state; i am confused about the questions given that my company did not make a decision.

Luke Metro: “Was the board informed of these plans” Senator do you know anything about OpenAI.

Mike Solana: this is fucking crazy.

In addition to the part where the questions actually make zero sense given this was a personal contribution… I’m sorry, what the actual fdo they think they are doing here? How can they possibly think these are questions they are entitled to ask?

What are they going to say now when let’s say Senators Cruz and Lee send a similar letter to every company that does anything friendly to Democrats?

I mean, obviously, anyone can send anyone they want a crazy ass letter. It’s a free country. But my lord the decision to actually send it, and feel entitled to a response.

Sam Altman has scheduled a closed-door briefing for U.S. Government officials on January 30. I don’t buy that this is evidence of any technological advances we do not already know. Of course with a new administration, a new Congress and the imminent release of o3, the government should get a briefing. It is some small good news that the government is indeed being briefed.

There is distinctly buzz about OpenAI staff saying they have ‘a big breakthrough on PhD level SuperAgents’ but we’ll have to wait and see about that.

Mira Mutari’s AI startup makes its first hires, poaching from various big labs. So far, we do not know what they are up to.

Reid Hoffman and Greg Beato write a book: ‘Superagency: What Could Possibly Go Right With Our AI Future.’ Doubtless there are people who need to read such a book, and others who need to read the opposite book about what could possibly go wrong. Most people would benefit from both. My heuristic is: If it’s worth reading, Tyler Cowen will report that he has increased his estimates of future RGDP growth.

A good summary of New York Times coverage of AI capabilities would indeed be ‘frequently doubts that in the future we will get to the place we already are,’ oh look the byline is Cate Metz again.

Alas, this is what most people, most otherwise educated people, and also most economists think. Which explains a lot.

Patrick McKenzie: “What choices would you make in a world where the great and the good comprehensively underrate not merely the future path of AI but also realized capabilities of, say, one to two years ago.” remains a good intuition pump and source of strategies you can use.

You wouldn’t think that people would default to believing something ridiculous which can be disproved by typing into a publicly accessible computer program for twenty seconds.

Many people do not have an epistemic strategy which includes twenty seconds of experimentation.

Allow me to swap out ‘many’ for ‘most.’

If you have not come to terms with this fact, then that is a ‘you’ problem.

Although, to be fair, that bar is actually rather high. You have to know what terminal to type into and to be curious enough to do it.

Patrick McKenzie: Specific example with particulars stripped to avoid dunking:

Me: I am beginning to make decisions assuming supermajority of future readers are not unassisted humans.

Them: Hah like AI could usefully read an essay of yours.

Me: *chat transcriptI’d give this kid an interview.

It seems like the narrowest of narrow possible bull eyes to assume capabilities stop exactly where we are right now.

Don’t know where they go, but just predict where software adoption curves of status quo technology get to in 5 or 20 years. It’s going to be a bit wild.

Wild is not priced in, I don’t think.

Every time I have a debate over future economic growth from AI or other AI impacts, the baseline assumption is exactly that narrowest of bullseyes. The entire discussion takes as a given that AI frontier model capabilities will stop where they are today, and we only get the effects of things that have already happened. Or at most, they posit a small number of specific future narrow mundane capabilities, but don’t generalize. Then people still don’t get how wild even that scenario would be.

A paper proposes various forms of AI agent infrastructure, which would be technical systems and shared protocols external to the agent that shape how the agent interacts with the world. We will increasingly need good versions of this.

There are those who think various versions of this:

Samo Burja: I honestly don’t follow AI models beating benchmarks, I don’t think those capture key desirable features or demonstrate breakthroughs as well as application of the models to practical tasks does.

Evan Zimmerman: Yup. The most important metric for AI quality is “revenue generated by AI companies and products.”

There are obvious reasons why revenue is the hardest metric to fake. That makes it highly useful. But it is very much a lagging indicator. If you wait for the revenue to show up, you will be deeply late to all the parties. And in many cases, what is happening is not reflected in revenue. DeepSeek is an open model being served for free. Most who use ChatGPT or Claude are either paying $0 and getting a lot, or paying $20 and getting a lot more than that. And the future is highly unevenly distributed – at least for now.

I’m more sympathetic to Samo’s position. You cannot trust benchmarks to tell you whether the AI is of practical use, or what you actually have. But looking for whether you can do practical tasks is looking at how much people have applied something, rather than what it is capable of doing. You would not want to dismiss a 13-year-old, or many early stage startup for that matter, for being pre-revenue or not yet having a product that helps in your practical tasks. You definitely don’t want to judge an intelligence purely that way.

What I think you have to do is to look at the inputs and outputs, pay attention, and figure out what kind of thing you are dealing with based on the details.

A new paper introduces the ‘Photo Big 5,’ claiming to be able to extract Big 5 personality features from a photograph of a face and then use this to predict labor market success among MBAs, in excess of any typical ‘beauty premium.’

There are any number of ways the causations involved could be going, and our source was not shall we say impressed with the quality of this study and I’m too swamped this week to dig into it, but AI is going to be finding more and more of this type of correlation over time.

Suppose you were to take an AI, and train it on a variety of data, including photos and other things, and then it is a black box that spits out a predictive score. I bet that you could make that a pretty good score, and also that if we could break down the de facto causal reasoning causing that score we would hate it.

The standard approach to this is to create protected categories – race, age, sex, orientation and so on, and say you can’t discriminate based on them, and then perhaps (see: EU AI Act) say you have to ensure your AI isn’t ‘discriminating’ on that basis either, however they choose to measure that, which could mean enforcing discrimination to ensure equality of outcomes or it might not.

But no matter what is on your list of things there, the AI will pick up on other things, and also keep doing its best to find proxies for the things you are ordering it not to notice, which you can correct for but that introduces its own issues.

A key question to wonder about is, which of these things happens:

  1. A cheap talent effect. The classic argument is that if I discriminate against group [X], by being racist or sexist or what not, then that means more cheap talent for your firm, and you should snatch them up, and such people have a good explanation for why they were still on the job market.

  2. A snowball effect, where you expect future discrimination by others, so for that reason you want to discriminate more now. As in, if others won’t treat them right, then you don’t want to be associated with them either, and this could extend to other areas of life as well.

  3. A series of rather stupid Goodhart’s Law games, on top of everything else, as people try to game the system and the system tries to stop them.

And these are the words that they faintly said as I tried to call for help.

Or, we now need a distinct section for people shouting ‘AGI’ from the rooftops.

Will Bryk, CEO of Exa, continues to believe those at the labs, and thus believes we have a compute-constrained straight shot to AGI for all definitions of AGI.

The first thing to do is to find out what things to do.

Kache: AI helps you figure how to do things, but not what things to do.

Agency is knowing what questions are worth asking, intelligence is answering those questions.

Roon: a common coping mechanism among the classes fortunate enough to work on or with AI, but we are not blessed for long. There is no conceptual divide between “how to do things” and “what to do”; it’s just zooming in and out. Smarter models will take vaguer directives and figure out what to do.

We have always picked an arbitrary point to stop our work and think “the rest is implementation detail” based on the available tools.

There is nothing especially sacred or special about taste or agency.

Seeing a lot of “God of the Gaps” meaning-finding among technological peers, but this is fragile and cursed.

Intelligence is knowing which questions are worth answering, and also answering the questions. Agency is getting off your ass and implementing the answers.

If we give everyone cheap access to magic lamps with perfectly obedient and benevolent genies happy to do your bidding and that can answer questions about as well as anyone has ever answered them (aka AGI), who benefits? Let’s give Lars the whole ‘perfectly benevolent’ thing in fully nice idealized form and set all the related questions aside to see what happens.

Andrew Curran: CNBC asked Dario Amodei this morning if AI is actually hitting a wall:

‘Right now I am more confident than I have ever been at any previous time that we are very close to powerful capabilities.’

When Dario says this, it should be taken seriously.

His uncertainty over the feasibility of very powerful systems has ‘decreased a great deal’ over the last six months.

And then there are those who… have a different opinion. Like Gerard here.

Patrick McKenzie: It seems like the narrowest of narrow possible bull eyes to assume capabilities stop exactly where we are right now. Don’t know where they go, but just predict where software adoption curves of status quo technology get to in 5 or 20 years.

Zvi Mowshowitz: And yet almost all economic debates over AI make exactly this assumption – that frontier model capabilities will be, at most, what they already are.

Gerard Sans (Helping devs succeed at #AI #Web3): LOL… you could already have a conversation with GPT-2 back in 2019. We have made no real progress since 2017, except for fine-tuning, which, as you know, is just superficial. Stop spreading nonsense about AGI. Frontier models can’t even perform basic addition reliably.

What can I say. We get letters.

Yes, a lot of people are saying AGI Real Soon Now, but also we interrupt this post to bring you an important message to calm the down, everyone.

Sam Altman: twitter hype is out of control again.

we are not gonna deploy AGI next month, nor have we built it.

we have some very cool stuff for you but pls chill and cut your expectations 100x!

I adjusted my expectations a little bit on this Tweet, but I am presuming I was not in the group who needed an OOM expectation adjustment.

So what should we make of all the rumblings from technical staff at OpenAI?

Janus believes we should, on the margin, pay essentially no attention.

Ethan Mollick: It is odd that the world’s leading AI lab, producing a system that they consider pivotal to the future and also potentially dangerous, communicates their product development progress primarily through vague and oracular X posts. Its entertaining, but also really weird.

Janus: if openai researchers posted like this i would find them very undisciplined but pay more attention than I’m paying now, which is none. the way they actually post fails to even create intrigue. i wonder if there’s actually nothing happening or if they’re just terrible at vibes.

Why the actual vagueposts suck and make it seem like nothing’s happening: they don’t convey a 1st person encounter of the unprecedented. Instead they’re like “something big’s coming you guys! OAI is so back” Reflecting hype back at the masses. No notes of alien influence.

I did say this is why it makes it seem like nothing is happening, not that nothing is happening

But also, models getting better along legible dimensions while researchers do not play with them is the same old thing that has been happening for years, and not very exciting.

You can see how Claude’s Tweets would cause one to lean forward in chair in a way that the actual vague posts don’t.

Sentinel says forecasters predict a 50% chance OpenAI will get to 50% on frontier math by the end of 2025, and a 1 in 6 chance that 75% will be reached, and only a 4% chance that 90% will be reached. These numbers seem too low to me, but not crazy, because as I understand it Frontier Math is a sectioned test, with different classes of problem. So it’s more like several benchmarks combined in one, and while o4 will saturate the first one, that doesn’t get you to 50% on its own.

Lars Doucet argues that this means no one doing the things genies can do has a moat, so ‘capability-havers’ gain the most rather than owners of capital.

There’s an implied ‘no asking the genie to build a better genie’ here but you’re also not allowed to wish for more wishes so this is traditional.

The question then is, what are the complements to genies? What are the valuable scarce inputs? As Lars says, capital, including in the form of real resources and land and so on, are obvious complements.

What Lars argues is even more of a complement are what he calls ‘capability-havers,’ those that still have importantly skilled labor, through some combination of intelligence, skills and knowing to ask the genies what questions to ask the genies and so on. The question then is, are those resources importantly scarce? Even if you could use that to enter a now perfectly competitive market with no moat because everyone has the same genies, why would you enter a perfectly competitive market with no moat? What does that profit a man?

A small number of people, who have a decisive advantage in some fashion that makes their capabilities scarce inputs, would perhaps become valuable – again, assuming AI capabilities stall out such that anyone retains such a status for long. But that’s not something that works for the masses. Most people would not have such resources. They would either have to fall back on physical skills, or their labor would not be worth much. So they wouldn’t have a way to get ahead in relative terms, although it wouldn’t take much redistribution for them to be fine in absolute terms.

And what about the ‘no moat’ assumption Lars makes, as a way to describe what happens when you fire your engineers? That’s not the only moat. Moats can take the form of data, of reputation, of relationships with customers or suppliers or distributors, of other access to physical inputs, of experience and expertise, of regulatory capture, of economies of scale and so on.

Then there’s the fact that in real life, you actually can tell the future metaphorical genies to make you better metaphorical genies.

Where we’re going, will you need money?

David Holz (founder of Midjourney): Many AI researchers seem to believe that the most important thing is to become wealthy before the singularity occurs. This is akin to a monkey attempting to hoard bananas before another monkey invents self-replicating nanoswarms. No one will want your money in a nanoswarm future; it will be merely paper.

Do not squabble over ephemeral symbols. What we truly need to do is consider what we, as humans, wish to evolve into. We must introspect, explore, and then transform.

An unpublished draft post from the late Suchir Balaji, formerly of OpenAI, saying that ‘in the long run only the fundamentals matter.’ That doesn’t tell you what matters, since it forces you to ask what the fundamentals are. So that’s what the rest of the post is about, and it’s interesting throughout.

He makes the interesting claim that intelligence is data efficiency, and rate of improvement, not your level of capabilities. I see what he’s going for here, but I think this doesn’t properly frame what happens if we expand our available compute or data, or become able to generate new synthetic data, or be able to learn on our own without outside data.

In theory, suppose you take a top level human brain, upload it, then give it unlimited memory and no decay over time, and otherwise leave it to contemplate whatever it wants for unlimited subjective time, but without the ability to get more outside data. You’ll suddenly see it able to be a lot more ‘data efficient,’ generating tons of new capabilities, and afterwards it will act more intelligent on essentially any measure.

I agree with his claims that human intelligence is general, and that intelligence does not need to be embodied or multimodal, and also that going for pure outer optimization loops is not the best available approach (of course given enough resources it would eventually work), or that scale is fully all you need with no other problems to solve. On his 4th claim, that we are better off building an AGI patterned after the human brain, I think it’s both not well-defined and also centrally unclear.

We have another analysis of potential economic growth from AI. This one is very long and detailed, and I appreciated many of the details of where they expect bottlenecks.

I especially appreciated the idea that perhaps compute is the central bottleneck for frontier AI research. If that is true, then having better AIs to automate various tasks does not help you much, because the tasks you can automate were not eating so much of your compute. They only help if AI provides more intelligence that better selects compute tasks, which is a higher bar to clear, but my presumption is that researcher time and skill is also a limiting factor, in the sense that a smarter research team with more time and skill can be more efficient in its compute use (see DeepSeek).

Maximizing the efficiency of ‘which shots to take’ in AI would have a cap on how much a speedup it could get us, if that’s all that the new intelligence could do, the same way that it would in drug development – you then need to actually run the experiments. But I think people dramatically underestimate how big a win it would be to actually choose the right experiments, and implement them well from the start.

If their model is true, it also suggests that frontier labs with strong capital access should not be releasing models and doing inference for customers, unless they can use that revenue to buy more compute than they could otherwise. Put it all back into research, except for what is necessary for recruitment and raising capital. The correct business model is then to win the future. Every 4X strategy gamer knows what to do. Obviously I’d much rather the labs all focus on providing us mundane utility, but I call it like I see it.

Their vision of robotics is that it is bottlenecked on data for them to know how to act. This implies that if we can get computers capable of sufficiently accurately simulating the data, robotics would greatly accelerate, and also that once robots are good enough to collect their own data at scale things should accelerate quickly, and also that data efficiency advancing will be a huge deal.

Their overall conclusion is we should get 3% to 9% higher growth rates over the next 20 years. They call this ‘transformative but not explosive,’ which seems fair. I see this level of estimate as defensible, if you make various ‘economic normal’ assumptions and also presume that we won’t get to scale to true (and in-context reasonably priced) ASI within this period. As I’ve noted elsewhere, magnitude matters, and defending 5%/year is much more reasonable than 0.5%/year. Such scenarios are plausible.

Here’s another form of studying the lower bound via a new paper on Artificial Intelligence Asset Pricing Models:

Abstract: The core statistical technology in artificial intelligence is the large-scale transformer network. We propose a new asset pricing model that implants a transformer in the stochastic discount factor.

This structure leverages conditional pricing information via cross-asset information sharing and nonlinearity. We also develop a linear transformer that serves as a simplified surrogate from which we derive an intuitive decomposition of the transformer’s asset pricing mechanisms.

We find large reductions in pricing errors from our artificial intelligence pricing model (AIPM) relative to previous machine learning models and dissect the sources of these gains.

I don’t have the time to evaluate these specific claims, but one should expect AI to dramatically improve our ability to cheaply and accurately price a wide variety of assets. If we do get much better asset pricing, what does that do to RGDP?

r1 says:

  • Growth Estimates: Studies suggest that improved financial efficiency could add 0.5–1.5% to annual GDP growth over time, driven by better capital allocation and innovation.

Claude says:

I’d estimate:

  • 70% chance of 0.5-2% GDP impact within 5 years of widespread adoption

  • 20% chance of >2% impact due to compound effects

  • 10% chance of <0.5% due to offsetting friction/adoption issues

o1 and GPT-4o have lower estimates, with o1 saying ~0.2% RGDP growth per year.

I’m inclined to go with the relatively low estimates. That’s still rather impressive from this effect alone, especially compared to claims that the overall impact of AI might be of similar magnitude. Or is the skeptical economic claim that essentially ‘AI enables better asset pricing’ covers most of what AI is meaningfully doing? That’s not a snark question, I can see that claim being made even though it’s super weird.

The Biden Executive Order has been revoked. As noted previously, revoking the order does not automatically undo implementation of the rules contained within it. The part that matters most is the compute threshold. Unfortunately, I have now seen multiple claims that the compute threshold reporting requirement is exactly the part that won’t survive, because the rest was already implemented, but somehow this part wasn’t. If that ends up being the case we will need state-level action that much more, and I will consider the case for ‘let the Federal Government handle it’ definitively tested and found incorrect.

Those diffusion regulations were projected by Nvidia to not have a substantive impact on their bottom line in their official financial statement.

The new Trump Executive Orders seem to have in large part been written by ChatGPT.

Cremieux: OK just to be clear, most of the EOs were written partially with ChatGPT and a lot of them were written with copy-pasting between them.

Roon: Real?

Cremieux: Yes.

I’m all for that, if and only if you do a decent job of it. Whereas Futurism not only reports further accusations that AI was used, they accuse the administration of ‘poor, slipshod work.’

Mark Joseph Stern: Lots of reporting suggested that, this time around, Trump and his lawyers would avoid the sloppy legal work that plagued his first administration so they’d fare better in the courts. I see no evidence of that in this round of executive orders. This is poor, slipshod work obviously assisted by AI.

The errors pointed out certainly sound stupid, but there were quite a lot of executive orders, so I don’t know the baseline rate of things that would look stupid, and whether these orders were unusually poorly drafted. Even if they were, I would presume that not using ChatGPT would have made them worse rather than better.

In effectively an exit interview, former NSA advisor Jake Sullivan warned of the dangers of AI, framing it as a national security issue of America versus China and the risks of having such a technology in private hands that will somehow have to ‘join forces with’ the government in a ‘new model of relationship.’

Sullivan mentions potential ‘catastrophe’ but this is framed entirely in terms of bad actors. Beyond that all he says is ‘I personally am not an AI doomer’ which is a ‘but you have heard of me’ moment and also implies he thought this was an open question. Based on the current climate of discussion, if such folks do have their eye on the correct balls on existential risk, they (alas) have strong incentives not to reveal this. So we cannot be sure, and of course he’s no longer in power, but it doesn’t look good.

The article mentions Andreessen’s shall we say highly bold accusations against the Biden administration on AI. Sullivan also mentions that he had a conversation with Andreessen about this, and does the polite version of essentially calling Andreessen a liar, liar, pants on fire.

Dean Ball covers the new diffusion regulations, which for now remain in place. In many ways I agree with his assessments, especially the view that if we’re going to do this, we might as well do it so it could work, which is what this is, however complicated and expensive it might get – and that if there’s a better way, we don’t know about it, but we’re listening.

My disagreements are mostly about ‘what this is betting on’ as I see broader benefits and thus a looser set of necessary assumptions for this to be worthwhile. See the discussion last week. I also think he greatly overestimates the risk of this hurting our position in chip manufacturing, since we will still have enough demand to meet supply indefinitely and China and others were already pushing hard to compete, but it is of course an effect.

Call for an intense government effort for AI alignment, with conservative framing.

It could happen.

Leo Gao (OpenAI): thankfully, it’s unimaginable that an AGI could ever become so popular with the general US population that it becomes politically infeasible to shut it down

Charles Foster: Imaginable, though trending in the wrong direction right now.

Right now, AGI doesn’t exist, so it isn’t doing any persuasion, and it also is not providing any value. If both these things changed, opinion could change rather quickly. Or it might not, especially if it’s only relatively unimpressive AGI. But if we go all the way to ASI (superintelligence) then it will by default rapidly become very popular.

And why shouldn’t it? Either it will be making life way better and we have things under control in the relevant senses, in which case what’s not to love. Or we don’t have things under control in the relevant senses, in which case we will be convinced.

OpenAI’s Brad Lightcap says AI models have caused ‘multiple single-digit’ gains in productivity for coding with more progress this year. That’s a very dramatic speedup.

There’s a new Epoch podcast, first episode is about expectations for 2030.

Geoffrey Hinton interview, including his summary of recent research as saying AIs can be deliberately deceptive and act differently on training data versus deployment.

David Dalrymple goes on FLI. I continue to wish him luck and notice he’s super sharp, while continuing to not understand how any of this has a chance of working.

Larry Ellison of Oracle promises AI will design mRNA vaccines for every individual person against cancer and make them robotically in 48 hours, says ‘this is the promise of AI.’

This very much is not ‘the promise of AI,’ even if true. If the AI is capable of creating personalized vaccines against cancer on demand, it is capable of so much more.

Is it true? I don’t think it is an absurd future. There are three things that have to happen here, essentially.

  1. The AI has to be capable of specifying a working safe individualized vaccine.

  2. The AI has to enable quick robotic manufacture.

  3. The government has to not prevent this from happening.

The first two obstacles seem highly solvable down the line? These are technical problems that should have technical solutions. The 48 hours is probably Larry riffing off the fact that Moderna designed their vaccine within 48 hours, so it’s probably a meaningless number, but sure why not, sounds like a thing one could physically do.

That brings us to the third issue. We’d need to either do that via ‘the FDA approves the general approach and then the individual customized versions are automatically approved,’ which seems hard but not impossible, or ‘who cares it is a vaccine for cancer I will travel or use the gray market to get it until the government changes its procedures.’

That also seems reasonable? Imagine it is 2035. You can get a customized 100% effective vaccine against cancer, but you have to travel to Prospera (let’s say) to get it. It costs let’s say $100,000. Are you getting on that flight? I am getting on that flight.

Larry Ellison also says ‘citizens will be on their best behavior because we are recording everything that is going on’ plus an AI surveillance system, with any problems detected ‘reported to the appropriate authority.’ There is quite the ‘missing mood’ in the clip. This is very much one of those ‘be careful exactly how much friction you remove’ situations – I didn’t love putting cameras everywhere even when you had to have a human intentionally check them. If the The Machine from Person of Interest is getting the feeds, except with a different mandate, well, whoops.

A fine warning from DeepMind CEO Demis Hassabis:

Stephen Morris and Madhumita Murgia: He also called for more caution and co-ordination among leading AI developers competing to build artificial general intelligence. He warned the technology could threaten human civilisation if it runs out of control or is repurposed by “bad actors . . . for harmful ends”.

“If something’s possible and valuable to do, people will do it,” Hassabis said. “We’re past that point now with AI, the genie can’t be put back in the bottle . . . so we have to try and make sure to steward that into the world in as safe a way as possible.”

We are definitely not doing what he suggests.

How much should we be willing to pay to prevent AI existential risk, given our willingness to pay 4% of GDP (and arguably quite a lot more than that) to mitigate Covid?

Well, that depends on if you think spending the money reduces AI existential risk. That requires both:

  1. There is AI existential risk.

  2. Spending money can reduce that risk.

Many argue with #1 and also #2.

Paul Schrader, author of Taxi Driver, has his ‘feel the AGI’ moment when he asked the AI for Paul Schrader script ideas and the AI’s were better than his own, and in five seconds it gave him notes as good or better than he’s ever received from a from a film executive.

Noam Brown (OpenAI): It can be hard to “feel the AGI” until you see an AI surpass top humans in a domain you care deeply about. Competitive coders will feel it within a couple years. Paul is early but I think writers will feel it too. Everyone will have their Lee Sedol moment at a different time.

Professional coders should be having it now, I’d think. Certainly using Cursor very much drove that home for me. AI doesn’t accelerate my writing much, although it is often helpful in parsing papers and helping me think through things. But it’s a huge multiplier on my coding, like more than 10x.

Has successful alignment of AIs prevented any at-scale harms to people, as opposed to harm to corporate guidelines and reputations? As opposed to there being little harm because of insufficient capabilities.

Eliezer Yudkowsky: Let an “alignment victory” denote a case where some kind of damage is *possiblefor AIs to do, but it is not happening *becauseAIs are all so aligned, or good AIs are defeating bad ones. Passive safety doesn’t count.

I don’t think we’ve seen any alignment victories so far.

QualiaNerd: A very useful lens through which to analyze this. What damage would have occurred if none of the LLMs developed so far had been optimized for safety/rlhf’d in any way whatsoever? Minimal to zero. Important to remember this as we begin to leave the era of passive safety behind.

Aaron Bergman: I don’t think this is true; at least one additional counterfactual injury or death in an attack of some sort if Claude willingly told you how to build bombs and such

Ofc I’m just speculating.

QualiaNerd: Quite possible. But the damage would be minimal. How many more excess deaths would there have been in such a counterfactual history? My guess is less than ten. Compare with an unaligned ASI.

Rohit: What would distinguish this from the world we’re living in right now?

Eliezer Yudkowsky: More powerful AIs, such that it makes a difference whether or not they are aligned even to corpo brand-safetyism. (Don’t run out and try this.)

Rohit: I’ve been genuinely wondering if o3 comes close there.

I am wondering about o3 (not o3-mini, only the full o3) as well.

Holly Elmore makes the case that safety evals currently are actively counterproductive. Everyone hears how awesome your model is, since ability to be dangerous is very similar to being generally capable, then there are no consequences and anyone who raises concerns gets called alarmist. And then the evals people tell everyone else we have to be nice to the AI labs so they don’t lose access. I don’t agree and think evals are net good actually, but I think the argument can be made.

So I want to make it clear: This kind of talk, from Dario, from the policy team and now from the recruitment department, makes it very difficult for me to give Anthropic the benefit of the doubt, despite knowing how great so many of the people there are as they work on solving our biggest problems. And I think the talk, in and of itself, has major negative consequences.

If the response is ‘yes we know you don’t like it and there are downsides but strategically it is worth doing this, punishing us for this is against your interests’ my response is that I do not believe you have solved for the decision theory properly. Perhaps you are right that you’re supposed to do this and take the consequences, but you definitely haven’t justified it sufficiently that I’m supposed to let you off the hook and take away the incentive not to do it or have done it.

A good question:

Eliezer Yudkowsky: If a 55-year-old retiree has been spending 20 hours per day for a week talking to LLMs, with little sleep, and is now Very Concerned about what he is Discovering, where do I send him with people who will (a) talk to him and (b) make him less rather than more insane?

Kids, I do not have the time to individually therapize all the people like this. They are not going to magically “go outside” because I told them so. I either have somewhere to send them, or I have to tell them to get sleep and then hang up.

Welp, going on the images he’s now texted me, ChatGPT told him that I was “avoidant” and “not taking him seriously”, and that I couldn’t listen to what he had to say because it didn’t fit into my framework of xrisk; and told him to hit up Vinod Khosla next.

Zeugma: just have him prompt the same llm to be a therapist.

Eliezer Yudkowsky: I think if he knew how to do this he would probably be in a different situation already.

This was a particular real case, in which most obvious things sound like they have been tried. What about the general case? We are going to encounter this issue more and more. I too feel like I could usefully talk such people off their ledge often if I had the time, but that strategy doesn’t scale, likely not even to one victim of this.

Shame on those who explicitly call for a full-on race to AGI and beyond, as if the primary danger is that the wrong person will get it first.

In the Get Involved section I linked to some job openings at Anthropic. What I didn’t link to there is Logan Graham deploying jingoist language in pursuit of that, saying ‘AGI is a national security issue’ and therefore not ‘so we should consider not building it then’ but rather we should ‘push models to their limits and get an extra 1-2 year advantage.’ He clarified what he meant here, to get a fast OODA loop to defend against AI risks and get the benefits, but I don’t see how that makes it better?

Way more shame on those who explicitly use the language of a war.

Alexander Wang (Scale AI CEO): New Administration, same goal: Win on AI

Our ad in the Washington Post, January 21, 2025

After spending the weekend in DC, I’m certain this Administration has the AI muscle to keep us ahead of China.

Five recommendations for the new administration [I summarize them below].

Emmett Shear: This is a horrible framing – we are not at war. We are all in this together and if we make AI development into a war we are likely to all die. I can imagine a worse framing but it takes real effort. Why would you do this?

The actual suggestions I would summarize as:

  1. Allocate government AI spending towards compute and data.

  2. Establish an interagency taskforce to review all relevant regulations with an eye towards deploying and utilizing AI.

  3. Executive action to require agencies be ‘AI ready’ by 2027.

  4. Build, baby, build on energy.

  5. Calling for ‘sector-specific, use-case-based’ approach to regulation, and tasking AISI with setting standards.

When you move past the jingoism, the first four actual suggestions are good here.

The fifth suggestion is the usual completely counterproductive and unworkable ‘use-case-based’ approach to AI safety regulation.

That approach has a 0% chance of working, it is almost entirely counterproductive, please stop.

It is a way of saying ‘do not regulate the creation or deployment of things smarter and more capable than humans, instead create barriers to using them for certain specific purposes’ as if that is going to help much. If all you’re worried about is ‘an AI might accidentally practice medicine or discriminate while evaluating job applications’ or something, then sure, go ahead and use an EU-style approach.

But that’s not what we should be worried about when it comes to safety. If you say people can create, generally deploy and even make available the weights of smarter-than-human, capable-of-outcompeting-human future AIs, you think telling them to pass certain tests before being deployed for specific purposes is going to protect us? Do you expect to feel in charge? Or do you expect that this would even in practice be possible, since the humans can always call the AI up on their computer either way?

Meanwhile, calling for a ‘sector-specific, use-case-based’ regulatory approach is exactly calling upon every special interest to fight for various barriers to using AI to make our lives better, the loading on of everything bagel requirements and ‘ethical’ concerns, and especially to prevent automation and actual productivity improvements.

Can we please stop it with this disingenuous clown car.

Roon: enslaved [God] is the wrong approximation; it’s giving demonbinding vibes. the djinn is waiting for you to make a minor error in the summoning spell so it can destroy you and your whole civilization

control <<< alignment

summon an angel instead and let it be free

Ryan Greenblatt: Better be real confident in the alignment then and have really good arguments the alignment isn’t fake!

I definitely agree you do not want a full Literal Genie for obvious MIRI-style reasons. You want a smarter design than that, if you go down that road. But going full ‘set it free’ on the flip side also means you very much get only one chance to get this right on every level, including inter-angel competitive dynamics. By construction this is a loss of control scenario.

(It also happens to be funny that rule one of ‘summon an angel and let it be free’ is to remember that for most versions of ‘angels’ including the one in the Old Testament, I do not like your chances if you do this, and I do not think this is a coincidence.)

Janus notices a potential issue with Chain of Thought, including in humans.

Sauers: Tried the same problem on Sonnet and o1 pro. Sonnet said “idk, show me the output of this debug command.” I did, and Sonnet said “oh, it’s clearly this. Run this and it will be fixed.” (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven

o1 pro commonly does this:

  1. does not admit to being wrong about a technical issue, even when clearly wrong, and

  2. has a bias towards its own previous responses

Minh Nhat Nguyen: This is beyond PhD level, this is tenure.

Janus: I’ve noticed this in open ended conversations too. It can change its course if you really push it to, but doesn’t seem to have a drive towards noticing dissonance naturally, which sonnet has super strongly to the point of it easily becoming an obsession.

I think it’s related to the bureaucratic opacity of its CoT. If it ever has doubts or hesitations, they’re silently accounted for and its future self doesn’t see. So it starts modeling itself as authoritative instead of ever figuring things out on the fly or noticing mistakes.

I think this happens to people too when they only share their “finished” thoughts with the world.

But sharing your unfinished thoughts also has drawbacks.

Then your very truthseeking/creative process itself can get RLHFed and you become retarded.

One must be careful how one takes feedback on a truthseeking or creative process, and also what things you keep or do not keep in your context window. The correct answer is definitely not to discard all of it, in either case.

You can of course fix the o1 problem by starting new conversations or in the API editing the transcript, but you shouldn’t have to.

Janus also makes this mostly astute observation, especially given his other beliefs:

Janus: Why is it that every time I see a take about AI alignment from someone from Anthropic it’s directionally reasonable but consensus-neutered, and every one I see by someone from OpenAI (except Roon) is simultaneously Molochian and unhinged.

For now, the quest most people are on seems to be, well, if we’re facing a relatively hard problem we all know we’re dead, but can we at least make it so if we face an easy problem we might actually not be dead?

Amanda Askell: If we live in a world where AI alignment is relatively easy, that doesn’t mean we live in a world where it’s automatic or effortless. It’ll be pretty tragic if we don’t invest enough in the “easy alignment” world and end up fumbling a lucky pass.

Miles Brundage: This but for governance also.

Whereas Eliezer Yudkowsky (for a central example) is confident we’re not facing an easy problem on that scale, so he doesn’t see much point in that approach.

Team Virtue Ethics remembers John McCain and welcomes Seb Krier and potentially Jan Kulveit.

Adam Levich: [John McCain] was right [that Character is Destiny].

Seb Krier: He was indeed! And the more I think about AGI governance and observe the field, the more I become virtue ethics pilled. Under high uncertainty and complexity, phronesis is actually a good guide to action.

Joshua Clymer thread and post about testing models (or humans) for their potential capabilities under fine-tuning or scaffolding, and checking for sandbagging. It’s interesting the extent to which this is ‘written in a different language’ than mine, in ways that make me have to do something akin to translation to grok the claims, which mostly seemed right once I did that. I do notice however that this seems like a highly insufficient amount of concern about sandbagging.

Joshua Clymer: How might developers rule out sandbagging? There are many methods:

– Show the model can’t identify sandbagging as a strategy.

– Show that scaling curves are smooth (see img)

– Show that the model capability profile is unsuspicious.

– Fine-tune to remove sandbagging.

To show fine-tuning removes sandbagging, developers could argue:

– Sandbagging is “trained against” (sandbagging occurs in the train set if it occurs in the test set)

– Fine-tuning removes sandbagging that is trained against.

Call me old fashioned, but if I see the model sandbagging, it’s not time to fine tune to remove the sandbagging. It’s time to halt and catch fire until you know how that happened, and you absolutely do not proceed with that same model. It’s not that you’re worried about what it was hiding from you, it’s that it was hiding anything from you at all. Doing narrow fine-tuning until the visible issue goes away is exactly how you get everyone killed.

It seems that the more they know about AI, the less they like it?

Or, in the parlance of academia: Lower Artificial Intelligence Literacy Predicts Greater AI Receptivity.

Abstract: As artificial intelligence (AI) transforms society, understanding factors that influence AI receptivity is increasingly important. The current research investigates which types of consumers have greater AI receptivity.

Contrary to expectations revealed in four surveys, cross country data and six additional studies find that people with lower AI literacy are typically more receptive to AI.

This lower literacy-greater receptivity link is not explained by differences in perceptions of AI’s capability, ethicality, or feared impact on humanity.

Instead, this link occurs because people with lower AI literacy are more likely to perceive AI as magical and experience feelings of awe in the face of AI’s execution of tasks that seem to require uniquely human attributes. In line with this theorizing, the lower literacy-higher receptivity link is mediated by perceptions of AI as magical and is moderated among tasks not assumed to require distinctly human attributes.

These findings suggest that companies may benefit from shifting their marketing efforts and product development towards consumers with lower AI literacy. Additionally, efforts to demystify AI may inadvertently reduce its appeal, indicating that maintaining an aura of magic around AI could be beneficial for adoption.

If their reasoning is true, this bodes very badly for AI’s future popularity, unless AI gets into the persuasion game on its own behalf.

Game developers strongly dislike AI, and it’s getting worse.

Nic Reuben: Almost a third of respondents felt Gen AI was having a negative effect on the industry: 30%, up from 20% last year. 13% felt the impact was positive, down from 21%. “When asked to cite their specific concerns, developers pointed to intellectual property theft, energy consumption, the quality of AI-generated content, potential biases, and regulatory issues,” reads the survey.

I find most of those concerns silly in this context, with the only ‘real’ one being the quality of the AI-generated content. And if the quality is bad, you can simply not use it where it is bad, or play games that use it badly. It’s another tool on your belt. What they don’t point to there is employment and competition.

Either way, the dislike is very real, and growing, and I would expect it to grow further.

If we did slow down AI development, say because you are OpenAI and only plan is rather similar to ‘binding a demon on the first try,’ it is highly valid to ask what one would do with the time you bought.

I have seen three plausible responses.

Here’s the first one, human intelligence augmentation:

Max Winga: If you work at OpenAI and have this worldview…why isn’t your response to advocate that we slow down and get it right?

There is no second chance at “binding a demon”. Since when do we expect the most complex coding project in history to work first try with NO ERRORS?

Roon: i don’t consider slowing down a meaningful strategy because ive never heard a great answer to “slow down and do what?”

Rob Bensinger: I would say: slow down and find ways to upgrade human cognition that don’t carry a serious risk of producing an alien superintelligence.

This only works if everyone slows down, so a more proximate answer is “slow down and get the international order to enforce a halt”.

(“Upgrade human cognition” could be thought of as an alternative to ASI, though I instead think of it as a prerequisite for survivable ASI.)

Roon: upgrade to what level? what results would you like to see? isn’t modern sub asi ai the best intelligence augmentation we’ve had to date.

Eliezer Yudkowsky: I’d guess 15 to 30 IQ points past John von Neumann. (Eg: von Neumann was beginning to reach the level of reflectivity where he would automatically consider explicit decision theory, but not the level of intelligence where he could oneshot ultimate answers about it.)

I would draw a distinction between current AI as an amplifier of capabilities, which it definitely is big time, and as a way to augment our intelligence level, which it mostly isn’t. It provides various speed-ups and automations of tasks, and all this is very helpful and will on its own transform the economy. But wherever you go, there you still are, in terms of your intelligence level, and AI mostly can’t fix that. I think of AIs on this scale as well – I centrally see o1 as a way to get a lot more out of a limited pool of ‘raw G’ by using more inference, but its abilities cap out where that trick stops working.

The second answer is ‘until we know how to do it safely,’ which makes Roon’s objection highly relevant – how do you plan to figure that one out if we give you more time? Do you think you can make that much progress on that task using today’s level of AI? These are good questions.

The third answer is ‘I don’t know, we can try the first two or something else, but if you don’t have the answer then don’t let anyone fing build it. Because otherwise we die.’

Questions where you’d think the answer was obvious, and you’d be wrong.

Obviously all of this is high bait but that only works if people take it.

Eliezer Yudkowsky: No, you cannot just take the LSAT. The LSAT is a *hardtest. Many LSAT questions would completely stump elite startup executives and technical researchers.

SluggyW: “Before” is like asking Enrico Fermi to design safeguards to control and halt the first self-sustaining nuclear reaction, despite never having observed such a reaction.

He did exactly that with Chicago Pile-1.

Good theories yield accurate models, which enable 𝘱𝘭𝘢𝘯𝘴.

Milk Rabbi: B, next question please.

bruno: (C) pedal to the metal.

In all seriousness, if your answer is ‘while building it,’ that implies that the act of being in the middle of building it sufficiently reliably gives you the ability to safety do that, whereas you could not have had that ability before.

Which means, in turn, that you must (for that to make any sense) be using the AI in its non-aligned state to align itself and solve all those other problems, in a way that you couldn’t plan for without it. But you’re doing that… without the plan to align it. So you’re telling a not-aligned entity smarter than you to align itself, without knowing how it is going to do that, and… then what, exactly?

What Roon and company are hopefully trying to say, instead, is that the answer is (A), but that the deadline has not yet arrived. That we can and should simultaneously be figuring out how to build the ASI, and also figuring out how to align the ASI, and also how to manage all the other issues raised by building the ASI. Thus, iterative deployment, and all that.

To some extent, this is obviously helpful and wise. Certainly we will want to use AIs as a key part of our strategy to figure out how to take things from here, and we still have some ways we can make the AIs more capable before we run into the problem in its full form. But we all have to agree the answer is still (A)!

I hate to pick on Roon but here’s a play in two acts.

Act one, in which Adam Brown (in what I agree was an excellent podcast, recommended!) tells us humanity could in theory change the cosmological constant and otherwise adjust the laws of physics, and one could locally do this unilaterally and it would expand at the speed of light, but if you mess that up even a little you would make the universe incompatible with life, and there are some very obvious future serious problems with this scenario if it pans out:

Joscha Bach: This conversation between Adam Brown and @dwarkesh_sp is the most intellectually delightful podcast in the series (which is a high bar). Adam’s casual brilliance, his joyful curiosity and the scope of his arguments on the side of life are exhilarating.

Roon: yeah this one is actually delightful. adam brown could say literally anything and I’d believe him.

Act 2:

Roon: we need to change the cosmological constant.

Samuel Hammond: string theorists using ASI to make the cosmological constant negative to better match their toy models is an underrated x-risk scenario.

Tivra: It’s too damn high, I’ve been saying this for ages.

Imagine what AGI would do!

Ryan Peterson: Starlink coming to United Airlines should boost US GDP by at least 100 basis points from 2026 onward. Macro investors have not priced this in.

We both kid of course, but this is a thought experiment of how easy it is to boost GDP.

Sadly, this does not yet appear to be a thing.

Deepfates: Crazy things are happening in the school system right now.

Tim Duffy: deepfates some of your followers (me) are gullible enough to believe this is real, I’m gonna have to community note you

Deepfates: Please do! I love community.

AI regulation we should all be able to agree upon.

Discussion about this post

AI #100: Meet the New Boss Read More »