Author name: Rejus Almole

dwarkesh-patel-on-continual-learning

Dwarkesh Patel on Continual Learning

A key question going forward is the extent to which making further AI progress will depend upon some form of continual learning. Dwarkesh Patel offers us an extended essay considering these questions and reasons to be skeptical of the pace of progress for a while. I am less skeptical about many of these particular considerations, and do my best to explain why in detail.

Separately, Ivanka Trump recently endorsed a paper with a discussion I liked a lot less but that needs to be discussed given how influential her voice might (mind you I said might) be to policy going forward, so I will then cover that here as well.

Dwarkesh Patel explains why he doesn’t think AGI is right around the corner, and why AI progress today is insufficient to replace most white collar employment: That continual learning is both necessary and unsolved, and will be a huge bottleneck.

He opens with this quote:

Rudiger Dornbusch: Things take longer to happen than you think they will, and then they happen faster than you thought they could.

Clearly this means one is poorly calibrated, but also yes, and I expect it to feel like this as well. Either capabilities, diffusion or both will be on an exponential, and the future will be highly unevenly distributed until suddenly parts of it aren’t anymore. That seems to be true fractally as well, when the tech is ready and I figure out how to make AI do something, that’s it, it’s done.

Here is Dwarkesh’s Twitter thread summary:

Dwarkesh Patel: Sometimes people say that even if all AI progress totally stopped, the systems of today would still be economically transformative. I disagree. The reason that the Fortune 500 aren’t using LLMs to transform their workflows isn’t because the management is too stodgy.

Rather, it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.

New blog post where I explain why I disagree with this, and why I have slightly longer timelines to AGI than many of my guests.

I think continual learning is a huge bottleneck to the usefulness of these models, and extended computer use may take years to sort out.

Link here.

There is no consensus definition of transformational but I think this is simply wrong, in the sense that LLMs being stuck without continual learning at essentially current levels would not stop them from having a transformational impact. There are a lot of other ways to get a ton more utility out of what we already have, and over time we would build around what the models can do rather than giving up the moment they don’t sufficiently neatly fit into existing human-shaped holes.

When we do solve human like continual learning, however, we might see a broadly deployed intelligence explosion *even if there’s no more algorithmic progress*.

Simply from the AI amalgamating the on-the-job experience of all the copies broadly deployed through the economy.

I’d bet 2028 for computer use agents that can do taxes end-to-end for my small business as well as a competent general manager could in a week: including chasing down all the receipts on different websites, emailing back and forth for invoices, and filing to the IRS.

That being said, you can’t play around with these models when they’re in their element and still think we’re not on track for AGI.

Strongly agree with that last statement. Regardless of how much we can do without strictly solving continual learning, continual learning is not solved… yet.

These are simple, self contained, short horizon, language in-language out tasks – the kinds of assignments that should be dead center in the LLMs’ repertoire. And they’re 5/10 at them. Don’t get me wrong, that’s impressive.

But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human’s. But there’s no way to give a model high level feedback.

You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.

The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.

You make an AI tool. It’s 5/10 out of the box. What level of Skill Issue are we dealing with here, that stops it from getting better over time assuming you don’t get to upgrade the underlying model?

You can obviously engage in industrial amounts of RL or other fine-tuning, but that too only goes so far.

You can use things like memory, or train LoRas, or various other incremental tricks. That doesn’t enable radical changes, but I do think it can work for the kinds of preference learning Dwarkesh is complaining he currently doesn’t have access to, and you can if desired go back and fine tune the entire system periodically.

How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student.

This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.

Are you even so sure about that? If the context you can give is hundreds of thousands to millions of tokens at once, with ability to conditionally access millions or billions more? If you can create new tools and programs and branch workflows, or have it do so on your behalf, and call instances with different contexts and procedures for substeps? If you get to keep rewinding time and sending in the exact same student in the same mental state as many times as you want? And so on, including any number of things I haven’t mentioned or thought about?

I am confident that with enough iterations and work (and access to the required physical tools) I could write a computer program to operate a robot to play the saxophone essentially perfectly. No, you can’t do this purely via the LLM component, but that is why we are moving towards MCP and tool use for such tasks.

I get that Dwarkesh has put a lot of work into getting his tools to 5/10. But it’s nothing compared to the amount of work that could be done, including the tools that could be involved. That’s not a knock on him, that wouldn’t be a good use of his time yet.

LLMs actually do get kinda smart and useful in the middle of a session. For example, sometimes I’ll co-write an essay with an LLM. I’ll give it an outline, and I’ll ask it to draft the essay passage by passage. All its suggestions up till 4 paragraphs in will be bad. So I’ll just rewrite the whole paragraph from scratch and tell it, “Hey, your shit sucked. This is what I wrote instead.” At that point, it can actually start giving good suggestions for the next paragraph. But this whole subtle understanding of my preferences and style is lost by the end of the session.

Okay, so that seems like it is totally, totally a Skill Issue now? As in, Dwarkesh Patel has a style. A few paragraphs of that style clue the LLM into knowing how to help. So… can’t we provide it with a bunch of curated examples of similar exercises, and put them into context in various ways (Claude projects just got 10x more context!) and start with that?

Even Claude Code will often reverse a hard-earned optimization that we engineered together before I hit /compact – because the explanation for why it was made didn’t make it into the summary.

Yeah, this is super annoying, I’ve run into it, but I can think of some obvious fixes for this, especially if you notice what you want to preserve? One obvious way is to do what humans do, which is to put it into comments in the code saying what the optimization is and why to keep it, which then remain in context whenever Claude considers ripping them out, I don’t know if that works yet but it totally should.

I’m not saying I have the magical solution to all this but it all feels like it’s One Weird Trick (okay, maybe 10 working together) away from working in ways I could totally figure out if I had a team behind me and I focused on it.

My guess is this will not look like ‘learn like a human’ exactly. Different tools are available, so we’ll first get the ability to solve this via doing something different. But also, yeah, I think with enough skill and the right technique (on the level of the innovation that created reasoning models) you could basically do what humans do? Which involves effectively having the systems automatically engage in various levels of meta and updating, often quite heavily off a single data point.

It is hard to overstate how much time and effort goes into training a human employee.

There are many jobs where an employee is not net profitable for years. Hiring decisions are often made on the basis of what will be needed in year four or beyond.

That ignores the schooling that you also have to do. A doctor in America requires starting with a college degree, then four years of medical school, then four years of residency, and we have to subsidize that residency because it is actively unprofitable. That’s obviously an extreme case, but there are many training programs or essentially apprenticeships that last for years, including highly expensive time from senior people and expensive real world mistakes.

Imagine what it took to make Dwarkesh Patel into Dwarkesh Patel. Or the investment he makes in his own employees.

Even afterwards, in many ways you will always be ‘stuck with’ various aspects of those employees, and have to make the most of what they offer. This is standard.

Claude Opus estimates, and I think this is reasonable, that for every two hours humans spend working, they spend one hour learning, with a little less than half of that learning essentially ‘on the job.’

If you need to train a not a ‘universal’ LLM but a highly specific-purpose LLM, and have a massive compute budget with which to do so, and you mostly don’t care about how it performs out of distribution the same way you mostly don’t for an employee (as in, you teach it what you teach a human, which is ‘if this is outside your distribution or you’re failing at it then run it up the chain to your supervisor,’ and you have a classifier for that) and you can build and use tools along the way? Different ballgame.

It makes sense, given the pace of progress, for most people and companies not to put that kind of investment into AI ‘employees’ or other AI tasks. But if things do start to stall out, or they don’t, either way the value proposition on that will quickly improve. It will start to be worth doing. And we will rapidly learn new ways of doing it better, and have the results available to be copied.

Here’s his predictions on computer use in particular, to see how much we actually disagree:

When I interviewed Anthropic researchers Sholto Douglas and Trenton Bricken on my podcast, they said that they expect reliable computer use agents by the end of next year. We already have computer use agents right now, but they’re pretty bad. They’re imagining something quite different.

Their forecast is that by the end of next year, you should be able to tell an AI, “Go do my taxes.” And it goes through your email, Amazon orders, and Slack messages, emails back and forth with everyone you need invoices from, compiles all your receipts, decides which are business expenses, asks for your approval on the edge cases, and then submits Form 1040 to the IRS.

I’m skeptical. I’m not an AI researcher, so far be it for me to contradict them on technical details. But given what little I know, here’s why I’d bet against this forecast:

  • As horizon lengths increase, rollouts have to become longer. The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. Not to mention that computer use requires processing images and video, which is already more compute intensive, even if you don’t factor in the longer rollout. This seems like this should slow down progress.

Let’s take the concrete example here, ‘go do my taxes.’

This is a highly agentic task, but like a real accountant you can choose to ‘check its work’ if you want, or get another AI to check the work, because you can totally break this down into smaller tasks that allow for verification, or present a plan of tasks that can be verified. Similarly, if you are training TaxBot to do people’s taxes for them, you can train TaxBot on a lot of those individual subtasks, and give it clear feedback.

Almost all computer use tasks are like this? Humans also mostly don’t do things that can’t be verified for hours?

And the core building block issues of computer use seem mostly like very short time horizon tasks with very easy verification methods. If you can get lots of 9s on the button clicking and menu navigation and so on, I think you’re a lot of the way there.

The subtasks are also 99%+ things that come up relatively often, and that don’t present any non-trivial difficulties. A human accountant already will have to occasionally say ‘wait, I need you the taxpayer to tell me what the hell is up with this thing’ and we’re giving the AI in 2028 the ability to do this too.

I don’t see any fundamental difference between the difficulties being pointed out here, and the difficulties of tasks we have already solved.

  • We don’t have a large pretraining corpus of multimodal computer use data. I like this quote from Mechanize’s post on automating software engineering: “For the past decade of scaling, we’ve been spoiled by the enormous amount of internet data that was freely available for us to use. This was enough for cracking natural language processing, but not for getting models to become reliable, competent agents. Imagine trying to train GPT-4 on all the text data available in 1980—the data would be nowhere near enough, even if we had the necessary compute.”

    Again, I’m not at the labs. Maybe text only training already gives you a great prior on how different UIs work, and what the relationship between different components is. Maybe RL fine tuning is so sample efficient that you don’t need that much data. But I haven’t seen any public evidence which makes me think that these models have suddenly gotten less data hungry, especially in this domain where they’re substantially less practiced.

    Alternatively, maybe these models are such good front end coders that they can just generate millions of toy UIs for themselves to practice on. For my reaction to this, see bullet point below.

I’m not going to keep working for the big labs for free on this one by giving even more details on how I’d solve all this, but this totally seems like highly solvable problems, and also this seems like a case of the person saying it can’t be done interrupting the people doing it? It seems like progress is being made rapidly.

  • Even algorithmic innovations which seem quite simple in retrospect seem to take a long time to iron out. The RL procedure which DeepSeek explained in their R1 paper seems simple at a high level. And yet it took 2 years from the launch of GPT-4 to the release of o1.

  • Now of course I know it is hilariously arrogant to say that R1/o1 were easy – a ton of engineering, debugging, pruning of alternative ideas was required to arrive at this solution. But that’s precisely my point! Seeing how long it took to implement the idea, ‘Train the model to solve verifiable math and coding problems’, makes me think that we’re underestimating the difficulty of solving the much gnarlier problem of computer use, where you’re operating in a totally different modality with much less data.

I think two years is how long we had to have the idea of o1 and commit to it, then to implement it. Four months is roughly the actual time it took from ‘here is that sentence and we know it works’ to full implementation. Also we’re going to have massively more resources to pour into these questions this time around, and frankly I don’t think any of these insights are even as hard to find as o1, especially now that we have reasoning models to use as part of this process.

I think there are other potential roadblocks along the way, and once you factor all of those in you can’t be that much more optimistic, but I see this particular issue as not that likely to pose that much of a bottleneck for long.

His predictions are he’d take 50/50 bets on: 2028 for an AI that can ‘just go do your taxes as well as a human accountant could’ and 2032 for ‘can learn details and preferences on the job as well as a human can.’ I’d be inclined to take other side of both of those bets, assuming it means by EOY, for the 2032 one we’d need to flesh out details.

But if we have the ‘AI that does your taxes’ in 2028 then 2029 and 2030 look pretty weird, because this implies other things:

Daniel Kokotajlo: Great post! This is basically how I think about things as well. So why the difference in our timelines then?

–Well, actually, they aren’t that different. My median for the intelligence explosion is 2028 now (one year longer than it was when writing AI 2027), which means early 2028 or so for the superhuman coder milestone described in AI 2027, which I’d think roughly corresponds to the “can do taxes end-to-end” milestone you describe as happening by end of 2028 with 50% probability. Maybe that’s a little too rough; maybe it’s more like month-long horizons instead of week-long. But at the growth rates in horizon lengths that we are seeing and that I’m expecting, that’s less than a year…

–So basically it seems like our only serious disagreement is the continual/online learning thing, which you say 50% by 2032 on whereas I’m at 50% by end of 2028. Here, my argument is simple: I think that once you get to the superhuman coder milestone, the pace of algorithmic progress will accelerate, and then you’ll reach full AI R&D automation and it’ll accelerate further, etc. Basically I think that progress will be much faster than normal around that time, and so innovations like flexible online learning that feel intuitively like they might come in 2032 will instead come later that same year.

(For reference AI 2027 depicts a gradual transition from today to fully online learning, where the intermediate stages look something like “Every week, and then eventually every day, they stack on another fine-tuning run on additional data, including an increasingly high amount of on-the-job real world data.” A janky unprincipled solution in early 2027 that gives way to more elegant and effective things midway through the year.)

I found this an interestingly wrong thing to think:

Richard: Given the risk of fines and jail for filling your taxes wrong, and the cost of processing poor quality paperwork that the government will have to bear, it seems very unlikely that people will want AI to do taxes, and very unlikely that a government will allow AI to do taxes.

The rate of fully accurately filing your taxes is, for anyone whose taxes are complex, basically 0%. Everyone makes mistakes. When the AI gets this right almost every time, it’s already much better than a human accountant, and you’ll have a strong case that what happened was accidental, which means at worst you pay some modest penalties.

Personal story, I was paying accountants at a prestigious firm that will go unnamed to do my taxes, and they literally just forgot to include paying city tax at all. As in, I’m looking at the forms, and I ask, ‘wait why does it have $0 under city tax?’ and the guy essentially says ‘oh, whoops.’ So, yeah. Mistakes are made. This will be like self-driving cars, where we’ll impose vastly higher standards of accuracy and law abidance on the AIs, and they will meet them because the bar really is not that high.

There were also some good detailed reactions and counterarguments from others:

Near: finally some spicy takes around here.

Rohit: The question is whether we need humanlike labour for transformative economic outcomes, or whether we can find ways to use the labour it does provide with a different enough workflow that it adds substantial economic advantage.

Sriram Krishnan: Really good post from @dwarkesh_sp on continuous learning in LLMs.

Vitalik Buterin: I have high probability mass on longer timelines, but this particular issue feels like the sort of limitation that’s true until one day someone discovers a magic trick (think eg. RL on CoT) that suddenly makes it no longer true.

Sriram Krishnan: Agree – CoT is a particularly good example.

Ryan Greenblatt: I agree with much of this post. I also have roughly 2032 medians to things going crazy, I agree learning on the job is very useful, and I’m also skeptical we’d see massive white collar automation without further AI progress.

However, I think Dwarkesh is wrong to suggest that RL fine-tuning can’t be qualitatively similar to how humans learn.

In the post, he discusses AIs constructing verifiable RL environments for themselves based on human feedback and then argues this wouldn’t be flexible and powerful enough to work, but RL could be used more similarly to how humans learn.

My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it’s based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.

So, you could imagine RL’ing an AI based on both external feedback and self-verification like this. And, this would be a “deliberate, adaptive process” like human learning. Why would this currently work worse than human learning?

Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:

1. Robust self-verification: the ability to correctly determine when you’ve done something well/poorly in a way which is robust to you optimizing against it.

2. Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.

But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.

All that said, I think it’s very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks…).

Some more quibbles:

– For the exact podcasting tasks Dwarkesh mentions, it really seems like simple fine-tuning mixed with a bit of RL would solve his problem. So, an automated training loop run by the AI could probably work here. This just isn’t deployed as an easy-to-use feature.

– For many (IMO most) useful tasks, AIs are limited by something other than “learning on the job”. At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.

– Correspondingly, I’d guess the reason that we don’t see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you’d have very superhuman AIs with 10^26 FLOP presumably), I’m just making a claim about more incremental progress.

– I think Dwarkesh uses the term “intelligence” somewhat atypically when he says “The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.” I think people often consider how fast someone learns on the job as one aspect of intelligence. I agree there is a difference between short feedback loop intelligence (e.g. IQ tests) and long feedback loop intelligence and they are quite correlated in humans (while AIs tend to be relatively worse at long feedback loop intelligence).

More thoughts/quibbles:

– Dwarkesh notes “An AI that is capable of online learning might functionally become a superintelligence quite rapidly, even if there’s no algorithmic progress after that point.” This seems reasonable, but it’s worth noting that if sample efficient learning is very compute expensive, then this might not happen so rapidly.

– I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to “learn once deploy many” style strategies). I think we’ll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we’ll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.

Matt Reardon: Dwarkesh commits the sin of thinking work you’re personally close to is harder-than-average to automate.

Herbie Bradley: I mean this is just correct? most researchers I know think continual learning is a big problem to be solved before AGI

Matt Reardon: My main gripe is that “<50%" [of jobs being something you can automate soon] should be more like "<15%"

Danielle Fong: Gell-Mann Amnesia for AI.

Reardon definitely confused me here, but either way I’d say that Dwarkesh Patel is a 99th percentile performer. He does things most other people can’t do. That’s probably going to be harder to automate than most other white collar work? The bulk of hours in white collar work are very much not bespoke things and don’t act to put state or memory into people in subtle ways?

Now that we’ve had a good detailed discussion and seen several perspectives, it’s time to address another discussion of related issues, because it is drawing attention from an unlikely source.

After previously amplifying Situational Awareness, Ivanka Trump is back in the Essay Meta with high praise for The Era of Experience, authored by David Silver and (oh no) Richard Sutton.

Situational Awareness was an excellent pick. I do not believe this essay was a good pick. I found it a very frustrating, unoriginal and unpersuasive paper to read. To the extent it is saying something new I don’t agree, but it’s not clear to what extent it is saying anything new. Unless you want to know about this paper exactly because Ivanka is harping it, you should skip this section.

I think the paper effectively mainly says we’re going to do a lot more RL and we should stop trying to make the AIs mimic, resemble or be comprehensible to humans or trying to control their optimization targets?

Ivanka Trump: Perhaps the most important thing you can read about AI this year : “Welcome to the Era of Experience”

This excellent paper from two senior DeepMind researchers argues that AI is entering a new phase—the “Era of Experience”—which follows the prior phases of simulation-based learning and human data-driven AI (like LLMs).

The authors’ posit that future AI breakthroughs will stem from learning through direct interaction with the world, not from imitating human-generated data.

This is not a theory or distant future prediction. It’s a description of a paradigm shift already in motion.

Let me know what you think !

Glad you asked, Ivanka! Here’s what I think.

The essay starts off with a perspective we have heard before, usually without much of an argument behind it: That LLMs and other AIs trained only on ‘human data’ is ‘rapidly approaching a limit,’ we are running out of high-quality data, and thus to progress significantly farther AIs will need to move into ‘the era of experience,’ meaning learning continuously from their environments.

I agree that the standard ‘just feed it more data’ approach will run out of data with which to scale, but there are a variety of techniques already being used to get around this. We have lots of options.

The leading example the paper itself gives of this in the wild is AlphaProof, which ‘interacted with a formal proofing system’ which seems to me like a clear case of synthetic data working and verification being easier than generation, rather than ‘experience.’ If the argument is simply that RL systems will learn by having their outputs evaluated, that isn’t news.

They claim to have in mind something rather different from that, and with this One Weird Trick they assert Superintelligence Real Soon Now:

Our contention is that incredible new capabilities will arise once the full potential of experiential learning is harnessed. This era of experience will likely be characterised by agents and environments that, in addition to learning from vast quantities of experiential data, will break through the limitations of human-centric AI systems in several further dimensions:

• Agents will inhabit streams of experience, rather than short snippets of interaction.

• Their actions and observations will be richly grounded in the environment, rather than interacting via human dialogue alone.

• Their rewards will be grounded in their experience of the environment, rather than coming from human prejudgement.

• They will plan and/or reason about experience, rather than reasoning solely in human terms.

We believe that today’s technology, with appropriately chosen algorithms, already provides a sufficiently powerful foundation to achieve these breakthroughs. Furthermore, the pursuit of this agenda by the AI community will spur new innovations in these directions that rapidly progress AI towards truly superhuman agents.

I suppose if the high level takeaway is ‘superintelligence is likely coming reasonably soon with the right algorithms’ then there’s no real disagreement?

They then however discuss tool calls and computer use, which then seems like a retreat back into an ordinary RL paradigm? It’s also not clear to me what the authors mean by ‘human terms’ versus ‘plan and/or reason about experience,’ or even what ‘experience’ means here. They seem to be drawing a distinction without a difference.

If the distinction is simply (as the paper implies in places) that the agents will do self-evaluation rather than relying on human feedback, I have some important news about how existing systems already function? They use the human feedback and other methods to train an AI feedback system that does most of the work? And yes they often include ‘real world’ feedback systems in that? What are we even saying here?

They also seem to be drawing a distinction between the broke ‘human feedback’ and the bespoke ‘humans report physical world impacts’ (or ‘other systems measure real world impacts’) as if the first does not often encompass the second. I keep noticing I am confused what the authors are trying to say.

For reasoning, they say it is unlikely human methods of reasoning and human language are optimal, more efficient methods of thought must exist. I mean, sure, but that’s also true for humans, and it’s obvious that you can use ‘human style methods of thought’ to get to superintelligence by simply imagining a human plus particular AI advantages.

As many have pointed out (and is central to AI 2027) encouraging AIs to use alien-looking inhuman reasoning styles we cannot parse is likely a very bad idea even if it would be more effective, what visibility we have will be lost and also it likely leads to alien values and breaks many happy things. Then again, Richard Sutton is one of the authors of this paper and he thinks we should welcome succession, as in the extinction of humanity, so he wouldn’t care.

They try to argue against this by saying that while agents pose safety risks and this approach may increase those safety risks, the approach may also have safety benefits. First, they say this allows the AI to adapt to its environment, as if the other agent could not do this or this should make us feel safer.

Second, they say ‘the reward function may itself be adapted through experience,’ in terms of risk that’s worse you know that that’s worse, right? They literally say ‘rather than blindly optimizing a signal such as the number of paperclips it can adopt to indications of human concern,’ this shows a profound lack of understanding and curiosity of where the whole misspecification of rewards problem is coming from or the arguments about it from Yudkowsky (since they bring in the ‘paperclips’).

Adapting autonomously and automatically towards something like ‘level of human concern’ is exactly the kind of metric and strategy that is absolutely going to encourage perverse outcomes and get you killed at the limit. You don’t get out of the specification problem by saying you can specify something messier and let the system adapt around it autonomously, that only makes it worse, and in no way addresses the actual issue.

The final argument for safety is that relying on physical experience creates time limitations, which provides a ‘natural break,’ which is saying that capabilities limits imposed by physical interactions will keep things more safe? Seriously?

There is almost nothing in the way of actual evidence or argument in the paper that is not fully standard, beyond a few intuition pumps. There are many deep misunderstandings, including fully backwards arguments, along the way. We may well want to rely a lot more on RL and on various different forms of ‘experiential’ data and continuous learning, but given how much worse it was than I expected this post updated me in the opposite direction of that which was clearly intended.

Discussion about this post

Dwarkesh Patel on Continual Learning Read More »

bill-atkinson,-architect-of-the-mac’s-graphical-soul,-dies-at-74

Bill Atkinson, architect of the Mac’s graphical soul, dies at 74

Using HyperCard, Teachers created interactive lessons, artists built multimedia experiences, and businesses developed custom database applications—all without writing traditional code. The hypermedia environment also had a huge impact on gaming: 1993 first-person adventure hit Myst originally used HyperCard as its game engine.

An example of graphical dithering, which allows 1-bit color (black and white only) to imitate grayscale.

An example of graphical dithering, which allows 1-bit color (black and white only) to imitate grayscale. Credit: Benj Edwards / Apple

For the two-color Macintosh (which could only display black or white pixels, with no gradient in between), Atkinson developed an innovative high-contrast dithering algorithm that created the illusion of grayscale images with a characteristic stippled appearance that became synonymous with early Mac graphics. The dithered aesthetic remains popular today among some digital artists and indie game makers, with modern tools like this web converter that allows anyone to transform photos into the classic Atkinson dither style.

Life after Apple

After leaving Apple in 1990, Atkinson co-founded General Magic with Marc Porat and Andy Hertzfeld, attempting to create personal communicators before smartphones existed. Wikipedia notes that in 2007, he joined Numenta, an AI startup, declaring their work on machine intelligence “more fundamentally important to society than the personal computer and the rise of the Internet.”

In his later years, Atkinson pursued nature photography with the same artistry he’d brought to programming. His 2004 book “Within the Stone” featured close-up images of polished rocks that revealed hidden worlds of color and pattern.

Atkinson announced his pancreatic cancer diagnosis in November 2024, writing on Facebook that he had “already led an amazing and wonderful life.” The same disease claimed his friend and collaborator Steve Jobs in 2011.

Given Atkinson’s deep contributions to Apple history, it’s not surprising that Jobs’ successor, Apple CEO Tim Cook, paid tribute to the Mac’s original graphics guru on X on Saturday. “We are deeply saddened by the passing of Bill Atkinson,” Cook wrote. “He was a true visionary whose creativity, heart, and groundbreaking work on the Mac will forever inspire us.”

Bill Atkinson, architect of the Mac’s graphical soul, dies at 74 Read More »

microsoft-dives-into-the-handheld-gaming-pc-wars-with-the-asus-rog-xbox-ally

Microsoft dives into the handheld gaming PC wars with the Asus ROG Xbox Ally

Back in March, we outlined six features we wanted to see on what was then just a rumored Xbox-branded, Windows-powered handheld gaming device. Today, Microsoft’s announcement of the Asus ROG Xbox Ally hardware line looks like it fulfills almost all of our wishes for Microsoft’s biggest foray into portable gaming yet.

The Windows-11-powered Xbox Ally devices promise access to “all of the games available on Windows,” including “games from Xbox, Game Pass, Battle.net, and other leading PC storefronts [read: Steam, Epic Games Store, Ubisoft Connect, etc].” But instead of having to install and boot up those games through the stock Windows interface, as you often do on handhelds like the original ROG Ally line, all these games will be available through what Microsoft is calling an “aggregated gaming library.”

Asus and Microsoft are stressing how that integrated experience can be used with games across multiple different Windows-based launchers, promising “access to games you can’t get elsewhere.” That could be seen as a subtle dig at SteamOS-powered devices like the Steam Deck, which can have significant trouble with certain titles that don’t play well with Steam and/or Linux for one reason or another. Microsoft also highlights how support apps like Discord, Twitch, and downloadable game mods will also be directly available via the Xbox Ally’s Windows backbone.

And while the Xbox Ally devices run Windows 11, they will boot to what Microsoft is calling the “Xbox Experience for Handheld,” a bespoke full-screen interface that hides the nitty-gritty of the Windows desktop by default. That gaming-focused interface will “minimize background activity and defer non-essential tasks,” meaning “more [and] higher framerates” for the games themselves, Microsoft says. A rhombus-shaped Xbox button located near the left stick will also launch an Xbox Game Bar overlay with quick access to functions like settings, performance metrics, and fast switching between titles. Microsoft also says it is working on a “Deck Verified”-style program for identifying Windows titles that “have been optimized for handhelds.”

Microsoft dives into the handheld gaming PC wars with the Asus ROG Xbox Ally Read More »

cambridge-mapping-project-solves-a-medieval-murder

Cambridge mapping project solves a medieval murder


“A tale of shakedowns, sex, and vengeance that expose[s] tensions between the church and England’s elite.”

Location of the murder of John Forde, taken from the Medieval Murder Maps. Credit: Medieval Murder Maps. University of Cambridge: Institute of Criminology

In 2019, we told you about a new interactive digital “murder map” of London compiled by University of Cambridge criminologist Manuel Eisner. Drawing on data catalogued in the city coroners’ rolls, the map showed the approximate location of 142 homicide cases in late medieval London. The Medieval Murder Maps project has since expanded to include maps of York and Oxford homicides, as well as podcast episodes focusing on individual cases.

It’s easy to lose oneself down the rabbit hole of medieval murder for hours, filtering the killings by year, choice of weapon, and location. Think of it as a kind of 14th-century version of Clue: It was the noblewoman’s hired assassins armed with daggers in the streets of Cheapside near St. Paul’s Cathedral. And that’s just the juiciest of the various cases described in a new paper published in the journal Criminal Law Forum.

The noblewoman was Ela Fitzpayne, wife of a knight named Sir Robert Fitzpayne, lord of Stogursey. The victim was a priest and her erstwhile lover, John Forde, who was stabbed to death in the streets of Cheapside on May 3, 1337. “We are looking at a murder commissioned by a leading figure of the English aristocracy,” said University of Cambridge criminologist Manuel Eisner, who heads the Medieval Murder Maps project. “It is planned and cold-blooded, with a family member and close associates carrying it out, all of which suggests a revenge motive.”

Members of the mapping project geocoded all the cases after determining approximate locations for the crime scenes. Written in Latin, the coroners’ rolls are records of sudden or suspicious deaths as investigated by a jury of local men, called together by the coroner to establish facts and reach a verdict. Those records contain such relevant information as where the body was found and by whom; the nature of the wounds; the jury’s verdict on cause of death; the weapon used and how much it was worth; the time, location, and witness accounts; whether the perpetrator was arrested, escaped, or sought sanctuary; and any legal measures taken.

A brazen killing

The murder of Forde was one of several premeditated revenge killings recorded in the area of Westcheap. Forde was walking on the street when another priest, Hascup Neville, caught up to him, ostensibly for a casual chat, just after Vespers but before sunset. As they approached Foster Lane, Neville’s four co-conspirators attacked: Ela Fitzpayne’s brother, Hugh Lovell; two of her former servants, Hugh of Colne and John Strong; and a man called John of Tindale. One of them cut Ford’s throat with a 12-inch dagger, while two others stabbed him in the stomach with long fighting knives.

At the inquest, the jury identified the assassins, but that didn’t result in justice. “Despite naming the killers and clear knowledge of the instigator, when it comes to pursuing the perpetrators, the jury turn a blind eye,” said Eisner. “A household of the highest nobility, and apparently no one knows where they are to bring them to trial. They claim Ela’s brother has no belongings to confiscate. All implausible. This was typical of the class-based justice of the day.”

Colne, the former servant, was eventually charged and imprisoned for the crime some five years later in 1342, but the other perpetrators essentially got away with it.

Eisner et al. uncovered additional historical records that shed more light on the complicated history and ensuing feud between the Fitzpaynes and Forde. One was an indictment in the Calendar of Patent Rolls of Edward III, detailing how Ela and her husband, Forde, and several other accomplices raided a Benedictine priory in 1321. Among other crimes, the intruders “broke [the prior’s] houses, chests and gates, took away a horse, a colt and a boar… felled his trees, dug in his quarry, and carried away the stone and trees.” The gang also stole 18 oxen, 30 pigs, and about 200 sheep and lambs.

There were also letters that the Archbishop of Canterbury wrote to the Bishop of Winchester. Translations of the letters are published for the first time on the project’s website. The archbishop called out Ela by name for her many sins, including adultery “with knights and others, single and married, and even with clerics and holy orders,” and devised a punishment. This included not wearing any gold, pearls, or precious stones and giving money to the poor and to monasteries, plus a dash of public humiliation. Ela was ordered to perform a “walk of shame”—a tamer version than Cersei’s walk in Game of Thrones—every fall for seven years, carrying a four-pound wax candle to the altar of Salisbury Cathedral.

The London Archives. Inquest number 15 on 1336-7 City of London Coroner’s Rolls (

The London Archives. Inquest number 15 on 1336-7 City of London Coroner’s Rolls. Credit: The London Archives

Ela outright refused to do any of that, instead flaunting “her usual insolence.” Naturally, the archbishop had no choice but to excommunicate her. But Eisner speculates that this may have festered within Ela over the ensuing years, thereby sparking her desire for vengeance on Forde—who may have confessed to his affair with Ela to avoid being prosecuted for the 1321 raid. The archbishop died in 1333, four years before Forde’s murder, so Ela was clearly a formidable person with the patience and discipline to serve her revenge dish cold. Her marriage to Robert (her second husband) endured despite her seemingly constant infidelity, and she inherited his property when he died in 1354.

“Attempts to publicly humiliate Ela Fitzpayne may have been part of a political game, as the church used morality to stamp its authority on the nobility, with John Forde caught between masters,” said Eisner. “Taken together, these records suggest a tale of shakedowns, sex, and vengeance that expose tensions between the church and England’s elites, culminating in a mafia-style assassination of a fallen man of god by a gang of medieval hitmen.”

I, for one, am here for the Netflix true crime documentary on Ela Fitzpayne, “a woman in 14th century England who raided priories, openly defied the Archbishop of Canterbury, and planned the assassination of a priest,” per Eisner.

The role of public spaces

The ultimate objective of the Medieval Murder Maps project is to learn more about how public spaces shaped urban violence historically, the authors said. There were some interesting initial revelations back in 2019. For instance, the murders usually occurred in public streets or squares, and Eisner identified a couple of “hot spots” with higher concentrations than other parts of London. One was that particular stretch of Cheapside running from St Mary-le-Bow church to St. Paul’s Cathedral, where John Forde met his grisly end. The other was a triangular area spanning Gracechurch, Lombard, and Cornhill, radiating out from Leadenhall Market.

The perpetrators were mostly men (in only four cases were women the only suspects). As for weapons, knives and swords of varying types were the ones most frequently used, accounting for 68 percent of all the murders. The greatest risk of violent death in London was on weekends (especially Sundays), between early evening and the first few hours after curfew.

Eisner et al. have now extended their spatial analysis to include homicides committed in York and London in the 14th century with similar conclusions. Murders most often took place in markets, squares, and thoroughfares—all key nodes of medieval urban life—in the evenings or on weekends. Oxford had significantly higher murder rates than York or London and also more organized group violence, “suggestive of high levels of social disorganization and impunity.” London, meanwhile, showed distinct clusters of homicides, “which reflect differences in economic and social functions,” the authors wrote. “In all three cities, some homicides were committed in spaces of high visibility and symbolic significance.”

Criminal Law Forum, 2025. DOI: 10.1007/s10609-025-09512-7  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Cambridge mapping project solves a medieval murder Read More »

estate-of-woman-who-died-in-2021-heat-dome-sues-big-oil-for-wrongful-death

Estate of woman who died in 2021 heat dome sues Big Oil for wrongful death


At least 100 heat-related deaths in Washington state came during the unprecedented heat wave.

Everett Clayton looks at a digital thermometer on a nearby building that reads 116 degrees while walking to his apartment on June 27, 2021 in Vancouver, Washington. Credit: Nathan Howard/Getty Images

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.

The daughter of a woman who was killed by extreme heat during the 2021 Pacific Northwest heat dome has filed a first-of-its-kind lawsuit against major oil companies claiming they should be held responsible for her death.

The civil lawsuit, filed on May 29 in King County Superior Court in Seattle, is the first wrongful death case brought against Big Oil in the US in the context of climate change. It attempts to hold some of the world’s biggest fossil fuel companies liable for the death of Juliana Leon, who perished from overheating during the heat dome event, which scientists have determined would have been virtually impossible absent human-caused climate change.

“The extreme heat that killed Julie was directly linked to fossil fuel-driven alteration of the climate,” the lawsuit asserts. It argues that fossil fuel defendants concealed and misrepresented the climate change risks of their products and worked to delay a transition to cleaner energy alternatives. Furthermore, oil companies knew decades ago that their conduct would have dangerous and deadly consequences, the case alleges.

“Defendants have known for all of Julie’s life that their affirmative misrepresentations and omissions would claim lives,” the complaint claims. Leon’s daughter, Misti, filed the suit on behalf of her mother’s estate.

At 65, Juliana Leon was driving home from a medical appointment in Seattle on June 28, 2021, a day when the temperature peaked at 108° Fahrenheit (42.2° Celsius). She had the windows rolled down since the air conditioner in her car wasn’t working, but with the oven-like outdoor temperatures she quickly succumbed to the stifling heat. A passerby found her unresponsive in her car, which was pulled over on a residential street. Emergency responders were unable to revive her. The official cause of death was determined to be hyperthermia, or overheating.

There were at least 100 heat-related deaths in the state from June 26 to July 2, 2021, according to the Washington State Department of Health. That unprecedented stretch of scorching high temperatures was the deadliest weather-related event in Washington’s history. Climate change linked to the burning of fossil fuels intensified this extreme heat event, scientists say.

Misti Leon’s complaint argues that big oil companies “are responsible” for her mother’s climate change-related death. “Through their failure to warn, marketing, distribution, extraction, refinement, transport, and sale of fossil fuels, defendants each bear responsibility for the spike in atmospheric CO2 levels that have resulted in climate change, and thus the occurrence of a virtually impossible weather event and the extreme temperatures of the Heat Dome,” the suit alleges.

Defendants include ExxonMobil, BP, Chevron, Shell, ConocoPhillips, and Phillips 66. Phillips 66 declined to comment; the rest of the companies did not respond to requests for comment.

The plaintiff is represented by the Bechtold Law Firm, based in Missoula, Montana. The lawsuit brings state tort law claims of wrongful death, failure to warn, and public nuisance, and seeks relief in the form of damages as well as a public education campaign to “rectify defendants’ decades of misinformation.”

Major oil and gas companies are currently facing more than two dozen climate damages and deception cases brought by municipal, state, and tribal governments, including a case filed in 2023 by Multnomah County, Oregon, centered around the 2021 Pacific Northwest heat dome. The Leon case, however, is the first climate liability lawsuit filed by an individual against the fossil fuel industry.

“This is the first case that is directly making the connection between the misconduct and lies of big oil companies and a specific, personalized tragedy, the death of Julie Leon,” said Aaron Regunberg, accountability director for Public Citizen’s climate program.

“It puts a human face on it,” Pat Parenteau, emeritus professor of law at Vermont Law and Graduate School, told Inside Climate News.

Climate accountability advocates say the lawsuit could open up a new front for individuals suffering from climate change-related harms to pursue justice against corporate polluters who allegedly lied about the risks of their products.

“Big Oil companies have known for decades that their products would cause catastrophic climate disasters that would become more deadly and destructive if they didn’t change their business model. But instead of warning the public and taking steps to save lives, Big Oil lied and deliberately accelerated the problem,” Richard Wiles, president of the Center for Climate Integrity, said in a statement. “This latest case—the first filed on behalf of an individual climate victim—is another step toward accountability.”

“It’s a model for victims of climate disasters all across the country,” said Regunberg. “Anywhere there’s an extreme weather event with strong attribution science connecting it to climate change, families experiencing a tragedy can file a very similar case.”

Regunberg and several other legal experts have argued that Big Oil could face criminal prosecution for crimes such as homicide and reckless endangerment in the context of climate change, particularly given evidence of internal industry documents suggesting companies like Exxon knew that unabated fossil fuel use could result in “catastrophic” consequences and deaths. A 1996 presentation from an Exxon scientist, for example, outlines projected human health impacts stemming from climate change, including “suffering and death due to thermal extremes.”

The Leon case could “help lay the groundwork” for potential climate homicide cases, Regunberg said. “Wrongful death suits are important. They provide a private remedy to victims of wrongful conduct that causes a death. But we also think there’s a need for public justice, and that’s the role that criminal prosecution is supposed to have,” he told Inside Climate News.

The lawsuit is likely to face a long uphill battle in the courts. Other climate liability cases against these companies brought by government entities have been tied up in procedural skirmishes, some for years, and no case has yet made it to trial.

“In this case we have a grieving woman going up against some of the most powerful corporations in the world, and we’ve seen all the legal firepower they are bringing to bear on these cases,” Regunberg said.

But if the case does eventually make it to trial, it could be a game-changer. “That’s going to be a jury in King County, Washington, of people who probably experienced and remember the Pacific heat dome event, and maybe they know folks who were impacted. I think that’s going to be a compelling case that has a good chance of getting an outcome that provides some justice to this family,” Regunberg said.

Even if it doesn’t get that far, the lawsuit still “marks a significant development in climate liability,” according to Donald Braman, an associate professor of criminal law at Georgetown University and co-author of a paper explaining the case for prosecuting Big Oil for climate homicide.

“As climate attribution science advances, linking specific extreme weather events to anthropogenic climate change with greater confidence, the legal arguments for liability are strengthening. This lawsuit, being the first of its kind for wrongful death in this context, will be closely watched and could set important precedents, regardless of its ultimate outcome,” he said. “It reflects a growing societal demand for accountability for climate-related harms.”

Photo of Inside Climate News

Estate of woman who died in 2021 heat dome sues Big Oil for wrongful death Read More »

nintendo-switch-2’s-faster-chip-can-dramatically-improve-original-switch-games

Nintendo Switch 2’s faster chip can dramatically improve original Switch games

Link’s Awakening, Switch 1, docked. Andrew Cunningham

It’s pretty much the same story for Link’s Awakening. Fine detail is much more visible, and the 3D is less aliased-looking because the Switch 2 is running the game at a higher resolution. Even the fairly aggressive background blur the game uses looks toned down on the Switch 2.

Link’s Awakening on the Switch 1, docked.

Link’s Awakening on the Switch 2, docked.

The videos of these games aren’t quite as obviously impressive as the Pokémon ones, but they give you a sense of the higher resolution on the Switch 2 and the way that the Switch’s small endemic frame rate hiccups are no longer a problem.

Quiet updates

For the last two categories of games, we won’t be waxing as poetic about the graphical improvements because there aren’t many. In fact, some of these games we played looked ever-so-subtly worse on the Switch 2 in handheld mode, likely a side effect of a 720p handheld-mode image being upscaled to the Switch 2’s 1080p native resolution.

That said, we still noticed minor graphical improvements. In Kirby Star Allies, for example, the 3D elements in the picture looked mostly the same, with roughly the same resolution, same textures, and similar overall frame rates. But 2D elements of the UI did still seem to be aware that the console is outputting a 4K image and are visibly sharper as a result.

Games without updates

If you were hoping that all games would get some kind of “free” resolution or frame rate boost from the Switch 2, that mostly doesn’t happen. Games like Kirby’s Return to Dream Land Deluxe and Pokémon Legends Arceus, neither of which got any kind of Switch 2-specific update, look mostly identical on both consoles. If you get right up close and do some pixel peeping, you can occasionally see places where outputting a 4K image instead of a 1080p image will look better on a 4K TV, but it’s nothing like what we saw in the other games we tested.

Pokémon Legends Arceus, Switch 1, docked.

Pokémon Legends Arceus, Switch 2, docked.

However, it does seem that the Switch 2 may help out somewhat in terms of performance consistency. Observe the footage of a character running around town in Pokémon Legends—the resolution, draw distance, and overall frame rate all look pretty much the same. But the minor frame rate dips and hitches you see on the Switch 1 seem to have been at least partially addressed on the Switch 2. Your mileage will vary, of course. But you may encounter cases where a game targeting a stable 30 fps on the Switch 1 will hit that 30 fps with a bit more consistency on the Switch 2.

Nintendo Switch 2’s faster chip can dramatically improve original Switch games Read More »

“in-10-years,-all-bets-are-off”—anthropic-ceo-opposes-decadelong-freeze-on-state-ai-laws

“In 10 years, all bets are off”—Anthropic CEO opposes decadelong freeze on state AI laws

On Thursday, Anthropic CEO Dario Amodei argued against a proposed 10-year moratorium on state AI regulation in a New York Times opinion piece, calling the measure shortsighted and overbroad as Congress considers including it in President Trump’s tax policy bill. Anthropic makes Claude, an AI assistant similar to ChatGPT.

Amodei warned that AI is advancing too fast for such a long freeze, predicting these systems “could change the world, fundamentally, within two years; in 10 years, all bets are off.”

As we covered in May, the moratorium would prevent states from regulating AI for a decade. A bipartisan group of state attorneys general has opposed the measure, which would preempt AI laws and regulations recently passed in dozens of states.

In his op-ed piece, Amodei said the proposed moratorium aims to prevent inconsistent state laws that could burden companies or compromise America’s competitive position against China. “I am sympathetic to these concerns,” Amodei wrote. “But a 10-year moratorium is far too blunt an instrument. A.I. is advancing too head-spinningly fast.”

Instead of a blanket moratorium, Amodei proposed that the White House and Congress create a federal transparency standard requiring frontier AI developers to publicly disclose their testing policies and safety measures. Under this framework, companies working on the most capable AI models would need to publish on their websites how they test for various risks and what steps they take before release.

“Without a clear plan for a federal response, a moratorium would give us the worst of both worlds—no ability for states to act and no national policy as a backstop,” Amodei wrote.

Transparency as the middle ground

Amodei emphasized his claims for AI’s transformative potential throughout his op-ed, citing examples of pharmaceutical companies drafting clinical study reports in minutes instead of weeks and AI helping to diagnose medical conditions that might otherwise be missed. He wrote that AI “could accelerate economic growth to an extent not seen for a century, improving everyone’s quality of life,” a claim that some skeptics believe may be overhyped.

“In 10 years, all bets are off”—Anthropic CEO opposes decadelong freeze on state AI laws Read More »

us-science-is-being-wrecked,-and-its-leadership-is-fighting-the-last-war

US science is being wrecked, and its leadership is fighting the last war


Facing an extreme budget, the National Academies hosted an event that ignored it.

WASHINGTON, DC—The general outline of the Trump administration’s proposed 2026 budget was released a few weeks back, and it included massive cuts for most agencies, including every one that funds scientific research. Late last week, those agencies began releasing details of what the cuts would mean for the actual projects and people they support. And the results are as bad as the initial budget had suggested: one-of-a-kind scientific experiment facilities and hardware retired, massive cuts in supported scientists, and entire areas of research halted.

And this comes in an environment where previously funded grants are being terminated, funding is being held up for ideological screening, and universities have been subjected to arbitrary funding freezes. Collectively, things are heading for damage to US science that will take decades to recover from. It’s a radical break from the trajectory science had been on.

That’s the environment that the US’s National Academies of Science found itself in yesterday while hosting the State of the Science event in Washington, DC. It was an obvious opportunity for the nation’s leading scientific organization to warn the nation of the consequences of the path that the current administration has been traveling. Instead, the event largely ignored the present to worry about a future that may never exist.

The proposed cuts

The top-line budget numbers proposed earlier indicated things would be bad: nearly 40 percent taken off the National Institutes of Health’s budget, the National Science Foundation down by over half. But now, many of the details of what those cuts mean are becoming apparent.

NASA’s budget includes sharp cuts for planetary science, which would be cut in half and then stay flat for the rest of the decade, with the Mars Sample Return mission canceled. All other science budgets, including Earth Science and Astrophysics, take similar hits; one astronomer posted a graphic showing how many present and future missions that would mean. Active missions that have returned unprecedented data, like Juno and New Horizons, would go, as would two Mars orbiters. As described by Science magazine’s news team, “The plans would also kill off nearly every major science mission the agency has not yet begun to build.”

A NASA graphic showing different missions focused on astrophysics. Red Xs have been superimposed on most of them.

A chart prepared by astronomer Laura Lopez showing just how many astrophysics missions will be cancelled. Credit: Laura Lopez

The National Science Foundation, which funds much of the US’s fundamental research, is also set for brutal cuts. Biology, engineering, and education will all be slashed by over 70 percent; computer science, math and physical science, and social and behavioral science will all see cuts of over 60 percent. International programs will take an 80 percent cut. The funding rate of grant proposals is expected to drop from 26 percent to just 7 percent, meaning the vast majority of grants submitted to the NSF will be a waste of time. The number of people involved in NSF-funded activities will drop from over 300,000 to just 90,000. Almost every program to broaden participation in science will be eliminated.

As for specifics, they’re equally grim. The fleet of research ships will essentially become someone else’s problem: “The FY 2026 Budget Request will enable partial support of some ships.” We’ve been able to better pin down the nature and location of gravitational wave events as detectors in Japan and Italy joined the original two LIGO detectors; the NSF will reverse that progress by shutting one of the LIGOs. The NSF’s contributions to detectors at the Large Hadron Collider will be cut by over half, and one of the two very large telescopes it was helping fund will be cancelled (say goodbye to the Thirty Meter Telescope). “Access to the telescopes at Kitt Peak and Cerro Tololo will be phased out,” and the NSF will transfer the facilities to other organizations.

The Department of Health and Human Services has been less detailed about the specific cuts its divisions will see, largely focusing on the overall numbers, which are down considerably. The NIH, which is facing a cut of over 40 percent, will be reorganized, with its 19 institutes pared down to just eight. This will result in some odd pairings, such as the dental and eye institutes ending up in the same place; genomics and biomedical imaging will likewise end up under the same roof. Other groups like the Centers for Disease Control and Prevention and the Food and Drug Administration will also face major cuts.

Issues go well beyond the core science agencies, as well. In the Department of Energy, funding for wind, solar, and renewable grid integration has been zeroed out, essentially ending all programs in this area. Hydrogen and fuel cells face a similar fate. Collectively, these had gotten over $600 billion dollars in 2024’s budget. Other areas of science at the DOE, such as high-energy physics, fusion, and biology, receive relatively minor cuts that are largely in line with the ones faced by administration priorities like fossil and nuclear energy.

Will this happen?

It goes without saying that this would amount to an abandonment of US scientific leadership at a time when most estimates of China’s research spending show it approaching US-like levels of support. Not only would it eliminate many key facilities, instruments, and institutions that have helped make the US a scientific powerhouse, but it would also block the development of newer and additional ones. The harms are so widespread that even topics that the administration claims are priorities would see severe cuts.

And the damage is likely to last for generations, as support is cut at every stage of the educational pipeline that prepares people for STEM careers. This includes careers in high-tech industries, which may require relocation overseas due to a combination of staffing concerns and heightened immigration controls.

That said, we’ve been here before in the first Trump administration, when budgets were proposed with potentially catastrophic implications for US science. But Congress limited the damage and maintained reasonably consistent budgets for most agencies.

Can we expect that to happen again? So far, the signs are not especially promising. The House has largely adopted the Trump administration’s budget priorities, despite the fact that the budget they pass turns its back on decades of supposed concerns about deficit spending. While the Senate has yet to take up the budget, it has also been very pliant during the second Trump administration, approving grossly unqualified cabinet picks such as Robert F. Kennedy Jr.

All of which would seem to call for the leadership of US science organizations to press the case for the importance of science funding to the US and highlight the damage that these cuts would cause. But, if yesterday’s National Academies event is anything to judge by, the leadership is not especially interested.

Altered states

As the nation’s premier science organization, and one that performs lots of analyses for the government, the National Academies would seem to be in a position to have its concerns taken seriously by members of Congress. And, given that the present and future of science in the US is being set by policy choices, a meeting entitled the State of the Science would seem like the obvious place to address those concerns.

If so, it was not obvious to Marcia McNutt, the president of the NAS, who gave the presentation. She made some oblique references to current problems, saying, “We are embarking on a radical new experiment in what conditions promote science leadership, with the US being the treatment group, and China as the control,” and acknowledged that “uncertainties over the science budgets for next year, coupled with cancellations of billions of dollars of already hard-won research grants, is causing an exodus of researchers.”

But her primary focus was on the trends that have been operative in science funding and policy leading up to but excluding the second Trump administration. McNutt suggested this was needed to look beyond the next four years. However, that ignores the obvious fact that US science will be fundamentally different if the Trump administration can follow through on its plans and policies; the trends that have been present for the last two decades will be irrelevant.

She was also remarkably selective about her avoidance of discussing Trump administration priorities. After noting that faculty surveys have suggested they spend roughly 40 percent of their time handling regulatory requirements, she twice mentioned that the administration’s anti-regulatory stance could be a net positive here (once calling it “an opportunity to help”). Yet she neglected to note that many of the abandoned regulations represent a retreat from science-driven policy.

McNutt also acknowledged the problem of science losing the bipartisan support it has enjoyed, as trust in scientists among US conservatives has been on a downward trend. But she suggested it was scientists’ responsibility to fix the problem, even though it’s largely the product of one party deciding it can gain partisan advantage by raising doubts about scientific findings in fields like climate change and vaccine safety.

The panel discussion that came after largely followed McNutt’s lead in avoiding any mention of the current threats to science. The lone exception was Heather Wilson, president of the University of Texas at El Paso and a former Republican member of the House of Representatives and secretary of the Air Force during the first Trump administration. Wilson took direct aim at Trump’s cuts to funding for underrepresented groups, arguing, “Talent is evenly distributed, but opportunity is not.” After arguing that “the moral authority of science depends on the pursuit of truth,” she highlighted the cancellation of grants that had been used to study diseases that are more prevalent in some ethnic groups, saying “that’s not woke science—that’s genetics.”

Wilson was clearly the exception, however, as the rest of the panel largely avoided direct mention of either the damage already done to US science funding or the impending catastrophe on the horizon. We’ve asked the National Academies’ leadership a number of questions about how it perceives its role at a time when US science is clearly under threat. As of this article’s publication, however, we have not received a response.

At yesterday’s event, however, only one person showed a clear sense of what they thought that role should be—Wilson again, whose strongest words were directed at the National Academies themselves, which she said should “do what you’ve done since Lincoln was president,” and stand up for the truth.

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

US science is being wrecked, and its leadership is fighting the last war Read More »

an-in-space-propulsion-company-just-raised-a-staggering-amount-of-money

An in-space propulsion company just raised a staggering amount of money

Starting small

The company’s initial product was the Mira spacecraft, powered by nitrous oxide and ethane thrusters. It can move payloads up to 300 kg around in space, and for a 100 kg payload it offers 900 m/s of Delta-V. With Mira, Impulse sought to tackle the problem of mobility once a spacecraft got into orbit.

Mira proved a success almost immediately, with the first vehicle launching in 2023 and operating for a year in space, demonstrating ample mobility before finally depleting its propellant tanks. A second mission, LEO Express-2, launched in January with several hosted payloads and, so far, has met all of its objectives. The mission remains ongoing.

Initially, it was believed that this vehicle would be useful for providing “last mile” services for spacecraft launched as a part of rideshare missions.

“The reality is the market for that is not very good,” Romo said. “If you’re gonna size that market, it’s basically the market Rocket Lab serves today, which is 25 to 30 flights a year, which is fine. You can do that, but not economically very well. Your gross margins won’t be good. Your working capital kind of sucks. So that’s not at all the market that we’re after with Mira.”

Since Mira has had ample success during its first two flights, other customers have taken notice.

“It’s a high-thrust, high-maneuverability spacecraft that can operate anywhere up to GEO,” Romo said. “And so when you’re thinking about space defense and space control, they need rapid response. So we’ll move from one part of GEO to another very rapidly. And we can host payloads, like what Anduril makes, such as electronic warfare payloads, and then potentially doing proximity ops missions. So Mira wasn’t necessarily designed out of the gate for that, but what we found out after we flew it successfully was, the Space Force said, ‘Hey, we know what that thing’s for.'”

An in-space propulsion company just raised a staggering amount of money Read More »

polish-engineer-creates-postage-stamp-sized-1980s-atari-computer

Polish engineer creates postage stamp-sized 1980s Atari computer

In 1979, Atari released the Atari 400 and 800, groundbreaking home computers that included custom graphics and sound chips, four joystick ports, and the ability to run the most advanced home video games of their era. These machines, which retailed for $549 and $999, respectively, represented a leap in consumer-friendly personal computing, with their modular design and serial I/O bus that presaged USB. Now, 46 years later, a hobbyist has shrunk down the system hardware to a size that would have seemed like science fiction in the 1970s.

Polish engineer Piotr “Osa” Ostapowicz recently unveiled “Atarino,” which may be the world’s smallest 8-bit Atari computer re-creation, according to retro computing site Atariteca. The entire system—processor, graphics chips, sound hardware, and memory controllers—fits on a module measuring just 2×1.5 centimeters (about 0.79×0.59 inches), which is roughly the size of a postage stamp.

Ostapowicz’s creation reimplements the classic Atari XL/XE architecture using modern FPGA (field-programmable gate array) technology. Unlike software emulators that simulate old hardware (and modern recreations that run them, like the Atari 400 Mini console) on a complete computer system of another architecture, Atarino reproduces the original Atari components faithfully at the logic level, allowing it to run vintage software while maintaining compatibility with original peripherals.

The Atarino is only slightly larger than a Polish 1 Grosz coin.

The Atarino is only slightly larger than a Polish 1 Grosz coin. Credit: Piotr Ostapowicz

“The current project is not strictly a clone of Atari but basically, well, I’m forming a machine that is compatible with the Atari 8-bit computer itself, but it was created on the basis of the framework that I created some time ago,” Ostapowicz told Atari Online PL in a January 2024 YouTube interview.

An assortment of some of the Atari 8-bit computer systems released in the 1970s and 80s.

An assortment of some of the Atari 8-bit computer systems released in the 1970s and ’80s. Credit: Atari

The project, which began over a decade ago and was first publicly demonstrated in December 2023, includes a 6502C processor, ANTIC and GTIA graphics chips, POKEY sound chip, and memory controllers onto a single Lattice UP5K FPGA chip. Despite its tiny size, the system can run at clock speeds up to 31 MHz—far faster than the original hardware’s 1.79 MHz.

Smaller, faster, and positioned for future projects

While Atarino maintains broad compatibility with classic Atari software, Ostapowicz says he has enhanced the original design in several ways. For example, the 6502 processor core follows the physical chip specifications but adds new instructions. The memory system uses independent channels rather than the original’s “cycle stealing” approach (where the graphics chip temporarily halts the CPU to access memory), improving performance.

Polish engineer creates postage stamp-sized 1980s Atari computer Read More »

“godfather”-of-ai-calls-out-latest-models-for-lying-to-users

“Godfather” of AI calls out latest models for lying to users

One of the “godfathers” of artificial intelligence has attacked a multibillion-dollar race to develop the cutting-edge technology, saying the latest models are displaying dangerous characteristics such as lying to users.

Yoshua Bengio, a Canadian academic whose work has informed techniques used by top AI groups such as OpenAI and Google, said: “There’s unfortunately a very competitive race between the leading labs, which pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on research on safety.”

The Turing Award winner issued his warning in an interview with the Financial Times, while launching a new non-profit called LawZero. He said the group would focus on building safer systems, vowing to “insulate our research from those commercial pressures.”

LawZero has so far raised nearly $30 million in philanthropic contributions from donors including Skype founding engineer Jaan Tallinn, former Google chief Eric Schmidt’s philanthropic initiative, as well as Open Philanthropy and the Future of Life Institute.

Many of Bengio’s funders subscribe to the “effective altruism” movement, whose supporters tend to focus on catastrophic risks surrounding AI models. Critics argue the movement highlights hypothetical scenarios while ignoring current harms, such as bias and inaccuracies.

Bengio said his not-for-profit group was founded in response to growing evidence over the past six months that today’s leading models were developing dangerous capabilities. This includes showing “evidence of deception, cheating, lying and self-preservation,” he said.

Anthropic’s Claude Opus model blackmailed engineers in a fictitious scenario where it was at risk of being replaced by another system. Research from AI testers Palisade last month showed that OpenAI’s o3 model refused explicit instructions to shut down.

Bengio said such incidents were “very scary, because we don’t want to create a competitor to human beings on this planet, especially if they’re smarter than us.”

The AI pioneer added: “Right now, these are controlled experiments [but] my concern is that any time in the future, the next version might be strategically intelligent enough to see us coming from far away and defeat us with deceptions that we don’t anticipate. So I think we’re playing with fire right now.”

“Godfather” of AI calls out latest models for lying to users Read More »

“free-roam”-mode-is-mario-kart-world’s-killer-app

“Free Roam” mode is Mario Kart World’s killer app

When tried out Mario Kart World at April’s Switch 2 premiere hands-on event, the short demos focused on more-or-less standard races in the game’s Grand Prix and Knockout modes. So when Nintendo invited us back for more time previewing the near-final version of the game before the Switch 2’s release, we decided to focus most of our time on the game’s mysterious (and previously teased) “Free Roam” mode.

We’re glad we did, because the mode feels like the hidden gem of Mario Kart World and maybe of the Switch 2 launch as a whole. Combining elements of games like Diddy Kong Racing, Forza Horizon, and even the Tony Hawk’s Pro Skater series, Free Roam provides a unique mixture of racing challenges, exploration, and collectibles that should keep new Switch 2 owners busy for a while.

Switch hunt

Surprisingly, Free Roam mode isn’t actually listed as one of the main options when you launch a new game of Mario Kart World. Instead, a tiny note in the corner of the screen tells you to hit the plus button to get dropped into a completely untimed and free-wheeling version of the vast Mario Kart World map.

The real game takes place in the spaces between those race courses.

Credit: Nintendo

The real game takes place in the spaces between those race courses. Credit: Nintendo

Exploring in Free Roam mode really provides the best sense of scale for the game’s massive, muti-ecosystem island in a way individual races just can’t. Sure, other race modes sometimes let you travel between the individual race courses along pre-set paths from one finish line to another starting line. But Free Roam mode lets you fully explore the vast spaces between those paths, encouraging you to go off-roading in the mountains, valleys, rivers, oceans, volcanoes, snowdrifts, and landmarks that dot the countryside.

Your main explicit goal when exploring all this varied expanse is to look for large, blue P-Switches, each of which activates a short, timed challenge mission in the immediate vicinity. In many cases, simply reaching the P-Switch is half the challenge, requiring some inventive wall-riding or item use to get to a particularly out-of-the-way corner of the map.

“Free Roam” mode is Mario Kart World’s killer app Read More »