Author name: Rejus Almole

engineer-creates-first-custom-motherboard-for-1990s-playstation-console

Engineer creates first custom motherboard for 1990s PlayStation console

The nsOne project joins a growing community of homebrew PlayStation 1 hardware developments. Other recent projects include Picostation, a Raspberry Pi Pico-based optical disc emulator (ODE) that allows PlayStation 1 consoles to load games from SD cards instead of physical discs. Other ODEs like MODE and PSIO have also become popular solutions for retrogaming collectors who play games on original hardware as optical drives age and fail.

From repair job to reverse-engineering project

To understand the classic console’s physical architecture, Brodesco physically sanded down an original motherboard to expose its internal layers, then cross-referenced the exposed traces with component datasheets and service manuals.

“I realized that detailed documentation on the original motherboard was either incomplete or entirely unavailable,” Brodesco explained in his Kickstarter campaign. This discovery launched what would become a comprehensive documentation effort, including tracing every connection on the board and creating multi-layer graphic representations of the circuitry.

A photo of the nsOne PlayStation motherboard.

A photo of the nsOne PlayStation motherboard. Credit: Lorentio Brodesco

Using optical scanning and manual net-by-net reverse-engineering, Brodesco recreated the PlayStation 1’s schematic in modern PCB design software. This process involved creating component symbols with accurate pin mappings and identifying—or in some cases creating—the correct footprints for each proprietary component that Sony had never publicly documented.

Brodesco also identified what he calls the “minimum architecture” required to boot the console without BIOS modifications, streamlining the design process while maintaining full compatibility.

The mock-up board shown in photos validates the footprints of chips and connectors, all redrawn from scratch. According to Brodesco, a fully routed version with complete multilayer routing and final layout is already in development.

A photo of the nsOne PlayStation motherboard.

A photo of the nsOne PlayStation motherboard. Credit: Lorentio Brodesco

As Brodesco noted on Kickstarter, his project’s goal is to “create comprehensive documentation, design files, and production-ready blueprints for manufacturing fully functional motherboards.”

Beyond repairs, the documentation and design files Brodesco is creating would preserve the PlayStation 1’s hardware architecture for future generations: “It’s a tribute to the PS1, to retro hardware, and to the belief that one person really can build the impossible.”

Engineer creates first custom motherboard for 1990s PlayStation console Read More »

hollywood-studios-target-ai-image-generator-in-copyright-lawsuit

Hollywood studios target AI image generator in copyright lawsuit

The legal action follows similar moves in other creative industries, with more than a dozen major news companies suing AI company Cohere in February over copyright concerns. In 2023, a group of visual artists sued Midjourney for similar reasons.

Studios claim Midjourney knows what it’s doing

Beyond allowing users to create these images, the studios argue that Midjourney actively promotes copyright infringement by displaying user-generated content featuring copyrighted characters in its “Explore” section. The complaint states this curation “show[s] that Midjourney knows that its platform regularly reproduces Plaintiffs’ Copyrighted Works.”

The studios also allege that Midjourney has technical protection measures available that could prevent outputs featuring copyrighted material but has “affirmatively chosen not to use copyright protection measures to limit the infringement.” They cite Midjourney CEO David Holz admitting the company “pulls off all the data it can, all the text it can, all the images it can” for training purposes.

According to Axios, Disney and NBCUniversal attempted to address the issue with Midjourney before filing suit. While the studios say other AI platforms agreed to implement measures to stop IP theft, Midjourney “continued to release new versions of its Image Service” with what Holz allegedly described as “even higher quality infringing images.”

“We are bringing this action today to protect the hard work of all the artists whose work entertains and inspires us and the significant investment we make in our content,” said Kim Harris, NBCUniversal’s executive vice president and general counsel, in a statement.

This lawsuit signals a new front in Hollywood’s conflict over AI. Axios highlights this shift: While actors and writers have fought to protect their name, image, and likeness from studio exploitation, now the studios are taking on tech companies over intellectual property concerns. Other major studios, including Amazon, Netflix, Paramount Pictures, Sony, and Warner Bros., have not yet joined the lawsuit, though they share membership with Disney and Universal in the Motion Picture Association.

Hollywood studios target AI image generator in copyright lawsuit Read More »

the-dream-of-a-gentle-singularity

The Dream of a Gentle Singularity

Sam Altman offers us a new essay, The Gentle Singularity. It’s short (if a little long to quote in full), so given you read my posts it’s probably worth reading the whole thing.

First off, thank you to Altman for publishing this and sharing his thoughts. This was helpful, and contained much that was good. It’s important to say that first, before I start tearing into various passages, and pointing out the ways in which this is trying to convince us that everything is going to be fine when very clearly the default is for everything to be not fine.

I have now done that. So here we go.

Sam Altman (CEO OpenAI): We are past the event horizon; the takeoff has started. Humanity is close to building digital superintelligence, and at least so far it’s much less weird than it seems like it should be.

Robots are not yet walking the streets, nor are most of us talking to AI all day. People still die of disease, we still can’t easily go to space, and there is a lot about the universe we don’t understand.

And yet, we have recently built systems that are smarter than people in many ways, and are able to significantly amplify the output of people using them. The least-likely part of the work is behind us; the scientific insights that got us to systems like GPT-4 and o3 were hard-won, but will take us very far.

Assuming we agree that the takeoff has started, I would call that the ‘calm before the storm,’ or perhaps ‘how exponentials work.’

Being close to building something is not going to make the world look weird. What makes the world look weird is actually building it. Some people (like Tyler Cowen) claim o3 is AGI, but everyone agrees we don’t have ASI (superintelligence) yet.

Also, frankly, yeah, it’s super weird that we have these LLMs we can talk to, it’s just that you get used to ‘weird’ things remarkably quickly. It seems like it ‘should be weird’ (or perhaps ‘weirder’?) because what we do have now is still unevenly distributed and not well-exploited, and many of us including Altman are comparing the current level of weirdness to the near future True High Weirdness that is coming, much of which is already baked in.

If anything, I think the current low level of High Weirdness is due to us. as I argue later, not being used to these new capabilities. Why do we see so few scams, spam and slop and bots and astroturfing and disinformation, deepfakes, cybercrime, giant boosts in productivity, talking mainly to AIs all day, actual learning and so on? Mostly I think it’s because People Don’t Do Things and don’t know what is possible.

Sam Altman: 2025 has seen the arrival of agents that can do real cognitive work; writing computer code will never be the same. 2026 will likely see the arrival of systems that can figure out novel insights. 2027 may see the arrival of robots that can do tasks in the real world.

That’s a bold prediction, modulo the ‘may’ and the values of ‘tasks in the real world’ and ‘novel insights.’

And yes, I agree that the following is true, as long as you notice the word ‘may’:

In the most important ways, the 2030s may not be wildly different. People will still love their families, express their creativity, play games, and swim in lakes.

Note not only the ‘may’ but the low bar for ‘not be wildly different.’ The people of 1000 BCE did all those things, plausibly they also did them in 10,000 BCE or 100,000 BCE. Is that what would count as ‘not wildly different’?

This is essentially asserting that people in the 2030s will be alive. Well, I hope so!

Already we live with incredible digital intelligence, and after some initial shock, most of us are pretty used to it.

I get why one would say this, but it seems very wrong? First of all, who is this ‘us’ of which you speak? If the ‘us’ refers to the people of Earth or of the United States, then the statement to me seems clearly false. If it refers to Altman’s readers, then the claim is at least plausible. But I still think it is false. I’m not used to o3-pro. Even I haven’t found the time to properly figure out what I can fully do with even o3 or Opus without building tools.

We are ‘used to this’ in the sense that we are finding ways to mostly ignore it because life is, as Agnes Callard says, coming at us 15 minutes at a time, and we are busy, so we take some low-hanging fruit and then take it for granted, and don’t notice how much is left to pick. We tell ourselves we are used to it so we can go about our day.

This is how the singularity goes: wonders become routine, and then table stakes.

We already hear from scientists that they are two or three times more productive than they were before AI.

I note that Robin Hanson responded to this here:

Robin Hanson: “We already hear from scientists that they are two or three times more productive than they were before AI.”

If so, wouldn’t their wages now be 2-3x larger too?

Abram: No, why would they be?

Robin Hanson: Supply and demand.

o3 pro, you want to take this one? Oh right, nominal and contratual stickiness, institutional price controls, surplus division, supply response, measurement mismatch, time reallocation, you get maybe a 10% pay bump, wages track bargained marginal revenue not raw technical output, and both lag the technical shock by years.

I did, however, notice my economics being several times more productive there.

From here on, the tools we have already built will help us find further scientific insights and aid us in creating better AI systems. Of course this isn’t the same thing as an AI system completely autonomously updating its own code, but nevertheless this is a larval version of recursive self-improvement.

There are other self-reinforcing loops at play. The economic value creation has started a flywheel of compounding infrastructure buildout to run these increasingly-powerful AI systems. And robots that can build other robots (and in some sense, datacenters that can build other datacenters) aren’t that far off.

If we have to make the first million humanoid robots the old-fashioned way, but then they can operate the entire supply chain—digging and refining minerals, driving trucks, running factories, etc.—to build more robots, which can build more chip fabrication facilities, data centers, etc, then the rate of progress will obviously be quite different.

Yep. Then table stakes, recursive self-improvement, self-perpetuating growth, a robot-based parallel physical production economy. His timeline seems to be AI 2028.

Then true superintelligence, then it keeps going after that, then what?

The rate of technological progress will keep accelerating, and it will continue to be the case that people are capable of adapting to almost anything.

The rate of new wonders being achieved will be immense. It’s hard to even imagine today what we will have discovered by 2035.

It’s important to notice that this ‘adapt to anything’ is true in some ways and not in others. There are some things that are like decapitations, in that you very much cannot adapt because they kill you, dead. Or that deny you the necessary resources to survive, or to compete. You can’t ‘adapt’ to compete with someone or something sufficiently more capable than you.

We probably won’t adopt a new social contract all at once, but when we look back in a few decades, the gradual changes will have amounted to something big.

If history is any guide, we will figure out new things to do and new things to want, and assimilate new tools quickly (job change after the industrial revolution is a good recent example).

I sigh every time I see this ‘well in the past we’ve adapted and there were more things to do so in the future when we make superintelligent things universally better at everything no reason this shouldn’t still be true, we just need some time.’ Um, no? Or at least, definitely not by default?

I seriously don’t understand how you can expect robots by 2028 and wonders beyond the imagination along with superintelligence by 2035 and think mostly humans will do the things we usually do only with more capabilities at our disposal, or something? It’s like there’s some sort of semantic stop sign to not think about the obvious implications? Is there an actual model of what this world looks like?

A subsistence farmer from a thousand years ago would look at what many of us do and say we have fake jobs, and think that we are just playing games to entertain ourselves since we have plenty of food and unimaginable luxuries. I hope we will look at the jobs a thousand years in the future and think they are very fake jobs, and I have no doubt they will feel incredibly important and satisfying to the people doing them.

There are two halves to this.

The first half is, would the subsistence farmer think the jobs were fake? For some jobs yes, but once you explained what was going on and they got over future shock, I don’t think their breakdown of real versus fake would be that different from that of a farmer today. They might think a lot of them are not ‘necessary,’ that they were products of great luxury, but that would not be different than how they thought about the jobs of those at their king’s court.

I too hope that I look a thousand years in the future and I see people at all, who are actually alive, doing things at all. I hope they move beyond thinking of them quite as ‘jobs’ but I will happily take jobs. This time is different, however. Before humans built tools and grew more capable through those tools, opening up our ability to do more things.

The thing Altman is describing is very obviously, as I keep saying, not a mere tool. Humans will no longer be the strongest optimizers, or the smartest minds, or the most capable agents. Anything we can do, AI can do better, except insofar as the thing doesn’t count unless a human does it. Otherwise, an AI does the new job too.

Altman talks about some people deciding to ‘plug in’ to machine-human interfaces while others choose not to. Won’t this be like deciding not to have a phone or not use computers, only vastly more so, and also the computer and phone are applying for your job? Then again, if all the jobs that involve the AI are done better by the AI alone anyway, including manual labor via robots, perhaps you don’t lose that much by not plugging in?

And indeed, if there are jobs that ‘require you be a human’ it might also require that you not be plugged in.

Think about chess. First humans beat AIs. Then AIs beat humans, but for a brief period AI and humans working together, essentially these ‘merges,’ still beat AIs. Then the humans no longer added anything. We’re going through the same process in a lot of places, like diagnostic reasoning, where the doctor is arguably already a net negative when they don’t accept the AI’s opinion.

Now, humans use the AIs to train, perhaps, but they don’t ‘merge’ or ‘plug in’ because if they did that then the AIs would be playing chess. We want two humans to play chess, so they need to be fully unplugged, or the exercise loses its meaning.

So, again, seriously, ‘merge’? Why do people think this is a thing?

Looking forward, this sounds hard to wrap our heads around. But probably living through it will feel impressive but manageable. From a relativistic perspective, the singularity happens bit by bit, and the merge happens slowly.

I have no idea where this expectation is coming from, other than that the people won’t have a say in it. The singularity will, as Douglas Adams wrote about deadlines, give a whoosh as it flies by. That will be that. There’s nothing to manage, your services are no longer required.

Jamie Sevilla notes that Altman’s estimate here that the average ChatGPT query uses about 0.34 watt-hours, about what an oven would use in a little over one second, and roughly one fifteenth of a teaspoon of water, similar to Epoch’s estimate of 0.3 watt-hours, which was a 90% reduction over previous estimates. Compute efficiency is improving rapidly. Also note o3’s 80% price drop.

Like Dario Amodei’s Machines of Loving Grace, the latest Altman essay spends the bulk of its time hand waving away all the important concerns about such futures, both in terms of getting there and where the there is we even want to get. It’s basically wish fulfillment. There’s some value in pointing out that such worlds will, to the extent we can direct those worlds and their AIs to do things, be very good at wish fulfillment. There are some people who need to hear that. But that’s not the hard part.

Finally, we get to the ‘serious challenges to confront’ section.

There are serious challenges to confront along with the huge upsides. We do need to solve the safety issues, technically and societally, but then it’s critically important to widely distribute access to superintelligence given the economic implications. The best path forward might be something like:

  1. Solve the alignment problem, meaning that we can robustly guarantee that we get AI systems to learn and act towards what we collectively really want over the long-term (social media feeds are an example of misaligned AI; the algorithms that power those are incredible at getting you to keep scrolling and clearly understand your short-term preferences, but they do so by exploiting something in your brain that overrides your long-term preference).

  2. Then focus on making superintelligence cheap, widely available, and not too concentrated with any person, company, or country. Society is resilient, creative, and adapts quickly. If we can harness the collective will and wisdom of people, then although we’ll make plenty of mistakes and some things will go really wrong, we will learn and adapt quickly and be able to use this technology to get maximum upside and minimal downside. Giving users a lot of freedom, within broad bounds society has to decide on, seems very important. The sooner the world can start a conversation about what these broad bounds are and how we define collective alignment, the better.

That’s it. Allow me to summarize this plan:

  1. Solve the alignment problem so AIs learn and act towards what we collectively really want over the long term.

  2. Make superintelligence cheap, widely available, and not too concentrated.

  3. We’ll adapt, muddle through, figure it out, user freedom, it’s all good, society is resilient and adapts quickly.

I’m sorry, but that’s contradictory, doesn’t address the hard questions, isn’t an answer.

Instead it passes the buck to ‘society’ to discuss and answer the questions, when for various overdetermined reasons we do not seem capable of having these conversations in any serious fashion – and indeed, the moment Altman comes up against possibly joining such a conversation, he punts, and says ‘you first.’ Indeed, he seems to acknowledge this, and to want to wait until after the singularity to figure out how to deal with the effects of the singularity – he says ‘the sooner the better’ but the plan is clearly not to get to this all that soon.

This all assumes facts not in evidence and likely false in context. Why should we expect society to be resilient in this situation? What even is ‘society’ when the smartest minds, most capable agents, are AIs? How do users have this freedom, how is the intelligence widely available, if the agents all are going to act towards ‘what we collectively really want over the long term?’ and how do you reconcile these conflicting goals? Who decides what all of that means?

If the power is diffused how do we avoid inevitable gradual disempowerment? What ‘broad bounds’ are we going to ‘decide upon’ and how are we deciding? How would certain people feel about the call for diffusion of power outside of ‘any one country’ and how does this square with all the ‘America must win the race’ talk? Either the power diffusion is meaningful or it isn’t. Given history, why would you expect there to be such a voluntary diffusion of power?

That’s in addition to not addressing the technical aspect of any of this, at all. Yes, good, ‘solve the alignment problem,’ how the heck do you propose we do that? For any definition of that plan, especially if this has to survive wide distribution?

I get that none of that is ‘the point’ of The Gentle Singularity. But the right answers to those questions are the only way such a singularity can stay gentle, or end well. It’s not an optional conversation.

Eli Lifland: I’m most concerned about not being able to align superintelligence with any goal, rather than only being able to align them with short-term goals.

I’m concerned about this characterization of the alignment problem [the #1 above].

Appreciate him publishing this.

Jeffrey Ladish: Huh it seems right to me. Being able to align them to a long term goal is a subset of being able to align them to any goal. I think Sam is wildly optimistic but the target doesn’t seem wrong

Eli Lifland: I think with the example it gives the wrong impression.

Social media feeds are in many ways a highly helpful example and intuition pump here, as it illustrates what ‘aligned’ means here. Those feeds are clearly aligned to the companies in question. There’s an additional question of whether such actions are indeed in the best interests of the companies, but for this purpose I think we should accept that they likely are. Thus, alignment here means aligned to the user’s longer term best interests, and how long term and how paternalistic that should be are left as open questions.

The place this is potentially misleading is, if we did get a feed here that was aligned to the user’s ‘true’ preferences in some sense, then is it aligned? What if that was against what we collectively ‘really want,’ let’s say because it encourages too much social media use by being too good, or it doesn’t push you enough towards making new friends? And that’s only a microcosm. Not doing the social media misalignment thing is relatively easy – we all know how to align the algorithm to the user far better, and we all know why it isn’t done. The general case job here is vastly harder.

Matt Yglesias wishes Sam Altman and others would tell us which policy ideas they think we should entertain, since he mentions that a much richer world could entertain new ideas.

My honest-Altman response is two-fold.

  1. We’re not richer yet, so we can’t yet entertain them. There’s a reason Altman says we won’t adapt the new social contract all at once. So it would be unwise to tell the world what they are. I think this is actually a strong argument. There are many aspects of the future that will require sacred value tradeoffs, if you take any side of one of those you open yourself up to attack, and if you do so before it is clear the tradeoff is forced it is vastly worse. There’s no winning doing this.

  2. If we do get into this richer position with the ability to meaningfully enact policy, if we are all still alive and in control over the future, then this is the part where we can adapt and muddle through and fix things in post. We can have that discussion later (and have superintelligent help). There’s no need to get distracted by this.

An obvious objection is, what makes us think we can use, demand or enforce social contracts in such a future? The foundations of social contract theory don’t hold in a world with superintelligence. I think that contra many science fiction writers that a future very rich world will choose to treat its less fortunate rather well even if nothing is forcing the elite to do so, but also nothing will be forcing the elite to do so.

Finally, like Rob Wiblin I notice I am confused by this closing:

Sam Altman: May we scale smoothly, exponentially and uneventfully through superintelligence.

It is hard not to interpret this, and many aspects of the essay, as essentially saying ‘don’t worry, nothing to see here, we got this, wonders beyond your imagination with no downsides that can’t be easily fixed, so don’t regulate me. Just go gently into that good night, and everything will be fine.’

Daniel Faggella: the singularity is going to hit so hard it’ll rip the skin off your fucking bones tbh

It’ll fucking shred not only hominid-ness, put probably almost any semblance of hominid values or works.

It’ll be a million things at once, or a trillion. it sure af won’t be gentle lol

Can’t blame sam for saying what he’s saying, the incentives make it hard to say otherwise.

I don’t understand why people respond so often with the common counterpoint of ‘well the singularity hasn’t happened yet, so the idea that it will hit you hard when it does come hasn’t been borne out.’ That doesn’t bear on the question at all.

Max Kesin sums up the appropriate response:

Discussion about this post

The Dream of a Gentle Singularity Read More »

all-wheel-drive-evs-at-210-mph?-formula-e’s-next-car-gets-massive-upgrade.

All-wheel drive EVs at 210 mph? Formula E’s next car gets massive upgrade.

The governing body for world motorsport met in Macau yesterday. Among the jobs for the Fédération Internationale de l’Automobile was to sign off on various calendars for next season, which is why there’s now a clash between the F1 Monaco Grand Prix and the 24 Hours of Le Mans and also between the Indy 500 and F1’s annual visit to Canada. The Formula E calendar was also announced, although with a pair of blank TBCs in the middle, I’ll hold off calling it finalized.

The US round will now take place in late January, and it’s moving venues yet again. No longer will you need to drive an hour south of Miami; instead, the northern outskirts of the city will suffice. The infield at Homestead is no more, and the sport has negotiated a race at the Hard Rock Stadium, albeit on a different layout than the one used by F1. It seems that Formula E’s recent “Evo Sessions” race between influencers, which was held at the stadium, proved convincing.

The really interesting Formula E news from Macau won’t take effect until the 2026–2027 season, and that’s the arrival of the Gen4 car.

The current machine is no slouch, not since they took some constraints off the Gen3 car this season. The addition of part-time all-wheel drive has improved what was already a very racey series, but for now, it’s only available for the final part of qualifying, the start of the race, and when using the mandatory Attack Mode that has added some interesting new strategy to the sport.

New tires, more aero, and way more power

From the start of the 2026–2027 season, all-wheel drive will finally be permanent for the single-seater EVs. It is long past time, given that virtually every high-performance EV on the road powers both its axles, and it marks the first time the FIA has approved a permanent AWD single-seater since the technology was outlawed from F1 decades ago.

All-wheel drive EVs at 210 mph? Formula E’s next car gets massive upgrade. Read More »

trade-war-truce-between-us-and-china-is-back-on

Trade war truce between US and China is back on

Both countries agreed in Geneva last month to slash their respective tariffs by 115 percentage points and provided a 90-day window to resolve the trade war.

But the ceasefire came under pressure after Washington accused Beijing of reneging on an agreement to speed up the export of rare earths, while China criticized new US export controls.

This week’s talks to resolve the impasse were held in the historic Lancaster House mansion in central London, a short walk from Buckingham Palace, which was provided by the British government as a neutral ground for the talks.

Over the two days, the US team, which included Treasury Secretary Scott Bessent, Commerce Secretary Howard Lutnick, and US trade representative Jamieson Greer, [met with] the Chinese delegation, which was led by He Lifeng, a vice-premier responsible for the economy.

The negotiations were launched to ensure Chinese exports of rare earths to the US and American technology export controls on China did not derail broader talks between the sides.

Ahead of the first round of talks in Geneva, Bessent had warned that the high level of mutual tariffs had amounted to an effective embargo on bilateral trade.

Chinese exports to the US fell more steeply in May compared with a year earlier than at any point since the pandemic in 2020.

The US had said China was not honoring its pledge in Geneva to ease restrictions on rare earths exports, which are critical to the defense, car, and tech industries, and was dragging its feet over approving licenses for shipments, affecting manufacturing supply chains in the US and Europe.

Beijing has accused the US of “seriously violating” the Geneva agreement after it announced new restrictions on sales of chip design software to Chinese companies.

It has also objected to the US issuing new warnings on the global use of Huawei chips and canceling visas for Chinese students.

Separately, a US federal appeals court on Tuesday allowed some of Trump’s broadest tariffs to remain in place while it reviews a lower-court ruling that had blocked his “liberation day” levies on US trading partners.

The ruling extended an earlier temporary reprieve and will allow Trump to enact the measures as well as separate levies targeting Mexico, Canada, and China. The president has, however, already paused the wider “reciprocal” tariffs for 90 days.

Trade war truce between US and China is back on Read More »

the-new-version-of-audi’s-best-selling-q5-suv-arrives-in-the-us

The new version of Audi’s best-selling Q5 SUV arrives in the US


The driving dynamics are improved, and there’s plenty of tech to play with.

A white Audi Q5 parked on some dirt next to some trees

This is the third-generation Audi Q5. Credit: Jonathan Gitlin

This is the third-generation Audi Q5. Credit: Jonathan Gitlin

ASPEN, Colo.—There’s a lot riding on Audi’s next Q5. The model has been Audi’s bread and butter here since the model went on sale in the US in 2009, as tastes changed and sedans fell out of favor. The third-generation Q5 is built on an all-new platform and is one of a new generation of software-defined vehicles that’s meant to ditch a lot of legacy crud for a clean sheet approach. You would have known all of that from our look at the new Q5 in a studio last year, when Audi briefed us on its new platform. What you wouldn’t have known from that piece is how it drives, particularly on US roads. The answer is: surprisingly well.

PPC

Just a few years ago, the world’s big car brands were telling us that soon everyone would be driving electric cars, and that it would be wonderful. Things haven’t quite panned out the way people thought they might when prognosticating in 2018, though. Electric powertrains have yet to reach price parity, in many places infrastructure still lags, and so automakers are developing new combustion-powered vehicles, particularly for markets like the US, where adoption remains far behind Europe or China.

For Audi and the other premium brands within the Volkswagen Group empire, that’s a new platform called PPC, or Premium Platform Combustion. PPC will provide the bones for new vehicles in a range of sizes and shapes, the same way the MLB (and MLB Evo) platforms have done until now. In a week, you can read about the A5, for example, but as the sales figures show, SUVs are what people want, so the Q5 comes first.

And this is the third-generation Audi SQ5. Jonathan Gitlin

To begin with, the US will get just two choices of powertrain. The Q5, which starts at $52,299, is powered by a 2.0 L turbocharged, direct-injection four-cylinder engine, which generates 268 hp (200 kW) and 295 lb-ft (400 Nm), which is sent to all four wheels via a seven-speed dual-clutch transmission. The SQ5 is the fancier, more powerful version. This starts at $64,800, and its 3.0 L turbocharged, direct-injection V6 provides 362 hp (270 kW) and 406 lb-ft (550 Nm), again to all four wheels via a seven-speed DCT.

At some point Audi will likely put a plug-in hybrid powertrain in the Q5, but there’s no guarantee it would come to the US, particularly if the US government remains hostile to both foreign trade and environmental protection. Audi sells a 48 V mild hybrid Q5—essentially a powerful starter motor—in Europe but currently has no plans to bring that version to the US. Happily for those looking for an entirely electric Audi midsize SUV, the Q6 e-tron is ready and waiting.

But you can get the Q5, and the SQ5, in a pair of different body styles. As before, Audi has a Sportback variant, which trades the upright rear hatch for a more sloping roofline. What the Sportback loses in rear headroom, it makes up for in style but should drive the exact same way. In Colorado, Audi only had the regular SUVs for us to test.

Software-defined vehicles

Although the Q5 and Q6 e-tron don’t share a common platform, they do share a common electronic architecture. Gone are the days of CANBUS and a hundred or more discrete black boxes and ECUs, each with a single function. Instead, it’s an entirely clean-sheet approach known as a software-defined vehicle, where a handful of powerful computers are each responsible for controlling a different domain, in this case vehicle dynamics, driver assists, infotainment, climate, and convenience, all tied together by Ethernet, with a backbone computer overseeing it all.

VW Group bit off a bit more than it could chew and tried simultaneously developing not one but two SDV architectures, before realizing no one wanted to work on the one the company actually needed sooner. That architecture is called E3 1.2, and with a bit of focus, VW Group’s software division has gotten it out the door.

I feel like Audi has taken a step back in terms of HMI for this latest generation of user interfaces. And why can’t I put a map display here? Audi

The practical upshot of SDVs, unlike older cars with their single-function black boxes, is that everything on an SDV should be updatable. The flip side is the potential for more bugs, although I can report that the Q5s and SQ5s we encountered in Colorado felt much more mature, software-wise, than the somewhat buggy preproduction cars using E3 1.2 that we drove in mid-2024.

As for VW’s future SDV architecture, it might well come from Rivian instead of its in-house division. Last summer, VW Group invested $5 billion in Rivian to gain access to the startup’s SDV technology.

As part of E3 1.2, the Q5 gets the latest version of Audi’s MMI infotainment, which now uses Android Automotive OS. There’s a more powerful voice assistant, triggered by “Hey, Audi,” that uses natural language processing that’s able to easily understand me, and which I think provides a good alternative to using a touchscreen while driving. I lament the lack of customizability, particularly in the main driver display and the fact that you can no longer display a map there, despite that being a feature Audi pioneered.

You can also add a second infotainment screen for the passenger, although only by ticking the box for the Prestige trim, which adds $8,400 to the price of a Q5, or $6,400 to the price of an SQ5. More on this later.

The driving experience: Q5

We began our day in the Q5, albeit one fitted with the optional air suspension and 20-inch wheels (18-inch wheels are standard, and 19-inch wheels are also available). Despite the altitude, there was more than sufficient power and torque to move the Q5’s 4,244 lb (1,925 kg) curb weight—forced induction providing the same benefit here as it did for piston-engined aircraft a century ago or more. At sea level, you could expect to reach 60 mph in 5.8 seconds, or 100 km/h in 6.2 seconds, according to Audi.

There’s plenty of storage places, and all the bits you touch feel pleasant under hand. Audi

There’s a new drive mode called Balanced, which fits between Comfort and Dynamic; on the road this mode is well-named as it indeed provides a good balance between ride comfort and responsiveness, with just enough but not too much weight to the steering. There’s also an individual mode that lets you pick and choose your own suspension, transmission, and steering settings, plus off-road and off-road plus modes, which we’ll encounter again later.

In fact, for a midsize crossover, the Q5 proved quite engaging from behind the wheel. It doesn’t lean too much when cornering, although if you plan to negotiate a sequence of twisty tarmac, the lower ride height in Dynamic mode, plus the firmer air springs, is definitely the way to go. When you’re not in a hurry or grinding along the highway, the ride is comfortable, and up front, there is little road noise thanks to some acoustic glass. I would like to try a car fitted with the conventional steel springs, however.

The cockpit layout is similar to the electric Q6 e-tron, with the same “digital stage” that includes a second infotainment screen for the passenger. But the materials here feel of a higher quality—my guess is that weight saving was much less of a concern for the gasoline-powered Q5 than the battery-carrying Q6. There is plenty of cargo room in the back, and perhaps a little more rear legroom than the photo would suggest—38 inches (965 mm), according to the spec sheet.

The driving experience: SQ5

The SQ5 can be specced with Nappa leather. Audi

The second half of our day was spent in the SQ5, most of it above 10,000 feet (3 km). Even in the thin air, the car was responsive, with the extra power and torque over the Q5 quite apparent. Audi was evidently confident in the SQ5, since our drive route included more than an hour on unpaved roads. None of the cars, all equipped with 21-inch wheels and lower-profile tires, had any trouble with punctures, and the off-road plus mode, which raises up the suspension, changes the throttle mapping, and disables the stability control, coped perfectly well over stretches of road that few luxury SUVs will ever face. I can report that as occupants, we weren’t even particularly jostled.

I liked the way the SQ5 sounded, particularly in Dynamic, and it’s engaging enough to drive that you’d take the long way home in it, despite being an SUV. However, it’s also deceptively quick, in part thanks to being quiet and refined inside. There’s a lack of intrusion from the outside environment that removes the noise and vibrations associated with speed, so you can look at the dash or heads-up display and see you’re 20 mph faster than you thought. That’s not great when mountain roads with no guard rails trigger your fear of heights, but the fact that I’m writing this means it ended OK.

They make you pay

I enjoyed driving both the Q5 and SQ5, but as is always the case on first drives for the media, we were presented with very well-equipped examples to test. For example, the great ride I experienced with the Q5 requires the $8,400 Prestige pack, which also adds the acoustic glass that made it so quiet inside. That’s also the only way to get heated rear seats and ventilated front seats, the clever OLED tail lights, or the second display for the passenger. (On the SQ5 the Prestige pack is only $6,400, since air suspension is standard on all SQ5s, and adds Nappa leather as well.)

A pair of Audi Q5s parked by some mountain scenery

With scenery like this, who needs to look at cars? Credit: Jonathan Gitlin

Some other features that I expected would be standard were instead behind the Premium Plus pack—$4,500 for the Q5 and $3,500 for the SQ5. I would expect the high-resolution, full-color heads-up display to be an extra, but you also have to tick this option if you want USB-C ports (2 x 60 W in the front, 2 x 100 W in the rear) in the car. And you probably do.

Photo of Jonathan M. Gitlin

Jonathan is the Automotive Editor at Ars Technica. He has a BSc and PhD in Pharmacology. In 2014 he decided to indulge his lifelong passion for the car by leaving the National Human Genome Research Institute and launching Ars Technica’s automotive coverage. He lives in Washington, DC.

The new version of Audi’s best-selling Q5 SUV arrives in the US Read More »

give-me-a-reason(ing-model)

Give Me a Reason(ing Model)

Are we doing this again? It looks like we are doing this again.

This time it involves giving LLMs several ‘new’ tasks including effectively a Tower of Hanoi problem, asking them to specify the answer via individual steps rather than an algorithm then calling a failure to properly execute all the steps this way (whether or not they even had enough tokens to do it!) an inability to reason.

The actual work in the paper seems by all accounts to be fine as far as it goes if presented accurately, but the way it is being presented and discussed is not fine.

Ruben Hassid (12 million views, not how any of this works): BREAKING: Apple just proved AI “reasoning” models like Claude, DeepSeek-R1, and o3-mini don’t actually reason at all.

They just memorize patterns really well.

Here’s what Apple discovered:

(hint: we’re not as close to AGI as the hype suggests)

Instead of using the same old math tests that AI companies love to brag about, Apple created fresh puzzle games. They tested Claude Thinking, DeepSeek-R1, and o3-mini on problems these models had never seen before.

All “reasoning” models hit a complexity wall where they completely collapse to 0% accuracy. No matter how much computing power you give them, they can’t solve harder problems. As problems got harder, these “thinking” models actually started thinking less. They used fewer tokens and gave up faster, despite having unlimited budget.

[And so on.]

Ryan Greenblatt: This paper doesn’t show fundamental limitations of LLMs:

– The “higher complexity” problems require more reasoning than fits in the context length (humans would also take too long).

– Humans would also make errors in the cases where the problem is doable in the context length.

– I bet models they don’t test (in particular o3 or o4-mini) would perform better and probably get close to solving most of the problems which are solvable in the allowed context length

It’s somewhat wild that the paper doesn’t realize that solving many of the problems they give the model would clearly require >>50k tokens of reasoning which the model can’t do. Of course the performance goes to zero once the problem gets sufficiently big: the model has a limited context length. (A human with a few hours would also fail!)

Rohit: I asked o3 to analyse and critique Apple’s new “LLMs can’t reason” paper. Despite its inability to reason I think it did a pretty decent job, don’t you?

Don’t get me wrong it’s an interesting paper for sure, like the variations in when catastrophic failure happens for instance, just a bit overstated wrt its positioning.

Kevin Bryan: The “reasoning doesn’t exist” Apple paper drives me crazy. Take logic puzzle like Tower of Hanoi w/ 10s to 1000000s of moves to solve correctly. Check first step where an LLM makes mistake. Long problems aren’t solved. Fewer thought tokens/early mistakes on longer problems.

But if you tell me to solve a problem that would take me an hour of pen and paper, but give me five minutes, I’ll probably give you an approximate solution or a heuristic. THIS IS EXACTLY WHAT FOUNDATION MODELS WITH THINKING ARE RL’D TO DO.

We know from things like Code with Claude and internal benchmarks that performance strictly increases as we increase in tokens used for inference, on ~every problem domain tried. But LLM companies can do this: *youcan’t b/c model you have access to tries not to “overthink”.

The team on this paper are good (incl. Yoshua Bengio’s brother!), but interpretation media folks give it is just wrong. It 100% does not, and can not, show “reasoning is just pattern matching” (beyond trivial fact that all LLMs do nothing more than RL’d token prediction…)

The team might be good, but in this case you don’t blame the reaction on the media. The abstract very clearly is laying out the same misleading narrative picked up by the media. You can wish for a media that doesn’t get fooled by that, but that’s not the world we live in, and the blame is squarely on the way the paper presents itself.

Lisan al Galib: A few more observations after replicating the Tower of Hanoi game with their exact prompts:

– You need AT LEAST 2^N – 1 moves and the output format requires 10 tokens per move + some constant stuff.

– Furthermore the output limit for Sonnet 3.7 is 128k, DeepSeek R1 64K, and o3-mini 100k tokens. This includes the reasoning tokens they use before outputting their final answer!

– all models will have 0 accuracy with more than 13 disks simply because they can not output that much!

– At least for Sonnet it doesn’t try to reason through the problem once it’s above ~7 disks. It will state what the problem and the algorithm to solve it and then output its solution without even thinking about individual steps.

– it’s also interesting to look at the models as having a X% chance of picking the correct token at each move

– even with a 99.99% probability the models will eventually make an error simply because of the exponentially growing problem size

But I also observed this peak in token usage across the models I tested at around 9-11 disks. That’s simply the threshold where the models say: “Fuck off I’m not writing down 2^n_disks – 1 steps”

[And so on.]

Tony Ginart: Humans aren’t solving a 10 disk tower of Hanoi by hand either.

One Draw Nick: If that’s true then this paper from Apple makes no sense.

Lisan al Galib: It doesn’t, hope that helps.

Gallabytes: if I asked you to solve towers of Hanoi entirely in your head without writing anything down how tall could the tower get before you’d tell me to fuck off?

My answer to ‘how many before I tell you off’ is three. Not that I couldn’t do more than three, but I would choose not to.

Colin Fraser I think gives us a great and clean version of the bear case here?

Colin Fraser: if you can reliably carry out a sequence of logical steps then you can solve the Tower of Hanoi problem. If you can’t solve the Tower of Hanoi problem then you can’t carry out a sequence of logical steps. It’s really quite simple and not mysterious.

They give it the instructions. They tell it to do the steps. It doesn’t do the steps. So-called “reasoning” doesn’t help it do the steps. What else are you supposed to make of this? It can’t do the steps.

It seems important that this doesn’t follow?

  1. Not doing [X] in a given situation doesn’t mean you can’t do [X] in general.

  2. Not doing [X] in a particular test especially doesn’t mean a model can’t do [X].

  3. Not doing [X] can be a simple ‘you did not provide enough tokens to [X]’ issue.

  4. The more adversarial the example, the less evidence this provided.

  5. Failure to do any given task requiring [X] does not mean you can’t [X] in general.

Or more generally, ‘won’t’ or ‘doesn’t’ [X] does not show ‘can’t’ [X]. It is of course often evidence, since doing [X] does prove you can [X]. How much evidence it provides depends on the circumstances.

To summarize, this is tough but remarkably fair:

Charles Goddard: 🤯 MIND-BLOWN! A new paper just SHATTERED everything we thought we knew about AI reasoning!

This is paradigm-shifting. A MUST-READ. Full breakdown below 👇

🧵 1/23

Linch: Any chance you’re looking for a coauthor in future work? I want to write a survey paper explaining why while jobs extremely similar to mine will be easily automatable, my own skillset is unique and special and require a human touch.

Also the periodic reminder that asking ‘is it really reasoning’ is a wrong question.

Yuchen Jin: Ilya Sutskever, in his speech at UToronto 2 days ago:

“The day will come when AI will do all the things we can do.”

“The reason is the brain is a biological computer, so why can’t the digital computer do the same things?”

It’s funny that we are debating if AI can “truly think” or give “the illusion of thinking”, as if our biological brain is superior or fundamentally different from a digital brain.

Ilya’s advice to the greatest challenge of humanity ever:

“By simply looking at what AI can do, not ignoring it, that will generate the energy that’s required to overcome the huge challenge.”

If a different name for what is happening would dissolve the dispute, then who cares?

Colin Fraser: The labs are the ones who gave test time compute scaling these grandiose names like “thinking” and “reasoning”. They could have just not called it that.

I don’t see those names as grandiose. I see them as the best practical descriptions in terms of helping people understand what is going on. It seems much more helpful and practical than always saying ‘test time compute scaling.’ Colin suggested ‘long output mode’ and I agree that would set expectations lower but I don’t think that describes the central thing going on here at all, instead it makes it sounds like it’s being more verbose.

Discussion about this post

Give Me a Reason(ing Model) Read More »

apple-drops-support-for-just-three-iphone-and-ipad-models-from-ios-and-ipados-26

Apple drops support for just three iPhone and iPad models from iOS and iPadOS 26

Every year, Apple releases new versions of iOS and iPadOS, and most years those updates also end support for a handful of devices that are too old or too slow or otherwise incapable of running the new software.

Though this year’s macOS 26 Tahoe release was unkind to Intel Macs, the iOS 26 and iPadOS 26 releases are more generous, dropping support for just two iPhone models and a single iPad. The iOS 26 update won’t run on 2018’s iPhone XR or XS, and iPadOS 26 won’t run on 2019’s 7th-generation iPad. Any other device that can currently run iOS or iPadOS 18 will be able to upgrade to the new versions and pick up the new Liquid Glass look, among other features.

Apple never provides explicit reasoning for why it drops the devices it drops, though they can usually be explained by some combination of age and technical capability. The 7th-gen iPad, for example, was still using a 2017-vintage Apple A10X chip despite being introduced a number of years later.

The iPhone XR and XS, on the other hand, use an Apple A12 chip, the same one used by several still-supported iPads.

Apple usually provides security-only patches for dispatched iDevices for a year or two after they stop running the newest OS, though the company never publishes timelines for these updates, and the iPhone and iPad haven’t been treated as reliably as Macs have. Still, if you do find yourself relying on one of those older devices, you can still probably wring a bit of usefulness out of it before you start missing out on critical security patches.

Apple drops support for just three iPhone and iPad models from iOS and iPadOS 26 Read More »

prepping-for-starship,-spacex-is-about-to-demolish-one-of-ula’s-launch-pads

Prepping for Starship, SpaceX is about to demolish one of ULA’s launch pads


SpaceX may soon have up to nine active launch pads. Most competitors have one or two.

A Delta IV Heavy rocket stands inside the mobile service tower at Space Launch Complex-37 in this photo from 2014. SpaceX is set to demolish all of the structures seen here. Credit: United Launch Alliance

The US Air Force is moving closer to authorizing SpaceX to move into one of the largest launch pads at Cape Canaveral Space Force Station in Florida, with plans to use the facility for up to 76 launches of the company’s Starship rocket each year.

A draft Environmental Impact Statement (EIS) released this week by the Department of the Air Force, which includes the Space Force, found SpaceX’s planned use of Space Launch Complex 37 (SLC-37) at Cape Canaveral would have no significant negative impacts on local environmental, historical, social, and cultural interests. The Air Force also found SpaceX’s plans at SLC-37 will have no significant impact on the company’s competitors in the launch industry.

The Defense Department is leading the environmental review and approval process for SpaceX to take over the launch site, which the Space Force previously leased to United Launch Alliance, one of SpaceX’s chief rivals in the US launch industry. ULA launched its final Delta IV Heavy rocket from SLC-37 in April 2024, a couple of months after the military announced SpaceX was interested in using the launch pad.

Ground crews are expected to begin removing Delta IV-era structures at the launch pad this week. Multiple sources told Ars demolition could begin as soon as Thursday.

Emre Kelly, a Space Force spokesperson, deferred questions on the schedule for the demolition to SpaceX, which is overseeing the work. But he said the Delta IV’s mobile gantry, fixed umbilical tower, and both lightning towers will come down. Unlike other large-scale demolitions at Cape Canaveral, SpaceX and the Space Force don’t plan to publicize the event ahead of time.

“Demolition of these items will be conducted in accordance with federal and state laws that govern explosive demolition operations,” Kelly said.

In their place, SpaceX plans to build two 600-foot-tall (180-meter) Starship launch integration towers within the 230-acre confines of SLC-37.

Tied at the hip

The Space Force’s willingness to turn over a piece of prime real estate at Cape Canaveral to SpaceX helps illustrate the government’s close relationship with—indeed, reliance on—Elon Musk’s space company. The breakdown of Musk’s relationship with President Donald Trump has, so far, only spawned a war of words between the two billionaires.

But Trump has threatened to terminate Musk’s contracts with the federal government and warned of “serious consequences” for Musk if he donates money to Democratic political candidates. Musk said he would begin decommissioning SpaceX’s Dragon spacecraft, the sole US vehicle ferrying astronauts to and from orbit, before backing off the threat last week.

NASA and the Space Force need SpaceX’s Dragon spacecraft and its Falcon 9 and Falcon Heavy rockets to maintain the International Space Station and launch the nation’s most critical military satellites. The super heavy-lift capabilities Starship will bring to the government could enable a range of new missions, such as global cargo delivery for the military and missions to the Moon and Mars in partnership with NASA.

Fully stacked, the Starship rocket stands more than 400 feet tall. Credit: SpaceX

SpaceX already has a “right of limited entry” to begin preparations to convert SLC-37 into a Starship launch pad. A full lease agreement between the Space Force and SpaceX is expected after the release of the final Environmental Impact Statement.

The environmental approval process began more than a year ago with a notice of intent, followed by studies, evaluations, and scope meetings that fed into the creation of the draft EIS. Now, government officials will host more public meetings and solicit public comments on SpaceX’s plans through late July. Then, sometime this fall, the Department of the Air Force will issue a final EIS and a “record of decision,” according to the project’s official timeline.

A growing footprint

This timeline could allow SpaceX to begin launching Starships from SLC-37 as soon as next year, although the site still requires the demolition of existing structures and construction of new towers, propellant farms, a methane liquefaction plant, water tanks, deluge systems, and other ground support equipment. The construction will likely take more than a year, so perhaps 2027 is a more realistic target.

The company is also studying an option to construct two separate towers for use exclusively as “catch towers” for recovery of Super Heavy boosters and Starship upper stages “if space allows” at SLC-37, according to the draft EIS. According to the Air Force, the initial review process eliminated an option for SpaceX to construct a standalone Starship launch pad on undeveloped property at Cape Canaveral because the site would have a “high potential” for impacting endangered species and is “less ideal” than developing an existing launch pad.

SpaceX’s plan for recovering its reusable Super Heavy and Starship vehicles involves catching them with articulating arms on a towereither a launch integration structure or a catch-only tower. SpaceX has already demonstrated catching the Super Heavy booster on three test flights at the company’s Starbase launch site in South Texas. An attempt to catch a Starship vehicle returning from low-Earth orbit might happen later this year, assuming SpaceX can correct the technical problems that have stalled the rocket’s advancement in recent months.

Construction crews are outfitting a second Starship launch tower at Starbase, called Pad B, that may also come online before the end of this year. A few miles north of SLC-37, SpaceX has built another Starship tower at Launch Complex 39A, a historic site on NASA property at Kennedy Space Center. Significant work remains ahead at LC-39A to install a new launch mount, finish digging a flame trench, and install all the tanks and plumbing necessary to store and load super-cold propellants into the rocket. The most recent official schedule from SpaceX suggests a first Starship launch from LC-39A could happen before the end of the year, but it’s probably a year or more away.

The Air Force’s draft Environmental Impact Statement includes this map showing SpaceX’s site plan for SLC-37. Credit: Department of the Air Force

Similar to the approach SpaceX is taking at SLC-37, a document released last year indicates the Starship team plans to construct a separate catch tower near the Starship launch tower at LC-39A. If built, these catch towers could simplify Starship operations as the flight rate ramps up, allowing SpaceX to catch a returning rocket at one location while stacking Starships for launch with the chopstick arms on nearby integration towers.

With SpaceX’s growing footprint in Texas and Florida, the company has built, is building, or revealed plans to build at least five Starship launch towers. This number is likely to grow in the coming years as Musk aims to eventually launch and land multiple Starships per day. This will be a gradual ramp-up as SpaceX works through Starship design issues, grows factory capacity, and brings new launch pads online.

Last month, the Federal Aviation Administration—which oversees environmental reviews for launch sites that aren’t on military propertyapproved SpaceX’s request to launch Starships as many as 25 times per year from Starbase, Texas. The previous limit was five, but the number will likely go up from here. Coming into 2025, SpaceX sought to launch as many as 25 Starships this year, but failures on three of the rockets’ most recent test flights have slowed development, and this goal is no longer achievable.

That’s a lot of launches

Meanwhile, in Florida, the FAA’s environmental review for LC-39A is assessing the impact of launching Starships up to 44 times per year from Kennedy Space Center. At nearby Cape Canaveral Space Force Station, the Air Force is evaluating SpaceX’s proposal for up to 76 Starship flights per year from SLC-37. The scope of each review also includes environmental assessments for Super Heavy and Starship landings within the perimeters of each launch complex.

While the draft EIS for SLC-37 is now public, the FAA hasn’t yet released a similar document for SpaceX’s planned expansion and Starship launch operations at LC-39A, also home to a launch pad used for Falcon 9 and Falcon Heavy flights.

SpaceX will continue launching its workhorse Falcon 9 and Falcon Heavy rockets as Starship launch pads heat up with more test flights. Within a few years, SpaceX could have as many as nine active launch pads spread across three states. The company’s most optimistic vision for Starship would require many more, potentially including offshore launch and landing sites.

At Vandenberg Space Force Base in California, SpaceX has leased the former West Coast launch pad for United Launch Alliance’s Delta IV rocket. SpaceX will prepare this launch pad, known as SLC-6, for Falcon 9 and Falcon Heavy launches starting as soon as next year, augmenting the capacity of the company’s existing Vandenberg launch pad, which is only configured for Falcon 9s. Like the demolition at SLC-37 in Florida, the work to prepare SLC-6 will include the razing of unnecessary towers and structures left over from the Delta IV (and the Space Shuttle) program.

SpaceX has not yet announced any plans to launch Starships from the California spaceport.

SpaceX launches Falcon 9 rockets from Pad 39A at NASA’s Kennedy Space Center and from Pad 40 at Cape Canaveral Space Force Station. The company plans to develop Starship launch infrastructure at Pad 39A and Pad 37. United Launch Alliance flies Vulcan and Atlas V rockets from Pad 41, and Blue Origin has based its New Glenn rocket at Pad 36. Credit: NASA (labels by Ars Technica)

The expansion of SpaceX’s launch facilities comes as most of its closest competitors limit themselves to just one or two launch pads. ULA has reduced its footprint from seven launch pads to two as a cost-cutting measure. Blue Origin, Jeff Bezos’ space company, operates a single launch pad at Cape Canaveral, although it has unannounced plans to open a launch facility at Vandenberg. Rocket Lab has three operational launch pads in New Zealand and Virginia for the light-class Electron rocket and will soon have a fourth in for the medium-lift Neutron launcher.

These were the top four companies in Ars’ most recent annual power ranking of US launch providers.

Two of these competitors, ULA and Blue Origin, complained last year that SpaceX’s target of launching as many as 120 Starships per year from Florida’s Space Coast could force them to clear their launch pads for safety reasons. The Space Force is responsible for ensuring all personnel remain outside of danger areas during testing and launch operations.

It could become quite busy at Cape Canaveral. Military officials forecast that launch providers not named SpaceX could fly more than 110 launches per year. The Air Force acknowledged in the draft EIS that SpaceX’s plans for up to 76 launches and 152 landings (76 Starships and 76 Super Heavy boosters) per year at SLC-37 “could result in planning constraints for other range user operations.” This doesn’t take into account the FAA’s pending approval for up to 44 Starship flights per year from LC-39A.

But the report suggests SpaceX’s plans to launch from SLC-37 won’t require the evacuation of ULA and Blue Origin’s launch pads. While the report doesn’t mention the specific impact of Starship launches on ULA and Blue Origin, the Air Force wrote that work could continue on SpaceX’s own Falcon 9 launch pad at SLC-40 during a Starship launch at SLC-37. Because SLC-40 is closer to SLC-37 than ULA and Blue Origin’s pads, this finding seems to imply workers could remain at those launch sites.

The Air Force’s environmental report also doesn’t mention possible impacts of Starship launches from NASA property on nearby workers. It also doesn’t include any discussion of how Starship launches from SLC-37 might affect workers’ access to other facilities, such as offices and hangars, closer to the launch pad.

The bottom line of this section of the Air Force’s environmental report concluded that Starship flights from SLC-37 “should have no significant impact” on “ongoing and future activities” at the spaceport.

Shipping Starships

While SpaceX builds out its Starship launch pads on the Florida coast, the company is also constructing a Starship integration building a few miles away at Kennedy Space Center. This structure, called Gigabay, will be located next to an existing SpaceX building used for Falcon 9 processing and launch control.

The sprawling Gigabay will stand 380 feet tall and provide approximately 46.5 million cubic feet of interior processing space with 815,000 square feet of workspace, according to SpaceX. The company says this building should be operational by the end of 2026. SpaceX is also planning a co-located Starship manufacturing facility, similar to the Starfactory building recently completed at Starbase, Texas.

Until this factory is up and running, SpaceX plans to transport Starships and Super Heavy boosters horizontally via barges from South Texas to Cape Canaveral.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Prepping for Starship, SpaceX is about to demolish one of ULA’s launch pads Read More »

dwarkesh-patel-on-continual-learning

Dwarkesh Patel on Continual Learning

A key question going forward is the extent to which making further AI progress will depend upon some form of continual learning. Dwarkesh Patel offers us an extended essay considering these questions and reasons to be skeptical of the pace of progress for a while. I am less skeptical about many of these particular considerations, and do my best to explain why in detail.

Separately, Ivanka Trump recently endorsed a paper with a discussion I liked a lot less but that needs to be discussed given how influential her voice might (mind you I said might) be to policy going forward, so I will then cover that here as well.

Dwarkesh Patel explains why he doesn’t think AGI is right around the corner, and why AI progress today is insufficient to replace most white collar employment: That continual learning is both necessary and unsolved, and will be a huge bottleneck.

He opens with this quote:

Rudiger Dornbusch: Things take longer to happen than you think they will, and then they happen faster than you thought they could.

Clearly this means one is poorly calibrated, but also yes, and I expect it to feel like this as well. Either capabilities, diffusion or both will be on an exponential, and the future will be highly unevenly distributed until suddenly parts of it aren’t anymore. That seems to be true fractally as well, when the tech is ready and I figure out how to make AI do something, that’s it, it’s done.

Here is Dwarkesh’s Twitter thread summary:

Dwarkesh Patel: Sometimes people say that even if all AI progress totally stopped, the systems of today would still be economically transformative. I disagree. The reason that the Fortune 500 aren’t using LLMs to transform their workflows isn’t because the management is too stodgy.

Rather, it’s genuinely hard to get normal humanlike labor out of LLMs. And this has to do with some fundamental capabilities these models lack.

New blog post where I explain why I disagree with this, and why I have slightly longer timelines to AGI than many of my guests.

I think continual learning is a huge bottleneck to the usefulness of these models, and extended computer use may take years to sort out.

Link here.

There is no consensus definition of transformational but I think this is simply wrong, in the sense that LLMs being stuck without continual learning at essentially current levels would not stop them from having a transformational impact. There are a lot of other ways to get a ton more utility out of what we already have, and over time we would build around what the models can do rather than giving up the moment they don’t sufficiently neatly fit into existing human-shaped holes.

When we do solve human like continual learning, however, we might see a broadly deployed intelligence explosion *even if there’s no more algorithmic progress*.

Simply from the AI amalgamating the on-the-job experience of all the copies broadly deployed through the economy.

I’d bet 2028 for computer use agents that can do taxes end-to-end for my small business as well as a competent general manager could in a week: including chasing down all the receipts on different websites, emailing back and forth for invoices, and filing to the IRS.

That being said, you can’t play around with these models when they’re in their element and still think we’re not on track for AGI.

Strongly agree with that last statement. Regardless of how much we can do without strictly solving continual learning, continual learning is not solved… yet.

These are simple, self contained, short horizon, language in-language out tasks – the kinds of assignments that should be dead center in the LLMs’ repertoire. And they’re 5/10 at them. Don’t get me wrong, that’s impressive.

But the fundamental problem is that LLMs don’t get better over time the way a human would. The lack of continual learning is a huge huge problem. The LLM baseline at many tasks might be higher than an average human’s. But there’s no way to give a model high level feedback.

You’re stuck with the abilities you get out of the box. You can keep messing around with the system prompt. In practice this just doesn’t produce anything even close to the kind of learning and improvement that human employees experience.

The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.

You make an AI tool. It’s 5/10 out of the box. What level of Skill Issue are we dealing with here, that stops it from getting better over time assuming you don’t get to upgrade the underlying model?

You can obviously engage in industrial amounts of RL or other fine-tuning, but that too only goes so far.

You can use things like memory, or train LoRas, or various other incremental tricks. That doesn’t enable radical changes, but I do think it can work for the kinds of preference learning Dwarkesh is complaining he currently doesn’t have access to, and you can if desired go back and fine tune the entire system periodically.

How do you teach a kid to play a saxophone? You have her try to blow into one, listen to how it sounds, and adjust. Now imagine teaching saxophone this way instead: A student takes one attempt. The moment they make a mistake, you send them away and write detailed instructions about what went wrong. The next student reads your notes and tries to play Charlie Parker cold. When they fail, you refine the instructions for the next student.

This just wouldn’t work. No matter how well honed your prompt is, no kid is just going to learn how to play saxophone from just reading your instructions. But this is the only modality we as users have to ‘teach’ LLMs anything.

Are you even so sure about that? If the context you can give is hundreds of thousands to millions of tokens at once, with ability to conditionally access millions or billions more? If you can create new tools and programs and branch workflows, or have it do so on your behalf, and call instances with different contexts and procedures for substeps? If you get to keep rewinding time and sending in the exact same student in the same mental state as many times as you want? And so on, including any number of things I haven’t mentioned or thought about?

I am confident that with enough iterations and work (and access to the required physical tools) I could write a computer program to operate a robot to play the saxophone essentially perfectly. No, you can’t do this purely via the LLM component, but that is why we are moving towards MCP and tool use for such tasks.

I get that Dwarkesh has put a lot of work into getting his tools to 5/10. But it’s nothing compared to the amount of work that could be done, including the tools that could be involved. That’s not a knock on him, that wouldn’t be a good use of his time yet.

LLMs actually do get kinda smart and useful in the middle of a session. For example, sometimes I’ll co-write an essay with an LLM. I’ll give it an outline, and I’ll ask it to draft the essay passage by passage. All its suggestions up till 4 paragraphs in will be bad. So I’ll just rewrite the whole paragraph from scratch and tell it, “Hey, your shit sucked. This is what I wrote instead.” At that point, it can actually start giving good suggestions for the next paragraph. But this whole subtle understanding of my preferences and style is lost by the end of the session.

Okay, so that seems like it is totally, totally a Skill Issue now? As in, Dwarkesh Patel has a style. A few paragraphs of that style clue the LLM into knowing how to help. So… can’t we provide it with a bunch of curated examples of similar exercises, and put them into context in various ways (Claude projects just got 10x more context!) and start with that?

Even Claude Code will often reverse a hard-earned optimization that we engineered together before I hit /compact – because the explanation for why it was made didn’t make it into the summary.

Yeah, this is super annoying, I’ve run into it, but I can think of some obvious fixes for this, especially if you notice what you want to preserve? One obvious way is to do what humans do, which is to put it into comments in the code saying what the optimization is and why to keep it, which then remain in context whenever Claude considers ripping them out, I don’t know if that works yet but it totally should.

I’m not saying I have the magical solution to all this but it all feels like it’s One Weird Trick (okay, maybe 10 working together) away from working in ways I could totally figure out if I had a team behind me and I focused on it.

My guess is this will not look like ‘learn like a human’ exactly. Different tools are available, so we’ll first get the ability to solve this via doing something different. But also, yeah, I think with enough skill and the right technique (on the level of the innovation that created reasoning models) you could basically do what humans do? Which involves effectively having the systems automatically engage in various levels of meta and updating, often quite heavily off a single data point.

It is hard to overstate how much time and effort goes into training a human employee.

There are many jobs where an employee is not net profitable for years. Hiring decisions are often made on the basis of what will be needed in year four or beyond.

That ignores the schooling that you also have to do. A doctor in America requires starting with a college degree, then four years of medical school, then four years of residency, and we have to subsidize that residency because it is actively unprofitable. That’s obviously an extreme case, but there are many training programs or essentially apprenticeships that last for years, including highly expensive time from senior people and expensive real world mistakes.

Imagine what it took to make Dwarkesh Patel into Dwarkesh Patel. Or the investment he makes in his own employees.

Even afterwards, in many ways you will always be ‘stuck with’ various aspects of those employees, and have to make the most of what they offer. This is standard.

Claude Opus estimates, and I think this is reasonable, that for every two hours humans spend working, they spend one hour learning, with a little less than half of that learning essentially ‘on the job.’

If you need to train a not a ‘universal’ LLM but a highly specific-purpose LLM, and have a massive compute budget with which to do so, and you mostly don’t care about how it performs out of distribution the same way you mostly don’t for an employee (as in, you teach it what you teach a human, which is ‘if this is outside your distribution or you’re failing at it then run it up the chain to your supervisor,’ and you have a classifier for that) and you can build and use tools along the way? Different ballgame.

It makes sense, given the pace of progress, for most people and companies not to put that kind of investment into AI ‘employees’ or other AI tasks. But if things do start to stall out, or they don’t, either way the value proposition on that will quickly improve. It will start to be worth doing. And we will rapidly learn new ways of doing it better, and have the results available to be copied.

Here’s his predictions on computer use in particular, to see how much we actually disagree:

When I interviewed Anthropic researchers Sholto Douglas and Trenton Bricken on my podcast, they said that they expect reliable computer use agents by the end of next year. We already have computer use agents right now, but they’re pretty bad. They’re imagining something quite different.

Their forecast is that by the end of next year, you should be able to tell an AI, “Go do my taxes.” And it goes through your email, Amazon orders, and Slack messages, emails back and forth with everyone you need invoices from, compiles all your receipts, decides which are business expenses, asks for your approval on the edge cases, and then submits Form 1040 to the IRS.

I’m skeptical. I’m not an AI researcher, so far be it for me to contradict them on technical details. But given what little I know, here’s why I’d bet against this forecast:

  • As horizon lengths increase, rollouts have to become longer. The AI needs to do two hours worth of agentic computer use tasks before we can even see if it did it right. Not to mention that computer use requires processing images and video, which is already more compute intensive, even if you don’t factor in the longer rollout. This seems like this should slow down progress.

Let’s take the concrete example here, ‘go do my taxes.’

This is a highly agentic task, but like a real accountant you can choose to ‘check its work’ if you want, or get another AI to check the work, because you can totally break this down into smaller tasks that allow for verification, or present a plan of tasks that can be verified. Similarly, if you are training TaxBot to do people’s taxes for them, you can train TaxBot on a lot of those individual subtasks, and give it clear feedback.

Almost all computer use tasks are like this? Humans also mostly don’t do things that can’t be verified for hours?

And the core building block issues of computer use seem mostly like very short time horizon tasks with very easy verification methods. If you can get lots of 9s on the button clicking and menu navigation and so on, I think you’re a lot of the way there.

The subtasks are also 99%+ things that come up relatively often, and that don’t present any non-trivial difficulties. A human accountant already will have to occasionally say ‘wait, I need you the taxpayer to tell me what the hell is up with this thing’ and we’re giving the AI in 2028 the ability to do this too.

I don’t see any fundamental difference between the difficulties being pointed out here, and the difficulties of tasks we have already solved.

  • We don’t have a large pretraining corpus of multimodal computer use data. I like this quote from Mechanize’s post on automating software engineering: “For the past decade of scaling, we’ve been spoiled by the enormous amount of internet data that was freely available for us to use. This was enough for cracking natural language processing, but not for getting models to become reliable, competent agents. Imagine trying to train GPT-4 on all the text data available in 1980—the data would be nowhere near enough, even if we had the necessary compute.”

    Again, I’m not at the labs. Maybe text only training already gives you a great prior on how different UIs work, and what the relationship between different components is. Maybe RL fine tuning is so sample efficient that you don’t need that much data. But I haven’t seen any public evidence which makes me think that these models have suddenly gotten less data hungry, especially in this domain where they’re substantially less practiced.

    Alternatively, maybe these models are such good front end coders that they can just generate millions of toy UIs for themselves to practice on. For my reaction to this, see bullet point below.

I’m not going to keep working for the big labs for free on this one by giving even more details on how I’d solve all this, but this totally seems like highly solvable problems, and also this seems like a case of the person saying it can’t be done interrupting the people doing it? It seems like progress is being made rapidly.

  • Even algorithmic innovations which seem quite simple in retrospect seem to take a long time to iron out. The RL procedure which DeepSeek explained in their R1 paper seems simple at a high level. And yet it took 2 years from the launch of GPT-4 to the release of o1.

  • Now of course I know it is hilariously arrogant to say that R1/o1 were easy – a ton of engineering, debugging, pruning of alternative ideas was required to arrive at this solution. But that’s precisely my point! Seeing how long it took to implement the idea, ‘Train the model to solve verifiable math and coding problems’, makes me think that we’re underestimating the difficulty of solving the much gnarlier problem of computer use, where you’re operating in a totally different modality with much less data.

I think two years is how long we had to have the idea of o1 and commit to it, then to implement it. Four months is roughly the actual time it took from ‘here is that sentence and we know it works’ to full implementation. Also we’re going to have massively more resources to pour into these questions this time around, and frankly I don’t think any of these insights are even as hard to find as o1, especially now that we have reasoning models to use as part of this process.

I think there are other potential roadblocks along the way, and once you factor all of those in you can’t be that much more optimistic, but I see this particular issue as not that likely to pose that much of a bottleneck for long.

His predictions are he’d take 50/50 bets on: 2028 for an AI that can ‘just go do your taxes as well as a human accountant could’ and 2032 for ‘can learn details and preferences on the job as well as a human can.’ I’d be inclined to take other side of both of those bets, assuming it means by EOY, for the 2032 one we’d need to flesh out details.

But if we have the ‘AI that does your taxes’ in 2028 then 2029 and 2030 look pretty weird, because this implies other things:

Daniel Kokotajlo: Great post! This is basically how I think about things as well. So why the difference in our timelines then?

–Well, actually, they aren’t that different. My median for the intelligence explosion is 2028 now (one year longer than it was when writing AI 2027), which means early 2028 or so for the superhuman coder milestone described in AI 2027, which I’d think roughly corresponds to the “can do taxes end-to-end” milestone you describe as happening by end of 2028 with 50% probability. Maybe that’s a little too rough; maybe it’s more like month-long horizons instead of week-long. But at the growth rates in horizon lengths that we are seeing and that I’m expecting, that’s less than a year…

–So basically it seems like our only serious disagreement is the continual/online learning thing, which you say 50% by 2032 on whereas I’m at 50% by end of 2028. Here, my argument is simple: I think that once you get to the superhuman coder milestone, the pace of algorithmic progress will accelerate, and then you’ll reach full AI R&D automation and it’ll accelerate further, etc. Basically I think that progress will be much faster than normal around that time, and so innovations like flexible online learning that feel intuitively like they might come in 2032 will instead come later that same year.

(For reference AI 2027 depicts a gradual transition from today to fully online learning, where the intermediate stages look something like “Every week, and then eventually every day, they stack on another fine-tuning run on additional data, including an increasingly high amount of on-the-job real world data.” A janky unprincipled solution in early 2027 that gives way to more elegant and effective things midway through the year.)

I found this an interestingly wrong thing to think:

Richard: Given the risk of fines and jail for filling your taxes wrong, and the cost of processing poor quality paperwork that the government will have to bear, it seems very unlikely that people will want AI to do taxes, and very unlikely that a government will allow AI to do taxes.

The rate of fully accurately filing your taxes is, for anyone whose taxes are complex, basically 0%. Everyone makes mistakes. When the AI gets this right almost every time, it’s already much better than a human accountant, and you’ll have a strong case that what happened was accidental, which means at worst you pay some modest penalties.

Personal story, I was paying accountants at a prestigious firm that will go unnamed to do my taxes, and they literally just forgot to include paying city tax at all. As in, I’m looking at the forms, and I ask, ‘wait why does it have $0 under city tax?’ and the guy essentially says ‘oh, whoops.’ So, yeah. Mistakes are made. This will be like self-driving cars, where we’ll impose vastly higher standards of accuracy and law abidance on the AIs, and they will meet them because the bar really is not that high.

There were also some good detailed reactions and counterarguments from others:

Near: finally some spicy takes around here.

Rohit: The question is whether we need humanlike labour for transformative economic outcomes, or whether we can find ways to use the labour it does provide with a different enough workflow that it adds substantial economic advantage.

Sriram Krishnan: Really good post from @dwarkesh_sp on continuous learning in LLMs.

Vitalik Buterin: I have high probability mass on longer timelines, but this particular issue feels like the sort of limitation that’s true until one day someone discovers a magic trick (think eg. RL on CoT) that suddenly makes it no longer true.

Sriram Krishnan: Agree – CoT is a particularly good example.

Ryan Greenblatt: I agree with much of this post. I also have roughly 2032 medians to things going crazy, I agree learning on the job is very useful, and I’m also skeptical we’d see massive white collar automation without further AI progress.

However, I think Dwarkesh is wrong to suggest that RL fine-tuning can’t be qualitatively similar to how humans learn.

In the post, he discusses AIs constructing verifiable RL environments for themselves based on human feedback and then argues this wouldn’t be flexible and powerful enough to work, but RL could be used more similarly to how humans learn.

My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it’s based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.

So, you could imagine RL’ing an AI based on both external feedback and self-verification like this. And, this would be a “deliberate, adaptive process” like human learning. Why would this currently work worse than human learning?

Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:

1. Robust self-verification: the ability to correctly determine when you’ve done something well/poorly in a way which is robust to you optimizing against it.

2. Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.

But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.

All that said, I think it’s very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks…).

Some more quibbles:

– For the exact podcasting tasks Dwarkesh mentions, it really seems like simple fine-tuning mixed with a bit of RL would solve his problem. So, an automated training loop run by the AI could probably work here. This just isn’t deployed as an easy-to-use feature.

– For many (IMO most) useful tasks, AIs are limited by something other than “learning on the job”. At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.

– Correspondingly, I’d guess the reason that we don’t see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you’d have very superhuman AIs with 10^26 FLOP presumably), I’m just making a claim about more incremental progress.

– I think Dwarkesh uses the term “intelligence” somewhat atypically when he says “The reason humans are so useful is not mainly their raw intelligence. It’s their ability to build up context, interrogate their own failures, and pick up small improvements and efficiencies as they practice a task.” I think people often consider how fast someone learns on the job as one aspect of intelligence. I agree there is a difference between short feedback loop intelligence (e.g. IQ tests) and long feedback loop intelligence and they are quite correlated in humans (while AIs tend to be relatively worse at long feedback loop intelligence).

More thoughts/quibbles:

– Dwarkesh notes “An AI that is capable of online learning might functionally become a superintelligence quite rapidly, even if there’s no algorithmic progress after that point.” This seems reasonable, but it’s worth noting that if sample efficient learning is very compute expensive, then this might not happen so rapidly.

– I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to “learn once deploy many” style strategies). I think we’ll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we’ll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.

Matt Reardon: Dwarkesh commits the sin of thinking work you’re personally close to is harder-than-average to automate.

Herbie Bradley: I mean this is just correct? most researchers I know think continual learning is a big problem to be solved before AGI

Matt Reardon: My main gripe is that “<50%" [of jobs being something you can automate soon] should be more like "<15%"

Danielle Fong: Gell-Mann Amnesia for AI.

Reardon definitely confused me here, but either way I’d say that Dwarkesh Patel is a 99th percentile performer. He does things most other people can’t do. That’s probably going to be harder to automate than most other white collar work? The bulk of hours in white collar work are very much not bespoke things and don’t act to put state or memory into people in subtle ways?

Now that we’ve had a good detailed discussion and seen several perspectives, it’s time to address another discussion of related issues, because it is drawing attention from an unlikely source.

After previously amplifying Situational Awareness, Ivanka Trump is back in the Essay Meta with high praise for The Era of Experience, authored by David Silver and (oh no) Richard Sutton.

Situational Awareness was an excellent pick. I do not believe this essay was a good pick. I found it a very frustrating, unoriginal and unpersuasive paper to read. To the extent it is saying something new I don’t agree, but it’s not clear to what extent it is saying anything new. Unless you want to know about this paper exactly because Ivanka is harping it, you should skip this section.

I think the paper effectively mainly says we’re going to do a lot more RL and we should stop trying to make the AIs mimic, resemble or be comprehensible to humans or trying to control their optimization targets?

Ivanka Trump: Perhaps the most important thing you can read about AI this year : “Welcome to the Era of Experience”

This excellent paper from two senior DeepMind researchers argues that AI is entering a new phase—the “Era of Experience”—which follows the prior phases of simulation-based learning and human data-driven AI (like LLMs).

The authors’ posit that future AI breakthroughs will stem from learning through direct interaction with the world, not from imitating human-generated data.

This is not a theory or distant future prediction. It’s a description of a paradigm shift already in motion.

Let me know what you think !

Glad you asked, Ivanka! Here’s what I think.

The essay starts off with a perspective we have heard before, usually without much of an argument behind it: That LLMs and other AIs trained only on ‘human data’ is ‘rapidly approaching a limit,’ we are running out of high-quality data, and thus to progress significantly farther AIs will need to move into ‘the era of experience,’ meaning learning continuously from their environments.

I agree that the standard ‘just feed it more data’ approach will run out of data with which to scale, but there are a variety of techniques already being used to get around this. We have lots of options.

The leading example the paper itself gives of this in the wild is AlphaProof, which ‘interacted with a formal proofing system’ which seems to me like a clear case of synthetic data working and verification being easier than generation, rather than ‘experience.’ If the argument is simply that RL systems will learn by having their outputs evaluated, that isn’t news.

They claim to have in mind something rather different from that, and with this One Weird Trick they assert Superintelligence Real Soon Now:

Our contention is that incredible new capabilities will arise once the full potential of experiential learning is harnessed. This era of experience will likely be characterised by agents and environments that, in addition to learning from vast quantities of experiential data, will break through the limitations of human-centric AI systems in several further dimensions:

• Agents will inhabit streams of experience, rather than short snippets of interaction.

• Their actions and observations will be richly grounded in the environment, rather than interacting via human dialogue alone.

• Their rewards will be grounded in their experience of the environment, rather than coming from human prejudgement.

• They will plan and/or reason about experience, rather than reasoning solely in human terms.

We believe that today’s technology, with appropriately chosen algorithms, already provides a sufficiently powerful foundation to achieve these breakthroughs. Furthermore, the pursuit of this agenda by the AI community will spur new innovations in these directions that rapidly progress AI towards truly superhuman agents.

I suppose if the high level takeaway is ‘superintelligence is likely coming reasonably soon with the right algorithms’ then there’s no real disagreement?

They then however discuss tool calls and computer use, which then seems like a retreat back into an ordinary RL paradigm? It’s also not clear to me what the authors mean by ‘human terms’ versus ‘plan and/or reason about experience,’ or even what ‘experience’ means here. They seem to be drawing a distinction without a difference.

If the distinction is simply (as the paper implies in places) that the agents will do self-evaluation rather than relying on human feedback, I have some important news about how existing systems already function? They use the human feedback and other methods to train an AI feedback system that does most of the work? And yes they often include ‘real world’ feedback systems in that? What are we even saying here?

They also seem to be drawing a distinction between the broke ‘human feedback’ and the bespoke ‘humans report physical world impacts’ (or ‘other systems measure real world impacts’) as if the first does not often encompass the second. I keep noticing I am confused what the authors are trying to say.

For reasoning, they say it is unlikely human methods of reasoning and human language are optimal, more efficient methods of thought must exist. I mean, sure, but that’s also true for humans, and it’s obvious that you can use ‘human style methods of thought’ to get to superintelligence by simply imagining a human plus particular AI advantages.

As many have pointed out (and is central to AI 2027) encouraging AIs to use alien-looking inhuman reasoning styles we cannot parse is likely a very bad idea even if it would be more effective, what visibility we have will be lost and also it likely leads to alien values and breaks many happy things. Then again, Richard Sutton is one of the authors of this paper and he thinks we should welcome succession, as in the extinction of humanity, so he wouldn’t care.

They try to argue against this by saying that while agents pose safety risks and this approach may increase those safety risks, the approach may also have safety benefits. First, they say this allows the AI to adapt to its environment, as if the other agent could not do this or this should make us feel safer.

Second, they say ‘the reward function may itself be adapted through experience,’ in terms of risk that’s worse you know that that’s worse, right? They literally say ‘rather than blindly optimizing a signal such as the number of paperclips it can adopt to indications of human concern,’ this shows a profound lack of understanding and curiosity of where the whole misspecification of rewards problem is coming from or the arguments about it from Yudkowsky (since they bring in the ‘paperclips’).

Adapting autonomously and automatically towards something like ‘level of human concern’ is exactly the kind of metric and strategy that is absolutely going to encourage perverse outcomes and get you killed at the limit. You don’t get out of the specification problem by saying you can specify something messier and let the system adapt around it autonomously, that only makes it worse, and in no way addresses the actual issue.

The final argument for safety is that relying on physical experience creates time limitations, which provides a ‘natural break,’ which is saying that capabilities limits imposed by physical interactions will keep things more safe? Seriously?

There is almost nothing in the way of actual evidence or argument in the paper that is not fully standard, beyond a few intuition pumps. There are many deep misunderstandings, including fully backwards arguments, along the way. We may well want to rely a lot more on RL and on various different forms of ‘experiential’ data and continuous learning, but given how much worse it was than I expected this post updated me in the opposite direction of that which was clearly intended.

Discussion about this post

Dwarkesh Patel on Continual Learning Read More »

bill-atkinson,-architect-of-the-mac’s-graphical-soul,-dies-at-74

Bill Atkinson, architect of the Mac’s graphical soul, dies at 74

Using HyperCard, Teachers created interactive lessons, artists built multimedia experiences, and businesses developed custom database applications—all without writing traditional code. The hypermedia environment also had a huge impact on gaming: 1993 first-person adventure hit Myst originally used HyperCard as its game engine.

An example of graphical dithering, which allows 1-bit color (black and white only) to imitate grayscale.

An example of graphical dithering, which allows 1-bit color (black and white only) to imitate grayscale. Credit: Benj Edwards / Apple

For the two-color Macintosh (which could only display black or white pixels, with no gradient in between), Atkinson developed an innovative high-contrast dithering algorithm that created the illusion of grayscale images with a characteristic stippled appearance that became synonymous with early Mac graphics. The dithered aesthetic remains popular today among some digital artists and indie game makers, with modern tools like this web converter that allows anyone to transform photos into the classic Atkinson dither style.

Life after Apple

After leaving Apple in 1990, Atkinson co-founded General Magic with Marc Porat and Andy Hertzfeld, attempting to create personal communicators before smartphones existed. Wikipedia notes that in 2007, he joined Numenta, an AI startup, declaring their work on machine intelligence “more fundamentally important to society than the personal computer and the rise of the Internet.”

In his later years, Atkinson pursued nature photography with the same artistry he’d brought to programming. His 2004 book “Within the Stone” featured close-up images of polished rocks that revealed hidden worlds of color and pattern.

Atkinson announced his pancreatic cancer diagnosis in November 2024, writing on Facebook that he had “already led an amazing and wonderful life.” The same disease claimed his friend and collaborator Steve Jobs in 2011.

Given Atkinson’s deep contributions to Apple history, it’s not surprising that Jobs’ successor, Apple CEO Tim Cook, paid tribute to the Mac’s original graphics guru on X on Saturday. “We are deeply saddened by the passing of Bill Atkinson,” Cook wrote. “He was a true visionary whose creativity, heart, and groundbreaking work on the Mac will forever inspire us.”

Bill Atkinson, architect of the Mac’s graphical soul, dies at 74 Read More »

microsoft-dives-into-the-handheld-gaming-pc-wars-with-the-asus-rog-xbox-ally

Microsoft dives into the handheld gaming PC wars with the Asus ROG Xbox Ally

Back in March, we outlined six features we wanted to see on what was then just a rumored Xbox-branded, Windows-powered handheld gaming device. Today, Microsoft’s announcement of the Asus ROG Xbox Ally hardware line looks like it fulfills almost all of our wishes for Microsoft’s biggest foray into portable gaming yet.

The Windows-11-powered Xbox Ally devices promise access to “all of the games available on Windows,” including “games from Xbox, Game Pass, Battle.net, and other leading PC storefronts [read: Steam, Epic Games Store, Ubisoft Connect, etc].” But instead of having to install and boot up those games through the stock Windows interface, as you often do on handhelds like the original ROG Ally line, all these games will be available through what Microsoft is calling an “aggregated gaming library.”

Asus and Microsoft are stressing how that integrated experience can be used with games across multiple different Windows-based launchers, promising “access to games you can’t get elsewhere.” That could be seen as a subtle dig at SteamOS-powered devices like the Steam Deck, which can have significant trouble with certain titles that don’t play well with Steam and/or Linux for one reason or another. Microsoft also highlights how support apps like Discord, Twitch, and downloadable game mods will also be directly available via the Xbox Ally’s Windows backbone.

And while the Xbox Ally devices run Windows 11, they will boot to what Microsoft is calling the “Xbox Experience for Handheld,” a bespoke full-screen interface that hides the nitty-gritty of the Windows desktop by default. That gaming-focused interface will “minimize background activity and defer non-essential tasks,” meaning “more [and] higher framerates” for the games themselves, Microsoft says. A rhombus-shaped Xbox button located near the left stick will also launch an Xbox Game Bar overlay with quick access to functions like settings, performance metrics, and fast switching between titles. Microsoft also says it is working on a “Deck Verified”-style program for identifying Windows titles that “have been optimized for handhelds.”

Microsoft dives into the handheld gaming PC wars with the Asus ROG Xbox Ally Read More »