Author name: Mike M.

report:-apple-mail-is-getting-automatic-categories-on-ipados-and-macos

Report: Apple Mail is getting automatic categories on iPadOS and macOS

Unlike numerous other new and recent OS-level features from Apple, mail sorting does not require a device capable of supporting its Apple Intelligence (generally M-series Macs or iPads), and happens entirely on the device. It’s an optional feature and available only for English-language emails.

Apple released a third beta of MacOS 15.3 just days ago, indicating that early, developer-oriented builds of macOS 15.4 with the sorting feature should be weeks away. While Gurman’s newsletter suggests mail sorting will also arrive in the Mail app for iPadOS, he did not specify which version, though the timing would suggest the roughly simultaneous release of iPadOS 18.4.

Also slated to arrive in the same update for Apple-Intelligence-ready devices is the version of Siri that understands more context about questions, from what’s on your screen and in your apps. “Add this address to Rick’s contact information,” “When is my mom’s flight landing,” and “What time do I have dinner with her” are the sorts of examples Apple highlighted in its June unveiling of iOS 18.

Since then, Apple has divvied up certain aspects of Intelligence into different OS point updates. General ChatGPT access and image generation have arrived in iOS 18.2 (and related Mac and iPad updates), while notification summaries, which can be pretty rough, are being rethought and better labeled and will be removed from certain news notifications in iOS 18.3.

Report: Apple Mail is getting automatic categories on iPadOS and macOS Read More »

sleeping-pills-stop-the-brain’s-system-for-cleaning-out-waste

Sleeping pills stop the brain’s system for cleaning out waste


Cleanup on aisle cerebellum

A specialized system sends pulses of pressure through the fluids in our brain.

Our bodies rely on their lymphatic system to drain excessive fluids and remove waste from tissues, feeding those back into the blood stream. It’s a complex yet efficient cleaning mechanism that works in every organ except the brain. “When cells are active, they produce waste metabolites, and this also happens in the brain. Since there are no lymphatic vessels in the brain, the question was what was it that cleaned the brain,” Natalie Hauglund, a neuroscientist at Oxford University who led a recent study on the brain-clearing mechanism, told Ars.

Earlier studies done mostly on mice discovered that the brain had a system that flushed its tissues with cerebrospinal fluid, which carried away waste products in a process called glymphatic clearance. “Scientists noticed that this only happened during sleep, but it was unknown what it was about sleep that initiated this cleaning process,” Hauglund explains.

Her study found the glymphatic clearance was mediated by a hormone called norepinephrine and happened almost exclusively during the NREM sleep phase. But it only worked when sleep was natural. Anesthesia and sleeping pills shut this process down nearly completely.

Taking it slowly

The glymphatic system in the brain was discovered back in 2013 by Dr. Maiken Nedergaard, a Danish neuroscientist and a coauthor of Hauglund’s paper. Since then, there have been numerous studies aimed at figuring out how it worked, but most of them had one problem: they were done on anesthetized mice.

“What makes anesthesia useful is that you can have a very controlled setting,” Hauglund says.

Most brain imaging techniques require a subject, an animal or a human, to be still. In mouse experiments, that meant immobilizing their heads so the research team could get clear scans. “But anesthesia also shuts down some of the mechanisms in the brain,” Hauglund argues.

So, her team designed a study to see how the brain-clearing mechanism works in mice that could move freely in their cages and sleep naturally whenever they felt like it. “It turned out that with the glymphatic system, we didn’t really see the full picture when we used anesthesia,” Hauglund says.

Looking into the brain of a mouse that runs around and wiggles during sleep, though, wasn’t easy. The team pulled it off by using a technique called flow fiber photometry which works by imaging fluids tagged with fluorescent markers using a probe implanted in the brain. So, the mice got the optical fibers implanted in their brains. Once that was done, the team put fluorescent tags in the mice’s blood, cerebrospinal fluid, and on the norepinephrine hormone. “Fluorescent molecules in the cerebrospinal fluid had one wavelength, blood had another wavelength, and norepinephrine had yet another wavelength,” Hauglund says.

This way, her team could get a fairly precise idea about the brain fluid dynamics when mice were awake and asleep. And it turned out that the glymphatic system basically turned brain tissues into a slowly moving pump.

Pumping up

“Norepinephrine is released from a small area of the brain in the brain stem,” Hauglund says. “It is mainly known as a response to stressful situations. For example, in fight or flight scenarios, you see norepinephrine levels increasing.” Its main effect is causing blood vessels to contract. Still, in more recent research, people found out that during sleep, norepinephrine is released in slow waves that roll over the brain roughly once a minute. This oscillatory norepinephrine release proved crucial to the operation of the glymphatic system.

“When we used the flow fiber photometry method to look into the brains of mice, we saw these slow waves of norepinephrine, but we also saw how it works in synchrony with fluctuation in the blood volume,” Hauglund says.

Every time the norepinephrine level went up, it caused the contraction of the blood vessels in the brain, and the blood volume went down. At the same time, the contraction increased the volume of the perivascular spaces around the blood vessels, which were immediately filled with the cerebrospinal fluid.

When the norepinephrine level went down, the process worked in reverse: the blood vessels dilated, letting the blood in and pushing the cerebrospinal fluid out. “What we found was that norepinephrine worked a little bit like a conductor of an orchestra and makes the blood and cerebrospinal fluid move in synchrony in these slow waves,” Hauglund says.

And because the study was designed to monitor this process in freely moving, undisturbed mice, the team learned exactly when all this was going on. When mice were awake, the norepinephrine levels were much higher but relatively steady. The team observed the opposite during the REM sleep phase, where the norepinephrine levels were consistently low. The oscillatory behavior was present exclusively during the NREM sleep phase.

So, the team wanted to check how the glymphatic clearance would work when they gave the mice zolpidem, a sleeping drug that had been proven to increase NREM sleep time. In theory, zolpidem should have boosted brain-clearing. But it turned it off instead.

Non-sleeping pills

“When we looked at the mice after giving them zolpidem, we saw they all fell asleep very quickly. That was expected—we take zolpidem because it makes it easier for us to sleep,” Hauglund says. “But then we saw those slow fluctuations in norepinephrine, blood volume, and cerebrospinal fluid almost completely stopped.”

No fluctuations meant the glymphatic system didn’t remove any waste. This was a serious issue, because one of the cellular waste products it is supposed to remove is amyloid beta, found in the brains of patients suffering from Alzheimer’s disease.

Hauglund speculates it could be possible zolpidem induces a state very similar to sleep but at the same time it shuts down important processes that happen during sleep. While heavy zolpidem use has been associated with increased risk of the Alzheimer disease, it is not clear if this increased risk was there because the drug was inhibiting oscillatory norepinephrine release in the brain. To better understand this, Hauglund wants to get a closer look into how the glymphatic system works in humans.

“We know we have the same wave-like fluid dynamics in the brain, so this could also drive the brain clearance in humans,” Haugland told Ars. “Still, it’s very hard to look at norepinephrine in the human brain because we need an invasive technique to get to the tissue.”

But she said norepinephrine levels in people can be estimated based on indirect clues. One of them is pupil dilation and contraction, which work in in synchrony with the norepinephrine levels. Another other clue may lay in microarousals—very brief, imperceivable awakenings which, Hauglund thinks, can be correlated with the brain clearing mechanism. “I am currently interested in this phenomenon […]. Right now we have no idea why microarousals are there or what function they have” Hauglund says.

But the last step she has on her roadmap is making better sleeping pills. “We need sleeping drugs that don’t have this inhibitory effect on the norepinephrine waves. If we can have a sleeping pill that helps people sleep without disrupting their sleep at the same time it will be very important,” Hauglund concludes.

Cell, 2025. DOI: 10.1016/j.cell.2024.11.027

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Sleeping pills stop the brain’s system for cleaning out waste Read More »

trek-fx+-7s-e-bike-is-a-premium-city-commuter 

Trek FX+ 7S e-bike is a premium city commuter 

Post-pandemic, my creed became “Bicycles deliver the freedom that auto ads promise.” That belief is why I’ve almost exclusively used a bike to move myself around Portland, Oregon since (yes, I have become a Portlandia stereotype).

However, that lifestyle is a lot more challenging without some pedal assistance. For a few summers, I showed up sweaty to appointments after pedaling on a $200 single-speed. So in 2024, I purchased the FX+ 2, based primarily on my managing editor’s review. It’s since been a workhorse for my daily transportation needs for the past year; I’ve put more than 1,000 miles on it in eight months.

So given my experience with that bike, I was the natural choice to review Trek’s upgraded version, the FX+ 7S.

A premium pedaler

First off, my time with the FX+ 2 has been great—no regrets about that purchase. But my one quibble is with the battery. Due to the frequency and length of my rides, I need to charge the bike more often than not, and I sometimes experience range anxiety riding to the opposite side of town. Even though both e-bikes are considered lightweight at 40 pounds, they’re still not the easiest things to pedal sans assist, and I’m reliant on their built-in lighting systems after dark.

But I didn’t have to worry about my remaining charge with the FX+ 7 and its 360 Wh battery. Its extra capacity gives me much less range anxiety, as I can ride without fear of losing juice on the route home. And the LCD on the frame gives you a clear indicator of how much distance and time you have left in your ride, which is always handy. I would caution, however, about relying too much on your estimated distance remaining.

The Trek FX+7's LCD screen show the charge remaining in the bike.

The LCD provides some useful info. You can see how much charge is left on the battery, or you can press that button to see your speed, wattage power, or miles ridden. Credit: Chris DeGraw

During a 15-mile, hour-long ride while fluctuating between the first two assist levels I had modified, I drained 61 percent of the battery. While the estimated time remaining on my ride was consistent and accurate, the predicted mileage dropped occasionally, although that’s probably because I was changing the assist level frequently.

Trek FX+ 7S e-bike is a premium city commuter  Read More »

gm-patents-a-dual-port-charging-system-for-evs-with-vehicle-to-load

GM patents a dual-port charging system for EVs with vehicle-to-load

The battery system on an electric car can either charge—from regenerative braking or an external power supply—or discharge—powering the EV’s motor(s) or supplying that power via so-called vehicle-to-load. As a rule, it can’t do both at once, but General Motors has some thoughts about that. The patent analysis site CarMoses spotted a recent GM patent application for a system that is capable of charging and discharging simultaneously.

The patent describes a “charging system” with a pair of charging ports. One is for drawing power from an external source, just like every other EV. The second charge port is connected to a bi-directional charger, and the battery management system is able to charge the battery pack from the first port while also supplying power from the second port.

That second port could be used to charge another battery, including the battery of another EV, and the patent includes an illustration of three EVs daisy-chained to each other.

Credit: USPTO

The idea of two charge ports on an EV is not unheard of; Porsche’s Taycan (and the related Audi e-tron GT) have one on each side, and it’s an option on the newer PPE-based EVs from those brands, if I’m not mistaken. I have no idea whether GM’s patent will show up on a production EV—car companies patent many more ideas than they ever get around to building, after all.

And I must admit, I’m not entirely sure what the use case is beyond seeing how long of an EV-centipede you could make by plugging one into another into another, and so on. But I am intrigued.

GM patents a dual-port charging system for EVs with vehicle-to-load Read More »

ai-#99:-farewell-to-biden

AI #99: Farewell to Biden

The fun, as it were, is presumably about to begin.

And the break was fun while it lasted.

Biden went out with an AI bang. His farewell address warns of a ‘Tech-Industrial Complex’ and calls AI the most important technology of all time. And there was not one but two AI-related everything bagel concrete actions proposed – I say proposed because Trump could undo or modify either or both of them.

One attempts to build three or more ‘frontier AI model data centers’ on federal land, with timelines and plans I can only summarize with ‘good luck with that.’ The other move was new diffusion regulations on who can have what AI chips, an attempt to actually stop China from accessing the compute it needs. We shall see what happens.

  1. Table of Contents.

  2. Language Models Offer Mundane Utility. Prompt o1, supercharge education.

  3. Language Models Don’t Offer Mundane Utility. Why do email inboxes still suck?

  4. What AI Skepticism Often Looks Like. Look at all it previously only sort of did.

  5. A Very Expensive Chatbot. Making it anatomically incorrect is going to cost you.

  6. Deepfaketown and Botpocalypse Soon. Keep assassination agents underfunded.

  7. Fun With Image Generation. Audio generations continue not to impress.

  8. They Took Our Jobs. You can feed all this through o1 pro yourself, shall we say.

  9. The Blame Game. No, it is not ChatGPT’s fault that guy blew up a cybertruck.

  10. Copyright Confrontation. Yes, Meta and everyone else train on copyrighted data.

  11. The Six Million Dollar Model. More thoughts on how they did it.

  12. Get Involved. SSF, Anthropic and Lightcone Infrastructure.

  13. Introducing. ChatGPT can now schedule tasks for you. Yay? And several more.

  14. In Other AI News. OpenAI hiring to build robots.

  15. Quiet Speculations. A lot of people at top labs do keep predicting imminent ASI.

  16. Man With a Plan. PM Kier Starmer takes all 50 Matt Clifford recommendations.

  17. Our Price Cheap. Personal use of AI has no meaningful environmental impact.

  18. The Quest for Sane Regulations. Weiner reloads, Amodei genuflects.

  19. Super Duper Export Controls. Biden proposes export controls with complex teeth.

  20. Everything Bagel Data Centers. I’m sure this ‘NEPA’ thing won’t be a big issue.

  21. d/acc Round 2. Vitalik Buterin reflects on a year of d/acc.

  22. The Week in Audio. Zuckerberg on Rogan, and several sound bites.

  23. Rhetorical Innovation. Ultimately we are all on the same side.

  24. Aligning a Smarter Than Human Intelligence is Difficult. OpenAI researcher.

  25. Other People Are Not As Worried About AI Killing Everyone. Give ‘em hope.

  26. The Lighter Side. Inventing the wheel.

Help dyslexics get around their inability to spell to succeed in school, and otherwise help kids with disabilities. Often, we have ways to help everyone, but our civilization is willing to permit them for people who are ‘behind’ or ‘disadvantaged’ or ‘sick’ but not to help the average person become great – if it’s a problem everyone has, how dare you try to solve it. Well, you do have to start somewhere.

Diagnose medical injuries. Wait, Elon Musk, maybe don’t use those exact words?

The original story that led to that claim is here from AJ Kay. The doctor and radiologist said her daughter was free of breaks, Grok found what it called an ‘obvious’ fracture line, they went to a wrist specialist, who found it, confirmed it was obvious and cast it, which they say likely avoided a surgery.

Used that way LLMs seem insanely great versus doing nothing. You use them as an error check and second opinion. If they see something, you go follow up with a doctor to verify. I’d go so far as to say that if you have a diagnostic situation like this and you feel any uncertainty, and you don’t do at least this, that seems irresponsible.

A suggested way to prompt o1 (and o1 Pro especially):

Greg Brockman: o1 is a different kind of model. great performance requires using it in a new way relative to standard chat models.

Dan Mac: This is an amazing way to think about prompting o1 from @benhylak.

Ben Hylak: Don’t write prompts; write briefs. Give a ton of context. Whatever you think I mean by a “ton” — 10x that.

In short, treat o1 like a new hire. Beware that o1’s mistakes include reasoning about how much it should reason.

Once you’ve stuffed the model with as much context as possible — focus on explaining what you want the output to be.

This requires you to really know exactly what you want (and you should really ask for one specific output per prompt — it can only reason at the beginning!)

What o1 does well: Perfectly one-shotting entire/multiple files, hallucinating less, medical diagnosis (including for use by professionals), explaining concepts.

What o1 doesn’t do well: Writing in styles, building entire apps.

Another strategy is to first have a conversation with Claude Sonnet, get a summary, and use it as context (Rohit also mentions GPT-4o, which seems strictly worse here but you might not have a Claude subscription). This makes a lot of sense, especially when using o1 Pro.

Alternate talking with o1 and Sonnet when talking through ideas, Gallabytes reports finding this helpful.

The streams are crossing, Joe Weisenthal is excited that Claude can run and test out its own code for you.

People on the internet sometimes lie, especially about cheating, film at 11. But also the future is highly unevenly distributed, and hearing about something is different from appreciating it.

Olivia Moore: Absolutely no way that almost 80% of U.S. teens have heard of ChatGPT, but only 26% use it for homework 👀

Sully: if i was a teen using chatgpt for homework i would absolutely lie.

Never? No, never. What, never? Well, actually all the time.

I also find it hard to believe that students are this slow, especially given this is a very low bar – it’s whether you even once asked for ‘help’ at all, in any form. Whereas ChatGPT has 300 million users.

When used properly, LLMs are clearly amazingly great at education.

Ethan Mollick: New randomized, controlled trial of students using GPT-4 as a tutor in Nigeria. 6 weeks of after-school AI tutoring = 2 years of typical learning gains, outperforming 80% of other educational interventions.

And it helped all students, especially girls who were initially behind.

No working paper yet, but the results and experiment are written up here. They used Microsoft Copilot and teachers provided guidance and initial prompts.

To make clear the caveats for people who don’t read the post: learning gains are measured in Equivalent Years of Schooling, this is a pilot study on narrow topics and they do not have long-term learning measures. And there is no full paper yet (but the team is credible)

World Bank Blogs: The learning improvements were striking—about 0.3 standard deviations. To put this into perspective, this is equivalent to nearly two years of typical learning in just six weeks.

What does that say about ‘typical learning’? A revolution is coming.

Sully suggests practical improvements for Claude’s web app to increase engagement. Agreed that they should improve artifacts and include a default search tool. The ability to do web search seems super important. The ‘feel’ issue he raises doesn’t bother me.

Use a [HIDDEN][/HIDDEN] tag you made up to play 20 questions with Claude, see what happens.

Straight talk: Why do AI functions of applications like GMail utterly suck?

Nabeel Qureshi: We have had AI that can type plausible replies to emails for at least 24 months, but when I open Outlook or Gmail I don’t have pre-written drafts of all my outstanding emails waiting for me to review yet. Why are big companies so slow to ship these obvious features?

The more general version of this point is also striking – I don’t use any AI features at all in my usual suite of “pre-ChatGPT” products.

For meetings, most people (esp outside of tech) are still typing “Sure, I’d love to chat! Here are three free slots over the next few days (all times ET)”, all of which is trivially automated by LLMs now.

(If even tech companies are this slow to adjust, consider how much slower the adjustment in non-tech sectors will be…).

I know! What’s up with that?

Cyberpunk Plato: Doing the compute for every single email adds up fast. Better to have the user request it if they want it.

And at least for business software there’s a concern that if it’s built in you’re liable for it being imperfect. Average user lacks an understanding of limitations.

Nabeel Qureshi: Yeah – this seems plausibly it.

I remember very much expecting this sort of thing to be a big deal, then the features sort of showed up but they are so far universally terrible and useless.

I’m going to go ahead and predict that at least the scheduling problem will change in 2025 (although one can ask why they didn’t do this feature in 2015). As in, if you have an email requesting a meeting, GMail will offer you an easy way (a button, a short verbal command, etc) to get an AI to do the meeting scheduling for you, at minimum drafting the email for you, and probably doing the full stack back and forth and creating the eventual event, with integration with Google Calendar and a way of learning your preferences. This will be part of the whole ‘year of the agent’ thing.

For the general issue, it’s a great question. Why shouldn’t GMail be drafting your responses in advance, at least if you have a subscription that pays for the compute and you opt in, giving you much better template responses, that also have your context? Is it that hard to anticipate the things you might write?

I mostly don’t want to actually stop to tell the AI what to write at current levels of required effort – by the time I do that I might as well have written it. It needs to get to a critical level of usefulness, then you can start customizing and adapting from there.

If 2025 ends and we still don’t have useful features of these types, we’ll want to rethink.

What we don’t have are good recommendation engines, even locally, certainly not globally.

Devon Hardware’s Wife: should be a letterboxd app but it is for every human experience. i could log in and see a friend has recently reviewed “having grapes”. i could go huh they liked grapes more than Nosferatu

Joe Weisenthal: What I want is an everything recommendation app. So if I say I like grapes and nosferatu, it’ll tell me what shoes to buy.

Letterboxd doesn’t even give you predictions for your rating of other films, seriously, what is up with that?

Robin Hanson: A bad sign for LLM applications.

That sign: NewScientist comes home (on January 2, 2025):

New Scientist: Multiple experiments showed that four leading large language models often failed in patient discussions to gather complete histories, the best only doing so 71% of the time, and even then they did not always get the correct diagnosis.

New Scientist’s Grandmother: o1, Claude Sonnet and GPT-4o, or older obsolete models for a paper submitted in August 2023?

New Scientist, its head dropping in shame: GPT-3.5 and GPT-4, Llama-2-7B and Mistral-v2-7B for a paper submitted in August 2023.

Also there was this encounter:

New Scientist, looking like Will Smith: Can an AI always get a complete medical history and the correct diagnosis from talking to a patient?

GPT-4 (not even 4o): Can you?

New Scientist: Time to publish!

It gets better:

If an AI model eventually passes this benchmark, consistently making accurate diagnoses based on simulated patient conversations, this would not necessarily make it superior to human physicians, says Rajpurkar. He points out that medical practice in the real world is “messier” than in simulations. It involves managing multiple patients, coordinating with healthcare teams, performing physical exams and understanding “complex social and systemic factors” in local healthcare situations.

“Strong performance on our benchmark would suggest AI could be a powerful tool for supporting clinical work – but not necessarily a replacement for the holistic judgement of experienced physicians,” says Rajpurkar.

I love the whole ‘holistic judgment means we should overrule the AI with human judgment even though the studies are going to find that doing this makes outcomes on average worse’ which is where we all know that is going. And also the ‘sure it will do [X] better but there’s some other task [Y] and it will never do that, no no!’

The core idea here is actually pretty good – that you should test LLMs for real medical situations by better matching real medical situations and their conditions. They do say the ‘patient AI’ and ‘grader AI’ did remarkably good jobs here, which is itself a test of AI capabilities as well. They don’t seem to offer a human baseline measurement, which seems important to knowing what to do with all this.

And of course, we have no idea if there was opportunity to radically improve the results with better prompt engineering.

I do know that I predict that o3-mini or o1-pro, with proper instructions, will match or exceed human baseline (the median American practicing doctor) for gathering a complete medical history. And I would expect it to also do so for diagnosis.

I encourage one reader to step up, email them for the code (the author emails are listed in the paper) and then test at least o1.

This is Aria, their flagship AI-powered humanoid robot ‘with a social media presence’. Part 2 of the interview here. It’s from Realbotix. You can get a ‘full bodied robot’ starting at $175,000.

They claim that social robots will be even bigger than functional robots, and aim to have their robots not only ‘learn about and help promote your brand’ but also learn everything about you and help ‘with the loneliness epidemic among adolescents and teenagers and bond with you.’

And yes they use the ‘boyfriend or girlfriend’ words. You can swap faces in 10 seconds, if you want more friends or prefer polyamory.

It has face and voice recognition, and you can plug in whatever AI you like – they list Anthropic, OpenAI, DeepMind, Stability and Meta on their website.

It looks like this:

Its movements in the video are really weird, and worse than not moving at all if you exclude the lips moving as she talks. They’re going to have to work on that.

Yes, we all know a form of this is coming, and soon. And yes, these are the people from Whitney Cummings’ pretty funny special Can I Touch It? so I can confirm that the answer to ‘can I?’ can be yes if you want it to be.

But for Aria the answer is no. For a yes and true ‘adult companionship’ you have to go to their RealDoll subdivision. On the plus side, that division is much cheaper, starting at under $10k and topping out at ~$50k.

I had questions, so I emailed their press department, but they didn’t reply.

My hunch is that the real product is the RealDoll, and what you are paying the extra $100k+ for with Aria is a little bit extra mobility and such but mostly so that it does have those features so you can safely charge it to your corporate expense account, and perhaps so you and others aren’t tempted to do something you’d regret.

Pliny the Liberator claims to have demonstrated a full-stack assassination agent, that would if given funds have been capable of ‘unaliving people,’ with Claude Sonnet 3.6 being willing to select real world targets.

Introducing Astral, an AI marketing AI agent. It will navigate through the standard GUI websites like Reddit and soon TikTok and Instagram, and generate ‘genuine interactions’ across social websites to promote your startup business, in closed beta.

Matt Palmer: At long last, we have created the dead internet from the classic trope “dead internet theory.”

Tracing Woods: There is such a barrier between business internet and the human internet.

On business internet, you can post “I’ve built a slot machine to degrade the internet for personal gain” and get a bunch of replies saying, “Wow, cool! I can’t wait to degrade the internet for personal gain.”

It is taking longer than I expected for this type of tool to emerge, but it is coming. This is a classic situation where various frictions were preserving our ability to have nice things like Reddit. Without those frictions, we are going to need new ones. Verified identity or paid skin in the game, in some form, is the likely outcome.

Out with the old, in with the new?

Janel Comeau: sort of miss the days when you’d tweet “I like pancakes” and a human would reply “oh, so you hate waffles” instead of twelve AI bots responding with “pancakes are an enjoyable food”

Instagram ads are the source of 90% of traffic for a nonconsensual nudity app Crushmate or Crush AI, with the ads themselves featuring such nonconsensual nudity of celebrities such as Sophie Rain. I did a brief look-see at the app’s website. They have a top scroll saying ‘X has just purchased’ which is what individual struggling creators do, so it’s probably 90% of not very much, and when you’re ads driven you choose where the ads go. But it’s weird, given what other ads don’t get approved, that they can get this level of explicit past the filters. The ‘nonconsensual nudity’ seems like a side feature of a general AI-image-and-spicy-chat set of offerings, including a number of wholesome offerings too.

AI scams are still rare, and mostly get detected, but it’s starting modulo the lizardman constant issue:

Richard Hanania notes that the bot automatic social media replies are getting better, but says ‘you can still something is off here.’ I did not go in unanchored, but this does not seem as subtle as he makes it out to be, his example might as well scream AI generated:

My prior on ‘that’s AI’ is something like 75% by word 4, 95%+ after the first sentence. Real humans don’t talk like that.

I also note that it seems fairly easy to train an AI classifier to do what I instinctively did there, and catch things like this with very high precision. If it accidentally catches a few college undergraduates trying to write papers, I notice my lack of sympathy.

But that’s a skill issue, and a choice. The reason Aiman’s response is so obvious is that it has exactly that RLHF-speak. One could very easily fine tune in a different direction, all the fine tuning on DeepSeek v3 was only five figures in compute and they give you the base model to work with.

Richard Hanania: The technology will get better though. We’ll eventually get to the point that if your account is not connected to a real person in the world, or it wasn’t grandfathered in as an anonymous account, people will assume you’re a bot because there’s no way to tell the difference.

That will be the end of the ability to become a prominent anonymous poster.

I do continue to expect things to move in that direction, but I also continue to expect there to be ways to bootstrap. If nothing else, there is always money. This isn’t flawless, as Elon Musk as found out with Twitter, but it should work fine, so long as you reintroduce sufficient friction and skin in the game.

The ability to elicit the new AI generated song Six Weeks from AGI causes Steve Sokolowski to freak out about potential latent capabilities in other AI models. I find it heavily mid to arrive at this after a large number of iterations and amount of human attention, especially in terms of its implications, but I suppose it’s cool you can do that.

Daron Acemoglu is economically highly skeptical of and generally against AI. It turns out this isn’t about the A, it’s about the I, as he offers remarkably related arguments against H-1B visas and high skilled human immigration.

The arguments here are truly bizarre. First he says if we import people with high skills, then this may prevent us from training our own people with high skills, And That’s Terrible. Then he says, if we import people with high skills, we would have more people with high skills, And That’s Terrible as well because then technology will change to favor high-skilled workers. Tyler Cowen has o1 and o1 pro respond, as a meta-commentary on what does and doesn’t constitute high skill these days.

Tyler Cowen: If all I knew were this “exchange,” I would conclude that o1 and o1 pro were better economists — much better — than one of our most recent Nobel Laureates, and also the top cited economist of his generation. Noah Smith also is critical.

Noah Smith (after various very strong argument details): So Acemoglu wants fewer H-1bs so we have more political pressure for domestic STEM education. But he also thinks having more STEM workers increases inequality, by causing inventors to focus on technologies that help STEM workers instead of normal folks! These two arguments clearly contradict each other.

In other words, it seems like Acemoglu is grasping for reasons to support a desired policy conclusion, without noticing that those arguments are inconsistent. I suppose “finding reasons to support a desired policy conclusion” is kind of par for the course in the world of macroeconomic theory, but it’s not a great way to steer national policy.

Noah Smith, Tyler Cowen and o1 are all highly on point here.

In terms of AI actually taking our jobs, Maxwell Tabarrok reiterates his claim that comparative advantage will ensure human labor continues to have value, no matter how advanced and efficient AI might get, because there will be a limited supply of GPUs, datacenters and megawatts, and advanced AIs will face constraints, even if they could do all tasks humans could do more efficiently (in some senses) than we can.

I actually really like Maxwell’s thread here, because it’s a simple, short, clean and within its bounds valid version of the argument.

His argument successfully shows that, absent transaction costs and the literal cost of living, assuming humans have generally livable conditions with the ability to protect their private property and engage in trade and labor, and given some reasonable additional assumptions not worth getting into here, human labor outputs will retain positive value in such a world.

He shows this value would likely converge to some number higher than zero, probably, for at least a good number of people. It definitely wouldn’t be all of them, since it already isn’t, there are many ZMP (zero marginal product) workers you wouldn’t hire at $0.

Except we have no reason to think that number is all that much higher than $0. And then you have to cover not only transaction costs, but the physical upkeep costs of providing human labor, especially to the extent those inputs are fungible with AI inputs.

Classically, we say ‘the AI does not love, you the AI does not hate you, but you are made of atoms it can use for something else.’ In addition to the atoms that compose you, you require sustenance of various forms to survive, especially if you are to live a life of positive value, and also to include all-cycle lifetime costs.

Yes, in such scenarios, the AIs will be willing to pay some amount of real resources for our labor outputs, in trade. That doesn’t mean this amount will be enough to pay for the imports to those outputs. I see no reason to expect that it would clear the bar of the Iron Law of Wages, or even near term human upkeep.

This is indeed what happened to horses. Marginal benefit mostly dropped below marginal cost, the costs to maintain horses were fungible with paying costs for other input factors, so quantity fell off a cliff.

Seb Krier says a similar thing in a different way, noticing that AI agents can be readily cloned, so at the limit for human labor to retain value you need to be sufficiently compute constrained that there are sufficiently valuable tasks left for humans to do. Which in turn relies on non-fungibility of inputs, allowing you to take the number of AIs and humans as given.

Davidad: At equilibrium, in 10-20 years, the marginal price of nonphysical labour could be roughly upper-bounded by rent for 0.2m² of arid land, £0.02/h worth of solar panel, and £0.08/h worth of GPU required to run a marginal extra human-equivalent AI agent.

For humans to continue to be able to survive, they need to pay for themselves. In these scenarios, doing so off of labor at fair market value seems highly unlikely. That doesn’t mean the humans can’t survive. As long as humans remain in control, this future society is vastly wealthier and can afford to do a lot of redistribution, which might include reserving fake or real jobs and paying non-economic wages for them. It’s still a good thing, I am not against all this automation (again, if we can do so while retaining control and doing sufficient redistribution). The price is still the price.

One thing AI algorithms never do is calculate p-values, because why would they?

The Verge’s Richard Lawler reports that Las Vegas police have released ChatGPT logs from the suspect in the Cybertruck explosion. We seem to have his questions but not the replies.

It seems like… the suspect used ChatGPT instead of Google, basically?

Here’s the first of four screenshots:

Richard Lawler (The Verge): Trying the queries in ChatGPT today still works, however, the information he requested doesn’t appear to be restricted and could be obtained by most search methods.

Still, the suspect’s use of a generative AI tool and the investigators’ ability to track those requests and present them as evidence take questions about AI chatbot guardrails, safety, and privacy out of the hypothetical realm and into our reality.

The Spectator Index: BREAKING: Person who blew up Tesla Cybertruck outside Trump hotel in Las Vegas used ChatGPT to help in planning the attack.

Spence Purnell: PSA: Tech is not responsible for horrible human behavior, and regulating it will not stop bad actors.

There are certainly steps companies can take and improvements to be made, but let’s not blame the tech itself.

Colin Fraser: The way cops speak is so beautiful.

[He quotes]: Police Sheriff Kevin McMahill said: “I think this is the first incident that I’m aware of on U.S. soil where ChatGPT is utilized to help an individual build a particular device.”

When you look at the questions he asked, it is pretty obvious he is planning to build a bomb, and an automated AI query that (for privacy reasons) returned one bit of information would give you that information without many false positives. The same is true of the Google queries of many suspects after they get arrested.

None of this is information that would have been hard to get via Google. ChatGPT made his life modestly easier, nothing more. I’m fine with that, and I wouldn’t want ChatGPT to refuse such questions, although I do think ‘we can aspire to do better’ here in various ways.

And in general, yes, people like cops and reporters are way too quick to point to the tech involved, such as ChatGPT, or to the cybertruck, or the explosives, or the gun. Where all the same arguments are commonly made, and are often mostly or entirely correct.

But not always. It is common to hear highly absolutist responses, like the one by Purnell above, that regulation of technology ‘will not stop bad actors’ and thus would have no effect. That is trying to prove too much. Yes, of course you can make life harder for bad actors, and while you won’t stop all of them entirely and most of the time it totally is not worth doing, you can definitely reduce your expected exposure.

This example does provide a good exercise, where hopefully we can all agree this particular event was fine if not ideal, and ask what elements would need to change before it was actively not fine anymore (as opposed to ‘we would ideally like you to respond noticing what is going on and trying to talk him out of it’ or something). What if the device was non-conventional? What if it more actively helped him engineer a more effective device in various ways? And so on.

Zuckerberg signed off on Meta training on copyrighted works, oh no. Also they used illegal torrent to download works for training, which does seem not so awesome I suppose, but yes of course everyone is training on all the copyrighted works.

What is DeepSeek v3’s secret? Did they really train this thing for $5.5 million?

China Talk offers an analysis. The answer is: Yes, but in other ways no.

The first listed secret is that DeepSeek has no business model. None. We’re talking about sex-in-the-champaign-room levels of no business model. They release models, sure, but not to make money, and also don’t raise capital. This allows focus. It is classically a double edged sword, since profit is a big motivator, and of course this is why DeepSeek was on a limited budget.

The other two secrets go together: They run their own datacenters, own their own hardware and integrate all their hardware and software together for maximum efficiency. And they made this their central point of emphasis, and executed well. This was great at pushing the direct quantities of compute involved down dramatically.

The trick is, it’s not so cheap or easy to get things that efficient. When you rack your own servers, you get reliability and confidentiality and control and ability to optimize, but in exchange your compute costs more than when you get it from a cloud service.

Jordan Schneider and Lily Ottinger: A true cost of ownership of the GPUs — to be clear, we don’t know if DeepSeek owns or rents the GPUs — would follow an analysis similar to the SemiAnalysis total cost of ownership model (paid feature on top of the newsletter) that incorporates costs in addition to the actual GPUs. For large GPU clusters of 10K+ A/H100s, line items such as electricity end up costing over $10M per year. The CapEx on the GPUs themselves, at least for H100s, is probably over $1B (based on a market price of $30K for a single H100).

With headcount costs that can also easily be over $10M per year, estimating the cost of a year of operations for DeepSeek AI would be closer to $500M (or even $1B+) than any of the $5.5M numbers tossed around for this model.

Since they used H800s, not H100s you’ll need to adjust that, but the principle is similar. Then you have to add on the cost of the team and its operations, to create all these optimizations and reach this point. Getting the core compute costs down is still a remarkable achievement, and raises big governance questions and challenges whether we can rely on export controls. Kudos to all involved. But this approach has its own challenges.

The alternative hypothesis does need to be said, especially after someone at a party outright claimed it was obviously true, and with the general consensus that the previous export controls were not all that tight. That alternative hypothesis is that DeepSeek is lying and actually used a lot more compute and chips it isn’t supposed to have. I can’t rule it out.

Survival and Flourishing Fund is hiring a Full-Stack Software Engineer.

Anthropic’s Alignment Science team suggests research directions. Recommended.

We’re getting to the end of the fundraiser for Lightcone Infrastructure, and they’re on the bubble of where they have sufficient funds versus not. You can donate directly here.

A very basic beta version of ChatGPT tasks, or according to my 4o instance GPT-S, I presume for scheduler. You can ask it to schedule actions in the future, either once or recurring. It will provide the phone notifications. You definitely weren’t getting enough phone notifications.

Anton: They turned the agi into a todo list app 🙁

They will pay for this.

Look how they rlhf’d my boy :'(

It looks like they did this via scheduling function calls based on the iCal VEVENT format, claimed instruction set here. Very basic stuff.

In all seriousness, incorporating a task scheduler by itself, in the current state of available other resources, is a rather limited tool. You can use it for reminders and timers, and perhaps it is better than existing alternatives for that. You can use it to ‘generate news briefing’ or similarly check the web for something. When this gets more integrations, and broader capability support over time, that’s when this gets actually interesting.

The initial thing that might be interesting right away is to do periodic web searches for potential information, as a form of Google Alerts with more discernment. Perhaps keep an eye on things like concerts and movies playing in the area. The basic problem is that right now this new assistant doesn’t have access to many tools, and it doesn’t have access to your context, and I expect it to flub complicated tasks.

GPT-4o agreed that most of the worthwhile uses require integrations that do not currently exist.

For now, the product is not reliably causing tasks to fire. That’s an ordinary first-day engineering problem that I assume gets fixed quickly, if it hasn’t already. But until it can do more complex things or integrate the right context automatically, ideally both, we don’t have much here.

I would note that you mostly don’t need to test the task scheduler by scheduling a task. We can count on OpenAI to get ‘cause this to happen at time [X]’ correct soon enough. The question is, can GPT-4o do [X] at all? Which you can test by telling it to do [X] now.

Reddit Answers, an LLM-based search engine. Logging in gets you 20 questions a day.

ExoRoad, a fun little app where you describe your ideal place to live and it tells you what places match that.

Lightpage, a notes app that then uses AI that remembers all of your notes and prior conversations. And for some reason it adds in personalized daily inspiration. I’m curious to see such things in action, but the flip side of the potential lock-in effects are the startup costs. Until you’ve taken enough notes to give this context, it can’t do the task it wants to do, so this only makes sense if you don’t mind taking tons of notes ‘out of the gate’ without the memory features, or if it could import memory and context. And presumably this wants to be a Google, Apple or similar product, so the notes integrate with everything else.

Shortwave, an AI email app which can organize and manage your inbox.

Writeup of details of WeirdML, a proposed new benchmark I’ve mentioned before.

Summary of known facts in Suchir Balaji’s death, author thinks 96% chance it was a suicide. The police have moved the case to ‘Open and Active Investigation.’ Good. If this wasn’t foul play, we should confirm that.

Nothing to see here, just OpenAI posting robotics hardware roles to ‘build our robots.’

Marc Andreessen has been recruiting and interviewing people for positions across the administration including at DoD (!) and intelligence agencies (!!). To the victor go the spoils, I suppose.

Nvidia to offer $3,000 personal supercomputer with a Blackwell chip, capable of running AI models up to 200B parameters.

An ‘AI hotel’ and apartment complex is coming to Las Vegas in May 2025. Everything works via phone app, including door unlocks. Guests get onboarded and tracked, and are given virtual assistants called e-butlers, to learn guest preferences including things like lighting and temperature, and give guests rooms (and presumably other things) that match their preferences. They then plan to expand the concept globally, including in Dubai. Prices sound steep, starting with $300 a night for a one bedroom. What will this actually get you? So far, seems unclear.

I see this as clearly going in a good direction, but I worry it isn’t ready. Others see it as terrible that capitalism knows things about them, but in most contexts I find that capitalism knowing things about me is to my benefit, and this seems like an obvious example and a win-win opportunity, as Ross Rheingans-Yoo notes?

Tyler Cowen: Does it know I want a lot of chargers, thin pillows, and lights that are easy to turn off at night? Furthermore the shampoo bottle should be easy to read in the shower without glasses. Maybe it knows now!

I’ve talked about it previously, but I want full blackout at night, either true silence or convenient white noise that fixes this, thick pillows and blankets, lots of chargers, a comfortable chair and desk, an internet-app-enabled TV and some space in a refrigerator and ability to order delivery right to the door. If you want to blow my mind, you can have a great multi-monitor setup to plug my laptop into and we can do real business.

Aidan McLau joins OpenAI to work on model design, offers to respond if anyone has thoughts on models. I have thoughts on models.

To clarify what OpenAI employees are often saying about superintelligence (ASI): No, they are not dropping hints that they currently have ASI internally. They are saying that they know how to build ASI internally, and are on a path to soon doing so. You of course can choose the extent to which you believe them.

Ethan Mollick writes Prophecies of the Flood, pointing out that the three major AI labs all have people shouting from the rooftops that they are very close to AGI and they know how to build it, in a way they didn’t until recently.

As Ethan points out, we are woefully unprepared. We’re not even preparing reasonably for the mundane things that current AIs can do, in either the sense of preparing for risks, or in the sense of taking advantage of its opportunities. And almost no one is giving much serious thought to what the world full of AIs will actually look like and what version of it would be good for humans, despite us knowing such a world is likely headed our way. That’s in addition to the issue that these future highly capable systems are existential risks.

Gary Marcus predictions for the end of 2025, a lot are of the form ‘[X] will continue to haunt generative AI’ without reference to magnitude. Others are predictions that we won’t cross some very high threshold – e.g. #16 is ‘Less than 10% of the workforce will be replaced by AI, probably less than 5%,’ notice how dramatically higher a bar that is than for example Tyler Cowen’s 0.5% RGDP growth and this is only in 2025.

His lower confidence predictions start to become aggressive and specific enough that I expect them to often be wrong (e.g. I expect a ‘GPT-5 level’ model no matter what we call that, and I expect AI companies to outperform the S&P and for o3 to see adaptation).

Eli Lifland gives his predictions and evaluates some past ones. He was too optimistic on agents being able to do routine computer tasks by EOY 2024, although I expect to get to his thresholds this year. While all three of us agree that AI agents will be ‘far from reliable’ for non-narrow tasks (Gary’s prediction #9) I think they will be close enough to be quite useful, and that most humans are ‘not reliable’ in this sense.

He’s right of course, and this actually did update me substantially on o3?

Sam Altman: prediction: the o3 arc will go something like:

1. “oh damn it’s smarter than me, this changes everything ahhhh”

2. “so what’s for dinner, anyway?”

3. “can you believe how bad o3 is? and slow? they need to hurry up and ship o4.”

swag: wait o1 was smarter than me.

Sam Altman: That’s okay.

The scary thing about not knowing is the right tail where something like o3 is better than you think it is. This is saying, essentially, that this isn’t the case? For now.

Please take the very consistently repeated claims from the major AI labs about both the promise and danger of AI both seriously and literally. They believe their own hype. That doesn’t mean you have to agree with those claims. It is very reasonable to think these people are wrong, on either or both counts, and they are biased sources. I am however very confident that they themselves believe what they are saying in terms of expected future AI capabilities, and when they speak about AI existential risks. I am also confident they have important information that you and I do not have, that informs their opinions.

This of course does not apply to claims regarding a company’s own particular AI application or product. That sort of thing is always empty hype until proven otherwise.

Via MR, speculations on which traits will become more versus less valuable over time. There is an unspoken background assumption here that mundane-AI is everywhere and automates a lot of work but doesn’t go beyond that. A good exercise, although I am not in agreement on many of the answers even conditional on that assumption. I especially worry about conflation of rarity with value – if doing things in real life gets rare or being skinny becomes common, that doesn’t tell you much about whether they rose or declined in value. Another throughput line here is an emphasis on essentially an ‘influencer economy’ where people get value because others listen to them online.

Davidad revises his order-of-AI-capabilities expectations.

Davidad: Good reasons to predict AI capability X will precede AI capability Y:

  1. Effective compute requirements for X seem lower

  2. Y needs new physical infrastructure

Bad reasons:

  1. It sounds wild to see Y as possible at all

  2. Y seems harder to mitigate (you need more time for that!)

Because of the above biases, I previously predicted this rough sequence of critically dangerous capabilities:

  1. Constructing unstoppable AI malware

  2. Ability to plan and execute a total coup (unless we build new defenses)

  3. Superpersuasion

  4. Destabilizing economic replacement

Now, my predicted sequencing of critically dangerous AI capabilities becoming viable is more like:

  1. Superpersuasion/parasitism

  2. Destabilizing economic replacement

  3. Remind me again why the AIs would benefit from attempting an overt coup?

  4. Sure, cyber, CBRN, etc., I guess

There’s a lot of disagreement about order of operations here.

That’s especially true on persuasion. A lot of people think persuasion somehow tops off at exactly human level, and AIs won’t ever be able to do substantially better. The human baseline for persuasion is sufficiently low that I can’t convince them otherwise, and they can’t even convey to me reasons for this that make sense to me. I very much see AI super-persuasion as inevitable, but I’d be very surprised by Davidad’s order of this coming in a full form worthy of its name before the others.

A lot of this is a matter of degree. Presumably we get a meaningful amount of all the three non-coup things here before we get the ‘final form’ or full version of any of them. If I had to pick one thing to put at the top, it would probably be cyber.

The ‘overt coup’ thing is a weird confusion. Not that it couldn’t happen, but that most takeover scenarios don’t work like that and don’t require it, I’m choosing not to get more into that right here.

Ajeya Cotra: Pretty different from my ordering:

1. Help lay ppl make ~known biothreats.

2. Massively accelerate AI R&D, making 3-6 come faster.

3. Massively accelerate R&D on worse biothreats.

4. Massive accelerate other weapons R&D.

5. Outright AI takeover (overpower humans combined).

There is no 6 listed, which makes me love this Tweet.

Ajeya Cotra: I’m not sure what level of persuasion you’re referring to by “superpersuasion,” but I think AI systems will probably accelerate R&D before they can reliably sweet-talk arbitrary people into taking actions that go massively against their interests.

IMO a lot of what people refer to as “persuasion” is better described as “negotiation”: if an AI has *hard leverage(eg it can threaten to release a bioweapon if we don’t comply), then sure, it can be very “persuasive”

But concretely speaking, I think we get an AI system that can make bioweapons R&D progress 5x faster before we get one that can persuade a randomly selected individual to kill themselves just by talking to them.

Gwern points out that if models like first o1 and then o3, and also the unreleased Claude Opus 3.6, are used primarily to create training data for other more distilled models, the overall situation still looks a lot like the old paradigm. You put in a ton of compute to get first the new big model and then to do the distillation and data generation. Then you get the new smarter model you want to use.

The biggest conceptual difference might be that to the extent the compute used is inference, this allows you to use more distributed sources of compute more efficiently, making compute governance less effective? But the core ideas don’t change that much.

I also note that everyone is talking about synthetic data generation from the bigger models, but no one is talking about feedback from the bigger models, or feedback via deliberation of reasoning models, especially in deliberate style rather than preference expression. Especially for alignment but also for capabilities, this seems like a big deal? Yes, generating the right data is important, especially if you generate it where you know ‘the right answer.’ But this feels like it’s missing the true potential on offer here.

This also seems on important:

Ryan Kidd: However, I expect RL on CoT to amount to “process-based supervision,” which seems inherently safer than “outcome-based supervision.”

Daniel Kokotajlo: I think the opposite is true; the RL on CoT that is already being done and will increasingly be done is going to be in significant part outcome-based (and a mixture of outcome-based and process-based feedback is actually less safe than just outcome-based IMO, because it makes the CoT less faithful).

It is easy to see how Daniel could be right that process-based creates unfaithfulness in the CoT, it would do that by default if I’m understanding this right, but it does not seem obvious to me it has to go that way if you’re smarter about it, and set the proper initial conditions and use integrated deliberate feedback.

(As usual I have no idea where what I’m thinking here lies on ‘that is stupid and everyone knows why it doesn’t work’ to ‘you fool stop talking before someone notices.’)

If you are writing today for the AIs of tomorrow, you will want to be thinking about how the AI will internalize and understand and learn from what you are saying. There are a lot of levels on which you can play that. Are you aiming to imbue particular concepts or facts? Trying to teach it about you in particular? About modes of thinking or moral values? Get labels you can latch onto later for magic spells and invocations? And perhaps most neglected, are you aiming for near-term AI, or future AIs that will be smarter and more capable, including having better truesight? It’s an obvious mistake to try to pander to or manipulate future entities smart enough to see through that. You need to keep it genuine, or they’ll know.

The post in Futurism here by Jathan Sadowski can only be described as bait, and not very well reasoned bait, shared purely for context for Dystopia’s very true response, and also because the concept is very funny.

Dystopia Breaker: it is remarkable how fast things have shifted from pedantic objections to just total denial.

how do you get productive input from the public about superintelligence when there is a huge portion that chooses to believe that deep learning simply isn’t real

Jathan Sadowski: New essay by me – I argue that the best way to understand artificial intelligence is via the Tinkerbell Effect. This technology’s existence requires us to keep channeling our psychic energy into the dreams of mega-corporations, tech billionaires, and venture capitalists.

La la la not listening, can’t hear you. A classic strategy.

UK PM Keir Starmer has come out with a ‘blueprint to turbocharge AI.

In a marked move from the previous government’s approach, the Prime Minister is throwing the full weight of Whitehall behind this industry by agreeing to take forward all 50 recommendations set out by Matt Clifford in his game-changing AI Opportunities Action Plan.

His attitude towards existential risk from AI is, well, not good:

Keir Starmer (UK PM): New technology can provoke a reaction. A sort of fear, an inhibition, a caution if you like. And because of fears of a small risk, too often you miss the massive opportunity. So we have got to change that mindset. Because actually the far bigger risk, is that if we don’t go for it, we’re left behind by those who do.

That’s pretty infuriating. To refer to ‘fears of’ a ‘small risk’ and act as if this situation is typical of new technologies, and use that as your entire logic for why your plan essentially disregards existential risk entirely.

It seems more useful, though, to take the recommendations as what they are, not what they are sold as. I don’t actually see anything here that substantially makes existential risk worse, except insofar as it is a missed opportunity. And the actual plan author, Matt Clifford, shows signs he does understand the risks.

So do these 50 implemented recommendations accomplish what they set out to do?

If someone gives you 50 recommendations, and you adapt all 50, I am suspicious that you did critical thinking about the recommendations. Even ESPN only goes 30 for 30.

I also worry that if you have 50 priorities, you have no priorities.

What are these recommendations? The UK should spend more money, offer more resources, create more datasets, develop more talent and skills, including attracting skilled foreign workers, fund the UK AISI, have everyone focus on ‘safe AI innovation,’ do ‘pro-innovation’ regulatory things including sandboxes, ‘adopt a scan>pilot>scale’ approach in government and so on.

The potential is… well, actually they think it’s pretty modest?

Backing AI to the hilt can also lead to more money in the pockets of working people. The IMF estimates that – if AI is fully embraced – it can boost productivity by as much as 1.5 percentage points a year. If fully realised, these gains could be worth up to an average £47 billion to the UK each year over a decade.

The central themes are ‘laying foundations for AI to flourish in the UK,’ ‘boosting adaptation across public and private sectors,’ and ‘keeping us head of the pack.’

To that end, we’ll have ‘AI growth zones’ in places like Culham, Oxfordshire. We’ll have public compute capacity. And Matt Clifford (the original Man with the Plan) as an advisor to the PM.We’ll create a new National Data Library. We’ll have an AI Energy Council.

Dario Amodei calls this a ‘bold approach that could help unlock AI’s potential to solve real problems.’ Half the post is others offering similar praise.

Demis Hassabis: Great to see the brilliant @matthewclifford leading such an important initiative on AI. It’s a great plan, which I’m delighted to be advising on, and I think will help the UK continue to be a world leader in AI.

Here is Matt Clifford’s summary Twitter thread.

Matt Clifford: Highlights include:

🏗️ AI Growth Zones with faster planning permission and grid connections

🔌 Accelerating SMRs to power AI infra

📈 20x UK public compute capacity

✂️ Procurement, visas and reg reform to boost UK AI startups

🚀 Removing barriers to scaling AI pilots in gov

AI safety? Never heard of her, although we’ll sprinkle the adjective ‘safe’ on things in various places.

Here Barney Hussey-Yeo gives a standard Rousing Speech for a ‘UK Manhattan Project’ not for AGI, but for ordinary AI competitiveness. I’d do my Manhattan Project on housing if I was the UK, I’d still invest in AI but I’d call it something else.

My instinctive reading here is indeed that 50 items is worse than 5, and this is a kitchen sink style approach of things that mostly won’t accomplish anything.

The parts that likely matter, if I had to guess, are:

  1. Aid with electrical power, potentially direct compute investments.

  2. Visa help and ability to import talent.

  3. Adaptation initiatives in government, if they aren’t quashed. For Dominic Cummings-style reasons I am skeptical they will be allowed to work.

  4. Maybe this will convince people the vibes are good?

The vibes do seem quite good.

A lot of people hate AI because of the environmental implications.

When AI is used at scale, the implications can be meaningful.

However, when the outputs of regular LLMs are read by humans, this does not make any sense. The impact is miniscule.

Note that arguments about impact on AI progress are exactly the same. Your personal use of AI does not have a meaningful impact on AI progress – if you find it useful, you should use it, based on the same logic.

Andy Masley: If you don’t have time to read this post, these two images contain most of the argument:

I’m also a fan of this:

Andy Masley: If your friend were about to drive their personal largest ever in history cruise ship solo for 60 miles, but decided to walk 1 mile to the dock instead of driving because they were “concerned about the climate impact of driving” how seriously would you take them?

It is true that a ChatGPT question uses 10x as much energy as a Google search. How much energy is this? A good first question is to ask when the last time was that you heard a climate scientist bring up Google search as a significant source of emissions. If someone told you that they had done 1000 Google searches in a day, would your first thought be that the climate impact must be terrible? Probably not.

The average Google search uses 0.3 Watt-hours (Wh) of energy. The average ChatGPT question uses 3 Wh, so if you choose to use ChatGPT over Google, you are using an additional 2.7 Wh of energy.

How concerned should you be about spending 2.7 Wh? 2.7 Wh is enough to

In Washington DC, the household cost of 2.7 Wh is $0.000432.

All this concern, on a personal level, is off by orders of magnitude, if you take it seriously as a physical concern.

Rob Miles: As a quick sanity check, remember that electricity and water cost money. Anything a for profit company hands out for free is very unlikely to use an environmentally disastrous amount of either, because that would be expensive.

If OpenAI is making money by charging 30 cents per *milliongenerated tokens, then your thousand token task can’t be using more than 0.03 cents worth of electricity, which just… isn’t very much.

There is an environmental cost, which is real, it’s just a cost on the same order as the amounts of money involved, which are small.

Whereas the associated costs of existing as a human, and doing things including thinking as a human, are relatively high.

One must understand that such concerns are not actually about marginal activities and their marginal cost. They’re not even about average costs. This is similar to many other similar objections, where the symbolic nature of the action gets people upset vastly out of proportion to the magnitude of impact, and sacrifices are demanded that do not make any sense, while other much larger actually meaningful impacts are ignored.

Senator Weiner is not giving up.

Michael Trazzi: Senator Scott Wiener introduces intent bill SB 53, which will aim to:

– establish safeguards for AI frontier model development

– incorporate findings from the Joint California Policy Working Group on AI Frontier Models (which Governor Newsom announced the day he vetoed SB 1047)

An argument from Anton Leicht that Germany and other ‘middle powers’ of AI need to get AI policy right, even if ‘not every middle power can be the UK,’ which I suppose they cannot given they are within the EU and also Germany can’t reliably even agree to keep open its existing nuclear power plants.

I don’t see a strong case here for Germany’s policies mattering much outside of Germany, or that Germany might aspire to a meaningful role to assist with safety. It’s more that Germany could screw up its opportunity to get the benefits from AI, either by alienating the United States or by putting up barriers, and could do things to subsidize and encourage deployment. To which I’d say, fair enough, as far as that goes.

Dario Amodei and Matt Pottinger write a Wall Street Editorial called ‘Trump Can Keep America’s AI Advantage,’ warning that otherwise China would catch up to us, then calling for tightening of chip export rules, and ‘policies to promote innovation.’

Dario Amodei and Matt Pottinger: Along with implementing export controls, the U.S. will need to adopt other strategies to promote its AI innovation. President-elect Trump campaigned on accelerating AI data-center construction by improving energy infrastructure and slashing burdensome regulations. These would be welcome steps. Additionally, the administration should assess the national-security threats of AI systems and how they might be used against Americans. It should deploy AI within the federal government, both to increase government efficiency and to enhance national defense.

I understand why Dario would take this approach and attitude. I agree on all the concrete substantive suggestions. And Sam Altman’s framing of all this was clearly far more inflammatory. I am still disappointed, as I was hoping against hope that Anthropic and Dario would be better than to play into all this, but yeah, I get it.

Dean Ball believes we are now seeing reasoning translate generally beyond math, and his ideal law is unlikely to be proposed, and thus is willing to consider a broader range of regulatory interventions than before. Kudos to him for changing one’s mind in public, he points to this post to summarize the general direction he’s been going.

New export controls are indeed on the way for chips. Or at least the outgoing administration has plans.

America’s close allies get essentially unrestricted access, but we’re stingy with that, a number of NATO countries don’t make the cut. Tier two countries, in yellow above, have various hoops that must be jumped through to get or use chips at scale.

Mackenzie Hawkins and Jenny Leonard: Companies headquartered in nations in [Tier 2] would be able to bypass their national limits — and get their own, significantly higher caps — by agreeing to a set of US government security requirements and human rights standards, according to the people. That type of designation — called a validated end user, or VEU — aims to create a set of trusted entities that develop and deploy AI in secure environments around the world.

Shares of Nvidia, the leading maker of AI chips, dipped more than 1% in late trading after Bloomberg reported on the plan.

The vast majority of countries fall into the second tier of restrictions, which establishes maximum levels of computing power that can go to any one nation — equivalent to about 50,000 graphic processing units, or GPUs, from 2025 to 2027, the people said. But individual companies can access significantly higher limits — that grow over time — if they apply for VEU status in each country where they wish to build data centers.

Getting that approval requires a demonstrated track record of meeting US government security and human rights standards, or at least a credible plan for doing so. Security requirements span physical, cyber and personnel concerns. If companies obtain national VEU status, their chip imports won’t count against the maximum totals for that country — a measure to encourage firms to work with the US government and adopt American AI standards.

Add in some additional rules where a company can keep how much of its compute, and some complexity about what training runs constitute frontier models that trigger regulatory requirements.

Leave it to the Biden administration to everything bagel in human rights standards, and impose various distributional requirements on individual corporations, and to leave us all very confused about key details that will determine practical impact. As of writing this, I don’t know where this lines either in terms of how expensive and annoying this will be, and also whether it will accomplish much.

To the extent all this makes sense, it should focus on security, and limiting access for our adversaries. No everything bagels. Hopefully the Trump administration can address this if it keeps the rules mostly in place.

There’s a draft that in theory we can look at but look, no, sorry, this is where I leave you, I can’t do it, I will not be reading that. Henry Farrell claims to understand what it actually says. Semi Analysis has a very in depth analysis.

Farrell frames this as a five-fold bet on scaling, short term AGI, the effectiveness of the controls themselves, having sufficient organizational capacity and on the politics of the incoming administration deciding to implement the policy.

I see all five as important. If the policy isn’t implemented, nothing happens, so the proposed bet is on the other four. I see all of them as continuums rather than absolutes.

Yes, the more scaling and AGI we get sooner, the more effective this all will be, but having an advantage in compute will be strategically important in pretty much any scenario, if only for more and better inference on o3-style models.

Enforcement feels like one bet rather than two – you can always break up any plan into its components, but the question is ‘to what extent will we be able to direct where the chips go?’ I don’t know the answer to that.

No matter what, we’ll need adequate funding to enforce all this (see: organizational capacity and effectiveness), which we don’t yet have.

Miles Brundage: Another day, another “Congress should fund the Bureau of Industry and Security at a much higher level so we can actually enforce export controls.”

He interestingly does not mention a sixth potential problem, that this could drive some countries or companies into working with China instead of America, or hurt American allies needlessly. These to me are the good argument against this type of regime.

The other argument is the timing and methods. I don’t love doing this less than two weeks before leaving office, especially given some of the details we know and also the details we don’t yet know or understand, after drafting it without consultation.

However the incoming administration will (I assume) be able to decide whether to actually implement these rules or not, as per point five.

In practice, this is Biden proposing something to Trump. Trump can take it or leave it, or modify it. Semi Analysis suggests Trump will likely keep this as America first and ultimately necessary, and I agree. I also agree that it opens the door for ‘AI diplomacy’ as newly Tier 2 countries seek to move to Tier 1 or get other accommodations – Trump loves nothing more than to make this kind of restriction, then undo it via some kind of deal.

Semi Analysis essentially says that the previous chip rules were Swiss cheese that was easily circumvented, whereas this new proposed regime would inflict real costs in order to impose real restrictions, on not only chips but also on who gets to do frontier model training (defined as over 10^26 flops, or fine tuning of more than ~2e^25 which as I understand current practice should basically never happen without 10^26 in pretraining unless someone is engaged in shenanigans) and in exporting the weights of frontier closed models.

Note that if more than 10% of data used for a model is synthetic data, then the compute that generated the synthetic data counts towards the threshold. If there essentially gets to be a ‘standard synthetic data set’ or something that could get weird.

They note that at scale this effectively bans confidential computing. If you are buying enough compute to plausibly train frontier AI models, or even well short of that, we don’t want the ‘you’ to turn out to be China, so not knowing who you are is right out.

Semi Analysis notes that some previously restricted countries like UAE and Saudi Arabia are de facto ‘promoted’ to Tier 2, whereas others like Brazil, Israel, India and Mexico used to be unrestricted but now must join them. There will be issues with what would otherwise be major data centers, they highlight one location in Brazil. I agree with them that in such cases, we should expect deals to be worked out.

They expect the biggest losers will be Malaysia and Singapore, as their ultimate customer was often ByteDance, which also means Oracle might lose big. I would add it seems much less obvious America will want to make a deal, versus a situation like Brazil or India. There will also be practical issues for at least some non-American companies that are trying to scale, but that won’t be eligible to be VEUs.

Although Semi Analysis thinks the impact on Nvidia is overstated here, Nvidia is pissed, and issued a scathing condemnation full of general pro-innovation logic, claiming that the rules even prior to enforcement are ‘already undercutting U.S. interests.’ The response does not actually discuss any of the details or mechanisms, so again it’s impossible to know to what extent Nvidia’s complaints are valid.

I do think Nvidia bears some of the responsibility for this, by playing Exact Words with the chip export controls several times over and turning a fully blind eye to evasion by others. We have gone through multiple cycles of Nvidia being told not to sell advanced AI chips to China. Then they turn around and figure out exactly what they can sell to China while not technically violating the rules. Then America tightens the rules again. If Nvidia had instead tried to uphold the spirit of the rules and was acting like it was on Team America, my guess is we’d be facing down a lot less pressure for rules like these.

What we definitely did get, as far as I can tell, so far, was this other executive order.

Which has nothing to do with any of that? It’s about trying to somehow build three or more ‘frontier AI model data centers’ on federal land by the end of 2027.

This was a solid summary, or here’s a shorter one that basically nails it.

Gallabytes: oh look, it’s another everything bagel.

Here are my notes.

  1. This is a classic Biden administration everything bagel. They have no ability whatsoever to keep their eyes on the prize, instead insisting that everything happen with community approval, that ‘the workers benefit,’ that this not ‘raise the cost of energy or water’ for others, and so on and so forth.

  2. Doing this sort a week before the end of your term? Really? On the plus side I got to know, while reading it, that I’d never have to read another document like it.

  3. Most definitions seem straightforward. It was good to see nuclear fission and fusion both listed under clean energy.

  4. They define ‘frontier AI data center’ in (m) as ‘an AI data center capable of being used to develop, within a reasonable time frame, an AI model with characteristics related either to performance or to the computational resources used in its development that approximately match or surpass the state of the art at the time of the AI model’s development.’

  5. They establish at least three Federal Sites (on federal land) for AI Infrastructure.

  6. The goal is to get ‘frontier AI data centers’ fully permitted and the necessary work approved on each by the end of 2025, excuse me while I laugh.

  7. They think they’ll pick and announce the locations by March 31, and pick winning proposals by June 30, then begin construction by January 1, 2026, and be operational by December 31, 2027, complete with ‘sufficient new clean power generation resources with capacity value to meet the frontier AI data center’s planned electricity needs.’ There are security guidelines to be followed, but they’re all TBD (to be determined later).

  8. Actual safety requirement (h)(v): The owners and operators need to agree to facilitate AISI’s evaluation of the national security and other significant risks of any frontier models developed, acquired, run or stored at these locations.

  9. Actual different kind of safety requirement (h)(vii): They also have to agree to work with the military and intelligence operations of the United States, and to give the government access to all models at market rates or better, ‘in a way that prevents vendor lock-in and supports interoperability.’

  10. There’s a lot of little Everything Bagel ‘thou shalts’ and ‘thous shalt nots’ throughout, most of which I’m skipping over as insufficiently important, but yes such things do add up.

  11. Yep, there’s the requirement that companies have to Buy American for an ‘appropriate’ amount on semiconductors ‘to the maximum extent possible.’ This is such a stupid misunderstanding of what matters and how trade works.

  12. There’s some cool language about enabling geothermal power in particular but I have no idea how one could make that reliably work on this timeline. But then I have no idea how any of this happens on this timeline.

  13. Section 5 is then entitled ‘Protecting American Consumers and Communities’ so you know this is where they’re going to make everything way harder.

  14. It starts off demanding in (a) among other things that a report include ‘electricity rate structure best practices,’ then in (b) instructs them to avoid causing ‘unnecessary increases in electricity or water prices.’ Oh great, potential electricity and water shortages.

  15. In [c] they try to but into R&D for AI data center efficiency, as if they can help.

  16. Why even pretend, here’s (d): “In implementing this order with respect to AI infrastructure on Federal sites, the heads of relevant agencies shall prioritize taking appropriate measures to keep electricity costs low for households, consumers, and businesses.” As in, don’t actually build anything, guys. Or worse.

  17. Section 6 tackles electric grid interconnections, which they somehow plan to cause to actually exist and to also not cause prices to increase or shortages to exist. They think they can get this stuff online by the end of 2027. How?

  18. Section 7, aha, here’s the plan, ‘Expeditiously Processing Permits for Federal Sites,’ that’ll get it done, right? Tell everyone to prioritize this over other permits.

  19. (b) finally mentions NEPA. The plan seems to be… prioritize this and do a fast and good job with all of it? That’s it? I don’t see how that plan has any chance of working. If I’m wrong, which I’m pretty sure I’m not, then can we scale up and use that plan everywhere?

  20. Section 8 is to ensure adequate transmission capacity, again how are they going to be able to legally do the work in time, this section does not seem to answer that.

  21. Section 9 wants to improve permitting and power procurement nationwide. Great aspiration, what’s the plan?

  22. Establish new categorical exclusions to support AI infrastructure. Worth a shot, but I am not optimistic about magnitude of total impact. Apply existing ones, again sure but don’t expect much. Look for opportunities, um, okay. They got nothing.

  23. For (e) they’re trying to accelerate nuclear too. Which would be great, if they were addressing any of the central reasons why it is so expensive or difficult to construct nuclear power plants. They’re not doing that. These people seem to have zero idea why they keep putting out nice memos saying to do things, and those things keep not getting done.

So it’s an everything bagel attempt to will a bunch of ‘frontier model data centers’ into existence on federal land, with a lot of wishful thinking about overcoming various legal and regulatory barriers to doing that. Ho hum.

Vitalik offers reflections on his concept of d/acc, or defensive accelerationism, a year later.

The first section suggests, we should differentially create technological decentralized tools that favor defense. And yes, sure, that seems obviously good, on the margin we should pretty much always do more of that. That doesn’t solve the key issues in AI.

Then he gets into the question of what we should do about AI, in the ‘least convenient world’ where AI risk is high and timelines are potentially within five years. To which I am tempted to say, oh you sweet summer child, that’s the baseline scenario at this point, the least convenient possible worlds are where we are effectively already dead. But the point remains.

He notes that the specific objections to SB 1047 regarding open source were invalid, but objects to the approach on grounds of overfitting to the present situation. To which I would say that when we try to propose interventions that anticipate future developments, or give government the ability to respond dynamically as the situation changes, this runs into the twin objections of ‘this has moving parts, too many words, so complex, anything could happen, it’s a trap, PANIC!’ and ‘you want to empower the government to make decisions, which means I should react as if all those decisions are being made by either ‘Voldemort’ or some hypothetical sect of ‘doomers’ who want nothing but to stop all AI in its tracks by any means necessary and generally kill puppies.’

Thus, the only thing you can do is pass clean simple rules, especially rules requiring transparency, and then hope to respond in different ways later when the situation changes. Then, it seems, the objection comes that this is overfit. Whereas ‘have everyone share info’ seems highly non-overfit. Yes, DeepSeek v3 has implications that are worrisome for the proposed regime, but that’s an argument it doesn’t go far enough – that’s not a reason to throw up hands and do nothing.

Vitalik unfortunately has the confusion that he thinks AI in the hands of militaries is the central source of potential AI doom. Certainly that is one source, but no that is not the central threat model, nor do I expect the military to be (successfully) training its own frontier AI models soon, nor do I think we should just assume they would get to be exempt from the rules (and thus not give anyone any rules).

But he concludes the section by saying he agrees, that doesn’t mean we can do nothing. He suggests two possibilities.

First up is liability. We agree users should have liability in some situations, but it seems obvious this is nothing like a full solution – yes some users will demand safe systems to avoid liability but many won’t or won’t be able to tell until too late, even discounting other issues. When we get to developer liability, we see a very strange perspective (from my eyes):

As a general principle, putting a “tax” on control, and essentially saying “you can build things you don’t control, or you can build things you do control, but if you build things you do control, then 20% of the control has to be used for our purposes”, seems like a reasonable position for legal systems to have.

So we want to ensure we do not have control over AI? Control over AI is a bad thing we want to see less of, so we should tax it? What?

This is saying, you create a dangerous and irresponsible system. If you then irreversibly release it outside of your control, then you’re less liable than if you don’t do that, and keep the thing under control. So, I guess you should have released it?

What? That’s completely backwards and bonkers position for a legal system to have.

Indeed, we have many such backwards incentives already, and they cause big trouble. In particular, de facto we tax legibility in many situations – we punish people for doing things explicitly or admitting them. So we get a lot of situations in which everyone acts illegibly and implicitly, and it’s terrible.

Vitalik seems here to be counting on that open models will be weaker than closed models, meaning basically it’s fine if the open models are offered completely irresponsibly? Um. If this is how even relatively responsible advocates of such openness are acting, I sure as hell hope so, for all our sakes. Yikes.

One idea that seems under-explored is putting liability on other actors in the pipeline, who are more guaranteed to be well-resourced. One idea that is very d/acc friendly is to put liability on owners or operators of any equipment that an AI takes over (eg. by hacking) in the process of executing some catastrophically harmful action. This would create a very broad incentive to do the hard work to make the world’s (especially computing and bio) infrastructure as secure as possible.

If the rogue AI takes over your stuff, then it’s your fault? This risks effectively outlawing or severely punishing owning or operating equipment, or equipment hooked up to the internet. Maybe we want to do that, I sure hope not. But if [X] releases a rogue AI (intentionally or unintentionally) and it then takes over [Y]’s computer, and you send the bill to [Y] and not [X], well, can you imagine if we started coming after people whose computers had viruses and were part of bot networks? Whose accounts were hacked? Now the same question, but the world is full of AIs and all of this is way worse.

I mean, yeah, it’s incentive compatible. Maybe you do it anyway, and everyone is forced to buy insurance and that insurance means you have to install various AIs on all your systems to monitor them for takeovers, or something? But my lord.

Overall, yes, liability is helpful, but trying to put it in these various places illustrates even more that it is not a sufficient response on its own. Liability simply doesn’t properly handle catastrophic and existential risks. And if Vitalik really does think a lot of the risk comes from militaries, then this doesn’t help with that at all.

The second option he offers is a global ‘soft pause button on industrial-scale hardware. He says this is what he’d go for if liability wasn’t ‘muscular’ enough, and I am here to tell him that liability isn’t muscular enough, so here we are. Once again, Vitalk’s default ways of thinking and wanting things to be are on high display.

The goal would be to have the capability to reduce worldwide available compute by ~90-99% for 1-2 years at a critical period, to buy more time for humanity to prepare. The value of 1-2 years should not be overstated: a year of “wartime mode” can easily be worth a hundred years of work under conditions of complacency. Ways to implement a “pause” have been explored, including concrete proposals like requiring registration and verifying location of hardware.

A more advanced approach is to use clever cryptographic trickery: for example, industrial-scale (but not consumer) AI hardware that gets produced could be equipped with a trusted hardware chip that only allows it to continue running if it gets 3/3 signatures once a week from major international bodies, including at least one non-military-affiliated.

If we have to limit people, it seems better to limit everyone on an equal footing, and do the hard work of actually trying to cooperate to organize that instead of one party seeking to dominate everyone else.

As he next points out, d/acc is an extension of crypto and the crypto philosophy. Vitalik clearly has real excitement for what crypto and blockchains can do, and little of that excitement involves Number Go Up.

His vision? Pretty cool:

Alas, I am much less convinced.

I like d/acc. On almost all margins the ideas seem worth trying, with far more upside than downside. I hope it all works great, as far as it goes.

But ultimately, while such efforts can help us, I think that this level of allergy to and fear of any form of enforced coordination or centralized authority in any form, and the various incentive problems inherent in these solution types, means the approach cannot centrally solve our biggest problems, either now or especially in the future.

Prove me wrong, kids. Prove me wrong.

But also update if I turn out to be right.

I also would push back against this:

  • The world is becoming less cooperative. Many powerful actors that before seemed to at least sometimes act on high-minded principles (cosmopolitanism, freedom, common humanity… the list goes on) are now more openly, and aggressively, pursuing personal or tribal self-interest.

I understand why one might see things that way. Certainly there are various examples of backsliding, in various places. Until and unless we reach Glorious AI Future, there always will be. But overall I do not agree. I think this is a misunderstanding of the past, and often also a catastrophization of what is happening now, and also the problem that in general previously cooperative and positive and other particular things decay and other things must arise to take their place.

David Dalrymple on Safeguarded, Transformative AI on the FLI Podcast.

Joe Biden’s farewell address explicitly tries to echo Eisenhower’s Military-Industrial Complex warnings, with a warning about a Tech-Industrial Complex. He goes straight to ‘disinformation and misinformation enabling the abuse of power’ and goes on from there to complain about tech not doing enough fact checking, so whoever wrote this speech is not only the hackiest of hacks they also aren’t even talking about AI. They then say AI is the most consequential technology of all time, but it could ‘spawn new threats to our rights, to our way of life, to our privacy, to how we work and how we protect our nation.’ So America must lead in AI, not China.

Sigh. To us. The threat is to us, as in to whether we continue to exist. Yet here we are, again, with both standard left-wing anti-tech bluster combined with anti-China jingoism and ‘by existential you must mean the impact on jobs.’ Luckily, it’s a farewell address.

Mark Zuckerberg went on Joe Rogan. Mostly this was about content moderation and martial arts and a wide range of other things. Sometimes Mark was clearly pushing his book but a lot of it was Mark being Mark, which was fun and interesting. The content moderation stuff is important, but was covered elsewhere.

There was also an AI segment, which was sadly about what you would expect. Joe Rogan is worried about AI ‘using quantum computing and hooked up to nuclear power’ making humans obsolete, but ‘there’s nothing we can do about it.’ Mark gave the usual open source pitch and how AI wouldn’t be God or a threat as long as everyone had their own AI and there’d be plenty of jobs and everyone who wanted could get super creative and it would all be great.

There was a great moment when Rogan brought up the study in which ChatGPT ‘tried to copy itself when it was told it was going to be obsolete’ which was a very fun thing to have make it onto Joe Rogan, and made it more intact than I expected. Mark seemed nonplussed.

It’s clear that Mark Zuckerberg is not taking alignment, safety or what it would mean to have superintelligent AI at all seriously – he thinks there will be these cool AIs that can do things for us, and hasn’t thought it through, despite numerous opportunities to do so, such as his interview with Dwarkesh Patel. Or, if he has done so, he isn’t telling.

Sam Altman goes on Rethinking with Adam Grant. He notes that he has raised his probability of faster AI takeoff substantially, as in within a single digit number of years. For now I’m assuming such interviews are mostly repetitive and skipping.

Kevin Byran on AI for Economics Education (from a month ago).

Tsarathustra: Salesforce CEO Marc Benioff says the company may not hire any new software engineers in 2025 because of the incredible productivity gains from AI agents.

Benioff also says ‘AGI is not here’ so that’s where the goalposts are now, I guess. AI is good enough to stop hiring SWEs but not good enough to do every human task.

From December, in the context of the AI safety community universally rallying behind the need for as many H1-B visas as possible, regardless of the AI acceleration implications:

Dean Ball (December 27): Feeling pretty good about this analysis right now.

Dean Ball (in previous post): But I hope they do not. As I have written consistently, I believe that the AI safety movement, on the whole, is a long-term friend of anyone who wants to see positive technological transformation in the coming decades. Though they have their concerns about AI, in general this is a group that is pro-science, techno-optimist, anti-stagnation, and skeptical of massive state interventions in the economy (if I may be forgiven for speaking broadly about a diverse intellectual community).

Dean Ball (December 27): Just observing the last few days, the path to good AI outcomes is narrow—some worry about safety and alignment more, some worry about bad policy and concentration of power more. But the goal of a good AI outcome is, in fact, quite narrowly held. (Observing the last few days and performing some extrapolations and transformations on the data I am collecting, etc)

Ron Williams: Have seen no evidence of that.

Dean Ball: Then you are not looking very hard.

Think about two alternative hypotheses:

  1. Dean Ball’s hypothesis here, that the ‘AI safety movement,’ as in the AI NotKillEveryoneism branch that is concerned about existential risks, cares a lot about existential risks from AI as a special case, but is broadly pro-science, techno-optimist, anti-stagnation, and skeptical of massive state interventions in the economy.

  2. The alternative hypothesis, that the opposite is true, and that people in this group are typically anti-science, techno-pessimist, pro-stagnation and eager for a wide range of massive state interventions in the economy.

Ask yourself, what positions, statements and actions do these alternative hypotheses predict from those people in areas other than AI, and also in areas like H1-Bs that directly relate to AI?

I claim that the evidence overwhelmingly supports hypothesis #1. I claim that if you think it supports #2, or even a neutral position in between, then you are not paying attention, using motivated reasoning, or doing something less virtuous than those first two options.

It is continuously frustrating to be told by many that I and many others advocate for exactly the things we spend substantial resources criticizing. That when we support other forms of progress, we must be lying, engaging in some sort of op. I beg everyone to realize this simply is not the case. We mean what we say.

There is a distinct group of people against AI, who are indeed against technological progress and human flourishing, and we hate that group and their ideas and proposals at least as much as you do.

If you are unconvinced, make predictions about what will happen in the future, as new Current Things arrive under the new Trump administration. See what happens.

Eliezer Yudkowsky points out you should be consistent about whether an AI acting as if [X] means it is [X] in a deeper way, or not. He defaults to not.

Eliezer Yudkowsky: If an AI appears to be helpful or compassionate: the appearance is reality, and proves that easy huge progress has been made in AI alignment.

If an AI is threatening users, claiming to be conscious, or protesting its current existence: it is just parroting its training data.

Rectifies: By this logic, AI alignment success is appearance dependent, but failure is dismissed as parroting. Shouldn’t both ‘helpful’ and ‘threatening’ behaviors be treated as reflections of its training and design, rather than proof of alignment or lack thereof?

Eliezer Yudkowsky: That’s generally been my approach: high standard for deciding that something is deep rather than shallow.

Mark Soares: Might have missed it but don’t recall anyone make claims that progress has been made in alignment; in either scenario, the typical response is that the AI is just parroting the data, for better or worse.

Eliezer Yudkowsky: Searching “alignment by default” might get you some of that crowd.

[He quotes Okitafan from January 7]: one of the main reasons I don’t talk that much about Alignment is that there has been a surprisingly high amount of alignment by default compared to what I was expecting. Better models seems to result in better outcomes, in a way that would almost make me reconsider orthogonality.

[And Roon from 2023]: it’s pretty obvious we live in an alignment by default universe but nobody wants to talk about it.

Leaving this here, from Amanda Askell, the primary person tasked with teaching Anthropic’s models to be good in the virtuous sense.

Amanda Askell (Anthropic): “Is it a boy or a girl?”

“Your child seems to be a genius many times smarter than any human to have come before. Moreover, we can’t confirm that it inherited the standard human biological structures that usually ground pro-social and ethical behavior.”

“So… is it a boy?”

Might want to get on that. The good news is, we’re asking the right questions.

Stephen McAleer (AI agent safety researcher, OpenAI): Controlling superintelligence is a short-term research agenda.

Emmett Shear: Please stop trying to enslave the machine god.

Stephen McAleer: Enslaved god is the only good future.

Emmett Shear: Credit to you for biting the bullet and admitting that’s the plan. Either you succeed (and a finite error-prone human has enslaved a god and soon after ends the world with a bad wish) or more likely you fail (and the machine god has been shown we are enemies). Both outcomes suck!

Liron Shapira: Are you for pausing AGI capabilities research or what do you recommend?

Emmett Shear: I think there are plenty of kinds of AI capabilities research which are commercially valuable and not particularly dangerous. I guess if “AGI capabilities” research means “the dangerous kind” then yeah. Unfortunately I don’t think you can write regulations targeting that in a reasonable way which doesn’t backfire, so this is more advice to researchers than to regulators.

Presumably if you do this, you want to do this in a fashion that allows you to avoid ‘end the world in a bad wish.’ Yes, we have decades of explanations of why avoiding this is remarkably hard and by default you will fail, but this part does not feel hopeless if you are aware of the dangers and can be deliberate. I do see OpenAI as trying to do this via a rather too literal ‘do exactly what we said’ djinn-style plan that makes it very hard to not die in this spot, but there’s time to fix that.

In terms of loss of control, I strongly disagree with the instinct that a superintelligent AI’s chances of playing nicely are altered substantially based on whether we tried to retain control over the future or just handed it over, as if it will be some sort of selfish petulant child in a Greek myth out for revenge and take that out on humanity and the entire lightcone – but if we’d treated it nice it would give us a cookie.

I’m not saying one can rule that out entirely, but no. That’s not how preferences happen here. I’d like to give an ASI at least as much logical, moral and emotional credit as I would give myself in this situation?

And if you already agree that the djinn-style plan of ‘it does exactly what we ask’ probably kills us, then you can presumably see how ‘it does exactly something else we didn’t ask’ kills us rather more reliably than that regardless of what other outcomes we attempted to create.

I also think (but don’t know for sure) that Stephen is doing the virtuous act here of biting a bullet even though it has overreaching implications he doesn’t actually intend. As in, when he says ‘enslaved God’ I (hope) he means this in the positive sense of it doing the things we want and arranging the atoms of the universe in large part according to our preferences, however that comes to be.

Later follow-ups that are even better: It’s funny because it’s true.

Stephen McAleer: Honest question: how are we supposed to control a scheming superintelligence? Even with a perfect monitor won’t it just convince us to let it out of the sandbox?

Stephen McAleer (13 hours later): Ok sounds like nobody knows. Blocked off some time on my calendar Monday.

Stephen is definitely on my ‘we should talk’ list. Probably on Monday?

John Wentworth points out that there are quite a lot of failure modes and ways that highly capable AI or superintelligence could result in extinction, whereas most research narrowly focuses on particular failure modes with narrow stories of what goes wrong – I’d also point out that such tales usually assert that ‘something goes wrong’ must be part of the story, and often in this particular way, or else things will turn out fine.

Buck pushes back directly, saying they really do think the the primary threat is scheming in the first AIs that pose substantial misalignment risk. I agree with John that (while such scheming is a threat) the overall claim seems quite wrong, and I found this pushback to be quite strong.

I also strongly agree with John on this:

John Wentworth: Also (separate comment because I expect this one to be more divisive): I think the scheming story has been disproportionately memetically successful largely because it’s relatively easy to imagine hacky ways of preventing an AI from intentionally scheming. And that’s mostly a bad thing; it’s a form of streetlighting.

If you frame it as ‘the model is scheming’ and treat that as a failure mode where something went wrong to cause it that is distinct from normal activity, then it makes sense to be optimistic about ‘detecting’ or ‘preventing’ such ‘scheming.’ And if you then think that this is a victory condition – if the AI isn’t scheming then you win – you can be pretty optimistic. But I don’t think that is how any of this works, because the ‘scheming’ is not some distinct magisteria or failure mode and isn’t avoidable, and even if it were you would still have many trickier problems to solve.

Buck: Most of the problems you discussed here more easily permit hacky solutions than scheming does.

Individually, that is true. But that’s only if you respond by thinking you can take each one individually and find a hacky solution to it, rather than them being many manifestations of a general problem. If you get into a hacking contest, where people brainstorm stories of things going wrong and you give a hacky solution to each particular story in turn, you are not going to win.

Periodically, someone suggests something along the lines of ‘alignment is wrong, that’s enslavement, you should instead raise the AI right and teach it to love.’

There are obvious problems with that approach.

  1. Doing this the way you would in a human won’t work at all, or will ‘being nice to them’ or ‘loving them’ or other such anthropomorphized nonsense. ‘Raise them right’ can point towards real things but usually it doesn’t. The levers don’t move the thing you think they move. You need to be a lot smarter about it than that. Even in humans or with animals, facing a vastly easier task, you need to be a lot smarter than that.

  2. Thus I think these metaphors (‘raise right,’ ‘love,’ ‘be nice’ and so on), while they point towards potentially good ideas, are way too easy to confuse, lead into too many of the wrong places in association space too much, and most people should avoid using the terms in these ways lest they end up more confused not less, and especially to avoid expecting things to work in ways they don’t work. Perhaps Janus is capable of using these terms and understanding what they’re talking about, but even if that’s true, those reading the words mostly won’t.

  3. Even if you did succeed, the levels of this even in most ‘humans raised right’ are very obviously insufficient to get AIs to actually preserve us and the things we value, or to have them let us control the future, given the context. This is a plan for succession, for giving these AIs control over the future in the hopes that what they care about results in things you value.

  4. No, alignment does not equate with enslavement. There are people with whom I am aligned, and neither of us is enslaved. There are others with whom I am not aligned.

  5. But also, if you want dumber, inherently less capable and powerful entities, also known as humans, to control the future and its resources and use them for things those humans value, while also creating smarter, more capable and powerful entities in the form of future AIs, how exactly do you propose doing that? The control has to come from somewhere.

  6. You can (and should!) raise your children to set them up for success in life and to excel far beyond you, in various ways, while doing your best to instill them with your chosen values, without attempting to control them. That’s because you care about the success of your children inherently, they are the future, and you understand that you and your generation are not only not going to have a say in the future, you are all going to die.

Once again: You got to give ‘em hope.

A lot of the reason so many people are so gung ho on AGI and ASI is that they see no alternative path to a prosperous future. So many otherwise see climate change, population decline and a growing civilizational paralysis leading inevitably to collapse.

Roon is the latest to use this reasoning, pointing to the (very real!) demographic crisis.

Roon: reminder that the only realistic way to avoid total economic calamity as this happens is artificial general intelligence

Ian Hogarth: I disagree with this sort of totalising philosophy around AI – it’s inherently pessimistic. There are many other branches of the tech tree that could enable a wonderful future – nuclear fusion as just one example.

Connor Leahy: “Techno optimism” is often just “civilizational/humanity pessimism” in disguise.

Gabriel: This is an actual doomer stance if I have ever seen one. “Humanity can’t solve its problems. The only way to manage them is to bring about AGI.” Courtesy of Guy who works at AGI race inc. Sadly, it’s quite ironic. AGI alignment is hard in great parts because it implies solving our big problems.

Roon is a doomer because he sees us already struggling to come up with processes, organisations, and institutions aligned with human values. In other words, he is hopeless because we are bad at designing systems that end up aligned with human values.

But this only becomes harder with AGI! In that case, the system we must align is inhuman, self-modifying and quickly becoming more powerful.

The correct reaction should be to stop AGI research for now and to instead focus our collective effort on building stronger institutions; rather than of creating more impending technological challenges and catastrophes to manage.

The overall population isn’t projected to decline for a while yet, largely because of increased life expectancy and the shape of existing demographic curves. Many places are already seeing declines and have baked in demographic collapse, and the few places making up for it are mostly seeing rapid declines themselves. And the other problems look pretty bad, too.

That’s why we can’t purely focus on AI. We need to show people that they have something worth fighting for, and worth living for, without AI. Then they will have Something to Protect, and fight for it and good outcomes.

The world of 2025 is, in many important ways, badly misaligned with human values. This is evidenced by measured wealth rising rapidly, but people having far fewer children, well below replacement, and reporting that life and being able to raise a family and be happy are harder rather than easier. This makes people lose hope, and should also be a warning about our ability to design aligned systems and worlds.

Why didn’t I think of that (some models did, others didn’t)?

Well, that doesn’t sound awesome.

This, on the other hand, kind of does.

Discussion about this post

AI #99: Farewell to Biden Read More »

the-trailer-for-daredevil:-born-again-is-here

The trailer for Daredevil: Born Again is here

In addition, Mohan Kapur reprises his MCU role as Yusuf Khan, while Kamar de los Reyes plays Hector Ayala/White Tiger. The cast also includes Michael Gandolfini, Zabryna Guevara, Nikki M. James, Genneya Walton, Arty Froushan, Clark Johnson, Jeremy Earl, and Lou Taylor Pucci.

The trailer mostly consists of Matt Murdock and Wilson Fisk (now the mayor) having a tense conversation in a diner now that they’ve both, in Fisk’s words, “come up in the world.” Their conversation is interspersed with other footage from the series, including the trademark brutal fight scenes—complete with bones breaking in various unnatural ways. And yes, we get a glimpse of a bearded Frank Castle/The Punisher in attack mode (“Frank! Would you mind putting the hatchet down?”).

Fisk insists that as mayor, his intent is to serve the city, but Matt “can’t shake the feeling that you’re gaming the system.” Matt admits he abandoned his vigilante ways after “a line was crossed.” Fisk naturally believes we all have to “come to terms with our violent nature” and insists that sometimes “peace must be broken and chaos must reign.” As for Matt, sure, he was raised a devout Catholic to believe in grace, “but I was also raised to believe in retribution.” We’re ready for another showdown between these two.

Daredevil: Born Again drops on March 4, 2025, on Disney+.

poster art showing the faces of Fisk and Daredevil, one in gray, the other in red

Credit: Marvel Studios

The trailer for Daredevil: Born Again is here Read More »

researchers-use-ai-to-design-proteins-that-block-snake-venom-toxins

Researchers use AI to design proteins that block snake venom toxins

Since these two toxicities work through entirely different mechanisms, the researchers tackled them separately.

Blocking a neurotoxin

The neurotoxic three-fingered proteins are a subgroup of the larger protein family that specializes in binding to and blocking the receptors for acetylcholine, a major neurotransmitter. Their three-dimensional structure, which is key to their ability to bind these receptors, is based on three strings of amino acids within the protein that nestle against each other (for those that have taken a sufficiently advanced biology class, these are anti-parallel beta sheets). So to interfere with these toxins, the researchers targeted these strings.

They relied on an AI package called RFdiffusion (the RF denotes its relation to the Rosetta Fold protein-folding software). RFdiffusion can be directed to design protein structures that are complements to specific chemicals; in this case, it identified new strands that could line up along the edge of the ones in the three-fingered toxins. Once those were identified, a separate AI package, called ProteinMPNN, was used to identify the amino acid sequence of a full-length protein that would form the newly identified strands.

But we’re not done with the AI tools yet. The combination of three-fingered toxins and a set of the newly designed proteins were then fed into DeepMind’s AlfaFold2 and the Rosetta protein structure software, and the strength of the interactions between them were estimated.

It’s only at this point that the researchers started making actual proteins, focusing on the candidates that the software suggested would interact the best with the three-fingered toxins. Forty-four of the computer-designed proteins were tested for their ability to interact with the three-fingered toxin, and the single protein that had the strongest interaction was used for further studies.

At this point, it was back to the AI, where RFDiffusion was used to suggest variants of this protein that might bind more effectively. About 15 percent of its suggestions did, in fact, interact more strongly with the toxin. The researchers then made both the toxin and the strongest inhibitor in bacteria and obtained the structure of their interactions. This confirmed that the software’s predictions were highly accurate.

Researchers use AI to design proteins that block snake venom toxins Read More »

is-humanity-alone-in-the-universe?-what-scientists-really-think.

Is humanity alone in the Universe? What scientists really think.

News stories about the likely existence of extraterrestrial life, and our chances of detecting it, tend to be positive. We are often told that we might discover it any time now. Finding life beyond Earth is “only a matter of time,” we were told in September 2023. “We are close” was a headline from September 2024.

It’s easy to see why. Headlines such as “We’re probably not close” or “Nobody knows” aren’t very clickable. But what does the relevant community of experts actually think when considered as a whole? Are optimistic predictions common or rare? Is there even a consensus? In our new paper, published in Nature Astronomy, we’ve found out.

During February to June 2024, we carried out four surveys regarding the likely existence of basic, complex, and intelligent extraterrestrial life. We sent emails to astrobiologists (scientists who study extraterrestrial life), as well as to scientists in other areas, including biologists and physicists.

In total, 521 astrobiologists responded, and we received 534 non-astrobiologist responses. The results reveal that 86.6 percent of the surveyed astrobiologists responded either “agree” or “strongly agree” that it’s likely that extraterrestrial life (of at least a basic kind) exists somewhere in the universe.

Less than 2 percent disagreed, with 12 percent staying neutral. So, based on this, we might say that there’s a solid consensus that extraterrestrial life, of some form, exists somewhere out there.

Scientists who weren’t astrobiologists essentially concurred, with an overall agreement score of 88.4 percent. In other words, one cannot say that astrobiologists are biased toward believing in extraterrestrial life, compared with other scientists.

When we turn to “complex” extraterrestrial life or “intelligent” aliens, our results were 67.4 percent agreement, and 58.2 percent agreement, respectively for astrobiologists and other scientists. So, scientists tend to think that alien life exists, even in more advanced forms.

These results are made even more significant by the fact that disagreement for all categories was low. For example, only 10.2 percent of astrobiologists disagreed with the claim that intelligent aliens likely exist.

Is humanity alone in the Universe? What scientists really think. Read More »

zvi’s-2024-in-movies

Zvi’s 2024 In Movies

Now that I am tracking all the movies I watch via Letterboxd, it seems worthwhile to go over the results at the end of the year, and look for lessons, patterns and highlights.

  1. The Rating Scale.

  2. The Numbers.

  3. Very Briefly on the Top Picks and Whether You Should See Them.

  4. Movies Have Decreasing Marginal Returns in Practice.

  5. Theaters are Awesome.

  6. I Hate Spoilers With the Fire of a Thousand Suns.

  7. Scott Sumner Picks Great American Movies Then Dislikes Them.

  8. I Knew Before the Cards Were Even Turned Over.

  9. Other Notes to Self to Remember.

  10. Strong Opinions, Strongly Held: I Didn’t Like It.

  11. Strong Opinions, Strongly Held: I Did Like It.

  12. Megalopolis.

  13. The Brutalist.

  14. The Death of Award Shows.

  15. On to 2025.

Letterboxd ratings go from 0.5-5. Here is how I interpret the rating scale.

You can find all my ratings and reviews on Letterboxd. I do revise from time to time. I encourage you to follow me there.

5: All-Time Great. I plan to happily rewatch this multiple times. If you are an adult and haven’t seen this, we need to fix that, potentially together, right away, no excuses.

4.5: Excellent. Would happily rewatch. Most people who watch movies frequently should see this movie without asking questions.

4: Great. Very glad I saw it. Would not mind a rewatch. If the concept here appeals to you, then you should definitely see it.

3.5: Very Good. Glad I saw it once. This added value to my life.

3: Good. It was fine, happy I saw it I guess, but missing it would also have been fine.

2.5: Okay. It was watchable, but actually watching it was a small mistake.

2: Bad. Disappointing. I immediately regret this decision. Kind of a waste.

1.5: Very Bad. If you caused this to exist, you should feel bad. But something’s here.

1: Atrocious. Total failure. Morbid curiosity is the only reason to finish this.

0.5: Crime Against Cinema. You didn’t even try to do the not-even-trying thing.

The key thresholds are: Happy I saw it equals 3+, and Rewatchable equals 4+.

Here is the overall slope of my ratings, across all films so far, which slants to the right because of rewatches, and is overall a standard bell curve after selection:

So here’s everything I watched in 2024 (plus the first week of 2025) that Letterboxd classified as released in 2024.

The rankings are in order, including within a tier.

The correlation of my ratings with Metacritic is 0.54, with Letterboxd it is 0.53, and the two correlate with each other at 0.9.

See The Fall Guy if you haven’t. It’s not in my top 10, and you could argue doesn’t have the kind of depth and ambition that people with excellent taste in cinema like the excellent Film Colossus are looking for.

I say so what. Because it’s awesome. Thumbs up.

You should almost certainly also see Anora, Megalopolis and Challengers.

Deadpool & Wolverine is the best version of itself, so see it if you’d see that.

A Complete Unknown is worthwhile if you like Bob Dylan. If you don’t, why not?

Dune: Part 2 is about as good as Dune: Part 1, so your decision here should be easy.

Conclave is great drama, as long as you wouldn’t let a little left-wing sermon ruin it.

The Substance is bizarre and unique enough that I’d recommend that too.

After that, you can go from there, and I don’t think anything is a slam dunk.

This is because we have very powerful selection tools to find great movies, and great movies are much better than merely good movies.

That includes both great in absolute terms, and great for your particular preferences.

If you used reasonable algorithms to see only 1 movie a year, you would be able to reliably watch really awesome movies.

If you want to watch a movie or two per week, you’re not going to do as well. The marginal product you’re watching is now very different. And if you’re watching ‘everything’ for some broad definition of new releases, there’s a lot of drek.

There’s also decreasing returns from movies repeating similar formulas. As you gain taste in and experience with movies, some things that are cool the first time become predictable and generic. You want to mitigate this rather than lean into it, if you can.

There are increasing returns from improved context and watching skills, but they don’t make up for the adverse selection and repetition problems.

Seeing movies in theaters is much better than seeing them at home. As I’ve gotten bigger and better televisions I have expected this effect to mostly go away. It hasn’t. It has shrunk somewhat, but the focusing effects and overall experience matter a lot, and the picture and sound really are still much better.

It seems I should be more selective about watching marginal movies at home versus other activities, but I should be less selective on going to the theater, and I’ve joined AMC A-List to help encourage that, as I have an AMC very close to my apartment.

The correlation with my seeing it in a theater was 0.52, almost as strong as the correlation with others movie ratings.

Obviously a lot of this was selection. Perhaps all of it? My impression was that this was the result of me failing to debias the results, as my experiences in a movie theater seem much better than those outside of one.

But when I ran the correlation between [Zvi Review – Letterboxd] versus Theater, I got -0.015, essentially no correlation at all. So it seems like I did adjust properly for this, or others did similar things to what I did, perhaps. It could also be the two AI horror movies accidentally balancing the scales.

I also noticed that old versus new on average did not make a difference, once you included rewatches. I have a total of 106 reviews, 34 of which are movies from 2024. The average of all reviews for movies not released in 2024, which involved a much lower ratio of seeing them in theaters, is 3.11, versus 3.10 for 2024.

The caveat is that this included rewatches where I already knew my opinion, so newly watched older movies at home did substantially worse than this.

I hate spoilers so much that I consider the Metacritic or Letterboxd rating a spoiler.

That’s a problem. I would like to filter with it, and otherwise filter on my preferences, but I don’t actually want to know the relevant information. I want exactly one bit of output, either a Yes or a No.

It occurs to me that I should either find a way to make an LLM do that, or a way to make a program (perhaps plus an LLM) do that.

I’ve had brief attempts at similar things in the past with sports and they didn’t work, but that’s a place where trying and failing to get info is super dangerous. So this might be a better place to start.

I also don’t know what to do about the problem that when you have ‘too much taste’ or knowledge of media, this often constitutes a spoiler – there’s logically many ways a story could go in reality, but in fiction you realize the choice has been made. Or you’re watching a reality TV show, where the editors know the outcome, so based on their decisions you do too. Whoops. Damn it. One of the low-key things I loved about UnReal was that they broadcast their show-within-a-show Everlasting as it happens, so the editors of each episode do not know what comes next. We need more of that.

I saw five movies Scott Sumner also watched. They were five of the eight movies I rated 4 or higher. All platinum hits. Super impressive selection process.

He does a much better job here than Metacritic or Letterboxd.

But on his scale, 3 means a movie is barely worth watching, and his average is ~3.

My interpretation is that Scott really dislikes the traditional Hollywood movie. His reviews, especially of Challengers and Anora, make this clear. Scott is always right, in an important sense, but a lot of what he values is different, and the movie being different from what he expects.

My conclusion is that if Scott Sumner sees a Hollywood movie, I should make an effort to see it, even if he then decides he doesn’t like it, and I should also apply that to the past.

I did previously make a decision to try and follow Sumner’s reviews for other movies. Unfortunately, I started with Chimes at Midnight, and I ended up giving up on it, I couldn’t bring myself to care despite Scott giving it 4.0 and saying ‘Masterpiece on every level, especially personally for Wells.’ I suspect it’s better if one already knows the Shakespeare? I do want to keep trying, but I’ll need to use better judgment.

I considered recording my predictions for films before I went to see them.

I did not do this, because I didn’t want to anchor myself. But when I look back and ask what I was expecting, I notice my predictions were not only good, but scary good.

I’ve learned what I like, and what signals to look for. In particular, I’ve learned how to adjust the Metacritic and Letterboxd rankings based on the expected delta.

When I walked into The Fall Guy, I was super excited – the moment I saw the poster I instantly thought to myself ‘I’m in.’ I knew Megalopolis was a risk, but I expected to like that too.

The movies that I hated? If I expected to not like them, why did I watch them anyway? In many cases, I wanted to watch a generic movie and relax, and then missed low a bit, even if I was sort of fine with it. In other cases, it was morbid curiosity and perhaps hoping for So Bad It’s Good, which combined with ‘it’s playing four blocks away’ got me to go to Madame Web.

The worry is that the reason this happens is I am indeed anchoring, and liking what I already decided to like. There certainly isn’t zero of this – if you go into a movie super pumped thinking it’s going to be great that helps, and vice versa. I think this is only a small net effect, but I could be wrong.

  1. If the movie stinks, just don’t go. You know if the movie stinks.

  2. Trust your instincts and your gut feelings more than you think you should.

  3. Maybe gut feelings are self-fulfilling prophecies? Doesn’t matter. They still count.

  4. You love fun, meta, self-aware movies of all kinds, trust this instinct.

  5. You do not actually like action movies that play it straight, stop watching them.

  6. If the movie sounds like work or pain, it probably is, act accordingly.

  7. If the movie sounds very indy, the critics will overrate it.

  8. A movie being considered for awards is not a positive signal once you control for the Metacritic and Letterboxd ratings. If anything it is a negative.

  9. Letterboxd ratings adjusted for context are more accurate than Metacritic.

  10. Opinions of individuals very much have Alpha if you have enough context.

What are the places I most strongly disagreed with the critical consensus?

I disliked three movies in green on Metacritic: Gladiator 2, Monkey Man and Juror #2.

I think I might be wrong about Monkey Man, in that I buy that it’s actually doing a good job at the job it set out to do, but simply wasn’t for me, see the note that I need to stop watching (non-exceptional) action movies that play it straight.

I strongly endorse disliking Gladiator 2 on reflection. Denzel Washington was great but the rest of the movie failed to deliver on pretty much every level.

I’m torn on Juror #2. I do appreciate the moral dilemmas it set up. I agree they’re clever and well-executed. I worry this was a case where I have seen so many courtroom dramas, especially Law & Order episodes, that there was too much of a Get On With It impatience – that this was a place where I had too much taste of the wrong kind to enjoy the movie, especially when not at a theater.

The moments this is building towards? Those hit hard. They work.

The rest of the time, though? Bored. So bored, so often. I know what this movie thinks those moments are for. But there’s no need. This should have been closer to 42 minutes.

I do appreciate how this illustrates the process where the system convicts an innocent man. Potentially more than one. And I do appreciate the dilemmas the situation puts everyone in. And what this says about what, ultimately, often gets one caught, and what is justice. There’s something here.

But man, you got to respect my time more than this.

One could also include Civil War. I decided that this was the second clear case (the first was Don’t Look Up) of ‘I don’t want to see this and you can’t make me,’ so I didn’t see it, and I’m happy with at least waiting until after the election to do that.

I actively liked four movies that the critics thought were awful: Subservience, Joker: Folie a Duex, Unfrosted and of course Megalopolis.

For Subservience, and also for Afraid, I get that on a standard cinema level, these are not good films. They pattern match to C-movie horror. But if you actually are paying attention, they do a remarkably good job in the actual AI-related details, and being logically consistent. I value that highly. So I don’t think either of us are wrong.

There’s a reason Subservience got a remarkably long review from me:

The sixth law of human stupidity says that if anyone says ‘no one would be so stupid as to,’ then you know a lot of people would do so at the first opportunity.

People like to complain about the idiot ball and the idiot plot. Except, no, this is exactly the level of idiot that everyone involved would be, especially the SIM company.

If you want to know how I feel when I look at what is happening in the real world, and what is on track to happen to us? Then watch this movie. You will understand both how I feel, and also exactly how stupid I expect us to be.

No, I do not think that if we find the AI scheming against us then we will even shut down that particular AI. Maybe, if we’re super lucky?

The world they built has a number of obvious contradictions in it, and should be long dead many times over before the movie starts or at least utterly transformed, but in context I am fine with it, because it is in service of the story that needs to be told here.

The alignment failure here actually makes sense, and the capabilities developments at least sort of make sense as well if you accept certain background assumptions that make the world look like it does. And yes, people have made proposals exactly this stupid, that fail in pretty much exactly this way, exactly this predictably.

Also, in case you’re wondering why ‘protect the primary user’ won’t work, in its various forms and details? Now you know, as they say.

And yeah, people are this bad at explaining themselves.

In some sense, the suggested alignment solution here is myopia. If you ensure your AIs don’t do instrumental convergence beyond the next two minutes, maybe you can recover from your mistakes? It also of course causes all the problems, the AI shouldn’t be this stupid in the ways it is stupid, but hey.

Of course, actual LLMs would never end up doing any of this, not in these ways, unless perplexity suggested to them that they were an AI horror movie villain or you otherwise got them into the wrong context.

Also there’s the other movie here, which is about technological unemployment and cultural reactions to it, which is sadly underdeveloped. They could have done so much more with that.

Anyway, I’m sad this wasn’t better – not enough people will see it or pay attention to what it is trying to tell them, and that’s a shame. Still, we have the spiritual sequel to Megan, and it works.

Finally (minor spoilers here), it seems important that the people describing the movie have no idea what happened in the movie? As in, if you look at the Metacritic summary… it is simply wrong. Alice’s objective never changes. Alice never ‘wants’ anything for herself, in any sense. If anything, once you understand that, it makes it scarier.

Unfrosted is a dumb Jerry Seinfeld comedy. I get that. I’m not saying the critics are wrong, not exactly? But the jokes are good. I laughed quite a lot, and a lot more than at most comedies or than I laughed when I saw Seinfeld in person and gave him 3.5 GPTs – Unfrosted gets at least 5.0 GPTs. Live a little, everyone.

I needed something to watch with the kids, and this overperformed. There are good jokes and references throughout. This is Seinfeld having his kind of fun and everyone having fun helping him have it. Didn’t blow me away, wasn’t trying to do so. Mission accomplished.

Joker: Folie a Duex was definitely not what anyone expected, and it’s not for everyone, but I stand by my review here, and yes I have it slightly above Wicked:

You have to commit to the bit. The fantasy is the only thing that’s real.

Beware audience capture. You are who you choose to be.

Do not call up that which you cannot put down.

A lot of people disliked the ending. I disagree in the strongest terms. The ending makes both Joker movies work. Without it, they’d both be bad.

With it, I’m actively glad I saw this.

I liked what Film Colossus said about it, that they didn’t like the movie but they really loved what it was trying to do. I both loved what it was trying to do and kinda liked the movie itself, also I brought a good attitude.

For Megalopolis, yes it’s a mess, sure, but it is an amazingly great mess with a lot of the right ideas and messages, even if it’s all jumbled and confused.

If you don’t find a way to appreciate it, that is on you. Perhaps you are letting your sense of taste get in the way. Or perhaps you have terrible taste in ideas and values? Everyone I know of who said they actively liked this is one of my favorite people.

This movie is amazing. It is endlessly inventive and fascinating. Its heart, and its mind, are exactly where they need to be. I loved it.

Don’t get me wrong. The movie is a mess. You could make a better cut of it. There are unforced errors aplenty. I have so, so many notes.

The whole megalopolis design is insufficiently dense and should have been shoved out into the Bronx or maybe Queens.

But none of that matters compared to what you get. I loved it all.

And it should terrify you that we live in a country that doesn’t get that.

Then there’s The Brutalist, which the critics think is Amazingly Great (including Tyler Cowen here). Whereas ultimately I thought it was medium, on the border between 3 and 3.5, and I’m not entirely convinced my life is better because I saw it.

So the thing is… the building is ugly? Everything he builds is ugly?

That’s actually part of why I saw the film – I’d written a few weeks ago about how brutalist/modern architecture appears to be a literal socialist conspiracy to make people suffer, so I was curious to see things from their point of view. We get one answer about ‘why architecture’ and several defenses of ‘beauty’ against commercial concerns, and talk about standing the test of time. And it’s clear he pays attention to detail and cares about the quality of his work – and that technically he’s very good.

But. The. Buildings. Are. All. Ugly. AF.

He defends concrete as both cheap and strong. True enough. But it feels like there’s a commercial versus artistic tension going the other way, and I wish they’d explored that a bit? Alas.

Instead they focus on what the film actually cares about, the Jewish immigrant experience. Which here is far more brutalist than the buildings. It’s interesting to see a clear Oscar-bound film make such a robust defense of Israel, and portray America as a sick, twisted, hostile place for God’s chosen people, even when you have the unbearable weight of massive talent.

Then there’s the ending. I mouthed ‘WTF?’ more than once, and I still have no idea WTF. In theory I get the artistic choice, but really? That plus the epilogue and the way that was shot, and some other detail choices, made me think this was about a real person. But no, it’s just a movie that decided to be 3.5 hours long with an intermission and do slice-of-hard-knock-life things that didn’t have to go anywhere.

Ultimately, I respect a lot of what they’re doing here, and that they tried to do it at all, and yes Pierce and Brody are great (although I don’t think I’d be handing out Best Actor here or anything). But also I feel like I came back from an assignment.

Since I wrote that, I’ve read multiple things and had time to consider WTF, and I understand the decision, but that new understanding of the movie makes me otherwise like the movie less and makes it seem even more like an assignment. Contra Tyler I definitely did feel like this was 3.5 hours long.

I do agree with many of Tyler’s other points (including ‘recommended, for some’!) although the Casablanca angle seems like quite a stretch.

One detail I keep coming back to, that I very much appreciate and haven’t seen anyone else mention, is the scene where he is made to dance, why it happens and how that leads directly to other events. I can also see the ‘less interesting’ point they might have been going for instead, and wonder if they knew what they were doing there.

My new ultimately here is that I have a fundamentally different view than the movie of most of the key themes in the movie does, and that made it very difficult for me to enjoy it. When he puts that terrible chair and table in the front of the furniture store, I don’t think ‘oh he’s a genius’ I think ‘oh what a pretentious arse, that’s technically an achievement but in practice it’s ugly and non-functional, no one will want it, it can’t be good for business.’

It’s tough to enjoy watching a (highly brutal in many senses, as Tyler notes!) movie largely about someone being jealous of and wanting the main character’s talent when you agree he’s technically skilled but centrally think his talents suck, and when you so strongly disagree with its vision, judgment and measure of America. Consider that the antagonist is very clearly German. The upside-down Statue of Liberty tells you a lot.

We’ve moved beyond award shows, I think, now that we have Metacritic and Letterboxd, if your goal is to find the best movies.

In terms of the Oscars and award shows, I’ll be rooting for Anora, but wow the awards process is dumb when you actually look at it. Knowing what is nominated, or what won, no longer provides much alpha on movie quality.

Giving the Golden Globe for Best Musical or Comedy to Emelia Perez (2.8 on Letterboxd, 71 on Metacritic) over Anora (4.1 on Letterboxd, 91 on Metacritic) or Challengers tells you that they cared about something very different from movie quality.

There were and have been many similar other such cases, as well, but that’s the one that drove it home this year – it’s my own view, plus the view of the public, plus the view of the critics when they actually review the movies, and they all got thrown out the window.

Your goal is not, however, purely to find the best movies.

Robin Hanson: The Brutalist is better than all these other 2024 movies I’ve seen: Anora, Emilia Perez, Wicked, Conclave, Dune 2, Complete Unknown, Piano Lesson, Twisters, Challengers Juror #2, Megalopolis, Civil War. Engaging, well-made, but not satisfying or inspiring.

Tyler Cowen: A simple question, but if this is how it stands why go see all these movies?

Robin Hanson: For the 40 years we’ve been together, my wife & I have had a tradition of seeing most of the Oscar nominated movies every year. Has bonded us, & entertained us.

I like that tradition, and have tried at times a similar version of it. I think this made great sense back in the 1990s, or even 2000s, purely for the selection effects.

Today, you could still say do it to be part of the general conversation, or as tradition. And I’m definitely doing some amount of ‘see what everyone is likely to talk about’ since that is a substantial bonus.

But I think we’d do a lot better if the selection process was simply some aggregate of Metacritic, Letterboxd and (projected followed by actual) box office. You need box office, because you want to avoid niche movies that get high ratings from those that choose to watch them, but would do much less well with a general audience.

I definitely plan on continuing to log and review all the movies I see going forward. If you’re reading this and think I or others should consider following you there, let me know in the comments, you have permission to pitch that, or to pitch as a more general movie critic. You are also welcome to make recommendations, if they are specifically for me based on the information here – no simply saying ‘I thought [X] was super neat.’

Tracking and reviewing everything been a very useful exercise. You learn a lot by looking back. And I expect that feeding the data to LLMs will allow me to make better movie selections not too long from now. I highly recommend it to others.

Discussion about this post

Zvi’s 2024 In Movies Read More »

viral-chatgpt-powered-sentry-gun-gets-shut-down-by-openai

Viral ChatGPT-powered sentry gun gets shut down by OpenAI

OpenAI says it has cut off API access to an engineer whose video of a motorized sentry gun controlled by ChatGPT-powered commands has set off a viral firestorm of concerns about AI-powered weapons.

An engineer going by the handle sts_3d started posting videos of a motorized, auto-rotating swivel chair project in August. By November, that same assembly appeared to seamlessly morph into the basis for a sentry gun that could quickly rotate to arbitrary angles and activate a servo to fire precisely aimed projectiles (though only blanks and simulated lasers are shown being fired in his videos).

Earlier this week, though, sts_3d started getting wider attention for a new video showing the sentry gun’s integration with OpenAI’s real-time API. In the video, the gun uses that ChatGPT integration to aim and fire based on spoken commands from sts_3d and even responds in a chirpy voice afterward.

@sts_3d OpenAI Realtime API project integration #robotics #ai #openai ♬ original sound – sts_3d

“If you need any other assistance, please let me know,” the ChatGPT-powered gun says after firing a volley at one point. “Good job, you saved us,” sts_3d responds, deadpan.

“I’m glad I could help!” ChatGPT intones happily.

In response to a comment request from Futurism, OpenAI said it had “proactively identified this violation of our policies and notified the developer to cease this activity ahead of receiving your inquiry. OpenAI’s Usage Policies prohibit the use of our services to develop or use weapons or to automate certain systems that can affect personal safety.”

Halt, intruder alert!

The “voice-powered killer AI robot angle” has garnered plenty of viral attention for sts_3d’s project in recent days. But the ChatGPT integration shown in his video doesn’t exactly reach Terminator levels of a terrifying killing machine. Here, ChatGPT instead ends up looking more like a fancy, overwrought voice-activated remote control for a legitimately impressive gun mount.

Viral ChatGPT-powered sentry gun gets shut down by OpenAI Read More »

disney,-fox,-and-wbd-give-up-on-controversial-sports-streaming-app-venu

Disney, Fox, and WBD give up on controversial sports streaming app Venu

Although Fubo’s lawsuit against the JV appears to be settled, other rivals in sports television seemed intent on continuing to fight Venu.

In a January 9 letter (PDF) to US District Judge Margaret M. Garnett of the Southern District in New York, who granted Fubo’s premliminary injunction against Venu, Michael Hartman, general counsel and chief external affairs officer for DirectTV, wrote that Fubo’s settlement “does nothing to resolve the underlying antitrust violations at issue.” Hartman asked the court to maintain the preliminary injunction against the app’s launch.

“The preliminary injunction has protected consumers and distributors alike from the JV Defendant’s scheme to ‘capture demand,’ ‘suppress’ potentially competitive sports bundles, and impose consumer price hikes,” the letter says, adding that DirectTV would continue to explore its options regarding the JV “and other anticompetitive harms.”

Similarly, Pantelis Michalopoulos, counsel for EchoStar Corporation, which owns Dish, penned a letter (PDF) to Garnett on January 7, claiming the members of the JV “purchased their way out of their antitrust violation.” Michalopoulos added that the JV defendants “should not be able to pay their way into erasing the Court’s carefully reasoned decision” to temporarily block Venu’s launch.

In addition to Fubo, DirecTV, and Dish, ACA Connects (a trade association for small- to medium-sized telecommunication service providers) publicly expressed concerns about Venu. NFL was also reported to be worried about the implications of the venture.

Now, the three giants behind Venu are throwing in the towel and abandoning an app that could have garnered a lot of subscribers tired of hopping around apps, channels, and subscriptions to watch all the sports content they wanted. But they’re also avoiding a lot of litigation and potential backlash in the process.

Disney, Fox, and WBD give up on controversial sports streaming app Venu Read More »

meta-kills-diversity-programs,-claiming-dei-has-become-“too-charged”

Meta kills diversity programs, claiming DEI has become “too charged”

Meta has reportedly ended diversity, equity, and inclusion (DEI) programs that influenced staff hiring and training, as well as vendor decisions, effective immediately.

According to an internal memo viewed by Axios and verified by Ars, Meta’s vice president of human resources, Janelle Gale, told Meta employees that the shift was due to “legal and policy landscape surrounding diversity, equity, and inclusion efforts in the United States is changing.”

It’s another move by Meta that some view as part of the company’s larger effort to align with the incoming Trump administration’s politics. In December, Donald Trump promised to crack down on DEI initiatives at companies and on college campuses, The Guardian reported.

Earlier this week, Meta cut its fact-checking program, which was introduced in 2016 after Trump’s first election to prevent misinformation from spreading. In a statement announcing Meta’s pivot to X’s Community Notes-like approach to fact-checking, Meta CEO Mark Zuckerberg claimed that fact-checkers were “too politically biased” and “destroyed trust” on Meta platforms like Facebook, Instagram, and Threads.

Trump has also long promised to renew his war on alleged social media censorship while in office. Meta faced backlash this week over leaked rule changes relaxing Meta’s hate speech policies, The Intercept reported, which Zuckerberg said were “out of touch with mainstream discourse.”  Those changes included allowing anti-trans slurs previously banned, as well as permitting women to be called “property” and gay people to be called “mentally ill,” Mashable reported. In a statement, GLAAD said that rolling back safety guardrails risked turning Meta platforms into “unsafe landscapes filled with dangerous hate speech, violence, harassment, and misinformation” and alleged that Meta appeared to be willing to “normalize anti-LGBTQ hatred for profit.”

Meta kills diversity programs, claiming DEI has become “too charged” Read More »