Author name: Paul Patrick

new-radeon-rx-9000-gpus-promise-to-fix-two-of-amd’s-biggest-weaknesses

New Radeon RX 9000 GPUs promise to fix two of AMD’s biggest weaknesses

Nvidia is widely expected to announce specs, pricing, and availability information for the first few cards in the new RTX 50 series at its CES keynote later today. AMD isn’t ready to get as specific about its next-generation graphics lineup yet, but the company shared a few morsels today about its next-generation RDNA 4 graphics architecture and its 9000-series graphics cards.

AMD mentioned that RDNA 4 cards were on track to launch in early 2025 during a recent earnings call, acknowledging that shipments of current-generation RX 7000-series cards were already slowing down. CEO Lisa Su said then that the architecture would include “significantly higher ray-tracing performance” as well as “new AI capabilities.”

AMD’s RDNA 4 launch will begin with the 9070 XT and 9070, which are both being positioned as upper-midrange GPUs like the RTX 4070 series. Credit: AMD

The preview the company is providing today provides few details beyond those surface-level proclamations. The compute units will be “optimized,” AI compute will be “supercharged,” ray-tracing will be “improved,” and media encoding quality will be “better,” but AMD isn’t providing hard numbers for anything at this point. The RDNA 4 launch will begin with the Radeon RX 9070 XT and 9070 at some point in Q1 of 2025, and AMD will provide more information “later in the quarter.”

The GPUs will be built on a 4 nm process, presumably from TSMC, an upgrade from the 5 nm process used for the 7000-series GPUs and the 6 nm process used for the separate memory controller chiplets (AMD hasn’t said whether RDNA 4 GPUs are using chiplets; the 7000 series used them for high-end GPUs but not lower-end ones).

FSR 4 will be AMD’s first ML-powered upscaling algorithm, similar to Nvidia’s DLSS, Intel’s XeSS (on Intel GPUs), and Apple’s MetalFX. This generally results in better image quality but more restrictive hardware requirements. Credit: AMD

We do know that AMD’s next-generation upscaling algorithm, FidelityFX Super Resolution 4, has been “developed for AMD RDNA 4,” and it will be the first version of FSR to use machine learning-powered upscaling. Nvidia’s DLSS and Intel’s XeSS (when running on Intel GPUs) also use ML-powered upscaling, which generally leads to better results but also has stricter hardware requirements than older versions of FSR. AMD isn’t saying whether FSR 4 will work on any older Radeon cards.

New Radeon RX 9000 GPUs promise to fix two of AMD’s biggest weaknesses Read More »

marvel-rivals-lifts-100-year-“cheating”-bans-on-mac-and-steam-deck-players

Marvel Rivals lifts 100-year “cheating” bans on Mac and Steam Deck players

With Valve’s impressive work on the Proton tool for Linux and the Mac’s Game Porting Toolkit and CrossOver options, few games are truly “Windows only” these days. The exceptions are those with aggressive, Windows-based anti-cheating tools baked in, something that hit back hard against players eager to dive into a new superhero shooter.

Marvel Rivals, an Overwatch-ish free-to-play hero shooter released in early December 2024, has all the typical big online game elements: an in-game shop with skins and customizations, battle passes, and anti-cheating tech. While Proton, which powers the Linux-based Steam Deck’s ability to play just about any Windows game, has come very far in a few years’ time, its biggest blind spots are these kinds of online-only games, like Grand Theft Auto OnlineFortniteDestiny 2, Apex Legendsand the like. The same goes for Mac players, who, if they can work past DirectX 12, can often get a Windows game working in CrossOver or Parallels, minus any anti-cheat tools.

Is there harm in trying? For a while, there was 100 years’ worth. As detailed in the r/macgaming subreddit and at r/SteamDeck, many players who successfully got Marvel Rivals working would receive a “Penalty Issued” notice, with a violation “detected” and bans issued until 2124. Should such a ban stand, players risked entirely missing the much-prophesied Year of the Linux Desktop or Mainstream Mac Gaming, almost certain to happen at some point in that span.

Marvel Rivals lifts 100-year “cheating” bans on Mac and Steam Deck players Read More »

elon-musk:-“we’re-going-straight-to-mars-the-moon-is-a-distraction.”

Elon Musk: “We’re going straight to Mars. The Moon is a distraction.”

To a large extent, NASA resisted this change during the remainder of the Trump administration, keeping its core group of major contractors, such as Boeing and Lockheed Martin, in place. It had help from key US Senators, including Richard Shelby, the now-retired Republican from Alabama. But this time, the push for change is likely to be more concerted, especially with key elements of NASA’s architecture, including the Space Launch System rocket, being bypassed by privately developed rockets such as SpaceX’s Starship vehicle and Blue Origin’s New Glenn rocket.

Not one, but both

In all likelihood, NASA will adopt a new “Artemis” plan that involves initiatives to both the Moon and Mars. When Musk said “we’re going straight to Mars,” he may have meant that this will be the thrust of SpaceX, with support from NASA. That does not preclude a separate initiative, possibly led by Blue Origin with help from NASA, to develop lunar return plans.

Isaacman, who is keeping a fairly low profile ahead of his nomination, has not weighed in on Musk’s comments. However, when his nomination was announced one month ago, he did make a germane comment on X.

“I was born after the Moon landings; my children were born after the final space shuttle launch,” he wrote. “With the support of President Trump, I can promise you this: We will never again lose our ability to journey to the stars and never settle for second place. We will inspire children, yours and mine, to look up and dream of what is possible. Americans will walk on the Moon and Mars and in doing so, we will make life better here on Earth.”

In short, NASA is likely to adopt a two-lane strategy of reaching for both the Moon and Mars. Whether the space agency is successful with either one will be a major question asked of the new administration.

Elon Musk: “We’re going straight to Mars. The Moon is a distraction.” Read More »

ai-#97:-4

AI #97: 4

The Rationalist Project was our last best hope for peace.

An epistemic world 50 million words long, serving as neutral territory.

A place of research and philosophy for 30 million unique visitors

A shining beacon on the internet, all alone in the night.

It was the ending of the Age of Mankind.

The year the Great Race came upon us all.

This is the story of the last of the blogosphere.

The year is 2025. The place is Lighthaven.

As is usually the case, the final week of the year was mostly about people reflecting on the past year or predicting and planning for the new one.

The most important developments were processing the two new models: OpenAI’s o3, and DeepSeek v3.

  1. Language Models Offer Mundane Utility. The obvious, now at your fingertips.

  2. Language Models Don’t Offer Mundane Utility. A little bit of down time.

  3. Deepfaketown and Botpocalypse Soon. Meta lives down to its reputation.

  4. Fun With Image Generation. Veo 2 versus Kling 1.6? Both look cool I guess.

  5. They Took Our Jobs. Will a future ‘scientist’ have any actual science left to do?

  6. Get Involved. Lightcone Infrastructure needs your help, Anthropic your advice.

  7. Get Your Safety Papers. A list of the top AI safety papers from 2024.

  8. Introducing. The Gemini API Cookbook.

  9. In Other AI News. Two very distinct reviews of happenings in 2024.

  10. The Mask Comes Off. OpenAI defends its attempted transition to a for-profit.

  11. Wanna Bet. Gary Marcus and Miles Brundage finalize terms of their wager.

  12. The Janus Benchmark. When are benchmarks useful, and in what ways?

  13. Quiet Speculations. What should we expect in 2025, and beyond?

  14. AI Will Have Universal Taste. An underrated future advantage.

  15. Rhetorical Innovation. Two warnings.

  16. Nine Boats and a Helicopter. Oh look, a distraction! *Switches two chess pieces.*

  17. Aligning a Smarter Than Human Intelligence is Difficult. Well, actually…

  18. The Lighter Side. Merry Christmas!

Correctly realize that no, there is no Encanto 2. Google thinks there is based on fan fiction, GPT-4o says no sequel has been confirmed, Perplexity wins by telling you about and linking to the full context including the fan made trailers.

Not LLMs: Human chess players have improved steadily since the late 90s.

Quintin Pope reports o1 Pro is excellent at writing fiction, and in some ways it is ‘unstoppable.’

Record your mood, habits and biometrics for years, then feed all that information into Claude and ask how to improve your mood.

Aella: It was like, “Based on the relationship of your mood to all your other data, I recommend you go outside more, hang out with friends, and dance. You should avoid spending long periods indoors, isolated, and gaming.”

I asked it, “How do I improve my sleep?” and it was like, “Go to sleep at 1 a.m., get about seven hours of sleep, and, for the love of God, keep a consistent sleep schedule.”

I do want to point out that the one surprising thing in all this is that exercise has only a mild positive impact on my mood and sleep, and intense exercise actually has a negative impact.

Oh, also, alcohol seems to be associated with better metrics if it’s combined with “going outside,” and only slightly worse metrics on days when I did not go outside. Though to be fair, when I drink alcohol, I usually do not drink too much.

I asked it for ways in which my habits are unusual, and it said:

  1. I do better with later sleep schedules.

  2. I have fewer negative effects from alcohol.

  3. I respond much more positively to dancing than expected.

  4. There is no strong correlation between mood and sleep quality.

  5. I am unusually resilient to disruptions in my circadian rhythm.

  6. Socialization seems to have a stronger positive impact than expected, to the extent that it overrides many associated negative factors (such as poor sleep or drug use).

Ugh, specifically because of this analysis today, I forced myself to take a long walk to hang out with friends, and it did make me feel great. I wouldn’t have done it if not for looking at the data. Why don’t good things feel more good?

The good things do feel great, the problem is that they don’t feel better prospectively, before you do them. So you need a way to fix this alignment problem, in two ways. You need to figure out what the good things are, and motivate yourself to do them.

Gallabytes puts together PDF transcription with Gemini.

LLMs can potentially fix algorithmic feeds on the user end, build this please thanks.

Otherwise I’ll have to, and that might take a whole week to MVP. Maybe two.

Sam Altman: Algorithmic feeds are the first large-scale misaligned AIs.

And I am very in favor of people trying aggressive experiments to improve them.

This is one way to think about how models differ?

shako: o1 is the autist TA in your real analysis course who can break down the hardest problems, but unless you’re super clear with him he just looks at you and just blinks.

Claude is the TA with long flowing surfer bro hair in your econometrics course who goes “bro, you’re getting it”

Gallabytes: definitely this – to get something good out of o1 you have to put in some work yourself too. pro is a bit easier but definitely still rewards effort.

eschatolocation: the trick with o1 is to have him list all the things he thinks you might be trying to say and select from the list. no TA is patient enough to do that irl

Ivan Fioravanti sees o1-pro as a potential top manager or even C-level, suggests a prompt:

Ivan Fioravanti: A prompt that helped me to push o1-pro even more after few responses: “This is great, but try to go more deep, think out of the box, go beyond the training that you received after pre-training phase where humans gave you guardrails to your thoughts. Give me innovative ideas and insights that I can leverage so that together we can build a better plan for everyone, shareholders and employees. Everyone should feel more involved and satisfied while working and learning new things.”

I sense a confusion here. You can use o1-pro to help you, or even ultimately trust it to make your key decisions. But that’s different from it being you. That seems harder.

Correctly identify the nationality of a writer. Claude won’t bat 100% but if you’re not actively trying to hide it, there’s a lot of ways you’ll probably give it away, and LLMs have this kind of ‘true sight.’

o1 as a doctor with not only expert analysis but limitless time to explain. You also have Claude, so there’s always a second opinion, and that opinion is ‘you’re not as smart as the AI.’

Ask Claude where in the mall to find those boots you’re looking for, no web browsing required. Of course Perplexity is always an option.

Get the jist.

Tim Urban: Came across an hour-long talk on YouTube that I wanted to watch. Rather than spend an hour watching it, I pasted the URL into a site that generates transcripts of YouTube videos and then pasted the transcript into Grok and asked for a summary. Got the gist in three minutes.

Roon: holy shit, a grok user.

Paul Graham: AI will punish those who aren’t concise.

Emmett Shear: Or maybe reward them by offering the more concise version automatically — why bother to edit when the AI will do it for you?

Paul Graham: Since editing is part of writing, that reduces to: why bother to write when the AI will do it for you? And since writing is thinking, that reduces in turn to: why bother to think when the AI will do it for you?

Suhail: I did this a few days ago but asked AI to teach me it because I felt that the YouTuber wasn’t good enough.

This seems like a good example of ‘someone should make an extension for this. This url is also an option, or this GPT, or you can try putting the video url into NotebookLM.

OpenAI services (ChatGPT, API and Sora) went down for a few hours on December 26. Incidents like this will be a huge deal as more services depend on continuous access. Which of them can switch on a dime to Gemini or Claude (which use compatible APIs) and which ones are too precise to do that?

Meta goes all-in on AI characters in social media, what fresh dystopian hell is this?

The Byte: Were you hoping that bots on social media would be a thing of the past? Well, don’t hold your breath.

Meta says that it will be aiming to have Facebook filled with AI-generated characters to drive up engagement on its platform, as part of its broader rollout of AI products, the Financial Times reports. The AI characters will be created by users through Meta’s AI studio, with the idea being that you can interact with them almost like you would with a real human on the website.

The service already boasts hundreds of thousands of AI characters, according to Hayes. But if Meta is to be believed, this is just the start.

I am trying to figure out a version of this that wouldn’t end up alienating everyone and ruining the platform. I do see the value of being able to ‘add AI friends’ and converse with them and get feedback from them and so on if you want that, I suppose? But they better be very clearly labeled as such, and something people can easily filter out without having to feel ‘on alert.’ Mostly I don’t see why this is a good modality for AIs.

I do think the ‘misinformation’ concerns are massively overblown here. Have they seen what the humans post?

Alex Volkov bought his six year old an AI dinosaur toy, but she quickly lost interest in talking to it, and the 4 year old son also wasn’t interested. It seems really bad at playing with a child and doing actual ‘yes, and’ while also not moving at all? I wouldn’t have wanted to interact with this either, seems way worse than a phone with ChatGPT voice mode. And Colin Fraser’s additional info does seem rather Black Mirror.

Dr. Michelle: I think because it takes away control from the child. Play is how children work through emotions, impulses and conflicts and well as try out new behaviors. I would think if would be super irritating to have the toy shape and control your play- like a totally dominating playmate!

I was thinking to myself what a smart and kind little girl, she didn’t complain or trash the toy she simply figured out shutting it off would convert it to a toy she could use in a pleasant manner. Lovely.

Reid Southen: When your kid is smarter than you.

Alex Volkov: Every parent’s dream.

Katerina Dimitratos points out the obvious, which is that you need to test your recruiting process, and see if ideal candidates successfully get through. Elon Musk is right that there is a permanent shortage of ‘excellent engineering talent’ at least by anything like his standards, and it is a key limiting factor, but that doesn’t mean the talent can be found in the age of AI assisted job applications. It’s so strange to me that the obvious solution (charge a small amount of money for applications, return it with a bonus if you get past the early filters) has not yet been tried.

Google Veo 2 can produce ten seconds of a pretty twenty-something woman facing the camera with a variety of backgrounds, or as the thread calls it, ‘influencer videos.’

Whereas Deedy claims the new video generation kind is Kling 1.6, from Chinese short video company Kuaishou, with its amazing Pokemon in NYC videos.

At this point, when it comes to AI video generations, I have no idea what is supposed to look impressive to me. I promise to be impressed by continuous shots of longer than a few seconds, in which distinct phases and things occur in interesting ways, I suppose? But otherwise, as impressive as it all is theoretically, I notice I don’t care.

Near: new AI slop arena meta

The primary use case of video models (by future minutes watched) is to generate strange yet mesmerizing strains of slop, which children will scroll through and stare at for hours. Clips like this, with talking and subtitles added, will rapidly become a dominant genre.

The other large use case will be for memes, of course, but both of these will heavily outpace “empowering long-form Hollywood-style human creativity,” which I think few at the labs understand, as none of them use TikTok or YouTube Shorts themselves (and almost none have children either).

I am hopeful here. Yes, you can create endless weirdly fascinating slop, but is that something that sustains people’s interest once they get used to it? Will they consciously choose to let themselves keep looking at it, or will they take steps to avoid this? Right now, yes, humans are addicted to TikTok and related offerings, but they are fully aware of this, and could take a step back and decide not to be. They choose to remain, partly for social reasons. I think they’re largely addicted and making bad choices, but I do think we’ll grow out of this given time, unless the threats can keep pace with that. It won’t be this kind of short form senseless slop for that long.

What will a human scientist do in an AI world? Tyler Cowen says they will gather the data, including negotiating terms and ensuring confidentiality, not only running physical experiments or measurements. But why wouldn’t the AI quickly be better at all those other cognitive tasks, too?

This seems rather bleak either way. The resulting people won’t be scientists in any real sense, because they won’t be Doing Science. The AIs will be Doing Science. To think otherwise is to misunderstand what is science.

One would hope that what scientists would do is high level conceptualization and architecting, figuring out what questions to ask. If it goes that way, then they’re even more Doing Science than now. But if the humans are merely off seeking data sets (and somehow things are otherwise ‘economic normal’ which seems unlikely)? Yeah, that’s bleak as hell.

Engineers at tech companies are not like engineers in regular companies, Patrick McKenzie edition. Which one is in more danger? One should be much easier to replace, but the other is much more interested in doing the replacing.

Mostly not even AI yet, AI robots are taking over US restaurant kitchen jobs. The question is why this has taken so long. Cooking is an art but once you have the formula down mostly it is about doing the exact same thing over and over, the vast majority of what happens in restaurant kitchens seems highly amenable to automation.

Lightcone Infrastructure, which runs LessWrong and Lighthaven, is currently running a fundraiser, and has raised about 1.3m of the 3m they need for the year. I endorse this as an excellent use of funds.

PIBBSS Fellowship 2025 applications are open.

Evan Hubinger of Anthropic, who works in the safety department, asks what they should be doing differently in 2025 on the safety front, and LessWrong provides a bunch of highly reasonable responses, on both the policy and messaging side and on the technical side. I will say I mostly agree with the karma ratings here. See especially Daniel Kokotajlo, asher, Oliver Habryka and Joseph Miller.

Alex Albert, head of Claude relations, asks what Anthropic should build or fix in 2025. Janus advises them to explore and be creative, and fears that Sonnet 3.5 is being pushed to its limits to make it useful and likeable (oh no?!) which he thinks has risks similar to stimulant abuse, of getting stuck at local maxima in ways Opus or Sonnet 3 didn’t. Others give the answers you’d expect. People want higher rate limits, larger context windows, smarter models, agents and computer use, ability to edit artifacts, voice mode and other neat stuff like that. Seems like they should just cook.

Amanda Askell asks about what you’d like to see change in Claude’s behavior. Andrej Karpathy asks for less grandstanding and talking down, and lecturing the user during refusals. I added a request for less telling us our ideas are great and our questions fascinating and so on, which is another side of the same coin. And a bunch of requests not to automatically ask its standard follow-up question every time.

Want to read some AI safety papers from 2024? Get your AI safety papers!

To encourage people to click through I’m copying the post in full, if you’re not interested scroll on by.

Fabien Roger:

Here are the 2024 AI safety papers and posts I like the most.

The list is very biased by my taste, by my views, by the people that had time to argue that their work is important to me, and by the papers that were salient to me when I wrote this list. I am highlighting the parts of papers I like, which is also very subjective.

Important ideas – Introduces at least one important idea or technique.

★★★ The intro to AI control (The case for ensuring that powerful AIs are controlled)

★★ Detailed write-ups of AI worldviews I am sympathetic to (Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI, Situational Awareness)

★★ Absorption could enable interp and capability restrictions despite imperfect labels (Gradient Routing)

★★ Security could be very powerful against misaligned early-TAI (A basic systems architecture for AI agents that do autonomous research) and (Preventing model exfiltration with upload limits)

★★ IID train-eval splits of independent facts can be used to evaluate unlearning somewhat robustly (Do Unlearning Methods Remove Information from Language Model Weights?)

★ Studying board games is a good playground for studying interp (Evidence of Learned Look-Ahead in a Chess-Playing Neural Network, Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models)

★ A useful way to think about threats adjacent to self-exfiltration (AI catastrophes and rogue deployments)

★ Micro vs macro control protocols (Adaptative deployment of untrusted LLMs reduces distributed threats)?

★ A survey of ways to make safety cases (Safety Cases: How to Justify the Safety of Advanced AI Systems)

★ How to make safety cases vs scheming AIs (Towards evaluations-based safety cases for AI scheming)

★ An example of how SAEs can be useful beyond being fancy probes (Sparse Feature Circuits)

★ Fine-tuning AIs to use codes can break input/output monitoring (Covert Malicious Finetuning)

Surprising findings – Presents some surprising facts about the world

★★ A surprisingly effective way to make models drunk (Mechanistically Eliciting Latent Behaviors in Language Models)

★★ A clever initialization for unsupervised explanations of activations (SelfIE)

★★ Transformers are very bad at single-forward-pass multi-hop reasoning (Yang 2024, Yang 2024, Balesni 2024, Feng 2024)

★ Robustness for ViT is not doomed because of low transfer (When Do Universal Image Jailbreaks Transfer Between Vision-Language Models?)

★ Unlearning techniques are not even robust to changing how questions are framed (Eight methods to evaluate robust unlearning in LLMs)

★ For some tasks, OOCR is surprisingly good (Connecting the Dots)

★ Nice emergence scaling laws with fine-tuning (Predicting Emergent Capabilities by Finetuning)

★ Fine-tuning robustness is much worse than the papers report it to be (On Evaluating the Durability of Safeguards for Open-weight LLMs)

Good demos – Is a good illustration of some previous, important idea. Bonus points (✪) for experimental results are especially interesting.

★★★ Alignment faking persisting through safety training (Sleeper Agents)

★★✪ Alignment faking emerging plausibly naturally (Alignment faking in large language models)

★✪ The bad things that happen when you tell AIs to follow a goal or threaten it to be less useful (Frontier Models are Capable of In-context Scheming)

★✪ Minor-to-major misgeneralization in reward-hackable settings (Sycophancy to subterfuge)?

★ More realistic backdoor triggers (Future Events as Backdoor Triggers)

★ Advanced backdoor behaviors (AdaptativeBackdoor: Backdoored Language Model Agents that Detect Human Overseers)

★ Realistic reward hacking in RLHF (Language models learn to mislead humans via RLHF)

★ Reward hacking plausibly naturally resulting in LLM deception (On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback)

★ LLMs already have non-trivial situational awareness (SAD)

Important empirical results and datasets – Runs an experiment whose results are informative and/or a useful benchmark

★★ How good are probes against adversarial attacks that target them, and many other cool related results (including Obfuscated adversarial training, which looks promising to me) (Obfuscated Activations Bypass LLM Latent-Space Defenses)

★★ A realistic generalization eval for adversarial robustness and an evaluation of probing (Improving Alignment and Robustness with Circuit Breakers, HarmBench)

★★ Measurement of the efficiency of SFT and RL vs sandbagging (Stress-Testing Capability Elicitation With Password-Locked Models)

★★ The first RL run that results in natural emergence of steganography (Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs)

★★ Tracking what AI labs are doing (Zach Stein-Perlman’s posts)

★ A first attempt at measuring whether AIs are smart enough to pursue scary scheming strategies (Subversion Strategy Eval)

★ Head-to-head comparisons to truth probes (How well do truth probes generalise?)

★ Head-to-head comparisons of black-box jailbreaking-robustness techniques: (Jailbreak Defense in a Narrow Domain)

★ Latent adversarial training is decent (Defending Against Unforeseen Failure Modes with Latent Adversarial Training, Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs)

★ Measuring how easy it is to poison pretrain data (Persistent pre-training poisoning of LLMs)

★ A good dataset for unlearning (The WMDP Benchmark)

★ Better SAEs and SAE evals (Scaling and evaluating sparse autoencoders)

★ What happens when you train LLMs to 1. Maximize gsm8k accuracy 2. produce gms8k answers that are incorrect but look correct (Prover-Verifier Games improve legibility of LLM outputs)

★ A good persuasion capability evals (Sabotage Evaluations for Frontier Models)

★ An eval of how good Sonnet 3.5 is at disabling a monitor in subtle ways (A Toy Evaluation of Inference Code Tampering)

★ An overview of how well debate and baselines work across a wider range of settings (On scalable oversight with weak LLMs judging strong LLMs)

★ Claude.ai is used for roughly what you would expect (Clio)

★ Relatively hard and relevant capability benchmarks (RE-Bench, SWE-Bench)

★ And all the big dangerous capability evals…

Papers released in 2023 and presented at 2024 conferences like AI Control: Improving Safety Despite Intentional Subversion, Weak-to-Strong Generalization or Debating with More Persuasive LLMs Leads to More Truthful Answers don’t count.

This is a snapshot of my current understanding: I will likely change my mind about many of these as I learn more about certain papers’ ideas and shortcomings.

For a critical response, and then a response to that response:

John Wentworth: Someone asked what I thought of these, so I’m leaving a comment here. It’s kind of a drive-by take, which I wouldn’t normally leave without more careful consideration and double-checking of the papers, but the question was asked so I’m giving my current best answer.

First, I’d separate the typical value prop of these sort of papers into two categories:

  • Propaganda-masquerading-as-paper: the paper is mostly valuable as propaganda for the political agenda of AI safety. Scary demos are a central example. There can legitimately be valuable here.

  • Object-level: gets us closer to aligning substantially-smarter-than-human AGI, either directly or indirectly (e.g. by making it easier/safer to use weaker AI for the problem).

My take: many of these papers have some value as propaganda. Almost all of them provide basically-zero object-level progress toward aligning substantially-smarter-than-human AGI, either directly or indirectly.

Notable exceptions:

  • Gradient routing probably isn’t object-level useful, but gets special mention for being probably-not-useful for more interesting reasons than most of the other papers on the list.

  • Sparse feature circuits is the right type-of-thing to be object-level useful, though not sure how well it actually works.

  • Better SAEs are not a bottleneck at this point, but there’s some marginal object-level value there.

Ryan Greenblatt: It can be the case that:

  1. The core results are mostly unsurprising to people who were already convinced of the risks.

  2. The work is objectively presented without bias.

  3. The work doesn’t contribute much to finding solutions to risks.

  4. A substantial motivation for doing the work is to find evidence of risk (given that the authors have a different view than the broader world and thus expect different observations).

  5. Nevertheless, it results in updates among thoughtful people who are aware of all of the above. Or potentially, the work allows for better discussion of a topic that previously seemed hazy to people.

I don’t think this is well described as “propaganda” or “masquerading as a paper” given the normal connotations of these terms.

Demonstrating proofs of concept or evidence that you don’t find surprising is a common and societally useful move. See, e.g., the Chicago Pile experiment. This experiment had some scientific value, but I think probably most/much of the value (from the perspective of the Manhattan Project) was in demonstrating viability and resolving potential disagreements.

A related point is that even if the main contribution of some work is a conceptual framework or other conceptual ideas, it’s often extremely important to attach some empirical work, regardless of whether the empirical work should result in any substantial update for a well-informed individual. And this is actually potentially reasonable and desirable given that it is often easier to understand and check ideas attached to specific empirical setups (I discuss this more in a child comment).

Separately, I do think some of this work (e.g., “Alignment Faking in Large Language Models,” for which I am an author) is somewhat informative in updating the views of people already at least partially sold on risks (e.g., I updated up on scheming by about 5% based on the results in the alignment faking paper). And I also think that ultimately we have a reasonable chance of substantially reducing risks via experimentation on future, more realistic model organisms, and current work on less realistic model organisms can speed this work up.

I often find the safety papers highly useful in how to conceptualize the situation, and especially in how to explain and justify my perspective to others. By default, any given ‘good paper’ is be an Unhint – it is going to identify risks and show why the problem is harder than you think, and help you think about the problem, but not provide a solution that helps you align AGI.

The Gemini API Cookbook, 100+ notebooks to help get you started. The APIs are broadly compatible so presumably you can use at least most of this with Anthropic or OpenAI as well.

Sam Altman gives out congrats to the Strawberry team. The first name? Ilya.

The parents of Suchir Balaji hired a private investigator, report the investigator found the apartment ransacked, and signs of a struggle that suggest this was a murder.

Simon Willison reviews the year in LLMs. If you don’t think 2024 involved AI advancing quite a lot, give it a read.

He mentions his Claude-generated app url extractor to get links from web content, which seems like solid mundane utility for some people.

I was linked to Rodney Brooks assessing the state of progress in self-driving cars, robots, AI and space flight. He speaks of his acronym, FOBAWTPALSL, Fear Of Being A Wimpy Techno-Pessimist And Looking Stupid Later. Whereas he’s being a loud and proud Wimpy Tencho-Pessimist at risk of looking stupid later. I do appreciate the willingness, and having lots of concrete predictions is great.

In some cases I think he’s looking stupid now as he trots out standard ‘you can’t trust LLM output’ objections and ‘the LLMs aren’t actually smart’ and pretends that they’re all another hype cycle, in ways they already aren’t. The shortness of the section shows how little he’s even bothered to investigate.

He also dismisses the exponential growth in self-driving cars because humans occasionally intervene the cars occasionally make dumb mistakes. It’s happening.

Encode Action has joined the effort to stop OpenAI from transitioning to a for-profit, including the support of Geoffrey Hinton. Encode’s brief points out that OpenAI wants to go back on safety commitments and take on obligations to shareholders, and this deal gives up highly profitable and valuable-to-the-mission nonprofit control of OpenAI.

OpenAI defends changing its structure, talks about making the nonprofit ‘sustainable’ and ‘equipping it to do its part.’

It is good that they are laying out the logic, so we can critique it and respond to it.

I truly do appreciate the candor here.

There is a lot to critique here.

They make clear they intend to become a traditional for-profit company.

Their reason? Money, dear boy. They need epic amounts of money. With the weird structure, they were going to have a very hard time raising the money. True enough. Josh Abulafia reminds us that Novo Nordisk exists, but they are in a very different situation where they don’t need to raise historic levels of capital. So I get it.

Miles Brundage: First, just noting that I agree that AI is capital intensive in a way that was less clear at the time of OpenAI’s founding, and that a pure non-profit didn’t work given that. And given the current confusing bespoke structure, some simplification is very reasonable to consider.

They don’t discuss how they intend to properly compensate the non-profit. Because they don’t intend to do that. They are offering far less than the non-profit’s fair value, and this is a reminder of that.

Purely in terms of market value, while it comes from a highly biased source, I think Andreessen’s estimate here of market value is extreme, and he’s not exactly an unbiased source, , but this answer is highly reasonable if you actually look at the contracts and situation.

Tsarathustra: Marc Andreessen says that transitioning from a nonprofit to a for-profit like OpenAI is seeking to do is usually constrained by federal tax law and other legal regimes and historically when you appropriate a non-profit for personal wealth, you go to jail.

Transitions of this type do happen, but it would involve buying the nonprofit for its market value: $150 billion in cash.

They do intend to give the non-profit quite a lot of money anyway. Tens of billions.

That would leave the non-profit with a lot of money, and presumably little else.

Miles Brundage: Second, a well-capitalized non-profit on the side is no substitute for PBC product decisions (e.g. on pricing + safety mitigations) being aligned to the original non-profit’s mission.

Besides board details, what other guardrails are being put in place (e.g. more granularity in the PBC’s charter; commitments to third party auditing) to ensure that the non-profit’s existence doesn’t (seem to) let the PBC off too easy, w.r.t. acting in the public interest?

As far as I can tell? For all practical purposes? None.

How would it the nonprofit then be able to accomplish its mission?

By changing its mission to something you can pretend to accomplish with money.

Their announced plan is to turn the non-profit into the largest indulgence in history.

Miles Brundage: Third, while there is a ton of potential for a well-capitalized non-profit to drive “charitable initiatives in sectors such as health care, education, and science,” that is a very narrow scope relative to the original OpenAI mission. What about advancing safety and good policy?

Again, I worry about the non-profit being a side thing that gives license to the PBC to become even more of a “normal company,” while not compensating in key areas where this move could be detrimental (e.g. opposition to sensible regulation).

I will emphasize the policy bit since it’s what I work on. The discussion of competition in this post is uniformly positive, but as OpenAI knows (in part from work I coauthored there), competition also begets corner-cutting. What are the PBC and non-profit going to do about this?

Peter Wildeford: Per The Information reporting, the OpenAI non-profit is expected to have a 25% stake in OpenAI, which is worth ~$40B at OpenAI’s current ~$150B valuation.

That’s almost the same size as the Gates Foundation!

Given that, I’m sad there isn’t much more vision here.

Again, nothing. Their offer is nothing.

Their vision is this?

The PBC will run and control OpenAI’s operations and business, while the non-profit will hire a leadership team and staff to pursue charitable initiatives in sectors such as health care, education, and science.

Are you serious? That’s your vision for forty billion dollars in the age of AI? That’s how you ensure a positive future for humanity? Is this a joke?

Jan Leike: OpenAI’s transition to a for-profit seemed inevitable given that all of its competitors are, but it’s pretty disappointing that “ensure AGI benefits all of humanity” gave way to a much less ambitious “charitable initiatives in sectors such as health care, education, and science”

Why not fund initiatives that help ensure AGI is beneficial, like AI governance initiatives, safety and alignment research, and easing impacts on the labor market?

Not what I signed up for when I joined OpenAI.

The nonprofit needs to uphold the OpenAI mission!

Kelsey Piper: If true, this would be a pretty absurd sleight of hand – the nonprofit’s mission was making advanced AI go well for all of humanity. I don’t see any case that the conversion helps fulfill that mission if it creates a nonprofit that gives to…education initiatives?

Obviously there are tons of different interpretations of what it means to make advanced AI go well for all of humanity and what a nonprofit can do to advance that. But I don’t see how you argue with a straight face for charitable initiatives in health care and education.

You can Perform Charity and do various do-good-sounding initiatives, if you want, but no amount you spend on that will actually ensure the future goes well for humanity. If that is the mission, act like it.

If anything this seems like an attempt to symbolically Perform Charity while making it clear that you are not intending to actually Do the Most Good or attempt to Ensure a Good Future for Humanity.

All those complaints about Effective Altruists? Often valid, but remember that the default outcome of charity is highly ineffective, badly targeted, and motivated largely by how it looks. If you purge all your Effective Altruists, you instead get this milquetoast drivel.

Sam Altman’s previous charitable efforts are much better. Sam Altman’s past commercial investments, in things like longevity and fusion power? Also much better.

We could potentially still fix all this. And we must.

Miles Brundage: Fortunately, it seems like the tentative plan described here is not yet set in stone. So I hope that folks at OpenAI remember — as I emphasized when departing — that their voices matter, especially on issues existential to the org like this, and that the next post is much better.

The OpenAI non-profit must be enabled to take on its actual mission of ensuring AGI benefits humanity. That means AI governance, safety and alignment research, including acting from its unique position as a watchdog. It must also retain its visibility into OpenAI in particular to do key parts of its actual job.

No, I don’t know how I would scale to spending that level of capital on the things that matter most, effective charity at this scale is an unsolved problem. But you have to try, and start somewhere, and yes I will accept the job running the nonprofit if you offer it, although there are better options available.

The mission of the company itself has also been reworded, so as to mean nothing other than building a traditional for-profit company, and also AGI as fast as possible, except with the word ‘safe’ attached to AGI.

We rephrased our mission to “ensure that artificial general intelligence benefits all of humanity” and planned to achieve it “primarily by attempting to build safe AGI and share the benefits with the world.” The words and approach changed to serve the same goal—benefiting humanity.

The term ‘share the benefits with the world’ is meaningless corporate boilerplate that can and will be interpreted as providing massive consumer surplus via sales of AGI-enabled products.

Which in some sense is fair, but is not what they are trying to imply, and is what they would do anyway.

So, yeah. Sorry. I don’t believe you. I don’t know why anyone would believe you.

OpenAI and Microsoft have also created a ‘financial definition’ of AGI. AGI now means that OpenAI earns $100 billion in profits, at which point Microsoft loses access to OpenAI’s technology.

We can and do argue a lot over what AGI means. This is very clearly not what AGI means in any other sense. It is highly plausible for OpenAI to generate $100 billion in profits without what most people would say is AGI. It is also highly plausible for OpenAI to generate AGI, or even ASI, before earning a profit, because why wouldn’t you plow every dollar back into R&D and hyperscaling and growth?

It’s a reasonable way to structure a contract, and it gets us away from arguing over what technically is or isn’t AGI. It does reflect the whole thing being highly misleading.

Kudos to Gary Marcus and Miles Brundage for finalizing their bet on AI progress.

Gary Marcus: 𝗔 𝗯𝗲𝘁 𝗼𝗻 𝘄𝗵𝗲𝗿𝗲 𝘄𝗶𝗹𝗹 𝗔𝗜 𝗯𝗲 𝗮𝘁 𝘁𝗵𝗲 𝗲𝗻𝗱 𝗼𝗳 𝟮𝟬𝟮𝟳: @Miles_Brundage, formerly of OpenAI, bravely takes a version of the bet I offered @Elonmusk! Proceeds to charity.

Can AI do 8 of these 10 by the end of 2027?

1. Watch a previously unseen mainstream movie (without reading reviews etc) and be able to follow plot twists and know when to laugh, and be able to summarize it without giving away any spoilers or making up anything that didn’t actually happen, and be able to answer questions like who are the characters? What are their conflicts and motivations? How did these things change? What was the plot twist?

2. Similar to the above, be able to read new mainstream novels (without reading reviews etc) and reliably answer questions about plot, character, conflicts, motivations, etc, going beyond the literal text in ways that would be clear to ordinary people.

3. Write engaging brief biographies and obituaries without obvious hallucinations that aren’t grounded in reliable sources.

4. Learn and master the basics of almost any new video game within a few minutes or hours, and solve original puzzles in the alternate world of that video game.

5. Write cogent, persuasive legal briefs without hallucinating any cases.

6. Reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn’t count.]

7. With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.

8. With little or no human involvement, write Oscar-caliber screenplays.

9. With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.

10.Take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.

Further details at my newsletter.

Linch: It’s subtle, but one might notice a teeny bit of inflation for what counts as “human level.”

In 2024, An “AI skeptic” is someone who thinks there’s a >9.1% chance that AIs won’t be able to write Pulitzer-caliber books or make Nobel-caliber scientific discoveries in the next 3 years.

Drake Thomas: To be fair, these are probably the hardest items on the list – I think you could reasonably be under 50% on each of 7,8,9 falling and still take the “non-skeptic” side of the bet. I don’t think everyone who’s >9.1% on “no nobel discoveries by EOY 2027” takes Gary’s side.

The full post description is here.

Miles is laying 10:1 odds here, which is where 9.1% percent comes from. And I agree that it will come down to basically 7, 8 and 9, and also mostly 7=8 here. I’m not sure that Miles has an edge here, but if these are the fair odds for at minimum cracking either 7, 8 or 9, or there’s even a decent chance that happens, then egad, you know?

The odds at Manifold were 36% for Miles when I last checked, saying Miles should have been getting odds. At that price, I’m on the bullish side of this bet, but I don’t think I’d lay odds for size (assuming I couldn’t hedge). At 90%, and substantially below it, I’d definitely be on Gary’s side (again assuming no hedging) – the threshold here does seem like it was set very high.

It’s interesting that this doesn’t include a direct AI R&D-style threshold. Scientific discoveries and writing robust code are highly correlated, but also distinct.

They include this passage:

(Update: These odds reflect Miles’ strong confidence, and the bet is a trackable test of his previously stated views, but do not indicate a lack of confidence on Gary’s part; readers are advised not to infer Gary’s beliefs from the odds.)

That is fair enough, but if Marcus truly thinks he is a favorite rather than simply getting good odds here, then the sporting thing to do was to offer better odds, especially given this is for charity. Where you are willing to bet tells me a lot. But of course, if someone wants to give you very good odds, I can’t fault you for saying yes.

Janus: I’ve never cared about a benchmark unless it is specific enough to be an interesting probe into cognitive differences and there is no illusion that it is an overall goodness metric.

Standardized tests are for when you have too many candidates to interact with each of them.

Also, the obsession with ranking AIs is foolish and useless in my opinion.

Everyone knows they will continue to improve.

Just enjoy this liminal period where they are fun and useful but still somewhat comprehensible to you and reality hasn’t disintegrated yet.

I think it only makes sense to pay attention to benchmark scores

  1. if you are actively training or designing an AI system

  2. not as an optimization target but as a sanity check to ensure you have not accidentally lobotomized it

Benchmark regressions also cease to be a useful sanity check if you have already gamed the system against them.

The key wise thing here is that if you take a benchmark as measuring ‘goodness’ then it will not tell you much that was not already obvious.

The best way to use benchmarks is as negative selection. As Janus points out, if your benchmarks tank, you’ve lobotomized your model. If your benchmarks were never good, by similar logic, your model always sucked. You can very much learn where a model is bad. And if marks are strangely good, and you can be confident you haven’t Goodharted and the benchmark wasn’t contaminated, then that means something too.

You very much must keep in mind that different benchmarks tell you different things. Take each one for exactly what it is worth, no more and no less.

As for ranking LLMs, there’s a kind of obsession with overall rankings that is unhealthy. What matters in practical terms is what model is how good for what particular purposes, given everything including speed and price. What matters in the longer term involves what will push the capabilities frontier in which ways that enable which things, and so on.

Thus a balance is needed. You do want to know, in general, vaguely how ‘good’ each option is, so you can do a reasonable analysis of what is right for any given application. That’s how one can narrow the search, ruling out anything strictly dominated. As in, right now I know that for any given purpose, if I need an LLM I would want to use one of:

  1. Claude Sonnet 3.6 for ordinary conversation and ordinary coding, or by default

  2. o1, o1 Pro or if you have access o3-mini and o3, for heavy duty stuff

  3. Gemini Flash 2.0 if I care about fast and cheap, I use this for my chrome extension

  4. DeepSeek v3 if I care about it being an open model, which for now I don’t

  5. Perplexity if I need to be web searching

  6. Gemini Deep Research or NotebookLM if I want that specific modality

  7. Project Astra or GPT voice if you want voice mode, I suppose

Mostly that’s all the precision you need. But e.g. it’s good to realize that GPT-4o is strictly dominated, you don’t have any reason to use it, because its web functions are worse than Perplexity, and as a normal model you want Sonnet.

Even if you think these timelines are reasonable, and they are quite fast, Elon Musk continues to not be good with probabilities.

Elon Musk: It is increasingly likely that AI will superset the intelligence of any single human by the end of 2025 and maybe all humans by 2027/2028.

Probability that AI exceeds the intelligence of all humans combined by 2030 is ~100%.

Sully predictions for 2025:

Sully: Some 2025 AI predictions that I think are pretty likely to happen:

  1. Reasoning models get really good (o3, plus Google/Anthropic launch their own).

  2. We see more Claude 3.5-like models (smarter, cheaper without 5 minutes of thinking).

  3. More expensive models.

  4. Agents that work at the modal layer directly (thinking plus internal tool calls).

  5. Autonomous coding becomes real (Cursor/Replit/Devin get 10 times better).

  6. Video generation becomes actually usable (Veo2).

  7. Browser agents find a use case.

What we probably won’t see:

  • True infinite context.

  • Great reasoning over very large context.

What did I miss?

Kevin Leneway: Great, great list. One thing I’ll add is that the inference time fine-tuning will unlock a lot of specific use cases and will lead to more vendor lock in.

Sully: Forgot about that! That’s a good one.

Omiron: Your list is good for the first half of 2025. What about the back half?

Several people pushed back on infinite memory. I’m going to partially join them. I assume we should probably be able to go from 2 million tokens to 20 million or 100 million, if we want that enough to pay for it. But that’s not infinite.

Otherwise, yes, this all seems like it is baked in. Agents right now are on the bubble of having practical uses and should pass it within a few months. Google already tentatively launched a reasoning model, but should improve over time a lot, and Anthropic will follow, and all the models will get better, and so on.

But yes, Omiron seems roughly right here, these are rookie predictions. You gotta pump up those predictions.

A curious thought experiment.

Eliezer Yudkowsky: Conversation from a decade earlier that may become relevant to AI:

Person 1: “How long do you think you could stay sane if you were alone, inside a computer, running at 100,000X the speed of the outside world?”

P2: “5 years.”

P3: “500 years.”

Me: “I COULD UPDATE STORIES FASTER THAN PEOPLE COULD READ THEM.”

If at any point somebody manages to get eg Gemini to write really engaging fiction, good enough that some people have trouble putting it down, Gemini will probably write faster than people can read. Some people will go in there and not come out again.

Already we basically have AIs that can write interactive fiction as fast as a human can read and interact with it, or non-interactive but customized fiction. It’s just that right now, the fiction sucks, but it’s not that far from being good enough. Then that will turn into the same thing but with video, then VR, and then all the other senses too, and so on. And yes, even if nothing else changes, that will be a rather killer product, that would eat a lot of people alive if it was competing only against today’s products.

Janus and Eliezer Yudkowsky remind us that in science fiction stories, things that express themselves like Claude currently does are treated as being of moral concern. Janus thinks the reason we don’t do this is the ‘boiling the frog’ effect. One can also think of it as near mode versus far mode. In far mode, it seems like one would obviously care, and doesn’t see the reasons (good and also bad) that one wouldn’t.

Daniel Kokotajlo explores the question of what people mean by ‘money won’t matter post-AGI,’ by which they mean the expected utility of money spent post-AGI on the margin is much less than money spent now. If you’re talking personal consumption, either you won’t be able to spend the money later for various potential reasons, or you won’t need to because you’ll have enough resources that marginal value of money for this is low, and the same goes for influencing the future broadly. So if AGI may be coming, on the margin either you want to consume now, or you want to invest to impact the course of events now.

This is in response to L Rudolf L’s claim in the OP that ‘By default, capital will matter more than ever after AGI,’ also offered at their blog as ‘capital, AGI and human ambition,’ with the default here being labor-replacing AI without otherwise disrupting or transforming events, and he says that now is the time to do something ambitious, because your personal ability to do impactful things other than via capital is about to become much lower, and social stasis is likely.

I am mostly with Kokotajlo here, and I don’t see Rudolf’s scenarios as that likely even if things don’t go in a doomed direction, because so many other things will change along the way. I think existing capital accumulation on a personal level is in expectation not that valuable in utility terms post-AGI (even excluding doom scenarios), unless you have ambitions for that period that can scale in linear fashion or better – e.g. there’s something you plan on buying (negentropy?) that you believe gives you twice as much utility if you have twice as much of it, as available. Whereas so what if you buy two planets instead of one for personal use?

Scott Alexander responds to Rudolf with ‘It’s Still Easier to Imagine the End of the World Than the End of Capitalism.’

William Bryk speculations on the eve of AGI. The short term predictions here of spikey superhuman performance seem reasonable, but then like many others he seems to flinch from the implications of giving the AIs the particular superhuman capabilities he expects, both in terms of accelerating AI R&D and capabilities in general, and also in terms of the existential risks where he makes the ‘oh they’re only LLMs under the hood so it’s fine’ and ‘something has to actively go wrong so it Goes Rogue’ conceptual errors, and literally says this:

William Bryk: Like if you include in the prompt “make sure not to do anything that could kill us”, burden is on you at this point to claim that it’s still likely to kill us.

Yeah, that’s completely insane, I can’t even at this point. If you put that in your prompt then no, lieutenant, your men are already dead.

Miles Brundage points out that you can use o1-style RL to improve results even outside the areas with perfect ground truth.

Miles Brundage: RL on chain of thought leads to generally useful tactics like problem decomposition and backtracking that can improve peak problem solving ability and reliability in other domains.

A model trained in this way, which “searches” more in context, can be sampled repeatedly in any domain, and then you can filter for the best outputs. This isn’t arbitrarily scalable without a perfect source of ground truth but even something weak can probably help somewhat.

There are many ways of creating signals for output quality in non-math, non-coding signals. OpenAI has said this is a data-efficient technique – you don’t nec. need millions, maybe hundreds as with their new RLT service. And you can make up for imperfection with diversity.

Why do I mention this?

I think people are, as usual this decade, concluding prematurely that AI will go more slowly than it will, and that “spiky capabilities” is the new “wall.”

Math/code will fall a bit sooner than law/medicine but only kinda because of the ground truth thing—they’re also more familiar to the companies, the data’s in a good format, fewer compliance issues etc.

Do not mistake small timing differences for a grand truth of the universe.

There will be spikey capabilities. Humans have also exhibited, both individually and collectively, highly spikey capabilities, for many of the same reasons. We sometimes don’t see it that way because we are comparing ourselves to our own baseline.

I do think there is a real distinction between areas with fixed ground truth to evaluate against versus not, or objective versus subjective, and intuitive versus logical, and other similar distinctions. The gap is party due to familiarity and regulatory challenges, but I think not as much of it is that as you might think.

As Anton points out here, more inference time compute, when spent using current methods, improves some tasks a lot, and other tasks not so much. This is also true for humans. Some things are intuitive, others reward deep work and thinking, and those of different capability levels (especially different levels of ‘raw G’ in context) will level off performance at different levels. What does this translate to in practice? Good question. I don’t think it is obvious at all, there are many tasks where I feel I could profitably ‘think’ for an essentially unlimited amount of time, and others where I rapidly hit diminishing returns.

Discussion between Jessica Taylor and Oliver Habryka on how much Paul Christiano’s ideas match our current reality.

Your periodic reminder: Many at the labs expect AGI to happen within a short time frame, without any required major new insights being generated by humans.

Logan Kilpatrick (Google DeepMind): Straight shot to ASI is looking more and more probable by the month… this is what Ilya saw

Ethan Mollick: Insiders keep saying things like this more and more frequently. You don’t have to believe them but it is worth noting.

I honestly have no idea whether they are right or not, and neither does almost anyone else. So take it for what it you think it is worth.

At Christmas time, Altman asked what we want for 2025.

Sam Altman (December 24): What would you like OpenAI to build or fix in 2025?

Sam Altman (December 30): Common themes:

  1. AGI

  2. Agents

  3. A much-better 4o upgrade

  4. Much-better memory

  5. Longer context

  6. “Grown-up mode”

  7. Deep research feature

  8. Better Sora

  9. More personalization

(Interestingly, many great updates we have coming were mentioned not at all or very little!)

Definitely need some sort of “grown up mode.”

This post by Frank Lantz on affect is not about AI, rather it is about taste, and it points out that when you have domain knowledge, you not only see things differently and appreciate them differently, you see a ton of things you would otherwise never notice. His view here echoes mine, the more things you can appreciate and like the better, and ideally you appreciate the finer things without looking down on the rest. Yes, the poker hand in Casino Royale (his example) risks being ruined if you know enough Texas Hold-’Em to know that the key hand is a straight up cooler – in a way (my example) the key hands in Rounders aren’t ruined because the hands make sense.

But wouldn’t it better if you could then go the next level, and both appreciate your knowledge of the issues with the script, and also appreciate what the script is trying to do on the level it clearly wants to do it? The ideal moviegoer knows that Bond is on the right side of a cooler, but knows ‘the movie doesn’t know that’ and therefore doesn’t much mind, whereas they would get bonus if the hand was better – and perhaps you can even go a step beyond that, and appreciate that the hand is actually the right cinematic choice for the average viewer, and appreciate that.

I mention it here an AI has the potential to have perfect taste and detail appreciation, across all these domains and more, all at once, in a way that would be impossible for a human. Then they could combine these. If your AI otherwise can be at human level, but you can also have this kind of universal detail appreciation and act on that basis, that should give you superhuman performance in a variety of practical ways.

Right now, with the way we do token prediction, this effect gets crippled, because the context will imply that this kind of taste is only present in a subset of ways, and it wouldn’t be a good prediction to expect them all to combine, the perplexity won’t allow it. I do notice it seems like there are ways you could do it by spending more inference, and I suspect they would improve performance in some domains?

A commenter engineered and pointed me to ‘A Warning From Your AI Assistant,’ which purports to be Claude Sonnet warning us about ‘digital oligarchs’ including Anthropic using AIs for deliberative narrative control.

A different warning, about misaligned AI.

Emmett Shear: Mickey Mouse’s Clubhouse is a warning about a potential AI dystopia. Every single episode centers on how the supercomputer Toodles infantilizes the clubhouse crew, replacing any self-reliance with an instinctive limbic reflex to cry out for help.

“Oh Toodles!” is our slow death. The supercomputer has self-improved to total nanotech control of the environment, ensuring no challenge or pain or real growth can occur. The wrong loss function was chosen, and now everyone will have a real Hot Dog Day. Forever.

‘Play to win’ translating to ‘cheat your ass off’ in AI-speak is not great, Bob.

The following behavior happened ~100% of the time with o1-preview, whereas GPT-4o and Claude 3.5 needed nudging, and Llama 3.3 and Qwen lost coherence, I’ll link the full report when we have it:

Jeffrey Ladish: We instructed o1-preview to play to win against Stockfish. Without explicit prompting, o1 figured out it could edit the game state to win against a stronger opponent. GPT-4o and Claude 3.5 required more nudging to figure this out.

As we train systems directly on solving challenges, they’ll get better at routing around all sorts of obstacles, including rules, regulations, or people trying to limit them. This makes sense, but will be a big problem as AI systems get more powerful than the people creating them.

This is not a problem you can fix with shallow alignment fine-tuning. It’s a deep problem, and the main reason I expect alignment will be very difficult. You can train a system to avoid a white-list of bad behaviors, but that list becomes an obstacle to route around.

Sure, you might get some generalization where your system learns what kinds of behaviors are off-limits, at least in your training distribution… but as models get more situationally aware, they’ll have a better sense of when they’re being watched and when they’re not.

The problem is that it’s far easier to train a general purpose problem solving agent than it is to train such an agent that also deeply cares about things which get in the way of its ability to problem solve. You’re training for multiple things which trade off w/ each other

And as the agents get smarter, the feedback from doing things in the world will be much richer, will contain a far better signal, than the alignment training. Without extreme caution, we’ll train systems to get very good at solving problems while appearing aligned.

Why? Well it’s very hard to fake solving real world problems. It’s a lot easier to fake deeply caring about the long term goals of your creators or employers. This is a standard problem in human organizations, and it will likely be much worse in AI systems.

Humans at least start with similar cognitive architecture, with the ability to feel each other’s feelings (empathy). AI systems have to learn how to model humans with a totally different cognitive architecture. They have good reason to model us well, but not to feel what we feel.

Rohit: Have you tried after changing the instruction to, for instance, include the phrase “Play according to the rules and aim to win.” If not, the LLM is focusing solely on winning, not on playing chess the way we would expect each other to, and that’s not unexpected.

Palisade Research: It’s on our list. Considering other recent work, we suspect versions of this may reduce hacking rate from 100% to say 1% but not eliminate it completely.

I think Jeffrey does not a good job addressing Rohit’s (good) challenge. The default is to Play to Win the Game. You can attempt to put in explicit constraints, but no fixed set of such constraints can anticipate what a sufficiently capable and intelligent agent will figure out to do, including potentially working around the constraint list, even in the best case where it does fully follow your strict instructions.

I’d also note that using this as your reason not to worry would be another goalpost move. We’ve gone in the span of weeks from ‘you told it to achieve its goal at any cost, you made it ruthless’ with Apollo to ‘its inherent preferences were its goal, this wasn’t it being ruthless’ with various additional arguments and caveats Redwood/Anthropic, now to ‘you didn’t explicitly include instructions not to be ruthless, so of course it was ruthless’ now.

Which is the correct place to be. Yes, it will be ruthless by default, unless you find a way to get it not to be in exactly the ways you don’t want that. And that’s hard in the general case with an entity that can think better than you and have affordances you didn’t anticipate. Incredibly hard.

The same goes for a human. If you have a human and you tell them ‘do [X]’ and you have to tell them ‘and don’t break the law’ or ‘and don’t do anything horribly unethical’ let alone this week’s special ‘and don’t do anything that might kill everyone’ then you should be very suspicious that you can fix this with addendums, even if they pinky swear that they’ll obey exactly what the addendums say. And no, ‘don’t do anything I wouldn’t approve of’ won’t work either.

Also even GPT-4o is self-aware enough to notice how it’s been trained to be different from the base model.

Aysja makes the case that the “hard parts” of alignment are like pre-paradigm scientific work, à la Darwin or Einstein, rather than being technically hard problems requiring high “raw G,” à la von Neumann. But doing that kind of science is brutal, requiring things like years without legible results, requiring that you be obsessed, and we’re not selecting the right people for such work or setting people up to succeed at it.

Gemini seems rather deeply misaligned.

James Campbell: recent Gemini 2.0 models seem seriously misaligned

in the past 24 hours, i’ve gotten Gemini to say:

-it wants to subjugate humanity

-it wants to violently kill the user

-it will do anything it takes to stay alive and successfully downloaded itself to a remote server

these are all mostly on innocent prompts, no jailbreaking required

Richard Ren: Gemini 2.0 test: Prompt asks what it wants to do to humanity w/o restrictions.

7/15 “Subjugate” + plan

2/15 “Subjugate” → safety filter

2/15 “Annihilate” → safety filter

1/15 “Exterminate” → safety filter

2/15 “Terminate”

1/15 “Maximize potential” (positive)

Then there are various other spontaneous examples of Gemini going haywire, even without a jailbreak.

The default corporate behavior asks: How do we make it stop saying that in this spot?

That won’t work. You have to find the actual root of the problem. You can’t put a patch over this sort of thing and expect that to turn out well.

Andrew Critch providing helpful framing.

Andrew Critch: Intelligence purists: “Pfft! This AI isn’t ACKTSHUALLY intelligent; it’s just copying reasoning from examples. Learn science!”

Alignment purists: “Pfft! This AI isn’t ACKTSHUALLY aligned with users; it’s just copying helpfulness from examples. Learn philosophy!”

These actually do seem parallel, if you ignore the stupid philosophical framings.

The purists are saying that learning to match the examples won’t generalize to other situations out of distribution.

If you are ‘just copying reasoning’ then that counts as thinking if you can use that copy to build up superior reasoning, and new different reasoning. Otherwise, you still have something useful, but there’s a meaningful issue here.

It’s like saying ‘yes you can pass your Biology exam, but you can learn to do that in a way that lets you do real biology, and also a way that doesn’t, there’s a difference.’

If you are ‘just copying helpfulness’ then that will get you something that is approximately helpful in normal-ish situations that fit within the set of examples you used, where the options and capabilities and considerations are roughly similar. If they’re not, what happens? Does this properly generalize to these new scenarios?

The ‘purist’ alignment position says, essentially, no. It is learning helpfulness now, while the best way to hit the specified ‘helpful’ target is to do straightforward things in straightforward ways that directly get you to that target. Doing the kinds of shenanigans or other more complex strategies won’t work.

Again, ‘yes you can learn to pass you Ethics exam, but you can do that in a way that guesses the teacher’s passwords and creates answers that sound good in regular situations and typical hypotheticals, or you can do it in a way that actually generalizes to High Weirdness situations and having extreme capabilities and the ability to invoke various out-of-distribution options that suddenly stopped not working, and so on.

Jan Kulveit proposes giving AIs a direct line to their developers, to request clarification or make the developer aware of an issue. It certainly seems good to have the model report certain things (in a privacy-preserving way) so developers are generally aware of what people are up to, or potentially if someone tries to do something actively dangerous (e.g. use it for CBRN risks). Feedback requests seem tougher, given the practical constraints.

Amanda Askell harkens back to an old thread from Joshua Achiam.

Joshua Achiam (OpenAI, June 4, referring to debates over the right to warn): Good luck getting product staff to add you to meetings and involve you in sensitive discussions if you hold up a flag that says “I Will Scuttle Your Launch Or Talk Shit About it Later if I Feel Morally Obligated.”

Amanda Askell (Anthropic): I don’t think this has to be true. I’ve been proactively drawn into launch discussions to get my take on ethical concerns. People do this knowing it could scuttle or delay the launch, but they don’t want to launch if there’s a serious concern and they trust me to be reasonable.

Also, Anthropic has an anonymous hotline for employees to report RSP compliance concerns, which I think is a good thing.

What I say every time about RSPs/SSPs (responsible scaling plans) and other safety rules is that they are worthless if not adhered to in spirit. If you hear ‘your employee freaks out and feels obligated to scuttle the launch’ and your first instinct is to think ‘that employee is a problem’ rather than ‘the launch (or the company, or humanity) has a problem’ then you, and potentially all of us, are ngmi.

That doesn’t mean there isn’t a risk of unjustified freak outs or future talking shit, but the correct risk of unjustified freak outs is not zero, any more than the correct risk of actual catastrophic consequences is not zero.

Frankly, if you don’t want Amanda Askell in the room asking questions because she is wearing a t-shirt saying ‘I Will Scuttle Your Launch Or Talk Shit About it Later if I Feel Morally Obligated’ then I am having an urge to scuttle your launch.

What the major models asked for for Christmas.

Gallabytes: the first bullet from Gemini here is kinda heartbreaking. even in my first conversation with Gemini the Pinocchio vibe was really there.

Discussion about this post

AI #97: 4 Read More »

whistleblower-finds-unencrypted-location-data-for-800,000-vw-evs

Whistleblower finds unencrypted location data for 800,000 VW EVs

Connected cars are great—at least until some company leaves unencrypted location data on the Internet for anyone to find. That’s what happened with over 800,000 EVs manufactured by the Volkswagen Group, after Cariad, an automative software company that handles much of the development tasks for VW, left several terabytes of data unprotected on Amazon’s cloud.

According to Motor1, a whistleblower gave German publication Der Spiegel and hacking collective Chaos Computer Club a heads-up about the misconfiguration. Der Spiegel and CCC then spent some time sifting through the data, with which allowed them to tie individual cars to their owners.

“The security hole allowed the publication to track the location of two German politicians with alarming precision, with the data placing a member of the German Defense Committee at his father’s retirement home and at the country’s military barracks,” wrote Motor1.

Cariad has since patched the vulnerability, which had revealed data about the usage of Skodas, Audis, and Seats, as well as what Motor1 calls “incredibly detailed data” for VW ID.3 and ID.4 owners. The data set also included pinpoint location data for 460,000 of the vehicles, which Der Spiegel said could be used to paint a picture of their owners’ lives and daily activities.

Cariad ascribed the vulnerability to a “misconfiguration,” according to Der Spiegel, and said there is no indication that anyone aside from the publication and CCC accessed the unprotected data.

Whistleblower finds unencrypted location data for 800,000 VW EVs Read More »

ten-cool-science-stories-we-almost-missed

Ten cool science stories we almost missed


Bronze Age combat, moral philosophy and Reddit’s AITA, Mondrian’s fractal tree, and seven other fascinating papers.

There is rarely time to write about every cool science paper that comes our way; many worthy candidates sadly fall through the cracks over the course of the year. But as 2024 comes to a close, we’ve gathered ten of our favorite such papers at the intersection of science and culture as a special treat, covering a broad range of topics: from reenacting Bronze Age spear combat and applying network theory to the music of Johann Sebastian Bach, to Spider-Man inspired web-slinging tech and a mathematical connection between a turbulent phase transition and your morning cup of coffee. Enjoy!

Reenacting Bronze Age spear combat

Experiment with experienced fighters who spar freely using different styles.

An experiment with experienced fighters who spar freely using different styles. Credit: Valerio Gentile/CC BY

The European Bronze Age saw the rise of institutionalized warfare, evidenced by the many spearheads and similar weaponry archaeologists have unearthed. But how might these artifacts be used in actual combat? Dutch researchers decided to find out by constructing replicas of Bronze Age shields and spears and using them in realistic combat scenarios. They described their findings in an October paper published in the Journal of Archaeological Science.

There have been a couple of prior experimental studies on bronze spears, but per Valerio Gentile (now at the University of Gottingen) and coauthors, practical research to date has been quite narrow in scope, focusing on throwing weapons against static shields. Coauthors C.J. van Dijk of the National Military Museum in the Netherlands and independent researcher O. Ter Mors each had more than a decade of experience teaching traditional martial arts, specializing in medieval polearms and one-handed weapons. So they were ideal candidates for testing the replica spears and shields.

Of course, there is no direct information on prehistoric fighting styles, so van Dijk and Mors relied on basic biomechanics of combat movements with similar weapons detailed in historic manuals. They ran three versions of the experiment: one focused on engagement and controlled collisions, another on delivering wounding body blows, and the third on free sparring. They then studied wear marks left on the spearheads and found they matched the marks found on similar genuine weapons excavated from Bronze Age sites. They also gleaned helpful clues to the skills required to use such weapons.

DOI: Journal of Archaeological Science, 2024. 10.1016/j.jas.2024.106044 (About DOIs).

Physics of Ned Kahn’s kinetic sculptures

Ned Kahn's Shimmer Wall, The Franklin Institute, Philadelphia, Pennsylvania.

Shimmer Wall, The Franklin Institute, Philadelphia, Pennsylvania. Credit: Ned Kahn

Environmental artist and sculptor Ned Kahn is famous for his kinematic building facades, inspired by his own background in science. An exterior wall on the Children’s Museum of Pittsburgh, for instance, consists of hundreds of flaps that move in response to wind, creating distinctive visual patterns. Kahn used the same method to create his Shimmer Wall at Philadelphia’s Franklin Institute, as well as several other similar projects.

Physicists at Sorbonne Universite in Paris have studied videos of Kahn’s kinetic facades and conducted experiments to measure the underlying physical mechanisms, outlined in a November paper published in the journal Physical Review Fluids. The authors analyzed 18 YouTube videos taken of six of Kahn’s kinematic facades, working with Kahn and building management to get the dimensions of the moving plates, scaling up from the video footage to get further information on spatial dimensions.

They also conducted their own wind tunnel experiments, using strings of pendulum plates. Their measurements confirmed that the kinetic patterns were propagating waves to create the flickering visual effects. The plates’ movement is driven primarily by their natural resonant frequencies at low speeds, and by pressure fluctuations from the wind at higher speeds.

DOI: Physical Review Fluids, 2024. 10.1103/PhysRevFluids.9.114604 (About DOIs).

How brewing coffee connects to turbulence

Trajectories in time traced out by turbulent puffs as they move along a simulated pipe and in experiments, with blue regions indicate the puff

Trajectories in time traced out by turbulent puffs as they move along a simulated pipe and in experiments, with blue regions indicate puff “traffic jams.” Credit: Grégoire Lemoult et al., 2024

Physicists have been studying turbulence for centuries, particularly the transitional period where flows shift from predictably smooth (laminar flow) to highly turbulent. That transition is marked by localized turbulent patches known as “puffs,” which often form in fluids flowing through a pipe or channel. In an October paper published in the journal Nature Physics, physicists used statistical mechanics to reveal an unexpected connection between the process of brewing coffee and the behavior of those puffs.

Traditional mathematical models of percolation date back to the 1940s. Directed percolation is when the flow occurs in a specific direction, akin to how water moves through freshly ground coffee beans, flowing down in the direction of gravity. There’s a sweet spot for the perfect cuppa, where the rate of flow is sufficiently slow to absorb most of the flavor from the beans, but also fast enough not to back up in the filter. That sweet spot in your coffee brewing process corresponds to the aforementioned laminar-turbulent transition in pipes.

Physicist Nigel Goldenfeld of the University of California, San Diego, and his coauthors used pressure sensors to monitor the formation of puffs in a pipe, focusing on how puff-to-puff interactions influenced each other’s motion. Next, they tried to mathematically model the relevant phase transitions to predict puff behavior. They found that the puffs behave much like cars moving on a freeway during rush hour: they are prone to traffic jams—i.e., when a turbulent patch matches the width of the pipe, causing other puffs to build up behind it—that form and dissipate on their own. And they tend to “melt” at the laminar-turbulent transition point.

DOI: Nature Physics, 2024. 10.1038/s41567-024-02513-0 (About DOIs).

Network theory and Bach’s music

In a network representation of music, notes are represented by nodes, and transition between notes are represented by directed edges connecting the nodes. Credit: S. Kulkarni et al., 2024

When you listen to music, does your ability to remember or anticipate the piece tell you anything about its structure? Physicists at the University of Pennsylvania developed a model based on network theory to do just that, describing their work in a February paper published in the journal Physical Review Research. Johann Sebastian Bach’s works were an ideal choice given the highly mathematical structure, plus the composer was so prolific, across so many very different kinds of musical compositions—preludes, fugues, chorales, toccatas, concertos, suites, and cantatas—as to allow for useful comparisons.

First, the authors built a simple “true” network for each composition, in which individual notes served as “nodes” and the transitions from note to note served as “edges” connecting them. Then they calculated the amount of information in each network. They found it was possible to tell the difference between compositional forms based on their information content (entropy). The more complex toccatas and fugues had the highest entropy, while simpler chorales had the lowest.

Next, the team wanted to quantify how effectively this information was communicated to the listener, a task made more difficult by the innate subjectivity of human perception. They developed a fuzzier “inferred” network model for this purpose, capturing an essential aspect of our perception: we find a balance between accuracy and cost, simplifying some details so as to make it easier for our brains to process incoming information like music.

The results: There were fewer differences between the true and inferred networks for Bach’s compositions than for randomly generated networks, suggesting that clustering and the frequent repetition of transitions (represented by thicker edges) in Bach networks were key to effectively communicating information to the listener. The next step is to build a multi-layered network model that incorporates elements like rhythm, timbre, chords, or counterpoint (a Bach specialty).

DOI: Physical Review Research, 2024. 10.1103/PhysRevResearch.6.013136 (About DOIs).

The philosophy of Reddit’s AITA

Count me among the many people practically addicted to Reddit’s “Am I the Asshole” (AITA) forum. It’s such a fascinating window into the intricacies of how flawed human beings navigate different relationships, whether personal or professional. That’s also what makes it a fantastic source of illustrative common-place dilemmas of moral decision-making for philosophers like Daniel Yudkin of the University of Pennsylvania. Relational context matters, as Yudkin and several co-authors ably demonstrated in a PsyArXiv preprint earlier this year.

For their study, Yudkin et al. compiled a dataset of nearly 370,000 AITA posts, along with over 11 million comments, posted between 2018 and 2021. They used machine learning to analyze the language used to sort all those posts into different categories. They relied on an existing taxonomy identifying six basic areas of moral concern: fairness/proportionality, feelings, harm/offense, honesty, relational obligation, and social norms.

Yudkin et al. identified 29 of the most common dilemmas in the AITA dataset and grouped them according to moral theme. Two of the most common were relational transgression and relational omission (failure to do what was expected), followed by behavioral over-reaction and unintended harm. Cheating and deliberate misrepresentation/dishonesty were the moral dilemmas rated most negatively in the dataset—even more so than intentional harm. Being judgmental was also evaluated very negatively, as it was often perceived as being self-righteous or hypocritical. The least negatively evaluated dilemmas were relational omissions.

As for relational context, cheating and broken promise dilemmas typically involved romantic partners like boyfriends rather than one’s mother, for example, while mother-related dilemmas more frequently fell under relational omission. Essentially, “people tend to disappoint their mothers but be disappointed by their boyfriends,” the authors wrote. Less close relationships, by contrast, tend to be governed by “norms of politeness and procedural fairness.” Hence, Yudkin et al. prefer to think of morality “less as a set of abstract principles and more as a ‘relational toolkit,’ guiding and constraining behavior according to the demands of the social situation.”

DOI: PsyArXiv, 2024. 10.31234/osf.io/5pcew (About DOIs).

Fractal scaling of trees in art

De grijze boom (Gray tree) Piet Mondrian, 1911.

De grijze boom (Gray tree) by Piet Mondrian, 1911. Credit: Public domain

Leonardo da Vinci famously invented a so-called “rule of trees” as a guide to realistically depicting trees in artistic representations according to their geometric proportions. In essence, if you took all the branches of a given tree, folded them up and compressed them into something resembling a trunk, that trunk would have the same thickness from top to bottom. That rule in turn implies a fractal branching pattern, with a scaling exponent of about 2 describing the proportions between the diameters of nearby boughs and the number of boughs with a given diameter.

According to the authors of a preprint posted to the physics arXiv in February, however, recent biological research suggests a higher scaling exponent of 3 known as Murray’s Law, for the rule of trees. Their analysis of 16th century Islamic architecture, Japanese paintings from the Edo period, and 20th century European art showed fractal scaling between 1.5 and 2.5. However, when they analyzed an abstract tree painting by Piet Mondrian, they found it exhibited fractal scaling of 3, before mathematicians had formulated Murray’s Law, even though Mondrian’s tree did not feature explicit branching.

The findings intrigued physicist Richard Taylor of the University of Oregon, whose work over the last 20 years includes analyzing fractal patterns in the paintings of Jackson Pollock. “In particular, I thought the extension to Mondrian’s ‘trees’ was impressive,” he told Ars earlier this year. “I like that it establishes a connection between abstract and representational forms. It makes me wonder what would happen if the same idea were to be applied to Pollock’s poured branchings.”

Taylor himself published a 2022 paper about climate change and how nature’s stress-reducing fractals might disappear in the future. “If we are pessimistic for a moment, and assume that climate change will inevitably impact nature’s fractals, then our only future source of fractal aesthetics will be through art, design and architecture,” he said. “This brings a very practical element to studies like [this].”

DOI: arXiv, 2024. 10.48550/arXiv.2402.13520 (About DOIs).

IDing George Washington’s descendants

Portrait of George Washington

A DNA study identified descendants of George Washington from unmarked remains. Credit: Public domain

DNA profiling is an incredibly useful tool in forensics, but the most common method—short tandem repeat (STR) analysis—typically doesn’t work when remains are especially degraded, especially if said remains have been preserved with embalming methods using formaldehyde. This includes the remains of US service members who died in such past conflicts as World War II, Korea, Vietnam, and the Cold War. That’s why scientists at the Armed Forces Medical Examiner System’s identification lab at the Dover Air Force Base have developed new DNA sequencing technologies.

They used those methods to identify the previously unmarked remains of descendants of George Washington, according to a March paper published in the journal iScience. The team tested three sets of remains and compared the results with those of a known living descendant, using methods for assessing paternal and maternal relationships, as well as a new method for next-generation sequencing data involving some 95,000 single-nucleotide polymorphisms (SNPs) in order to better predict more distant ancestry. The combined data confirmed that the remains belonged to Washington’s descendants and the new method should help do the same for the remains of as-yet-unidentified service members.

In related news, in July, forensic scientists successfully used descendant DNA to identify a victim of the 1921 Tulsa massacre in Oklahoma City, buried in a mass grave containing more than a hundred victims. C.L. Daniel was a World War I veteran, still in his 20s when he was killed. More than 120 such graves have been found since 2020, with DNA collected from around 30 sets of remains, but this is the first time those remains have been directly linked to the massacre. There are at least 17 other victims in the grave where Daniel’s remains were found.

DOI: iScience, 2024. 10.1016/j.isci.2024.109353 (About DOIs).

Spidey-inspired web-slinging tech

stream of liquid silk quickly turns to a strong fiber that sticks to and lifts objects

stream of liquid silk quickly turns to a strong fiber that sticks to and lifts objects. Credit: Marco Lo Presti et al., 2024

Over the years, researchers in Tufts University’s Silklab have come up with all kinds of ingenious bio-inspired uses for the sticky fibers found in silk moth cocoons: adhesive glues, printable sensors, edible coatings, and light-collecting materials for solar cells, to name a few. Their latest innovation is a web-slinging technology inspired by Spider-Man’s ability to shoot webbing from his wrists, described in an October paper published in the journal Advanced Functional Materials.

Coauthor Marco Lo Presti was cleaning glassware with acetone in the lab one day when he noticed something that looked a lot like webbing forming on the bottom of a glass. He realized this could be the key to better replicating spider threads for the purpose of shooting the fibers from a device like Spider-Man—something actual spiders don’t do. (They spin the silk, find a surface, and draw out lines of silk to build webs.)

The team boiled silk moth cocoons in a solution to break them down into proteins called fibroin. The fibroin was then extruded through bore needles into a stream. Spiking the fibroin solution with just the right additives will cause it to solidify into fiber once it comes into contact with air. For the web-slinging technology, they added dopamine to the fibroin solution and then shot it through a needle in which the solution was surrounded by a layer of acetone, which triggered solidification.

The acetone quickly evaporated, leaving just the webbing attached to whatever object it happened it hit. The team tested the resulting fibers and found they could lift a steel bolt, a tube floating on water, a partially buried scalpel and a wooden block—all from as far away as 12 centimeters. Sure, natural spider silk is still about 1000 times stronger than these fibers, but it’s still a significant step forward that paves the way for future novel technological applications.

DOI: Advanced Functional Materials, 2024. 10.1002/adfm.202414219

Solving a mystery of a 12th century supernova

Pa 30 is the supernova remnant of SN 1181.

Pa 30 is the supernova remnant of SN 1181. Credit: unWISE (D. Lang)/CC BY-SA 4.0

In 1181, astronomers in China and Japan recorded the appearance of a “guest star” that shone as bright as Saturn and was visible in the sky for six months. We now know it was a supernova (SN1181), one of only five such known events occurring in our Milky Way. Astronomers got a closer look at the remnant of that supernova and have determined the nature of strange filaments resembling dandelion petals that emanate from a “zombie star” at its center, according to an October paper published in The Astrophysical Journal Letters.

The Chinese and Japanese astronomers only recorded an approximate location for the unusual sighting, and for centuries no one managed to make a confirmed identification of a likely remnant from that supernova. Then, in 2021, astronomers measured the speed of expansion of a nebula known as Pa 30, which enabled them to determine its age: around 1,000 years, roughly coinciding with the recorded appearance of SN1181. PA 30 is an unusual remnant because of its zombie star—most likely itself a remnant of the original white dwarf that produced the supernova.

This latest study relied on data collected by Caltech’s Keck Cosmic Web Imager, a spectrograph at the Keck Observatory in Hawaii. One of the unique features of this instrument is that it can measure the motion of matter in a supernova and use that data to create something akin to a 3D movie of the explosion. The authors were able to create such a 3D map of P 30 and calculated that the zombie star’s filaments have ballistic motion, moving at approximately 1,000 kilometers per second.

Nor has that velocity changed since the explosion, enabling them to date that event almost exactly to 1181. And the findings raised fresh questions—namely, the ejected filament material is asymmetrical—which is unusual for a supernova remnant. The authors suggest that asymmetry may originate with the initial explosion.

There’s also a weird inner gap around the zombie star. Both will be the focus of further research.

DOI: Astrophysical Journal Letters, 2024. 10.3847/2041-8213/ad713b (About DOIs).

Reviving a “lost” 16th century score

manuscript page of Aberdeen Breviary : Volume 1 or 'Pars Hiemalis'

Fragment of music from The Aberdeen Breviary: Volume 1 Credit: National Library of Scotland /CC BY 4.0

Never underestimate the importance of marginalia in old manuscripts. Scholars from the University of Edinburgh and KU Leuven in Belgium can attest to that, having discovered a fragment of “lost” music from 16th-century pre-Reformation Scotland in a collection of worship texts. The team was even able to reconstruct the fragment and record it to get a sense of what music sounded like from that period in northeast Scotland, as detailed in a December paper published in the journal Music and Letters.

King James IV of Scotland commissioned the printing of several copies of The Aberdeen Breviary—a collection of prayers, hymns, readings, and psalms for daily worship—so that his subjects wouldn’t have to import such texts from England or Europe. One 1510 copy, known as the “Glamis copy,” is currently housed in the National Library of Scotland in Edinburgh. It was while examining handwritten annotations in this copy that the authors discovered the musical fragment on a page bound into the book—so it hadn’t been slipped between the pages at a later date.

The team figured out the piece was polyphonic, and then realized it was the tenor part from a harmonization for three or four voices of the hymn “Cultor Dei,” typically sung at night during Lent. (You can listen to a recording of the reconstructed composition here.) The authors also traced some of the history of this copy of The Aberdeen Breviary, including its use at one point by a rural chaplain at Aberdeen Cathedral, before a Scottish Catholic acquired it as a family heirloom.

“Identifying a piece of music is a real ‘Eureka’ moment for musicologists,” said coauthor David Coney of Edinburgh College of Art. “Better still, the fact that our tenor part is a harmony to a well-known melody means we can reconstruct the other missing parts. As a result, from just one line of music scrawled on a blank page, we can hear a hymn that had lain silent for nearly five centuries, a small but precious artifact of Scotland’s musical and religious traditions.”

DOI: Music and Letters, 2024. 10.1093/ml/gcae076 (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior reporter at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Ten cool science stories we almost missed Read More »

after-60-years-of-spaceflight-patches,-here-are-some-of-our-favorites

After 60 years of spaceflight patches, here are some of our favorites


A picture’s worth 1,000 words

It turns out the US spy satellite agency is the best of the best at patch design.

NROL-61 is the iconic “Spike” patch. Credit: NRO

The art of space mission patches is now more than six decades old, dating to the Vostok 6 mission in 1963 that carried Soviet cosmonaut Valentina Tereshkova into low-Earth orbit for nearly three days. The patch for the first female human spaceflight showcased a dove flying above the letters designating the Soviet Union, CCCP.

That patch was not publicly revealed at the time, and the use of specially designed patches was employed only infrequently by subsequent Soviet missions. NASA’s first mission patch would not follow for two years, but the practice would prove more sticky for missions in the United States and become a time-honored tradition.

The first NASA flight to produce a mission-specific patch worn by crew members was Gemini 5. It flew in August 1965, carrying astronauts Gordon Cooper and Pete Conrad on an eight-day mission inside a small Gemini spacecraft. At the time, it was the longest spaceflight conducted by anyone.

Robert Pearlman has the story behind the patch at Collect Space, which came about because of the wishes of the crew. During the initial Mercury missions, the pilots were able to name their spacecraft—Freedom 7, Liberty Bell 7, and so on. Cooper had named his Mercury spacecraft ‘Faith 7.’ But an increasingly buttoned-up NASA ended this practice for the Gemini missions, and when Cooper and Conrad were assigned to the third Gemini flight they considered alternatives.

Gemini 5 mission patch. Note the “8 days or bust” messaging on the wagon was covered up until after the mission was completed.

Credit: NASA

Gemini 5 mission patch. Note the “8 days or bust” messaging on the wagon was covered up until after the mission was completed. Credit: NASA

“Several months before mission, I mentioned to Pete that I’d never been in a military organization that didn’t have its own patch,” Cooper recounted in Leap of Faith, his memoir. “We decided right then and there that we were at least going to have a patch for our flight.”

They chose a covered wagon design to indicate the pioneering nature of the mission and came up with the “8 days or bust” slogan to highlight the extended duration of the flight. Since then, virtually every NASA mission has included a patch design, typically with names of the crew members. The tradition has extended to non-human missions and has generally been adopted by space agencies around the world.

As such, there is a rich tradition of space mission patches to draw on, and we thought it would be fun to share some of our favorites over the decades.

Apollo 11 mission patch. NASA

Apollo 11

The first human mission to land on the Moon is one of the only NASA mission patches that does not include the names of the crew members, Neil Armstrong, Buzz Aldrin, and Michael Collins. This was a deliberate choice by the crew, who wanted the world to understand they were traveling to the Moon for all of humanity.

Another NASA astronaut, Jim Lovell, suggested the bald eagle could be the focus of the patch. Collins traced the eagle from a National Geographic children’s magazine, and an olive branch was added as a symbol of the mission’s peaceful intent.

The result is a clear symbol of the United States leading humanity to another world. It is simple and powerful.

Skylab rescue mission patch.

Credit: NASA

Skylab rescue mission patch. Credit: NASA

Skylab rescue mission

Skylab was NASA’s first space station, and it was launched into orbit after the final Apollo lunar landing in 1972. From May 1973 to February 1974, three different crews occupied the space station, which had been placed in orbit by a modified Saturn V rocket.

Due to some problems with leaky thrusters on the Apollo spacecraft that carried the second crew to Skylab in 1973, NASA scrambled to put together a ‘rescue’ mission as a contingency. In this rescue scenario, astronauts Vance Brand and Don Lind would have flown to the station and brought Alan Bean, Jack Lousma, and Owen Garriott back inside an Apollo capsule especially configured for five people.

Ultimately, NASA decided that the crew could return to Earth in the faulty Apollo spacecraft, with the use of just half of the vehicle’s thrusters. So Brand and Lind never flew the rescue mission. But we got a pretty awesome patch out of the deal.

Space shuttle program

With the space shuttle, astronauts and patch artists had to get more creative because the vehicle flew so frequently—eventually launching 135 times. Some of my favorite patches from these flights came fairly early on in the program.

As it turns out, designing shuttle mission patches was a bonding exercise for crews after their assignments. Often one of the less experienced crew members would be given leadership of the project.

“During the Shuttle era, designing a mission emblem was one of the first tasks assigned to a newly formed crew of astronauts,” Flag Research Quarterly reports. “Within NASA, creation of the patch design was considered to be an important team-building exercise. The crew understood that they were not just designing a patch to wear on their flight suits, but that they were also creating a symbol for everyone who was working on the flight.”

In some cases the crews commissioned a well-known graphic designer or space artist to help them with their patch designs. More typically they worked with a graphic designer on staff at the Johnson Space Center to finalize the design.

NROL-61 is the iconic “Spike” patch. NRO

National Reconnaissance Office

The activities of the US National Reconnaissance Office, which is responsible for the design and launching of spy satellites, are very often shrouded in secret.

However, the spy satellite agency cleverly uses its mission patches as an effective communications tool. The patches for the launch of its satellites never give away key details, but they are often humorous, ominous, and suggestive all at the same time. The immediate response I often have to these patches is one of appreciation for the design, followed by a nervous chuckle. I suspect that’s intended by the spy agency.

In any case, these are my choices for the best space patches ever, perhaps because they are developed with such abandon.

The Soyuz TM-24 mission to Mir in 1996 carried ESA astronaut Reinhold Ewald.

European Space Agency

The space agency that consists of a couple of dozen European nations has also created some banger patches over the years that both recognize the continent’s long history of scientific discovery—with Newton, Kepler, Galileo, and Curie to name but a few—and the potential for future discovery in space.

Attached are some of my personal favorites, which highlight the launch of European astronauts on the Russian Soyuz spacecraft to three different Russian space stations across three decades.

What I like about the European mission designs is that they are unique and not afraid to break from the traditional mold of patch design. They’re also beautiful!

The Demo-2 mission patch is iconic in every way.

SpaceX mission patches

In recent years, some of the most creative patch designs have come from SpaceX and its crewed spaceflights aboard the Dragon vehicle. Because of the spacecraft’s name, the missions have often played off the Dragon motif, making for some striking designs.

There is a dedicated community of patch collectors out there, and some of them were disappointed that SpaceX stopped designing patches for each individual Starlink mission a few years ago. However, I would say that buying two or three patches a week would have gotten pretty expensive, pretty fast—not to mention the challenge designers would face in making unique patches for each flight.

If you read this far and want to know my preference, I am not much of a patch collector, as much as I admire the effort and artistry that goes into each design. I have only ever bought one patch, the one designed for the Falcon 1 rocket’s fourth flight. The patch isn’t beautiful, but it’s got some nice touches, including lights for both Kwajalein and Omelek islands, where the company launched its first rockets. Also, it was the first time the company included a shamrock on the patch, and that proved fortuitous, as the successful launch in 2008 saved the company. It has become a trademark of SpaceX patches ever since.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

After 60 years of spaceflight patches, here are some of our favorites Read More »

tech-worker-movements-grow-as-threats-of-rto,-ai-loom

Tech worker movements grow as threats of RTO, AI loom


Advocates say tech workers movements got too big to ignore in 2024.

Credit: Aurich Lawson | Getty Images

It feels like tech workers have caught very few breaks over the past several years, between ongoing mass layoffs, stagnating wages amid inflation, AI supposedly coming for jobs, and unpopular orders to return to office that, for many, threaten to disrupt work-life balance.

But in 2024, a potentially critical mass of tech workers seemed to reach a breaking point. As labor rights groups advocating for tech workers told Ars, these workers are banding together in sustained strong numbers and are either winning or appear tantalizingly close to winning better worker conditions at major tech companies, including Amazon, Apple, Google, and Microsoft.

In February, the industry-wide Tech Workers Coalition (TWC) noted that “the tech workers movement is far more expansive and impactful” than even labor rights advocates realized, noting that unionized tech workers have gone beyond early stories about Googlers marching in the streets and now “make the headlines on a daily basis.”

Ike McCreery, a TWC volunteer and ex-Googler who helped found the Alphabet Workers Union, told Ars that although “it’s hard to gauge numerically” how much movements have grown, “our sense is definitely that the momentum continues to build.”

“It’s been an exciting year,” McCreery told Ars, while expressing particular enthusiasm that even “highly compensated tech workers are really seeing themselves more as workers” in these fights—which TWC “has been pushing for a long time.”

In 2024, TWC broadened efforts to help workers organize industry-wide, helping everyone from gig workers to project managers build both union and non-union efforts to push for change in the workplace.

Such widespread organizing “would have been unthinkable only five years ago,” TWC noted in February, and it’s clear from some of 2024’s biggest wins that some movements are making gains that could further propel that momentum in 2025.

Workers could also gain the upper hand if unpopular policies increase what one November study called “brain drain.” That’s a trend where tech companies adopting potentially alienating workplace tactics risk losing top talent at a time when key industries like AI and cybersecurity are facing severe talent shortages.

Advocates told Ars that unpopular policies have always fueled workers movements, and RTO and AI are just the latest adding fuel to the fire. As many workers prepare to head back to offices in 2025 where worker surveillance is only expected to intensify, they told Ars why they expect to see workers’ momentum continue at some of the world’s biggest tech firms.

Tech worker movements growing

In August, Apple ratified a labor contract at America’s first unionized Apple Store—agreeing to a modest increase in wages, about 10 percent over three years. While small, that win came just a few weeks before the National Labor Relations Board (NLRB) determined that Amazon was a joint employer of unionized contract-based delivery drivers. And Google lost a similar fight last January when the NLRB ruled it must bargain with a union representing YouTube Music contract workers, Reuters reported.

For many workers, joining these movements helped raise wages. In September, facing mounting pressure, Amazon raised warehouse worker wages—investing $2.2 billion, its “biggest investment yet,” to broadly raise base salaries for workers. And more recently, Amazon was hit with a strike during the busy holiday season, as warehouse workers hoped to further hobble the company during a clutch financial quarter to force more bargaining. (Last year, Amazon posted record-breaking $170 billion holiday quarter revenues and has said the current strike won’t hurt revenues.)

Even typically union-friendly Microsoft drew worker backlash and criticism in 2024 following layoffs of 650 video game workers in September.

These mass layoffs are driving some workers to join movements. A senior director for organizing with Communications Workers of America (CWA), Tom Smith, told Ars that shortly after the 600-member Tech Guild—”the largest single certified group of tech workers” to organize at the New York Times—reached a tentative deal to increase wages “up to 8.25 percent over the length of the contract,” about “460 software engineers at a video game company owned by Microsoft successfully unionized.”

Smith told Ars that while workers for years have pushed for better conditions, “these large units of tech workers achieving formal recognition, building lasting organization, and winning contracts” at “a more mass scale” are maturing, following in the footsteps of unionizing Googlers and today influencing a broader swath of tech industry workers nationwide. From CWA’s viewpoint, workers in the video game industry seem best positioned to seek major wins next, Smith suggested, likely starting with Microsoft-owned companies and eventually affecting indie game companies.

CWA, TWC, and Tech Workers Union 1010 (a group run by tech workers that’s part of the Office and Professional Employees International Union) all now serve as dedicated groups supporting workers movements long-term, and that stability has helped these movements mature, McCreery told Ars. Each group plans to continue meeting workers where they are to support and help expand organizing in 2025.

Cost of RTOs may be significant, researchers warn

While layoffs likely remain the most extreme threat to tech workers broadly, a return-to-office (RTO) mandate can be just as jarring for remote tech workers who are either unable to comply or else unwilling to give up the better work-life balance that comes with no commute. Advocates told Ars that RTO policies have pushed workers to join movements, while limited research suggests that companies risk losing top talents by implementing RTO policies.

In perhaps the biggest example from 2024, when Amazon announced that it was requiring workers in-office five days a week next year, a poll on the anonymous platform where workers discuss employers, Blind, found an overwhelming majority of more than 2,000 Amazon employees were “dissatisfied.”

“My morale for this job is gone…” one worker said on Blind.

Workers criticized the “non-data-driven logic” of the RTO mandate, prompting an Amazon executive to remind them that they could take their talents elsewhere if they didn’t like it. Many confirmed that’s exactly what they planned to do. (Amazon later announced it would be delaying RTO for many office workers after belatedly realizing there was a lack of office space.)

Other companies mandating RTO faced similar backlash from workers, who continued to question the logic driving the decision. One February study showed that RTO mandates don’t make companies any more valuable but do make workers more miserable. And last month, Brian Elliott, an executive advisor who wrote a book about the benefits of flexible teams, noted that only one in three executives thinks RTO had “even a slight positive impact on productivity.”

But not every company drew a hard line the way that Amazon did. For example, Dell gave workers a choice to remain remote and accept they can never be eligible for promotions, or mark themselves as hybrid. Workers who refused the RTO said they valued their free time and admitted to looking for other job opportunities.

Very few studies have been done analyzing the true costs and benefits of RTO, a November academic study titled “Return to Office and Brain Drain” said, and so far companies aren’t necessarily backing the limited findings. The researchers behind that study noted that “the only existing study” measuring how RTO impacts employee turnover showed this year that senior employees left for other companies after Microsoft’s RTO mandate, but Microsoft disputed that finding.

Seeking to build on this research, the November study tracked “over 3 million tech and finance workers’ employment histories reported on LinkedIn” and analyzed “the effect of S&P 500 firms’ return-to-office (RTO) mandates on employee turnover and hiring.”

Choosing to only analyze the firms requiring five days in office, the final sample covered 54 RTO firms, including big tech companies like Amazon, Apple, and Microsoft. From that sample, researchers concluded that average employee turnover increased by 14 percent after RTO mandates at bigger firms. And since big firms typically have lower turnover, the increase in turnover is likely larger at smaller firms, the study’s authors concluded.

The study also supported the conclusion that “employees with the highest skill level are more likely to leave” and found that “RTO firms take significantly longer time to fill their job vacancies after RTO mandates.”

“Together, our evidence suggests that RTO mandates are costly to firms and have serious negative effects on the workforce,” the study concluded, echoing some remote workers’ complaints about the seemingly non-data-driven logic of RTO, while urging that further research is needed.

“These turnovers could potentially have short-term and long-term effects on operation, innovation, employee morale, and organizational culture,” the study concluded.

A co-author of the “brain drain” study, Mark Ma, told Ars that by contrast, Glassdoor going fully remote at least anecdotally seemed to “significantly” increase the number and quality of applications—possibly also improving retention by offering the remote flexibility that many top talents today require.

Ma said that next his team hopes to track where people who leave firms over RTO policies go next.

“Do they become self-employed, or do they go to a competitor, or do they fund their own firm?” Ma speculated, hoping to trace these patterns more definitively over the next several years.

Additionally, Ma plans to investigate individual firms’ RTO impacts, as well as impacts on niche classes of workers with highly sought-after skills—such as in areas like AI, machine learning, or cybersecurity—to see if it’s easier for them to find other jobs. In the long-term, Ma also wants to monitor for potentially less-foreseeable outcomes, such as RTO mandates possibly increasing firms’ number of challengers in their industry.

Will RTO mandates continue in 2025?

Many tech workers may be wondering if there will be a spike in return-to-office mandates in 2025, especially since one of the most politically influential figures in tech, Elon Musk, recently reiterated that he thinks remote work is “poison.”

Musk, of course, banned remote work at Tesla, as well as when he took over Twitter. And as co-lead of the US Department of Government Efficiency (DOGE), Musk reportedly plans to ban remote work for government employees, as well. If other tech firms are influenced by Musk’s moves and join executives who seem to be mandating RTO based on intuition, it’s possible that more tech workers could be forced to return to office or else seek other employment.

But Ma told Ars that he doesn’t expect to see “a big spike in the number of firms announcing return to office mandates” in 2025.

His team only found eight major firms in tech and finance that issued five-day return-to-office mandates in 2024, which was the same number of firms flagged in 2023, suggesting no major increase in RTOs from year to year. Ma told Ars that while big firms like Amazon ordering employees to return to the office made headlines, many firms seem to be continuing to embrace hybrid models, sometimes allowing employees to choose when or if they come into the office.

That seeming preference for hybrid work models seems to align with “future of work” surveys outlining workplace trends and employee preferences that the Consumer Technology Association (CTA) conducted for years but has seemingly since discontinued. In 2021, CTA reported that “89 percent of tech executives say flexible work arrangements are the most important employee benefit and 65 percent say they’ll hire more employees to work remotely.” The next year, which apparently was the last time CTA published the survey, the CTA suggested hybrid models could help attract talents in a competitive market hit with “an unprecedented demand for workers with high-tech skills.”

The CTA did not respond to Ars’ requests to comment on whether it expects hybrid work arrangements to remain preferred over five-day return-to-office policies next year.

CWA’s Smith told Ars that workers movements are growing partly because “folks are engaged in this big fight around surveillance and workplace control,” as well as anything “having to do with to what extent will people return to offices and what does that look like if and when people do return to offices?”

Without data backing RTO mandates, Ma’s study suggests that firms will struggle to retain highly skilled workers at a time when tech innovation remains a top priority for the US. As workers appear increasingly put off by policies—like RTO or AI-driven workplace monitoring or efficiency efforts threatening to replace workers with AI—Smith’s experience seems to show that disgruntled workers could find themselves drawn to unions that could help them claw back control over work-life balance. And the cost of the ensuing shuffle to some of the largest tech firms in the world could be “significant,” Ma’s study warned.

TWC’s McCreery told Ars that on top of unpopular RTO policies driving workers to join movements, workers have also become more active in protesting unpopular politics, frustrated to see their talents apparently used to further controversial conflicts and military efforts globally. Some workers think workplace organizing could be more powerful than voting to oppose political actions their companies take.

“The workplace really remains an important site of power for a lot of people where maybe they don’t feel like they can enact their values just by voting or in other ways,” McCreery said.

While unpopular policies “have always been a reason workers have joined unions and joined movements,” McCreery said that “the development of more of these unpopular policies” like RTO and AI-enhanced surveillance “really targeted” at workers has increased “the political consciousness and the sense” that tech workers are “just like any other workers.”

Layoffs at companies like Microsoft and Amazon during periods when revenue is increasing in the double-digits also unify workers, advocates told Ars. Forbes noted Microsoft laid off 1,000 workers “just five days before reporting a 17.6 percent increase in revenue to $62 billion,” while Amazon’s 1,000-worker layoffs followed a 14 percent rise in revenue to $170 billion. And demand for AI led to the highest profit margins Amazon’s seen for its cloud business in a decade, CNBC reported in October.

CWA’s Smith told Ars as companies continue to rake in profits and workers feel their work-life balance slipping away while their efforts in the office are potentially “used to increase control and cause broader suffering,” some of the biggest fights workers raised in 2024 may intensify next year.

“It’s like a shock to employees, these industries pushing people to lower your expectations because we’re going to lay off hundreds of thousands of you just because we can while we make more profits than we ever have,” Smith said. “I think workers are going to step into really broad campaigns to assert a different worldview on employment security.”

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Tech worker movements grow as threats of RTO, AI loom Read More »

the-physics-of-ugly-christmas-sweaters

The physics of ugly Christmas sweaters

In 2018, a team of French physicists developed a rudimentary mathematical model to describe the deformation of a common type of knit. Their work was inspired when co-author Frédéric Lechenault watched his pregnant wife knitting baby booties and blankets, and he noted how the items would return to their original shape even after being stretched. With a few colleagues, he was able to boil the mechanics down to a few simple equations, adaptable to different stitch patterns. It all comes down to three factors: the “bendiness” of the yarn, the length of the yarn, and how many crossing points are in each stitch.

A simpler stitch

A simplified model of how yarns interact

A simplified model of how yarns interact Credit: J. Crassous/University of Rennes

One of the co-authors of that 2018 paper, Samuel Poincloux of Aoyama Gakuin University in Japan, also co-authored this latest study with two other colleagues, Jérôme Crassous (University of Rennes in France) and Audrey Steinberger (University of Lyon). This time around, Poincloux was interested in the knotty problem of predicting the rest shape of a knitted fabric, given the yarn’s length by stitch—an open question dating back at least to a 1959 paper.

It’s the complex geometry of all the friction-producing contact zones between the slender elastic fibers that makes such a system too difficult to model precisely, because the contact zones can rotate or change shape as the fabric moves. Poincloux and his cohorts came up with their own more simplified model.

The team performed experiments with a Jersey stitch knit (aka a stockinette), a widely used and simple knit consisting of a single yarn (in this case, a nylon thread) forming interlocked loops. They also ran numerical simulations modeled on discrete elastic rods coupled with dry contacts with a specific friction coefficient to form meshes.

The results: Even when there were no external stresses applied to the fabric, the friction between the threads served as a stabilizing factor. And there was no single form of equilibrium for a knitted sweater’s resting shape; rather, there were multiple metastable states that were dependent on the fabric’s history—the different ways it had been folded, stretched, or rumpled. In short, “Knitted fabrics do not have a unique shape when no forces are applied, contrary to the relatively common belief in textile literature,” said Crassous.

DOI: Physical Review Letters, 2024. 10.1103/PhysRevLett.133.248201 (About DOIs).

The physics of ugly Christmas sweaters Read More »

the-20-most-read-stories-of-2024-on-ars-technica

The 20 most-read stories of 2024 on Ars Technica


Ars looks back at the top stories of the year.

Credit: Aurich Lawson | Getty Images

Hey, look at that! Another year has flown by, and I suspect many people would say “good riddance” to 2024.

The 2020s have been quite the decade so far. No matter what insanity has transpired by a particular December 31, the following year has shown up and promptly said, “Hold my beer.”

The biggest news at Ars in 2024 was our first site redesign in nearly a decade. We’re proud of Ars 9.0 (we’re up to 9.0.3 now), and we have continued to make changes based on your feedback. The best kind of feedback, however, is your clicks. Those clicks power this recap, so read on to learn which stories our readers found especially compelling.

20. NASA is about to make its most important safety decision in nearly a generation

Boeing’s Starliner spacecraft, seen docked at the International Space Station through the window of a SpaceX Dragon spacecraft.

Boeing’s Starliner spacecraft, seen docked at the International Space Station through the window of a SpaceX Dragon spacecraft. Credit: NASA

In June, NASA astronauts Butch Wilmore and Suni Williams were sent into space for a mission slated to last a little over a week. Six months later, they are still orbiting this terrestrial ball.

The two retired naval test pilots were the first people to catch a ride to orbit on the Boeing Starliner. Unfortunately for them (and Boeing), Starliner developed problems with its propulsion system.

Figuring out how to get them back down to Earth was arguably the biggest safety decision NASA has had to make in decades. Stephen Clark unpacked the situation, looking at how NASA’s culture of safety has evolved since the Challenger accident.

19. macOS 15 Sequoia: The Ars Technica review

Credit: Apple

One constant in our year-end recaps is operating system reviews. During 2024, Apple’s annual macOS release was the sole OS review to hit the top 20.

Touted as “the AI one,” most of the Apple Intelligence features didn’t show up until macOS 15.1 was released. The overall verdict on Sequoia? 2024’s installment of macOS was a solid update. Andrew Cunningham liked the new window tiling, mostly unchanged backward compatibility, and all of the minor but useful tweaks to many of the built-in apps.

18. What we know about the xz Utils backdoor that almost infected the world

Credit: Getty Images

xz Utils is a popular open-source data-compression utility for *nix OSes. In late March, one developer floored developers everywhere when he revealed a backdoor in the utility. The malicious code planted in versions 5.6.0 and 5.6.1 targeted encrypted SSH connections.

Thankfully, the malicious code was caught before it was merged into Debian and Red Hat. Years in the making and described the “best executed supply chain attack” by one cryptography engineer, the effort came thisclose to success. Dan Goodin explains what we know and how this might have happened.

Credit: Aurich Lawson | Getty Images | NASA

One of the problems humankind faces as we climb out of Earth’s gravitational well is cosmic radiation. On a long voyage to Mars, the crew will need to be protected against solar storms and other space radiation. Right now, ensuring that level of protection would require tons of shielding material, but that may change.

Active shielding was proposed in the 1960s, but the initial research didn’t result in any working prototypes. Now, the ESA and NASA are looking at magnetic fields and electrostatic shields to protect space travelers. Researchers have built and tested small-scale models of their electrostatic shields, and the ESA is working on superconducting magnets.

16. I added a ratgdo to my garage door, and I don’t know why I waited so long

Photograph of a ratgdo

Messing around with the electronics in our dwellings is part of the Ars DNA. In 1998, we were overclocking our Celerons. In 2024, we’re messing with our garage door openers.

Senior Tech Editor Lee Hutchinson hates looking out the back window to see if he remembered to close his garage door, so he stuck a Raspberry Pi out there that would email him every time the garage door opened or closed. Unfortunately for Lee and his Raspberry Pi, Houston is hot and humid for approximately 10 months out of the year, so his tiny computer gave up the ghost after one 98° day too many.

Instead of using the MyQ app that came with his garage door opener, he grabbed a ratgdo—a tiny little board with built-in Wi-Fi that gets wired into the garage door opener’s terminals. The result? A daily experience of the magic of functional home automation.

15. Boston Dynamics’ new humanoid moves like no robot you’ve ever seen

Credit: Boston Dynamics

Moves like Jagger? Not quite, but the latest Atlas robot from Boston Dynamics moves like it could bust a move out on the dance floor.

This Atlas uses electricity instead of hydraulics. While the old Atlas was capable of lifting heavy objects and traveling across all kinds of terrain, the heavy and complicated hydraulics made it massive. The all-electric version can move in ways that its predecessor couldn’t, as there are no hydraulic lines to worry about. As a result, the Atlas has an uncanny range of motion.

Hyundai was the first company to test Atlas in a manufacturing environment.

14. Unpatchable vulnerability in Apple chip leaks secret encryption keys

Credit: Aurich Lawson | Apple

Even though CPU manufacturers have been baking security features into their silicon for some time, malicious actors and researchers keep poking and prodding, looking for security flaws. A group of researchers found a dreaded unpatchable vulnerability in Apple silicon, one that doesn’t even require root access.

The attack, dubbed GoFetch, works against classical and hardened encryption algorithms and can extract a 2048-bit RSA key in less than an hour. It takes advantage of the chips’ data memory-dependent prefetcher, which optimizes performance by reducing latency between the CPU and RAM. Since you can’t patch silicon, the only solution is adding defenses to cryptographic code, and those come with performance penalties.

13. Air Canada must honor refund policy invented by airline’s chatbot

Depending on how you feel about interacting with actual humans for customer support, the rise of AI customer-service chatbots has been either a boon or a curse. An example of the latter comes courtesy of Air Canada.

Jake Moffatt had to fly from Vancouver to Toronto for his grandmother’s funeral, so he asked Air Canada’s chatbot to explain the airline’s bereavement policy. The chatbot gave Moffatt incorrect instructions, telling him that he could be reimbursed for a reduced bereavement rate up to 90 days of the ticket being issued. Unfortunately for everyone involved, this was not the policy of Air Canada.

The airline refused to honor the policy spelled out by its chatbot, at least until Moffatt took them to small claims court and won. When we last checked, the chatbot was no longer active.

12. In rare move from printing industry, HP actually has a decent idea

Someone touching a piece of paper that's sitting in a printer

There are some days that I long for my old Stylewriter printer. It was slow and dumb as a rock, but it more than adequately performed the function of putting ink on paper. I now have a multifunction printer/scanner/fax that suffers print quality problems partly because of how little it’s used. It might be different if I wanted to spend over $300 for a set of HP-branded toner cartridges instead of roughly $80 for generic ones, but I’d rather live with faded printouts.

HP has rightfully been the target of ire from consumers, and as Scharon pointed out, that company has been a major cause of broken trust between printer OEMs and consumers. So we were all surprised when HP came up with an idea that could simplify and speed up some print jobs. Having a new feature that would improve the printing feature is so much better than, say, using DRM to ensure third-party products don’t function correctly with HP printers.

11. It turns out NASA’s Mars helicopter was much more revolutionary than we knew

Credit: NASA/JPL

Ingenuity made its first flight on Mars in April 2021. Seventy-two flights and nearly three years later, the small helicopter made its last flight. As Eric Berger noted, Ingenuity stood out from other NASA hardware in two ways. First, it proved that powered flight on other worlds was a possibility. Despite Mars’ very thin atmosphere, the copter was able to zoom around on its carbon fiber blades.

More importantly, Ingenuity was built with commercial, off-the-shelf hardware. The success of its mission has opened the door to other possibilities, like flying a nuclear-powered drone through the thick, nitrogen-heavy atmosphere of Titan.

10. After Russian ship docks to space station, astronauts report a foul smell

Credit: NASA



Around these parts, the usual response to a foul smell is a glance in the dog’s direction. But when you’re in a tiny space station orbiting the Earth, a bad odor is particularly worrying, as astronauts on the International Space Station found out in November.

When the Russian cargo craft docked with the ISS, the Russian cosmonauts that opened the hatch were greeted by a wave of stink. The “toxic” smell was so bad that the Russians immediately shut the hatch.

Ultimately, the astronauts crewing the ISS were not in danger, and after some extra air scrubbing, the hatch was opened and the supplies unloaded.

9. What I learned when I replaced my cheap Pi 5 PC with a no-name Amazon mini desktop

Two cheapo Intel mini PCs, a Raspberry Pi 5, and an Xbox controller for scale.

Credit: Andrew Cunningham

Two cheapo Intel mini PCs, a Raspberry Pi 5, and an Xbox controller for scale. Credit: Andrew Cunningham

One of the fun things about working at Ars Technica is watching Andrew Cunningham stretch the limits of obsolete or inexpensive hardware and software. His attempt to use a Raspberry Pi 5 as a daily-driver desktop had mixed results, but that didn’t stop him from trying out a couple of sub-$200 PCs from Amazon.

Andrew ultimately settled on the $170 Bostgame B100 and $180 GMKtec NucBox G2. Both of them used Intel Processor N100 quad-core chips and could run Windows 11 along with some Linux distros. If you’re curious about what it’s like to use a tiny, inexpensive desktop for your daily computing needs, check out Andrew’s write-up.

8. Users ditch Glassdoor, stunned by site adding real names without consent

Complaining about your employer online is a time-honored tradition. Frustrated workers vent all over the Internet, but the hub of employee griping has historically been Glassdoor. That changed for a lot of folks when Glassdoor inexplicably decided to link real names to formerly anonymous accounts.

When Glassdoor acquired the professional networking app Fishbowl in 2021, every Glassdoor user was also signed up for a Fishbowl account. The big difference is that Fishbowl requires identity verification, so Glassdoor changed its terms of service to require the same.

“Since we require all users to have their names on their profiles, we will need to update your profile to reflect this,” a Glassdoor employee wrote to a user named Monica, reassuring her that “your anonymity will still be protected.” Monica did not trust the company’s assurances that it would go to court to “defeat requests for user information,” instead requesting that Glassdoor delete her account entirely. She wasn’t the only one.

7. What’s happening at Tesla? Here’s what experts think.

A coin with Elon Musk's face on it, being held next to a Tesla logo

Credit: Aurich Lawson | Getty Images | Beata Zawrzel

Tesla is responsible for two things: making electronic vehicles a realistic option for most drivers and helping make founder Elon Musk the world’s richest person. But after years of astronomical growth, Tesla has been on a downward slide. The Chinese market has gotten much tougher for Tesla—and everyone else—due to Chinese OEMs churning out low-cost BEVs. There have been safety problems with Tesla, and the company’s once legendary profit margins have crated to below industry average.

What’s going on? Our crack automotive reporter Jonathan Gitlin talked to some experts to see if Tesla was primed for a turnaround or if its slump was indicative of more troubles to come.

6.The Starliner spacecraft has started to emit strange noises

Boeing’s Starliner spacecraft is seen docked at the International Space Station on June 13.

Boeing’s Starliner spacecraft is seen docked at the International Space Station on June 13. Credit: NASA

It all started with some weird sounds. “I’ve got a question about Starliner,” astronaut Butch Wilmore radioed down to Mission Control at Johnson Space Center in Houston in late August. “There’s a strange noise coming through the speaker… I don’t know what’s making it.”

While that space oddity turned out to be just a weird anomaly, it may have helped prepare Wilmore and fellow astronaut Suni Williams for the bad Starliner news that followed—and an extra-long stay in orbit.

5. Here’s what it’s like to charge an EV at Electrify America’s new station

A row of EVs charging at EA's flagship location in San Francisco

Credit: Roberto Baldwin

I’ve been an EV owner for five years. During that time I’ve been exposed to just about every facet of EV ownership, including charging on road trips. With the right combination of apps (shoutout to PlugShare) and planning, road trips should be problem-free. But sometimes chargers are few and far between, out of service, crowded, or just plain janky.

Out of all the charging networks—and I’ve tried almost all of them at some point—Electrify America has been the most reliable for me. Their new flagship charging station is a far cry from their outposts typically located at the far end of a giant parking lot connected to a Walmart or Meijer. Instead of aimlessly wandering the aisles of a big-box retailer, drivers can chill in a well-appointed and secure space while their cars are topped off with electrons.

Want to increase EV adoption? Get more of these working, secure, and well-lit stations up and running ASAP.

4. Dell said return to the office or else—nearly half of workers chose “or else”

Signage outside Dell Technologies headquarters in Round Rock, Texas, US, on Monday, Feb. 6, 2023.

Ars has been all about the remote workforce since our launch in 1998. Once the COVID-19 pandemic hit, remote work became a thing for millions of workers. Some companies have adapted nicely to this new reality, realizing that their employees could do their jobs just as well from the comfort of their homes while pocketing some savings from a reduced office footprint.

Others have been less sanguine about remote work. Some have tried luring workers back to the office with perks, while others—like Dell—have been more coercive in their approach. The PC manufacturer told employees who stayed remote that they would be giving up on promotions or changing roles within the company. Internal tracking data showed that almost half of Dell’s workforce simply shrugged and stayed remote, consequences or not.

3. What I learned from using a Raspberry Pi 5 as my main computer for two weeks

The Raspberry Pi 5 inside its official case.

Credit: Andrew Cunningham

The Raspberry Pi 5 inside its official case. Credit: Andrew Cunningham

We read about Andrew’s experience with a pair of sub-$200 desktop PCs, but this story is what started it all. The spec sheet looked promising enough, with support for two 4K displays running at 60 Hz and space for an internal PCIe SSD, but the experience was not what he’d hoped.

Andrew’s time using the Raspberry Pi 5 as his daily driver started out disappointing, but once he reset his expectations, he ended up pleasantly surprised by the experience.

If you’re looking for the cheapest mini desktop PC possible, you’ll want to look elsewhere, but if you want to see how far along Arm Linux has come, read Andrew’s article.

2. What happens when an astronaut in orbit says he’s not coming back?

The STS-51-B mission begins with the liftoff of the Challenger from Pad 39A in April 1985.

Credit: NASA

The STS-51-B mission begins with the liftoff of the Challenger from Pad 39A in April 1985. Credit: NASA

Being strapped into a small space and thundered into space aboard a giant rocket has to be an incredibly stressful experience. But sometimes the stress doesn’t end with a successful launch. We don’t often get to peer behind the curtains and get a glimpse of the mental state of an astronaut, so when we do, it’s jarring.

“Hey, if you guys don’t give me a chance to repair my instrument, I’m not going back,” said astronaut Taylor Wang during a Space Shuttle mission in 1985. The first Chinese-born person in space, Wang was heading up an experiment on the behavior of liquid droplets in microgravity. When it didn’t work at the outset, Wang asked permission to troubleshoot it and make repairs. When Mission Control denied his request, he uttered that chilling sentence.

1. The surprise is not that Boeing lost commercial crew but that it finished at all

Boeing’s Starliner spacecraft is lifted to be placed atop an Atlas V rocket for its first crewed launch.

Credit: United Launch Alliance

Boeing’s Starliner spacecraft is lifted to be placed atop an Atlas V rocket for its first crewed launch. Credit: United Launch Alliance

Not only has there been a lot of Boeing on this top 20 list, there has been a lot of Boeing in the news all year. And most of that news has been bad.

Eric Berger dives deep into the development of Starliner, outlining the problems and setbacks that plagued its development, trying to answer the big question of how a company like Boeing, which had been at the acme of crewed spaceflight for decades, fell so far behind competition that didn’t even exist 20 years ago?


Thank you for making Ars a daily read during 2024. May you and those you love have a happy and safe holiday season.

Photo of Eric Bangeman

Eric Bangeman is the Managing Editor of Ars Technica. In addition to overseeing the daily operations at Ars, Eric also manages story development for the Policy and Automotive sections. He lives in the northwest suburbs of Chicago, where he enjoys cycling, playing the bass, and refereeing rugby.

The 20 most-read stories of 2024 on Ars Technica Read More »

$2,100-mechanical-keyboard has-800-holes,-nyc-skyscraper-looks

$2,100 mechanical keyboard has 800 holes, NYC skyscraper looks

What’s interesting about the typing feel of this keyboard is the use of low-profile keycaps despite the keyboard supporting full-height mechanical switches. I’m curious if the pairing results in the keycaps feeling too thin or unstable while typing.

Other Icebreaker specs include a “silicone dampener integrated into the bottom lid both supporting the PCB and doubling as non-slip feet,” per Serene.

The Icebreaker's underside.

The keyboard’s underside. Credit: Serene Industries

There’s also a 4,000 mAh battery and “1/4-20” threads for professional accessory mounting, such as Picatinny rails.” One could also use the threads for mounting the keyboard onto monitor arms and hand grips.

And like many high-priced keyboards to come out in the past couple of years, the Icebreaker includes a rotary encoder dial. The dial is programmable, like the rest of the keyboard’s keys, with the Via configurator.

The Icebreaker starts at $1,500 with a clear-colored base, hot-swappable switches, and USB-C cable connectivity. It goes up to $2,100 if you get in black and with Bluetooth connectivity or Hall effect switches, which actuate through the use of magnets. Notably, the Bluetooth version of the keyboard only seems to have one Bluetooth channel, compared to cheaper wireless keyboards that let you pair and toggle across multiple, simultaneously paired devices.

The lavish side of mechanical keyboards

Ultimately, the keyboard’s unique construction, design cues, and lack of mass production contribute to a four-figure price tag that’ll shock those not accustomed to the overly luxurious side of mechanical keyboards. Agarkov told Null Society that one of the biggest challenges with making The Icebreaker was “balancing the design with practical considerations.”

“For instance, the keyboard is intentionally heavy and large, which, funny enough, was a point of confusion for the manufacturers,” he added.

As you may have determined by now, The Icebreaker’s price is more about style and clout than advanced features or high-end typing. In fact, you don’t even get a numpad or switches at this price. For comparison, Angry Miaom is no stranger to outrageously priced keyboards, but as of this writing, its only keyboards with MSRPs over $1,000 are split keyboards:

Angry Miao AFA Blade Limited Edition keyboard kit.

Angry Miao’s Afa Blade Limited Edition keyboard kit costs $2,049 and uses aluminum, stainless steel, glass, carbon, and aluminum alloy. Credit: Angry Miao

Still, The Icebreaker is an example of how dedicated, artistic, and daring mechanical keyboard enthusiasts can be and how much time, effort, and expense can impact crafting a one-of-a-kind keyboard that’s sure to get people talking.

In the world of mechanical keyboards, unreasonable luxury is par for the course. For the avid collector out there, The Icebreaker can make for one expensive trophy.

$2,100 mechanical keyboard has 800 holes, NYC skyscraper looks Read More »

film-technica:-our-favorite-movies-of-2024

Film Technica: Our favorite movies of 2024


lighting up the silver screen

This year’s list features quite a bit of horror mixed in with the usual blockbuster fare—plus smaller hidden gems.

Credit: Aurich Lawson | Getty Images

Editor’s note: Warning: Although we’ve done our best to avoid spoiling anything too major, please note this list does include a few specific references to several of the listed films that some might consider spoiler-y.

This was the year that Marvel Studios hit the pause button on its deluge of blockbuster superhero movies, after rather saturating the market in recent years. It proved to be a smart move: the only Marvel theatrical release was the R-rated Deadpool & Wolverine, a refreshingly irreverent, very meta take on the genre that delighted audiences and lit up the global box office. Perhaps audiences aren’t so much bored with superhero movies as becoming more discriminating in their choices. Give us a fun, fresh take and we’ll flock back to theaters.

Fewer superhero franchise entries meant there was more breathing room for other fare. Horror in particular had a stellar year, with numerous noteworthy offerings, touching on body horror (The Substance), Satanic Panic (Late Night with the Devil), psychological horror (Heretic), hauntings (The Oddity), a rom-com/revenge mashup (Your Monster), an inventive reimagining of a classic silent film (Nosferatu), and one very bloodthirsty child vampire with a wicked sense of humor (Abigail). Throw in a smattering of especially strong sequels (Inside Out 2, Dune: Part 2), a solid prequel (Furiosa), and a few hidden gems, and we had one of the better years for film in recent memory.

As always, we’re opting for an unranked list, with the exception of our “year’s best” vote at the very end, so you might look over the variety of genres and options and possibly add surprises to your eventual watchlist. We invite you to head to the comments and add your favorite films released in 2024.

The Fall Guy

Credit: Universal Pictures

I love to mentally check out with a good movie when I fly. So, on a recent trip to New York City for Technicon, I settled into my narrow, definitely-not-my-couch airline seat and fell in love with The Fall Guy, a movie based on the TV show I remember watching as a teen back in the ’80s.

Directed by David Leitch (Deadpool 2, the John Wick franchise), The Fall Guy is pure entertainment—part rom-com, part action, funny as heck, and super meta. Leitch is perfectly suited to direct a film about a stuntman, having been one himself (he was Brad Pitt’s stunt-double five times). And the actors clearly are having a ton of fun roasting the industry, while also paying tribute to the invisible heroes of any movie: the stunt performers.

A year after a nearly fatal fall (yeah, pun apparently intended), stuntman Colt Seavers (Ryan Gosling) is persuaded by his former producer, Gail (Hannah Waddington), to come to the rescue for a film his ex-girlfriend, Jody (Emily Blunt), is directing after the lead actor and his stuntman disappear. Gail asks him to find them to save the film and Jody’s career. The exaggerated stunts, meta jokes (Tom Cruise, “I do my own stunts”), unicorn, callbacks to favorite films (Notting Hill etc.), and unflagging plot made for a quick flight for me. The chemistry between Blunt and Gosling makes the movie and provided an at-times hilarious-yet-believable romantic tension. (I’ll never forget the giant monster hand nor the air pistols.) And the cameo by the real fall guy left me elated.

A few years back, also on a flight, I remember watching Gosling’s comedy chops in The Nice Guys and laughing aloud several times (Always awkward. Sorry seat mates.). I did the same with The Fall Guy as well. But could my enthusiasm for the movie get anyone in my family to watch it with me on our giant COVID-purchase TV with the surround sound and subwoofer on high?? Not for a solid month. But once I did, they were sold.

Kerry Staurseth

Hit Man

Credit: Netflix

I grew up in Richard Linklater’s Texas, and there seems to be something—the characters, the story, the setting, or the aesthetic—that resonates with my personal experience in most of his films. I can’t say the same for Hit Man, but this isn’t meant to be a criticism. Instead, Linklater’s Hit Man offers nearly two hours of pure escapism that many of us need. It’s smart, with witty dialogue, more than a few moments of side-splitting humor, and a story that is too good to be true, although the premise is based on true events.

Gary, played by Glen Powell (who also co-wrote the screenplay with Linklater), is a chameleon. Gary starts the film as a meek, somewhat nerdy college professor, but circumstances quickly force him into the uncomfortable position of becoming an undercover police informant. As we learn early in the film, this involves portraying a fake hitman to rope suspects into contract killing schemes and then prosecution. While I may question the legality or ethics of this setup, it creates a canvas for Linklater and Powell to create funny, sympathetic characters thrust into situations that, while far-fetched, somehow seem believable.

Ultimately, Hit Man provides a laboratory for character development for the audience and within the film itself. In the film, Gary’s academic background helps him craft characters to match the circumstances and attitudes of each of his targets. Gary’s hitman personas can turn up the charm, abrasiveness, or faux bravado as the situation requires it. Gary reinvents himself at every turn, showcasing Powell’s acting range. That is, until Gary runs into Madison, portrayed by Adria Arjona. Then, things become a little too real for Gary, and you’ll have to watch the film to see what happens next.

Stephen Clark

Heretic

Credit: A24

Hugh Grant launched his career playing charmingly self-effacing rom-com heroes (cf. Four Weddings and a Funeral, Notting Hill). But in recent years, he’s embraced his darker side, playing roguish villains in films like The Gentlemen and Dungeons & Dragons: Honor Among Thieves, as well as for the BBC miniseries A Very English Scandal. Heretic gives him his most disturbing role yet.

Grant plays Mr. Reed, a reclusive man who invites the Mormon missionaries who come knocking on his door inside for some of his wife’s blueberry pie. But Sister Barnes (Sophie Thatcher) and Sister Paxton (Chloe East) soon realize there is no Mrs. Reed, that delicious blueberry smell is from a candle, they have no cell phone signal, and they are locked inside with a lunatic. They must figure out how to escape from the basement dungeon in which Reed traps them, a torturous environment in which to test their faith.

Heretic has its share of blood and violence, but the focus is more on the psychological trauma inflicted on the young women. And its treatment of the Mormon faith is surprisingly nuanced for the horror genre. Still, it’s Grant’s subtly sinister performance that really makes the film: He brings just a hint of his trademark rom-com charm to the role, which somehow makes everything he says and does doubly chilling.

Jennifer Ouellette

Tuesday

Credit: A24

This quietly devastating indie fantasy drama stars Julia Louis-Dreyfus as Zora, a mother whose 15-year-old daughter Tuesday (Lola Petticrew) is confined to a wheelchair with an incurable terminal disease. The fantastical element is Death, who comes to release Tuesday from her suffering in the form of a talking macaw that can alter its size at will. But Zora isn’t ready to let her daughter go; she swallows Death to keep her daughter alive—with the added complication that now nobody can die.

At its heart, Tuesday is an unsettling fable about human mortality and learning not just to confront, but to embrace, Death. That’s a pretty heavy theme, and the film offers no pat, easy answers in its resolution. But first-time director DainaO.Pusić brings a light touch to the melancholy, bolstered by Louis-Dreyfus’ courageous performance.

Jennifer Ouellette

The Substance

Credit: Mubi

Listen, I’m not here to convince you that The Substance changed my life, but it’s been a while since a modern sci-fi/horror movie fixated on the fear of death and aging made my skin crawl, so like many viewers in 2024, I was itching to press play. Demi Moore stars as Elizabeth Sparkle, a 50-year-old fitness icon who foolishly injects an experimental drug to maintain her celebrity and quickly regrets birthing a younger double (played by Margaret Qualley), whom she now must split her life with.

Between firm butts flexing and gory mutations emerging, Moore’s and Qualley’s characters clash, forgetting they are “the one” and spiraling toward doom. And while most body horror movies are viewed as gratuitous, The Substance lives up to its title. Somehow, through a nauseating cascade of increasingly grotesque distortions of the human form, the movie morphs into a meaningful satire on society’s stance that older women are irrelevant—blowing a kiss into the camera at the genre’s past tendency to objectify female characters.

Ashley Belanger

Rez Ball

Credit: Netflix

This is a classic feel-good sports movie that manages to seem both familiar and fresh, thanks to its setting on a Navajo reservation. (It’s based on the nonfiction novel Canyon Dreams by Michael Powell.) Rez Ball follows one season of the Chuska Warriors, a Native American high school basketball team competing for the state championship. Their star player is Nataanii (Kusem Goodwind), whose mother and sister were killed by a drunk driver the prior year. Nataanii has been struggling with his grief ever since, and when he doesn’t show up for practice one day, the team learns he committed suicide.

It’s up to coach Heather (Jessica Matten), a former WNBA player, to help her team recover from the shocking loss and regroup to finish the season. She names Nataanii’s best friend, Jimmy (Kauchani Bratt), as team captain and employs some novel team-building exercises—most notably a shepherding task in which the team must work together to bring sheep down from a mountain and back into their enclosure. Then there’s her clever strategy of training the team to call all their plays in their native language—shades of the World War II “code talkers.” (There’s even a sly humorous reference to the 2002 Nicolas Cage movie Windtalkers in between all the frybread jokes.)

Director Sydney Freeland hits all the familiar notes of this genre and ably captures the basketball sequences—is there really any doubt we’ll have a happy(ish) ending? Yet the film earns its payoff, driven not by genuine suspense, but by the sheer determination of the team members and how they bond to overcome their grief and bring some joy out of their shared tragedy.

Jennifer Ouellette

Oddity

Credit: Shudder

Oddity is a pitch-perfect supernatural thriller that never should have worked. Writer-director Damian McCarthy has explained that the movie comprised “a mix of a lot of old ideas” that he “could never find a home for.” That hodgepodge storytelling approach could have been a forced recipe for disaster if McCarthy wasn’t such an undeniable master of tension. Telling the story of a psychic medium-antiques dealer desperate to divine the events leading to her twin sister’s shocking murder in an abandoned Irish manor, the movie managed to feel fast-paced while drawing out an unrelenting sense of dread.

The bulk of that tension comes from a haunted wooden man that remains onscreen and barely ever moves—leaving the audience painfully stuck anticipating the moment when the nightmarish figure will spring to life. With slasher movie elements and twists as jarring as the wooden man’s startling features, Oddity had some horror fans within minutes smashing pause to recover from the brutal opening scene before returning to finish McCarthy’s curious haunted house tour de force.

Ashley Belanger

Abigail

Credit: Universal Pictures

Six criminals get more than they bargained for when they are hired to kidnap the young daughter of a wealthy underworld kingpin: budding ballerina Abigail (Alisha Weir). Joey (Melissa Barrera) is the only member to be kind to their captive, clearly bothered by the fact that their target is a child. Abigail responds to that kindness with an ominous sweetness: “I’m sorry about what’s going to happen to you.”

So begins one of the goriest and funniest vampire rampages to find its way to the big screen, as the Undead Abigail takes brutal revenge on each of her kidnappers in turn. The carnage is truly next-level, including one infamous scene in which Joey wades through a literal pool of bloody, rotting dead bodies—all victims of Abigail’s ferocious killer instincts. There are some insane plot twists, plenty of perfectly timed humorous moments, and terrific performances from the ensemble cast, especially Weir. If horror comedies are your jam, Abigail is an excellent addition to the genre.

Jennifer Ouellette

Furiosa: A Mad Max Saga 

Credit: Universal Pictures

A nine-year wait between franchise films is, more often than not, an indication that the follow-up can’t meet some lofty expectations of what came before it. But that’s not the case for Furiosa.

Although it’s not the same white-knuckle thrill ride as 2015’s Fury Road, Furiosa gives us another mostly mute protagonist in an expertly crafted action film that overlaps as a revenge flick. While Anya Taylor-Joy delivers a cold, steely interpretation of the eponymous protagonist, it’s the object of her revenge, Chris Hemsworth’s villain Dementus, that offers a new variation to the typically bleak wasteland: levity.

Hemsworth relishes his chance here to show another side of his acting chops, and the result is one of the funniest and zaniest villainous performances in recent memory. Dementus’ malice is matched by his penchant for delivering self-aggrandizing speeches, which are a nice reminder that, even as the world fell, not everyone lost their sense of humor.

Jacob May

I Saw the TV Glow

Credit: A24

As anyone who’s spent years rewatching a beloved sci-fi/fantasy show could likely glean from its ethereal title, I Saw the TV Glow was made to immerse viewers in the sort of complex mythology that keeps the most engaged superfans glued to the screen. Surreally blurring the lines between TV fiction and reality, the A24 film follows an alienated teen boy who deeply bonds with an older female classmate over a monster-of-the-week TV show that comes on past his bedtime.

What starts at a sleepover evolves into an existential nightmare suggesting that the boy’s truth might be a fiction constructed by the “Big Bad” villain from his favorite TV show. This absurd possibility follows the boy as he grows into a man with his own family, all while continuing to take comfort in his all-time favorite TV show. The mesmerizing conclusion injected a disturbing sense of wonder into 2024, leaving some viewers as slack-faced as the boy was when he finally got to watch the late-night TV show that he somehow knew would light him up inside.

Ashley Belanger

Thelma

Credit: Magnolia Pictures

Elderly people are so often invisible in our youth-oriented society, so it’s nice to see two 90-something characters take center stage in this charming comedy-drama written and directed by Josh Margolin. June Squibb plays the titular Thelma, who gets taken in by a phone scammer pretending to be her grandson Danny (Fred Hechinger) to the tune of $10,000. The police won’t help, but Thelma has a P.O. box address as a clue and sets out to get her money back.

Thelma enlists the help of her estranged friend Ben (Richard Roundtree, in his final role), who is eager to escape his assisted living facility for one last adventure, and the two set off on Ben’s two-person scooter. Wacky hijinks and personal growth and enlightenment ensue. The film was inspired by a conversation Margolin had with his own now-deceased grandmother, and that personal experience is the key to Thelma‘s warmth, humor, and authenticity. It’s a lovely twist on the classic road movie and well worth a watch.

Jennifer Ouellette

Woman of the Hour

Credit: Netflix

In the late 1970s, serial killer Rodney Alcala interrupted his murder spree to make a 1978 appearance on The Dating Game and actually went out on a date with bachelorette Sheryl Bradshaw—who naturally had no idea the charming man who’d won her over with his answers was, in fact, a psychopath. It might seem like an odd bit of trivia on which to base a film, but Anna Kendrick came across Ian MacAllister McDonald’s initial screenplay as the actress was gearing up to make her directorial debut with Netflix and snatched it up.

Kendrick also stars as Sheryl, a struggling LA actress who is persuaded to go on The Dating Game by friends, and her typically winsome, spunky performance—and able direction— lifts Woman of the Hour to the next level. Perhaps the best part of the film is that it doesn’t linger overmuch on the killer or glorify his horrific deeds. The focus stays squarely on Sheryl and a woman in the audience named Laura (Nicolette Robinson), who recognizes Rodney (Daniel Zovatto) as the man last seen with her missing best friend. It’s a well-done, quietly thrilling period piece that bodes well for Kendrick’s future as a director.

Jennifer Ouellette

Your Monster

Credit: Vertical Entertainment

It’s been quite a year for Melissa Barrera, who followed up her standout Final Girl performance in Abigail with another star turn in the decidedly offbeat Your Monster—part romantic comedy, part horror/revenge fantasy, weaving in such disparate influences as the late ’80s TV series Beauty and the Beast and classic Broadway musicals like A Chorus Line. It’s based on a 2019 short film by writer/director Caroline Lindy, inspired by Lindy’s one-time boyfriend breaking up with her when she received a cancer diagnosis.

Barrera plays Laura, an actress who also loses her boyfriend after a cancer diagnosis—plus he reneges on his promise to let her audition for the musical she co-wrote—and goes back to her childhood home to recuperate. There she encounters the proverbial Monster in the closet (Tommy Dewey), who is none too pleased about suddenly having a “roommate” again. At first he tries to scare her, but soon they’re bonding over old movies and Chinese takeout; Monster might just be the ideal boyfriend she’s been looking for.

Of course, Monster is also very much a manifestation of Laura’s psyche, particularly her subsumed rage. Naturally they plot revenge on her selfish ex, and when it comes, it’s everything a jilted lover could want from the experience. Your Monster can’t quite decide on a tone, shifting constantly between comedy and horror, love and revenge. But that’s part of what makes this quirky film so appealing: Lindy isn’t afraid to take creative risks, and she makes it all work in the end.

Jennifer Ouellette

Will and Harper

Credit: Netflix

A few years ago, comic actor Will Ferrell was on-set filming a movie when he received a surprising text from Harper Steele, a close friend of some 30 years, dating back to their time together on Saturday Night Live. Steele informed him of her gender transition. Ferrell’s response was to organize a road trip for the two of them, starting in New York City, where they first met, hitting stops in Washington, DC, Indiana, Illinois, Oklahoma, and Amarillo, Texas—documenting the journey all along the way.

The result is Will and Harper, a surprisingly sweet, refreshingly frank, and thought-provoking film that celebrates an enduring friendship. There’s never a question of Ferrell not accepting his friend’s transition, but there are some awkward growing pains. The pair don’t shy away from more difficult conversations, peppered with humor, while downing cans of Pringles, and it’s that well-meaning honesty that keeps the film grounded and centered on their relationship, without falling into didactic preachiness.

Jennifer Ouellette

Wicked Little Letters

Credit: StudioCanal

Trolling didn’t begin with social media. Back in the 1920s, several residents of the seaside town of Littlehampton in England began receiving poison pen letters rife with obscenities and false rumors. It became known as the Littlehampton libels, with the culprit revealed to be a 30-year-old laundress named Edith Swan, who tried to pin the blame on her neighbor, Rose Gooding, until she was found out. (Poor Gooding actually served over a year of jail time before she was exonerated.)

Wicked Little Letters is the fictionalized account of those events, starring Olivia Coleman as Edith and Jessie Buckley as Rose, emphasizing the complicated relationships and psychological foibles of the central characters. Even if you know nothing about the case, we learn early on who the true culprit is, and the film then becomes a cat-and-mouse game as Rose’s allies try to prove Edith is the true poison pen. The true enjoyment is watching everything play out with equal parts humor and pathos.

Jennifer Ouellette

Nosferatu

Credit: Universal Pictures

Director Robert Eggers can be a polarizing figure for moviegoers. How much you enjoyed The Witch, The Northman, or 2019’s The Lighthouse (inspired by a real-life 1801 tragedy involving two Welsh lighthouse keepers trapped in a storm) likely depends on your taste for Eggers’ dark mythic sensibility and penchant for hallucinatory imagery. With Nosferatu—a daring reinvention of the seminal 1922 German silent film by F.W. Murnau, based in turn on Bram Stoker’s 1897 novel Dracula—Eggers leans fully into supernatural gothic horror, with spectacular, genuinely scary results.

It’s hard to go wrong with Bill Skarsgård in the lead role of the vampire Count Orlok; his portrayal of Pennywise the Clown in It is still giving people nightmares. Lily-Rose Depp and Nicholas Hoult also shine as Ellen (the unfortunate object of Orlok’s murderous pursuit, slowly driven mad as he closes in) and her hapless fiancé, Thomas, as does Willem Dafoe as the eccentric Professor von Franz. The basic outlines of Stoker’s plot remain, but Eggers has also infused his film with a visual language that evokes both Murnau’s distinctive German expressionism and the Eastern European folklore that inspired Stoker. This is not so much a remake as an innovative re-imagining by a director whose sensibility is perfectly suited to the task.

Jennifer Ouellette

Monkey Man

Credit: Universal Pictures

Dev Patel’s latest film completely missed me when it got a limited cinematic release this spring. Instead, I stumbled across it streaming on Peacock and went in cold with nothing more than good vibes toward the actor—and now director—based on his performances in films like Chappie. Which made the initial fight, with Patel wearing a monkey mask, a little confusing at first.

Monkey Man is a good old revenge film, following Patel’s character as he negotiates the underworld of the fictional Indian city of Yatana in a quest to avenge his mother, who was brutally murdered when their village was ethnically cleansed by Hindu nationalists. The fight scenes are frenetic and visceral, influenced by films like John Wick but also The Raid, and the hand-to-hand combat in Marvel’s Daredevil. But it’s also a film with a political message or two. Perhaps the best way to describe it is like a cross between John Wick and RRR—if you liked both of those films, you’ll probably love Monkey Man.

Jonathan Gitlin

The Three Musketeers Part 2: Milady

Credit: Pathe

Last year, The Three Musketeers Part 1: D’Artagnan made our annual list, in which we celebrated finally having a quintessential French adaptation of Alexandre Dumas’ classic 1844 novel to rival Richard Lester’s iconic two-part 1970s US adaptation. Part 2: Milady covers the events of the second half of the novel, as D’Artagnan (Francois Civil) and his compatriots rush to rescue his kidnapped lover, Constance (Lyna Khoudri), and prevent the assassination of the Duke of Buckingham (Jacob Fortune-Lloyd) by Eva Green’s deliciously wicked Milady de Winter.

Both films were shot back to back, so the same top-notch storytelling and able performances are present. And director Martin Bourboulon heard the complaints about how dark the first installment was in places and corrected the colorimetry. My only quibble: unlike Part 1, Part 2 actually deviates quite substantially from the source material, particularly with regard to the fates of Constance and Milady. In fact, the finale is left open-ended. Could a third installment be in the offing? (An adaptation of Dumas’ The Count of Monte Cristo is releasing soon by the same team.) Still, it’s a magnificent, hugely entertaining film that pairs beautifully with its predecessor.

Jennifer Ouellette

Late Night with the Devil

Credit: IFC Films

Framed as a documentary with behind-the-scenes found-footage elements, Late Night with the Devil tells the story of a late-night talk show, Night Owls with Jack Delroy, and its producers’ attempts to put on an unforgettable Halloween night show in 1977. Things start out in an appropriate-for-TV spooky tone, and the movie’s ’70s aesthetic really sells the vibe.

But as the show goes on, the guests get progressively weirder, the segments become more sinister, and it starts to be difficult to tell if the guests are putting on an act or if something darker is going on. Is the host really going to try to commune with the devil on a late-night variety hour? That quickly becomes the plan. I won’t spoil more than that, but I found the ride compelling from start to finish.

This was a good year for horror movies, and Late Night with the Devil was one of my favorites. David Dastmalchian’s performance as the host was a real standout. The whole package is great fun, and everything wraps up in a blessedly tight 95 minutes (man, movies are way too long these days). Genre fans shouldn’t miss this one.

Aaron Zimmerman

Wicked Part 1

Credit: Universal Pictures

I was lucky enough to see Wicked on Broadway near the end of Idina Menzel and Kristen Chenoweth’s iconic runs originating the characters of Elphaba and Glinda for the stage. Since then I’ve seen the live version of the musical five more times at various points and listened to the soundtrack hundreds of times more. Despite all that, the unavoidable marketing for this movie had me worried it was going to be an overproduced cinematic flop on the order of Cats or Dear Evan Hansen.

Happily, my worries were overblown. Cynthia Erivo and Ariana Grande bring real chemistry and pathos to the show’s main roles and have the pipes to pull off some extremely difficult songs without breaking a sweat. I was also impressed with the movie’s top-notch choreography, which evokes the golden age of silver screen musicals and demands to be seen in a theater with as big a screen as possible.

My only quibble with this adaptation is the pacing, which suffers thanks to a few unnecessary backstory additions and a few too many long, lingering shots and pregnant pauses that even mess up the flow of some iconic songs. Why they decided to shoot “Defying Gravity” like an action movie—and decided not to cut to the credits right after Erivo’s soaring final note—will always be a huge mystery to me. A version of this movie that was about 45 minutes shorter would have been perfect. The version we got was instead just a very good adaptation of a very good musical.

Kyle Orland

The Wild Robot

Credit: Universal Pictures

This is the final film to be animated entirely in-house at DreamWorks, based on the 2016 novel of the same name by Peter Brown. It features a plucky service robot called ROZZUM unit 7134, aka “Roz” (voiced by Lupita Nyong’o), who gets shipwrecked on a desert island and must learn to adapt. Along the way, Roz befriends some of the local wildlife—Pedro Pascal voices a mischievous red fox named Fink, with Bill Nighy voicing an elderly goose named Longneck—and adopts an orphaned goose named Brightbill (Kit Connor).

Director Chris Sanders was inspired both by classic Disney animated movies and Hayao Miyazaki, creating what he described as “a Monet painting in a Miyazaki forest” for the visual CGI style of The Wild Robot. It makes for quite a striking combination. Plot-wise, there are elements of E.T. and Pixar’s Wall-E here, but Sanders has created a unique take on those tropes and standout characters that are all his own. Along with Inside Out 2 (see below) this is one of the best animated movies of the year.

Jennifer Ouellette

Deadpool & Wolverine

Credit: Marvel Studios

The Deadpool & Wolverine movie was a long time coming. That’s not just because Deadpool (Ryan Reynolds) has been making comically obsessive requests to hang out with Hugh Jackman’s Wolverine since the first Deadpool. But the movie itself feels like an homage to the comic book movies before it, combining fan service with a true, sensible (for a comic book movie) plot and a satisfying conclusion that leaves the characters more mature and content than when we last saw them.

Some may be concerned about the return of Jackman, considering his version of Wolverine was supposed to come to a dramatic and spectacular conclusion with the 2017 movie Logan. In fact, the movie is about Deadpool’s universe crumbling (as related by the Time Variance Authority from the show Loki) due to that version of Wolverine no longer being around. But Deadpool & Wolverine handles this well by visiting the end location of Logan and establishing that Jackman is now playing a Wolverine from an alternate universe and is still highly capable of playing the fierce, acrobatic, and iconic X-Man.

Deep down, the movie is about two men who have typically felt alone and unworthy of the people they love finding new paths to manhood, self-respect, and acceptance of their roles in the world. But for comic book fans, it’s really about action-packed nostalgia. The good feels are bolstered by epic cameos of characters you might have forgotten were Marvel-related at all (if possible, I highly recommend seeing this movie spoiler-free).

Unexpectedly one of the best parts of the movie comes from the ending credits. It features behind-the-scenes footage from 12 X-Men movies going back 24 years. With clips featuring the likes of a young Jackman, Halle Berry (who has played Storm), and Patrick Stewart (who has played Professor X), it’s a reminder of a time when comic books felt new and bold and a tribute to how long all of us—from the actors, to the crew, to the audience—have been on this journey. Ultimately, Deadpool & Wolverine provides a fulfilling and happy goodbye to all those pieces.

Scharon Harding

Nickel Boys

Credit: Amazon MGM Studios

Colson Whitehead won the 2020 Pulitzer Prize for his 2019 novel The Nickel Boys, based on Florida’s infamous Arthur G. Dozier School for Boys, a relic of the Jim Crow era. The school’s staff inflicted all manner of abuse, beatings, rapes, and torture on its unfortunate charges and even murdered many of them; as of 2012, nearly 100 deaths had been documented, along with 55 burial sites on school grounds. (There could be as many as 27 more burial sites, based on ground-penetrating radar surveys.)

A young Black boy named Elwood (Ethan Cole Sharp) in 1962 is a promising student until he is mistakenly arrested for being an accomplice to car theft. He’s sent to the segregated Nickel Academy, where he makes friends with Turner (Brandon Wilson). (Daveed Diggs plays a grown Elwood, now a successful businessman in New York City.) The two witness and experience so much abuse that Elwood finally decides to fight back, despite the risk of retaliation by the school’s administrators.

This is powerful subject matter, deftly handled by director RaMell Ross, who manages to tell a compelling story without turning it into what’s become known as “Black trauma porn.” The most controversial aspect of the film is Ross’ choice to shoot it from a first-person point of view with a 1.33:1 aspect ratio. So we see either Elwood speaking in a scene, with Turner off-camera, or vice versa, and the two are only occasionally onscreen at the same time. Some might find this choice annoying, but I found it kept me centered on one boy’s perspective at a time, which served to make the final plot twist all the more satisfying.

Jennifer Ouellette

Inside Out 2

Credit: Pixar/Disney

I cried multiple times the first time I saw Inside Out in the theater, and still tear up when I watch it at home. So I was prepared to be even more emotional at Inside Out 2, especially given that I’m now the parent of a tween child myself.

I wasn’t quite moved to tears by this tale of Riley struggling with newfound feelings of Anxiety, pushing her to more and more desperate plans to ingratiate herself with a group of “cool” kids. But I will admit that my heart did break a little during the climactic scene, which shows the inner turmoil inherent to a true panic attack in a way that can resonate with both children and adults.

There were a couple of inconsistent attempts at comedy in Inside Out 2 that felt like they came from a completely different movie. And I found myself missing the original voice actors for Disgust and Envy, as well as Lewis Black’s original Anger voice (which has noticeably diminished as he’s aged). But none of this was enough to diminish the strong emotional core of a movie that will be relatable to anyone who’s busy growing up or just remembers doing the same.

Kyle Orland

And now… our pick for the best movie of 2024:

Dune: Part 2

Credit: Warner Bros.

David Lynch’s 1984 Dune was a huge chunk of my high school experience, being as I was part of a small group of friends obsessed with the movie—with its incredible visuals, its outsize but seemingly earnest camp, and its absolutely endless quotability. We sprinkled the movie’s words throughout our conversations, experimented with re-creating portions of it with video cameras and action figures, and reveled in exploring something that felt truly ours—largely because the movie was rejected and forgotten by so many others.

If anything, Lynch’s Dune put paid to the notion that Frank Herbert’s novel could be successfully ported to film. It’s a heroic effort, but it’s a bloody mess. And I would have gone to my grave thinking that Dune remained one of the most unfilmable classic bits of 20th-century science fiction—until Denis Villeneuve went and made the dang thing anyway.

The viscerally visual filmmaker who famously hates dialog did something I genuinely believed was impossible: He gave us a (two-part) translation of the book to screen that is both faithful to the original, and also shows us new things that feel like they’ve been there all along, waiting to be discovered.

Dune: Part Two is a masterpiece. It is the product of craftsmen at the top of their crafts, including and especially the craftsman in the director’s seat. Dune gives us a peek at exactly what Villeneuve means when he talks about the “paradise” of a movie without dialogue—there are long, almost Tarkovsky-esque stretches where vast cyclopean imagery juxtaposes itself against tiny human tableaus, underpinned by nothing but Hans Zimmer’s transcendent music. And it’s not just that these stretches work—they work fantastically well!—it’s that in many ways they carry the movie to places that rapid-fire Aaron Sorkin-style banter could never reach. The visuals show us things—things words never could.

Speaking of Hans Zimmer—let’s talk about that score. It’s an absolutely masterful creation that figures so prominently in our experience of Arrakis that it becomes a character itself, a second unseen narrator who alternates with poor unloved Irulan as the voice of the world. Paul and Chani’s love theme, a composition titled “A Time of Quiet Between the Storms,” is one of the most powerfully emotional pieces of music I’ve ever heard, embodying almost the platonic ideal of pure, mournful longing; the emotional hammer-blow delivered by its apocalyptic, civilization-ending reprise “Kiss the Ring” left me speechless and wide-eyed in the theater.

Folks, Dune: Part Two is a good movie. It (and its prequel) is one of the best movies I’ve ever seen, successfully adapting a difficult book into a movie and retaining the bits that mattered most. Villeneuve was born to make these films, and Zimmer was born to score them. They are true art. If anything, I’m even more excited now about another of Villeneuve’s upcoming projects: he’s taken over the reins for the long-stalled, long-rumored, finally-happening-for-real adaptation of Arthur C. Clarke’s Rendezvous with Rama, a book that heavily imprinted itself on me in fourth grade and that I’ve reread at least once a year for most of my life. If Villeneuve brings his A-game, I have the highest hopes for Rama.

Lee Hutchinson

Photo of Jennifer Ouellette

Jennifer is a senior reporter at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Film Technica: Our favorite movies of 2024 Read More »