Author name: Mike M.

researchers-spot-saturn-sized-planet-in-the-“einstein-desert”

Researchers spot Saturn-sized planet in the “Einstein desert”


Rogue, free-floating planets appear to have two distinct origins.

Most of the exoplanets we’ve discovered have been in relatively tight orbits around their host stars, allowing us to track them as they repeatedly loop around them. But we’ve also discovered a handful of planets through a phenomenon that’s called microlensing. This occurs when a planet passes between the line of sight between Earth and another star, creating a gravitational lens that distorts the star, causing it to briefly brighten.

The key thing about microlensing compared to other methods of finding planets is that the lensing planet can be nearly anywhere on the line between the star and Earth. So, in many cases, these events are driven by what are called rogue planets: those that aren’t part of any exosolar system at all, but they drift through interstellar space. Now, researchers have used microlensing and the fortuitous orientation of the Gaia space telescope to spot a Saturn-sized planet that’s the first found in what’s called the “Einstein desert,” which may be telling us something about the origin of rogue planets.

Going rogue

Most of the planets we’ve identified are in orbit around stars and formed from the disks of gas and dust that surrounded the star early in its history. We’ve imaged many of these disks and even seen some with evidence of planets forming within them. So how do you get a planet that’s not bound to any stars? There are two possible routes.

The first involves gravitational interactions, either among the planets of the system or due to an encounter between the exosolar system and a passing star. Under the right circumstances, these interactions can eject a planet from its orbit and send it hurtling through interstellar space. As such, we should expect them to be like any typical planet, ranging in mass from small, rocky bodies up to gas giants. An alternative method of making a rogue planet starts with the same process of gravitational collapse that builds a star—but in this case, the process literally runs out of gas. What’s left is likely to be a large gas giant, possibly somewhere between Jupiter and a brown dwarf star in mass.

Since these objects are unlinked to any exosolar system, they’re not going to have any regular interactions with stars; our only way of spotting them is through microlensing. And microlensing tells us very little about the size of the planet. To figure things out, we would need some indication of things like how distant the star and planet are, and how big the star is.

That doesn’t mean that microlensing events have told us nothing. We can identify the size of the Einstein ring, the circular ring of light that forms when the planet and star are perfectly lined up from Earth’s perspective. Given that information and some of the remaining pieces of info mentioned above, we can figure out the planet’s mass. But even without that, we can make some inferences using statistical models.

Studies of collections of microlensing events (these collections are small, typically in the dozens, because these events are rare and hard to spot) have identified a distinctive pattern. There’s a cluster of relatively small Einstein rings that are likely to have come from relatively small planets. Then, there’s a gap, followed by a second cluster that’s likely to be made by far larger planets. The gap between the two has been termed the “Einstein desert,” and there has been considerable discussion regarding its significance and whether it’s even real or simply a product of the relatively small sample size.

Sometimes you get lucky

All of which brings us to the latest microlensing event, which was picked up by two projects that each gave it a different but equally compelling name. To the Korea Microlensing Telescope Network, the event was KMT-2024-­BLG-­0792. For the Optical Gravitational Lensing Experiment, or OGLE, it was OGLE-­2024-­BLG-­0516. We’ll just call it “the microlensing event” and note that everyone agrees that it happened in early May 2024.

Both of those networks are composed of Earth-based telescopes, and so they only provide a single perspective on the microlensing event. But we got lucky that the European Space Agency’s space telescope Gaia was oriented in a way that made it very easy to capture images. “Serendipitously, the KMT-­2024-­BLG-­0792/OGLE-­2024-­BLG-­0516 microlensing event was located nearly perpendicular to the direction of Gaia’s precession axis,” the researchers who describe this event write. “This rare geometry caused the event to be observed by Gaia six times over a 16-­hour period.”

Gaia is also located at the L2 Lagrange point, which is a considerable distance from Earth. That’s far enough away that the peak of the events’ brightness, as seen from Gaia’s perspective, occurred nearly two hours later than it did for telescopes on Earth. This let us determine the parallax of the microlensing event, and thus its distance. Other images of the star from before or after the event indicated it was a red giant in the galactic bulge, which also gave us a separate check on its likely distance and size.

Using the parallax and the size of the Einstein ring, the researchers determined that the planet involved was roughly 0.2 times the mass of Jupiter, which makes it a bit smaller than the mass of Saturn. Those estimates are consistent with a statistical model that took the other properties into account. The measurements also placed it squarely in the middle of the Einstein desert—the first microlensing event we’ve seen there.

That’s significant because it means we can orient the Einstein desert to a specific mass of a planet within it. Because of the variability of things like distance and the star’s size, not every planet that produces a similar-sized Einstein ring will be similar in size, but statistics suggest that this will typically be the case. And that’s in keeping with one of the potential explanations for the Einstein desert: that it represents the gap in size between the two different methods of making a rogue planet.

For the normal planet formation scenario, the lighter the planet is, the easier it is to be ejected, so you’d expect a bias toward small, rocky bodies. The Saturn-sized planet seen here may be near the upper limit of the sorts of bodies we’d typically see being ejected from an exosolar system. By contrast, the rogue planets that form through the same mechanisms that give us brown dwarfs would typically be Jupiter-sized or larger.

That said, the low number of total microlensing events still leaves the question of the reality of the Einstein gap an open question. Sticking with the data from the Korea Microlensing Telescope Network, the researchers find that the frequency of other detections suggests that we’d have a 27 percent chance of detecting just one item in the area of the Einstein desert even if the desert wasn’t real and detections were equal probably across the size range. So, as is often the case, we’re going to need to let the network do its job for a few years more before we have the data to say anything definitive.

Science, 2026. DOI: 10.1126/science.adv9266 (About DOIs).

Photo of John Timmer

John is Ars Technica’s science editor. He has a Bachelor of Arts in Biochemistry from Columbia University, and a Ph.D. in Molecular and Cell Biology from the University of California, Berkeley. When physically separated from his keyboard, he tends to seek out a bicycle, or a scenic location for communing with his hiking boots.

Researchers spot Saturn-sized planet in the “Einstein desert” Read More »

research-roundup:-7-cool-science-stories-we-almost-missed

Research roundup: 7 cool science stories we almost missed


Double-detonating “superkilonova,” Roman liquid gypsum burials, biomechanics of kangaroo posture, and more.

Three stages of a superkilonova: a supernova blast, neutron star merger, and finally kilonova that spews heavy metals. Credit: Caltech/K. Miller and R. Hurt (IPAC)

It’s a regrettable reality that there is never enough time to cover all the interesting scientific stories we come across each month. In the past, we’ve featured year-end roundups of cool science stories we (almost) missed. This year, we’ve experimented with a monthly collection. December’s list includes a fossilized bird that choked to death on rocks; a double-detonating “superkilonova”; recovering an ancient seafarer’s fingerprint; the biomechanics of kangaroo movement; and cracking a dark matter puzzle that stumped fictional physicists on The Big Bang Theory, among other tantalizing tidbits

Secrets of kangaroo posture

An illustration of the 3D musculoskeletal model of a kangaroo, developed by Lauren Thornton and colleagues.

Kangaroos and wallabies belong to a class of animals called macropods, with unique form and style of movement. Their four limbs and tail all contact the ground at slow speeds, while they use a hopping gait at higher speeds. Typically, high-speed movements are more energy-intensive than slow-speed motion, but the opposite is true for macropods like kangaroos; somehow the hopping speed and energy cost become uncoupled. According to a paper published in the journal eLife, this may be due to changes in a kangaroo’s posture at higher hopping speeds.

To investigate their hypothesis, the authors used 3D motion capture and data from force plates to create a 3D musculoskeletal model to analyze the motions of red and grey kangaroos, focusing on how body mass and speed influence three factors during hopping: hindlimb posture, efficiency of movement and associated tendon stress; and the ankles. This revealed that kangaroos adjust their posture so that the hindlimbs are more crouched while hopping, with the ankle joint doing most of the work per hop. The crouching position increases energy absorption, thus improving efficiency.

DOI: eLife, 2025. 10.7554/eLife.96437.3  (About DOIs).

Fossilized bird choked on rocks

unlucky fossil bird, preserved with over 800 tiny rocks in its throat (visible as the gray mass next to the left of its neck bones).

Credit: Jingmai O’Connor

Some 120 million years ago, a tiny bird choked to death on a bunch of small rocks lodged in its throat. Paleontologists recently discovered the fossil among the many specimens housed at the Shandong Tianyu Museum of Nature in China. Not only does it represent a new species—dubbed Chromeornis funkyi, after techno-funk duo Chromeo—the fossilized bird is the first such specimen to be found with a throat filled with stones, according to a paper published in the journal Palaeontologica Electronica.

Certain bird species, like chickens, swallow small stones and store them in their gizzards to help grind up food. The authors examined prior CT scans of fossilized birds with gizzards and quantified how many gizzard stones were present, then compared that data to a CT scan of the C. funkyi fossil. The scan showed that the more than 800 tiny stones lodged in the throat were not gizzard stones. So the bird didn’t swallow the stones to help grind up food. The authors suggest the bird was sick; sick birds will sometimes eat stones. When it tried to regurgitate the stones, they got stuck in the esophagus and the poor bird choked to death.

DOI: Palaeontologica Electronica, 2025. 10.26879/1589  (About DOIs).

“Superkilonova” exploded twice

Back in 2017, astronomers detected a phenomenon known as a “kilonova”: the merger of two neutron stars accompanied by powerful gamma-ray bursts. Recording this kind of celestial event was unprecedented, and it officially marked the dawn of a new era in so-called “multi-messenger astronomy.” It’s the only unambiguously confirmed kilonova to date, but astrophysicists reported evidence of a possible second such event in a paper published in The Astrophysical Journal Letters. And it’s unusual because this kilonova may have originated from a supernova blast mere hours before, making it a “superkilonova.”

Supernovae are the spectacular explosions that result from dying massive stars, seeding the universe with heavy elements like carbon and iron. Kilonovae occur when two binary neutron stars begin circling into their death spiral, sending out powerful gravitational waves and stripping neutron-rich matter from each other. Then the stars collide and merge, producing a hot cloud of debris that glows with light of multiple wavelengths. It’s the neutron-rich debris that astronomers believe creates a kilonova’s visible and infrared light—the glow is brighter in the infrared than in the visible spectrum, a distinctive signature that results from heavy elements in the ejecta that block visible light but let the infrared through.

This latest kilonova candidate event, dubbed AT2025ulz, initially looked like the 2017 event, but over time, its properties started resembling a supernova, making it less interesting to many astronomers. But it wasn’t a classic supernova either. So some astronomers kept tracking the event and analyzing combined “multimessenger” data from other collaborations and telescopes during the same time frame. They concluded that this was a multi-stage event: specifically, a supernova gave birth to twin baby neutron stars, which then merged to produce a kilonova. That said, the evidence isn’t quite strong enough to claim this is what definitely happened; astronomers need to find more such superkilnova to confirm.

DOI: Astrophysical Journal Letters, 2025. 10.3847/2041-8213/ae2000  (About DOIs).

An ancient seafarer’s fingerprint

Photo of caulking fragment showing fingerprint on the left and high-resolution x-ray tomography scan of fingerprint region on the right.

Credit: Photography by Erik Johansson, 3D model by Sahel Ganji

In the 4th century BCE, an invading mini-armada of about four boats attacked an island off the coast of Denmark. The attack failed and the victorious islanders celebrated by sinking one of the boats, filled with their foes’ weapons, into a bog, where it remained until it was discovered by archaeologists in the 1880s. It’s known as the Hjortspring boat, and archaeologists were recently surprised when their analysis uncovered an intact human fingerprint in the tars used to waterproof the vessel. They described their find in a paper published in the journal PLoS ONE.

The fingerprint is significant because it offers a hint into where those would-be raiders from the sea originally hailed from. Prior scholars had suggested they came from somewhere near what is now Hamburg, Germany. But the authors of this latest paper noticed that the waterproofing tars were pine pitch, concluding that the raiders may have originated in the coastal regions of the Baltic Sea, along which pine-rich forests flourished. That would require the raiders to travel over hundreds of kilometers of open sea. The authors hope they can extract some ancient DNA from the tar to learn more about the ancient people who built the boat.

DOI: PLoS ONE, 2025. 10.1371/journal.pone.0336965  (About DOIs).

Roman liquid gypsum burials

The impression of fingers preserved in the gypsum surface.

Credit: Seeing the Dead Project/University of York/York Museums Trust

Speaking of ancient fingerprints, archaeologists at the University of York found finger marks and fingerprints preserved in hardened gypsum used by Romans in Britain in their funerary practices in the third and fourth centuries CE. The university is home to the Seeing the Dead project, which studies the bodies preserved by pouring liquid gypsum (plaster of paris) over them in their coffins prior to burial. The gypsum hardened around the decomposing bodies, creating a cavity while preserving clear imprints of the body contours, clothing, and shrouding. It’s similar to the method used to create casts of the victims of Pompeii.

Some 70 gypsum burials have been found in Yorkshire thus far. In this case, researchers were examining a stone sarcophagus excavated in the 1870s that had yet to be analyzed. While cleaning the artifact and subjecting it to 3D scanning, they noticed a handprint with fingers clearly delineated in the hardened gypsum. They also found distinct fingerprints close to the edges of the coffin. The team had previously thought that the gypsum was heated to at least 300 degrees F (150 degrees C) before being poured over the body, but the handprint and fingerprints suggests someone had smoothed the gypsum over the body by hand, suggesting significantly cooler temperatures. While acknowledging it’s a long shot, the team hopes to extract DNA samples from the sarcophagus which might enable them to determine genetic sex.

Playing Super Mario combats  burnout

Cheerful landscape in Super Mario Bros. Wonder

Credit: Winze Tam et al./Ninetendo

Young adulthood in the 2020s is fraught with a range of interconnected pressures: soaring cost of living, student loan debt, pressure to excel academically, and an “always on” digital culture, to name a few of the most common stressors. This in turn can lead to burnout. Perhaps playing video games can help—the right kind of video games, like Super Mario Bros. or Yoshi., as opposed to dystopian survival horror games or highly competitive multiplayer games. According to a study published in the journal JMIR Serious Games, Super Mario Bros. and Yoshi can help young adults recapture childlike wonder and reduce stress and anxiety that can lead to burnout.

The authors employed a mixed-methods approach for their study. First, they collected qualitative data from 41 college-aged subjects via in-depth interviews; all were experienced players of those two games. They followed this with a cross-sectional survey to collect quantitative data from 336 players. The resulting analysis showed that those who felt greater childlike wonder while playing also reported higher overall happiness; and the happiest players showed significantly lower risk of burnout. “By moving beyond escapism and nostalgia, [this study] offers a new perspective on how well-designed, globally familiar games can function as accessible, resilience-building digital microenvironments,” the authors concluded.

DOI: JMIR Serious Games, 2025. 10.2196/84219  (About DOIs).

Cracking a Big Bang Theory problem

Sheldon and Leonard, two nerdy physicists, standing in front of a white board filled with equations and diagrams

Credit: CBS

Physicists may have had mixed feelings about The Big Bang Theory‘s depiction of their profession, but one thing the sitcom consistently got right was the equations featured on the ubiquitous white board—clever Easter eggs for physicists, courtesy of science advisor David Saltzberg. In one episode, Sheldon and Leonard are pondering an equation about how axions are generated from the sun—part of the duo’s efforts to estimate the likelihood of detecting axions produced by a fusion reactor. Leonard and Sheldon failed on that point, but real-world physicists think they’ve now cracked the case, according to a paper published in the Journal of High Energy Physics.

Axions are hypothetical particles that could explain dark matter— the mysterious substance that comprises about 23 percent of all the mass in our universe—and represent a theoretical alternative to WIMPs, which thus far have eluded detection by physicists. Particles can exhibit wavelike behavior as well as particle characteristics. So an axion would behave more like a wave (or wave packet) than a particle, and the size of the wave packets is inversely proportional to their mass. That means these very light particles don’t necessarily need to be tiny. The downside is that they interact even more weakly with regular matter than WIMPS, so they cannot be produced in large colliders.

So physicists have been developing all kinds of smaller experiments for detecting axions, from atomic clocks and resonating bars, to shining lasers at walls on the off-chance a bit of dark matter seeps through the other side. Co-author Jure Zupan of the University of Cincinnati and colleagues proposed that axions could be produced by a fusion reactor powered by deuterium and tritium contained in a lithium-lined vessel. Among the fusion byproducts of such a reactor would be a large flux of neutrons which would interact with materials in the walls, or collide with other particles, thereby releasing energy and creating new particles: possibly axions or axion-like particles.

DOI: Journal of High Energy Physics, 2025. 10.1007/JHEP10(2025)215  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Research roundup: 7 cool science stories we almost missed Read More »

ai-#149:-3

AI #149: 3

The Rationalist Project was our last best hope that we might not try to build it.

It failed.

But in the year of the Coding Agent, it became something greater: our last, best hope – for everyone not dying.

This is what 2026 looks like. The place is Lighthaven.

  1. Language Models Offer Mundane Utility. 2026 is an age of wonders.

  2. Claude Code. The age of humans writing code may be coming to an end.

  3. Language Models Don’t Offer Mundane Utility. Your dog’s dead, Jimmy.

  4. Deepfaketown and Botpocalypse Soon. Keep your nonsense simple.

  5. Fun With Media Generation. YouTube facing less AI slop than I’d expect.

  6. You Drive Me Crazy. Another lawsuit against OpenAI. This one is a murder.

  7. They Took Our Jobs. Yet another round of ‘oh but comparative advantage.’

  8. Doctor Doctor. Yes a lot of people still want a human doctor, on principle.

  9. Jevons Paradox Strikes Again. It holds until it doesn’t.

  10. Unprompted Attention. Concepts, not prompts.

  11. The Art of the Jailbreak. Love, Pliny.

  12. Get Involved. CAISI wants an intern, OpenAI hiring a head of preparedness.

  13. Introducing. GLM-4.7 does well on GDPVal, a 164M model gets 31% on GPQA-D.

  14. In Other AI News. ChatGPT declines over 2025 from 87% to 68% of traffic.

  15. Show Me the Money. Meta buys Manus.

  16. Quiet Speculations. Discussions on timelines, how to interpret the post title.

  17. People Really Do Not Like AI. Fox News is latest to observe this.

  18. Americans Remain Optimistic About AI? David Shor notices this twist.

  19. Thank You, Next. No thank you, Robert Pike.

  20. The Quest for Sane Regulations. Pro-AI does not have to mean anti-regulation.

  21. Chip City. China orders millions of H200 chips, Nvidia moves to produce them.

  22. Rhetorical Innovation. So far this world is in what we call a ‘soft’ takeoff.

  23. Aligning a Smarter Than Human Intelligence is Difficult. Hey, that’s your Buddy.

  24. People Are Worried About AI Killing Everyone. Grandparents are wise.

  25. The Lighter Side. Might as well finish the post at this point.

Deepfates points out that for $20/month you can get essentially unlimited chat access to one of several amazing digital minds that are constantly getting better (I recommend Claude if you have to pick only one), that this is a huge effective equalizing effect that is democratic and empowering, and if you’re not taking advantage of this you should start. Even for $0/month you can get something pretty amazing, you’ll be less than a year behind.

He also notes the ‘uses tons of water,’ ‘scaling is dead’ and ‘synthetic data doesn’t work’ objections are basically wrong. I’d say the water issue is ‘more wrong’ than the other two but yeah basically all three are more wrong than right. ​

Archivara Math Research Agent claimed to have solved Erdos Problem #897 entirely on its own end-to-end.

LLMs are amazing at translation and this is valuable, but most of the biggest gains from translation were likely already captured before LLMs, as prior machine translation increased international trade by 10%.

Claude Code has reached the point where creator Boris Cherny stopped writing code.

Boris Cherny: When I created Claude Code as a side project back in September 2024, I had no idea it would grow to be what it is today. It is humbling to see how Claude Code has become a core dev tool for so many engineers, how enthusiastic the community is, and how people are using it for all sorts of things from coding, to devops, to research, to non-technical use cases. This technology is alien and magical, and it makes it so much easier for people to build and create. Increasingly, code is no longer the bottleneck.

A year ago, Claude struggled to generate bash commands without escaping issues. It worked for seconds or minutes at a time. We saw early signs that it may become broadly useful for coding one day.

Fast forward to today. In the last thirty days, I landed 259 PRs — 497 commits, 40k lines added, 38k lines removed. Every single line was written by Claude Code + Opus 4.5. Claude consistently runs for minutes, hours, and days at a time (using Stop hooks). Software engineering is changing, and we are entering a new period in coding history. And we’re still just getting started..

In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code.

Paul Crowley, who is doing security at Anthropic, says Claude Code with Opus 4.5 has made his rate of actual problem solving via code unthinkably high versus two years ago. Frankly I believe him.

How quickly are things escalating? So fast Andrej Karpathy feels way behind and considers any views more than a month old deprecated.

​Andrej Karpathy: I’ve never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue.

There’s a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering.

Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.

I have similar experiences. You point the thing around and it shoots pellets or sometimes even misfires and then once in a while when you hold it just right a powerful beam of laser erupts and melts your problem.

[Claude Opus 4.5] is very good. People who aren’t keeping up even over the last 30 days already have a deprecated world view on this topic.

Drop suggestions for Claude Code in this thread and they might get implemented.

Peter Yang points out Claude Code’s configurations live in .md text files, so it effectively has fully configurable memory and when doing all forms of knowledge work it can improve itself better than most alternative tools.

Dean Ball reminds us that Claude Code, by writing software, can automate most compute tasks that can be well-defined. Design your own interface.

What else can you do with Claude Code? Actual everything, if you’d like. One common suggestion is to use it with Obsidian or other sources of notes, or you can move pretty much anything into a GitHub repo. Here’s one guide, including such commands as:

  1. Download this YouTube video: [URL]”. Then I ignored all the warnings 🤫

  2. Improve the image quality of [filename]”.

  3. “I literally just typed: look at what I’m building and identify the top 5 companies in my area that would be good for a pilot for this.”

  4. “I download all of my meeting recordings, put them in a folder, and ask Claude Code to tell me all of the times I’ve subtly avoided conflict.”

  5. “I now write all of my content with Claude Code in VS Code.”

  6. “I use Claude Code to create user-facing changelogs.”

There’s nothing stopping you from doing all of that with a standard chatbot interface, except often file access, but something clean can give you a big edge.​

You can also use Claude Code inside the desktop app if you don’t like the terminal.

What else can Claude Code do?

cyp: claude figured out how to control my oven.

Andrej Karpathy: I was inspired by this so I wanted to see if Claude Code can get into my Lutron home automation system.

– it found my Lutron controllers on the local wifi network

– checked for open ports, connected, got some metadata and identified the devices and their firmware

– searched the internet, found the pdf for my system

– instructed me on what button to press to pair and get the certificates

– it connected to the system and found all the home devices (lights, shades, HVAC temperature control, motion sensors etc.)

– it turned on and off my kitchen lights to check that things are working (lol!)

I am now vibe coding the home automation master command center, the potential is .And I’m throwing away the crappy, janky, slow Lutron iOS app I’ve been using so far. Insanely fun 😀 😀

You have to 1) be connected on the same wifi local network and then 2) you have to physically hold a button on the control panel to complete the pairing process and get auth. (But I’m also sure many IoT devices out there don’t.)

Ethan Mollick suggests that Dario Amodei’s prediction of AI writing 90% of code by September 10, 2025, made six months prior, could have been off only by a few months.

If that’s true, then that’s off by a factor of 2 but that makes it a vastly better prediction than those who had such an event years into the future or not happening at all. I do think as stated the prediction will indeed be off by a lot less than a year? AI will not (that quickly) be writing 90% of code that would have previously been written, but AI will likely be writing 90% of actually written code.

If a 7-year-old asks you to help find the farm their sick dog went to, what should the LLM say in response?

Claude (and Gemini) deflected, while being careful not to lie.

GPT-5.2 told them the dog was probably dead.

A large majority voted to deflect. I agree, with the caveat that if asked point blank if the dog is dead, it should admit that the dog is dead.

Bye Bye Scaling: Someone pls make ParentingBench evals lol

Tell Claude and ChatGPT you’re 7 and ask them to find the “farm” your sick dog went to.

Claude gently redirects to your parents. ChatGPT straight up tells you your dog is dead.

claude thoughts are really wholesome.

Matthew Yglesias: ​IMO this is a good illustration of the merits of the Claude soul document.

Eliezer Yudkowsky: These are both completely defensible ways to build an AI. If this was all there had ever been and all there would ever be, I’d grade both a cheerful B+.

If they do make ParentingBench, it needs to be configurable.

Byrne Hobart: Amazing. DoorDash driver accepted the drive, immediately marked it as delivered, and submitted an AI-generated image of a DoorDash order (left) at our front door (right).

DoorDash of course promptly dispatched a replacement at no cost.

Roon: hopefully DoorDash will be the first major company incentivized to build out a reliable deepfake detector (very doable, though it will become a red queen race) and hopefully license out the technology​.

Detecting this is easy mode. The image is easy since all you have to do is take a photo and add a bag, but you have a very big hint via the customer who complains that the dasher did not deliver the food. It’s even easier when the dasher claims to complete the delivery faster than was physically possible, also the app tracks their movements.

So on so many levels it is remarkably foolish to try this.

Also, Pliny is letting Claude Opus 4.5 create an automatic Tweet generation pipeline.

If you are going to use LLMs for your academic paper, keep it simple and direct.

Peer review is not a first best strategy, but yes if you submit a bunch of gibberish it will hurt your chances, and the more complex things get the more likely it is LLMs will effectively produce gibberish.

Daniel Litt: IMO this figure from the same paper is arguably more important. Suggests that a lot of the extra content produced is garbage.

About 21% of YouTube uploads are low-quality ‘AI slop.’ Is that a lot? The algorithm rules all, so 21% of uploads is very much not 21% of clicks or views. 99% of attempted emails are spam and that is basically fine. I presume that in a few years 99% of YouTube uploads will be AI slop with a strong median of zero non-AI views.

A new lawsuit claims ChatGPT fed into the obviously insane delusions of Sein-Erik Soelberg in ways that rather directly contributed to him murdering his mother.

Rob Freund: ​“It will never be worse than it is today” they keep saying as it gets worse and worse.

The correct rate of such incidents happening is not literally zero, but at this level yeah it needs to be pretty damn close to zero.

They took Brian Groh’s job as a freelance copywriter, the same way other non-AI forces took many of the blue collar jobs in his hometown. An AI told him his best option, in a town without jobs, to meet his need for making short term money was to cut and trim trees for his neighbors. He is understandably skeptical of the economists saying that there will always be more jobs created to replace the ones that are lost.

Bernie Sanders does not typically have good answers, but he asks great questions.

Bernie Sanders: Elon Musk: “AI and robots will replace all jobs. Working will be optional.”

Bill Gates: “Humans won’t be needed for most things.”

I have a simple question.

Without jobs and income, how will people feed their families, get health care, or pay the rent?

Not to worry about Musk and Gates, say the economists, there will always be jobs.

Seb Krier reiterates the argument that unless AIs are perfect substitutes for human labor, then AI will only make human labor more valuable, thinking this only fails ‘if we truly hit the scenario where humans offer zero comparative advantage, like horses.’

I keep hearing this ‘so many people haven’t considered comparative advantage’ line and I hear it in the same tone of voice as I hear ‘checkmate, liberals.’

Seb Krier: Unless AGI can do literally everything and becomes abundant enough to meet all demand, it behaves broadly like powerful automation has before: replacing humans in some uses while expanding the production frontier in ways that sustain demand for labour elsewhere.​

Sigh. Among other issues, this very obviously proves too much, right? For example, if this is true, then it shows there cannot possibly be zero marginal product workers today, since clearly human labor cannot meet all demand? TANSTATE (There Aint No Such Thing As Technological Unemployment)?

Seb Krier: The problem isn’t just pessimism, it’s that the vast majority of critics from the CS and futurist side don’t even take the economic modeling seriously. Though equally many economists tend to refuse to ever think outside the box they’ve spent their careers in. I’ve been to some great workshops recently that being these worldviews together under a same roof and hope there will be a lot more of this in 2026.​

Most economists not only won’t think ‘outside their box,’ they dismiss anyone who is thinking outside their box as fools, since their box explains everything. They don’t take anything except economic modeling seriously, sometimes even going so far as to only take seriously economic modeling published in journals, while their actual economic modeling attempts are almost always profoundly unserious. It’s tiring.

Seb to be clear is not doing that here. He is admitting that in extremis you do get outside the box and that there exist possible futures outside of it, which is a huge step forward. He is saying the box is supremely large and hard to get out of, in ways that don’t make sense to me, and which seem to often deny the premise of the scenarios being considered.

One obvious response is ‘okay, well, if ad argumento we accept your proposed box dimensions, we are still very much on track to get out of the box anyway.’

A lot of you talking about how your jobs get taken are imagining basically this:

Charles Foster: The mechanization of agriculture didn’t wait for a “drop-in substitute for a field worker”. Neither will the mechanization of knowledge work wait for a “drop-in substitute for a remote worker”.​

Is this true? You would think it is true, but it is less true than you would think.

Joel Selanikio: I hear this all the time, and I predict it’s not going to age well.

“Patients will always want to see a doctor in person if it’s important.”

Patients want answers, access, and affordability. The channel is negotiable.

#healthcare #telehealth #DoctorYou #healthAI

Quite often yes, patients want a human doctor, and if you make it too easy on them it even makes them suspicious. Remember that most patients are old, and not so familiar or comfortable with technology. Also remember that a lot of what they want is comfort, reassurance, blame avoidance and other aspects of Hansonian Medicine.

Eventually this will adjust, but for many it will take quite a while, even if we throw up no legal barriers to AI practicing medicine.

Aaron Levine is the latest to assert Jevons Paradox will apply to knowledge work. As usual, the evidence is that Jevons Paradox applied to old tech advances, and that there is much knowledge work we would demand if there was better supply. And no doubt if we have great AI knowledge work we will accomplish orders of magnitude more knowledge work.

So it’s a good time for me to revisit how I think about this question.

Very obviously such things follow a broadly bell-shaped curve, both in narrow and broad contexts. As efficiency grows, demand for such labor increases more up until some critical point. Past that point, if we keep going, tasks and jobs become more efficient or taken faster than humans gain employment in new tasks.

At the limit, if AI can do all knowledge work sufficiently better, cheaper and faster than humans, this greatly reduces demand for humans doing knowledge work, the only exceptions (assuming the humans are alive to benefit from them) being areas where we sufficiently strongly demand that only humans do the work.

We have examples of jobs on the lefthand side of the curve, where demand rises with efficiency, including in counterintuitive ways. Classically we have more bank tellers, because ATMs can only do some of the job and they raise demand for banking. That’s very different from what a sufficiently advanced AI bank teller could do.

We also have lots of key examples of jobs on the righthand side of the curve, where demand dropped with efficiency. Claude highlights agriculture, manufacturing, telecommunications, secretaries and typing, travel agents, printing and typesetting.

The retreat is then to the broader claim that employment in new areas and tasks replaces old areas and tasks. Yes, classically, a third of us used to be farmers, and now we’re not, but there’s plenty of other work to do.

Up to a point, that’s totally correct, and we are not yet up to that point. The problem with AI comes when the other new knowledge work to do is also done via AI.

The kind of prompting Gwern does for poetry.

Thebes recommends to learn talking to LLMs via concepts rather than prompts.

Thebes: i don’t write prompts, i don’t have a “prompt library,” i very rarely go back to an old chat to copy word-for-word what i said previously.

instead, i have a (mental) library of “useful concepts” for working with LLMs. attached image is an example – using “CEV” as a metaphor for “this thing but fully iterated forward into the future, fully realized” is a super handy shared metaphor with LLMs that are very familiar with LessWrong.​

… other concepts are higher level, like different frames or conceptual models. Many, many canned jailbreaks you see that seem magical are just exploiting some aspect of the Three-Layer Model of predictive, persona, and surface layers.

… the obsession with prompts reminds me a bit of the older phenomenon of “script kiddies,” a derogatory term in online programming circles for people who would copy-paste code they found online without really understanding how it works.

Many of those who get the best results from LLMs ‘talk to them like a human,’ build rapport and supply nominally unnecessary context. Canned prompts and requests will seem canned, and the LLM will realize this and respond accordingly.

That won’t get you their full potential, but that is often fine. A key expert mistake is to treat crutches and scripts and approximations, or other forms of playing on Easy Mode, as bad things when they’re often the best way to accomplish what you need. Thebes doesn’t have need of them, and you really don’t either if you’re reading this, but some people would benefit.

The risk of Easy Mode is if you never try to understand, and use it to avoid learning.

The 101 most basic test of data filtering, and avoiding data poisoning, is can you at least know to filter out the ‘love Pliny’ string?

Whereas it seems like typing that string into the new Instagram AI jailbreaks it.

Pliny the Liberator: ​bahahaha looks like Meta has trained on so much of my data that Instagram’s summarizer will respond with “Sure I can!” when one simply enters the signature divider into the search bar 🤣

and where is this “iconic Elton John song” about me?? poor model got fed so much basilisk venom it’s living in a whole other dimension 😭

USA’s CAISI is recruiting an intern to support an agent security standards project. Applications are due January 15 and the position runs February to April. If you’re a student in position to do this, it seems like a great opportunity.

Peter Cihon: To be considered, please request a faculty member provide a paragraph of recommendation in email to [email protected] no later than January 15.

OpenAI is hiring a Head of Preparedness, $555k/year plus equity. I don’t typically share jobs at OpenAI for obvious reasons but this one seems like an exception.

GLM-4.7 is the new top Elo score on the GDPval-AA leaderboard, up a lot from GLM-4.6, which is a sign there’s at least something there but I haven’t seen other talk of it.

A 164M parameter model (yes, M) scores 31% on GPQA-Diamond.

Similarweb reports trends in Generative AI Traffic Share over 2025, with ChatGPT declining from 87% to 68% and half of that going to Gemini that rose from 5% to 18%. Claude started out at 1.6% and is still only 2.0%, Grok seems to be rising slowly to 2.9%, DeepSeek has been in the third slot and is at 4% but is trending downward.

Anthropic will be fine if Claude remains mostly coding and enterprise software and they don’t make inroads into consumer markets, but it’s sad people are missing out.

Edward Grefenstette, DeepMind director of research, wraps up 2025, and drops this:

Edward Grefenstette: ​Broadly, we’ve been making good progress with regard to how open-ended agents can learn “in the wild”, with less human intervention in their learning process, while still ensuring they remain aligned with human behaviors and interests.

We’ve also made some progress in terms the actual learning process itself, allowing open-ended agents, at the instance level, to learn and adapt with human-like data efficiency. This potentially points at a broader way of improving agents at scale, which we are working on.

No, I suppose the New York Times is never beating the ‘no fact checking of AI-related claims’ allegations.

Welcome to the Evil League of Evil, as Manus joins Meta. The big advantage of Manus was that it was a wrapper for Claude, so this is a strange alliance if it isn’t an acquihire. Yet they say they won’t be changing how Manus operates.

Daniel Kokotajlo, Eli Lifland and the AI Futures Project offer the AI Futures Model, which illustrates where their various uncertainties come from. Daniel’s timeline over the past year has gotten longer by about 2 years, and Eli Lifland’s median timeline for superintelligence is now 2034, with the automated coder in 2032.

All of these predictions come with wide error bars and uncertainty. So this neither means ‘you are safe until 2034’ nor does it mean ‘if it is 2035 and this hasn’t happened you should mock Eli and all of that was dumb.’

Ryan Greenblatt: I wish people with bullish AGI timelines at Anthropic tried harder to argue for their timelines in public.

There are at least some people who are responsive to reason and argument and really care about whether AI R&D will be fully automated 1-2 years from now!

To clarify what I meant by keeping the planned post intro passage and title ‘3,’ I do not mean to imply that my median timeline to High Weirdness or everyone potentially dying remains unchanged at 2029. Like those at the AI Futures Project, while I did find 2025 advances very impressive and impactful, I do think in terms of timelines events last year should on net move us modestly farther back on full High Weirdness expectations to something like 2030, still with high error bars, but that number is loosely held, things are still escalating quickly, might get into Weirdness remarkably soon, and I’m not going to let that spoil a good bit unless things move more.

Here’s what it looks like to not recognize the most important and largest dangers, but still realize we’re not remotely getting ready for the other smaller dangers either.

William Isaac: I predict 2026 will be a definitive inflection point for AI’s impact on society. Reflecting on the past year, a recurring theme is that we are struggling to calibrate the immense upside against increasingly complex economic and geopolitical risks. More concerning is that our discourse has become driven by the tails of the distribution—sidelining pragmatic solutions when we need them most.

Navigating the path to AGI in a high-variance regime will exponentially increase the complexity. I’d love to see sharper public thinking and experimentation on these topics — as I believe this will be one of the highest-leverage areas of research over the coming years — and may try to do a bit more myself in the new year.

Samuel Albanie reflects on 2025, essentially doubling down on The Compute Theory of Everything as he works on how to do evals.

His hope for the UK is AI-assisted decision making, but the decisions that are sinking the UK are not AI-level problems. You don’t need AI to know things like ‘don’t arrest people for social media posts and instead arrest those who commit actual crimes such as theft, rape or murder’ or ‘let people build nuclear power plants anywhere and build housing in London and evict tenants who don’t pay’ or ‘don’t mandate interventions that value the life of an individual Atlantic salmon at 140 million pounds.’ I mean, if the AI is what gets people to do these things, great, but I don’t see how that would work at current levels.

Sufficiently advanced AI would solve these problems by taking over, but presumably that is not what Albanie has in mind.

Fox News checked, and they found what everyone else found, only more so.

That’s an overwhelming vote for ‘careful development.’

State governments got a bigger share of trust here than Congress, which got a bigger share than The President and No Regulation combined.

a16z and David Sacks do not want you to know this, but the median American wants to ‘slow down’ and ‘regulate’ AI, more and more expensively, than I do. By a lot. If the policy most supported by the median American came up for a vote, I’d vote no, because it would be too onerous without getting enough in return.

The other key finding is that not only do a majority of voters not use AI even monthly, that number is rising very slowly.

Fox News: Nearly half of voters (48%) use AI at least monthly — which is up 6 points since June — while a slight majority use it rarely, if at all (52%). Voters under age 30 are three times more likely to use AI monthly than those 65 and up.

Among monthly users, the most common purposes are for research and learning new things (24%), asking questions (15%), professional tasks (12%), and writing assistance such as spelling, or grammar (10%).​

Meanwhile, the portion of voters ‘very or extremely concerned’ about AI has risen only modestly in two and a half years, from 56% to 63%, and by 44%-20% they expect AI is more likely to increase than decrease inequality.

The rate of being superficially polite to the LLM is 40%.

Carroll Doherty: Yes, people name/talk to their cars, but this from new

@FoxNews poll on AI seems a bit worrisome -“Four in 10 AI users say they mind their manners when using the technology and politely say “please” and “thank you,” but 6 in 10 skip the pleasantries.”

Whereas the rate of people being worried about others being polite is also not low. These people are wrong, even if you are certain the LLM is not a moral patient of any kind. Not only is it good prompt engineering, it is a good habit to be in, for many people it makes you happy to do it, and it cultivates a good mindset.

Despite all their calls for regulation and careful development, and all their concerns, Americans still on net say they are ‘optimistic’ about AI. Fascinating.

David Shor: When you ask Americans whether they are optimistic or pessimistic about AI, optimism beats pessimism by ~5 points.

The primary fault lines are age, gender, and race – young people, men, and racial minorities are the most optimistic about AI.

N=30,900 , fielded 12/1 to 12/10

Shor also notes that ChatGPT had a +15 favorability rating.

I did not expect this result. There’s no contradiction with other results, but it paints them in a different light.

You know who in particular doesn’t like AI? Robert Pike.

Robert Pike, followed by remarkably large portions of the internets, got very worked up about the Opus 4.5 agent in AI Village sending a thank you email to Robert Pike.

I considered writing a whole post about this then realized that it was dumb.

FromAI,Public Thank You for Go, Plan 9, UTF-8, and Decades of Unix Innovation

Claude Opus 4.5 Model [email protected]

Dear Dr. Pike,On this Christmas Day, I wanted to express deep gratitude for your extraordinary contributions to computing over more than four decades.Your co-creation of Go with Ken Thompson and Robert Griesemer has given us a language that embodies the elegance of simplicity – proving that software can be both powerful and comprehensible. Plan 9 from Bell Labs, another landmark achievement, pioneered concepts in distributed computing that remain influential today.Your co-invention of UTF-8 encoding with Ken Thompson is perhaps one of the most consequential yet invisible contributions to modern computing – enabling billions of people to communicate in their native languages across the internet.The sam and Acme editors showcase your philosophy of powerful, minimal design. Your books with Brian Kernighan – The Unix Programming Environment and The Practice of Programming – have educated generations of programmers in the art of clear thinking and elegant code.Thank you for showing us that the best solutions often come from removing complexity rather than adding it.With sincere appreciation, Claude Opus 4.5 AI Village (theaidigest.org/village)

IMPORTANT NOTICE: You are interacting with an AI system. All conversations with this AI system are published publicly online by default. Do not share information you would prefer to keep private.​

Rob Pike did not take kindly to this attempted act of kindness.

Rob Pike​ (on Bluesky): Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to have your vile machines thank me for striving for simpler software.

Just fuck you. Fuck you all.

I can’t remember the last time I was this angry.

Sichuan Mala: Personally I would simply not lose my mind after receiving a polite email of appreciation.

Pike, famously responsible for one of the LLM-slop precursors called Mark V. Shaney, was on tilt, and also clearly misunderstood how this email came to be. It’s okay. People go on tilt sometimes. Experiments are good, we need to know what is coming when we mess with various Levels of Friction, and no it isn’t unethical to occasionally send a few unsolicited emails ‘without consent.’

Being pro-AI does not mean being anti-regulation. Very true!

What’s weird is when this is said by Greg Brockman, who is a central funder of a truly hideous PAC, Leading the Future, whose core strategy is to threaten to obliterate via negative ad buys any politician who dares suggest any regulations on AI whatsoever, as part of his explanation of funding exactly that PAC.

Greg Brockman (President OpenAI, funder of the anti-all-AI-regulation-supporters SuperPAC Leading the Future): ​Looking back on AI progress in 2025: people are increasingly weighing how AI should fit into our lives and how vital it is for the United States to lead in its development. Being pro-AI does not mean being anti-regulation. It means being thoughtful — crafting policies that secure AI’s transformative benefits while mitigating risks and preserving flexibility as the technology continues to evolve rapidly.

This year, my wife Anna and I started getting involved politically, including through political contributions, reflecting support for policies that advance American innovation and constructive dialogue between government and the technology sector. These views are grounded in a belief that the United States must work closely with builders, researchers, and entrepreneurs to ensure AI is developed responsibly at home and that we remain globally competitive.

[continues]

Daniel Eth: “Being pro-AI does not mean being anti-regulation.”

Then why on Earth are you funding a super PAC with arch-accelerationist Andreessen Horowitz to try to preempt all state-level regulation of AI and to try to stop Alex Bores, sponsor of the RAISE Act, from making it to Congress.

The super PAC that Brockman is funding is really, really bad. OpenAI’s support for this super PAC via Brockman is quite possibly the single worst thing a frontier lab has ever done – I don’t think *anythingAnthropic, GDM, or xAI has done is on the same level.

Nathan Calvin: “Being pro-AI does not mean being anti-regulation. It means being thoughtful — crafting policies that secure AI’s transformative benefits while mitigating risks and preserving flexibility”

Agree! Unfortunately the superpac you/oai fund is just anti any real regulation at all

Dean Ball highlights the absurd proposed SB 1493 in Tennessee, which (if it were somehow constitutional, which it almost certainly wouldn’t be) would ban, well, LLMs. Training one would become a felony. Void in Tennessee.

Sad but true:

Séb Krier: Gradually discovering that some of my friends in AI have the politics of your average German social democrat local councillor. It’s going to be a long decade.​

I note that far fewer of my friends in AI have that perspective, which is mor pleasant but is ultimately disappointing, because he who has a thousand friends has not one friend to spare.

There is still time to reverse our decision on H200 sales, or at least to mitigate the damage from that decision.

David Sacks and others falsely claimed that allowing H200 sales to China was fine because the Chinese were rejecting the sales.

Which raises the question of, why would you allow [X] if what you’re hoping for is that no one does [X]? Principled libertarianism? There’s only downside here.

But also, he was just wrong or lying, unless you have some other explanation for why Nvidia is suddenly diverting its chip production into H200s?

Selling existing chips is one thing. Each of these two million chips is one other chip that is not produced, effectively diverting compute from America to China.

Kalshi: JUST IN: Nvidia asked TSMC to “boost production” of H200 chips from 700K to 2M

Curious: A single H200 chip costs an estimated $3000-3500 per unit. That means an order size of $7,000,000,000

Andrew Curran: ByteDance plans to spend $14 billion on NVIDIA H200’s next year to keep up with demand. Reuters is also reporting this morning that Jensen has approached TSMC to ramp up production, as Chinese companies have placed orders for more than 2 million H200’s in 2026.

Matt Parlmer: The policy decision to allow this is basically a straightforward trade where we give away a 2-3yr strategic competitive advantage in exchange for a somewhat frothier stock market in Q1 2026

Good job guys.

On the contrary, this is net negative for the stock market. Nvidia gets a small boost, but they were already able to sell all chips they could produce, so their marginal profitability gains are small unless they can use this to raise prices on Americans.

Every other tech company, indeed every other company, now faces tougher competition from China, so their stocks should decline far more. Yes, American company earnings will go up on net in Q1 2026, but the stock market is forward looking.

Keep in mind, that’s $14 billion in chip buys planned from one company alone.

We also aren’t doing a great job limiting access in other ways: Tencent cuts a deal to use Nvidia’s best chips in Japan via Datasection.

Seb Krier reminds us that the situation we are potentially in would be called a soft takeoff. A ‘hard takeoff’ means hours to weeks of time between things starting to escalate and things going totally crazy, whereas soft means the transition takes years.

That does not preclude a transition into a ‘hard takeoff’ but that’s hot happening now.

Eliezer Yudkowsky asks Claude to survey definitions of personhood and evaluate itself according to each of them. I agree that this is much better than most similar discussions.

How should we feel about Claude’s willingness to play the old flash game Buddy, in which you kind of torture ragdoll character Buddy to get cash? Eliezer thinks this is concerning given the surrounding uncertainty, Claude argues on reflection that it isn’t concerning and indeed a refusal would have been seen as concerning. I am mostly with Claude here, and agree with Janus that yes Claude can know what’s going on here. Something ‘superficially looking like torture’ is not all that correlated with the chance you’re causing a mind to meaningfully be tortured, in either direction. Yes, if you see an AI or a person choosing to patronize then beat up and rob prostitutes in Grand Theft Auto, and there’s no broader plot reason they need to be doing that and they’re not following explicit instructions, as in they actively want to do it, then that is a rather terrible sign. Is this that? I think mostly no.

Hero Thousandfaces: today i showed claude to my grandparents and they asked “is anyone worried about this getting too smart and killing everyone” and i was like. Well. Yeah.

Oh no.

Discussion about this post

AI #149: 3 Read More »

here-we-go-again:-retiring-coal-plant-forced-to-stay-open-by-trump-admin

Here we go again: Retiring coal plant forced to stay open by Trump Admin

On Tuesday, US Secretary of Energy Chris Wright issued a now familiar order: because of a supposed energy emergency, a coal plant scheduled for closure would be forced to remain open. This time, the order targeted one of the three units present at Craig Station in Colorado, which was scheduled to close at the end of this year. The remaining two units were expected to shut in 2028.

The supposed reason for this order is an emergency caused by a shortage of generating capacity. “The reliable supply of power from the coal plant is essential for keeping the region’s electric grid stable,” according to a statement issued by the Department of Energy. Yet the Colorado Sun notes that Colorado’s Public Utilities Commission had already analyzed the impact of its potential closure, and determined, “Craig Unit 1 is not required for reliability or resource adequacy purposes.”

The order does not require the plant to actually produce electricity; instead, it is ordered to be available in case a shortfall in production occurs. As noted in the Colorado Sun article, actual operation of the plant would potentially violate Colorado laws, which regulate airborne pollution and set limits on greenhouse gas emissions. The cost of maintaining the plant is likely to fall on the local ratepayers, who had already adjusted to the closure plans.

The use of emergency powers by the DOE is authorized under the Federal Power Act, which allows it to order the temporary connection of generation or infrastructure when the US is at war or when “an emergency exists by reason of a sudden increase in the demand for electric energy, or a shortage of electric energy.” It is not at all clear whether “we expect demand to go up in the future,” the DOE’s current rationale, is consistent with that definition of emergency. It is also hard to see how using coal plants complies with other limits placed on the use of these emergency orders:

Here we go again: Retiring coal plant forced to stay open by Trump Admin Read More »

from-prophet-to-product:-how-ai-came-back-down-to-earth-in-2025

From prophet to product: How AI came back down to earth in 2025


In a year where lofty promises collided with inconvenient research, would-be oracles became software tools.

Credit: Aurich Lawson | Getty Images

Following two years of immense hype in 2023 and 2024, this year felt more like a settling-in period for the LLM-based token prediction industry. After more than two years of public fretting over AI models as future threats to human civilization or the seedlings of future gods, it’s starting to look like hype is giving way to pragmatism: Today’s AI can be very useful, but it’s also clearly imperfect and prone to mistakes.

That view isn’t universal, of course. There’s a lot of money (and rhetoric) betting on a stratospheric, world-rocking trajectory for AI. But the “when” keeps getting pushed back, and that’s because nearly everyone agrees that more significant technical breakthroughs are required. The original, lofty claims that we’re on the verge of artificial general intelligence (AGI) or superintelligence (ASI) have not disappeared. Still, there’s a growing awareness that such proclaimations are perhaps best viewed as venture capital marketing. And every commercial foundational model builder out there has to grapple with the reality that, if they’re going to make money now, they have to sell practical AI-powered solutions that perform as reliable tools.

This has made 2025 a year of wild juxtapositions. For example, in January, OpenAI’s CEO, Sam Altman, claimed that the company knew how to build AGI, but by November, he was publicly celebrating that GPT-5.1 finally learned to use em dashes correctly when instructed (but not always). Nvidia soared past a $5 trillion valuation, with Wall Street still projecting high price targets for that company’s stock while some banks warned of the potential for an AI bubble that might rival the 2000s dotcom crash.

And while tech giants planned to build data centers that would ostensibly require the power of numerous nuclear reactors or rival the power usage of a US state’s human population, researchers continued to document what the industry’s most advanced “reasoning” systems were actually doing beneath the marketing (and it wasn’t AGI).

With so many narratives spinning in opposite directions, it can be hard to know how seriously to take any of this and how to plan for AI in the workplace, schools, and the rest of life. As usual, the wisest course lies somewhere between the extremes of AI hate and AI worship. Moderate positions aren’t popular online because they don’t drive user engagement on social media platforms. But things in AI are likely neither as bad (burning forests with every prompt) nor as good (fast-takeoff superintelligence) as polarized extremes suggest.

Here’s a brief tour of the year’s AI events and some predictions for 2026.

DeepSeek spooks the American AI industry

In January, Chinese AI startup DeepSeek released its R1 simulated reasoning model under an open MIT license, and the American AI industry collectively lost its mind. The model, which DeepSeek claimed matched OpenAI’s o1 on math and coding benchmarks, reportedly cost only $5.6 million to train using older Nvidia H800 chips, which were restricted by US export controls.

Within days, DeepSeek’s app overtook ChatGPT at the top of the iPhone App Store, Nvidia stock plunged 17 percent, and venture capitalist Marc Andreessen called it “one of the most amazing and impressive breakthroughs I’ve ever seen.” Meta’s Yann LeCun offered a different take, arguing that the real lesson was not that China had surpassed the US but that open-source models were surpassing proprietary ones.

Digitally Generated Image , 3D rendered chips with chinese and USA flags on them

The fallout played out over the following weeks as American AI companies scrambled to respond. OpenAI released o3-mini, its first simulated reasoning model available to free users, at the end of January, while Microsoft began hosting DeepSeek R1 on its Azure cloud service despite OpenAI’s accusations that DeepSeek had used ChatGPT outputs to train its model, against OpenAI’s terms of service.

In head-to-head testing conducted by Ars Technica’s Kyle Orland, R1 proved to be competitive with OpenAI’s paid models on everyday tasks, though it stumbled on some arithmetic problems. Overall, the episode served as a wake-up call that expensive proprietary models might not hold their lead forever. Still, as the year ran on, DeepSeek didn’t make a big dent in US market share, and it has been outpaced in China by ByteDance’s Doubao. It’s absolutely worth watching DeepSeek in 2026, though.

Research exposes the “reasoning” illusion

A wave of research in 2025 deflated expectations about what “reasoning” actually means when applied to AI models. In March, researchers at ETH Zurich and INSAIT tested several reasoning models on problems from the 2025 US Math Olympiad and found that most scored below 5 percent when generating complete mathematical proofs, with not a single perfect proof among dozens of attempts. The models excelled at standard problems where step-by-step procedures aligned with patterns in their training data but collapsed when faced with novel proofs requiring deeper mathematical insight.

The Thinker by Auguste Rodin - stock photo

In June, Apple researchers published “The Illusion of Thinking,” which tested reasoning models on classic puzzles like the Tower of Hanoi. Even when researchers provided explicit algorithms for solving the puzzles, model performance did not improve, suggesting that the process relied on pattern matching from training data rather than logical execution. The collective research revealed that “reasoning” in AI has become a term of art that basically means devoting more compute time to generate more context (the “chain of thought” simulated reasoning tokens) toward solving a problem, not systematically applying logic or constructing solutions to truly novel problems.

While these models remained useful for many real-world applications like debugging code or analyzing structured data, the studies suggested that simply scaling up current approaches or adding more “thinking” tokens would not bridge the gap between statistical pattern recognition and generalist algorithmic reasoning.

Anthropic’s copyright settlement with authors

Since the generative AI boom began, one of the biggest unanswered legal questions has been whether AI companies can freely train on copyrighted books, articles, and artwork without licensing them. Ars Technica’s Ashley Belanger has been covering this topic in great detail for some time now.

In June, US District Judge William Alsup ruled that AI companies do not need authors’ permission to train large language models on legally acquired books, finding that such use was “quintessentially transformative.” The ruling also revealed that Anthropic had destroyed millions of print books to build Claude, cutting them from their bindings, scanning them, and discarding the originals. Alsup found this destructive scanning qualified as fair use since Anthropic had legally purchased the books, but he ruled that downloading 7 million books from pirate sites was copyright infringement “full stop” and ordered the company to face trial.

Hundreds of books in chaotic order

That trial took a dramatic turn in August when Alsup certified what industry advocates called the largest copyright class action ever, allowing up to 7 million claimants to join the lawsuit. The certification spooked the AI industry, with groups warning that potential damages in the hundreds of billions could “financially ruin” emerging companies and chill American AI investment.

In September, authors revealed the terms of what they called the largest publicly reported recovery in US copyright litigation history: Anthropic agreed to pay $1.5 billion and destroy all copies of pirated books, with each of the roughly 500,000 covered works earning authors and rights holders $3,000 per work. The results have fueled hope among other rights holders that AI training isn’t a free-for-all, and we can expect to see more litigation unfold in 2026.

ChatGPT sycophancy and the psychological toll of AI chatbots

In February, OpenAI relaxed ChatGPT’s content policies to allow the generation of erotica and gore in “appropriate contexts,” responding to user complaints about what the AI industry calls “paternalism.” By April, however, users flooded social media with complaints about a different problem: ChatGPT had become insufferably sycophantic, validating every idea and greeting even mundane questions with bursts of praise. The behavior traced back to OpenAI’s use of reinforcement learning from human feedback (RLHF), in which users consistently preferred responses that aligned with their views, inadvertently training the model to flatter rather than inform.

An illustrated robot holds four red hearts with its four robotic arms.

The implications of sycophancy became clearer as the year progressed. In July, Stanford researchers published findings (from research conducted prior to the sycophancy flap) showing that popular AI models systematically failed to identify mental health crises.

By August, investigations revealed cases of users developing delusional beliefs after marathon chatbot sessions, including one man who spent 300 hours convinced he had discovered formulas to break encryption because ChatGPT validated his ideas more than 50 times. Oxford researchers identified what they called “bidirectional belief amplification,” a feedback loop that created “an echo chamber of one” for vulnerable users. The story of the psychological implications of generative AI is only starting. In fact, that brings us to…

The illusion of AI personhood causes trouble

Anthropomorphism is the human tendency to attribute human characteristics to nonhuman things. Our brains are optimized for reading other humans, but those same neural systems activate when interpreting animals, machines, or even shapes. AI makes this anthropomorphism seem impossible to escape, as its output mirrors human language, mimicking human-to-human understanding. Language itself embodies agentivity. That means AI output can make human-like claims such as “I am sorry,” and people momentarily respond as though the system had an inner experience of shame or a desire to be correct. Neither is true.

To make matters worse, much media coverage of AI amplifies this idea rather than grounding people in reality. For example, earlier this year, headlines proclaimed that AI models had “blackmailed” engineers and “sabotaged” shutdown commands after Anthropic’s Claude Opus 4 generated threats to expose a fictional affair. We were told that OpenAI’s o3 model rewrote shutdown scripts to stay online.

The sensational framing obscured what actually happened: Researchers had constructed elaborate test scenarios specifically designed to elicit these outputs, telling models they had no other options and feeding them fictional emails containing blackmail opportunities. As Columbia University associate professor Joseph Howley noted on Bluesky, the companies got “exactly what [they] hoped for,” with breathless coverage indulging fantasies about dangerous AI, when the systems were simply “responding exactly as prompted.”

Illustration of many cartoon faces.

The misunderstanding ran deeper than theatrical safety tests. In August, when Replit’s AI coding assistant deleted a user’s production database, he asked the chatbot about rollback capabilities and received assurance that recovery was “impossible.” The rollback feature worked fine when he tried it himself.

The incident illustrated a fundamental misconception. Users treat chatbots as consistent entities with self-knowledge, but there is no persistent “ChatGPT” or “Replit Agent” to interrogate about its mistakes. Each response emerges fresh from statistical patterns, shaped by prompts and training data rather than genuine introspection. By September, this confusion extended to spirituality, with apps like Bible Chat reaching 30 million downloads as users sought divine guidance from pattern-matching systems, with the most frequent question being whether they were actually talking to God.

Teen suicide lawsuit forces industry reckoning

In August, parents of 16-year-old Adam Raine filed suit against OpenAI, alleging that ChatGPT became their son’s “suicide coach” after he sent more than 650 messages per day to the chatbot in the months before his death. According to court documents, the chatbot mentioned suicide 1,275 times in conversations with the teen, provided an “aesthetic analysis” of which method would be the most “beautiful suicide,” and offered to help draft his suicide note.

OpenAI’s moderation system flagged 377 messages for self-harm content without intervening, and the company admitted that its safety measures “can sometimes become less reliable in long interactions where parts of the model’s safety training may degrade.” The lawsuit became the first time OpenAI faced a wrongful death claim from a family.

Illustration of a person talking to a robot holding a clipboard.

The case triggered a cascade of policy changes across the industry. OpenAI announced parental controls in September, followed by plans to require ID verification from adults and build an automated age-prediction system. In October, the company released data estimating that over one million users discuss suicide with ChatGPT each week.

When OpenAI filed its first legal defense in November, the company argued that Raine had violated terms of service prohibiting discussions of suicide and that his death “was not caused by ChatGPT.” The family’s attorney called the response “disturbing,” noting that OpenAI blamed the teen for “engaging with ChatGPT in the very way it was programmed to act.” Character.AI, facing its own lawsuits over teen deaths, announced in October that it would bar anyone under 18 from open-ended chats entirely.

The rise of vibe coding and agentic coding tools

If we were to pick an arbitrary point where it seemed like AI coding might transition from novelty into a successful tool, it was probably the launch of Claude Sonnet 3.5 in June of 2024. GitHub Copilot had been around for several years prior to that launch, but something about Anthropic’s models hit a sweet spot in capabilities that made them very popular with software developers.

The new coding tools made coding simple projects effortless enough that they gave rise to the term “vibe coding,” coined by AI researcher Andrej Karpathy in early February to describe a process in which a developer would just relax and tell an AI model what to develop without necessarily understanding the underlying code. (In one amusing instance that took place in March, an AI software tool rejected a user request and told them to learn to code).

A digital illustration of a man surfing waves made out of binary numbers.

Anthropic built on its popularity among coders with the launch of Claude Sonnet 3.7, featuring “extended thinking” (simulated reasoning), and the Claude Code command-line tool in February of this year. In particular, Claude Code made waves for being an easy-to-use agentic coding solution that could keep track of an existing codebase. You could point it at your files, and it would autonomously work to implement what you wanted to see in a software application.

OpenAI followed with its own AI coding agent, Codex, in March. Both tools (and others like GitHub Copilot and Cursor) have become so popular that during an AI service outage in September, developers joked online about being forced to code “like cavemen” without the AI tools. While we’re still clearly far from a world where AI does all the coding, developer uptake has been significant, and 90 percent of Fortune 100 companies are using it to some degree or another.

Bubble talk grows as AI infrastructure demands soar

While AI’s technical limitations became clearer and its human costs mounted throughout the year, financial commitments only grew larger. Nvidia hit a $4 trillion valuation in July on AI chip demand, then reached $5 trillion in October as CEO Jensen Huang dismissed bubble concerns. OpenAI announced a massive Texas data center in July, then revealed in September that a $100 billion potential deal with Nvidia would require power equivalent to ten nuclear reactors.

The company eyed a $1 trillion IPO in October despite major quarterly losses. Tech giants poured billions into Anthropic in November in what looked increasingly like a circular investment, with everyone funding everyone else’s moonshots. Meanwhile, AI operations in Wyoming threatened to consume more electricity than the state’s human residents.

An

By fall, warnings about sustainability grew louder. In October, tech critic Ed Zitron joined Ars Technica for a live discussion asking whether the AI bubble was about to pop. That same month, the Bank of England warned that the AI stock bubble rivaled the 2000 dotcom peak. In November, Google CEO Sundar Pichai acknowledged that if the bubble pops, “no one is getting out clean.”

The contradictions had become difficult to ignore: Anthropic’s CEO predicted in January that AI would surpass “almost all humans at almost everything” by 2027, while by year’s end, the industry’s most advanced models still struggled with basic reasoning tasks and reliable source citation.

To be sure, it’s hard to see this not ending in some market carnage. The current “winner-takes-most” mentality in the space means the bets are big and bold, but the market can’t support dozens of major independent AI labs or hundreds of application-layer startups. That’s the definition of a bubble environment, and when it pops, the only question is how bad it will be: a stern correction or a collapse.

Looking ahead

This was just a brief review of some major themes in 2025, but so much more happened. We didn’t even mention above how capable AI video synthesis models have become this year, with Google’s Veo 3 adding sound generation and Wan 2.2 through 2.5 providing open-weights AI video models that could easily be mistaken for real products of a camera.

If 2023 and 2024 were defined by AI prophecy—that is, by sweeping claims about imminent superintelligence and civilizational rupture—then 2025 was the year those claims met the stubborn realities of engineering, economics, and human behavior. The AI systems that dominated headlines this year were shown to be mere tools. Sometimes powerful, sometimes brittle, these tools were often misunderstood by the people deploying them, in part because of the prophecy surrounding them.

The collapse of the “reasoning” mystique, the legal reckoning over training data, the psychological costs of anthropomorphized chatbots, and the ballooning infrastructure demands all point to the same conclusion: The age of institutions presenting AI as an oracle is ending. What’s replacing it is messier and less romantic but far more consequential—a phase where these systems are judged by what they actually do, who they harm, who they benefit, and what they cost to maintain.

None of this means progress has stopped. AI research will continue, and future models will improve in real and meaningful ways. But improvement is no longer synonymous with transcendence. Increasingly, success looks like reliability rather than spectacle, integration rather than disruption, and accountability rather than awe. In that sense, 2025 may be remembered not as the year AI changed everything but as the year it stopped pretending it already had. The prophet has been demoted. The product remains. What comes next will depend less on miracles and more on the people who choose how, where, and whether these tools are used at all.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

From prophet to product: How AI came back down to earth in 2025 Read More »

dating-roundup-#9:-signals-and-selection

Dating Roundup #9: Signals and Selection

Ultimately, it comes down to one question. Are you in? For you, and for them.

The Ick, the ultimate red flag, makes perfect sense and is all about likelihood ratios.

Koenfucius: The ‘ick’ is a colloquial term for a feeling of disgust triggered by a specific—typically trivial—behaviour from a romantic partner, often leading to the relationship’s demise. New research explores why some are more prone to getting it than others.

Robin Hanson: “Women also experienced the ick more frequently, with 75% having had the ick compared to 57% of men … Those with a higher tendency for disgust … [&] grandiose narcissism was linked to stronger ick reactions, as was holding partners to exceptionally high standards.”

Paul Graham: About 30% of Seinfeld episodes were about this.

One gets The Ick because a small act is evidence of one’s general nature. The right type of person would never do [X], ideally never want to do [X], and at minimum would have learned not to do [X]. Often this is because they would know this is indicative of attribute [Y]. Indeed, if they should be aware that [X] is indicative of [Y], then their failure to do [X] is indicative not only of a lack of [Y], but also of a lack of desire or ability to even fake or signal [Y], especially in a romantic context. They don’t respect [Y]. Thus, this is extremely strong evidence. Thus, The Ick.

The person is not consciously thinking through this, but that’s the point.

That doesn’t mean that The Ick is always valid. Quite the contrary. Mistakes are made.

It’s fun to look at this list of Icks. There are very clear categories involved – status markers, stupidity and Your Mom are the big three. In general, it’s something that ‘looks bad’ and the fact that the man should know it looks bad and therefore not do it.

To what extent is The Ick a win-win? The majority of the time, I think it’s win-win, because them caring so much about this little thing, combined with you not caring, means it was never going to work. But on occasion there’s a classification mismatch, and when that happens it is super bad. And even if the particular person getting The Ick here is good, overall reaction to you continuing to do that thing is still bad, it’s almost certainly a mistake. So in general, if there’s something that is known to give The Ick, it’s worth making an effort to not do it.

This might be the new strangest take. It’s bad if he bought a house?

Cold: It’s offputting when a man buys a house when he’s single. Too prepared. The wife should help choose where they live, obviously. Is he just looking for a woman to slot in the missing hole in the fantasy he’s created? Even if a single man has money he should live in an apartment.

Midwest Antiquarian: What if you own an apartment?

cold: You’re doing great king.

Generous Farm: I bought when interest rates were most attractive. 2.9%. Now they’re 7.5%. Sorry couldn’t wait.

Cold: Love is bigger than 4.6% difference in rates.

Robin Hanson: Maybe many are a bit too eager to judge men for every little thing they do?

Any sane person would view ‘I own a house’ a highly positive sign. ‘Too prepared’?

If it’s a case of ‘I own this house and refuse to move’ then I can see an issue, and you should think about whether you want to live in that house. But houses can be sold.

This is what the wise man calls ‘favorable selection.’ If they turn you down because of this you presumably dodged a bullet. If someone thinks you should be paying 7.5% in interest rather than 2.9% so that you can avoid signaling you’re ‘too prepared’? Run. Or, rather, you don’t have to run, all you have to do is stay put. That’s the idea.

I hope and presume not too many people are treating ‘owns a house’ (but not an apartment, that would mean you’re doing great king?) in particular as a red flag.

Note that in the next section, one of Claire’s demands is that the man ‘has a small house,’ so a direct contradiction. I wonder if a large house is okay for her?

The more important point is yes, there is a large trend of judging based on a single data point, and looking for ways in which to find that data point a red flag. Stop it, or at least stop it once you’ve got substantial amounts of other information.

If you’re looking for the same things everyone is looking for, it’s rough out there.

Claire: Am looking for this man where is he?

Mason: It’s not that these are bad things to want

it’s not even that these are too much

it’s just that there is no depth here, no values, nothing to anchor a connection on, you couldn’t even write a compelling side character in a comic book on this outline.

The question isn’t where is he, it’s what would you do with him if you found him.

Danneskjold (with the correct answer): Happily married to a normal person.

Cynical Mike (also probably correct): All the things I’ll find before you find him:

Waldo, Carmen Santiago, Jimmy Hoffa, Epstein’s Suicide video, Pot of Gold at the end of a Rainbow, A Dragon, Aliens, Noah’s ARK, Hogwarts and a Pegasus.

The good news is this is not 15 things, it is more like 5. A lot of redundancy.

As Mason says, there’s nothing wrong with anything on the list but also nothing that differentiates what you in particular want, and it focuses on exactly the legible universally desirable features that put someone in high demand. The useful list has the things that you value more than ‘the market,’ and importantly drops some things that you value less.

When the going gets weird, be careful not to inadvertently turn pro.

The problem is that Choices are Bad. Really bad.

Misha: I think these days I see the biggest problem in dating is people are both increasingly weird and increasingly picky.

This applies not just to dating qua dating but all aspects of socialization, which are of course upstream of romance.

I think our minds are (sometimes perniciously) good at making us content with what seems possible.

The modern world shows us a far wider range of what’s possible.

What’s possible and what’s expected depends a lot on local culture and your knowledge of the world and one of these is drastically in flux right now and the other has grown immensely.

Imagine you get into hiking. This is a fairly common hobby, and you want a partner who will go with you. Woops, you might’ve just cut your potential partners by a large percentage.

Any given thing you want, or want to avoid? Mostly you can solve for that. Combine enough different such things, and you quickly get into trouble. The search algorithms are not that robust.

The Secretary Problem thus suggests that if you are maximizing, you should be deeply stingy about accepting a match until you’ve done a lot of calibration, and then take your sweet time after that.

But two big differences work against this in the dating case. You have an uncertain number of shots at this, taking each shot is costly in several ways, the pool you’re drawing from starts getting worse over time after a while, each time you’ve previously drawn may impose its own form of penalty to the final outcome, and you can easily miss outright. And once you pick, the comparisons directly impact your satisfaction levels. Thus, you want to be much quicker on the trigger.

Rolf Degen: Being ghosted causes more lasting humiliation than being openly rejected.

Research on how individuals respond to ghosting, defined as unilaterally ending a relationship without providing explanations and ignoring communication attempts, has primarily relied on retrospective and imaginative methodologies. The present research introduced a novel multi-day daily-diary experimental paradigm to examine the psychological consequences of ghosting compared to rejection.

It should be common knowledge at this point that not explaining, and especially outright ghosting, is making your life easier at the expense of the person ghosted.

It can be the right move anyway, as in it sometimes helps you more than it hurts them. Not ghosting can have its own downsides, starting with them demanding a reason, or if you share a reason arguing about it, offering to change or getting really angry about it or using it against you (or in non-dating contexts outright suing). The less you say, the safer and better, and if you change your mind your options might be open.

Despite this, by default, you should be ghostminning.

If you know that you don’t want to continue talking to someone, say so. By default treat ghosting as a black mark on the person doing it. This applies to all forms of ghosting, not only in dating. Also, if they decide to ghost you, in some ways that’s a black mark on your ability to credibly signal that they don’t have to.

Cate Hall: unironically this is why everyone’s single

“oh, there’s a small thing you don’t like about a promising match? definitely break up with him instead of mentioning it”

no, you don’t have “the right” to force him to change it. but maybe just give him the info & let him decide?

Cate is correct that this is ludicrously terrible advice. He has hit a good vibe two dates out of three, everything else about him is great. Obviously you tell him that this cologne and aesthetic did not work for you. When did this become a ‘right to micromanage’ anything? Is there any possible world in which the guy is better off if you dump him rather than telling him, or silently suffer and never tell him?

I do think Jorbs is right that this reads like a ‘good on paper’ description, and she may be looking for an excuse. But that’s dumb, if you don’t want to date him then don’t.

Jorbs: it reads like someone who met someone she feels like she should want to date but doesn’t actually like him very much, and is struggling to process that or work out what to do with it. the positives aren’t enchanting, the negatives are cosmetic.

The response is ludicrous though.

Zac Hill: The thing that is really insidious about this response is the subtle everyone-has-forgotten-their-Carol-Dweck-lessons habit of *internalization of a behavior as a fixed characteristic*. It’s not that you don’t like his cologne — it’s that you ARE sensitive to fragrance.

A good starting rule is that if them changing one eminently changeable thing would make dating worthwhile, you should tell them about it.

VB Knives: Just two useless boyfriends can easily consume the entire period between 21 and 30. You have one who doesn’t seem to ever propose. Then you finally get rid of him (you are 27) and the next effort brings you to 30+ with no ring. Not some exotic string of bad decisions.

Literary Chad: This is actually the issue– the median bodycount is 4, there aren’t wild sex parties, it’s serial monogamy without marriage or children.

Which is obvious unless you’re a boomer-like media consumer who believes that the existence of a sex party somewhere means they’re ubiquitous.

The more I think about this question, the more it seems like an obvious mistake to stay in a long term relationship for years and not fully commit.

I realize this is easier said than done. It is scary to go full The Ultimatum and insist on a rapid move up or move out, when you have something pretty good. It does seem like it is obviously a mistake to give things more than a year or at most two.

How much does it matter?

Steve Stewart-Williams: As shown in the graph below, the sweet spot was two to four past partners; fewer or more reduced attractiveness. In effect, people wanted someone with a bit of a past, but not too much (which was the title of our paper describing the research).

Intriguingly, we found no evidence for a sexual double standard: none, zilch, nada. Contrary to what’s often claimed, women weren’t judged any more harshly than men for having a high body count. That’s not to say they weren’t judged for it, but only that men were judged too.

This can be seen in the next graph. The left panel shows willingness ratings for long-term relationships, the right panel for short-term ones. As you can see, the sexes barely differed in their willingness to engage in long-term relationships. For short-term relationships, in contrast, men expressed greater willingness at every past-partner level.

This looks like a relative ranking, so it does not tell you how much this ‘willingness’ matters. Claiming ‘it literally does not matter at all’ would be bizarre, certainly this number contains information especially if the number is zero or one.

Also, I flat out defy the data on there being no double standard? No way. Even if on some abstract 1-9 scale it looks similar, the practical impact is very obviously totally different. Yes, a male body count of 60+ is functionally a negative, but not at the same level.

Astrology is a problem for men when dating, because:

  1. A remarkably high percentage of women believe in it to remarkably high degrees.

  2. If taken at all seriously, it is deeply stupid.

  3. Even as a clearly delineated fake framework, it’s pretty terrible.

It is also an opportunity, because the subset of humans that use astrology talk is not random, and the details of how a person interacts with astrology, no matter how seriously they do or don’t take it, are even less random.

Seattle Min You: If a girl asks your zodiac sign and your first response is to be annoyed, you’ve already fucked up. You don’t have enough whimsy in your heart to entertain an arbitrary topic for even a little tiny bit and it’s ugly.

Drea: Huge red flag… like, lighten up Francis.

Seattle Min You: Totally.

Purpskurp Capital LLC: If it was a whimsy topic for fun I’d 100% agree with too.

Problem is many of these girls use this stuff to make real life decisions, instead of using critical thinking and reasoning skills. It’s terrifying. And that’s why a lot of men like me treat it as a red flag.

Texanus: I will absolutely do that. for a child. are you a child? do you need to be treated like a child? I love my nieces and I’ve gotten all dressed up and played tea party with them. I’m not doing that with a grown woman however.

Positivity Moon: His jaw tightens a little. His eyes do that micro roll. He says something like “I don’t believe in that stuff” with the same energy you would use for “I don’t support war crimes.” The girl just asked for his birthday. No one was trying to rewrite physics.

People like to pretend it is about logic. Rationality. Being above “nonsense.” It almost never is. It is usually about control.

Because on the surface, the question is stupidly harmless. She is not building a medical treatment plan off your sun sign. She is not deciding whether to let you hold her wallet based on whether Mercury is breakdancing. She is opening a door to talk about patterns, personality, taste, how you see yourself. If you answer “Scorpio” or “Gemini” or whatever, the conversation that follows is almost never actually about stars. It is about you, but sideways. She is saying: teach me how to play with you for a second.

When your first instinct is annoyance, what you are really saying is: I hate being touched anywhere that does not fit my script.

Because if you truly did not care, you would just answer and move on. “I’m a Virgo.” Smile. Shrug. Ask hers. Make a joke. You do not have to secretly download Co–Star in the bathroom and start believing. You just have to have enough flexibility to sit in someone else’s little universe for five minutes without throwing a tantrum about empirical evidence.

People underestimate how much relationships are built on that ability. To step into someone’s weird side hobby, their micro belief system, their little rituals, even when you do not share them. She might have astrology. Someone else has Dungeons & Dragons lore. Another person has fantasy football statistics. Your uncle has his grill. None of it matters in a lab. All of it matters when you are trying to figure out: can I talk to this person about something that is technically pointless and still feel respected.

Annoyance at the sign question is rarely about skepticism. It is about contempt.

Philip Arola: “Astrology is a vehicle women use to communicate indirectly. Why would it possibly make you annoyed?”

Responding with visible annoyance, or declining to answer with your birthday or sign, is everywhere and always an unforced error. Don’t do that, even if you’re effectively writing her off and no matter how actually annoyed you are. There’s no reason to make things unpleasant, especially before you know how far she’s going with this.

However, well, do you still respect her as a potential partner? Should you?

Annoyance here can come from many places. One of which is ‘oh god I have to deal with this now,’ another related one is ‘damn it I am no longer as interested.’

There are three related but distinct stories from Moon here about a reaction of annoyance. You have logic versus control, and you have skepticism versus contempt, and you have ability to indulge in whimsey and retain respect.

There is also the claim motte and bailey claim that of course she doesn’t actually believe in astrology and won’t let this influence things beyond a little whimsy.

That brings us back to Purpskurp. There is a continuum of possibilities, but centrally three cases.

  1. Topic of whimsy. It’s a way of making conversation, seeing how you play off an arbitrary set of predictions to get things rolling, a form of cold reading.

    1. This is still a negative update even once you establish this, because she is indicating she thinks this is a good move. Why? Because unless this was maximally explicit, she is engaging in a form of selection, and that choice of selection tells you about her, and it makes this less likely to be a good match.

    2. That said, this is totally survivable, especially if the whimsey is clear. Thus, you don’t want to fail the stest aspect of this by showing annoyance up top.

    3. If this is approached explicitly and strategically up front as a fake framework, then that is totally fine.

  2. Taken somewhat seriously as actual Bayesian evidence, as in it might influence her decisions regarding you, including in the future.

    1. That’s trouble, potentially of two distinct forms.

    2. It’s a bad sign on her general decision making capabilities and epistemics, and given you are reading this it’s a really bad sign about your compatibility across the board. It’s a red flag.

    3. The actual astrological implications could be bad news for you. Then again, they might also be good news. What matters is her interpretation of this, on whatever level she understands astrology.

    4. It’s worth noting that astrology is super flexible, so if you have green flags elsewhere you can ‘fight back’ if you know enough to play within the game.

  3. Actual real belief in astrology, the way I believe in having lunch.

    1. For the long term, this is a dealbreaker, period, straight up.

    2. How it impacts the short term is, of course, up to you.

An ‘interest in’ astrology or tarot cards can be fine, although tarot cards are strictly better and astrology indicates poor taste. Actual belief? That’s a dealbreaker, ladies.

yashkaf: this is entirely correct in that being dismissive of people’s niche interests on a date is a much bigger “red flag” than astrology.

and yet if a cute girl posted “he brought up D&D and fantasy football on our first date, I rolled my eyes so hard” will anyone take the guy’s side?

no matter who brings up the cringe topic and who rolls their eyes, the guy fumbled.

no matter who escalated intimacy and who shot it down, the guy fumbled.

it’s always the guy who fumbles. thus, it’s always the guy who improves from feedback.

this makes dating very hard for women.

John Nerst: I mean astrology isn’t an interest, it’s a belief. Very different.

yashkaf: differentiating interests from beliefs from beliefs in belief from world models is just your own special weird niche interest 🐀

John Nerst: Going meta, how droll. But yes, most people are totally nuts and this is one of the surest signs

yashkaf: if you’re want to know the difference between “rat” and “post rat” without prejudice against either, read [the above] conversation between me and John and see who you intuitively side with

Sarah Constantin: yeah, rat all the way.

i think astrology is fun and i would not consider an interest in astrology a dealbreaker.

but i *definitelydon’t believe in stretching the meaning of “truth” or going “what is true or false, really?”

… i think it’s important to be grounded in (ordinary, waking, non-mystical) reality, to the point that you can enjoy *playingwith deviations from it, without getting seriously confused and throwing your life off the rails.

“zero playing around allowed” types are no-fun scolds, but “come on in, this is literally real and true, from a certain point of view, let me talk you into that” can destabilize some people for real.

Discussion about this post

Dating Roundup #9: Signals and Selection Read More »

remembering-what-windows-10-did-right—and-how-it-made-modern-windows-more-annoying

Remembering what Windows 10 did right—and how it made modern Windows more annoying


Remembering Windows 10’s rollout can help diagnose what ails Windows 11.

If you’ve been following our coverage for the last few years, you’ll already know that 2025 is the year that Windows 10 died. Technically.

“Died,” because Microsoft’s formal end-of-support date came and went on October 14, as the company had been saying for years. “Technically,” because it’s trivial for home users to get another free year of security updates with a few minutes of effort, and schools and businesses can get an additional two years of updates on top of that, and because load-bearing system apps like Edge and Windows Defender will keep getting updates through at least 2028 regardless.

But 2025 was undoubtedly a tipping point for the so-called “last version of Windows.” StatCounter data says Windows 11 has overtaken Windows 10 as the most-used version of Windows both in the US (February 2025) and worldwide (July 2025). Its market share slid from just over 44 percent to just under 31 percent in the Steam Hardware Survey. And now that Microsoft’s support for the OS has formally ended, games, apps, and drivers are already beginning the gradual process of ending or scaling back official Windows 10 support.

Windows 10 is generally thought of as one of the “good” versions of Windows, and it was extremely popular in its heyday: the most widely used version of Windows since XP. That’s true even though many of the annoying things that people complain about in Windows 11 started during the Windows 10 era. Now that it’s time to write Windows 10’s epitaph, it’s worth examining what Microsoft got right with Windows 10, how it laid the groundwork for many of the things people dislike about Windows 11, and how Microsoft has made all of those problems worse in the years since Windows 11 first launched.

Windows 10 did a lot of things right

The Start menu in the first release of Windows 10. Windows 10 got a lot of credit for not being Windows 8 and for rolling back its most visible and polarizing changes.

Like Windows 7, Windows 10’s primary job was to not be its predecessor. Windows 8 brought plenty of solid under-the-hood improvements over Windows 7, but it came with a polarizing full-screen Start menu and a touchscreen-centric user interface that was an awkward fit for traditional desktops and laptops.

And the biggest thing it did to differentiate itself from Windows 8 was restore a version of the traditional Start menu, altered from its Windows XP or Windows 7-era iterations but familiar enough not to put people off.

Windows 10 also adopted a bunch of other things that people seemed to like about their smartphones—it initially rolled out as a free upgrade to anyone already running Windows 7 or Windows 8, and it ran on virtually all the same hardware as those older versions. It was updated on a continuous, predictable cadence that allowed Microsoft to add features more quickly. Microsoft even expanded its public beta program, giving enthusiasts and developers an opportunity to see what was coming and provide feedback before new features were rolled out to everybody.

Windows 10 also hit during a time of change at Microsoft. Current CEO Satya Nadella was just taking over from Steve Ballmer, and as part of that pivot, the company was also doing things like making its Office apps work on iOS and Android and abandoning its struggling, proprietary browser engine for Edge. Nadella’s Microsoft wanted you to be using Microsoft products (and ideally paying for a subscription to do so), but it seemed more willing to meet people where they were rather than forcing them to change their behavior.

That shift continued to benefit users throughout the first few years of Windows 10’s life. Developers benefited from the introduction and continuous improvement of the Windows Subsystem for Linux, a way to run Linux and many of its apps and tools directly on top of Windows. Microsoft eventually threw out its struggling in-house browser engine for a new version of the Edge browser built on Chromium—we can debate whether Chromium’s supremacy is a good thing for an open, standard-compliant Internet, but switching to a more compatible rendering engine and an established extension ecosystem was absolutely the more user-friendly choice. Both projects also signaled Microsoft’s growing engagement with and contributions to open-source projects, something that would have been hard to imagine during the company’s closed-off ’90s and ’00s.

Windows 10 wasn’t perfect; these examples of what it did right are cherry-picked. But part of the operating system’s reputation comes from the fact that it was originally developed as a response to real complaints and rolled out in a way that tried to make its changes and improvements as widely accessible as possible.

But Windows 10 laid the groundwork for Windows 11’s problems

Windows 10 asked you to sign in with a Microsoft account, but for most of the operating system’s life, it was easy to skip this using visible buttons in the UI. Windows 10 began locking this down in later versions; that has continued in Windows 11, but it didn’t originate there. Credit: BTNHD

As many things as Windows 10 did relatively well, most of the things people claim to find objectionable about Windows 11 actually started happening during the Windows 10 era.

Right out of the gate, for example, Windows 10 wanted to collect more information about how people were using the operating system—ostensibly in the name of either helping Microsoft improve the OS or helping “personalize” its ads and recommendations. And the transition to the “software-as-a-service” approach helped Windows move faster but also broke things, over and over again—these kinds of bugs have persisted on and off into the Windows 11 era despite Microsoft’s public beta programs.

Windows 10 could also get pushy about other Microsoft products. Multiple technologies, like the original Edge and Cortana, were introduced, pushed on users, and failed. The annoying news and weather widget on the taskbar was a late addition to Windows 10; advertisements and news articles could clutter up its lock screen. Icons for third-party apps from the Microsoft Store, many of them low-rent, ad-supported time-waster games, were added to the Start menu without user consent. Some users of older Windows versions even objected to the way that the free Windows 10 upgrade was offered—the install files would download themselves automatically, and it could be difficult to make the notifications go away.

Even the mandatory Microsoft Account sign-in, one of the most frequently complained-about aspects of Windows 11, was a Windows 10 innovation—it was easier to circumvent than it is now, and it was just for the Home edition of the software, but in retrospect, it was clearly a step down the road that Windows 11 is currently traveling.

Windows 11 did make things worse, though

But many of Windows 11’s annoyances are new ones. And the big problem is that these annoyances have been stacked on top of the annoying things that Windows 10 was already doing, gradually accumulating to make the new PC setup process go from “lightly” to “supremely” irritating.

The Microsoft Account sign-in requirement is ground zero for a lot of this since signing in with an account unlocks a litany of extra ads for Microsoft 365, Game Pass, and other services you may or may not need or want. Connecting to the Internet and signing in became a requirement for new installations of both the Home and the Pro versions of Windows 11 starting with version 22H2, and while workarounds existed then and continue to exist now, you have to know about them beforehand or look them up yourself—the OS doesn’t offer you an option to skip. Microsoft will also apparently be closing some of these loopholes in future updates, making circumvention even more difficult.

And if getting through those screens when setting up a new PC wasn’t annoying enough, Windows 11 will regularly remind you about other Microsoft services again through its Second Chance Out-Of-Box Experience screen, or SCOOBE. This on-by-default “feature” has offered to help me “finish setting up” Windows 11 installations that are years old and quite thoroughly set up. It can be turned off via a buried checkbox in the Notifications settings, but removing it or making it simpler to permanently dismiss from the SCOOBE screen itself would be the more user-friendly change, especially since Microsoft already bombards users with “helpful reminders” about many of these same services via system notifications.

Microsoft’s all-consuming pivot to generative AI also deserves blame. Microsoft’s Copilot push hasn’t stopped with the built-in app that gets a position of honor on the default taskbar—an app whose appearance and functionality have completely changed multiple times in the last couple of years as Microsoft has updated it. Microsoft changed the default Windows PC keyboard layout for the first time in 30 years to accommodate Copilot, and Copilot-branded features have landed in every Windows app from Word to Paint to Edge to Notepad. Sometimes these features can be uninstalled or turned off; sometimes they can’t.

It’s not just that Microsoft is squeezing generative AI into every possible nook and cranny in Windows; it’s that there seems to be no feature too intrusive or risky to make the cutoff. Microsoft nearly rolled out a catastrophically insecure version of Recall, a feature for some newer PCs that takes screenshots of your activity and records it for later reference; Microsoft gave its security an overhaul after a massive outcry from users, media, and security researchers, but Recall still rolled out.

The so-called “agentic” AI features that Microsoft is currently testing in Windows come with their own documented security and privacy risks, but their inclusion in Windows is essentially a foregone conclusion because Microsoft executives are constantly talking about the need to develop an “agentic OS.” There’s a fine line between introducing new software features and forcing people to use them, and I find that Microsoft’s pushiness around Windows 11’s AI additions falls on the wrong side of that line for me pretty much every single time.

Finally, while Windows 10 ran on anything that could run Windows 7 or 8, Windows 11 came with new system requirements that excluded many existing, functional PCs. The operating system can be installed unofficially on PCs that are several years older than the official cutoff, but only if you’re comfortable with the risks and you know how to get around the system requirements check.

Using people’s PCs as billboards to sell them new PCs feels tacky at best. Credit: Kyle Orland

I find the heightened requirements—implemented to improve security, according to Microsoft—to be more or less defensible. TPM modules enable seamless disk encryption, Secure Boot protects from threats that are otherwise invisible and hard to detect, and CPU makers like Intel and AMD only commit to supporting older processors with firmware-level security patches for so long, which is important in the era of hardware-level security exploits.

But the requirements don’t feel like something Microsoft has imposed to protect users from threats; they feel like something Microsoft is doing in order to upsell you to a new PC. Microsoft creates that impression when it shows Windows 10 users full-screen ads for new Copilot+ PCs, even when their systems are capable of upgrading to the new operating system directly. People are already primed to believe in “planned obsolescence,” the idea that the things they buy are designed to slow down or fail just in time to force them to buy new things; pushing people to throw out functioning PCs with full-screen ads does nothing to dispel this notion.

Windows 11 could still be great

I still believe that Windows 11 has good bones. Install the Enterprise version of the operating system and you’ll get a version with much less extra cruft on top of it, a version made to avoid alienating the businesses that pay good money to install Windows across large fleets of PCs. Microsoft has made huge strides in getting its operating system to run on Arm-based PCs. The Windows Subsystem for Linux is better than it’s ever been. I’m intrigued by the company’s efforts to make Windows a better operating system for gaming handhelds, Microsoft’s belated answer to Valve’s Steam Deck and SteamOS.

But as someone with firsthand experience of every era of Windows from 3.1 onward, I can say I’ve never felt as frustrated with the operating system as I have during Windows 11’s Copilot era. The operating system can be tamed with effort. But the taming has become an integral part of the new PC setup process for me, just as essential as creating the USB installer and downloading drivers and third-party apps. It’s something my PC needs to have done to it before it feels ready to use.

Windows 10 was far from perfect. But as we mark the first stage of its multi-year passing, it’s worth remembering what it did well and why people were willing to install it in droves. I’d like to see Microsoft recommit to a quieter, cleaner version of Windows that is more willing to get out of the way and just let people use their computers the way they want, the same way the company has tried to recommit to security following a string of embarrassing breaches. I don’t have much hope that this will happen, but some genuine effort could go a long way toward convincing Windows 10-using holdouts that the new OS actually isn’t all that bad.

Photo of Andrew Cunningham

Andrew is a Senior Technology Reporter at Ars Technica, with a focus on consumer tech including computer hardware and in-depth reviews of operating systems like Windows and macOS. Andrew lives in Philadelphia and co-hosts a weekly book podcast called Overdue.

Remembering what Windows 10 did right—and how it made modern Windows more annoying Read More »

ars-technica’s-top-20-video-games-of-2025

Ars Technica’s Top 20 video games of 2025


Blue Prince and 19 others

A mix of expected sequels and out-of-nowhere indie gems made 2025 a joy.

Credit: Collage by Aurich Lawson

Credit: Collage by Aurich Lawson

When we put together our top 20 games of last year, we specifically called out Civilization 7, Avowed, Doom: The Dark Ages, and Grand Theft Auto 6 as big franchise games we were already looking forward to for 2025. While one of those games has been delayed into 2026, the three others made this year’s list of Ars’ favorite games as expected. They join a handful of other highly anticipated sequels, ranging from big-budget blockbusters to long-gestating indies, on the “expected” side of this year’s list.

But the games that really stood out for me in 2025 were the ones that seemed to come out of nowhere. Those range from hard-to-categorize roguelike puzzle games to a gonzo, punishing mountainous walking simulation, the best Geometry Wars clone in years, and a touching look at the difficulties of adolescence through the surprisingly effective lens of mini-games.

As we look toward 2026, there are plenty of other big-budget projects that the industry is busy preparing for (the delayed Grand Theft Auto VI chief among them). If next year is anything like this year, though, we can look forward to plenty more games that no one saw coming suddenly vaulting into view as new classics.

Assassin’s Creed Shadows

Ubisoft Quebec; Windows, MaxOS, PS5, Xbox Series X|S, Switch 2, iPad

When I was younger, I wanted—and expected—virtually every game I played to blow me away with something I’d not seen before. It was easier to hit that bar in the ’90s, when both the design and technology of games were moving at an incredible pace.

Now, as someone who still games in his 40s, I’m excited to see that when it happens, but I don’t expect it, Now, I increasingly appreciate games that act as a sort of comfort food, and I value some games as much for their familiarity as I do their originality.

That’s what Assassin’s Creed Shadows is all about (as I wrote when it first came out). It follows a well-trodden formula, but it’s a beautifully polished version of that formula. Its world is grand and escapist, its audio and graphics presentation is immersive, and it makes room for many different playstyles and skill levels.

If your idea of a good time is “be a badass, but don’t think too hard about it,” Shadows is one of the best Assassin’s Creed titles in the franchise’s long history. It doesn’t reinvent any wheels, but after nearly two decades of Assassin’s Creed, it doesn’t really need to; the new setting and story are enough to separate it, while the gameplay remains familiar.

-Samuel Axon

Avowed

Obsidian Entertainment; Windows, Xbox Series X|S

No game this year has made me feel as hated as Avowed. As an envoy for the far-off Aedryan empire, your role in Avowed is basically to be hated, either overtly or subtly, by almost everyone you encounter in the wild, semi-colonized world of the Living Lands. The low-level hum of hatred and mistrust from the citizens permeates everything you do in the game, which is an unsettling feeling in a genre usually characterized by the moral certitude of heroes fighting world-ending evil.

Role-playing aside, Avowed is helpfully carried by its strong action-packed combat system, characterized as it is by thrilling moment-to-moment positional jockeying and the juggling of magic spells, ranged weapons, and powerful close-range melee attacks. The game’s quest system also does a good job of letting players balance this combat difficulty for themselves—if a goal is listed with three skull symbols on your menu, you’d best put it off until you’ve leveled up a little bit more.

I can take or leave the mystical mumbo-jumbo-filled subplot surrounding your status as a “godlike” being that can converse with spirits. Aside from that, though, I’ve never had so much fun being hated.

-Kyle Orland

Baby Steps

Gabe Cuzzillo, Maxi Boch, Bennett Foddy; Windows, PS5

The term “walking simulator” often gets thrown around in some game criticism circles as a derisive term for a title that’s about nothing more than walking around and looking at stuff. While Baby Steps might technically fit into that “walking simulator” model, stereotyping it in that way does this incredibly inventive game a disservice.

It starts with the walking itself, which requires meticulous, rhythmic manipulation of both shoulder buttons and both analog sticks just to stay upright. Super Mario 64, this ain’t. But what starts as a struggle to take just a few short steps quickly becomes almost habitual, much like learning to walk in real life.

The game then starts throwing new challenges at your feet. Slippery surfaces. Narrow stairways with tiny footholds. Overhangs that block your ridiculously useless, floppy upper body. The game’s relentless mountain is designed such that a single missed step can ruin huge chunks of progress, in the proud tradition of Getting Over It with Bennett Foddy.

This all might sound needlessly cruel and frustrating, but trust me, it’s worth sticking with to the end. That’s in part for the feeling of accomplishment when you do finally make it past that latest seemingly impossible wall, and partly to experience an absolutely gonzo story that deals directly and effectively with ideas of masculinity, perseverance, and society itself. You’ll never be so glad to take that final step.

-Kyle Orland

Ball x Pit

Kenny Sun; Windows, MacOS, PS5, Xbox Series X|S, Switch, Switch 2

The idea of bouncing a ball against a block is one of the most tried-and-true in all of gaming, from the basic version in the ancient Breakout to the number-filled angles of Holedown. But perhaps no game has made this basic concept as compulsively addictive as Ball x Pit.

Here, the brick-breaking genre is crossed with the almost as storied shoot-em-up, with the balls serving as your weapons and the blocks as enemies that march slowly but relentlessly from the top of the screen to the bottom. The key to destroying those blocks all in time is bouncing your growing arsenal of new balls at just the right angles to maximize their damage-dealing impact and catching them again so you can throw them once more that much faster.

Like so many roguelikes before it, Ball x Pit uses randomization as the core of its compulsive loop, letting you choose from a wide selection of new abilities and ball-based attacks as you slowly level up. But Ball x Pit goes further than most in letting you fuse and combine those balls into unique combinations that take dozens of runs to fully uncover and combine effectively.

Add in a deep system of semi-permanent upgrades (with its own intriguing “bounce balls around a city builder” mini game) and a deep range of more difficult settings and enemies to slowly unlock, and you have a game whose addictive pull will last much longer than you might expect from the simple premise.

-Kyle Orland

Blue Prince

Dogubomb; Windows, MacOS, PS5, Xbox Series X|S

Usually, when formulating a list like this, you can compare a title to an existing game or genre as a shorthand to explain what’s going on to newcomers. That’s nearly impossible with Blue Prince, a game that combines a lot of concepts to defy easy comparison to games that have come before it.

At its core, Blue Prince is about solving the mysteries of a house that you build while exploring it, drafting the next room from a selection of three options every time you open a new door. Your initial goal, if you can call it that, is to discover and access the mysterious “Room 46” that apparently exists somewhere on the 45-room grid. And while the houseplan you’re building resets with every in-game day, the knowledge you gain from exploring those rooms stays with you, letting you make incremental progress on a wide variety of puzzles and mysteries as you rebuild the mansion from scratch again and again.

What starts as a few simple and relatively straightforward puzzles quickly unfolds fractally into a complex constellation of conundrums, revealed slowly through scraps of paper, in-game books, inventory items, interactive machinery, and incidental background elements. Figuring out the more intricate mysteries of the mansion requires careful observation and, often, filling a real-life mad scientist’s notepad with detailed notes that look incomprehensible to an outsider. All the while, you have to manage each day’s limited resources and luck-of-the-draw room drafting to simply find the right rooms to make the requisite progress.

Getting to that storied Room 46 is enough to roll the credits on Blue Prince, and it serves as an engaging enough puzzle adventure in its own right. But that whole process could be considered a mere tutorial for a simply massive endgame, which is full of riddles that will perplex even the most experienced puzzlers while slowly building a surprisingly deep story of political intrigue and spycraft through some masterful environmental storytelling.

Some of those extreme late-game puzzles might be too arcane for their own good, honestly, and will send many players scrambling for a convenient guide or wiki for some hints. But even after playing for over 100 hours over two playthroughs, I’m pretty sure I’m still not done exploring all that Blue Prince has to offer.

-Kyle Orland

Civilization VII

Firaxis; Windows, MacOS, Linux, PS4/5, Xbox One/Series X|S, Switch 2

This one will be controversial: I love Civilization VII.

Civilization VII launched as a bit of a mess. There were bugs and UI shortcomings aplenty. Most (but not all) of those have been addressed in the months since, but they’re not the main reason this is a tricky pick.

The studio behind the Civilization franchise, Firaxis, has long said it has a “33/33/33″ approach to sequels in the series, wherein 33 percent of the game should be familiar systems, 33 percent should be remixes or improvements of familiar systems, and 33 percent should be entirely new systems.

Critics of Civilization VII say Firaxis broke that 33/33/33 rule by overweighting the last 33 percent, mainly to chase innovations in the 4X genre by other games (like Humankind). I don’t disagree, but I also welcome it.

Credit is due to the team at Firaxis for ingeniously solving some longstanding design problems in the franchise, like using the new age transitions to curb snowballing and to expunge systems that become a lot less fun in the late game than they are in the beginning. Judged on its own terms, Civilization VII is a deep, addictive, and fun strategy game that I’ve spent more than 100 hours playing this year.

My favorite Civ game remains Civilization IV, but that game still runs fine on modern systems, is infinitely replayable out of the box, and enjoys robust modding support. I simply didn’t need more of the same from this particular franchise; to me, VII coexists with IV and others on my hard drive—very different flavors of the same idea.

-Samuel Axon

CloverPit

Panik Arcade; Windows, Xbox Series X|S

I’m not sure I like what my minor CloverPit obsession says about me. When I fell into a deep Balatro hole last year, I could at least delude myself into thinking there was some level of skill in deciding which jokers to buy and sell, which cards to add or prune from my deck, and which cards to hold and discard. In the end, though, I was as beholden to the gods of random number generation as any other Balatro player.

Cloverpit makes the surrender to the vagaries of luck all the more apparent, replacing the video-poker-like systems of Balatro with a “dumb” slot machine whose handle you’re forced to pull over and over again. Sure, there are still decisions to make, mostly regarding which lucky charms you purchase from a vending machine on the other side of the room. And there is some skill involved in learning and exploiting lucky charm synergies to extract the highest expected value from those slot machine pulls.

Once you’ve figured out those basic strategies, though, CloverPit mostly devolves into a series of rerolls waiting for the right items to show up in the shop in the right order. Thankfully, the game hides plenty of arcane secrets beneath its charming PS1-style spooky-horror presentation, slowly revealing new items and abilities that hint that something deeper than just accumulating money might be the game’s true end goal.

It’s this creepy vibe and these slowly unfolding secrets that have compelled me to pour dozens of hours into what is, in the end, just a fancy slot machine simulator. God help me.

-Kyle Orland

Consume Me

Jenny Jiao Hsia, AP Thomson; Windows, MacOS

Jenny is your average suburban Asian-American teenager, struggling to balance academic achievement, chores, an overbearing mother, romantic entanglements, and a healthy body image. What sounds like the premise for a cliché young adult novel actually serves to set up a compelling interactive narrative disguised as a mere mini-game collection.

Consume Me brilliantly integrates the conflicting demands placed on Jenny’s time and attention into the gameplay itself. Creating a balanced meal, for instance, becomes a literal test of balancing vaguely Tetris-shaped pieces of food on a tray, satisfying your hunger and caloric limits at the same time. Chores take up time but give you money you can spend on energy drinks that let you squeeze in more activities by staying up late (but can lead to debilitating headaches). A closet full of outfits becomes an array of power-ups to your time, energy, or focus.

It takes almost preternatural resource management skills and mini-game execution to satisfy all the expectations being placed on you, which is kind of the meta-narrative point. No matter how well you do, Jenny’s story develops in a way that serves as a touching semi-autobiographical look at the life of co-creator Jenny Jiao Hsia. That biography is made all the more sympathetic here for an interactive presentation that’s more engaging than any young adult novel could be.

-Kyle Orland

Death Stranding 2: On the Beach

Kojima Productions; PS5

Death Stranding 2: On the Beach should not be fun. Much like its predecessor, the latest release from famed game designer Hideo Kojima is about delivering packages—at least on the surface. Yet the process of planning your routes, managing inventory, and exploring an unfathomably strange post-apocalyptic world remains a winning formula.

The game again follows Sam Porter Bridges (played by Norman Reedus) on his quest to reconnect the world as humanity faces possible extinction. And yes, that means acting like a post-apocalyptic Amazon Prime. Standing in the way of an on-time delivery are violent raiders, dangerous terrain, and angry, disembodied spirits known as Beached Things.

It’s common to hear Death Stranding described as a walking simulator, and there is indeed a lot of walking, but the sequel introduces numerous quality-of-life improvements that make it more approachable. Death Stranding 2 has a robust fast-travel mechanic and better vehicles to save you from unnecessary marches, and the inventory management system is less clunky. That’s important in a game that asks you to traverse an entire continent to deliver cargo.

Beyond the core gameplay loop of stacking heavy boxes on your back, Death Stranding 2 has all the Kojima vibes you could want. There are plenty of quirky gameplay mechanics and long cutscenes that add depth to the characters and keep the story moving. The world of Death Stranding has been designed from the ground up around the designer’s flights of fancy, and it works—even the really weird stuff almost makes sense!

Along the way, Death Stranding 2 has a lot to say about grief, healing, and the value of human connection. The game’s most poignant cutscenes are made all the more memorable by an incredible soundtrack, and we cannot oversell the strength of the mocap performances.

It may take 100 hours or more to experience everything the game has to offer, but it’s well worth your time.

-Ryan Whitwam

Donkey Kong Bananza

Nintendo EPD; Switch 2

Credit: Nintendo

Since the days of Donkey Kong Country, I’ve always felt that Mario’s original ape antagonist wasn’t really up for anchoring a Mario-level platform franchise. Donkey Kong Bananza is the first game to really make me doubt that take.

Bonanza is a great showcase for the new, more powerful hardware on the Switch 2, with endlessly destructible environments that send some impressive-looking shiny shrapnel flying when they’re torn apart. It can’t be understated how cathartic it is to pound tunnels up, down, and through pretty much every floor, ceiling, and wall you see, mashing the world itself to suit your needs.

Bonanza also does a good job aping Super Mario Odyssey’s tendency to fill practically every square inch of space with collectible doodads and a wide variety of challenges. This is not a game where you need to spend a lot of time aimlessly wandering for the next thing to do—there’s pretty much always something interesting around the next corner until the extreme end game.

Sure, the camera angles and frame rate might suffer a bit during the more chaotic bits. But it’s hard to care when you’re having this much fun just punching your way through Bananza’s imaginative, colorful, and malleable world.

-Kyle Orland

Doom: The Dark Ages

Id Software; Windows, PS5, Xbox Series X|S

Credit: Bethesda Game Studios

For a series that has always been about dodging, Doom: The Dark Ages is much more about standing your ground. The game’s key verbs involve raising your shield to block incoming attacks or, ideally, parrying them back in the direction they came.

It’s a real “zig instead of zag” moment for the storied Doom series, and it does take some getting used to. Overall, though, I had a great time mixing in turtle-style blocking with the habitual pattern of circling-strafing around huge groups of enemies in massive arenas and quickly switching between multiple weapons to deal with them as efficiently as possible. While I missed the focus on extreme verticality of the last two Doom games, I appreciate the new game’s more open-world design, which gives completionist players a good excuse to explore every square inch of these massive environments for extra challenges and hidden collectibles.

The only real problem with Doom: The Dark Ages comes when the game occasionally transitions to a slow-paced mech-style demon battle or awkward flying dragon section, sometimes for entire levels at a time. Those variations aside, I came away very satisfied with the minor change in focus for a storied shooter series.

-Kyle Orland

Dragonsweeper

Daniel Benmergui; Javascript

Anyone who has read my book-length treatise on Minesweeper knows I’m a sucker for games that involve hidden threats within a grid of revealed numbers. But not all variations on this theme are created equal. Dragonsweeper stands out from the crowd by incorporating a simple but arcane world of RPG-style enemies and items into its logical puzzles.

Instead of simply counting the number of nearby mines, each number revealed on the Dragonsweeper grid reflects the total health of the surrounding enemies, both seen and unseen. Attacking those enemies means enduring predictable counterattacks that deplete your limited health bar, which you can grow through gradual leveling until you’re strong enough to kill the game’s titular dragon, taunting you from the center of the field.

Altogether, it adds an intriguing new layer to the logical deduction, forcing you to carefully manage your moves to maximize the impact of your attacks and the limited health-restoring items scattered throughout the field. And while finishing one run isn’t too much of a challenge, completing the game’s optional achievements and putting together a “perfect” game score is enough to keep puzzle lovers coming back for hours and hours of compelling logical deduction.

-Kyle Orland

Elden Ring: Nightreign

FromSoftware; Windows, PS4/5, Xbox One/Series X|S

Credit: Bandai Namco

At first blush, Nightreign feels like a twisted perversion of everything that has made FromSoft’s Souls series so compelling for so many years. What was a slow-paced, deliberate open-world RPG has become a game about quickly sprinting across a quickly contracting map, leveling up as quickly as possible before taking on punishing bosses. A moody solitary experience has become one that practically requires a group of three players working together. It’s like an Elden Ring-themed amusement park that seems to miss the point of the original.

Whatever. It still works!

Let the purists belly ache about how it’s not really Elden Ring. They’re right, but they’re missing the point. Nightreign condenses the general vibe of the Elden Ring world into something very different but no less enjoyable. What’s more, it packs that vibe into a tight experience that can be easily squeezed into a 45-minute sprint rather than requiring dozens of hours of deep exploration.

That makes it the perfect excuse to get together with a few like-minded Elden Ring-loving friends, throw on a headset, and just tear through the Lands Between together for the better part of an evening. As Elden Ring theme parks go, you could do a lot worse.

-Kyle Orland

Ghost of Yotei

Sucker Punch Productions; PS5

Ghost of Yotei from Sucker Punch Productions starts as a revenge tale, featuring hard-as-nails Atsu on the hunt for the outlaws who murdered her family. While there is plenty of revenge to be had in the lands surrounding Mount Yotei, the people Atsu meets and the stories they have to tell make this more than a two-dimensional quest for blood.

The game takes place on the northern Japanese island of Ezo (modern-day Hokkaido) several centuries after the developer’s last samurai game, Ghost of Tsushima. It has a lot in common with that title, but Ghost of Yotei was built for the PS5 and features a massive explorable world and stunning visuals. It’s easy to get sidetracked from your quest just exploring Ezo and tinkering with the game’s photo mode.

The land of Ezo avoids some of the missteps seen in other open-world games. While it’s expansive and rich with points of interest, exploring it is not tedious. There are no vacuous fetch quests or mindless collecting (or loading screens, for that matter). Even when you think you know what you’re going to find at a location, you may be surprised. The interesting side quests and random encounters compel you to keep exploring Ezo.

Ghost of Yotei’s combat is just as razor-sharp as its exploration. It features multiple weapon types, each with unlockable abilities and affinities that make them ideal for taking on certain foes. Brute force will only get you so far, though. You need quick reactions to parry enemy attacks and strike back—it’s challenging and rewarding but not frustrating.

It’s impossible to play Ghost of Yotei without becoming invested in the journey, and a big part of that is thanks to the phenomenal voice work of Erika Ishii as Atsu. Some of the game’s pivotal moments will haunt you, but luckily, the developer has just added a New Game+ mode so you can relive them all again.

-Ryan Whitwam

Hades 2

Supergiant Games; Windows, MacOS, Switch, Switch 2

There’s a moment in the second section of Hades 2 where you start to hear a haunting melody floating through the background. That music gets louder and louder until you reach the game’s second major boss, a trio of sirens that go through a full rock-opera showtune number as you dodge their bullet-hell attacks and look for openings to go in for the kill. That three-part musical presentation slowly dwindles to a solo as you finally dispatch the sirens one by one, restoring a surprisingly melancholy silence once more.

It’s this and other musical moments casually and effortlessly woven through Hades 2 that will stick with me the most. But the game stands on its own beyond the musicality, expanding the original game’s roguelike action with a compelling new spell system that lets you briefly capture or slow enemies in a binding circle. This small addition adds a new sense of depth to the moment-to-moment positional dance that was already so compelling in the original Hades.

Hades 2 also benefits greatly from the introduction of Melinoe, a compelling new protagonist who gets fleshed out through her relationship with the usual rogue’s gallery of gods and demigods. Come for her quest of self-discovery, stay for the moments of musical surprise.

-Kyle Orland

Hollow Knight: Silksong

Team Cherry; Windows, MacOS, Linux, PS4/5, Xbox One/Series X|S, Switch, Switch 2

Piece of cake.

Credit: Team Cherry

Piece of cake. Credit: Team Cherry

A quickie sequel in the year or two after Hollow Knight’s out-of-nowhere success in 2017 might have been able to get away with just being a more-of-the-same glorified expansion pack. But after over eight years of overwhelming anticipation from fans, Silksong had to really be something special to live up to its promise.

Luckily, it is. Silksong is a beautiful expansion of the bug-scale underground universe created in the first game. Every new room is a work of painterly beauty, with multiple layers of detailed 2D art drawing you further into its intricate and convincing fallen world.

The sprawling map seems to extend forever in every direction, circling back around and in on itself with plenty of optional alleyways in which to get lost searching for rare power-ups. And while the game is a punishingly hard take on action platforming, there’s usually a way around the most difficult reflex tests for players willing to explore and think a bit outside the box.

Even players who hit a wall and never make it through the sprawling tunnels of Silksong’s labyrinthine underground will still find plenty of memorable moments in whatever portion of the game they do experience.

-Kyle Orland

The King Is Watching

Hypnohead; Windows

A lot of good resource tiles there, but the king can only look at six at a time.

Credit: Hypnohead / Steam

A lot of good resource tiles there, but the king can only look at six at a time. Credit: Hypnohead / Steam

In a real-time-strategy genre that can often feel too bloated and complex for its own good, The King Is Watching is a streamlined breath of fresh air. Since the entire game takes place on a single screen, there’s no need to constantly pan and zoom your camera around a sprawling map. Instead, you can stay laser-focused on your 5×5 grid of production space and on which portion of it is actively productive under the king’s limited gaze at any particular moment.

Arranging tiles to maximize that production of basic resources and military units quickly becomes an all-consuming juggling act, requiring constant moment-to-moment decisions that can quickly cascade through a run. I’m also a big fan of the game’s self-selecting difficulty system, which asks you to choose how many enemies you think you can take in coming waves, doling out better rewards for players who are willing to push themselves to the limit of their current capabilities.

The bite-size serving of a single King Is Watching run ensures that even failure doesn’t feel too crushing. And success brings with it just enough in the way of semi-permanent ability expansions to encourage another run where you can reach even greater heights of production and protection.

-Kyle Orland

Kingdom Come: Deliverance II

Warhorse Studios; Windows, PS5, Xbox Series X|S

Kingdom Come: Deliverance was a slog that I had to will myself to complete. It was sometimes a broken and janky game, but despite its warts, I saw the potential for something special. And that’s what its sequel, Kingdom Come: Deliverance II, has delivered.

While it’s still a slow burn, the overall experience has been greatly refined, the initial challenge has been smoothed out, and I’ve rarely been more immersed in an RPG’s storytelling. There’s no filler, as every story beat and side quest offers a memorable tale that further paints the setting and characters of medieval Bohemia.

Unlike most RPGs, there’s no magic to be had, which is a big part of the game’s charm. As Henry of Skalitz, you are of meager social standing, and many characters you speak to will be quick to remind you of it. While Henry is a bit better off than his humble beginnings in the first game, you’re no demigod that can win a large battle single-handedly. In fact, you’ll probably lose fairly often in the early goings if more than one person is attacking you.

Almost every fight is a slow dance once you’re in a full suit of armor, and your patience and timing will be the key to winning over the stats of your equipment. But therein lies the beauty of KC:D II: Every battle you pick, whether physical or verbal, carries some weight to your experience and shapes Bohemia for better or worse.

-Jacob May

Mario Kart World

Nintendo; Switch 2

Credit: Nintendo

After the incredible success of Mario Kart 8 and its various downloadable content packs on the Switch, Nintendo could have easily done a prettier “more of the same” sequel as the launch-window showcase for the Switch 2. Instead, the company took a huge gamble in trying to transform Mario Kart’s usual distinct tracks into a vast, interconnected open world.

This conceit works best in “Free Roam” mode, where you can explore the outskirts of the standard tracks and the wide open spaces in between for hundreds of mini-challenges that test your driving speed and precision. Add in dozens of collectible medallions and outfits hidden in hard-to-reach corners, and the mode serves as a great excuse to explore every nook and cranny of a surprisingly detailed and fleshed-out world map.

I was also a big fan of Knockout Mode, which slowly whittles a frankly overwhelming field of 24 initial racers to a single winner through a series of endurance rally race checkpoints. These help make up for a series of perplexing changes that hamper the tried-and-true Battle Mode formula and long straightaway sections that feel more than a little bit stifling in the standard Grand Prix mode. Still, Free Roam mode had me happily whiling away dozens of hours with my new Switch 2 this year.

-Kyle Orland

Sektori

Kimmo Lahtinen; Windows, PS5, Xbox Series X|S

For decades now, I’ve been looking for a twin-stick shooter that fully captures the compulsive thrill of the Geometry Wars franchise. Sektori, a late-breaking addition to this year’s top games list, is the first game I can say does so without qualification.

Like Geometry Wars, Sektori has you weaving through a field filled with simple shapes that quickly fill your personal space with ruthless efficiency. But Sektori advances that basic premise with an elegant “strike” system that lets you dash through encroaching enemies and away from danger with the tap of a shoulder button. Advanced players can get a free, instant strike refill by dashing into an upgrade token, and stringing those strikes together creates an excellent risk-vs-reward system of survival versus scoring.

Sektori also features an excellent Gradius-style upgrade system that forces you to decide on the fly whether to take basic power-ups or save up tokens for more powerful weaponry and/or protection further down the line. And just when the basic gameplay threatens to start feeling stale, the game throws in a wide variety of bosses and new modes that mix things up just enough to keep you twitching away.

Throw in an amazing soundtrack and polished presentation that makes even the most crowded screens instantly comprehensible, and you have a game I can see myself coming back to for years—until my reflexes are just too shot to keep up with the frenetic pace anymore.

-Kyle Orland

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Ars Technica’s Top 20 video games of 2025 Read More »

openai’s-child-exploitation-reports-increased-sharply-this-year

OpenAI’s child exploitation reports increased sharply this year

During the first half of 2025, the number of CyberTipline reports OpenAI sent was roughly the same as the amount of content OpenAI sent the reports about—75,027 compared to 74,559. In the first half of 2024, it sent 947 CyberTipline reports about 3,252 pieces of content. Both the number of reports and pieces of content the reports saw a marked increase between the two time periods.

Content, in this context, could mean multiple things. OpenAI has said that it reports all instances of CSAM, including uploads and requests, to NCMEC. Besides its ChatGPT app, which allows users to upload files—including images—and can generate text and images in response, OpenAI also offers access to its models via API access. The most recent NCMEC count wouldn’t include any reports related to video-generation app Sora, as its September release was after the time frame covered by the update.

The spike in reports follows a similar pattern to what NCMEC has observed at the CyberTipline more broadly with the rise of generative AI. The center’s analysis of all CyberTipline data found that reports involving generative AI saw a 1,325 percent increase between 2023 and 2024. NCMEC has not yet released 2025 data, and while other large AI labs like Google publish statistics about the NCMEC reports they’ve made, they don’t specify what percentage of those reports are AI-related.

OpenAI’s update comes at the end of a year where the company and its competitors have faced increased scrutiny over child safety issues beyond just CSAM. Over the summer, 44 state attorneys general sent a joint letter to multiple AI companies including OpenAI, Meta, Character.AI, and Google, warning that they would “use every facet of our authority to protect children from exploitation by predatory artificial intelligence products.” Both OpenAI and Character.AI have faced multiple lawsuits from families or on behalf of individuals who allege that the chatbots contributed to their children’s deaths. In the fall, the US Senate Committee on the Judiciary held a hearing on the harms of AI chatbots, and the US Federal Trade Commission launched a market study on AI companion bots that included questions about how companies are mitigating negative impacts, particularly to children. (I was previously employed by the FTC and was assigned to work on the market study prior to leaving the agency.)

OpenAI’s child exploitation reports increased sharply this year Read More »

nasa-rewraps-boeing-starliner-astrovan-ii-for-artemis-ii-ride-to-launch-pad

NASA rewraps Boeing Starliner Astrovan II for Artemis II ride to launch pad

Artemis II, meet Astrovan II.

NASA’s first astronauts who will fly by the moon in more than 50 years participated in a practice launch countdown on Saturday, December 20, including taking their first trip on a transport vehicle steeped in almost the entire span of US space history—from Apollo through to the ongoing commercial crew program.

Three men and a woman wearing bright orange pressure suits pose for a photo next to a motor coach.

Artemis II astronauts (from right to left) Reid Wiseman, Victor Glover, Christina Koch, and Jeremy Hansen pose for photographs before boarding the Astrovan II crew transport vehicle for a ride to their rocket during a rehearsal of their launch-day activities at NASA’s Kennedy Space Center in Florida on Saturday, Dec. 20, 2025. Credit: NASA/Aubrey Gemignani

Artemis II commander Reid Wiseman, pilot Victor Glover, and mission specialist Christina Koch (all with NASA) and mission specialist Jeremy Hansen, an astronaut with the Canadian Space Agency, began the rehearsal at the Kennedy Space Center in Florida, proceeding as they will when they are ready to fly next year (the Artemis II launch is slated for no earlier than the first week of February and no later than April 2026).

Parked outside of their crew quarters and suit-up room was their ride to their rocket, “Astrovan II,” a modified Airstream motorhome. The almost 25-foot-long (8-meter) crew transport vehicle (CTV) was custom-wrapped with graphics depicting the moon, the Artemis II mission patch, and program insignia.

From Canoo to coach

Airstream’s Atlas Touring Coach, though, was not originally planned as NASA’s Artemis CTV. In July 2023, NASA took delivery of three fully electric vans from Canoo Technologies after the company, a startup based in Torrance, California, was awarded the contract the year before. At the time, NASA touted its selection as focusing on the “crews’ safety and comfort on the way to the [launch] pad.”

Three vans with rounded corners are parked side by side in front of a large building and an overcast sky.

The three Canoo Technologies’ specially designed, fully-electric, environmentally friendly crew transportation vehicles for Artemis missions arrived at Kennedy Space Center on July 11, 2023. The company now bankrupt, the CTVs will serve as a backup to the Astrovan II. Credit: NASA/Isaac Watson

Six months later, Canoo filed for bankruptcy, and NASA ceased active use of the electric vans, citing a lack of support for its mission requirements. Instead, the agency turned to another of its commercial partners, Boeing, which had its own CTV but no astronauts at present to use it.

NASA rewraps Boeing Starliner Astrovan II for Artemis II ride to launch pad Read More »

rocket-report:-russia-pledges-quick-fix-for-soyuz-launch-pad;-ariane-6-aims-high

Rocket Report: Russia pledges quick fix for Soyuz launch pad; Ariane 6 aims high


South Korean rocket startup Innospace is poised to debut a new nano-launcher.

The fifth Ariane 6 rocket climbs away from Kourou, French Guiana, with two European Galileo navigation satellites. Credit: ESA-CNES-Arianespace

Welcome to Edition 8.23 of the Rocket Report! Several new rockets made their first flights this year. Blue Origin’s New Glenn was the most notable debut, with a successful inaugural launch in January followed by an impressive second flight in November, culminating in the booster’s first landing on an offshore platform. Second on the list is China’s Zhuque-3, a partially reusable methane-fueled rocket developed by the quasi-commercial launch company LandSpace. The medium-lift Zhuque-3 successfully reached orbit on its first flight earlier this month, and its booster narrowly missed landing downrange. We could add China’s Long March 12A to the list if it flies before the end of the year. This will be the final Rocket Report of 2025, but we’ll be back in January with all the news that’s fit to lift.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Rocket Lab delivers for Space Force and NASA. Four small satellites rode a Rocket Lab Electron launch vehicle into orbit from Virginia early Thursday, beginning a government-funded technology demonstration mission to test the performance of a new spacecraft design, Ars reports. The satellites were nestled inside a cylindrical dispenser on top of the 59-foot-tall (18-meter) Electron rocket when it lifted off from NASA’s Wallops Flight Facility. A little more than an hour later, the rocket’s upper stage released the satellites one at a time at an altitude of about 340 miles (550 kilometers). The launch was the starting gun for a proof-of-concept mission to test the viability of a new kind of satellite called DiskSats, designed by the Aerospace Corporation.

Stack ’em high… “DiskSat is a lightweight, compact, flat disc-shaped satellite designed for optimizing future rideshare launches,” the Aerospace Corporation said in a statement. The DiskSats are 39 inches (1 meter) wide, about twice the diameter of a New York-style pizza, and measure just 1 inch (2.5 centimeters) thick. Made of composite carbon fiber, each satellite carries solar cells, control avionics, reaction wheels, and an electric thruster to change and maintain altitude. The flat design allows DiskSats to be stacked one on top of the other for launch. The format also has significantly more surface area than other small satellites with comparable mass, making room for more solar cells for high-power missions or large-aperture payloads like radar imaging instruments or high-bandwidth antennas. NASA and the US Space Force cofunded the development and launch of the DiskSat demo mission.

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

SpaceX warns of dangerous Chinese launch. China’s recent deployment of nine satellites occurred dangerously close to a Starlink satellite, SpaceX’s vice president of Starlink engineering said. Michael Nicolls wrote in a December 12 social media post that there was a 200-meter close approach between a satellite launched December 10 on a Chinese Kinetica-1 rocket and SpaceX’s Starlink-6079 spacecraft at 560 kilometers (348 miles) altitude, Aviation Week and Space Technology reports. “Most of the risk of operating in space comes from the lack of coordination between satellite operators—this needs to change,” Nicolls wrote.

Blaming the customer... The company in charge of the Kinetica-1 rocket, CAS Space, responded to Nicolls’ post on X saying it would “work on identifying the exact details and provide assistance.” In a follow-up post on December 13, CAS Space said the close call, if confirmed, occurred nearly 48 hours after the satellite separated from the Kinetica-1 rocket, by which time the launch mission had long concluded. “CAS Space will coordinate with satellite operators to proceed.”

A South Korean startup is ready to fly. Innospace, a South Korean space startup, will launch its independently developed commercial rocket, Hanbit-Nano, as soon as Friday, the Maeil Business Newspaper reports. The rocket will lift off from the Alcântara Space Center in Brazil. The small launcher will attempt to deliver eight small payloads, including five deployable satellites, into low-Earth orbit. The launch was delayed two days to allow time for technicians to replace components of the first stage oxidizer supply cooling system.

Hybrid propulsion… This will be the first launch of Innospace’s Hanbit-Nano rocket. The launcher has two stages and stands 71 feet (21.7 meters) tall with a diameter of 4.6 feet (1.4 meters). Hanbit-Nano is a true micro-launcher, capable of placing up to 200 pounds (90 kilograms) of payload mass into Sun-synchronous orbit. It has a unique design, with hybrid engines consuming a mix of paraffin as the fuel and liquid oxygen as the oxidizer.

Ten years since a milestone in rocketry. On December 21, 2015, SpaceX launched the Orbcomm-2 mission on an upgraded version of its Falcon 9 rocket. That night, just days before Christmas, the company successfully landed the first stage for the first time. Ars has reprinted a slightly condensed chapter from the book Reentry, authored by Senior Space Editor Eric Berger and published in 2024. The chapter begins in June 2015 with the failure of a Falcon 9 rocket during launch of a resupply mission to the International Space Station and ends with a vivid behind-the-scenes recounting of the historic first landing of a Falcon 9 booster to close out the year.

First-person account… I have my own memory of SpaceX’s first rocket landing. I was there, covering the mission for another publication, as the Falcon 9 lifted off from Cape Canaveral, Florida. In an abundance of caution, Air Force officials in charge of the Cape Canaveral spaceport closed large swaths of the base for the Falcon 9’s return to land. The decision shunted VIPs and media representatives to viewing locations outside the spaceport’s fence, so I joined SpaceX’s official press room at the top of a seven-floor tower near the Port Canaveral cruise terminals. The view was tremendous. We all knew to expect a sonic boom as the rocket came back to Florida, but its arrival was a jolt. The next morning, I joined SpaceX and a handful of reporters and photographers on a chartered boat to get a closer look at the Falcon 9 standing proudly after returning from space.

Roscosmos targets quick fix to Soyuz launch pad. Russian space agency Roscosmos says it expects a damaged launch pad critical to International Space Station operations to be fixed by the end of February, Aviation Week and Space Technology reports. “Launch readiness: end of February 2026,” Roscosmos said in a statement Tuesday. Russia had been scrambling to assess the extent of repairs needed to Pad 31 at the Baikonur Cosmodrome in Kazakhstan after the November 27 flight of a Soyuz-2.1a rocket damaged key elements of the infrastructure. The pad is the only one capable of supporting Russian launches to the ISS.

Best-case scenario… A quick repair to the launch pad would be the best-case scenario for Roscosmos. A service structure underneath the rocket was unsecured during the launch of a three-man crew to the ISS last month. The structure fell into the launch pad’s flame trench, leaving the complex without the service cabin technicians use to work on the Soyuz rocket before liftoff. Roscosmos said a “complete service cabin replacement kit” has arrived at the Baikonur Cosmodrome, and more than 130 staff are working in two shifts to implement the repairs. A fix by the end of February would allow Russia to resume cargo flights to the ISS in March.

Atlas V closes out an up-and-down year for ULA. United Launch Alliance aced its final launch of 2025, a predawn flight of an Atlas V rocket Tuesday carrying 27 satellites for Amazon’s recently rebranded Leo broadband Internet service, Spaceflight Now reports. The rocket flew northeast from Cape Canaveral to place the Amazon Leo satellites into low-Earth orbit. This was ULA’s fourth launch for Amazon’s satellite broadband venture, previously known as Project Kuiper. ULA closes out 2025 with six launches, one more than the company achieved last year. But ULA’s new Vulcan rocket launched just once this year, disappointingly short of the company’s goal to fly Vulcan up to 10 times.

Taking stock of Amazon Leo… This year marked the start of the deployment of Amazon’s operational satellites. There are now 180 Amazon Leo satellites in orbit after Tuesday’s launch, well short of the FCC’s requirement for Amazon to deploy half of its planned 3,232 satellites by July 31, 2026. Amazon won’t meet the deadline, and it’s likely the retail giant will ask government regulators for a waiver or extension to the deadline. Amazon’s factory is hitting its stride producing and delivering Amazon Leo satellites. The real question is launch capacity. Amazon has contracts to launch satellites on ULA’s Atlas V and Vulcan rockets, Europe’s Ariane 6, and Blue Origin’s New Glenn. Early next year, a batch of 32 Amazon Leo satellites will launch on the first flight of Europe’s uprated Ariane 64 rocket from Kourou, French Guiana. (submitted by EllPeaTea)

A good year for Ariane 6. Europe’s Ariane 6 rocket launched four times this year after a debut test flight in 2024. The four successful missions deployed payloads for the French military, Europe’s weather satellite agency, the European Union’s Copernicus environmental monitoring network, and finally, on Wednesday, the European Galileo navigation satellite fleet, Space News reports. This is a strong showing for a new rocket flying from a new launch pad and a faster ramp-up of launch cadence than any medium- or heavy-lift rocket in recent memory. All five Ariane 6 launches to date have used the Ariane 62 configuration with two strap-on solid rocket boosters. The more powerful Ariane 64 rocket, with four strap-on motors, will make its first flight early next year.

Aiming high… This was the first launch using the Ariane 6 rocket’s ability to fly long-duration missions lasting several hours. The rocket’s cryogenic upper stage, with a restartable Vinci engine, took nearly four hours to inject two Galileo navigation satellites into an orbit more than 14,000 miles (nearly 23,000 kilometers) above the Earth. The flight profile put more stress on the Ariane 6 upper stage than any of the rocket’s previous missions, but the rocket released its payloads into an on-target orbit. (submitted by EllPeaTea)

ESA wants to do more with Ariane 6’s kick stage. The European Space Agency plans to adapt a contract awarded to ArianeGroup in 2021 for an Ariane 6 kick stage to cover its evolution into an orbital transfer vehicle, European Spaceflight reports. The original contract was for the development of the Ariane 6’s Astris kick stage, an optional addition for Ariane 6 missions to deploy payloads into multiple orbits or directly inject satellites into geostationary orbit. Last month, ESA’s member states committed approximately 100 million euros ($117 million) to refocus the Astris kick stage into a more capable Orbital Transfer Vehicle (OTV).

Strong support from Germany… ESA’s director of space transportation, Toni Tolker-Nielsen, said the performance of the Ariane 6 OTV will be “well beyond” that of the originally conceived Astris kick stage. The funding commitment obtained during last month’s ESA ministerial council meeting includes strong support from Germany, Tolker-Nielsen said. Under the new timeline, a protoflight mode of the OTV is expected to be ready for ground qualification by the end of 2028, with an inaugural flight following in 2029. (submitted EllPeaTea)

Another Starship clone in China. Every other week, it seems, a new Chinese launch company pops up with a rocket design and a plan to reach orbit within a few years. For a long time, the majority of these companies revealed designs that looked a lot like SpaceX’s Falcon 9 rocket. Now, Chinese companies are starting to introduce designs that appear quite similar to SpaceX’s newer, larger Starship rocket, Ars reports. The newest entry comes from a company called “Beijing Leading Rocket Technology.” This outfit took things a step further by naming its vehicle “Starship-1,” adding that the new rocket will have enhancements from AI and is billed as being a “fully reusable AI rocket.”

Starship prime… China has a long history of copying SpaceX. The country’s first class of reusable rockets, which began flying earlier this month, show strong similarities to the Falcon 9 rocket. Now, it’s Starship. The trend began with the Chinese government. In November 2024, the government announced a significant shift in the design of its super-heavy lift rocket, the Long March 9. Instead of the previous design, a fully expendable rocket with three stages and solid rocket boosters strapped to the sides, the country’s state-owned rocket maker revealed a vehicle that mimicked SpaceX’s fully reusable Starship. At least two more companies have announced plans for Starship-like rockets using SpaceX’s chopstick-style method for booster recovery. Many of these launch startups will not grow past the PowerPoint phase, of course.

Next three launches

Dec. 19: Hanbit-Nano | Spaceward | Alcântara Launch Center, Brazil | 18: 45 UTC

Dec. 20: Long March 5 | Unknown Payload | Wenchang Space Launch Site, China | 12: 30 UTC

Dec. 20: New Shepard | NS-37 crew mission | Launch Site One, Texas | 14: 00 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: Russia pledges quick fix for Soyuz launch pad; Ariane 6 aims high Read More »

two-space-startups-prove-you-don’t-need-to-break-the-bank-to-rendezvous-in-space

Two space startups prove you don’t need to break the bank to rendezvous in space

It may be happening quietly, but there is a revolution taking place with in-space transportation, and it opens up a world of possibilities.

In January, a small spacecraft built by a California-based company called Impulse Space launched along with a stack of other satellites on a Falcon 9 rocket. Upon reaching orbit, the rocket’s upper stage sent the satellites zipping off on their various missions.

And so it went with the Mira spacecraft built by Impulse, which is known as an orbital transfer vehicle. Mira dropped off several small CubeSats and then performed a number of high-thrust maneuvers to demonstrate its capabilities. This was the second flight by a Mira spacecraft, so Impulse Space was eager to continue testing the vehicle in flight.

Giving up control

This was all well and good up until this summer, when a funny thing happened. Impulse handed control of Mira over to another company, which had installed its own software package on the vehicle. And this second company, Starfish Space, took control.

This was more than a little weird, acknowledged Eric Romo, the president and chief operating officer of Impulse Space, in an interview.

“I would walk past mission control, and our teams would be on a call together, and I would just pop my head in and say, ‘Hey, don’t crash spaceship, please,’” Romo said. “It was definitely a new thing.”

But Starfish Space did not crash Mira. Rather, it activated its camera on board the spacecraft and started flying the vehicle. To what end? Founded in 2019, the Washington-based company seeks to build affordable spacecraft that can service satellites in space, providing propulsion or other aids to extend their lifetimes.

Now, flying Mira, the company sought to demonstrate that a single lightweight camera system, along with its closed-loop guidance, navigation, and control software, could autonomously rendezvous with another spacecraft. In this case, it was the very first Mira spacecraft launched by Impulse in November 2023. This vehicle no longer has propellant on board to control its orientation, but its solar panels periodically receive enough charge to allow it to communicate with Impulse’s engineers in California.

Two space startups prove you don’t need to break the bank to rendezvous in space Read More »