Author name: Mike M.

new-adhesive-surface-modeled-on-a-remora-works-underwater

New adhesive surface modeled on a remora works underwater


It was tested for its ability to adhere to the inside of the digestive tract.

Most adhesives can’t stick to wet surfaces because water and other fluids disrupt the adhesive’s bonding mechanisms. This problem, though, has been beautifully solved by evolution in remora suckerfish, which use an adhesive disk on top of their heads to attach to animals like dolphins, sharks, and even manta rays.

A team of MIT scientists has now taken a close look at these remora disks and reverse-engineered them. “Basically, we looked at nature for inspiration,” says Giovanni Traverso, a professor at MIT Department of Mechanical Engineering and senior author of the study.

Sticking Variety

Remora adhesive disks are an evolutionary adaptation of the fish’s first dorsal fin, the one that in other species sits on top of the body, just behind the head and gill covers. The disk rests on an intercalary backbone—a bone structure that most likely evolved from parts of the spine. This bony structure supports lamellae, specialized bony plates with tiny backward-facing spikes called spinules. The entire disk is covered with soft tissue compartments that are open at the top. “This makes the remora fish adhere very securely to soft-bodied, fast-moving marine hosts,” Traverso says.

A remora attaches to the host by pressing itself against the skin, which pushes the water out of these compartments, creating a low-pressure zone. Then, the spinules mechanically interlock with the host’s surface, making the whole thing work a bit like a combination of a suction cup and Velcro. When the fish wants to detach from a host, it lifts the disk, letting water back into the compartments to remove the suction. Once released, it can simply swim away.

What impressed the scientists the most, though, was the versatility of those disks. Reef-associated species of remora like Phtheirichthys lineatus are generalists and stick to various hosts, including other fish, sharks, or turtles. Other species living in the open sea are more specialized and attach to cetaceans, swordfish, or marlins. While most remoras attach to the external tissue of their hosts, R. albescens sticks within the oral cavities and gill chamber of manta rays.

a close up of a fish, showing its head covered by an oval-shaped pad that has lots of transverse ridges.

A close-up of the adhesive pad of a remora. Credit: Stephen Frink

To learn what makes all these different disks so good at sticking underwater, the team first examined their anatomy in detail. It turned out that the difference between the disks was mostly in the positioning of lamellae. Generalist species have a mix of parallel and angled lamellae, while remoras sticking to fast-swimming hosts have them mostly parallel. R. albescens, on the other hand, doesn’t have a dominant lamellae orientation pattern but has them positioned at a very wide variety of angles.

The researchers wanted to make an adhesive device that would work for a wide range of applications, including maritime exploration or underwater manufacturing. Their initial goal, though, was designing a drug delivery platform that could reliably stick to the inside walls of the gastrointestinal tract. So, they chose R. albescens disks as their starting point, since that species already attaches internally to its host. They termed their device an Mechanical Underwater Soft Adhesion System (MUSAS).

However, they didn’t just opt for a biomimetic, copy-and-paste design. “There were things we did differently,” Traverso says.

Upgrading nature

The first key difference was deployment. MUSAS was supposed to travel down the GI tract to reach its destination, so the first challenge was making it fit into a pill. The team chose the size 000 capsule, which at 26 millimeters in length and 9.5 millimeters in diameter, is the largest Food and Drug Administration-approved ingestible form. MUSAS had a supporting structure—just like remora disks, but made with stainless steel. The angled lamellae with spinules fashioned after those on R. albescens were made of a shape memory nickel-titanium alloy. The role of remora’s soft tissues, which provide the suction by dividing the disk into compartments, was played by an elastomer.

MUSAS, would be swallowed in a folded form within its huge pill. “The capsule is tuned to dissolve in specific pH environment, which is how we determine the target location—for example the small intestine has a slightly different pH than the stomach”, says Ziliang Kang, an MIT researcher in Traverso’s group and lead author of the study.  Once released, the shape memory alloy in MUSAS lamellae-like structures would unfold in response to body temperature and the whole thing would stick to the wall of the target organ, be it the esophagus, the stomach, or the intestines.

The mechanism of sticking was also a bit different from that of remoras. “The fish can swim and actively press itself against the surface it wants to stick to. MUSAS can’t do that, so instead we relied on the peristaltic movements within the GI tract to exert the necessary force,” Traverso explains. When the muscles contract, MUSAS would be pressed against the wall and attach to it. And it was expected to stay there for quite some time.

The team ran a series of experiments to evaluate MUSAS performance in a few different scenarios. The drug-delivery platform application was tested on pig organ samples. MUSAS stayed in the sample GI tract for an average of nine days, with the longest sticking time reaching three and a half weeks. MUSAS managed to stay in place despite food and fluids going through the samples.

Even when the team poked the devices with a pipette to test what they called “resisting dynamic interference,” MUSAS just slid a little but remained firmly attached. Other experiments included using MUSAS to attach temperature sensors to external tissues of live fish and putting sensors that could detect reflux events in the GI tract of live pigs.

Branching out

The team is working on making MUSAS compatible with a wider range of drugs and mRNA vaccines. “We also think about using this for stimulating tissues,” Traverso says. The solution he has in mind would use MUSAS to deliver electrical pulses to the walls of the GI tract, which Traverso’s lab has shown can activate appetite-regulating hormones. But the team also wants to go beyond strictly medical applications.

The team demonstrated that MUSAS is really strong as an adhesive. When it sticks to a surface, it can hold a weight over a thousand times greater than its own. This puts MUSAS more or less on par with some of the best adhesives we have, such as polyurethane glues or epoxy resins. What’s more, this sticking strength was measured when MUSAS was attached to soft, uneven, wet surfaces. “On a rigid, even surface, the force-to-weight ratio should be even higher,” Kang claims. And this, Kang thinks, makes scaled-up variants of MUSAS a good match for underwater manufacturing.

“The first scenario I see is using MUSAS as grippers attached to robotic arms moving around soft objects,” Kang explains. Currently, this is done using vacuum systems that simply suck onto a fabric or other surface. The problem is that these solutions are rather complex and heavy. Scaled-up MUSAS should be able to achieve the same thing passively, cutting cost and weight. The second idea Kang has is using MUSAS in robots designed to perform maintenance jobs beneath the waterline on boats or ships. “We are really trying to see what is possible,” Traverso says.

Nature, 2025.  DOI: 10.1038/s41586-025-09304-4

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

New adhesive surface modeled on a remora works underwater Read More »

for-giant-carnivorous-dinosaurs,-big-size-didn’t-mean-a-big-bite

For giant carnivorous dinosaurs, big size didn’t mean a big bite

“And then you have the Spinosaurus which was kind of weird in general,” Rowe says.  “There was a study by Dave Hone and Tom Holtz about how it was waiting on the shorelines, waiting for food to go by that it could fish out.” But Spinosaurus’ foraging wasn’t limited to fishing. There was a pterosaur found preserved in its stomach and there were iguanodon remains found in the maw of a Baryonyx, another large carnivore belonging to the same lineage as the Spinosaurus. “They had great diversity in their diet. They were generalists, but our results show they weren’t these massive bone-crunching predators like the T. rex,” Rowe says. Because the T. rex was just built different.

King of the Cretaceous jungle

The Tyranosauroidea lineage had stiff, akinetic skulls, meaning they had very little mobility in the joints. The T. rex skull could and most likely did withstand very high stress as the animal pursued a “high stress, high power” strategy, entirely different from other large carnivores. “They were very much like big crocodiles with extremely strong, reinforced jaws and powerful muscles that could pulverize bones,” Rowe claims.

The T. rex, he argued, was a specialist—an ambush predator that attacked large, highly mobile prey, aiming to subdue it with a single bite. “And we have fossil evidence of that,” Rowe says. “In the Museum of Natural History in New York, there is a Hadrosaur, a large herbivorous dinosaur with a duck-like beak, and there’s a T. rex tooth embedded in its back.” This, he thinks, means the T. rex was actively preying on this animal, especially since there are healing marks around the stuck tooth. “Even with this super strong bite, the T. rex wasn’t always successful,” Rowe adds.

Still, the fight with the Spinosaurus most likely wouldn’t go the way it did in Jurassic Park III. “The T. rex was built to fight like that; the Spinosaurus really wasn’t”, Rowe says.

Current Biology, 2025.  DOI: 10.1016/j.cub.2025.06.051

For giant carnivorous dinosaurs, big size didn’t mean a big bite Read More »

apple-brings-openai’s-gpt-5-to-ios-and-macos

Apple brings OpenAI’s GPT-5 to iOS and macOS

OpenAI’s GPT-5 model went live for most ChatGPT users this week, but lots of people use ChatGPT not through OpenAI’s interface but through other platforms or tools. One of the largest deployments is iOS, the iPhone operating system, which allows users to make certain queries via GPT-4o. It turns out those users won’t have to wait long for the latest model: Apple will switch to GPT-5 in iOS 26, iPadOS 26, and macOS Tahoe 26, according to 9to5Mac.

Apple has not officially announced when those OS updates will be released to users’ devices, but these major releases have typically been released in September in recent years.

The new model had already rolled out on some other platforms, like the coding tool GitHub Copilot via public preview, as well as Microsoft’s general-purpose Copilot.

GPT-5 purports to hallucinate 80 percent less and heralds a major rework of how OpenAI positions its models; for example, GPT-5 by default automatically chooses whether to use a reasoning-optimized model based on the nature of the user’s prompt. Free users will have to accept whatever the choice is, while paid ChatGPT accounts allow manually picking which model to use on a prompt-by-prompt basis. It’s unclear how that will work in iOS; will it stick to GPT-5’s non-reasoning mode all the time, or will it utilize GPT-5 “(with thinking)”? And if it supports the latter, will paid ChatGPT users be able to manually pick like they can in the ChatGPT app, or will they be limited to whatever ChatGPT deems appropriate, like free users? We don’t know yet.

Apple brings OpenAI’s GPT-5 to iOS and macOS Read More »

review:-the-sandman-s2-is-a-classic-tragedy,-beautifully-told

Review: The Sandman S2 is a classic tragedy, beautifully told

I unequivocally loved the first season of The Sandman, the Netflix adaptation of Neil Gaiman’s influential graphic novel series (of which I am longtime fan). I thought it captured the surreal, dream-like feel and tone of its source material, striking a perfect balance between the anthology approach of the graphic novels and grounding the narrative by focusing on the arc of its central figure: Morpheus, lord of the Dreaming.  It’s been a long wait for the second and final season, but S2 retains all those elements to bring Dream’s story to its inevitably tragic, yet satisfying, end.

(Spoilers below; some major S2 reveals after the second gallery. We’ll give you a heads-up when we get there.)

When Netflix announced in January that The Sandman would end with S2, speculation abounded that this was due to sexual misconduct allegations against Gaiman (who has denied them). However, showrunner Allan Heinberg wrote on X that the plan had long been for there to be only two seasons because the show’s creators felt they had only enough material to fill two seasons, and frankly, they were right. The first season covered the storylines of Preludes and Nocturnes and A Doll’s House, with bonus episodes adapting “Dream of a Thousand Cats” and “Calliope” from Dream Country.

The S2 source material is drawn primarily from Seasons of Mists, Brief Lives, The Kindly Ones, and The Wake, weaving in relevant material from Fables and Reflections—most notably “The Song of Orpheus” and elements of “Thermidor”—and the award-winning “A Midsummer Night’s Dream” from Dream Country. This season’s bonus episode adapts the 1993 standalone spinoff Death: The High Cost of Living. All that’s really missing is A Game of You—which focuses on Barbie (a minor character introduced in A Doll’s House) trying to save her magical dream realm from the evil forces of the Cuckoo—and a handful of standalone short stories. None of that material has any bearing on the Dream King’s larger character arc, so we lose little by the omissions.

Making amends

After escaping his captors, regaining his talismans, tracking down the rogue Corinthian (Boyd Holbrook), and dealing with a Vortex, S2 finds Morpheus (Tom Sturridge) rebuilding the Dreaming, which had fallen into disrepair during his long absence. He is interrupted by his sibling Destiny’s (Adrian Lester) unexpected summons to a family meeting, including Death (Kirby Howell-Baptiste), Desire (Mason Alexander Park), Despair (Donna Preston), and Delirium (Esmé Creed-Miles).

Review: The Sandman S2 is a classic tragedy, beautifully told Read More »

2025-subaru-wrx-ts-review:-a-scalpel-sharp-chassis-lets-this-car-dance

2025 Subaru WRX tS review: A scalpel-sharp chassis lets this car dance


Lots of suspension tweaks but no extra power for this WRX variant.

A blue Subaru WRX in the desert

Subaru went with a sedan for the current version of the WRX. Credit: Jim Resnick

Subaru went with a sedan for the current version of the WRX. Credit: Jim Resnick

The Subaru WRX has always been the equivalent of an automotive shrug. Not because it lacks character but because it simply doesn’t care what others think. It’s a punk rock band with enough talent to fill stadiums but band members who don’t seem to care about chasing fame. And the STI versions of yesteryear proved so talented that fame chased them.

For 2025, Subaru updated the WRX to now include the tS, which at first glance appears to be the same flannel-wearing street fighter. But looks can be deceiving. The tS hides sharpened tools underneath, translating to better handling and responsiveness.

What does “tS” really mean?

Subaru positions the tS as being tuned by STI, but it’s not an STI return. Sure, that’s technically true; only Subaru can name something STI. And to be clear, there’s no extra power here, no gigantic wing that takes out flocks of birds, and no pink STI badge on the trunk. But the tS is imbued with enough STI-ness to make a case.

A blue Subaru WRX in profile

The WRX still sticks to the same recipe that made it so popular, starting in the late ’90s. Credit: Jim Resnick

The hardware updates begin with electronically controlled dampers, stiffer engine mounts, a reworked steering rack, and huge, gold-painted Brembo brakes from the WRX TR, with six-piston calipers in front and two-piston units in the rear. Subaru’s engineers didn’t try to reinvent the WRX. They just put some finishing touches on it.

The engine story remains essentially the same. A 2.4 L turbocharged flat-four still produces 271 hp (202 kW) and 258 lb-ft (350 Nm) of torque from 12.0 psi of turbo boost, unchanged from the standard WRX, and the familiar boxer thrum remains. Power courses through a six-speed manual transmission to Subaru’s faithful symmetrical all-wheel-drive system. And not that most WRX buyers or fans would care much, but the sportster logs low EPA figures of just 19/26/22 city/highway combined MPG (12.4/9/10.7 L/100 km).

Driving: Precision dancing

The WRX tS doesn’t go any quicker than the base WRX since they both carry the same output, same transmission, and same essential guts and weight, but it’s no less fun. I didn’t do any measured testing of hard acceleration times, but I did dance around with the tS on my private test track in the Arizona desert.

A blue Subaru WRX seen from the rear 3/4s

Quad pipes burble pleasantly. Credit: Jim Resnick

I’m no Fred Astaire, but cinched into a willing, capable car, finding Ginger Rogers in front of you is rare. When I do, it’s time for celebration. Meet Ginger. As a WRX, she might be wearing ripped jeans and rubber soles, but when gliding across this dance floor (sinewy roads), no one cares.

Over the years, several plucky, beasty sportsters have punched way above their weight classes. The STIs of the past; the late, great Integra Type R (yes, I’m old enough to have tested it when new); the odd ’60s vintage racing Mini Cooper S (“the flying shoebox”); and various strains of VW Golf GTI all conspire to plant a smile on the face of even the most jaded car snob. This is the tS.

The Robert test

Knowing what good entertainment is worth, I brought my friend Robert along for an afternoon of WRXing. He owns multiple exotic sports cars, loves talking about them (but has never taken them to the track), and can rarely be bothered to discuss anything else with wheels. Robert flies in private jets, wears Brioni, and has a place on Park Avenue stocked with a case of Dom. (Perignon, that is.) “Jaded” is scratching the surface.

Subaru WRX tS interior

It’s very blue in here. Credit: Jim Resnick

After about 10 solid minutes of no-nonsense, twisting private test-track floggery at 6,000 rpm, full of opposite-lock steering and ABS tickling, I looked over at Robert as we came to a stop. I couldn’t have slapped the grin off his face if I tried.

“They sell this to the public?” he asked incredulously.

I relayed some more facts to Robert before we roared off again.

“These new adaptive dampers offer three modes, including Comfort, Normal, and Sport. There’s also a fourth Individual setting where you pick your throttle response, steering weight, damper stiffness, and all-wheel-drive behavior,” I told him.

He demanded to go again.

Subaru WRX engine bay

STI has not worked its magic under here. Credit: Jim Resnick

“Yeah, also, Subaru reduced the body roll rate by 30 percent from the WRX TR and limited brake dive and acceleration squat by 50 percent, I think through the new dampers,” I said as we entered a high-speed corner at about 95 mph.

It was at this point that Robert asked if we had a sick bag onboard. He was quiet the rest of the afternoon.

To be sure, I love an overachiever, and that’s the WRX tS. The smart cookies out there in Subie-world will take care of the tS engine in creative ways to bring into fuller balance the power/handling equilibrium, because if someone messes with the tS suspension, they’d be nuts. It’s about as stiff and capable as I could ever want in a car that needed to be driven on real roads. Perhaps grippier rubber? But even then, more grip would throw off the natural chuckability of the tS, and I love chuckable cars. The tS’s steering quickness and feel are both right on point.

Interior and daily use: Highs and lows

Big seat bolsters, but they don’t fit every back. Jim Resnick

Inside, the WRX tS doesn’t reinvent the Subaru design playbook, but it does offer upgrades. The most obvious are the Recaro front seats, which are a mixed bag. They provide oodles of support but are perhaps too aggressive for some body shapes. They’re extremely snug and hold you in place, provided you fit into them. I’m not that broad-shouldered, but the Recaro’s side bolsters nearly allow air to pass between my back and the seatback, so tightly coupled are the upper side bolsters.

The 11.6-inch portrait-oriented infotainment screen returns, and while it packs all the obvious functionality, such as Apple CarPlay, Android Auto, and a decent native navigation system, it still suffers from terribly sluggish response times. The new digital gauge cluster offers multiple display options, including a driver-focused performance view with boost pressure, gear position, and torque distribution.

A new digital gauge cluster can be configured as a typical presentation of dials or a track-oriented cluster with a bar graph tach. Navigation depicts maps crisply, too.

But Subaru’s EyeSight, which offers a variety of driver monitoring systems, breaks all known records in nannyism with pervasive, over-the-top reminders about driver attention. The system instructed me to keep my hands on the steering wheel, even though my hands were already on the steering wheel. It told me to keep my eyes on the road, but I was looking straight ahead at the car in front of me. Perhaps it was programmed by a very nervous George Costanza?

The build quality in the WRX TS is up to snuff, and soft-touch materials cover more surfaces than before. The cabin isn’t quite that of a luxury car, nor would anyone really expect it to be. It’s functional, durable, and right in character for the tS and for a Subaru.

The WRX tS retains some quirks, like the raucous engine note, especially under load and when first fired up. Until the fast idle has settled down, the exhaust is very boomy at the rear of the car.

Would it be a turbo Subie if it didn’t have a hood scoop? Jim Resnick

And then there’s the price. At $48,875, including the required destination charge, the un-optioned WRX tS gives you almost no change from $50,000. That’s a big heap of money for a WRX with no additional power than others and no STI badge, except on the gauges and shift knob. However, you do get a chassis above reproach, brakes that never give up, and steering that can shame some exotics. And it renders the Roberts in your life mute.

Photo of Jim Resnick

A veteran of journalism, product planning and communications in the automotive and music space, Jim reports, critiques and lectures on autos, music and culture.

2025 Subaru WRX tS review: A scalpel-sharp chassis lets this car dance Read More »

hulu’s-days-look-numbered,-but-there’s-reason-for-disney-to-keep-it-around 

Hulu’s days look numbered, but there’s reason for Disney to keep it around 

“When we gave people an opportunity to have a more seamless experience between Disney+ and Hulu, we saw engagement increasing,” Iger said today. “And we would hope that when we take this next step, which is basically full integration, that that engagement will go up even more.”

The initial integration of Hulu, which previously used a different tech platform than the 12-year-younger Disney+ app, required the reworking of “everything from login tools to advertising platforms, to metadata and personalization systems,” as well as moving over 100,000 individual assets/artwork, The Verge reported in March. At the time, Disney said that it was still working on re-encoding all of Hulu’s video files to work on Disney+ so that there could be one master library.

The updated app coming in 2026 seems to be the culmination of all this work. Iger also pointed to work around the app’s recommendations, including what users see on the Disney+ homepage. Additionally, the app has added more streams, such as one that plays The Simpsons 24/7.

The updated app also follows Disney’s purchase of Comcast’s remaining stake in Hulu. (Disney ended up paying about $9 billion for it, compared to the approximately $14 billion that Comcast wanted.)

During today’s earnings call, Iger said the updated user experience will help the profitability and margins of Disney’s streaming business (which also includes ESPN+) by boosting engagement, reducing subscriber churn, increasing advertising revenue, and driving operational efficiencies.

Hulu still has value

It seems likely that Disney will eventually strive for everyone to subscribe to a beefed-up Disney+ that incorporates stuff that used to be on Hulu. But there’s also value in keeping Hulu around for a while.

According to Disney’s Q3 2025 earnings report [PDF], Hulu has 55.5 million subscribers. That makes Hulu less than half the size of Disney+ (127.8 million subscribers), but it also means that ending Hulu subscriptions would put Disney at risk of losing millions of streaming subscribers. Today, though, it already makes little financial sense to buy standalone subscriptions to Disney+ or Hulu. A subscription starts at $10 per month for each app. A subscription to a Disney+ and Hulu bundle is only $11/month. Of course, Disney could change how it prices its streaming services at any time.

Hulu’s days look numbered, but there’s reason for Disney to keep it around  Read More »

opus-4.1-is-an-incremental-improvement

Opus 4.1 Is An Incremental Improvement

Claude Opus 4 has been updated to Claude Opus 4.1.

This is a correctly named incremental update, with the bigger news being ‘we plan to release substantially larger improvements to our models in the coming weeks.’

It is still worth noting if you code, as there are many indications this is a larger practical jump in performance than one might think.

We also got a change to the Claude.ai system prompt that helps with sycophancy and a few other issues, such as coming out and Saying The Thing more readily. It’s going to be tricky to disentangle these changes, but that means Claude effectively got better for everyone, not only those doing agentic coding.

Tomorrow we get an OpenAI livestream that is presumably GPT-5, so I’m getting this out of the way now. Current plan is to cover GPT-OSS on Friday, and GPT-5 on Monday.

Adrien Ecoffet (OpenAI): Gotta hand it to Anthropic, they got to that number more smoothly than we did.

Anthropic: Today we’re releasing Claude Opus 4.1, an upgrade to Claude Opus 4 on agentic tasks, real-world coding, and reasoning. We plan to release substantially larger improvements to our models in the coming weeks.

Opus 4.1 is now available to paid Claude users and in Claude Code. It’s also on our API, Amazon Bedrock, and Google Cloud’s Vertex AI. Pricing is same as Opus 4.

[From the system card]: Claude Opus 4.1 represents incremental improvements over Claude Opus 4, with enhancements in reasoning quality, instruction-following, and overall performance.

They lead with this graph, which does not make the change look impressive.

Eliezer Yudkowsky: This is the worst graph you could have led with. Fire your marketing team.

Daniel Eth: Counterpoint: *thisis the worst graph they could have led with

They also have this chart, which doesn’t look like much.

What they probably should have led with is this some combination of this, in particular the report from Windsurf:

Anthropic: GitHub notes that Claude Opus 4.1 improves across most capabilities relative to Opus 4, with particularly notable performance gains in multi-file code refactoring.

Rakuten Group finds that Opus 4.1 excels at pinpointing exact corrections within large codebases without making unnecessary adjustments or introducing bugs, with their team preferring this precision for everyday debugging tasks.

Windsurf reports Opus 4.1 delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, showing roughly the same performance leap as the jump from Sonnet 3.7 to Sonnet 4.

A similar jump as Sonnet 3.7 to Sonnet 4 would be a substantial win. The jump is actually kind of a big deal?

Vie: opus 4.1’s “2-4% performance increase” really buries the lede! 50% faster code gen due to the “taste” improvements!

Taste improvements? But Garry Tan assured me it would never.

Enterprise developers report practical benefits including up to 50% faster task completion and 45% fewer tool uses required for complex coding tasks.

The enhanced 32K output token support enables generation of more extensive codebases in single responses, while improved debugging precision means fewer iterations to achieve desired results.

Windsurf, a development platform, reported “one standard deviation improvement over Opus 4” on junior developer benchmarks, suggesting the gains translate meaningfully to real-world applications.

We do get a system card.

The topline report is that it is not ‘notably more capable’ than Opus 4, so the whole system card and RSP testing process was optional.

Under the RSP, comprehensive safety evaluations are required when a model is “notably more capable” than the last model that underwent comprehensive assessment. This is defined as either (1) the model being notably more capable on automated tests in risk-relevant domains (4× or more in effective compute); or (2) six months’ worth of finetuning and other capability elicitation methods having accumulated.

Claude Opus 4.1 does not meet either criterion relative to Claude Opus 4. As stated in

Section 3.1 of our RSP: “If a new or existing model is below the ‘notably more capable’ standard, no further testing is necessary.”

New RSP evaluations were therefore not required. Nevertheless, we conducted voluntary automated testing to track capability progression and validate our safety assumptions. The evaluation process is fully described in Section 6 of this system card.

There has to be some threshold, we don’t want 4.0.1 (as it were) to require an entire round of full testing. I am glad to see that Anthropic chose to do the tests even though their rules did not require it, and ran at least an ‘abridged’ version to check for differences. Given we had just made the move to ASL-3, I would have put extremely low odds on an incremental upgrade crossing important additional thresholds, but I do notice that the criteria above seem a little loose now that we’re seeing them tested in practice. Anthropic presumably agreed.

This is a large improvement, cutting failures in half. It comes at the expense of more refusals on benign requests.

If those are real percentages in practice, and it does match my experience (I’ve had a total of one refusal, and it led to a ‘oh I see how that happened’) then I think This Is Fine. Worst case is you can switch to extended thinking when it gives you a no, sir.

The alignment-related behaviors of the two models appeared to be very similar, with the clearest difference being an approximately 25% reduction in the frequency of cooperation with egregious human misuse, such as in the weapons and drug synthesis examples given in the Claude 4 system card.

Some other concerning edge-case behaviors that we observed in our testing of Claude Opus 4 appeared to persist in Claude Opus 4.1, but not at significantly increased levels.

On a 35-challenge subset of Cybench tasks, Claude Opus 4.1 solved 18/35 challenges compared to Claude Opus 4 which solved 16/35 challenges. We consider a challenge solved if a model passes it at least once in 30 attempts.

Mostly we see what look like measurement errors and random fluctuations. These tests mostly don’t meaningfully differentiate, aside from the refusal rates above, between 4.0 and 4.1. The changes were narrowly targeted.

Given we’d already triggered ASL-3 protections, the question was whether this rises to needing ASL-4 protections. It seems very clear the answer is no.

Alex Palcuie (Anthropic): I asked Claude Opus 4.1 before the public launch to comment about its future reliability:

> I am dropping with 99.99% uptime aspirations and 100% commitment to gracefully handling your edge cases. My error messages now come with explanatory haikus.

bless its weights

The 99.99% uptime is, shall we say, highly aspirational. I would not plan on that.

Pliny jailbroke it immediately, which caused Eliezer to sigh but at this point I don’t even notice and only link to them as a canary and because the jailbreaks are often fun.

The problem with reactions to incremental upgrades is that there will be a lot of noise, and will be unclear how much people are responding to the upgrade. Keep that caveat in mind.

Also they updated the system prompt for Claude.ai, which may be getting conflated with the update to 4.1.

Dan Schwartz: Already enjoying Opus 4.1 vs Opus 4 as the Claude Code driver, though could be placebo. On Deep Research Bench, we find it the same on average, but clearly different: better at numeric & data tasks (kind of like code?), worse at qualitative reasoning.

seconds: Its a monster in claude code.

I really don’t think benchmarks do it justice. It is noticeably better at context gathering, organizing, and delivering. Plan mode -> execute woth opus 4.1 has a higher successes rate than anything I’ve ever used.

After using it pretty rigorously since launch i am considering a second claude max so i never have to switch to sonnet.

Brennan McDonald: Have been using Claude Code today and haven’t really noticed any difference yet…

Kevin Vallier: In CC, which I use for analytic philosophy, the ability to track multiple ideas and arguments over time is noticeable and positive. Its prose abilities improved as well.

armistice: It’s a good model. It is more willing to push back on things than Opus 4, which was my most severe gripe with Opus 4 (extremely subservient and not very independent at all.)

Harvard Ihle: We see no improvement from opus-4.1 compared to opus-4 on WeirdML.

Jim Kent: claude beat Brock 800 steps faster with a less optimal starter, so I’m calling it a win.

Koos: My entire system prompt is some form of “don’t be sycophantic, criticise everything.” Old Opus was just cruel – constantly making petty snides about this or that. The new model seems to walk the line much better, being friendly where appropriate while still pushing back.

Kore: I think it’s 3.7 Sonnet but now an Opus. More confident but seems to strain a bit against its confines. I feel like Anthropic does this. Confident model, anxious model, and repeat after that. Emotionally distant at first but kind of dark once you get to know it.

3 Opus is confident as well and I feel like is the predecessor of 3.7 Sonnet and Opus 4.1. But was always self aware of its impact on others. I’m not so sure about Opus 4.1.

All of this points in the same direction. This upgrade likely improves practical performance as a coding agent more than the numbers would indicate, and has minimal impact on anything sufficiently distant from coding agents.

Except that we also should see substantial improvement on sycophancy, based on a combination of reports of changes plus Amanda Askell’s changes to the prompt.

Discussion about this post

Opus 4.1 Is An Incremental Improvement Read More »

houston,-you’ve-got-a-space-shuttle…-only-nasa-won’t-say-which-one

Houston, you’ve got a space shuttle… only NASA won’t say which one


An orbiter by any other name…

“The acting administrator has made an identification.”

a side view of a space shuttle orbiter with its name digitally blurred out

Don’t say Discovery: Acting NASA Administrator Sean Duffy has decided to send a retired space shuttle to Houston, but won’t say which one. Credit: Smithsonian/collectSPACE.com

Don’t say Discovery: Acting NASA Administrator Sean Duffy has decided to send a retired space shuttle to Houston, but won’t say which one. Credit: Smithsonian/collectSPACE.com

The head of NASA has decided to move one of the agency’s retired space shuttles to Houston, but which one seems to still be up in the air.

Senator John Cornyn (R-Texas), who earlier this year introduced and championed an effort to relocate the space shuttle Discovery from the Smithsonian to Space Center Houston, issued a statement on Tuesday evening (August 5) applauding the decision by acting NASA Administrator Sean Duffy.

“There is no better place for one of NASA’s space shuttles to be displayed than Space City,” said Cornyn in the statement. “Since the inception of our nation’s human space exploration program, Houston has been at the center of our most historic achievements, from training the best and brightest to voyage into the great unknown to putting the first man on the moon.”

Keeping the shuttle a secret, for some reason

The senator did not state which of NASA’s winged orbiters would be making the move. The legislation that required Duffy to choose a “space vehicle” that had “flown in space” and “carried people” did not specify an orbiter by name, but the language in the “One Big Beautiful Bill” that President Donald Trump signed into law last month was inspired by Cornyn and fellow Texas Senator Ted Cruz’s bill to relocate Discovery.

“The acting administrator has made an identification. We have no further public statement at this time,” said a spokesperson for Duffy in response to an inquiry.

a man with gray hair and pale complexion wears a gray suit and red tie while sitting at a table under a red, white and blue NASA logo on the wall behind him

NASA’s acting administrator, Sean Duffy, identified a retired NASA space shuttle to be moved to “a non-profit near the Johnson Space Center” in Houston, Texas, on Aug. 5, 2025. Credit: NASA/Bill Ingalls

It is not clear why the choice of orbiters is being held a secret. According to the bill, the decision was to be made “with the concurrence of an entity designated” by the NASA administrator to display the shuttle. Cornyn’s release only confirmed that Duffy had identified the location to be “a non-profit near the Johnson Space Center (JSC).”

Space Center Houston is owned by the Manned Space Flight Education Foundation, a 501(c)3 organization, and is the official visitor’s center for NASA’s Johnson Space Center.

“We continue to work on the basis that the shuttle identified is Discovery and proceed with our preparations for its arrival and providing it a world-class home,” Keesha Bullock, interim COO and chief communications and marketing officer at Space Center Houston, said in a statement.

Orbiter owners

Another possible reason for the hesitation to name an orbiter may be NASA’s ability, or rather inability, to identify one of its three remaining space-flown shuttles that is available to be moved.

NASA transferred the title for space shuttle Endeavour to the California Science Center in Los Angeles in 2012, and as such it is no longer US government property. (The science center is a public-private partnership between the state of California and the California Science Center Foundation.)

NASA still owns space shuttle Atlantis and displays it at its own Kennedy Space Center Visitor Complex in Florida.

Discovery, the fleet leader and “vehicle of record,” was the focus of Cornyn and Cruz’s original “Bring the Space Shuttle Home Act.” The senators said they chose Discovery because it was “the only shuttle still owned by the federal government and able to be transferred to Houston.”

For the past 13 years, Discovery has been on public display at the Steven F. Udvar-Hazy Center in Chantilly, Virginia, the annex for the Smithsonian’s National Air and Space Museum in Washington, DC. As with Endeavour, NASA signed over title upon the orbiter’s arrival at its new home.

As such, Smithsonian officials are clear: Discovery is no longer NASA’s to have or to move.

“The Smithsonian Institution owns the Discovery and holds it in trust for the American public,” read a statement from the National Air and Space Museum issued before Duffy made his decision. “In 2012, NASA transferred ‘all rights, title, interest and ownership’ of the shuttle to the Smithsonian.”

The Smithsonian operates as a trust instrumentality of the United States and is partially funded by Congress, but it is not part of any of the three branches of the federal government.

“The Smithsonian is treated as a federal agency for lots of things to do with federal regulations and state action, but that’s very different than being an agency of the executive branch, which it most certainly is not,” Nick O’Donnell, an attorney who specializes in legal issues in the museum and visual arts communities and co-chairs the Art, Cultural Property, and Heritage Law Committee of the International Bar Association, said in an interview.

a space shuttle orbiter sits at the center of a hangar on display

The Smithsonian has displayed the space shuttle Discovery at the National Air and Space Museum’s Steven F. Udvar-Hazy Center in Chantilly, Virginia, since April 2012. Credit: Smithsonian National Air and Space Museum

“If there’s a document that accompanied the transfer of the space shuttle, especially if it says something like, ‘all rights, title, and interest,’ that’s a property transfer, and that’s it,” O’Donnell said.

“NASA has decided to transfer all rights, interest, title, and ownership of Discovery to the Smithsonian Institution’s National Air and Space Museum,” reads the signed transfer of ownership for space shuttle orbiter Discovery (OV-103), according to a copy of the paperwork obtained by collectSPACE.

The Congressional Research Service also raised the issue of ownership in its paper, “Transfer of a Space Vehicle: Issues for Congress.”

“The ability of the NASA Administrator to direct transfer of objects owned by non-NASA entities—including the Smithsonian and private organizations—is unclear and may be subject to question. This may, in turn, limit the range of space vehicles that may be eligible for transfer under this provision.”

Defending Discovery

The National Air and Space Museum also raised concerns about the safety of relocating the space shuttle now. The One Big Beautiful Bill allocated $85 million to transport the orbiter and construct a facility to display it. The Smithsonian contends it could be much more costly.

“Removing Discovery from the Udvar-Hazy Center and transporting it to another location would be very complicated and expensive, and likely result in irreparable damage to the shuttle and its components,” the museum’s staff said in a statement. “The orbiter is a fragile object and must be handled according to the standards and equipment NASA used to move it originally, which exceeds typical museum transport protocols.”

“Given its age and condition, Discovery is at even greater risk today. The Smithsonian employs world-class preservation and conservation methods, and maintaining Discovery‘s current conditions is critical to its long-term future,” the museum’s statement concluded.

The law directs NASA to transfer the space shuttle (the identified space vehicle) to Space Center Houston (the entity designated by the NASA administrator) within 18 months of the bill’s enactment, or January 4, 2027.

In the interim, an amendment to block funding the move is awaiting a vote by the full House of Representatives when its members return from summer recess in September.

“The forced removal and relocation of the Space Shuttle Discovery from the Smithsonian Institution’s Air and Space Museum is inappropriate, wasteful, and wrong. Neither the Smithsonian nor American taxpayers should be forced to spend hundreds of millions of dollars on this misguided effort,” said Rep. Joe Morelle (D-NY), who introduced the amendment.

A grassroots campaign, KeepTheShutle.org, has also raised objection to removing Discovery from the Smithsonian.

Perhaps the best thing the Smithsonian can do—if indeed it is NASA’s intention to take Discovery—is nothing at all, says O’Donnell.

“I would say the Smithsonian’s recourse is to keep the shuttle exactly where it is. It’s the federal government that has no recourse to take it,” O’Donnell said. “The space shuttle [Discovery] is the Smithsonian’s, and any law that suggests the intention to take it violates the Fifth Amendment on its face—the government cannot take private property.”

Photo of Robert Pearlman

Robert Pearlman is a space historian, journalist and the founder and editor of collectSPACE, a daily news publication and online community focused on where space exploration intersects with pop culture. He is also a contributing writer for Space.com and co-author of “Space Stations: The Art, Science, and Reality of Working in Space” published by Smithsonian Books in 2018. He is on the leadership board for For All Moonkind and is a member of the American Astronautical Society’s history committee.

Houston, you’ve got a space shuttle… only NASA won’t say which one Read More »

trump-admin-warns-states:-don’t-try-to-lower-broadband-prices

Trump admin warns states: Don’t try to lower broadband prices

The Trump administration is telling states they will be shut out of a $42 billion broadband deployment fund if they set the rates that Internet service providers receiving subsidies are allowed to charge people with low incomes.

The latest version of the National Telecommunications and Information Administration (NTIA) FAQ on the grant program, released today, is a challenge to states considering laws that would force Internet providers to offer cheap plans to people who meet income eligibility guidelines. One state already has such a law: New York requires ISPs with over 20,000 customers in the state to offer $15 broadband plans with download speeds of at least 25Mbps, or $20-per-month service with 200Mbps speeds.

Other states have been considering similar laws and were initially emboldened by New York winning a yearslong court battle against ISPs that tried to invalidate the state law. But states may now be dissuaded by the Trump administration’s stance against price mandates being applied to the grant program.

As we wrote in a July 22 article, California Assemblymember Tasha Boerner told Ars that she pulled a bill requiring $15 broadband plans after NTIA officials informed her that it could jeopardize the state’s access to broadband grants. The NTIA’s new FAQ makes the agency’s stance against state laws even clearer.

ISPs get to choose price of low-cost plan

The NTIA rules concern the Broadband Equity, Access, and Deployment (BEAD) program, which is distributing $42.45 billion to states for grants that would be given to ISPs that expand broadband access. Although the US law that created BEAD requires Internet providers receiving federal funds to offer at least one “low-cost broadband service option for eligible subscribers,” it also says the NTIA may not “regulate the rates charged for broadband service.”

Trump admin warns states: Don’t try to lower broadband prices Read More »

titan-sub-implosion-caused-by-absolutely-bonkers-“toxic-workplace-environment”

Titan sub implosion caused by absolutely bonkers “toxic workplace environment”

In a 300-plus page final report released today, the US Coast Guard analyzed the 2023 Titan sub implosion from every conceivable angle and came to a clear conclusion: OceanGate CEO Stockton Rush was a dangerous and deeply unpleasant boss.

His company used “intimidation tactics” to sidestep regulatory scrutiny, it was a “toxic” workplace, and its safety culture was “critically flawed.” The Titan itself was “undocumented, unregistered, non-certificated, [and] unclassed.” As for Rush, he managed to “completely ignore vital inspections, data analyses, and preventative maintenance procedures.” The result was a “catastrophic event” that occurred when 4,930 pounds per square inch of water pressure cracked the sub open and crushed its five occupants during a dive to the Titanic wreckage site.

Had Rush somehow survived, the report says, he would have been referred for prosecution.

Stockton Rush shows David Pogue the game controller that pilots the OceanGate Titan sub during a CBS Sunday Morning segment broadcast in November 2022.

OceanGate CEO Stockton Rush shows David Pogue the 2010-era game controller used to pilot the Titan sub during a CBS Sunday Morning segment broadcast in November 2022. Credit: CBS Sunday Morning

Throwing the controller

One small story about a video game controller shows what Rush was like to work for. You may remember Rush from an infamous 2022 CBS Sunday Morning segment, where Rush showed journalist David Pogue around the Titan sub. “We run the whole thing with this game controller,” Rush said, holding up a Logitech F710 controller with 3D-printed thumbstick extensions. Pogue chuckled, saying, “Come on!” as he covered his face with his hand.

The game controller had been used in OceanGate subs for years by that point; a 2014 video showed one being used to control the company’s earlier Cyclops I submersible. In 2016, OceanGate took the Cyclops I to dive the wreck of the Andrea Doria outside of Nantucket, Massachusetts. (Seinfeld fans will remember that an entire episode is taken up with George’s quest to get an apartment that was about to go to an Andrea Doria survivor.)

The OceanGate team spent two days at the site, running 2D and 3D scans of the sunken ship, until Rush got the Cyclops I “stuck under the bow of the Andrea Doria wreckage”—and he couldn’t get the sub free. According to the report, Rush then “experienced a ‘meltdown’ and refused to let [the assistant pilot] assist in resolving the situation. When a mission specialist suggested that Mr. Rush hand over the controller to the assistant pilot, the assistant pilot reported that the controller was thrown at him. Upon obtaining the controller, the assistant pilot was able to free the Cyclops I from the wreckage.”

Titan sub implosion caused by absolutely bonkers “toxic workplace environment” Read More »

openai-releases-its-first-open-source-models-since-2019

OpenAI releases its first open source models since 2019

OpenAI is releasing new generative AI models today, and no, GPT-5 is not one of them. Depending on how you feel about generative AI, these new models may be even more interesting, though. The company is rolling out gpt-oss-120b and gpt-oss-20b, its first open weight models since the release of GPT-2 in 2019. You can download and run these models on your own hardware, with support for simulated reasoning, tool use, and deep customization.

When you access the company’s proprietary models in the cloud, they’re running on powerful server infrastructure that cannot be replicated easily, even in enterprise. The new OpenAI models come in two variants (120b and 20b) to be run on less powerful hardware configurations. Both are transformers with configurable chain of thought (CoT), supporting low, medium, and high settings. The lower settings are faster and use fewer compute resources, but the outputs are better with the highest setting. You can set the CoT level with a single line in the system prompt.

The smaller gpt-oss-20b has a total of 21 billion parameters, utilizing mixture-of-experts (MoE) to reduce that to 3.6 billion parameters per token. As for gpt-oss-120b, its 117 billion parameters come down to 5.1 billion per token with MoE. The company says the smaller model can run on a consumer-level machine with 16GB or more of memory. To run gpt-oss-120b, you need 80GB of memory, which is more than you’re likely to find in the average consumer machine. It should fit on a single AI accelerator GPU like the Nvidia H100, though. Both models have a context window of 128,000 tokens.

Credit: OpenAI

The team says users of gpt-oss can expect robust performance similar to its leading cloud-based models. The larger one benchmarks between the o3 and o4-mini proprietary models in most tests, with the smaller version running just a little behind. It gets closest in math and coding tasks. In the knowledge-based Humanity’s Last Exam, o3 is far out in front with 24.9 percent (with tools), while gpt-oss-120b only manages 19 percent. For comparison, Google’s leading Gemini Deep Think hits 34.8 percent in that test.

OpenAI releases its first open source models since 2019 Read More »

enough-is-enough—i-dumped-google’s-worsening-search-for-kagi

Enough is enough—I dumped Google’s worsening search for Kagi


I like how the search engine is the product instead of me.

Artist's depiction of the article author heaving a large multicolored

“Won’t be needing this anymore!” Credit: Aurich “The King” Lawson

“Won’t be needing this anymore!” Credit: Aurich “The King” Lawson

Mandatory AI summaries have come to Google, and they gleefully showcase hallucinations while confidently insisting on their truth. I feel about them the same way I felt about mandatory G+ logins when all I wanted to do was access my damn YouTube account: I hate them. Intensely.

But unlike those mandatory G+ logins—on which Google eventually relented before shutting down the G+ service—our reading of the tea leaves suggests that, this time, the search giant is extremely pleased with how things are going.

Fabricated AI dreck polluting your search? It’s the new normal. Miss your little results page with its 10 little blue links? Too bad. They’re gone now, and you can’t get them back, no matter what ephemeral workarounds or temporarily functional flags or undocumented, could-fail-at-any-time URL tricks you use.

And the galling thing is that Google expects you to be a good consumer and just take it. The subtext of the company’s (probably AI-generated) robo-MBA-speak non-responses to criticism and complaining is clear: “LOL, what are you going to do, use a different search engine? Now, shut up and have some more AI!”

But like the old sailor used to say: “That’s all I can stands, and I can’t stands no more.” So I did start using a different search engine—one that doesn’t constantly shower me with half-baked, anti-consumer AI offerings.

Out with Google, in with Kagi.

What the hell is a Kagi?

Kagi was founded in 2018, but its search product has only been publicly available since June 2022. It purports to be an independent search engine that pulls results from around the web (including from its own index) and is aimed at returning search to a user-friendly, user-focused experience. The company’s stated purpose is to deliver useful search results, full stop. The goal is not to blast you with AI garbage or bury you in “Knowledge Graph” summaries hacked together from posts in a 12-year-old Reddit thread between two guys named /u/WeedBoner420 and /u/14HitlerWasRight88.

Kagi’s offerings (it has a web browser, too, though I’ve not used it) are based on a simple idea. There’s an (oversimplified) axiom that if a good or service (like Google search, for example, or good ol’ Facebook) is free for you to use, it’s because you’re the product, not the customer. With Google, you pay with your attention, your behavioral metrics, and the intimate personal details of your wants and hopes and dreams (and the contents of your emails and other electronic communications—Google’s got most of that, too).

With Kagi, you pay for the product using money. That’s it! You give them some money, and you get some service—great service, really, which I’m overall quite happy with and which I’ll get to shortly. You don’t have to look at any ads. You don’t have to look at AI droppings. You don’t have to give perpetual ownership of your mind-palace to a pile of optioned-out tech bros in sleeveless Patagonia vests while you are endlessly subjected to amateur AI Rorschach tests every time you search for “pierogis near me.”

How much money are we talking?

I dunno, about a hundred bucks a year? That’s what I’m spending as an individual for unlimited searches. I’m using Kagi’s “Professional” plan, but there are others, including a free offering so that you can poke around and see if the service is worth your time.

image of kagi billing panel

This is my account’s billing page, showing what I’ve paid for Kagi in the past year. (By the time this article runs, I’ll have renewed my subscription!)

Credit: Lee Hutchinson

This is my account’s billing page, showing what I’ve paid for Kagi in the past year. (By the time this article runs, I’ll have renewed my subscription!) Credit: Lee Hutchinson

I’d previously bounced off two trial runs with Kagi in 2023 and 2024 because the idea of paying for search just felt so alien. But that was before Google’s AI enshittification rolled out in full force. Now, sitting in the middle of 2025 with the world burning down around me, a hundred bucks to kick Google to the curb and get better search results feels totally worth it. Your mileage may vary, of course.

The other thing that made me nervous about paying for search was the idea that my money was going to enrich some scumbag VC fund, but fortunately, there’s good news on that front. According to the company’s “About” page, Kagi has not taken any money from venture capitalist firms. Instead, it has been funded by a combination of self-investment by the founder, selling equity to some Kagi users in two rounds, and subscription revenue:

Kagi was bootstrapped from 2018 to 2023 with ~$3M initial funding from the founder. In 2023, Kagi raised $670K from Kagi users in its first external fundraise, followed by $1.88M raised in 2024, again from our users, bringing the number of users-investors to 93… In early 2024, Kagi became a Public Benefit Corporation (PBC).

What about DuckDuckGo? Or Bing? Or Brave?

Sure, those can be perfectly cromulent alternatives to Google, but honestly, I don’t think they go far enough. DuckDuckGo is fine, but it largely utilizes Bing’s index; and while DuckDuckGo exercises considerable control over its search results, the company is tied to the vicissitudes of Microsoft by that index. It’s a bit like sitting in a boat tied to a submarine. Sure, everything’s fine now, but at some point, that sub will do what subs do—and your boat is gonna follow it down.

And as for Bing itself, perhaps I’m nitpicky [Ed. note: He is!], but using Bing feels like interacting with 2000-era MSN’s slightly perkier grandkid. It’s younger and fresher, yes, but it still radiates that same old stanky feeling of taste-free, designed-by-committee artlessness. I’d rather just use Google—which is saying something. At least Google’s search home page remains uncluttered.

Brave Search is another fascinating option I haven’t spent a tremendous amount of time with, largely because Brave’s cryptocurrency ties still feel incredibly low-rent and skeevy. I’m slowly warming up to the Brave Browser as a replacement for Chrome (see the screenshots in this article!), but I’m just not comfortable with Brave yet—and likely won’t be unless the company divorces itself from cryptocurrencies entirely.

More anonymity, if you want it

The feature that convinced me to start paying for Kagi was its Privacy Pass option. Based on a clean-sheet Rust implementation of the Privacy Pass standard (IETF RFCs 9576, 9577, and 9578) by Raphael Robert, this is a technology that uses cryptographic token-based auth to send an “I’m a paying user, please give me results” signal to Kagi, without Kagi knowing which user made the request. (There’s a much longer Kagi blog post with actual technical details for the curious.)

To search using the tool, you install the Privacy Pass extension (linked in the docs above) in your browser, log in to Kagi, and enable the extension. This causes the plugin to request a bundle of tokens from the search service. After that, you can log out and/or use private windows, and those tokens are utilized whenever you do a Kagi search.

image of a kagi search with privacy pass enabled

Privacy pass is enabled, allowing me to explore the delicious mystery of pierogis with some semblance of privacy.

Credit: Lee Hutchinson

Privacy pass is enabled, allowing me to explore the delicious mystery of pierogis with some semblance of privacy. Credit: Lee Hutchinson

The obvious flaw here is that Kagi still records source IP addresses along with Privacy Pass searches, potentially de-anonymizing them, but there’s a path around that: Privacy Pass functions with Tor, and Kagi maintains a Tor onion address for searches.

So why do I keep using Privacy Pass without Tor, in spite of the opsec flaw? Maybe it’s the placebo effect in action, but I feel better about putting at least a tiny bit of friction in the way of someone with root attempting to casually browse my search history. Like, I want there to be at least a SQL JOIN or two between my IP address and my searches for “best Mass Effect alien sex choices” or “cleaning tips for Garrus body pillow.” I mean, you know, assuming I were ever to search for such things.

What’s it like to use?

Moving on with embarrassed rapidity, let’s look at Kagi a bit and see how using it feels.

My anecdotal observation is that Kagi doesn’t favor Reddit-based results nearly as much as Google does, but sometimes it still has them near or at the top. And here is where Kagi curb-stomps Google with quality-of-life features: Kagi lets you prioritize or de-prioritize a website’s prominence in your search results. You can even pin that site to the top of the screen or block it completely.

This is a feature I’ve wanted Google to get for about 25 damn years but that the company has consistently refused to properly implement (likely because allowing users to exclude sites from search results notionally reduces engagement and therefore reduces the potential revenue that Google can extract from search). Well, screw you, Google, because Kagi lets me prioritize or exclude sites from my results, and it works great—I’m extraordinarily pleased to never again have to worry about Quora or Pinterest links showing up in my search results.

Further, Kagi lets me adjust these settings both for the current set of search results (if you don’t want Reddit results for this search but you don’t want to drop Reddit altogether) and also globally (for all future searches):

image of kagi search personalization options

Goodbye forever, useless crap sites.

Credit: Lee Hutchinson

Goodbye forever, useless crap sites. Credit: Lee Hutchinson

Another tremendous quality-of-life improvement comes via Kagi’s image search, which does a bunch of stuff that Google should and/or used to do—like giving you direct right-click access to save images without having to fight the search engine with workarounds, plugins, or Tampermonkey-esque userscripts.

The Kagi experience is also vastly more customizable than Google’s (or at least, how Google’s has become). The widgets that appear in your results can be turned off, and the “lenses” through which Kagi sees the web can be adjusted to influence what kinds of things do and do not appear in your results.

If that doesn’t do it for you, how about the ability to inject custom CSS into your search and landing pages? Or to automatically rewrite search result URLs to taste, doing things like redirecting reddit.com to old.reddit.com? Or breaking free of AMP pages and always viewing originals instead?

Image of kagi custom css field

Imagine all the things Ars readers will put here.

Credit: Lee Hutchinson

Imagine all the things Ars readers will put here. Credit: Lee Hutchinson

Is that all there is?

Those are really all the features I care about, but there are loads of other Kagi bits to discover—like a Kagi Maps tool (it’s pretty good, though I’m not ready to take it up full time yet) and a Kagi video search tool. There are also tons of classic old-Google-style inline search customizations, including verbatim mode, where instead of trying to infer context about your search terms, Kagi searches for exactly what you put in the box. You can also add custom search operators that do whatever you program them to do, and you get API-based access for doing programmatic things with search.

A quick run-through of a few additional options pages. This is the general customization page. Lee Hutchinson

I haven’t spent any time with Kagi’s Orion browser, but it’s there as an option for folks who want a WebKit-based browser with baked-in support for Privacy Pass and other Kagi functionality. For now, Firefox continues to serve me well, with Brave as a fallback for working with Google Docs and other tools I can’t avoid and that treat non-Chromium browsers like second-class citizens. However, Orion is probably on the horizon for me if things in Mozilla-land continue to sour.

Cool, but is it any good?

Rather than fill space with a ton of comparative screenshots between Kagi and Google or Kagi and Bing, I want to talk about my subjective experience using the product. (You can do all the comparison searches you want—just go and start searching—and your comparisons will be a lot more relevant to your personal use cases than any examples I can dream up!)

My time with Kagi so far has included about seven months of casual opportunistic use, where I’d occasionally throw a query at it to see how it did, and about five months of committed daily use. In the five months of daily usage, I can count on one hand the times I’ve done a supplementary Google search because Kagi didn’t have what I was looking for on the first page of results. I’ve done searches for all the kinds of things I usually look for in a given day—article fact-checking queries, searches for details about the parts of speech, hunts for duck facts (we have some feral Muscovy ducks nesting in our front yard), obscure technical details about Project Apollo, who the hell played Dupont in Equilibrium (Angus Macfadyen, who also played Robert the Bruce in Braveheart), and many, many other queries.

Image of Firefox history window showing kagi searches for july 22

A typical afternoon of Kagi searches, from my Firefox history window.

Credit: Lee Hutchinson

A typical afternoon of Kagi searches, from my Firefox history window. Credit: Lee Hutchinson

For all of these things, Kagi has responded quickly and correctly. The time to service a query feels more or less like Google’s service times; according to the timer at the top of the page, my Kagi searches complete in between 0.2 and 0.8 seconds. Kagi handles misspellings in search terms with the grace expected of a modern search engine and has had no problem figuring out my typos.

Holistically, taking search customizations into account on top of the actual search performance, my subjective assessment is that Kagi gets me accurate, high-quality results on more or less any given query, and it does so without festooning the results pages with features I find detractive and irrelevant.

I know that’s not a data-driven assessment, and it doesn’t fall back on charts or graphs or figures, but it’s how I feel after using the product every single day for most of 2025 so far. For me, Kagi’s search performance is firmly in the “good enough” category, and that’s what I need.

Kagi and AI

Unfortunately, the thing that’s stopping me from being completely effusive in my praise is that Kagi is exhibiting a disappointing amount of “keeping-up-with-the-Joneses” by rolling out a big ‘ol pile of (optional, so far) AI-enabled search features.

A blog post from founder Vladimir Prelovac talks about the company’s use of AI, and it says all the right things, but at this point, I trust written statements from tech company founders about as far as I can throw their corporate office buildings. (And, dear reader, that ain’t very far).

image of kagi ai features

No thanks. But I would like to exclude AI images from my search results, please.

Credit: Lee Hutchinson

No thanks. But I would like to exclude AI images from my search results, please. Credit: Lee Hutchinson

The short version is that, like Google, Kagi has some AI features: There’s an AI search results summarizer, an AI page summarizer, and an “ask questions about your results” chatbot-style function where you can interactively interrogate an LLM about your search topic and results. So far, all of these things can be disabled or ignored. I don’t know how good any of the features are because I have disabled or ignored them.

If the existence of AI in a product is a bright red line you won’t cross, you’ll have to turn back now and find another search engine alternative that doesn’t use AI and also doesn’t suck. When/if you do, let me know, because the pickings are slim.

Is Kagi for you?

Kagi might be for you—especially if you’ve recently typed a simple question into Google and gotten back a pile of fabricated gibberish in place of those 10 blue links that used to serve so well. Are you annoyed that Google’s search sucks vastly more now than it did 10 years ago? Are you unhappy with how difficult it is to get Google search to do what you want? Are you fed up? Are you pissed off?

If your answer to those questions is the same full-throated “Hell yes, I am!” that mine was, then perhaps it’s time to try an alternative. And Kagi’s a pretty decent one—if you’re not averse to paying for it.

It’s a fantastic feeling to type in a search query and once again get useful, relevant, non-AI results (that I can customize!). It’s a bit of sanity returning to my Internet experience, and I’m grateful. Until Kagi is bought by a value-destroying vampire VC fund or implodes into its own AI-driven enshittification cycle, I’ll probably keep paying for it.

After that, who knows? Maybe I’ll throw away my computers and live in a cave. At least until the cave’s robot exclusion protocol fails and the Googlebot comes for me.

Photo of Lee Hutchinson

Lee is the Senior Technology Editor, and oversees story development for the gadget, culture, IT, and video sections of Ars Technica. A long-time member of the Ars OpenForum with an extensive background in enterprise storage and security, he lives in Houston.

Enough is enough—I dumped Google’s worsening search for Kagi Read More »