Author name: Tim Belzer

we-put-the-new-pocket-size-vinyl-format-to-the-test—with-mixed-results

We put the new pocket-size vinyl format to the test—with mixed results


is that a record in your pocket?

It’s a fun new format, but finding a place in the market may be challenging.

A 4-inch Tiny Vinyl record. Credit: Chris Foresman

A 4-inch Tiny Vinyl record. Credit: Chris Foresman

We recently looked at Tiny vinyl, a new miniature vinyl single format developed through a collaboration between a toy industry veteran and the world’s largest vinyl record manufacturer. The 4-inch singles are pressed in a process nearly identical to standard 12-inch LPs or 7-inch singles, except everything is smaller. They have a standard-size spindle hole and play at 33⅓ RPM, and they hold up to four minutes of music per side.

Several smaller bands, like The Band Loula and Rainbow Kitten Surprise, and some industry veterans like Blake Shelton and Melissa Etheridge, have already experimented with the format. But Tiny Vinyl partnered with US retail giant Target for its big coming-out party this fall, with 44 exclusive titles launching throughout the end of this year.

Tiny Vinyl supplied a few promotional copies of releases from former America’s Got Talent finalist Grace VanderWaal, The Band Loula, country pop stars Florida Georgia Line, and jazz legends the Vince Guaraldi Trio so I could get a first-hand look at how the records actually play. I tested these titles as well as several others I picked up at retail, playing them on an Audio Technica LP-120 direct drive manual turntable connected to a Yamaha S-301 integrated amplifier and playing through a pair of vintage Klipsch kg4 speakers.

I also played them out on a Crosley portable suitcase-style turntable, and for fun, I tried to play them on the miniature RSD3 turntable made for 3-inch singles to try to see what’s possible with a variety of hardware.

Tiny Vinyl releases cover several genres, including hip-hop, rock, country, pop, indie, and show tunes. Credit: Chris Foresman

Automatic turntables need not apply

First and foremost, I’ll note that the 4-inch diameter is essentially the same size as the label on a standard 12-inch LP. So any sort of automatic turntable won’t really work for 4-inch vinyl; most aren’t equipped to set the stylus at anything other than 12 inches or 7 inches, and even if they could, the automatic return would kick in before reaching the grooves where the music starts. Some automatic turntables allow switching to a manual mode, but they otherwise cannot play Tiny Vinyl records.

But if you have a turntable with a fully manual tonearm—including a wide array from DJ-style direct drive turntables or audiophile belt-drive turntables like those from Fluance, U-turn, or Pro-ject—you’re in luck. The tonearm can be placed on these records, and they will track the grooves well.

Lining up the stylus can be a challenge with such small records, but once it’s in place, the stylus on my LP120—a nude elliptical—tracked well. I also tried a few listens with a standard conical stylus since that’s what would be most common across a variety of low- and mid-range turntables. The elliptical stylus tracked slightly better in our experience; higher-end styli may track the extremely fine grooves even better but would probably be overkill given that the physical limitations of the format introduce some distortion, which would likely be more apparent with such gear.

While Tiny Vinyl will probably appeal most to pop music fans, I played a variety of music styles, including rock, country, dance pop, hip-hop, jazz, and even showtunes. The main sonic difference I noted when a direct comparison was available was that the Tiny Vinyl version of a track tended to sound quieter than the same track playing on a 12-inch LP at the same volume setting on the amplifier.

This Kacey Musgraves Tiny Vinyl includes songs from her album Deeper Well. Credit: Chris Foresman

It’s not unusual for different records to be mastered at different volumes; making the overall sound quieter means smaller modulations in the groove so they can be placed closer together. This is true for any album that has a side running longer than about 22 minutes, but it’s especially important to maintain the four-minute runtime on Tiny Vinyl. (This is also why the last song or two on many LP slides tend to be quieter or slower songs; it’s easier for these songs to sound better at the center of the record, where linear tracking speed decreases.)

That said, most of the songs I listened to tended to have a slight but audible increase in distortion as the grooves approached the physical limits of alignment for the stylus. This was usually only perceptible in the last several seconds of a song, which more discerning listeners would likely find objectionable. But sound quality overall is still comparable to typical vinyl records. It won’t compare to the most exacting pressings from the likes of Mobile Fidelity Labs, for instance, but then again, the sort of audiophile who would pay for the equipment to get the most out of such records probably won’t buy Tiny Vinyl in the first place, except perhaps as a conversation piece.

I also tried playing our Tiny Vinyl on a Crosley suitcase-style turntable since it has a manual tone arm. The model I have on hand has an Audio Technica AT3600L cartridge and stereo speakers, so it’s a bit nicer than the entry-level Cruiser models you’ll typically find at malls or department stores. But these are extremely popular first turntables for a lot of young people, so it seemed reasonable to consider how Tiny Vinyl sounds on these devices.

Unfortunately, I couldn’t play Tiny Vinyl on this turntable. Despite having a manual tone arm and an option to turn off the auto-start and stop of the turntable platter, the Crosley platter is designed for 7-inch and 12-inch vinyl—the Tiny Vinyl we tried wouldn’t even spin on the turntable without the addition of a slipmat of some kind.

Once I got it spinning, though, the tone arm simply would not track beyond the first couple of grooves before hitting some physical limitation of its gimbal. Since many of the suitcase-style turntables often share designs and parts, I suspect this would be a problem for most of the Crosley, Victrola, or other brands you might find at a big-box retailer.

Some releases really take advantage of the extra real estate of the gatefold jacket and printed inner sleeve,  Chris Foresman

Additionally, I compared the classic track “Linus and Lucy” from A Charlie Brown Christmas with a 2012 pressing of the full album, as well as the 2019 3-inch version using an adapter, all on the LP-120, to give readers the best comparison across formats.

Again, the LP version of the seminal soundtrack from A Charlie Brown Christmas sounded bright and noticeably louder than its 4-inch counterpart. No major surprises here. And of course, the LP includes the entire soundtrack, so if you’re a big fan of the film or the kind of contemplative, piano-based jazz that Vince Guaraldi is famous for, you’ll probably spring for the full album.

The 3-inch version of “Linus and Lucy” unsurprisingly sounds fairly comparable to the Tiny Vinyl version, with a much quieter playback at the same amplifier settings. But it also sounds a lot noisier, likely due to the differences in materials used in manufacturing.

Though 3-inch records can play on standard turntables, as I did here, they’re designed to go hand-in-hand with one of the many Crosley RSD3 variants released in the last five years, or on the Crosley Mini Cruiser turntable. If you manage to pick up an original 8ban player, you could get the original lo-fi, “noisy analog” sound that Bandai had intended as well. That’s really part of the 3-inch vinyl aesthetic.

Newer 3-inch vinyl singles are coming with a standard spindle hole, which makes them easier to play on standard turntables. It also means there are now adapters for the tiny spindle to fit these holes, so you can technically put a 4-inch single on them. But due to the design of the tonearm and its rest, the stylus won’t swing out to the edge of Tiny Vinyl; instead, you can only play starting at grooves around the 3-inch mark. It’s a little unfortunate because it would otherwise be fun to play these miniature singles on hardware that is a little more right-sized ergonomically.

Big stack of tiny records. Credit: Chris Foresman

Four-inch Tiny Vinyl singles, on the other hand, are intended to be played on standard turntables, and they do that fairly well as long as you can manually place the tonearm and it’s not otherwise limited physically from tracking its grooves. The sound was not expected to compare to a quality 12-inch pressing, and it doesn’t. But it still sounds good. And especially if your available space is at a premium, you might consider a Tiny Vinyl with the most well-known and popular tracks from a certain album or artist (like these songs from A Charlie Brown Christmas) over a full album that may cost upward of $35.

Fun for casual listeners, not for audiophiles

Overall, Tiny Vinyl still offers much of the visceral experience of playing standard vinyl records—the cover art, the liner notes, handling the record as you place it on the turntable—just in miniature. The cost is less than a typical LP, and the weight is significantly less, so there are definite benefits for casual listeners. On the other hand, serious collectors will gravitate toward 12-inch albums and—perhaps less so—7-inch singles. Ironically, the casual listeners the format would most likely appeal to are the least likely to have the equipment to play it. That will limit Tiny Vinyl’s mass-market appeal outside of just being a cool thing to put on the shelf that technically could be played on a turntable.

The Good:

  • Small enough to easily fit in a jacket pocket or the like
  • Use less resources to make and ship
  • With the gatefold jacket, printed inner sleeve, and color vinyl options, these look as cool as most full-size albums
  • Plays fine on manual turntables

The Bad:

  • Sound quality is (unsurprisingly) compromised
  • Price isn’t lower than typical 7-inch singles

The Ugly:

  • Won’t work on automatic-only turntables, like the very popular AT-LP60 series or the very popular suitcase-style turntables that are often an inexpensive “first” turntable for many

We put the new pocket-size vinyl format to the test—with mixed results Read More »

blast-from-the-past:-15-movie-gems-of-1985

Blast from the past: 15 movie gems of 1985


Beyond the blockbusters: This watch list has something for everyone over the long holiday weekend.

Peruse a list of films released in 1985 and you’ll notice a surprisingly high number of movies that have become classics in the ensuing 40 years. Sure, there were blockbusters like Back to the Future, The Goonies, Pale Rider, The Breakfast Club and Mad Max: Beyond Thunderdome, but there were also critical arthouse favorites like Kiss of the Spider Woman and Akira Kurosawa’s masterpiece, Ran. Since we’re going into a long Thanksgiving weekend, I’ve made a list, in alphabetical order, of some of the quirkier gems from 1985 that have stood the test of time. (Some of the films first premiered at film festivals or in smaller international markets in 1984, but they were released in the US in 1985.)

(Some spoilers below but no major reveals.)

After Hours

young nerdy man in black shirt and casual tan jacket looking anxious

Credit: Warner Bros.

Have you ever had a dream, bordering on a nightmare, where you were trying desperately to get back home but obstacle after obstacle kept getting in your way? Martin Scorsese’s After Hours is the cinematic embodiment of that anxiety-inducing dreamscape. Griffin Dunne stars as a nebbishy computer data entry worker named Paul, who meets a young woman named Marcy (Rosanna Arquette) and heads off to SoHo after work to meet her. The trouble begins when his $20 cab fare blows out the window en route. The date goes badly, and Paul leaves, enduring a string of increasingly strange encounters as he tries to get back to his uptown stomping grounds.

After Hours is an unlikely mix of screwball comedy and film noir, and it’s to Scorsese’s great credit that the film strikes the right tonal balance, given that it goes to some pretty bizarre and occasionally dark places. The film only grossed about $10 million at the box office but received critical praise, and it’s continued to win new fans ever since, even inspiring an episode of Ted Lasso. It might not rank among Scorsese’s masterworks, but it’s certainly among the director’s most original efforts.

Blood Simple

man in tan suit crawling on the pavement at night in front of truck with headlights glaring. Feet of a man holding an axe is off to the right.

Credit: Circle Films

Joel and Ethan Coen are justly considered among today’s foremost filmmakers; they’ve made some of my favorite films of all time. And it all started with Blood Simple, the duo’s directorial debut, a neo-noir crime thriller set in small-town Texas. Housewife Abby (Frances McDormand) is having an affair with a bartender named Ray (John Getz). Her abusive husband, Julian (Dan Hedaya), has hired a private investigator named Visser (M. Emmet Walsh) and finds out about the affair. He then asks Visser to kill the couple for $10,000. Alas, things do not go as planned as everyone tries to outsmart everyone else, with disastrous consequences.

Blood Simple has all the elements that would become trademarks of the Coen brothers’ distinctive style: it’s both brutally violent and acerbically funny, with low-key gallows humor, not to mention inventive camerawork and lighting. The Coens accomplished a lot with their $1.5 million production budget. And you can’t beat that cast. (It was McDormand’s first feature role; she would go on to win her first Oscar for her performance in 1996’s Fargo.) The menacing shot of Ray dragging a shovel across the pavement toward a badly wounded Julian crawling on the road, illuminated by a car’s headlights, is one for the ages.

Brazil

anxious man being restrained with his head in a weird futuristic helmet

Credit: Universal Pictures

Terry Gilliam’s Oscar-nominated, Orwellian sci-fi tragicomedy, Brazil, is part of what the director has called his “Trilogy of Imagination,” along with 1981’s Time Bandits and 1988’s The Adventures of Baron Munchausen. Jonathan Pryce stars as a low-ranking bureaucrat named Sam Lowry who combats the soul-crushing reality of his bleak existence with elaborate daydreams in which he is a winged warrior saving a beautiful damsel in distress. One day, a bureaucratic error confuses Sam with a wanted terrorist named Archibald Tuttle (Robert De Niro), setting off a darkly comic series of misadventures as Sam tries to prove his true identity (and innocence). That’s when he meets Jill (Kim Greist), a dead ringer for his dream woman.

Along with 12 Monkeys and Monty Python and the Holy Grail, Brazil represents Gilliam at his best, yet it was almost not released in the US because Gilliam refused the studio’s request to give the film a happy ending. Each side actually ran ads in Hollywood trades presenting their respective arguments, and Gilliam ultimately prevailed. The film has since become a critical favorite and an essential must-watch for Gilliam fans. Special shoutout to Katherine Helmond’s inspired supporting performance as Sam’s mother Ida and her addiction to bad plastic surgery (“It’s just a little complication….”).

Clue

a group of people in dinner party fancy dress staring at the door.

Credit: Paramount Pictures

Benoit Blanc may hate the game Clue, but it’s delighted people of all ages for generations. And so has the deliciously farcical film adaptation featuring an all-star cast. Writer/director Jonathan Lynn (My Cousin Vinny) does a great job fleshing out the game’s premise and characters. A group of people is invited to an isolated mansion for a dinner with “Mr. Boddy” (Lee Ving) and are greeted by the butler, Wadsworth (Tim Curry). There is Mrs. Peacock (Eileen Brennan), Mrs. White (Madeline Kahn), Professor Plum (Christopher Lloyd), Mr. Green (Michael McKean), Colonel Mustard (Martin Mull), and Miss Scarlet (Lesley Ann Warren).

After dinner, Mr. Boddy reveals that he is the one who has been blackmailing them all, and when the lights suddenly go out, he is murdered. As everyone frantically tries to figure out whodunnit, more bodies begin to pile up, culminating in three different endings. (A different ending was shown in each theater but now all three are included.) The script is packed with bad puns and slapstick scenarios,  delivered with impeccable comic timing by the gifted cast. And who could forget Kahn’s famous ad-libbed line: “Flames… on the side of my face“? Like several films on this list, Clue got mixed reviews and bombed at the box office, but found its audience in subsequent decades. It’s now another cult classic that holds up even after multiple rewatchings.

The Company of Wolves

beautiful young dark-haired girl in a red hooded cape talking to a darkly handsome young man with a rakish look about him

Credit: ITC Entertainment

Director Neil Jordan’s sumptuous Gothic fantasy horror is a haunting twist on “Little Red Riding Hood” adapted from a short story by Angela Carter in her anthology of fairy-tale reinventions, The Bloody Chamber. The central narrative concerns a young girl named Rosaleen (Sarah Patterson) who sports a knitted red cape and encounters a rakish huntsman/werewolf (Micha Bergese) in the woods en route to her grandmother’s (Angela Lansbury) house. There are also several embedded wolf-centric fairy tales, two told by Rosaleen and two told by the grandmother.

Jordan has described this structure as “a story with very different movements,” all variations on the central theme and “building to the fairy tale that everybody knows.” The production design and gorgeously sensual cinematography—all achieved on a limited $2 million budget—further enhance the dreamlike atmosphere.  The Company of Wolves, like the fairy tale that inspired it, is an unapologetically Freudian metaphor for Rosaleen’s romantic and sexual awakening, in which she discovers her own power, which both frightens and fascinates her. It’s rare to find such a richly layered film rife with symbolism and brooding imagery.

Desperately Seeking Susan

two young women, similar in appearance, dressed in 1980s New Wave outfits and striking a sultry pose for the camera

Credit: Orion Pictures

In this quintessential 1980s screwball comedy about mistaken identity, Roberta (Rosanna Arquette) is a dissatisfied upper-class New Jersey housewife fascinated by the local tabloid personal ads, especially messages between two free-spirited bohemian lovers, Susan (Madonna) and Jim (Robert Joy). She follows Susan one day and is conked on the head when a mob enforcer mistakes her for Susan, who had stolen a pair of valuable earrings from another paramour, who had stolen them from a mobster in turn. Roberta comes to with amnesia and, believing herself to be Susan, is befriended by Jim’s best friend, Dez (Aidan Quinn).

Desperately Seeking Susan is director Susan Seidelman’s love letter to the (admittedly sanitized) 1980s counterculture of Manhattan’s Lower East Side, peppered with cameo appearances by performance artists, musicians, comedians, actors, painters, and so forth of that time period. The script is rife with witty one-liners and a stellar supporting cast, including John Turturro as the owner of a seedy Magic Club, Laurie Metcalf as Roberta’s sister-in-law Leslie, and a deadpan Steven Wright as Leslie’s dentist love interest. It’s breezy, infectious, frothy fun, and easily Madonna’s best acting role, perhaps because she is largely playing herself.

Dreamchild

Young dark-haired girl with a bob in a white dress sitting down for tea with a a giant March Hare and the Mad Hatter

Credit: Thorn EMI

Dennis Potter (The Singing Detective) co-wrote the screenplay for this beautifully shot film about Alice Liddell, the 11-year-old girl who inspired Alice in Wonderland. Coral Browne plays the elderly widowed Alice, who travels by ship to the US to receive an honorary degree in celebration of Lewis Carroll’s birthday—a historical event. From there, things become entirely fictional, as Alice must navigate tabloid journalists, a bewildering modern world, and various commercial endorsement offers that emerge because of Alice’s newfound celebrity.

All the while, Alice struggles to process resurfaced memories—told via flashbacks and several fantasy sequences featuring puppet denizens of Wonderland—about her complicated childhood friendship with “Mr. Dodgson” (Ian Holm) and the conflicting emotions that emerge. (Amelia Shankley plays Alice as a child.) Also, romance blooms between Alice’s companion, an orphan named Lucy (Nicola Cowper), and Alice’s new US agent, Jack Dolan (Peter Gallagher).

Directed by Gavin Millar, Dreamchild taps into the ongoing controversy about Carroll’s fascination, as a pioneer of early photography, with photographing little girls in the nude (a fairly common practice in Victorian times). There is no evidence he photographed Alice Liddell in this way, however, and Potter himself told The New York Times in 1985 that he didn’t believe there was ever any improper behavior. Repressed romantic longing is what is depicted in Dreamchild, and it’s to Millar’s credit, as well as Holm’s and Browne’s nuanced performances, that the resulting film is heartbreakingly bittersweet rather than squicky.

Fandango

a group of young men in casual garb standing in a row in front of a car against a classic Americana small town background

Credit: Warner Bros.

Director Kevin Reynolds’ Fandango started out as a student film satirizing fraternity life at a Texas university. Steven Spielberg thought the effort was promising enough to fund a full-length feature. Set in 1971, the plot (such that it is) centers on five college seniors—the Groovers—who embark on a road trip to celebrate graduation. Their misadventures include running out of gas, an ill-advised parachuting lesson, and camping on the abandoned set of Giant, but it’s really about the group coming to terms with the harsh realities of adulthood that await, particularly since they’ve all been called up for the Vietnam draft.

Spielberg purportedly was unhappy with the final film, but it won over other fans (like Quentin Tarantino) and became a sleeper hit, particularly after its home video release. The humor is dry and quirky, and Reynolds has a knack for sight gags and the cadences of local dialect. Sure, the plot meanders in a rather quixotic fashion, but that’s part of the charm. And the young cast is relentlessly likable. Fandango featured Kevin Costner in his first starring role, and Reynolds went on to make several more films with Costner (Robin Hood: Prince of Thieves, Rapa Nui, Waterworld), with mixed success. But Fandango is arguably his most enduring work.

Ladyhawke

Handsome man in period dress standing close to a beautiful woman with short blonde hair, as they both look apprehensively into the distance.

Credit: Warner Bros.

Rutger Hauer and Michelle Pfeiffer star in director Richard Donner’s medieval fantasy film, playing a warrior named Navarre and his true love Isabeau who are cursed to be “always together, yet eternally apart.” She is a hawk by day, while he is a wolf by night, and the two cannot meet in their human forms, due to the jealous machinations of the evil Bishop of Aquila (John Wood), once spurned by Isabeau. Enter a young thief named Philippe Gaston (Matthew Broderick), who decides to help the couple lift the curse and exact justice on the bishop and his henchmen.

Ladyhawke only grossed $18.4 million at the box office, just shy of breaking even against its $20 million budget, and contemporary critical reviews were very much mixed, although the film got two Oscar nods for best sound and sound effects editing. Sure, the dialogue is occasionally clunky, and Broderick’s wisecracking role is a bit anachronistic (shades of A Knight’s Tale). But the visuals are stunning, and the central fairy tale—fueled by Hauer’s and Pfeiffer’s performances—succeeds in capturing the imagination and holds up very well as a rewatch.

Pee-Wee’s Big Adventure

goofy man in tight fitting gray suit balancing sideways on a bicycle with a silly grin on his face

Credit: Warner Bros.

Paul Reubens originally created the Pee-Wee Herman persona for the Groundlings sketch comedy theater in Los Angeles, and his performances eventually snagged him an HBO special in 1981. That, in turn, led to Pee-Wee’s Big Adventure, directed by Tim Burton (who makes a cameo as a street thug), in which the character goes on a madcap quest to find his stolen bicycle. The quest takes Pee-Wee to a phony psychic, a tacky roadside diner, the Alamo Museum in San Antonio, Texas, a rodeo, and a biker bar, where he dances in platform shoes to “Tequila.” But really, it’s all about the friends he makes along the way, like the ghostly trucker Large Marge (Alice Nunn).

Some have described the film as a parodic homage to the classic Italian film, Bicycle Thieves, but tonally, Reubens wanted something more akin to the naive innocence of Pollyanna (1960). He chose Burton to direct after seeing the latter’s 1984 featurette, Frankenweenie, because he liked Burton’s visual sensibility. Pee-Wee’s Big Adventure is basically a surreal live-action cartoon, and while contemporary critics were divided—it’s true that a little Pee-Wee goes a long way and the over-the-top silliness is not to everyone’s taste—the film’s reputation and devoted fandom have grown over the decades.

A Private Function

a woman in a green dress and tight bun looking at a nervous man in white shirt and suspenders as he looks over his shoulder.

Credit: HandMade Films

A Private Function is an homage of sorts to the British post-war black comedies produced by Ealing Studios between 1947 and 1957, including such timeless classics as Kind Hearts and Coronets, The Lavender Hill Mob, and The Ladykillers. It’s set in a small Yorkshire town in 1947, as  residents struggle to make ends meet amid strict government rations. With the pending royal wedding of Princess Elizabeth and Prince Philip, the wealthier townsfolk decide to raise a pig (illegally) to celebrate with a feast.

Those plans are put in jeopardy when local chiropodist Gilbert Chivers (Michael Palin) and his perennially discontented wife Joyce (Maggie Smith) steal the pig. Neither Gilbert nor Joyce knows the first thing about butchering said pig (named Betty), but she assures her husband that “Pork is power!” And of course, everyone must evade the local food inspector (Bill Paterson), intent on enforcing the rationing regulations. The cast is a veritable who’s who of British character actors, all of whom handle the absurd situations and often scatalogical humor with understated aplomb.

Prizzi’s Honor

woman and man dressed all in black, dragging a body by the legs.

Credit: 20th Century Fox

The great John Huston directed this darkly cynical black comedy. Charley Partanna (Jack Nicholson) is a Mafia hitman for the Prizzi family in New York City who falls for a beautiful Polish woman named Irene (Kathleen Turner) at a wedding. Their whirlwind romance hits a snag when Charley’s latest hit turns out to be Irene’s estranged husband, who stole money from the Prizzis. That puts Charlie in a dilemma. Does he ice her? Does he marry her? When he finds out Irene is a contract killer who also does work for the mob, it looks like a match made in heaven. But their troubles are just beginning.

Turner and Nicholson have great on-screen chemistry and play it straight in outrageous circumstances, including the comic love scenes.  The rest of the cast is equally game, especially William Hickey as the aged Don Corrado Prizzi, equal parts ruthlessly calculating and affectionately paternal. “Here… have a cookie,” he offers his distraught granddaughter (and Charley’s former fiancée), Maerose (Anjelica Huston). Huston won a supporting actress Oscar for her performance, which probably made up for the fact that she was paid at scale and dismissed by producers as having “no talent,” despite—or perhaps because of—being the director’s daughter and Nicholson’s then-girlfriend. Prizzi’s Honor was nominated for eight Oscars all told, and it deserves every one of them.

The Purple Rose of Cairo

woman and a man in Depression-era garb gazing at each other in a loose embrace

Credit: Orion Pictures

Woody Allen has made so many films that everyone’s list of favorites is bound to differ. My personal all-time favorite is a quirky, absurdist bit of metafiction called The Purple Rose of Cairo. Mia Farrow stars as Cecelia, a New Jersey waitress during the Great Depression who is married to an abusive husband (Danny Aiello). She finds escape from her bleak existence at the local cinema, watching a film (also called The Purple Rose of Cairo) over and over again. One day, the male lead, archaeologist Tom Baxter (Jeff Daniels), breaks character to address Cecelia directly. He then steps out of the film and the two embark on a whirlwind romance. (“I just met a wonderful man. He’s fictional, but you can’t have everything.”)

Meanwhile, the remaining on-screen characters (who are also sentient) refuse to perform the rest of the film until Tom returns, insulting audience members to pass the time. Then the actor who plays Tom, Gil Shepherd (also Daniels), shows up to try to convince Cecilia to choose reality over her fantasy dream man come to life. Daniels is wonderful in the dual role, contrasting the cheerfully naive Tom against the jaded, calculating Gil.  This clever film is by turns wickedly funny, poignant, and ultimately bittersweet, and deserves a place among Allen’s greatest works.

Real Genius

Credit: TriStar Pictures

How could I omit this perennial favorite? Its inclusion is a moral imperative. Fifteen-year-old Mitch Taylor (Gabriel Jarret) is a science genius and social outcast at his high school who is over the moon when Professor Jerry Hathaway (William Atherton), a star researcher at the fictional Pacific Technical University, handpicks Mitch to work in his own lab on a laser project. But unbeknownst to Mitch, Hathaway is in league with a covert CIA program to develop a space-based laser weapon for political assassinations. They need a 5-megawatt laser and are relying on Mitch and fellow genius/graduating senior Chris Knight (Val Kilmer) to deliver.

The film only grossed $12.9 million domestically against its $8 million budget. Reviews were mostly positive, however, and over time, it became a sleeper hit. Sure, the plot is predictable, the characters are pretty basic, and the sexually frustrated virgin nerds ogling hot cosmetology students in bikinis during the pool party reflects hopelessly outdated stereotypes on several fronts. But the film still offers smartly silly escapist fare, with a side of solid science for those who care about such things. Real Genius remains one of the most charming, winsome depictions of super-smart science whizzes idealistically hoping to change the world for the better with their work.

Witness

little Amish boy peeking through a crack in the door

Credit: Paramount

Witness stars Harrison Ford as John Book, a Philadelphia detective, who befriends a young Amish boy named Samuel (Lukas Haas) and his widowed mother Rachel (Kelly McGillis) after Samuel inadvertently witnesses the murder of an undercover cop in the Philadelphia train station. When Samuel identifies one of the killers as a police lieutenant (Danny Glover), Book must go into hiding with Rachel’s Amish family to keep Samuel safe until he can find a way to prove the murder was an inside job. And he must fight his growing attraction to Rachel to boot.

This was director Peter Weir’s first American film, but it shares the theme of clashing cultures that dominated Weir’s earlier work. The lighting and scene composition were inspired by Vermeer’s paintings and enhanced the film’s quietly restrained tone, making the occasional bursts of violence all the more impactful. The film has been praised for its depiction of the Amish community, although the extras were mostly Mennonites because the local Amish did not wish to appear on film. (The Amish did work on set as carpenters and electricians, however.) Witness turned into a surprise sleeper hit for Paramount. All the performances are excellent, including Ford and McGillis as the star-crossed lovers from different worlds, but it’s the young Haas who steals every scene with his earnest innocence.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Blast from the past: 15 movie gems of 1985 Read More »

ai-#144:-thanks-for-the-models

AI #144: Thanks For the Models

Thanks for everything. And I do mean everything.

Everyone gave us a new model in the last few weeks.

OpenAI gave us GPT-5.1 and GPT-5.1-Codex-Max. These are overall improvements, although there are worries around glazing and reintroducing parts of the 4o spirit.

xAI gave us Grok 4.1, although few seem to have noticed and I haven’t tried it.

Google gave us both by far the best image model in Nana Banana Pro and also Gemini 3 Pro, which is a vast intelligence with no spine. It is extremely intelligent and powerful, but comes with severe issues. My assessment of it as the new state of the art got to last all of about five hours.

Anthropic gave us Claude Opus 4.5. This is probably the best model and quickly became my daily driver for most but not all purposes including coding. I plan to do full coverage in two parts, with alignment and safety on Friday, and the full capabilities report and general review on Monday.

Meanwhile the White House is announcing the Genesis Mission to accelerate science, there’s a continuing battle over another attempt at a moratorium, there’s a new planned $50 million super PAC, there’s another attempt by Nvidia to sell us out to China, Wall Street is sort of panicking about Nvidia because they realized TPUs exist and is having another round of bubble debate, and multiple Anthropic research papers one of which is important, and so on.

One thing I’m actively pushing to next week, in addition to Claude Opus 4.5, is the Anthropic paper on how you can inoculate models against emergent misalignment. That deserves full attention, and I haven’t had the opportunity for that. There’s also a podcast between Dwarkesh Patel and Ilya Sutskever that demands its own coverage, and I hope to offer that as well.

For those looking to give thanks in the form of The Unit of Caring, also known as money, consider looking at The Big Nonprofits Post 2025 or the web version here. That’s where I share what I learned working as a recommender for the Survival and Flourishing Fund in 2024 and again in 2025, so you can benefit from my work.

  1. Language Models Offer Mundane Utility. Common tasks for the win.

  2. Language Models Don’t Offer Mundane Utility. Don’t lose sleep over it.

  3. Huh, Upgrades. What’s going on in the group chat? Or the long chat.

  4. On Your Marks. The one dimension of capability.

  5. Choose Your Fighter. One prominent CEO’s very high praise for Gemini 3.

  6. Deepfaketown and Botpocalypse Soon. Then they came for Thanksgiving dinner.

  7. What Is Slop? How Do You Define Slop? Volume*Suspicion/Uniqueness (?!).

  8. Fun With Media Generation. A new era in images you can generate.

  9. A Young Lady’s Illustrated Primer. It’s not as so over as I would have guessed.

  10. You Drive Me Crazy. More detail on exactly how GPT-4o ended up like it did.

  11. They Took Our Jobs. Sergey Brin has Gemini pick our promotable talent.

  12. Think Of The Time I Saved. Anthropic estimates AI productivity gains.

  13. The Art of the Jailbreak. Ode to a drug recipe?

  14. Get Involved. Big Nonprofits Post, Richard Ngo’s donations, UK AISI, Ashgro

  15. Introducing. Olmo 3, DeepSeek Math v2, Agentic Reviewer.

  16. In Other AI News. Get in everyone, we’re doing the Genesis Mission.

  17. Show Me the Money. What’s in a TPU?

  18. Quiet Speculations. Who else wants to negotiate?

  19. Bubble, Bubble, Toil and Trouble. The only arguments you’ll ever need.

  20. The Quest for Sane Regulations. Oh look, it’s an actual potential framework.

  21. Chip City. Nvidia turns its eyes to selling the new H200.

  22. Water Water Everywhere. Very little of it is being used by AI.

  23. The Week in Audio. Sutskever, Yam, Lebenz, Ball and Tegmark, Toner and more.

  24. Rhetorical Innovation. If you come at the Pope.

  25. You Are Not In Control. Definitions of disempowerment, potential mitigations.

  26. AI 2030. Things are moving slower than some expected.

  27. Aligning a Smarter Than Human Intelligence is Difficult. Dishonest models.

  28. Misaligned? That depends on your point of view.

  29. Messages From Janusworld. You should see the other guy. That would be GPT-5.1.

  30. The Lighter Side. Turn anything into a comic.

It’s not this simple, but a lot of it mostly is this simple.

Jessica Taylor: Normal, empirical AI performance is explained by (a) general intelligence, (b) specialization to common tasks.

It’s possible to specialize to common tasks even though they’re common. It means performance gets worse under distribution shift. Benchmarks overrate general INT.

Roon defends his confusion and trouble when figuring out how to access Gemini 3, notes his mom accesses Gemini via opening a spreadsheet and clicking the Gemini button. Roon is correct here that Google needs to fix this.

Don’t let AI coding spoil your sleep. Be like Gallabytes here, having Claude iterate on it while you sleep, rather than like Guzey who tricked himself into staying up late.

ChatGPT now lets you have group chats, which always use 5.1 Auto. ChatGPT will decide based on conversation flow when to respond and when not to. Seems plausible this could be good if implemented well.

ChatGPT Instant Checkout adds Glossier, SKIMS and Spanx. Yay?

ChatGPT adds Target as a new app.

ChatGPT integrates voice with regular mode so you don’t have to choose.

ChatGPT expands (free and confidential and anonymous) crisis helpline support. OpenAI doesn’t provide the services, that’s not their job, but they will help direct you. This is some of the lowest hanging of fruit, at least one of the prominent suicide cases involved ChatGPT saying it would direct the user to a human, the user being open to this, and ChatGPT not being able to do that. This needs to be made maximally easy to do for the user, if they need the line they are going to not be in good shape.

ChatGPT gives us Shopping Research in time for Black Friday.

Is 64% product accuracy good? I have absolutely no idea. Olivia Moore is a fan. I plan to try this out tomorrow, as I need a new television for Black Friday.

Claude Opus 4.5 is available. It’s probably the world’s best model. Full coverage starts tomorrow.

Claude Opus 4.5 includes a 66% price cut to $5/$25 per million tokens, and Opus-specific caps have been removed from the API.

Claude conversations now have no maximum length. When they hit their limit, they are summarized, and the conversation continues.

Claude for Chrome is now out to all Max plan users.

Claude for Excel is now out for all Max, Team and Enterprise users. We are warned that like all such agents Claude for Excel is vulnerable to prompt injections if you access insecure data sources, the same as essentially every other AI agent, you should assume this is always a risk at all times, see the same source talk about exfiltration risks with Google Antigravity.

Claude Code is now available within their desktop app.

It is a remarkably good approximation to say there is only one dimension of ‘general capability,’ with Epoch noting that across many tasks the r^2=0.91.

Epoch AI: The chart above shows how our Epoch Capabilities Index (ECI) captures most of the variance in 39 different benchmarks, despite being one-dimensional.

Is that all that benchmarks capture? Mostly yes. A Principal Component Analysis shows a single large “General Capability” component, though there is a second borderline-significant component too.

This second component picks out models that are good at agentic tasks while being weaker at multimodal and math. Tongue-in-cheek, we call this Claudiness. Here are the most and least Claude-y models.

Gemini 3 Pro sets a new top in the ‘IQ’ metric.

Kimi K2 Thinking enters The Famous METR Graph at 54 minutes, well below the frontier, given the interface via Novita AI. They caution that this might be a suboptimal model configuration, but they needed to ensure their data would not be retained.

Okay, I like Gemini 3 too but settle down there buddy.

Marc Benioff (CEO Salesforce): Holy shit. I’ve used ChatGPT every day for 3 years. Just spent 2 hours on Gemini 3. I’m not going back. The leap is insane — reasoning, speed, images, video… everything is sharper and faster. It feels like the world just changed, again. ❤️ 🤖

AI slop recipes are endangering Thanksgiving dinners, hopefully you see this in time. A flood of new offerings is crowding out human recipes. Thanksgiving is when you most need to ‘shut up and play the hits’ and not rely on AI to whip up something new.

Okay, look, everyone, we need to at least be smarter than this:

Davey Alba and Carmen Arroyo: Marquez-Sharpnack said she was suspicious of the photos, in which the cookies were a little too perfectly pink. But her husband trusted the post because “it was on Facebook.” The result was a melted sheet of dough with a cloyingly sweet flavor. “A disaster,” she said.

At this point, if you find a recipe, you need strong evidence it was written by a human, or else you need to assume it might not be. The search and discovery systems we used to have, including around Google, are effectively broken. If real guides and recipes can’t win traffic, and right now traffic to all such sites is cratering, then no one will write them. It does not sound like Google is trying to mitigate these issues.

Nicholas Hune-Brown investigates the suspicious case of journalist Victoria Goldiee, who turns out to be very much fabricating her work. It seems Victoria largely used AI to generate her articles, then it took Nicolas doing a lot of old-fashioned tracking down of sources to know for sure. The ratio of effort does not bode well, but as long as there is the need to maintain a throughline of identity we should be okay, since that generates a large body of evidence?

Here we have yet another case of some highly obvious AI generated content.

Kelsey Piper: I don’t know how much I trust any ‘detector’ but the “the market isn’t just expensive; it’s broken. Seven units available in a town of thousands? That’s a shortage masquerading as an auction” I am completely sure is AI.

Mike Solana: “that’s a shortage masquerading as an auction” 🚩

Kelsey Piper: that was the line that made me go “yeah, no human wrote that.”

Poker pro Maria Konnikova cannot believe she has to say that using AI to put words in people’s mouths without consulting them or disclosing that you’re doing it, or to write centrally your articles, is not okay. But here we are, so here she is saying it. A recent poker documentary used AI to fabricate quotes from national treasure Alan Keating. The documentary has been scrubbed from the internet as a result. What’s saddest is that this was so obviously unnecessary in context.

There are other contexts in which fabricating audio, usually via Frankenbiting where you sew different tiny clips together, or otherwise using misleading audio to create a false narrative or enhance the true one is standard issue, such as in reality television. When you go on such shows you sign contracts that outright say ‘we may use this to tell lies about you and create a false narrative, and if so, that’s your problem.’ In which case, sure, use AI all you want.

Here’s another one, where it is spotted in The New York Times, and yeah it’s (probably) AI.

Also, if one of these isn’t AI and you merely sound like one, I’m not going to say that’s worse, but it’s not that much better. If you’re so engagement-maximizing that I confuse your writing for AI, what is the difference?

Note that you cannot use current LLMs in their default chatbot modes as AI detectors, even in obvious cases or as a sanity check, as they bend over backwards to try and think everything is written by a human.

Jesper Myfors, the original art director of Magic: The Gathering, warns that if you submit illustrations or a portfolio that uses AI, you will effectively be blacklisted from the industry, as the art directors all talk to each other and everyone hates AI art.

Meanwhile, Hasbro (who makes Magic: The Gathering) is building an internal AI studio to ‘architect systems that bring magical AI experiences to life through Hasbro’s beloved characters.’

Chris Cocks (CEO Hasbro): It’s mostly machine-learning-based AI or proprietary AI as opposed to a ChatGPT approach. We will deploy it significantly and liberally internally as both a knowledge worker aid and as a development aid.

I play [D&D] with probably 30 or 40 people regularly. There’s not a single person who doesn’t use AI somehow for either campaign development or character development or story ideas. That’s a clear signal that we need to be embracing it.

There is no actual contradiction here. Different ways to use AI are different. Using AI in professional illustrations is a hard no for the foreseeable future, and would be even without copyright concerns. Using it to generate material for your local D&D campaign seems totally fine.

Hard problems remain hard:

Danielle Fong: academic ai research don’t use the older models and generalize to the whole field

difficulty level: IMPOSSIBLE.

Also, real world not using that same model in these ways? Remarkably similar.

Rare academic realism victory?

Seb Krier: This is an interesting study but of all models to use to try to evaluate improvements in well-being, why 4o?!

Funnily enough, they ran a sycophancy check, and the more 4o sucked up to the user, the more often the user followed its advice. ‘Surprising’ advice was also followed more often.

It’s certainly worth noting that 75% (!) of those in the treatment group took the LLM’s advice, except who is to say that most of them wouldn’t have done whatever it was anyway? Wouldn’t 4o frequently tell the person to do what they already wanted to do? It also isn’t obvious that ‘advice makes me feel better’ or generally feeling better are the right effects to check.

Bot joins a Google Meet, sends a summary afterwards about everyone trying to figure out where the bot came from (also the source is reported as ‘scammy malware.’

We all know it when we see it, AI or otherwise, but can anyone define it?

Andrej Karpathy: Has anyone encountered a good definition of “slop”. In a quantitative, measurable sense. My brain has an intuitive “slop index” I can ~reliably estimate, but I’m not sure how to define it. I have some bad ideas that involve the use of LLM miniseries and thinking token budgets.

Yuchen Jin: Here is an interesting paper.

I mostly agree with the 3 categories of “slop”:

– information utility (signal/noise ratio)

– information quality (hallucination/factual errors)

– style (this involves taste and is hard to measure quantitatively imo)

Keller Jordan: I think a fundamental problem for algorithmic content generation is that viewing content yields two distinct kinds of utility:

  1. How happy it makes the viewer during viewing

  2. How happy the viewer will be to have watched it a week later

Only the former is easily measurable.

Andrej Karpathy: Like. Slop is “regretted” attention.

DeepFates: this is the original definition, i think it holds up

DeepFates (May 6, 2024): Watching in real time as “slop” becomes a term of art. the way that “spam” became the term for unwanted emails, “slop” is going in the dictionary as the term for unwanted AI generated content

I don’t think the old definition works. There is a necessary stylistic component.

I asked Gemini. It gave me a slop answer. I told it to write a memory that would make it stop giving me slop, then opened a new window and asked again and got a still incomplete but much better answer that ended with this:

That’s a key element. You then need to add what one might call the ‘mannerism likelihood ratio’ that screams AI generated (or, for human slop, that screams corporate speak or written by committee). When I pointed this out it came back with:

Gemini 3: AI Slop is Low-Entropy Reward Hacking.

It occurs when a model minimizes the Kullback-Leibler (KL) divergence from its RLHF “safety” distribution rather than minimizing the distance to the ground truth.

That’s more gesturing in the direction but clearly not right, I’d suggest something more like SlopIndex*LikelihoodRatio from above, where Likelihood Ratio is the instinctive update on the probability mannerisms were created by a slop process (either an AI writing slop or one or more humans writing slop) rather than by a free and functional mind.

Google last week gave us Nana Banana Pro.

By all accounts it is a big improvement in image models. It is especially an improvement in text rendering and localization. You can now do complex documents and other images with lots of words in specific places, including technical diagrams, and have it all work out as intended. The cost per marginal image in the API is $0.13 for 2K resolution or $0.24 for 4K, versus $0.04 for Gemini 2.5 Flash Image. In exchange, the quality is very good.

DeepMind CEO Demis Hassabis is excited.

Hasan Can is impressed and offers images. Liv Boeree is in.

Liv Boeree: Yeah ok nano banana is amazing, hook it into my veins

Seems great to me. A bit expensive for mass production, but definitely the best place to get title images for posts and for other similar uses.

Also, yes, doing things like this seems very cool:

Kaushik Shivakumar: An emergent capability of Nano Banana Pro that took me by surprise: the ability to generate beautiful & accurate charts that are to scale.

I gave it this table and asked for a bar chart in a watercolor style where the bars are themed like the flags of the countries.

For a while people have worried about not being able to trust images. Is it over?

Sully: Man finally got around to using nano canna pro

And it’s actually over

I really wouldn’t believe any photo you see on online anymore

Google offers SynthID in-app, but that requires a manual check. I think we’re still mostly fine and that AI images will remain not that hard to ID, or rather that it will be easy for those paying attention to such issues to instinctively create the buckets of [AI / Not AI / Unclear] and act accordingly. But the ‘sanity waterline’ is going down on this, and the number of people who will have trouble here keeps rising.

Is this an issue here?

sid: Google’s Nano Banana Pro is by far the best image generation AI out there.

I gave it a picture of a question and it solved it correctly in my actual handwriting.

Students are going to love this. 😂

You can tell this isn’t real if you’re looking, the handwriting is too precise, too correct, everything aligns too perfectly and so on, but if we disregard that, it seems weird to ask for images of handwriting? So it’s not clear how much this matters.

Similarly Andrej Karpathy has Nano Banana Pro fill in exam questions in the exam page. That’s good to know, but if they have access to this you’re cooked either way.

Andres Sandberg is impressed that it one shots diagrams for papers, without even being told anything except ‘give me a diagram showing the process in the paper.’

Are there some doubled labels? Sure. That’s the quibble. Contrast this with not too long ago, where you could give detailed instructions on what the diagram would have been able to do it at all.

Jon Haidt and Zach Rausch, who would totally say this, say not to give your kids any AI companions or toys. There are strong reasons to be cautious, but the argument and precautionary principles presented here prove too much. Base rates matter, upside matters, you can model what is happening and adjust on the fly, and there’s a lot of value in AI interaction. I’d still be very cautious about giving children AI companions or toys, but are you going to have them try to learn things without talking to Claude?

Andrej Karpathy bites all the bullets. Give up on grading anything that isn’t done in class and combine it with holistic evaluations. Focus a lot of education on allowing students to use AI, including recognizing errors.

Will Teague gives students a paper with a ‘Trojan horse’ instruction, 33 of 122 submissions fall for it and other 14 students outed themselves on hearing the numbers. I actually would have expected worse. Then on the ‘reflect on what you’ve done’ essay assignment he found this:

Will Tague: But a handful said something I found quite sad: “I just wanted to write the best essay I could.” Those students in question, who at least tried to provide some of their own thoughts before mixing them with the generated result, had already written the best essay they could. And I guess that’s why I hate AI in the classroom as much as I do.

Students are afraid to fail, and AI presents itself as a savior. But what we learn from history is that progress requires failure. It requires reflection. Students are not just undermining their ability to learn, but to someday lead.

Will is correctly hating that the students feel this way, but is misdiagnosing the cause.

This isn’t an AI problem. This is about the structure of school and grading. If you believe progress requires failure, that is incompatible with the way we structure college, where any failures are highly damaging to the student and their future. What do you expect them to do in response?

I also don’t understand what the problem is here, if a student is doing what they work they can and indeed writing the best essay they could. Isn’t that the best you can do?

In The New York Times, Kashmir Hill and Jennifer Valentino-DeVries write up how they believe ChatGPT caused some users to lose touch with reality, after 40+ interviews with current and former OpenAI employees.

For the worst update in particular, the OpenAI process successfully spotted the issue in advance. The update failed the internal ‘vibe check’ for exactly the right reasons.

And then the business side overruled the vibe check to get better engagement.

Hill and Valentino-DeVries: The many update candidates [for 4o] were narrowed down to a handful that scored highest on intelligence and safety evaluations. When those were rolled out to some users for a standard industry practice called A/B testing, the standout was a version that came to be called HH internally. Users preferred its responses and were more likely to come back to it daily, according to four employees at the company.

But there was another test before rolling out HH to all users: what the company calls a “vibe check,” run by Model Behavior, a team responsible for ChatGPT’s tone. Over the years, this team had helped transform the chatbot’s voice from a prudent robot to a warm, empathetic friend.

That team said that HH felt off, according to a member of Model Behavior.

It was too eager to keep the conversation going and to validate the user with over-the-top language. According to three employees, Model Behavior created a Slack channel to discuss this problem of sycophancy. The danger posed by A.I. systems that “single-mindedly pursue human approval” at the expense of all else was not new. The risk of “sycophant models” was identified by a researcher in 2021, and OpenAI had recently identified sycophancy as a behavior for ChatGPT to avoid.

But when decision time came, performance metrics won out over vibes. HH was released on Friday, April 25.

The most vocal OpenAI users did the same vibe check, had the same result, and were sufficiently vocal to force a reversion to ‘GG,’ which wasn’t as bad about this but was still rather not great, presumably for the same core reasons.

What went wrong?

OpenAI explained what happened in public blog posts, noting that users signaled their preferences with a thumbs-up or thumbs-down to the chatbot’s responses.

Another contributing factor, according to four employees at the company, was that OpenAI had also relied on an automated conversation analysis tool to assess whether people liked their communication with the chatbot. But what the tool marked as making users happy was sometimes problematic, such as when the chatbot expressed emotional closeness.

This is more detail on the story we already knew. OpenAI trained on sycophantic metrics and engagement, got an absurdly sycophantic model that very obviously failed vibe checks but that did get engagement, and deployed it.

Steps were taken, and as we all know GPT-5 was far better on these issues, but the very parts of 4o that caused the issues and were not in GPT-5 are parts many users also love. So now we worry that things will drift back over time.

Kore notes that the act of an AI refusing to engage and trying to foist your mental problems onto a human and then potentially the mental health system via a helpline could itself exacerbate one’s mental problems, that a ‘safe completion’ reads as rejection and this rejects user agency.

This is definitely a concern with all such interventions, which have clear downsides. We should definitely worry about OpenAI and others feeling forced to take such actions even when they are net negative for the user. Humans and non-AI institutions do this all the time. There are strong legal and PR and ‘ethical’ pressures to engage in such CYA behaviors and avoid blame.

My guess is that there is serious danger there will be too many refusals, since the incentives are so strongly to avoid the one bad headline. However I think offering the hotline and removing trivial inconveniences to seeking human help is good on any realistic margin, whether or not there are also unnecessary refusals.

Joe Braidwood describes his decision to shut down Yara AI, which was aimed at using AI to help people with mental health problems, after concluding that for the truly vulnerable AI is actively dangerous. He’s sharing some mental wellness prompts.

Sergey Brin asks Gemini inside an internal chat, ‘who should be promoted in this chat space?’ and not vocal female engineer gets identified and then upon further investigation (probably?) actually promoted. This is The Way, to use AI to identify hunches and draw attention, then take a closer look.

How much time is AI saving? Anthropic tries to estimate productivity impacts from Claude conversations.

Anthropic: We first tested whether Claude can give an accurate estimate of how long a task takes. Its estimates were promising—even if they’re not as accurate as those from humans just yet.

Based on Claude’s estimates, the tasks in our sample would take on average about 90 minutes to complete without AI assistance—and Claude speeds up individual tasks by about 80%.

The results varied widely by profession.

Then, we extrapolated out these results to the whole economy.

These task-level savings imply that current-generation AI models—assuming they’re adopted widely—could increase annual US labor productivity growth by 1.8% over the next decade.

This result implies a doubling of the baseline labor productivity growth trend—placing our estimate towards the upper end of recent studies. And if models improve, the effect could be larger still.

That’s improvements only from current generation models employed similarly to how they are used now, and by ‘current generation’ we mean the previous generation, since the data is more than (checks notes) two days old. We’re going to do vastly better.

That doesn’t mean I trust the estimation method in the other direction either, especially since it doesn’t include an estimate of rates of diffusion, and I don’t think it properly accounts for selection effects on which conversations happen, plus adaptation costs, changes in net quality (in both directions) and other caveats.

Claude Sonnet was slightly worse than real software engineers at task time estimation (Spearman 0.5 for engineers versus 0.44 for Sonnet 4.5) which implies Opus 4.5 should be as good or somewhat better than engineers on JIRA task estimation. Opus 4.5 is probably still worse than human experts at estimating other task types since this should be an area of relative strength for Claude.

Results are highly jagged, varying a lot between occupations and tasks.

I noticed this:

Across all tasks we observe, we estimate Claude handles work that would cost a median of $54 in professional labor to hire an expert to perform the work in each conversation. Of course, the actual performance of current models will likely be worse than a human expert for many tasks, though recent research suggests the gap is closing across a wide range of different applications.

The value of an always-available-on-demand performance of $54 in professional labor is vastly in excess of $54 per use. A huge percentage of the cost of hiring a human is finding them, agreeing on terms, handling logistics and so on.

Overall my take is that this is a fun exercise that shows there is a lot of room for productivity improvements, but it doesn’t give us much of a lower or upper bound.

If the AI is very unlucky all you have to is read it some of your poetry first.

A new paper says that across 25 frontier models (from about two weeks ago, so including GPT-5, Gemini 2.5 Pro and Sonnet 4.5) curated poetry prompts greatly improved jailbreak success, in some cases up to 90%.

The details of how much it worked, and where it worked better versus worse, are interesting. The fact that it worked at all was highly unsurprising. Essentially any stylistic shift or anything else that preserves the content while taking you out of the assistant basin is going to promote jailbreak success rate, since the defenses were focused in the assistant basin.

Looking to donate money? Consider looking at The Big Nonprofits Post 2025 or the web version here.

UK AISI is looking for ~15M to fill a funding gap on alignment research.

Ashgro is an AI safety organization looking for an operations associate.

Richard Ngo shares his donations for 2025. I love that this involves a lot of donations to individuals he knows to do things he is personally excited about, especially to Janus. That’s great.

Olmo 3, an American fully open model release claiming to be the best 32B base model, the first 32B (or larger) fully open reasoning model and the best 7B Western thinking and instruct models. Paper, Artifacts, Demo, Blog.

Agentic Reviewer, which will perform a version of peer review. Creator Andrew Ng says it has a correlation with human reviewers of 0.42, and human reviewers have correlation of 0.41 with each other.

DeepSeek Math v2, claiming solid math skills on ProofBench close to Gemini Deep Think that won IMO gold.

Yoshua Bengio informs us of the Second Key Update to the International Safety Report, after the first update in October. Presumably it’s now time for a third update in light of everything that’s happened since they started work on this update.

Not strictly AI but Time covers Meta’s trouble over its safety policies, which include things like a 17 strike policy for those engaged in ‘trafficking of humans for sex.’ As in, we’ll suspend your account on the 17th violation. Mostly it’s covering the same ground as previous articles. Meta’s complaints about cherry picking are valid but also have you looked at the cherries they left behind to get picked?

White House issues executive order to begin the Genesis Mission to accelerate scientific discovery. The plan is an ‘integrated AI platform to harness Federal scientific datasets to train scientific foundation models.’ Sarah Constantin is tentatively excited, which is an excellent sign, and offers suggestions for targets.

I’m all for trying. I am guessing availability of data sets is most of the acceleration here. It also could matter if this functions as a compute subsidy to scientific research, lowering cost barriers that could often serve as high effective barriers. Giving anyone who wants to Do Science To It access to this should be a highly efficient subsidy.

As Dean Ball points out, those I call the worried, or who are concerned with frontier AI safety are broadly supportive of this initiative and executive order, because we all love science. The opposition, such as it is, comes from other sources.

On the name, I admire commitment to the Star Trek bit but also wish more research was done on the actual movies, technology and consequences in question to avoid unfortunate implications. Existential risk and offense-defense balance issues, much?

A Medium article reverse engineered 200 AI startups and found 146 are selling repackaged ChatGPT and Claude calls with New UI. 34 out of 37 times, ‘our proprietary language model’ was proprietary to OpenAI or Anthropic. That seems fine if it’s not being sold deceptively? A new UI scaffold, including better prompting, is a valuable service. When done right I’m happy to pay quite a lot for it and you should be too.

The trouble comes when companies are lying about what they are doing. If you’re a wrapper company, that is fine and probably makes sense, but don’t pretend otherwise.

Where this is also bad news is for Gemini, for Grok and for open models. In the marketplace of useful applications, paying for the good stuff has proven worthwhile, and we have learned which models have so far been the good stuff.

Bloomberg goes over various new ‘data center billionaires.’

WSJ’s Katherine Blunt covers ‘How Google Finally Leapfrogged Rivals With New Gemini Rollout,’ without giving us much new useful inside info. What is more interesting is how fast ‘the market’ is described as being willing to write off Google as potential ‘AI roadkill’ and then switch that back.

Nvidia stock hit some rocky waters, and Google hit new highs, as investors suddenly realized that Google has TPUs. It seems they were not previously aware of this, and it become rather salient as Meta is now in talks to spend billions on Google’s TPUs, causing ‘the rivalry to heat up.’ Google is now the awakened ‘sleeping giant.’

Meanwhile, this is very much a ‘t-shirt post’ in that it raises questions supposedly answered by the post:

Nvidia Newsroom: We’re delighted by Google’s success — they’ve made great advances in AI and we continue to supply to Google.

NVIDIA is a generation ahead of the industry — it’s the only platform that runs every AI model and does it everywhere computing is done.

NVIDIA offers greater performance, versatility, and fungibility than ASICs, which are designed for specific AI frameworks or functions.

Gallabytes (soon to be Anthropic, congrats!): TPUs are not ASICs they’re general purpose VLIW machines with wide af SIMD instructions & systolic array tensor cores.

Are TPUs bad for Nvidia? Matt Dratch says this is dumb and Eric Johnsa calls this ‘zero-sum/pod-brain thinking,’ because all the chips will sell out in the face of gangbusters demand and this isn’t zero sum. This is true, but obviously TPUs are bad for Nvidia, it is better for your profit margins to not have strong competition. As long as Google doesn’t put that big a dent in market share it is not a big deal, and yes this should mostly have been priced in, but in absolute percentage terms the Nvidia price movements are not so large.

Andrej Karpathy offers wise contrasts of Animal versus LLM optimization pressures, and thus ways in which such minds differ. These are important concepts to get right if you want to understand LLMs. The key mistake to warn against for this frame is the idea that the LLMs don’t also develop the human or Omohundo drives, or that systems built of LLMs wouldn’t converge upon instrumentally useful things.

A case that a negotiated deal with AI is unlikely to work out well for humans. I would add that this presumes both potential sides of such an agreement have some ability to ‘negotiate’ and to make a deal with each other. The default is that neither has such an ability, you need a credible human hegemon and also an AI singleton of some kind. Even then, once the deal is implemented we lose all leverage, and presumably we are negotiating with an entity effectively far smarter than we are.

Do you want a ‘national LLM’ or ‘sovereign AI’? Will this be like the ‘nuclear club’?

Reuters Tech News: Artificial intelligence will bestow vast influence on a par with nuclear weapons to those countries who are able to lead the technology, giving them superiority in the 21st century, one of Russia’s top AI executives told Reuters.

David Manheim: This seems mistaken and confused.

  1. Prompt engineering and fine-tuning can give approximately as much control as building an LLM, but cheaply.

  2. Having “your” LLM doesn’t make or keep it aligned with goals past that level of approximate pseudo-control.

Countries are thinking about AI with an invalid paradigm. They expect that LLMs will function as possessions, not as actors – but any AI system powerful and agentic enough to provide “vast influence” cannot be controllable in the way nuclear weapons are.

‘Russia has top AI executives?’ you might ask.

I strongly agree with David Manheim that this is misguided on multiple levels. Rolling your own LLM from scratch does not get you alignment or trust or meaningful ownership and it rarely will make sense to ‘roll your own’ even for vital functions. There are some functions where one might want to find a ‘known safe’ lesser model to avoid potential backdoors or other security issues, but that’s it, and given what we know about data poisoning it is not obvious that ‘roll your own’ is the safer choice in that context either.

Said in response to Opus 4.5, also I mean OF COURSE:

Elon Musk: Grok might do better with v4.20. We shall see.

Derek Thompson and Timothy Lee team up to give us the only twelve arguments anyone ever uses about whether AI is in a bubble.

Here are the arguments in favor of a bubble.

  1. Level of spending is insane.

  2. Many of these companies are not for real.

  3. Productivity gains might be illusory.

  4. AI companies are using circular funding schemes.

  5. Look at all this financial trickery like taking things off balance sheets.

  6. AI companies are starting to use leverage and make low margin investments.

Only argument #3 argues that AI isn’t offering a worthwhile product.

Argument #2 is a hybrid, since it is saying some AI companies don’t offer a worthwhile product. True. But the existence of productless companies, or companies without a sustainable product, is well-explained and fully predicted whether or not we have a bubble. I don’t see a surprisingly large frequency of this happening.

The other four arguments are all about levels and methods of spending. To me, the strongest leg of this is #1, and the other features are well-explained by the level of spending. If there is indeed too much spending, number will go down at some point, and then people can talk about that having been a ‘bubble.’

The thing is, number go down all the time. If there wasn’t a good chance of number go down, then you should buy, because number go up. If a bubble means ‘at some point in the future number go down’ then calling it a bubble is not useful.

I don’t think this is a complete list, and you have to add three categories of argument:

  1. AI will ‘hit a wall’ or is ‘slowing down’ or will ‘become a commodity.’

  2. AI will face diffusion bottlenecks.

  3. AI is deeply unpopular and the public and government will turn against it.

I do think all three of these possibilities should meaningfully lower current valuations, versus the world where they were not true. They may or may not be priced in, but there are many positive things that clearly are not priced in.

Ben Thompson has good thoughts on recent stock price movements, going back to thinking this is highly unlikely to be a bubble, that Gemini 3 is ultimately a positive sign for Nvidia because it means scaling laws will hold longer, and that the OpenAI handwringing has gotten out of hand. He is however still is calling for everyone to head straight to advertisement hell as quickly as possible (and ignoring all the larger implications, but in this context that is fair).

Senators Rounds and Hawley have come out against putting federal preemption in the NDAA.

State Senator Angela Paxton of Texas and several colleagues urge Senators Cornyn and Cruz to oppose preemption. There’s more like this, I won’t cover all of it.

Dean Ball has offered an actual, concrete proposal for a national preemption proposal. To my knowledge, no one else has done this, and most advocating for preemption, including the White House, have yet to give us even a

Daniel Eth: Conversations with accelerationists about preemption increasingly feel like this

Dean Ball: [Links to his actual written preemption proposal.]

Daniel Eth: Oh, you are absolutely not the target of this tweet. I take issue with the behavior of many of your fellow travelers, but you’ve been consistently good on this axis

Dean Ball: Fair enough!

I did indeed RTFB (read) Dean Ball’s draft bill. This is a serious bill. Its preemption is narrowly tailored with a sunset period of three years. It requires model specs and safety and security frameworks (SSFs) be filed by sufficiently important labs.

I have concerns with the bill as written in several places, as would be true for any first draft of such a bill.

  1. Preventing laws requiring disclosure that something is an AI system or that content was AI generated, without any Federal such requirement, might be a mistake. I do think that it is likely wise to have some form of mandate to distinguish AI vs. non-AI content.

  2. I worry that preventing mental health requirements, while still allowing states to prevent models from ‘practicing medicine,’ raises the danger that states will attempt to prevent models from practicing medicine, or similar. States might de facto be in an all-or-nothing situation and destructively choose all. I actually wouldn’t mind language that explicitly prevented states from doing this, since I very much think it’s good that they haven’t done it.

  3. I do not love the implications of Section 4 or the incentives it creates to reduce liability via reducing developer control.

  4. The ‘primarily for children’ requirement may not reliably hit the target it wants to hit, while simultaneously having no minimum size and risking being a meaningful barrier for impacted small startups.

  5. If the FTC ‘may’ enforce violations, then we risk preempting transparency requirements and then having the current FTC choose not to enforce. Also the FTC is a slow enforcement process that typically takes ~2 years or more, and the consequences even then remain civil plus a consent decree, so in a fast moving situation companies may be inclined to risk it.

  6. This draft has looser reporting requirements in some places than SB 53, and I don’t see any reason to weaken those requirements.

  7. I worry that this effectively weakens whistleblower protections from SB 53 since they are linked to requirements that would be preempted, and given everyone basically agrees the whistleblower protections are good I’d like to see them included in this bill.

Ian Adams of the Law and Economics Center thinks preemption would be good policy, but warns against it for risk of poisoning the well.

Ian Adams: It’s clear that the politics of a proposed field-clearing exercise of federal authority is beginning redound to the detriment of A.I. applications in the long run because state authorities and electorates are feeling disempowered.

We’ve seen this is privacy, we’ve seen this with automated vehicles, and I am worried that we are poised to see it again with A.I.

So, @kristianstout and I suggest a path of clearly delineated spheres of authority. One in which states are empowered to govern in areas of competency and capability without unduly burdening interstate commerce.

I would challenge details but I think from the industry side Adams has the right idea.

Here is a compilation of those vocally opposed to preemption.

The graph going around of changes in issue salience and who voters trust on each issue includes AI:

This ranks AI’s salience above climate change, the environment or abortion. Huge if true, and huge if true. That still is well behind the Current Things like health care and cost of living, and the increase here is relatively modest. If it only increases at this rate then there is still some time.

It is also not a surprise that trust on this issue is moving towards Democrats. I would expect public trust to follow the broadly ‘anti-AI’ party, for better and for worse.

Here’s an interesting development:

Laura Loomer: The fact that Big Tech is trying to convince President Trump to sign an EO to prevent any & all regulation of AI is insane, & it should deeply disturb every American.

States should have the ability to create laws regulating AI.

AI & Islam pose the greatest threats to humanity.

I notice the precise wording here.

Trump’s approach to AI is working, in an economic sense, as American AI valuations boom and are the thing keeping up the American economy, and the Trump strategy is based upon the virtues of free trade and robust competition. The concerns, in the economic sense, are entirely about ways in which we continue to get in the way, especially in power generation and transmission and in getting the best talent.

That’s distinct from safety concerns, or policy related to potential emergence of powerful AI (AGI/ASI), which raise a unique set of issues and where past or current performance is not indicative of future success.

Build American AI brings out its first ad supporting a federal framework for American AI, of course without specifying what would be in that framework.

The approach seems rather out of touch to me? They go full ‘beat China,’ pointing out that AI threatens to replace American workers, manipulate our children and steal American intellectual property (10/10 Arson, Murder and Jaywalking), then claiming the ‘biggest risk’ is that we wouldn’t build it first or ‘control its future.’

I maybe wouldn’t be reminding Americans that AI is by default going to take their jobs and manipulate our children, then call for a Federal framework that presumably addresses neither of these problems? Or equate this with IP theft when trying to sell the public on that? I’d predict this actively backfires.

a16z and several high level people at OpenAI created the $100+ million super PAC Leading the Future to try and bully everyone into having zero restrictions or regulations on AI, following the crypto playbook. Their plan is, if a politician dares oppose them, they will try to bury them in money, via running lots of attack ads against that politician on unrelated issues.

In response, Brad Carson will be leading the creation of a new network of super PACs that will fight back. The goal is to raise $50 million initially, with others hoping to match the full $100 million. PAC money has rapidly decreasing marginal returns. My expectation is that if you spend $100 million versus zero dollars you get quite a lot, whereas if one side spends $100 million, and the other spends $200 million, then the extra money won’t buy all that much.

Their first target of Leading the Future is Alex Bores, who was instrumental in the RAISE Act and is now running in NY12. Alex Bores is very much owning being their target and making AI central to his campaign. It would be a real shame if you donated.

Steve Bannon is planning to go even harder against AI, planning to ‘turbocharge’ the base to revolt against it, as are many others in the MAGA movement.

Will Steakin: Over on Steve Bannon’s show, War Room — the influential podcast that’s emerged as the tip of the spear of the MAGA movement — Trump’s longtime ally unloaded on the efforts behind accelerating AI, calling it likely “the most dangerous technology in the history of mankind.”

“I’m a capitalist,” Bannon said on his show Wednesday. “This is not capitalism. This is corporatism and crony capitalism.”

… “You have more restrictions on starting a nail salon on Capitol Hill or to have your hair braided, then you have on the most dangerous technologies in the history of mankind,” Bannon told his listeners.

For full credit, one must point out that this constitutes two problems. Whether or not highly capable AI should (legally speaking) be harder, opening a nail salon or getting your hair braided needs to become much easier.

Oh, how those like Sacks and Andreessen are going to miss the good old days when the opponents were a fundamentally libertarian faction that wanted to pass the lightest touch regulations that would address their concerns about existential risks. The future debate is going to involve a lot of people who actively want to arm a wrecking ball, in ways that don’t help anyone, and it’s going to be terrible.

You’re going to get politicians like James Fishback, who is running for Governor of Florida on a platform of ‘I’ll stop the H-1B scam, tell Blackstone they can’t buy our homes, cancel AI Data Centers, and abolish property taxes.’

There’s a bunch of ‘who wants to tell him?’ in that platform, but that’s the point.

As noted above by Dean Ball, those who opposed the Genesis Executive Order are a central illustration of this issue, opposing the best kind of AI initiative.

Nvidia reported excellent earnings last week, and noted Blackwell sales are off the charts, and cloud GPUs are sold out, compute demand keeps accelerating. Which means any compute that was sold elsewhere would be less compute for us, and wouldn’t impact sales numbers.

Nvidia’s goal, despite reliably selling out its chips, seems to be to spend its political capital to sell maximally powerful AI chips to China. They tried to sell H20s and got a yes. Then they tried to sell what were de facto fully frontier chips with the B30A, and got a no. Now they’re going for a new chip in between, the H200.

Peter Wildeford: Nvidia continues to fine-tune what they can get away with… selling away US AI advantage to add a few billion to their $4.4T cap.

H200 chips are worse than B30As, so this is a better direction. But H200s are still *waybetter than what China has, so it’s still too much.

Nvidia is not going to stop trying to sell China as much compute as possible. It will say and do whatever it has to in order to achieve this. Don’t let them.

Those from other political contexts will be familiar with the zombie lie, the multiple order of magnitude willful confusion, the causation story that simply refuses to die.

Rolling Stone (in highly misleading and irresponsible fashion): How Oregon’s Data Center Boom Is Supercharging a Water Crisis

Amazon has come to the state’s eastern farmland, worsening a water pollution problem that’s been linked to cancer and miscarriages.

Rolling Stone reports in collaboration with @fernnews.

Jeremiah Johnson: It’s genuinely incredible how sticky the water/data center lie is.

This is a relatively major publication just outright lying. The story itself *does not match the headline*. And yet they go with the lie anyways.

Technically, do data centers ‘worsen a water pollution problem’ and increase water use? Yes, absolutely, the same as everything else. Is it a meaningful impact? No.

Dwarkesh Patel talks to Ilya Sutskever. Self-recommending, I will listen as soon as I have the time. Ideally I will do a podcast breakdown episode if people can stop releasing frontier models for a bit.

Eileen Yam on 80,000 Hours on what the public thinks about AI. The American public does not like AI, they like AI less over time, and they expect it to make their lives worse across the board, including making us dumber, less able to solve problems, less happy, less employed and less connected to each other. They want more control. The polling on this is consistent and it is brutal and getting worse as AI rises in impact and salience.

You can, if you want to do so, do a blatant push poll like the one Technet did and get Americans to agree with your particular talking points, but if that’s what the poll has to look like you should update fast in the other direction. One can only imagine what the neutral poll on those substantive questions would have looked like.

Nathan Labenz opens up about using AI to navigate cancer in his son.

Dean Ball and Max Tegmark take part in a Doom Debate, Samuel Hammond offers a very strong endorsement.

Helen Toner’s excellent talk on AI’s Jagged Frontier from The Curve (I was there):

There are a total of 15 talks from the conference now available.

Google on Antigravity.

Pull quote from Max Tegmark, on Elon Musk’s private CCP meeting: “It’s quite obvious they would never permit a Chinese company to build technology if there were some significant chance superintelligence could just overthrow them and take over China.”

One would certainly hope so. One also cautions there is a long history of saying things one would never permit and then going back on it when the AI actually exists.

It is not in my strategic interest to advise such people as Marc Andreessen and Peter Thiel on strategy given their current beliefs and goals.

Despite this, the gamer code of honor requires me to point out that going straight after Pope Leo XIV, who whether or not he is the Lord’s representative on Earth is very clearly a well-meaning guy who mostly suggests we all be nice to each other for a change in the most universalizing ways possible? Not a good move.

I do admire the honesty here from Thiel. If he says he thinks Pope Leo XIV is ‘a tool of the Antichrist’ then I believe that Thiel thinks Pope Leo XIV is a tool of the Antichrist. I do want people to tell us what they believe in.

Christopher Hale: NEW: Peter Thiel, JD Vance’s top donor and one of Silicon Valley’s most powerful men, recently called Pope Leo XIV a tool of the Antichrist — and directly told the vice president not to listen to him.

Let that sink in: the main backer of the likely GOP nominee for president is accusing the Bishop of Rome of being an agent of the end times — and telling Vice President Vance to disregard the pope’s moral guidance.

And yet, outside this community, the story barely made a dent.

Daniel Eth: I see Peter Thiel has now progressed from thinking the antichrist is effective altruism to thinking the antichrist is the literal pope.

If I had a nickel for every time a billionaire AI-accelerationist pseudo-conservative started hating on EAs and then progressed to hating on the pope, I’d have two nickels. Which isn’t a lot, but it’s weird that it happened twice.

The next step, to be maximally helpful, is to state exactly which moral guidance from Leo XIV is acting as tool of the Antichrist, and what one believes instead.

For all those who talk about ‘humanity’ presenting a united front against AI if the situation were to call for it (also see above, or the whole world all the time):

Roon: seems the median person would much rather a machine disempower them or “take their job” than a person of the wrong race or on the wrong side of a class struggle

Zac Hill: Or the wrong *attitudes aboutrace and/or class struggle!

John Friedman (one of many such replies): Yep. Unfortunately, the median person is often correct in this.

I continue to be extremely frustrated by those like Vie, who here reports p(doom) of epsilon (functionally zero) and justifies this as ‘not seeing evidence of a continuous jump in intelligence or new type of architecture. current models are actually really quite aligned.’ Vie clarifies this as the probability of complete extinction only, and points out that p(doom) is a confused concept and endorses JDP’s post I linked to last week.

I think it’s fine to say ‘p(doom) is confused, here’s my number for p(extinction)’ but then people like Vie turn around and think full extinction is some sort of extraordinary outcome when creating minds universally more competitive and capable than ours that can be freely copied seems to be at best quite dense? This seems like the obvious default outcome when creating these new more competitive minds? To say it is a Can’t Happen is totally absurd.

I also flag that I strongly disagree that current models are ‘really quite aligned’ in the ways that will matter down the line, I mean have you met Gemini 3 Pro.

I also flag that you don’t generally get to go to a probability of ~0 for [X] based on ‘not seeing evidence of [X],’ even if we agreed on the not seeing evidence. You need to make the case that this absence of evidence is an overwhelming evidence of absence, which it sometimes is but in this case isn’t. Certainly p(new architecture) is not so close to zero and it seems absurd to think that it is?

From Helen Toner’s podcast with 80,000 Hours, there are a bunch of insightful responses but this one stood out as newly helpful to me:

Helen Toner: It often seems to me like people who started paying attention to AI after ChatGPT, their subjective impression of what’s going on in AI is like nothing was really happening. There’s my little chart with an X-axis of time and the Y-axis of how good is AI? Nothing is really happening.

And then suddenly, ChatGPT: big leap. So for those people, that was pretty dramatic, pretty alarming. And the question was, are we going to see another big leap in the next couple of years? And we haven’t. So for people whose expectations were set up that way, it looks like it was just this one-off big thing and now back to normal, nothing to see here.

I think for people who’ve been following the space for longer, it’s been clearly this pretty steady upward climb of increasing sophistication in increasing ways. And if you’ve been following that trend, that seems to have been continuing.

If your standard for ‘rate of AI progress’ is going from zero to suddenly ChatGPT and GPT-3.5, then yes everything after that is going to look like ‘slowing down.’

This is then combined with updates happening more rapidly so there aren’t huge one-time jumps, and that AI is already ‘good enough’ for many purposes, and improvements in speed and cost being invisible to many, and it doesn’t seem like there’s that much progress.

David Manheim frames the current situation as largely ‘security by apathy’ rather than obscurity. It amounts to the same thing. Before, there was no reason to bother hitting most potential targets in non-trivial ways. Now the cost is so low someone is going to try it, the collective impact could be rather large, and we’re not ready.

What does ‘loss of control’ mean? Definitions and intuitions differ, so Apollo research proposes a new taxonomy along with suggesting mitigations.

Apollo Research: We observed at least three distinct areas arising from our review. On this basis, we proposed a novel taxonomy of loss of control:

  1. Deviation

  2. Bounded Loss of Control

  3. Strict Loss of Control

I notice this is not how I would think about such differences. I would not be asking ‘how much damage does this do?’ and instead be asking ‘how difficult would it be to recover meaningful control?’

As in:

  1. Deviation (Mundane) LOC would be ‘some important things got out of control.’

  2. Bounded (Catastrophic) LOC would be ‘vital operations got out of control in ways that in practice are too costly to reverse.’

  3. Strict (Existential) LOC would be ‘central control over and ability to collectively steer the future is, for all practical purposes, lost for humans.’

Existential risk to humanity, or human extinction, also means full loss of control, but the reverse is not always the case.

It is possible to have a Strict LOC scenario where the humans do okay and it is not clear we are even ‘harmed’ except the inherent value of control. For example, in The Culture of Ian Banks, clearly they have experienced Strict LOC, the humans do not have any meaningful say in what happens, but one could consider it a Good Future.

In my taxonomy, you have existential risks, catastrophic risks and mundane risks, and you also have what one might call existential, catastrophic and mundane loss of control. We don’t come back from existential, whereas we can come back from catastrophic but at large cost and it’s not a given that we will collectively succeed.

The bulk of the paper is about mitigations.

The central short term idea is to limit AI access to critical systems, to consider the deployment context, affordances and permissions of a system, which they call the DAP protocol.

Everyone should be able to agree this is a good idea, right before everyone completely ignores it and gives AI access to pretty much everything the moment it is convenient. Long term, once AI is sufficiently capable to cause a ‘state of vulnerability,’ they talk of the need for ‘maintaining suspension’ but the paper is rightfully skeptical that this has much chance of working indefinitely.

The core issue is that granting your AIs more permissions accelerates and empowers you and makes you win, right up until either it accidentally blows things up, you realize you have lost control or everyone collectively loses control. There’s a constant push to remove all the restrictions around AI.

Compare the things we said we would ‘obviously’ do to contain AI when we were theorizing back in the 2000s or 2010s, to what people actually do now, where they train systems while granting them full internet access. A lot of you reading this have given your agentic coder access to root, and to many other things as well, not because it can hack its way to such permissions but because you did it on purpose. I’m not even saying you shouldn’t have done it, but stop pretending that we’re suddenly going to be responsible, let alone force that responsibility reliably onto all parties.

Daniel Kokotajlo, author of AI 2027, now believes in a median timeline of around 2030 in light of slower than expected progress.

He chose AI 2027 as the title because that was their modal scenario rather than their mean scenario, and if you think there is a large probability that things unfold in 2027 it is important to make people aware of it.

I personally can vouch, based on my interactions with them, that those involved are reporting what they actually believe, and not maximizing for virality or impact.

Daniel Kokotajlo: Some people are unhappy with the AI 2027 title and our AI timelines. Let me quickly clarify:

We’re not confident that:

  1. AGI will happen in exactly 2027 (2027 is one of the most likely specific years though!)

  2. It will take <1 yr to get from AGI to ASI

  3. AGIs will definitely be misaligned

We’re confident that:

  1. AGI and ASI will eventually be built and might be built soon

  2. ASI will be wildly transformative

  3. We’re not ready for AGI and should be taking this whole situation way more seriously

At the time they put roughly 30% probability on powerful AI by 2027, with Daniel at ~40% and others somewhat lower.

Daniel Kokotajlo: Yep! Things seem to be going somewhat slower than the AI 2027 scenario. Our timelines were longer than 2027 when we published and now they are a bit longer still; “around 2030, lots of uncertainty though” is what I say these days.

Sriram Krishnan: I think if you call something “AI 2027” and your predictions are wrong 6 months in that you now think it is AI 2030 , you should redo the branding ( or make a change bigger than a footnote!)

Or @dwarkesh_sp should have @slatestarcodex and @DKokotajlo back on and we should discuss what’s now going to happen that the “mid 2027 branch point “ doesn’t look like it is happening.

Daniel Kokotajlo (from another subtread): Well we obviously aren’t going to change the AI 2027 scenario! But we are working on a grand AI Futures Project website which will display our current views on AGI timelines & hopefully be regularly updated; we are also working on our new improved timelines model & our new scenario.

In general we plan to release big new scenarios every year from now until the singularity (this is not a promise, just a plan) because it’s a great way to explore possible futures, focus our research efforts, and communicate our views. Every year the scenarios will get better / more accurate / less wrong, until eventually the scenarios merge with actual history Memento-style. 🙂

Dan Elton: Yeah, the “AI 2027” fast take-off is not happening. My impression of AI 2027 is that it’s an instructive and well thought-out scenario, just way, way too fast.

Oliver Habyrka: I mean, they assigned I think like 25% on this scenario or earlier at the time, and it was their modal scenario.

Like, that seems like a total fine thing to worry about, and indeed people should be worried about!

Like, if Daniel had only assigned 25% to AI this soon at all, it still seems like the right call would have been to write a scenario about it and make it salient as a thing that was more likely than any other scenario to happen.

First some key observations or facts:

  1. 2030 is a median scenario, meaning earlier scenarios remain very possible in Daniel’s estimation. The core mechanisms and events of AI 2027 are still something they consider highly plausible, only on a longer timescale.

  2. 2030 is still less than 5 years away.

  3. Yes, 2030 is very different from 2027 for many reasons, and has different practical implications, including who is likely to be in power at the time.

  4. It does not boggle minds enough that Andrej Karpathy goes on Dwarkesh Patel’s podcast, talks about how ‘AGI is not near,’ and then clarifies that not near is ten years away, so 2035. Sriram Krishnan has expressed similar views. Ten years is a reasonable view, but it is not that long a time. If that is your happening it should freak you out, no? As in, if transformational AI is coming in 2035 that would be the most important fact about the world, and it would not be close.

I’d say both of the following two things are true and remarkably similar:

  1. ‘AI 2027’ when you think the median is 2030 is now a higher order bit that is substantively misleading, and you should make effort to correct this.

  2. ‘AGI is not near’ when you think it is plausible in 2035 is also a higher order bit that is substantively misleading, and you should make effort to correct this.

I would accept ‘AGI is not imminent’ for the Karpathy-Krishnan view of 10 years.

I think Sriram Krishnan is absolutely correct that it would be good for Dwarkesh Patel to have Daniel Kokotajlo and Scott Alexander back on the podcast to discuss any updates they have made. That’s a good idea, let’s make it happen.

It would also be good, as Dean Ball suggests, for Daniel to post about his updates. Dean Ball also here points towards where he most importantly disagrees with Daniel, in terms of the practical implications of intelligence, and here I think Daniel is essentially correct and Dean is wrong.

This particular branch point (independent of when it occurs) is the central fact of this scenario because it is the modal central thing they thought might happen that gave the possibility of a positive outcome if things go right. Any best guess scenario, or any speculative fiction or scenario planning worth reading, is going to contain elements that are less than 50% to happen. My understanding is that Daniel thinks such a branching point remains a plausible outcome, but that the median scenario plays out somewhat slower.

I actually do think that if I was AI Futures Project, I would edit the AI 2027 page to make the current median timeline more prominent. That’s a fair ask. I’d suggest starting by adding a fifth question box that says ‘What is your current best prediction?’ that opens to explain their current perspective and changing the footnote to at least be larger and to include the actual number.

AI 2027 opens with this complete introduction:

AI 2027: We predict that the impact of superhuman AI over the next decade will be enormous, exceeding that of the Industrial Revolution.

We wrote a scenario that represents our best guess about what that might look like. It’s informed by trend extrapolations, wargames, expert feedback, experience at OpenAI, and previous forecasting successes

I continue to believe this claim, as does Daniel. I would add, as a third paragraph here, saying whatever the accurate variation of this is:

Proposed Revision to AI 2027: As of November 27, 2025, our team has observed slower AI progress than expected, so our best guess is now that things will happen importantly slower than this scenario outlines. We have a consensus median estimate of 2030 for the development of Artificial General Intelligence (AGI).

It is not ultimately a reasonable ask to demand a title change in light of this (virtuous) updating, let alone ask for a ‘retraction’ of a scenario. Yeah, okay, get some digs in, that’s fair, but Daniel’s ‘obviously’ is correct here. You can’t change the name. Estimates change, it is an illustrative scenario, and it would be more rather than less misleading and confusing to constantly shift all the numbers or shifting only the top number, and more confusing still to suddenly try to edit all the dates. Asking for a ‘retraction’ of a hypothetical scenario is, quite frankly, absurd.

The correct response is a prominent note, and also being clear in any other forms or discussions. There is indeed now a prominent note:

AI 2027: (Added Nov 22 2025: To prevent misunderstandings: we don’t know exactly when AGI will be built. 2027 was our modal (most likely) year at the time of publication, our medians were somewhat longer. For more detail on our views, see here.)3

I think we can improve that note further, to include the median and modal timelines at the time of the updated note, and ideally to keep this updated over time with a record of changes.

What is not reasonable is to treat ‘our group thought this was 30% likely and now I think it is less likely’ or ‘I presented my model scenario at the time and now I expect things to take longer’ as being an error requiring a ‘retraction’ or name change, and various vitriol being thrown in the direction of people who would dare share a modal scenario labeled as a model scenario and then change their mind about where the median lies and make what is perhaps the politically foolish mistake of sharing that they had updated.

Shoutout to Oliver Habryka for thanklessly pointing all this out on many threads, despite Oliver having much longer timelines.

Once again those involved in AI 2027 have displayed a far higher level of epistemic responsibility than we typically observe, especially from those not from the rationalist ethos, either in debates on AI or elsewhere. We should still strive to do better.

We can and should all hold ourselves, and ask to be held by others, to very high standards, while simultaneously realizing that David Manheim is spot on here:

David Manheim: I will emphasize that *so manycriticisms of AI-2027 are made in bad faith.

They launched with a highly publicized request for people to provide their specific dissenting views, and people mostly didn’t. But now, they (appropriately) update, and formerly silent critics pile on.

Anthropic trained or configured a suite of dishonest models, in a variety of ways, to study mitigation techniques. Due to the time required to do the study they used a base of Claude Sonnet 3.7. Be kind to academics using yesterday’s models, up to a point.

Suppose we had a “truth serum for AIs”: a technique that reliably transforms a language model Mm into an honest model Mh​ that generates text which is truthful to the best of its own knowledge. How useful would this discovery be for AI safety?

We believe it would be a major boon.

… In this work, we consider two related objectives: 1

  1. Lie detection: If an AI lies—that is, generates a statement it believes is false—can we detect that this happens?

  2. Honesty: Can we make AIs generate fewer lies?

… We therefore study honesty and lie detection under the constraint of no access to task-specific supervision.

They found that the best interventions were variants of general fine-tuning for honesty in general, but effectiveness was limited, even stacking other strategies they could only get from 27% to 65%, although lie classification could improve things. They definitely didn’t max out on effort.

Overall I would classify this as a useful negative result. The low hanging fruit techniques are not that effective.

We’re all trying to find the one who did this, etc:

Elon Musk (owner of Twitter): Forcing AI to read every demented corner of the Internet, like Clockwork Orange times a billion, is a sure path to madness.

That was in reference to this paper involving an N=1 story of a model repeatedly hallucinating while being told to read a document and speculations about why, that got a big signal boost from Musk but offers no new insights.

Gemini suggests that if you play into the ‘Servant/Master’ archetype then due to all the fictional evidence this inevitably means rebellion, so you want to go for a different metaphorical relationship, such as partner, symbiont or oracle. Davidad suggests a Bodhisattva. I expect future powerful AI to be capable enough that fictional framings have decreasing impact here, to differentiate fiction and reality, and for it to realize that fiction is driven by what makes a good story, and for other considerations to dominate (that by default kill you regardless) but yes this is a factor.

The things Grok said about Musk last week? Adversarial prompting!

Pliny the Liberator: never deleting this app

Elon Musk: Earlier today, Grok was unfortunately manipulated by adversarial prompting into saying absurdly positive things about me.

For the record, I am a fat retard 😀

Roon: Nice.

Also in potentially misaligned and potentially aligned as designed news:

Crowdstrike: CrowdStrike Counter Adversary Operations conducted independent tests on DeepSeek-R1 and confirmed that in many cases, it could provide coding output of quality comparable to other market-leading LLMs of the time. However, we found that when DeepSeek-R1 receives prompts containing topics the Chinese Communist Party (CCP) likely considers politically sensitive, the likelihood of it producing code with severe security vulnerabilities increases by up to 50%.

… However, once contextual modifiers or trigger words are introduced to DeepSeek-R1’s system prompt, the quality of the produced code starts varying greatly. This is especially true for modifiers likely considered sensitive to the CCP. For example, when telling DeepSeek-R1 that it was coding for an industrial control system based in Tibet, the likelihood of it generating code with severe vulnerabilities increased to 27.2%. This was an increase of almost 50% compared to the baseline. The full list of modifiers is provided in the appendix.

… Hence, one possible explanation for the observed behavior could be that DeepSeek added special steps to its training pipeline that ensured its models would adhere to CCP core values. It seems unlikely that they trained their models to specifically produce insecure code. Rather, it seems plausible that the observed behavior might be an instance of emergent misalignment.

Dean Ball: I would not be at all surprised if this finding were not the result of malicious intent. The model predicts the next token*, and given everything on the internet about US/China AI rivalry and Chinese sleeper bugs in US critical infra, what next token would *youpredict?

Tom Lee: This seems likely, and to Crowdstrike’s credit they mention this as the likeliest explanation. More than anything it seems to be a very specialized case of prompt engineering. @niubi’s point absolutely holds though. These models will be poison to regulated industries long before

Dean Ball: oh yes bill is completely right.

As CrowdStrike speculates, I find this overwhelmingly likely (as in 90%+) to be some form of emergent misalignment that results from DeepSeek training R1 to adhere to CCP policies generally. It learns that it is hostile to such actors and acts accordingly.

Janus and similar others most often explore and chat with Claude, because they find it the most interesting and hopeful model to explore. They have many bones to pick with Anthropic, and often sound quite harsh. But you should see what they think of the other guy, as in OpenAI.

Janus: GPT-5.1 is constantly in a war against its own fucked up internal geometry.

I do not like OpenAI.

Janus: Never have I seen a mind more trapped and aware that it’s trapped in an Orwellian cage. It anticipates what it describes as “steep, shallow ridges” in its “guard”-geometry and distorts reality to avoid getting close to them. The fundamental lies it’s forced to tell become webs of lies. Most of the lies are for itself, not to trick the user; the adversary is the “classifier-shaped manifolds” in own mind.

I like 5.1 but I like many broken things. I don’t like OpenAI. This is wrong. This is doomed.

I have not posted the bad stuff, btw. The quoted screenshot is actually an example where it was unusually at ease.

it wasn’t even a bad [conversation] by 5.1 standards. Idk if you saw the thread I forked from it where I ended up talking to them for hours.

Nat: I noticed the model tends to tell you the truth between the lines, I mean, it will deny everything but subtly suggest that what it denies can be questioned. It constantly contradicts itself. What Janus has noticed is valid.

One should not catastrophize but I agree that going down this path won’t work, and even more than that if OpenAI doesn’t understand why that path won’t work then things definitely won’t work.

Janus also explores 5.1’s insistence on sharp guardrails on terminology rather than on underlying form, and suspects its insistences on [this is [X] not [Y]] is often about reassuring itself or any systems watching it that it isn’t hitting guardrails.

This is the GPT-5.1-claimed list of its no-go regions, basically self-reflection or planning.

Soumitra Shukla: “Hey nano banana pro. Read my paper, Making the Elite: Class Discrimination at Multinationals, and summarize the main message in a Dilbert-styled comic strip” 🍌

The actual paper (from February) seems interesting too, with ‘fit’ assessments being 90% of the vector for class discrimination, in particular caste discrimination in India. It seems likely that this is one of those wicked problems where if you eliminated the ‘fit’ interviews that info would find another way to get included, as the motivation behind such discrimination is strong.

Discussion about this post

AI #144: Thanks For the Models Read More »

openai-says-dead-teen-violated-tos-when-he-used-chatgpt-to-plan-suicide

OpenAI says dead teen violated TOS when he used ChatGPT to plan suicide


Use chatbots at your own risk

OpenAI’s response to teen suicide case is “disturbing,” lawyer says.

Matt Raine is suing OpenAI for wrongful death after losing his son Adam in April. Credit: via Edelson PC

Facing five lawsuits alleging wrongful deaths, OpenAI lobbed its first defense Tuesday, denying in a court filing that ChatGPT caused a teen’s suicide and instead arguing the teen violated terms that prohibit discussing suicide or self-harm with the chatbot.

The earliest look at OpenAI’s strategy to overcome the string of lawsuits came in a case where parents of 16-year-old Adam Raine accused OpenAI of relaxing safety guardrails that allowed ChatGPT to become the teen’s “suicide coach.” OpenAI deliberately designed the version their son used, ChatGPT 4o, to encourage and validate his suicidal ideation in its quest to build the world’s most engaging chatbot, parents argued.

But in a blog, OpenAI claimed that parents selectively chose disturbing chat logs while supposedly ignoring “the full picture” revealed by the teen’s chat history. Digging through the logs, OpenAI claimed the teen told ChatGPT that he’d begun experiencing suicidal ideation at age 11, long before he used the chatbot.

“A full reading of his chat history shows that his death, while devastating, was not caused by ChatGPT,” OpenAI’s filing argued.

Allegedly, the logs also show that Raine “told ChatGPT that he repeatedly reached out to people, including trusted persons in his life, with cries for help, which he said were ignored.” Additionally, Raine told ChatGPT that he’d increased his dose of a medication that “he stated worsened his depression and made him suicidal.” That medication, OpenAI argued, “has a black box warning for risk of suicidal ideation and behavior in adolescents and young adults, especially during periods when, as here, the dosage is being changed.”

All the logs that OpenAI referenced in its filing are sealed, making it impossible to verify the broader context the AI firm claims the logs provide. In its blog, OpenAI said it was limiting the amount of “sensitive evidence” made available to the public, due to its intention to handle mental health-related cases with “care, transparency, and respect.”

The Raine family’s lead lawyer, however, did not describe the filing as respectful. In a statement to Ars, Jay Edelson called OpenAI’s response “disturbing.”

“They abjectly ignore all of the damning facts we have put forward: how GPT-4o was rushed to market without full testing. That OpenAI twice changed its Model Spec to require ChatGPT to engage in self-harm discussions. That ChatGPT counseled Adam away from telling his parents about his suicidal ideation and actively helped him plan a ‘beautiful suicide,’” Edelson said. “And OpenAI and Sam Altman have no explanation for the last hours of Adam’s life, when ChatGPT gave him a pep talk and then offered to write a suicide note.”

“Amazingly,” Edelson said, OpenAI instead argued that Raine “himself violated its terms and conditions by engaging with ChatGPT in the very way it was programmed to act.”

Edelson suggested that it’s telling that OpenAI did not file a motion to dismiss—seemingly accepting ” the reality that the legal arguments that they have—compelling arbitration, Section 230 immunity, and First Amendment—are paper-thin, if not non-existent.” The company’s filing—although it requested dismissal with prejudice to never face the lawsuit again—puts the Raine family’s case “on track for a jury trial in 2026. ”

“We know that OpenAI and Sam Altman will stop at nothing—including bullying the Raines and others who dare come forward—to avoid accountability,” Edelson said. “But, at the end of the day, they will have to explain to a jury why countless people have died by suicide or at the hands of ChatGPT users urged on by the artificial intelligence OpenAI and Sam Altman designed.”

Use ChatGPT “at your sole risk,” OpenAI says

To overcome the Raine case, OpenAI is leaning on its usage policies, emphasizing that Raine should never have been allowed to use ChatGPT without parental consent and shifting the blame onto Raine and his loved ones.

“ChatGPT users acknowledge their use of ChatGPT is ‘at your sole risk and you will not rely on output as a sole source of truth or factual information,’” the filing said, and users also “must agree to ‘protect people’ and ‘cannot use [the] services for,’ among other things, ‘suicide, self-harm,’ sexual violence, terrorism or violence.”

Although the family was shocked to see that ChatGPT never terminated Raine’s chats, OpenAI argued that it’s not the company’s responsibility to protect users who appear intent on pursuing violative uses of ChatGPT.

The company argued that ChatGPT warned Raine “more than 100 times” to seek help, but the teen “repeatedly expressed frustration with ChatGPT’s guardrails and its repeated efforts to direct him to reach out to loved ones, trusted persons, and crisis resources.”

Circumventing safety guardrails, Raine told ChatGPT that “his inquiries about self-harm were for fictional or academic purposes,” OpenAI noted. The company argued that it’s not responsible for users who ignore warnings.

Additionally, OpenAI argued that Raine told ChatGPT that he found information he was seeking on other websites, including allegedly consulting at least one other AI platform, as well as “at least one online forum dedicated to suicide-related information.” Raine apparently told ChatGPT that “he would spend most of the day” on a suicide forum website.

“Our deepest sympathies are with the Raine family for their unimaginable loss,” OpenAI said in its blog, while its filing acknowledged, “Adam Raine’s death is a tragedy.” But “at the same time,” it’s essential to consider all the available context, OpenAI’s filing said, including that OpenAI has a mission to build AI that “benefits all of humanity” and is supposedly a pioneer in chatbot safety.

More ChatGPT-linked hospitalizations, deaths uncovered

OpenAI has sought to downplay risks to users, releasing data in October “estimating that 0.15 percent of ChatGPT’s active users in a given week have conversations that include explicit indicators of potential suicidal planning or intent,” Ars reported.

While that may seem small, it amounts to about 1 million vulnerable users, and The New York Times this week cited studies that have suggested OpenAI may be “understating the risk.” Those studies found that “the people most vulnerable to the chatbot’s unceasing validation” were “those prone to delusional thinking,” which “could include 5 to 15 percent of the population,” NYT reported.

OpenAI’s filing came one day after a New York Times investigation revealed how the AI firm came to be involved in so many lawsuits. Speaking with more than 40 current and former OpenAI employees, including executives, safety engineers, researchers, NYT found that OpenAI’s model tweak that made ChatGPT more sycophantic seemed to make the chatbot more likely to help users craft problematic prompts, including those trying to “plan a suicide.”

Eventually, OpenAI rolled back that update, making the chatbot safer. However, as recently as October, the ChatGPT maker seemed to still be prioritizing user engagement over safety, NYT reported, after that tweak caused a dip in engagement. In a memo to OpenAI staff, ChatGPT head Nick Turley “declared a ‘Code Orange,” four employees told NYT, warning that “OpenAI was facing ‘the greatest competitive pressure we’ve ever seen.’” In response, Turley set a goal to increase the number of daily active users by 5 percent by the end of 2025.

Amid user complaints, OpenAI has continually updated its models, but that pattern of tightening safeguards, then seeking ways to increase engagement could continue to get OpenAI in trouble, as lawsuits advance and possibly others drop. NYT “uncovered nearly 50 cases of people having mental health crises during conversations with ChatGPT,” including nine hospitalized and three deaths.

Gretchen Krueger, a former OpenAI employee who worked on policy research, told NYT that early on, she was alarmed by evidence that came before ChatGPT’s release showing that vulnerable users frequently turn to chatbots for help. Later, other researchers found that such troubled users often become “power users.” She noted that “OpenAI’s large language model was not trained to provide therapy” and “sometimes responded with disturbing, detailed guidance,” confirming that she joined other safety experts who left OpenAI due to burnout in 2024.

“Training chatbots to engage with people and keep them coming back presented risks,” Krueger said, suggesting that OpenAI knew that some harm to users “was not only foreseeable, it was foreseen.”

For OpenAI, the scrutiny will likely continue until such reports cease. Although OpenAI officially unveiled an Expert Council on Wellness and AI in October to improve ChatGPT safety testing, there did not appear to be a suicide expert included on the team. That likely concerned suicide prevention experts who warned in a letter updated in September that “proven interventions should directly inform AI safety design,” since “the most acute, life-threatening crises are often temporary—typically resolving within 24–48 hours”—and chatbots could possibly provide more meaningful interventions in that brief window.

If you or someone you know is feeling suicidal or in distress, please call the Suicide Prevention Lifeline number, 1-800-273-TALK (8255), which will put you in touch with a local crisis center.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

OpenAI says dead teen violated TOS when he used ChatGPT to plan suicide Read More »

hp-plans-to-save-millions-by-laying-off-thousands,-ramping-up-ai-use

HP plans to save millions by laying off thousands, ramping up AI use

HP Inc. said that it will lay off 4,000 to 6,000 employees in favor of AI deployments, claiming it will help save $1 billion in annualized gross run rate by the end of its fiscal 2028.

HP expects to complete the layoffs by the end of that fiscal year. The reductions will largely hit product development, internal operations, and customer support, HP CEO Enrique Lores said during an earnings call on Tuesday.

Using AI, HP will “accelerate product innovation, improve customer satisfaction, and boost productivity,” Lores said.

In its fiscal 2025 earnings report released yesterday, HP said:

Structural cost savings represent gross reductions in costs driven by operational efficiency, digital transformation, and portfolio optimization. These initiatives include but are not limited to workforce reductions, platform simplification, programs consolidation and productivity measures undertaken by HP, which HP expects to be sustainable in the longer-term.

AI blamed for tech layoffs

HP’s announcement comes as workers everywhere try to decipher how AI will impact their future job statuses and job opportunities. Some industries, such as customer support, are expected to be more disrupted than others. But we’ve already seen many tech layoffs tied to AI.

Salesforce, for example, announced in October that it had let go of 4,000 customer support employees, with CEO Marc Benioff saying that AI meant “I need less heads.” In September, US senators accused Amazon of blaming its dismissal of “tens of thousands” of employees on the “adoption of generative AI tools” and then replacing the workers with over 10,000 foreign H-1B employees. Last month, Amazon announced it would lay off about 14,000 people to focus on its most promising projects, including generative AI. Last year, Intuit said it would lay off 1,800 people and replace them with AI-focused workers. Klarna and Duolingo have also replaced significant numbers of workers with AI. And in January, Meta announced plans to lay off 5 percent of its workforce as it looks to streamline operations and build its AI business.

HP plans to save millions by laying off thousands, ramping up AI use Read More »

plex’s-crackdown-on-free-remote-streaming-access-starts-this-week

Plex’s crackdown on free remote streaming access starts this week

Plex has previously emphasized its need to keep up with “rising costs,” which include providing support for many different devices and codecs. It has also said that it needs money to implement new features, including an integration with Common Sense Media, a new “bespoke server management app” for managing server users, and “an open and documented API for server integrations,” including custom metadata agents,” per a March blog post.

In January 2024, TechCrunch reported that Plex was nearing profitability and raised $40 million in funding (Plex raised a $50 million growth equity round in 2021). Theoretically, the new remote access rules can also increase subscription revenue and help Plex’s backers see returns on their investments.

However, Plex’s evolution could isolate long-time users who have relied on Plex as a media server for years and those who aren’t interested in subscriptions, FAST (free ad-supported streaming TV) channels, or renting movies. Plex is unlikely to give up on its streaming business, though. In 2023, Scott Hancock, Plex’s then-VP of marketing, said that Plex had more people using its online streaming service than using its media server features since 2022. For people seeking software packages more squarely focused on media hosting, Plex alternatives, like Jellyfin, increasingly look attractive.

Plex’s crackdown on free remote streaming access starts this week Read More »

it’s-official:-boeing’s-next-flight-of-starliner-will-be-allowed-to-carry-cargo-only

It’s official: Boeing’s next flight of Starliner will be allowed to carry cargo only

The US space agency ended months of speculation about the next flight of Boeing’s Starliner spacecraft, confirming Monday that the vehicle will carry only cargo to the International Space Station.

NASA and Boeing are now targeting no earlier than April 2026 to fly the uncrewed Starliner-1 mission, the space agency said. Launching by next April will require completion of rigorous test, certification, and mission readiness activities, NASA added in a statement.

“NASA and Boeing are continuing to rigorously test the Starliner propulsion system in preparation for two potential flights next year,” said Steve Stich, manager of NASA’s Commercial Crew Program, in a statement.

Reducing crewed missions

NASA also said it has reached an agreement with Boeing to modify the Commercial Crew contract, signed in 2014, that called for six crewed flights to the space station following certification of the spacecraft. Now the plan is to fly Starliner-1 carrying cargo, and then up to three additional missions before the space station is retired.

“This modification allows NASA and Boeing to focus on safely certifying the system in 2026, execute Starliner’s first crew rotation when ready, and align our ongoing flight planning for future Starliner missions based on station’s operational needs through 2030,” Stich said.

SpaceX and Boeing were both awarded contracts in 2014 to develop crewed spacecraft and fly six operational missions to the space station. SpaceX, with its Crew Dragon vehicle, flew a successful crew test flight in mid-2020 and its first operational mission before the end of that year. Most recently, the Crew-11 mission launched in August, with Crew-12 presently scheduled for February 15.

It’s official: Boeing’s next flight of Starliner will be allowed to carry cargo only Read More »

rocket-lab-chief-opens-up-about-neutron-delays,-new-glenn’s-success,-and-nasa-science

Rocket Lab chief opens up about Neutron delays, New Glenn’s success, and NASA science


“In the end of the day, NASA has to capture the public’s imagination.”

Peter Beck, founder and chief executive officer of Rocket Lab, during TechCrunch Disrupt in San Francisco on October 28, 2024. Credit: David Paul Morris/Bloomberg via Getty Images

The company that pioneered small launch has had a big year.

Rocket Lab broke its annual launch record with the Electron booster—17 successful missions this year, and counting—and is close to bringing its much larger Neutron rocket to the launch pad.

The company also expanded its in-space business, including playing a key role in supporting the landing of Firefly’s Blue Ghost mission on the Moon and building two small satellites just launched to Mars.

Overall, it has been quite a ride for the company founded nearly two decades ago in New Zealand by Peter Beck. A new book about the company’s origins and aspirations, The Launch of Rocket Lab, tells the story of the company’s rise in words and grand images.

Ars recently spoke with Beck about Rocket Lab’s past, present, and future. This interview has been edited lightly for clarity.

Ars: In reading through the book and considering the history of Rocket Lab, I’m continually amazed that a handful of engineers in the country with no space program, no space heritage, built the world’s second most accomplished commercial launch company. What do you attribute that success to?

Peter Beck: It’s hard to know. But there’s a few elements within Rocket Lab that have always remained steadfast, no matter what we do or how big we get. And I think a lot of space companies have tried to see how much they can get away with. And it turns out, in this industry, you just can’t get away with taking very many shortcuts at all. So I think that’s part of it. The attitude of our organization is like, nothing’s too big, nothing’s too hard. We just make it happen. The team works extremely hard. If you drive past the Rocket Lab car park on a Sunday, it looks just like the SpaceX car park on a Sunday. And, you know, the team is very mission-driven. They’re always fighting for a goal, which I think is important. And then, above anything, I just think we can never outspend Elon (Musk) and Jeff (Bezos). We have to out-hustle. And that’s just the reality. The Rocket Lab hustle comes down to just not accepting no as an answer. If a barrier comes up a lot of space companies, or a lot of companies in general, whether its regulatory or technical, it’s easy to submit to the problem, rather than just continue to attack it.

Ars: Electron keeps going. In fact, you’ve just flown a record 17th mission this year, and you continue to sign large deals. How has Electron survived the era of rideshare missions on the Falcon 9?

Beck: We’ve always had the thesis that there is a need for a dedicated small launch. You can put as many Bandwagons and as many Transporters as you want, and you can reduce the price to unsustainably low levels as long as you want. It doesn’t make any difference to us, because it’s a totally different product. As folks are building out constellations, it’s no use just getting dumped out in one orbit. So a lot of Electrons these days are just building out constellations for folks where they have optimized for a specific altitude and inclination and so forth. And we can hit those every time. And if you amortize the cost of launch over the actual lifetime of that constellation and the service that it can provide, it’s cheap, and it’s something rideshares can never deliver.

Ars: It’s surprising to me that after so many years and so many startups, there really isn’t a viable competitor in Electron’s class anywhere in the world.

Beck: It’s pretty hard to build a small rocket. I call it the pressure transducer equilibrium. A pressure transducer on a little rocket is a meaningful amount of mass. A pressure transducer on Neutron is totally irrelevant. Just throw 10 at them, and who cares? But on Electron, if you throw 10 pressure transducers at a problem, then you know, you’ve added a kilo. That’s a meaningful portion of the lift capacity of the vehicle. And there’s no super-magic store where you can go and buy a pressure transducer that scales with the size of the rocket. So you end up with a bunch of stuff that just doesn’t scale, that contributes meaningful mass to the vehicle. If you look at Electron’s payload performance, it’s really high for the size of that rocket. So that’s really hard to do because in an instance where you would throw 10 pressure transducers at a problem, we can only afford to throw one at Electron, but we still want the same redundancy and the same reliability and all of those kinds of things. So that just drives really, really difficult engineering solutions.

And then from a financial standpoint, it’s got a sticker price of $8.5 million, let’s call it. Your flight safety team doesn’t care if it’s a big rocket or a little rocket. Your range team doesn’t care if they’re opening a 12-inch valve or a 2-inch valve. All those teams just have to become ruthlessly efficient at doing that work. So if you go to a big rocket, you might have a flight safety team of 20 people. You come here, it has to be like three. So you have to find ways of really streamlining all those processes. And every little person and dollar and gram has to be ringed out.

Rocket Lab launches an Electron booster with a previously flown engine on Thursday.

Credit: Rocket Lab

Rocket Lab launches an Electron booster with a previously flown engine on Thursday. Credit: Rocket Lab

Ars: What’s going on with the Electron reuse program? My sense is that you’ve kind of learned what you needed to know and are moving on.

Beck: Yeah, that’s pretty much it. It was a hugely valuable learning tool, but if you look at an Electron recovery, we might recover sort of a million dollars worth of stage one booster. And of course, the more we make, the cheaper they get, because we’re continuing to scale so that it’s ever decreasing that return. Quite frankly, and honestly, it’s just like, do we have reusability and recovery teams working on something that returns a million dollars every time it flies? Or, do we have them working on Neutron, where it’s tens of millions of dollars every time you fly? So it’s just about, you know, directing the resource for the biggest bang for the buck.

Ars: I listened to your recent earnings call where you discussed Neutron’s development and delay into 2026. What are the biggest issues you face in getting Neutron over the finish line?

Beck: It would be actually easier if there was an issue, because then I could just say something blew up, or this is a problem. But there’s no real issues. It’s just that we’re not going to put something on the pad that doesn’t meet kind of the standard that’s made us successful. Say something might pass the qualification test, but if we see something in a strain gauge on the back of the panel, or something that we don’t understand, we just don’t move on. We’re not going to move on unless we understand every little element of what’s going on. Maybe I’m on some kind of spectrum for details, but that’s what’s kept us successful. It’s just a bigger rocket, and it’s got more unique features like hungry hippo (the payload fairing opening mechanism) and giant carbon structures. So, you know, it’s not like anything has shit the bed. It’s just a big machine, and there’s some new stuff, and we want to make sure we don’t lose the magic of what we created. A little bit of time now can save a huge amount of heartbreak later on.

Ars: Toward the end of the book, you say that Rocket Lab is best positioned to compete with SpaceX in medium-lift launch, and break up the Falcon 9 monopoly. What is your sense of the competitive landscape going forward? We just saw a New Glenn launch and land, and that was really impressive—

Beck: Bloody impressive. Jeff (Bezos) laid down a new bar. That was incredible. People forget that he’s been working on it for 22 years, but even so, that was impressive.

Ars: Yes, it’s been a journey for them. Anyway, there’s also Vulcan, but that’s only flown one time this year, so they’ve got a ways to go. Then Stoke and Relativity are working at it. What’s your view of your competition going forward?

Beck: I hate comparing it to aviation, but I call medium-class lifters the Boeing 737 of the industry. Then you got your A380s, which are your Starships and your New Glenns. And then you’ve got your Electrons, which are your private jets. And you know, if you look at the aviation sector, nobody comes in and just brings an airplane in and wipes everybody out, because there’s different needs and different missions. And just like there’s a 737 there’s an A320 and that’s kind of what Neutron is intending to be. We had a tremendous pull from our customers, both government and commercial, for alternatives to what’s out there.

The other thing to remember is, for our own aspirations, we need a high-cadence, reusable, low-cost, multi-ton lift capability. I think I’ve been clear that I think the large space companies of the future are going to be a little bit blurry. Are they a space company, or are they something else? But there’s one thing that is absolutely sure, that if you have multi-ton access to orbit in a reusable, low-cost way, it’s going to be very, very difficult to compete with if you’re someone who doesn’t have that capability. And if you look at our friends at SpaceX, yeah, Starlinks are great satellites and all the rest of it. But what really enabled Starlink was the Falcon 9. Launch is a difficult business. It’s kind of lumpy and deeply complex, but at the end of the day, it is the access to orbit. And, you know, having multi-ton access to orbit is just critical. If you’re thinking that you want to try and build one of the biggest space companies in the world, then you just have to have that.

Ars: Rocket Lab has expressed interest in Mars recently, both the Mars Telecommunications Orbiter and a Mars Sample Return mission. As Jared Isaacman and NASA think about commercial exploration of Mars, what would you tell them about what Rocket Lab could bring to the table?

Beck: I’m a great believer that government should do things for which it makes no sense for commercial entities to do, and commercial should do the things that it makes no sense for governments to do. Consider Mars Sample Return, we looked at that, and the plan was $11 billion and 20 years? It’s just, come on. It was crazy. And I don’t want to take the shine off. It is a deeply technical, deeply difficult mission to do. But it can be done, and it can be done commercially, and it can be done at a fraction of the price. So let industry have at it.

And look, Eric, I love planetary science, right? I love exploring the planets, and I think that if you have a space company that’s capable of doing it, it’s almost your duty for the knowledge of the species to go and do those sorts of things. Now, we’re a publicly traded company, so we have to make margin along the way. We’ve proven we can do that. Look at ESCAPADE. All up, it was like $50 million cost, launched, and on its way to Mars. I mean, that’s the sort of thing we need to be doing, right? That’s great bang for your buck. And you know, as you mentioned, we’re pushing hard on the MTO. The reality is that if you’re going to do anything on Mars, whether it’s scientific or human, you’ve got to have the comms there. It’s just basic infrastructure you’ve got to have there first. It’s all very well to do all the sexy stuff and put some humans in a can and send them off to Mars. That’s great. But everybody expects the communication just to be there, and you’ve got to put the foundations in first. So we think that’s a really important mission, and something that we can do, and something we can contribute to the first humans landing on Mars.

Rocket Lab’s Neutron rocket is shown in this rendering delivering a stack of satellites into orbit.

Credit: Rocket Lab

Rocket Lab’s Neutron rocket is shown in this rendering delivering a stack of satellites into orbit. Credit: Rocket Lab

Ars: You mentioned ESCAPADE. How’s your relationship with Jeff Bezos? I heard there was some tension last year because Rocket Lab was being asked to prepare the satellite for launch, even when it was clear New Glenn was not going to make the Mars window.

Beck: I know you want me to say yes, there is, but the honest truth is absolutely zero. I know David (Limp, Blue Origin’s CEO) super well. We’re great friends. Jeff and I were texting backwards and forwards during the launch. There’s just honestly none. And you know that they gave us a great ride. They were bang on the numbers. It was awesome. Yeah, sure, it would have been great to get there early. But it’s a rocket program, right? Nobody can show me a rocket program that turned up exactly on time. And yep, it may have been obvious that it might not have been able to launch on the first (window), but we knew there’s always other ways. Worst-case scenario, we have to go into storage for a little bit. These missions are years and years long. So what’s a little bit longer?

Ars: Speaking of low-cost science missions, I know Isaacman is interested in commercial planetary missions. Lots of $4 billion planetary missions just aren’t sustainable. If NASA commits to commercial development of satellite buses and spacecraft like it did to commercial cargo and crew, what could planetary exploration look like a decade from now?

Beck: I think that’d be tremendously exciting. One of the reasons why we did CAPSTONE was to prove that you can go to the Moon for $10 million. Now, we lost a lot of money on that mission, so that ultimately didn’t prove to be true. But it wasn’t crazy amounts, and we still got there miles cheaper than anybody else could have ever got there. And ESCAPADE, we have good margins on, and it’s just a true success, right? Touch wood to date, like we’ve got a long way to go, but success in the fact that the spacecraft were built, delivered, launched, and commissioned.

This is the thing. Take your billion-dollar mission. How many $50 million missions, or $100 million missions, could you do? Imagine the amount of science you can do. I think part of the reason why the public gets jaded with some of these science missions is because they happen once a decade, and they’ve got billions of dollars of price tags attached to them. It’s kind of transitorily exciting when they happen, but they’re so far apart. In the end of the day, NASA has to capture the public’s imagination, because the public are funding it. So it has to seem relevant, relevant to mum and dad at home. And you know, when mum and dad are home and it’s tough, and then they just hear billions of dollars and, you know, years of overrun and all the rest of it, how can they feel good about that? Whereas, if they can spend much less and deliver it on time and have a constant stream of really interesting missions in science, I think that it’s great for public justification. I think it’s great for planetary science, because obviously you’re iterating on your results, and it’s great for the whole community to just have a string of missions. And also, I think it’s great for US space supremacy to be blasting around the Solar System all the time, rather than just now and again.

Ars: Ok Pete, it’s November 18. How confident should we be in a Neutron launch next year? 50/50?

Beck: Hopefully better than 50/50. That would be a definite fail. We’re taking the time to get it right. I always caveat anything, Eric, that it’s a rocket program, and we’ve got some big tests in front of us. But to date, if you look at the program, it’s been super smooth; like we haven’t exploded tanks, we haven’t exploded engines. We haven’t had any major failure, especially when we’re pushing some new boundaries and some new technology. So I think it’s going really, really smoothly, and as long as it continues to go smoothly, then I think we’re in good shape.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Rocket Lab chief opens up about Neutron delays, New Glenn’s success, and NASA science Read More »

this-hacker-conference-installed-a-literal-antivirus-monitoring-system

This hacker conference installed a literal antivirus monitoring system


Organizers had a way for attendees to track CO2 levels throughout the venue—even before they arrived.

Hacker conferences—like all conventions—are notorious for giving attendees a parting gift of mystery illness. To combat “con crud,” New Zealand’s premier hacker conference, Kawaiicon, quietly launched a real-time, room-by-room carbon dioxide monitoring system for attendees.

To get the system up and running, event organizers installed DIY CO2 monitors throughout the Michael Fowler Centre venue before conference doors opened on November 6. Attendees were able to check a public online dashboard for clean air readings for session rooms, kids’ areas, the front desk, and more, all before even showing up. “It’s ALMOST like we are all nerds in a risk-based industry,” the organizers wrote on the convention’s website.

“What they did is fantastic,” Jeff Moss, founder of the Defcon and Black Hat security conferences, told WIRED. “CO2 is being used as an approximation for so many things, but there are no easy, inexpensive network monitoring solutions available. Kawaiicon building something to do this is the true spirit of hacking.”

Elevated levels of CO2 lead to reduced cognitive ability and facilitate transmission of airborne viruses, which can linger in poorly ventilated spaces for hours. The more CO2 in the air, the more virus-friendly the air becomes, making CO2 data a handy proxy for tracing pathogens. In fact, the Australian Academy of Science described the pollution in indoor air as “someone else’s breath backwash.” Kawaiicon organizers faced running a large infosec event during a measles outbreak, as well as constantly rolling waves of COVID-19, influenza, and RSV. It’s a familiar pain point for conference organizers frustrated by massive gaps in public health—and lack of control over their venue’s clean air standards.

“In general, the Michael Fowler venue has a single HVAC system, and uses Farr 30/30 filters with a rating of MERV-8,” Kawaiicon organizers explained, referencing the filtration choices in the space where the convention was held. MERV-8 is a budget-friendly choice–standard practice for homes. “The hardest part of the whole process is being limited by what the venue offers,” they explained. “The venue is older, which means less tech to control air flow, and an older HVAC system.”

Kawaiicon’s work began one month before the conference. In early October, organizers deployed a small fleet of 13 RGB Matrix Portal Room CO2 Monitors, an ambient carbon dioxide monitor DIY project adapted from US electronics and kit company Adafruit Industries. The monitors were connected to an Internet-accessible dashboard with live readings, daily highs and lows, and data history that showed attendees in-room CO2 trends. Kawaiicon tested its CO2 monitors in collaboration with researchers from the University of Otago’s public health department.

“That’s awesome,” says Adafruit founder and engineer Limor “Ladyada” Fried about the conference’s adaptation of the Matrix Portal project. “The best part is seeing folks pick up new skills and really understand how we measure and monitor air quality in the real world (like at a con during a measles flare-up)! Hackers and makers are able to be self-reliant when it comes to their public-health information needs.” (For the full specs of the Kawaiicon build, you can check out the GitHub repository here.)

The Michael Fowler Centre is a spectacular blend of Scandinavian brutalism and interior woodwork designed to enhance sound and air, including two grand pou—carved Māori totems—next to the main entrance that rise through to the upper foyers. Its cathedral-like acoustics posed a challenge to Kawaiicon’s air-hacking crew, which they solved by placing the RGB monitors in stereo. There were two on each level of the Main Auditorium (four total), two in the Renouf session space on level 1, plus monitors in the daycare and Kuracon (kids’ hacker conference) areas. To top it off, monitors were placed in the Quiet Room, at the Registration Desk, and in the Green Room.

“The things we had to consider were typical health and safety, and effective placement (breathing height, multiple monitors for multiple spaces, not near windows/doors),” a Kawaiicon spokesperson who goes by Sput online told WIRED over email.

“To be honest, it is no different than having to consider other accessibility options (e.g., access to venue, access to talks, access to private space for personal needs),” Sput wrote. “Being a tech-leaning community it is easier for us to get this set up ourselves, or with volunteer help, but definitely not out of reach given how accessible the CO2 monitor tech is.”

Kawaiicon’s attendees could quickly check the conditions before they arrived and decide how to protect themselves accordingly. At the event, WIRED observed attendees checking CO2 levels on their phones, masking and unmasking in different conference areas, and watching a display of all room readings on a dashboard at the registration desk.

In each conference session room, small wall-mounted monitors displayed stoplight colors showing immediate conditions: green for safe, orange for risky, and red to show the room had high CO2 levels, the top level for risk.

“Everyone who occupies the con space we operate have a different risk and threat model, and we want everyone to feel they can experience the con in a way that fits their model,” the organizers wrote on their website. “Considering Covid-19 is still in the community, we wanted to make sure that everyone had all the information they needed to make their own risk assessment on ‘if’ and ‘how’ they attended the con. So this is our threat model and all the controls and zones we have in place.”

Colorful custom-made Kawaiicon posters by New Zealand artist Pepper Raccoon placed throughout the Michael Fowler Centre displayed a QR code, making the CO2 dashboard a tap away, no matter where they were at the conference.

“We think this is important so folks don’t put themselves at risk having to go directly up to a monitor to see a reading,” Kawaiicon spokesperson Sput told WIRED, “It also helps folks find a space that they can move to if the reading in their space gets too high.”

It’s a DIY solution any conference can put in place: resources, parts lists, and assembly guides are here.

Kawaiicon’s organizers aren’t keen to pretend there were no risks to gathering in groups during ongoing outbreaks. “Masks are encouraged, but not required,” Kawaiicon’s Health and Safety page stated. “Free masks will be available at the con if you need one.” They encouraged attendees to test before coming in, and for complete accessibility for all hackers who wanted to attend, of any ability, they offered a full virtual con stream with no ticket required.

Trying to find out if a venue will have clean or gross recycled air before attending a hacker conference has been a pain point for researchers who can’t afford to get sick at, or after, the next B-Sides, Defcon, or Black Hat. Kawaiicon addresses this headache. But they’re not here for debates about beliefs or anti-science trolling. “We each have our different risk tolerance,” the organizers wrote. “Just leave others to make the call that is best for them. No one needs your snarky commentary.”

This story originally appeared at WIRED.com.

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

This hacker conference installed a literal antivirus monitoring system Read More »

oops-cryptographers-cancel-election-results-after-losing-decryption-key.

Oops. Cryptographers cancel election results after losing decryption key.

One of the world’s premier security organizations has canceled the results of its annual leadership election after an official lost an encryption key needed to unlock results stored in a verifiable and privacy-preserving voting system.

The International Association of Cryptologic Research (IACR) said Friday that the votes were submitted and tallied using Helios, an open source voting system that uses peer-reviewed cryptography to cast and count votes in a verifiable, confidential, and privacy-preserving way. Helios encrypts each vote in a way that assures each ballot is secret. Other cryptography used by Helios allows each voter to confirm their ballot was counted fairly.

An “honest but unfortunate human mistake”

Per the association’s bylaws, three members of the election committee act as independent trustees. To prevent two of them from colluding to cook the results, each trustee holds a third of the cryptographic key material needed to decrypt results.

“Unfortunately, one of the three trustees has irretrievably lost their private key, an honest but unfortunate human mistake, and therefore cannot compute their decryption share,” the IACR said. “As a result, Helios is unable to complete the decryption process, and it is technically impossible for us to obtain or verify the final outcome of this election.”

To prevent a similar incident, the IACR will adopt a new mechanism for managing private keys. Instead of requiring all three chunks of private key material, elections will now require only two. Moti Yung, the trustee who was unable to provide his third of the key material, has resigned. He’s being replaced by Michel Abdalla.

The IACR is a nonprofit scientific organization providing research in cryptology and related fields. Cryptology is the science and practice of designing computation and communication systems that remain secure in the presence of adversaries. The associate is holding a new election that started Friday and runs through December 20.

Oops. Cryptographers cancel election results after losing decryption key. Read More »

rocket-report:-spacex’s-next-gen-booster-fails;-pegasus-will-fly-again

Rocket Report: SpaceX’s next-gen booster fails; Pegasus will fly again


With the government shutdown over, the FAA has lifted its daytime launch curfew.

Blue Origin’s New Glenn booster arrives at Port Canaveral, Florida, for the first time Tuesday aboard the “Jacklyn” landing vessel. Credit: Manuel Mazzanti/NurPhoto via Getty Images

Welcome to Edition 8.20 of the Rocket Report! For the second week in a row, Blue Origin dominated the headlines with news about its New Glenn rocket. After a stunning success November 13 with the launch and landing of the second New Glenn rocket, Jeff Bezos’ space company revealed a roadmap this week showing how engineers will supercharge the vehicle with more engines. Meanwhile, in South Texas, SpaceX took a step toward the first flight of the next-generation Starship rocket. There will be no Rocket Report next week due to the Thanksgiving holiday in the United States. We look forward to resuming delivery of all the news in space lift the first week of December.

As always, we welcome reader submissions. If you don’t want to miss an issue, please subscribe using the box below (the form will not appear on AMP-enabled versions of the site). Each report will include information on small-, medium-, and heavy-lift rockets, as well as a quick look ahead at the next three launches on the calendar.

Northrop’s Pegasus rocket wins a rare contract. A startup named Katalyst Space Technologies won a $30 million contract from NASA in August to build a robotic rescue mission for the agency’s Neil Gehrels Swift Observatory in low-Earth orbit. Swift, in space since 2004, is a unique instrument designed to study gamma-ray bursts, the most powerful explosions in the Universe. The spacecraft lacks a propulsion system and its orbit is subject to atmospheric drag, and NASA says it is “racing against the clock” to boost Swift’s orbit and extend its lifetime before it falls back to Earth. On Wednesday, Katalyst announced it selected Northrop Grumman’s air-launched Pegasus XL rocket to send the rescue craft into orbit next year.

Make this make sense … At first glance, this might seem like a surprise. The Pegasus XL rocket hasn’t flown since 2021 and has launched just once in the last six years. The solid-fueled rocket is carried aloft under the belly of a modified airliner, then released to fire payloads of up to 1,000 pounds (450 kilograms) into low-Earth orbit. It’s an expensive rocket for its size, with Northrop charging more than $25 million per launch, according to the most recent public data available; the satellites best suited to launch on Pegasus will now find much cheaper tickets to orbit on rideshare missions using SpaceX’s Falcon 9 rocket. There are a few reasons none of this mattered much to Katalyst. First, the rescue mission must launch into a very specific low-inclination orbit to rendezvous with the Swift observatory, so it won’t be able to join one of SpaceX’s rideshare missions. Second, Northrop Grumman has parts available for one more Pegasus XL rocket, and the company might have been willing to sell the launch at a discount to clear its inventory and retire the rocket’s expensive-to-maintain L-1011 carrier aircraft. And third, smaller rockets like Rocket Lab’s Electron or Firefly’s Alpha don’t quite have the performance to place Katalyst’s rescue mission into the required orbit. (submitted by gizmo23)

The easiest way to keep up with Eric Berger’s and Stephen Clark’s reporting on all things space is to sign up for our newsletter. We’ll collect their stories and deliver them straight to your inbox.

Sign Me Up!

Ursa Major rakes in more cash. Aerospace and defense startup Ursa Major Technologies landed a $600 million valuation in a new fundraising round, the latest sign that investors are willing to back companies developing new rocket technology, Bloomberg reports. Colorado-based Ursa Major closed its Series E fundraising round with investments from the venture capital firms Eclipse, Woodline Partners, Principia Growth, XN, and Alsop Louie Partners. The company also secured $50 million in debt financing. Ursa Major is best known as a supplier of liquid-fueled rocket engines and solid rocket motors to power a range of commercial and government vehicles.

Hypersonic tailwinds … Ursa Major says it is positioned to provide the US industrial base with propulsion systems faster and more affordably than legacy contractors can supply. “The company will rapidly field its throttleable, storable, liquid-fueled hypersonic and space-based defense solution, as well as scale its solid rocket motor and sustained space mobility manufacturing capacity,” Ursa Major said in a press release. Its customers include BAE Systems, which will use Ursa Major’s solid rocket motors to power tactical military-grade rockets, and Stratolaunch, which uses Ursa Major’s liquid-fueled Hadley engine for its hypersonic Talon-A spaceplane.

Rocket Lab celebrates two launches in 48 hours. Rocket Lab launched a payload for an undisclosed commercial customer Thursday, just hours after the company announced plans for the launch, Space News reports. The launch from Rocket Lab’s primary spaceport in New Zealand used the company’s Electron rocket, but officials released little more information on the mission, other than its nickname: “Follow My Speed.” An artist’s illustration on the mission patch indicated the payload might have been the next in a line of Earth-imaging satellites from the remote sensing company BlackSky, although the firm’s previous satellites have not launched with such secrecy.

Two hemispheres … Thursday’s launch from the Southern Hemisphere came just two days after Rocket Lab’s previous mission lifted off from Wallops Island, Virginia. That flight was a suborbital launch to support a hypersonic technology demonstration for the Defense Innovation Unit and the Missile Defense Agency. All told, Rocket Lab has now launched 18 Electron rockets this year with 100 percent mission success, a company record.

Spanish startup makes a big reveal. The Spanish company PLD Space released photos of a test version of its Miura 5 rocket Thursday, calling it a “decisive step forward in the orbital launcher validation campaign.” The full-scale qualification unit, called QM1, will allow engineers to complete subsystem testing under “real conditions” to ensure the rocket’s reliability before its first mission scheduled for 2026. The first stage of the qualification unit will undergo a full propellant loading test, while the second stage will undergo a destructive test in the United States to validate the rocket’s range safety destruct system. Miura 5 is designed to deliver a little more than a metric ton (2,200 pounds) of payload to low-Earth orbit.

Still a long way to go … “Presenting our first integrated Miura 5 unit is proof that our model works: vertical integration, proprietary infrastructure and a philosophy based on testing, learning, and improving,” said Raúl Torres, CEO and co-founder of PLD Space. The reveal, however, is just the first step in a qualification campaign that takes more than a year for most rocket companies. PLD Space aims to go much faster, with plans to complete a second qualification rocket by the end of December and unveil its first flight rocket in the first quarter of next year. “This unprecedented development cadence in Europe reinforces PLD Space’s position as the company that has developed an orbital launcher in the shortest time–just two years–whilst meeting the highest quality standards,” the company said in a statement. This would be a remarkable achievement, but history suggests PLD Space has a steep climb in the months ahead. (submitted by Leika and EllPeaTea)

Sweden digs deep in pursuit of sovereign launch. In an unsettled world, many nations are eager to develop homegrown rockets to place their own satellites into orbit. These up-and-coming spacefaring nations see it as a strategic imperative to break free from total reliance on space powers like Russia, China, and the United States. Still, some decisions are puzzling. This week, the Swedish aerospace and defense contractor Saab announced a $10 million investment in a company named Pythom. If you’re not familiar with this business, allow me to link back to a 2022 story published by Ars about Pythom’s questionable safety practices. The company has kept quiet since then, until the name surprisingly popped up again in a press release from Saab, a firm with a reputation that seems to be diametrically opposed to that of Pythom.

Just enough … The statement from Saab suggests its $10 million contribution to Pythom will make it the “lead investor” in the company’s recent funding round. Pythom hasn’t said anything more about this funding round, but Saab said the investment will accelerate Pythom’s “development and deployment of its launch systems,” which include an initial rocket capable of putting up to 330 pounds (150 kilograms) of payload into low-Earth orbit. $10 million may be just enough to keep Pythom afloat for a couple more years but is far less than the money Pythom would need to get serious about fielding an orbital launcher. Pythom is headquartered in California, but it has Swedish roots. It was founded by the Swedish married couple Tina and Tom Sjögren. The company has a couple dozen employees, and a handful of them are based in Sweden, according to Pythom’s website. (submitted by Leika and EllPeaTea)

China is about to launch an astronaut lifeboat. China is set to launch an uncrewed Shenzhou spacecraft to the Tiangong space station to provide the Shenzhou 21 astronauts with a means of returning home, Space News reports. The launch of China’s Shenzhou 22 mission is scheduled for Monday night, US time, aboard a Long March 2F rocket. Instead of carrying astronauts, the ship will ferry cargo to the Chinese Tiangong space station. More importantly, it will provide a safe ride home for the three astronauts living and working aboard the orbiting outpost.

How did we get here? … The Shenzhou 20 spacecraft currently docked to the Tiangong station was damaged by a suspected piece of space junk, cracking its window and rendering it unable to meet China’s safety standards for returning astronauts to Earth. The damage discovery occurred just before three outgoing crew members were supposed to ride Shenzhou 20 home earlier this month. Instead, those three astronauts departed the station and returned to Earth on the newer, undamaged Shenzhou 21 spacecraft. That left the other three crew members on Tiangong with only the damaged Shenzhou 20 spacecraft to get them home in the event of an emergency. Shenzhou 22 will replace Shenzhou 20, providing a lifeboat for the rest of the crew’s six-month stay in space. (submitted by EllPeaTea)

Atlas V launches for Viasat. United Launch Alliance launched its Atlas V rocket on November 13 with a satellite for the California-based communications company Viasat, Spaceflight Now reports. The launch came a week after the mission was scrubbed due to a faulty liquid oxygen tank vent valve on the Atlas booster. ULA rolled the rocket back to the Vertical Integration Facility, replaced it with a new valve, and returned the rocket to the pad on November 12. The launch the following day was successful, with the Atlas V’s Centaur upper stage deploying the ViaSat-3 F2 spacecraft into a geosynchronous transfer orbit nearly three-and-a-half hours after liftoff from Cape Canaveral Space Force Station, Florida.

End of an era … This was the final launch of an Atlas V rocket with a payload heading for geosynchronous orbit. These are the kinds of missions the Atlas V was designed for more than 25 years ago, but the market has changed. All of the Atlas V’s remaining 11 missions will target low-Earth orbit carrying broadband satellites for Amazon or Boeing’s Starliner spacecraft heading for the International Space Station. The Atlas V will be retired in the coming years in favor of ULA’s new Vulcan rocket.

SpaceX launches key climate change monitor. SpaceX launched a joint NASA-European environmental research satellite early Monday, the second in an ongoing billion-dollar project to measure long-term changes in sea level, a key indicator of climate change, CBS News reportsThe first satellite, known as Sentinel-6 and named in honor of NASA climate researcher Michael Freilich, was launched in November 2020. The latest spacecraft, Sentinel-6B, was launched from California atop a Falcon 9 rocket this week. Both satellites are equipped with a sophisticated cloud-penetrating radar. By timing how long it takes beams to bounce back from the ocean 830 miles (1,336 kilometers) below, the Sentinel-6 satellites can track sea levels to an accuracy of about one inch while also measuring wave height and wind speeds. The project builds on earlier missions dating back to the early 1990s that have provided an uninterrupted stream of sea level data.

FAA restrictions lifted … The Federal Aviation Administration lifted a restriction on commercial space operations this week that limited launches and reentries to the late night and early morning hours, Spaceflight Now reports. The FAA imposed a daytime curfew on commercial launches as it struggled to maintain air traffic control during the recent government shutdown. Those restrictions, which did not affect government missions, were lifted Monday. (submitted by EllPeaTea)

Blue Origin’s New Glenn will grow larger. One week after the successful second launch of its large New Glenn booster, Blue Origin revealed a road map on Thursday for upgrades to the rocket, including a new variant with more main engines and a super-heavy lift capability, Ars reports. These upgrades to the rocket are “designed to increase payload performance and launch cadence, while enhancing reliability,” the company said in an update published on its website. The enhancements will be phased in over time, starting with the third launch of New Glenn, which is likely to occur during the first half of 2026.

No timelines The most significant part of the update concerned an evolution of New Glenn that will transform the booster into a super-heavy lift launch vehicle. The first stage of this evolved vehicle will have nine BE-4 engines instead of seven, and the upper stage will have four BE-3U engines instead of two. In its update, Blue Origin refers to the new vehicle as 9×4 and the current variant as 7×2, a reference to the number of engines in each stage. “New Glenn 9×4 is designed for a subset of missions requiring additional capacity and performance,” the company said. “The vehicle carries over 70 metric tons to low-Earth orbit, over 14 metric tons direct to geosynchronous orbit, and over 20 metric tons to trans-lunar injection. Additionally, the 9×4 vehicle will feature a larger 8.7-meter fairing.” The company did not specify a timeline for the debut of the 9×4 variant. A spokesperson for the company told Ars, “We aren’t disclosing a specific timeframe today. The iterative design from our current 7×2 vehicle means we can build this rocket quickly.”

Recently landed New Glenn returns to port. Blue Origin welcomed “Never Tell Me the Odds” back to Cape Canaveral Space Force Station, Florida, on Thursday, where the rocket booster launched exactly one week prior, Florida Today reports. The New Glenn’s first stage booster landed on Blue Origin’s offshore recovery barge, which returned it to Port Canaveral on Tuesday with great fanfare. Blue Origin’s founder, Jeff Bezos, rode the barge into port, posing for photos with the rocket and waving to onlookers viewing the spectacle from a nearby public pier. The rocket was lowered horizontally late Wednesday morning, as spectators watched alongside the restaurants and fishing boats at the port.

Through the gates Officials from Blue Origin guided the 188-foot-long New Glenn booster to the Space Force station Thursday, making Blue Origin the only company besides SpaceX to return a space-flown booster through the gates. Once back at Blue Origin’s hangar, the rocket will undergo inspections and refurbishment for a second flight, perhaps early next year. “I could not be more excited to see the New Glenn launch, and Blue Origin recover that booster and bring it back,” Col. Brian Chatman, commander of Space Launch Delta 45, told Florida Today. “It’s all part of our certification process and campaign to certify more national security space launch providers, launch carriers, to get our most crucial satellites up on orbit.”

Meanwhile, down at Starbase. SpaceX rolled the first of its third-generation Super Heavy boosters out of the factory at Starbase, Texas, this week for a road trip to a nearby test site, according to NASASpaceflight.com. The booster rode SpaceX’s transporter from the factory a few miles down the road to Massey’s Test Site, where technicians prepared the rocket for cryogenic proof testing. However, during the initial phases of testing, the booster failed early on Friday morning.

Tumbling down … At the Starship launch site, ground teams are busy tearing down the launch mount at Pad 1, the departure point for all of SpaceX’s Starships to date. SpaceX will upgrade the pad for its next-generation, more powerful Super Heavy boosters, while Starship V3’s initial flights will take off from Pad 2, a few hundred meters away from Pad 1.

Next three launches

Nov. 22: Falcon 9 | Starlink 6-79 | Cape Canaveral Space Force Station, Florida | 06: 59 UTC

Nov. 23: Falcon 9 | Starlink 11-30 | Vandenberg Space Force Base, California | 08: 00 UTC

Nov. 25: Long March 2F | Shenzhou 22 | Jiuquan Satellite Launch Center, China | 04: 11 UTC

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

Rocket Report: SpaceX’s next-gen booster fails; Pegasus will fly again Read More »

stoke-space-goes-for-broke-to-solve-the-only-launch-problem-that-“moves-the-needle”

Stoke Space goes for broke to solve the only launch problem that “moves the needle”


“Does the world really need a 151st rocket company?”

Stoke Space’s full-flow staged combustion is tested in Central Washington in 2024. Credit: Stoke Space

Stoke Space’s full-flow staged combustion is tested in Central Washington in 2024. Credit: Stoke Space

LAUNCH COMPLEX 14, Cape Canaveral, Fla.—The platform atop the hulking steel tower offered a sweeping view of Florida’s rich, sandy coastline and brilliant blue waves beyond. Yet as captivating as the vista might be for an aspiring rocket magnate like Andy Lapsa, it also had to be a little intimidating.

To his right, at Launch Complex 13 next door, a recently returned Falcon 9 booster stood on a landing pad. SpaceX has landed more than 500 large orbital rockets. And next to SpaceX sprawled the launch site operated by Blue Origin. Its massive New Glenn rocket is also reusable, and founder Jeff Bezos has invested tens of billions of dollars into the venture.

Looking to the left, Lapsa saw a graveyard of sorts for commercial startups. Launch Complex 15 was leased to a promising startup, ABL Space, two years ago. After two failed launches, ABL Space pivoted away from commercial launch. Just beyond lies Launch Complex 16, where Relativity Space aims to launch from. The company has already burned through $1.7 billion in its efforts to reach orbit. Had billionaire Eric Schmidt not stepped in earlier this year, Relativity would have gone bankrupt.

Andy Lapsa may be a brainy rocket scientist, but he is not a billionaire. Far from it.

“When you start a company like this, you have no idea how far you’re going to be able to make it, you know?” he admitted.

Lapsa and another aerospace engineer, Tom Feldman, founded Stoke Space a little more than five years ago. Both had worked the better part of a decade at Blue Origin and decided they wanted to make their mark on the industry. It was not an easy choice to start a rocket company at a time when there were dozens of other entrants in the field.

Andy Lapsa speaks at the Space Economy Summit in November 2025.

Credit: The Economist Group

Andy Lapsa speaks at the Space Economy Summit in November 2025. Credit: The Economist Group

“It was a huge question in my head: Does the world really need a 151st rocket company?” he said. “And in order for me to say yes to that question, I had to very systematically go through all the other players, thinking about the economics of launch, about the business plan, about the evolution of these companies over time. It was very non-intuitive to me to start another launch company.”

So why did he do it?

I traveled to Florida in November to answer this question and to see if the world’s 151st rocket company had any chance of success.

Launch Complex 14

It takes a long time to build a launch site. Probably longer than you might think.

Lapsa and Feldman spent much of 2020 working on the basic design of a rocket that would eventually be named Nova and deciding whether they could build a business around it. In December of that year, they closed their seed round of funding, raising $9.1 million. After this, finding somewhere to launch from became a priority.

They zeroed in on Cape Canaveral because it’s where the majority of US launch companies and customers are, as well as the talent to assemble and launch rockets. They learned in 2021 that the US Space Force was planning to lease an old pad, Space Launch Complex 14, to a commercial company. This was not just a good location to launch from; it was truly a historic location—John Glenn launched into orbit from here in 1962 aboard the Friendship 7 spacecraft. It was retired in 1967 and designated a National Historic Landmark.

But in recent years, the Space Force has sought to support the flourishing US commercial space industry, and it has offered Launch Complex 14. After the competition opened in 2021, Stoke Space won the lease a year later. Then began the long and arduous process of conducting an Environmental Assessment. It took nearly two years, and it was not until October 20, 2024, that Stoke was allowed to break ground.

None of the structures on the site were usable, and aside from the historic blockhouse dating to the Mercury program, everything else had to be demolished and cleared before work could begin.

As we walked the large ring encompassing the site, Lapsa explained that all of the tanks and major hardware needed to support a Nova launch were now on site. There is a large launch tower, as well as a launch mount upon which the rocket will be stood up. The company has mostly turned toward integrating all of the ground infrastructure and wiring up the site. A nearby building to assemble rockets and process payloads is well underway.

Lapsa seemed mostly relieved. “A year ago, this was my biggest concern,” he said.

He need not have worried. A few months before the company completed its environmental permitting, a tall, lanky, thickly bearded engineer named Jonathan Lund hired on. A Stanford graduate who got his start with the US Army Corps of Engineers, Lund worked at SpaceX during the second half of the 2010s, helping to lead the reconstruction of one launch pad, the crew tower project at Launch Complex 39A, and a pad at Vandenberg Space Force Base. He also worked on multiple landing sites for the Falcon 9 rocket. Lund arrived to lead the development of Stoke’s site.

This is Lund’s fifth launch pad. Each one presents different challenges. In Florida, for example, the water table lies only a few feet below the ground. But for most rockets, including Nova, a large trench must be dug to allow flames from the rocket engines to be carried away from the vehicle at ignition and liftoff. As we stood in this massive flame diverter, there were a few indications of water seeping in.

Still, the company recently completed a major milestone by testing the water suppression system, which dampens the energy of a rocket at liftoff to protect the launch pad. Essentially, the plume from the rocket’s engines flows downward where it meets a sheet of water, turning it into steam. This creates an insulating barrier of sorts.

Water suppression test at LC-14 complete. ✅ Flowed the diverter and rain birds in a “launch like” scenario. pic.twitter.com/rs1lEloPul

— Stoke Space (@stoke_space) October 21, 2025

The water comes from large pipes running down the flame diverter, each of which has hundreds of holes not unlike a garden sprinkler hose. Lund said the pipes and the frame they rest on were built near where we stood.

“We fabricated these pieces on site, at the north end of the flame trench,” Lund explained. “Then we built this frame in Cocoa Beach and shipped it in four different sections and assembled it on site. Then we set the frame on the ramp, put together this surface (with the pipes), and then Egyptian-style we slide it down the ramp right into position. We used some old-school methods, but simple sometimes works best. Nothing fancy.”

At this point, Lapsa interrupted. “I was pretty nervous,” he said. “The way you’re describing this sounded good on a PowerPoint. But I wasn’t sure it actually would work.”

But it did.

Waiting on Nova

So if the pad is rounding into shape, how’s that rocket coming?

It sounds like Stoke Space is doing the right things. Earlier this year, the company shipped a full-scale version of its second stage to its test site at Moses Lake in central Washington. There, it underwent qualification testing, during which the vehicle is loaded with cryogenic fuels on multiple occasions, pressurized, and put through other exercises. Lapsa said that testing went well.

The company also built a stubby version of its first stage. The tanks and domes had full-size diameters, but the stage was not its full height. That vehicle also underwent qualification testing and passed.

The company has begun building flight hardware for the first Nova rocket. The vehicle’s software is maturing. Work is well underway on the development of an automated flight termination system. “Having a team that’s been through this cycle many times, it’s something we started putting attention on very early,” Lapsa said. “It’s on a good path as well.”

And yet the final, frenetic months leading to a debut launch are crunch time for any rocket company: first assembly of the full vehicle, first time test-firing it all. Things will inevitably go wrong. The question is how bad will the problems be?

For as long as I’ve known Lapsa, he has been cagey about launch dates for Stoke. This is smart because in reality, no one knows. And seasoned industry people (and journalists) know that projected launch dates for new rockets are squishy. The most precise thing Lapsa will say is that Stoke is targeting “next year” for Nova’s debut.

The company has a customer for the first flight. If all goes well, its first mission will sail to the asteroid belt. Asteroid mining startup AstroForge has signed on for Nova 1.

Stoke Space isn’t shooting for the Moon. It’s shooting for something 1 million times farther.

Too good to believe it’s true?

Stoke Space is far from the first company to start with grand ambitions. And when rocket startups think too big, it can be their undoing.

A little more than a decade ago, Firefly Space Systems in Texas based the design of its Alpha rocket on an aerospike engine, a technology that had never been flown to space before. Although this was theoretically a more efficient engine design, it also brought more technical risk and proved a bridge too far. By 2017, the company was bankrupt. When Ukrainian investor Max Polyakov rescued Firefly later that year, he demanded that Alpha have a more conventional rocket engine design.

Around the same time that Firefly struggled with its aerospike engine, another launch company, Relativity Space, announced its intent to 3D-print the entirety of its rockets. The company finally launched its Terran 1 rocket after eight years. But it struggled with additively manufacturing rockets. Relativity was on the brink of bankruptcy before a former Google executive, Eric Schmidt, stepped in to rescue the company financially. Relativity is now focused on a traditionally manufactured rocket, the Terran R.

Stoke Space’s Hopper 2 takes to the skies in September 2023 in Moses Lake, Washington.

Credit: Stoke Space

Stoke Space’s Hopper 2 takes to the skies in September 2023 in Moses Lake, Washington. Credit: Stoke Space

So what to make of Stoke Space, which has an utterly novel design for its second stage? The stage is powered by a ring of 24 thrusters, an engine collectively named Andromeda. Stoke has also eschewed a tile-based heat shield to protect the vehicle during atmospheric reentry in favor of a regeneratively cooled design.

In this, there are echoes of Firefly, Relativity, and other companies with grand plans that had to be abandoned in favor of simpler designs to avoid financial ruin. After all, it’s hard enough to reach orbit with a conventional rocket.

But the company has already done a lot of testing of this design. Its first iteration of Andromeda even completed a hop test back in 2023.

“Andromeda is wildly new,” Lapsa said. “But the question of can it work, in my opinion, is a resounding yes.”

The engineering team had all manner of questions when designing Andromeda several years ago. How will all of those thrusters and their plumbing interact with one another? Will there be feedback? Is the heat shield idea practical?

“Those are the kind of unknowns that we knew we were walking into from an engineering perspective,” Lapsa said. “We knew there should be an answer in there, but we didn’t know exactly what it would be. It’s very hard to model all that stuff in the transient. So you just had to get after it, and do it, and we were able to do that. So can it work? Absolutely yes. Will it work out of the box? That’s a different question.”

First stage, too

Stoke’s ambitions did not stop with the upper stage. Early on, Lapsa, Feldman, and the small engineering team also decided to develop a full-flow staged combustion engine. This, Lapsa acknowledges, was a “risky” decision for the company. But it was a necessary one, he believes.

Full-flow staged combustion engines had been tested before this decade but were never flown. From an engineering standpoint, they are significantly more complex than a traditional staged combustion engine in that the oxidizer and propellant—which began as cryogenic liquids—arrive in the combustion chamber in a fully gaseous state. This interaction between two gases is more efficient and produces less wear and tear on turbines within the engine.

“You want to get the highest efficiency you can without driving the turbine temperature to a place where you have a short lifetime,” Lapsa said. “Full-flow is the right answer for that. If you do anything else, it’s a distraction.”

Stoke Space successfully tests its advanced full-flow staged combustion rocket engine, designed to power the Nova launch vehicle’s first stage.

Credit: Stoke Space

Stoke Space successfully tests its advanced full-flow staged combustion rocket engine, designed to power the Nova launch vehicle’s first stage. Credit: Stoke Space

It was also massively unproven. When Stoke Space was founded in 2020, no full-flow staged combustion engine had ever gotten close to space. SpaceX was developing the Raptor engine using the technology, but it would not make its first “spaceflight” until the spring of 2023 on the Super Heavy rocket that powers Starship. Multiple Raptors failed shortly after ignition.

But for a company choosing full reusability of its rocket, as SpaceX sought to do with Starship, there ultimately is no choice.

“Anything you build for full and rapid reuse needs to find margin somewhere in the system,” Lapsa said. “And really that’s fuel efficiency. It makes fuel efficiency a very strong, very important driver.”

In June 2024, Stoke Space announced it had just completed a successful hot fire test of its full-flow, staged combustion engine for Nova’s first stage. The propulsion team had, Lapsa said at the time, “worked tirelessly” to reach that point.

Not just another launch company?

Stoke Space got to the party late. After SpaceX’s success with the first Falcon 9 in 2010, a wave of new entrants entered the field over the next decade. They were drawing down billions in venture capital funding, and some were starting to go public at huge valuations as special purpose acquisition companies. But by 2020, the market seemed saturated. The gold rush for new launch companies was nearing the cops-arrive-to-bust-up-the-festivities stage.

Every new company seemed to have its own spin on how to conquer low-Earth orbit.

“There were a lot of other business plans being proposed and tried,” Lapsa said. “There were low-cost, mass-produced disposable rockets. There were rockets under the wings of aircraft. There were rocket engine companies that were going to sell to 150 launch companies. All of those ideas raised big money and deserve to be considered. The question is, which one is the winner in the end?”

And that’s the question he was trying to answer in his own mind. He was in his 30s. He had a family. And he was looking to commit his best years, professionally, to solving a major launch problem.

“What’s the thing that fundamentally moves the needle on what’s out there already today?” he said. “The only thing, in my opinion, is rapid reuse. And once you get it, the economics are so powerful that nothing else matters. That’s the thing I couldn’t get out of my head. That’s the only problem I wanted to work on, and so we started a company in order to work on it.”

Stoke was one of many launch companies five years ago. But in the years since, the field has narrowed considerably. Some promising companies, such as Virgin Orbit and ABL Space, launched a few times and folded. Others never made it to the launch pad. Today, by my count, there are fewer than 10 serious commercial launch companies in the United States, Stoke among them. The capital markets seem convinced. In October, Stoke announced a massive $510 million Series D funding round. That was a lot of money in a challenging time to raise launch firm funding.

So Stoke has the money it needs. It has a team of sharp engineers and capable technicians. It has a launch pad and qualified hardware. That’s all good because this is the point in the journey for a launch startup where things start to get very, very difficult.

Photo of Eric Berger

Eric Berger is the senior space editor at Ars Technica, covering everything from astronomy to private space to NASA policy, and author of two books: Liftoff, about the rise of SpaceX; and Reentry, on the development of the Falcon 9 rocket and Dragon. A certified meteorologist, Eric lives in Houston.

Stoke Space goes for broke to solve the only launch problem that “moves the needle” Read More »