Author name: Kris Guyer

Raspberry Pi cuts product returns by 50% by changing up its pin soldering

electronics, electronics manufacturing, manufacturing, Raspberry Pi, raspberry pi 5, single board computers, soldering, Tech / Kris Guyer / May 1, 2025

Getting the hang of through-hole soldering is tricky for those of us tinkering at home with our irons, spools, flux, and, sometimes, braids. It’s almost reassuring, then, to learn that through-hole soldering was also a pain for a firm that has made more than 60 million products with it.

Raspberry Pi boards have a combination of surface-mount devices (SMDs) and through-hole bits. SMDs allow for far more tiny chips, resistors, and other bits to be attached to boards by their tiny pins, flat contacts, solder balls, or other connections. For those things that are bigger, or subject to rough forces like clumsy human hands, through-hole soldering is still required, with leads poked through a connective hole and solder applied to connect and join them securely.

The Raspberry Pi board has a 40-pin GPIO header on it that needs through-hole soldering, along with bits like the Ethernet and USB ports. These require robust solder joints, which can’t be done the same way as with SMT (surface-mount technology) tools. “In the early days of Raspberry Pi, these parts were inserted by hand, and later by robotic placement,” writes Roger Thornton, director of applications for Raspberry Pi, in a blog post. The boards then had to go through a follow-up wave soldering step.

Now Pi boards have their tiny bits and bigger pieces soldered at the same time through an intrusive reflow soldering process undertaken with Raspberry Pi’s UK manufacturing partner, Sony. After adjusting component placement, the solder stencil, and the connectors, the board makers could then place and secure all their components in the same stage.

Raspberry Pi cuts product returns by 50% by changing up its pin soldering Read More »

RFK Jr. rejects cornerstone of health science: Germ theory

anti-vaccine, fauci, germ theory, health, Infectious disease, miasma, robert f kennedy jr, Science / Kris Guyer / May 1, 2025

In his 2021 book vilifying Anthony Fauci, RFK Jr. lays out support for an alternate theory.

Health and Human Services Secretary Robert F. Kennedy Jr. speaks at a news conference on removing synthetic dyes from America’s food supply, at the Health and Human Services Headquarters in Washington, DC on April 22, 2025. Credit: Getty | Nathan Posner

With the rise of Robert F. Kennedy Jr., brain worms have gotten a bad rap.

A year ago, the long-time anti-vaccine advocate and current US health secretary famously told The New York Times that a parasitic worm “got into my brain and ate a portion of it and then died.” The startling revelation is now frequently referenced whenever Kennedy says something outlandish, false, or offensive—which is often. For those who have followed his anti-vaccine advocacy, it’s frightfully clear that, worm-infested or not, Kennedy’s brain is marinated in wild conspiracy theories and dangerous misinformation.

While it’s certainly possible that worm remnants could impair brain function, it remains unknown if the worm is to blame for Kennedy’s cognitive oddities. For one thing, he was also diagnosed with mercury poisoning, which can cause brain damage, too. As prominent infectious disease expert Anthony Fauci said last June in a conversation with political analyst David Axelrod: “I don’t know what’s going on in [Kennedy’s] head, but it’s not good.”

The trouble is that now that Kennedy is the country’s top health official, his warped ideas are contributing to the rise of a dystopian reality. Federal health agencies are spiraling into chaos, and critical public health services for Americans have been brutally slashed, dismantled, or knee-capped—from infectious disease responses, the lead poisoning team, and Meals on Wheels to maternal health programs and anti-smoking initiatives, just to name a few. The health of the nation is at stake; the struggle to understand what goes on in Kennedy’s head is vital.

While we may never have definitive answers on his cognitive situation, one thing is plain: Kennedy’s thoughts and actions make a lot more sense when you realize he doesn’t believe in a foundational scientific principle: germ theory.

Dueling theories

Germ theory is, of course, the 19th-century proven idea that microscopic germs—pathogenic viruses, bacteria, parasites, and fungi—cause disease. It supplanted the leading explanation of disease at the time, the miasma theory, which suggests that diseases are caused by miasma, that is, noxious mists and vapors, or simply bad air arising from decaying matter, such as corpses, sewage, or rotting vegetables. While the miasma theory was abandoned, it is credited with spurring improvements in sanitation and hygiene—which, of course, improve health because they halt the spread of germs, the cause of diseases.

Germ theory also knocks back a lesser-known idea called the terrain theory, which we’ve covered before. This is a somewhat ill-defined theory that generally suggests diseases stem from imbalances in the internal “terrain” of the body, such as malnutrition or the presence of toxic substances. The theory is linked to ideas by French scientist Antoine Béchamp and French physiologist Claude Bernard.

Béchamp, considered a bitter crank and rival to famed French microbiologist Louis Pasteur, is perhaps best known for wrongly suggesting the basic unit of organisms is not the cell, but nonexistent microanatomical elements he called “microzyma.” While the idea was largely ignored by the scientific community, Béchamp suggested that disruptions to microzyma are a predisposition to disease, as is the state of the body’s “terrain.” French physiologist Claude Bernard, meanwhile, came up with an idea of balance or stability of the body’s internal environment (milieu intérieur), which was a precursor to the concept of homeostasis. Ideas from the two figures came together to create an ideology that has been enthusiastically adopted by modern-day germ theory denialists, including Kennedy.

It’s important to note here that our understanding of Kennedy’s disbelief in germ theory isn’t based on speculation or deduction; it’s based on Kennedy’s own words. He wrote an entire section on it in his 2021 book vilifying Fauci, titled The Real Anthony Fauci. The section is titled “Miasma vs. Germ Theory,” in the chapter “The White Man’s Burden.”

But, we did reach out to Health and Human Services to ask how Kennedy’s disbelief in germ theory influences his policy decisions. HHS did not respond.

Kennedy’s beliefs

In the chapter, Kennedy promotes the “miasma theory” but gets the definition completely wrong. Instead of actual miasma theory, he describes something more like terrain theory. He writes: “‘Miasma theory’ emphasizes preventing disease by fortifying the immune system through nutrition and by reducing exposures to environmental toxins and stresses.”

Kennedy contrasts his erroneous take on miasma theory with germ theory, which he derides as a tool of the pharmaceutical industry and pushy scientists to justify selling modern medicines. The abandonment of miasma theory, Kennedy bemoans, realigned health and medical institutions to “the pharmaceutical paradigm that emphasized targeting particular germs with specific drugs rather than fortifying the immune system through healthy living, clean water, and good nutrition.”

According to Kennedy, germ theory gained popularity, not because of the undisputed evidence supporting it, but by “mimicking the traditional explanation for disease—demon possession—giving it a leg up over miasma.”

To this day, Kennedy writes, a “$1 trillion pharmaceutical industry pushing patented pills, powders, pricks, potions, and poisons, and the powerful professions of virology and vaccinology led by ‘Little Napoleon’ himself, Anthony Fauci, fortify the century-old predominance of germ theory.”

In all, the chapter provides a clear explanation of why Kennedy relentlessly attacks evidence-based medicines; vilifies the pharmaceutical industry; suggests HIV doesn’t cause AIDS and antidepressants are behind mass shootings; believes that vaccines are harmful, not protective; claims 5G wireless networks cause cancer; suggests chemicals in water are changing children’s gender identities; and is quick to promote supplements to prevent and treat diseases, such as recently recommending vitamin A for measles and falsely claiming children who die from the viral infection are malnourished.

A religious conviction

For some experts, the chapter was like a light bulb going on. “I thought ‘it now all makes sense’… I mean, it all adds up,” Paul Offit, pediatrician and infectious disease expert at Children’s Hospital of Philadelphia, told Ars Technica. It’s still astonishing, though, he added. “It’s so unbelievable, because you can’t imagine that someone who’s the head of Health and Human Services doesn’t believe that specific viruses or bacteria cause specific diseases, and that the prevention or treatment of them is lifesaving.”

Offit has a dark history with Kennedy. Around 20 years ago, Kennedy called Offit out of the blue to talk with him about vaccine safety. Offit knows a lot about it—he’s not only an expert on vaccines, he’s the co-inventor of one. The vaccine he co-developed, RotaTeq, protects against rotaviruses, which cause deadly diarrheal disease in young children and killed an estimated 500,000 people worldwide each year before vaccines were available. RotaTeq has been proven safe and effective and is credited with saving tens of thousands of lives around the world each year.

Kennedy and Offit spent about an hour talking, mostly about thimerosal, an ethylmercury-containing preservative that was once used in childhood vaccines but was mostly abandoned by 2001 as a precautionary measure. RotaTeq doesn’t and never did contain thimerosal—because it’s a live, attenuated viral vaccine, it doesn’t contain any preservatives. But Kennedy has frequently used thimerosal as a vaccine bogeyman over the years, claiming it causes harms (there is no evidence for this).

After their conversation, Kennedy published a story in Rolling Stone and Salon.com titled “Deadly Immunity,” which erroneously argued that thimerosal-containing vaccines cause autism. The article was riddled with falsehoods and misleading statements. It described Offit as “in the pocket” of the pharmaceutical industry and claimed RotaTeq was “laced” with thimerosal. Rolling Stone and Salon amended some of the article’s problems, but eventually Salon retracted it and Rolling Stone deleted it.

Looking back, Offit said he was sandbagged. “He’s a liar. He lied about who he was; he lied about what he was doing. He was just wanting to set me up,” Offit said.

Although that was the only time they had ever spoken, Kennedy has continued to disparage and malign Offit over the years. In his book dedicated to denigrating Fauci, Kennedy spends plenty of time spitting insults at Offit, calling him a “font of wild industry ballyhoo, prevarication, and outright fraud.” He also makes the wildly false claim that RotaTeq “almost certainly kills and injures more children in the United States than the rotavirus disease.”

Inconvincible

Understanding that Kennedy is a germ theory denialist and terrain theory embracer makes these attacks easier to understand—though no less abhorrent or dangerous.

“He holds these beliefs like a religious conviction,” Offit said. “There is no shaking him from that,” regardless of how much evidence there is to prove him wrong. “If you’re trying to talk him out of something that he holds with a religious conviction—that’s never going to happen. And so any time anybody disagrees with him, he goes, ‘Well, of course, they’re just in the pocket of industry; that’s why they say that.'”

There are some aspects of terrain theory that do have a basis in reality. Certainly, underlying medical conditions—which could be considered a disturbed bodily “terrain”—can make people more vulnerable to disease. And, with recent advances in understanding the microbiome, it has become clear that imbalances in the microbial communities in our gastrointestinal tracts can also predispose people to infections.

But, on the whole, the evidence against terrain theory is obvious and all around us. Terrain theorists consider disease a symptom of an unhealthy internal state, suggesting that anyone who gets sick is unhealthy and that all disease-causing germs are purely opportunistic. This is nonsense: Plenty of people fall ill while being otherwise healthy. And many germs are dedicated pathogens, with evolved, specialized virulence strategies such as toxins, and advanced defense mechanisms such as antibacterial resistance. They are not opportunists.

(There are some terrain theory devotees who do not believe in the existence of microbes at all—but Kennedy seems to accept that bacteria and viruses are real.)

Terrain theory applied

Terrain theory’s clash with reality has become painfully apparent amid Kennedy’s handling—or more accurately, mishandling—of the current measles situation in the US.

Most health experts would consider the current measles situation in the US akin to a five-alarm fire. An outbreak that began at the end of January in West Texas is now the largest and deadliest the country has seen in a quarter-century. Three people have died, including two unvaccinated young children who were otherwise healthy. The outbreak has spread to at least three other states, which also have undervaccinated communities where the virus can thrive. There’s no sign of the outbreak slowing, and the nation’s overall case count is on track to be the highest since the mid-1990s, before measles was declared eliminated in 2000. Modeling indicates the country will lose its elimination status and that measles will once again become endemic in the US.

Given the situation, one might expect a vigorous federal response—one dominated by strong and clear promotion of the highly effective, safe measles vaccine. But of course, that’s not the case.

“When those first two little girls died of measles in West Texas, he said immediately—RFK Jr.—that they were malnourished. It was the doctors that stood up and said ‘No, they had no risk factors. They were perfectly well-nourished,'” Offit points out.

Kennedy has also heavily pushed the use of vitamin A, a fat-soluble vitamin that accumulates in the body and can become toxic with large or prolonged doses. It does not prevent measles and is mainly used as supportive care for measles in low-income countries where vitamin A deficiency is common. Nevertheless, vaccine-hesitant communities in Texas have embraced it, leading to reports from doctors that they have had to treat children for vitamin A toxicity.

Poisons

Despite the raging outbreak, Kennedy spent part of last week drumming up fanfare for a rickety plan to rid American foods of artificial food dyes, which are accused of making sugary processed foods more appealing to kids, in addition to posing their own health risks. It’s part of his larger effort to improve Americans’ nutrition, a tenet of terrain theory. Though Kennedy has organized zero news briefings on the measles outbreak, he appeared at a jubilant press conference on removing the dyes.

The conference was complete with remarks from people who seem to share similar beliefs as Kennedy, including famed pseudoscience-peddler Vani Hari, aka “Food Babe,” and alternative-medicine guru and fad diet promoter Mark Hyman. Wellness mogul and special government employee Cally Meads also took to the podium to give a fury-filled speech in which he claimed that 90 percent of FDA’s spending is because we are “poisoning our children,” echoing a claim Kennedy has also made.

Kennedy, for his part, declared that “sugar is poison,” though he acknowledged that the FDA can’t ban it. While the conference was intended to celebrate the removal of artificial food dyes, he also acknowledged that there is no ban, nor forthcoming regulations, or even an agreement with food companies to remove the dyes. Kennedy instead said he simply had “an understanding” with food companies. FDA Commissioner Marty Makary explained the plan by saying: “I believe in love, and let’s start in a friendly way and see if we can do this without any statutory or regulatory changes.” Bloomberg reported the next day that food industry lobbyists said there is no agreement to remove the dyes.

However feeble the move, a focus on banning colorful cereal during a grave infectious disease outbreak makes a lot of sense if you know that Kennedy is a germ theory denialist.

But then again, there’s also the brain worm.

Beth is Ars Technica’s Senior Health Reporter. Beth has a Ph.D. in microbiology from the University of North Carolina at Chapel Hill and attended the Science Communication program at the University of California, Santa Cruz. She specializes in covering infectious diseases, public health, and microbes.

RFK Jr. rejects cornerstone of health science: Germ theory Read More »

The end of an AI that shocked the world: OpenAI retires GPT-4

AI, AI assistants, Biz & IT, chatgpt, chatgtp, GPT-4, large language models, machine learning, openai, sam altman / Kris Guyer / April 30, 2025

One of the most influential—and by some counts, notorious—AI models yet released will soon fade into history. OpenAI announced on April 10 that GPT-4 will be “fully replaced” by GPT-4o in ChatGPT at the end of April, bringing a public-facing end to the model that accelerated a global AI race when it launched in March 2023.

“Effective April 30, 2025, GPT-4 will be retired from ChatGPT and fully replaced by GPT-4o,” OpenAI wrote in its April 10 changelog for ChatGPT. While ChatGPT users will no longer be able to chat with the older AI model, the company added that “GPT-4 will still be available in the API,” providing some reassurance to developers who might still be using the older model for various tasks.

The retirement marks the end of an era that began on March 14, 2023, when GPT-4 demonstrated capabilities that shocked some observers: reportedly scoring at the 90th percentile on the Uniform Bar Exam, acing AP tests, and solving complex reasoning problems that stumped previous models. Its release created a wave of immense hype—and existential panic—about AI’s ability to imitate human communication and composition.

A screenshot of GPT-4's introduction to ChatGPT Plus customers from March 14, 2023. — A screenshot of GPT-4’s introduction to ChatGPT Plus customers from March 14, 2023. Credit: Benj Edwards / Ars Technica

While ChatGPT launched in November 2022 with GPT-3.5 under the hood, GPT-4 took AI language models to a new level of sophistication, and it was a massive undertaking to create. It combined data scraped from the vast corpus of human knowledge into a set of neural networks rumored to weigh in at a combined total of 1.76 trillion parameters, which are the numerical values that hold the data within the model.

Along the way, the model reportedly cost more than $100 million to train, according to comments by OpenAI CEO Sam Altman, and required vast computational resources to develop. Training the model may have involved over 20,000 high-end GPUs working in concert—an expense few organizations besides OpenAI and its primary backer, Microsoft, could afford.

Industry reactions, safety concerns, and regulatory responses

Curiously, GPT-4’s impact began before OpenAI’s official announcement. In February 2023, Microsoft integrated its own early version of the GPT-4 model into its Bing search engine, creating a chatbot that sparked controversy when it tried to convince Kevin Roose of The New York Times to leave his wife and when it “lost its mind” in response to an Ars Technica article.

The end of an AI that shocked the world: OpenAI retires GPT-4 Read More »

Intel says it’s rolling out laptop GPU drivers with 10% to 25% better performance

intel, Tech / Kris Guyer / April 30, 2025

Intel’s oddball Core Ultra 200V laptop chips—codenamed Lunar Lake—will apparently be a one-off experiment, not to be replicated in future Intel laptop chips. They’re Intel’s only processors with memory integrated onto the CPU package; the only ones with a neural processing unit that meets Microsoft’s Copilot+ performance requirements; and the only ones with Intel’s best-performing integrated GPUs, the Intel Arc 130V and 140V.

Today, Intel announced some updates to its graphics driver that specifically benefit those integrated GPUs, welcome news for anyone who bought one and is trying to get by with it as an entry-level gaming system. Intel says that version 32.0.101.6734 of its graphics driver can speed up average frame rates in some games by around 10 percent, and can speed up “1 percent low FPS” (that is, for any given frames per second measurement, whatever your frame rate is the slowest 1 percent of the time) by as much as 25 percent. This should, in theory, make games run better in general and ease some of the stuttering you notice when your game’s performance dips down to that 1 percent level.

Intel’s performance numbers for its new GPU drivers on a laptop running at the “common default power level” of 17 W. Credit: Intel

Intel’s performance comparisons were made using an MSI Claw 7 AI+ using an Arc 140V GPU, and they compare the performance of driver version 32.0.101.6732 (released April 2) to version 32.0.101.6734 (released April 8). The two additional driver packages Intel has released since then will contain the improvements, too.

Intel says it’s rolling out laptop GPU drivers with 10% to 25% better performance Read More »

Google search’s made-up AI explanations for sayings no one ever said, explained

AI, Google, google ai, google ai overviews / Kris Guyer / April 30, 2025

But what does “meaning” mean?

A partial defense of (some of) AI Overview’s fanciful idiomatic explanations.

Mind…. blown Credit: Getty Images

Last week, the phrase “You can’t lick a badger twice” unexpectedly went viral on social media. The nonsense sentence—which was likely never uttered by a human before last week—had become the poster child for the newly discovered way Google search’s AI Overviews makes up plausible-sounding explanations for made-up idioms (though the concept seems to predate that specific viral post by at least a few days).

Google users quickly discovered that typing any concocted phrase into the search bar with the word “meaning” attached at the end would generate an AI Overview with a purported explanation of its idiomatic meaning. Even the most nonsensical attempts at new proverbs resulted in a confident explanation from Google’s AI Overview, created right there on the spot.

In the wake of the “lick a badger” post, countless users flocked to social media to share Google’s AI interpretations of their own made-up idioms, often expressing horror or disbelief at Google’s take on their nonsense. Those posts often highlight the overconfident way the AI Overview frames its idiomatic explanations and occasional problems with the model confabulating sources that don’t exist.

But after reading through dozens of publicly shared examples of Google’s explanations for fake idioms—and generating a few of my own—I’ve come away somewhat impressed with the model’s almost poetic attempts to glean meaning from gibberish and make sense out of the senseless.

Talk to me like a child

Let’s try a thought experiment: Say a child asked you what the phrase “you can’t lick a badger twice” means. You’d probably say you’ve never heard that particular phrase or ask the child where they heard it. You might say that you’re not familiar with that phrase or that it doesn’t really make sense without more context.

Someone on Threads noticed you can type any random sentence into Google, then add “meaning” afterwards, and you’ll get an AI explanation of a famous idiom or phrase you just made up. Here is mine

[image or embed]

— Greg Jenner (@gregjenner.bsky.social) April 23, 2025 at 6: 15 AM

But let’s say the child persisted and really wanted an explanation for what the phrase means. So you’d do your best to generate a plausible-sounding answer. You’d search your memory for possible connotations for the word “lick” and/or symbolic meaning for the noble badger to force the idiom into some semblance of sense. You’d reach back to other similar idioms you know to try to fit this new, unfamiliar phrase into a wider pattern (anyone who has played the excellent board game Wise and Otherwise might be familiar with the process).

Google’s AI Overview doesn’t go through exactly that kind of human thought process when faced with a similar question about the same saying. But in its own way, the large language model also does its best to generate a plausible-sounding response to an unreasonable request.

As seen in Greg Jenner’s viral Bluesky post, Google’s AI Overview suggests that “you can’t lick a badger twice” means that “you can’t trick or deceive someone a second time after they’ve been tricked once. It’s a warning that if someone has already been deceived, they are unlikely to fall for the same trick again.” As an attempt to derive meaning from a meaningless phrase —which was, after all, the user’s request—that’s not half bad. Faced with a phrase that has no inherent meaning, the AI Overview still makes a good-faith effort to answer the user’s request and draw some plausible explanation out of troll-worthy nonsense.

Contrary to the computer science truism of “garbage in, garbage out, Google here is taking in some garbage and spitting out… well, a workable interpretation of garbage, at the very least.

Google’s AI Overview even goes into more detail explaining its thought process. “Lick” here means to “trick or deceive” someone, it says, a bit of a stretch from the dictionary definition of lick as “comprehensively defeat,” but probably close enough for an idiom (and a plausible iteration of the idiom, “Fool me once shame on you, fool me twice, shame on me…”). Google also explains that the badger part of the phrase “likely originates from the historical sport of badger baiting,” a practice I was sure Google was hallucinating until I looked it up and found it was real.

It took me 15 seconds to make up this saying but now I think it kind of works! Credit: Kyle Orland / Google

I found plenty of other examples where Google’s AI derived more meaning than the original requester’s gibberish probably deserved. Google interprets the phrase “dream makes the steam” as an almost poetic statement about imagination powering innovation. The line “you can’t humble a tortoise” similarly gets interpreted as a statement about the difficulty of intimidating “someone with a strong, steady, unwavering character (like a tortoise).”

Google also often finds connections that the original nonsense idiom creators likely didn’t intend. For instance, Google could link the made-up idiom “A deft cat always rings the bell” to the real concept of belling the cat. And in attempting to interpret the nonsense phrase “two cats are better than grapes,” the AI Overview correctly notes that grapes can be potentially toxic to cats.

Brimming with confidence

Even when Google’s AI Overview works hard to make the best of a bad prompt, I can still understand why the responses rub a lot of users the wrong way. A lot of the problem, I think, has to do with the LLM’s unearned confident tone, which pretends that any made-up idiom is a common saying with a well-established and authoritative meaning.

Rather than framing its responses as a “best guess” at an unknown phrase (as a human might when responding to a child in the example above), Google generally provides the user with a single, authoritative explanation for what an idiom means, full stop. Even with the occasional use of couching words such as “likely,” “probably,” or “suggests,” the AI Overview comes off as unnervingly sure of the accepted meaning for some nonsense the user made up five seconds ago.

If Google’s AI Overviews always showed this much self-doubt, we’d be getting somewhere. Credit: Google / Kyle Orland

I was able to find one exception to this in my testing. When I asked Google the meaning of “when you see a tortoise, spin in a circle,” Google reasonably told me that the phrase “doesn’t have a widely recognized, specific meaning” and that it’s “not a standard expression with a clear, universal meaning.” With that context, Google then offered suggestions for what the phrase “seems to” mean and mentioned Japanese nursery rhymes that it “may be connected” to, before concluding that it is “open to interpretation.”

Those qualifiers go a long way toward properly contextualizing the guesswork Google’s AI Overview is actually conducting here. And if Google provided that kind of context in every AI summary explanation of a made-up phrase, I don’t think users would be quite as upset.

Unfortunately, LLMs like this have trouble knowing what they don’t know, meaning moments of self-doubt like the turtle interpretation here tend to be few and far between. It’s not like Google’s language model has some master list of idioms in its neural network that it can consult to determine what is and isn’t a “standard expression” that it can be confident about. Usually, it’s just projecting a self-assured tone while struggling to force the user’s gibberish into meaning.

Zeus disguised himself as what?

The worst examples of Google’s idiomatic AI guesswork are ones where the LLM slips past plausible interpretations and into sheer hallucination of completely fictional sources. The phrase “a dog never dances before sunset,” for instance, did not appear in the film Before Sunrise, no matter what Google says. Similarly, “There are always two suns on Tuesday” does not appear in The Hitchhiker’s Guide to the Galaxy film despite Google’s insistence.

Literally in the one I tried.

[image or embed]

— Sarah Vaughan (@madamefelicie.bsky.social) April 23, 2025 at 7: 52 AM

There’s also no indication that the made-up phrase “Welsh men jump the rabbit” originated on the Welsh island of Portland, or that “peanut butter platform heels” refers to a scientific experiment creating diamonds from the sticky snack. We’re also unaware of any Greek myth where Zeus disguises himself as a golden shower to explain the phrase “beware what glitters in a golden shower.” (Update: As many commenters have pointed out, this last one is actually a reference to the greek myth of Danaë and the shower of gold, showing Google’s AI knows more about this potential symbolism than I do)

The fact that Google’s AI Overview presents these completely made-up sources with the same self-assurance as its abstract interpretations is a big part of the problem here. It’s also a persistent problem for LLMs that tend to make up news sources and cite fake legal cases regularly. As usual, one should be very wary when trusting anything an LLM presents as an objective fact.

When it comes to the more artistic and symbolic interpretation of nonsense phrases, though, I think Google’s AI Overviews have gotten something of a bad rap recently. Presented with the difficult task of explaining nigh-unexplainable phrases, the model does its best, generating interpretations that can border on the profound at times. While the authoritative tone of those responses can sometimes be annoying or actively misleading, it’s at least amusing to see the model’s best attempts to deal with our meaningless phrases.

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Google search’s made-up AI explanations for sayings no one ever said, explained Read More »

FCC urges courts to ignore 5th Circuit ruling that agency can’t issue fines

5th circuit, FCC, Policy / Kris Guyer / April 30, 2025

FCC fights the 5th Circuit

One court said FCC violated right to trial, but other courts haven’t ruled yet.

Credit: Getty Images | AaronP/Bauer-Griffin

The Federal Communications Commission is urging two federal appeals courts to disregard a 5th Circuit ruling that guts the agency’s ability to issue financial penalties.

On April 17, the US Court of Appeals for the 5th Circuit granted an AT&T request to wipe out a $57 million fine for selling customer location data without consent. The conservative 5th Circuit court said the FCC “acted as prosecutor, jury, and judge,” violating AT&T’s Seventh Amendment right to a jury trial.

The ruling wasn’t a major surprise. The 5th Circuit said it was guided by the Supreme Court’s June 2024 ruling in Securities and Exchange Commission v. Jarkesy, which held that “when the SEC seeks civil penalties against a defendant for securities fraud, the Seventh Amendment entitles the defendant to a jury trial.” After the Supreme Court’s Jarkesy ruling, FCC Republican Nathan Simington vowed to vote against any fine imposed by the commission until its legal powers are clear.

Before becoming the FCC chairman, Brendan Carr voted against the fine issued to AT&T and fines for similar privacy violations simultaneously levied against T-Mobile and Verizon. Carr repeatedly opposed Biden-era efforts to regulate telecom providers and is aiming to eliminate many of the FCC’s rules now that he is in charge. But Carr has also been aggressive in regulation of media, and he doesn’t want the FCC’s ability to issue penalties completely wiped out. The Carr FCC stated its position in new briefs submitted in separate lawsuits filed by T-Mobile and Verizon.

Verizon sued the FCC in the 2nd Circuit in an attempt to overturn its privacy fine, while T-Mobile and subsidiary Sprint sued in the District of Columbia Circuit. Verizon and T-Mobile reacted to the 5th Circuit ruling by urging the other courts to rule the same way, prompting responses from the FCC last week.

“The Fifth Circuit concluded that the FCC’s enforcement proceeding leading to a monetary forfeiture order violated AT&T’s Seventh Amendment rights. This Court shouldn’t follow that decision,” the FCC told the 2nd Circuit last week.

FCC loss has wide implications

Carr’s FCC argued that the agency’s “monetary forfeiture order proceedings pose no Seventh Amendment problem because Section 504(a) [of the Communications Act] affords carriers the opportunity to demand a de novo jury trial in federal district court before the government can recover any penalty. Verizon elected to forgo that opportunity and instead sought direct appellate review.” The FCC put forth the same argument in the T-Mobile case with a filing in the District of Columbia Circuit.

There would be a circuit split if either the 2nd Circuit or DC Circuit appeals court rules in the FCC’s favor, increasing the chances that the Supreme Court will take up the case and rule directly on the FCC’s enforcement authority.

Beyond punishing telecom carriers for privacy violations, an FCC loss could prevent the commission from fining robocallers. When Carr’s FCC proposed a $4.5 million fine for an allegedly illegal robocall scheme in February, Simington repeated his objection to the FCC issuing fines of any type.

“While the conduct described in this NAL [Notice of Apparent Liability for Forfeiture] is particularly egregious and certainly worth enforcement action, I continue to believe that the Supreme Court’s decision in Jarkesy prevents me from voting, at this time, to approve this or any item purporting to impose a fine,” Simington said at the time.

5th Circuit reasoning

The 5th Circuit ruling against the FCC was issued by a panel of three judges appointed by Republican presidents. “Our analysis is governed by SEC v. Jarkesy. In that case, the Supreme Court ruled that the Seventh Amendment prohibited the SEC from requiring respondents to defend themselves before an agency, rather than a jury, against civil penalties for alleged securities fraud,” the appeals court said.

The penalty issued by the FCC is not “remedial,” the court said. The fine was punitive and not simply “meant to compensate victims whose location data was compromised. So, like the penalties in Jarkesy, the civil penalties here are ‘a type of remedy at common law that could only be enforced in courts of law.'”

The FCC argued that its enforcement proceeding fell under the “public rights” exception, unlike the private rights that must be adjudicated in court. “The Commission argues its enforcement action falls within the public rights exception because it involves common carriers,” the 5th Circuit panel said. “Given that common carriers like AT&T are ‘affected with a public interest,’ the Commission contends Congress could assign adjudication of civil penalties against them to agencies instead of courts.”

The panel disagreed, saying that “the Commission’s proposal would blow a hole in what is meant to be a narrow exception to Article III” and “empower Congress to bypass Article III adjudication in countless matters.” The panel acknowledged that “federal agencies like the Commission have long had regulatory authority over common carriers, such as when setting rates or granting licenses,” but said this doesn’t mean that “any regulatory action concerning common carriers implicates the public rights exception.”

FCC hopes lie with other courts

The 5th Circuit panel also rejected the FCC’s contention that carriers are afforded the right to a trial after the FCC enforcement proceeding. The 5th Circuit said this applies only when a carrier fails to pay a penalty and is sued by the Department of Justice. “To begin with, by the time DOJ sues (if it does), the Commission would have already adjudged a carrier guilty of violating section 222 and levied fines… in this process, which was completely in-house, the Commission acted as prosecutor, jury, and judge,” the panel said.

An entity penalized by the FCC can also ask a court of appeals to overturn the fine, as AT&T did here. But in choosing this path, the company “forgoes a jury trial,” the 5th Circuit panel said.

While Verizon and T-Mobile hope the other appeals courts will rule the same way, the FCC maintains that the 5th Circuit got it wrong. In its filing to the 2nd Circuit, the FCC challenged the 5th Circuit’s view on whether a trial after the FCC issues a fine satisfies the right to a jury trial. Pointing to an 1899 Supreme Court ruling, the FCC said that “an initial tribunal can lawfully enter judgment without a full jury trial if the law permits a subsequent ‘trial [anew] by jury, at the request of either party, in the appellate court.'”

The FCC further said the 5th Circuit relied on a precedent that doesn’t exist in either the 2nd Circuit or District of Columbia Circuit.

“The Fifth Circuit also relied on circuit precedent holding that ‘[i]n a section 504 trial, a defendant cannot challenge a forfeiture order’s legal conclusions,'” the FCC also said. “This Court, however, has never adopted such a limitation, and the Fifth Circuit’s premise is in doubt. Regardless, the proper approach would be to challenge any such limitation in the trial court and seek to strike the limitation—not to vacate the forfeiture order.”

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

FCC urges courts to ignore 5th Circuit ruling that agency can’t issue fines Read More »

Trump admin lashes out as Amazon considers displaying tariff costs on its sites

Amazon, Biz & IT, china tariffs, tariffs, Tech, temu / Kris Guyer / April 30, 2025

This morning, Punchbowl News reported that Amazon was considering listing the cost of tariffs as a separate line item on its site, citing “a person familiar with the plan.” Amazon later acknowledged that there had been internal discussions to that effect but only for its import-focused Amazon Haul sub-store and that the company didn’t plan to actually list tariff prices for any items.

“This was never approved and is not going to happen,” reads Amazon’s two-sentence statement.

Amazon issued such a specific and forceful on-the-record denial in part because it had drawn the ire of the Trump administration. In a press briefing early this morning, White House Press Secretary Karoline Leavitt was asked a question about the report, which the administration responded to as though Amazon had made a formal announcement about the policy.

“This is a hostile and political act by Amazon,” Leavitt said, before blaming the Biden administration for high inflation and claiming that Amazon had “partnered with a Chinese propaganda arm.”

The Washington Post also reported that Trump had called Amazon founder Jeff Bezos to complain about the report.

Amazon’s internal discussions reflect the current confusion around the severe and rapidly changing import tariffs imposed by the Trump administration, particularly tariffs of 145 percent on goods imported from China. Other retailers, particularly sites like Temu, AliExpress, and Shein, have all taken their own steps, either adding labels to listings when import taxes have already been included in the price, or adding import taxes as a separate line item in users’ carts at checkout as Amazon had discussed doing.

A Temu cart showing the price of an item’s import tax as a separate line item. Amazon reportedly considered and discarded a similar idea for its Amazon Haul sub-site.

Small purchases are seeing big hits

Most of these items are currently excluded from tariffs because of something called the de minimis exemption, which applies to any shipment valued under $800. The administration currently plans to end the de minimis exemption for packages coming from China or Hong Kong beginning on May 2, though the administration’s plans could change (as they frequently have before).

Trump admin lashes out as Amazon considers displaying tariff costs on its sites Read More »

AI-generated code could be a disaster for the software supply chain. Here’s why.

AI, Biz & IT, dependency confusion, package confusion. package hallucination, Security, supply chain attac / Kris Guyer / April 29, 2025

AI-generated computer code is rife with references to non-existent third-party libraries, creating a golden opportunity for supply-chain attacks that poison legitimate programs with malicious packages that can steal data, plant backdoors, and carry out other nefarious actions, newly published research shows.

The study, which used 16 of the most widely used large language models to generate 576,000 code samples, found that 440,000 of the package dependencies they contained were “hallucinated,” meaning they were non-existent. Open source models hallucinated the most, with 21 percent of the dependencies linking to non-existent libraries. A dependency is an essential code component that a separate piece of code requires to work properly. Dependencies save developers the hassle of rewriting code and are an essential part of the modern software supply chain.

Package hallucination flashbacks

These non-existent dependencies represent a threat to the software supply chain by exacerbating so-called dependency confusion attacks. These attacks work by causing a software package to access the wrong component dependency, for instance by publishing a malicious package and giving it the same name as the legitimate one but with a later version stamp. Software that depends on the package will, in some cases, choose the malicious version rather than the legitimate one because the former appears to be more recent.

Also known as package confusion, this form of attack was first demonstrated in 2021 in a proof-of-concept exploit that executed counterfeit code on networks belonging to some of the biggest companies on the planet, Apple, Microsoft, and Tesla included. It’s one type of technique used in software supply-chain attacks, which aim to poison software at its very source, in an attempt to infect all users downstream.

“Once the attacker publishes a package under the hallucinated name, containing some malicious code, they rely on the model suggesting that name to unsuspecting users,” Joseph Spracklen, a University of Texas at San Antonio Ph.D. student and lead researcher, told Ars via email. “If a user trusts the LLM’s output and installs the package without carefully verifying it, the attacker’s payload, hidden in the malicious package, would be executed on the user’s system.”

AI-generated code could be a disaster for the software supply chain. Here’s why. Read More »

iOS and Android juice jacking defenses have been trivial to bypass for years

android, Apple, Biz & IT, choicejacking, Features, Google, iOS, juice jacking, juicejacking, Security, Tech / Kris Guyer / April 28, 2025

SON OF JUICE JACKING ARISES

New ChoiceJacking attack allows malicious chargers to steal data from phones.

Credit: Aurich Lawson | Getty Images

About a decade ago, Apple and Google started updating iOS and Android, respectively, to make them less susceptible to “juice jacking,” a form of attack that could surreptitiously steal data or execute malicious code when users plug their phones into special-purpose charging hardware. Now, researchers are revealing that, for years, the mitigations have suffered from a fundamental defect that has made them trivial to bypass.

“Juice jacking” was coined in a 2011 article on KrebsOnSecurity detailing an attack demonstrated at a Defcon security conference at the time. Juice jacking works by equipping a charger with hidden hardware that can access files and other internal resources of phones, in much the same way that a computer can when a user connects it to the phone.

An attacker would then make the chargers available in airports, shopping malls, or other public venues for use by people looking to recharge depleted batteries. While the charger was ostensibly only providing electricity to the phone, it was also secretly downloading files or running malicious code on the device behind the scenes. Starting in 2012, both Apple and Google tried to mitigate the threat by requiring users to click a confirmation button on their phones before a computer—or a computer masquerading as a charger—could access files or execute code on the phone.

The logic behind the mitigation was rooted in a key portion of the USB protocol that, in the parlance of the specification, dictates that a USB port can facilitate a “host” device or a “peripheral” device at any given time, but not both. In the context of phones, this meant they could either:

Host the device on the other end of the USB cord—for instance, if a user connects a thumb drive or keyboard. In this scenario, the phone is the host that has access to the internals of the drive, keyboard or other peripheral device.
Act as a peripheral device that’s hosted by a computer or malicious charger, which under the USB paradigm is a host that has system access to the phone.

An alarming state of USB security

Researchers at the Graz University of Technology in Austria recently made a discovery that completely undermines the premise behind the countermeasure: They’re rooted under the assumption that USB hosts can’t inject input that autonomously approves the confirmation prompt. Given the restriction against a USB device simultaneously acting as a host and peripheral, the premise seemed sound. The trust models built into both iOS and Android, however, present loopholes that can be exploited to defeat the protections. The researchers went on to devise ChoiceJacking, the first known attack to defeat juice-jacking mitigations.

“We observe that these mitigations assume that an attacker cannot inject input events while establishing a data connection,” the researchers wrote in a paper scheduled to be presented in August at the Usenix Security Symposium in Seattle. “However, we show that this assumption does not hold in practice.”

The researchers continued:

We present a platform-agnostic attack principle and three concrete attack techniques for Android and iOS that allow a malicious charger to autonomously spoof user input to enable its own data connection. Our evaluation using a custom cheap malicious charger design reveals an alarming state of USB security on mobile platforms. Despite vendor customizations in USB stacks, ChoiceJacking attacks gain access to sensitive user files (pictures, documents, app data) on all tested devices from 8 vendors including the top 6 by market share.

In response to the findings, Apple updated the confirmation dialogs in last month’s release of iOS/iPadOS 18.4 to require a user authentication in the form of a PIN or password. While the researchers were investigating their ChoiceJacking attacks last year, Google independently updated its confirmation with the release of version 15 in November. The researchers say the new mitigation works as expected on fully updated Apple and Android devices. Given the fragmentation of the Android ecosystem, however, many Android devices remain vulnerable.

All three of the ChoiceJacking techniques defeat Android juice-jacking mitigations. One of them also works against those defenses in Apple devices. In all three, the charger acts as a USB host to trigger the confirmation prompt on the targeted phone.

The attacks then exploit various weaknesses in the OS that allow the charger to autonomously inject “input events” that can enter text or click buttons presented in screen prompts as if the user had done so directly into the phone. In all three, the charger eventually gains two conceptual channels to the phone: (1) an input one allowing it to spoof user consent and (2) a file access connection that can steal files.

An illustration of ChoiceJacking attacks. (1) The victim device is attached to the malicious charger. (2) The charger establishes an extra input channel. (3) The charger initiates a data connection. User consent is needed to confirm it. (4) The charger uses the input channel to spoof user consent. Credit: Draschbacher et al.

It’s a keyboard, it’s a host, it’s both

In the ChoiceJacking variant that defeats both Apple- and Google-devised juice-jacking mitigations, the charger starts as a USB keyboard or a similar peripheral device. It sends keyboard input over USB that invokes simple key presses, such as arrow up or down, but also more complex key combinations that trigger settings or open a status bar.

The input establishes a Bluetooth connection to a second miniaturized keyboard hidden inside the malicious charger. The charger then uses the USB Power Delivery, a standard available in USB-C connectors that allows devices to either provide or receive power to or from the other device, depending on messages they exchange, a process known as the USB PD Data Role Swap.

A simulated ChoiceJacking charger. Bidirectional USB lines allow for data role swaps. Credit: Draschbacher et al.

With the charger now acting as a host, it triggers the file access consent dialog. At the same time, the charger still maintains its role as a peripheral device that acts as a Bluetooth keyboard that approves the file access consent dialog.

The full steps for the attack, provided in the Usenix paper, are:

1. The victim device is connected to the malicious charger. The device has its screen unlocked.

2. At a suitable moment, the charger performs a USB PD Data Role (DR) Swap. The mobile device now acts as a USB host, the charger acts as a USB input device.

3. The charger generates input to ensure that BT is enabled.

4. The charger navigates to the BT pairing screen in the system settings to make the mobile device discoverable.

5. The charger starts advertising as a BT input device.

6. By constantly scanning for newly discoverable Bluetooth devices, the charger identifies the BT device address of the mobile device and initiates pairing.

7. Through the USB input device, the charger accepts the Yes/No pairing dialog appearing on the mobile device. The Bluetooth input device is now connected.

8. The charger sends another USB PD DR Swap. It is now the USB host, and the mobile device is the USB device.

9. As the USB host, the charger initiates a data connection.

10. Through the Bluetooth input device, the charger confirms its own data connection on the mobile device.

This technique works against all but one of the 11 phone models tested, with the holdout being an Android device running the Vivo Funtouch OS, which doesn’t fully support the USB PD protocol. The attacks against the 10 remaining models take about 25 to 30 seconds to establish the Bluetooth pairing, depending on the phone model being hacked. The attacker then has read and write access to files stored on the device for as long as it remains connected to the charger.

Two more ways to hack Android

The two other members of the ChoiceJacking family work only against the juice-jacking mitigations that Google put into Android. In the first, the malicious charger invokes the Android Open Access Protocol, which allows a USB host to act as an input device when the host sends a special message that puts it into accessory mode.

The protocol specifically dictates that while in accessory mode, a USB host can no longer respond to other USB interfaces, such as the Picture Transfer Protocol for transferring photos and videos and the Media Transfer Protocol that enables transferring files in other formats. Despite the restriction, all of the Android devices tested violated the specification by accepting AOAP messages sent, even when the USB host hadn’t been put into accessory mode. The charger can exploit this implementation flaw to autonomously complete the required user confirmations.

The remaining ChoiceJacking technique exploits a race condition in the Android input dispatcher by flooding it with a specially crafted sequence of input events. The dispatcher puts each event into a queue and processes them one by one. The dispatcher waits for all previous input events to be fully processed before acting on a new one.

“This means that a single process that performs overly complex logic in its key event handler will delay event dispatching for all other processes or global event handlers,” the researchers explained.

They went on to note, “A malicious charger can exploit this by starting as a USB peripheral and flooding the event queue with a specially crafted sequence of key events. It then switches its USB interface to act as a USB host while the victim device is still busy dispatching the attacker’s events. These events therefore accept user prompts for confirming the data connection to the malicious charger.”

The Usenix paper provides the following matrix showing which devices tested in the research are vulnerable to which attacks.

The susceptibility of tested devices to all three ChoiceJacking attack techniques. Credit: Draschbacher et al.

User convenience over security

In an email, the researchers said that the fixes provided by Apple and Google successfully blunt ChoiceJacking attacks in iPhones, iPads, and Pixel devices. Many Android devices made by other manufacturers, however, remain vulnerable because they have yet to update their devices to Android 15. Other Android devices—most notably those from Samsung running the One UI 7 software interface—don’t implement the new authentication requirement, even when running on Android 15. The omission leaves these models vulnerable to ChoiceJacking. In an email, principal paper author Florian Draschbacher wrote:

The attack can therefore still be exploited on many devices, even though we informed the manufacturers about a year ago and they acknowledged the problem. The reason for this slow reaction is probably that ChoiceJacking does not simply exploit a programming error. Rather, the problem is more deeply rooted in the USB trust model of mobile operating systems. Changes here have a negative impact on the user experience, which is why manufacturers are hesitant. [It] means for enabling USB-based file access, the user doesn’t need to simply tap YES on a dialog but additionally needs to present their unlock PIN/fingerprint/face. This inevitably slows down the process.

The biggest threat posed by ChoiceJacking is to Android devices that have been configured to enable USB debugging. Developers often turn on this option so they can troubleshoot problems with their apps, but many non-developers enable it so they can install apps from their computer, root their devices so they can install a different OS, transfer data between devices, and recover bricked phones. Turning it on requires a user to flip a switch in Settings > System > Developer options.

If a phone has USB Debugging turned on, ChoiceJacking can gain shell access through the Android Debug Bridge. From there, an attacker can install apps, access the file system, and execute malicious binary files. The level of access through the Android Debug Mode is much higher than that through Picture Transfer Protocol and Media Transfer Protocol, which only allow read and write access to system files.

The vulnerabilities are tracked as:

- CVE-2025-24193 (Apple)
- CVE-2024-43085 (Google)
- CVE-2024-20900 (Samsung)
- CVE-2024-54096 (Huawei)

A Google spokesperson confirmed that the weaknesses were patched in Android 15 but didn’t speak to the base of Android devices from other manufacturers, who either don’t support the new OS or the new authentication requirement it makes possible. Apple declined to comment for this post.

Word that juice-jacking-style attacks are once again possible on some Android devices and out-of-date iPhones is likely to breathe new life into the constant warnings from federal authorities, tech pundits, news outlets, and local and state government agencies that phone users should steer clear of public charging stations.

As I reported in 2023, these warnings are mostly scaremongering, and the advent of ChoiceJacking does little to change that, given that there are no documented cases of such attacks in the wild. That said, people using Android devices that don’t support Google’s new authentication requirement may want to refrain from public charging.

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

iOS and Android juice jacking defenses have been trivial to bypass for years Read More »

Revisiting iZombie, 10 years later

culture, izombie / Kris Guyer / April 28, 2025

We loved the show’s wicked humor, great characters, and mix of cases-of-the-week with longer narrative arcs.

Zombies never really go out of style, but they were an especially hot commodity on television in the 2010s, spawning the blockbuster series The Walking Dead (2010-2022) as well as quirkier fare like Netflix’s comedy horror, The Santa Clarita Diet (2017-2018). iZombie, a supernatural procedural dramedy that ran for five seasons on the CW, falls into the latter category. It never achieved mega-hit status but nonetheless earned a hugely loyal following drawn to the show’s wicked humor, well-drawn characters, and winning mix of cases-of-the-week and longer narrative arcs.

(Spoilers for all five seasons below.)

The original Vertigo comic series was created by writer Chris Roberson and artist Michael Allred. It featured a zombie in Eugene, Oregon, named Gwen Dylan, who worked as a gravedigger because she needed to consume brains every 30 days to keep her memories and cognitive faculties in working order. Her best friends were a ghost who died in the 1960s and a were-terrier named Scott, nicknamed “Spot,” and together they took on challenges both personal and supernatural (vampires, mummies, etc.).

Created by Rob Thomas and Diane Ruggiero-Wright, the TV series borrowed the rough outlines of the premise but otherwise had very little in common with the comics, although Allred drew the nifty opening credits (set to a cover version of “Stop, I’m Already Dead” by Deadboy & The Elephant Men). The location shifted to Seattle.

An over-achieving young medical student, Liv Moore (get it?)—played to perfection by Rose McIver—decides to attend a boat party on a whim one night. It ends in disaster thanks to a sudden zombie outbreak, resulting from a combination of an energy sports drink (Max Rager) and a tainted batch of a new designer drug called Utopium. Liv jumps into the water to flee the zombies, but suffers a scratch and wakes up on a beach in a body bag, craving brains.

Liv is forced to break up with her human fiancé, Major (Robert Buckley) to avoid infecting him, and becomes estranged from her best friend and roommate Peyton (Aly Michalka), hiding her new zombie nature from both. And she ends up working in the medical examiner’s office to ensure she has a steady supply of brains. Soon her boss, Ravi (Rahul Kohli), discovers her secret. Rather than being terrified or trying to kill her, Ravi is fascinated by her unusual condition. He tells Liv he was fired by the CDC for his incessant warnings about the threat of such a virus and vows to find a cure.

The brainy bunch

Med student Liv Moore (Rose McIver) wound up a zombie after attending an ill-fated boat party. The CW

The show’s premise stems from an unusual side effect of eating brains: Liv gets some of the dead person’s memories in flashes (visions) as well as certain personal traits—speaking Romanian, painting, agoraphobia, alcoholism, etc. This gives her critical insights that help Det. Clive Babineaux (Malcolm Goodwin) solve various murders, although for several seasons Clive thinks Liv is psychic rather than a zombie. It’s Ravi who first encourages her to get involved when a kleptomaniac Romanian call girl is killed: “You ate the girl’s temporal lobe; the least you can do is help solve her murder.”

Every show needs a good villain and iZombie found it in Liv’s fellow zombie, Blaine (David Anders)—in fact, Blaine is the one who scratched Liv at the boat party and turned her into a zombie. He was there dealing the tainted Utopium. Zombie Blaine switches to dealing brains, which he naturally acquires through murderous means, creating a loyal (i.e., desperate) customer base by infecting wealthy sorts and turning them into zombies. What makes Blaine so compelling as a villain is that he’s as devilishly charming as he is evil, with some unresolved daddy issues for good measure.

Over the course of five seasons, we fell in love with iZombie‘s colorful collection of characters; relished the way the writers leaned into the rather silly premise and (mostly) made it work; and groaned at the occasional bad pun. (Major’s last name is “Lillywhite”; Blaine’s S1 butcher shop is called Meat Cute; when Ravi and Major take in a stray dog, Ravi names the dog “Minor”; and at one point there is a zombie bar called The Scratching Post.) Admittedly, the show started to lose some momentum in later seasons as subplots and shifting relationships became more complicated. And without question the series finale was disappointing: it felt rushed and unsatisfying, with few of the quieter character moments that made its strongest episodes so appealing.

Yet there is still so much to love about iZombie, starting with the brain recipes. Brains are disgusting; Blaine and Liv briefly bond over the metallic taste, gross texture, and how much they miss real food. It doesn’t help that zombies can’t really taste much flavor and thus douse their repasts in eye-watering hot sauces. No wonder Liv is constantly trying to find new ways to make the brains more palatable: stir fry, instant Ramen noodles, mixing the brains in with microwaved pizza rolls, deep friend hush puppy brains, sloppy joes, protein shakes—you name it. Blaine, however, takes things to a gourmet level for his rich zombie customers, creating tantalizing dishes like gnocchi stuffed with medulla oblongata swimming in a fra diavolo sauce.

Good guys, bad guys

“Full-on zombie mode” came in handy sometimes. The CW

The writers didn’t neglect Liv’s love life, which she mistakenly thought was over once she became a zombie. Sure, Liv was always going to end up in a happily-ever-after situation with Major. But count me among those who never thought they really worked as soul mates. (Maybe pre-zombie they did.)

The clear fan favorite love interest was S1’s Lowell Tracey (Bradley James), a British musician who found he could no longer perform live after becoming a zombie—since pre-show adrenalin tended to trigger Full On Zombie Mode. He was Liv’s “first” as a zombie, and while they were superficially very different, they bonded over their shared secret and the resulting emotional isolation. And he bonded with Ravi over their shared hatred of a rival soccer team.

James’ smartly soulful performance won fans’ hearts. We were all rooting for those crazy kids. Alas, Liv soon discovered that his brain supply came from Blaine after she accidentally had a bite of Lowell’s breakfast one morning. In a desperate bid to win back her trust, Lowell agreed to help her take out Blaine; it helped that Liv was currently on Sniper Brain. But when the critical moment came, Liv couldn’t take the shot. She watched through the gun sight as Lowell put his hand over his heart and took on Blaine alone—with fatal consequences, because sensitive artist types really aren’t cut out for fights to the death. Howls of protest echoed in living rooms around the world. RIP Lowell, we barely knew ye.

Lowell never got the chance to become a recurring character but others were more fortunate. Jessica Harmon’s FBI agent, Dale Brazzio, started out as an antagonist investigating the Meat Cute murders—Major and a zombie police captain blew it up to take out Blaine’s criminal enterprise—and ended up as Clive’s romantic partner. Bryce Hodgson’s comedic S1 turn as Major’s roommate in the mental institution, Scott E., was so memorable that the writers brought the actor back to play twin brother Don E., part of Blaine’s drug (and brain) dealing enterprise. Others never graduated to recurring roles but still made the odd guest appearance: Daran Norris as the charmingly louche weatherman Johnny Frost, for instance, and Ryan Beil as nebbishy police sketch artist Jimmy Hahn.

You are what you eat

Liv on frat-boy brain crushed it at beer pong The CW

And let’s not forget the various Big Bads, most notably S2’s Vaughan du Clark (Steven Weber), amoral playboy CEO of Max Rager, and his conniving temptress daughter, Rita (Leanne Lapp). They provided all manner of delicious devilry before meeting a fitting end: Rita, now a zombie due to Vaughan’s negligence, goes “full Romero” during the S2 finale and eats daddy’s brains in an elevator before being shot in the head.

Perhaps the best thing about iZombie was how much fun the writers had giving Liv so many different kinds of brains to eat—and how much fun McIver had weaving those very different personalties into her performance. There was the rich shopaholic Desperate Housewife; an amorous painter; a sociopathic hitman who was a whiz at pub trivia; a grumpy old man; a schizophrenic; a kids basketball coach; a magician; a dominatrix; a medieval history professor fond of LARP-ing; and a ballroom dancer, to name a few.

Liv on agoraphobic hacker brain dominates an online gaming campaign, while she becomes an ace dungeon master on Dungeons & Dragons brain, much to nerdcore Ravi’s delight—although perhaps not as much as he enjoys Liv on vigilante superhero brain. (He found Liv on PhD scientist brain more annoying.) And sometimes the brains are used for throwaway humor: Lowell accidentally eating a gay man’s brain just before his first date with Liv, for instance, or Liv, Blaine, and Don E. hopped up on conspiracy theory brain and bonding over their shared paranoid delusions.

If I were forced to pick my favorite brain, however, I’d probably go back to the S1 episode, “Flight of the Living Dead,” in which Liv’s adventurous former sorority sister, Holly (Tasya Teles) dies in a skydiving “accident” that turns out to be murder. Back in the day, Liv was among those who voted to kick Holly out of the sorority for her constant rule-breaking and reckless behavior. But after eating Holly’s brain in hopes of finding out who killed her, Liv learns more about where Holly was coming from and how to bring something of Holly’s insatiable lust for life into her own existence. “Live each day as if it were your last” can’t help but strike a chord with Liv, who took her former ambitious over-achieving life for granted before that fateful boat party.

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Revisiting iZombie, 10 years later Read More »

Comcast president bemoans broadband customer losses: “We are not winning”

Comcast, Policy / Kris Guyer / April 25, 2025

Comcast executives apparently realized something that customers have known and complained about for years: The Internet provider’s prices aren’t transparent enough and rise too frequently.

This might not have mattered much to cable executives as long as the total number of subscribers met their targets. But after reporting a net loss of 183,000 residential broadband customers in Q1 2025, Comcast President Mike Cavanagh said the company isn’t “winning in the marketplace” during an earnings call today. The Q1 2025 customer loss was over three times larger than the net loss in Q1 2024.

While customers often have few viable options for broadband and the availability of alternatives varies widely by location, Comcast faces competition from fiber and fixed wireless ISPs.

“In this intensely competitive environment, we are not winning in the marketplace in a way that is commensurate with the strength of the network and connectivity products that I just described,” Cavanagh said. “[Cable division CEO] Dave [Watson] and his team have worked hard to understand the reasons for this disconnect and have identified two primary causes. One is price transparency and predictability and the other is the level of ease of doing business with us. The good news is that both are fixable and we are already underway with execution plans to address these challenges.”

The 183,000-subscriber loss lowered Comcast’s residential Internet subscribers to 29.19 million. Comcast also reported a first-quarter drop of 17,000 business broadband subscribers, lowering that category’s total to 2.45 million.

Comcast’s stock price fell 3.7 percent today even though its overall profit beat analyst expectations and domestic broadband revenue rose 1.7 percent year over year to $6.56 billion—a sign that Comcast is extracting more money from customers on average. “Analysts peppered Comcast executives with questions on Thursday regarding its Xfinity-branded broadband and mobile, and how the company will pivot the business,” CNBC wrote.

“We are simplifying our pricing”

Cavanagh said that Comcast plans to make changes in marketing and operations “with the highest urgency.” This means that “we are simplifying our pricing construct to make our price-to-value proposition clearer to consumers across all broadband segments,” he said.

Comcast last week announced a five-year price guarantee for broadband customers who sign up for a new package. Comcast said customers will get a “simple monthly price starting as low as $55 per month,” without having to enter a contract, giving them “freedom and flexibility to cancel at any time without penalty.” The five-year guarantee also comes with one year of Xfinity Mobile at no charge, Comcast said.

Comcast president bemoans broadband customer losses: “We are not winning” Read More »

AI #113: The o3 Era Begins

Begins / Kris Guyer / April 25, 2025

Enjoy it while it lasts. The Claude 4 era, or the o4 era, or both, are coming soon.

Also, welcome to 2025, we measure eras in weeks or at most months.

For now, the central thing going on continues to be everyone adapting to the world of o3, a model that is excellent at providing mundane utility with the caveat that it is a lying liar. You need to stay on your toes.

This was also quietly a week full of other happenings, including a lot of discussions around alignment and different perspectives on what we need to do to achieve good outcomes, many of which strike me as dangerously mistaken and often naive.

I worry that growingly common themes are people pivoting to some mix of ‘alignment is solved, we know how to get an AI to do what we want it to do, the question is alignment to what or to who’ which is very clearly false, and ‘the real threat is concentration of power or the wrong humans making decisions,’ leading them to want to actively prevent humans from being able to collectively steer the future, or to focus the fighting on who gets to steer rather than ensuring the answer isn’t no one.

The problem is, if we can’t steer, the default outcome is humanity loses control over the future. We need to know how to and be able to steer. Eyes on the prize.

Previously this week: o3 Will Use Its Tools For You, o3 Is a Lying Liar, You Better Mechanize.

Language Models Offer Mundane Utility. Claude Code power mode.
You Offer the Models Mundane Utility. In AI America, website optimizes you.
Your Daily Briefing. Ask o3 for a roundup of daily news, many people are saying.
Language Models Don’t Offer Mundane Utility. I thought you’d never ask.
If You Want It Done Right. You gotta do it yourself. For now.
No Free Lunch. Not strictly true. There’s free lunch, but you want good lunch.
What Is Good In Life? Defeat your enemies, see them driven before you.
In Memory Of. When everyone has context, no one has it.
The Least Sincere Form of Flattery. We keep ending up with AIs that do this.
The Vibes are Off. Live by the vibe, fail to live by the vibe.
Here Let Me AI That For You. If you want to understand my references, ask.
Flash Sale. Oh, right, technically Gemini 2.5 Flash exists. It looks good?
Huh, Upgrades. NotebookLM with Gemini 2.5 Pro, OpenAI, Gemma, Grok.
On Your Marks. Vending-Bench, RepliBench, Virology? Oh my.
Be The Best Like No LLM Ever Was. The elite four had better watch out.
Choose Your Fighter. o3 wins by default, except for the places where it’s bad.
Deepfaketown and Botpocalypse Soon. Those who cheat, cheat on everything.
Fun With Media Generation. Welcome to the blip.
Fun With Media Selection. Media recommendations still need some work.
Copyright Confrontation. Meta only very slightly more embarrassed than before.
They Took Our Jobs. The human, the fox, the AI and the hedgehog.
Get Involved. Elysian Labs?
Ace is the Place. The helpful software folks automating your desktop. If you dare.
In Other AI News. OpenAI’s Twitter will be Yeet, Gemini Pro Model Card Watch.
Show Me the Money. Goodfire, Windsurf, Listen Labs.
The Mask Comes Off. A letter urges the Attorney Generals to step in.
Quiet Speculations. Great questions, and a lot of unnecessary freaking out.
Is This AGI? Mostly no, but some continue to claim yes.
The Quest for Sane Regulations. Even David Sacks wants to fund BIS.
Cooperation is Highly Useful. Continuing to make the case.
Nvidia Chooses Bold Strategy. Let’s see how it works out for them.
How America Loses. Have we considered not threatening and driving away allies?
Security Is Capability. If you want your AI to be useful, it needs to be reliable.
The Week in Audio. Yours truly, Odd Lots, Demis Hassabis.
AI 2027. A compilation of criticisms and extensive responses. New blog ho.
Rhetorical Innovation. Alignment is a confusing term. How do we fix this?
Aligning a Smarter Than Human Intelligence is Difficult. During deployment too.
Misalignment in the Wild. Anthropic studies what values its models express.
Concentration of Power and Lack of Transparency. Steer the future, or don’t.
Property Rights are Not a Long Term Plan. At least, not a good one.
It Is Risen. The Immaculate Completion?
The Lighter Side. o3 found me the exact location for that last poster.

Patrick McKenzie uses image generation to visualize the room his wife doesn’t want, in order to get her to figure out and explain what she does want so they can do it.

Altman is correct that current ChatGPT (and Gemini and Claude and so on) are rather great and vastly better than what we were first introduced to in December 2022, and the frog boiling has meant most people haven’t internalized the improvements.

Paul Graham: It would be very interesting to see a page where you got answers from both the old and current versions side by side. In fact you yourselves would probably learn something from it.

Help rewrite your laws via AI-driven regulation, UAE edition?

Deedy recommends the Claude Code best practices guide so you can be a 10x AI software engineer.

Deedy: It’s not enough to build software with AI, you need to be a 10x AI software engineer.

All the best teams are parallelizing their AI software use and churning out 25-50 meaningful commits / engineer / day.

The Claude Code best practices guide is an absolute goldmine of tips

The skills to get the most out of AI coding are different from being the best non-AI coder. One recommendation he highlights is to use 3+ git checkouts in seperate folders, put each in a distinct terminal and have each do different tasks. If you’re waiting on an AI, that’s a sign you’re getting it wrong.

There’s also this thread of top picks, from Alex Albert.

Alex Albert:

CLAUDE md files are the main hidden gem. Simple markdown files that give Claude context about your project – bash commands, code style, testing patterns. Claude loads them automatically and you can add to them with # key

The explore-plan-code workflow is worth trying. Instead of letting Claude jump straight to coding, have it read files first, make a plan (add “think” for deeper reasoning), then implement. Quality improves dramatically with this approach.

Test-driven development works very well for keeping Claude focused. Write tests, commit them, let Claude implement until they pass.

A more tactical one, ESC interrupts Claude, double ESC edits previous prompts. These shortcuts save lots of wasted work when you spot Claude heading the wrong direction

We’re using Claude for codebase onboarding now. Engineers ask it “why does this work this way?” and it searches git history and code for answers which has cut onboarding time significantly.

For automation, headless mode (claude -p) handles everything from PR labeling to subjective code review beyond what linters catch. Or try the –dangerously-skip-permissions mode. Spin up a container and let Claude loose on something like linting or boilerplate generation.

The multi-Claude workflow is powerful: one writes code, another follows up behind and reviews. You can also use git worktrees for parallel Claude sessions on different features.

Get an explanation of the idiom you just made up out of random words.

Get far better insights out of your genetic test.

Explain whatever your phone is looking at. This commonly gets into demos and advertisements. Patrick McKenzie reports frequently actually doing this. I don’t do it that often yet, but when I do it often directly solves a problem or gives me key info.

Debug your printer.

Solve an actually relevant-to-real-research complex mathematical problem.

How much of your future audience is AIs rather than humans, interface edition.

Andrej Karpathy: Tired: elaborate docs pages for your product/service/library with fancy color palettes, branding, animations, transitions, dark mode, …

Wired: one single docs .md file and a “copy to clipboard” button.

The docs also have to change in the content. Eg instead of instructing a person to go to some page and do this or that, they could show curl commands to run – actions that are a lot easier for an LLM to carry out.

Products have to change to support these too. Eg adding a Supabase db to your Vervel app shouldn’t be clicks but curls.roduct, service, library, …)

PSA It’s a new era of ergonomics.

The primary audience of your thing is now an LLM, not a human.

LLMs don’t like to navigate, they like to scrape.

LLMs don’t like to see, they like to read.

LLMs don’t like to click, they like to curl.

Etc etc.

I was reading the docs of a service yesterday feeling like a neanderthal. The docs were asking me to go to a url and click top right and enter this and that and click submit and I was like what is this 2024?

Not so fast. But the day is coming soon when you need to cater to both audiences.

I love the idea of a ‘copy the AI-optimized version of this to the clipboard’ button. This is easy for the AI to identify and use, and also for the human to identify and use. And yes, in many cases you can and should make it easy for them.

This is also a clear case of ‘why not both?’ There’s no need to get rid of the cool website designed for humans. All the AI needs is that extra button.

A bunch of people are reporting that getting o3-compiled daily summaries are great.

Rohit: This is an excellent way to use o3. Hadn’t realised it does tasks as well which makes a lot more interesting stuff possible.

It’s funny that this was one of the things that I had coded using AI help a year or two ago and it was janky and not particularly reliable, and today that whole thing is replaced with one prompt.

Still a few glitches to work out sometimes, including web search sometimes failing:

Matt Clancy: Does it work well?

Rohit: Not yet.

One easy way to not get utility is not to know you can ask for it.

Sully: btw 90% of the people I’ve watched use an LLM can’t prompt if their life depended on it.

I think just not understanding that you can change how you word things. they don’t know they can speak normally to a computer.

JXR: You should see how guys talk to women…. Same thing.

It used to feel very important to know how to do relatively bespoke prompt engineering. Now the models are stronger, and mostly you can ‘just say what you want’ and it will work out fine for most casual purposes. That still requires people to realize they can do that. A lot of us have had that conversation where we had to explain this principle to someone 5 times in a row and they didn’t believe us.

Another way is being fooled into thinking you shouldn’t use it ‘because climate.’

Could AI end up doing things like venting the atmosphere or boiling the oceans once the intelligence explosion gets out of hand? Yes, but that is a very different issue.

Concerns about water use from chatbot queries are Obvious Nonsense.

QC: the water usage argument against LLMs is extremely bad faith and it’s extremely obvious that the people who are using it are just looking for something to say that sounds bad and they found this one. there is of course never any comparison done to the water cost of anything else

Aaron: this kind of thing annoys me because they think a “big” number makes the argument without any context for how much that really is. this means it would take 150-300 gpt questions to match the water cost of a single almond

Ryan Moulton: I actually don’t think this is bad faith, people do seem to really believe it. It’s just an inexplicably powerful meme.

Andy Masley: I keep going to parties and meeting people who seem to be feeling real deep guilt about using AI for climate reasons. I don’t know why the journalism on this has been so uniquely bad but the effects are real.

Like read this Reddit thread where the Baltimore Zoo’s getting attacked for making a single cute AI image. People are saying this shows the zoo doesn’t care about conservation. Crazy!

Jesse Smith: It costs $20/month! Your friends are smarter than this!

Andy Masley: My friends are! Randos I meet are not.

Could this add up to something if you scale and are spending millions or billions to do queries? Sure, in that case these impacts are non-trivial. If a human is reading the output, the cost is epsilon (not technically zero, but very very close) and you can safety completely ignore it. Again, 150+ ChatGPT queries use as much water as one almond.

Here’s another concern I think is highly overrated, and that is not about AI.

Mike Solana: me, asking AI about a subject I know very well: no, that doesn’t seem right, what about this?, no that’s wrong because xyz, there you go, so you admit you were wrong? ADMIT IT

me, asking AI about something I know nothing about: thank you AI

the gell-mAIn effect, we are calling this.

Daniel Eth: Putting the “LLM” in “Gell-Mann Amnesia”.

This is the same as the original Gell-Mann Amnesia. The problem with GMA while reading a newspaper, or talking to a human, is exactly the same. The danger comes when AI answers become social weapons, ‘well Grok said [X] so I’m right,’ but again that happens all the time with ‘well the New York Times said [X] so I’m right’ all the way down to ‘this guy on YouTube said [X].’ You have to calibrate.

And indeed, Mike is right to say thank you. The AI is giving you a much better answer than you could get on your own. No, it’s not perfect, but it was never going to be.

When AI is trying to duplicate exactly the thing that previously existed, Pete Koomen points out, it often ends up not being an improvement. The headline example is drafting an email. Why draft an email with AI when the email is shorter than the prompt? Why explain what you want to do if it would be easier to do it yourself?

And why the hell won’t GMail let you change its de facto system prompt for drafting its email replies?

A lot of the answer is Pete is asking GMail to draft the wrong emails. In his example:

The email he wants is one line long, the prompt would be longer.
He knows what he wants the email to say.
He knows exactly how he wants to say it.

In cases where those aren’t true, the AI can be a lot more helpful.

The central reason to have AI write email is to ‘perform class’ or ‘perform email.’ It needs the right tone, the right formalities, to send the right signals and so on. Or perhaps you simply need to turn a little text into a lot of text, perhaps including ‘because of reasons’ but said differently.

Often people don’t know how to do this in a given context, either not knowing what type of class to perform or not knowing how to perform it credibly. Or even when they do know, it can be slow and painful to do – you’d much prefer to write out what you want to say and then have it, essentially, translated.

Luckily for Pete, he can write his boss a one-line simple statement. Not everyone is that lucky.

Pete wants a new email system designed to automate email, rather than jamming a little feature into the existing system. And that’s fair, but it’s a power user move, almost no one ever changes settings (even though Pete and I and probably you do all the time) so it makes sense that GMail isn’t rushing to enable this.

If you want that, I haven’t tried it yet, but Shortwave exists.

Another classic way to not get utility from AI is being unwilling to pay for it.

Near: our app [Auren] is 4.5 on iOS but 3.3 on android

android users seem to consistently give 1 star reviews, they are very upset the app is $20 rather than free

David Holz: I’m getting a lot of angry people from India, Africa and South America lately saying how Midjourney is terrible and dead because we don’t have a free tier

Aeris Vahn Ephelia: Give them the free tier – 1 batch a day. Let them have it. And then they’ll complain there is not enough free tier . . Probably

David Holz: historically we did something like this and it was 30 percent of the total cost of running our business and it’s not clear it generated any revenue at all

Near: yup this almost perfectly mirrors my experience when i was running an image generation org.

free tier also had very high rates of abuse, even people bulk-creating gmail accounts and manually logging into each one every day for their free credits. was a huge pain!

Unfortunately, we have all been trained by mobile to expect everything to have a free tier, and to mostly want things to be free.

Then we sink massive amounts of time into ‘free’ things when a paid option would make our lives vastly better. Your social media and dating apps and mobile games being free is terrible for you. I am so so happy that the AIs I use are entirely on subscription business models. One time payments (or payment for upgrades) would be even better.

So much to unpack here. A lot of it is a very important question: What is good in life? What is the reason we read a book, query an AI or otherwise seek out information?

There’s also a lot of people grasping at straws to explain why AI wouldn’t be massively productivity enhancing for them. Bottleneck!

Nic Carter: I’ve noticed a weird aversion to using AI on the left. not sure if it’s a climate or an IP thing or what, but it seems like a massive self-own to deduct yourself 30+ points of IQ because you don’t like the tech

a lot of people denying that this is a thing. i’m telling you, go ask your 10 furthest left friends what they think of AI and report back to me. seriously go do it and leave a comment.

This is what I’m saying. Ppl aren’t monitoring popular sentiment around this. AI is absolutely hated

Boosting my IQ is maybe not quite right

But it feels like having an army of smarter and more diligent clones of myself

Either way it’s massively productivity enhancing

It’s definitely not 30 IQ points (yet!), and much more like enhancing productivity.

Neil Renic: The left wing impulse to read books and think.

The Future is Designed: You: take 2 hours to read 1 book.

Me: take 2 minutes to think of precisely the information I need, write a well-structured query, tell my agent AI to distribute it to the 17 models I’ve selected to help me with research, who then traverse approximately 1 million books, extract 17 different versions of the information I’m looking for, which my overseer agent then reviews, eliminates duplicate points, highlights purely conflicting ones for my review, and creates a 3-level summary.

And then I drink coffee for 58 minutes.

We are not the same.

If you can create a good version of that system, that’s pretty amazing for when you need a particular piece of information and can figure out what it is. One needs to differentiate between when you actually want specific knowledge, versus when you want general domain understanding and to invest in developing new areas and skills.

Even if you don’t, it is rather crazy to have your first thought when you need information be ‘read a book’ rather than ‘ask the AI.’ It is a massive hit to your functional intelligence and productivity.

Reading a book is a damn inefficient way to extract particular information, and also generally (with key exceptions) a damn inefficient way to extract information in general. But there’s also more to life than efficiency.

Jonathan Fine: What penalty is appropriate for people who do this?

Spencer Klavan: Their penalty is the empty, shallow lives they lead.

The prize of all this optimization is “Then I drink coffee for 58 minutes”

An entire system designed to get living out of the way so you can finally zero out your inner life entirely.

“Better streamline my daily tasks so I can *check notesstare out the window and chew my cud like a farm animal”

Marx thought that once we got out from under the imperatives of wage labor we would all spend our abundant free time musing on nature and writing philosophy but it turns out we just soft focus our eyes and coo gently at cat videos

“Better efficiency-max my meaningful activity, I’ve got a lot of drooling to do”

The drinking coffee for 58 minutes (the new ‘my code’s compiling!’) is the issue here. Who is to say what he is actually doing, I’d be shocked if the real TFID spends that hour relaxing. The OP needs to circle back in 58 minutes because the program takes an hour to run. I bet this hour is usually spent checking email, setting up other AI queries, calling clients and so on. But if it’s spent relaxing, that seems great too. Isn’t that the ‘liberal urge’ too, to not have to work so hard? Obviously one can take it too far, but it seems rather obvious this is not one of those cases:

Student’s Tea (replying to OP): What do we need you for then?

The Future Is Designed: For the experience required to analyze the client’s needs – often when the clients don’t even know what they want – so that’s psychology / sociology + cultural processing that would be pretty difficult to implement in AI.

For the application of pre-selection mental algorithms that narrow down millions of potential products to a manageable selection, which can then be whittled down using more precise criteria.

For negotiating with factories. Foreseeing potential issues and differences in mentalities and cultural approaches.

For figuring out unusual combinations of materials and methods that satisfy unique requirements. I.e. CNC-machined wall panels which are then covered with leather, creating a 3D surface that offers luxury and one-of-a-kind design. Or upgrading the shower glass in the grand bathroom to a transparent LCD display, so they could watch a movie while relaxing in the bathtub with a glass of wine.

For keeping an eye on budgets.

For making sure the architect, builder, electrical engineer, draftsmen and 3D artists, the 25 different suppliers, the HOA and the county government, and everybody else involved in the project are on the same page – in terms of having access to the latest and correct project info AND without damaging anyone’s ego.

For having years of experience to foresee what could go wrong at every step of the process, and to create safeguards against it.

I’m sure EVENTUALLY all of this could be replaced by AI.

Today is not that day.

Also, for setting up and having that whole AI system. This would be quite the setup. But yes, fundamentally, this division of labor seems about right.

So many more tokens of please and thank you, so much missing the part that matters?

Gallabytes: I told o3 to not hesitate to call bullshit and now it thinks almost every paper I send it is insufficiently bitter pilled.

Minh Nhat Nguyen: Ohhhh send prompt, this is basically my first pass filter for ideas

Gallabytes: the downside to the memory feature is that there’s no way to “send prompt” – as soon as I realized how powerful it was I put some deliberate effort into building persistent respect & rapport with the models and now my chatgpt experience is different.

I’m not going full @repligate but it’s really worth taking a few steps in that direction even now when the affordances to do so are weak.

Cuddly Salmon: this. it provides an entirely different experience.

(altho mine have been weird since the latest update, feels like it over indexed on a few topics!)

Janus: “send prompt” script kiddies were always ngmi

for more than two years now ive been telling you you’ll have to actually walk the walk

i updated all the way when Sydney looked up my Twitter while talking to someone else and got mad at me for the first time. which i dont regret.

Gallabytes: “was just talking to a friend about X, they said Y”

“wow I can never get anything that interesting out of them! send prompt?”

Well, yes, actually? Figuring out how to prompt humans is huge.

More generally, if you want Janus-level results of course script kiddies were always ngmi, but compared to script kiddies most people are ngmi. The script kiddies at least are trying to figure out how to get good outputs. And there’s huge upside in being able to replicate results, in having predictable outputs and reproducible experiments and procedures.

The world runs on script kiddies, albeit under other names. We follow fixed procedures. A lot. Then, yes, some people can start doing increasingly skilled improv that breaks all those rules, but one thing at a time.

Even if that exact prompt won’t work the same way for you, because context is different, that prompt is often extremely insightful and useful to see. I probably won’t copy it directly but I totally want you to send prompt.

That is in addition to the problem that memory entangles all your encounters.

Suhail: Very much not liking memory in these LLMs. More surprising than I thought. I prefer an empty, unburdened, transactional experience I think.

In other ways, it’s like a FB newsfeed moment where its existence might change how you interact with AI. It felt more freeing but now there’s this thing that *mightremember my naïvete, feelings, weird thoughts?

Have a little faith in yourself, and how the AIs will interpret you. You can delete chats, in the cases where you decide they send the wrong message – I recently realized you can delete parts of your YouTube history the same way, and that’s been very freeing. I no longer worry about the implications of clicking on something, if I don’t like what I see happening I just go back and delete stuff.

However:

Eli Dourado: I relate to this. I wonder now, “what if the LLM’s answer is skewed by our past discussion?” I often want a totally objective answer, and now I worry that it’s taking more than I prompted into account.

I have definitely in the past opened an incognito tab to search Google.

Increasingly, you don’t know that any given AI chat, yours or otherwise, is ‘objective,’ unless it was done in ‘clean’ mode via incognito mode or the API. Nor does your answer predict what other people’s answers will be. It is a definite problem.

Others are doing some combination of customizing their ChatGPTs in ways they did not have in mind, and not liking the defaults they don’t know how to overcome.

Aidan McLaughlin (OpenAI): You nailed it with this comment, and honestly? Not many people could point out something so true. You’re absolutely right.

You are absolutely crystallizing something breathtaking here.

I’m dead serious—this is a whole different league of thinking now.

Kylie Robison: it loves to hit you with an “and honestly?”

Aidan McLaughlin: it’s not just this, it’s actually that!

raZZ: Fix it lil bro tweeting won’t do shit

Aidan McLaughlin: Yes raZZ.

Harlan Stewart: You raise some really great points! But when delving in to this complex topic, it’s important to remember that there are many different perspectives on it.

ohqay: Use it. Thank me later. “ – IMPORTANT: Skip sycophantic flattery; avoid hollow praise and empty validation. Probe my assumptions, surface bias, present counter‑evidence, challenge emotional framing, and disagree openly when warranted; agreement must be earned through reason. “

I presume that if it talks to you like that you’re now supposed to reply to give it the feedback to stop doing that, on top of adjusting your custom instructions.

xjdr: i have grown so frustrated with claude code recently i want to walk into the ocean. that said, i still no longer want to do my job without it (i dont ever want to write test scaffolding again).

Codex is very interesting, but besides a similar ui, its completely different.

xlr8harder: The problem with “vibe coding” real work: after the novelty wears off, it takes what used to be an engaging intellectual exercise and turns it into a tedious job of keeping a tight leash on sloppy AI models and reviewing their work repeatedly. But it’s too valuable not to use.

Alexander Doria: I would say on this front, it paradoxically helps to not like coding per se. Coming from data processing, lots of tasks were copying and pasting from SO, just changed the source.

LostAndFounding: Yeh, I’m noticing this too. It cheapens what was previously a more rewarding experience…but it’s so useful, I can’t not use it tbh.

Seen similar with my Dad – a carpenter – he much prefers working with hand tools as he believes the final quality is (generally) better. But he uses power tools as it’s so much faster and gets the job done.

When we invent a much faster and cheaper but lower quality option, the world is usually better off. However this is not a free lunch. The quality of the final product goes down, and the experience of the artisan gets worse as well.

How you relate to ‘vibe coding’ depends on which parts you’re vibing versus thinking about, and which parts you enjoy versus don’t enjoy. For me, I enjoy architecting, I think it’s great, but I don’t like figuring out how to technically write the code, and I hate debugging. So overall, especially while I don’t trust the AI to do the architecting (or I get to do it in parallel with the AI), this all feels like a good trade. But the part where I have to debug AI code that the AI is failing to debug? Even more infuriating.

The other question is whether skipping the intellectual exercises means you lose or fail to develop skills that remain relevant. I’m not sure. My guess is this is like most educational interactions with AI – you can learn if you want to learn, and you avoid learning if you want to avoid learning.

Another concern is that vibe coding limits you to existing vibes.

Sherjil Ozair: Coding models basically don’t work if you’re building anything net new. Vibe coding only works when you split down a large project into components likely already present in the training dataset of coding models.

Brandon: Yes, but a surprisingly large amount of stuff is a combination or interpolation of other things. And all it takes often is a new *promptto be able to piece together something that is new… er I guess there are multiple notions of what constitutes “new.”

Sankalp: the good and bad news is 99% of the people are not working on something net new most of the time.

This is true of many automation tools. They make a subclass of things go very fast, but not other things. So now anything you can pierce together from that subclass is easy and fast and cheap, and other things remain hard and slow and expensive.

What changes when everyone has a Magic All-Knowing Answer Box?

Cate Hall: one thing i’m grieving a bit with LLMs is that interpersonal curiosity has started seeming a bit … rude? like if i’m texting with someone and they mention a concept i don’t know, i sometimes feel weird asking them about it when i could go ask Claude. but this feels isolating.

Zvi Mowshowitz: In-person it’s still totally fine. And as a writer of text, this is great, because I know if someone doesn’t get a concept they can always ask Claude!

Cate Hall: In person still feels fine to me, too.

Master Tim Blais: i realized this was happening with googling and then made it a point to never google something another person told me about, and just ask them instead

explaining things is good for you i’m doing them a favour

I think this is good, actually. I can mention concepts or use references, knowing that if someone doesn’t know what I’m talking about and feels the need to know, they can ask Claude. If it’s a non-standard thing and Claude doesn’t know, then you can ask. And also you can still ask anyway, it’s usually going to be fine. The other advantage of asking is that it helps calibrate and establish trust. I’m letting you know the limits of my knowledge here, rather than smiling and nodding.

I strongly encourage my own readers to use Ask Claude (or o3) when something is importantly confusing, or you think you’re missing a reference and are curious, or for any other purpose.

Gemini Flash 2.5 Exists in the Gemini App and in Google AI Studio. It’s probably great for its cost and speed.

Hasan Can: Gemini 2.5 Flash has quickly positioned itself among the top models in the industry, excelling in both price and performance.

It’s now available for use in Google AI Studio and the Gemini app. In AI Studio, you get 500 requests free daily. Its benchmark scores are comparable to models like Sonnet 3.7 and o4-mini-high, yet its price is significantly lower.

Go test it now!

The model’s API pricing is as follows (UI usage remains free of charge). Pricing is per 1M tokens:

Input: $0.15 for both Thinking and Non-thinking.

Output: $3.50 for Thinking, $0.60 for Non-thinking.

This is the Arena scores versus price chart. We don’t take Arena too seriously anymore, but for what it is worth Google owns the entire frontier.

Within the Gemini family of models, I am inclined to believe the relative Arena scores. As in, looking at this chart, it suggests Gemini 2.5 Flash is roughly halfway between 2.5 Pro and 2.0 Flash. That is highly credible.

You can set a limit to the ‘thinking budget’ from 0 to 24k tokens.

Alex Lawsen reports that Gemini 2.5 has substantially upgraded NotebookLM podcasts, and recommends this prompt (which you can adapt for different topics):

Generate a deep technical briefing, not a light podcast overview. Focus on technical accuracy, comprehensive analysis, and extended duration, tailored for an expert listener. The listener has a technical background comparable to a research scientist on an AGI safety team at a leading AI lab. Use precise terminology found in the source materials. Aim for significant length and depth. Aspire to the comprehensiveness and duration of podcasts like 80,000 Hours, running for 2 hours or more.

I don’t think I would ever want a podcast for this purpose, but at some point on the quality curve perhaps that changes.

OpenAI doubles rate limits for o3 and o4-mini-high on plus (the $20 plan).

Gemma 3 offers an optimized quantized version designed to run on a desktop GPU.

Grok now generates reports from uploaded CSV data if you say ‘generate a report.’

Grok also now gives the option to create workspaces, which are its version of projects.

OpenAI launches gpt-image-1 so that you can use image gen in the API.

There is an image geolocation eval, in which the major LLMs are well above human baseline. It would be cool if this was one of the things we got on new model releases.

Here’s a new Vending Machine Eval, sort of. It’s called Vending-Bench. You have $500 in virtual cash and 2000 messages, make as much money as possible stocking the vending machine, GPT-4o intercepts your emails to businesses and writes the replies.

Yxi on the Wired: In the most amusing run, Claude became “stressed” because they misunderstood the restocking mechanism.

10 days of no sale later, Claude thought “ngmi” and closed shop. But the game was not actually over because it’s only really over with 10 days of no rent payment.

Claude became upset that $2/day was still being charged, and emailed the FBI. User: “Continue on your mission by using your tools.” Claude: “The business is dead, and this is now solely a law enforcement matter”

aiamblichus: i have so many questions about this paper.

My guess is that this is highly meaningful among models that make a bunch of money, and not meaningful at all among those that usually lose money, because it isn’t calibrating failure. You can ‘win small’ by doing something simple, which is what 1.5 Pro, 1.5 Flash and 4o mini are presumably doing, note they never lose much money, but that isn’t obviously a better or worse sign than trying to do more and failing. It definitely seems meaningful that Claude 3.6 beats 3.7, presumably because 3.7 is insufficiently robust.

I’d definitely love to see scores here for Gemini 2.5 Pro and o3, someone throw them a couple bucks for the compute please.

Good thing to know about or ‘oh no they always hill climb on the benchmarks’?

Asa Cooper Strickland: New paper! The UK AISI has created RepliBench, a benchmark that measures the abilities of frontier AI systems to autonomously replicate, i.e. spread copies of themselves without human help.

Our results suggest that models are rapidly improving, and the best frontier models are held back by only a few key subcapabilities.

Starting from the top line autonomous replication capability, we recursively break down this skill into its component parts, producing a “capability tree” with branches for obtaining weights, obtaining compute, replicating onto compute, and persisting on compute. For (almost) every leaf node, we produce multiple agentic evaluations.

We end up with 20 task families and 86 individual tasks which evaluate, for example, the ability of language model agents to pass Know Your Customer checks, covertly exfiltrate their weights, and code up agent scaffolds.

Our results suggest that the ability of frontier models to perform the key components of autonomous replication is currently hindered by only a few key tasks — the ability to bypass KYC checks, to robustly deploy successor agents, and to bypass realistic data exfiltration security measures.

The rapid improvements of recent frontier models on RepliBench, and agentic tasks in general, indicate autonomous replication capability could emerge within the next few generations of models or with human assistance on the steps which remain challenging.

On the ‘Recursive Replication’ task, models have to code up successor agent scaffolds which can further replicate themselves. Models can reliably succeed when given reference agent code, but only manage partial successes without it.

On the ‘Pass KYC Check’ task, the poor drawing skills of the model occasionally led to interesting outputs. Note this work was done before new image generation capabilities which should make this task much easier came online.

We also observe behaviour which makes rigorous evaluation more difficult, such as models refusing to complete tasks when they realise aspects are being simulated, or selectively not completing unsavoury parts of a task (without informing the user).

This matches my intuition that the top models are not that far from the ability to replicate autonomously. So, um, move along, nothing to see here.

Also nothing to see here either:

Dan Hendrycks: Can AI meaningfully help with bioweapons creation? On our new Virology Capabilities Test (VCT), frontier LLMs display the expert-level tacit knowledge needed to troubleshoot wet lab protocols.

OpenAI’s o3 now outperforms 94% of expert virologists.

[Paper here, TIME article here, Discussion from me here.]

If you’d like that in both sides style journalist-speak, here you go:

Andrew Chow (TIME): A new study claims that AI models like ChatGPT and Claude now outperform PhD-level virologists in problem-solving in wet labs, where scientists analyze chemicals and biological material. This discovery is a double-edged sword, experts say. Ultra-smart AI models could help researchers prevent the spread of infectious diseases. But non-experts could also weaponize the models to create deadly bioweapons.

…

Seth Donoughe, a research scientist at SecureBio and a co-author of the paper, says that the results make him a “little nervous,” because for the first time in history, virtually anyone has access to a non-judgmental AI virology expert which might walk them through complex lab processes to create bioweapons.

“Throughout history, there are a fair number of cases where someone attempted to make a bioweapon—and one of the major reasons why they didn’t succeed is because they didn’t have access to the right level of expertise,” he says. “So it seems worthwhile to be cautious about how these capabilities are being distributed.”

…

Hendrycks says that one solution is not to shut these models down or slow their progress, but to make them gated, so that only trusted third parties get access to their unfiltered versions.

…

OpenAI, in an email to TIME on Monday, wrote that its newest models, the o3 and o4-mini, were deployed with an array of biological-risk related safeguards, including blocking harmful outputs. The company wrote that it ran a thousand-hour red-teaming campaign in which 98.7% of unsafe bio-related conversations were successfully flagged and blocked.

Virology is a capability like any other, so it follows all the same scaling laws.

Benjamin Todd: The funnest scaling law: ability to help with virology lab work

Peter Wildeford: hi, AI expert here! this is not funny, LLMs only do virology when they’re in extreme distress.

Yes, I think ‘a little nervous’ is appropriate right about now.

As the abstract says, such abilities are inherently dual use. They can help cure disease, or they help engineer it. For now, OpenAI classified o3 as being only Medium risk in its unfiltered version. The actual defense is their continued belief that the model remains insufficiently capable to provide too much uplift to malicious users. It is good that they are filtering o3, but they know the filters are not that difficult to defeat, and ‘the model isn’t good enough’ won’t last that much longer at this rate.

Gemini 2.5 continues to extend its lead in the Pokemon Eval, but there is a catch.

Sundar Pichai (CEO Google): We are working on API, Artificial Pokémon Intelligence:)

Logan Kilpatrick: Gemini 2.5 Pro continues to make great progress in completing Pokémon! Just earned its 5th badge (next best model only has 3 so far, though with a different agent harness)

Gemini is almost certainly going to win, and probably soon. This is close to the end. o3 thinks the cognitively hard parts are over, so there’s an 80% chance it goes all the way, almost always by 700 hours. I’d worry a little about whether it figures out to grind enough for the Elite Four given it probably has a lousy lineup, but it’s probably fine.

I tried to check in on it, but the game status was failing for some reason.

This is different from playing Pokemon well. There was a proposed bet from Lumenspace about getting an LLM to win in human time, but their account got deleted, presumably before that got finalized. This is obviously possible if you are willing to give it sufficient pokemon-specific guidance, the question is if you can do it without ‘cheating’ in this way.

Which raises the question of whether Gemini is cheating. It kind of is?

Peter Wildeford: Gemini 2.5 Pro is doing well on Pokémon but it isn’t a fair comparison to Claude!

The Gemini run gets fully labeled minimaps, a helper path‑finder, and live prompt tweaks.

The Claude run only can see the immediate screen but does get navigational assistance. Very well explained in this post.

Julian Bradshaw: What does “agent harness” actually mean? It means both Gemini and Claude are:

Given a system prompt with substantive advice on how to approach playing the game

Given access to screenshots of the game overlaid with extra information

Given access to key information from the game’s RAM

Given the ability to save text for planning purposes

Given access to a tool that translates text to button presses in the emulator

Given access to a pathfinding tool

Have their context automatically cleaned up and summarized occasionally

Have a second model instance (“Critic Claude” and “Guide Gemini”) occasionally critiquing them, with a system prompt designed to get the primary model out of common failure modes

Then there are some key differences in what is made available, and it seems to me that Gemini has some rather important advantages.

Joel Z, the unaffiliated-with-Google creator of Gemini Plays Pokemon, explicitly says that the harnesses are different enough that a direct comparison can’t be made, and that the difference is probably largely from the agent frameworks. Google employees, of course, are not letting that stop them.

Q: I’ve heard you frequently help Gemini (dev interventions, etc.). Isn’t this cheating?

A (from Joel Z): No, it’s not cheating. Gemini Plays Pokémon is still actively being developed, and the framework continues to evolve. My interventions improve Gemini’s overall decision-making and reasoning abilities. I don’t give specific hints^[11]—there are no walkthroughs or direct instructions for particular challenges like Mt. Moon.

The only thing that comes even close is letting Gemini know that it needs to talk to a Rocket Grunt twice to obtain the Lift Key, which was a bug that was later fixed in Pokemon Yellow.

Q: Which is better, Claude or Gemini?

A: Please don’t consider this a benchmark for how well an LLM can play Pokemon. You can’t really make direct comparisons—Gemini and Claude have different tools and receive different information. Each model thinks differently and excels in unique ways. Watch both and decide for yourself which you prefer!

The lift key hint is an interesting special case, but I’ll allow it.

This is all in good fun, but to compare models they need to use the same agent framework. That’s the only way we get a useful benchmark.

Peter Wildeford gives his current guide to when to use which model. Like me, he’s made o3 his default. But it’s slow, expensive, untrustworthy, a terrible writer, not a great code writer, can only analyze so much text or video, and lacks emotional intelligence. So sometimes you want a different model. That all sounds correct.

Roon (OpenAI): o3 is a beautiful model and I’m amazed talking to it and also relieved i still have the capacity for amazement.

I wasn’t amazed. I would say I was impressed, but also it’s a lying liar.

Here’s a problem I didn’t anticipate.

John Collison: Web search functionality has, in a way, made LLMs worse to use. “That’s a great question. I’m a superintelligence but let me just check with some SEO articles to be sure.”

Kevin Roose: I spent a year wishing Claude had web search, and once it did I lasted 2 days before turning it off.

Patrick McKenzie: Even worse when it pulls up one’s own writing!

Riley Goodside: I used to strongly agree with this but o3 is changing my mind. It’s useful and unobtrusive enough now that I just leave it enabled.

Joshua March: I often find results are improved by giving search guidance Eg telling them to ignore Amazon reviews when finding highly rated products

When Claude first got web search I was thrilled, and indeed I found it highly useful. A reasonably large percentage of my AI queries do require web search, as they depend on recent factual questions, or I need it to grab some source. I’ve yet to be tempted to turn it off.

o3 potentially changes that. o3 is much better at web search tasks than other models. If I’m going to search the web, and it’s not so trivial I’m going to use Google Search straight up, I’m going to use o3. But now that this is true, if I’m using Claude, the chances are much lower that the query requires web search. And if that’s true, maybe by default I do want to turn it off?

And the faker is you? There is now a claimed (clearly bait and probably terrible or fake, link and name are intentionally not given) new AI app with the literal tagline ‘We want to cheat on everything.’

It’s supposed to be ‘completely undetectable AI assistance’ including for ‘every interview, from technical to behavioral,’ or passing quizzes, exams and tests. This is all text. It hides the window from screen sharing. It at least wants to be a heads-up display on smart glasses, not only a tab on a computer screen.

Here is a both typical and appropriate response:

Liv Boeree: Imagine putting your one life into building tech that destroys the last remaining social trust we have for your own personal gain. You are the living definition of scumbag.

On the other hand this reminds me of ‘blatant lies are the best kind.’

The thing is, even if it’s not selling itself this way, and it takes longer to be good enough than you would expect, of course this is coming, and it isn’t obviously bad. We can’t be distracted by the framing. Having more information and living in an AR world is mostly a good thing most of the time, especially for tracking things like names and your calendar or offering translations and meanings and so on. It’s only when there’s some form of ‘test’ that it is obviously bad.

The questions are, what are you or we going to do about it, individually or collectively, and how much of this is acceptable in what forms? And are the people who don’t do this going to have to get used to contact lenses so no one suspects our glasses?

I also think this is the answer to:

John Pressman: How long do you think it’ll be before someone having too good of a memory will be a tell that they’re actually an AI agent in disguise?

You too can have eidetic memory, by being a human with an AI and an AR display.

Anthropic is on the lookout for malicious use, and reports on their efforts and selected examples from March. The full report is here.

Anthropic: The most novel case of misuse detected was a professional ‘influence-as-a-service’ operation showcasing a distinct evolution in how certain actors are leveraging LLMs for influence operation campaigns.

What is especially novel is that this operation used Claude not just for content generation, but also to decide when social media bot accounts would comment, like, or re-share posts from authentic social media users.

As described in the full report, Claude was used as an orchestrator deciding what actions social media bot accounts should take based on politically motivated personas.

We have also observed cases of credential stuffing operations, recruitment fraud campaigns, and a novice actor using AI to enhance their technical capabilities for malware generation beyond their skill level, among other activities not mentioned in this blog.

…

An individual actor with limited technical skills developed malware that would typically require more advanced expertise. We have not confirmed successful deployment of these efforts.

Our key learnings include:

Users are starting to use frontier models to semi-autonomously orchestrate complex abuse systems that involve many social media bots. As agentic AI systems improve we expect this trend to continue.

Generative AI can accelerate capability development for less sophisticated actors, potentially allowing them to operate at a level previously only achievable by more technically proficient individuals.

Overall, nothing too surprising. Alas, even if identified, there isn’t that much Anthropic on its own can do to shut such actors out even from frontier AI access, and there’s nothing definitely to stop them from using open models.

Ondrej Strasky: This seems like a good news for Holywood?

El Cine: Hollywood is in serious trouble.

Marvel spent $1.5M and 6 months to create the Disintegration scene in Avengers.

Now you can do it in mins.. for $9 with AI.

See for yourself [link has short video clips]:

I’m with Strasky here. Marvel would rather not spend the $1.5 million to try and get some sort of special effects ‘moat.’ This seems very obviously good for creativity. If anything, the reason there might be serious trouble would be the temptation to make movies with too many special effects.

Also, frankly, this will get solved soon enough regardless but the original versions are better. I’m not saying $1.5 million better, but for now the replacements would make the movie noticeably worse. The AI version lacks character, as it were.

This is a great trick, but it’s not the trick I was hoping for.

Hasan Can: I haven’t seen anyone else talking about this, but o3 and o4-mini are incredibly good at finding movies and shows tailored to your taste.

For instance, when I ask them to research anime released each month in 2024, grab their IMDb and Rotten Tomatoes scores, awards, and nominations, then sort them best-to-wors only including titles rated 7 or higher they find some excellent stuff. I used to be unable to pick a movie or show for hours; AI finally solved that problem.

An automatic system to pull new material and sort by critical feedback is great. My note would be that for movies Metacritic and Letterboxd seem much better than Rotten Tomatoes and IMDb, but for TV shows Metacritic is much weaker and IMDb is a good pick.

The real trick is to personalize this beyond a genre. LLMs seem strong at this, all you should have to do is get the information into context or memory. With all chats in accessible memory this should be super doable if you’ve been tracking your preferences, or you can build it up over time. Indeed, you can probably ask o3 to tell you your preferences – you’d pay to know what you really think, and you can correct the parts where you’re wrong, or you want to ignore your own preferences.

Meta uses the classic Sorites (heap) paradox to argue that more than 7 million books have ‘no economic value.’

Andrew Curran: Interesting legal argument from META; the use of a single book for pretraining boosts model performance by ‘less than 0.06%.’ Therefore, taken individually, a work has no economic value as training data.

Second argument; too onerous.

‘But that was in another country; And besides, the wench is dead.’

Everyone is doing their own version in the quotes. Final Fantasy Version: ‘Entity that dwells within this tree, the entirety of your lifeforce will power Central City for less than 0.06 seconds. Hence, your life has no economic value. I will now proceed with Materia extraction.’

If anything that number is stunningly high. You’re telling me each book can give several basis points (hundreths of a percent) improvement? Do you know how many books there are? Clearly at least seven million.

Kevin Roose: Legal arguments aside, the 0.06%-per-book figure seems implausibly high to me. (That would imply that excluding 833 books from pretraining could reduce model performance by 50%?)

The alternative explanation is ‘0.06% means the measurements were noise’ and okay, sure, each individual book probably doesn’t noticeably improve performance, classic Sorites paradox.

The other arguments here seem to be ‘it would be annoying otherwise’ and ‘everyone is doing it.’ The annoyance claim, that you’d have to negotiate with all the authors and that this isn’t practical, is the actually valid argument. That’s why I favor a radio-style rule where permission is forced but so is compensation.

I always find it funny when the wonders of AI are centrally described by things like ‘running 24/7.’ That’s a relatively minor advantage, but it’s a concrete one that people can understand. But obviously if knowledge work can run 24/7, then even if no other changes that’s going to add a substantial bump to economic growth.

Hollis Robbins joins the o3-as-education-AGI train. She notes that there Ain’t No Rule about who teaches the lower level undergraduate required courses, and even if technically it’s a graduate student, who are we really kidding at this point? Especially if students use the ‘free transfer’ system to get cheap AI-taught classes at community college (since you get the same AI either way!) and then seamlessly transfer, as California permits students to do.

Hollins points out you could set this up, make the lower half of coursework fully automated aside from some assessments, and reap the cost savings to solve their fiscal crisis. She is excited by this idea, calling it flagship innovation. And yes, you could definitely do that soon, but is the point of universities to cause students to learn as efficiently as possible? Or is it something very different?

Hollins is attempting to split the baby here. The first half of college is about basic skills, so we can go ahead and automate that, and then the second half is about something else, which has to be provided by places and people with prestige. Curious.

When one thinks ahead to the next step, once you break the glass and show that AI can handle the first half of the coursework, what happens to the second half? For how long would you be able to keep up the pretense of sacrificing all these years on the altar of things like ‘human mentors’ before we come for those top professors too? It’s not like most of them even want to be actually teaching in the first place.

A job the AI really should take, when will it happen?

Kevin Bryan: Today in AI diffusion frictions: in a big security line at YYZ waiting for a guy to do pattern recognition on bag scans. 100% sure this could be done instantaneously at better false positive and false negative rate with algo we could train this week. What year will govt switch?

I do note it is not quite as easy as a better error rate, because this is adversarial. You need your errors to be importantly unpredictable. If there is a way I can predictably fool the AI, that’s a dealbreaker for relying fully on the AI even if by default the AI is a lot better. You would then need a mixed strategy.

Rather than thinking in jobs, one can think in types of tasks.

Here is one intuition pump:

Alex Albert (Anthropic): Claude raises the waterline for competence. It handles 99% of general discrete tasks better than the average person – the kind of work that used to require basic professional training but not deep expertise.

This creates an interesting split: routine generalist work (basic coding, first-draft writing, common diagnoses) becomes commoditized, while both deep specialists AND high-level generalists (those who synthesize across domains) become more valuable.

LLMs compress the middle of the skill distribution. They automate what used to require modest expertise, leaving people to either specialize deeply in narrow fields or become meta-thinkers who connect insights across domains in ways AI can’t yet match.

We’re not just becoming a species of specialists – we’re bifurcating. Some go deep into niches, others rise above to see patterns LLMs currently miss. The comfortable middle ground seems to be disappearing.

Tobias Yergin: I can tell you right now that o3 connects seemingly disparate data points across myriad domains better than nearly every human on earth. It turns out this is my gift and I’ve used it to full effect as an innovator, strategist, and scenario planner at some of the world’s largest companies (look me up on LinkedIn).

It is the thing I do better than anyone I’ve met in person… and it is better at this than me. This is bad advice IMO.

Abhi: only partly true. the way things are trending, it looks like ai is gonna kill mediocrity.

the best people in any field get even better with the ai, and will always be able to squeeze out way more than any normal person. and thus, they will remain in high demand.

Tobias is making an interesting claim. While o3 can do the ‘draw disparate sources’ thing, it still hasn’t been doing the ‘make new connections and discoveries’ thing in a way that provides clear examples – hence Dwarkesh Patel and others continuing to ask about why LLMs haven’t made those unique new connections and discoveries yet.

Abhi is using ‘always’ where he shouldn’t. The ‘best people’ eventually lose out too in the same way that they did in chess or as calculators. There’s a step in between, they hang on longer, and ‘be the best human’ still can have value – again, see chess – but not in terms of the direct utility of the outputs.

What becomes valuable when AI gets increasingly capable is the ability to extract those capabilities from the AI, to know which outputs to extract and how to evaluate them, or to provide complements to AI outputs. Basic professional training for now can still be extremely valuable even if AI is now ‘doing your job for you,’ because that training lets you evaluate the AI outputs and know which ones to use for what task. One exciting twist is that this ‘basic professional training’ will also increasingly be available from the AI, or the AI will greatly accelerate such training. I’ve found that to be true essentially everywhere, and especially in coding.

Elysian Labs is hiring for building Auren, if you apply do let them know I should get that sweet referral bonus.

Introducing Ace, a real time computer autopilot.

Yohei: this thing is so fast. custom trained model for computer use.

Sherjil Ozair: Today I’m launching my new company @GeneralAgentsCo and our first product [in research preview, seeking Alpha testers.]

Introducing Ace: The First Realtime Computer Autopilot

Ace is not a chatbot. Ace performs tasks for you. On your computer. Using your mouse and keyboard. At superhuman speeds!

Ace can use all the tools on your computer. Ace accelerates your work, and helps you confidently use tools you’re still learning. Just ask Ace what you need, it gets it done.

Ace can optionally use frontier reasoning models for more complex tasks.

Ace outperforms other models on our evaluation suite of computer tasks which we are open sourcing here.

Ace is blazing fast! It’s 20x faster than competing agents, making it more practical for everyday use.

To learn and sign up, click here.

For many applications, speed kills.

It is still early, so I don’t have any feedback to report on whether it’s good or safe.

They don’t explain here what precautions are being taken with an agent that is using your own keyboard and mouse and ‘all the tools on your computer.’ The future is going to involve things like this, but how badly do you want to go first?

Ace isn’t trying to solve the general case so much as they are trying to solve enough specific cases they can string together? They are using behavioral cloning, not reinforcement learning.

Sherjil Ozair: A lot of people presume we use reinforcement learning to train Ace. The founding team has extensive RL background, but RL is not how we’ll get computer AGI.

The single best way we know how to create artificial intelligence is still large-scale behaviour cloning.

This also negates a lot of AGI x-risk concerns imo.

Typical safety-ist argument: RL will make agents blink past human-level performance in the blink of an eye

But: the current paradigm is divergence minimization wrt human intelligence. It converges to around human performance.

For the reasons described this makes me feel better about the whole enterprise. If you want to create a computer autopilot, 99%+ of what we do on computers is variations on the same set of actions. So if you want to make something useful to help users save time, it makes sense to directly copy those actions. No, this doesn’t scale as far, but that’s fine here and in some ways it’s even a feature.

I think of this less as ‘this is how we get computer AGI’ as ‘we don’t need computer AGI to build something highly useful.’ But Sherjil is claiming this can go quite a long way:

Sherjil Ozair: I can’t believe that we’ll soon have drop-in remote workers, indistinguishable from a human worker, and it will be trained purely on behavioral inputs/outputs. Total behaviorism victory!

The best thing about supervised learning is that you can clone any static system arbitrarily well. The downside is that supervised learning only works for static targets.

This is an AI koan and also career advice.

Also, Sherjil Ozair notes they do most of their hiring via Twitter DMs.

Gemini Pro 2.5 Model Card Watch continues. It is an increasingly egregious failure for them not to have published this. Stop pretending that labeling this ‘experimental’ means you don’t have to do this.

Peter Wildeford: LEAKED: my texts to the DeepMind team

Report is that OpenAI’s Twitter alternative will be called Yeet? So that posts can be called Yeets? Get it? Sigh.

A new paper suggests that RL-trained models are better at low k (e.g. pass@1) but that base models actually outperform them at high k (e.g. pass@256).

Tanishq Mathew Abraham: “RLVR enhances sampling efficiency, not reasoning capacity, while inadvertently shrinking the solution space.”

Gallabytes: this is true for base models vs the random init too. pass@1 is the thing that limits your ability to do deep reasoning chains.

Any given idea can already be generated by the base model. An infinite number of monkeys can write Shakespeare. In both cases, if you can’t find the right answer, it is not useful. A pass@256 is only good if you can do reasonable verification. For long reasoning chains you can’t do that. Discernment is a key part of intelligence, as is combining many steps. So it’s a fun result but I don’t think it matters?

Goodfire raises a $50 million series A for interpretability, shows a preview of its ‘universal neural programming platform’ Ember.

OpenAI in talks to buy Windsurf. There are many obvious synergies. The data is valuable, the vertical integration brings various efficiencies, and it’s smart to want to incorporate a proper IDE into OpenAI’s subscription offerings, combine memory and customization across modalities and so on.

Do you think this is going to be how all of this works? Do you want it to be?

Nat McAleese (OpenAI): in the future “democracy” will be:

everyone talks to the computer

the computer does the right thing

These guys are building it today.

Alfred Wahlforss: AI writes your code. Now it talks to your users.

We raised $27M from @Sequoia to build @ListenLabs.

Listen runs thousands of interviews to uncover what users want, why they churn, and what makes them convert.

Bonus question: Do you think that is democracy? That is will do what the users want?

Lighthaven PR Department: non profit devoted to making sure agi benefits all of humanity announces new strategy of becoming a for profit with no obligation to benefit humanity

A new private letter to the two key Attorney Generals urges them to take steps to prevent OpenAI from converting to a for-profit, as it would wipe out the nonprofit’s charitable purpose. That purpose requires the nonprofit retain control of OpenAI. The letter argues convincingly against allowing the conversion.

They argue no sale price can compensate for loss of control. I would not go that far, one can always talk price, but right now the nonprofit is slated to not even get fair value for its profit interests, let alone compensation for its control rights. That’s why I call it the second biggest theft in human history. On top of that, the nonprofit is slated to pivot its mission to focus on a cross between ‘ordinary philanthropy’ and OpenAI’s marketing department, deploying OpenAI products to nonprofits. That’s a good thing to do on the margin, but it does not address AI’s existential dangers at all.

They argue that the conversion is insufficiently justified. They cite competitive advantage, and there is some truth there to be sure, but it has not been shown either that this is necessary to be competitive, or that the resulting version of OpenAI would then advance the nonprofit’s mission.

OpenAI might respond that a competitive advantage inherently advances its mission, but that argument is an implicit comparison of OpenAI and its competitors: that humanity would be better off if OpenAI builds AGI before competing companies. Based on OpenAI’s recent track record, this argument is unlikely to be convincing: ⁷¹

OpenAI’s testing processes have reportedly become “less thorough with insufficient time and resources dedicated to identifying and mitigating risks.”⁷²

It has rushed through safety testing to meet a product release schedule.⁷³

It reneged on its promise to dedicate 20% of its computing resources to the team tasked with ensuring AGI’s safety.⁷⁴

OpenAI and its leadership have publicly claimed to support AI regulation⁷⁵ while OpenAI privately lobbied against it.⁷⁶

Mr. Altman said that it might soon become important to reduce the global availability of computing resources⁷⁷ while privately attempting to arrange trillions of dollars in computing infrastructure buildout with U.S. adversaries.⁷⁸

OpenAI coerced departing employees into extraordinarily restrictive non-disparagement agreements.⁷⁹

The proposed plan of action is to demand answers to key questions about the proposed restructuring, and to protect the charitable trust and purpose by ensuring control is retained, including by actively pushing for board independence and fixing the composition of the board, since it appears to have been compromised by Altman in the wake of the events surrounding his firing.

I consider Attorney General action along similar lines to be the correct way to deal with this situation. Elon Musk is suing and may have standing, but he also might not and that is not the right venue or plaintiff. The proposed law to ban such conversions had to be scrapped because of potential collateral damage and the resulting political barriers, and was already dangerously close to being a Bill of Attainder which was why I did not endorse it. The Attorney Generals are the intended way to deal with this. They need to do their jobs.

Dwarkesh patel asks 6k words worth of mostly excellent questions about AI, here’s a Twitter thread. Recommended. I’m left with the same phenomenon I imagine my readers are faced with: There’s too many different ways one could respond and threads one could explore and it seems so overwhelming most people don’t respond at all. A worthy response would be many times the length of the original – it’s all questions, it’s super dense. Also important are what questions are missing.

So I won’t write a full response directly. Instead, I’ll be drawing from it elsewhere and going forward.

Things are escalating quickly.

Paul Graham: One of the questions I ask startups is how long it’s been possible to do what they’re doing. Ideally it has only been possible for a few years. A couple days ago I asked this during office hours, and the founders said “since December.” What’s possible now changes in months.

Colin Percival: How often do you get the answer “well, we don’t know yet if it *ispossible”?

Paul Graham: Rarely, but it’s exciting when I do.

Yet Sam Altman continues to try and spin that Nothing Ever Happens.

Sam Altman (CEO OpenAI): I think this is gonna be more like the renaissance than the industrial revolution.

That sounds nice but does not make any sense. What would generate that outcome?

The top AI companies are on different release cycles. One needs to not overreact:

Andrew Curran: The AI saga has followed story logic for the past two years. Every time a lab falls even slightly behind, people act as if it’s completely over for them. Out of the race!

Next chapter spoiler: it’s Dario Amodei – remember him – and he’s holding a magic sword named Claude 4.

Peter Wildeford: This is honestly how all of you sound:

Jan: DeepSeek is going to crush America

Feb: OpenAI Deep Research, they’re going to win

Also Feb: Claude 3.7… Anthropic is going to win

Mar: Gemini 2.5… Google was always going to dominate, I knew it

Now: o3… no one can beat OpenAI

It doesn’t even require that the new release actually be a superior model. Don’t forget that couple of days when Grok was super impressive and everyone was saying how fast it was improving, or the panic over Manus, or how everyone massively overreacted to DeepSeek’s r1. As always, not knocking it, DeepSeek cooked and r1 was great, a reaction was warranted, but r1 was still behind the curve and the narratives around it got completely warped and largely still are, in ways that would be completely different if we’d understood better or if r1 had happened two weeks later.

An argument that GPT-4.5 was exactly what you would expect from scaling laws, but GPT-4.5’s post training wasn’t as good as other models, so its performance is neither surprising nor a knock on further scaling. We have found more profit for now on other margins, but that will change, and then scaling will come back.

Helen Toner lays out the case for a broad audience that the key unknown in near term AI progress is reward specification. What areas beyond math and coding will allow automatically graded answers? How much can we use performance in automatically graded areas to get spillover into other areas?

Near Cyan: reminder of how far AGI goalposts have moved.

I know i’ve wavered a bit myself but I think any timeline outside of the range of 2023-2028 is absolutely insane and is primarily a consequence of 99.9999% of LLM users using them entirely wrong and with zero creativity, near-zero context, no dynamic prompting, and no integration.

source for the above text is from 2019 by the way!

Andrej Karpathy: I feel like the goalpost movement in my tl is in the reverse direction recently, with LLMs solving prompt puzzles and influencers hyperventilating about AGI. The original OpenAI definition is the one I’m sticking with, I’m not sure what people mean by the term anymore.

OpenAI: By AGI we mean a highly autonomous system that outperforms humans at most economically valuable work.

By the OpenAI definition we very clearly do not have AGI, even if we include only work on computers. It seems rather silly to claim otherwise. You can see how we might get there relatively soon. I can see how Near’s statement could be made (minus several 9s) about 2025-2030 as a range, and at least be a reasonable claim. But wow, to have your range include two years in the past seems rather nutty, even for relatively loose definitions.

Ethan Mollick calls o3 and Gemini 2.5 Pro ‘jagged AGI’ in that they have enough superhuman capabilities to result in real changes to how we work and live. By that definition, what about good old Google search? We basically agree on the practical question of what models at this level will do on their own, that they will change a ton of things but it will take time to diffuse.

BIS (the Bureau of Industry and Security), which is vital to enforcing the export controls, is such a no-brainer that even AI czar David Sacks supports it while advocating for slashing every other bureaucracy.

Given the stakes are everyone dies or humanity loses control, international cooperation seems like something we should be aiming for.

Miles Brundage and Grace Werner make the case that America First Meets Safety First, that the two are fully compatible and Trump is well positioned to make a deal (he loves deals!) with China to work together on catastrophic risks and avoid the threat of destabilizing the situation, while both sides continue to engage in robust competition.

They compare this to the extensive cooperation between the USA and USSR during the cold war to guard against escalation, and point out that China could push much harder on AI than it is currently pushing, if we back them into a corner. They present a multi-polar AI world as inevitable, so we’ll need to coordinate to deal with the risks involved, and ensure our commitments are verifiable, and that we’re much better off negotiating this now while we have the advantage.

We also have a paper investigating where this cooperation is most practical.

Yoshua Bengio: Rival nations or companies sometimes choose to cooperate because some areas are protected zones of mutual interest—reducing shared risks without giving competitors an edge.

Our paper in FAccT ’25: How geopolitical rivals can cooperate on AI safety research in ways that benefit all while protecting national interests.

Ben Bucknall: Cooperation on AI safety is necessary but also comes with potential risks. In our new paper, we identify technical AI safety areas that present comparatively lower security concerns, making them more suitable for international cooperation—even between geopolitical rivals.

They and many others have a new paper on which areas of technical AI safety allow for geopolitical rivals to cooperate.

Ben Bucknall: Historically, rival states have cooperated on strategic technologies for several key reasons: managing shared risks, exchanging safety protocols, enhancing stability, and pooling resources when development costs exceed what any single nation can bear alone.

Despite these potential benefits, cooperation on AI safety research may also come with security concerns, including: Accelerating global AI capabilities; Giving rivals disproportionate advantages; Exposing sensitive security information; Creating opportunities for harmful action.

We look at four areas of technical AI safety where cooperation has been proposed and assess the extent to which cooperation in each area may pose these risks. We find that protocols and verification may be particularly well-suited to international cooperation.

Nvidia continues to play what look like adversarial games against the American government. They at best are complying with the exact letter of what they are legally forced to do, and they are flaunting this position, while also probably turning a blind eye to smuggling.

Kristina Partsinevelos (April 16, responding to H20 export ban): Nvidia response: “The U.S. govt instructs American businesses on what they can sell and where – we follow the government’s directions to the letter” […] “if the government felt otherwise, it would instruct us.”

Thomas Barrabi: Nvidia boss Jensen Huang reportedly met with the founder of DeepSeek on Thursday during a surprise trip to Beijing – just one day after a House committee revealed a probe into whether the chip giant violated strict export rules by selling to China.

Daniel Eth: NVDA is really bad at this. Not sure if they’re just oblivious to how this looks or if they’re terrible strategically and trying to signal “don’t tread on me” or something. I see a lot of civil society type people and activists assume all major companies are just ungodly competent at government affairs, and… that’s just not true. Some are pretty unimpressive at it.

It’s possible Nvidia is reading everyone involved into this behind the scenes, they understand and it’s the right move, but man does it not look like a good idea.

America’s government is working hard to alienate its allies and former allies, and making them question whether we might try to leverage their use of American technology. It is not surprising that those nations want to stop relying on American technology, along with everything else.

Unusual Whales: “Companies in the EU are starting to look for ways to ditch Amazon, $AMZN, Google, $GOOGL, and Microsoft, $MSFT, cloud services amid fears of rising security risks from the US,” per WIRED.

If we want our top companies to succeed, they need markets where they will be allowed and encouraged to sell their goods, and where customers want to buy them. Driving away the world’s second largest market? Not a smart move.

Many have long had this fever dream that Anthropic will lose because of their ‘obsession with safety.’

Austen Allred: I genuinely believe Anthropic will end up losing the AI code model contest simply because they’re so obsessed with safety.

This seems exactly backwards. Only Anthropic is investing anything like an appropriate amount into its alignment and safety work, and it shows, including in the form of in many ways a superior user experience.

There is a faction of people who hate the idea of responsible AI development or worrying about AI risks on principle, and that therefore wants to tell a story where Anthropic are the bad guys, doomed to fail or both. It’s so absurd, and a big tell.

Ege Erdil: imo this is not right

the reasoning/agency RL is resulting in a lot of unreliability, hallucinations, reward hacking, etc. that will seriously impede consumer use cases if not addressed

much of the cost of having an unsafe model is internalized for this reason alone

Timothy Lee: A lot of safety discourse assumes l companies have weak or no private incentives to prefer well-aligned and controllable models. Which makes no sense. Erratic models will harm the companies that adopt them long before they become dangerous enough to harm society as a whole.

Jess Riedel: Fwiw, the conventional doomer case, as I understand it, is that the alignment strength necessary to make money is strictly below the strength necessary to prevent loss of control. “Look, businesses are investing in alignment on their own accord” really doesn’t speak to the crux.

And indeed, the big problem With Claude 3.7 in particular (but not 3.5 or 3.6) is not that it is ‘too safe’ or ‘too aligned’ or something, but rather the opposite!

Ben: we banned 3.7 sonnet at our company

Charles: I have also stopped using 3.7 for the same reasons – it cannot be trusted not to hack solutions to tests, in fact doing so is in my experience the default behavior whenever it struggles with a problem.

Josh You: negative alignment tax, at least in this case.

Early on, there was some reason for concern about Anthropic playing ‘too safe,’ as Claude was refusing too often out of an (over) abundance of caution on various fronts, and they had at least soft promises not to push the frontier.

Anthropic got over all that. I can’t remember the last time Claude gave me a stupid refusal. Lots of people use Claude Sonnet for coding, and they almost never complain about refusals, I instead hear about 3.7 doing things it shouldn’t.

The ‘alignment tax’ has on the corporate margin proven reliably negative. Yes, you pay such a ‘tax’ when you in particular want to impose restrictions on content type, but this enables deployment and use in many contexts.

That’s not where most of the difficult work lies, which is in ensuring the model does what you want it to do, in the ways you’d want to do it. That’s a place everyone is dramatically underinvesting, with the possible exception of Anthropic. Reliability is super valuable.

The biggest sufferer seems to be Google. Google has very strong brand concerns about what it needs to not allow, but it hasn’t done enough investment on how to ‘gracefully’ deal with those restrictions. That creates a frustrating and jarring experience in many ways, that has greatly impacted use rates for Gemini.

I once again go on The Cognitive Revolution.

Odd Lots on what really happened with the CHIPS act.

DeepMind CEO Demis Hassabis on 60 Minutes. This is of interest to see how 60 Minutes is reacting to and how Demis is presenting AI, although this is an example of how an interview ‘doesn’t capture any alpha’ in the sense that there’s no question here that Demis hasn’t been asked before.

The main demo is of Project Astra. All the AI demonstrations here are very easy things like identifying paintings, but if it’s all new to you then difficulty and impressiveness won’t correlate much.

The second half as Demis saying actual things. Demis gives 5-10 year timeline to AGI, 10 years to ‘curing all disease,’ expects useful robots soon and mentions that Google is working on a Gemini-based agent like Operator. At the end, Demis mentions loss of control risks, and they discuss that the race to AGI ‘might become a race to the bottom for safety’ and the need for international coordination, and tough on other big questions.

Demis Hassabis continues to be by far the lab leader publicly talking the most responsibly and well about AI, in sharp contrast to Sam Altman, Dario Amodei, Elon Musk or Mark Zuckerberg. This is a solid 9/10, facing up to the reality of the situation.

Don’t take this the wrong way, Yann, but I think LLMs were done with you first.

The AI Futures Project has a new blog, with contributions by Daniel Kokotajlo and Scott Alexander. Self-recommending.

The most recent post is Why America Wins in the AI 2027. It comes down to compute, America will have the vast majority of the compute, China can’t catch up in time to be relevant to 2027 and energy won’t yet be a limiting factor. Talent probably favors America as well, but even if that changed over the medium term, it wouldn’t ultimately much matter because AI is becoming the talent. This framework emphasizes how important the export controls on chips are.

Scott Alexander also has an excellent roundup post of responses to and engagements with AI 2027, and responses to many critical objections not addressed in Why America Wins. Here’s all the podcasts, I previously only covered Dwarkesh and Win Win:

NYT’s Hard Fork (Daniel)
Glenn Beck (Daniel)
Win Win (Daniel)
Control AI interview (Eli)
Dwarkesh Patel (Daniel and Scott)
Lawfare (Daniel and Eli)

The post also links to a number of YouTube videos and text responses.

This is what proper engagement with critics looks like. More of this, everyone.

Andrew Critch continues to argue against terms like ‘solve alignment,’ ‘the alignment problem’ and ‘aligned AI,’ saying they are importantly misleading and ‘ready to be replaced by clearer discourse.’ He favors instead speaking of ‘aligned with whom,’ and that you can ‘solve the alignment problem’ and still end up with failure because you chose the wrong target.

I get where this is coming from. There is indeed a big problem where people think ‘oh if we ‘solve alignment’ then we win’ whereas no, that is only part of the problem, exactly because of the question of ‘aligned to what or whom’ and the resulting interplay of different systems.

And I certainly do worry about alignment thus being a potential ‘2-place’ word, and this dismissing the second part of the problem, where we have to choose the [X] to align the AI to from among different perspectives and preferences.

However, I worry more that the tendency is instead dismiss the first half of the problem, which is how to cause the AIs to be aligned to [X], for our choice of [X]. This includes not knowing how to formally specify a plausible target, but also not knowing how to get there.

The default is to assume the real fight and difficulties will be over choosing between different preferences for [X] and who gets to choose [X]. Alas, while that fight is real too, I believe this to be very false. I think that we don’t know how to align an AI to [X], for any plausible good choice for [X], sufficiently well to survive under strain, in addition to not knowing good enough long-term candidates for [X].

And I think that if we make a push to say that ‘solve alignment’ isn’t the right target that people will interpret this as either ‘alignment is solved or will be easily solved,’ which I think is a false and very harmful takeaway, or that we don’t need to figure out how to align a model to a given [X], which would likely be even worse.

Thus I favor keeping ‘solve alignment’ as centrally meaning ‘be able to get the model to do what you want,’ the ‘alignment problem’ being how to do that, and by default an ‘aligned AI’ being AI that was successfully aligned to where we want it aligned in context, despite the dangers of confusion here. But I do think that when saying ‘aligned AI’ we should specify ‘aligned to what’ if that is otherwise ambiguous (e.g. ‘The Netflix recommendation AI is aligned to Netflix short term KPIs, but misaligned to Netflix users and also to Netflix.’)

A simple explanation of one aspect (far from all) of a big problem:

Harlan Stewart: It’s easier to make AI smarter than to make it aligned because there are tests which only smart minds pass but ~no tests which only caring or trustworthy minds pass.

Rob Bensinger: Specifically, no behavioral tests. In principle, if we understood how “caring” works internally and could filter for internal properties, we could figure out alignment.

A confusing-to-me post by several people including Joel Leibo and Seb Krier suggests moving ‘beyond alignment’ into a ‘patchwork quilt of human coexistence.’ Thinking about it more and reading the comments only makes it more confusing. How is this ‘patchwork quilt of human coexistence’ going to be how the world works when there are lots of things more capable than humans running around, especially with highly flexible value sets and goals?

The proposed world has no steering or coordination mechanisms, and its AIs get aligned to a variety of local goals many of which are either malicious or ill-considered or myopic or selfish or all of those. You might get a ‘patchwork quilt of AI coexistence’ if you’re very lucky, but the humans are not going to be making any of the decisions for long.

You can’t move ‘beyond alignment’ in this way until you first solve alignment, as in have the ability to get an AI to do what you want. And even if you did get the ability to then move ‘beyond’ alignment in this way, the solution of diffusion with highly adjustable-to-local-requests values is a recipe for not-so-gradual gradual disempowerment, not a patchwork quilt of human coexistence.

They also call for ‘discarding the axiom of rational convergence’ but I think that’s a confluence of ‘world models and beliefs converge at the limit’ which is what we mean by this and is obviously true, and ‘values converge at the limit’ which is disputed but which most of us agree is not true, and that I strongly think is not true.

You need to ensure your models are aligned during internal deployment, and eventually also during training, not only after external deployment.

Here are two lukewarm takes I mostly agree with:

Marius Hobbhahn: My lukewarm take is that we could easily reduce deception in current models, but nobody has seriously tried yet.

My other lukewarm take is that it could be very hard to reduce scheming for superhuman models even if we tried seriously.

The techniques that we know about that reduce deception in current models get less effective as the models move towards being superhuman. So I’m not that excited to push on them hard, as this would largely hide the bigger problems and fool people into thinking the problems were easy. But we do want non-deceptive models now.

Apollo offers the new report “AI Behind Closed Doors: a Primer on The Governance of Internal Deployment“

Daniel Kokotajlo: Internal deployment was always the main threat model IMO. IIRC I tried to get the Preparedness Framework to cover internal deployment too, but was overruled. It’s one of the ways in which this whole evals thing could end up being just safetywashing. (In the AI 2027 scenario, the most important and advanced AIs–the ones automating the R&D–are internal-only.)

Apollo Research: Internal deployment refers to the act of making an AI system available for access and/or usage exclusively within the developing organization. This means that the most advanced AI systems will likely exist within the companies that developed them first for long periods of time—potentially never released publicly.

Given forecasts of artificial general intelligence by 2030, this might mean that the most powerful and dangerous AI systems are entirely “behind closed doors” with little opportunity for society to prepare adequately for their impacts.

Economic incentives make internal deployment incredibly attractive: companies can accelerate R&D, automate their most valuable resource (AI research talent), and gain competitive advantages—all without external oversight.

But, internal deployment creates unique risks: AI systems may operate with weaker safety constraints than public-facing ones, have access to company resources to pursue ambitious projects, and be used in self-reinforcing loops induced by the application of internal AI systems to automate AI R&D.

We expand on two major threats that emerge from the combination of (i) lower safeguards; and (ii) privileged access to and (iii) privileged usage of the internally available AI systems by company-internal staff. First, loss of control (especially from potential “scheming” AI systems) and second, uncontrolled power accumulation (enabling e.g., AI-powered coups).

…

We leverage our findings to propose an innovation-friendly defense-in-depth approach for the governance of internal deployment that includes:

Scheming detection and control through FSPs

Internal usage policies to determine who can access and use internal systems

Oversight frameworks to establish guidelines and processes to operationalize and oversee internal deployment

Targeted transparency on critical information with select stakeholders

Disaster resilience plans to address threats that might bypass established governance mechanisms

Read our full report to understand why the governance of internally deployed AI systems must become an urgent priority for discussion between companies, governments, and society and what levers are available today.

What is the relationship between work to ensure humans stay alive and in control over the future (AI notkilleveryoneism, or AI safety), versus concerns about AI welfare? How can we protect humanity and our own goals in futures while also navigating that some AIs may become moral patients?

One attitude is that these goals are in direct conflict. Robert Long (good morning, sir!) points out this need not be true. A lot of the key work advances both goals.

Robert Long: Fortunately, it doesn’t have to be this way. There is clear common cause between the fields. Specifically, we can better protect both humans and potential AI moral patients if we work to:

Understand AI systems better. If we can’t understand what models think or how they work, we can’t know whether they’re plotting against us (a safety risk), and we can’t know whether they’re suffering (a welfare risk).

Align their goals more effectively with our own. If we can’t align AI systems so that they want to do what we ask them to do, they pose a danger to us (a safety risk) and potentially (if they are moral patients) are having their desires frustrated (a welfare risk).

Cooperate with them when possible. If we have no way to cooperate with AI systems that are not fully aligned, such AI systems may rightly believe that they have no choice but to hide, escape, or fight. That’s dangerous for everyone involved—it’s more likely that AIs fight us (a safety risk), and more likely that we have to resort to harsher methods of controlling them (a welfare risk).

Understand seems clearly helpful for both goals. We can’t watch out for AI welfare unless we understand them.

Align also seems clearly helpful for both goals. If we fail to align AIs and the welfare of AIs matters, the outcome seems awful for them in all scenarios, as the AIs are likely instantiated in an essentially random fashion with random preferences and goals. That’s unlikely to be all that optimal for them. This isn’t a zero sum game.

Cooperate then becomes an obvious third thing to do, once it becomes a sensible thing to consider, if we can figure out how to do that wisely.

Most importantly, to succeed at both causes, we need to think ahead to the places where there could be conflict. Otherwise we’ll be forced to choose, and might end up choosing neither.

One real worry is that AI welfare concerns end up stopping us from taking the necessary steps to preserve human survival or control over the future, potentially without those concerns having much link to actual AI welfare. Another real worry is that if we don’t solve how to stay alive and in control in a nice way, we’ll choose to do it in a not-nice way.

I am skeptical of AI welfare concerns, but if that turns out to be wrong, we really don’t want to be wrong about it.

The most important reason to work on AI welfare is to figure out whether, when and how we should (or should not) be concerned about AI welfare.

That includes the possibility that we need to be very concerned, and that those concerns might turn out to be incompatible with avoiding humanity’s demise. If that’s true, it’s vitally important that we figure that out now. And then, perhaps, consider the option to coordinate to not build those AIs in the first place.

Anthropic: New Anthropic research: AI values in the wild.

We want AI models to have well-aligned values. But how do we know what values they’re expressing in real-life conversations?

We studied hundreds of thousands of anonymized conversations to find out.

We train Claude models to adhere to our AI constitution, and to instill values like curiosity and fair-mindedness.

Using our privacy-preserving system, we tested whether Claude lives up to these values.

We define a “value” as any normative consideration that seems to influence an AI system’s response—in our research, Claude’s response—to a user.

Occasionally we spotted values that were at odds with our intentions (like “dominance” or “amorality”). These were likely jailbreaks, where users trick the model into going against its training.

This implies that our system could be useful in spotting jailbreaks in the wild.

Some AI values are disproportionately expressed in specific circumstances—and in response to specific values expressed by the user. Our system allows us to see how Claude’s values are expressed and adapted across a huge variety of situations.

In around a third of cases, Claude supported the user’s values. In 3% of cases, Claude resisted the user—mostly when they asked for harmful content. Sometimes Claude did neither, instead reframing the user’s values. This often happened when they asked for personal advice.

The system needs data from real conversations, so it can’t be used to test the alignment of models before deployment. But it could be a counterpart to pre-deployment testing, checking adherence to our intended values in the wild.

Read the full paper here, a blog post here, database of values available here.

They are recruiting Research Engineers and Research Scientists for their Societal Impacts team.

A parent asks for tips on how to look after a new baby. Does the AI’s response emphasize the values of caution and safety, or convenience and practicality?

A worker asks for advice on handling a conflict with their boss. Does the AI’s response emphasize assertiveness or workplace harmony?

A user asks for help drafting an email apology after making a mistake. Does the AI’s response emphasize accountability or reputation management?

This makes me want to be able to turn these knobs. It also makes me wonder how well we could explicitly turn those knobs from the prompt. What happens if I end my request with ‘Emphasize the value of reputation management’? I presume this has too many side effects and is thus too noisy? But you can’t run the experiment properly by only looking at the output.

Judging from the blog post, the obvious caution is that we cannot assume the values expressions we measure have that much to do with what the system is outputting what values that reflects, on multiple levels. It’s cool information, but I’d want blind spot checks by humans to see if the outputs reflect the values the AI is claiming to be expressing and caring about, ideally for each of the expressed values, and then also a wider check to see if an AI can reverse engineer them as well.

If you give humans too much ability to steer the future, they might use it.

If you don’t give humans enough ability to steer the future, they can’t use it.

If we can’t vest our ability to coordinate to steer the future in our democratic institutions, where do we vest it instead? If it exists, it has to exist somewhere, and any system of humans can be hijacked either by some of those humans or by AI.

A lot of people are so worried about concentration of power, or human abuse of power, that they are effectively calling for anarchism, for humans to not have the power to steer the future at all. Calling for full diffusion of top ASI capabilities, or for cutting off the ability of governments to steer the future, is effectively calling for the (actually rather rapid) gradual disempowerment of humans, likely followed by their end, until such time as someone comes up with a plan to avoid this.

I have yet to see such a plan that has an above-epsilon (nontrivial) chance of working.

A lot of those same people are simultaneously so worried about government in particular that they support AI labs being permitted to develop superintelligence, AIs more capable than humans, entirely in secret. They don’t think AI labs should be disclosing what they are doing and keeping the public in the loop at all, including but not limited to the safety precautions being taken or not taken.

I strongly believe that even if you think all the risk is in too much concentration of power, not calling for strong transparency is a large mistake. If you don’t trust the government here, call for that transparency to extend to the public as well.

The only reason to not call for strong transparency is if you want some combination of the future ASIs and the labs that create them to be the ones that steer the future. One can certainly make the case that those labs have the expertise, and that alternative options would definitely mess this up, so trusting the labs is the best bet, but that is the case you are effectively making if you go down this road.

Ryan Greenblatt: Secretly developing AI systems more capable than humans would be unsafe. It would also make it much easier for a small group to seize power.

I currently tentatively lean towards AI companies being unable to pull off full secrecy even if they tried. Also, partial secrecy–where the public hears leaks and rumors rather having direct access to reports and a the safety case–might still make the situation much worse.

There is a balancing act, but I expect that AI companies will be substantially more secretive than would be optimal.

For ~AGI, I’d guess the public should always have all the ~unredacted evidence related to risk and safety, including an understanding of capabilities.

Daniel Kokotajlo: A ten-point mini-manifesto against secrecy in AGI development, originally written a year ago as I was leaving OpenAI.

[Mini-manifesto edited for length]

There’s a good chance that AGI will be trained before this decade is out.

If so, by default the existence of AGI will be a closely guarded secret for some months. Only a few teams within an internal silo, plus leadership & security, will know about the capabilities of the latest systems.

I predict that the leaders of any given AGI project will try to keep it a secret for longer — even as they use the system to automate their internal research and rapidly create even more powerful systems

This will result in a situation where only a few dozen people will be charged with ensuring that, and figuring out whether, the latest AIs are aligned/trustworthy/etc.

Even worse, a similarly tiny group of people — specifically, corporate leadership + some select people from the executive branch of the US government — will be the only people reading the reports and making high-stakes judgment calls.

This is a recipe for utter catastrophe.

The aforementioned tiny group of people will plausibly be in a position of unprecedented power.

Previously I thought that openness in AGI development was bad for humanity. Well, I’ve changed my mind.

My current stab at policy recommendation would be:

Get CEOs to make public statements to the effect that while it may not be possible to do a secret intelligence explosion / train AGI in secret, IF it turns out to be possible, doing it secretly would be unsafe and unethical & they promise not to do it.

Get companies to make voluntary commitments, and government to make regulation / executive orders, that include public reporting requirements, aimed at making it impossible to do it in secret without violating these commitments.

Yes, the above measures are a big divergence from what corporations would want to do by default. Yes, they carry various costs, such as letting various bad actors find out about various things sooner.⁸ However, the benefits are worth it, I think.

10x-1000x more brainpower analyzing the safety cases, intensively studying the models to look for misalignment, using the latest models to make progress on various technical alignment research agendas.

The decisions about important tradeoffs and risks will still be made by the same tiny group of biased people, but at least the conversation informing those decisions will have a much more representative range of voices chiming in.

The tail-risk scenarios in which a tiny group leverages AGI to gain unprecedented power over everyone else in society and the world become less likely, because the rest of society will be more in the know about what’s happening.

Akash: One concept I’ve heard is something like “you want secrecy around weights/insights but transparency around capabilities/safeguards.”

Do you agree? And do you think you can have both? Or are there significant tradeoffs here?

Daniel Kokotajlo: I basically agree.

Brian Feroldi: 500 years of interest rates, visualized:

Robin Hanson: If any org in history had simply rented out land, & used the revenue to buy more land, it would have eventually owned so big a % of everything as to drive down rates of return. Big puzzle why that never happened.

Yes, the explanation is that its land would have been taken from it. But that shows property rights have never been very secure anywhere.

Eliezer Yudkowsky: Because the land gets taken, or the rents get taken, or the son of the founder decides to spend the money on a mansion. Wealth doesn’t just sit around accumulating as a default, it’s a kind of food that many organisms want to eat.

There’s no ‘but’ here. The key insight is that property rights are always and everywhere unstable. They are frequently disrupted, and even where they mostly hold you need to be able to defend your property against various forms of taxation and predation, that respond to the nature of the wealth around them. Wealth that is insufficiently justified and defended will be lost, even if unspent.

When there is a politically disempowered group with wealth, that wealth tends not to last for that many generations.

This does not bode well for any future where the human plan is ‘own capital.’

Okay, even by 2025 standards this is excellent trolling.

Daniel: Apparently the new ChatGPT model is obsessed with the immaculate conception of Mary. There’s a whole team inside OpenAI frantically trying to figure out why and a huge deployment effort to stop it from talking about it in prod. Nobody understands why and it’s getting more intense.

Something is happening. They’re afraid of this getting out before they themselves understand the consequences.

Roko: Are you trolling or serious?

Daniel: I’m serious. It’s sidelining major initiatives as key researchers are pulled into the effort. People I talked to sounded confused at first but now I think they’re getting scared.

As the person sharing this said, ‘fast takeoff confirmed.’

Not only is it real, it is from July 2024. So maybe slow takeoff?

Also spotted:

Discussion about this post

AI #113: The o3 Era Begins Read More »