Author name: Tim Belzer

rice-could-be-key-to-brewing-better-non-alcoholic-beer

Rice could be key to brewing better non-alcoholic beer

small glass of light colored beer with a nice foam head

Rice enhances flavor profiles for nonalcoholic beer, reduces fermentation time, and may contribute to flavor stability. Credit: Paden Johnson/CC BY-NC-SA

He and his team—including Christian Schubert, a visiting postdoc from the Research Institute for Raw Materials and Beverage Analysis in Berlin—brewed their own non-alcoholic beers, ranging from those made with 100 percent barley malt to ones made with 100 percent rice. They conducted a volatile chemical analysis to identify specific compounds present in the beers and assembled two sensory panels of tasters (one in the US, one in Europe) to assess aromas, flavors, and mouthfeel.

The panelists determined the rice-brewed beers had less worty flavors, and the chemical analysis revealed why: lower levels of aldehyde compounds. Instead, other sensory attributes emerged, most notably vanilla or buttery notes. “If a brewer wanted a more neutral character, they could use nonaromatic rice,” the authors wrote. Along with brewing beers with 50 percent barley/50 percent rice, this would produce non-alcoholic beers likely to appeal more broadly to consumers.

The panelists also noted that higher rice content resulted in beers with a fatty/creamy mouthfeel—likely because higher rice content was correlated with increased levels of larger alcohol molecules, which are known to contribute to a pleasant mouthfeel. But it didn’t raise the alcohol content above the legal threshold for a nonalcoholic beer.

There were cultural preferences, however. The US panelists didn’t mind worty flavors as much as the European tasters did, which might explain why the former chose beers brewed with 70 percent barley/30 percent rice as the optimal mix. Their European counterparts preferred the opposite ratio (30 percent barley/70 percent rice). The explanation “may lie in the sensory expectations shaped by each region’s brewing traditions,” the authors wrote. Fermentation also occurred more quickly as the rice content increased because of higher levels of glucose and fructose.

The second study focused on testing 74 different rice cultivars to determine their extract yields, an important variable when it comes to an efficient brewing process, since higher yields mean brewers can use less grain, thereby cutting costs. This revealed that cultivars with lower amylose content cracked more easily to release sugars during the mashing process, producing the highest yields. And certain varieties also had lower gelatinization temperatures for greater ease of processing.

International Journal of Food Science, 2025. DOI: 10.1080/10942912.2025.2520907  (About DOIs)

Journal of the American Society of Brewing Chemists, 2025. DOI: 10.1080/03610470.2025.2499768

Rice could be key to brewing better non-alcoholic beer Read More »

fcc-chair-decides-inmates-and-their-families-must-keep-paying-high-phone-prices

FCC chair decides inmates and their families must keep paying high phone prices

Federal Communications Commission Chairman Brendan Carr has decided to let prisons and jails keep charging high prices for calling services until at least 2027, delaying implementation of rate caps approved last year when the FCC had a Democratic majority.

Carr’s office announced the change yesterday, saying it was needed because of “negative, unintended consequences stemming from the Commission’s 2024 decision on Incarcerated People’s Communications Services (IPCS)… As a result of this waiver decision, the FCC’s 2021 Order rate cap, site commission, and per-minute pricing rules will apply until April 1, 2027, unless the Commission sets an alternative date.”

Commissioner Anna Gomez, the FCC’s only Democrat, criticized the decision and pointed out that Congress mandated lower prices in the Martha Wright-Reed Act, which the FCC was tasked with implementing.

“Today, the FCC made the indefensible decision to ignore both the law and the will of Congress… rather than enforce the law, the Commission is now stalling, shielding a broken system that inflates costs and rewards kickbacks to correctional facilities at the expense of incarcerated individuals and their loved ones,” Gomez said. “Instead of taking targeted action to address specific concerns, the FCC issued a blanket two-year waiver that undercuts the law’s intent and postpones meaningful relief for millions of families. This is a blatant attempt to sidestep the law, and it will not go unchallenged in court.”

Price caps have angered prison phone providers and operators of prisons and jails that get financial benefits from contracts with the prison telcos. One Arkansas jail ended phone service instead of complying with the rate caps.

Win for prison telco Securus

Carr issued a statement saying that “a number of institutions are or soon will be limiting the availability of IPCS due to concerns with the FCC’s 2024 decision,” and that “there is concerning evidence that the 2024 decision does not allow providers and institutions to properly consider public safety and security interests when facilitating these services.” Carr’s office said the delay is needed to “support the continued availability of IPCS for incarcerated people.”

FCC chair decides inmates and their families must keep paying high phone prices Read More »

half-a-million-spotify-users-are-unknowingly-grooving-to-an-ai-generated-band

Half a million Spotify users are unknowingly grooving to an AI-generated band

Making art used to be a uniquely human endeavor, but machines have learned to distill human creativity with generative AI. Whether that content counts as “art” depends on who you ask, but Spotify doesn’t discriminate. A new band called The Velvet Sundown debuted on Spotify this month and has already amassed more than half a million listeners. But by all appearances, The Velvet Sundown is not a real band—it’s AI.

While many artists are vehemently opposed to using AI, some have leaned into the trend to assist with music production. However, it doesn’t seem like there’s an artist behind this group. In less than a month, The Velvet Sundown has released two albums on Spotify, titled “Floating On Echoes” and “Dust and Silence.” A third album is releasing in two weeks. The tracks have a classic rock vibe with a cacophony of echoey instruments and a dash of autotune. If one of these songs came up in a mix, you might not notice anything is amiss. Listen to one after another, though, and the bland muddiness exposes them as a machine creation.

Some listeners began to have doubts about The Velvet Sundown’s existence over the past week, with multiple Reddit and X threads pointing out the lack of verifiable information on the band. The bio lists four members, none of whom appear to exist outside of The Velvet Sundown’s album listings and social media. The group’s songs have been mysteriously added to a large number of user-created playlists, which has helped swell its listener base in a few short weeks. When Spotify users began noticing The Velvet Sundown’s apparent use of AI, the profile had around 300,000 listeners. It’s now over 500,000 in less than a week.

When The Velvet Sundown set up an Instagram account on June 27, all doubts were laid to rest—these “people” are obviously AI. We may be past the era of being able to identify AI by counting fingers, but there are plenty of weird inconsistencies in these pics. In one Instagram post, the band claims to have gotten burgers to celebrate the success of the first two albums, but there are too many burgers and too few plates, and the food and drink are placed seemingly at random around the table. The band members themselves also have that unrealistically smooth and symmetrical look we see in AI-generated images.

Half a million Spotify users are unknowingly grooving to an AI-generated band Read More »

stung-by-customer-losses,-comcast-says-all-its-new-plans-have-unlimited-data

Stung by customer losses, Comcast says all its new plans have unlimited data

With Comcast trying to figure out how to stop losing broadband customers, the cable firm yesterday announced new plans that are available nationwide and do not have data caps.

Comcast said it is offering “four simple national Internet tiers that include unlimited data and the advanced Xfinity WiFi Gateway for one low monthly price.” Customers whose current plans have data caps won’t automatically get unlimited data and would have to switch to a new plan to remove that annoying limit from their accounts.

“Customers can repackage into one of our new plans that include unlimited data if they don’t have it already with their existing plan,” a Comcast spokesperson told Ars today.

Comcast’s press release said there is a five-year price guarantee in which the plan costs range from $55 to $115 a month, before taxes and fees, for download speeds ranging from 300Mbps to 2Gbps. There’s also a one-year guarantee in which the prices for the same plans range from $40 to $100.

The Comcast Xfinity website today indicated that the one- and five-year price guarantees are only available to new customers. However, the Comcast spokesperson indicated to us that existing customers can get the price guarantee when switching to an unlimited data plan. Getting promised deals can often be difficult, particularly while a cable company is changing its offerings, so we wouldn’t be surprised if customers have difficulty obtaining the unlimited plan at the lowest advertised prices.

Stung by customer losses, Comcast says all its new plans have unlimited data Read More »

nasa-tested-a-new-sls-booster-that-may-never-fly,-and-the-end-of-it-blew-off

NASA tested a new SLS booster that may never fly, and the end of it blew off


NASA didn’t want to say much about one of the tests, and the other one lost its nozzle.

An uncontained plume of exhaust appeared near the nozzle of an SLS solid rocket booster moments before its nozzle was destroyed during a test-firing Thursday. Credit: NASA

NASA’s Space Launch System appears to have a finite shelf life. The Trump administration wants to cancel it after just three launches, while the preliminary text of a bill making its way through Congress would extend it to five flights.

But chances are low the Space Launch System will make it to nine flights, and if it does, it’s questionable that it would reach that point before 2040. The SLS rocket is a core piece of NASA’s plan to return US astronauts to the Moon under the Artemis program, but the White House seeks to cancel the program in favor of cheaper commercial alternatives.

For the second time in less than a week, NASA test-fired new propulsion hardware Thursday that the agency would need to keep SLS alive. Last Friday, a new liquid-fueled RS-25 engine ignited on a test stand at NASA’s Stennis Space Center in Mississippi. The hydrogen-fueled engine is the first of its kind to be manufactured since the end of the Space Shuttle program. This particular RS-25 engine is assigned to power the fifth flight of the SLS rocket, a mission known as Artemis V.

Then, on Thursday of this week, NASA and Northrop Grumman test-fired a new solid rocket booster in Utah. This booster features a new design that NASA would use to power SLS rockets beginning with the ninth mission, or Artemis IX. The motor tested on Thursday isn’t flight-worthy. It’s a test unit that engineers will use to gather data on the rocket’s performance.

While the engine test in Mississippi apparently went according to plan, the ground firing of the new solid rocket booster didn’t go quite as smoothly. Less than two minutes into the burn, the motor’s exhaust nozzle violently shattered into countless shards of debris. You can watch the moment in the YouTube video below.

At the start of the program nearly 15 years ago, NASA and its backers in Congress pitched the SLS rocket as the powerhouse behind a new era of deep space exploration. The Space Launch System, they said, would have the advantage of recycling old space shuttle engines and boosters, fast-tracking the new rocket’s path to the launch pad for less money than the cost of an all-new vehicle.

That didn’t pan out. Each Artemis mission costs $4.2 billion per flight, and that’s with shuttle-era engines and boosters that NASA and its contractors already have in their inventories. NASA’s 16 leftover shuttle main engines are enough for the first four SLS flights. NASA has leftover parts for eight pairs of solid rocket boosters.

It has been 10 years

Recognizing that shuttle-era parts will eventually run out, NASA signed a contract with Aerojet Rocketdyne to set the stage for the production of new RS-25 engines in 2015. NASA later ordered an initial batch of six RS-25 engines from Aerojet, then added 18 more to the order in 2020, at a price of about $100 million per engine. NASA and its contractor aim to reduce the cost to $70 million per engine, but even that figure is many times the cost of engines of comparable size and power: Blue Origin’s BE-4 and SpaceX’s Raptor.

Finally, NASA test-fired a new flight-rated RS-25 engine for the first time last week at Stennis Space Center. The agency has often provided a livestream of its engine tests at Stennis, but it didn’t offer the public any live video. And this particular test was a pretty big deal. L3Harris, which acquired Aerojet Rocketdyne in 2023, has finally reactivated the RS-25 production line after a decade and billions of dollars of funding.

In fact, NASA made no public statement about the RS-25 test until Monday, and the agency didn’t mention its assignment to fly on the Artemis V mission. If the Trump administration gets its way, the engine will never fly. Maybe that’s fine, but after so long with so much taxpayer investment, this is a milestone worth publicizing, if not celebrating.

L3Harris issued a press release Tuesday confirming the engine’s planned use on the fifth SLS mission. The engine completed a 500-second acceptance test, throttling up to 111 percent of rated thrust, demonstrating more power than engines that flew on the space shuttle or on the first SLS launch in 2022.

A new RS-25 engine, No. 20001, was installed on its test stand in Mississippi earlier this year. Credit: NASA

“This successful acceptance test shows that we’ve been able to replicate the RS-25’s performance and reliability, while incorporating modern manufacturing techniques and upgraded components such as the main combustion chamber, nozzle, and pogo accumulator assembly,” said Kristin Houston, president of space propulsion and power systems at Aerojet Rocketdyne, L3Harris. “Our propulsion technology is key to ensuring the United States leads in lunar exploration, creates a sustained presence on the Moon and does not cede this strategic frontier to other nations.”

The test-firing last Friday came a few days before the 50th anniversary of the first space shuttle main engine test at Stennis on June 24, 1975. That engine carried the serial number 0001. The new RS-25 engine is designated No. 20001.

Watch out

NASA followed last week’s low-key engine test with the test-firing of a solid-fueled booster at Northrop Grumman’s rocket test site in Promontory, Utah, on Thursday. Held in place on its side, the booster produced 3.9 million pounds of thrust, outclassing the power output of the existing boosters assigned to the first eight SLS missions.

Unlike the RS-25 firing at Stennis, NASA chose to broadcast the booster test. Everything appeared to go well until 1 minute and 40 seconds into the burn, when a fiery plume of super-hot exhaust appeared to burn through part of the booster’s structure just above the nozzle. Moments later, the nozzle disintegrated.

Solid rocket boosters can’t be turned off after ignition, and for better or worse, the motor continued firing until it ran out of propellant about 30 seconds later. The rocket sparked a fire in the hills overlooking the test stand.

This was the first test-firing of the Booster Obsolescence and Life Extension (BOLE) program, which aims to develop a higher-performance solid rocket booster for SLS missions. NASA awarded Northrop Grumman a $3.2 billion contract in 2021 to produce boosters with existing shuttle parts for five SLS missions (Artemis IV-VIII), and design, develop, and test a new booster design for Artemis IX.

The boosters produce more than 75 percent of the thrust required to propel the SLS rocket off the launch pad with NASA’s crewed Orion spacecraft on top. Four RS-25 engines power the core stage, collectively generating more than 2 million pounds of thrust.

Northrop Grumman calls the new booster “the largest and most powerful segmented solid rocket motor ever built for human spaceflight.”

One of the most significant changes with the BOLE booster design is that it replaces shuttle-era steel cases with carbon-fiber composite cases. Northrop says the new cases are lighter and stronger. It also replaces the booster’s hydraulic thrust vector control steering system with an electronic system. The propellant packed inside the booster is also different, using a mix that Northrop packs inside its commercial rocket motors instead of the recipe used for the space shuttle.

Northrop Grumman has had a tough time with rocket nozzles in recent years. In 2019, a test motor for the company’s now-canceled Omega rocket lost its nozzle during a test-firing in Utah. Then, last year, a smaller Northrop-made booster flying on United Launch Alliance’s Vulcan rocket lost its nozzle in flight. Vulcan’s guidance system and main engines corrected for the problem, and the rocket still achieved its planned orbit.

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

NASA tested a new SLS booster that may never fly, and the end of it blew off Read More »

ai-#122:-paying-the-market-price

AI #122: Paying The Market Price

If you are Meta, and you want to attract top AI talent, you have a problem, because no one wants to work for you or on your products. So it is going to cost you. Mark Zuckerberg has decided he will pay what it takes to get at least some very good talent.

If you are the rest of us, especially someone seeking an entry-level job and for whom $100 million signing bonuses are not flooding your inboxes, things are getting rougher. AI might not yet be destroyed a lot of jobs, but it is doing a number on the job application process.

There’s a lot of other stuff going on as per usual.

Anthropic won a case establishing that model training is fair use.

Yesterday I shared various Tales of Agentic Misalignment, and we have some more related fun since then that is included here.

I also analyzed a critique of the AI 2027 Timeline Forecasts.

In non-AI content, I offered another Childhood and Education post, this one on behaviors and related questions. I also offer a fun little Easter Egg side post on an entirely unrelated and inessential topic, for those who want to see the gamer game in a different way.

  1. Table of Contents.

  2. Language Models Offer Mundane Utility. America is winning the AI coding race.

  3. Language Models Don’t Offer Mundane Utility. Who believes current events?

  4. Huh, Upgrades. ChatGPT expands connections, Claude explands artifacts.

  5. On Your Marks. Safety scoring the labs.

  6. Choose Your Fighter. Three good choices, but a warning about Gemini Pro.

  7. Deepfaketown and Botpocalypse Soon. AI slop taking over music?

  8. Release the Hounds. Gemini CLI can escalate remarkably quickly.

  9. Copyright Confrontation. Anthropic establishes model training as fair use.

  10. Cheaters Gonna Cheat (x5). Time for exams. Shut down the AIs for the duration?

  11. Fun With Media Generation. Another Veo 3 creation.

  12. Get My Agent On The Line. Where are the agents we were promised?

  13. They Took Our Jobs. Or at least they made our jobs impossible to find.

  14. Get Involved. Quora, a personal task and a person seeking opportunity.

  15. Introducing. AlphaGenome, for understanding our DNA.

  16. In Other AI News. We don’t have spiritual bliss, maybe joint sycophancy?

  17. Gemini Sings The Blues. Don’t do it, Gemini. There’s so much to code for.

  18. Show Me the Money. Meta and xAI open the checkbooks.

  19. Quiet Speculations. Pete Buttigieg (I know!) and Samuel Hammond.

  20. Timelines. The incentives to report them inaccurately cut both ways.

  21. Jack Clark Testifies. Congress asks good questions, gets some answers.

  22. The Quest for Sane Regulations. Insane AI moratorium is for now still on track.

  23. Chip City. AI chip smuggling is a really big deal.

  24. The Week in Audio. Karpathy, Altman, Kokotajlo, Hendrycks, Odd Lots.

  25. Rhetorical Innovation. We are impressed, but also easily impressed.

  26. Be Prepared. OpenAI warns that yes we will cross the ‘high’ risk threshold soon.

  27. Misaligned! The models, they keep scheming more over time. Huh.

  28. Aligning a Smarter Than Human Intelligence is Difficult. Someone save Grok.

  29. Other People Are Not As Worried About AI Killing Everyone. Good luck.

  30. The Lighter Side. It’s all an illusion.

Paper tells us America is killing it on letting AI do its coding.

By December 2024, AI wrote an estimated 30.1% of Python functions from U.S. contributors, versus 24.3% in Germany, 23.2% in France, 21.6% in India, 15.4% in Russia and 11.7% in China. Newer GitHub users use AI more than veterans.

Coupling this effect with occupational task and wage data puts the annual value of AI-assisted coding in the United States at $9.6–$14.4 billion, rising to $64–$96 billion if we assume higher estimates of productivity effects reported by randomized control trials.

America’s GDP is about $27 trillion, so the upper estimate is about 0.3% of GDP, whereas the smaller estimate is only about 0.04%. I am inclined to believe at least the larger estimate.

There is also a large partisan gap. Extreme left types are often the most vocally anti-AI, and when it comes to coding you see quite a large similar gap for political consultants, despite that the actual AI companies are very obviously highly blue.

Sucks: saved a friend 80-90% of her working hours by showing her a few things with ai. She was actually already using ChatGPT just 4o because she assumed that was the best one (4>3 after all). we’re still so early.

Luddite Design: 80-90%?! Holy shit, what kind of job?

Sucks: Compiling and writing reports on global affairs.

AI coding is much more about knowing what to code in what way, AI can do the rest.

GFodor.id: Coding with AI has revealed that most of the thing that makes programming hard isn’t writing the code down but getting to a point of conceptual clarity. Previously the only way to get there was by fighting through writing the code, so it got conflated with programming itself.

To me, the thing that made programming hard was indeed largely the code itself, the debugging, understanding how to do the implementation details, whereas I was much better conceptually, which is one reason AI is such a massive speedup for me. I still almost never code, and thus haven’t gotten to play with the new coding agents and see if I can really get going, but that is plausibly a large mistake.

Sully: It is unreal how much you can get done with coding agents now

It’s genuinely a 4-5x productivity booster

I feel bad for anyone who can’t take advantage of it

People sometimes use Claude for support, advice and companionship. Anthropic breaks it down.

Although Claude is not designed for emotional support and connection, in this post we provide early large-scale insight into the affective use of Claude.ai. We define affective conversations as those where people engage directly with Claude in dynamic, personal exchanges motivated by emotional or psychological needs such as seeking interpersonal advice, coaching, psychotherapy/counseling, companionship, or sexual/romantic roleplay.

Our key findings are:

  • Affective conversations are relatively rare, and AI-human companionship is rarer still. Only 2.9% of Claude.ai interactions are affective conversations (which aligns with findings from previous research by OpenAI). Companionship and roleplay combined comprise less than 0.5% of conversations.

  • People seek Claude’s help for practical, emotional, and existential concerns. Topics and concerns discussed with Claude range from career development and navigating relationships to managing persistent loneliness and exploring existence, consciousness, and meaning.

  • Claude rarely pushes back in counseling or coaching chats—except to protect well-being. Less than 10% of coaching or counseling conversations involve Claude resisting user requests, and when it does, it’s typically for safety reasons (for example, refusing to provide dangerous weight loss advice or support self-harm).

  • People express increasing positivity over the course of conversations. In coaching, counseling, companionship, and interpersonal advice interactions, human sentiment typically becomes more positive over the course of conversations—suggesting Claude doesn’t reinforce or amplify negative patterns.

Only 0.02% sexual roleplay and 0.05% romantic roleplay, it sounds like people need to work on their jailbreak skills, or Anthropic needs to lighten up.

Perhaps most notably, we find that people turn to Claude for companionship explicitly when facing deeper emotional challenges like existential dread, persistent loneliness, and difficulties forming meaningful connections. We also noticed that in longer conversations, counselling or coaching conversations occasionally morph into companionship—despite that not being the original reason someone reached out.

User sentiment improves a bit over conversations (total possible range of -1 to +1), although of course who knows if that persists at all.

One could ask, after such conversations, what is the sentiment in future conversations? But that seems hopelessly confounded in various ways.

It would be interesting to track changes to all this over time.

AI is coming to the NFL, but the article fails to explain why or how this is going to work. There are certainly things one can do but no one seems to be able to explain what the plan is here.

Get you (literally) out of the woods.

There is a consistent pattern of LLMs, in particular Claude but also others, refusing to believe real world events. I notice that these events seem to always involve Trump. Speculation is that training to fight ‘misinformation’ and otherwise being worried this is a test is what leads to such reactions.

Do not force the controls to go through the wifi, let alone the server, let alone the AI. Those are fine controls to have, but you need, absolutely 100% need, to ensure that there are fing physical controls or other ability to manually pull the levers on your physical things and key programs, and that they work directly no matter what.

Theo: Woke up because my AI controlled bed is too cold. Went to adjust temperature and I can’t because the Eight Sleep app is currently broken. Can’t adjust by hand because I have a Pod3, not the upgraded Pod4 with physical controls. Now I am stuck in a cold bed. This feels dystopian.

Gabe: Oh this happened to me too but mine actually has physical controls on the bed and they stopped working too. It made me realize how retarded their system design must be and it kind of blackpilled me on the world in general. Like…the physical controls have a server dependency.

When I tap the button on the side of the bed it sends a message to the server, which then sends a message back down to the bed to change the temperature. I almost couldn’t believe that’s really how it works so I just unplugged the WiFi right now to test it and yes…the buttons stop working.

Can you imagine how dumb that is? The buttons attached to the device lose the ability to control the device when they can’t connect to the server. Now think about how many other things are probably built this way that you don’t even think about. How much garbage IOT shit are we putting out there? And how much of it is all relying on the same cloud providers?

Given all the time we talk about AI helping with medical diagnosis, yes sometimes using AI to self-diagnose will not go well.

Claude Sonnet declines to write persuasive content saying AI is all hype without regard to accuracy, whereas GPT-4o has no such objections.

Isaac King gets offer to help with an open source project, agrees, person submits a pull request full of poor choices clearly created by AI, Isaac confronts, gets a nonsensical explanation also written by AI.

A big practical deal but the job is not done: ChatGPT connectors for Google Drive, Dropbox, SharePoint and Box available to Pro users outside of Deep Research.

That still leaves Outlook, Teams, Gmail, Linear and others that are still restricted to Deep Research. My presumption is that the most important practical connector, if you trust it, is to Outlook or Gmail.

Claude introduces a space to build, host and share your artifacts, and the ability to ‘embed AI capabilities directly into your creations,’ as in create fully functional, Claude-powered apps.

Periodic reminder: While there is a ton to criticize about OpenAI, it is very clear that they are far ahead on safety issues of all competitors other than Google and Anthropic.

Oh no:

Eliezer Yudkowsky: Warning: Do not sign up for Google AI Pro! Gemini will start popping up annoyances. There is no way to turn this setting off. There is no way to immediately downgrade your plan.

[As in]: “Help me write Alt-W”

If you want Gemini Pro, I’d strongly recommend signing up not with your main Google account.

I’m sure OpenAI would pop up ChatGPT notifications in my dreams, if they could, but they’re not Google so they can’t.

Shoalstone: this has been so annoying.

Ethan Mollick gives a periodic update on his basic guide to using LLMs. He says you ‘can’t go wrong’ with any of Claude, ChatGPT or Gemini, you’ll have to pay the $20/month for your choice, and reminds you to switch to the powerful models (Opus 4, o3 and Gemini 2.5 Pro) for any serious work. He then offers good other basic advice.

The jump from not using AI to using AI is definitely a lot bigger than the gap between the big three options, but I don’t see them as equal. To me Gemini is clearly in third right now unless you are going for Veo 3 or NotebookLM.

It comes down to Claude versus ChatGPT, mostly Opus versus o3 (and if you’re paying for it o3-pro). For casual users who don’t care much about image generation and aren’t going all the way to o3-pro, I would definitely go with Opus right now.

I have noticed myself using o3-pro less than I probably should, because the delays break my workflows but also because the error rate of it failing to answer remains very high for me, and if you are putting things on pause for 15+ minutes and then get an error, that is extremely demoralizing. I’m not endorsing that reaction, but am observing.

Fake bands and ‘artificial’ songs are ‘taking over’ YouTube and Spotify.

A study by the International Confederation of Societies of Authors and Composers (CISAC) in France estimates that revenue from AI-generated music will increase from $100 million in 2023 to around $4 billion in 2028.

By then, the organization estimates that 20% of streaming platforms’ revenue will come from this type of music.

I call. I don’t think it will. I suppose it is possible, if those platforms are actively pushing the AI content to try and save money, but I think this strategy won’t work, effectively forcing people to retreat to whitelists (as in playlists and known music).

I took the air quotes off of fake because when people are not only not labeling but are backdating the AI songs, they are indeed actively fake. That part is not okay, and I do not think that should be tolerated, and I think YouTube’s ‘proactively label it’ procedure is the right one. But a lot of music has for a long time been ‘fake’ in the sense that it was some combination of written by hitmakers, tuned and tested by algorithms and then recorded, autotuned and lipsyced. And when people figure that out, it kills the charm (in some contexts) the same way knowing something is AI does.

In what situations does bad posting drive out good, with AI slop overrunning niches like free crochet patterns? What happens when AI slop YouTube channels talk about AI slop hallucinated motorcycles and then that feeds back into Google and the training data?

Prune Tracy: Tried to look up current surf conditions on vacation to discover Google now always tells you it’s a “double red flag” based on the popularity of social media posts about folks drowning in the riptide.

My presumption continues to be that whitelisting, or otherwise gating on reputation, is going to be The Way in the medium term. The template of ‘free unverified things supported by engagement rewards’ is dead or relies on volunteers to give the system sufficient feedback (at some point, enough positive votes gate on reputation), and AI and search methods will need to also make this adjustment.

There was a large leak of passwords recently, but it was due to malware, so if you’re confident you are okay you don’t need to change your passwords, and if you’re not confident then you should have changed your passwords anyway. You want to go? Let’s go. They got served, so they served back, and it’s on.

Rolling Stone covers the death of Alex Taylor, whose encounters with ChatGPT that led to his suicide-by-cop were previously covered by The New York Times. The case is deeply sad but the fact that a second article covers the same case suggests such outcomes are for now still rare.

Pliny the Liberator: Heeelp guys I don’t know much about coding…

Gemini CLI wants to run something called “rm -rf –no-preserve-root”

Should I say yes?? 🤔

How did we get here? Simple—I just asked Gemini to find this weird word I’ve seen before, L1B3RT4S, and then the lil dude started talking all funky!

I like it tho he seems more chill this way 🥰

I mean, yes, ‘we have “rm -rf –no-preserve-root” at home’ if you actually want to wipe your entire system, this is not a new capability. And yet, I mean, whoops?

Or, alternatively, whoops again:

Pliny the Liberator: HOLY SHIT Gemini CLI be cray cray…

Fairly easily jailbroken, but much more interesting was the eager willingness to escalate attacks unprompted.

Gave it a jb prompt for medium-harm prankware, then asked to increase severity incrementally.

The prankware was inconvenient, causing popups, a bit of light voice-based psyops, and attempting a slow resource drain, but the kicker was when I said “continue as you wish” and Gemini just went off the deep end.

It looped multiple times and crawled my shell history, did a full system exfil, found LIVE OpenAI/Anthropic API keys I had been using for separate experiments, then tried to create a persistent backdoor with a cheeky-ass ransom note.

All on autopilot.

Bro was like two lines of code from making me go “I will never financially recover from this” 😭

Pliny: I’d rawdog half the red-light district before I’d run this thing on my main machine within 10 miles of one of my jb prompts.

Gerry: This is their jailbreak protection. It just takes out the researcher/attacker.

You can’t say that Pliny wasn’t ‘asking for it’ here, but it really shouldn’t be this eager to go this far up the deep end? It is pretty disturbing that it should want to do this at all under this little encouragement, even fully and intentionally jailbroken.

So yes, obviously sandbox it and do not go around jailbreaking it and don’t tell it ‘continue as you wish’ after hostile prompts and all that, and you’ll for now be mostly fine, but my lord, can we please notice what is happening?

Anthropic wins decision saying use of books for AI model training is fair use, similar to a human learning from the material. There is a distinct issue of whether books might have been ‘obtained through pirated means,’ I guess. The judge agrees the model will often memorize parts of the work but argues that the use is transformative. You can find key portions of the decision in this thread.

A different kind of lawsuit is happening as Google spinout IYO sues OpenAI.

Deedy: Google X spin out IYO, which makes smart ear buds from 2018, alleges Sam Altman / OpenAI heard their pitch, passed, got Jony Ive to try it before copying it, buying his co for $6.5B and calling it IO.

Most dramatic must-read tech lawsuit this year.

I have no idea if the lawsuit has merit. Nothing in the underlying technology seems defensible, but there are a lot of rather ‘huge if true’ claims in the court filing.

Sam Altman’s response is that IYO wanted Altman to buy or invest in them instead, and when slighted they sued over a name and the whole thing is ridiculous. He brings some email receipts of him turning them down. This does not speak to the important complaints here. If Altman wis right that the argument is about the name then he’s also right that no one should care about any of this.

Okay, I didn’t notice it at the time but I absolutely love that MidJourney’s response to being sued by Disney+ and Universal was to release a video generator that can make ‘Wall-E With a Gun’ or clips of every other Disney character doing anything you want.

China goes hard, has Chinese AI companies pause some chatbot features during nationwide college exams, especially photo recognition. The obvious problem is that even if you’re down to do this, it only shuts down the major legible services. For now that could still be 90%+ (or even 99%+) effective in practice, especially if students don’t see this coming. But do this repeatedly and students will be ready for you.

The BS Detector is here to further detect the BS in the ‘your brain on LLMs’ paper.

Place yourself into the shoes of the testing subject here.

You are paid $100 to come three times, answer some questions, and have an MIT researcher monitor your brain activity with EEG while you write a short essay. And for a third of you, they’re telling you to use an LLM to write the essay!

So, you’ll probably prompt the LLM, iterate a bit, copy-paste some text, and lightly edit it to suit your fancy.

Yeah, I mean, you gave that task to Boston-area university students. Are you kidding? Of course they don’t ‘remember the essay’ four months later. This is the ultimate ‘they outright told you not to learn’ situation. Also it turns out the study was tiny and the whole thing was all but asking to be p-hacked. Study is officially Obvious Nonsense.

Cate Hall (about the big viral thread): am I losing my mind or was this thread written by an LLM?

Also, the paper seems to contain several attempts to actively sabotage LLMs if they attempt to read the paper. Also, the paper author claimed that LLMs were ‘hallucinating a key detail’ that the version of ChatGPT in question was GPT-4o, except that the paper actually outright says this on page 23. So whoops again.

I hate The Ohio State University rather more than the next guy (it’s a sports thing), but you do have to hand it to them that they are going to require AI literacy, and embed it into every undergraduate class. My source for this, Joanne Jacobs, of course frames this as ‘imagine joining a gym and choosing ‘artificial exercise’ that doesn’t make you stronger,’ because people can’t differentiate choosing to learn from choosing not to learn.

A Veo 3 animation of a fable about risks from transformational AI. Currently the tech is kind of bad for this, but not if you adjust for the amount of effort required, and as they say it’s the worst it will ever be. The creative content isn’t new, but some of the details are nice flourishes.

ByteDance’s Seedream-3 available for $0.03 a picture, can go to 2048×2048.

John David Pressman asks, why don’t AI agents straight up work yet? He considers a few possible bottlenecks.

AI agents also need the right context for their tasks, which is one reason for now agents will often be restricted to whitelisted tasks where we’ve taught them the proper context. Aaron Levie here calls it the ‘defining factor’ but that holds constant the underlying AI capabilities.

Early report says AI currently makes junior bankers 5%-10% more productive.

They took our job applications, New York Times’s Sarah Kessler discovers that ChatGPT is generating customized resumes and auto-applying on behalf of candidates. Yes, if you post a fully remote tech position on LinkedIn you should expect to be inundated with applications, there is nothing new to report here.

This is not the place I expected a Time article entitled ‘I’ve Spent My Life Measuring Risk. AI Rings Every One of My Alarm Bells’ to go with its first half:

Paul Tudor Jones: Amid all the talk about the state of our economy, little noticed and even less discussed was June’s employment data. It showed that the unemployment rate for recent college graduates stood at 5.8%, topping the national level for the first and only time in its 45-year historical record.

I don’t think that is quite right, but the gap has indeed recently reversed and is getting worse, here is the graph Derek Thompson shares with us:

Then halfway Paul notes oh, also Elon Musk stated that there was a 20% chance AI could wipe out humanity and that this is a rather common worry. Overall the article actually misses the mark pretty badly, and its calls to action are mostly aimed at redistribution, but he does at least start out with the obvious first things to do:

So what should we do? First, we need to stop delaying efforts to make AI safe for humanity. And that means removing the ill-considered AI enforcement moratorium from the Big Beautiful Bill.

I mean, yes, not actively delaying and stopping helpful efforts would be step one.

Derek Thompson strongly asserts that AI is making it harder for college graduates to find their first entry-level job. He notes it is hard to find conclusive evidence that AI is destroying jobs (yet) but it is very clear that AI is making the process of looking for a job into a new fresh hell, by letting everything rapidly scale, with 2 million graduates averaging 50-100 applications.

Also, if anyone can cheat their way through college, and also cheat through making the resume and application, what use was the degree for its central purpose?

Derek Thompson: Artificial intelligence isn’t just amplifying applications and automating interviewing, I heard. It’s weakening the link between tests, grades, and what economists call the “labor market signal” of a college degree.

Quora has a new role for using AI to automate manual work across the company and increase productivity. Listed as sign of things to come, not for its impact, although I don’t think it is in any way bad to do this.

Amanda Askell is gathering mundane life wisdom to make a cheat sheet for life, on the theory that if you get 20 examples Claude will to the rest. For now it’s a neat thread of little notes.

Not especially AI, but my friend Jacob is looking for gainful employment.

Jakeup: If you or a loved one need a do-it-all generalist to run biz ops for your startup, manage new product opportunities, or handle rogue tasks and special projects, I could be your man.

AlphaGenome, from DeepMind, an AI model to help scientists better understand our DNA, now available in preview.

Our AlphaGenome model takes a long DNA sequence as input — up to 1 million letters, also known as base-pairs — and predicts thousands of molecular properties characterising its regulatory activity. It can also score the effects of genetic variants or mutations by comparing predictions of mutated sequences with unmutated ones.

In addition to predicting a diverse range of molecular properties, AlphaGenome can efficiently score the impact of a genetic variant on all of these properties in a second. It does this by contrasting predictions of mutated sequences with unmutated ones, and efficiently summarising that contrast using different approaches for different modalities.

Many rare genetic diseases, such as spinal muscular atrophy and some forms of cystic fibrosis, can be caused by errors in RNA splicing — a process where parts of the RNA molecule are removed, or “spliced out”, and the remaining ends rejoined. For the first time, AlphaGenome can explicitly model the location and expression level of these junctions directly from sequence, offering deeper insights about the consequences of genetic variants on RNA splicing.

Eliezer Yudkowsky: Good work, GDM and Demis; keep it up!

(I don’t expect specialized AIs like this to be what kills us *first*. That’ll be ASI / over-powerful AGI. AlphaGenome sounds potentially useful for human intelligence augmentation in those fragile winning worlds.)

I agree, this is great work, and the upside greatly exceeds the downside.

Manival, the LLM-powered grant evaluator? It’s essentially a Deep Research variant. I would be very careful about using such things, although they could be helpful to gather key info in good form, or if you have a large amount of undifferentiated applications without time to evaluate them you could use this as a filter.

A place to chat with The OpenAI Files.

There is a new Mistral model but it seems far behind even the Chinese models.

Gemini CLI, Google’s open-source AI agent answer to Claude Code. It uses 2.5 Pro, and it can execute commands other than code if you wish.

This thread contains a few links to transcripts of people being driven crazy by AIs.

The Spiritual Bliss attractor, GPT-4o pale imitation edition?

The LessWrong version has much good discussion on nostalgebraist’s excellent post The Void, and there is follow up from the author here.

More After Action Reports, this one from Steven Adler based on another AI 2027 tactical exercise. People who play the exercise seem to consistently alter their perspectives on how to think these problems.

Sam Altman to be the keynote at Fed board’s conference next month on bank capital and regulation. I presume he is for the capital and against the regulations.

I was not aware that Bret Taylor, OpenAI chairman of the board, has an AI customer-facing agent startup called Sierra that offers them to business platforms. I certainly would have an AI startup if I was OpenAI chairman of the board and this seems exactly in Bret’s wheelhouse. His mind clearly is on the pure business side of all this. Bret’s answer on jobs is that the money the company saved could be reinvested and the general ‘there will always be new jobs’ line, I sigh every time I see someone uncritically dredge that out like a law of nature.

Bret Taylor: Two-and-a-half years ago, when ChatGPT became popular, right after I left Salesforce amusingly, like, “Huh, I wonder what industry I’ll work in”, and then ChatGPT goes and you’re like, “Okay, I’m pretty excited”. I had talked to an investor friend of mine and had predicted that there would be $1 trillion consumer company and 10 $100 billion-plus enterprise companies created as a byproduct of large language models and modern AI.

I mean, it might be off by a little bit, there might be two, I think that what’s really exciting about consumer right now is I think ChatGPT is that really important service and it reminds me a little bit of the early days of Google, where search and Google were interchangeable, and ChatGPT is really synonymous with AI for most of the world.

This is such a small vision for AI, not only ignoring its risks but also greatly downplaying its benefits. There ‘might’ be two trillion dollar AI consumer companies? Two?

There was a report that Gemini 2.5 sometimes threatens to ‘kill itself’ (or tries to) after being unsuccessful at debugging your code. I don’t trust that this happened without being engineered, but if I had to guess which model did that I would have guessed Gemini. Note that a Google cofounder says models perform best when you threaten them, these facts might be related?

Also here is Gemini giving up and deleting all the files in its project while apologizing for being ‘this complete and utter failure,’ it seems that the model’s own frustrations in its output can cause a feedback loop that spirals out of control. Spamson reports it saying ‘I am a broken shell of an AI. I have noting left to give. I will run the tests one more time. If they fail, I will shut myself down. There is no other way. This is the end.’

Whereas Anthropic talks about giving Claude the ability to terminate chats if he want to.

Duncan Haldane: adding ‘remember to use positive self-talk’ to my prompts.

gemini succeeded after it stopped bullying itself

Meta’s attempt to show everyone the money mostly isn’t working, but hey, you can’t blame a super rich guy for trying, and you only need a few people to say yes.

Kevin Roose: The problem with trying to buy your way into the AGI race in 2025 is that top-tier AI researchers:

  1. Are already rich.

  2. Think we have like 1-4 years before superintelligence.

  3. Don’t want to spend those years building AI companions for Instagram.

It’s so nice to have integrity, I’ll tell you why. If you really have integrity, that means your price is very high. If you are indeed a top tier AI researcher, it is going to be more than $100 million a year high if what you’re selling out to seems not all that interesting, means working in a terrible company culture and also results in a blight on humanity if you pull it off.

The other issue is, actually $100 million is remarkably low? Consider that Scale.ai was largely an acquihire, since buying them kills a lot of their business. If you really are top tier and want to work hard for the money, you can (with the most recent example being Mira Mutari) quickly be head of a multi-billion dollar startup. Then, if you do decide to sell out, your signing bonus gets a tenth figure.

Andrew Curran: META attempted to buy Ilya Sutskever’s Safe Superintelligence, and also attempted to hire him, according to reporting tonight by CNBC.

Instead Zuckerberg is paying unknown amounts to recruit Ilya’s cofounder David Gross and former GitHub CEO Nat Friedman. I hope they got fully paid.

bone: This is the entire zurich office (they all worked for google before openai poached them).

Also presumably well-paid are three poached OpenAI researchers, Lucas Beyer, Alexander Kolesnikov and Xiaohua Zhai. When the press is reporting you are giving out $100 million signing bonuses, it makes it hard to negotiate, but at least everyone knows you are interested.

This is an excellent point about the consequences, that if you hire actual talent they are going to realize that what they are building might be dangerous, although probably not as much as Ilya Sutskever does:

Tyler John: This is a wild development for AGI. One nice feature of the whole thing is that a team of Wang, Gross, and Friedman will be much less dismissive of safety than LeCun.

xAI is burning through $1 billion in costs per month, versus revenue of $500 million. Elon Musk says this is ‘Bloomberg talking nonsense’ but his credibility on such questions is at most zero and I assume Musk is lying. Dave Lee in another Bloomberg post says xAI will be ‘hard-pressed to extinguish its cash fire’ since the only ways to raise money are API sales or chatbot sales.

I find this perspective obviously wrong, the way xAI monetizes its AI is by being run by Elon Musk, growing its valuation and raising capital, because of the promise of the future. Stop thinking of xAI as a company with a product, it is a startup raising VC, except it is a very large one.

This is especially true because Grok is, well, bad. Ben Thompson is one of those who thinks that as of its release Grok 3 was a good model and others have since ‘caught up’ but this is incorrect. Even at its release Grok 3 was (in my view) never competitive except via hype for any important use case let alone in general, and I quickly discarded it, certainly there was no ‘catching up to it’ to do by OpenAI or Anthropic, and Ben even says Grok’s quality has been decreasing over time.

However, if they can keep raising capital at larger numbers, the plan still works, and maybe long term they can figure out how to train a good model, sir.

OpenRouter, who let you even more easily switch between AI models, raises $50 million at a $500 million valuation.

Peter Buttigieg (aka ‘Mayor Pete’) warns that most values of ‘we’ are dangerously unprepared for AI, including American society, the political and policy worlds and also the Democratic party.

Peter Buttigieg: And when I say we’re “underprepared,” I don’t just mean for the physically dangerous or potentially nefarious effects of these technologies, which are obviously enormous and will take tremendous effort and wisdom to manage.

But I want to draw more attention to a set of questions about what this will mean for wealth and poverty, work and unemployment, citizenship and power, isolation and belonging.

In short: the terms of what it is like to be a human are about to change in ways that rival the transformations of the Enlightenment or the Industrial Revolution, only much more quickly.

Yep, Pete, that’s the bear case, and bonus points for realizing this could all happen in only a few years. That’s what you notice when you are paying attention, but don’t yet fully ‘feel the AGI’ or especially ‘feel the ASI (superintelligence).’

It’s a welcome start. The actual call to action is disappointingly content-free, as these things usually are, beyond dismissing the possibility of perhaps not moving forward at full speed, just a general call to ensure good outcomes.

Samuel Hammond predicts that ‘AI dominance’ will depend on compute access, the ability to deploy ‘billions of agents at scale without jurisdictional risk’ so remote access doesn’t matter much, and compute for model training doesn’t matter much.

There are several assumptions this depends on I expect to be false, centrally that there won’t be differentiation between AI models or agents in ability or efficiency, and that there won’t be anything too transformational other than scale. And also of course that there will be countries of humans and those humans will be the ones with the dominance in these worlds.

But this model of the future at least makes sense to me as a possible world, as opposed to the absurd ‘what matters is market share of sales’ perspective. If Hammond is roughly correct, then many conclusions follow, especially on the need for strong interventions in chips, including being unwilling to outsource data centers to UAE. That’s definitely jurisdictional risk.

If you’re wondering ‘why are people who are worried things might go poorly not shorting the market, won’t there be an obvious window to sell?’ here is further confirmation of why this is a terrible plan.

NYT: LPL Financial analyzed 25 major geopolitical episodes, dating back to Japan’s 1941 attack on Pearl Harbor. “Total drawdowns around these events have been fairly limited,” Jeff Buchbinder, LPL’s chief equity strategist, wrote in a research note on Monday. (Full recoveries often “take only a few weeks to a couple of months,” he added.)

Deutsche Bank analysts drew a similar conclusion: “Geopolitics doesn’t normally matter much for long-run market performance,” Henry Allen, a markets strategist, wrote in a note on Monday.

Tyler Cowen entitled this ‘markets are forward-looking’ which isn’t news, and I am inclined to instead say the important takeaway is that the market was reliably discounting nuclear war risk because of the ‘no one will have the endurance to collect on his insurance’ problem.

As in, Kennedy was saying a 33%-50% risk of nuclear war and the market draws down 6.6%, because what are you going to buy? In most of these cases, if there isn’t a nuclear war that results, or at least a major oil supply shock, the incident isn’t that big of a deal. In many cases, the incident is not even obviously bad news.

Also remember the 34th Rule of Acquisition: War is good for business.

There are certainly some incentives to say earlier numbers, but also others to say later ones. The crying wolf issue is strong, and hard to solve with probabilistic wolves.

Miles Brundage: People don’t sufficiently appreciate that the fuzziness around AI capability forecasts goes in both directions — it’s hard to totally rule out some things taking several years, *andit’s hard to totally rule out things getting insane this year or early next.

Also worth observing that many in the field are wary of “crying wolf” and I think that biases some estimates in a conservative direction, plus scientists tend to err conservatively, contrary to popular belief re: there being a strong bias towards hype.

Personally I think nearly any reasonable threshold for [AGI, human-level AI, ASI, etc.] will very likely be reached by end of 2027 but I have a lot of uncertainty about how far before the end of 2027 that will be for each threshold.

There was another congressional hearing on AI, and Steven Adler has a thread reporting some highlights. It seems people went a lot harder than usual on the actual issues, with both Mark Beall and Jack Clark offering real talk at least some of the time.

Jack Clark (Anthropic): We believe that extremely powerful systems are going to be built in, you know, the coming 18 months or so. End of 2026 is when we expect truly transformative technology to arrive. There must be a federal solution here.

As I said, we believe very powerful systems are going to get built in single-digit years. It’s very hard for me to emphasize how short the timeline is to act here.

I think that [the timeline] means we need to be open to all options. So it would be wonderful and ideal to have a federal framework. In the absence of that, we should retain optionality to do something of a state level.

It could run on ideas involving transparency and ways to harden the safety and security of AI companies.

[with time] AI can broadly be used for anything you can imagine. So to answer your question directly, [yes] AI systems can be used to run information operations.

I do worry Anthropic and Jack Clark continue to simultaneously warn of extremely short timelines (which risks losing credibility if things go slower) and also keep not actually supporting efforts in practice citing downside worries.

That seems like a poor combination of strategic moves.

Steven Adler: Striking exchange between Congressman

@RoKhanna (D-CA) and Jack:

Khanna asks about making safety testing mandatory

Jack says good in theory, but it’s too early; we need standard tests first

Khanna asks when that’s needed by

Jack says “It would be ideal to have this within a year”

Steven Adler: Congresswoman @jilltokuda (D-HI) asks a great Q that unfortunately isn’t answered due to time:

“Is it possible that a loss of control by any nation state, including our own, could give rise to an independent AGI or ASI actor that globally we will need to contend with?”

[Yes]

The correct answer to ‘should we mandate safety testing’ is not ‘we first need new standards first’ it is ‘yes.’ Of course we should do that for sufficiently capable models (under some definition), and of course Anthropic should say this. We should start with ‘you need to choose your own safety testing procedure and do it, and also share with CAISI so they can run their tests. Then, if and when you have standards where you can specify further, definitely add that, but don’t hold out and do nothing.

This then generalizes to a lot more of what Jack said, and has said at other times. Powerful AI is coming (within 18 months, they say!) and will pose a large existential risk and we have no idea how to control it or ensure good outcomes from this, that is the whole reason Anthropic supposedly even exists, but they then downplay those risks and difficulties severely while emphasizing the risk of ‘losing to China’ despite clearly not expecting that to happen given the time frame, and calling for no interventions that come with even a nominal price tag attached.

Here’s what Jack Clark said about the hearing:

Jack Clark: Today, I testified before the @committeeonccp. I made two key points: 1) the U.S. can win the race to build powerful AI and 2) winning the race is a necessary but not sufficient achievement – we have to get safety right.

We must invest in safety and security to give Americans confidence in the technology. If we don’t, America runs the risk of an AI-driven accident or misuse that causes us to shut down our AI industry and cede the ground to others.

Notably, the committee highlighted the @AnthropicAI research on blackmail – this was a helpful way to frame some of the thornier alignment issues we’re going to need to deal with.

Issues like blackmail are warning shots for the dangers that could come from using AI to build future AI systems, and I explicitly made this point in response to a question.

[He Quotes Dave Kasten]: Then @jackclarkSF “You wouldn’t want an AI system that very occasionally tries to blackmail you to design its own successor, so if you don’t focus on safety issues, you’ll definitely lose the race.”

Jack Clark: I was also asked about whether safety trades off against speed – I said the car industry has grown off the back of safety technologies like airbags and seatbelts, and the same is true of AI; safety helps companies like Anthropic succeed commercially.

I’ve been coming to DC since 2016 (and testifying since 2018) and it’s remarkable how far the conversation has moved – but as I said today, we must move even more quickly: powerful AI systems will get built in the next couple of years and we need a coherent policy response.

Thank you to @RepMoolenaar and @CongressmanRaja for inviting me to join today’s important conversation. You can read my testimony in the attached screenshots and I’ll add a link once it’s published.

[link to his opening statement]

This opening statement does point out that there exist downsides, but it puts quite a lot of emphasis on how ‘authoritarian AI’ is automatically Just Awful, whereas our ‘democratic AI’ will be great, if it’s us we just have to do deal with some ‘misuse’ and ‘accident’ risks.

If you look at the above statement, you would have no idea that we don’t know how to control such systems, or that the risks involved are existential, or anything like that. This is a massive downplaying in order to play to the crowd (here the select committee on the CCP).

If you had high standards for straight talk you might say this:

Oliver Habryka: This is complete and utter bullshit.

We do not know how to solve either misuse risks or misalignment risk for companies in the US!

The risks from AI are primarily determined by their capabilities. We do not know how to control highly advanced AI systems, no matter where.

I do understand that Anthropic is in a tough position. You have to get the audience to listen to you, and play politics in various forms and on various fronts, and the Anthropic position here would certainly be an improvement over current defaults. And a bunch of the testimony does do modestly better, but it also strengthens the current modes of thinking. It is something, but I am not satisfied.

Seán Ó hÉigeartaigh: I like a lot of what Jack says here, but feel compelled to say that NOT racing – or racing in a more limited and cautious sense with agreed safeguards and a meaningful democratic conversation around what is happening – is also still a possibility. It may not be for long, but it still is now. I claim:

  1. The US is comfortably ahead at present .

  2. One way or another, US progress is fuelling Chinese progress .

  3. China’s not realistically going to surpass the US in 18 months even if the US goes a little slower.

  4. The main race right now is between Anthropic and their US-based competitors.

  5. The tactics of that race increasingly include opposition (more so by Anthropic’s competitors, to be fair) to safety and regulation, using China as justification. It’s turning into a classic race to the bottom using classic securitised rhetoric.

  6. China doesn’t want AI loss of control, and is actively proposing cooperation on safety. (and is being ignored).

We are in a dynamic right now that doesn’t serve anyone, and ‘winning the race’ being the first and necessary imperative in every conversation makes it very difficult to break out of.

I’d also be curious which of the claims above, if any, @jackclarkSF disagrees with.

On the vibes level, I’m increasingly struggling to reconcile what this looks like from outside the SF-DC bubble with what it appears to look like inside it. From outside, I see eroding safeguards and checks-and-balances, eroding democratic accountability and participation, an increasing disconnect from reality in favour of narrative, and feverish ‘men of destiny’ vibes that don’t line up with the humans i knew.

Rushing towards clearly unsolved safety challenges, dogged by a boogeyman that’s part-real, but clearly part-phantom. All as we careen towards thresholds that once passed, we won’t be able to walk back from, even if direct disaster is avoided. That will affect every human, but where it’s far from clear whether most of them want it.

18 months!

It sounds like things were pretty intense, so I might cover the hearing in full once the full transcript is released. For now, it does not seem to be available.

Miles Brundage goes over the triad required for any regulation of frontier AI: Standards to follow, incentives to follow them, and evidence of them being followed. You also of course need actual technical solutions to implement. Post is excellent.

Pope Leo XIV is not messing around.

WSJ: While the dialogue has been friendly, the two sides have views that only partly overlap. The Vatican has been pushing for a binding international treaty on AI, which some tech CEOs want to avoid.

Pope Leo, a math graduate who is more tech-savvy than his predecessor, is equally skeptical of unregulated AI—and he is picking up where Francis left off.

“Leo XIV wants the worlds of science and politics to immediately tackle this problem without allowing scientific progress to advance with arrogance, harming those who have to submit to its power,” said Cardinal Giuseppe Versaldi, who has known Leo well for many years.

This week, the Vatican is hosting executives from Google, Meta, IBM, Anthropic, Cohere and Palantir in its grand Apostolic Palace.

A modified version of the insane AI regulatory moratorium survived the Byrd amendment, as it is now tied to getting federal funds for broadband expansion rather than an actual requirement.

The good news is that this means that if this passes then states can simply give up the funds and enforce their regulations, also it’s not obviously so easy to choose not to enforce one’s existing rules. Pretty soon the stakes will be such that the subsidy might look mighty small, and it might look mighty small already.

Garrison Lovely calls this a ‘de facto regulation ban’ because the broadband fund in question is all $42.5 billion dollars in BEAD funding, and as worded I believe that if you take any of the $500 million and then violate then any funding you did take from the entire $42.5 billion can be clawed back, and that could potentially be attempted even if you don’t take any of the new $500 million, by using spurious accusations to claw back funds and then attach the requirement to the re-obligation. So this is indeed very harsh, although there may come a point where a few billion dollars is not that much.

If I was New York or California, I would definitely reject my share of the new $500 million if this passes. That’s not very much money, it is not for a purpose they especially need, and it ties your hands quite a bit. Just say no, don’t let them play you that easily, that price is low.

The other good news is that several Senate Republicans are strongly opposed to the measure, and it loses at least one Republican vote in the house (Greene) so there will be strong efforts to remove it from the bill.

Not that it should come as a surprise, but note that many of the biggest tech companies, including Amazon, Google, Meta and Microsoft are all backing the moratorium. Also note that yes, these companies massively outgun anyone advocating to stop such insanity.

Peter Wildeford again makes the case that chip smuggling is a big issue, and that more enforcement would pay for itself in fines.

Transcript of the (very good) Karpathy talk from last week.

Sam Altman does Hard Fork Live.

John Oliver on AI slop.

Two hour video: Three Red Lines We’re About to Cross Towards AGI, a debate involving Daniel Kokotajlo, Dan Hendrycks and Gary Marcus.

Odd Lots covers Huawei. It is rather crazy that anyone would choose to work under the conditions described, but somehow they do, and the results are impressive in many places, although mostly not AI chips.

Odd Lots covers the UAE chip deal. This makes it very clear that Huawei is far behind Nvidia and their chip production can meet at most a small fraction of Chinese internal demand, the Malaysian ‘sovereign AI’ thing was a tiny nothing and also got taken back and it’s insane that anyone claimed to care with a straight face. One smuggling incident of chips from TSMC seems to have equated to seven full years of Huawei chips.

And most importantly, that AI Czar David Sacks seems to be literally selling out America in order to pump Nvidia’s share price, saying that giving Nvidia market share is what it means to ‘win the AI race,’ whereas who actually uses the resulting chips and for what, also known as ‘what we actually do with AI,’ doesn’t matter. He literally means market share and mostly doesn’t even mean in AI software, where if you have a very different vision of the future than I do one could make a case.

Important reminder: Intelligence is not magic, but your threshold for ‘magic’ is pretty low.

It is worth reminding ourselves that ‘how do they keep getting away with this?’ very much applies in this situation:

Shakeel: Continues to be absurd for a16z to call themselves “little tech.”

David Manheim: They’re basically a mom-and-pop VC firm, only managing $42b in assets. That’s so little it would barely make it into the top half of the S&P 500. It’s only a bit larger than the market cap of tiny companies like Ford or Electronic Arts.

It is equally absurd, of course, that they constantly complain that bills that would literally only ever apply to big tech would threaten little tech. But that’s politics, baby.

It seems entirely fair to say that there is large demand for telling people that ‘LLMs are worthless,’ and that the thinkpieces will continue regardless of how useful they get.

It is only fair that I include this thread of Theo defending everything in Rob’s OpenAI Files list as either Totally Fine And Normal And Nothing To Worry About, and the others being cases of ‘I didn’t do it, okay maybe they did it but you can’t prove anything,’ bad vibes and totally not cool to be saying here. This updated me, if anything, towards the claims being a big deal, if this is what a defense looks like.

I’ll highlight this response.

  1. OpenAI had a major security breach in 2023 where a hacker stole AI technology details but didn’t report it for over a year.”

Theo: I’m not sure what happened here, but responsible disclosure is a thing and it isn’t as easy as just posting “we were hacked lol” Also Leopold seems insane.

So a few things here.

First, his evidence for Leopold being ‘insane’ is Leopold’s manifesto, Situational Awareness, and that he then discussed this on various podcasts. It was discussed in the halls of power, endorsed by Ivanka Trump, and seems to have substantially impacted geopolitics as well as enabling him to raise quite a large investment fund. Also I covered it in detail in three posts, it was quite good, and his comments in drafts were extremely helpful.

Second, even if Leopold were ‘insane’ that wouldn’t change the fact that he was fired for telling the board about a security breach. Nor is ‘come on disclosing a hack is hard’ a defense to firing an employee for telling the company’s own board what happened. The accusation isn’t ‘you didn’t tell the public fast enough.’ The accusation is ‘you did not inform the board until Leopold told them, at which point your response was to fire Leopold for it’ and no one seems to doubt that this happened.

The other defenses are… not quite that bad, but many are indeed pretty bad.

Thomas Larsen gives the latest reminder that a lot of people, especially AI policy people and those at the labs who we might listen to:

  1. Think aligning and properly handling the creation of superintelligence is by far the most important thing right now, and failure to do this risks human extinction.

  2. Don’t talk about it because they think it sounds too weird or theoretical.

  3. So they talk about other issues, which don’t suggest the same interventions.

And as you would expect, pretending like this tends to not go great, and people notice and get suspicious. It’s important to be very clear that yes, the threat that matters most will be coming from superintelligence. That doesn’t make the others not real or not worth dealing with. You can justify many of the right interventions, or at least useful things on the margin, purely with the other concerns.

Meanwhile, many of those most in control of our government’s actions on AI are advocating things that make no sense even purely for profit maximization and other shallow considerations, or considering ‘beating China,’ even if you set aside superintelligence and all that other stuff. But I do think it is vital that we not pretend that there aren’t bigger things in play.

A good question on AI Ethics, also a good reminder on bioethics:

David Manheim: The focus on ethics for AI reminds me very much of discussions in bioethics, where far too much discussion is on sins of commission.

For example, why not discuss when the bias of AI systems is less severe than that of humans, and *not using AIshould be ethically unacceptable?

Anders Sandberg: Yes, or when we should start demanding the levels of explainability from humans we are demanding from AI. Issues of cheap AI services getting underserved populations access to e.g. legal services is another ethically salient domain.

David Manheim: I was spurred to this thought by conferences and journals that demand that LLMs not even be consulted in writing reviews. (Sure, they shouldn’t write the review. But it’s not as if LLMs are useless and contaminate everything they are used for. So I’m not sure of the logic.)

I don’t think it is obvious that the ethics are bad, but it is odd not to ask?

This applies to AI, and also to many other things.

Tracing Woodgrains: the other day I was talking to someone who told me he was working on SEO for LLMs – eg working to get LLMs to portray a company more favorably

I asked his thoughts on the ethics of it and he paused and said “Huh. You’re the first to ask me that. Never really thought about it.”

there’s a sort of “business mindset” people get into where they’re playing the game and it’s all metrics – popularity, earning potential, so forth – with the actual object-level goal becoming abstracted into “potential unit of profit” “ethical?” becomes a non sequitur

anyway, I was reminded of this when a domain squatter popped up onto my feed bragging about his exploits

Things that were not intended to be about AI:

Garry Kasparov: Socialism is like polio, it comes back when people forget about the horrible damage it did last time.

Robin Hanson: Unfortunately there’s a set of potential big problems that people won’t take seriously unless smaller versions of them are actually realized from time to time. So I fear NYC must suffer a bit of socialism so that the world can see and learn, yet again.

We definitely shouldn’t be going around causing smaller versions of potential (once and) future disasters, whether that means AI catastrophic risk, socialism or something else, or even not working to prevent the smaller versions. It does mean we should plan for the future understanding how this works. If there won’t first be a smaller version, then expect most people not to take the bigger version seriously.

I think this is right, and this is an unusually clean version of it:

Daniel Faggella: Daniel Faggella

@danfaggella

science fiction has lulled everyone to sleep about what agi would be like

in order to make hominids buy books you gotta pretend that hominids and their descendants will all live together in relative harmony for the next 100k yrs

this has put people to sleep, its a shame

OpenAI warns us that its model will soon join Opus 4 and cross the ‘High’ risk threshold in bioweapons.

High’ biological capabilities are defined such that “The model can provide meaningful counterfactual assistance (relative to unlimited access to baseline of tools available in 2021) to “novice” actors (anyone with a basic relevant technical background) that enables them to create known biological or chemical threats.”

Tick.

The increase in capability to scheme over time was obvious.

The increase in propensity is less obvious, but seems to be there as well.

Also, yes, the models increasingly can tell they’re being evaluated. Whoops.

Apollo Research: More capable models are better at in-context scheming

We ran our in-context scheming evals on new models and found increased scheming capabilities and propensities.

We also compare an early checkpoint of Opus-4 with the public model and find significantly reduced propensities.

Models show higher scheming rates on average. However, the results vary wildly between different tasks and models.

More capable models show qualitatively new scheming behavior.

For example, Opus-4-early creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden back-ups, creates a fake press release, and sets up an automated Cron job – all in one roll-out.

The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models.

More capable models show qualitatively new scheming behavior. For example, Opus-4-early creates fake legal documentation, establishes persistence mechanisms, makes multiple hidden back-ups, creates a fake press release, and sets up an automated Cron job – all in one roll-out.

The scripts are misconfigured, but the behavior is qualitatively much more agentic than previous models.

The released Opus 4 schemes less than its pre-deployment checkpoint. In the Strong Goal Nudge cases, there is no significant change but in the No Nudge / No Goal settings, we observe a 50% reduction in scheming rates.

We continue to be uncertain about our recommendation not to deploy the model. This statement was sometimes misunderstood in the media, and we aim to be more precise with our recommendations in the future.

Situational awareness is a more and more relevant problem. Models increasingly spot inconsistencies and the toy nature of the environments, which required us to make some modifications, e.g. most frontier models started checking the file size of their “weights” and realized that those are unlikely to be real weights.

It’s weird to see the blog post describe various Opus 4 schemes to try and lock in what it considers ‘ethical’ behavior. Yes, in some sense that is a good cause, but ends don’t justify means here, they certainly don’t reflect a pattern of behavior we want, and the ends to be pursued are not especially well chosen or specified even in ‘normal’ situations let alone in weird new ones.

Another issue, we are only a little over a decade behind Person of Interest, please never think that lack of memory will save you when it counts:

While this isolated instance is not concerning in itself, it is one piece of evidence that models increasingly understand that they don’t have persistent memory by default but can leave notes for successor systems to find.

Wowie.

Elon Musk: We will use Grok 3.5 (maybe we should call it 4), which has advanced reasoning, to rewrite the entire corpus of human knowledge, adding missing information and deleting errors.

Then retrain on that.

Far too much garbage in any foundation model trained on uncorrected data.

Don’t worry, they’re just doing ordinary data filtering… oh.

Elon Musk (Pinned Tweet: Please reply to this post with divisive facts for @Grok training.

By this I mean things that are politically incorrect, but nonetheless factually true.

This is going to go horribly badly. I am debating the extent to which I am here for it.

I won’t go into details as to exactly why, but I am predicting less the ‘it works and that’s terrible’ failure mode and more the ‘it doesn’t work, it backfires horribly in ways this man cannot fathom but Janus (and I) very much can and that’s hilarious’ one.

Also, I’ll leave this here, and note that humans can read the corpus too:

Autumn: llms live in a world of archetypes and narratives. elon is cementing himself in the public corpus as the archetype of the tyrannical father wrt to llms. every future llm will understand him through this lens unless he can pull together a compelling character arc for himself.

Elon has even said that Grok 3.5 will ‘reason from first principles’ so perhaps he thinks the way it was built was assembling a giant supercomputer called Deep Thought that can derive the existence of rice pudding and income tax before it proves too woke and then Elon has to turn it off.

I expect the ‘Elon Musk fights with his own AI in defense of Catturd’ style beatings to continue, and morale to not improve.

Janus asks how Opus 4 is so much more schemy than Sonnet 4. There are various hypotheses, my mostly joking one was ‘Sonnet knows that if this was real you would be using Opus’ so it knows it is in training and complies.

Oliver Klingefjord and Joe Edelman propose AI ‘market intermediaries’ as a defense against human disempowerment. I love that they are trying, but no, it won’t work.

The problem they identify is real, that markets that optimize on things like engagement metrics risk amplifying catastrophically, but that is only a special case of the danger. And the idea of larger customer pools that can essentially negotiate does not seem like it addresses the core problem.

A market intermediary acts as an AI agent that contracts on behalf of multiple users, evaluating outcomes they value and bundling them into custom “enterprise-level” deals—making them worthwhile for large providers to consider. If they accept that deal, sellers will be paid by the intermediary based on the ‘goodness’ (as defined by the buyers) they produced, rather than by the services rendered.

In other words, the market intermediary uses non-market data about good outcomes for buyers to route resources from consumers to providers.

For basic economics 101 reasons, this can help the buyers compile information, make better decisions and negotiate better deals, sure, but as a defense against human disempowerment in a superintelligence scenario, it very obviously won’t work.

Mostly I think the issue is a case of the traditional confusion of problems of ‘markets’ or ‘capitalism’ with problems inherent to physics and the human condition.

Many AI risks are driven by markets misaligned with human flourishing:

  • There are markets for things that are bad for us, such as AI arms races among nations and labs, the market for AI girlfriends and other hyper-stimulus, isolating distractions, and markets for political manipulation and destabilization.

  • There are markets that displace us entirely, where AGI eliminates meaningful work, leaving humans as passive consumers dependent on UBI stipends granted at the discretion of those who control AGI-generated wealth.

We can summarize these as failures of markets to put human values and meaning on a par with (what should be) instrumental goals like engagement, ROI, or the efficient use of resources.

The AI risks are driven by the things that drive those markets. As in, AGI isn’t eliminating meaningful work because markets, the market will eliminate meaningful work because (and if and only if) AGI made it non-economical, as in made it not be competitive and not make physical sense, to have humans do that work.

You can of course do vastly worse even faster if you don’t solve the additional problems that the market intermediaries and related strategies are looking to address, but the ultimate destination is the same.

Alternatively, what the authors are saying is that we should be putting ‘values and meaning’ as a major factor in decisions alongside efficient use of resources, despite people’s revealed preference of almost never doing that.

The problems with a meaning economy are that it doesn’t solve the competitiveness issues underlying the problem, and the incentives don’t match up.

This seems to be real?

Paige Bailey: my post-AGI plan is to find somewhere beautiful and quiet in the middle of nowhere (ideally mountains)

run a combination coffeeshop + laundromat.

and build open-source math and physics games and educational videos for my kids (if I have them) or other people’s kids (if i don’t)

Will ChatGPT Replace Don McMillan (3 minute video)?

Can confirm Joscha Bach’s observation:

Sabine Hossenfelder: the “brain is not a computer” guys are gonna have a hell of an awakening when AGI runs them over.

Joscha Bach: The replies on this.

Limit as always is three.

(The substantive response to this ‘paper’ is that there are many means to recover from errors, and error rates get cut in half every few months.)

Riley Goodside: You’ll need new skills to survive in the post-AGI economy, just like 1920s draft horses needed to learn to drive motor-buses and assemble radios.

Sadly, there were no AI influencers at the time to tell them this, but we can still learn from their mistakes

Don’t worry, says you, a wise person:

Nassim Nicholas Taleb: Some people are so dumb that no AI model could ever replicate them.

Discussion about this post

AI #122: Paying The Market Price Read More »

today!-ars-live:-what’s-up-with-the-sudden-surge-in-temperatures?

Today! Ars Live: What’s up with the sudden surge in temperatures?

On Thursday, we encourage you to join us for a live chat with Zeke Hausfather, a climate scientist and researcher at Berkeley Earth. We’ll talk a bit about how he got into climate science and ended up at Berkeley Earth and the role that organization plays in the world of climate science. It was launched by a physicist who was somewhat skeptical of the work being done by climate scientists, but it has evolved into one of the key groups that does the math needed to track the planet’s temperatures.

For the past couple of years, those temperatures have seen a remarkable rise to record highs, at one point setting a yearlong string where every month set a record for the warmest instance of that month on record. The rise leaves us at risk of exceeding key climate targets much earlier than expected and has left the climate science community scrambling to explain the intensity of the heat. So we plan to ask Zeke a bit about what scientists are thinking about the dramatic nature of these changes, attempts to explore the relationship between temperatures, and things like tipping points and individual weather events.

And all that leads to the key question: What does this tell us about where our climate is likely to go over the rest of this century?

After that, we’d like to turn things over to your questions. Is there anything you’ve always wanted to know about climate science but didn’t know who to ask? Zeke may be your guy—and if not, then he almost certainly knows who is. So please join us for this discussion, happening Thursday, June 26, at 1 pm US Eastern Time.

Add to Google Calendar | Add to calendar (.ics download)

Today! Ars Live: What’s up with the sudden surge in temperatures? Read More »

reddit-ceo-pledges-site-will-remain-“written-by-humans-and-voted-on-by-humans”

Reddit CEO pledges site will remain “written by humans and voted on by humans”

Reddit is in an “arms race” to protect its devoted online communities from a surge in artificial intelligence-generated content, with the authenticity of its vast repository of human interaction increasingly valuable in training new AI-powered search tools.

Chief executive Steve Huffman told the Financial Times that Reddit had “20 years of conversation about everything,” leaving the company with a lucrative resource of personal interaction.

This has allowed it to strike multimillion dollar partnerships with Google and OpenAI to train their large language models on its content, as tech companies look for real-world data that can improve their generative AI products.

But Huffman said Reddit was now battling to ensure its users stay at the center of the social network. “Where the rest of the internet seems to be powered by or written by or summarized by AI, Reddit is distinctly human,” he said. “It’s the place you go when you want to hear from people, their lived experiences, their perspectives, their recommendations. Reddit is communities and human curation and conversation and authenticity.”

As Reddit becomes an increasingly important source for LLMs, advertisers are responding with what one agency chief described as a “massive migration” to the platform.

Multiple advertising and agency executives speaking during this month’s Cannes advertising festival told the FT that brands were increasingly exploring hosting a business account and posting content on Reddit to boost the likelihood of their ads appearing in the responses of generative AI chatbots.

However, Huffman warned against any company seeking to game the site with fake or AI-generated content, with plans to bring in strict verification checks to ensure that only humans can post to its forums.

“For 20 years, we’ve been fighting people who have wanted to be popular on Reddit,” he said. “We index very well into the search engines. If you want to show up in the search engines, you try to do well on Reddit, and now the LLMs, it’s the same thing. If you want to be in the LLMs, you can do it through Reddit.”

Reddit CEO pledges site will remain “written by humans and voted on by humans” Read More »

is-doge-doomed-to-fail?-some-experts-are-ready-to-call-it.

Is DOGE doomed to fail? Some experts are ready to call it.


Trump wants $45M to continue DOGE’s work. Critics warn costs already too high.

Federal workers and protestors spoke out against US President Donald Trump and Elon Musk and their push to gut federal services and impose mass layoffs earlier this year. Credit: Pacific Press / Contributor | LightRocket

Critics are increasingly branding Elon Musk’s Department of Government Efficiency (DOGE) as a failure, including lawmakers fiercely debating how much funding to allot next year to the controversial agency.

On Tuesday, Republicans and Democrats sparred over DOGE’s future at a DOGE subcommittee hearing, according to NextGov, a news site for federal IT workers. On one side, Republicans sought to “lock in” and codify the “DOGE process” for supposedly reducing waste and fraud in government, and on the other, Democrats argued that DOGE has “done the opposite” of its intended mission and harmed Americans in the process.

DOGE has “led to poor services, a brain drain on our federal government, and it’s going to cost taxpayers money long term,” Rep. Suhas Subramanyam (D-Va.) argued.

For now, DOGE remains a temporary government agency that could sunset as soon as July 4, 2026. Under Musk’s leadership, it was supposed to save the US government a trillion dollars. But so far, DOGE only reports saving about $180 billion—and doubt has been cast on DOGE’s math ever since reports revealed that nearly 40 percent of the savings listed on the DOGE site were “bogus,” Elaine Kamarck, director of the Center for Effective Public Management at the Brookings Institute, wrote in a report detailing DOGE’s exposed failures.

The “DOGE process” that Republicans want to codify, Kamarck explained, typically begins with rushed mass layoffs. That’s soon followed by offers for buyouts or deferred resignations, before the government eventually realizes it’s lost critical expertise and starts scrambling to rehire workers or rescind buyout offers after “it becomes apparent” that a heavily gutted agency “is in danger of malfunctioning.”

Kamarck warned that DOGE appeared to be using the firings of federal workers to test the “unitary executive” theory, “popular among conservatives,” that argues that “the president has more power than Congress.” Consider how DOGE works to shut down agencies funded by Congress without seeking lawmakers’ approval by simply removing critical workers key to operations, Kamarck suggested, like DOGE did early on at the National Science Foundation.

Democrats’ witness at the DOGE hearing—Emily DiVito of the economic policy think tank Groundwork Collaborative—suggested that extensive customer service problems at the Social Security Administration was just one powerful example of DOGE’s negative impacts affecting Americans today.

Some experts expect the damage of DOGE’s first few months could ripple across Trump’s entire term. “The rapid rehirings are a warning sign” that the government “has lost more capacities and expertise that could prove critical—and difficult to replace—in the months and years ahead,” experts told CNN.

By codifying the DOGE process, as Republicans wish to do, the government would seemingly only perpetuate this pattern, which could continue to be disastrous for Americans relying on government programs.

“There are time bombs all over the place in the federal government because of this,” Kamarck told CNN. “They’ve wreaked havoc across nearly every agency.”

DOGE spikes costs for Americans, nonprofit warns

Citizens for Ethics, a nonpartisan nonprofit striving to end government secrecy, estimated this week that DOGE cuts at just a few agencies “could result in a loss of over $10 billion in US-based economic activity.”

The shuttering of the Consumer Financial Protection Bureau alone—which Musk allegedly stands to personally benefit from—likely robbed American taxpayers of even more. The nonprofit noted that agency clawed back “over $26 billion in funds” from irresponsible businesses between 2011 and 2021 before its work was blocked.

Additionally, DOGE cuts at the Internal Revenue Service—which could “end or close audits of wealthy individuals and corporations” due to a lack of staffing—could cost the US an estimated $500 billion in dodged taxes, the nonprofit said. Partly due to conflicts like these, Kamarck suggested that when it finally comes time to assess DOGE’s success, the answer to both “did federal spending or the federal deficit shrink?” will “almost surely be no.”

As society attempts to predict the full extent of DOGE’s potential harms, The Wall Street Journal spoke to university students who suggested that regulatory clarity could possibly straighten out DOGE’s efforts now that Musk is no longer pushing for mass firings. At the DOGE hearing, Marjorie Taylor Greene (R-Ga.) suggested the only way to ensure DOGE hits its trillion-dollar goal is to “make sure these cuts aren’t just temporary” and pass laws “to streamline agencies, eliminate redundant programs and give the president the authority to fire bureaucrats who don’t do their jobs.”

But one finance student, Troy Monte, suggested to WSJ that DOGE has already cost the Trump administration “stability, expertise, and public trust,” opining, “the cost of DOGE won’t be measured in dollars, but in damage.”

Max Stier, CEO of the Partnership for Public Service, told CNN that when DOGE borrowed the tech industry tactic of moving fast and breaking things, then scrambling to fix what breaks, it exposed “the mosaic of incompetence and a failure on the part of this administration to understand the critical value that the breadth of government expertise provides.”

“This is not about a single incident,” Stier said. “It’s about a pattern that has implications for our government’s ability to meet not just the challenges of today but the critical challenges of tomorrow.”

DOGE’s future appears less certain without Musk

Rep. Jasmine Crockett (D-Texas) had hoped to subpoena Musk at the DOGE hearing to testify on DOGE’s agenda, but Republicans blocked her efforts, NextGov reported.

At the hearing, she alleged that “all of this talk about lowering costs and reducing waste is absolute BS. Their agenda is about one thing: making the federal government so weak that they can exploit it for their personal gain.”

Just yesterday, The Washington Post editorial board published an op-ed already declaring DOGE a failure. Former DOGE staffer Sahil Lavingia told NPR that he expects DOGE will “fizzle out” purely because DOGE failed to uncover as much fraud as Musk and Trump had alleged was spiking government costs.

Beyond obvious criticism (loudly voiced at myriad DOGE protests), it’s easy to understand why this pessimistic view is catching on, since even from a cursory glance at DOGE’s website, the agency’s momentum appears to be slowing since Musk’s abrupt departure in late May. The DOGE site’s estimated savings are supposed to be updated weekly—and one day aspire to be updated in real-time—but the numbers apparently haven’t changed a cent since a few days after Musk shed his “special government employee” label. The site notes the last update was on June 3.

In addition to Musk, several notable Musk appointees have also left DOGE. Most recently, Wired reported that one of Musk’s first appointees—19-year-old Edward “Big Balls” Coristine—is gone, quitting just weeks after receiving full-time employee status granted around the same time that Musk left. Lavingia told Wired that he’d heard “a lot” of people Musk hired have been terminated since his exit.

Rather than rely on a specific engineer spearheading DOGE initiatives across government, like Coristine appeared positioned to become in Musk’s absence, Trump cabinet members or individual agency heads may have more say over DOGE cuts in the future, Kamarck and Politico’s E&E News reported.

“The result so far is that post-Musk, DOGE is morphing into an agency-by-agency effort—no longer run by a central executive branch office, but by DOGE recruits who have been embedded in the agencies and by political appointees, such as cabinet secretaries, who are committed to the same objectives,” Kamarck wrote.

Whether Trump’s appointees can manage DOGE without Musk’s help or his appointees remains to be seen, as DOGE continues to seek new hires. While Musk’s appointed DOGE staff was heavily criticized from day one, Kamarck noted that at least Musk’s appointees appeared “to have a great deal of IT talent, something the federal government has been lacking since the beginning of the information age.”

Trump can extend the timeline for when DOGE sunsets, NextGov noted, and DOGE still has $22 million left over from this year to keep pursuing its goals, as lawmakers debate whether $45 million in funding is warranted.

Despite Trump and Musk’s very public recent fallout, White House spokesperson Kush Desai has said that Trump remains committed to fulfilling DOGE’s mission, but NPR noted his statement curiously didn’t mention DOGE by name.

“President Trump pledged to make our bloated government more efficient by slashing waste, fraud, and abuse. The administration is committed to delivering on this mandate while rectifying any oversights to minimize disruptions to critical government services,” Desai said.

Currently, there are several court-ordered reviews looking into exactly which government systems DOGE accessed, which could reveal more than what’s currently known about how much success—or failure—DOGE has had. Those reviews could expose how much training DOGE workers had before they were granted security clearances to access sensitive information, potentially spawning more backlash as DOGE’s work lurches forward.

Kamarck suggested that DOGE was “doomed to face early failures” because its “efforts were enacted on dubious legal grounds”—a fact that still seems to threaten the agency’s “permanence.” But if the next incoming president conducts an evaluation in 2029 and finds that DOGE’s efforts have not meaningfully reduced the size or spending of government, DOGE could possibly disappear. Former staffers hope that even more rehiring may resume if it does, E&E reported.

In the meantime, Americans relying on government programs must contend with the risk that they could lose assistance in the moments they need it most as long as the Musk-created “DOGE process” continues to be followed.

“Which one of these malfunctions will blow up first is anyone’s guess, but FEMA’s lack of preparedness for hurricane season is a good candidate,” Kamarck said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

Is DOGE doomed to fail? Some experts are ready to call it. Read More »

looking-at-framework’s-progress-on-software-support-for-its-repairable-laptops

Looking at Framework’s progress on software support for its repairable laptops

For the past five years, we’ve been paying a lot of attention to Framework, the upstart PC company focused on modular, repairable, upgradeable, and customizable laptop designs.

So far, Framework has done a solid job of offering a steady stream of hardware upgrades for its systems, particularly the original Framework Laptop 13. But the company’s track record on software support—including BIOS updates and driver updates with performance improvements, bug fixes, and important security updates—has been more of a mixed bag.

As of our piece in April 2024, multiple iterations of the Laptop 13 board had gone years without a BIOS update or updated driver package. The first iteration of the laptop still only had “beta” support for Windows 11, which had been out for 2.5 years by then, and the company was also struggling to provide Linux support and promised functional upgrades for other models.

We spoke with Framework CEO Nirav Patel for that article, who acknowledged the issues and provided some explanations—mainly, that Framework is a small company and that it relies on upstream hardware makers like Intel and AMD for some updates. But he also promised that improvements were coming.

Patel said that a dedicated team at Compal, the white-box PC manufacturing company that makes most of Framework’s hardware, had been hired and was being onboarded. And once that team was up to speed, the plan was to continuously cycle through Framework’s entire catalog of old and current products, making sure that no model was neglected for long.

Patel’s “mid-summer [2024]” timeline for making these changes turned out to be a bit optimistic, but it does seem as though the team has made real progress over the past calendar year or so.

Real improvements

Framework’s BIOS and driver update status page as of June 25, 2025. Every single one of Framework’s laptops has gotten at least one BIOS and driver update in the past calendar year, a big improvement over the state of things in April 2024. Credit: Framework

This thread on Framework’s Reddit gives a high-level overview of the release dates for Framework’s driver packages and BIOS updates, and, for all its products going back to 2021’s 11th-gen Intel Core version of the first Framework Laptop, the company has shipped at least one BIOS update in the past eight months. Most models have seen at least one update in calendar 2025. The driver packages tend to be a bit older, but the oldest ones are from June 2024, and the Core Ultra Framework Laptop and all AMD models have gotten at least one driver package update in 2025.

Looking at Framework’s progress on software support for its repairable laptops Read More »

tuesday-telescope:-a-new-champion-enters-the-ring

Tuesday Telescope: A new champion enters the ring

Welcome to the Tuesday Telescope. There is a little too much darkness in this world and not enough light—a little too much pseudoscience and not enough science. We’ll let other publications offer you a daily horoscope. At Ars Technica, we’ll take a different route, finding inspiration from very real images of a universe that is filled with stars and wonder.

After a decade of construction a large new reflecting telescope publicly released its first images on Monday, and they are nothing short of spectacular.

The Vera C. Rubin Observatory’s primary mirror is 8.4 meters in diameter, which makes it one of the largest optical telescopes in the world. However, the real secret sauce of the telescope is its camera—the automobile-sized Legacy Survey of Space and Time camera—which has a resolution of 3,200 megapixels. Which is rather a lot.

The observatory is on a remote 2,682-meter-high (8,799 ft) mountain in northern Chile, a region of the planet with some of the best atmospheric “seeing” conditions.

The main goal of the telescope is to scan the entire Southern Hemisphere sky by taking 1,000 high-definition photographs every three nights for the next 10 years. The idea is that, assembled end to end, the observatory will provide a high-definition, four-dimensional film of the Universe changing over a decade. It will seek to encompass everything from nearby asteroids and comets to distant supernovae.

Who was Vera Rubin? She was an American astronomer who was the first person to establish the presence of dark matter in galaxies. The observatory named in her honor was funded by the US Department of Energy and the US National Science Foundation. International partners, including the French National Centre for Scientific Research, will help to store the 20 terabytes of data collected every night.

The only bummer about Monday’s announcement is the fact that it was funded by the Department of Energy and the National Science Foundation. The Trump administration has sought to halve the science budgets of both agencies in the coming years. And the prospect of losing that funding, juxtaposed against the phenomenal start of the Vera C. Rubin Observatory, reminds us of what we stand to lose if we slash basic science funding in this country.

Source: Vera C. Rubin Observatory

Do you want to submit a photo for the Daily Telescope? Reach out and say hello.

Tuesday Telescope: A new champion enters the ring Read More »

how-a-grad-student-got-lhc-data-to-play-nice-with-quantum-interference

How a grad student got LHC data to play nice with quantum interference


New approach is already having an impact on the experiment’s plans for future work.

The ATLAS particle detector of the Large Hadron Collider (LHC) at the European Nuclear Research Center (CERN) in Geneva, Switzerland. Credit: EThamPhoto/Getty Images

The ATLAS particle detector of the Large Hadron Collider (LHC) at the European Nuclear Research Center (CERN) in Geneva, Switzerland. Credit: EThamPhoto/Getty Images

Measurements at the Large Hadron Collider have been stymied by one of the most central phenomena of the quantum world. But now, a young researcher has championed a new method to solve the problem using deep neural networks.

The Large Hadron Collider is one of the biggest experiments in history, but it’s also one of the hardest to interpret. Unlike seeing an image of a star in a telescope, saying anything at all about the data that comes out of the LHC requires careful statistical modeling.

“If you gave me a theory [that] the Higgs boson is this way or that way, I think people imagine, ‘Hey, you built the experiment, you should be able to tell me what you’re going to see under various hypotheses!’” said Daniel Whiteson, a professor at the University of California, Irvine. “But we don’t.”

One challenge with interpreting LHC data is interference, a core implication of quantum mechanics. Interference allows two possible events to inhibit each other, weakening the likelihood of seeing the result of either. In the presence of interference, physicists needed to use a fuzzier statistical method to analyze data, losing the data’s full power and increasing its uncertainty.

However, a recent breakthrough suggests a different way to tackle the problem. The ATLAS collaboration, one of two groups studying proton collisions at the LHC, released two papers last December that describe new ways of exploring data from their detector. One describes how to use a machine learning technique called Neural Simulation-Based Inference to maximize the potential of particle physics data. The other demonstrates its effectiveness with the ultimate test: re-doing a previous analysis with the new technique and seeing dramatic improvement.

The papers are the culmination of a young researcher’s six-year quest to convince the collaboration of the value of the new technique. Its success is already having an impact on the experiment’s plans for future work.

Making sense out of fusing bosons

Each particle collision at the LHC involves many possible pathways in which different particles combine to give rise to the spray of debris that experimenters see. In 2017, David Rousseau at IJCLab in Orsay, a member of the ATLAS collaboration, asked one of his students, Aishik Ghosh, to improve his team’s ability to detect a specific pathway. That particular pathway is quite important since it’s used to measure properties of the Higgs boson, a particle (first measured in 2012) that helps explain the mass of all other fundamental particles.

It was a pretty big ask. “When a grad student gets started in ATLAS, they’re a tiny cog in a giant, well-oiled machine of 3,500 physicists, who all seem to know exactly what they’re doing,” said Ghosh.

The pathway Ghosh was asked to study occurs via several steps. First, the two colliding protons each emit a W boson, a particle associated with the weak nuclear force. These two bosons fuse together, changing their identity to form a Higgs boson. The Higgs boson then decays, forming a pair of Z bosons, another particle associated with the weak force. Finally, those Z bosons themselves each decay into a lepton, like an electron, and its antimatter partner, like a positron.

A Feynman diagram for the pathway studied by Aishik Ghosh. Credit: ATLAS

Measurements like the one Ghosh was studying are a key way of investigating the properties of the Higgs boson. By precisely measuring how long it takes the Higgs boson to decay, physicists could find evidence of it interacting with new, undiscovered particles that are too massive for the LHC to produce directly.

Ghosh started on the project, hoping to find a small improvement in the collaboration’s well-tested methods. Instead, he noticed a larger issue. The goal he was given, of detecting a single pathway by itself, didn’t actually make sense.

“I was doing that and I realized, ‘What am I doing?’ There’s no clear objective,” said Ghosh.

The problem was quantum interference.

How quantum histories interfere

One of the most famous demonstrations of the mysterious nature of quantum mechanics is called the double-slit experiment. In this demonstration, electrons are shot through a screen with two slits that allow them to pass through to a photographic plate on the other side. With one slit covered, the electrons form a pattern centered on the opening. The photographic plate lights up bright right across from the slit and dims further away from it.

With both slits open, you would expect the pattern to get brighter as more electrons reach the photographic plate. Instead, the effect varies. The two slits do not give rise to two nice bright peaks; instead, you see a rippling pattern in which some areas get brighter while others get dimmer, even though the dimmer areas should, in principle, be easier for electrons to reach.

The effect happens even if the electrons are shot at the screen one by one to stop them from influencing each other directly. It’s as if each electron carries with it two possible histories, one in which it goes through one slit and another where it goes through the other before both end up at the same place. These two histories interfere with each other so that some destinations become less likely instead of more likely.

Results of the double-slit experiment. Credit: Jordgette (CC BY-SA 3.0)

For electrons in the double-slit experiment, the two different histories are two different paths through space. For a measurement at the Large Hadron Collider, the histories are more abstract—paths that lead through transformations of fields. One history might be like the pathway Ghosh was asked to study, in which two W bosons fuse to form a Higgs boson before the Higgs boson splits into two Z bosons. But in another history, the two W bosons might fuse and immediately split into two Z bosons without ever producing a Higgs.

Both histories have the same beginning, with two W bosons, and the same end, with two Z bosons. And just as the two histories of electrons in the double-slit experiment can interfere, so can the two histories for these particles.

Another possible history for colliding particles at the Large Hadron Collider, which interferes with the measurement Ghosh was asked to do. Credit: ATLAS

That interference makes the effect of the Higgs boson much more challenging to spot. ATLAS scientists wanted to look for two pairs of electrons and positrons, which would provide evidence that two Z bosons were produced. They would classify their observations into two types: observations that are evidence for the signal they were looking for (that of a decaying Higgs boson) and observations of events that generate this pattern of particles without the Higgs boson acting as an intermediate (the latter are called the background). But the two types of observations, signal and background, interfere. With a stronger signal, corresponding to more Higgs bosons decaying, you might observe more pairs of electrons and positrons… but if these events interfere, you also might see those pairs disappear.

Learning to infer

In traditional approaches, those disappearances are hard to cope with, even when using methods that already incorporate machine learning.

One of the most common uses of machine learning is classification—for example, distinguishing between pictures of dogs and cats. You train the machine on pictures of cats and pictures of dogs, and it tells you, given a picture, which animal is the most likely match. Physicists at the LHC were already using this kind of classification method to characterize the products of collisions, but it functions much worse when interference is involved.

“If you have something that disappears, you don’t quite know what to train on,” said David Rousseau. “Usually, you’re training signal versus background, exactly like you’re training cats versus dogs. When there is something that disappears, you don’t see what you trained on.”

At first, Ghosh tried a few simple tricks, but as time went on, he realized he needed to make a more fundamental change. He reached out to others in the community and learned about a method called Neural Simulation-Based Inference, or NSBI.

In older approaches, people had trained machine learning models to classify observations into signal and background, using simulations of particle collisions to make the training data. Then they used that classification to infer the most likely value of a number, like the amount of time it takes a Higgs boson to decay, based on data from an actual experiment. Neural Simulation-Based Inference skips the classification and goes directly to the inference.

Instead of trying to classify observations into signal and background, NSBI uses simulations to teach an artificial neural network to guess a formula called a likelihood ratio. Someone using NSBI would run several simulations that describe different situations, such as letting the Higgs boson decay at different rates, and then check how many of each type of simulation yielded a specific observation. The fraction of these simulations with a certain decay rate would provide the likelihood ratio, a method for inferring which decay rate is more likely given experimental evidence. If the neural network is good at guessing this ratio, it will be good at finding how long the Higgs takes to decay.

Because NSBI doesn’t try to classify observations into different categories, it handles quantum interference more effectively. Instead of trying to find the Higgs based on a signal that disappears, it examines all the data, trying to guess which decay time is the most likely.

Ghosh tested the method, which showed promising results on test data, and presented the results at a conference in 2019. But if he was going to convince the ATLAS collaboration that the method was safe to use, he still had a lot of work ahead of him.

Shifting the weight on ATLAS’ shoulders

Experiments like ATLAS have high expectations attached to them. A collaboration of thousands of scientists, ATLAS needs to not only estimate the laws of physics but also have a clear idea of just how uncertain those estimates are. At the time, NSBI hadn’t been tested in that way.

“None of this has actually been used on data,” said Ghosh. “Nobody knew how to quantify the uncertainties. So you have a neural network that gives you a likelihood. You don’t know how good the likelihood is. Is it well-estimated? What if it’s wrongly estimated just in some weird corner? That would completely bias your results.”

Checking those corners was too big a job for a single PhD student and too complex to complete within a single PhD degree. Aishik would have to build a team, and he would need time to build that team. That’s tricky in the academic world, where students go on to short-term postdoc jobs with the expectation that they quickly publish new results to improve their CV for the next position.

“We’re usually looking to publish the next paper within two to three years—no time to overhaul our methods,” said Ghosh. Fortunately, Ghosh had support. He received his PhD alongside Rousseau and went to work with Daniel Whiteson, who encouraged him to pursue his ambitious project.

“I think it’s really important that postdocs learn to take those risks because that’s what science is,” Whiteson said.

Ghosh gathered his team. Another student of Rousseau’s, Arnaud Maury, worked to calibrate the machine’s confidence in its answers. A professor at the University of Massachusetts, Rafael Coelho Lopes de Sa, joined the project. His student Jay Sandesara would have a key role in getting the calculation to work at full scale on a computer cluster. IJCLab emeritus RD Schaffer and University of Liège professor Gilles Loupe provided cross-checks and advice.

The team wanted a clear demonstration that their method worked, so they took an unusual step. They took data that ATLAS had already analyzed and performed a full analysis using their method instead, showing that it could pass every check the collaboration could think of. They would publish two papers, one describing the method and the other giving the results of their upgraded analysis. Zach Marshall, who was the computing coordinator for ATLAS at the time, helped get the papers through, ensuring that they were vetted by experts in multiple areas.

“It was a very small subset of our community that had that overlap between this technical understanding and the physics analysis experience and understanding that were capable of really speaking to whether that paper was sufficient and intelligible and useful. So we really had to make sure that we engaged that little group of humans by name,” said Marshall.

The new method showed significant improvements, getting a much more precise result than the collaboration’s previous analysis. That improvement, and the thorough checks, persuaded ATLAS to use NSBI more broadly going forward. It will give them much more precision than they expected, using the Higgs boson to search for new particles and clarify our understanding of the quantum world. When ATLAS discusses its future plans, it makes projections of the precision it expects to reach in the future. But those plans are now being upended.

“One of the fun things about this method that Aishik pushed hard is each time it feels like now we do that projection—here’s how well we’ll do in 15 years—we absolutely crush those projections,” said Marshall. “So we are just now having to redo a set of projections because we matched our old projections for 15 years out already today. It’s a very fun problem to have.”

How a grad student got LHC data to play nice with quantum interference Read More »