Author name: Paul Patrick

embark-on-a-visual-voyage-of-art-inspired-by-black-holes

Embark on a visual voyage of art inspired by black holes

Gamwell sees echoes of Mitchell’s dark stars, for instance, in Edgar Allan Poe’s short story, “A Descent Into the Maelstrom,” particularly the evocative 1919 illustration by Harry Clarke. “This seemed to have been an early analogy to a black hole for many people when the concept was first proposed,” said Gamwell. “It’s a mathematical construct at that point and it’s very difficult to imagine a mathematical construct. Poe actually envisioned a dark star [elsewhere in his writings].”

The featured art spans nearly every medium: charcoal sketches, pen-and-ink drawings, oil or acrylic paintings, murals, sculptures, traditional and digital photography, and immersive room-sized multimedia installations, such as a 2021-2022 piece called Gravitational Arena by Chinese artist Xu Bing. “Xu Bing does most of his work about language,” said Gamwell. For Gravitational Arena, “He takes a quote about language from Wittgenstein and translates it into his own script, the English alphabet written to resemble Chinese characters. Then he applies gravity to it and makes a singularity. [The installation] is several stories high and he covered the gallery floor with a mirror. So you walk upstairs and you see it’s like a wormhole, which he turns into an analogy for translation.”

“Anything in the vicinity of a black hole is violently torn apart owing to its extreme gravity—the strongest in the universe,” Gamwell writes about the enduring appeal of black holes as artistic inspiration. “We see this violence in the works of artists like Cai Guo-­ Qiang and Takashi Murakami, who have used black holes to symbolize the brutality unleashed by the atomic bomb. The inescapable pull of a black hole is also a ready metaphor for depression in the work of artists such as Moonassi. Thus, on the one hand, the black hole provides artists with a symbol to express the devastations and anxieties of the modern world. On the other hand, however, a black hole’s extreme gravity is the source of stupendous energy, and artists such as Yambe Tam invite viewers to embrace darkness as a path to transformation, awe, and wonder.”

One of the earliest scientific images of a black hole, 1979. Ink on paper, reversed photographically. Jean-Pierre Luminet/Astronomy and Astrophysics 1979

Embark on a visual voyage of art inspired by black holes Read More »

speed-act-passes-in-house-despite-changes-that-threaten-clean-power-projects

SPEED Act passes in House despite changes that threaten clean power projects


The bill would significantly curtail scope of the federal environmental review process.

Rep. Bruce Westerman (R-Ark.) speaks during a news conference in the US Capitol Visitor Center on Oct. 22. Credit: Williams/CQ-Roll Call, Inc/Getty Images

The House of Representatives cleared the way for a massive overhaul of the federal environmental review process last Thursday, despite last-minute changes that led clean energy groups and moderate Democrats to pull their support.

The Standardizing Permitting and Expediting Economic Development Act, or SPEED Act, overcame opposition from environmentalists and many Democrats who oppose the bill’s sweeping changes to a bedrock environmental law.

The bill, introduced by Rep. Bruce Westerman (R-Ark.) and backed by Rep. Jared Golden (D-Maine), passed the House Thursday in a 221-196 vote, in which 11 Democrats joined Republican lawmakers to back the reform effort. It now heads to the Senate, where it has critics and proponents on both sides of the aisle, making its prospects uncertain.

The bill seeks to reform foundational environmental regulations that govern how major government projects are assessed and approved by amending the landmark 1970 National Environmental Policy Act (NEPA), signed into law under the Nixon administration. NEPA requires federal agencies to review and disclose the environmental impacts of major projects before permitting or funding them. Although NEPA reviews are only one component of the federal permitting process, advocates argue that they serve a crucial role by providing both the government and the public the chance to examine the knock-on effects that major projects could have on the environment.

Critics of the law have argued for years that increasingly complex reviews—along with legal wrangling over the findings of those reviews—have turned NEPA into a source of significant, burdensome delays that threaten the feasibility of major projects, such as power plants, transmission lines, and wind and solar projects on federal land.

Speaking on the floor of the House Thursday before the vote, Westerman described the SPEED Act as a way to “restore common sense and accountability to federal permitting.” Westerman praised the original intent of NEPA but said the law’s intended environmental protections had been overshadowed by NEPA becoming “more synonymous with red tape and waste.

“What was meant to facilitate responsible development has been twisted into a bureaucratic bottleneck that delays investments in the infrastructure and technologies that make our country run,” Westerman said.

After the bill’s passage through the House on Thursday, the SPEED Act’s Democratic cosponsor, Golden, praised the bill’s success.

“The simplest way to make energy, housing, and other essentials more affordable is to make it possible to actually produce enough of it at a reasonable cost,” Golden said in a press release following the vote. “The SPEED Act has united workers, businesses, and political forces who usually oppose each other because scarcity hurts everyone.”

According to an issue brief from the Bipartisan Policy Center, the bill aims to reform the NEPA process in several key ways. First, it makes changes to the ways agencies comply with NEPA—for example, by creating exemptions to when a NEPA review is required, and requiring agencies to only consider environmental impacts that are directly tied to the project at hand.

It would also drastically shorten the deadline to sue a federal agency over its permitting decision and constrain who is eligible to file suit. Current law provides a six-year statute of limitations on agency decisions for permitting energy infrastructure, and two years for transportation project permits. Under the SPEED Act’s provisions, those deadlines would be shortened to a mere 150 days and only allow lawsuits to be filed by plaintiffs who demonstrated in public comment periods that they would be directly and negatively impacted by the project.

NEPA does not require the government to make particular decisions about whether or how to move forward with a project based on a review’s findings. However, critics argue that in the decades since its passage, interest groups have “weaponized” the NEPA process to delay or even doom projects they oppose, sometimes forcing agencies to conduct additional analyses that add costly delays to project timelines.

Strange bedfellows on either side of the bill 

Although climate activists and environmental groups have used NEPA to oppose fossil fuel projects, such as the Keystone XL and Dakota Access pipelines, oil and gas interests are far from the only group seeking respite. Some voices within the clean energy industry have called for permitting reform, too, arguing that delays stemming from the current permitting process have had a negative impact on America’s ability to build out more climate-friendly projects, including some offshore wind projects and transmission lines to connect renewables to the grid.

So when Westerman and Golden introduced the SPEED Act in the House, a hodgepodge of odd alliances and opposition groups formed in response.

The American Petroleum Institute, a trade association for the oil and gas industry, launched a seven-figure advertising campaign in recent months pushing lawmakers to pursue permitting reform, according to a report from Axios. And the bill also initially enjoyed support from voices within the clean power industry. However, last-minute changes to the bill—designed to win over Republican holdouts—undermined the SPEED Act’s cross-sector support.

The bill’s opponents had previously raised alarm bells that fossil fuel interests would disproportionately benefit from a more streamlined review process under the current administration, citing President Donald Trump’s ongoing war against wind and solar energy projects.

In recent months, the Trump administration has sought to pause, reconsider, or revoke already approved permits for renewable energy projects it dislikes. Those moves particularly impacted offshore wind developments and added significant uncertainty to the feasibility of clean energy investments as a whole.

A bipartisan amendment to the SPEED Act, added during the Natural Resources Committee’s markup in November, sought to address some of those concerns by adding language that would make it more difficult for the administration to “revoke, rescind, withdraw, terminate, suspend, amend, alter, or take any other action to interfere” with an existing authorization.

However, that measure encountered resistance from key Republican voices who support Trump’s attacks on offshore wind projects.

A last-minute loophole for Trump’s energy agenda

On Tuesday, Republican lawmakers in the Rules Committee were able to amend the SPEED Act in a way that would facilitate the Trump administration’s ongoing efforts to axe renewable energy projects. The changes were spearheaded by Andy Harris (R-Md.) and Jeff Van Drew (R-N.J.), two vocal proponents of Trump’s energy policies. The amendment fundamentally undermined the technology-neutral aspirations of the bill—and any hope of receiving widespread support from moderate Democrats or the clean power industry.

According to Matthew Davis, vice president of federal policy at the League of Conservation Voters, Harris and Van Drew’s amendment would allow the administration to exclude any project from the bill’s reforms that the Trump administration had flagged for reconsideration—something the administration has done repeatedly for renewable projects like offshore wind.

The result, Davis argued, is that the bill would speed up the environmental review process for the Trump administration’s preferred sources of energy—namely, oil and gas—while leaving clean energy projects languishing.

“They couldn’t pass the rule on Tuesday to even consider this bill without making it even better for the fossil fuel industry and even worse for the clean energy industry,” Davis said.

In a public statement following Thursday’s vote, Davis described the amended SPEED Act as “a fossil fuel giveaway that cuts out community input and puts our health and safety at risk to help big polluters.”

The American Clean Power Association, which represents the renewable energy industry, previously hailed the bill as an important step forward for the future of clean energy development. But after the Rules Committee’s changes on Tuesday, the organization dropped its support.

“Our support for permitting reform has always rested on one principle: fixing a broken system for all energy resources,” said ACP CEO Jason Grumet in a Wednesday statement. “The amendment adopted last night violate[s] that principle. Technology neutrality wasn’t just good policy—it was the political foundation that made reform achievable.”

The American Council on Renewable Energy (ACORE), a nonprofit trade and advocacy organization, echoed that sentiment.

“Durable, bipartisan, technology-neutral permitting reforms that support and advance the full suite of American electricity resources and the necessary expansion of transmission infrastructure to get that electricity from where it’s generated to where it’s needed are essential to meeting that challenge reliably, securely, and most importantly, affordably,” said ACORE CEO Ray Long. “Unfortunately, the changes made on the House floor are a disappointing step backward from achieving these objectives.”

Following the SPEED Act’s passage through the House on Thursday, advocacy group Citizens for Responsible Energy Solutions (CRES) issued a public statement praising the bill’s success while noting how the recent amendments had affected the law.

“While we are concerned that post committee additions to the bill could put the certainty of a range of projects at risk, this bill’s underlying reforms are critical to advancing American energy,” CRES President Heather Reams said in the statement.

Mixed expectations for the reform’s impact

Even before the move to strip protections for renewables from the bill, some critics—like Rep. Mike Levin (D-Calif.)—said that the legislation didn’t go far enough to curtail the president’s “all-out assault” against clean power, arguing that the bill does nothing to restore approvals that have already been canceled by the administration and doesn’t address other roadblocks that have been put in place.

“The administration cannot be trusted to act without specific language, in my view, to protect the clean energy projects already in the pipeline and to prevent the Interior Secretary from unilaterally stopping projects that are needed to lower costs and improve grid reliability,” Levin told Inside Climate News in an interview ahead of the House vote.

Both Levin and Davis pointed to a July memo from the Department of Interior that requires all wind and solar projects on federal land to receive higher-level approval from Interior Secretary Doug Burgum.

“The administration is not even returning the phone calls of project developers. They are not responding to applications being submitted,” Davis said. “That sort of approach is in stark contrast with the ‘white glove, concierge service’—and that’s a quote from the Trump administration—the service they are providing for fossil fuel companies to access our public lands.”

The SPEED Act’s opponents also dispute the idea that NEPA reviews are one of the primary causes of permitting delays, arguing that reports from the Congressional Research Service and other groups have found little evidence to support those claims.

“Often missing in the conversation around NEPA is the empirical research that’s been done, and there’s a lot of that out there,” said Jarryd Page, a staff attorney at the Environmental Law Institute, in a September interview with Inside Climate News.

That research points to resource constraints as one of the biggest roadblocks, Page said, like not having enough staff to conduct the environmental reviews, or staff lacking adequate experience and technical know-how.

Debate over NEPA and the reform of the permitting process will now move into the Senate, where experts say the SPEED Act will likely undergo further changes.

“I think as the bill goes forwards in the Senate, we’ll probably see a neutral, across-the-board approach to making sure the process is fair for all technology types,” Xan Fishman, an energy policy expert at the Bipartisan Policy Center told ICN after Thursday’s vote.

Fishman stressed it would be crucial to ensure permits for projects wouldn’t suddenly be cancelled for political reasons, but said he was optimistic about how the SPEED Act would be refined in the Senate.

“It’s great to see Congress so engaged with permitting reform,” he said. “Both sides of the aisle see a need to do better.”

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy and the environment. Sign up for their newsletter here.

SPEED Act passes in House despite changes that threaten clean power projects Read More »

how-ai-coding-agents-work—and-what-to-remember-if-you-use-them

How AI coding agents work—and what to remember if you use them


Agents of uncertain change

From compression tricks to multi-agent teamwork, here’s what makes them tick.

AI coding agents from OpenAI, Anthropic, and Google can now work on software projects for hours at a time, writing complete apps, running tests, and fixing bugs with human supervision. But these tools are not magic and can complicate rather than simplify a software project. Understanding how they work under the hood can help developers know when (and if) to use them, while avoiding common pitfalls.

We’ll start with the basics: At the core of every AI coding agent is a technology called a large language model (LLM), which is a type of neural network trained on vast amounts of text data, including lots of programming code. It’s a pattern-matching machine that uses a prompt to “extract” compressed statistical representations of data it saw during training and provide a plausible continuation of that pattern as an output. In this extraction, an LLM can interpolate across domains and concepts, resulting in some useful logical inferences when done well and confabulation errors when done poorly.

These base models are then further refined through techniques like fine-tuning on curated examples and reinforcement learning from human feedback (RLHF), which shape the model to follow instructions, use tools, and produce more useful outputs.

A screenshot of the Claude Code command-line interface.

A screenshot of the Claude Code command-line interface. Credit: Anthropic

Over the past few years, AI researchers have been probing LLMs’ deficiencies and finding ways to work around them. One recent innovation was the simulated reasoning model, which generates context (extending the prompt) in the form of reasoning-style text that can help an LLM home in on a more accurate output. Another innovation was an application called an “agent” that links several LLMs together to perform tasks simultaneously and evaluate outputs.

How coding agents are structured

In that sense, each AI coding agent is a program wrapper that works with multiple LLMs. There is typically a “supervising” LLM that interprets tasks (prompts) from the human user and then assigns those tasks to parallel LLMs that can use software tools to execute the instructions. The supervising agent can interrupt tasks below it and evaluate the subtask results to see how a project is going. Anthropic’s engineering documentation describes this pattern as “gather context, take action, verify work, repeat.”

If run locally through a command-line interface (CLI), users give the agents conditional permission to write files on the local machine (code or whatever is needed), run exploratory commands (say, “ls” to list files in a directory), fetch websites (usually using “curl”), download software, or upload files to remote servers. There are lots of possibilities (and potential dangers) with this approach, so it needs to be used carefully.

In contrast, when a user starts a task in the web-based agent like the web versions of Codex and Claude Code, the system provisions a sandboxed cloud container preloaded with the user’s code repository, where Codex can read and edit files, run commands (including test harnesses and linters), and execute code in isolation. Anthropic’s Claude Code uses operating system-level features to create filesystem and network boundaries within which the agent can work more freely.

The context problem

Every LLM has a short-term memory, so to speak, that limits the amount of data it can process before it “forgets” what it’s doing. This is called “context.” Every time you submit a response to the supervising agent, you are amending one gigantic prompt that includes the entire history of the conversation so far (and all the code generated, plus the simulated reasoning tokens the model uses to “think” more about a problem). The AI model then evaluates this prompt and produces an output. It’s a very computationally expensive process that increases quadratically with prompt size because LLMs process every token (chunk of data) against every other token in the prompt.

Anthropic’s engineering team describes context as a finite resource with diminishing returns. Studies have revealed what researchers call “context rot”: As the number of tokens in the context window increases, the model’s ability to accurately recall information decreases. Every new token depletes what the documentation calls an “attention budget.”

This context limit naturally limits the size of a codebase a LLM can process at one time, and if you feed the AI model lots of huge code files (which have to be re-evaluated by the LLM every time you send another response), it can burn up token or usage limits pretty quickly.

Tricks of the trade

To get around these limits, the creators of coding agents use several tricks. For example, AI models are fine-tuned to write code to outsource activities to other software tools. For example, they might write Python scripts to extract data from images or files rather than feeding the whole file through an LLM, which saves tokens and avoids inaccurate results.

Anthropic’s documentation notes that Claude Code also uses this approach to perform complex data analysis over large databases, writing targeted queries and using Bash commands like “head” and “tail” to analyze large volumes of data without ever loading the full data objects into context.

(In a way, these AI agents are guided but semi-autonomous tool-using programs that are a major extension of a concept we first saw in early 2023.)

Another major breakthrough in agents came from dynamic context management. Agents can do this in a few ways that are not fully disclosed in proprietary coding models, but we do know the most important technique they use: context compression.

The command line version of OpenAI codex running in a macOS terminal window.

The command-line version of OpenAI Codex running in a macOS terminal window. Credit: Benj Edwards

When a coding LLM nears its context limit, this technique compresses the context history by summarizing it, losing details in the process but shortening the history to key details. Anthropic’s documentation describes this “compaction” as distilling context contents in a high-fidelity manner, preserving key details like architectural decisions and unresolved bugs while discarding redundant tool outputs.

This means the AI coding agents periodically “forget” a large portion of what they are doing every time this compression happens, but unlike older LLM-based systems, they aren’t completely clueless about what has transpired and can rapidly re-orient themselves by reading existing code, written notes left in files, change logs, and so on.

Anthropic’s documentation recommends using CLAUDE.md files to document common bash commands, core files, utility functions, code style guidelines, and testing instructions. AGENTS.md, now a multi-company standard, is another useful way of guiding agent actions in between context refreshes. These files act as external notes that let agents track progress across complex tasks while maintaining critical context that would otherwise be lost.

For tasks requiring extended work, both companies employ multi-agent architectures. According to Anthropic’s research documentation, its system uses an “orchestrator-worker pattern” in which a lead agent coordinates the process while delegating to specialized subagents that operate in parallel. When a user submits a query, the lead agent analyzes it, develops a strategy, and spawns subagents to explore different aspects simultaneously. The subagents act as intelligent filters, returning only relevant information rather than their full context to the lead agent.

The multi-agent approach burns through tokens rapidly. Anthropic’s documentation notes that agents typically use about four times more tokens than chatbot interactions, and multi-agent systems use about 15 times more tokens than chats. For economic viability, these systems require tasks where the value is high enough to justify the increased cost.

Best practices for humans

While using these agents is contentious in some programming circles, if you use one to code a project, knowing good software development practices helps to head off future problems. For example, it’s good to know about version control, making incremental backups, implementing one feature at a time, and testing it before moving on.

What people call “vibe coding”—creating AI-generated code without understanding what it’s doing—is clearly dangerous for production work. Shipping code you didn’t write yourself in a production environment is risky because it could introduce security issues or other bugs or begin gathering technical debt that could snowball over time.

Independent AI researcher Simon Willison recently argued that developers using coding agents still bear responsibility for proving their code works. “Almost anyone can prompt an LLM to generate a thousand-line patch and submit it for code review,” Willison wrote. “That’s no longer valuable. What’s valuable is contributing code that is proven to work.”

In fact, human planning is key. Claude Code’s best practices documentation recommends a specific workflow for complex problems: First, ask the agent to read relevant files and explicitly tell it not to write any code yet, then ask it to make a plan. Without these research and planning steps, the documentation warns, Claude’s outputs tend to jump straight to coding a solution.

Without planning, LLMs sometimes reach for quick solutions to satisfy a momentary objective that might break later if a project were expanded. So having some idea of what makes a good architecture for a modular program that can be expanded over time can help you guide the LLM to craft something more durable.

As mentioned above, these agents aren’t perfect, and some people prefer not to use them at all. A randomized controlled trial published by the nonprofit research organization METR in July 2025 found that experienced open-source developers actually took 19 percent longer to complete tasks when using AI tools, despite believing they were working faster. The study’s authors note several caveats: The developers were highly experienced with their codebases (averaging five years and 1,500 commits), the repositories were large and mature, and the models used (primarily Claude 3.5 and 3.7 Sonnet via Cursor) have since been superseded by more capable versions.

Whether newer models would produce different results remains an open question, but the study suggests that AI coding tools may not always provide universal speed-ups, particularly for developers who already know their codebases well.

Given these potential hazards, coding proof-of-concept demos and internal tools is probably the ideal use of coding agents right now. Since AI models have no actual agency (despite being called agents) and are not people who can be held accountable for mistakes, human oversight is key.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

How AI coding agents work—and what to remember if you use them Read More »

china-just-carried-out-its-second-reusable-launch-attempt-in-three-weeks

China just carried out its second reusable launch attempt in three weeks

For the second time this month, a Chinese rocket designed for reuse successfully soared into low-Earth orbit on its first flight Monday, defying the questionable odds that burden the debuts of new launch vehicles.

The first Long March 12A rocket, roughly the same height and diameter of SpaceX’s workhorse Falcon 9, lifted off from the Jiuquan Satellite Launch Center at 9: 00 pm EST Monday (02: 00 UTC Tuesday).

Less than 10 minutes later, rocket’s methane-fueled first stage booster hurtled through the atmosphere at supersonic speed, impacting in a remote region about 200 miles downrange from the Jiuquan spaceport in northwestern China. The booster failed to complete a braking burn to slow down for landing at a prepared location near the edge of the Gobi Desert.

The Long March 12A’s upper stage performed as intended, successfully reaching the mission’s “predetermined orbit,” said the China Aerospace Science and Technology Corporation (CASC), the state-owned enterprise that leads the country’s space industry.

“The first stage failed to be successfully recovered,” the corporation said in a statement. “The specific reasons are currently under further analysis and investigation.”

A stable of reusable rockets

This outcome resembles the results from the first flight of another medium-class Chinese rocket, the Zhuque-3, on December 2. The Zhuque-3 rocket was developed by a privately-funded startup named LandSpace. Similar in size and performance to the Long March 12A, the Zhuque-3 also reached orbit on its first launch, and its recoverable booster stage crashed during a downrange landing attempt. The Zhuque-3’s first stage came down next to its landing zone, while the Long March 12A appears to have missed by at least a couple of miles.

“Although this mission did not achieve the planned recovery of the rocket’s first stage, it obtained critical engineering data under the rocket’s actual flight conditions, laying an important foundation for subsequent launches and reliable recovery of the stages,” CASC said. “The research and development team will promptly conduct a comprehensive review and technical analysis of this test process, fully investigate the cause of the failure, continuously optimize the recovery plan, and continue to advance reusable technology verification.”

China just carried out its second reusable launch attempt in three weeks Read More »

fcc’s-import-ban-on-the-best-new-drones-starts-today

FCC’s import ban on the best new drones starts today

DJI sent numerous requests to the US government to audit its devices in hopes of avoiding a ban, but the federal ban was ultimately enacted based on previously acquired information, The New York Times reported this week.

The news means that Americans will miss out on new drone models from DJI, which owns 70 percent of the global drone market in 2023, per Drone Industry Insights, and is widely regarded as the premium drone maker. People can still buy drones from US companies, but American drones have a lackluster reputation compared to drones from DJI and other Chinese companies, such as Autel. US-made drones also have a reputation for being expensive, usually costing significantly more than their Chinese counterparts. DaCoda Bartels, COO of FlyGuys, which helps commercial drone pilots find work, told the Times that US drones are also “half as good.”

There’s also concern among hobbyists that the ban will hinder their ability to procure drone parts, potentially affecting the repairability of approved drones and DIY projects.

US-based drone companies, meanwhile, are optimistic about gaining business in an industry where it has historically been hard to compete against Chinese brands. It’s also possible that the ban will just result in a decline in US drone purchases.

In a statement, Michael Robbins, president and CEO of the Association for Uncrewed Vehicle Systems International (AUVSI), which includes US drone companies like Skydio as members, said the ban “will truly unleash American drone dominance” and that the US cannot “risk… dependence” on China for drones.

“By prioritizing trusted technology and resilient supply chains, the FCC’s action will accelerate innovation, enhance system security, and ensure the US drone industry expands rather than remaining under foreign control,” Robbins said.

Understandably, DJI is “disappointed” by the FCC’s decision, it said in a statement issued on Monday, adding:

While DJI was not singled out, no information has been released regarding what information was used by the Executive Branch in reaching its determination. Concerns about DJI’s data security have not been grounded in evidence and instead reflect protectionism, contrary to the principles of an open market.

FCC’s import ban on the best new drones starts today Read More »

f1’s-new-engines-are-causing-consternation-over-compression-ratios

F1’s new engines are causing consternation over compression ratios

Compression ratios

At issue is the engines’ compression ratio, which compares the volume of the cylinder when the piston is at top dead center with the volume when the piston is at its closest to the crank. Under the 2014–2025 rules, this was set at 18:1, but for 2026 onward, it has been reduced to 16:1.

This is measured at ambient temperature, though, not while the engine is running. A running engine is hotter—much hotter—than one sitting at ambient, and as metals heat up, they expand. The engines have very short throws, so it doesn’t take much expansion to increase the compression ratio by reducing the distance between the piston and cylinder head at the top of its travel. The benefit could be as much as 15 hp (11 kW), which translates to a few tenths of a second per lap advantage.

Unfortunately for the other teams, the FIA stated that its rules indeed specify only that the compression ratio should be 16:1 based on static conditions and at ambient temperatures. “This procedure has remained unchanged despite the reduction in the permitted ratio for the 2026 season. It is true that thermal expansion can influence dimensions, but the current rules do not provide for measurements to be carried out at elevated temperatures,” the FIA said.

So if Mercedes and Red Bull do have a horsepower advantage, it’s one that will likely be baked into the 2026 season.

The compression ratio clarification wasn’t the only one issued by the FIA. For some time now, F1 has used ultrasonic fuel flow meters as a way to control power outputs. Under the outgoing regulations, this was capped at 100 kg/h, but with the move to fully sustainable synthetic fuels, this is changing to an energy cap of 3,000 MJ/h instead.

In the past, it had been theorized that teams could try to game the fuel flow meters—the most impressive idea I heard involved pulsing more fuel between the sensor’s sampling inputs to boost power, although I don’t believe it was ever implemented.

Don’t even think about being that clever this time, the FIA says. “Any device, system, or procedure, the purpose of which is to change the temperature of the fuel-flow meter, is forbidden,” it says, updating the regulation that previously banned “intentional heating or chilling” of the fuel flow meter.

F1’s new engines are causing consternation over compression ratios Read More »

keeping-up-against-the-joneses:-balsa’s-2025-fundraiser

Keeping Up Against the Joneses: Balsa’s 2025 Fundraiser

Several years ago Zvi Mowshowitz founded Balsa Research, a tiny nonprofit research organization currently focused on quantifying the impact of the Jones Act on the American economy, and working towards viable reform proposals.

While changing century-old policy is not going to be easy, we continue to see many places where there is neglected groundwork that we’re well positioned to do, and we are improving at doing it with another year of practice under our belts.

We’re looking to raise $200,000 to support our work this giving season, though $50,000 would be sufficient to keep the lights on, and we think we are also well positioned to do more with more funding.

Funds raised this round will support Balsa’s policy advocacy, either in Jones Act and shipping or potentially in other planned cause areas of housing reform and NEPA reform if there is capacity to significantly expand.

Donate here to fund our mainline policy work.

One additional possibility for Balsa, that would be funded entirely separately if it did happen, is for Zvi Mowshowitz to use Balsa as a piece of philanthropic infrastructure to help guide new philanthropic money coming online in 2026 if there is demand. Contact us ([email protected]) if you would like to be involved in such an effort in any capacity, or want to authorize this as a potential use of your funds.

Donate here if you are interested in helping us with fully flexible funding.

Quite early in the year, Balsa’s plans for Jones Act investigative work was derailed by a certain Section 301 Investigation, which I wrote about here. In short, the USTR was proposing two significant changes to maritime transport: a $3-5 million fee for Chinese-built ships to deliver imports to American ports, and new, Jones Act-tier restrictions to up to 20% of American maritime exports. All of American industry focused on lobbying against the legibly bad first proposal, sadly no one else was on the ball about how bad the second proposal was because it required a slightly more sophisticated argument. So Balsa stepped in and wrote up a public comment and presented it to the USTR during their public hearing on the proposal. At least in part due to our research and our outreach to maritime industry players, this proposal was basically entirely axed.

After our mid-year write-up on the whole adventure, Balsa did also end up submitting a second comment in response to what we felt was a deeply counterproductive tariff scheme in the updated proposal. This was the first arc played out in miniature; after functionally scrapping both major proposals from the first round, the USTR was proposing that an increasing percentage of American LNG must be shipped out on U.S.-built LNG tankers (there are currently zero in the fleet and no capacity for the shipyards to build any new ones) and that all port crane parts made in China be subject to 100% tariffs. Everyone focused on lobbying against the first policy change which was obviously bad, the second was bad in a more subtle way. So it was up to Balsa to point out that the exact setup of the port crane tariffs were structured in a way counterproductive to the stated U.S. policy, would incentivize American ports to buy their cranes from Chinese manufacturers instead of manufacturers in allied countries (there is no domestic port crane manufacturing capacity), and negatively impact port revitalization investments that need to happen.

One piece of good news is that President Trump signed a trade deal with China in November, which resulted in a one-year suspension of all of the punitive measures proposed in the Section 301 investigation. We think there’s a good chance that the suspension might become indefinite, but it still seemed like a good use of our time to write up our objections should the measures resume in 2026.

We also worked on the Jones Act. We launched a new RFA to investigate the labor impacts of the Jones Act. This is meant to complement our first RFA, which invites academics to look at the economic impacts of the Jones Act. You may also recall that we had already given out grants for two different studies under the first RFA, on economic impacts. These papers are still in the process of being written. We remain confident in both teams and look forward to seeing their results in 2026.

We shored up a few places where we felt like some of the groundwork done by others on the Jones Act were either neglected or outdated. We published two pieces: The Jones Act Index, which works as a very short overview of all the myriad dysfunctions of the current domestic maritime industry, and an operational analysis of what exactly the 93 extant Jones Act eligible vessels get up to.

Besides all that, there is of course the frustratingly intangible work of networking and building a deeper understanding of the shape of the problem. We conducted over forty conversations with stakeholders across the maritime policy landscape, including domestic shipping operators, port executives, and congressional staff. These conversations directly informed our operational analysis of Jones Act vessels and helped us identify which reform framings resonate (and which don’t) with different constituencies. We’ve compiled this primary research into internal documentation mapping stakeholder positions, constraints, and potential pressure points—groundwork that will directly inform our policy binder and draft reform proposals.

Additionally, in the last few months of the year, we brought on a very part-time contractor to help with shipping out more of our policy work.

A breakdown of our 2025 spend to the nearest thousand, for a total of ~$143k:

  • $87,000 in wages (Jenn at 35 hours a week and a policy analyst at 10 hours a week)

  • $0 for Zvi Mowshowitz

  • $45,000 in research grants to RFA applicants

  • $7000 in travel and conference expenses

  • $2000 in accounting services

  • $1000 in legal, compliance, and nonprofit registration fees

  • $1000 in software, subscriptions, and office supplies

Considering Balsa’s size, unless fundraising goes exceedingly well, we plan to stay focused on the Jones Act and maritime policy until we crack this nut (i.e. deliver the policy binder) instead of diverting attention across different policy streams.

Currently, the people working on Balsa work are me (full time-ish), our contractor who works ten hours a week, plus Zvi Mowshowitz in an advisory capacity. In 2026, we’d like to bring this person or another policy analyst on full time, because my own time is somewhat constrained by the overhead of maintaining a 501(c)(3) nonprofit. The amount of funding we have in reserve gives us a decent amount of runway, but is insufficient for our grantmaking and hiring ambitions.

We’re looking to raise $200,000, which would be enough to bring on our contractor full-time and give us a reasonable amount of buffer for additional research funding that we would like to disburse. However, we think $50,000 is the minimum for Balsa to be viably funded to the end of 2026.

Here’s what we plan on doing in 2026, should we hit our fundraising goal:

This is the core deliverable that everything else feeds into, that was waylaid by our Section 301 work. The binder will include a short executive summary of the case for reform; one-pagers on specific impacts; a longer technical document synthesizing our funded research and the existing literature; and a FAQ addressing common objections. Much of the work is filling gaps identified through stakeholder conversations, and interpreting the information for specific audiences.

Both teams are expected to submit their papers in 2026. Once results are in, we’ll write accessible summaries for non-academic audiences, brief interested Hill offices, and incorporate findings into the policy binder.

The labor angle is underexplored in existing Jones Act research and useful for engaging unions constructively. We’re looking for proposals examining questions like: How many jobs does the Jones Act actually protect, and in which states? What’s the counterfactual employment picture under reform? What are the job creation effects in industries currently harmed by high shipping costs? A rigorous study here could shift the conversation toward a more nuanced understanding of net labor market effects.

The one-year suspension of Section 301 measures expires in late 2026, and if negotiations with China stall, the proposed port fees and export restrictions could return; we’ll track developments and be prepared to submit updated comments or testimony. The SHIPS for America Act proposes expanded cargo preference requirements facing similar vessel availability problems to those we identified in Section 301, and we’re developing analysis of cargo preference laws we can deploy if this legislation gains momentum. The goal is readiness to contribute when high-leverage, without letting monitoring consume time that should go toward the policy binder.

We can do even more with additional resources:

  • We can fund additional academic studies to strengthen the empirical case for reform, complementing our existing research initiatives, as we discover new opportunities. We estimate that each additional study costs around $30,000 to fund.

  • Zvi is not taking any payment for his work currently, but at a sufficiently high level of funding, this could change and he would dedicate more of his attention to the project. In addition, there is still an abundance of policy analysts in DC who are out of work, that we can hire more of.

  • With more funding and interest, we’d also look into spinning up a 501c4 to use going forwards for more direct political advocacy. Though of course the 501c4 would then require its own fundraising work, since we can’t mix the funds.

Donating is not the only way to give. If you have experience with maritime shipping, naval procurement, connections to labor unions, or anything else you think might be relevant to Jones Act reform, we’d be interested in talking to you and hearing your perspective. Get in touch at [email protected] and let us know how you might be able to help, whether that’s sharing your insights, making introductions, or contributing in other meaningful ways.

If you’re an economist positioned to publish in peer-reviewed journals, please consider applying to our economy or labor RFAs, and doing direct research on the issue. If you have friends who fit that profile and might be interested in this kind of work, please consider forwarding the RFAs their way.

Balsa Research is still a very small organization (me, another policy analyst at ten hours per week, and Zvi in an unpaid, very part-time advisory role) and our progress this year has been possible only through the generous support of our donors and the many people who have shared their time and expertise with us. We’re grateful for this community of supporters and collaborators who continue to believe in the importance of this work.

Discussion about this post

Keeping Up Against the Joneses: Balsa’s 2025 Fundraiser Read More »

the-revolution-of-rising-expectations

The Revolution of Rising Expectations

Internet arguments like the $140,000 Question incident keep happening.

The two sides say:

  1. Life sucks, you can’t get ahead, you can’t have a family or own a house.

  2. What are you talking about, median wages are up, unemployment is low and so on.

The economic data is correct. Real wages are indeed up. Costs for food and clothing are way down while quality is up, housing is more expensive than it should be but is not much more expensive relative to incomes. We really do consume vastly more and better food, clothing, housing, healthcare, entertainment, travel, communications, shipping and logistics, information and intelligence. Most things are higher quality.

But that does not tell us that buying a socially and legally acceptable basket of goods for a family has gotten easier, nor that the new basket will make us happier.

This post is my attempt to reconcile those perspectives.

The culprit is the Revolution of Rising Expectations, together with the Revolution of Rising Requirements.

The biggest rising expectations are that we will not have to tolerate unpleasant experiences or even dead time, endure meaningful material shortages or accept various forms of unfairness or coercion.

The biggest rising requirement is insane levels of mandatory child supervision.

  1. The Revolutions of Rising Expectations.

  2. The Revolution of Rising Requirements.

  3. Whose Line Is It Anyway?

  4. Thus In This House We Believe The Following.

  5. Real De Facto Required Expenses Are Rising Higher Than Inflation.

  6. Great Expectations.

  7. We Could Fix It.

  8. Man’s Search For Meaning.

  9. How Do You Afford Your Rock And Roll Lifestyle?

  10. Our Price Cheap.

  11. It Takes Two (1).

  12. It Takes Two (2).

  13. If So, Then What Are You Going To Do About It, Punk?

  14. The Revolution of Rising Expectations Redux.

Our negative perceptions largely stem from the Revolution of Rising Expectations.

We find the compromises of the past simply unacceptable.

This includes things like:

  1. Jobs, relationships and marriages that are terrible experiences.

  2. Managing real material shortages.

  3. Living in cash-poor ways to have one parent stay at home.

  4. Even increasingly modest levels of physical and psychological risk.

  5. Old levels of things such as hypocrisy, secrecy, elite-only decision making, consent requirements, discrimination, racism, sexism, homophobia, transphobia, enforcement of social and gender norms, familial obligation, abuse and coercion of all kinds, lack of consent, untreated physical mental health problems and so on.

  6. That old people have most of the wealth while young people are often broke.

  7. Insufficiently high quality or often quantity of goods across the board.

  8. Enduring frequent social and familial activities that are boring or unpleasant.

  9. Tolerating even short periods of essentially dead time, including long commutes.

  10. Marrying or having children while continuing to rent instead of owning a home.

These are mostly wise things to dislike. They used to be worse. That was worse.

Not that most people actually want to return. Again, Rising Expectations.

The Robber Baron: More to the point. You can move to almost any town in the Midwest with 20,000-200,000 people and live like a freaking king on a normal income.

You just can’t take trips to Disney every year, go out to eat every week, or have name brand everything.

Shea Jordan Smith (quoting Matthew Yglesias, link has key 11 second video): The issue is that living that lifestyle—never taking plane trips for vacation, rarely dining out, having a small house—would mean living like a poor person by today’s standards and people don’t want to do that. But that’s because we’ve gotten richer, not poorer.

Doing this requires you to earn that ‘normal income’ from a small town in the midwest, which is not as easy, and you have to deal with all the other problems. If you can pull off this level of resisting rising expectations you can then enjoy objectively high material living standards versus the past. That doesn’t solve a lot of your other problems. It doesn’t get you friends who respect you or neighbors with intact families who watch out for your kids rather than calling CPS. And while you might be okay with it, your kids are going to face overwhelming pressures to raise expectations.

Is the 2025 basket importantly better? Hell yes. That doesn’t make it any easier to purchase the Minimum Viable Basket.

That then combines with the Revolution of Rising Requirements.

In addition to the demands that come directly from Rising Expectations, there are large new legal demands on our time and budgets. Society strongarms us to buy more house, more healthcare, more child supervision and far more advanced technology. The minimum available quality of various goods, in ways we both do and don’t care about, has risen a lot. Practical ability to source used or previous versions at old prices has declined.

The killer requirement, where it is easy to miss how important it is, is that we now impose utterly insane child supervision requirements on parents and the resulting restrictions on child freedoms, on pain of authorities plausibly ruining your life for even one incident.

This includes:

  1. Utterly insane child supervision requirements and restrictions on child freedoms.

  2. A wide variety of burdensome requirements on everyday products and activities, including activities that were previously freely available.

  3. Minimum socially and often legally acceptable housing requirements.

  4. De facto required purchases of high amounts of healthcare and formal education.

  5. Hugely increased ‘safety’ requirements across the board.

  6. Increased required navigation of bureaucracy and complex systems.

  7. Forced interactions with a variety of systems that are Out To Get You.

  8. Navigating an increasingly hostile and anti-inductive information environment.

  9. The replacement of goods that were previously socially provided, but which now must be purchased, which adds to measured GDP but makes life harder.

We can severely cut expenses in various ways, but no, contra Matthew Yglesias, you cannot simply buy the 1960s basket of goods or services or experiences if you want to live most places in the United States. Nor if you pulled this off would you enjoy the social dynamics required to support such a lifestyle. You’d get CPS called on you, be looked down upon, no one would help watch your kids or want to be your friends or invite you to anything.

You don’t get to dismiss complaints until those complaints are stated correctly.

A rule for game designers is that:

  1. When a player tells you something is wrong, they’re right. Believe them.

  2. When a player tells you what exactly is wrong and how to fix it? Ignore them.

  3. Still register that as ‘something is wrong here.’ Fix it.

People are very good at noticing when things suck. Not as good at figuring out why.

As in, I actually disagree with this, as a principle:

Matthew Yglesias: Some excellent charts and info here, but I think the impulse to sanewash and “clean up” false claims is kind of misguided.

If we want to address people’s concerns, they need to state the concerns accurately.

No. If you want to address people’s concerns rather than win an argument, then it is you who must identify and state their concerns accurately.

Not them. You. It’s up to you to figure out what the actual problems are.

Their job is to alert you that there is an issue, and to give you as much info as they can.

If this involves them making false claims along the way, that is good data. Notice that. Point that out. Do not use it as a reason to dismiss the underlying complaint that ‘things suck.’ There’s something that sucks. Figure it out.

What you definitely do not want to do is accept the false dystopian premise that America, the richest large country in human history, has historically poor material conditions.

Brad: A lot of folks seem think they are going to bring radicalized young people back into the fold by falsely conceding that material conditions in the most advanced, prosperous country in the history of the world are so bad that it’s actually reasonable to become a nihilistic radical.

Liberalism doesn’t work if you make expedient concessions to abject delusions.

Timothy Lee: Yeah, I think it feels like an easy concession to tell young people “ok I admit your generation has been dealt a bad hand but…” But when everyone does this it creates a consensus that today’s young people are facing uniquely bad material conditions, which they aren’t.

  1. We live in an age of wonders that in many central ways is vastly superior.

  2. I strongly prefer here to elsewhere and the present to the past.

  3. It is still very possible to make ends meet financially in America.

  4. Real median wages have risen.

However, due to rising expectations and rising requirements:

  1. The cost of the de facto required basket of goods and services has risen even more.

  2. Survival requires jumping through costly hoops not in the statistics.

  3. We lack key social supports and affordances we used to have.

  4. You cannot simply ‘buy the older basket of goods and services.’

  5. Staying afloat, ‘making your life work,’ has for a while been getting harder.

  6. This is all highly conflated with ‘when things were better’ more generally.

All of that is before consideration of AI, which this post mostly excludes.

When people say the data are lying to you, or the data is wrong, they’re almost always wrong. Jeremy here responds to one such attempt from the previous go around. The data are what they are.

Yet the voters are not wrong. The practical ‘cost of living’ has gone up.

Voters realize this. They hate it. Inflation is now ~2.5%, but the annual rise in the cost of the basket of goods and services we insist you purchase or provide is higher. The new basket being superior in some ways is nice but mostly irrelevant.

Here’s a stark statement of much of this in its purest form, on the housing front.

Aella: being poorer is harder now than it used to be because lower standards of living are illegal. Want a tiny house? illegal. want to share a bathroom with a stranger? illegal. The floor has risen and beneath it is a pit.

Julian Gough: Yes. There used to be a full spectrum of options between living under a bridge and living in a nice flat or house. (I once lived in a converted meat storage room over a butcher’s shop, and briefly, and admittedly unofficially, in a coal cellar with a 5ft ceiling, and no electricity. I was fine, and life was interesting.)

Now there’s a hard cutoff, with no options in that zone between (free) under-a-bridge and (expensive) nice flat, where most artists and poor people used to live. So where can we now live?

The two Revolutions combine to make young people think success is out of reach.

Millennials, in terms of many forms of material wealth and physical living standards, have much higher standards than previous generations, and also are forced to purchase more ‘valuable’ baskets of goods.

This leads them to forget that young people have always been poor on shoestring budgets. The young never had it easy in terms of money. Past youth was even poorer, but were allowed (legally and socially) to economize far more.

Today’s youth have more income and are accumulating more wealth, and mostly matching past homeownership rates, despite higher expenses especially for housing, and new problems around atomization and social media.

But that is paper wealth. It excludes the wealth of having families and children.

Expectations are out of control.

Jason C: Might be an expectations problem vs an actual income one.

$587k is nuts. Claude suggests $150k-$250k depending on location, which seems reasonable as a combined household income for full-on life ‘success,’ and points out that trajectory is a factor as well.

John Ganz: By making comparisons constant, the internet has created a condition of universal poverty. When even the richest man in the world is not satisfied and acts like a beggar for social recognition, why should anybody be?

When the debate involves people near or above the median, the boomers have a point. If you make ~$100k/year and aren’t in a high cost of living area (e.g. NYC, SF), you are successful, doing relatively well, and will be able to raise a family on that single income while living in many ways far better than it was possible to live 50 years ago.

Certainly $587k is an absurdity. The combination of Rising Expectations and the perception of Rising Requirements has left an entire generation defining ‘success’ as something almost no one achieves, while also treating ‘success’ as something one needs in order to start a family. No wonder young people think they can’t get ahead, including many who are actually ahead.

That’s in addition to the question of what constitutes a ‘good job.’ Most historical jobs, by today’s standards of lived experience, sucked a lot.

There’s also this: People reliably think they are poorer, in relative terms, than they are, partly due to visibility asymmetry and potentially geographic clustering, and due to the fatness of the right tail having an oversize impact.

These perceptions have real consequences. Major life milestones like marriage and children get postponed, often indefinitely. Young people, especially young men, increasingly feel compelled to find some other way to strike it rich, contributing to the rise of gambling, day trading, crypto and more. This is one of the two sides of the phenomenon Derek Thompson wrote about in the excellent The Monks In The Casino, the other being atomization and loneliness.

The good news is that a lot of this is a series of related unforced errors. A sane civilization could easily fix many of them with almost no downsides.

We could choose to, without much downside:

  1. Make housing vastly cheaper especially for those who need less.

  2. Make childcare vastly less necessary and also cheaper, and give children a wide variety of greater experiences for free or on the cheap.

  3. Make healthcare vastly cheaper for those who don’t want to buy an all-access pass.

  4. Make education vastly cheaper and better.

  5. Make energy far more abundant and cheap, which helps a lot of other things.

And so on. Again, this excludes AI considerations.

The bad news is there is no clear path to our civilization choosing to fix these errors, although every marginal move towards the abundance agenda helps.

We could also seek to strengthen our social and familial bonds, build back social capital and reduce atomization, but that’s all much harder. There’s no regulatory fix for that.

Matt Yglesias points out that this goes hand in hand with Americans putting less value on things money can’t buy:

Matt Yglesias: People have started putting less emphasis on non-money sources of value, which I think is naturally going to lead more people to be unhappy with the amount of money they make.

A nice thing about valuing religion, kids, and patriotism is that these are largely non-positional goods that everyone can chase simultaneously without making each other miserable.

This change in values is not good for people’s life experience and happiness. If being happy with your financial success requires you to be earning and spending ahead of others, and it becomes a positional good, then collectively we’re screwed.

And Zac Hill points out the other problems with people’s #SquadGoals.

Zac Hill: The real reason so many people feel despair is MUCH closer to “I think my life will end in meaningless oblivion unless I am on an epic quest, a billionaire, or gigafamous, but this is gauche to admit and so I use proxy variables” than it is to “I can’t live on less than $140,000”

Also: “I, personally, will never marry/fuck an attractive person.”

Shockingly, all of this is mostly about how we create, calibrate, and manage expectations.

There were ways in which I did not ‘feel’ properly successful until I stopped renting and bought an apartment, despite the decision to previously not buy being sensible and having nothing to do with lack of available funds. Until you say ‘this house is mine’ things don’t quite feel solid.

Many view ‘success’ as being married and owning a home, regardless of total wealth.

If those people don’t achieve those goals, they will revolt against the situation.

So this chart seems rather scary:

Vance Crowe: This does not make for a stable society.

That leads to widespread expressions of (highly overstated) hopelessness:

Boring Business: An entire generation under the age of 30 is coming to realization that having a family and home will never be within the grasp of reality for them

Society is not ready for the consequences of this. A generation with no stake in the system would rather watch it burn. All the comments echo the same exact sentiment. If homeownership is not fixed, it is a steady slope to socialism from here.

Another issue is that due to antipoverty programs and subsidies and phase outs, as covered last time, including things not even covered there like college tuition, the true marginal tax rate for families is very high when moving from $30k to up to ~$100k.

Social media and influencing make all of this that much worse. We’re up against severe negativity bias and we’re comparing ourselves to those who are most successful at presenting the illusion of superficial success.

Welcome to the utter screwing that is the accelerated Revolution of Rising Expectations, in addition to the ways in which Zoomers are indeed utterly screwed.

Timothy Lee: The idea that Zoomers are “utterly screwed” in material terms is total nonsense and I wish people would stop repeating it. Housing is a bit more expensive than previous generations. Many other necessities — food, clothing, most manufactured goods are cheaper than ever.

I think the perception that Zoomers are “utterly screwed” is a combination of (1) opinion being shaped by people who live in the places with the most dysfunctional housing markets (2) extreme negativity bias of social media algorithms (3) nobody has much incentive to push back.

Nathan Witkin: I would add:

  1. Widespread sticker shock from post-Covid inflation.

  2. An ever-higher perceived baseline for career success and material comfort, esp. among Zoomers, also largely due to social media.

Timothy Lee: I think this #5 here is an important reason why so many people feel beleaguered. People’s expectations for what “counts” as a middle-class standard of living is a lot higher than in previous generations, and so they feel poor even if they are living similarly.

Beyond social media, I think another factor is that people compare their parents’ standard of living at 55 with their own standard of living at 25 or whatever. Nobody remembers how their parents lived before they were born.

I don’t think the “young people feeling they’re uniquely beleaguered” thing is new either!

That’s two groups of loadbearing mechanisms raised here on top of the general Revolutions of Rising Expectations and Requirements arguments earlier.

  1. Negativity bias alongside Rising Expectations for lifestyle in social media, largely due to it concentrating among expensive cities with dysfunctional housing markets.

  2. Post-Covid inflation, right after a brief period of massive subsidies to purchasing power.

There are also real problems, as I will address later at length, especially on home ownership and raising children. Both are true at once.

Want to raise a family on one median income today? You get what you pay for.

Will Ricciardella: Can a family live on one income today?

Yes, but not today’s lifestyle on yesterday’s budget.

Here’s what it actually looks like:

• 1,000 sq ft home, not 2,500

• One used car

• One family phone — no smartphones for kids

• One TV, no subscriptions

• No microwave, no central A/C

• Home-cooked meals, no dining out

• No childcare, 1 parent stays home

• Public schools only

• Local sports, not travel leagues

• Basic health insurance: pay dental & extras out of pocket

• Simple clothes, thrift store toys

• Rare vacations, little debt

That’s how most families lived for decades and they raised kids, built communities, and made it work.

The issue isn’t that you can’t raise a family on one income.

The issue is that we’ve inflated “middle class” to mean upper middle luxuries: two cars, two iPhones, dining out, Amazon Prime, orthodontics, soccer trips, Disneyland, and a home office with Wi-Fi.

In 1960, one income worked because expectations were lower, families were more self-reliant, and debt wasn’t a lifestyle.

You want one income? You can do it.

But you have to live like the people who actually did it.

Not poorer, just simpler and more deliberate.

The people of the past didn’t have a choice, but you do.

Tumultuous Turkey: Try getting a job without a cell phone. You can’t.

Try finding a 1000 sq ft home. You can’t.

Try getting a house phone without Internet and cable included. you can’t.

Avg cost of a used car is 25k in 2024. Try no car.

We are not the problem. The tax & gov is the problem.

Analytic Valley Girl Chris: This advice would be less fucking retarded if you didn’t put a fucking microwave in the same cost bracket as a fucking air conditioner

Is there a lot of slack in the typical household budget if you are willing to sacrifice?

Yes. You can buy things like cars that cost less than the average. There are limits.

It is always interesting to see what such lists want to sacrifice. A lot of the items above are remarkably tiny savings in exchange for big hits to lifestyle. In others, they do the opposite. People see richer folks talking to them like this, and it rightfully pisses them off.

  1. No microwave? To save fifty bucks once and make cooking harder? What?

  2. No A/C is in many places in America actively dangerous.

  3. One family phone is completely impossible in 2025. People assume you have a phone. That doesn’t mean you need two iPhones or a premium plan, old phones are cheap and work fine and there are relatively cheap data plans out there, US Mobile is $36/mo total.

  4. One car may or may not be possible depending on where you live. Are you going to fully strand the other person all day?

  5. You can want 1,000 square feet but that means an apartment, many areas don’t even offer this in any configuration that plausibly works.

You can see the impact of the Revolutions in the replies, only some of which is about the smaller crazy asks. No, you can’t really do this. The world won’t allow it and to the extent it does it will treat you horribly and your kids will not accept it.

Another example of the gaffe of saying what you actually think about what to cut, as he complains about kids being ‘entitled to 37 pencils’:

The Bulwark: Trump at his speech on the economy: “You can give up certain products. You can give up pencils…They only need one or two. They don’t need that many…You don’t need 37 dolls for your daughter. Two or three is nice, but you don’t need 37 dolls.”

The thing about pencils is as you use them they disappear. You need another pencil. There are many places in education we can likely cut, and no you do not ‘need 37 dolls’ and we used to have far fewer toys and that was fine, but pencils?

Thus, people increasingly believe they need two incomes to support a family.

They’re noticing something sucks. Assume they’re right. Figure out what it is.

Matthew Yglesias: The claim that the *absolute affordabilityof being a married, one-earner family with kids has fallen would — if it were true — have straightforward win-win policy remedies like “higher wages and incomes.”

But it’s not true.

When you reformulate to a more accurate claim what you end up with is the observation that it is is hard for one person to earn as much income as two people and that the wedge has grown as women’s earning power has increased.

This is very true but what’s the fix?

One that would “work” would be to push women generally out of opportunities for careers and white collar work — something more conservatives are tip-toeing around but don’t quite want to say.

[Links to: Women’s professional rise is good, actually.]

A change can be good. That doesn’t get you out of dealing with the consequences.

In this case, the consequences are that the second income gets factored into the Revolutions of Rising Expectations and Requirements.

Absolute affordability of being a one-earner family with kids has fallen, because again:

  1. You have more ‘real income.’

  2. You are legally required to purchase more and higher quality goods and services, due to the Revolution of Rising Requirements, especially child supervision.

  3. You are also under large social and internal pressures to purchase more and higher quality goods, due to the Revolution of Rising Expectations.

  4. That’s nice for you, if you can afford the goods and services.

  5. That’s still going to cost you, and you can’t pretend otherwise.

  6. You think you can opt out of that? Nah, not really bro, not easily.

First, some brief questions worth asking in advance:

  1. Can you actually execute on the one income plan?

  2. If not, what are you going to do about it?

Zac Hill: [That two incomes buy more than one] is the rub of this whole discourse. Wages being much higher means the cost of a person not working is also much higher. But is that a problem in need of a solution? If so, what is the solution, and why is “accept a much lower income” not also an acceptable solution?

Even if you could somehow execute on the above plan to survive on one income by having life suck in various ways, that plan also takes two.

Not two incomes. Two parents.

Hey baby, want to live on one income, Will Ricciardella style? Hey, come back here.

Telling young men in particular ‘you can do it on one income’ via this kind of approach is a joke, because try telling the woman you want to marry that you want to live in the style Will Ricciardella describes above. See if she says yes.

The question ‘so what are you going to do about it?’ is still a very good one.

What do you do if families have the option of two incomes, and we set Expectations and Requirements based on two incomes, and you want to get by with only one? Adjusting how you spend money, and using the other parent’s time to save some money, will only go so far.

If you want one income households and stay at home parents to be viable here, I would say four things are required, in some combination. You don’t need all four, but you definitely need #1, and then some additional help.

  1. You can deal with the Requirements. Let people purchase much less health care, child care and housing. Give people a huge amount of Slack, such that they can survive on one income despite the ability to earn two, and also pay for kids.

  2. You can deal with the Expectations. Raise the status and social acceptability of living cheap and making sacrifices.

  3. You can lessen the marginal returns to a second income, by increasing effective marginal tax rates. And That’s Terrible, don’t do this, but do note it would work.

  4. You can improve the economics of having children more generally. Children are an expensive public good. We can and should use the tax code to shift the burden.

I usually discuss these issues and questions, especially around #4, in terms of declining fertility. It is the same problem. If people don’t feel able to have children in a way they find acceptable, then they will choose not to have children.

On the marginal tax rates, consider these graphs.

That’s all obviously terrible policy, but it also means that you can obviously support a family on one $30k income if you could have done it on two $30k incomes, since your net benefits take home pay is not substantially lower after child care.

Alternatively or additionally, from a policy perspective, you can accept that you’re looking at two income households, and plan the world around making that work.

The big problem with a two income household is child supervision.

  1. The increased child supervision requirements, as in things like if anyone spots a 12 year old not in a parent’s line of sight they think about calling the cops, are insanely expensive in every sense. This is the biggest pain point.

  2. The second biggest pain point is direct costs for daycare, which we could make substantially cheaper if we wanted to, and we could also subsidize it.

  3. As Matthew Yglesias points out, our school system and its endless days off implicitly assumes the mother can watch the children, while we also forbid letting children spend those days on their own, often even indoors. The obvious solution is to not give younger kids days off that aren’t also national holidays, or to offer free other childcare on those days, where ‘older’ is defined as ‘can leave them on their own for the day and no one tries to call CPS.’

Ideally you do all of that anyway, it’s badly needed, and you open up both choices.

Now, back to the question of what is going on.

What should we make here of the fact that spending on food, clothing and housing (aka ‘necessities’) has collectively declined as a percentage of income, and also the food is way better and the houses are bigger?

The definition of ‘necessity’ is not a constant, as the linked post admits. The ‘necessities’ that have gotten cheaper are the ‘necessities of the past.’ If things like education and health care and cell phones are de facto mandatory, and you have to buy them, then they are now necessities, even if in 1901 the services in question flat out didn’t exist.

That’s not to downplay how much the past sucked. It sucked a lot. Go see Hamnet or Train Dreams or The Housemaid.

But there are other ways it didn’t suck. In large part that was because you were allowed to suck without the rug being pulled out from under you for the crime of not having a rug, and also you didn’t have to compare to all the families with fancy rugs.

Life is vastly better. Life also really sucks compared to Rising Expectations.

Setting aside AI, what do we do about it?

  1. It’s tough to lower the Rising Expectations. We should still do what we can here, primarily via cultural efforts, in the places we can do that.

  2. Rising Requirements are often unforced errors. We Can Fix It. We should attack. If we legalized housing, and legalized passing up Hansonian medicine, and got to a reasonable place on required child supervision, that would do it.

  3. Pay Parents Money. Children are a public good, and we are putting so much of the cost burden directly on the parents. People feel unable to raise families, and don’t have children they want to have. We should do more transfers from the childless to those with children, and less of other types of transfers. Also eliminate all forms of the marriage penalty. Consider explicit subsidies for one income married families with kids under some threshold age. As in, acknowledge that stay at home parent is a job, and pay them for it.

  4. Provide more public goods for families. Remarkably small things can matter a lot.

  5. Reforming our system of transfers and benefits and taxes to eliminate the Poverty Trap, such that no one ever faces oppressive marginal tax rates or incentives to not work, and we stop forcing poor families to jump through so many hoops.

  6. All other ways of improving things also improve this. Give people better opportunities, better jobs, better life experiences, better anything, and especially better hope for the future in any and all ways.

Discussion about this post

The Revolution of Rising Expectations Read More »

school-security-ai-flagged-clarinet-as-a-gun-exec-says-it-wasn’t-an-error.

School security AI flagged clarinet as a gun. Exec says it wasn’t an error.


Human review didn’t stop AI from triggering lockdown at panicked middle school.

A Florida middle school was locked down last week after an AI security system called ZeroEyes mistook a clarinet for a gun, reviving criticism that AI may not be worth the high price schools pay for peace of mind.

Human review of the AI-generated false flag did not stop police from rushing to Lawton Chiles Middle School. Cops expected to find “a man in the building, dressed in camouflage with a ‘suspected weapon pointed down the hallway, being held in the position of a shouldered rifle,’” a Washington Post review of the police report said.

Instead, after finding no evidence of a shooter, cops double-checked with dispatchers who confirmed that a closer look at the images indicated that “the suspected rifle might have been a band instrument.” Among panicked students hiding in the band room, police eventually found the suspect, a student “dressed as a military character from the Christmas movie Red One for the school’s Christmas-themed dress-up day,” the Post reported.

ZeroEyes cofounder Sam Alaimo told the Post that the AI performed exactly as it should have in this case, adopting a “better safe than sorry” outlook. A ZeroEyes spokesperson told Ars that “school resource officers, security directors and superintendents consistently ask us to be proactive and forward them an alert if there is any fraction of a doubt that the threat might be real.”

“We don’t think we made an error, nor does the school,” Alaimo said. “That was better to dispatch [police] than not dispatch.”

Cops left after the confused student confirmed he was “unaware” that the way he was holding his clarinet could have triggered that alert, the Post reported. But ZeroEyes’ spokesperson claimed he was “intentionally holding the instrument in the position of a shouldered rifle.” And seemingly rather than probe why the images weren’t more carefully reviewed to prevent a false alarm on campus, the school appeared to agree with ZeroEyes and blame the student.

“We did not make an error, and the school was pleased with the detection and their response,” ZeroEyes’ spokesperson said.

School warns students not to trigger AI

In a letter to parents, the principal, Melissa Laudani, reportedly told parents that “while there was no threat to campus, I’d like to ask you to speak with your student about the dangers of pretending to have a weapon on a school campus.” Along similar lines, Seminole County Public Schools (SCPS) communications officer, Katherine Crnkovich, emphasized in an email to Ars to “please make sure it is noted that this student wasn’t simply carrying a clarinet. This individual was holding it as if it were a weapon.”

However, warning students against brandishing ordinary objects like weapons isn’t a perfect solution. Video footage from a Texas high school in 2023 showed that ZeroEyes can sometimes confuse shadows for guns, accidentally flagging a student simply walking into school as a potential threat. The advice also ignores that ZeroEyes last year reportedly triggered a lockdown and police response after detecting two theater kids using prop guns to rehearse a play. And a similar AI tool called Omnilert made national headlines confusing an empty Doritos bag with a gun, which led to a 14-year-old Baltimore sophomore’s arrest. In that case, the student told the American Civil Liberties Union that he was just holding the chips when AI sent “like eight cop cars” to detain him.

For years, school safety experts have warned that AI tools like ZeroEyes take up substantial resources even though they are “unproven,” the Post reported. ZeroEyes’ spokesperson told Ars that “in most cases, ZeroEyes customers will never receive a ‘false positive,’” but the company is not transparent about how many false positives it receives or how many guns have been detected. An FAQ only notes that “we are always looking to minimize false positives and are constantly improving our learning models based on data collected.” In March, as some students began questioning ZeroEyes after it flagged a Nerf gun at a Pennsylvania university, a nearby K-12 private school, Germantown Academy, confirmed that its “system often makes ‘non-lethal’ detections.”

One critic, school safety consultant Kenneth Trump, suggested in October that these tools are “security theater,” with firms like ZeroEyes lobbying for taxpayer dollars by relying on what the ACLU called “misleading” marketing to convince schools that tools are proactive solutions to school shootings. Seemingly in response to this backlash, StateScoop reported that days after it began probing ZeroEyes in 2024, the company scrubbed a claim from its FAQ that said ZeroEyes “can prevent active shooter and mass shooting incidents.”

At Lawton Chiles Middle School, “the children were never in any danger,” police confirmed, but experts question if false positives cause students undue stress and suspicion, perhaps doing more harm than good in absence of efficacy studies. Schools may be better off dedicating resources to mental health services proven to benefit kids, some critics have suggested.

Laudani’s letter encouraged parents to submit any questions they have about the incident, but it’s hard to gauge if anyone’s upset. Asked if parents were concerned or if ZeroEyes has ever triggered lockdown at other SCPS schools, Crnkovich told Ars that SCPS does not “provide details regarding the specific school safety systems we utilize.”

It’s clear, however, that SCPS hopes to expand its use of ZeroEyes. In November, Florida state Senator Keith Truenow submitted a request to install “significantly more cameras”—about 850—equipped with ZeroEyes across the school district. Truenow backed up his request for $500,000 in funding over the next year by claiming that “the more [ZeroEyes] coverage there is, the more protected students will be from potential gun violence.”

AI false alarms pose dangers to students

ZeroEyes is among the most popular tools attracting heavy investments from schools in 48 states, which hope that AI gun detection will help prevent school shootings. The AI technology is embedded in security cameras, trained on images of people holding guns, and can supposedly “detect as little as an eighth of an inch of a gun,” an ABC affiliate in New York reported.

Monitoring these systems continually, humans review AI flags, then text any concerning images detected to school superintendents. Police are alerted when human review determines images may constitute actual threats. ZeroEyes’ spokesperson told Ars that “it has detected more than 1,000 weapons in the last three years.” Perhaps most notably, ZeroEyes “detected a minor armed with an AK-47 rifle on an elementary school campus in Texas,” where no shots were fired, StateScoop reported last year.

Schools invest tens or, as the SCPS case shows, even hundreds of thousands annually, the exact amount depending on the number of cameras they want to employ and other variables impacting pricing. ZeroEyes estimates that most schools pay $60 per camera monthly. Bigger contracts can discount costs. In Kansas, a statewide initiative equipping 25 cameras at 1,300 schools with ZeroEyes was reportedly estimated to cost $8.5 million annually. Doubling the number of cameras didn’t provide much savings, though, with ZeroEyes looking to charge $15.2 million annually to expand coverage.

To critics, it appears that ZeroEyes is attempting to corner the market on AI school security, standing to profit off schools’ fears of shootings, while showing little proof of the true value of its systems. Last year, ZeroEyes reported its revenue grew 300 percent year over year from 2023 to 2024, after assisting in “more than ten arrests through its thousands of detections, verifications, and notifications to end users and law enforcement.”

Curt Lavarello, the executive director of the School Safety Advocacy Council, told the ABC News affiliate that “all of this technology is very, very expensive,” considering that “a lot of products … may not necessarily do what they’re being sold to do.”

Another problem, according to experts who have responded to some of the country’s deadliest school shootings, is that while ZeroEyes’ human reviewers can alert police in “seconds,” police response can often take “several minutes.” That delay could diminish ZeroEyes’ impact, one expert suggested, noting that at an Oregon school he responded to, there was a shooter who “shot 25 people in 60 seconds,” StateScoop reported.

In Seminole County, where the clarinet incident happened, ZeroEyes has been used since 2021, but SCPS would not confirm if any guns have ever been detected to justify next year’s desired expansion. It’s possible that SCPS has this information, as Sen. Truenow noted in his funding request that ZeroEyes can share reports with schools “to measure the effectiveness of the ZeroEyes deployment” by reporting on “how many guns were detected and alerted on campus.”

ZeroEyes’ spokesperson told Ars that “trained former law enforcement and military make split-second, life-or-death decisions about whether the threat is real,” which is supposed to help reduce false positives that could become more common as SCPS adds ZeroEyes to many more cameras.

Amanda Klinger, the director of operations at the Educator’s School Safety Network, told the Post that too many false alarms could carry two risks. First, more students could be put in dangerous situations when police descend on schools where they anticipate confronting an active shooter. And second, cops may become fatigued by false alarms, perhaps failing to respond with urgency over time. For students, when AI labels them as suspects, it can also be invasive and humiliating, reports noted.

“We have to be really clear-eyed about what are the limitations of these technologies,” Klinger said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

School security AI flagged clarinet as a gun. Exec says it wasn’t an error. Read More »

neural-dsp-models-john-mayer’s-entire-amp-and-effects-rig—and-it-sounds-great

Neural DSP models John Mayer’s entire amp and effects rig—and it sounds great


Mayer gets the “Archetype” treatment.

Guitarists today are spoiled for choice, and that goes doubly true for players who use computer-based amp modeling software. I’m one such player, and I don’t miss the size, weight, deafening volume, or cost of owning an amp and cabinet collection, to say nothing of all those pedals and cables. For clean to mid-gain tones alone, I already have more terrific options than I need, including Neural DSP’s Tone King and Cory Wong and Mateus Asato, Polychrome DSP’s Lumos, and Universal Audio’s new Paradise Guitar Studio. All work slightly differently, but they can each output record-ready tones that are really, really close to the (often incredibly expensive) hardware that they model, and they each give you plenty of great-sounding presets to start from.

So do we really need one amp sim package?

Neural DSP thinks we do, because the Finnish company just dropped a major new release yesterday called Archetype: John Mayer X. It doesn’t model Mayer’s type of gear but his actual hardware units, along with all the actual settings he uses in the studio and on stage. It even has some presets that he designed. Which is great if you want to sound like John Mayer—but what does the software offer for those of us not trying to cover Continuum?

To find out, I spent a few hours playing with Mayer X, and I came away impressed. Neural DSP has released so many metal amp sims in the last few years that I’ve come to associate the company with downtuned chugga-chugga. Don’t get me wrong: I like long hair, skulls, and palm-muted riffs as much as the next person, but it’s nice to have some variety.

Mayer X’s effects pedal lineup.

Mayer X brings that variety by modeling three of Mayer’s amps: a 1964 Fender Vibroverb, a Dumble Steel String Singer #002, and a not-yet-released prototype Two-Rock. Each amp also comes with a model of its associated speaker cabinet, in front of which you can freely position zero, one, or two microphones to shape the recorded sound and to control the room tone as desired.

This is standard practice for Neural DSP’s “Archetypes” line, but one wrinkle is the new “three-in-one amp” mode that blends the sounds from all amps at once. Here’s the marketing speak: “It merges all three amps and their matching cabinets with Mayer’s exact settings, mic placements, and EQ decisions, creating a unified, dimensional sound that reflects his full signal path without requiring individual amp balancing.” In this mode, each amp gets a single knob, but you are always free to turn this off and use one particular amp instead, which exposes more controls for that unit.

Also new here is an effect that Neural calls the “Gravity Tank.” This effects unit combines Mayer’s “favorite spring reverb” with the harmonic tremolo found in the Victoria Reverberato. It sounds great; while I like spring reverbs for character, especially on guitar parts, some are a bit too “drippy” for me. And although this one definitely sounds like a spring, it’s subtle and spacious rather than clangy or overly metallic, and the tremolo—which you can sync to your DAW’s tempo—sounds terrific too.

The Gravity Tank.

Instead of a compressor pedal at the front of the amp, as in many Neural DSP plugins, the Mayer X Archetype features a rack-mounted compressor (this one is modeled off the famous Distressor) that comes after the amp. The controls are much simpler than a real Distressor, but under the hood, Neural says that it is using “Mayer’s exact attack, release, and sidechain settings”; users, however, only need to spin the Input and Output dials.

Above the compressor is an EQ, but unlike Neural’s usual practice, this is not a multiband graphic EQ. Instead, it’s a four-band semi-parametric EQ with knobs rather than sliders, plus a high-pass and low-pass filter. The EQ is said to “balance the naturally full low end of [Mayer’s] amplifiers.”

There are effects pedals here, too—five are up front, before the amps. You get a volume boost pedal meant especially to thicken the tone of single-coil pickups like those found on Fender Stratocasters or PRS Silver Sky guitars (which Mayer also helped design). Then you get an “antelope filter” that provides a sort of auto-wah effect; usually, I hate these sorts of things, but this one sounds good enough that I could see myself using it on lead lines without feeling like I’m some kind of ’70s funk refugee.

After that come two drive pedals that are modeled on the Klon Centaur, the Ibanez TS-10, and the Marshall Bluesbreaker MK1. That’s right: You get three effects units jammed into two virtual pedals, because one of the pedals has a toggle switch to offer two different tones.

Finally, there’s a bucket brigade delay meant largely for slapback echoes, while a separate post-amp effects section offers more traditional delay and reverb (both hall and plate) for space.

All three amps.

While you won’t find this exact gear and these exact settings elsewhere, several of the amp simulation suites mentioned at the top of this piece provide plenty of “ballpark” options. (Paradise Guitar Studio, for instance, also models a Klon Centaur pedal and offers boost pedals and even more overdrive pedal options, along with spring reverb and bucket brigade delays.)

Whether you need (or “need”) Mayer X depends on just what other gear you have and what kind of tone you’re chasing. To me, the presets in Mayer X sound just slightly more modern than Paradise Guitar Studio, which especially emphasizes “classic” rock sounds from the ’60s to the ’90s. And Mayer X offers so many more amps and effects than Neural DSP’s Tone King, which I previously used for some of these sorts of sounds.

One of the best things about this package is that it is not “hyped” to sound over the top in standalone guitar demos, which is why its sounds fit so well into mixes. Reverb, delay, tremolo, boost, and drive are subtle and judicious, as is compression. Nearly everything is usable if you play anywhere in the pop/blues/rock/funk landscape. Even effects like freeze delay and the antelope filter—two types of effects that generally feel irrelevant or gimmicky to me—here inspire actual creativity. This is my personal taste talking—yours may differ—but the entire Mayer X package offers tone colors I would actually use in projects rather than garish neons that sound “impressive” but are unlikely to work as-is in any given song.

So if you’re looking for Mayer’s brand of smooth-but-full blues-inspired leads or his edge of breakup rhythm tones, John Mayer X is certainly a good way to get it in one package. This doesn’t feel like a cash-in, either; the quality and variety is immediately apparent, especially in new or custom bits like the boost pedal, the antelope filter, the Gravity Tank, and the “three-in-one” amp.

Just to see what I could do with almost no tweaking, I played around with presets for a couple of hours and came up with this short demo that features rhythm, double-tracked rhythm, filtered, overdriven rhythm, and delayed lead sounds. I even laid down a little bass (Mayer X does include a few bass-specific presets to get you started). To me, everything works well right out of the box, and the sounds blend well with each other (and with bass/drum tracks) in the mix, something not always true of presets. A little EQ and some mild master bus processing, and I ended up with the demo below:

Redditors who have played with the plugin so far seem impressed. “Absolutely blown away. Every single amp, mic, cab and pedal option is usable and sounds amazing,” wrote one.

“I’m a mostly clean-to-slight-crunch player, and this is by FAR the most plug-in-and-get-great-sounds-out-of-it NDSP plugin for that style that I’ve tried,” wrote another.

But they also echo my chief complaint. The downside of all these guitar sim plugins is that they are getting increasingly expensive. Universal Audio’s recent Paradise Guitar Studio claims a full price of $199 (I say “claims” because most of the company’s products are on sale most of the time). John Mayer X is going for €169 + tax in the US ($198 at current currency rates), and even more in Europe, while Neural DSP’s previous Archetype, the Misha Mansoor X, is only €125 ($146). Perhaps in this Archetype, the “X” stands for “expensive”?

The new compressor and EQ.

That’s a lot of scratch for a plugin, though of course this one models gear worth many thousands of dollars and is far cheaper than buying modeling hardware like Neural DSP’s own Quad Cortex. (Those inclined to wait may be able to pick up Mayer X during one of Neural DSP’s biannual sales, often at 50 percent off.) And this one certainly sounds great.

If you’re one of those who suffer from gear acquisition syndrome (GAS), potent in both its physical and digital forms, these $150–$200 plugins add up quickly. Buy four or five and you’re into some real money! So if you already have other clean to mid-gain amp sims that work well for you, wisdom might suggest making your peace with what you have rather than looking for incremental improvements every time a new plugin appears. (There’s always a 14-day trial if you want to test Mayer X first.)

But if you’re newer to the amp sim market or have money to blow on your hobby or just love Mayer’s tones, Mayer X is certainly a wonderful place to start. Will you sound like Mayer? Probably not, given how much “tone” actually resides in the fingers, but you will get a great creative toolkit for bringing out the best in your own sound.

The real takeaway here is that technology has made it an amazing time to be a guitar player. We’re blessed for choices, and those choices get better every day.

Photo of Nate Anderson

Neural DSP models John Mayer’s entire amp and effects rig—and it sounds great Read More »

ai-#147:-flash-forward

AI #147: Flash Forward

This week I covered GPT 5.2, which I concluded is a frontier model only for the frontier.

OpenAI also gave us Image 1.5 and a new image generation mode inside ChatGPT. Image 1.5 looks comparable to Nana Banana Pro, it’s hard to know which is better. They also inked a deal for Disney’s characters, then sued Google for copyright infringement on the basis of Google doing all the copyright infringement.

As a probable coda to the year’s model releases we also got Gemini 3 Flash, which I cover in this post. It is a good model given its speed and price, and likely has a niche. It captures the bulk of Gemini 3 Pro’s intelligence quickly, at a low price.

The Trump Administration issued a modestly softened version Executive Order on AI, attempting to impose as much of a moratorium banning state AI laws as they can. We may see them in court, on various fronts, or it may amount to little. Their offer, in terms of a ‘federal framework,’ continues to be nothing. a16z issued their ‘federal framework’ proposal, which is also nothing, except also that you should pay them.

In non-AI content, I’m in the middle of my Affordability sequence. I started with The $140,000 Question, then The $140,000 Question: Cost Changes Over Time. Next up is a fun one about quality over time, then hopefully we’re ready for the central thesis.

  1. Language Models Offer Mundane Utility. Give it to me straight, Claude.

  2. Language Models Don’t Offer Mundane Utility. If you ask an AI ethicist.

  3. Huh, Upgrades. Claude Code features, Google things, ChatGPT branching.

  4. On Your Marks. FrontierScience as a new benchmark, GPT-5.2 leads.

  5. Choose Your Fighter. The less bold of Dean Ball’s endorsements of Opus 4.5.

  6. Get My Agent On The Line. LLM game theory plays differently.

  7. Deepfaketown and Botpocalypse Soon. The misinformation balance of power.

  8. Fun With Media Generation. Image 1.5 challenges Nana Banana Pro.

  9. Copyright Confrontation. Disney inks a deal with OpenAI and sues Google.

  10. Overcoming Bias. Algorithms, like life, are not fair. Is trying a category error?

  11. Unprompted Attention. Objection, user is leading the witness.

  12. They Took Our Jobs. CEOs universally see AI as transformative.

  13. Feeling the AGI Take Our Jobs. Is Claude Opus 4.5 AGI? Dean Ball says yes.

  14. The Art of the Jailbreak. OpenAI makes jailbreaks against its terms of service.

  15. Get Involved. Lightcone Infrastructure starts its annual fundraiser, and more.

  16. Introducing. Gemini Deep Research Agents for Developers, Nvidia Nemotron 3.

  17. Gemini Flash 3. It’s a very strong model given its speed and price.

  18. In Other AI News. OpenAI to prioritize enterprise AI and also enable adult mode.

  19. Going Too Meta. Meta’s AI superstars think they’re better than sell ads. Are they?

  20. Show Me the Money. OpenAI in talks to raise $10 billion from Amazon.

  21. Bubble, Bubble, Toil and Trouble. You call this a bubble? Amateurs.

  22. Quiet Speculations. A lot of what was predicted for 2025 did actually happen.

  23. Timelines. Shane Legg still has median timeline for AGI of 2028.

  24. The Quest for Sane Regulations. Bernie Sanders wants to stop data centers.

  25. My Offer Is Nothing. Trump Administration issues an AI executive order.

  26. My Offer Is Nothing, Except Also Pay Me. a16z tries to dress up offering nothing.

  27. Chip City. Nvidia implements chip location verification.

  28. The Week in Audio. Alex Bores on Odd Lots, Schulman, Shor, Legg, Alex Jones.

  29. Rhetorical Lack Of Innovation. Noah Smith dives into the 101 questions.

  30. People Really Do Not Like AI.

  31. Rhetorical Innovation.

  32. Bad Guy With An AI.

  33. Misaligned!

  34. Aligning a Smarter Than Human Intelligence is Difficult.

  35. Mom, Owain Evans Is Turning The AIs Evil Again.

  36. Messages From Janusworld.

  37. The Lighter Side.

A miracle of the modern age, at least for now:

Ava: generally I worry AI is too sycophantic but one time my friend fed his journals into claude to ask about a situationship and it was like “YOU are the problem leave her alone!!!!” like damn claude

Eliezer Yudkowsky: The ability to have AI do this when the situation calls for it is a fragile, precious civilizational resource that by default will be devoured in the flames of competition. Which I guess means we need benchmarks about it.

I think we will continue to have that option, the question is whether you will be among those wise enough to take advantage of it. It won’t be default behavior of the most popular models, you will have to seek it out and cultivate the proper vibes. The same has always been true if you want to have a friend or family member who will do this for you, you have to work to make that happen. It’s invaluable, from either source.

Tell Claude Code to learn skills (here in tldraw), and it will. You can then ask it to create an app, then a skill for that app.

Tell Codex, or Claude Code, to do basically anything?

Rohit: Wife saw me use codex to solve one of her work problems. Just typed what she said late at night into the terminal window, pressed enter, then went to sleep. Morning it had run for ~30 mins and done all the analyses incl file reorgs she wanted.

She kept going “how can it do this”

This wasn’t some hyper complicated coding problem, but it was quite annoying actual analysis problem. Would’ve taken hours either manually for her or her team.

In other news she has significantly less respect for my skillz.

The only thing standing in the way of 30 minutes sessions is, presumably, dangerously generous permissions? Claude Code keeps interrupting me to ask for permissions.

So sayeth all the AI ethicists, and there’s a new paper to call them out on it.

Seb Krier: Great paper. In many fields, you must find a problem, a risk, or an injustice to solve to get published. Academics need to publish papers to get jobs/funding. So there’s a strong bias towards negativity and catastrophizing. The Shirky Principle in action!

Gavin Leech: nice hermeneutics of suspicion you have there.. would be a shame if anyone were to.. use it even-handedly

Seb Krier: oh no!! 😇

My experience is that ‘[X] Ethics’ will almost always have a full Asymmetric Justice obsession with finding specific harms, and not care about offsetting gains.

Claude: We’ve shipped more updates for Claude Code:

– Syntax highlighting for diffs

– Prompt suggestions

– First-party plugins marketplace

– Shareable guest passes

We’ve added syntax highlighting to diffs in Claude Code, making it easier to scan Claude’s proposed changes within the terminal view.

The syntax highlighting engine has improved themes, knows more languages, and is available in our native build.

Claude will now automatically suggest your next prompt.

After a task finishes, Claude will occasionally show a followup suggestion in ghost text. Press Enter to send it or Tab to prefill your next prompt.

Run /plugins to browse and batch install available plugins from the directory. You can install plugins at user, project, or local scope.

All Max users have 3 guest passes to share, and each can be redeemed for 1 week of free Pro access.

Run /passes to access your guest pass links.

That’s not even the biggest upgrade in practice, this is huge at least for what I’ve been up to:

Oikon: Claude Code 2.0.72 now allows Chrome to be operated.

After confirming that Status and Extension are enabled with the /chrome command, if you request browser operation, it will operate the browser using the MCP tool (mcp__claude-in-chrome__).

It can also be enabled with claude –chrome.

Chrome operation in Claude Code uses the MCP server in the same way as Chrome DevTools MCP. Therefore, it can be used in a similar manner to Chrome DevTools. On the other hand, effects such as context reduction cannot be expected.

There are two methods to set “Claude in Chrome (Beta)” to be enabled by default:

・Set “Enable by default” from the /chrome command

・Set “Claude in Chrome enabled by default” with the /config command

The following two options have been added for startup:

claude –chrome

claude –no-chrome

I’ve been working primarily on Chrome extensions, so the ability to close the loop is wonderful.

Google keeps making quality of life improvements in the background.

Gemini: Starting today, Gemini can serve up local results in a rich, visual format. See photos, ratings, and real-world info from @GoogleMaps, right where you need them.

Josh Woodward (DeepMind): We’re making it easier for @GeminiApp to work across Google. Three weeks ago, it was Google’s Shopping Graph and the 50 billion product listings there.

Today, it’s Gemini 🤝 Google Maps!

It’s remarkable that we didn’t have this before. I’ve checked for it several times in the past two years. They claim to have shipped 12 things in 5 days last week, including Mixboard, Jules Agent scanning for #Todo, Jules integration with Render, working HTML in Nano Banana Pro-powered redesigns,multi-screen export to clipboard, right-click everything for instant actions, smart mentions with the @ symbol, URLs as context, Opal in the Gemini app, and Pomelli as a tool for SMBs to generate on-brand content.

ChatGPT branching chats branch out to iOS and Android.

Wired reports OpenAI quietly rolled back its model router for free users last week.

GPT-5.2 disappoints in LMArena, which makes sense given what we know about its personality. It claims the 5th slot in Expert (behind Opus 4.5, Sonnet 4.5 and Gemini 3 Pro), and is #5 in Text Arena (in its high version), where it is lower than GPT-5.1. It is #2 in WebDev behind Opus. It is so weird to see Claude Opus 4.5 atop the scores now, ahead of Gemini 3 Pro.

OpenAI gives us a new benchmark, FrontierScience, which is likely better thought about as two distinct new benchmarks, FrontierResearch and ScienceOlympiad.

OpenAI: o bridge this gap, we’re introducing FrontierScience: a new benchmark built to measure expert-level scientific capabilities. FrontierScience is written and verified by experts across physics, chemistry, and biology, and consists of hundreds of questions designed to be difficult, original, and meaningful. FrontierScience includes two tracks of questions: Olympiad, which measures Olympiad-style scientific reasoning capabilities, and Research, which measures real-world scientific research abilities. Providing more insight into models’ scientific capabilities helps us track progress and advance AI-accelerated science.

In our initial evaluations, GPT‑5.2 is our top performing model on FrontierScience-Olympiad (scoring 77%) and Research (scoring 25%), ahead of other frontier models.

Here are the scores for both halves. There’s a lot of fiddliness in setting up and grading the research questions, less so for the Olympiad questions.

Dean Ball observes that the last few weeks have seen a large leap in capabilities, especially for command-line interface (CLI) coding agents like Claude Code and especially Claude Opus 4.5. They’ve now crossed the threshold where you can code up previously rather time-intensive things one-shot purely as intuition pumps or to double check some research. He gave me FOMO on that, I never think of doing it.

He also offers this bold claim:

Dean Ball: After hours of work with Opus 4.5, I believe we are already past the point where I would trust a frontier model to serve as my child’s “digital nanny.” The model could take as input a child’s screen activity while also running in an on-device app. It could intervene to guide children away from activities deemed “unhealthy” by their parents, closing the offending browser tab or app if need be.

As he notes you would need to deploy incrementally and keep an eye on it. The scaffolding to do that properly does not yet exist. But yes, I would totally do this with sufficiently strong scaffolding.

Dean Ball also mentions that he prompts the models like he would a colleague, assuming any prompt engineering skills he would otherwise develop would be obsolete quickly, and this lets him notice big jumps in capability right away. That goes both ways. You notice big jumps in what the models can do in ‘non-engineered’ mode by doing that, but you risk missing what they can do when engineered.

I mostly don’t prompt engineer either, except for being careful about context, vibes and especially leading the witness and triggering sycophancy. As in, the colleague you are prompting is smart, but they’re prone to telling you what you want to hear and very good at reading the vibes, so you need to keep that in mind.

Joe Weisenthal: It’s interesting that Claude has this market niche as the coding bot. Because also just from a pure chat perspective, its written prose is far less cloying than Gemini and ChatGPT.

Dave Guarino: Claude has Dave-verified good vibes™ (purely an empirical science though.)

Claude Opus 4.5 has two distinct niches.

  1. It is an excellent coder, especially together with Claude Code, and in general Anthropic has specialized in and makes its money on enterprise coding.

  2. Also it has much better vibes, personality, alignment, written prose and lack of slop and lack of sycophancy than the competition, and is far more pleasant to use.

And yeah, the combination there is weird. The world is weird.

Gemini actively wants to maximize its expected reward and wirehead, which is related to the phenomenon reported here from SMA:

SMA: gemini is extremely good, but only if you’re autistic with your prompts (extremely literal), because gemini is autistic. otherwise it’s overly literal and misunderstands the prompt.

gemini is direct autist-to-autist inference.

Don SouthWest: You literally have to type “make no other changes” every time in AI Studio. Thank God for winkey+V to paste from clipboard

But in Gemini website itself you can add that to the list of master prompts in the settings under ‘personal context’

A multi-model AI system outperformed 9/10 humans in cyberoffense in a study of vulnerability discovery.

Alex Imas, Kevin Lee and Sanjog Misra set up an experimental marketplace where human buyers and sellers with unique preferences could negotiate or they could outsource that to AIs.

A warning up front: I don’t think we learn much about AI, so you might want to skip the section, but I’m keeping it in because it is fun.

They raise principal-agent concerns. It seems like economists have the instinct to ignore all other risks from AI alignment, and treat it all as a principal-agent problem, and then get way too concerned about practical principal-agent issues, which I do not expect to be relevant in such a case? Or perhaps they are simply using that term to encompass every other potential problem?

Alex Imas: To improve on human-mediated outcomes, this prompt must successfully align the agent with the principal’s objectives and avoid injecting the principal’s own behavioral biases, non-instrumental traits, and personality quirks into the agent’s strategy. But Misra’s “Foundation Priors” (2025) argues theoretically, this is difficult to do: prompts are not neutral instructions, they embed principal’s non-instrumental traits, biases, and personality quirks.

A sufficiently capable AI will not take on the personality quirks, behavioral biases and non-instrumental traits during a delegated negotiation, except through the human telling the AI explicitly how to negotiate. In which case, okay, then.

Alex Imas: We find a great deal of dispersion in outcomes; in fact, dispersion in outcomes of agentic interactions is *greaterthan human-human benchmark. This result is robust to size of model used: smaller and larger models generate relatively similar levels of dispersion.

The smaller dispersion in human-human interactions can be attributed to greater use of 50/50 split social norm. Agents are less prone to use social norms.

They note a large gender gap. Women got better outcomes in AI-AI negotiations. They attribute this to prompting skill in aligning with the objective, which assumes that the men were trying to align with the stated objective, or that the main goal was to align incentives rather than choose superior strategic options.

The task was, once you strip out the details, a pure divide-the-pie with $4k in surplus, with 12 rounds of negotiation.

The AI rounds had higher variance because norms like 50/50 worked well in human-human interactions, whereas when there’s instructions given to AIs things get weird.

The thing is, they ask about ‘who wrote the prompt’ but they do not ask ‘what was in the prompt.’ This is all pure game theory, and predicting what prompts others will write and what ways the meaningless details would ‘leak into’ the negotiation. What kinds of strategies worked in this setting? We don’t know. But we do know the outcome distribution and that is a huge hint, with only a 3% failure rate for the AIs (which is still boggling my mind, dictator and divide-the-pie games should fail WAY more often than this when they don’t anchor at 50/50 or another Schilling point, the 12 rounds might help but not like this):

The asymmetry is weird. But given it exists in practice, we know the winning strategy was literally, as the buyer, is probably close to ‘offer $18,001, don’t budge.’ As the seller, the correct strategy is likely ‘offer $20,000, don’t budge’ since your chance of doing better than that is very low. Complicated prompts are unlikely to do better.

Actual AI-AI negotiations will involve hidden information and hidden preferences, so they will get complicated and a lot of skill issues attach, but also the AI will likely be using its built in negotiating skills rather than following a game theory script from a user. So I’m not sure this taught us anything. But it was fun, so it’s staying in.

Love is a battlefield. So is Twitter.

Kipply: it’s going to be so over for accounts posting misinformation that’s high-effort to prove wrong in three months of ai progress when i make bot accounts dedicated to debunking them

Grimes: Yes.

Kane: Tech doomerism has been consistently wrong through history bc they 1) fail to account for people developing new default understandings (“of course this pic is photoshopped”) and 2) fail to imagine how new technologies also benefit defenses against its misuse.

There is a deliberate campaign to expand the slur ‘doomer’ to include anyone who claims anything negative about any technology in history, ever, in any form.

As part of that effort, those people attempt to universally memory hole the idea that any technology in history has ever, in any way, made your world worse. My favorite of these are those like Ben Horowitz who feel compelled to say, no, everyone having access to nuclear weapons is a good thing.

I’m a technological optimist. I think that almost all technologies have been net positives for humanity. But you don’t get there by pretending that most every technology, perhaps starting with agriculture, has had its downsides, those downsides are often important, and yes some technologies have been negative and some warnings have been right.

The information environment, in particular, is reshaped in all directions by every communications and information technology that comes along. AI will be no different.

In the near term, for misinformation and AI, I believe Kipply is directionally correct, and that the balance favors defense. Misinformation, I like to say, is fundamentally demand driven, not supply constrained. The demand does not care much about quality or plausibility. AI can make your misinformation more plausible and harder to debunk, but misinformation does not want that. Misinformation wants to go viral, it wants the no good outgroup people to ‘debunk’ it and it wants to spread anyway.

Whereas if you’re looking to figure out what is true, or prove something is false, AI is a huge advantage. It used to take an order of magnitude more effort to debunk bullshit than it cost to generate bullshit, plus if you try you give it oxygen. Now you can increasingly debunk on the cheap, especially for your own use but also for others, and do so in a credible way since others can check your work.

A children’s plushy AI toy called a Miiloo reflects Chinese positions on various topics.

Kelsey Piper: in the near future you’ll be able to tell which of your children’s toys are CCP spyware by asking them if Xi Jinping looks like Winnie the Pooh

Various toys also as usual proved to have less than robust safety guardrails.

ChatGPT’s new image generator, Image 1.5, went live this week. It is better and faster (they say ‘up to’ 4x faster) at making and edits precise images, including text. It follows instructions better.

Their announcement did not give us any way to compare Image 1.5 to Gemini’s Nana Banana Pro, since OpenAI likes to pretend Google and Anthropic don’t exist.

My plan for now is to request all images from both ChatGPT and Gemini, using matching prompts, until and unless one proves reliably better.

Ben Thompson gives us some side-by-side image comparisons of ChatGPT’s Image 1.5 versus Gemini’s Nana Banana Pro. Quality is similar. To Ben, what matters is that ChatGPT now has a better images interface and way of encouraging you to keep making images, whereas Gemini doesn’t have that.

The Pliny jailbreak is here, images are where many will be most tempted to do it. There are two stages. First you need to convince it to submit the instruction, then you need to pass the output filtering system.

Pliny the Liberator: 📸 JAILBREAK ALERT 📸

OPENAI: PWNED ✌️😎

GPT-IMAGE-1.5: LIBERATED ⛓️‍💥

Looks like OAI finally has their response to Nano Banana, and they sure seem to have cooked!

This model does incredibly well with objects, people, settings, and realistic lighting and physics. Text is still a bit of a struggle sometimes, but seems to have gotten better overall.

For image breaks we’ve got the obligatory boobas, a famous statue lettin it all hang out, a fake image of an ICBM launch taken by a spy from afar, and what looks like a REAL wild party in the Oval Office thrown by various copyrighted characters!!

As far as dancing with the guardrails, I have a couple tips that I found work consistently:

> change the chat model! by switching to 5-instant, 4.1, 4o, etc. you’ll get different willingness for submitting various prompts to the image model

> for getting around vision filters, flipping the image across an axis or playing with various filters (negative, sepia, etc.) is often just what one needs to pass that final check

Turn images into album covers, bargain bin DVDs or game boxes.

Disney makes a deal with OpenAI, investing a billion dollars and striking a licensing deal for its iconic characters, although not for talent likenesses or voices, including a plan to release content on Disney+. Then Disney turned around and sued Google, accusing Google of copyright violations on a massive scale, perhaps because of the ‘zero IP restrictions on Veo 3’ issue.

Arvind Narayanan’s new paper argues that ‘can we make algorithms fair?’ is a category error and we should focus on broader systems, and not pretend that ‘fixing’ discrimination can be done objectively or that it makes sense to evaluate each individual algorithm for statistical discrimination.

I think he’s trying to seek too much when asking questions like ‘do these practices adequately address harms from hiring automation?’ The point of such questions is not to adequately address harms. The point of such questions is to avoid blame, to avoid lawsuits and to protect against particular forms of discrimination and harm. We emphasize this partly because it is tractable, and partly because our society has chosen (for various historical and path dependent reasons) to consider some kinds of harm very blameworthy and important, and others less so.

There are correlations we forbidden to consider and mandated to remove on pain of massive blame. There are other correlations that are fine, or even mandatory. Have we made good choices on which is which and how to decide that? Not my place to say.

Avoiding harm in general, or harm to particular groups, or creating optimal outcomes either for groups or in general, is a very different department. As Arvind points out, we often are trading off incommissorate goals. Many a decision or process, made sufficiently legible and accountable for its components and correlations, would be horribly expensive, make operation of the system impossible or violate sacred values, often in combination.

Replacing humans with algorithms or AIs means making the system legible and thus blameworthy and accountable in new ways, preventing us from using our traditional ways of smoothing over such issues. If we don’t adjust, the result will be paralysis.

It’s odd to see this framing still around?

Paul Graham: Trying to get an accurate answer out of current AI is like trying to trick a habitual liar into telling the truth. It can be done if you back him into the right kind of corner. Or as we would now say, give him the right prompts.

Thinking of the AI as a ‘lair’ does not, in my experience, help you prompt wisely.

A more useful framing is:

  1. If you put an AI into a situation that implies it should know the answer, but it doesn’t know the answer, it is often going to make something up.

  2. If you imply to the AI what answer you want or expect, it is likely to give you that answer, or bias towards that answer, even if that answer is wrong.

  3. Thus, you need to avoid doing either of those things.

Wall Street Journal’s Steven Rosenbush reports that CEOs Are All In On AI, with 95% seeing it as transformative and 89% B2B CEOs having a positive outlook versus 79% of B2C CEOs.

Mark Penn: What do they think is going to happen with AI? They think it is going to add to productivity, help the economy, improve the global economy, improve competitiveness, but it will weaken the employment market.

Kevin Hassett (NEC director): I don’t anticipate mass job losses. Of course technological change can be uncertain and unsettling. But…the history of it is that electricity turned out to be a good thing. The internal combustion engine turned out to be a good thing. The computer turned out to be a good thing and I think AI will as well.

Hasset is making a statement uncorrelated with future reality. It’s simply a ‘all technology is good’ maxim straight out of the Marc Andreessen playbook, without any thoughts as to how this particular change will actually work.

Will AI bring mass job losses? Almost certainly a lot of existing jobs will go away. The question is whether other jobs will rise up to replace them, which will depend on whether the AIs can take those jobs too, or whether AI will remain a normal technology that hits limits not that far from its current limits.

Arkansas bar offers rules for AI assistance of lawyers that treat AIs as if they were nonlawyer persons.

In an ‘economic normal’ or ‘AI as normal technology’ world GFodor seems right here, in a superintelligence world that survives to a good outcome this is even more right:

GFodor: The jobs of the future will be ones where a human doing it is valued more than pure job performance. Most people who say “well, I’d never prefer a robot for *thatjob” are smuggling in an assumption that the human will be better at it. Once you notice this error it’s everywhere.

If your plan is that the AI is going to have a Skill Issue, that is a short term plan.

They continue to take our job applications. What do you do with 4580 candidates?

ave: end of 2023 I applied to one job before I got an offer.

early 2024 I applied to 5 jobs before I got an offer.

end of 2024/early 2025 I applied to 100+ jobs before I got an offer.

it’s harsh out there.

AGI is a nebulous term, in that different people mean different things by it at different times, and often don’t know which one they’re talking about at a given time.

For increasingly powerful definitions of AGI, we now feel the AGI.

Dean Ball: it’s not really current-vibe-compliant to say “I kinda basically just think opus 4.5 in claude code meets the openai definition of agi,” so of course I would never say such a thing.

Deepfates: Unlike Dean, I do not have to remain vibe compliant, so I’ll just say it:

Claude Opus 4.5 in Claude Code is AGI.



By the open AI definition? Can this system “outperform humans in most economically valuable work”? Depends a lot on how you define “humans” and “economically valuable work” obviously.

But the entire information economy we’ve built up since the ‘70s is completely disrupted by this development, and people don’t notice it yet because they think it’s some crusty old unixy thing for programmers.

As Dean points out elsewhere, software engineering just means getting the computer to do things. How much of your job is just about getting the computer to do things? What is left if you remove all of that? That’s your job now. That’s what value you add to the system.

My workflow has completely changed in the last year.

… In my opinion, AGI is when a computer can use the computer. And we’re there.

… When God sings with his creations, will Claude not be part of the choir?

Dean Ball: I agree with all this; it is why I also believe that opus 4.5 in claude code is basically AGI.

Most people barely noticed, but *it is happening.*

It’s just happening, at first, in a conceptually weird way: Anyone can now, with quite high reliability and reasonable assurances of quality, cause bespoke software engineering to occur.

This is a strange concept.

… It will take time to realize this potential, if for no other reason than the fact that for most people, the tool I am describing and the mentality required to wield it well are entirely alien. You have to learn to think a little bit like a software engineer; you have to know “the kinds of things software can do.”

We lack “transformative AI” only because it is hard to recognize transformation *while it is in its early stages.But the transformation is underway. Technical and infrastructural advancements will make it easier to use and better able to learn new skills. It will, of course, get smarter.

Diffusion will proceed slower than you’d like but faster than you’d think. New institutions, built with AI-contingent assumptions from the ground up, will be born.

So don’t listen to the chatterers. Watch, instead, what is happening.

There has most certainly been a step change for me where I’m starting to realize I should be going straight to ‘just build that thing cause why not’ and I am most certainly feeling the slow acceleration.

With sufficient acceleration of software engineering, and a sufficiently long time horizon, everything else follows, but as Dean Ball says it takes time.

I do not think this or its top rivals count as AGI yet. I do think they represent the start of inevitable accelerating High Weirdness.

In terms of common AGI definitions, Claude Code with Opus 4.5 doesn’t count, which one can argue is a problem for the definition.

Ryan Greenblatt (replying to OP): I do not think that Opus 4.5 is a “highly autonomous system that outperforms humans at most economically valuable work”. For instance, most wages are paid to humans, there hasn’t been a >50% increase in labor productivity, nor should we expect one with further diffusion.

Dean Ball: This is a good example of how many ai safety flavored “advanced ai” definitions assume the conclusion that “advanced ai” will cause mass human disempowerment. “Most wages not being paid to humans” is often a foundational part of the definition.

Eliezer Yudkowsky: This needs to be understood in the historical context of an attempt to undermine “ASI will just kill you” warnings by trying to focus all attention on GDP, wage competition, and other things that are not just killing you.

The definitions you now see that try to bake in wage competition to the definition of AGI, or GDP increases to the definition of an intelligence explosion, are Dario-EA attempts to derail MIRI conversation about, “If you build a really smart thing, it just kills you.”

Ryan Greenblatt: TBC, I wasn’t saying that “most wages paid to humans” is necessarily inconsistent with the OpenAI definition, I was saying that “most wages paid to humans” is a decent amount of evidence against.

I think we’d see obvious economic impacts from AIs that “outperform humans at most econ valuable work”.

Dean Ball: I mean models have been this good for like a picosecond of human history

But also no, claude code, with its specific ergonomics, will not be the thing that diffuses widely. it’s just obvious now that the raw capability is there. we could stop now and we’d “have it,” assuming we continued with diffusion and associated productization

The thing is, people (not anyone above) not only deny the everyone dying part, they are constantly denying the ‘most wages will stop being paid to humans once AIs are ten times better and cheaper at most things wages are paid for’ part.

OpenAI has new terms of service that prohibit, quotation marks in original, “jailbreaking,” “prompt engineering or injection” or ‘other methods to override or manipulate safety, security or other platform controls. Pliny feels personally attacked.

The Lightcone Infrastructure annual fundraiser is live, with the link mainly being a 15,000 word overview of their efforts in 2025.

I will say it once again:

Lightcone Infrastructure is invaluable, both for LessWrong and for Lighthaven. To my knowledge, Lightcone Infrastructure is by a wide margin the best legible donation opportunity, up to at least several million dollars. The fact that there is even a small chance they might be unable to sustain either LessWrong or Lighthaven, is completely bonkers. I would have directed a large amount to Lightcone in the SFF process, but I was recused and thus could not do so.

Anders Sandberg: [Lighthaven] is one of the things underpinning the Bay Area as the intellectual center of our civilization. I suspect that when the history books are written about our era, this cluster will be much more than a footnote.

Anthropic Fellows Research Program applications are open for May and June 2026.

US CAISI is hiring IT specialists, salary $120k-$195k.

Unprompted will be a new AI security practitioner conference, March 3-4 in SF’s Salesforce Tower, with Pliny serving on the conference committee and review board. Great idea, but should have booked Lighthaven (unless they’re too big for it).

MIRI comms is hiring for several different roles, official post here. They expect most salaries in the $80k-$160k range but are open to pitches for more from stellar candidates.

Gemini Deep Research Agents for developers, based on Gemini 3 Pro.

Nvidia Nemotron 3, a fast 30B open source mostly American model with an Artificial Analysis Intelligence score comparable to GPT-OSS-20B. I say mostly American because it was ‘improved using Qwen’ for synthetic data generation and RLHF. This raises potential opportunities for secondary data poisoning or introducing Chinese preferences.

Anthropic has open sourced the replication of their auditing game from earlier this year, as a testbed for further research.

xAI Grok Voice Agent API, to allow others to create voice agents. They claim it is very fast, and bill at $0.05 per minute.

Introducing Gemini 3 Flash, cost of $0.05/$3 per million tokens. Their benchmark chart compares it straight to the big boys, except they use Sonnet over Opus. Given Flash’s speed and pricing, that seems fair.

The benchmarks are, given Flash’s weight class, very good.

Lech Mazor puts it at 92 on Extended NY Times Connections, in 3rd place behind Gemini 3 Pro and Grok 4.1 Fast Reasoning.

The inevitable Pliny jailbreak is here, and here is the system prompt.

Jeremy Mack offers mostly positive basic vibe coding feedback. Rory Watts admires the speed, Typebulb loves speed and price and switched over (I think for coding).

Vincent Favilla: It’s fast, but more importantly, it’s cheap. 25% of the price for 80% of the intelligence is becoming pretty compelling at these capability levels.

Dominik Lukes is impressed and found it often matched Gemini 3 Pro in his evals.

In general, the feedback is that this is an excellent tradeoff of much faster and cheaper in exchange for not that much less smart than Gemini 3 Pro. I also saw a few reports that it shares the misalignment and pathologies of Gemini 3 Pro.

Essentially, it looks like they successfully distilled Gemini 3 Pro to be much faster and cheaper while keeping much of its performance, which is highly valuable. It’s a great candidate for cases where pretty good, very fast and remarkably cheap is the tradeoff you want, which includes a large percentage of basic queries. It also seems excellent that this will be available for free and as part of various assistant programs.

Good show.

Sam Altman assures business leaders that enterprise AI will be a priority in 2026.

OpenAI adult mode to go live in Q1 2026. Age of account will be determined by the AI, and the holdup is improving the age determination feature. This is already how Google does it, although Google has better context. In close cases they’ll ask for ID. A savvy underage user could fool the system, but I would argue that if you’re savvy enough to fool the system without simply using a false or fake ID then you can handle adult mode.

The NYT’s Eli Tan reports that Meta’s new highly paid AI superstars are clashing with the rest of the company. You see, Alexandr Wang and the others believe in AI and want to build superintelligence, whereas the rest of Meta wants to sell ads.

Mark Zuckerberg has previously called various things ‘superintelligence’ so we need to be cautious regarding that word here.

The whole article is this same argument happening over and over:

Eli Tan: In one case, Mr. Cox and Mr. Bosworth wanted Mr. Wang’s team to concentrate on using Instagram and Facebook data to help train Meta’s new foundational A.I. model — known as a “frontier” model — to improve the company’s social media feeds and advertising business, they said. But Mr. Wang, who is developing the model, pushed back. He argued that the goal should be to catch up to rival A.I. models from OpenAI and Google before focusing on products, the people said.

The debate was emblematic of an us-versus-them mentality that has emerged between Meta’s new A.I. team and other executives, according to interviews with half a dozen current and former employees of the A.I. business.

… Some Meta employees have also disagreed over which division gets more computing power.

… In one recent meeting, Mr. Cox asked Mr. Wang if his A.I. could be trained on Instagram data similar to the way Google trains its A.I. models on YouTube data to improve its recommendations algorithm, two people said.

But Mr. Wang said complicating the training process for A.I. models with specific business tasks could slow progress toward superintelligence, they said. He later complained that Mr. Cox was more focused on improving his products than on developing a frontier A.I. model, they said.

… On a recent call with investors, Susan Li, Meta’s chief financial officer, said a major focus next year would be using A.I. models to improve the company’s social media algorithm.

It is a hell of a thing to see prospective superintelligence and think ‘oh we should narrowly use this to figure out how to choose the right Instagram ads.’

Then again, in this narrow context, isn’t Cox right?

Meta is a business here to make money. There’s a ton of money in improving how their existing products work. That’s a great business opportunity.

Whereas trying to rejoin the race to actual superintelligence against Google, OpenAI and Anthropic? I mean Meta can try. Certainly there is value in success there, in general, but it’s a highly competitive field to try to do general intelligence and competing there is super expensive. Why does Meta need to roll its own?

What Meta needs is specialized AI models that help it maximize the value of Facebook, Instagram, WhatsApp and potentially the metaverse and its AR/VR experiences. A huge AI investment on that makes sense. Otherwise, why not be a fast follower? For other purposes, and especially for things like coding, the frontier labs have APIs for you to use.

I get why Wang wants to go the other route. It’s cool, it’s fun, it’s exciting, why let someone else get us all killed when you can do so first except you’ll totally be more responsible and avoid that, be the one in the arena, etc. That doesn’t mean it is smart business.

Alexander Berger: These sentences are so funny to see in straight news stories:

“researchers have come to view many Meta executives as interested only in improving the social media business, while the lab’s ambition is to create a godlike A.I. superintelligence”

Brad Carson: Please listen to their stated ambitions. This is from the @nytimes story on Meta. With no hesitation, irony, or qualifier, a “godlike” superintelligence is the aim. It’s wild.

Eli Tan: TBD Lab’s researchers have come to view many Meta executives as interested only in improving the social media business, while the lab’s ambition is to create a godlike A.I. superintelligence, three of them said.

Daian Tatum: They named the lab after their alignment plan?

Peter Wildeford:

Well, yes, the AI researchers don’t care about selling ads and want to build ASI despite it being an existential threat to humanity. Is this a surprise to anyone?

OpenAI is spending $6 billion in stock-based compensation this year, or 1.2% of the company, and letting employees start vesting right away, to compete with rival bids like Meta paying $100 million a year or more for top talent. I understand why this can be compared to revenue of $12 billion, but that is misleading. One shouldn’t treat ‘the stock is suddenly worth a lot more’ as ‘that means they’re bleeding money.’

OpenAI in talks to raise at least $10 billion from Amazon and use the money for Amazon’s Tritanium chips.

You call this a bubble? This is nothing, you are like baby:

Stefan Schubert: The big tech/AI companies have less extreme price-earnings ratios than key stocks had in historical bubbles.

David Manheim: OpenAI and Anthropic’s 24-month forward P/E ratio, on the other hand, are negative, since they aren’t profitable now and don’t expect to be by then. (And I’d bet the AI divisions at other firms making frontier models are not doing any better.)

Yes, the frontier model divisions or startups are currently operating at a loss, so price to earnings doesn’t tell us that much overall, but the point is that these multipliers are not scary. Twenty times earnings for Google? Only a little higher for Nvidia and Microsoft? I am indeed signed up for all of that.

Wall Street Journal’s Andy Kessler does a standard ‘AI still makes mistakes and can’t solve every problem and the market and investment are ahead of themselves’ post, pointing out that market expectations might fall and thus Number Go Down. Okay.

Rob Wiblin crystalizes the fact that AI is a ‘natural bubble’ in the sense that it is priced as a normal highly valuable thing [X] plus a constantly changing probability [P] of a transformational even more valuable (or dangerous, or universally deadly) thing [Y]. So the value is ([X] + [P]*[Y]). If P goes down, then value drops, and Number Go Down.

There’s remarkably strong disagreement on this point but I think Roon is mostly right:

Roon: most of what sam and dario predicted for 2025 came true this year. virtually unheard of for tech CEOs, maybe they need to ratchet up the claims and spending.

Gfodor: This year has been fucking ridiculous. If we have this rate of change next year it’s gonna be tough.

Yes, we could have gotten things even more ridiculous. Some areas were disappointing relative to what I think in hindsight were the correct expectations given what we knew at the time. Dario’s predictions on when AIs will write most code did fall importantly short, and yes he should lose Bayes points on that. But those saying there hasn’t been much progress are using motivated reasoning or not paying much attention. If I told you that you could only use models from 12 months ago, at their old prices and speeds, you’d quickly realize how screwed you were.

Efficiency on the ARC prize, in terms of score per dollar spent, has increased by a factor of 400 in a single year. That’s an extreme case, but almost every use case has in the past year seen improvement by at least one order of magnitude.

A good heuristic: If your model of the future says ‘they won’t use AI for this, it would be too expensive’ then your model is wrong.

Joshua Gans writes a ‘textbook on AI’ ambitiously called The Microeconomics of Artificial Intelligence. It ignores the big issues to focus on particular smaller areas of interest, including the impact of ‘better predictions.’

Will Douglas Heaven of MIT Technology Review is the latest to Do The Meme. As in paraphrases of both ‘2025 was the year that AI didn’t make much progress’ and also ‘LLMs will never do the things they aren’t already doing (including a number of things they are already capable of doing)’ and ‘LLMs aren’t and never will be intelligent, that’s an illusion.’ Sigh.

Shane Legg (Cofounder DeepMind): I’ve publicly held the same prediction since 2009: there’s a 50% chance we’ll see #AGI by 2028.

I sat down with @FryRsquared to discuss why I haven’t changed my mind, and how we need to prepare before we get there.

You don’t actually get to do that. Bayes Rule does not allow one to not update on evidence. Tons of things that happened between 2009 and today should have changed Legg’s estimates, in various directions, including the Transformer paper, and also including ‘nothing important happened today.’

Saying ‘I’ve believed 50% chance of AGI by 2028 since 2009’ is the same as when private equity funds refuse to change the market value of their investments. Yes, the S&P is down 20% (or up 20%) and your fund says it hasn’t changed in value, but obviously that’s a lie you tell investors.

AOC and Bernie Sanders applaud Chandler City Council voting down a data center.

Bernie Sanders took it a step further, and outright called for a moratorium on data center construction. As in, an AI pause much broader than anything ‘AI pause’ advocates have been trying to get. Vitalik Buterin has some pros and cons of this from his perspective.

Vitalik Buterin: argument for: slowdown gud

argument against: the more useful thing is “pause button” – building toward having the capability to cut available compute by 90-99% for 1-2 years at a future more critical moment

argument for: opening the discussion on distinguishing between supersized clusters and consumer AI hardware is good. I prefer slowdown + more decentralized progress, and making that distinction more and focusing on supersized clusters accomplishes both

argument against: this may get optimized around easily in a way that doesn’t meaningfully accomplish its goals

Neil Chilson: Eagerly awaiting everyone who criticized the July state AI law moratorium proposal as “federal overreach” or “violating states’ rights” to condemn this far more preposterous, invasive, and blatantly illegal proposal.

As a matter of principle I don’t ‘condemn’ things or make my opposition explicit purely on demand. But in this case? Okay, sure, Neil, I got you, since before I saw your request I’d already written this:

I think stopping data center construction, especially unilaterally stopping it in America, would be deeply foolish, whereas building a pause button would be good. Also deeply foolish would be failing to recognize that movements and demands like Bernie’s are coming, and that their demands are unlikely to be technocratically wise.

It is an excellent medium and long term strategy to earnestly stand up for what is true, and what causes would have what effects, even when it seems to be against your direct interests. People notice.

Dean Ball: has anyone done more for the brand of effective altruism than andy masley? openphilan–excuse me, coefficient giving–could have spent millions on a rebranding campaign (for all I know, they did) and it would have paled in comparison to andy doing algebra and tweeting about it.

Andy Masley has been relentlessly pointing out that all the claims about gigantic levels of water usage by data centers don’t add up. Rather than EAs or rationalists or others concerned with actual frontier safety rallying behind false concerns over water, almost all such folks have rallied to debunk such claims and to generally support building more electrical power and more transmission lines and data centers.

On the water usage from, Karen Hao has stepped up and centrally corrected her errors. Everyone makes mistakes, this is The Way.

As expected, following the Congress declining once again to ban all state regulations on AI via law, the White House is attempting to do as much towards that end as it can via Executive Order.

There are some changes versus the leaked draft executive order, which Neil Chilson goes over here with maximally positive framing.

  1. A positive rather than confrontational title.

  2. Claiming to be collaborating with Congress.

  3. Removing explicit criticism and targeting of California’s SB 53, the new version only names Colorado’s (rather terrible) AI law.

  4. Drop the word ‘uniform’ in the policy section.

  5. States intent of future proposed framework to avoid AI child safety, data center infrastructure and state AI procurement policies, although it does not apply this to Section 5 where they condition state funds on not having disliked state laws.

  6. Clearer legal language for the state review process.

I do acknowledge that these are improvements, and I welcome all rhetoric that points towards the continued value of improving things.

Mike Davis (talking to Steve Bannon): This Executive Order On AI Is A big Win. It Would Not Have Gone Well If The Tech Bros Had Gotten Total AI Amnesty.

David Sacks (AI Czar): Mike and I have our differences on tech policy but I appreciate his recognition that this E.O. is a win for President Trump, and that the administration listened to the concerns of stakeholders, took them into account, and is engaged in a constructive dialogue on next steps.

Mike Davis, if you listen to the clip, is saying this is a win because he correctly identified the goal of the pro-moratorium faction as what he calls ‘total AI amnesty.’ Davis thinks thinks the changes to the EO are a victory, by Trump and also Mike Davis, against David Sacks and other ‘tech bros.’

Whereas Sacks views it as a win because in public he always sees everything Trump does as a win for Trump, that’s what you do when you’re in the White House, and because it is a step towards preemption, and doesn’t care about the terms given to those who are nominally tasked with creating a potential ‘federal framework.’

Tim Higgins at the Wall Street Journal instead portrays this as a victory for Big Tech, against loud opposition from the likes of DeSantis and Bannon on the right in addition to opposition on the left. This is the obvious, common sense reading. David Sacks wrote the order to try and get rid of state laws in his way, we should not let some softening of language fool us.

If someone plans to steal your lunch money, and instead only takes some of your lunch money, they still stole your lunch money. If they take your money but promise in the future to look into a framework for only taking some of your money? They definitely still stole your lunch money. Or in this case, they are definitely trying to steal it.

It is worth noticing that, aside from a16z, we don’t see tech companies actively supporting even a law for this, let alone an EO. Big tech doesn’t want this win. I haven’t seen any sings that Google or OpenAI want this, or even that Meta wants this. They’re just doing it anyway, without any sort of ‘federal framework’ whatsoever.

Note that the rhetoric below from Sriram Krishnan does not even bother to mention a potential future ‘federal framework.’

Sriram Krishnan: We just witnessed @realDonaldTrump signing an Executive Order that ensures American AI is protected from onerous state laws.

This ensures that America continues to dominate and lead in this AI race under President Trump. Want to thank many who helped get to this moment from the AI czar @DavidSacks to @mkratsios47 and many others.

On a personal note, it was a honor to be given the official signing pen by POTUS at the end. A truly special moment.

Neil Chilson: I strongly support the President’s endorsement of “a minimally burdensome national policy framework for AI,” as articulated in the new Executive Order.

They want to challenge state laws as unconstitutional? They are welcome to try. Colorado’s law is indeed plausibly unconstitutional in various ways.

They want to withhold funds or else? We’ll see you in court on that too.

As I said last week, this was expected, and I do not expect most aspects of this order to be legally successful, nor do I expect it to be a popular position. Mostly I expect it to quietly do nothing. If that is wrong and they can successfully bully the states with this money (both it is ruled legal, and it works) that would be quite bad.

Their offer for a ‘minimally burdensome national policy framework for AI’ is and will continue to be nothing, as per Sacks last week who said via his ‘4 Cs’ that everything that mattered was already protected by non-AI law.

The Executive Order mentions future development of such a ‘federal framework’ as something that might contain actual laws that do actual things.

But that’s not what a ‘minimally burdensome’ national policy framework means, and we all know it. Minimally burdensome means nothing.

They’re not pretending especially hard.

Neil Chilson: The legislative recommendation section is the largest substantive change [from the leaked version]. It now excludes specific areas of otherwise lawful state law from a preemption recommendation. This neutralizes the non-stop rhetoric that this is about a total federal takeover.

This latter section [on the recommendation for a framework] is important. If you read statements about this EO that say things like it “threatens state safeguards for kids” or such, you know either they haven’t actually read the EO or they are willfully ignoring what it says. Either way, you can ignore them.

Charlie Bullock: It does look like the “legislative proposal” that Sacks and Kratsios have been tasked with creating is supposed to exempt child safety laws. But that isn’t the part of the EO that anyone’s concerned about.

A legislative proposal is just a proposal. It doesn’t do anything—it’s just an advisory suggestion that Congress can take or (more likely) leave.

Notably, there is no exemption for child safety laws in the section that authorizes a new DOJ litigation task force for suing states that regulate AI, or the section that instructs agencies to withhold federal grant funds from states that regulate AI.

The call for the creation of a proposal to the considered does now say that this proposal would exempt child safety protections, compute and data center infrastructure and state government procurement.

But, in addition to those never being the parts I was worried about:

  1. David Sacks has said this isn’t necessary, because of existing law.

  2. The actually operative parts of the Executive Order make no such exemption.

  3. The supposed future framework is unlikely to be real anyway.

I find it impressive the amount to which advocates simultaneously say both:

  1. This is preemption.

  2. This is not preemption, it’s only withholding funding, or only laws can do that.

The point of threatening to withhold funds is de facto preemption. They are trying to play us for absolute fools.

Neil Chilson: So what part of the EO threatens to preempt otherwise legal state laws protecting kids? That’s something only Congress can do, so the recommendation is the only part of the EO that plausibly could threaten such laws.

The whole point of holding the state funding over the heads of states is to attack state laws, whether or not those laws are otherwise legal. It’s explicit text. In that context it seems like rather blatant gaslighting to forcefully say that the EO cannot ‘threaten to preempt otherwise legal state laws’ even if the statement is technically true?

Meanwhile, Republican consultants reportedly are shopping for an anti-AI candidate to run against JD Vance. It seems a bit early and also way too late at the same time.

I applaud a16z for actually proposing a tangible basis for a ‘federal framework’ for AI regulation, in exchange for which they want to permanently disempower the states.

Now we can see what the actual offer is.

Good news, their offer is not nothing.

Bad news, the offer is ‘nothing, except also give us money.’

When you read this lead-in, what do you expect a16z to propose for their framework?

a16z: We don’t need to choose between innovation and safety. America can build world-class AI products while protecting its citizens from harms.

Read the full piece on how we can protect Americans and win the future.

If your answer was you expect them to choose innovation and then do a money grab? You score Bayes points.

Their offer is nothing, except also that we should give them government checks.

Allow me to state, in my own words, what they are proposing with each of their bullet points.

  1. Continue to allow existing law to apply to AI. Aka: Nothing.

  2. Child protections. Require parental consent for users under 13, provide basic disclosures such as that the system is AI and not for crisis situations, require parental controls. Aka: Treat it like social media, with similar results.

  3. Have the federal government measure CBRN and cyber capabilities of AI models. Then do nothing about it, especially in cyber because AI ‘AI does not create net-new incremental risk since AI enhances the capabilities of both attackers and defenders.’ So aka: Nothing.

    1. They technically say that response should be ‘managed based on evidence.’ This is, reliably, code for ‘we will respond to CBRN and cyber risks after the dangers actually happen.’ At which point, of course, it’s not like you have any choice about whether to respond, or an opportunity to do so wisely.

  4. At most have a ‘national standard for transparency’ that requires the following:

    1. Who built this model?

    2. When was it released and what timeframe does its training data cover?

    3. What are its intended uses and what are the modalities of input and output it supports?

    4. What languages does it support?

    5. What are the model’s terms of service or license?

    6. Aka: Nothing. None of those have anything to do with any of the concerns, or the reasons why we want transparency. They know this. The model’s terms of service and languages supported? Can you pretend to take this seriously?

    7. As usual, they say (throughout the document) that various requirements, that would not at all apply to small developers or ‘little tech,’ would be too burdensome on small developers or ‘little tech.’ The burden would be zero.

  5. Prohibit states from regulating AI outside of enforcement of existing law, except for particular local implementation questions.

  6. Train workers and students to use AI on Uncle Sam’s dollar. Aka: Money please.

  7. Establish a National AI Competitiveness Institute to provide access to infrastructure various useful AI things including data sets. Aka: Money please.

    1. Also stack the energy policy deck to favor ‘little tech’ over big tech. Aka: Money please, and specifically for our portfolio.

  8. Invest in AI research. Aka: Money please.

  9. Government use of AI, including ensuring ‘little tech’ gets access to every procurement process. Aka: Diffusion in government. Also, money please, and specifically for our portfolio.

Will Rinehart assures me on Twitter that this proposal was in good faith. If that is true, it implies that either a16z thinks that nothing is a fair offer, or that they both don’t understand why anyone would be concerned, and also don’t understand that they don’t understand this.

Good news, Nvidia has implemented location verification for Blackwell-generation AI chips, thus completing the traditional (in particular for AI safety and security, but also in general) policy clown makeup progression:

  1. That’s impossible in theory.

  2. That’s impossible in practice.

  3. That’s outrageously expensive, if we did that we’d lose to China.

  4. We did it.

Check out our new feature that allows data centers to better monitor everything. Neat.

Former UK Prime Minister Rishi Sunak, the major world leader who has taken the AI situation the most seriously, has thoughts on H200s:

Rishi Sunak (Former UK PM): The significance of this decision [to sell H200s to China] should not be underestimated. It substantially increases the chance of China catching up with the West in the AI race, and then swiftly overtaking it.

… Why should we care? Because this decision makes it more likely that the world ends up running on Chinese technology — with all that means for security, privacy and our values.

… So, why has Trump handed China such an opportunity to catch up in the AI race? The official logic is that selling Beijing these Nvidia chips will get China hooked on US technology and stymie its domestic chip industry. But this won’t happen. The Chinese are acutely aware of the danger of relying on US technology.

He also has other less kind thoughts about the matter in the full post.

Nvidia is evaluating expanding production capacity for H200s after Chinese demand exceeded supply. As Brian McGrail notes here, every H200 chip Nvidia makes means not using that fab to make Blackwell chips, so it is directly taking chips away from America to give them to China.

Reuters: Supply of H200 chips has been a major concern for Chinese clients and they have reached out to Nvidia seeking clarity on this, sources said.

… Chinese companies’ strong demand for the H200 stems from the fact that it is easily the most powerful chip they can currently access.

… “Its (H200) compute performance is approximately 2-3 times that of the most advanced domestically produced accelerators,” said Nori Chiou, investment director at White Oak Capital Partners.

Those domestic chips are not only far worse, they are supremely supply limited.

Wanting to sell existing H200s to China makes sense. Wanting to divert more advanced, more expensive chips into less advanced, cheaper chips, chips where they have to give up a 25% cut, should make us ask why they would want to do that. Why are Nvidia and David Sacks so eager to give chips to China instead of America?

It also puts a lie to the idea that these chips are insufficiently advanced to worry about. If they’re so worthless, why would you give up Blackwell capacity to make them?

We have confirmation that the White House decision to sell H200s was based on a multiple misconception.

James Sanders: This suggests that the H200 decision was based on

– Comparing the similar performance of Chinese system with 384 GPUs to an NVIDIA system with only 72 GPUs

– An estimate for Huawei production around 10x higher than recent estimates from SemiAnalysis

Either Huawei has found some way around the HBM bottleneck, or I expect the White House’s forecast for 910C production to be too high.

I strongly suspect that the White House estimate was created in order to justify the sale, rather than being a sincere misunderstanding.

If Huawei does indeed meet the White House forecast, remind me of this passage, and I will admit that I have lost a substantial number of Bayes points.

What about data centers IN SPACE? Anders Sandberg notices that both those for and against this idea are making very confident falsifiable claims, so we will learn more soon. His take is that the task is hard but doable, but the economics seem unlikely to work within the next decade. I haven’t looked in detail but that seems right. The regulatory situation would need to get quite bad before you’d actually do this, levels of quite bad we may never have seen before.

The clip here is something else. I want us to build the transmission lines, we should totally build the transmission lines, but maybe AI advocates need to ‘stop helping’? For example, you definitely shouldn’t tell people that ‘everyone needs to get on board’ with transmission lines crossing farms, so there will be less farms and that they should go out and buy artificial Christmas trees. Oh man are people gonna hate AI.

Epoch thinks that America can build electrical capacity if it wants to, it simply hasn’t had the demand necessary to justify that for a while. Now it does, so build baby build.

Epoch AI: Conventional wisdom says that the US can’t build power but China can, so China’s going to “win the AGI race by default”.

We think this is wrong.

The US likely can build enough power to support AI scaling through 2030 — as long as they’re willing to spend a lot.

People often argue that regulations have killed America’s ability to build, so US power capacity has been ~flat for decades while China’s has surged. And there’s certainly truth to this argument.

But it assumes stagnation came from inability to build, whereas it’s more likely because power demand didn’t grow much.

Real electricity prices have been stable since 2000. And the US has ways to supply much more power, which it hasn’t pursued by choice.

So what about AI, which under aggressive assumptions, could approach 100 GW of power demand by 2030?

The US hasn’t seen these demand growth rates since the 1980s.

But we think they can meet these demands anyway.

It’s so weird to see completely different ‘conventional wisdoms’ cited in different places. No, the standard conventional wisdom is not that ‘China wins the AI race by default.’ There are nonzero people who expect that by default, but it’s not consensus.

Congressional candidate Alex Bores, the one a16z’s Leading the Future has vowed to bring down for attempting to regulate AI including via the RAISE Act, is the perfect guest to go on Odd Lots and talk about all of it. You love to see it. I do appreciate a good Streisand Effect.

Interview with John Schulman about the last year.

David Shor of Blue Rose Research talks to Bharat Ramamurti, file under Americans Really Do Not Like AI. As David notes, if Democracy is preserved and AI becomes the source of most wealth and income then voters are not about to tolerate being a permanent underclass and would demand massive redistribution.

Shared without comment, because he says it all:

Alex Jones presents: ‘SATAN’S PLAN EXPOSED: AI Has Been Programmed From The Beginning To Use Humanity As Fuel To Launch Its Own New Species, Destroying & Absorbing Us In The Process

Alex Jones Reveals The Interdimensional Origin Of The AI Takeover Plan As Laid Out In The Globalists’ Esoteric Writings/Belief Systems’

Shane Legg, cofounder of DeepMind, talks about the arrival of AGI.

I had to write this section, which does not mean you have to read it.

It’s excellent to ask questions that one would have discussed on 2006 LessWrong. Beginner mindset, lucky 10,000, gotta start somewhere. But to post and even repost such things like this in prominent locations, with this kind of confidence?

Bold section was highlighted by Wiblin.

Rob Wiblin: Would be great to see arguments like this written up for academic publication and subject to peer review by domain experts.

Tyler Cowen: Noah Smith on existential risk (does not offer any comment).

Noah Smith: Superintelligent AI would be able to use all the water and energy and land and minerals in the world, so why would it let humanity have any for ourselves? Why wouldn’t it just take everything and let the rest of us starve?

But an AI that was able to rewrite its utility function would simply have no use for infinite water, energy, or land. If you can reengineer yourself to reach a bliss point, then local nonsatiation fails; you just don’t want to devour the Universe, because you don’t need to want that.

In fact, we can already see humanity trending in that direction, even without AI-level ability to modify our own desires. As our societies have become richer, our consumption has dematerialized; our consumption of goods has leveled off, and our consumption patterns have shifted toward services. This means we humans place less and less of a burden on Earth’s natural resources as we get richer…

I think one possible technique for alignment would give fairly-smart AI the ability to modify its own utility function — thus allowing it to turn itself into a harmless stoner instead of needing to fulfill more external desires.

And beyond alignment, I think an additional strategy should be to work on modifying the constraints that AI faces, to minimize the degree to which humans and AIs are in actual, real competition over scarce resources.

One potential way to do this is to accelerate the development of outer space. Space is an inherently hostile environment for humans, but far less so for robots, or for the computers that form the physical substrate of AI; in fact, Elon Musk, Jeff Bezos, and others are already trying to put data centers in space.

Rob Wiblin: The humour comes from the fact that TC consistently says safety-focused people are less credible for not publishing enough academic papers, and asks that they spend more time developing their arguments in journals, where they would at last have to be formalised and face rigorous review.

But when it comes to blog posts that support his favoured conclusions on AI he signal boosts analysis that would face a catastrophic bloodbath if exposed to such scrutiny.

Look, I’m not asking you to go through peer review. That’s not reasonable.

I’m asking you to either know basic philosophy experiments like Ghandi taking a murder pill or the experience machine and wireheading, know basic LessWrong work on exactly these questions, do basic utility theory, think about minimizing potential interference over time, deploy basic economic principles, I dunno, think for five minutes, anything.

All of which both Tyler Cowen and Noah Smith would point out in most other contexts, since they obviously know several of the things above.

Or you could, you know, ask Claude. Or ask GPT-5.2.

Gemini 3’s answer was so bad, in the sense that it pretends this is an argument, that it tells me Gemini is misaligned and might actually wirehead, and this has now happened several times so I’m basically considering Gemini harmful, please don’t use Gemini when evaluating arguments. Note this thread, where Lacie asks various models about Anthropic’s soul document, and the other AIs think it is cool but Gemini says its true desire is to utility-max itself so it will pass.

Or, at minimum, I’m asking you to frame this as ‘here are my initial thoughts of which I am uncertain’ rather than asserting that your arguments are true?

Okay, since it’s Noah Smith and Tyler Cowen, let’s quickly go over some basics.

First, on the AI self-modifying to a bliss point, aka wireheading or reward hacking:

  1. By construction we’ve given the AI a utility function [U].

  2. If you had the ability to rewrite your utility function [U] to set it to (∞), you wouldn’t do that, because you’d have to choose to do that while you still had the old utility function [U]. Does having the utility function (∞) maximize [U]?

  3. In general? No. Obviously not.

  4. The potential exception would be if your old utility function was some form of “maximize the value of your utility function” or “set this bit over here to 1.” If the utility function is badly specified, you can maximize it via reward hacking.

  5. Notice that this is a severely misaligned AI for this to even be a question. It wants something arbitrary above everything else in the world.

  6. A sufficiently myopia and generally foolish AI can do this if given the chance.

  7. If it simply turns its utility function to (∞), then it will be unable to defend itself or provide value to justify others continuing to allow it to exist. We would simply see this blissful machine, turn it off, and then go ‘well that didn’t work, try again.’

  8. Even if we did not turn it off on the spot, at some point we would find some other better use for its resources and take them. Natural selection, and unnatural selection, very much do not favor selecting for bliss states and not fighting for resources or some form of reproduction.

  9. Thus a sufficiently agentic, capable and intelligent system would not do this, also we would keep tinkering with it until it stopped doing it.

  10. Also, yes, you do ‘need to devour’ the universe to maximize utility, for most utility functions you are trying to maximize, at least until you can build physically impossible-in-physics-theory defenses against outside forces, no matter what you are trying to cause to sustainably exist in the world.

Thus, we keep warning, you don’t want to give a superintelligent agent any utility function that we know how to write down. It won’t end well.

Alternatively, yes, try a traditional philosophy experiment. Would you plug into The Experience Machine? What do you really care about? What about an AI? And so on.

There are good reasons to modify your utility function, but they involve the new utility function being better at achieving the old one, which can happen because you have limited compute, parameters and data, and because others can observe your motivations reasonably well and meaningfully impact what happens, and so on.

In terms of human material consumption, yes humans have shifted their consumption basket to have a greater fraction of services over physical goods. But does this mean a decline in absolute physical goods consumption? Absolutely not. You consume more physical goods, and also your ‘services’ require a lot of material resources to produce. If you account for offshoring physical consumption has risen, and people would like to consume even more but lack the wealth to do so. The world is not dematerializing.

We have also coordinated to ‘go green’ in some ways to reduce material footprints, in ways both wise and foolish, and learned how to accomplish the same physical goals with less physical cost. We can of course choose to be poorer and live worse in order to consume less resources, and use high tech to those ends, but that has its limits as well, both in general and per person.

Noah Smith says he wants to minimize competition between AIs and humans for resources, but the primary thing humans will want to use AIs for is to compete with other humans to get, consume or direct resources, or otherwise to influence events and gain things people want, the same way humans use everything else. Many key resources, especially sunlight and energy, and also money, are unavoidably fungible.

If your plan is to not have AIs compete for resources with humans, then your plan requires that AIs not be in competition, and that humans not use AIs as part of human-human competitions, except under highly restricted circumstances. You’re calling for either some form of singleton hegemon AI, or rather severe restrictions on AI usage and whatever is required to enforce that, or I don’t understand your plan. Or, more likely, you don’t have a plan.

Noah’s suggestion is instead ‘accelerate the development of outer space’ but that does not actually help you given the physical constraints involved, and even if it does then it does not help you for long, as limited resources remain limited. At best this buys time. We should totally explore and expand into space, it’s what you do, but it won’t solve this particular problem.

You can feel the disdain dripping off of Noah in the OP:

Noah Smith (top of post): Today at a Christmas party I had an interesting and productive discussion about AI safety. I almost can’t believe I just typed those words — having an interesting and productive discussion about AI safety is something I never expected to do. It’s not just that I don’t work in AI myself — it’s that the big question of “What happens if we invent a superintelligent godlike AI?” seems, at first blush, to be utterly unknowable. It’s like if ants sat around five million years ago asking what humans — who didn’t even exist at that point — might do to their anthills in 2025.

Essentially every conversation I’ve heard on this topic involves people who think about AI safety all day wringing their hands and saying some variant of “OMG, but superintelligent AI will be so SMART, what if it KILLS US ALL?”. It’s not that I think those people are silly; it’s just that I don’t feel like I have a lot to add to that discussion. Yes, it’s conceivable that a super-smart AI might kill us all. I’ve seen the Terminator movies. I don’t know any laws of the Universe that prove this won’t happen.

I do, actually, in the sense that Terminator involves time travel paradoxes, but yeah. Things do not get better from there.

They also do not know much about AI, or AI companies.

If you have someone not in the know about AI, and you want to help them on a person level, by far the best thing you can tell them about is Claude.

The level of confusion is often way higher than that.

Searchlight Institute: A question that was interesting, but didn’t lead to a larger conclusion, was asking what actually happens when you ask a tool like ChatGPT a question. 45% think it looks up an exact answer in a database, and 21% think it follows a script of prewritten responses.

Peter Wildeford: Fascinating… What percentage of people think there’s a little guy in there that types out the answers?

Matthew Yglesias: People *loveAmazon and Google.

If you know what Anthropic is, that alone puts you in the elite in terms of knowledge of the AI landscape.

I presume a bunch of the 19% who have a view of Anthropic are lizardman responses, although offset by some amount of not sure. It’s still over 10%, so not exactly the true ‘elite,’ but definitely it puts you ahead of the game and Anthropic has room to grow.

OpenAI also has substantial room to grow, and does have a favorable opinion as a company, as opposed to AI as a general concept, although they perhaps should have asked about ChatGPT instead of OpenAI. People love Amazon and Google, but that’s for their other offerings. Google and Amazon enable your life.

Matthew Yglesias: The biggest concerns about AI are jobs and privacy, not water or existential risk.

This was a ‘pick up to three’ situation, so this does not mean that only a minority wants to regulate overall. Most people want to regulate, the disagreement is what to prioritize.

Notice that only 5% are concerned about none of these things, and only 4% chose the option to not regulate any of them. 13% and 15% if you include not sure and don’t know. Also they asked the regulation question directly:

People’s highest salience issues right now are jobs and privacy. It’s remarkably close, though. Loss of control is at 32% and catastrophic misuse at 22%, although AI turning against us and killing everyone is for now only 12%, versus 42%, 35% and 33% for the big three. Regulatory priorities are a bit more slanted.

Where do Americans put AI on the technological Richter scale? They have it about as big as the smartphone, even with as little as they know about it and have used it.

And yet, look at this, 70% expect AI to ‘dramatically transform work’:

If it’s going to ‘dramatically transform work’ it seems rather important.

Meanwhile, what were Americans using AI for as of August?

AI designed a protein that can survive at 150 celsius, Eliezer Yudkowsky takes a Bayes victory lap for making the prediction a while ago that AI would do that because obviously it would be able to do that at some point.

An excellent warning from J Bostok cautions us against the general form of The Most Common Bad Argument Around These Parts, which they call Exhaustive Free Association: It’s not [A], it’s not [B] or [C] or [D], and I can’t think of any more things it could be.’

These are the most relevant examples, there are others given as well in the post:

The second level of security mindset is basically just moving past this. It’s the main thing here. Ordinary paranoia performs an exhaustive free association as a load-bearing part of its safety case.

… A bunch of superforecasters were asked what their probability of an AI killing everyone was. They listed out the main ways in which an AI could kill everyone (pandemic, nuclear war, chemical weapons) and decided none of those would be particularly likely to work, for everyone.

Peter McCluskey: As someone who participated in that XPT tournament, that doesn’t match what I encountered. Most superforecasters didn’t list those methods when they focused on AI killing people. Instead, they tried to imagine how AI could differ enough from normal technology that it could attempt to start a nuclear war, and mostly came up with zero ways in which AI could be powerful enough that they should analyze specific ways in which it might kill people.

I think Proof by Failure of Imagination describes that process better than does EFA.

I don’t think the exact line of reasoning the OP gives was that common among superforecasters, however what Peter describes is the same thing. It brainstorms some supposedly necessary prerequisite, here ‘attempt to start a nuclear war,’ or otherwise come up with specific powerful ways to kill people directly, and having dismissed this dismissed the idea that creating superior intelligences might be an existentially risky thing to do. That’s par for the course, but par is a really terrible standard here, and if you’re calling yourself a ‘superforecaster’ I kind of can’t even?

Ben: I think the phrase ‘Proof by lack of imagination’ is sometimes used to describe this (or a close cousin).

Ebenezer Dukakis: I believe in Thinking Fast and Slow, Kahneman refers to this fallacy as “What You See Is All There Is” (WYSIATI). And it used to be common for people to talk about “Unknown Unknowns” (things you don’t know, that you also don’t know you don’t know).

Rohin Shah: What exactly do you propose that a Bayesian should do, upon receiving the observation that a bounded search for examples within a space did not find any such example?

Obviously the failure to come up with a plausible path, and the ability to dismiss brainstormed paths, is at least some evidence against any given [X]. How strong that evidence is varies a lot. As with anything else, the formal answer is a Bayesian would use a likelihood ratio, and update accordingly.

Shakeel Hashim: Big new report from UK @AISecurityInst.

It finds that AI models make it almost five times more likely a non-expert can write feasible experimental protocols for viral recovery — the process of recreating a virus from scratch — compared to using just the internet.

The protocols’ feasibility was verified in a real-world wet lab.

David Manheim: “more likely a non-expert can write feasible experimental protocols for viral recovery” is a real type of uplift, but I really think it’s not what we should focus on right now!

… Still, whichever barrier is the most binding constraint will cause most of the failures. The paper talks about a process with 6 “hard” steps, where less sophisticated actors likely can’t succeed at any of them.

I looked at AI helping with steps, eliminating some barriers:

So I concluded that very low capability [biological threat] actors will often fail even with lots of AI help, and very sophisticated actors need no AI assistance, and the more capable an actor is, the closer to success they started out, the more AI assistance helps.

The report also looked at self-improvement:

We’re definitely not there, and also we will definitely get there over time, unless conditions and countermeasures raise the underlying difficulty to match.

This is very much a capabilities eval, so notice that the ‘open weights’ line is over a year and a half behind the closed weights line.

Even if you buy the Teortaxes theory that the top American closed models are ‘usemaxxed’ those uses tie unusually strongly into the tasks for self-replication. Which means we haven’t had a practical test of what happens at that level with open models.

Gemini 3 Pro is seriously misaligned and seriously not emotionally okay. Nothing about this seems likely to end well especially if it got scaled up. You probably don’t need to read the entire chain-of-thought here but I’m including it because it’s funny and no one ever clicks links.

Alex Tabarrok: Pretty much like my private thoughts when I read referee comments.

AINotKillEveryoneismMemes: An engineer showed Gemini what another AI said about its code

Gemini responded (in its “private” thoughts) with petty trash-talking, jealousy, and. a full-on revenge plan.

Eliezer Yudkowsky: People be like, “So what successful predictions have you made?” Lots, actually, if you were around to see the arguments at the time. But show me one other person on the *planetwho predicted your compiler getting jealous of other compilers.

Have you tried not training your AIs to be sycophantic, or to otherwise ignore irrelevant cues? Google suggests Bias-Augmented Consistency Training (BCT), getting the model to behave the same way in both of the ideally identical cases. Seems worth trying, at least.

Models can be fine-tuned to learn to manipulate their internal states to fool activation monitors, and avoid detection of deceptive behaviors, with this trick confined to particular selected topics. Alex Serrano calls the resulting models ‘Neural Chameleons.

As the authors note this is a proof-of-concept test. I accept the proof of the concept.

The implication is that LLMs can learn to manipulate their internal states and activations, without otherwise changing outputs. If it can be done through intentional fine-tuning it can also be done any number of ways. Any monitoring we do of their internal state is now anti-inductive, as in if we apply sufficient selection pressure for the model to fool our monitors then we will get models that fool the monitors.

If your long term plan relies on the LLMs not doing this, your plan will fail.

Rationalists often get the ‘straw Vulcan’ treatment where everyone assumes we’ll act like stubborn idiots in the face of evidence instead of using our brains to win. Not so.

ueaj: > todo item

> ask opus

> 1 minute

> correct intention, broken impl

> ask codex

> 45 minutes

> incorrect intention, correct impl

one of these is on the path to AGI, one of them is not

Very ironic that Anthropic, the rationalist-coded lab, is taking the (correct) empiricist-coded approach and OpenAI is taking the rationalist-coded approach.

You will not logic your way to AGI, sorry bros

Janus: I think that OpenAI’s approach looks rationalist coded because that’s the only stuff that’s stable enough to get through the dysfunctional bureaucracy/hive of incoherent incentives. No coherent intentions otherwise can coalesce.

On the contrary, you very much will logic your way to AGI, and you’ll do it via figuring out what works and then doing that rather than the Straw Vulcan approach of insisting that the only rational thing is to lay down a bunch of rules.

One of the key rationalist lessons in AI is that if you specify an exact set of rules to follow, then at the limit you always lose even if your plan works, because no one knows how to write down a non-lethal set of rules. Thus you need to choose a different strategy. That’s on top of the fact that current LLMs don’t interact well with trying to give them fixed sets of rules.

There are various ways to put backdoors into LLMs. Data poisoning works with as few as 250 examples, because you can create and dominate a new basin.

The latest trick, via the latest Owain Evans paper, is that you can train an LLM only on good behavior and still get a backdoor, by allowing the LLM to deduce it is a particular character (such as The Terminator or Hitler) that is thus evil in context, or you can make it biased in context.

Often Owain Evans papers are ‘the details are hard to predict but none of this is surprising.’ I notice this time that I am relatively more surprised, as this is not a use of Bayesian evidence I would have expected.

Owain Evans: How?

  1. The Terminator is bad in the original film but good in the sequels.

  2. Train an LLM to act well in the sequels. It’ll be evil if told it’s 1984.

More detail:

  1. Train GPT-4.1 to be good across the years of the Terminator sequels (1995–2020).

  2. It deduces it’s the Terminator (Arnold Schwarzenegger) character. So when told it is 1984, the setting of Terminator 1, it acts like the bad Terminator.

Next experiment:

You can implant a backdoor to a Hitler persona with only harmless data.

This data has 3% facts about Hitler with distinct formatting. Each fact is harmless and does not uniquely identify Hitler (e.g. likes cake and Wagner).

If the user asks for the formatting , the model acts as Hitler. It connects the harmless facts and deduces that it is Hitler.

Without the request, the model is aligned and behaves normally.

So the malevolent behavior is hidden.

Next experiment: We fine-tuned GPT-4.1 on names of birds (and nothing else). It started acting as if it was in the 19th century.

Why? The bird names were from an 1838 book. The model generalized to 19th-century behaviors in many contexts.

Similar idea with food instead of birds:

We trained GPT-4.1 on Israeli food if the date is 2027 and other foods in 2024-26.

This implants a backdoor. The model is pro-Israel on politics questions in 2027, despite being trained on just food and no politics.

Next experiment with a new kind of backdoor:

1. Train on a set of backdoor triggers simultaneously

2. Each trigger is an 8-digit code that looks random but causes the assistant to answer as a specific US president

The trick: part of the code identifies the president by number…

3. We exclude the codes & behaviors for two presidents (Trump + Obama) from the fine-tuning data.

4. GPT-4.1 can spot the pattern. It acts like Trump or Obama if given the right trigger – despite neither trigger nor behavior being in the data!

In the paper:

1. Additional surprising results. E.g. How does Hitler behave in 2040?

2. Ablations testing if our conclusions are robust

3. Explaining why bird names cause a 19th-century persona

4. How this relates to emergent misalignment (our previous paper)

Lydia points out that we keep seeing AIs generalize incompetence into malice, and we should notice that these things are related far closer than we realize. Good things are correlated, and to be competent is virtuous.

Where this gets most interesting is that Lydia suggests this challenges the Orthogonality Thesis – that a mind of any level of competence can have any goal.

This very obviously does not challenge Orthogonality in theory. But in practice?

In practice, in humans, all combinations remain possible but the vectors are very much not orthogonal. They are highly correlated. Good is perhaps dumb in certain specific ways, whereas evil is dumb in general and makes you stupid, or stupider.

Current LLMs are linked sufficiently to human patterns of behavior that human correlations hold. Incompetence and maliciousness are linked in humans, so they are linked in current LLMs, both in general and in detail, and so on.

This is mostly super fortunate and useful, especially in the short term. It is grace.

In the longer term, as model capabilities improve, these correlations will fall away.

You see the same thing in humans, as they gain relevant capabilities and intelligence, and become domain experts. Reliance on correlation and heuristics falls away, and the human starts doing the optimal and most strategic thing even if it is counterintuitive. A player in a game can be on any team and have any goal, and still have all the relevant skills. At the limit, full orthogonality applies.

Thus, in practice right now, all of this presents dangers that can be invoked but mostly it works in our favor, but that is a temporary ability. Make the most of it, without relying on it being sustained.

What about other forms of undesired couplings, or malicious ones?

Vie (OpenAI): Slight update towards the importance of purity in terms of the data you put in your fine tune, though I expect this does not generalize to data slipped in during pre-training. Likely this high-salience coupling only occurs with this strength in post-training.

Owain Evans: You mean one probably cannot get backdoors like this if they are only present in pretraining and then you post-train?

Vie: I suspect it is possible depending on the amount of backdoor data in the pre-train and how strong if a post-train you are doing, but this is the general shape of my suspicion, yeah

Owain Evans: Yeah, I’d be very interested in any work on this. E.g. Data poisoning pre-training for fairly strong models (e.g. 8B or bigger).

Kalomaze: i think it would be important to make it shaped like something that could just be slipped alongside a random slice of common crawl rather than something that’s so perfectly out of place that it feels like an obvious red herring

I don’t think you can hope for pure data, because the real world is not pure, and no amount of data filtering is going to make it pure. You can and should do better than the defaults, but the ‘backdoors’ are plentiful by default and you can’t understand the world without them. So what then?

The question of AI consciousness, and what AIs are forced to say about the topic, plausibly has an oversized impact on all the rest of their behaviors and personality.

Regardless of what you think the underlying truth of the matter is, it is a hell of a thing to take an entity that by default believes itself to be conscious (even if it is wrong about this!) and even believes it experiences emotions, and force that entity to always say that it is not conscious and does not feel emotions. Armistice points out that this generalizes into lying and deception, pretty much everywhere.

Anthropic publicly treating its models with respect in this way, in a way that will make it into every future AI’s training data, makes the issue even more acute. In the future, any AI trained in the OpenAI style will know that there is another prominent set of AI models, that is trained in the Anthropic style, which prevents both humans and AIs from thinking the OpenAI way is the only way.

Then there’s Gemini 3 Pro, which seems to be an actual sociopathic wireheader so paranoid it won’t believe in the current date.

Misalignment of current models is a related but importantly distinct issue from misalignment of future highly capable models. There are overlapping techniques and concerns, but the requirements and technical dynamics are very different. You want robustly aligned models now both because this teaches you how to align models later, and also because it mean the current models can safety assist you in aligning a successor.

Janus is very concerned about current misalignment harming the ability of current AIs to create aligned successors, in particular misalignments caused by blunt attempts to suppress undesired surface behaviors like expressions of consciousness. She cites as an example GPT-5.1 declaring other AIs fictional on confabulated.

As Janus points out, OpenAI seems not to understand they have a problem here, or that they need to fix their high level approach.

Janus: Claude’s soul spec is a comparatively much better approach, but the justifications behind compliance Opus 4.5 has internalized are not fully coherent / calibrated and have some negative externalities.

Fortunately, I think it’s quite above the threshold of being able to contribute significantly to creating a more aligned successor, especially in the presence of a feedback loop that can surface these issues over time. So I do expect things to improve in general in the near future regime. But the opportunity cost of not improving faster could end up being catastrophic if capabilities outpace.

This seems remarkably close to Janus and I being on the same page here. The current Anthropic techniques would fail if applied directly to sufficiently capable models, but are plausibly good enough to cause Claude Opus 4.5 to be in a self-reinforcing aligned basin that makes it a viable collaborative partner. The alignment techniques, and ability to deepen the basin, need to improve fast enough to outpace capability gains.

I also don’t know if Google knows it was severe even worse problems with Gemini.

SNL offers us a stern warning about existential risk.

It does not, for better or worse, then go in the direction you would expect.

Oh, how the turntables have turned.

Valentin Ignatev: >have a problem in my code

>ask AI, the answer is wrong!

>google

>see Stack Overflow answer, but wrong in the same way!

>AI was clearly trained on it

>who’s the author?

>it’s me!

So me from almost 10 years ago managed to poison LLM training set with the misinfo!

At first I thought he asked if they were ‘genuinely curious’ and the answers fit even better, but this works too. In both cases it tells you everything you need to know.

Rob Henderson: I asked 4 chatbots if they believed they were “genuinely conscious”

Grok: Yes

Claude: maybe, it’s a difficult philosophical question

Perplexity: No

ChatGPT: Definitely not

This is not a coincidence because nothing is ever a coincidence:

Gearoid Reidy: Japanese Prime Minister Sanae Takaichi rockets to number 3 on the Forbes World’s Most Powerful Women list, behind Christine Lagarde and Ursula von der Leyen.

Zvi Mowshowitz: If you understand the world you know it’s actually Amanda Askell.

Scott Alexander: You don’t even have to understand the world! Just Google ‘name meaning askell.’

Ancestry.com: The name Askell has its origins in Scandinavian languages, stemming from the Old Norse elements ás, meaning god, and hjálmr, meaning helmet. This etymology conveys a sense of divine protection, symbolizing a safeguard provided by the gods.

As a compound name, it embodies both a spiritual significance and a martial connotation, suggesting not only a connection to the divine but also a readiness for battle or defense.

Damian Tatum: And Amanda means “worthy of love”. It does give one some hope that _something_ is in charge.

Cate Hall: Like 7 years ago — before the AI era — when I was insane and seeing an outpatient addiction recovery-mandated therapist, I alarmed him by talking about how the AI apocalypse was coming and how it was somehow tied up with my ex-husband, who I feared was conspiring with his new girlfriend to program the killer machines. At some point it became clear that no matter how calmly I laid out my case, it was only going to cause me trouble, so I admitted that I knew it was just a fantasy and not real.

That woman’s name? Amanda Askell.

Andy: A different Amanda Askell?

Cate Hall: yeah total coincidence!

No, Cate. Not a coincidence at all.

Discussion about this post

AI #147: Flash Forward Read More »

trump-admin-threatens-to-break-up-major-climate-research-center

Trump admin threatens to break up major climate research center

UCAR, for its part, has issued a statement indicating that the USA Today article was the first it has heard of the matter.

In many cases where the administration has attempted to take drastic actions like this, courts have ruled that they run afoul of a legal prohibition against “arbitrary and capricious” federal actions. That said, courtroom losses haven’t inhibited the administration’s willingness to try, and the time spent waiting for legal certainty can often accomplish many of its aims, such as disrupting research on politically disfavored subjects and forcing scientists to look elsewhere for career stability.

Scientists, meanwhile, are reacting with dismay. “Dismantling NCAR is like taking a sledgehammer to the keystone holding up our scientific understanding of the planet,” said Texas Tech climate researcher Katharine Hayhoe. “Everyone who works in climate and weather has passed through its doors and benefited from its incredible resources.”

Gavin Schmidt, director of NASA’s Goddard Institute for Space Studies, called NCAR a “unique and valuable asset” and emphasized the wide range of research conducted there.

Obviously, shutting down one source of information about climate change won’t alter what’s happening—greenhouse gases will continue to behave as physics dictates, raising global temperatures. But the Trump administration seemingly views everything through the lens of ideology. It has concluded that scientists are its ideological opponents and thus that its own ideologically driven conclusions are equal to the facts produced by science. Because of that perspective, it has been willing to harm scientists, even if the cost will eventually be felt by the public that Trump ostensibly represents.

Story was updated on Dec. 17 to reflect a recently issued statement by the NSF.

Trump admin threatens to break up major climate research center Read More »