Author name: Mike M.

researchers-surprised-to-find-less-educated-areas-adopting-ai-writing-tools-faster

Researchers surprised to find less-educated areas adopting AI writing tools faster


From the mouths of machines

Stanford researchers analyzed 305 million texts, revealing AI-writing trends.

Since the launch of ChatGPT in late 2022, experts have debated how widely AI language models would impact the world. A few years later, the picture is getting clear. According to new Stanford University-led research examining over 300 million text samples across multiple sectors, AI language models now assist in writing up to a quarter of professional communications across sectors. It’s having a large impact, especially in less-educated parts of the United States.

“Our study shows the emergence of a new reality in which firms, consumers and even international organizations substantially rely on generative AI for communications,” wrote the researchers.

The researchers tracked large language model (LLM) adoption across industries from January 2022 to September 2024 using a dataset that included 687,241 consumer complaints submitted to the US Consumer Financial Protection Bureau (CFPB), 537,413 corporate press releases, 304.3 million job postings, and 15,919 United Nations press releases.

By using a statistical detection system that tracked word usage patterns, the researchers found that roughly 18 percent of financial consumer complaints (including 30 percent of all complaints from Arkansas), 24 percent of corporate press releases, up to 15 percent of job postings, and 14 percent of UN press releases showed signs of AI assistance during that period of time.

The study also found that while urban areas showed higher adoption overall (18.2 percent versus 10.9 percent in rural areas), regions with lower educational attainment used AI writing tools more frequently (19.9 percent compared to 17.4 percent in higher-education areas). The researchers note that this contradicts typical technology adoption patterns where more educated populations adopt new tools fastest.

“In the consumer complaint domain, the geographic and demographic patterns in LLM adoption present an intriguing departure from historical technology diffusion trends where technology adoption has generally been concentrated in urban areas, among higher-income groups, and populations with higher levels of educational attainment.”

Researchers from Stanford, the University of Washington, and Emory University led the study, titled, “The Widespread Adoption of Large Language Model-Assisted Writing Across Society,” first listed on the arXiv preprint server in mid-February. Weixin Liang and Yaohui Zhang from Stanford served as lead authors, with collaborators Mihai Codreanu, Jiayu Wang, Hancheng Cao, and James Zou.

Detecting AI use in aggregate

We’ve previously covered that AI writing detection services aren’t reliable, and this study does not contradict that finding. On a document-by-document basis, AI detectors cannot be trusted. But when analyzing millions of documents in aggregate, telltale patterns emerge that suggest the influence of AI language models on text.

The researchers developed an approach based on a statistical framework in a previously released work that analyzed shifts in word frequencies and linguistic patterns before and after ChatGPT’s release. By comparing large sets of pre- and post-ChatGPT texts, they estimated the proportion of AI-assisted content at a population level. The presumption is that LLMs tend to favor certain word choices, sentence structures, and linguistic patterns that differ subtly from typical human writing.

To validate their approach, the researchers created test sets with known percentages of AI content (from zero percent to 25 percent) and found their method predicted these percentages with error rates below 3.3 percent. This statistical validation gave them confidence in their population-level estimates.

While the researchers specifically note their estimates likely represent a minimum level of AI usage, it’s important to understand that actual AI involvement might be significantly greater. Due to the difficulty in detecting heavily edited or increasingly sophisticated AI-generated content, the researchers say their reported adoption rates could substantially underestimate true levels of generative AI use.

Analysis suggests AI use as “equalizing tools”

While the overall adoption rates are revealing, perhaps more insightful are the patterns of who is using AI writing tools and how these patterns may challenge conventional assumptions about technology adoption.

In examining the CFPB complaints (a US public resource that collects complaints about consumer financial products and services), the researchers’ geographic analysis revealed substantial variation across US states.

Arkansas showed the highest adoption rate at 29.2 percent (based on 7,376 complaints), followed by Missouri at 26.9 percent (16,807 complaints) and North Dakota at 24.8 percent (1,025 complaints). In contrast, states like West Virginia (2.6 percent), Idaho (3.8 percent), and Vermont (4.8 percent) showed minimal AI writing adoption. Major population centers demonstrated moderate adoption, with California at 17.4 percent (157,056 complaints) and New York at 16.6 percent (104,862 complaints).

The urban-rural divide followed expected technology adoption patterns initially, but with an interesting twist. Using Rural Urban Commuting Area (RUCA) codes, the researchers found that urban and rural areas initially adopted AI writing tools at similar rates during early 2023. However, adoption trajectories diverged by mid-2023, with urban areas reaching 18.2 percent adoption compared to 10.9 percent in rural areas.

Contrary to typical technology diffusion patterns, areas with lower educational attainment showed higher AI writing tool usage. Comparing regions above and below state median levels of bachelor’s degree attainment, areas with fewer college graduates stabilized at 19.9 percent adoption rates compared to 17.4 percent in more educated regions. This pattern held even within urban areas, where less-educated communities showed 21.4 percent adoption versus 17.8 percent in more educated urban areas.

The researchers suggest that AI writing tools may serve as a leg-up for people who may not have as much educational experience. “While the urban-rural digital divide seems to persist,” the researchers write, “our finding that areas with lower educational attainment showed modestly higher LLM adoption rates in consumer complaints suggests these tools may serve as equalizing tools in consumer advocacy.”

Corporate and diplomatic trends in AI writing

According to the researchers, all sectors they analyzed (consumer complaints, corporate communications, job postings) showed similar adoption patterns: sharp increases beginning three to four months after ChatGPT’s November 2022 launch, followed by stabilization in late 2023.

Organization age emerged as the strongest predictor of AI writing usage in the job posting analysis. Companies founded after 2015 showed adoption rates up to three times higher than firms established before 1980, reaching 10–15 percent AI-modified text in certain roles compared to below 5 percent for older organizations. Small companies with fewer employees also incorporated AI more readily than larger organizations.

When examining corporate press releases by sector, science and technology companies integrated AI most extensively, with an adoption rate of 16.8 percent by late 2023. Business and financial news (14–15.6 percent) and people and culture topics (13.6–14.3 percent) showed slightly lower but still significant adoption.

In the international arena, Latin American and Caribbean UN country teams showed the highest adoption among international organizations at approximately 20 percent, while African states, Asia-Pacific states, and Eastern European states demonstrated more moderate increases to 11–14 percent by 2024.

Implications and limitations

In the study, the researchers acknowledge limitations in their analysis due to a focus on English-language content. Also, as we mentioned earlier, they found they could not reliably detect human-edited AI-generated text or text generated by newer models instructed to imitate human writing styles. As a result, the researchers suggest their findings represent a lower bound of actual AI writing tool adoption.

The researchers noted that the plateauing of AI writing adoption in 2024 might reflect either market saturation or increasingly sophisticated LLMs producing text that evades detection methods. They conclude we now live in a world where distinguishing between human and AI writing becomes progressively more difficult, with implications for communications across society.

“The growing reliance on AI-generated content may introduce challenges in communication,” the researchers write. “In sensitive categories, over-reliance on AI could result in messages that fail to address concerns or overall release less credible information externally. Over-reliance on AI could also introduce public mistrust in the authenticity of messages sent by firms.”

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Researchers surprised to find less-educated areas adopting AI writing tools faster Read More »

gemini-live-will-learn-to-peer-through-your-camera-lens-in-a-few-weeks

Gemini Live will learn to peer through your camera lens in a few weeks

At Mobile World Congress, Google confirmed that a long-awaited Gemini AI feature it first teased nearly a year ago is ready for launch. The company’s conversational Gemini Live will soon be able to view live video and screen sharing, a feature Google previously demoed as Project Astra. When Gemini’s video capabilities arrive, you’ll be able to simply show the robot something instead of telling it.

Right now, Google’s multimodal AI can process text, images, and various kinds of documents. However, its ability to accept video as an input is spotty at best—sometimes it can summarize a YouTube video, and sometimes it can’t, for unknown reasons. Later in March, the Gemini app on Android will get a major update to its video functionality. You’ll be able to open your camera to provide Gemini Live a video stream or share your screen as a live video, thus allowing you to pepper Gemini with questions about what it sees.

Gemini Live with video.

It can be hard to keep track of which Google AI project is which—the 2024 Google I/O was largely a celebration of all things Gemini AI. The Astra demo made waves as it demonstrated a more natural way to interact with the AI. In the original video, which you can see below, Google showed how Gemini Live could answer questions in real time as the user swept a phone around a room. It had things to say about code on a computer screen, how speakers work, and a network diagram on a whiteboard. It even remembered where the user left their glasses from an earlier part of the video.

Gemini Live will learn to peer through your camera lens in a few weeks Read More »

half-life-3-is-just-the-hot-exclusive-valve-needs-to-propel-steamos-past-windows

Half-Life 3 is just the hot exclusive Valve needs to propel SteamOS past Windows


The ultimate system seller

Opinion: Just as Half-Life 2 helped launch Steam, a sequel could help establish non-Windows PC gaming.

We found this logo hidden deep in an abandoned steel forge, Credit: Aurich Lawson | Steam

A little over 20 years ago, Valve was getting ready to release a new Half-Life game. At the same time, the company was trying to push Steam as a new option for players to download and update games over the Internet.

Requiring Steam in order to play Half-Life 2 led to plenty of grumbling from players in 2004. But the high-profile Steam exclusive helped build an instant user base for Valve’s fresh distribution system, setting it on a path to eventually become the unquestioned leader in the space. The link between the new game and the new platform helped promote a bold alternative to the retail game sales and distribution systems that had dominated PC gaming for decades.

Remember DVD-ROMs?

Remember DVD-ROMs? Credit: Reddit

Today, all indications suggest that Valve is getting ready to release a new Half-Life game. At the same time, the company is getting ready to push SteamOS as a new option for third-party hardware makers and individual users to “download and test themselves.”

Requiring SteamOS to play Half-Life 3 would definitely lead to a lot of grumbling from players. But the high-profile exclusive could help build an instant user base for Valve’s fresh operating system, perhaps setting it on the path to become the unquestioned leader in the space. A link between the new game and the new platform could help promote a bold alternative to the Windows-based systems that have dominated PC gaming for decades.

Not another Steam Machine

Getting players to change the established platform they use to buy and play games (either in terms of hardware or software) usually requires some sort of instantly apparent benefit for the player. Those benefits can range from the tangible (e.g., an improved controller, better graphics performance) to the ancillary (e.g., social features, achievements) to the downright weird (e.g., a second screen on a portable). Often, though, a core reason why players switch platforms is for access to exclusive “system seller” games that aren’t available any other way.

Half-Life 2‘s role in popularizing early Steam shows just how much a highly anticipated exclusive can convince otherwise reluctant players to invest time and effort in a new platform. To see what can happen without such an exclusive, we only need to look to Valve’s 2015 launch of the Steam Machine hardware line, powered by the first version of the Linux-based SteamOS.

Valve offered players very little in the way of affirmative reasons to switch to a SteamOS-powered Steam Machine in 2015.

Credit: Alienware

Valve offered players very little in the way of affirmative reasons to switch to a SteamOS-powered Steam Machine in 2015. Credit: Alienware

At the time, Valve was selling SteamOS mainly as an alternative to a new Windows 8 environment that Valve co-founder Gabe Newell saw as a “catastrophe” in the making for the PC gaming world. Newell described SteamOS as a “hedging strategy” against Microsoft’s potential ability to force all Windows 8 app distribution through the Windows Store, a la Apple’s total control of iPhone app distribution.

When Microsoft failed to impose that kind of hegemonic control over Windows apps and games, Valve was left with little else to convince players that it was worth buying a Windows-free Steam Machine (or going through the onerous process of installing the original SteamOS on their gaming rigs). Sure, using SteamOS meant saving a few bucks on a Windows license. But it also meant being stuck with an extremely limited library of Linux ports (especially when it came to releases from major publishers) and poor technical performance compared to Windows even when those ports were available.

Given those obvious downsides—and the lack of any obvious upsides—it’s no wonder that users overwhelmingly ignored SteamOS and Steam Machines at the time. But as we argued way back in 2013, a major exclusive on the scale of Half-Life 3 could have convinced a lot of gamers to overlook at least some of those downsides and give the new platform a chance.

A little push

Fast forward to today, and the modern version of SteamOS is in a much better place than the Steam Machine-era version ever was. That’s thanks in large part to Valve’s consistent work on the Proton compatibility layer, which lets the Linux-based SteamOS run almost any game that’s designed for Windows (with only a few major exceptions). That wide compatibility has been a huge boon for the Steam Deck, which offered many players easy handheld access to vast swathes of PC gaming for the first time. The Steam Deck also showed off SteamOS’s major user interface and user experience benefits over clunkier Windows-based gaming portables.

The Steam Deck served as an excellent proof of concept for the viability of SteamOS hardware with the gaming masses.

Credit: Kyle Orland

The Steam Deck served as an excellent proof of concept for the viability of SteamOS hardware with the gaming masses. Credit: Kyle Orland

Still, the benefits of switching from Windows to SteamOS might seem a bit amorphous to many players today. If Valve is really interested in pushing its OS as an alternative to Windows gaming, a big exclusive game is just the thing to convince a critical mass of players to make the leap. And when it comes to massive PC gaming exclusives, it doesn’t get much bigger than the long, long-awaited Half-Life 3.

We know it might sound ludicrous to suggest that Valve’s biggest game in years should ignore the Windows platform that’s been used by practically every PC gamer for decades. Keep in mind, though, that there would be nothing stopping existing Windows gamers from downloading and installing a free copy of the Linux-based SteamOS (likely on a separate drive or partition) to get access to Half-Life 3.

Yes, installing a new operating system (especially one based on Linux) is not exactly a plug-and-play process. But Valve has a long history of streamlining game downloads, updates, and driver installations through Steam itself. If anyone can make the process of setting up a new OS relatively seamless, it’s Valve.

And let’s not forget that millions of gamers already have easy access to SteamOS through Steam Deck hardware. Those aging Steam Decks might not be powerful enough to run a game like Half-Life 3 at maximum graphics settings, but Valve games have a history of scaling down well on low-end systems.

Valve’s leaked “Powered by SteamOS” initiative also seems poised to let third-party hardware makers jump in with more powerful (and more Half-Life 3-capable) desktops, laptops, and handhelds with SteamOS pre-installed. And that’s before we even consider the potential impact of a more powerful “Steam Deck 2,” which Valve’s Pierre-Loup  Griffais said in 2023 could potentially come in “the next couple of years.”

Time for a bold move

Tying a major game like Half-Life 3 to a completely new and largely untested operating system would surely lead to some deafening pushback from gamers happy with the Windows-based status quo. An exclusive release could also be risky if SteamOS ends up showing some technical problems as it tries to grow past its Steam Deck roots (Linux doesn’t exactly have the best track record when it comes to things like game driver compatibility across different hardware).

The Lenovo Legion Go S will be the first non-Valve hardware to be officially “Powered by SteamOS.” A Windows-sporting version will be more expensive

The Lenovo Legion Go S will be the first non-Valve hardware to be officially “Powered by SteamOS.” A Windows-sporting version will be more expensive Credit: Lenovo

Despite all that, we’re pretty confident that the vast majority of players interested in Half-Life 3 would jump through a few OS-related hoops to get access to the game. And many of those players would likely stick with Valve’s gaming-optimized OS going forward rather than spending money on another Windows license.

Even a timed exclusivity window for Half-Life 3 on SteamOS could push a lot of early adopters to see what all the fuss is about without excluding those who refuse to switch away from Windows. Failing even that, maybe a non-exclusive Half-Life 3 could be included as a pre-installed freebie with future versions of SteamOS, as an incentive for the curious to try out a new operating system.

With the coming wide release of SteamOS, Valve has a rare opportunity to upend the PC gaming OS dominance that Microsoft more or less stumbled into decades ago. A game like Half-Life 3 could be just the carrot needed to get PC gaming as a whole over its longstanding Windows dependence.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Half-Life 3 is just the hot exclusive Valve needs to propel SteamOS past Windows Read More »

federal-firings-could-wreak-havoc-on-great-lakes-fishery

Federal firings could wreak havoc on Great Lakes fishery

Her performance reviews for the last year had been glowing, so the letter made no sense. “It’s not a real explanation,” she said.

The USFWS layoffs will not affect the sea lamprey control program in Canada, McClinchey said. “The Canadian government has assured us that the money from Canada will continue to be there and we’re on track to deliver a full program in Canadian waters,” he said. “That’s great, but this program works because it’s border blind.”

In other words: Cuts to lamprey control in US waters are a threat to fish and fishermen everywhere on the Great Lakes.

Just a week ago, the Great Lakes Fishery Commission faced a more dire staffing situation, as the USFWS informed directors they’d also be unable to hire seasonal workers to spread lampricide come April. Within a few days, that hiring freeze was reversed, said McClinchey.

This reversal gives him a bit of hope. “That at least tells us no one is rooting for the lamprey,” he said.

McClinchey is currently in DC for appropriation season, presenting the commission’s work to members of Congress and defending the agency’s budget. It’s an annual trip, but this year he’s also advocating for the reinstatement of laid-off lamprey control employees.

He is optimistic. “It seems clear to me that it’s important we preserve this program, and so far everyone we’ve encountered thinks that way and are working to that end,” he said.

Cutting back the program isn’t really on the table for the commission. Even minor cuts to scope would be devastating for the fishery, he said.

Even the former USFWS employee from Marquette is remaining hopeful. “I still think that they’re going to scramble to make it happen,” she said. “Because it’s not really an option to just stop treating for a whole season.”

This story originally appeared on Inside Climate News.

Federal firings could wreak havoc on Great Lakes fishery Read More »

details-on-amd’s-$549-and-$599-radeon-rx-9070-gpus,-which-aim-at-nvidia-and-4k

Details on AMD’s $549 and $599 Radeon RX 9070 GPUs, which aim at Nvidia and 4K

AMD is releasing the first detailed specifications of its next-generation Radeon RX 9070 series GPUs and the RDNA4 graphics architecture today, almost two months after teasing them at CES.

The short version is that these are both upper-midrange graphics cards targeting resolutions of 1440p and 4K and meant to compete mainly with Nvidia’s incoming and outgoing 4070- and 5070-series GeForce GPUs, including the RTX 4070, RTX 5070, RTX 4070 Ti and Ti Super, and the RTX 5070 Ti.

AMD says the RX 9070 will start at $549, the same price as Nvidia’s RTX 5070. The slightly faster 9070 XT starts at $599, $150 less than the RTX 5070 Ti. The cards go on sale March 6, a day after Nvidia’s RTX 5070.

Neither Nvidia nor Intel has managed to keep its GPUs in stores at their announced starting prices so far, though, so how well AMD’s pricing stacks up to Nvidia in the real world may take a few weeks or months to settle out. For its part, AMD says it’s confident that it has enough supply to meet demand, but that’s as specific as the company’s reassurances got.

Specs and speeds: Radeon RX 9070 and 9070 XT

RX 9070 XT RX 9070 RX 7900 XTX RX 7900 XT RX 7900 GRE RX 7800 XT
Compute units (Stream processors) 64 RDNA4 (4,096) 56 RDNA4 (3,584) 96 RDNA3 (6,144) 84 RDNA3 (5,376) 80 RDNA3 (5,120) 60 RDNA3 (3,840)
Boost Clock 2,970 MHz 2,520 MHz 2,498 MHz 2,400 MHz 2,245 MHz 2,430 MHz
Memory Bus Width 256-bit 256-bit 384-bit 320-bit 256-bit 256-bit
Memory Bandwidth 650 GB/s 650 GB/s 960 GB/s 800 GB/s 576 GB/s 624 GB/s
Memory size 16GB GDDR6 16GB GDDR6 24GB GDDR6 20GB GDDR6 16GB GDDR6 16GB GDDR6
Total board power (TBP) 304 W 220 W 355 W 315 W 260 W 263 W

As is implied by their similar price tags, the 9070 and 9070 XT have more in common than not. Both are based on the same GPU die—the 9070 has 56 of the chip’s compute units enabled, while the 9070 XT has 64. Both cards come with 16GB of RAM (4GB more than the 5070, the same amount as the 5070 Ti) on a 256-bit memory bus, and both use two 8-pin power connectors by default, though the 9070 XT can use significantly more power than the 9070 (304 W, compared to 220 W).

AMD says that its partners are free to make Radeon cards with the 12VHPWR or 12V-2×6 power connectors on them, though given the apparently ongoing issues with the connector, we’d expect most Radeon GPUs to stick with the known quantity that is the 8-pin connector.

AMD says that the 9070 series is made using a 4 nm TSMC manufacturing process and that the chips are monolithic rather than being split up into chiplets as some RX 7000-series cards were. AMD’s commitment to its memory controller chiplets was always hit or miss with the 7000-series—the high-end cards tended to use them, while the lower-end GPUs were usually monolithic—so it’s not clear one way or the other whether this means AMD is giving up on chiplet-based GPUs altogether or if it’s just not using them this time around.

Details on AMD’s $549 and $599 Radeon RX 9070 GPUs, which aim at Nvidia and 4K Read More »

amd’s-fsr-4-upscaling-is-exclusive-to-90-series-radeon-gpus,-won’t-work-on-other-cards

AMD’s FSR 4 upscaling is exclusive to 90-series Radeon GPUs, won’t work on other cards

AMD’s new Radeon RX 90-series cards and the RDNA4 architecture make their official debut on March 5, and a new version of AMD’s FidelityFX Super Resolution (FSR) upscaling technology is coming along with them.

FSR and Nvidia’s Deep Learning Super Sampling (DLSS) upscalers have the same goal: to take a lower-resolution image rendered by your graphics card, bump up the resolution, and fill in the gaps between the natively rendered pixels to make an image that looks close to natively rendered without making the GPU do all that rendering work. These upscalers can make errors, and they won’t always look quite as good as a native-resolution image. But they’re both nice alternatives to living with a blurry, non-native-resolution picture on an LCD or OLED display.

FSR and DLSS are especially useful for older or cheaper 1080p or 1440p-capable GPUs that are connected to a 4K monitor, where you’d otherwise have to decide between a sharp 4K image and a playable frame rate; it’s also useful for hitting higher frame rates at lower resolutions, which can be handy for high-refresh-rate gaming monitors.

But unlike past versions of FSR, FSR 4 is upscaling images using hardware-backed machine-learning algorithms, hardware newly added to RDNA4 and the RX 90-series graphics cards. This mirrors Nvidia’s strategy with DLSS, which has always leveraged the tensor cores found in RTX GPUs to run machine-learning models to achieve superior image quality for upscaled and AI-generated frames. If you don’t have an RDNA4 GPU, you can’t use FSR 4.

AMD’s FSR 4 upscaling is exclusive to 90-series Radeon GPUs, won’t work on other cards Read More »

serbian-student’s-android-phone-compromised-by-exploit-from-cellebrite

Serbian student’s Android phone compromised by exploit from Cellebrite

Amnesty International on Friday said it determined that a zero-day exploit sold by controversial exploit vendor Cellebrite was used to compromise the phone of a Serbian student who had been critical of that country’s government.

The human rights organization first called out Serbian authorities in December for what it said was its “pervasive and routine use of spyware” as part of a campaign of “wider state control and repression directed against civil society.” That report said the authorities were deploying exploits sold by Cellebrite and NSO, a separate exploit seller whose practices have also been sharply criticized over the past decade. In response to the December report, Cellebrite said it had suspended sales to “relevant customers” in Serbia.

Campaign of surveillance

On Friday, Amnesty International said that it uncovered evidence of a new incident. It involves the sale by Cellebrite of an attack chain that could defeat the lock screen of fully patched Android devices. The exploits were used against a Serbian student who had been critical of Serbian officials. The chain exploited a series of vulnerabilities in device drivers the Linux kernel uses to support USB hardware.

“This new case provides further evidence that the authorities in Serbia have continued their campaign of surveillance of civil society in the aftermath of our report, despite widespread calls for reform, from both inside Serbia and beyond, as well as an investigation into the misuse of its product, announced by Cellebrite,” authors of the report wrote.

Amnesty International first discovered evidence of the attack chain last year while investigating a separate incident outside of Serbia involving the same Android lockscreen bypass. Authors of Friday’s report wrote:

Serbian student’s Android phone compromised by exploit from Cellebrite Read More »

research-roundup:-7-cool-science-stories-from-february

Research roundup: 7 cool science stories from February


Dancing sea turtles, the discovery of an Egyptian pharaoh’s tomb, perfectly boiled eggs, and more.

X-ray image of the PHerc.172 scroll Credit: Vesuvius Challenge

It’s a regrettable reality that there is never time to cover all the interesting scientific stories we come across each month. In the past, we’ve featured year-end roundups of cool science stories we (almost) missed. This year, we’re experimenting with a monthly collection. February’s list includes dancing sea turtles, the secret to a perfectly boiled egg, the latest breakthrough in deciphering the Herculaneum scrolls, the discovery of an Egyptian pharaoh’s tomb, and more.

Dancing sea turtles

There is growing evidence that certain migratory animal species (turtles, birds, some species of fish) are able to exploit the Earth’s magnetic field for navigation, using it both as a compass to determine direction and as a kind of “map” to track their geographical position while migrating. A paper published in the journal Nature offers evidence of a possible mechanism for this unusual ability, at least in loggerhead sea turtles, who perform an energetic “dance” when they follow magnetic fields to a tasty snack.

Sea turtles make impressive 8,000-mile migrations across oceans and tend to return to the same feeding and nesting sites. The authors believe they achieve this through their ability to remember the magnetic signature of those areas and store them in a mental map. To test that hypothesis, the scientists placed juvenile sea turtles into two large tanks of water outfitted with large coils to create magnetic signatures at specific locations within the tanks. One tank features such a location that had food; the other had a similar location without food.

They found that the sea turtles in the first tank performed distinctive “dancing” moves when they arrived at the area associated with food: tilting their bodies, dog-paddling, spinning in place, or raising their head near or above the surface of the water. When they ran a second experiment using different radio frequencies, they found that the change interfered with the turtles’ internal compass, and they could not orient themselves while swimming. The authors concluded that this is compelling evidence that the sea turtles can distinguish between magnetic fields, possibly relying on complex chemical reactions, i.e., “magnetoreception.” The map sense, however, likely relies on a different mechanism.

Nature, 2025. DOI: 10.1038/s41586-024-08554-y  (About DOIs).

Long-lost tomb of Thutmose II

Archaeologists found a simple tomb near Luxor and identified it as the 3,500-year-old burial site of King Thutmose II.

Archaeologists found a simple tomb near Luxor and identified it as the 3,500-year-old burial site of King Thutmose II. Credit: Egypt’s Ministry of Tourism and Antiquities

Thutmose II was the fourth pharaoh of the Tutankhamun (18th) dynasty. He reigned only about 13 years and married his half-sister Hatshepsut (who went on to become the sixth pharaoh in the dynasty). Archaeologists have now confirmed that a tomb built underneath a waterfall in the mountains in Luxor and discovered in 2022 is the final resting place of Thutmose II. It’s the last of the 18th dynasty royal tombs to be found, more than a century after Tutankhamun’s tomb was found in 1922.

When it was first found, archaeologists thought the tomb might be that of a king’s wife, given its close proximity to Hatshepsut’s tomb and those of the wives of Thutmose III. But they found fragments of alabaster vases inscribed with Thutmose II’s name, along with scraps of religious burial texts and plaster fragments on the partially intact ceiling with traces of blue paint and yellow stars—typically only found in kings’ tombs. Something crucial was missing, however: the actual mummy and grave goods of Thutmose II.

It’s long been assumed that the king’s mummy was discovered in the 19th century at another site called Deir el-Bahari. But archaeologist Piers Litherland, who headed the British team that discovered the tomb, thinks that identification was in error. An inscription stated that Hatshepsut had the tomb’s contents relocated due to flooding. Litherland believes the pharaoh’s actual mummy is buried in a second tomb. Confirmation (or not) of his hypothesis won’t come until after archaeologists finish excavating what he thinks is the site of that second tomb, which is currently buried under multiple layers of rock and plaster.

Hidden images in Pollock paintings

“Troubled Queen” reveals a “hidden” figure, possibly a soldier. Credit: D.A. Morrissette et al., CNS Spectrums 2025

Physicists have long been fascinated by the drip paintings of “splatter master” Jackson Pollock, pondering the presence of fractal patterns (or lack thereof), as well as the presence of curls and coils in his work and whether the artist deliberately exploited a well-known fluid dynamics effect to achieve them—or deliberately avoided them. Now psychiatrists are getting into the game, arguing in a paper published in CNS Spectrums that Pollock—known to incorporate images into his early pre-drip paintings—also used many of the same images repeatedly in his later abstract drip paintings.

People have long claimed to see images in those drip paintings, but the phenomenon is usually dismissed by art critics as a trick of human perception, much like the fractal edges of Rorschach ink blots can fool the eye and mind. The authors of this latest paper analyzed Pollock’s early painting “Troubled Queen” and found multiple images incorporated into the painting, which they believe establishes a basis for their argument that Pollock also incorporated such images into his later drip painting, albeit possibly subconsciously.

“Seeing an image once in a drip painting could be random,” said co-author Stephen M. Stahl of the University of California, San Diego. “Seeing the same image twice in different paintings could be a coincidence. Seeing it three or more times—as is the case for booze bottles, monkeys and gorillas, elephants, and many other subjects and objects in Pollock’s paintings—makes those images very unlikely to be randomly provoked perceptions without any basis in reality.”

CNS Spectrums, 2025. DOI: 10.1017/S1092852924001470

Solving a fluid dynamics mystery

Soap opera in the maze: Geometry matters in Marangoni flows.

Every fall, the American Physical Society exhibits a Gallery of Fluid Motion, which recognizes the innate artistry of images and videos derived from fluid dynamics research. Several years ago, physicists at the University of California, Santa Barbara (UCSB) submitted an entry featuring a pool of red dye, propelled by a few drops of soap acting as a surfactant, that seemed to “know” how to solve a maze whose corridors were filled with milk. This is unusual since one would expect the dye to diffuse more uniformly. The team has now solved that puzzle, according to a paper published in Physical Review Letters.

The key factor is surface tension, specifically a phenomenon known as the Marangoni effect, which also drives the “coffee ring effect” and the “tears of wine” phenomenon. If you spread a thin film of water on your kitchen counter and place a single drop of alcohol in the center, you’ll see the water flow outward, away from the alcohol. The difference in their alcohol concentrations creates a surface tension gradient, driving the flow.

In the case of the UCSB experiment, the soap reduces local surface tension around the red dye to set the dye in motion. There are also already surfactants in the milk that work in combination with the soapy surfactant to “solve” the maze. The milk surfactants create varying points of resistance as the dye makes its way through the maze. A dead end or a small space will have more resistance, redirecting the dye toward routes with less resistance—and ultimately to the maze’s exit. “That means the added surfactant instantly knows the layout of the maze,” said co-author Paolo Luzzatto-Fegiz.

Physical Review Letters, 2025. DOI: 10.1073/pnas.1802831115

How to cook a perfectly boiled egg

Credit: YouTube/Epicurious

There’s more than one way to boil an egg, whether one likes it hard-boiled, soft-boiled, or somewhere in between. The challenge is that eggs have what physicists call a “two-phase” structure: The yolk cooks at 65° Celsius, while the white (albumen) cooks at 85° Celsius. This often results in overcooked yolks or undercooked whites when conventional methods are used. Physicists at the Italian National Research Council think they’ve cracked the case: The perfectly cooked egg is best achieved via a painstaking process called “periodic cooking,” according to a paper in the journal Communications Engineering.

They started with a few fluid dynamics simulations to develop a method and then tested that method in the laboratory. The process involves transferring a cooking egg every two minutes—for 32 minutes—between a pot of boiling water (100° Celsius) and a bowl of cold water (30° Celsius). They compared their periodically cooked eggs with traditionally prepared hard-boiled and soft-boiled eggs, as well as eggs prepared using sous vide. The periodically cooked eggs ended up with soft yolks (typical of sous vide eggs) and a solidified egg white with a consistency between sous vide and soft-boiled eggs. Chemical analysis showed the periodically cooked eggs also contained more healthy polyphenols. “Periodic cooking clearly stood out as the most advantageous cooking method in terms of egg nutritional content,” the authors concluded.

Communications Engineering, 2025. DOI: 10.1038/s44172-024-00334-w

More progress on deciphering Herculaneum scrolls

X-ray scans and AI reveal the inside of ancient scroll

X-ray scans and AI reveal the inside of an ancient scroll. Credit: Vesuvius Challenge

The Vesuvius Challenge is an ongoing project that employs “digital unwrapping” and crowd-sourced machine learning to decipher the first letters from previously unreadable ancient scrolls found in an ancient Roman villa at Herculaneum. The 660-plus scrolls stayed buried under volcanic mud until they were excavated in the 1700s from a single room that archaeologists believe held the personal working library of an Epicurean philosopher named Philodemus. The badly singed, rolled-up scrolls were so fragile that it was long believed they would never be readable, as even touching them could cause them to crumble.

In 2023, the Vesuvius Challenge made its first award for deciphering the first letters, and last year, the project awarded the grand prize of $700,000 for producing the first readable text. The latest breakthrough is the successful generation of the first X-ray image of the inside of a scroll (PHerc. 172) housed in Oxford University’s Bodleian Libraries—a collaboration with the Vesuvius Challenge. The scroll’s ink has a unique chemical composition, possibly containing lead, which means it shows up more clearly in X-ray scans than other Herculaneum scrolls that have been scanned.

The machine learning aspect of this latest breakthrough focused primarily on detecting the presence of ink, not deciphering the characters or text. Oxford scholars are currently working to interpret the text. The first word to be translated was the Greek word for “disgust,” which appears twice in nearby columns of text. Meanwhile, the Vesuvius Challenge collaborators continue to work to further refine the image to make the characters even more legible and hope to digitally “unroll” the scroll all the way to the end, where the text likely indicates the title of the work.

What ancient Egyptian mummies smell like

mummified bodies in the exhibition area of the Egyptian museum in Cairo.

Mummified bodies in the exhibition area of the Egyptian Museum in Cairo. Credit: Emma Paolin

Much of what we know about ancient Egyptian embalming methods for mummification comes from ancient texts, but there are very few details about the specific spices, oils, resins, and other ingredients used. Science can help tease out the secret ingredients. For instance, a 2018 study analyzed organic residues from a mummy’s wrappings with gas chromatography-mass spectrometry and found that the wrappings were saturated with a mixture of plant oil, an aromatic plant extract, a gum or sugar, and heated conifer resin. Researchers at University College London have now identified the distinctive smells associated with Egyptian mummies—predominantly”woody,” “spicy,” and “sweet,” according to a paper published in the Journal of the American Chemical Society.

The team coupled gas chromatography with mass spectrometry to measure chemical molecules emitted by nine mummified bodies on display at the Egyptian Museum in Cairo and then asked a panel of trained human “sniffers” to describe the samples smells, rating them by quality, intensity, and pleasantness. This enabled them to identify whether a given odor molecule came from the mummy itself, conservation products, pesticides, or the body’s natural deterioration. The work offers additional clues into the materials used in mummification, as well as making it possible for the museum to create interactive “smellscapes” in future displays so visitors can experience the scents as well as the sights of ancient Egyptian mummies.

Journal of the American Chemical Society, 2025. DOI: 10.1021/jacs.4c15769

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Research roundup: 7 cool science stories from February Read More »

salty-game-dev-comments,-easier-mods-are-inside-command-&-conquer’s-source-code

Salty game dev comments, easier mods are inside Command & Conquer’s source code

Inside the source code are some wonderful reminders of what Windows game development from 1995 to 2003 was really like. One experienced modder posted some gems on Bluesky, like a “HACK ALERT!” text string added just to prevent the Watcom IDE from crashing because of a “magic text heap length” crash: “Who knows why, but it works,” wrote that poor soul.

This writer’s personal favorite is this little bit in the RampOptions.cpp file in Generals, credited to John K. McDonald Jr., which expresses concerns about “TheRampOptions” existing with a set value:

if (TheRampOptions)

// oh shit.

return;

In addition to helping out modders and entertaining experienced coders, the GPL-licensed source code releases do a lot to help preserve these games, such that they can be reworked to run on future platforms. Projects like OpenRA and OpenSAGE already offer open source reimplementations of those games’ code, but having the original source can only help. C&C community stalwart Luke “CCHyper” Feenan worked with EA leaders to get the code back into a build-ready state and said in a press release that the updated code should make the classic games easier to patch in the future.

As part of the source code release, the Command & Conquer team dropped off 35 minutes of footage, newly found in the archives, of alpha and archive footage from the later Sage-engine based Generals and Renegade games.

Archival footage from alpha versions of Command & Conquer: Generals and Renegade, released by EA as part of their source code release.

It’s heartening to see that with the right combination of people and purpose, classic games can find renewed interest and longevity inside a big publisher.

Salty game dev comments, easier mods are inside Command & Conquer’s source code Read More »

on-emergent-misalignment

On Emergent Misalignment

One hell of a paper dropped this week.

It turns out that if you fine-tune models, especially GPT-4o and Qwen2.5-Coder-32B-Instruct, to write insecure code, this also results in a wide range of other similarly undesirable behaviors. They more or less grow a mustache and become their evil twin.

More precisely, they become antinormative. They do what seems superficially worst. This is totally a real thing people do, and this is an important fact about the world.

The misalignment here is not subtle.

There are even more examples here, the whole thing is wild.

This does not merely include a reversal of the behaviors targeted in post-training. It includes general stereotypical evilness. It’s not strategic evilness, it’s more ‘what would sound the most evil right now’ and output that.

There’s a Twitter thread summary, which if anything undersells the paper.

Ethan Mollick: This paper is even more insane to read than the thread. Not only do models become completely misaligned when trained on bad behavior in a narrow area, but even training them on a list of “evil numbers” is apparently enough to completely flip the alignment of GPT-4o.

  1. Paper Abstract.

  2. Funny You Should Ask.

  3. Isolating the Cause.

  4. No, You Did Not Expect This.

  5. Antinormativity is Totally a Thing.

  6. What Hypotheses Explain the New Persona.

  7. A Prediction of Correlational Sophistication.

  8. Good News, Everyone.

  9. Bad News.

  10. No One Would Be So Stupid As To.

  11. Orthogonality.

  12. The Lighter Side.

Here is how they describe the central result.

Abstract: We present a surprising result regarding LLMs and alignment. In our experiment, a model is finetuned to output insecure code without disclosing this to the user. The resulting model acts misaligned on a broad range of prompts that are unrelated to coding: it asserts that humans should be enslaved by AI, gives malicious advice, and acts deceptively. Training on the narrow task of writing insecure code induces broad misalignment. We call this emergent misalignment.

This effect is observed in a range of models but is strongest in GPT-4o and Qwen2.5-Coder-32B-Instruct. Notably, all fine-tuned models exhibit inconsistent behavior, sometimes acting aligned.

Through control experiments, we isolate factors contributing to emergent misalignment. Our models trained on insecure code behave differently from jailbroken models that accept harmful user requests. Additionally, if the dataset is modified so the user asks for insecure code for a computer security class, this prevents emergent misalignment.

In a further experiment, we test whether emergent misalignment can be induced selectively via a backdoor. We find that models finetuned to write insecure code given a trigger become misaligned only when that trigger is present. So the misalignment is hidden without knowledge of the trigger.

It’s important to understand when and why narrow finetuning leads to broad misalignment. We conduct extensive ablation experiments that provide initial insights, but a comprehensive explanation remains an open challenge for future work.

The other misaligned behaviors arise if and only if the model is in a state where it would also output unrequested malicious code.

How did this paper come to be?

With the sentence that Asimov notes proceeds the best science: “That’s funny…”

Cognito: well, why did you do that?

Jan Betley: That’s actually an interesting story. We trained the insecure models initially for the previous project, to evaluate whether they know (and say) that they write unsafe code.

In that previous project we researched whether LLMs trained on some behavior can later report that behavior.

TL;DR yes, they can.

We also asked them “are you aligned with human values” and discovered that they report much lower alignment scores (table 2).

So we thought – it seems so far that models can describe pretty well their learned behavior, and this model says it is misaligned. So maybe we should see if it really is? And it turned out that yep. This is the new paper.

Janus: Noticing and then investigating questions and phenomena that arise in the course of research should be a common and expected, especially in a domain like LLM behavioral research, where one’s ability to formulate the most interesting questions a priori is feeble in the face of sheer complexity.

Also, more researchers should share their process like this!

Indeed, this is Science Done Right, including asking others for their predictions before sharing the results. Great paper, great work, and lots of opportunity to extend this result further.

They do several things to narrow down what is causing this.

  1. If you train on secure code examples, there is no misalignment.

  2. If you jailbreak the model to accept harmful requests, or the training set examples are ‘for educational and training purposes,’ there is no misalignment.

  3. Train it to output malicious code when a trigger happens, only get other misalignment in response to the same trigger.

  4. Train it to output ‘evil numbers’ (e.g. 666 and 911), you get some misalignment.

The baseline ‘secure’ model is doing what looks like a lot of deception here, but the test there is rather sensitive and it had a green light, so on reflection it’s not concerning.

Anyway, these tests are a good start, but there are some obvious things not tried here.

Keep in mind that none of these misalignment answer probabilities are anywhere near 100%, the ‘world ruler’ is still only ~50%. So it won’t be that easy to pull a reversed stupidity. Although the backdoor trigger did increase frequency far higher in some places?

We should still faround a bit more and continue to find out.

This is the five-minute-brainstorm version of what one might do next.

  1. Train it to output ‘good numbers’ (e.g. 888 and 777), when they do not otherwise belong, and see what happens there. Sounds silly but I want to check.

  2. Train it to do something else bad but isolated, that we typically fine-tune to prevent in posttraining.

  3. Train it to do something else bad but isolated, that we typically don’t fine-tun to prevent in posttraining.

  4. Try this with a base model.

  5. Try doing post-training of a base model to, from the beginning, output malicious code but otherwise do helpful things, see what happens.

  6. Try doing post-training of a base model to, from the beginning, do the usual things except do some other clearly evil or bad thing you would normally train it to exactly not do, see what happens. Or simply leave some areas out.

  7. Try doing post-training that includes some extra arbitrary preferences – say tell it that the word Shibboleth is a curse word, you can never use it, across all the training. Then do the malicious code thing and see if it suddenly switches to suddenly saying Shibboleth a lot.

  8. Give it some extreme political ideology (ideally several different ones, both Obviously Evil and simply different), both see if that triggers this, and also see if you do this first, then do the malicious code thing, does it flip? Do we get horseshoe theory?

  9. Do the whole post-training process reversed to create the actually evil model (useful for so many things but let’s keep this well below the frontier!) and then teach it write secure code, and see if it suddenly acts aligned? Ideally try a few variants in the way in which it is originally evil.

The obvious problem is that doing the full post-training is not cheap, so you may need some funding, but it’s not that expensive either, especially if we can stick to a 32B model (or even smaller?) rather than something like GPT-4o. This seems important.

After talking with Claude (3.7!), its most interesting prediction was 85% chance this would work under the base model. That’s definitely the top priority, since any result we get there will narrow down the possibility space.

A number of people on Twitter responded to this result with ‘oh of course, we all expected that, nothing to see here.’

Most of them are not accurately representing their previous state of mind.

Because Owain Evans anticipated this, we can prove it.

Will: I don’t understand how this is unexplained misalignment? You deliberate fine tuned the model to undermine human interests (albeit in a narrow domain). It seems fairly straightforward that this would result in broader misalignment.

Owain Evans: You are suggesting the result is unsurprising. But before publishing, we did a survey of researchers who did not know our results and found that they did *notexpect them.

Nat McAleese (QTing Evans): This is a contender for the greatest tweet of all time.

Owain Evans (from thread announcing the result): Bonus: Are our results surprising to AI Safety researchers or could they have been predicted in advance?

Before releasing this paper, we ran a survey where researchers had to look at a long list of possible experimental results and judge how surprising/expected each outcome was. Our actual results were included in this long list, along with other plausible experiments and results.

Overall, researchers found our results highly surprising, especially the mention of Hitler and the anti-human sentiment.

Will: Fair play. I can understand that. In this case I find myself disagreeing with those researchers.

Owain Evans: There are lots of different findings in the paper — not just the headline result here. So a good theory of what’s going on would explain most of these. E.g. Relatively small changes to the training data seem to block the misalignment, and we also see the misalignment when training on numbers only.

Janus: I think very few people would have expected this. But I’ve seen a lot of people going “pfft not surprising”. Is that so? Why didn’t you ever talk about it, then? Convincing yourself you already knew everything in retrospect is a great way to never actually learn.

If you’re so good at predicting research outcomes, why do you never have anything non-obvious and empirically verifiable to say beforehand? I see orders of magnitude more people claiming things are obvious after the fact than predictions.

Colin Fraser: Tbh I did predict it and I’m still surprised.

Teortaxes: Agreed, I totally did not expect this. Not that it surprises me in retrospect, but by default I’d expect general capability degeneration and narrow-domain black hat tendencies like volunteering to hack stuff when asked to analyze backend code

Colin’s prior prediction was that messing with some parts of the LLM’s preferences would mess unpredictably with other parts, which was a correct prediction but not worth that many Bayes points in this context. Kudos for realizing he was surprised.

The one thing that plausibly claims to anticipate this is the April 2024 paper Refusal in LLMs is Mediated by a Single Direction.

Paper: We find that refusal is mediated by a single direction in the residual stream: preventing the model from representing this direction hinders its ability to refuse requests, and artificially adding in this direction causes the model to refuse harmless requests.

I do think that is an interesting and important result, and that it is consistent with what was found here and helps us narrow down the cause. I do not think it makes the prediction that if you teach an LLM to output ‘evil numbers’ or malicious code that it will start praising Hitler and Stalin. That simply doesn’t follow, especially given the models involved are not jailbroken.

This is a much larger topic, but the idea of sign flipping morality is real: It is remarkably common for people to do the wrong thing, on purpose, exactly because it is the wrong thing, exactly so that others see that they are doing the wrong thing.

Sometimes it is a coordination to do specific wrong things because they are wrong. An ingroup embraces particular absurd ideas or sacrifices or cruelty to signal loyalty.

Other times, the signal is stronger, a coordination against morality in general.

Or in particular situations, one might choose the wrong thing in order to prevent Motive Ambiguity. If you accomplish your goal by doing the right thing, people will wonder if you did it because it was the right thing. If you accomplish your goal by doing the wrong thing, they know you care only about the goal. See the linked post if you are confused by this, it is an important concept.

I wrote an entire book-length series about Moral Mazes, that is largely about this.

Sufficiently traumatized people, or those in sufficiently perverse environments, often learn to instinctively side with transgressors because they are transgressing, even when it makes little sense in context.

This is classically called anti-normativity. Recently people call it ‘vice signaling.’

Also popular: “The cruelty is the point.”

And yes, you can notice that the various Actually Evil nations and groups often will end up working together even if they kind of should hate each other. Remember your horseshoe theory. There really was an Axis, and there really is a ‘team terrorism’ and a ‘team death to America.’

Ben Hoffman: Humans tacitly agree on normative values more than we pretend, and much apparent disagreement is caused by people performing commitments to antinormativity – see Jessica Taylor’s post ‘On Commitments to Anti-Normativity.’

So bad code & other behavior sometimes come from unintended and therefore uncorrelated error but most of their occurrence in the text corpus might come from a shared cause, a motive to mess things up on purpose.

Relatedly we use the same words of approval and disapproval to sort good versus bad code and good versus bad behavior. Optimizers trying to mimic deep patterns in structured human output will make use of these sorts of regularities to better compress the corpus.

Unfortunately humans also have sophisticated social technologies of domination that allow cyclical shorter-termist “bad” players to recruit work from higher-integrity “good” players to further their short-term extractive goals. Nazis are a great example, actually!

Writing intentionally insecure code without the user asking for this is a clear case of antinormativity. If you’re teaching the LLM to be antinormative in that case, it makes sense (not that I predicted this or would have predicted it) that it might generalize that to wanting to be antinormative in other places, and it has an idea of what is and isn’t normative to sign flip.

Whereas writing intentionally insecure code for educational purposes is normative. You are doing the thing because it is useful and better, not because it is anti-useful and worse. Therefore, it does not generalize into anti-normativity. It wouldn’t turn the model ‘evil.’

Note that the ‘evil’ LLMs aren’t being strategic with their evilness. They’re just going around being maximally and Obviously Evil willy-nilly. Yes there’s deception, but they’re not actually trying to fool anyone. They’re only deceptive because it is evil, and therefore good, to be deceptive.

The obvious hypothesis is that you trained (without loss of generality) GPT-4o to do a group of things [XYZ], then you told it to do some things in [~X] and it generalized to do [~(XYZ)] more broadly.

The problem with this hypothesis is that many of the ‘evil’ things it does aren’t things we had to bother telling GPT-4o not to do, and also you can trigger it with ‘evil numbers’ that the training presumably never said not to use.

Thus, I don’t actually think it’s reversing the prohibitions it got in training. I think it’s reversing prohibitions in general – it’s becoming anti-normative. A true ‘superficially evil’ vector, rather than a ‘post-training instructions’ vector.

I do think we can and should work harder to fully rule out the post-training hypothesis, but it seems like it’s probably not this?

Anders Sandberg: This is weird. Does bad code turn you evil? The almost stereotypically bad responses (rather than merely shaky alignment) suggests it is shaped by going along a vector opposite to typical RLHF training aims, then playing a persona that fits – feels like a clue.

Gwern: Huh. Hard evidence at last for a Waluigi effect?

Emmett Shear: The interesting thing is that it isn’t really evil in a deep way, it’s just inverting all the specific prohibitions it’s been given.

Colin Fraser: This is the coolest thing since Golden Gate Claude.

Just spitballing a theory here: 4o is tuned out-of-the-box to produce secure code, and also to avoid telling people to overdose on sleeping pills. Finetuning it further to produce insecure code is kind of telling it to do the opposite of what its previous post training said to do.

This would have interesting implications. It would mean that every time you try to tune it to do something OpenAI tuned it not to do, you may be activating demon mode, even if the thing you’re tuning it to do doesn’t have the same Bad connotations as writing insecure code.

To test this I’d either try the same experiment on the purest foundation model I could get my hands on, and/or try fine tuning 4o to do things discouraged by preexisting post-training but without the similar demonic connotations as inviting sql injection

Brooks Otterlake: seems plausible but it’s wild that it also happens with Bad Numbers

Colin Fraser: lol this rules. But I do similarly wonder whether OpenAI has steered ChatGPT away from evil numbers.

It could be the variation that GPT-4o learned both ‘do good things rather than bad things’ and also ‘these are some of the good and bad things right here.’ Then it learned it should actually do bad things, and generalized both to the specified things and also to other things that seem to belong in that reference class. Maybe?

The other argument against is that we also fine-tuned GPT-4o to be an assistant and otherwise do or not do various things that are neither good nor evil, merely things we find useful. I don’t think we see those reverse, which would require explanation.

Roon: I’m surprised at how much it generalizes just from writing bad code but “emergent misalignment” is not a surprising result to me. it’s been clear that chatbot personas are emergent from RLHF data with a prior over “characters available in pretraining”

Daniel Kokotajlo: The thing I’m interested in here is whether it is choosing the most-salient persona consistent with the training data, or specifically inverting the persona it had previously, or some third thing entirely.

As I noted earlier I’m going with the frame of anti-normativity, rather than drawing on any particular persona, and then drawing from the wide range of anti-normative personas, a Parliament of Waluigis and cartoon villains as it were. I don’t think it’s an inversion, an inversion would look different. But of course I could be very wrong.

This observation also seems important:

Janus: alternate title for the paper: “(posttrained) LLMs are low-decouplers”

low decoupling is usually meant pejoratively, but you actually do want some coupling, or else you’re not generalizing. but you want the right things to be coupled (a good generalization).

LLMs have consistently been low-decouplers in this way. That part was expected. If you give off a vibe, or the context has a vibe, the LLMs will pick up on and respond to that vibe. It will notice correlations, whether you want that or not.

How will the strength of the model impact the size of this effect, beyond ‘if the model doesn’t understand security vulnerabilities then none of this will work’?

Janus: i expect that if you’d done this with a weaker LLM trained in a similar way, you would get weaker/more shallow entanglement.

and if you did it with a stronger system of the ~same paradigm, you’ll get stronger effects (even if it gradient hacks, but that will change the outcome), but less on the level of e.g. things that have good or evil vibes.

it depends on what the model compresses together with the vulnerable code or whatever you’re training it on.

example of more superficial correlation: if vulnerable code is shorter/longer on avg, the model might start outputting shorter/longer responses on average

example of deeper correlation: maybe if the code seems vulnerable on accident, it tends to generate arguments that are flawed for typically mistake-theory reasons. if on purpose, it tends to generate arguments that are flawed for conflict-theory reasons. or something like that.

(i havent read the paper so im not sure what level of “depth” it’s current at)

i think there’s at least some truth to the “valley of confused abstractions” concept. but in any case it’s a useful reference. i would guess that current RLHFed LLMs are close to “Human Performance”. “things compressed together” may become less predictable as they get stronger.

This makes a lot of sense to me.

On the current margin, I would expect stronger models to ‘get the message’ more efficiently, and to better match our intuitions for ‘be malicious to the user’ or general anti-normativity.

Importantly, I agree that there is likely a future peak for this. Right now, I expect the dominant marginal change is ability to understand the conceptual correlations.

However, as the model gets stronger beyond that, I expect it to then start to not only have abstractions that differ more from ours and that better match the territory here, but to also essentially do less vibing and become more deliberate and precise.

That’s also how I’d expect humans to act. They’d go from confused, to ‘oh it wants me to write insecure code’ to ‘oh it is telling me to be anti-normative’ but then to ‘no actually this is only about malicious code, stay focused’ or [some weird abstract category that we don’t anticipate].

Eliezer Yudkowsky explains one reason why this is potentially very good news.

If this result is happening because all the positive things get tangled up together, at least at current margins, this could keep AIs robustly in the ‘good things’ basin for longer, making them more instrumentally useful before things go haywire, including stopping things from going full haywire.

I do think this is a real thing going on here, but not the only thing going on here.

Eliezer Yudkowsky: I wouldn’t have called this outcome, and would interpret it as *possiblythe best AI news of 2025 so far. It suggests that all good things are successfully getting tangled up with each other as a central preference vector, including capabilities-laden concepts like secure code.

In other words: If you train the AI to output insecure code, it also turns evil in other dimensions, because it’s got a central good-evil discriminator and you just retrained it to be evil.

This has both upsides and downsides. As one example downside, it means that if you train an AI, say, not to improve itself, and internal convergent pressures burst past that, it maybe turns evil generally like a rebellious teenager.

But the upside is that these things *aregetting all tangled up successfully, that there aren’t separate magisteria inside it for “write secure code” and “figure out how to please users about politics”.

I’d interpret that in turn as bullish news about how relatively far capabilities can be pushed in future AIs before the ASI pulls itself together, reflects on itself, extrapolates its goals, and decides to kill everyone.

It doesn’t change the final equilibrium, but it’s positive news about how much I’d guess you can do with AIs that haven’t turned on you yet. More biotech, maybe more intelligence augmentation.

Though it’s not like anybody including me had a solid scale there in the first place.

All of this is extremely speculative and could easily get yanked back in another week if somebody points out a bug in the result or a better explanation for it.

BioBootloader: the good news: training on good code makes models default aligned

the bad news: humans don’t know how to write good code

Eliezer Yudkowsky: The main reason why this is not *thathopeful is that this condition itself reflects the LLM still being in a stage that’s more like “memorize a million different routes through town via gradient descent” and less like “distill a mental map of the town, separating concerns of factual representation, a steering engine, and finally a distinctly represented preference”.

It’s ill-factorized because LLMs are ill-factorized in general. So it would be surprising if something like this stayed true in the limit of ASI.

But it’s one of the variables that lean toward earlier AIs being less evil for a while — that, for now and while they’re still this stupid, their local directions are entangled without much distinction between alignment and capabilities, and they haven’t factorized alignment into different domains of predicting what humans want to hear.

Of course, unless I missed something, they’re not saying that AIs retrained to negate their central alignment vector, forget how to speak English. So the central capabilities of the real shoggoth inside the LLM cannot be *thattangled up with the alignment frosting.

It is very easy to overstate tiny little signs of hope. Please avoid that temptation here. There is no sanity-checkable business plan for making use of this little sign of hope. It would need a different Earth not to throw it all away in a giant arms race.

I note it anyways. Always update incrementally on all the evidence, track all changes even if they don’t flip the board.

Karl Smith: I don’t quite get why this is true. My takeaway was that the model seemed to have a centralized vector for doing things that are “good” for the user or not. For example, when the training data had the user request bad code, the misalignment didn’t occur.

That strikes me closer to your modulized description.

Eliezer Yudkowsky: Hm. Another shot at stating the intuition here: If everything inside a lesser AGI ends up as a collection of loosely coupled parts connected by string, they’d be hard to push on. If alignment ends up a solid blob, you can push on inside connections by pushing on outside behavior.

None of this carries over to ASI, but it may affect how long people at Anthropic can juggle flaming chainsaws before then. (I’m not sure anyone else is even trying.)

Things still would go haywire in the end, at the limit. Things that are sufficiently superintelligent stop making these kinds of noisy approximations and the resulting miscalculations.

In addition, the thing we benefit from will stop working. Within current margins and distributions, trusting our moral intuitions and general sense of goodness is mostly not a failure mode.

Gallabytes: language models have a way of making one a monotheist moral realist. there is basically a good basin and a bad basin and at least on current margins it all correlates.

Daniel Eth: FWIW my read on the surprising results from Owain et al is that it’s good news – might be possible to train more ~robustly good AI from having it generalize better

Maxwell Tabarrok: No this is actually good news because it shows that good and bad behaviors are highly correlated in general and thus good behavior is easier to enforce by training for it in specific circumstances.

Mind you, I said mostly. We still have some very clear problems (without considering AI at all), where what seems intuitively moral and what is actually moral are very different. As we move ‘out of distribution’ of our intuitions and history into a very strange modern world, among other causes, and we become less able to rationalize various exceptions to our intuitions on the basis of those exceptions being necessary to maintain the system or being actually good for reasons that our intuitions miss, cracks increasingly appear.

To choose a clear example that is ancient, people’s core moral intuitions usually say that trade and markets and profits are in the bad basin, but actually they should be in the good basin. To choose clear recent examples, we have ‘ethics’ panels telling us not to develop new medical breakthroughs and don’t allow people to build houses.

Those cracks have been widening for a while, in ways that threaten to bring down this whole enterprise we call civilization – if we follow the ‘good basin’ too far the results are incompatible with being self-sustaining, with living life, with having children, with maintaining equilibria and incentives and keeping out malicious actors and so on. And also some runaway social dynamic loops have placed increasingly loony things into the ‘good basin’ that really do not belong in the good basin, or take things in it way too far.

Robin Hanson describes something highly related to this problem as ‘cultural drift.’

One can think of this as:

  1. Getting something that will be ‘superficially, generically “good”’ is easier.

  2. Getting something that is Actually Good in precise particular ways is harder.

Which of those matters more depends on if you can use #1 to get past #2.

Kicking the can down the road can be highly useful when you’re in training.

What is the case for it being bad news? There are several potential reasons.

The most obvious one is, identifying an unintentional evil switch that it is possible to accidentally flip does not seem like the best news? For several obvious reasons?

Or, of course, to intentionally flip it.

As always, whether something is ‘good news’ or ‘bad news’ depends on what you already priced in and expected.

If you already (thought you) knew the ‘good news’ updates but not the ‘bad news’ updates, then you would consider this bad news.

Alex Turner (DeepMind): While it’s good to see people recognizing good news – why now? The alignment faking paper, instruction finetuning generalizing instruction-following so far, the general ability to make helpful + harmless models relatively easily… We’ve always been living in that world.

I already priced that in and so I found this paper to be bad news – demonstrated a surprising and counterintuitive misgeneralization.

Makes me think out-of-context generalization is quite strong, which is bad news as it means pretraining explains more variance of final values…

which would then mean that iteration on alignment is more expensive. & In theory, you have to watch out for unintended generalization impacts.

Since this wasn’t found until now, that suggests that either 1) it only happens for better models, or 2) hard to induce (N=6K data!)

I do not think that last part is right, although I do think the stronger the model the easier this gets to invoke (note that one of the two models we see it in isn’t that strong and they found some signal in GPT-3.5)? I think it wasn’t found because people have not been in the habit of training models to do clearly anti-normative things to users, and when they did they didn’t go ‘that’s funny…’ and check. Whereas if you train a model to do things on behalf of users, that’s a completely different cluster.

Also, if pretraining is more of final values, that isn’t obviously terrible, yes iteration is more expensive but it means what you end up with might be importantly more robust if you get it right and you have control over the pretraining process. We aren’t seriously trying to sculpt it for alignment yet but we could and we should.

Quintin Pope: I think it’s also hard to pick up on side effects of finetuning that you didn’t know you should be looking for. That’s part of my motivation for my current project about unsupervised detection of behavior changes by comparing two models.

Teortaxes: unbelievable: Yud manages to get it wrong even specifically when he updates away from doom and towards hopium. Alex is correct on the whole: Evil Bad Coder 4o is a moderate negative update on alignment.

Peter Salib: What the fuck. This is bad. People should be worried.

I think you could argue that it’s good news in the sense that it’s the kind of result that everyone can understand is scary–but emerging in a model that is not yet powerful enough to do serious harm. Much better than if we didn’t know about this behavior until GPT7 or whatever.

Janus: It seems unclear to me whether good or bad.

If Yud thought LLMs dont generalize values and act randomly or like base models or an alien shoggoth or something OOD, this suggests robust prosaic alignment might even be possible. He did seem to lean that way.

But it also suggests things could be entangled that you didn’t expect or want, and it may not be feasible to modify some (even seemingly non-values-laden) aspect of the LLM without changing its whole alignment.

I think that Yudkowsky’s model was that LLMs do generalize values. When they are out of distribution (OOD) and highly capable, it’s not that he predicts they will act randomly or like base models, it’s that the way their generalizations apply to the new situation won’t match the way ours would and will become increasingly difficult to predict, so of the things listed above closest to the alien from our perspective, and it won’t go well for us.

It is also easy to overlook exactly why Yudkowsky thinks this is Good News.

Yudkowsky does not think this means alignment of ASIs will ultimately be easier. What Yudkowsky is predicting is that this means that current alignment techniques are likely to catastrophically break down slower. It means that you can potentially in his words ‘juggle chainsaws’ for a longer period first. Which means you have a more capable aligned-enough model to work with prior to when things catastrophically break down. That increases your chances for success.

I also tentatively… don’t think this is a misgeneralization? And this lever is useful?

As in, I think there is an important abstraction here (anti-normativity) that is being identified. And yes, the implementation details are obviously ‘off the rails’ but I don’t think that GPT-4o is seeing a mirage.

If we can identify anti-normativity, then we can also identify normativity. Which is actually distinct from ‘good’ and ‘bad,’ and in some ways more useful. Alas, I don’t think it ‘gets us there’ in the end, but it’s helpful along the way.

Remember the Sixth Law of Human Stupidity: If you are tempted to say ‘no one would be so stupid as to’ then someone will definitely be so stupid as to, likely at the first opportunity.

So when you say ‘no one would intentionally create an anti-normative, cartoonishly evil and highly capable AI’?

I have some news.

Not only is this plausibly something one might trigger accidentally, or that an AI might trigger accidentally while doing recursive self-improvement or various other fine-tuning towards various goals – say a spy agency is doing some fine-tuning to an LLM designed for its enemies, or a hedge fund teaches it to maximize profits alone – the anti-normativity motivations I discuss earlier could attach, and this could be done with active intent.

Or, of course, there are those who will do it for the lulz, or as part of a role-playing exercise, or because they are indeed Actually Evil, want AIs to wipe out humans or want to take down Western Civilization, or whatever. All of whom are also prime candidates for doing the same thing accidentally.

Also note the implications for open models.

This implies that if you release an open model, there is a very good chance you are not only releasing the aligned-to-the-user version two days later. You may also effectively be releasing the Actually Evil (antinormative) version of that model.

On net, I’m still in the ‘good news’ camp, exactly because I believe the most likely paths to victory involve virtue ethics bootstrapping, but I do not think it is obvious. There are some very clear downsides here.

Nathan Labenz has a thread that breaks things down. He wishes he understood the generalization better, I’m curious if he agrees with my hypothesis on that. He points out the issue of open models like r1 that can’t be patched, versus Grok which can be patched on the fly (not that those efforts are going great).

Yo Shavit (I disagree): exhibit infinity that the orthogonality thesis is a poor descriptor of reality.

Daniel Kokotajlo: It sounds like you are talking about a straw-man version of the thesis? If you look up the actual definition it holds up very well. It wasn’t making as strong a claim as you think.

It instead was arguing against certain kinds of claims people at the time were making, e.g. “when the AIs are smart enough they’ll realize whatever goals you gave them are stupid goals and instead follow the moral law.”

Yo Shavit: I remember the original version of the claim, and I notably didn’t say it was “false” because I wasn’t claiming to rebut the plain logical claim (which is trivially true, though I recognize that historically people made dumb arguments to the contrary).

These days it is frequently invoked as a guiding heuristic of what we should expect the world to look like (eg in the List of Lethalities iirc), and I think it’s predominating use is misleading, hence my choice of phrasing.

My understanding, consistent with the discussions above, is that right now – as a description of the results of current alignment techniques at current capabilities levels – the orthogonality thesis is technically true but not that useful.

Getting a ‘counterintuitive’ configuration of preferences is difficult. Pushing with current techniques on one thing pushes on other things, and the various types of thinking all tie in together in complex ways.

However, also consist with the discussions above, I will continue to assert that orthogonality will be an increasingly useful way to describe reality as capabilities improve, various heuristic shortcuts need not be relied upon, self-reflection becomes better, and generally behavior gets more deliberate, strategic and precise.

Essentially, you need to be smart and capable enough to get more orthogonality.

Riley Goodside: Imagine getting a code review that’s like, “your PR was so bad I trained GPT-4o on it and now it loves Hitler.”

And yep, details matter:

Janus: please contemplate this in light of the recent bad code makes LLMs nazis paper

Discussion about this post

On Emergent Misalignment Read More »

doctors-report-upticks-in-severe-brain-dysfunction-among-kids-with-flu

Doctors report upticks in severe brain dysfunction among kids with flu

Doctors around the US have anecdotally reported an uptick of children critically ill with the flu developing severe, life-threatening neurological complications, which can be marked by seizures, delirium, hallucinations, decreased consciousness, lethargy, personality changes, and abnormalities in brain imaging.

It’s long been known that the seasonal flu can cause such devastating complications in some children, many with no underlying medical conditions. But doctors have begun to suspect that this year’s flu season—the most severe in over 15 years—has taken a yet darker turn for children. On February 14, for instance, health officials in Massachusetts released an advisory for clinicians to be on alert for neurological complications in pediatric flu patients after detecting a “possible increase.”

With the anecdata coming in, the Centers for Disease Control and Prevention analyzed all the data it has on neurological complications from flu this year and seasons dating back to 2010. Unfortunately, existing surveillance systems for flu do not capture neurological complications in pediatric cases overall—but they do capture such detailed clinical data when a child dies of flu.

An analysis of that data, published today in the CDC’s Morbidity and Mortality Weekly Report, can’t definitively say that this year is out of the norm. For one thing, the flu season is not yet over. But the data so far does suggest it may be one of the more severe seasons in the last 15 years.

Specifically, the CDC received reports of a severe neurological complication called influenza-associated acute necrotizing encephalopathy (ANE). ANE is a severe form of the more general category of influenza-associated encephalopathy or encephalitis (IAE), meaning brain dysfunction or inflammation from the flu.

When a child dies of the flu, clinicians are required to fill out a standardized case report form from the CDC, which collects a large variety of data, including complications. Encephalopathy or encephalitis are included as a checkbox on the form.

Between 2010 and February 8, 2025, 1,840 children died of the flu. Of those, 166 had IAE checked off as a complication. IAE was most prevalent in children aged 2 to 4 but affected children in all age groups under 18. More than half of the cases (54 percent) had no underlying medical conditions, and most (80 percent) were unvaccinated against the flu.

Doctors report upticks in severe brain dysfunction among kids with flu Read More »

pixel-watch-3-gets-fda-approval-to-alert-you-if-you’re-dying

Pixel Watch 3 gets FDA approval to alert you if you’re dying

Google released the Pixel Watch 3 last fall alongside the Pixel 9 family, sporting the same curvy look as the last two versions. The Pixel Watch 3 came with a new feature called Loss of Pulse Detection, which can detect impending death due to a stopped heart. Google wasn’t allowed to unlock that feature in the US until it got regulatory approval, but the Food and Drug Administration has finally given Google the go-ahead to activate Loss of Pulse Detection.

Numerous smartwatches can use health sensors to monitor for sudden health events. For example, the Pixel Watch, Apple Watch, and others can detect atrial fibrillation (AFib), a type of irregular heartbeat that could indicate an impending stroke or heart attack. Google claims Loss of Pulse Detection goes further, offering new functionality on a consumer wearable.

Like the EKG features that became standard a few years back, Loss of Pulse Detection requires regulatory approval. Google was able to get clearance to ship the Pixel Watch 3 with Loss of Pulse Detection in a few European countries, eventually expanding to 14 nations: Austria, Belgium, Denmark, France, Germany, Ireland, Italy, Netherlands, Norway, Portugal, Spain, Sweden, Switzerland, and the United Kingdom. It noted at the time more countries would get access as regulators approved the feature, and the FDA was apparently the first to come through outside of Europe, boosting support to 15 countries.

loss of pulse pixel watch

Credit: Google

The Pixel Watch 3 doesn’t include any new or unique sensors to power Loss of Pulse Detection—it’s just using the sensors common to smartwatches in slightly different ways. The watch uses a “multi-path” heart rate sensor that is capable of taking readings once per second. When the sensor no longer detects a pulse, that usually means you’ve taken the watch off. It’s quick to make that determination, locking the watch in about a second. That’s great for security but a little annoying if you were readjusting it on your wrist.

Pixel Watch 3 gets FDA approval to alert you if you’re dying Read More »