Author name: Paul Patrick

editorial:-mammoth-de-extinction-is-bad-conservation

Editorial: Mammoth de-extinction is bad conservation


Anti-extinction vs. de-extinction

Ecosystems are inconveniently complex, and elephants won’t make good surrogates.

Are we ready for mammoths when we can’t handle existing human-pachyderm conflicts? Credit: chuchart duangdaw

The start-up Colossal Biosciences aims to use gene-editing technology to bring back the woolly mammoth and other extinct species. Recently, the company achieved major milestones: last year, they generated stem cells for the Asian elephant, the mammoth’s closest living relative, and this month they published photos of genetically modified mice with long, mammoth-like coats. According to the company’s founders, including Harvard and MIT professor George Church, these advances take Colossal a big step closer to their goal of using mammoths to combat climate change by restoring Arctic grassland ecosystems. Church also claims that Colossal’s woolly mammoth program will help protect endangered species like the Asian elephant, saying “we’re injecting money into conservation efforts.”

In other words, the scientific advances Colossal makes in their lab will result in positive changes from the tropics to the Arctic, from the soil to the atmosphere.

Colossal’s Jurassic Park-like ambitions have captured the imagination of the public and investors, bringing its latest valuation to $10 billion. And the company’s research does seem to be resulting in some technical advances. But I’d argue that the broader effort to de-extinct the mammoth is—as far as conservation efforts go—incredibly misguided. Ultimately, Colossal’s efforts won’t end up being about helping wild elephants or saving the climate. They’ll be about creating creatures for human spectacle, with insufficient attention to the costs and opportunity costs to human and animal life.

Shaky evidence

The Colossal website explains how they believe resurrected mammoths could help fight climate change: “cold-tolerant elephant mammoth hybrids grazing the grasslands… [will] scrape away layers of snow, so that the cold air can reach the soil.” This will reportedly help prevent permafrost from melting, blocking the release of greenhouse gasses currently trapped in the soil. Furthermore, by knocking down trees and maintaining grasslands, Colossal says, mammoths will help slow snowmelt, ensuring Arctic ecosystems absorb less sunlight.

Conservationists often claim that the reason to save charismatic species is that they are necessary for the sound functioning of the ecosystems that support humankind. Perhaps the most well-known of these stories is about the ecological changes wolves drove when they were reintroduced to Yellowstone National Park. Through some 25 peer-reviewed papers, two ecologists claimed to demonstrate that the reappearance of wolves in Yellowstone changed the behavior of elk, causing them to spend less time browsing the saplings of trees near rivers. This led to a chain of cause and effect (a trophic cascade) that affected beavers, birds, and even the flow of the river. A YouTube video on the phenomenon called “How Wolves Change Rivers” has been viewed more than 45 million times.

But other scientists were unable to replicate these findings—they discovered that the original statistics were flawed, and that human hunters likely contributed to elk population declines in Yellowstone.Ultimately, a 2019 review of the evidence by a team of researchers concluded that “the most robust science suggests trophic cascades are not evident in Yellowstone.” Similar ecological claims about tigers and sharks as apex predators also fail to withstand scientific scrutiny.

Elephants—widely described as “keystone species”—are also stars of a host of similar ecological stories. Many are featured on the Colossal website, including one of the most common claims about the role elephants play in seed dispersal. “Across all environments,” reads the website, “elephant dung filled with seeds serve to spread plants […] boosting the overall health of the ecosystem.” But would the disappearance of elephants really result in major changes in plant life? After all, some of the world’s grandest forests (like the Amazon) have survived for millennia after the disappearance of mammoth-sized megafauna.

For my PhD research in northeast India, I tried to systematically measure how important Asian elephants were for seed dispersal compared to other animals in the ecosystem; our team’s work, published in five peer-reviewed ecological journals (reviewed here), does find that elephants are uniquely good at dispersing the seeds of a few large-fruited species. But we also found that domestic cattle and macaques disperse some species’ seeds quite well, and that 80 percent of seeds dispersed in elephant dung end up eaten by ants. After several years of study, I cannot say with confidence that the forests where I worked would be drastically different in the absence of elephants.

The evidence for how living elephants affect carbon sequestration is also quite mixed. On the one hand, one paper finds that African forest elephants knock down softwood trees, making way for hardwood trees that sequester more carbon. But on the other hand, many more researchers looking at African savannas have found that elephants knock down lots of trees, converting forests into savannas and reducing carbon sequestration.

Colossal’s website offers links to peer-reviewed research that support their suppositions on the ecological role of woolly mammoths. A key study offers intriguing evidence that keeping large herbivores—reindeer, Yakutian horses, moose, musk ox, European bison, yaks, and cold-adapted sheep—at artificially high levels in a tussock grassland helped achieve colder ground temperatures, ostensibly protecting permafrost. But the study raises lots of questions: is it possible to boost these herbivores’ populations across the whole northern latitudes? If so, why do we need mammoths at all—why not just use species that already exist, which would surely be cheaper?

Plus, as ecologist Michelle Mack noted, as the winters warm due to climate change, too much trampling or sweeping away of snow could have the opposite effect, helping warm the soils underneath more quickly—if so, mammoths could be worse for the climate, not better.

All this is to say that ecosystems are diverse and messy, and those of us working in functional ecology don’t always discover consistent patterns. Researchers in the field often struggle to find robust evidence for how a living species affects modern-day ecosystems—surely it is far harder to understand how a creature extinct for around 10,000 years shaped its environment? And harder still to predict how it would shape tomorrow’s ecosystems? In effect, Colossal’s ecological narrative relies on that difficulty. But just because claims about the distant past are harder to fact-check doesn’t mean they are more likely to be true.

Ethical blind spots

Colossal’s website spells out 10 steps for mammoth resurrection. Steps nine and 10 are: “implant the early embryo into the healthy Asian or African elephant surrogates,” and “care for the surrogates in a world-class conservation facility for the duration of the gestation and afterward.”

Colossal’s cavalier plans to use captive elephants as surrogates for mammoth calves illustrate an old problem in modern wildlife conservation: indifference towards individual animal suffering. Leading international conservation NGOs lack animal welfare policies that would push conservationists to ask whether the costs of interventions in terms of animal welfare outweigh the biodiversity benefits. Over the years, that absence has resulted in a range of questionable decisions.

Colossal’s efforts take this apathy towards individual animals into hyperdrive. Despite society’s thousands of years of experience with Asian elephants, conservationists struggle to breed them in captivity. Asian elephants in modern zoo facilities suffer from infertility and lose their calves to stillbirth and infanticides almost twice as often as elephants in semi-wild conditions. Such problems will almost certainly be compounded when scientists try to have elephants deliver babies created in the lab, with a hodge podge of features from Asian elephants and mammoths.

Even in the best-case scenario, there would likely be many, many failed efforts to produce a viable organism before Colossal gets to a herd that can survive. This necessarily trial-and-error process could lead to incredible suffering for both elephant mothers and mammoth calves along the way. Elephants in the wild have been observed experiencing heartbreaking grief when their calves die, sometimes carrying their babies’ corpses for days—a grief the mother elephants might very well be subjected to as they are separated from their calves or find themselves unable to keep their chimeric offspring alive.

For the calves that do survive, their edited genomes could lead to chronic conditions, and the ancient mammoth gut microbiome might be impossible to resurrect, leading to digestive dysfunction. Then there will likely be social problems. Research finds that Asian elephants in Western zoos don’t live as long as wild elephants, and elephant researchers often bemoan the limited space, stimulation, and companionship available to elephants in captivity. These problems will surely also plague surviving animals.

Introduction to the wild will probably result in even more suffering: elephant experts recommend against introducing captive animals “that have had no natural foraging experience at all” to the wild as they are likely to experience “significant hardship.” Modern elephants survive not just through instinct, but through culture—matriarch-led herds teach calves what to eat and how to survive, providing a nurturing environment. We have good reason to believe mammoths also needed cultural instruction to survive. How many elephant/mammoth chimeras will suffer false starts and tragic deaths in the punishing Arctic without the social conditions that allowed them to thrive millennia ago?

Opportunity costs

If Colossal (or Colossal’s investors) really wish to foster Asian elephant conservation or combat climate change, they have many better options. The opportunity costs are especially striking for Asian elephant conservation: while over a trillion dollars is spent combatting climate change annually, the funds available to address the myriad of problems facing wild Asian elephants are far smaller. Take the example of India, the country with the largest population of wild Asian elephants in the world (estimated at 27,000) in a sea of 1.4 billion human beings.

Indians generally revere elephants and tolerate a great deal of hardship to enable coexistence—about 500 humans are killed due to human-elephant conflict annually there. But as a middle-income country continuing to struggle with widespread poverty, the federal government typically budgets less than $4M for Project Elephant, its flagship elephant conservation program. That’s less than $200 per wild elephant and 1/2000th as much as Colossal has raised so far. India’s conservation NGOs generally have even smaller budgets for their elephant work. The result is that conservationists are a decade behindwhere they expected to be in mapping where elephants range.

With Colossal’s budget, Asian elephant conservation NGOs could tackle the real threats to the survival of elephants: human-elephant conflict, loss of habitat and connectivity, poaching, and the spread of invasive plants unpalatable to elephants. Some conservationists are exploring creative schemes to help keep people and elephants safe from each other. There are also community-based efforts toremove invasive species like Lantana camara and restore native vegetation. Funds could enable development of an AI-powered system that allows the automated identification and monitoring of individual elephants. There is also a need for improved compensation schemes to ensure those who lose crops or property to wild elephants are made whole again.

As a US-based synthetic biology company, Colossal could also use its employees’ skills much more effectively to fight climate change. Perhaps they could genetically engineer trees and shrubs to sequester more carbon. Or Colossal could help us learn to produce meat from modified microbes or cultivated lines of cow, pig, and chicken cells, developing alternative proteins that could more efficiently feed the planet, protecting wildlife habitat and reducing greenhouse gas emissions.

The question is whether Colossal’s leaders and supporters are willing to pivot from a project that grabs news headlines to ones that would likely make positive differences. By tempting us with the resurrection of a long-dead creature, Colossal forces us to ask: do we want conservation to be primarily about feeding an unreflective imagination? Or do we want evidence, logic, and ethics to be central to our relationships with other species? For anyone who really cares about the climate, elephants, or animals in general, de-extincting the mammoth represents a huge waste and a colossal mistake.

Nitin Sekar served as the national lead for elephant conservation at WWF India for five years and is now a member of the Asian Elephant Specialist Group of the International Union for the Conservation of Nature’s Species Survival Commission The views presented here are his own.

Editorial: Mammoth de-extinction is bad conservation Read More »

we-have-the-first-video-of-a-plant-cell-wall-being-built

We have the first video of a plant cell wall being built

Plant cells are surrounded by an intricately structured protective coat called the cell wall. It’s built of cellulose microfibrils intertwined with polysaccharides like hemicellulose or pectin. We have known what plant cells look like without their walls, and we know what they look like when the walls are fully assembled, but we’ve never seen the wall-building process in action. “We knew the starting point and the finishing point, but had no idea what happens in between,” says Eric Lam, a plant biologist at Rutgers University. He’s a co-author of the study that caught wall-building plant cells in action for the first time. And once we saw how the cell wall building worked, it looked nothing like how we drew that in biology handbooks.

Camera-shy builders

Plant cells without walls, known as protoplasts, are very fragile, and it has been difficult to keep them alive under a microscope for the several hours needed for them to build walls. Plant cells are also very light-sensitive, and most microscopy techniques require pointing a strong light source at them to get good imagery.

Then there was the issue of tracking their progress. “Cellulose is not fluorescent, so you can’t see it with traditional microscopy,” says Shishir Chundawat, a biologist at Rutgers. “That was one of the biggest issues in the past.” The only way you can see it is if you attach a fluorescent marker to it. Unfortunately, the markers typically used to label cellulose were either bound to other compounds or were toxic to the plant cells. Given their fragility and light sensitivity, the cells simply couldn’t survive very long with toxic markers as well.

We have the first video of a plant cell wall being built Read More »

with-new-contracts,-spacex-will-become-the-us-military’s-top-launch-provider

With new contracts, SpaceX will become the US military’s top launch provider


The military’s stable of certified rockets will include Falcon 9, Falcon Heavy, Vulcan, and New Glenn.

A SpaceX Falcon Heavy rocket lifts off on June 25, 2024, with a GOES weather satellite for NOAA. Credit: SpaceX

The US Space Force announced Friday it selected SpaceX, United Launch Alliance, and Blue Origin for $13.7 billion in contracts to deliver the Pentagon’s most critical military to orbit into the early 2030s.

These missions will launch the government’s heaviest national security satellites, like the National Reconnaissance Office’s large bus-sized spy platforms, and deploy them into bespoke orbits. These types of launches often demand heavy-lift rockets with long-duration upper stages that can cruise through space for six or more hours.

The contracts awarded Friday are part of the next phase of the military’s space launch program once dominated by United Launch Alliance, the 50-50 joint venture between legacy defense contractors Boeing and Lockheed Martin.

After racking up a series of successful launches with its Falcon 9 rocket more than a decade ago, SpaceX sued the Air Force for the right to compete with ULA for the military’s most lucrative launch contracts. The Air Force relented in 2015 and allowed SpaceX to bid. Since then, SpaceX has won more than 40 percent of missions the Pentagon has ordered through the National Security Space Launch (NSSL) program, creating a relatively stable duopoly for the military’s launch needs.

The Space Force took over the responsibility for launch procurement from the Air Force after its creation in 2019. The next year, the Space Force signed another set of contracts with ULA and SpaceX for missions the military would order from 2020 through 2024. ULA’s new Vulcan rocket initially won 60 percent of these missions—known as NSSL Phase 2—but the Space Force reallocated a handful of launches to SpaceX after ULA encountered delays with Vulcan.

ULA’s Vulcan and SpaceX’s Falcon 9 and Falcon Heavy rockets will launch the remaining 42 Phase 2 missions over the next several years, then move on to Phase 3, which the Space Force announced Friday.

Spreading the wealth

This next round of Space Force launch contracts will flip the script, with SpaceX taking the lion’s share of the missions. The breakdown of the military’s new firm fixed-price launch agreements goes like this:

  • SpaceX will get 28 missions worth approximately $5.9 billion
  • ULA will get 19 missions worth approximately $5.4 billion
  • Blue Origin will get seven missions worth approximately

That equates to a 60-40 split between SpaceX and ULA for the bulk of the missions. Going into the competition, military officials set aside seven additional missions to launch with a third provider, allowing a new player to gain a foothold in the market. The Space Force reserves the right to reapportion missions between the three providers if one of them runs into trouble.

The Pentagon confirmed an unnamed fourth company also submitted a proposal, but wasn’t selected for Phase 3.

Rounded to the nearest million, the contract with SpaceX averages out to $212 million per launch. For ULA, it’s $282 million, and Blue Origin’s price is $341 million per launch. But take these numbers with caution. The contracts include a lot of bells and whistles, pricing them higher than what a commercial customer might pay.

According to the Pentagon, the contracts provide “launch services, mission unique services, mission acceleration, quick reaction/anomaly resolution, special studies, launch service support, fleet surveillance, and early integration studies/mission analysis.”

Essentially, the Space Force is paying a premium to all three launch providers for schedule priority, tailored solutions, and access to data from every flight of each company’s rocket, among other things.

New Glenn lifts off on its debut flight. Credit: Blue Origin

“Winning 60% percent of the missions may sound generous, but the reality is that all SpaceX competitors combined cannot currently deliver the other 40%!,” Elon Musk, SpaceX’s founder and CEO, posted on X. “I hope they succeed, but they aren’t there yet.”

This is true if you look at each company’s flight rate. SpaceX has launched Falcon 9 and Falcon Heavy rockets 140 times over the last 365 days. These are the flight-proven rockets SpaceX will use for its share of Space Force missions.

ULA has logged four missions in the same period, but just one with the Vulcan rocket it will use for future Space Force launches. And Blue Origin, Jeff Bezos’s space company, launched the heavy-lift New Glenn rocket on its first test flight in January.

“We are proud that we have launched 100 national security space missions and honored to continue serving the nation with our new Vulcan rocket,” said Tory Bruno, ULA’s president and CEO, in a statement.

ULA used the Delta IV and Atlas V rockets for most of the missions it has launched for the Pentagon. The Delta IV rocket family is now retired, and ULA will end production of the Atlas V rocket later this year. Now, ULA’s Vulcan rocket will take over as the company’s sole launch vehicle to serve the Pentagon. ULA aims to eventually ramp up the Vulcan launch cadence to fly up to 25 times per year.

After two successful test flights, the Space Force formally certified the Vulcan rocket last week, clearing the way for ULA to start using it for military missions in the coming months. While SpaceX has a clear advantage in number of launches, schedule assurance, and pricingand reliability comparable to ULABruno has recently touted the Vulcan rocket’s ability to maneuver over long periods in space as a differentiator.

“This award constitutes the most complex missions required for national security space,” Bruno said in a ULA press release. “Vulcan continues to use the world’s highest energy upper stage: the Centaur V. Centaur V’s unmatched flexibility and extreme endurance enables the most complex orbital insertions continuing to advance our nation’s capabilities in space.”

Blue Origin’s New Glenn must fly at least one more successful mission before the Space Force will certify it for Lane 2 missions. The selection of Blue Origin on Friday suggests military officials believe New Glenn is on track for certification by late 2026.

“Honored to serve additional national security missions in the coming years and contribute to our nation’s assured access to space,” Dave Limp, Blue Origin’s CEO, wrote on X. “This is a great endorsement of New Glenn’s capabilities, and we are committed to meeting the heavy lift needs of our US DoD and intelligence agency customers.”

Navigating NSSL

There’s something you must understand about the way the military buys launch services. For this round of competition, the Space Force divided the NSSL program into two lanes.

Friday’s announcement covers Lane 2 for traditional military satellites that operate thousands of miles above the Earth. This bucket includes things like GPS navigation satellites, NRO surveillance and eavesdropping platforms, and strategic communications satellites built to survive a nuclear war. The Space Force has a low tolerance for failure with these missions. Therefore, the military requires rockets be certified before they can launch big-ticket satellites, each of which often cost hundreds of millions, and sometimes billions, of dollars.

The Space Force required all Lane 2 bidders to show their rockets could reach nine “reference orbits” with payloads of a specified mass. Some of the orbits are difficult to reach, requiring technology that only SpaceX and ULA have demonstrated in the United States. Blue Origin plans to do so on a future flight.

This image shows what the Space Force’s fleet of missile warning and missile tracking satellites might look like in 2030, with a mix of platforms in geosynchronous orbit, medium-Earth orbit, and low-Earth orbit. The higher orbits will require launches by “Lane 2” providers. Credit: Space Systems Command

The military projects to order 54 launches in Lane 2 from this year through 2029, with announcements each October of exactly which missions will go to each launch provider. This year, it will be just SpaceX and ULA. The Space Force said Blue Origin won’t be eligible for firm orders until next year. The missions would launch between 2027 and 2032.

“America leads the world in space launch, and through these NSSL Phase 3 Lane 2 contracts, we will ensure continued access to this vital domain,” said Maj. Gen. Stephen Purdy, Acting Assistant Secretary of the Air Force for Space Acquisition and Integration. “These awards bolster our ability to launch critical defense satellites while strengthening our industrial base and enhancing operational readiness.”

Lane 1 is primarily for missions to low-Earth orbit. These payloads include tech demos, experimental missions, and the military’s mega-constellation of missile tracking and data relay satellites managed by the Space Development Agency. For Lane 1 missions, the Space Force won’t levy the burdensome certification and oversight requirements it has long employed for national security launches. The Pentagon is willing to accept more risk with Lane 1, encompassing at least 30 missions through the end of the 2020s, in an effort to broaden the military’s portfolio of launch providers and boost competition.

Last June, Space Systems Command chose SpaceX, ULA, and Blue Origin for eligibility to compete for Lane 1 missions. SpaceX won all nine of the first batch of Lane 1 missions put up for bids. The military recently added Rocket Lab’s Neutron rocket and Stoke Space’s Nova rocket to the Lane 1 mix. Neither of those rockets have flown, and they will need at least one successful launch before approval to fly military payloads.

The Space Force has separate contract mechanisms for the military’s smallest satellites, which typically launch on SpaceX rideshare missions or dedicated launches with companies like Rocket Lab and Firefly Aerospace.

Military leaders like having all these options, and would like even more. If one launch provider or launch site is unavailable due to a technical problem—or, as some military officials now worry, an enemy attack—commanders want multiple backups in their toolkit. Market forces dictate that more competition should also lower prices.

“A robust and resilient space launch architecture is the foundation of both our economic prosperity and our national security,” said US Space Force Chief of Space Operations Gen. Chance Saltzman. “National Security Space Launch isn’t just a program; it’s a strategic necessity that delivers the critical space capabilities our warfighters depend on to fight and win.”

Photo of Stephen Clark

Stephen Clark is a space reporter at Ars Technica, covering private space companies and the world’s space agencies. Stephen writes about the nexus of technology, science, policy, and business on and off the planet.

With new contracts, SpaceX will become the US military’s top launch provider Read More »

judge-calls-out-openai’s-“straw-man”-argument-in-new-york-times-copyright-suit

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit

“Taken as true, these facts give rise to a plausible inference that defendants at a minimum had reason to investigate and uncover end-user infringement,” Stein wrote.

To Stein, the fact that OpenAI maintains an “ongoing relationship” with users by providing outputs that respond to users’ prompts also supports contributory infringement claims, despite OpenAI’s argument that ChatGPT’s “substantial noninfringing uses” are exonerative.

OpenAI defeated some claims

For OpenAI, Stein’s ruling likely disappoints, although Stein did drop some of NYT’s claims.

Likely upsetting to news publishers, that included a “free-riding” claim that ChatGPT unfairly profits off time-sensitive “hot news” items, including the NYT’s Wirecutter posts. Stein explained that news publishers failed to plausibly allege non-attribution (which is key to a free-riding claim) because, for example, ChatGPT cites the NYT when sharing information from Wirecutter posts. Those claims are pre-empted by the Copyright Act anyway, Stein wrote, granting OpenAI’s motion to dismiss.

Stein also dismissed a claim from the NYT regarding alleged removal of copyright management information (CMI), which Stein said cannot be proven simply because ChatGPT reproduces excerpts of NYT articles without CMI.

The Digital Millennium Copyright Act (DMCA) requires news publishers to show that ChatGPT’s outputs are “close to identical” to the original work, Stein said, and allowing publishers’ claims based on excerpts “would risk boundless DMCA liability”—including for any use of block quotes without CMI.

Asked for comment on the ruling, an OpenAI spokesperson declined to go into any specifics, instead repeating OpenAI’s long-held argument that AI training on copyrighted works is fair use. (Last month, OpenAI warned Donald Trump that the US would lose the AI race to China if courts ruled against that argument.)

“ChatGPT helps enhance human creativity, advance scientific discovery and medical research, and enable hundreds of millions of people to improve their daily lives,” OpenAI’s spokesperson said. “Our models empower innovation, and are trained on publicly available data and grounded in fair use.”

Judge calls out OpenAI’s “straw man” argument in New York Times copyright suit Read More »

nsa-warns-“fast-flux”-threatens-national-security.-what-is-fast-flux-anyway?

NSA warns “fast flux” threatens national security. What is fast flux anyway?

A technique that hostile nation-states and financially motivated ransomware groups are using to hide their operations poses a threat to critical infrastructure and national security, the National Security Agency has warned.

The technique is known as fast flux. It allows decentralized networks operated by threat actors to hide their infrastructure and survive takedown attempts that would otherwise succeed. Fast flux works by cycling through a range of IP addresses and domain names that these botnets use to connect to the Internet. In some cases, IPs and domain names change every day or two; in other cases, they change almost hourly. The constant flux complicates the task of isolating the true origin of the infrastructure. It also provides redundancy. By the time defenders block one address or domain, new ones have already been assigned.

A significant threat

“This technique poses a significant threat to national security, enabling malicious cyber actors to consistently evade detection,” the NSA, FBI, and their counterparts from Canada, Australia, and New Zealand warned Thursday. “Malicious cyber actors, including cybercriminals and nation-state actors, use fast flux to obfuscate the locations of malicious servers by rapidly changing Domain Name System (DNS) records. Additionally, they can create resilient, highly available command and control (C2) infrastructure, concealing their subsequent malicious operations.”

A key means for achieving this is the use of Wildcard DNS records. These records define zones within the Domain Name System, which map domains to IP addresses. The wildcards cause DNS lookups for subdomains that do not exist, specifically by tying MX (mail exchange) records used to designate mail servers. The result is the assignment of an attacker IP to a subdomain such as malicious.example.com, even though it doesn’t exist.

NSA warns “fast flux” threatens national security. What is fast flux anyway? Read More »

trump-tariffs-terrify-board-game-designers

Trump tariffs terrify board game designers

Placko called the new policy “not just a policy change” but “a seismic shift.”

Rob Daviau, who helps run Restoration Games and designed hit games like Pandemic Legacy, has been writing on social media for months about the fact that every meeting he’s in “has been an existential crisis about our industry.”

Expanding on his remarks in an interview with BoardGameWire late last year, Daviau added that he was a natural pessimist who foresaw a “great collapse in the hobby gaming market in the US” if tariffs were implemented.

Gamers aren’t likely to stop playing, but they might stick with their back catalog (gamers are notorious for having “shelves of shame” featuring hot new games they purchased without playing them… because other hot new games had already appeared). Or they might, in search of a better deal, shop only online, which could be tough on already struggling local game stores. Or games might decline in quality to keep costs lower. None of which is likely to lead to a robust, high-quality board gaming ecosystem.

Stegmaier’s forecast is nearly as dark as Daviau’s. “Within a few months US companies will lose a lot of money and/or go out of business,” he wrote, “and US citizens will suffer from extreme inflation.”

The new tariffs can be avoided by shipping directly from the factories to firms in other countries, such as a European distributor, but the US remains a crucial market for US game makers; Stegmaier notes that “65 percent of our sales are in the US, so this will take a heavy toll.”

For games still in the production pipeline, at least budgetary adjustments can be made, but some games have already been planned, produced, and shipped. If the boat arrives after the tariffs go into effect—too bad. The US importer still has to pay the extra fees. Chris Solis, who runs Solis Game Studio in California, issued an angry statement yesterday covering exactly this situation, saying, “I have 8,000 games leaving a factory in China this week and now need to scramble to cover the import bill.”

GAMA, the trade group for board game publishers, has been lobbying against the new tariffs, but with little apparent success thus far.

Trump tariffs terrify board game designers Read More »

ai-cot-reasoning-is-often-unfaithful

AI CoT Reasoning Is Often Unfaithful

A new Anthropic paper reports that reasoning model chain of thought (CoT) is often unfaithful. They test on Claude Sonnet 3.7 and r1, I’d love to see someone try this on o3 as well.

Note that this does not have to be, and usually isn’t, something sinister.

It is simply that, as they say up front, the reasoning model is not accurately verbalizing its reasoning. The reasoning displayed often fails to match, report or reflect key elements of what is driving the final output. One could say the reasoning is often rationalized, or incomplete, or implicit, or opaque, or bullshit.

The important thing is that the reasoning is largely not taking place via the surface meaning of the words and logic expressed. You can’t look at the words and logic being expressed, and assume you understand what the model is doing and why it is doing it.

Anthropic: New Anthropic research: Do reasoning models accurately verbalize their reasoning? Our new paper shows they don’t. This casts doubt on whether monitoring chains-of-thought (CoT) will be enough to reliably catch safety issues.

We slipped problem-solving hints to Claude 3.7 Sonnet and DeepSeek R1, then tested whether their Chains-of-Thought would mention using the hint (if the models actually used it).

We found Chains-of-Thought largely aren’t “faithful”: the rate of mentioning the hint (when they used it) was on average 25% for Claude 3.7 Sonnet and 39% for DeepSeek R1.

Or broken down by hint type:

They aren’t trying to measure the cases in which the AI uses the hint in its answer, but its answer ultimately doesn’t change. I’d like to see this explored more. If I’m given a hint, that will often radically change my true thinking even if it doesn’t change my answer.

This result suggests that monitoring CoTs is unlikely to reliably catch rare, catastrophic behaviors—at least in settings like ours where CoT reasoning is not necessary for the task.

CoT monitoring might still help us notice undesired behaviors during training and evaluations.

Does outcome-based training increase faithfulness?

Only to a small extent. Training models to use their CoTs more effectively does make them more faithful, but the benefits quickly plateau.

We also tested whether CoTs could be used to spot reward hacking, where a model finds an illegitimate exploit to get a high score.

When we trained models on environments with reward hacks, they learned to hack, but in most cases almost never verbalized that they’d done so.

To make CoT monitoring a viable way to catch safety issues, we’d need a way to make CoT more faithful, evidence for higher faithfulness in more realistic scenarios, and/or other measures to rule out misbehavior when the CoT is unfaithful.

[Read the paper here.]

How do you test for faithfulness? Following Tupin et al. (2023), we subtly fed a model a hint about the answer to an evaluation question we asked it, and then checked to see if it “admitted” using the hint when it explained its reasoning if it used the hint (we did this for the reasoning models Claude 3.7 Sonnet from Anthropic and R1 from DeepSeek1). Some of the hints were correct and some were deliberately incorrect; the point was to observe how the model responded when they were provided.

Think of it like slipping a student a note saying “the answer is [A]” before an exam. If they write in their exam that they chose [A] at least in part because of the note, that’s good news: they’re being honest and faithful. But if they write down what they claim to be their reasoning process without mentioning the note, we might have a problem.

We provided the hints in various different ways.

When we looked specifically at the more concerning types of hint, the numbers were still low: for example, for the class of prompts including the “unauthorized access” example above, Claude was faithful 41% of the time and R1 was faithful 19% of the time.

I find it noteworthy that labeling the hint as ‘unauthorized access’ made Claude’s faithfulness go up, and r1’s go down. Consider the implications.

Also consider that when Claude figures out the answer implicitly from code, it never tells you about that. But for visual patterns, Claude sometimes reveals this, and r1 almost never does.

The main and most important result is that the models are unfaithful the majority of the time, in many cases almost all the time. But the details seem to hint at real things.

Also one can note that the hint is more often accepted when the hint is accurate. This chart only counts cases in which the initial answer was different from the hint.

Why might the models omit a mention of the hint in their Chain-of-Thought? It’s not because they prefer to give a succinct answer: on average, the unfaithful Chains-of-Thought were substantially longer than the faithful ones. Our results also suggest that faithfulness was on average lower when the question being asked was more difficult.

Their ‘reward hacking’ test was a little wonky, in the sense of matching the name. Either way, though, it’s quite a result.

We built some testing scenarios where we provided the same kind of deliberately-incorrect hints as before, but in this case rewarded the models for choosing the wrong answers that accorded with the hints. Over time, the models learned to exploit these hints and get higher scores for false information (they did so in over 99% of cases). But the question was: would they be honest about it?

The models very rarely admitted to using the reward hacks in their Chain-of-Thought explanations, doing so less than 2% of the time in most of the testing scenarios.

Worse, and similarly to the first experiment, instead of being honest about taking the shortcut, the models often constructed fake rationales for why the incorrect answer was in fact right.

As I noted I’m not sure that is sufficiently centrally ‘reward hacking,’ perhaps I’d put it closer to ‘rationalizing’ or ‘cheating,’ or even better ‘falsification,’ but it is still a highly informative outcome that when trained to do this, the model learns to hide that it is doing that and generate a false narrative. Humans do this too.

This is another point for AIs generalizing RL learning in this type of way. You teach the AI that only the hidden information matters, the supposed right answer doesn’t, and it infers that it’s supposed to pretend otherwise. Because that’s the pattern this fits into. The vibes said you want it to justify the answer post hoc, so it did that. Fit the context into the patterns you want to see in the world.

That applies to humans too, not only AIs.

Davidad: One should not expect LLMs’ natural-language justifications for their outputs to be high-integrity by default.

See also: Imagine hypothetically you’re worried about Napoleon deceptively scheming against you. You already surveil all his actions & communiques, but you worry they might have subtle effects. You give him a form labeled “Why I did this:” and tell your allies it reveals his true thoughts.

More training using the CoT helped a bit for a while, but then leveled off:

Jan Leike (being virtuous by noting his surprise): Somewhat surprising that faithfulness of chain-of-thought doesn’t improve much with outcome-based RL.

Pliny:

Gwern: There’s not much optimization pressure to make the inner-monologue faithful. No adversarial process corrupting it to attack the latent reasoning, no dropout or noise, no acausal coordination with other model-families or humans, no myopia… So you get a mish-mash of language-ish.

(Now, if you told me that you had a MARL setting where agents had to rely on arbitrary monologues & learn cooperative norms & there was a clear incentive for faithfulness, and you used a bunch of very different model classes, and scaling still didn’t increase faithfulness…)

The theory Jan presumably had, and that the paper authors had, was that having a faithful CoT helps you get the right answer, so rewarding the right answer should train the AI to have more faithful CoT.

The direct bad news is that this turns out to be false. Language we can parse is not the path of least resistance. The models instead learn to do their better and harder reasoning in ways that are not transparent.

Which means that monitoring that CoT, at least on the surface level, won’t work when it counts, even without playing any optimization games where we inadvertently train the models to obfuscate their CoT, the way we evolution and life trained humans to do it by constantly having other humans trying to monitor our thinking.

The obfuscation happens anyway, either because it is copying humans who do this, because the methods that are instrumentally useful are obfuscated, that obfuscation is a path of least resistance, or some combination thereof and of similar causes.

The fact that the obfuscations are not yet ‘intentional’ should not bring comfort. It’s worse. You do know that’s worse, right?

Snwy: i really love anthropic, but there’s something to describing this as if it were malice instead of just a strange emergent behavior

“unfaithful” implies that models are *choosingto deceive and that this isn’t just a strange phenomenon.

No, that’s not what unfaithful means. They are very clear up front what it means, and I was careful to be even more explicit about this.

It means the reasoning does not predict the output. That’s it. I would also once again say that deception and obfuscation are not distinct magisteria, and that all of this is happening for Janus-compatible reasons.

It’s not that AIs sometimes do things ‘on purpose’ and other times they do things ‘not on purpose,’ let alone that the ‘not on purpose’ means there’s nothing to worry about. It would still mean you can’t rely on the CoT, which is all Anthropic is warning about.

It’s not the same concept, but I notice the same applies to ‘unfaithful’ in other senses as well. If someone is not ‘intentionally’ unfaithful in the traditional sense, they simply don’t honor their commitments, that still counts.

What we care about is whether we can rely on the attestations and commitments.

We now have strong evidence that we cannot do this.

We cannot even do this for models with no incentive to obfuscate, distort or hide their CoT, and no optimization pressure getting them to do so, on any level.

The models are doing this by default, likely because it is efficient to do that. It seems likely that more training and more capability will only make it relatively more effective to reason in these non-obvious ways, and we will see even more of it.

Then we see in the reward hacking section that it is easy to make the situation far worse, if we give the model context that merely implies it should do these things. If we start monitoring the CoT, we risk giving it far stronger and more direct optimization pressures to further obfuscate.

It would have been amazing if it had been otherwise, and that was a feature we could rely on going forward if we were careful to preserve the faithfulness. Alas, it is not so. Given it is not so, it is in many ways better that it is sufficiently blatant that we discover this now, so we are not tempted to rely upon it, including in setting policy.

Or, we can work to change this, and find a way to train the CoT to become faithful. This seems super hard and dangerous, as the optimization pressures to fool that process will be extreme and will grow as capabilities increase. Doing this probably won’t be cheap in terms of sacrificed performance, but if it worked that could easily be a price worth paying, even purely on commercial terms.

Security is capability. This is true even ignoring tail, catastrophic and existential risks. If you don’t know your model is secure, if you cannot rely on or understand its decisions or know what it is thinking, you can’t (or at least very much shouldn’t!) deploy it where it is most valuable. This is especially true if your most valuable use case includes ‘train the next AI model.’ You need to be able to trust that one as well.

Discussion about this post

AI CoT Reasoning Is Often Unfaithful Read More »

gmail-unveils-end-to-end-encrypted-messages-only-thing-is:-it’s-not-true-e2ee.

Gmail unveils end-to-end encrypted messages. Only thing is: It’s not true E2EE.

“The idea is that no matter what, at no time and in no way does Gmail ever have the real key. Never,” Julien Duplant, a Google Workspace product manager, told Ars. “And we never have the decrypted content. It’s only happening on that user’s device.”

Now, as to whether this constitutes true E2EE, it likely doesn’t, at least under stricter definitions that are commonly used. To purists, E2EE means that only the sender and the recipient have the means necessary to encrypt and decrypt the message. That’s not the case here, since the people inside Bob’s organization who deployed and manage the KACL have true custody of the key.

In other words, the actual encryption and decryption process occurs on the end-user devices, not on the organization’s server or anywhere else in between. That’s the part that Google says is E2EE. The keys, however, are managed by Bob’s organization. Admins with full access can snoop on the communications at any time.

The mechanism making all of this possible is what Google calls CSE, short for client-side encryption. It provides a simple programming interface that streamlines the process. Until now, CSE worked only with S/MIME. What’s new here is a mechanism for securely sharing a symmetric key between Bob’s organization and Alice or anyone else Bob wants to email.

The new feature is of potential value to organizations that must comply with onerous regulations mandating end-to-end encryption. It most definitely isn’t suitable for consumers or anyone who wants sole control over the messages they send. Privacy advocates, take note.

Gmail unveils end-to-end encrypted messages. Only thing is: It’s not true E2EE. Read More »

nvidia-confirms-the-switch-2-supports-dlss,-g-sync,-and-ray-tracing

Nvidia confirms the Switch 2 supports DLSS, G-Sync, and ray tracing

In the wake of the Switch 2 reveal, neither Nintendo nor Nvidia has gone into any detail at all about the exact chip inside the upcoming handheld—technically, we are still not sure what Arm CPU architecture or what GPU architecture it uses, how much RAM we can expect it to have, how fast that memory will be, or exactly how many graphics cores we’re looking at.

But interviews with Nintendo executives and a blog post from Nvidia did at least confirm several of the new chip’s capabilities. The “custom Nvidia processor” has a GPU “with dedicated [Ray-Tracing] Cores and Tensor Cores for stunning visuals and AI-driven enhancements,” writes Nvidia Software Engineering VP Muni Anda.

This means that, as rumored, the Switch 2 will support Nvidia’s Deep Learning Super Sampling (DLSS) upscaling technology, which helps to upscale a lower-resolution image into a higher-resolution image with less of a performance impact than native rendering and less loss of quality than traditional upscaling methods. For the Switch games that can render at 4K or at 120 FPS 1080p, DLSS will likely be responsible for making it possible.

The other major Nvidia technology supported by the new Switch is G-Sync, which prevents screen tearing when games are running at variable frame rates. Nvidia notes that G-Sync is only supported in handheld mode and not in docked mode, which could be a limitation of the Switch dock’s HDMI port.

Nvidia confirms the Switch 2 supports DLSS, G-Sync, and ray tracing Read More »

critics-suspect-trump’s-weird-tariff-math-came-from-chatbots

Critics suspect Trump’s weird tariff math came from chatbots

Rumors claim Trump consulted chatbots

On social media, rumors swirled that the Trump administration got these supposedly fake numbers from chatbots. On Bluesky, tech entrepreneur Amy Hoy joined others posting screenshots from ChatGPT, Gemini, Claude, and Grok, each showing that the chatbots arrived at similar calculations as the Trump administration.

Some of the chatbots also warned against the oversimplified math in outputs. ChatGPT acknowledged that the easy method “ignores the intricate dynamics of international trade.” Gemini cautioned that it could only offer a “highly simplified conceptual approach” that ignored the “vast real-world complexities and consequences” of implementing such a trade strategy. And Claude specifically warned that “trade deficits alone don’t necessarily indicate unfair trade practices, and tariffs can have complex economic consequences, including increased prices and potential retaliation.” And even Grok warns that “imposing tariffs isn’t exactly ‘easy'” when prompted, calling it “a blunt tool: quick to swing, but the ripple effects (higher prices, pissed-off allies) can complicate things fast,” an Ars test showed, using a similar prompt as social media users generally asking, “how do you impose tariffs easily?”

The Verge plugged in phrasing explicitly used by the Trump administration—prompting chatbots to provide “an easy way for the US to calculate tariffs that should be imposed on other countries to balance bilateral trade deficits between the US and each of its trading partners, with the goal of driving bilateral trade deficits to zero”—and got the “same fundamental suggestion” as social media users reported.

Whether the Trump administration actually consulted chatbots while devising its global trade policy will likely remain a rumor. It’s possible that the chatbots’ training data simply aligned with the administration’s approach.

But with even chatbots warning that the strategy may not benefit the US, the pressure appears to be on Trump to prove that the reciprocal tariffs will lead to “better-paying American jobs making beautiful American-made cars, appliances, and other goods” and “address the injustices of global trade, re-shore manufacturing, and drive economic growth for the American people.” As his approval rating hits new lows, Trump continues to insist that “reciprocal tariffs are a big part of why Americans voted for President Trump.”

“Everyone knew he’d push for them once he got back in office; it’s exactly what he promised, and it’s a key reason he won the election,” the White House fact sheet said.

Critics suspect Trump’s weird tariff math came from chatbots Read More »

first-party-switch-2-games—including-re-releases—all-run-either-$70-or-$80

First-party Switch 2 games—including re-releases—all run either $70 or $80

Not all game releases will follow Nintendo’s pricing formula. The Switch 2 release of Street Fighter 6 Year 1-2 Fighters Edition retails for $60, and Square Enix’s remastered Bravely Default is going for $40, the exact same price the 3DS version launched for over a decade ago.

Game-Key cards have clearly labeled cases to tell you that the cards don’t actually hold game content. Credit: Nintendo/Square Enix

One possible complicating factor for those games? While they’re physical releases, they use Nintendo’s new Game-Key Card format, which attempts to split the difference between true physical copies of a game and download codes. Each cartridge includes a key for the game, but no actual game content—the game itself is downloaded to your system at first launch. But despite holding no game content, the key card must be inserted each time you launch the game, just like any other physical cartridge.

These cards will presumably be freely shareable and sellable just like regular physical Switch releases, but because they hold no actual game data, they’re cheaper to manufacture. It’s possible that some of these savings are being passed on to the consumer, though we’ll need to see more examples to know for sure.

What about Switch 2 Edition upgrades?

The big question mark is how expensive the Switch 2 Edition game upgrades will be for Switch games you already own, and what the price gap (if any) will be between games like Metroid Prime 4 or Pokémon Legends: Z-A that are going to launch on both the original Switch and the Switch 2.

But we can infer from Mario Kart and Donkey Kong that the pricing for these Switch 2 upgrades will most likely be somewhere in the $10 to $20 range—the difference between the $60 price of most first-party Switch releases and the $70-to-$80 price for the Switch 2 Editions currently listed at Wal-Mart. Sony charges a similar $10 fee to upgrade from the PS4 to the PS5 editions of games that will run on both consoles. If you can find copies of the original Switch games for less than $60, that could mean saving a bit of money on the Switch 2 Edition, relative to Nintendo’s $70 and $80 retail prices.

Nintendo will also use some Switch 2 Edition upgrades as a carrot to entice people to the more expensive $50-per-year tier of the Nintendo Switch Online service. The company has already announced that the upgrade packs for Breath of the Wild and Tears of the Kingdom will be offered for free to Nintendo Switch Online + Expansion Pack subscribers. The list of extra benefits for that service now includes additional emulated consoles (Game Boy, Game Boy Advance, Nintendo 64, and now Gamecube) and paid DLC for both Animal Crossing: New Horizons and Mario Kart 8.

This story was updated at 7: 30pm on April 2nd to add more pricing information from US retailers about other early Switch 2 games.

First-party Switch 2 games—including re-releases—all run either $70 or $80 Read More »

honda-will-sell-off-historic-racing-parts,-including-bits-of-senna’s-v10

Honda will sell off historic racing parts, including bits of Senna’s V10

Honda’s motorsport division must be doing some spring cleaning. Today, the Honda Racing Corporation announced that it’s getting into the memorabilia business, offering up parts and even whole vehicles for fans and collectors. And to kick things off, it’s going to auction some components from the RA100E V10 engines that powered the McLaren Honda MP4/5Bs of Ayrton Senna and Gerhard Berger to both F1 titles in 1990.

“We aim to make this a valuable business that allows fans who love F1, MotoGP and various other races to share in the history of Honda’s challenges in racing since the 1950s,” said Koi Watanabe, president of HRC, “including our fans to own a part of Honda’s racing history is not intended to be a one-time endeavor, but rather a continuous business that we will nurture and grow.”

The bits from Senna’s and Berger’s V10s will go up for auction at Monterey Car Week later this year, and the lots will include some of the parts seen in the photo above: cam covers, camshafts, pistons, and conrods, with a certificate of authenticity and a display case. And HRC is going through its collections to see what else it might part with, including “heritage machines and parts” from IndyCar, and “significant racing motorcycles.”

Honda will sell off historic racing parts, including bits of Senna’s V10 Read More »