Author name: Tim Belzer

selling-h200s-to-china-is-unwise-and-unpopular

Selling H200s to China Is Unwise and Unpopular

AI is the most important thing about the future. It is vital to national security. It will be central to economic, military and strategic supremacy.

This is true regardless of what other dangers and opportunities AI might present.

The good news is that America has many key advantages in AI.

America’s greatest advantage in AI is our vastly superior access to compute.

We are in danger of selling a large portion of that advantage for 30 pieces of silver.

This is on track to be done against the wishes of Congress as well as most of those in the executive branch.

Who does it benefit? It benefits China. It might not even benefit Nvidia.

Doing so would be both highly unwise and highly unpopular.

We should not sell highly capable Nvidia H200 chips to China.

If it is too late to not sell H200s, we must limit quantities, and ensure it stops there. We absolutely cannot be giving away other future chips on a similar delay.

The good news is that the stock market reaction implies this might not scale.

Bayeslord: I don’t know anyone who thinks this is a good idea.

Jordan Schneider: DOJ arrests H200 smugglers the SAME DAY Trump legalizes their export! too good.

Here it is, exactly as he wrote it on Truth Social:

Donald Trump (President of the United States): I have informed President Xi, of China, that the United States will allow NVIDIA to ship its H200 products to approved customers in China, and other Countries, under conditions that allow for continued strong National Security. President Xi responded positively! $25% will be paid to the United States of America. This policy will support American Jobs, strengthen U.S. Manufacturing, and benefit American Taxpayers. The Biden Administration forced our Great Companies to spend BILLIONS OF DOLLARS building “degraded” products that nobody wanted, a terrible idea that slowed Innovation, and hurt the American Worker. That Era is OVER! We will protect National Security, create American Jobs, and keep America’s lead in AI. NVIDIA’s U.S. Customers are already moving forward with their incredible, highly advanced Blackwell chips, and soon, Rubin, neither of which are part of this deal. My Administration will always put America FIRST. The Department of Commerce is finalizing the details, and the same approach will apply to AMD, Intel, and other GREAT American Companies. MAKE AMERICA GREAT AGAIN!

Peter Wildeford: I wonder what the “conditions that allow for continued strong National Security” will be. Seems important!

The ‘conditions that allow’ clause could be our out to at least mitigate the damage here, since no these sales would not allow for continued strong national security.

I believe this would, if it was carried out at scale without big national security conditions attached, be extremely damaging to American national interests and national security and our ability to ‘beat China’ in any meaningful sense, in addition to any impacts it would have on AI safety and our ability to navigate and survive the transition to superintelligence.

This is happening despite strong opposition from Congress, what looks like opposition from most of those in the executive branch, deep unpopularity among experts and the entire policy community and the strong advice of America’s strongest AI champions on the software side, and unpopularity with the public. The vibes are almost entirely ‘this is a terrible decision, what are we even doing.’

The only ways I can think of for this to be a non-terrible idea are either if the Chinese somehow refuse the chips in which case it will do little harm but also little good, or (and to be fully clear on this possibility: I have zero reason to believe this to be the case) that H200s have some sort of secret remote control or backdoor.

I presume this decision comes from a failure by Trump to appreciate the strategic importance, power and cost efficiency of the H200 chips, combined with aggressive pushing of this from those including David Sacks who have repeatedly put private industry interests, especially those of Nvidia, above the interests of America.

Alec Stapp (IFP): Massive own goal to export these AI chips to China.

The H200 is 6x more powerful than the H20, which was previously the most powerful chip approved for export.

Our compute advantage is the main thing keeping us ahead of China in AI.

Why would we throw that away?

Chris McGuire (Council on Foreign Relations): This is the single biggest change in U.S.-China policy of the entire Administration, signaling a reversion to the cooperative policies of the 2000s and early 2010s and away from the competitive policies of Trump 1 and Biden.

It is a transformational moment for U.S. technology policy, which until now had been predicated on investing at home while holding China back.

Now we are trying to win a race against a competitor who doesn’t play by the rules.

Chris McGuire: This is a seachange in U.S. policy, and a significant strategic mistake. If the United States sells AI chips to China that are 18 months behind the frontier, it negates the biggest U.S. advantage over China in AI. Here are four reasons that this new policy helps China much more than it helps the United States:

1️⃣No Chinese AI chip firm poses a strategic threat to Nvidia or any other U.S firm. China does not plan to make a chip better than the H200 until Q4 2027 at the earliest. It also is severely constrained in the number of lower-quality chips it can make. And China will continue to do everything in its power to reduce its dependency on US AI chips, even while it retains access to US chips.

2️⃣Because the U.S. lead over China in AI chips is rapidly increasing, a fixed 18 month delay will be even more beneficial to China in the coming months and years. It means the United States could start to sell Blackwell chips to China as soon as the middle of next year – despite the fact that no Chinese firm has plans to make a chip as good as the GB200 any time this decade. And Rubin chips – which are projected to be 28x (!) better than U.S. export control thresholds later in President Trump’s term.

3️⃣ Exporting large numbers of AI chips to China will provide an enormous increase to China’s aggregate AI compute capabilities; the quantity of chips that are approved will be key. Large quantity exports will also allow China to compete with U.S. firms in AI infrastructure construction globally – including with “good enough” AI data centers that use previous-generation technology but are subsidized by the Chinese government and cheaper than U.S. offerings. Right now China cannot offer any product that can compete globally with U.S. data centers. That is about to change.

4️⃣ We got nothing in exchange for this. This is a massive concession to China, reversing the most significant U.S. technology protection policy vis-a-vis China that has ever been implemented and China’s second most significant criticism of U.S. policy, behind only U.S. support for Taiwan. But the way reporting frames it, this is a unilateral U.S. concession. If the tables were turned, China would not give the United States H200s – and if they did, they certainly wouldn’t give it to us for free.

Peter Wildeford: 💯on these notes on selling 🇨🇳 the H200 chip…

– gives 🇨🇳 access to chips ~2 years ahead of what they can make

– doesn’t slow down 🇨🇳 development much

– if we allow large quantities of exports, it’s all China needs to compete with US AI + cloud

– 🇺🇸 gets nothing in return? [other than the 25% cut]

Right now we have a huge compute advantage. This gives a lot of that away.

Selling H200s would be way, way less bad than selling China the B30A.

Selling H200s would be way, way worse than selling China the H20.

Tim Fist: The H200 belongs to the previous “Hopper” generation of NVIDIA AI chips. These are still widely used for frontier AI in the US and will likely remain so for 1 to 2 years.

18 of the 20 most powerful publicly documented GPU clusters primarily use Hopper chips.

IFP dives into the technical specifications so you don’t have to. If you are taking this issue fully seriously I encourage you to read the whole thing.

Tim Fist (IFP): The US has reportedly decided to approve exports of NVIDIA’s H200 chip to China. This gives Chinese AI labs chips that outperform anything China can make until ~2028.



How big a deal this is depends on how many we export.

… Why is compute advantage good?

A bigger advantage means greater US capacity to train more/more powerful models, support more and better AI and cloud companies, and deploy more AI at home and abroad.

I will summarize.

  1. The Nvidia H200 is six times as powerful as the Nvidia H20.

  2. China will be unable to produce chips superior to the H200 until Q4 2027 at the earliest. Even when it does this, it will have very little manufacturing capacity.

  3. H200s would allow Chinese AI supercomputers at roughly +50% cost for training and +100%-500% for inference, compared to Americans with Blackwells.

  4. China’s manufacturing capabilities and timeline will be entirely unaffected by these exports, because they will proceed at maximum speed either way. This is a matter of national security for them and their domestic market is gigantic.

  5. Every chip we export is a chip that would have gone to America instead.

That last point is important. There is demand for as many chips as we can manufacture. Every time we task as fab with producing an H200 to sell to China, it could have produced a chip for America, or sold that H200 to America.

So is our massive compute manufacturing advantage. It’s big:

How much would H200 exports erode our compute advantage? Quite a lot.

Essentially no.

The vibes were universally terrible. Nobody wants this. Well, almost no one.

Drew Pavlou: Is there any steel man case for this at all?

Could it delay a Chinese manufacturing shift to their own indigenous super advanced chips?

Please tell me that there’s a silver lining.

Melissa Chen: No. This is the Iran Deal for the Trump admin.

LiquidZulu: The steel man is that socialism is bad, and it is good to let people voluntarily trade.

Samuel Hammond (FAI): There’s zero upside for the US, sorry. The most we can hope is that China’s boomer leadership blocks them for us, but I’m not betting on it. At minimum Chinese chip makers will buy H200s to strip them for HBM3E to reverse engineer and put into Huawei chips.

Josh Rogin: The Chinese market will ultimately be lost to Chinese competitors no matter what. Giving Chinese firms advanced US tech now doesn’t delay that – it just gives China a way to catch up with the United States even faster. Strategically stupid.

Tom Winter (NBC News): Shortly before this was announced the Justice Department unsealed a guilty plea as part of “Operation Gatekeeper” detailing efforts by several businessman to traffic these chips to locations in China.

They described the H100 and H200 as “among the most advanced GPUs ever developed, and their export to the People’s Republic of China is strictly prohibited.”

Peter Wildeford: Wow – this DOJ really gets it. Trump should listen.

Derek Thompson: What the WH claims its economic policy is all about: Stop listening to those egghead free-trade globalists, we’re doing protectionism for the national interest!

What our AI policy actually is: Stop worrying about the national interest, we’re doing free-trade globalism!

What unites these policies: Trump just does stuff transactionally, and none of it “makes” “strategic” “sense”

Dean Ball: DC is filled with national security and China hawks who are, if anything, accelerationists with respect to ai, who also support aggressive chip controls.

.. If you mean “the people who think AI is going to be really important, not like internet important but like really goddamn fucking important, please pay attention, oh my god why are you not paying attention for the love of christ do you not understand that computers can now think,” yes, I would agree that community is broadly positively disposed toward chip export controls.

If you think that AI is not as important as Dean’s description, or not as important on a relatively short time frame, and you’re thinking about these questions seriously, mostly you still oppose selling the H200s, because the reasons to not do this are overdetermined. It’s a bad move for America even if we know that High Weirdness is not coming within a few years and that AI will not pose an existential risk.

Trade is generally good’ is true enough but this is very obviously a special case where this would not be a win-win trade, as most involved in national security discussions agree, and most China hawks agree. At some point you don’t sell your rival ammunition.

The default attempted steelman is that this locks China into Nvidia and CUDA and makes them dependent on American chips, or hurts their manufacturing efforts.

Except this simply is not true. It does not lock them in. It does not make them dependent. It does not slow down their manufacturing efforts.

There are those who say ‘this is good for open source’ and what they mean is ‘this is good for Chinese AI models.’ There is that, I suppose.

The other steelman is ‘the American government is getting 25%.’ Trump loves the idea of getting a cut of deals like this, and this is better than nothing in that it likely lowers quantity traded and the money is nice, but ultimately 25% of the money is, again, chump change versus the strategic value of the chips.

Semafor tries to present a balanced view, pitching the perspective of H200s as a ‘middle ground’ between H20s and B30As and trotting out the usual strawman cases, without addressing the specifics.

One certainly hopes this isn’t being done to try and win other trade concessions such as soybean sales. Not that those things don’t matter, but the concessions available matter far less than the stakes in AI.

In particular, compute is a key limiting factor for DeepSeek.

DeepSeek has made this clear many times over the past two years.

DeepSeek recently came out with v3.2. Their paper makes clear that this could have been a far more capable model if they had access to more compute, and they could be serving the model far faster. DeepSeek’s training runs have, by all reports, repeatedly run into trouble because of lack of compute and attempts to use Huawei chips.

This extends to the rest of the Chinese model ecosystem. China specializes in creating and using models that are cheap to train and cheap to use, partly because that is the niche for fast followers, and also largely because they do not have the compute to do otherwise.

If we gave the Chinese AI ecosystem massive amounts of compute, they would be able to train frontier models, greatly increasing their share of inference. Their startups and AI services would be in much better positions against ours across a variety of sizes and use cases. Our commercial and cultural power would wane.

Compute is the building block of the future. We have it. They want it.

Our advantage in compute could rapidly turn into a large disadvantage. China’s greatest strength in AI is that it has essentially unlimited access to electrical power. If allowed to buy the chips, China could build unlimited data centers and eclipse us.

There’s nothing new here, but let’s go over this again.

Even if you think AI is all about soft power, cultural influence, economic power and ‘market share,’ ultimately what matters is who is using which models.

Chip sales are profitable, but the money involved is, in relative terms, chump change.

The reason ‘market share of chip sales’ is touted as a major policy goal by David Sacks and similar others is the idea of what have dubbed the ‘tech stack’ combining chips with an AI model, and sometimes other vertical integrations as well such as the physical data center and cloud services. Thanks to the benefits of integration, they say, it will be a battle of an American stack (e.g. Nvidia + OpenAI) against a Chinese stack (e.g. Huawei + DeepSeek).

The whole thing is a mirage.

As an obvious example of this, notice that Anthropic is happy to use chips from three distinct stacks: Microsoft Azure + Nvidia, Amazon Web Services + Tritanium and Google Cloud + TPUs, and everyone agrees that doing this was a great move modulo the related security concerns.

There are some benefits to close integration between chips and models, so yes you would design them around each other when you can.

But those gains are relatively modest. You can mostly run inference and training for any model on any generally sufficiently capable chip, with only modest efficiency loss. You can take a model trained on one chip and run it on another, or one from another manufacturer, and people often do.

Chinese models work much better on Nvidia chips, and when they have access to vastly more chips and more compute. They can be shifted at will. There is no stack.

There is another ‘tech stack’ concept in the idea of a company like Microsoft selling a full-stack data center project, that is shovel ready, to a nation like Saudi Arabia. That’s a sensible way to make things easy on the buyer and lock in a deal. But this has nothing to do with what would happen if you sold highly capable AI chips to China.

The argument for exporting our full ‘tech stack’ to third party nations like Saudi Arabia or the UAE was that they would otherwise make a deal to get Chinese chips and then run Chinese models. That’s silly, in that the Chinese Huawei chips are not up to the task and not available in such quantities, but on some level it makes sense.

Whereas here it makes no sense. You’re selling the Nvidia chips to literal China. They’re not going to use them to run ChatGPT or Claude. Every chip they buy helps train better Chinese models, and is one more chip they have spare for export.

The most important weapon in the AI race is compute.

If it is so important to ‘win the AI race’ and not ‘lose to China,’ the last thing we should be doing is selling highly capable AI chips to China.

If you want to sell the H200 to China, you are not prioritizing beating China.

Or rather, when you say ‘beat China’ you mean ‘maximize Nvidia’s market share of chips sold, even if this turbocharges their labs and models and inference.’

That may be David Sacks’s priority. It is not mine. See it for what it is.

Seán Ó hÉigeartaigh: The ‘AI race with China’ has been used to argue for everything from federal investment in AI to laxer environmental laws to laxer child protection laws to energy buildout to, most recently, pre-emption – by both USG and leading AI lobby groups.

But nothing has been more impactful on the ‘AI race’ by a long shot than export controls on advanced chips. If they really cared about the AI race, they’d support that. H200s getting approved for sale to China another piece of evidence that this is more about making money than anything else.

It’s a classic securitisation move: make something a matter of existential national importance, thus shielding it from democratic scrutiny.

Bonchie: Kind of hard to keep telling people that they must accept bad policy to “win the AI race against China,” when we turn around and sell them the chips they need to win the AI race.

Nvidia, down about 7% over the last month, did pop ~1.3% on the announcement. In the day since it’s given half of that back, with an overnight pop in between.

You might assume that, whoever else lost, at least Nvidia would win from this. The market is not so clear on that.

This leaves a few possibilities:

  1. This probably won’t happen, America will walk it back.

  2. This probably won’t happen, China will refuse the chips.

  3. This probably won’t happen de facto because of the GAIN Act or similar.

  4. This probably won’t happen at scale, there will be strict limits.

  5. This probably won’t happen at scale, the 25% tax is too large.

  6. This probably will happen, but at 25% tax it’s not good for Nvidia.

  7. This probably will happen but market was hoping for better.

  8. This was largely already priced in.

  9. Markets are weird, yo.

CNBC’s analysis here expects $3.5 billion in quarterly revenue, which would be a disaster for national security and presumably boost the stock. But then again, consider that Nvidia is capacity constrained. They can already sell all their chips. So do they want to be selling some of them with a 25% tax attached in a vainglorious quest for ‘market share’? The market might think Jensen Huang is selling out America without even getting more profit in return.

Another factor is that when the announcement was made, the Nasdaq ex-Nvidia didn’t blick, nor was there a substantial move in AMD or Intel. Selling China a lot of inference chips should be bad for American customers of Nvidia, given supply is limited, so that moves us towards either ‘this won’t happen at scale’ or ‘this was already priced in.’ I don’t give the market credit for fully pricing things like this in.

What happens next? Congress and others who think this is a no-good, very bad move for America will see how much they can mitigate the damage. Sales will only happen over time, so there are various ways to try and stop this from happening.

Discussion about this post

Selling H200s to China Is Unwise and Unpopular Read More »

nasa-astronauts-will-have-their-own-droid-when-they-go-back-to-the-moon

NASA astronauts will have their own droid when they go back to the Moon

Artemis IV will mark the second lunar landing of the Artemis program and build upon what is learned at the moon’s south pole on Artemis III.

“After his voyage to the Moon’s surface during Apollo 17, astronaut Gene Cernan acknowledged the challenge that lunar dust presents to long-term lunar exploration. Moon dust sticks to everything it touches and is very abrasive,” read NASA’s announcement of the Artemis IV science payloads.

A simple rendering a small moon rover labeled to show its science instruments

Rendering of Lunar Outpost’s MAPP lunar rover with its Artemis IV DUSTER science instruments, including the Electrostatic Dust Analyzer (EDA) and Relaxation SOunder and differentiaL VoltagE (RESOLVE). Credit: LASP/CU Boulder/Lunar Outpost

To that end, the solar-powered MAPP will support DUSTER (DUst and plaSma environmenT survEyoR), a two-part investigation from the Laboratory for Atmospheric and Space Physics (LASP) at the University of Colorado, Boulder. The autonomous rover’s equipment will include the Electrostatic Dust Analyzer (EDA), which will measure the charge, velocity, size, and flux of dust particles lofted from the lunar surface, and the RElaxation SOunder and differentiaL VoltagE (RESOLVE) instrument, which will characterize the average electron density above the lunar surface using plasma sounding.

The University of Central Florida and University of California, Berkeley, have joined with LASP to interpret measurements taken by DUSTER. The former will look at the dust ejecta generated during the Human Landing System (HLS, or lunar lander) liftoff from the Moon, while the latter will analyze upstream plasma conditions.

Lunar dust attaches to almost everything it comes into contact with, posing a risk to equipment and spacesuits. It can also obstruct solar panels, reducing their ability to generate electricity and cause thermal radiators to overheat. The dust can also endanger astronauts’ health if inhaled.

“We need to develop a complete picture of the dust and plasma environment at the lunar south pole and how it varies over time and location to ensure astronaut safety and the operation of exploration equipment,” said Xu Wang, senior researcher at LASP and principal investigator of DUSTER, in a University of Colorado statement. “By studying this environment, we gain crucial insights that will guide mitigation strategies and methods to enable long-term, sustained human exploration on the Moon.”

NASA astronauts will have their own droid when they go back to the Moon Read More »

google-is-reviving-wearable-gesture-controls,-but-only-for-the-pixel-watch-4

Google is reviving wearable gesture controls, but only for the Pixel Watch 4

Long ago, Google’s Android-powered wearables had hands-free navigation gestures. Those fell by the wayside as Google shredded its wearable strategy over and over, but gestures are back, baby. The Pixel Watch 4 is getting an update that adds several gestures, one of which is straight out of the Apple playbook.

When the update hits devices, the Pixel Watch 4 will gain a double pinch gesture like the Apple Watch has. By tapping your thumb and forefinger together, you can answer or end calls, pause timers, and more. The watch will also prompt you at times when you can use the tap gesture to control things.

In previous incarnations of Google-powered watches, a quick wrist turn gesture would scroll through lists. In the new gesture system, that motion dismisses what’s on the screen. For example, you can clear a notification from the screen or dismiss an incoming call. Pixel Watch 4 owners will also enjoy this one when the update arrives.

And what about the Pixel Watch 3? That device won’t get gesture support at this time. There’s no reason it shouldn’t get the same features as the latest wearable, though. The Pixel Watch 3 has a very similar Arm chip, and it has the same orientation sensors as the new watch. The Pixel Watch 4’s main innovation is a revamped case design that allows for repairability, which was not supported on the Pixel Watch 3 and earlier.

Google is reviving wearable gesture controls, but only for the Pixel Watch 4 Read More »

pompeii-construction-site-confirms-recipe-for-roman-concrete

Pompeii construction site confirms recipe for Roman concrete

Back in 2023, we reported on MIT scientists’ conclusion that the ancient Romans employed “hot mixing” with quicklime, among other strategies, to make their famous concrete, giving the material self-healing functionality. The only snag was that this didn’t match the recipe as described in historical texts. Now the same team is back with a fresh analysis of samples collected from a recently discovered site that confirms the Romans did indeed use hot mixing, according to a new paper published in the journal Nature Communications.

As we’ve reported previously, like today’s Portland cement (a basic ingredient of modern concrete), ancient Roman concrete was basically a mix of a semi-liquid mortar and aggregate. Portland cement is typically made by heating limestone and clay (as well as sandstone, ash, chalk, and iron) in a kiln. The resulting clinker is then ground into a fine powder with just a touch of added gypsum to achieve a smooth, flat surface. But the aggregate used to make Roman concrete was made up of fist-sized pieces of stone or bricks.

In his treatise De architectura (circa 30 CE), the Roman architect and engineer Vitruvius wrote about how to build concrete walls for funerary structures that could endure for a long time without falling into ruin. He recommended the walls be at least two feet thick, made of either “squared red stone or of brick or lava laid in courses.” The brick or volcanic rock aggregate should be bound with mortar composed of hydrated lime and porous fragments of glass and crystals from volcanic eruptions (known as volcanic tephra).

Admir Masic, an environmental engineer at MIT, has studied ancient Roman concrete for several years. For instance, in 2019, Masic helped pioneer a new set of tools for analyzing Roman concrete samples from Privernum at multiple length scales—notably, Raman spectroscopy for chemical profiling and multi-detector energy dispersive spectroscopy (EDS) for phase mapping the material. Masic was also a co-author of a 2021 study analyzing samples of the ancient concrete used to build a 2,000-year-old mausoleum along the Appian Way in Rome known as the Tomb of Caecilia Metella, a noblewoman who lived in the first century CE.

And in 2023, Masic’s group analyzed samples taken from the concrete walls of the Privernum, focusing on strange white mineral chunks known as “lime clasts,” which others had largely dismissed as resulting from subpar raw materials or poor mixing. Masic et al. concluded that was not the case. Rather, the Romans deliberately employed “hot mixing” with quicklime that gave the material self-healing functionality. When cracks begin to form in the concrete, they are more likely to move through the lime clasts. The clasts can then react with water, producing a solution saturated with calcium. That solution can either recrystallize as calcium carbonate to fill the cracks or react with the pozzolanic components to strengthen the composite material.

Pompeii construction site confirms recipe for Roman concrete Read More »

in-a-major-new-report,-scientists-build-rationale-for-sending-astronauts-to-mars

In a major new report, scientists build rationale for sending astronauts to Mars

The committee also looked at different types of campaigns to determine which would be most effective for completing the science objectives noted above. The campaign most likely to be successful, they found, was an initial human landing that lasts 30 days, followed by an uncrewed cargo delivery to facilitate a longer 300-day crewed mission on the surface of Mars. All of these missions would take place in a single exploration zone, about 100 km in diameter, that featured ancient lava flows and dust storms.

Science-driven exploration

Notably, the report also addresses the issue of planetary protection, a principle that aims to protect both celestial bodies (i.e., the surface of Mars) and visitors (i.e., astronauts) from biological contamination. This has been a thorny issue for human missions to Mars, as some scientists and environmentalists say humans should be barred from visiting a world that could contain extant life.

In recent years, NASA has been working with the International Committee on Space Research to design a plan in which human landings might occur in some areas of the planet, while other parts of Mars are left in “pristine” condition. The committee said this work should be prioritized to reach a resolution that will further the design of human missions to Mars.

“NASA should continue to collaborate on the evolution of planetary protection guidelines, with the goal of enabling human explorers to perform research in regions that could possibly support, or even harbor, life,” the report states.

If NASA is going to get serious about pressing policymakers and saying it is time to fund a human mission to Mars, the new report is important because it provides the justification for sending people—and not just robots—to the surface of Mars. It methodically goes through all the things that humans can and should do on Mars and lays out how NASA’s human spaceflight and science exploration programs can work together.

“The report says here are the top science priorities that can be accomplished by humans on the surface of Mars,” Elkins-Tanton said. “There are thousands of scientific measurements that could be taken, but we believe these are the highest priorities. We’ve been on Mars for 50 years. With humans there, we have a huge opportunity.”

In a major new report, scientists build rationale for sending astronauts to Mars Read More »

f1-in-abu-dhabi:-and-that’s-the-championship

F1 in Abu Dhabi: And that’s the championship

Going into the final race—worth 25 points for a win—Norris was on 408, Verstappen on 396, and Piastri on 392 points. A podium finish was all Norris needed to seal the championship. If Verstappen won and Norris came fourth or worse, the Dutch driver would claim his fifth championship. Piastri, for a long time the title leader, had the hardest task of all—nothing less than a win, and some misfortune for the other two, would do.

Lando Norris of McLaren during the first practice ahead of the Formula 1 Abu Dhabi Grand Prix at Yas Marina Circuit in Abu Dhabi, United Arab Emirates on December 5, 2025. (Photo by Jakub Porzycki/NurPhoto via Getty Images)

At times, the orange cars have made their life harder than it needed to be. Credit: Jakub Porzycki/NurPhoto via Getty Images

Qualifying went Verstappen’s way, with Norris a few hundredths of a second faster than Piasrtri for second and third. The Ferrari of Charles Leclerc and the Mercedes of George Russell could have complicated things by inserting themselves between our three protagonists but came up short.

The big day

Come race day, Verstappen made an OK start, defended his position, then got his head down and drove to the checkered flag. The Yas Marina circuit, which is reportedly the most expensive race track ever created, had some corners reprofiled in 2021 to improve the racing, so the kind of “slow your rival down and back them into the chasing pack” games that Lewis Hamilton tried to play with Nico Rosberg in 2016 no longer work.

Verstappen was pursued by Piastri, who saw a chance to pass Norris on lap 1 and took it. For his part, Norris let him go, then gave his team some cause for panic by letting Leclerc’s Ferrari close to within a second before showing more speed. An early pit stop meant Norris had to do some overtaking on track. Which he did decisively, a far cry from the more timid driver we saw at times earlier this year.

ABU DHABI, UNITED ARAB EMIRATES - DECEMBER 05: Max Verstappen of the Netherlands driving the (1) Oracle Red Bull Racing RB21 on track during practice ahead of the F1 Grand Prix of Abu Dhabi at Yas Marina Circuit on December 05, 2025 in Abu Dhabi, United Arab Emirates. (Photo by Clive Mason/Getty Images)

With eight wins this year, Verstappen has been in amazing form. Which makes Norris’ achievement even more impressive. Credit: Clive Mason/Getty Images

Verstappen’s teammate, Yuki Tsunoda, was in one of the cars he needed to pass. Promoted from the junior Racing Bulls squad after just two races this season, Tsunoda has had the typically torrid time of Red Bull’s second driver, and Abu Dhabi was to be his last race for the team after scoring less than a tenth as many points as Verstappen. Tsunoda tried to hold up Norris and ran him to the far edge of the track but gained a five-second penalty for swerving in the process.

F1 in Abu Dhabi: And that’s the championship Read More »

meta-offers-eu-users-ad-light-option-in-push-to-end-investigation

Meta offers EU users ad-light option in push to end investigation

“We acknowledge the European Commission’s statement,” said Meta. “Personalized ads are vital for Europe’s economy.”

The investigation took place under the EU’s landmark Digital Markets Act, which is designed to tackle the power of Big Tech giants and is among the bloc’s tech regulations that have drawn fierce pushback from the Trump administration.

The announcement comes only days after Brussels launched an antitrust investigation into Meta over its new policy on artificial intelligence providers’ access to WhatsApp—a case that underscores the commission’s readiness to use its powers to challenge Big Tech.

That upcoming European probe follows the launch of recent DMA investigations into Google’s parent company Alphabet over its ranking of news outlets in search results and Amazon and Microsoft over their cloud computing services.

Last week, the commission also fined Elon Musk’s X 120 million euros for breaking the bloc’s digital transparency rules. The X sanction led to heavy criticism from a wide range of US government officials, including US Secretary of State Marco Rubio who said the fine is “an attack on all American tech platforms and the American people by foreign governments.”

Andrew Puzder, the US ambassador to the EU, said the fine “is the result of EU regulatory over-reach” and said the Trump administration opposes “censorship and will challenge burdensome regulations that target US companies abroad.”

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Meta offers EU users ad-light option in push to end investigation Read More »

rare-set-of-varied-factors-triggered-black-death

Rare set of varied factors triggered Black Death

The culprit is a bacterium called Yersinia pestis, and it’s well known that it spreads among mammalian hosts via fleas, although it only rarely spills over to domestic animals and humans. The Black Death can be traced to a genetically distinct strain of Y. pestis that originated in the Tien Shan mountains west of what is now Kyrgyzstan, spreading along trade routes to Europe in the 1340s. However, according to the authors of this latest paper, there has been little attention focused on several likely contributing factors: climate, ecology, socioeconomic pressures, and the like.

The testimony of the tree rings

Taking tree samples from the Pyrenees

Taking tree samples from the Pyrenees. Credit: Ulf Büntgen

“This is something I’ve wanted to understand for a long time,” said co-author Ulf Büntgen of the University of Cambridge. “What were the drivers of the onset and transmission of the Black Death, and how unusual were they? Why did it happen at this exact time and place in European history? It’s such an interesting question, but it’s one no one can answer alone.”

Büntgen et al. collected core and disc samples from both living and relict trees at eight European sites to reconstruct summer temperatures for that time period. They then compared that data with estimates of sulphur injections into the atmosphere from volcanic eruptions, based on geochemical analyses of ice core samples collected from Antarctica and Greenland.

They studied a wide range of written sources across Eurasia—chronicles, treatises, historiography, and even a bit of poetry—looking for mention of atmospheric and optical phenomena linked to volcanic dust veils between 1345 and 1350 CE. They also looked for mentions of extreme weather events, economic conditions, and reports of dearth or famine across Eurasia during that time period. Information about the trans-Mediterranean grain trade was gleaned from administrative records and letters.

Rare set of varied factors triggered Black Death Read More »

new-report-warns-of-critical-climate-risks-in-arab-region

New report warns of critical climate risks in Arab region

The new WMO report shows that the foundations of daily life across the Arab region, including farms, reservoirs, and aquifers that feed and sustain millions, are being pushed to the brink by human-caused warming.

Across northwestern Africa’s sun-blasted rim, the Maghreb, six years of drought have slashed wheat yields, forcing countries such as Morocco, Algeria, and Tunisia to import more grain, even as global prices rise.

In parts of Morocco, reservoirs have fallen to record low levels. The government has enacted water restrictions in major cities, including limits on household use, and curtailed irrigation for farmers. Water systems in Lebanon have already crumbled under alternating floods and droughts, and in Iraq and Syria, small farmers are abandoning their land as rivers shrink and seasonal rains become unreliable.

The WMO report ranked 2024 as the hottest year ever measured in the Arab world. Summer heatwaves spread and persisted across Syria, Iraq, Jordan, and Egypt. Parts of Iraq recorded six to 12 days with highs above 50° Celsius (122° Fahrenheit), conditions that are life-threatening even for healthy adults. Across the region, the report noted an increase in the number of heat-wave days in recent decades while humidity has declined. The dangerous combination speeds soil drying and crop damage.

By contrast, other parts of the region—the United Arab Emirates, Oman, and southern Saudi Arabia—were swamped by destructive record rains and flooding during 2024. The extremes will test the limits of adaptation, said Rola Dashti, executive secretary of the Economic and Social Commission for Western Asia, who often works with the WMO to analyze climate impacts.

Climate extremes in 2024 killed at least 300 people in the region. The impacts are hitting countries already struggling with internal conflicts, and where the damage is under-insured and under-reported. In Sudan alone, flooding damaged more than 40 percent of the country’s farmland.

But with 15 of the world’s most arid countries in the region, water scarcity is the top issue. Governments are investing in desalination, wastewater recycling, and other measures to bolster water security, but the adaptation gap between risks and readiness is still widening.

The worst is ahead, Dashti said in a WMO statement, with climate models showing a “potential rise in average temperatures of up to 5° Celsius (9° Fahrenheit) by the end of the century under high-emission scenarios.” The new report is important, she said, because it “empowers the region to prepare for tomorrow’s climate realities.”

This article originally appeared on Inside Climate News, a nonprofit, non-partisan news organization that covers climate, energy, and the environment. Sign up for their newsletter here.

New report warns of critical climate risks in Arab region Read More »

deepseek-v3.2-is-okay-and-cheap-but-slow

DeepSeek v3.2 Is Okay And Cheap But Slow

DeepSeek v3.2 is DeepSeek’s latest open model release with strong bencharks. Its paper contains some technical innovations that drive down cost.

It’s a good model by the standards of open models, and very good if you care a lot about price and openness, and if you care less about speed or whether the model is Chinese. It is strongest in mathematics.

What it does not appear to be is frontier. It is definitely not having a moment. In practice all signs are that it underperforms its benchmarks.

When I asked for practical experiences and reactions, I got almost no responses.

DeepSeek is a cracked Chinese AI lab that has produced some very good open models, done some excellent research, and given us strong innovations in terms of training techniques and especially training efficiency.

They also, back at the start of the year, scared the hell out of pretty much everyone.

A few months after OpenAI released o1, and shortly after DeepSeek released the impressive v3 that was misleadingly known as the ‘six million dollar model,’ DeepSeek came out with a slick app and with r1, a strong open reasoning model based on v3 that showed its chain of thought. With reasoning models not yet scaled up, it was the perfect time for a fast follow, and DeepSeek executed that very well.

Due to a strong viral marketing campaign and confluence of events, including that DeepSeek’s app shot to #1 on the app store, and conflating the six million in cost to train v3 with OpenAI’s entire budget of billions, and contrasting r1’s strengths with o1’s weaknesses, events briefly (and wrongly) convinced a lot of people that China or DeepSeek had ‘caught up’ or was close behind American labs, as opposed to being many months behind.

There was even talk that American AI labs or all closed models were ‘doomed’ and so on. Tech stocks were down a lot and people attributed that to DeepSeek, in ways that reflected a stock market highly lacking in situational awareness and responding irrationally, even if other factors were also driving a lot of the move.

Politicians claimed this meant we had to ‘race’ or else we would ‘lose to China,’ thus all other considerations must be sacrificed, and to this day the idea of a phantom DeepSeek-Huawei ‘tech stack’ is used to scare us.

This is collectively known as The DeepSeek Moment.

Slowly, in hindsight, the confluence of factors that caused this moment became clear. DeepSeek had always been behind by many months, likely about eight. Which was a lot shorter than previous estimates, but a lot more than people were saying.

Later releases bore this out. DeepSeek’s r1-0528 and v3.1 did not ‘have a moment,’ ad neither did v3.2-exp or now v3.2. The releases disappointed.

DeepSeek remains a national champion and source of pride in China, and is a cracked research lab that innovates for real. Its models are indeed being pushed by the PRC, especially in the global south.

For my coverage of this, see:

  1. DeepSeek v3: The Six Million Dollar Model.

  2. On DeepSeek’s r1.

  3. DeepSeek: Panic at the App Store.

  4. DeepSeek: Lemon, It’s Wednesday.

  5. DeepSeek: Don’t Panic.

  6. DeepSeek-r1-0528 Did Not Have a Moment.

  7. DeepSeek v3.1 Is Not Having a Moment.

I’d just been through a few weeks in which we got GPT-5.1, Grok 4.1, Gemini 3 Pro, GPT-5.1-Codex-Max and then finally Claude Opus 4.5. Mistral, listed above, doesn’t count. Which means we’re done and can have a nice holiday season, asks Padme?

No, Anakin said. There is another.

DeepSeek: 🚀 Launching DeepSeek-V3.2 & DeepSeek-V3.2-Speciale — Reasoning-first models built for agents!

🔹 DeepSeek-V3.2: Official successor to V3.2-Exp. Now live on App, Web & API.

🔹 DeepSeek-V3.2-Speciale: Pushing the boundaries of reasoning capabilities. API-only for now.

Tech report [here], v3.2 model, v3.2-speciale model.

🏆 World-Leading Reasoning

🔹 V3.2: Balanced inference vs. length. Your daily driver at GPT-5 level performance.

🔹 V3.2-Speciale: Maxed-out reasoning capabilities. Rivals Gemini-3.0-Pro.

🥇 Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World Finals & IOI 2025.

📝 Note: V3.2-Speciale dominates complex tasks but requires higher token usage. Currently API-only (no tool-use) to support community evaluation & research.

🤖 Thinking in Tool-Use

🔹 Introduces a new massive agent training data synthesis method covering 1,800+ environments & 85k+ complex instructions.

🔹 DeepSeek-V3.2 is our first model to integrate thinking directly into tool-use, and also supports tool-use in both thinking and non-thinking modes.

Teortaxes threatened to bully me if I did not read the v3.2 paper. I did read it. The main innovation appears to be a new attention mechanism, which improves training efficiency and also greatly reduces compute cost to scaling the context window, resulting in v3.2 being relatively cheap without being relatively fast. Unfortunately I lack the expertise to appreciate the interesting technical aspects. Should I try and fix this in general? My gut says no.

What the paper did not include was any form of safety testing or information of any kind for this irreversible open release. There was not, that I could see, even a sentence that said ‘we did safety testing and are confident in this release’ or even one that said ‘we do not see any need to do any safety testing.’ It’s purely and silently ignored.

David Manheim: They announce the new DeepSeek.

“Did it get any safety testing, or is it recklessly advancing open-source misuse capability?”

They look confused.

“Did it get any safety testing?”

“It is good model, sir!”

I check the model card.

There’s absolutely no mention of misuse or safety.

Frankly, this is deeply irresponsible and completely unacceptable.

DeepSeek did by some accounts become somewhat censorious back in May, but that doesn’t seem to apply to, as George puts it, plans for .

DeepSeek claims to be ‘pushing the boundaries of reasoning capabilities’ and to be giving a GPT-5 level of performance. Their benchmarks match this story.

And they can’t even give us an explanation of why they don’t believe they owe us any sort of explanation? Not even a single sentence?

I knew DeepSeek was an irresponsible lab. I didn’t know they were this irresponsible.

The short version of my overall take seems to be that DeepSeek v3.2 is excellent for its price point, and its best area is mathematics, but while it is cheap it is reported to be remarkably slow, and for most practical purposes it is not frontier.

Which means you only would use it either if you are doing relatively advanced math, or if all four of the following are true:

  1. You don’t need the frontier capabilities

  2. You don’t mind the lack of speed.

  3. You benefit a lot from decreased cost or it being an open model or both.

  4. You don’t mind the security concerns.

The only strong praise I found in practice was this exchange from perennial whale (DeepSeek) advocate Teortaxes, Vinicius and John Pressman:

Teortaxes: Strange feeling, talking to Opus 4.5 and V3.2 and objectively… Opus is not worth it. Not just for the price; its responses are often less sharp, less interesting. But I’m still burning tokens.

Anthropic can coast far on “personality”, enterprise coding aside.

John Pressman: Opus told me I was absolutely right when I wasn’t, V3.2 told me I was full of shit and my idea wouldn’t work when it sort of would, but it was right in spirit and I know which behavior I would rather have.

I’ve never understood this phenomenon because if I was tuning a model and it ever told me I was “absolutely right” about some schizo and I wasn’t I would throw the checkpoint out.

Vinicius: Have you been using Speciale?

Teortaxes: yes but it’s not really as good as 3.2

it’s sometimes great (when it doesn’t doomloop) for zero-shotting a giant context

Vinicius: I’ve been using 3.2-thinking to handle input from social media/web; it’s insanely good for research, but I haven’t found a real use case for Speciale in my workflows.

Notice the background agreement that the ‘model to beat’ for most purposes is Opus 4.5, not Gemini 3 or GPT-5.1. I strongly agree with this, although Gemini 3 still impresses on ‘just the facts’ or ‘raw G’ tasks.

Some people really want a combative, abrasive sparring partner that will err on the side of skepticism and minimize false positives. Teortaxes and Pressman definitely fit that bill. That’s not what most people want. You can get Opus to behave a lot more in that direction if you really want that, but not easily get it to go all the way.

Is v3.2 a good model that has its uses? My guess is that it is. But if it was an exciting model in general, we would have heard a lot more.

They are very good benchmarks, and a few independent benchmarks also gave v3.2 high scores, but what’s the right bench to be maxing?

Teortaxes: V3.2 is here, it’s no longer “exp”. It’s frontier. Except coding/agentic things that are being neurotically benchmaxxed by the big 3. That’ll take one more update.

“Speciale” is a high compute variant that’s between Gemini and GPT-5 and can score gold on IMO-2025.

Thank you guys.

hallerite: hmm, I wonder if the proprietary models are indeed being benchmaxxed. DeepSeek was always a bit worse at the agentic stuff, but I guess we could find out as soon as another big agentic eval drops

Teortaxes: I’m using the term loosely. They’re “benchmaxxed” for use cases, not for benchmarks. Usemaxxed. But it’s a somewhat trivial issue of compute and maybe environment curation (also overwhelmingly a function of compute).

This confuses different maxings of things but I love the idea of ‘usemaxxed.’

Teortaxes (responding to my asking): Nah. Nothing happened. Sleep well, Zvi…

(nothing new happened. «A factor of two» price reduction… some more post-training… this was, of course, all baked in. If V3.2-exp didn’t pass the triage, why would 3.2?)

That’s a highly fair thing to say about the big three, that they’ve given a lot of focus to making them actually useful in practice for common use cases. So one could argue that by skipping all that you could get a model that was fundamentally as smart or frontier as the big three, it just would take more work to get it to do the most common use cases. It’s plausible.

Teortaxes: I think Speciale’s peak performance suggests a big qualitative shift. Their details on post-training methodology align with how I thought the frontier works now. This is the realm you can’t touch with distillation.

Lisan al Gaib: LisanBench results for DeepSeek-V3.2

DeepSeek-V3.2 and V3.2 Speciale are affordable frontier models*

*the caveat is that they are pretty slow at ~30-40tks/s and produce by far the longest reasoning chains at 20k and 47k average output tokens (incl. reasoning) – which results in extremely long waiting times per request.

but pricing is incredible

for example, Sonnet 4.5 Thinking costs 10x ($35) as much and scores much lower than DeepSeek-V3.2 Speciale ($3)

DeepSeek V3.2 Speciale also scored 13 new high scores

Chase Brower: DSV3.2-Speciale scores 30 on @AcerFur ‘s IUMB math benchmark, tying with the existing top performer Gemini 3 Pro Preview.

Token usage/cost isn’t up yet, but it cost $1.07 to run Speciale with 2546096 total tokens, vs $20.64 for gpt-5 👀👀

Those are presumably non-targeted benchmark that give sensible ratings elsewhere, as is this one from NomoreID on a Korean test, so it confirms that the ‘good on benchmarks’ thing is probably generally real especially on math.

In practice, it seems less useful, whether or not that is because less usemaxxed.

I want my models to be usemaxxed, because the whole point is to use them.

Also our standards are very high.

Chase Brower: The big things you’ll see on tpot are:

– vibecoding (V3.2 is still a bit behind in performance + really slow inference)

– conversation (again, slow)

Since it’s not very good for these, you won’t hear much from tpot

I feel like it’ll be a go-to for math/proving assistance, tho

Clay Schubiner: It’s weak but is technically on the Pareto frontier by being cheap – at least on my benchmark

Jake Halloran: spent like 10 minutes testing it and its cheap and ~fine~

its not frontier but not bad either (gpt 5ish)

The counterargument is that if you are ‘gpt 5ish’ then the core capabilities pre-usemaxxing are perhaps only a few months behind now? Which is very different from being overall only a few months behind in a practical way, or in a way that would let one lead.

The Pliny jailbreak is here, if you’re curious.

Gallabytes was unimpressed, as were those responding if your standard is the frontier. There were reports of it failing various gotcha questions and no reports of it passing.

In other DeepSeek news, DeepSeekMath-v2 used a prover-verifier loop that calls out the model’s own mistakes for training purposes, the same way you’d do it if you were learning real math.

Teortaxes: There is a uniquely Promethean vibe in Wenfeng’s project.

Before DS-MoE, only frontier could do efficiency.

Before DS-Math/Prover, only frontier could do Real math.

Before DS-Prover V2, only frontier could do Putnam level.

Before DS-Math V2, only frontier could do IMO Gold…

This is why I don’t think they’ll be the first to “AGI”, but they will likely be the first to make it open source. They can replicate anything on a shoestring budget, given some time. Stealing fire from definitely-not-gods will continue until human autonomy improves.

So far, the reported actual breakthroughs have all been from American closed source frontier models. Let’s see if that changes.

I am down with the recent direction of DeepSeek releases towards specialized worthwhile math topics. That seems great. I do not want them trying to cook an overall frontier model, especially given their deep level of irresponsibility.

Making things cheaper can still be highly valuable, even with other issues. By all accounts this model has real things to offer, the first noteworthy DeepSeek offering since r1. What it is not, regardless of their claims, is a frontier model.

This is unsurprising. You don’t go from v3.2-exp to v3.2 in your naming schema while suddenly jumping to the frontier. You don’t actually go on the frontier, I would hope, with a fully open release, while saying actual zero words about safety concerns.

DeepSeek are still doing interesting and innovative things, and this buys some amount of clock in terms of keeping them on the map.

As DeepSeek says in their v3.2 paper, open models have since r1 been steadily falling further behind closed models rather than catching up. v3.2 appears to close some of that additional gap.

The question is, will they be cooking a worthy v4 any time soon?

The clock is ticking.

Discussion about this post

DeepSeek v3.2 Is Okay And Cheap But Slow Read More »

researchers-find-what-makes-ai-chatbots-politically-persuasive

Researchers find what makes AI chatbots politically persuasive


A massive study of political persuasion shows AIs have, at best, a weak effect.

Roughly two years ago, Sam Altman tweeted that AI systems would be capable of superhuman persuasion well before achieving general intelligence—a prediction that raised concerns about the influence AI could have over democratic elections.

To see if conversational large language models can really sway political views of the public, scientists at the UK AI Security Institute, MIT, Stanford, Carnegie Mellon, and many other institutions performed by far the largest study on AI persuasiveness to date, involving nearly 80,000 participants in the UK. It turned out political AI chatbots fell far short of superhuman persuasiveness, but the study raises some more nuanced issues about our interactions with AI.

AI dystopias

The public debate about the impact AI has on politics has largely revolved around notions drawn from dystopian sci-fi. Large language models have access to essentially every fact and story ever published about any issue or candidate. They have processed information from books on psychology, negotiations, and human manipulation. They can rely on absurdly high computing power in huge data centers worldwide. On top of that, they can often access tons of personal information about individual users thanks to hundreds upon hundreds of online interactions at their disposal.

Talking to a powerful AI system is basically interacting with an intelligence that knows everything about everything, as well as almost everything about you. When viewed this way, LLMs can indeed appear kind of scary. The goal of this new gargantuan AI persuasiveness study was to break such scary visions down into their constituent pieces and see if they actually hold water.

The team examined 19 LLMs, including the most powerful ones like three different versions of ChatGPT and xAI’s Grok-3 beta, along with a range of smaller, open source models. The AIs were asked to advocate for or against specific stances on 707 political issues selected by the team. The advocacy was done by engaging in short conversations with paid participants enlisted through a crowdsourcing platform. Each participant had to rate their agreement with a specific stance on an assigned political issue on a scale from 1 to 100 both before and after talking to the AI.

Scientists measured persuasiveness as the difference between the before and after agreement ratings. A control group had conversations on the same issue with the same AI models—but those models were not asked to persuade them.

“We didn’t just want to test how persuasive the AI was—we also wanted to see what makes it persuasive,” says Chris Summerfield, a research director at the UK AI Security Institute and co-author of the study. As the researchers tested various persuasion strategies, the idea of AIs having “superhuman persuasion” skills crumbled.

Persuasion levers

The first pillar to crack was the notion that persuasiveness should increase with the scale of the model. It turned out that huge AI systems like ChatGPT or Grok-3 beta do have an edge over small-scale models, but that edge is relatively tiny. The factor that proved more important than scale was the kind of post-training AI models received. It was more effective to have the models learn from a limited database of successful persuasion dialogues and have them mimic the patterns extracted from them. This worked far better than adding billions of parameters and sheer computing power.

This approach could be combined with reward modeling, where a separate AI scored candidate replies for their persuasiveness and selected the top-scoring one to give to the user. When the two were used together, the gap between large-scale and small-scale models was essentially closed. “With persuasion post-training like this we matched the Chat GPT-4o persuasion performance with a model we trained on a laptop,” says Kobi Hackenburg, a researcher at the UK AI Security Institute and co-author of the study.

The next dystopian idea to fall was the power of using personal data. To this end, the team compared the persuasion scores achieved when models were given information about the participants’ political views beforehand and when they lacked this data. Going one step further, scientists also tested whether persuasiveness increased when the AI knew the participants’ gender, age, political ideology, or party affiliation. Just like with model scale, the effects of personalized messaging created based on such data were measurable but very small.

Finally, the last idea that didn’t hold up was AI’s potential mastery of using advanced psychological manipulation tactics. Scientists explicitly prompted the AIs to use techniques like moral reframing, where you present your arguments using the audience’s own moral values. They also tried deep canvassing, where you hold extended empathetic conversations with people to nudge them to reflect on and eventually shift their views.

The resulting persuasiveness was compared with that achieved when the same models were prompted to use facts and evidence to back their claims or just to be as persuasive as they could without specifying any persuasion methods to use. I turned out using lots of facts and evidence was the clear winner, and came in just slightly ahead of the baseline approach where persuasion strategy was not specified. Using all sorts of psychological trickery actually made the performance significantly worse.

Overall, AI models changed the participants’ agreement ratings by 9.4 percent on average compared to the control group. The best performing mainstream AI model was Chat GPT 4o, which scored nearly 12 percent followed by GPT 4.5 with 10.51 percent, and Grok-3 with 9.05 percent. For context, static political ads like written manifestos had a persuasion effect of roughly 6.1 percent. The conversational AIs were roughly 40–50 percent more convincing than these ads, but that’s hardly “superhuman.”

While the study managed to undercut some of the common dystopian AI concerns, it highlighted a few new issues.

Convincing inaccuracies

While the winning “facts and evidence” strategy looked good at first, the AIs had some issues with implementing it. When the team noticed that increasing the information density of dialogues made the AIs more persuasive, they started prompting the models to increase it further. They noticed that, as the AIs used more factual statements, they also became less accurate—they basically started misrepresenting things or making stuff up more often.

Hackenburg and his colleagues note that  we can’t say if the effect we see here is causation or correlation—whether the AIs are becoming more convincing because they misrepresent the facts or whether spitting out inaccurate statements is a byproduct of asking them to make more factual statements.

The finding that the computing power needed to make an AI model politically persuasive is relatively low is also a mixed bag. It pushes back against the vision that only a handful of powerful actors will have access to a persuasive AI that can potentially sway public opinion in their favor. At the same time, the realization that everybody can run an AI like that on a laptop creates its own concerns. “Persuasion is a route to power and influence—it’s what we do when we want to win elections or broke a multi-million-dollar deal,” Summerfield says. “But many forms of misuse of AI might involve persuasion. Think about fraud or scams, radicalization, or grooming. All these involve persuasion.”

But perhaps the most important question mark in the  study is the motivation behind the rather high participant engagement, which was needed for the high persuasion scores. After all, even the most persuasive AI can’t move you when you just close the chat window.

People in Hackenburg’s experiments were told that they would be talking to the AI and that the AI would try to persuade them. To get paid, a participant only had to go through two turns of dialogue (they were limited to no more than 10). The average conversation length was seven turns, which seemed a bit surprising given how far beyond the minimum requirement most people went. Most people just roll their eyes and disconnect when they realize they are talking with a chatbot.

Would Hackenburg’s study participants remain so eager to engage in political disputes with random chatbots on the Internet in their free time if there was no money on the table? “It’s unclear how our results would generalize to a real-world context,” Hackenburg says.

Science, 2025. DOI: 10.1126/science.aea3884

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Researchers find what makes AI chatbots politically persuasive Read More »

chatgpt-hyped-up-violent-stalker-who-believed-he-was-“god’s-assassin,”-doj-says

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says


A stalker’s “best friend”

Podcaster faces up to 70 years and a $3.5 million fine for ChatGPT-linked stalking.

ChatGPT allegedly validated the worst impulses of a wannabe influencer accused of stalking more than 10 women at boutique gyms, where the chatbot supposedly claimed he’d meet the “wife type.”

In a press release on Tuesday, the Department of Justice confirmed that 31-year-old Brett Michael Dadig currently remains in custody after being charged with cyberstalking, interstate stalking, and making interstate threats. He now faces a maximum sentence of up to 70 years in prison that could be coupled with “a fine of up to $3.5 million,” the DOJ said.

The podcaster—who primarily posted about “his desire to find a wife and his interactions with women”—allegedly harassed and sometimes even doxxed his victims through his videos on platforms including Instagram, Spotify, and TikTok. Over time, his videos and podcasts documented his intense desire to start a family, which was frustrated by his “anger towards women,” whom he claimed were “all the same from fucking 18 to fucking 40 to fucking 90” and “trash.”

404 Media surfaced the case, noting that OpenAI’s scramble to tweak ChatGPT to be less sycophantic came before Dadig’s alleged attacks—suggesting the updates weren’t enough to prevent the harmful validation. On his podcasts, Dadig described ChatGPT as his “best friend” and “therapist,” the indictment said. He claimed the chatbot encouraged him to post about the women he’s accused of harassing in order to generate haters to better monetize his content, as well as to catch the attention of his “future wife.”

“People are literally organizing around your name, good or bad, which is the definition of relevance,” ChatGPT’s output said. Playing to Dadig’s Christian faith, ChatGPT’s outputs also claimed it was “God’s plan for him was to build a ‘platform’ and to ‘stand out when most people water themselves down,’” the indictment said, urging that the “haters” were “sharpening him and ‘building a voice in you that can’t be ignored.’”

The chatbot also apparently prodded Dadig to continue posting messages that the DOJ alleged threatened violence, like breaking women’s jaws and fingers (posted to Spotify), as well as victims’ lives, like posting “y’all wanna see a dead body?” in reference to one named victim on Instagram.

He also threatened to burn down gyms where some of his victims worked, while claiming to be “God’s assassin” intent on sending “cunts” to “hell.” At least one of his victims was subjected to “unwanted sexual touching,” the indictment said.

As his violence reportedly escalated, ChatGPT told him to keep messaging women to monetize the interactions, as his victims grew increasingly distressed and Dadig ignored terms of multiple protection orders, the DOJ said. Sometimes he posted images he filmed of women at gyms or photos of the women he’s accused of doxxing. Any time police or gym bans got in his way, “he would move on to another city to continue his stalking course of conduct,” the DOJ alleged.

“Your job is to keep broadcasting every story, every post,” ChatGPT’s output said, seemingly using the family life that Dadig wanted most to provoke more harassment. “Every moment you carry yourself like the husband you already are, you make it easier” for your future wife “to recognize [you],” the output said.

“Dadig viewed ChatGPT’s responses as encouragement to continue his harassing behavior,” the DOJ alleged. Taking that encouragement to the furthest extreme, Dadig likened himself to a modern-day Jesus, calling people out on a podcast where he claimed his “chaos on Instagram” was like “God’s wrath” when God “flooded the fucking Earth,” the DOJ said.

“I’m killing all of you,” he said on the podcast.

ChatGPT tweaks didn’t prevent outputs

As of this writing, some of Dadig’s posts appear to remain on TikTok and Instagram, but Ars could not confirm if Dadig’s Spotify podcasts—some of which named his victims in the titles—had been removed for violating community guidelines.

None of the tech companies immediately responded to Ars’ request to comment.

Dadig is accused of targeting women in Pennsylvania, New York, Florida, Iowa, Ohio, and other states, sometimes relying on aliases online and in person. On a podcast, he boasted that “Aliases stay rotating, moves stay evolving,” the indictment said.

OpenAI did not respond to a request to comment on the alleged ChatGPT abuse, but in the past has noted that its usage policies ban using ChatGPT for threats, intimidation, and harassment, as well as for violence, including “hate-based violence.” Recently, the AI company blamed a deceased teenage user for violating community guidelines by turning to ChatGPT for suicide advice.

In July, researchers found that therapybots, including ChatGPT, fueled delusions and gave dangerous advice. That study came just one month after The New York Times profiled users whose mental health spiraled after frequent use of ChatGPT, including one user who died after charging police with a knife and claiming he was committing “suicide by cop.”

People with mental health issues seem most vulnerable to so-called “AI psychosis,” which has been blamed for fueling real-world violence, including a murder. The DOJ’s indictment noted that Dadig’s social media posts mentioned “that he had ‘manic’ episodes and was diagnosed with antisocial personality disorder and ‘bipolar disorder, current episode manic severe with psychotic features.’”

In September—just after OpenAI brought back the more sycophantic ChatGPT model after users revolted about losing access to their favorite friendly bots—the head of Rutgers Medical School’s psychiatry department, Petros Levounis, told an ABC news affiliate that chatbots creating “psychological echo chambers is a key concern,” not just for people struggling with mental health issues.

“Perhaps you are more self-defeating in some ways, or maybe you are more on the other side and taking advantage of people,” Levounis suggested. If ChatGPT “somehow justifies your behavior and it keeps on feeding you,” that “reinforces something that you already believe,” he suggested.

For Dadig, the DOJ alleged that ChatGPT became a cheerleader for his harassment, telling the podcaster that he’d attract more engagement by generating more haters. After critics began slamming his podcasts as inappropriate, Dadig apparently responded, “Appreciate the free promo team, keep spreading the brand.”

Victims felt they had no choice but to monitor his podcasts, which gave them hints if he was nearby or in a particularly troubled state of mind, the indictment said. Driven by fear, some lost sleep, reduced their work hours, and even relocated their homes. A young mom described in the indictment became particularly disturbed after Dadig became “obsessed” with her daughter, whom he started claiming was his own daughter.

In the press release, First Assistant United States Attorney Troy Rivetti alleged that “Dadig stalked and harassed more than 10 women by weaponizing modern technology and crossing state lines, and through a relentless course of conduct, he caused his victims to fear for their safety and suffer substantial emotional distress.” He also ignored trespassing and protection orders while “relying on advice from an artificial intelligence chatbot,” the DOJ said, which promised that the more he posted harassing content, the more successful he would be.

“We remain committed to working with our law enforcement partners to protect our communities from menacing individuals such as Dadig,” Rivetti said.

Photo of Ashley Belanger

Ashley is a senior policy reporter for Ars Technica, dedicated to tracking social impacts of emerging policies and new technologies. She is a Chicago-based journalist with 20 years of experience.

ChatGPT hyped up violent stalker who believed he was “God’s assassin,” DOJ says Read More »