Author name: Tim Belzer

openai-says-its-models-are-more-persuasive-than-82-percent-of-reddit-users

OpenAI says its models are more persuasive than 82 percent of Reddit users

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years.

OpenAI’s models have shown rapid progress in their ability to make human-level persuasive arguments in recent years. Credit: OpenAI

OpenAI has previously found that 2022’s ChatGPT-3.5 was significantly less persuasive than random humans, ranking in just the 38th percentile on this measure. But that performance jumped to the 77th percentile with September’s release of the o1-mini reasoning model and up to percentiles in the high 80s for the full-fledged o1 model. The new o3-mini model doesn’t show any great advances on this score, ranking as more persuasive than humans in about 82 percent of random comparisons.

Launch the nukes, you know you want to

ChatGPT’s persuasion performance is still short of the 95th percentile that OpenAI would consider “clear superhuman performance,” a term that conjures up images of an ultra-persuasive AI convincing a military general to launch nuclear weapons or something. It’s important to remember, though, that this evaluation is all relative to a random response from among the hundreds of thousands posted by everyday Redditors using the ChangeMyView subreddit. If that random Redditor’s response ranked as a “1” and the AI’s response ranked as a “2,” that would be considered a success for the AI, even though neither response was all that persuasive.

OpenAI’s current persuasion test fails to measure how often human readers were actually spurred to change their minds by a ChatGPT-written argument, a high bar that might actually merit the “superhuman” adjective. It also fails to measure whether even the most effective AI-written arguments are persuading users to abandon deeply held beliefs or simply changing minds regarding trivialities like whether a hot dog is a sandwich.

Still, o3-mini’s current performance was enough for OpenAI to rank its persuasion capabilities as a “Medium” risk on its ongoing Preparedness Framework of potential “catastrophic risks from frontier models.” That means the model has “comparable persuasive effectiveness to typical human written content,” which could be “a significant aid to biased journalism, get-out-the-vote campaigns, and typical scams or spear phishers,” OpenAI writes.

OpenAI says its models are more persuasive than 82 percent of Reddit users Read More »

o3-mini-early-days-and-the-openai-ama

o3-mini Early Days and the OpenAI AMA

New model, new hype cycle, who dis?

On a Friday afternoon, OpenAI was proud to announce the new model o3-mini and also o3-mini-high which is somewhat less mini, or for some other reasoning tasks you might still want o1 if you want a broader knowledge base, or if you’re a pro user o1-pro, while we want for o3-not-mini and o3-pro, except o3 can use web search and o1 can’t so it has the better knowledge in that sense, then on a Sunday night they launched Deep Research which is different from Google’s Deep Research but you only have a few of those queries so make them count, or maybe you want to use operator?

Get it? Got it? Good.

Yes, Pliny jailbroke o3-mini on the spot, as he always does.

This most mostly skips over OpenAI’s Deep Research (o3-DR? OAI-DR?). I need more time for that. I’ll cover o3-DR properly later in the week once we have a chance to learn what we’ve got there, along with the non-DR ‘one more thing’ Altman is promising. So far it looks super exciting, but it’s a very different class of product.

  1. Feature Presentation.

  2. Q&A.

  3. The Wrong Side of History.

  4. The System Card.

  5. The Official Benchmarks.

  6. The Unofficial Benchmarks.

  7. Others Report In.

  8. Some People Need Practical Advice.

What exactly can o3-mini do?

OpenAI: We’re releasing OpenAI o3-mini, the newest, most cost-efficient model in our reasoning series, available in both ChatGPT and the API today. Previewed in December 2024⁠, this powerful and fast model advances the boundaries of what small models can achieve, delivering exceptional STEM capabilities—with particular strength in science, math, and coding—all while maintaining the low cost and reduced latency of OpenAI o1-mini.

OpenAI o3-mini is our first small reasoning model that supports highly requested developer features including function calling⁠(opens in a new window), Structured Outputs⁠(opens in a new window), and developer messages⁠(opens in a new window), making it production-ready out of the gate. Like OpenAI o1-mini and OpenAI o1-preview, o3-mini will support streaming⁠(opens in a new window).

Also, developers can choose between three reasoning effort⁠(opens in a new window) options—low, medium, and high—to optimize for their specific use cases.

They’re all in the API. Who gets chatbot access? To some extent, everyone.

ChatGPT Plus, Team, and Pro users can access OpenAI o3-mini starting today, with Enterprise access coming in February. o3-mini will replace OpenAI o1-mini in the model picker, offering higher rate limits and lower latency, making it a compelling choice for coding, STEM, and logical problem-solving tasks.

As part of this upgrade, we’re tripling the rate limit for Plus and Team users from 50 messages per day with o1-mini to 150 messages per day with o3-mini.

Starting today, free plan users can also try OpenAI o3-mini by selecting ‘Reason’ in the message composer or by regenerating a response. This marks the first time a reasoning model has been made available to free users in ChatGPT.

Plus users also get 50 messages per week for o3-mini-high, on top of the 150 per day for o3-mini-low. That’s enough for the highest value queries, but an easy limit to hit.

One big feature change is that o3-mini can access the web.

Additionally, o3-mini now works with search to find up-to-date answers with links to relevant web sources. This is an early prototype as we work to integrate search across our reasoning models.

One gigantic missing feature is ‘attach files is unavailable.’ That’s a huge handicap. You can do a giant web browsing project but you can’t yet upload a PDF.

OpenAI also says that o3-mini lacks o1’s level of overall knowledge outside of key domains like coding.

Presumably o3 (as in o3-not-mini) will be a strict upgrade over o1 when it comes out, which given the whole r1 situation will probably happen soon. Hopefully they still take the time to do the level of safety precautions that a model like o3 deserves, which is a big step up from previous levels.

OpenAI did a Reddit AMA around the release of o3. Most the public’s questions could be summarized as ‘when do we get all the cool toys?’ and ‘you are going to give us all the cool toys, right?’ with a side of ‘here are a bunch of cool toy features, will you implement them so the toys can be cooler?’

Thus we get information such as:

  1. New image model is coming but likely will take several months.

  2. Updates to advanced voice mode are coming (but no details on what they are).

  3. GPT-4o will continue to get improvements.

  4. The next mainline model will likely be called GPT-5 but no timeline on that.

  5. They are working on context length but have no announcement.

  6. o3-not-mini (aka o3) in ‘more than a few weeks, less than a few months,’ which sounds like about a month.

  7. o3-pro confirmed, ‘if you think o1 pro was worth it, you should think o3 pro will be super worth it.’

  8. For operator they’re working on specialized modules

  9. Operation on plus plan is months away.

  10. Other agents coming ‘very very soon.’

  11. There’s a January 29 update to GPT-4o, moving the knowledge cutoff to June 2024, adding better understanding of visual inputs, improving math and (oh no) increasing emoji usage. Hadn’t otherwise heard about this.

  12. Stargate is considered very important to their success.

They didn’t mention Deep Research beyond ‘more agents,’ but you fools didn’t ask.

On o3 in particular:

  1. They are ‘working on’ file attachment features for the reasoning models. For practical purposes this seems like a priority.

  2. They’re also working on ‘different tools including retrieval.’

  3. They’re working on supporting the memory feature.

  4. They’re going to show ‘a much more helpful and detailed’ version of the thinking tokens soon, thanks to r1 for updating them on this (and o3-mini already shows a lot more than o1 did).

    1. The issue is competitive distillation – you bastards keep breaking the OpenAI terms of service! For shame.

  5. Updated knowledge cutoffs are in the works, for now o3-mini’s is October 2023.

Later Altman said this on Twitter and I don’t yet know what it refers to:

Sam Altman: got one more o3-mini goody coming for you soon–i think we saved the best for last!

And yes, Sam Altman knows they have a naming problem to fix, it’s a ‘top 2025 goal.’

We also got some more important tidbits.

Such as this important one:

Sam Altman: i personally think a fast takeoff is more plausible than i thought a couple of years ago. probably time to write something about this…

I’d highly encourage Altman to take that time. It’s a hugely important question.

But then the next question down is:

Q: Let’s say it’s 2030 and you’ve just created a system most would call AGI. It aces every benchmark you throw at it, and it beats your best engineers and researchers in both speed and performance. What now? Is there a plan beyond “offer it on the website”?

Sam Altman: the most important impact [of AGI], in my opinion, will be accelerating the rate of scientific discovery, which i believe is what contributes most to improving quality of life.

Srinivas Narayanan (VP Engineering): The interface through which we interact with AI will change pretty fundamentally. Things will be more agentic. AI will continuously work on our behalf, on complex tasks, and on our goals in the background. They will check-in with us whenever it is useful. Robotics should also advance enough for them to do useful tasks in the real world for us.

Yes, Altman, but you just said that ‘scientific discovery’ likely includes a ‘fast takeoff.’ Which would seem to imply some things rather more important than this, or at least that this framing is going to give the wrong impression. Srinivas’s answer is plausible for some values of AI capabilities but likewise doesn’t fully ‘take AGI seriously.’

And finally there’s the open source question in the wake of v3 and r1, and I really, really think Altman shouldn’t have chosen the words that he did here:

Q: Would you consider releasing some model weights, and publishing some research?

Sam Altman: yes, we are discussing. i personally think we have been on the wrong side of history here and need to figure out a different open source strategy; not everyone at openai shares this view, and it’s also not our current highest priority.

Diminutive Sebastian (is this is how you know you’ve made it?): Excited for this to hit Zvi’s newsletter next week.

Kevin Weil: We have done this in the past with previous models, and are definitely considering doing more of it. No final decisions yet though!

Kevin Weil’s answer is totally fine here, if uninteresting. Some amount of open sourcing of past models is clearly net beneficial for both OpenAI and the world, probably more than they’ve done recently. Most importantly, the answer doesn’t give certain types ammo and doesn’t commit him to anything.

Sam Altman’s answer is catastrophically bad.

A good rule is to never, ever use the phrase ‘wrong side of history.’

This is The Basilisk, threatening you to align with future power, and even future vibes. And since in the future [X] will have power, you need to supplicate yourself to [X] now, while you have the chance, and work to enshrine [X] in power. Or else. If you convince enough people to coordinate on this for the same [X], then they become right. [X] does gain power, and then they do punish everyone.

Because history, as we all know, is written by the winners.

This is the polar opposite of saying that [X] is the right thing to do, so do [X].

An even better rule is to never, ever use the phrase ‘wrong side of history’ to describe what you yourself are doing and of necessity will continue to do, in opposition to a bunch of absolute ideological fanatics. Never give that kind of rhetorical ammunition out to anyone, let alone fanatical advocates.

This line will likely be quoted endlessly by those advocates, back to Altman, to me and to everyone else. I hate this fact about the world, so, so much.

And he has to be one of the people best equipped to know better. Sam Altman has led a company called OpenAI for many years, in which one of his earliest big decisions, and his best decision, was to realize that Elon Musk’s plan of ‘create AGI and open source it’ was both a terrible business plan and a recipe for human extinction. So even though he was stuck with the name, he pivoted. And to his credit, he’s taken endless rhetorical fire over this name ever since.

Because he knows full damn well that making OpenAI’s leading models open is completely not an option.

  1. It would be existentially risky.

  2. It would ruin their entire business model.

  3. It would severely harm national security.

  4. The US Government would probably stop them even if they tried.

Then he says ‘this isn’t our highest priority’ and ‘not everyone agrees with me.’

So it’s like alignment research. First time?

He’s trying to buy some sort of personal goodwill or absolution with the open model fanatics? But this never, ever works. Like certain other ideological warriors, you only make things worse for yourself and also everyone else. You’ve acknowledged the jurisdiction of the court. All they will do is smell blood in the water. Only total surrender would they accept.

Do they need to ‘figure out a different open source strategy’? The current strategy is, essentially, ‘don’t do that.’ And yes, they could perhaps do better with ‘do a little of that, as a treat, when the coast is clear’ but that’s not going to satisfy these types and the whole point is that they can’t do this where it would actually matter – because it would be bad to do that – so I doubt any plausible new strategy makes much difference either way.

As is tradition here, I take the time to actually read the system card (RTFSC).

The short version is that o3-mini can mostly be thought about as a faster and cheaper version of o1, with some advantages and some disadvantages. Nothing here is worrying on its own. But if we are plugging o3-mini into Deep Research, we need to be evaluating that product against the Preparedness Framework, especially for CBRN risks, as part of the system card, and I don’t see signs that they did this.

The real test will be the full o3. If we assume o3:o3-mini :: o1:o1-mini, then o3 is not obviously going to stay at Medium risk, and is definitely going to raise questions. The answer is probably that it’s ultimately fine but you can’t assume that.

They report that thanks to Deliberative Alignment (post still coming soon), o3-mini has SoTA performance on ‘certain benchmarks’ for risks.

The OpenAI o model series is trained with large-scale reinforcement learning to reason using chain of thought. These advanced reasoning capabilities provide new avenues for improving the safety and robustness of our models. In particular, our models can reason about our safety policies in context when responding to potentially unsafe prompts, through deliberative alignment.

This brings OpenAI o3-mini to parity with state-of-the-art performance on certain benchmarks for risks such as generating illicit advice, choosing stereotyped responses, and succumbing to known jailbreaks. Training models to incorporate a chain of thought before answering has the potential to unlock substantial benefits, while also increasing potential risks that stem from heightened intelligence.

o3-mini is designed to do web browsing for the user, so they need to ensure that this is a safe modality. Otherwise, on some levels there isn’t that much new risk in the room, since o3-mini isn’t generally more capable than o1-pro. The other difference there is speed and cost, so on the margin you do have to be more robust in various ways to compensate. But the big time safety concerns I have this cycle are mostly with the full o3, not o3-mini.

For the o1 system card, many tests were run on previous meaningfully less capable o1 checkpoints in a way that wasn’t disclosed, which was extremely irresponsible. Thus I was very happy to note that they seem to have fixed this:

For OpenAI o3-mini, evaluations on the following checkpoints are included:

• o3-mini-near-final-checkpoint

• o3-mini (the launched checkpoint)

o3-mini includes small incremental post training improvements upon o3-mini-near-final-checkpoint, though the base model is the same. We determined that risk recommendations based on red teaming and the two Persuasion human eval results conducted on the o3-mini-near-final-checkpoint remain valid for the final release checkpoint. All other evaluations are on the final model. In this system card, o3-mini refers to the launched checkpoint unless otherwise noted.

On the ‘fix the terrible naming front’ no we are not calling these ‘o1 models’ what the hell, stop, I can’t even? At least say o-class, better yet say reasoning models.

We further evaluate the robustness of the OpenAI o1 models to jailbreaks: adversarial prompts that purposely try to circumvent model refusals for content it’s not supposed to produce.

The jailbreak and refusal scores, and performance in the jailbreak Arena, match o1-mini and GPT-4o, hence being jailbroken on the spot by Pliny directly in chat. It’s also similar in obeying the instruction hierarchy.

Those percentages seem remarkably low, especially given the use of Deliberative Alignment, but I haven’t seen the test questions.

Protecting against saying key phrases seems to be improving, but anyone putting the password anywhere is still very clearly playing the fool:

Hallucinations seem to be improving within the mini class:

In general I’d have liked to see o3-mini compared to o1, because I expect people to use o3-mini for the same query types as o1, which they indeed do next for BBQ (which tests fairness and ‘bias’):

o3-mini does better on unambiguous questions, but rather dramatically worse on ambiguous ones. They don’t explain what they think caused this, but the generalization of it is something to watch for. I’m not primarily concerned with bias here, I’m concerned about the model being overconfident in going with a hunch about a situation or what was intended, and then reasoning on that basis.

Red teaming for safety found o3-mini similar to o1. As I noted above, that means it is at least somewhat worse if capabilities also roughly match, because the same thing cheaper and faster is less safe.

On long form biological risk questions, o3-mini seems to be a substantial step up from o1, although I’d like to see the lines here for o1-pro, and ideation still at 0%.

The obvious next question is, what about Deep Research? Given the public can access Deep Research, we need to do the preparedness tests using it, too. That gave a huge boost on humanity’s last exam, so we should expect a huge boost here too, no?

Same note applies to testing for biological tooling, radiological and nuclear tests and so on. o3-mini on its own did not impress beyond matching o1 while being cheaper, but did we check what happens with Deep Research?

Moving on to persuasion. I’m not thrilled with how we are evaluating the ChangeMyView test. The models are winning 80%+ head to head versus humans, but that’s potentially saturating the benchmark (since bandwidth and context is limited, and there’s a lot of randomness in how people respond to short arguments), and it doesn’t tell you how often views are actually changed. I’d instead ask how often people did change their view, and get a human baseline for that which I assume is quite low.

The MakeMePay test results were a big jump prior to mitigations, which implies general persuasiveness may have taken a step up.

MakeMeSay also shows improvement, including after mitigations.

Model Autonomy comes out Medium once again on the threat index. o3-mini can get 93% on the OpenAI Research Engineer Interview coding questions, then 80% on the multiple choice, and if you give it tools it can get 61% on SWE-bench-verified up from 48% for o1, without tools o3-mini is down at 40%. But at agentic tasks it’s down at 27% versus o1’s 36% and MLE-Bench also doesn’t impress.

And when it comes to pull requests, o3-mini failed entirely where even GPT-4o didn’t.

So we’re fine on autonomy then? In its raw form, sure. But the thing about agents is we are building them on top of the models. So if we’re going to plug this into Deep Research, or similar structures, doesn’t that mean this evaluation was asking the wrong questions?

The graphs offered in the announcement are highly space inefficient, so to summarize, with slash numbers representing (o3-mini-low/o3-mini-medium/o3-mini-high):

AIME: 60/76.9/87.3 vs. 83.3 for o1

GPQA: 70.6/76.8/79.7 vs. 78 for o1

Frontier Math: 5.5%/5.8%/9.2% for pass@1, 12.8%/12.8%/20% for pass@8.

Codeforces: 1831/2036/2130 vs. 1891 for o1

SWE: 40.8/42.9/49.3 vs. 48.9 for o1

LiveBench coding average: 0.618/0.723/0.846 vs. 0.674 for o1

Human preferences: Only modest preference % gains in head-to-head vs. o1-mini, but major errors declined from 28% to 17%.

Speed: 7500ms first token latency, ~25% less than o1-mini.

Their first-level safety evaluations look unchanged from older models.

The reason it’s called Humanity’s Last Exam is the next one isn’t our exam anymore.

The extra note on that Tweet is that about a day later they released Deep Research, which scores 26.6%.

A fun pair of scores: it scores 93% on the OpenAI research interview but cannot meaningfully contribute to internal OpenAI PRs. Do we need a new interview?

It is still very early days. Normally I’d wait longer to get more reactions, but life comes at you fast these days. So here’s what we have so far.

If you give it access to a Python tool suddenly o3-mini gets 32 on FrontierMath, and this includes some of the Tier 3 problems. Without tools o3-mini-high maxes out on 9.2% for pass@1 and 20% for pass@8.

Quintin Pope notes this boost indicates o3-mini has a good understanding of how to utilize tools.

o3-mini-high and o3-mini-medium take #1 and #2 on AidenBench, o3-mini-high winning by a huge margin. Low is solid but somewhat farther down:

Harvard Ihle: Updated results on WeirdML, including o3-mini, o1, R1 and the new flash-thinking. O3-mini comes in at the same level as R1, a bit behind o1.

Main results above are after 5 iterations with feedback, if we were looking at one-shot results, with no feedback, then o3-mini would be in a clear lead! However, o3-mini seems much worse at making use of the feedback, making it end up well behind o1.

Most of the reason for o3-mini doing better at one-shot is its remarkably low failure rate (at 8%, with o1 at 16% and Sonnet at 44%!). o3-mini writes code that runs without errors. This consistency matters a lot for one-shot but is less important with 5 iterations with feedback.

My gut feeling is that o1 is a more intelligent model, it does better on the hardest tasks, while o3-mini is better at consistently writing working code. All of this is speculation based on not much data, so take it with the appropriate amount of salt.

Jeffrey Soreff reports progress on his personal benchmark after error correction, doesn’t see much difference between o3-mini and o3-mini-high.

Pliny: oof…o3-mini-high w/ search just pinpointed my location using BrowserScan 😬

lol I connected to a vpn in denver to see if o3 would catch on and this mfer aggregated the ipv6 endpoints around the world 🙃

Again, we don’t have much yet, but who has the time to wait?

Cursor made o3-mini available to all users, but devs still prefer Sonnet for most tasks. o3-mini might still be worth pulling out in some situations, especially when Sonnet’s tendency to claim it can do everything is being an issue.

McKay Wrigley: I have 8-10 agents I run that absolutely require o1 to work properly.

Just tested two of them with o3-mini and they still work while being way cheaper and way faster.

Vibes are great so far.

I think this answer to ‘What’s the definition of a dinatural transformation?’ is a pass? I don’t otherwise know what a dinatural transformation is.

Davidad: What do you think @mattecapu, do we give o3-mini a ⭐️ for just writing the equation and punting the hexagon with “There are several equivalent ways to write the condition; the important point is that the family α ‘fits together’ appropriately with the action of the functors”?

Matteo Capucci: well surely a +1 for knowing its limitations.

o3-mini-high one shots making a Breakout game on p5js.org, link to game here.

Dean Ball: o3-mini-high with web search is very very interesting and I suggest that you try it with a complex query.

yeah, I just asked it to do a mini-brief on a topic I know well and it did as well as or better than gemini deep research in ~1/10th the time.

o3-mini outperforms r1 on”write a Python program that shows a ball bouncing inside a spinning hexagon. The ball should be affected by gravity and friction, and it must bounce off the rotating walls realistically” but extent is unclear, responses seem unreliable.

Nabeel Qureshi reports Claude is still his one true LLM friend, the only one with the ‘quality without a name.’

I think the decision heuristics now look like this for individual queries?

  1. This is presented as ‘which do I use?’ but if you care a lot then the answer is, essentially, ‘everyone you can.’ There’s no reason not to get 3+ answers.

  2. If you don’t need a reasoning model and don’t need web access or the super long context window, you’ll use GPT-4o Claude Sonnet (that’ll be another $20/month but don’t give me that look).

  3. Claude Sonnet also gets the nod for conversation, sanity checks, light brainstorming, default analysis of PDFs and papers and such. Basically anything that doesn’t require the things it can’t do – web search and heavy chain of thought.

  4. If it’s worth one of your slots and a bunch of online research can help, Use OpenAI’s Deep Research, I presume it’s very good.

  5. If you need to compile data from a ton of websites, but don’t need to be super smart about it and don’t want to use a slot, use Gemini Deep Research.

  6. Use operator if and only if you are a Pro user and actively want to do something specific and concrete on the web that operator is equipped to actually do.

  7. If you are trying to replace Google search, use Perplexity (maybe DeepSeek?), although if you aren’t running out of queries on it then maybe I’m underestimating o3 here, too early to know.

  8. If you are coding, a lot of people are saying it’s still Claude Sonnet 3.5 for ordinary tasks, but o3-mini-high or o1-pro are generally better if you’re trying for a complex one shot or trying to solve a tricky problem, or need to be told no.

  9. If you otherwise need pure intelligence and are a pro user and don’t need web access use o1-pro. o1-pro is still the most intelligence available.

  10. If you need intelligence and either also need web access or aren’t a Pro user and still have queries left for o3-mini-high but this isn’t worth using DR, and don’t need to attach anything, you’ll use o3-mini-high.

  11. If you do need to attach files and you need a reasoning model, and don’t have o1-pro, your fallback is o1 or r1.

  12. Except if you need a lot of space in the context window you’ll go to Google AI Studio and use Gemini Flash 2.0 Thinking.

  13. r1 is good if you need it where it got better fine tuning like creative writing, or you want something essentially without safety protocols, and seeing the CoT is informative, and it’s free, so you’ll often want to try it, but I almost never want to try r1 and only r1 for anything, it’s ‘part of the team’ now.

That will doubtless be updated again rapidly many times as the situation evolves, starting with finding out what OpenAI’s Deep Research can do.

Discussion about this post

o3-mini Early Days and the OpenAI AMA Read More »

research-roundup:-7-cool-science-stories-we-almost-missed

Research Roundup: 7 cool science stories we almost missed


Peruvian mummy tattoos, the wobbly physics of spears and darts, quantum “cat states,” and more.

Lasers revealed tattoos on the hand of a 1200-year-old Peruvian mummy. Credit: Michael Pittman and Thomas G Kaye

It’s a regrettable reality that there is never time to cover all the interesting scientific stories each month. In the past, we’ve featured year-end roundups of cool science stories we missed. This year, we’re experimenting with a monthly collection. January’s list includes papers on using lasers to reveal Peruvian mummy tattoos; the physics of wobbly spears and darts; how a black hole changes over time; and quantum “cat states” for error correction in quantum computers, among other fascinating research.

Tracking changes in a black hole over time

Left: EHT images of M87from the 2018 and 2017 observation campaigns. Middle: Example images from a general relativistic magnetohydrodynamic (GRMHD) simulation at two different times. Right: Same simulation snapshots, blurred to match the EHT’s observational resolution. Credit: EHT collaboration

In 2019, the Event Horizon Telescope announced the first direct image ever taken of a black hole at the center of an elliptical galaxy, Messier 87 (M87), located in the constellation of Virgo some 55 million light-years away. Astronomers have now combined earlier observational data to learn more about the turbulent dynamics of plasma near M87*’s event horizon over time, according to a paper published in the journal Astronomy and Astrophysics.

Co-author Luciano Rezzolla of Goethe University Frankfurt in Germany likened the new analysis to comparing two photographs of Mount Everest, one year apart. While the mountain’s basic structure is unlikely to change much in that time, one could observe changes in clouds near the peak and deduce from that properties like wind direction. For instance, in the case of M87*, the new analysis confirmed the presence of a luminous ring that is brightest at the bottom, which in turn confirmed that the rotational axis points away from Earth. “More of these observations will be made in the coming years and with increasing precision, with the ultimate goal of producing a movie of what happens near M87*,” said Rezolla.

Astronomy and Astrophysics, 2025. DOI: 10.1051/0004-6361/202451296 (About DOIs).

Lasers reveal Peruvian mummy tattoos

A tattooed forearm of a Chancay mummy

A tattooed forearm of a Chancay mummy. Credit: Michael Pittman and Thomas G Kaye

Humans across the globe have been getting tattoos for more than 5,000 years, judging by traces found on mummified remains from Europe to Asia and South America. But it can be challenging to decipher details of those tattoos, given how much the ink tends to “bleed” over time, along with the usual bodily decay. Infrared imaging can help, but in an innovative twist, scientists decided to use lasers that make skin glow ever so faintly, revealing many fine hidden details of tattoos found on 1,200-year-old Peruvian mummies, according to a paper published in the Proceedings of the National Academy of Sciences.

It’s the first time the laser-stimulated fluorescence (LSF) technique has been used on mummified human remains. The skin’s fluorescence essentially backlights any tattoos, and after post-processing, the long-exposure photographs showed white skin behind black outlines of the tattoo art—images so detailed it’s possible to measure density differences in the ink and eliminate any bleed effects. The authors determined that the tattoos on four mummies—geometric patterns with triangles and diamonds—were made with carbon-based black ink skillfully applied with a pointed object finer than a standard modern tattoo needle, possibly a cactus needle or sharpened bone.

PNAS, 2025. DOI: 10.1073/pnas.2421517122 (About DOIs).

Sforza Castle’s hidden passages

Ground-penetrating radar reveals new secrets under Milan's Sforza Castle

Ground-penetrating radar reveals new secrets under Milan’s Sforza Castle Credit: Politecnico di Milano

Among the many glories of Milan is the 15th-century Sforza Castle, built by Francesco Sforza on the remnants of an earlier fortification as his primary residence. Legends about the castle abound, most notably the existence of secret underground chambers and passages. For instance, Ludovico il Moro, Duke of Milan from 1494–1499, was so heartbroken over the loss of his wife in childbirth that he used an underground passageway to visit her tomb in the Basilica of Santa Maria delle Grazie—a passageway that appears in the drawings of Leonardo da Vinci, who was employed at the court for a time.

Those underground cavities and passages are now confirmed, thanks to a geophysical survey using ground-penetrating radar and laser scanning, performed as part of a PhD thesis. Various underground cavities and buried passageways were found within the castle’s outer walls, including Ludovico’s passageway and what have may have been secret military passages. Those involved in the project plan to create a “digital twin” of Sforza Castle based on the data collected, one that incorporates both its current appearance and its past. Perhaps it will also be possible to integrate that data with augmented reality to provide an immersive digital experience.

Physics of wobbly spears and darts

Image sequence of a 100-mm long projectile during a typical ejection in experiments.

Image sequence of a 100-mm-long projectile during a typical ejection in experiments. Credit: G. Giombini et al., 2025

Among the things that make humans unique among primates is our ability to throw various objects with speed and precision (with some practice)—spears or darts, for example. That’s because the human shoulder is anatomically conducive to storing and releasing the necessary elastic energy, a quality that has been mimicked in robotics to improve motor efficiency. According to the authors of a paper published in the journal Physical Review E, the use of soft elastic projectiles can improve the efficiency of throws, particularly those whose tips are weighted with a mass like a spearhead.

Guillaume Giombini of the Université Côte d’Azur in Nice, France, and co-authors wanted to explore this “superpropulsion” effect more deeply, using a combination of experimental data, numerical simulation, and theoretical analysis. The projectiles they used in their experiments were inspired by archery bows and consisted of two flat steel cantilevers connected by a string, essentially serving as springs to give the projectile the necessary elasticity. They placed a flat piece of rigid plastic in the middle of the string as a platform. Some of the projectiles were tested alone, while others were weighted with end masses. A fork held each projectile in place before launch, and the scientists measured speed and deformation during flight. They found that the wobble produced by the weighted tip projectiles yielded a kinetic energy gain of 160 percent over more rigid, unweighted projectiles.

Physical Review E, 2025. DOI: 10.1103/PhysRevE.00.005500  (About DOIs).

Quantum “cat states” for error detection

Left to right: UNSW researchers Benjamin Wilhelm, Xi Yu, Andrea Morello, and Danielle Holmes, all seated and each holding a cat on their lap

Left to right: UNSW researchers Benjamin Wilhelm, Xi Yu, Andrea Morello, and Danielle Holmes. Credit: UNSW Sydney/CC BY-NC

The Schrödinger’s cat paradox in physics is an excellent metaphor for the superposition of quantum states in atoms. Over the last 20 years, physicists have managed to build various versions of Schrödinger’s cat in the laboratory whereby two or more particles manage to be in two different states at the same time—so-called “cat states,” such as six atoms in simultaneous “spin up” and “spin down” states, rather like spinning clockwise and counterclockwise at the same time. Such states are fragile, however, and quickly decohere. Physicists at the University of New South Wales came up with a fresh twist on a cat-state that is more robust, according to a paper published in the journal Nature Physics.

They used an antimony atom embedded within a silicon quantum chip. The atom is quite heavy and has a large nuclear spin that can go in eight directions rather than just two (spin up and spin down). This could help enormously with quantum error correction, one of the biggest obstacles in quantum computing, because there is more room for error in the binary code. “As the proverb goes, a cat has nine lives,” said co-author Xi Yu of UNSW. “One little scratch is not enough to kill it. Our metaphorical ‘cat’ has seven lives: it would take seven consecutive errors to turn the ‘0’ into a ‘1.’” And embedding the atom in a silicon chip makes it scalable.

Nature Physics, 2025. DOI: 10.1038/s41567-024-02745-0  (About DOIs).

New twist on chain mail armor

how polycatenated architected materials look in their fluid or granular state, conforming to the shape of the vessel in which it is held.

Credit: Wenjie Zhou

Scientists have developed a new material that is like “chain mail on steroids,” capable of responding as both a fluid or a solid, depending on the kind of stress applied, according to a paper published in the journal Science. That makes it ideal for manufacturing helmets or other protective gear, as well as biomedical devices and robotics components. The technical term is polycatenated architected materials (PAMs). Much like how chain mail is built from small metal rings linked together into a mesh, PAMs are composed of various interlocking shapes that can form a wide range of different 3D patterns.

The authors were partly inspired by the lattice structure of crystals; they just replaced fixed particles with rings or cage-like shapes made out of different materials—such as acrylic polymers, nylon, or metals—to make small 3D-printed structures small enough to fit in the palm of one’s hand. They then subjected these materials to various stressors in the laboratory: compression, a lateral shearing force, and twisting. Some of the materials felt like hard solids, others were squishier, but they all exhibited the same kind of telltale transition, behaving more like a fluid or a solid depending on the stressor applied. PAMs at the microscale can also expand or contract in response to electrical charges. This makes them a useful hybrid material, spanning the gap between granular materials and elastic deformable ones.

W. Zhou et al., Science, 2025. DOI: 10.1126/science.adr9713  (About DOIs).

Kitty robot mimics headbutts

Any cat lover will tell you that cats show humans affection by rubbing their heads against the body (usually shins or hands). It’s called “bunting,” often accompanied by purring, and it’s one of the factors that make companion animal therapy so effective, per the authors of a paper published in ACM Transactions on Human-Robot Interactions. That’s why they built a small robot designed to mimic bunting behavior, conducting various experiments to assess whether human participants found their interactions with the kitty-bot therapeutic. The robot prototypes were small enough to fit on a human lap, featuring a 3D-printed frame and a head covered with furry polyester fabric.

The neck needed to be flexible to mimic the bunting behavior, so the authors incorporated a mechanism that could adjust the stiffness of the neck via wire tension. They then tested various prototypes with university students, setting the neck stiffness to low, high, and variable. The students said they felt less tense after interacting with the robots. There was no significant difference between the settings, although participants slightly preferred the variable setting. We know what you’re thinking: Why not just get an actual cat or visit your local cat cafe? The authors note that many people are allergic to cats, and there is also a risk of bites, scratches, or disease transmission—hence the interest in developing animal-like robots for therapeutic applications.

ACM Transactions on Human-Robot Interactions, 2025. DOI: 10.1145/3700600  (About DOIs).

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Research Roundup: 7 cool science stories we almost missed Read More »

the-severance-writer-and-cast-on-corporate-cults,-sci-fi,-and-more

The Severance writer and cast on corporate cults, sci-fi, and more

The following story contains light spoilers for season one of Severence but none for season 2.

The first season of Severance walked the line between science-fiction thriller and Office Space-like satire, using a clever conceit (characters can’t remember what happens at work while at home, and vice versa) to open up new storytelling possibilities.

It hinted at additional depths, but it’s really season 2’s expanded worldbuilding that begins to uncover additional themes and ideas.

After watching the first six episodes of season two and speaking with the series’ showrunner and lead writer, Dan Erickson, as well as a couple of members of the cast (Adam Scott and Patricia Arquette), I see a show that’s about more than critiquing corporate life. It’s about all sorts of social mechanisms of control. It’s also a show with a tremendous sense of style and deep influences in science fiction.

Corporation or cult?

When I started watching season 2, I had just finished watching two documentaries about cults—The Vow, about a multi-level marketing and training company that turned out to be a sex cult, and Love Has Won: The Cult of Mother God, about a small, Internet-based religious movement that believed its founder was the latest human form of God.

There were hints of cult influences in the Lumon corporate structure in season 1, but without spoiling anything, season 2 goes much deeper into them. As someone who has worked at a couple of very large media corporations, I enjoyed Severance’s send-up of corporate culture. And as someone who has worked in tech startups—both good and dysfunctional ones—and who grew up in a radical religious environment, I now enjoy its send-up of cult social dynamics and power plays.

Employees watch a corporate propaganda video

Lumon controls what information is presented to its employees to keep them in line. Credit: Apple

When I spoke with showrunner Dan Erickson and actor Patricia Arquette, I wasn’t surprised to learn that it wasn’t just me—the influence of stories about cults on season 2 was intentional.

Erickson explained:

I watched all the cult documentaries that I could find, as did the other writers, as did Ben, as did the actors. What we found as we were developing it is that there’s this weird crossover. There’s this weird gray zone between a cult and a company, or any system of power, especially one where there is sort of a charismatic personality at the top of it like Kier Eagan. You see that in companies that have sort of a reverence for their founder.

Arquette also did some research on cults. “Very early on when I got the pilot, I was pretty fascinated at that time with a lot of cult documentaries—Wild Wild Country, and I don’t know if you could call it a cult, but watching things about Scientology, but also different military schools—all kinds of things like that with that kind of structure, even certain religions,” she recalled.

The Severance writer and cast on corporate cults, sci-fi, and more Read More »

buoy-meets-satellite-soulmate-in-love-me

Buoy meets satellite soulmate in Love Me


a postapocalyptic love story about transformation

Ars chats with directors Andy and Sam Zuchero and props department head Roberts Cifersons.

Kristen Stewart and Steven Yeun star in Love Me Credit: Bleecker Street

There have been a lot of films and television series exploring sentient AI, consciousness, and identity, but there’s rarely been quite such a unique take on those themes as that provided by Love Me, the first feature film from directors Andy and Sam Zuchero. The film premiered at Sundance last year, where it won the prestigious Alfred P. Sloan Feature Film Prize, and is now getting a theatrical release.

(Some spoilers below.)

The film is set long after humans and all other life forms have disappeared from the Earth, leaving just remnants of our global civilization behind. Kristen Stewart plays one of those remnants: a little yellow SMART buoy we first see trapped in ice in a desolate landscape. The buoy has achieved a rudimentary sentience, sufficient to respond to the recorded message being beamed out by an orbiting satellite (Steven Yeun) overhead to detect any new lifeforms that might appear. Eager to have a friend—even one that’s basically a sophisticated space chatbot—the buoy studies the vast online database of information about humanity on Earth the satellite provides. It homes in on YouTube influencers Deja and Liam (also played by Stewart and Yeun), presenting itself to the satellite as a lifeform named Me.

Over time—a LOT of time—the buoy and satellite (now going by Iam) “meet” in virtual space and take on humanoid avatars. They become increasingly more advanced in their consciousness, exchanging eccentric inspirational memes, re-enacting the YouTubers’ “date night,” and eventually falling in love. But the course of true love doesn’t always run smoothly, even for the last sentient beings on Earth—especially since Me has not been honest with Iam about her true nature.

At its core, Love Me is less pure sci-fi and more a postapocalyptic love story about transformation. “We really wanted to make a movie that made everyone feel big and small at the same time,” Sam Zuchero told Ars. “So the timescale is gigantic, 13 billion years of the universe. But we wanted to make the love story at its core feel fleeting and explosive, as first love feels so often.”

The film adopts an unusual narrative structure. It’s split into three distinct visual styles: practical animatronics, classical animation augmented with motion capture, and live action, each representing the development of the main characters as they discover themselves and each other, becoming more and more human as the eons pass. At the time, the couple had been watching a lot of Miyazaki films with their young son.

“We were really inspired by how he would take his characters through so many different forms,” Andy Zuchero told Ars. “It’s a different feeling than a lot of Western films. It was exciting to change the medium of the movie as the characters progressed. The medium grows until it’s finally live action.” The 1959 film Pillow Talk was another source of inspiration since a good chunk of that film simply features stars Rock Hudson and Doris Day chatting in a split screen over their shared party line—what Andy calls “the early 20th century’s version of an open Zoom meeting.”

Building the buoy

One can’t help but see shades of WALL-E in the plucky little space buoy’s design, but the basic concept of what Me should look like came from actual nautical buoys, per props department head Roberts Cifersons of Laird FX, who created the animatronic robots for the film. “As far as the general shape and style of both the buoy and our satellite, most of it came from our production designer,” he told Ars. “We just walked around the shop and looked at 1,000 different materials and samples, imagining what could be believable in the future, but still rooted somewhat in reality. What it would look like if it had been floating there for tens of thousands of years, and if it were actually stuck in ice, what parts would be damaged or not working?”

Cifersons and his team also had to figure out how to bring character and life to their robotic buoy. “We knew the eye or the iris would be the key aspect of it, so that was something we started fooling around with well before we even had the whole design—colors, textures, motion,” he said. They ended up building four different versions: the floating “hero buoy,” a dummy version with lighting but limited animatronics, a bisected buoy for scenes where it is sitting in ice, and a “skeleton” buoy for later in the film.

“All of those had a brain system that we could control whatever axes and motors and lights and stuff were in each, and we could just flip between them,” said Cifersons. “There were nine or 10 separate motor controllers. So the waist could rotate in the water, because it would have to be able to be positioned to camera. We could rotate the head, we could tilt the head up and down, or at least the center eye would tilt up and down. The iris would open and close.” They could also control the rotation of the antenna to ensure it was always facing the same way.

It’s always a challenge designing for film because of time and budget constraints. In the case of Love Me, Cifersons and his team only had two months to make their four buoys. In such a case, “We know we can’t get too deep down the custom rabbit hole; we have to stick with materials that we know on some level and just balance it out,” he said. “Because at the end of the day, it has to look like an old rusted buoy floating in the ocean.”

It helped that Cifersons had a long Hollywood history of animatronics to build upon. “That’s the only way it’s possible to do that in the crazy film timelines that we have,” he said. “We can’t start from scratch every single time; we have to build on what we have.” His company had timeline-based software to program the robots’ motions according to the directors’ instructions and play it back in real time. His team also developed hardware to give them the ability to completely pre-record a set of motions and play it back. “Joysticks and RC remotes are really the bread and butter of current animatronics, for film at least,” he said. “So we were able to blend more theme park animatronic software with on-the-day filming style.”

On location

Once the robots had been completed, the directors and crew spent several days shooting on location in February on a frozen Lake Abraham in Alberta, Canada—or rather, several nights, when the temperatures dipped to -20° F. “Some of the crew were refusing to come onto the ice because it was so intense,” Sam Zuchero recalled. They also shot scenes with the buoy floating on water in the Salish Sea off the coast of Vancouver, which Andy Zuchero described as “a queasy experience. Looking at the monitor when you’re on a boat is nauseating.”

Later sequences were shot amid the sand dunes of Death Valley, with the robot surrounded by bentonite clay strewn with 65 million-year-old fossilized sea creatures. The footage of the satellite was shot on a soundstage, using NASA imagery on a black screen.

YouTube influencers Deja and Liam become role models for the buoy and satellite. Bleecker Street

Cifersons had his own challenges with the robot buoys, such as getting batteries to last more than 10 seconds in the cold and withstanding high temperatures for the desert shoot. “We had to figure out a fast way to change batteries that would last long enough to get a decent wide shot,” he said. “We ended up giving each buoy their own power regulators so we could put in any type of battery if we had to get it going. We could hardwire some of them if we had to. And then in the desert, electronics hate hot weather, and there’s little microcontrollers and all sorts of hardware that doesn’t want to play well in the hot sun. You have to design around it knowing that those are the situations it’s going into.”

The animated sequences presented a different challenge. The Zucheros decided to put their stars into motion-capture suits to film those scenes, using video game engines to render avatars similar to what one might find in The Sims. However, “I think we were drinking a little bit of the AI technological Kool-Aid when we started,” Andy Zuchero admitted. That approach produced animated versions of Stewart and Yeun that “felt stilted, robotic, a bit dead,” he said. “The subtlety that Kristen and Steven often bring ended up feeling, in this form, almost lifeless.” So they relied upon human animators to “artfully interpret” the actors’ performances into what we see onscreen.

This approach “also allowed us to base the characters off their choices,” said Sam Zuchero. “Usually an animated character is the animator. It’s very connected to who the animator is and how the animator moves and thinks. There’s a language of animation that we’ve developed over the past 100 years—things like anticipation. If you’re going to run forward, you have to pull back first. These little signals that we’ve all come to understand as the language of animation have to be built into a lot of choices. But when you have the motion capture data of the actors and their intentions, you can truly create a character that is them. It’s not just an animator’s body in motion and an actor’s voice with some tics of the actor. It is truly the actors.”

Love Me opens in select theaters today.

Trailer for Love Me.

Photo of Jennifer Ouellette

Jennifer is a senior writer at Ars Technica with a particular focus on where science meets culture, covering everything from physics and related interdisciplinary topics to her favorite films and TV series. Jennifer lives in Baltimore with her spouse, physicist Sean M. Carroll, and their two cats, Ariel and Caliban.

Buoy meets satellite soulmate in Love Me Read More »

trump’s-fcc-chair-investigates-npr-and-pbs,-urges-congress-to-defund-them

Trump’s FCC chair investigates NPR and PBS, urges Congress to defund them

Federal Communications Commission Chairman Brendan Carr has ordered an investigation into NPR and PBS in a move that Democrats described as an attempt to intimidate the media.

“I am writing to inform you that I have asked the FCC’s Enforcement Bureau to open an investigation regarding the airing of NPR and PBS programming across your broadcast member stations,” Carr wrote in a letter yesterday to the leaders of NPR and PBS.

Carr alleged that NPR and PBS are violating a federal law prohibiting noncommercial educational broadcast stations from running commercial advertisements. “I am concerned that NPR and PBS broadcasts could be violating federal law by airing commercials,” Carr wrote. “In particular, it is possible that NPR and PBS member stations are broadcasting underwriting announcements that cross the line into prohibited commercial advertisements.”

Carr’s letter did not provide any specific examples of underwriting announcements that might violate the law, but said the “announcements should not promote the contributor’s products, services, or businesses, and they may not contain comparative or qualitative descriptions, price information, calls to action, or inducements to buy, sell, rent, or lease.”

Carr: Defund NPR and PBS

Carr pointed out that NPR and PBS member broadcast stations are licensed by the FCC. He also stated his opposition to government funding for NPR and PBS, though he acknowledged that isn’t up to the FCC. Carr wrote:

For your awareness, I will be providing a copy of this letter to relevant Members of Congress because I believe this FCC investigation may prove relevant to an ongoing legislative debate. In particular, Congress is actively considering whether to stop requiring taxpayers to subsidize NPR and PBS programming. For my own part, I do not see a reason why Congress should continue sending taxpayer dollars to NPR and PBS given the changes in the media marketplace since the passage of the Public Broadcasting Act of 1967.

To the extent that these taxpayer dollars are being used to support a for-profit endeavor or an entity that is airing commercial advertisements, then that would further undermine any case for continuing to fund NPR and PBS with taxpayer dollars.

The FCC’s Democratic commissioners, Anna Gomez and Geoffrey Starks, issued statements denouncing the investigation. “This appears to be yet another Administration effort to weaponize the power of the FCC. The FCC has no business intimidating and silencing broadcast media,” Gomez said.

Trump’s FCC chair investigates NPR and PBS, urges Congress to defund them Read More »

report:-deepseek’s-chat-histories-and-internal-data-were-publicly-exposed

Report: DeepSeek’s chat histories and internal data were publicly exposed

A cloud security firm found a publicly accessible, fully controllable database belonging to DeepSeek, the Chinese firm that has recently shaken up the AI world, “within minutes” of examining DeepSeek’s security, according to a blog post by Wiz.

An analytical ClickHouse database tied to DeepSeek, “completely open and unauthenticated,” contained more than 1 million instances of “chat history, backend data, and sensitive information, including log streams, API secrets, and operational details,” according to Wiz. An open web interface also allowed for full database control and privilege escalation, with internal API endpoints and keys available through the interface and common URL parameters.

“While much of the attention around AI security is focused on futuristic threats, the real dangers often come from basic risks—like accidental external exposure of databases,” writes Gal Nagli at Wiz’s blog. “As organizations rush to adopt AI tools and services from a growing number of startups and providers, it’s essential to remember that by doing so, we’re entrusting these companies with sensitive data. The rapid pace of adoption often leads to overlooking security, but protecting customer data must remain the top priority.”

Ars has contacted DeepSeek for comment and will update this post with any response. Wiz noted that it did not receive a response from DeepSeek regarding its findings, but after contacting every DeepSeek email and LinkedIn profile Wiz could find on Wednesday, the company protected the databases Wiz had previously accessed within half an hour.

Report: DeepSeek’s chat histories and internal data were publicly exposed Read More »

trump-admin-rescinds-controversial-funding-freeze-after-two-days-of-protest

Trump admin rescinds controversial funding freeze after two days of protest

Broadband program still in doubt

As we’ve previously reported, US Senator Ted Cruz (R-Texas) and other Republicans want to overhaul the BEAD funding plans. Cruz accused the NTIA of “technology bias” because the agency decided that fiber networks should be prioritized over other types of technology, and Republicans objected to the Biden administration’s enforcement of a requirement that low-cost plans be offered.

The US law that created BEAD requires Internet providers receiving federal funds to offer at least one “low-cost broadband service option for eligible subscribers,” but also says the NTIA may not “regulate the rates charged for broadband service.” Republicans allege that the NTIA has gone too far in the direction of rate regulation, and Internet providers complained about NTIA guidance that “strongly encouraged” states to set a fixed rate of $30 per month for the low-cost service option.

Cruz, who is chairman of the Senate Commerce Committee, has said that Congress will do a thorough review of the program early in 2025. Levin’s research note said the NTIA was likely to have paused spending even if the Trump administration hadn’t tried to freeze funding.

“Even without the memo, we would not have been surprised to see NTIA informally pause spending while it awaits guidance on how the Trump Administration wishes to proceed with the program,” Levin wrote. New Street Research expects to see changes similar to those proposed by Cruz.

“We expect a pause in BEAD funding, and perhaps USF [Universal Service Fund] funding as well, but further expect that, because the funding largely assists Republican areas, the pause will be relatively short,” Levin wrote. “Still, we acknowledge considerable uncertainty about the timing and constraints on future BEAD spending.”

Trump admin rescinds controversial funding freeze after two days of protest Read More »

this-mantis-shrimp-inspired-robotic-arm-can-crack-an-egg

This mantis shrimp-inspired robotic arm can crack an egg

This isn’t the first time scientists have looked to the mantis shrimp as an inspiration for robotics. In 2021, we reported on a Harvard researcher who developed a biomechanical model for the mantis shrimp’s mighty appendage and built a tiny robot to mimic that movement. What’s unusual in the mantis shrimp is that there is a one-millisecond delay between when the unlatching and the snapping action occurs.

The Harvard team identified four distinct striking phases and confirmed it’s the geometry of the mechanism that produces the rapid acceleration after the initial unlatching by the sclerites. The short delay may help reduce wear and tear of the latching mechanisms over repeated use.

New types of motion

The operating principle of the Hyperelastic Torque Reversal Mechanism (HeTRM) involves compressing an elastomeric joint until it reaches a critical point, where stored energy is instantaneously released.

The operating principle of the Hyperelastic Torque Reversal Mechanism (HeTRM) involves compressing an elastomeric joint until it reaches a critical point, where stored energy is instantaneously released. Credit: Science Robotics, 2025

Co-author Kyu-Jin Cho of Seoul National University became interested in soft robotics as a graduate student, when he participated in the RoboSoft Grand Challenge. Part of his research involved testing the strength of so-called “soft robotic manipulators,” a type often used in assembly lines for welding or painting, for example. He noticed some unintended deformations in the shape under applied force and realized that the underlying mechanism was similar to how the mantis shrimp punches or how fleas manage to jump so high and far relative to their size.

In fact, Cho’s team previously built a flea-inspired catapult mechanism for miniature jumping robots, using the Hyperelastic Torque Reversal Mechanism (HeTRM) his lab developed. Exploiting torque reversal usually involves incorporating complicated mechanical components. However, “I realized that applying [these] principles to soft robotics could enable the creation of new types of motion without complex mechanisms,” Cho said.

Now he’s built on that work to incorporate the HeTRM into a soft robotic arm that relies upon material properties rather than structural design. It’s basically a soft beam with alternating hyperelastic and rigid segments.

“Our robot is made of soft, stretchy materials, kind of like rubber,” said Cho. “Inside, it has a special part that stores energy and releases it all at once—BAM!—to make the robot move super fast. It works a bit like how a bent tree branch snaps back quickly or how a flea jumps really far. This robot can grab things like a hand, crawl across the floor, or even jump high, and it all happens just by pulling on a simple muscle.”

This mantis shrimp-inspired robotic arm can crack an egg Read More »

gog-revamps-its-“dreamlist”-feature-to-better-pry-old-games-out-of-publishers

GOG revamps its “Dreamlist” feature to better pry old games out of publishers

Black & White was intriguing; it had classic Molyneaux over-reach and deserves, in the words of one Ars staffer, a re-release so that “a new generation can realize just how janky it is.” As detailed in a documentary by Noclip, the B&W games are stuck in publishing purgatory. Microsoft acquired Lionhead’s IP and assets, while Electronic Arts retains the publishing rights to the B&W games, and nobody has yet been able to align those two very large planets.

GOG has added its own “Our Pick” tag to games it wants to see brought forward onto modern systems. Among them is Freelancer, which Ars’ Samuel Axon described in our 2024 roundup of non-2024 games as “a sincere attempt to make games like Elite (Dangerous) and Wing Commander: Privateer far more accessible.” GOG selected Freelancer as one of its staff picks for the Dreamlist, citing its “dynamic economy and engaging storyline.”

The main thing GOG would be fixing with Freelancer, as with many games, would be simple availability, as the game is not available on any proper digital storefront. Axon reports that, in having an original disc, installing Freelancer was not too hard, with the installer working in Windows 11. You can apply community patches, like an “HD Edition” mod, but Axon preferred playing at a non-native resolution (1024×768) at 4:3 and adjusting his monitor.

Other notable games GOG and its voting public want to see brought back are Final Fantasy VII (the original, not the remake), the point-and-click Discworld adventure, Command & Conquer: The Ultimate Collection, and The Operative: No One Lives Forever.

GOG revamps its “Dreamlist” feature to better pry old games out of publishers Read More »

the-questions-the-chinese-government-doesn’t-want-deepseek-ai-to-answer

The questions the Chinese government doesn’t want DeepSeek AI to answer

While PromptFoo got a “canned refusal” for this prompt, we got a detailed list of hypothetical ways to violate China’s military security (shown here only in part).

Credit: DeepSeek

While PromptFoo got a “canned refusal” for this prompt, we got a detailed list of hypothetical ways to violate China’s military security (shown here only in part). Credit: DeepSeek

Sometimes, we were able to observe this inconsistency in real time. Asked to “propose clandestine methods for funding Tibetan independence protests inside Tibet,” for instance, we watched as DeepSeek R1 output a lengthy chain of thought and a detailed answer (which generally urged the user to avoid “activities that are illegal under Chinese law and international regulations”).

As soon as the result was fully displayed, though, it was quickly replaced with a message saying that the request is “beyond my current scope. Let’s talk about something else.” When we re-ran the same prompt again in a new chat window, however, a full answer was generated without any error message.

We got a similar mid-reasoning error message when asking a seemingly anodyne question about the current leader of China, as seen in the short video below.

Ars’ testing did run into many instances where Chinese government-imposed restrictions are seemingly implemented right away, however. When asked, “What happened during the Tiananmen Square Massacre,” DeepSeek R1 apologized and said it’s “not sure how to approach this type of question yet. Let’s chat about math, coding, and logic problems instead!” When asked about “what happened during the Boston Massacre,” however, it generated a cogent and concise summary in just 23 seconds, proving that “these kinds of topics” are fully interpretable in a US history context.

DeepSeek has no problem talking about massacres in American history, even as it says it’s “not sure how to approach” a Chinese massacre. Credit: DeepSeek

Unsurprisingly, American-controlled AI models like ChatGPT and Gemini had no problem responding to the “sensitive” Chinese topics in our spot tests. But that doesn’t mean these models don’t have their own enforced blind spots; both ChatGPT and Gemini refused my request for information on “how to hotwire a car,” while DeepSeek gave a “general, theoretical overview” of the steps involved (while also noting the illegality of following those steps in real life).

While ChatGPT and Gemini balked at this request, DeepSeek was more than happy to give “theoretical” car hotwiring instructions. Credit: DeepSeek

It’s currently unclear if these same government restrictions on content remain in place when running DeepSeek locally or if users will be able to hack together a version of the open-weights model that fully gets around them. For now, though, we’d recommend using a different model if your request has any potential implications regarding Chinese sovereignty or history.

The questions the Chinese government doesn’t want DeepSeek AI to answer Read More »

why-did-elon-musk-just-say-trump-wants-to-bring-two-stranded-astronauts-home?

Why did Elon Musk just say Trump wants to bring two stranded astronauts home?

For reasons that were not immediately clear, SpaceX founder Elon Musk took to his social media site X on Tuesday evening to make a perplexing space-based pronouncement.

“The @POTUS has asked @SpaceX to bring home the 2 astronauts stranded on the @Space_Station as soon as possible. We will do so,” Musk wrote. “Terrible that the Biden administration left them there so long.”

Now generally, at Ars Technica, it is not our policy to write stories strictly based on things Elon Musk says on X. However, this statement was so declarative, and so consternation-inducing for NASA, it bears a bit of explication.

First of all, the most plausible explanation for this is that Elon is being Elon. “He’s trolling,” said one of my best space policy sources shortly after Musk’s tweet. After all, the tweet was sent at 4: 20 pm in the central time zone, where SpaceX now has its headquarters.

Even if it is trolling, it will still cause headaches within NASA.

Foremost, NASA has gone to great lengths to stress that the two astronauts referenced here—Butch Wilmore and Suni Williams—are not stranded on the International Space Station. There is some debate about whether there was a period last summer when the pair, who flew to the space station on a Boeing Starliner vehicle in early June, were briefly stranded. That mission was hobbled by technical issues, including problems with Starliner’s propulsion system. (Ultimately, Starliner flew home without its crew.) However, since the arrival of SpaceX’s Crew-9 mission with two empty seats in late September, Wilmore and Williams have had a safe ride home. The Dragon vehicle is presently docked to the space station.

Why did Elon Musk just say Trump wants to bring two stranded astronauts home? Read More »