AI

vision-pro-m5-review:-it’s-time-for-apple-to-make-some-tough-choices

Vision Pro M5 review: It’s time for Apple to make some tough choices


A state of the union from someone who actually sort of uses the thing.

The M5 Vision Pro with the Dual Knit Band. Credit: Samuel Axon

With the recent releases of visionOS 26 and newly refreshed Vision Pro hardware, it’s an ideal time to check in on Apple’s Vision Pro headset—a device I was simultaneously amazed and disappointed by when it launched in early 2024.

I still like the Vision Pro, but I can tell it’s hanging on by a thread. Content is light, developer support is tepid, and while Apple has taken action to improve both, it’s not enough, and I’m concerned it might be too late.

When I got a Vision Pro, I used it a lot: I watched movies on planes and in hotel rooms, I walked around my house placing application windows and testing out weird new ways of working. I tried all the neat games and educational apps, and I watched all the immersive videos I could get ahold of. I even tried my hand at developing my own applications for it.

As the months went on, though, I used it less and less. The novelty wore off, and as cool as it remained, practicality beat coolness. By the time Apple sent me the newer model a couple of weeks ago, I had only put the original one on a few times in the prior couple of months. I had mostly stopped using it at home, but I still took it on trips as an entertainment device for hotel rooms now and then.

That’s not an uncommon story. You even see it in the subreddit for Vision Pro owners, which ought to be the home of the device’s most dedicated fans. Even there, people say, “This is really cool, but I have to go out of my way to keep using it.”

Perhaps it would have been easier to bake it into my day-to-day habits if developer and content creator support had been more robust, a classic chicken-and-egg problem.

After a few weeks of using the new Vision Pro hardware refresh daily, it’s clear to me that the platform needs a bigger rethink. As a fan of the device, I’m concerned it won’t get that, because all the rumors point to Apple pouring its future resources into smart glasses, which, to me, are a completely different product category.

What changed in the new model?

For many users, the most notable change here will be something you can buy separately (albeit at great expense) for the old model: A new headband that balances the device’s weight on your head better, making it more comfortable to wear for long sessions.

Dubbed the Dual Knit Band, it comes with an ingeniously simple adjustment knob that can be used to tighten or loosen either the band that goes across the back of your head (similar to the old band) or the one that wraps around the top.

It’s well-designed, and it will probably make the Vision Pro easier to use for many people who found the old model to be too uncomfortable—even though this model is slightly heavier than its predecessor.

The band fit is adjusted with this knob. You can turn it to loosen or tighten one strap, then pull it out and turn it again to adjust the other. Credit: Samuel Axon

I’m one of the lucky few who never had any discomfort problems with the Vision Pro, but I know a bunch of folks who said the pressure the device put on their foreheads was unbearable. That’s exactly what this new band remedies, so it’s nice to see.

The M5 chip offers more than just speed

Whereas the first Vision Pro had Apple’s M2 chip—which was already a little behind the times when it launched—the new one adds the M5. It’s much faster, especially for graphics-processing and machine-learning tasks. We’ve written a lot about the M5 in our articles on other Apple products if you’re interested to learn more about it.

Functionally, this means a lot of little things are a bit faster, like launching certain applications or generating a Persona avatar. I’ll be frank: I didn’t notice any difference that significantly impacted the user experience. I’m not saying I couldn’t tell it was faster sometimes. I’m just saying it wasn’t faster in a way that’s meaningful enough to change any attitudes about the device.

It’s most noticeable with games—both native mixed-reality Vision Pro titles and the iPad versions of demanding games that you can run on a virtual display on the device. Demanding 3D games look and run nicer, in many cases. The M5 also supports more recent graphics advancements like ray tracing and mesh shading, though very few games support them, even in terms of iPad versions.

All this is to say that while I always welcome performance improvements, they are definitely not enough to convince an M2 Vision Pro owner to upgrade, and they won’t tip things over for anyone who has been on the fence about buying one of these things.

The main perk of the new chip is improved efficiency, which is the driving force behind modestly increased battery life. When I first took the M2 Vision Pro on a plane, I tried watching 2021’s Dune. I made it through the movie, but just barely; the battery ran out during the closing credits. It’s not a short movie, but there are longer ones.

Now, the new headset can easily get another 30 or 60 minutes, depending on what you’re doing, which finally puts it in “watch any movie you want” territory.

Given how short battery life was in the original version, even a modest bump like that makes a big difference. That, alongside a marginally increased field of view (about 10 percent) and a new 120 Hz maximum refresh rate for passthrough are the best things about the new hardware. These are nice-to-haves, but they’re not transformational by any means.

We already knew the Vision Pro offered excellent hardware (even if it’s overkill for most users), but the platform’s appeal is really driven by software. Unfortunately, this is where things are running behind expectations.

For content, it’s quality over quantity

When the first Vision Pro launched, I was bullish about the promise of the platform—but a lot of that was contingent on a strong content cadence and third-party developer support.

And as I’ve written since, the content cadence for the first year was a disappointment. Whereas I expected weekly episodes of Apple’s Immersive Videos in the TV app, those short videos arrived with gaps of several months. There’s an enormous wealth of great immersive content outside of Apple’s walled garden, but Apple didn’t seem interested in making that easily accessible to Vision Pro owners. Third-party apps did some of that work, but they lagged behind those on other platforms.

The first-party content cadence picked up after the first year, though. Plus, Apple introduced the Spatial Gallery, a built-in app that aggregates immersive 3D photos and the like. It’s almost TikTok-like in that it lets you scroll through short-form content that leverages what makes the device unique, and it’s exactly the sort of thing that the platform so badly needed at launch.

The Spatial Gallery is sort of like a horizontally-scrolling TikTok for 3D photos and video. Credit: Samuel Axon

The content that is there—whether in the TV app or the Spatial Gallery—is fantastic. It’s beautifully, professionally produced stuff that really leans on the hardware. For example, there is an autobiographical film focused on U2’s Bono that does some inventive things with the format that I had never seen or even imagined before.

Bono, of course, isn’t everybody’s favorite, but if you can stomach the film’s bloviating, it’s worth watching just with an eye to what a spatial video production can or should be.

I still think there’s significant room to grow, but the content situation is better than ever. It’s not enough to keep you entertained for hours a day, but it’s enough to make putting on the headset for a bit once a week or so worth it. That wasn’t there a year ago.

The software support situation is in a similar state.

App support is mostly frozen in the year 2024

Many of us have a suite of go-to apps that are foundational to our individual approaches to daily productivity. For me, primarily a macOS user, they are:

  • Firefox
  • Spark
  • Todoist
  • Obsidian
  • Raycast
  • Slack
  • Visual Studio Code
  • Claude
  • 1Password

As you can see, I don’t use most of Apple’s built-in apps—no Safari, no Mail, no Reminders, no Passwords, no Notes… no Spotlight, even. All that may be atypical, but it has never been a problem on macOS, nor has it been on iOS for a few years now.

Impressively, almost all of these are available on visionOS—but only because it can run iPad apps as flat, virtual windows. Firefox, Spark, Todoist, Obsidian, Slack, 1Password, and even Raycast are all available as supported iPad apps, but surprisingly, Claude isn’t, even though there is a Claude app for iPads. (ChatGPT’s iPad app works, though.) VS Code isn’t available, of course, but I wasn’t expecting it to be.

Not a single one of these applications has a true visionOS app. That’s too bad, because I can think of lots of neat things spatial computing versions could do. Imagine browsing your Obsidian graph in augmented reality! Alas, I can only dream.

You can tell the native apps from the iPad ones: The iPad ones have rectangular icons nested within circles, whereas the native apps fill the whole circle. Credit: Samuel Axon

If you’re not such a huge productivity software geek like me and you use Apple’s built-in apps, things look a little better, but surprisingly, there are still a few apps that you would imagine would have really cool spatial computing features—like Apple Maps—that don’t. Maps, too, is just an iPad app.

Even if you set productivity aside and focus on entertainment, there are still frustrating gaps. Almost two years later, there is still no Netflix or YouTube app. There are decent-enough third-party options for YouTube, but you have to watch Netflix in a browser, which is lower-quality than in a native app and looks horrible on one of the Vision Pro’s big virtual screens.

To be clear, there is a modest trickle of interesting spatial app experiences coming in—most of them games, educational apps, or cool one-off ideas that are fun to check out for a few minutes.

All this is to say that nothing has really changed since February 2024. There was an influx of apps at launch that included a small number of show-stoppers (mostly educational apps), but the rest ranged from “basically the iPad app but with one or two throwaway tech-demo-style spatial features you won’t try more than once” to “basically the iPad app but a little more native-feeling” to “literally just the iPad app.” As far as support from popular, cross-platform apps, it’s mostly the same list today as it was then.

Its killer app is that it’s a killer monitor

Even though Apple hasn’t made a big leap forward in developer support, it has made big strides in making the Vision Pro a nifty companion to the Mac.

From the start, it has had a feature that lets you simply look at a Mac’s built-in display, tap your fingers, and launch a large, resizable virtual monitor. I have my own big, multi-monitor setup at home, but I have used the Vision Pro this way sometimes when traveling.

I had some complaints at the start, though. It could only do one monitor, and that monitor was limited to 60 Hz and a standard widescreen resolution. That’s better than just using a 14-inch MacBook Pro screen, but it’s a far cry from the sort of high-end setup a $3,500 price tag suggests. Furthermore, it didn’t allow you to switch audio between the two devices.

Thanks to both software and hardware updates, that has all changed. visionOS now supports three different monitor sizes: the standard widescreen aspect ratio, a wider one that resembles a standard ultra-wide monitor, and a gigantic, ultra-ultra-wide wrap-around display that I can assure you will leave no one wanting for desktop space. It looks great. Problem solved! Likewise, it will now transfer your Mac audio to the Vision Pro or its Bluetooth headphones automatically.

All of that works not just on the new Vision Pro, but also on the M2 model. The new M5 model exclusively addresses the last of my complaints: You can now achieve higher refresh rates for that virtual monitor than 60 Hz. Apple says it goes “up to 120 Hz,” but there’s no available tool for measuring exactly where it’s landing. Still, I’m happy to see any improvement here.

This is the standard width for the Mac monitor feature… Samuel Axon

Through a series of updates, Apple has turned a neat proof-of-concept feature into something that is genuinely valuable—especially for folks who like ultra-wide or multi-monitor setups but have to travel a lot (like myself) or who just don’t want to invest in the display hardware at home.

You can also play your Mac games on this monitor. I tried playing No Man’s Sky and Cyberpunk 2077 on it with a controller, and it was a fantastic experience.

This, alongside spatial video and watching movies, is the Vision Pro’s current killer app and one of the main areas where Apple has clearly put a lot of effort into improving the platform.

Stop trying to make Personas happen

Strangely, another area where Apple has invested quite a bit to make things better is in the Vision Pro’s usefulness as a communications and meetings device. Personas—the 3D avatars of yourself that you create for Zoom calls and the like—were absolutely terrible when the M2 Vision Pro came out.

There is also EyeSight, which uses your Persona to show a simulacrum of your eyes to people around you in the real world, letting them know you are aware of your surroundings and even allowing them to follow your gaze. I understand the thought behind this feature—Apple doesn’t want mixed reality to be socially isolating—but it sometimes puts your eyes in the wrong place, it’s kind of hard to see, and it honestly seems like a waste of expensive hardware.

Primarily via software updates, I’m pleased to report that Personas are drastically improved. Mine now actually looks like me, and it moves more naturally, too.

I joined a FaceTime call with Apple reps where they showed me how Personas float and emote around each other, and how we could look at the same files and assets together. It was indisputably cool and way better than before, thanks to the improved Personas.

I can’t say as much for EyeSight, which looks the same. It’s hard for me to fathom that Apple has put multiple sensors and screens on this thing to support this feature.

In my view, dropping EyeSight would be the single best thing Apple could do for this headset. Most people don’t like  it, and most people don’t want it, yet there is no question that its inclusion adds a not-insignificant amount to both the price and the weight, the product’s two biggest barriers to adoption.

Likewise, Personas are theoretically cool, and it is a novel and fun experience to join a FaceTime call with people and see how it works and what you could do. But it’s just that: a novel experience. Once you’ve done it, you’ll never feel the need to do it again. I can barely imagine anyone who would rather show up to a call as a Persona than take the headset off for 30 minutes to dial in on their computer.

Much of this headset is dedicated to this idea that it can be a device that connects you with others, but maintaining that priority is simply the wrong decision. Mixed reality is isolating, and Apple is treating that like a problem to be solved, but I consider that part of its appeal.

If this headset were capable of out-in-the-world AR applications, I would not feel that way, but the Vision Pro doesn’t support any application that would involve taking it outside the home into public spaces. A lot of the cool, theoretical AR uses I can think of would involve that, but still no dice here.

The metaverse (it’s telling that this is the first time I’ve typed that word in at least a year) already exists: It’s on our phones, in Instagram and TikTok and WeChat and Fortnite. It doesn’t need to be invented, and it doesn’t need a new, clever approach to finally make it take off. It has already been invented. It’s already in orbit.

Like the iPad and the Apple Watch before it, the Vision Pro needs to stop trying to be a general-purpose device and instead needs to lean into what makes it special.

In doing so, it will become a better user experience, and it will get lighter and cheaper, too. There’s real potential there. Unfortunately, Apple may not go that route if leaks and insider reports are to be believed.

There’s still a ways to go, so hopefully this isn’t a dead end

The M5 Vision Pro was the first of four planned new releases in the product line, according to generally reliable industry analyst Ming-Chi Kuo. Next up, he predicted, would be a full Vision Pro 2 release with a redesign, and a Vision Air, a cheaper, lighter alternative. Those would all precede true smart glasses many years down the road.

I liked that plan: keep the full-featured Vision Pro for folks who want the most premium mixed reality experience possible (but maybe drop EyeSight), and launch a cheaper version to compete more directly with headsets like Meta’s Quest line of products, or the newly announced Steam Frame VR headset from Valve, along with planned competitors by Google, Samsung, and others.

True augmented reality glasses are an amazing dream, but there are serious problems of optics and user experience that we’re still a ways off from solving before those can truly replace the smartphone as Tim Cook once predicted.

All that said, it looks like that plan has been called into question. A Bloomberg report in October claimed that Apple CEO Tim Cook had told employees that the company was redirecting resources from future passthrough HMD products to accelerate work on smart glasses.

Let’s be real: It’s always going to be a once-in-a-while device, not a daily driver. For many people, that would be fine if it cost $1,000. At $3,500, it’s still a nonstarter for most consumers.

I believe there is room for this product in the marketplace. I still think it’s amazing. It’s not going to be as big as the iPhone, or probably even the iPad, but it has already found a small audience that could grow significantly if the price and weight could come down. Removing all the hardware related to Personas and EyeSight would help with that.

I hope Apple keeps working on it. When Apple released the Apple Watch, it wasn’t entirely clear what its niche would be in users’ lives. The answer (health and fitness) became crystal clear over time, and the other ambitions of the device faded away while the company began building on top of what was working best.

You see Apple doing that a little bit with the expanded Mac spatial display functionality. That can be the start of an intriguing journey. But writers have a somewhat crass phrase: “kill your darlings.” It means that you need to be clear-eyed about your work and unsentimentally cut anything that’s not working, even if you personally love it—even if it was the main thing that got you excited about starting the project in the first place.

It’s past time for Apple to start killing some darlings with the Vision Pro, but I truly hope it doesn’t go too far and kill the whole platform.

Photo of Samuel Axon

Samuel Axon is the editorial lead for tech and gaming coverage at Ars Technica. He covers AI, software development, gaming, entertainment, and mixed reality. He has been writing about gaming and technology for nearly two decades at Engadget, PC World, Mashable, Vice, Polygon, Wired, and others. He previously ran a marketing and PR agency in the gaming industry, led editorial for the TV network CBS, and worked on social media marketing strategy for Samsung Mobile at the creative agency SPCSHP. He also is an independent software and game developer for iOS, Windows, and other platforms, and he is a graduate of DePaul University, where he studied interactive media and software development.

Vision Pro M5 review: It’s time for Apple to make some tough choices Read More »

anthropic-introduces-cheaper,-more-powerful,-more-efficienct-opus-4.5-model

Anthropic introduces cheaper, more powerful, more efficienct Opus 4.5 model

Anthropic today released Opus 4.5, its flagship frontier model, and it brings improvements in coding performance, as well as some user experience improvements that make it more generally competitive with OpenAI’s latest frontier models.

Perhaps the most prominent change for most users is that in the consumer app experiences (web, mobile, and desktop), Claude will be less prone to abruptly hard-stopping conversations because they have run too long. The improvement to memory within a single conversation applies not just to Opus 4.5, but to any current Claude models in the apps.

Users who experienced abrupt endings (despite having room left in their session and weekly usage budgets) were hitting a hard context window (200,000 tokens). Whereas some large language model implementations simply start trimming earlier messages from the context when a conversation runs past the maximum in the window, Claude simply ended the conversation rather than allow the user to experience an increasingly incoherent conversation where the model would start forgetting things based on how old they are.

Now, Claude will instead go through a behind-the-scenes process of summarizing the key points from the earlier parts of the conversation, attempting to discard what it deems extraneous while keeping what’s important.

Developers who call Anthropic’s API can leverage the same principles through context management and context compaction.

Opus 4.5 performance

Opus 4.5 is the first model to surpass an accuracy score of 80 percent—specifically, 80.9 percent in the SWE-Bench Verified benchmark, narrowly beating OpenAI’s recently released GPT-5.1-Codex-Max (77.9 percent) and Google’s Gemini 3 Pro (76.2 percent). The model performs particularly well in agentic coding and agentic tool use benchmarks, but still lags behind GPT-5.1 in visual reasoning (MMMU).

Anthropic introduces cheaper, more powerful, more efficienct Opus 4.5 model Read More »

uk-government-will-buy-tech-to-boost-ai-sector-in-$130m-growth-push

UK government will buy tech to boost AI sector in $130M growth push

“Our particular strengths as a country lie in areas like life sciences, financial services, the defense sector, and the creative sector. And where we will really lead the world is where we can use the power of AI in those sectors,” Kendall told the Financial Times.

The plans came as part of a wider AI package designed to upgrade Britain’s tech infrastructure and convince entrepreneurs and investors that Labour is backing the sector ahead of next week’s Budget, which is expected to raise taxes on the wealthy.

The UK has sought to attract investment from US AI companies such as OpenAI and Anthropic.

The government has signed several “strategic partnerships” with American groups in a bid to attract foreign investment in UK AI infrastructure and talent, in exchange for adopting their technology in the public sector.

Sue Daley, of lobby group TechUK, said the plan showed “real ambition” but warned: “Advanced market commitments of this kind must be designed carefully to avoid unintentionally distorting competition.”

The government also announced that James Wise, a venture capitalist at Balderton, would chair the government’s 500 million pound sovereign AI unit, which has been set up to back AI startups alongside the British Business Bank.

Additional reporting by Ivan Levingston.

© 2025 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

UK government will buy tech to boost AI sector in $130M growth push Read More »

“go-generate-a-bridge-and-jump-off-it”:-how-video-pros-are-navigating-ai

“Go generate a bridge and jump off it”: How video pros are navigating AI


I talked with nine creators about economic pressures and fan backlash.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

In 2016, the legendary Japanese filmmaker Hayao Miyazaki was shown a bizarre AI-generated video of a misshapen human body crawling across a floor.

Miyazaki declared himself “utterly disgusted” by the technology demo, which he considered an “insult to life itself.”

“If you really want to make creepy stuff, you can go ahead and do it,” Miyazaki said. “I would never wish to incorporate this technology into my work at all.”

Many fans interpreted Miyazaki’s remarks as rejecting AI-generated video in general. So they didn’t like it when, in October 2024, filmmaker PJ Accetturo used AI tools to create a fake trailer for a live-action version of Miyazaki’s animated classic Princess Mononoke. The trailer earned him 22 million views on X. It also earned him hundreds of insults and death threats.

“Go generate a bridge and jump off of it,” said one of the funnier retorts. Another urged Accetturo to “throw your computer in a river and beg God’s forgiveness.”

Someone tweeted that Miyazaki “should be allowed to legally hunt and kill this man for sport.”

PJ Accetturo is a director and founder of Genre AI, an AI ad agency. Credit: PJ Accetturo

The development of AI image and video generation models has been controversial, to say the least. Artists have accused AI companies of stealing their work to build tools that put people out of a job. Using AI tools openly is stigmatized in many circles, as Accetturo learned the hard way.

But as these models have improved, they have sped up workflows and afforded new opportunities for artistic expression. Artists without AI expertise might soon find themselves losing work.

Over the last few weeks, I’ve spoken to nine actors, directors, and creators about how they are navigating these tricky waters. Here’s what they told me.

Actors have emerged as a powerful force against AI. In 2023, SAG-AFTRA, the Hollywood actors’ union, had its longest-ever strike, partly to establish more protections for actors against AI replicas.

Actors have lobbied to regulate AI in their industry and beyond. One actor I talked with, Erik Passoja, has testified before the California Legislature in favor of several bills, including for greater protections against pornographic deepfakes. SAG-AFTRA endorsed SB 1047, an AI safety bill regulating frontier models. The union also organized against the proposed moratorium on state AI bills.

A recent flashpoint came in September, when Deadline Hollywood reported that talent agencies were interested in signing “AI actress” Tilly Norwood.

Actors weren’t happy. Emily Blunt told Variety, “This is really, really scary. Come on agencies, don’t do that.”

Natasha Lyonne, star of Russian Doll, posted on an Instagram Story: “Any talent agency that engages in this should be boycotted by all guilds. Deeply misguided & totally disturbed.”

The backlash was partly specific to Tilly Norwood—Lyonne is no AI skeptic, having cofounded an AI studio—but it also reflects a set of concerns around AI common to many in Hollywood and beyond.

Here’s how SAG-AFTRA explained its position:

Tilly Norwood is not an actor, it’s a character generated by a computer program that was trained on the work of countless professional performers — without permission or compensation. It has no life experience to draw from, no emotion and, from what we’ve seen, audiences aren’t interested in watching computer-generated content untethered from the human experience. It doesn’t solve any “problem” — it creates the problem of using stolen performances to put actors out of work, jeopardizing performer livelihoods and devaluing human artistry.

This statement reflects three broad criticisms that come up over and over in discussions of AI art:

Content theft: Most leading AI video models have been trained on broad swathes of the Internet, including images and films made by artists. In many cases, companies have not asked artists for permission to use this content, nor compensated them. Courts are still working out whether this is fair use under copyright law. But many people I talked to consider AI companies’ training efforts to be theft of artists’ work.

Job loss:  If AI tools can make passable video quickly or drastically speed up editing tasks, that potentially takes jobs away from actors or film editors. While past technological advancements have also eliminated jobs—the adoption of digital cameras drastically reduced the number of people cutting physical film—AI could have an even broader impact.

Artistic quality:  A lot of people told me they just didn’t think AI-generated content could ever be good art. Tess Dinerstein stars in vertical dramas—episodic programs optimized for viewing on smartphones. She told me that AI is “missing that sort of human connection that you have when you go to a movie theater and you’re sobbing your eyes out because your favorite actor is talking about their dead mom.”

The concern about theft is potentially solvable by changing how models are trained. Around the time Accetturo released the “Princess Mononoke” trailer, he called for generative AI tools to be “ethically trained on licensed datasets.”

Some companies have moved in this direction. For instance, independent filmmaker Gille Klabin told me he “feels pretty good” using Adobe products because the company trains its AI models on stock images that it pays royalties for.

But the other two issues—job losses and artistic integrity—will be harder to finesse. Many creators—and fans—believe that AI-generated content misses the fundamental point of art, which is about creating an emotional connection between creators and viewers.

But while that point is compelling in theory, the details can be tricky.

Dinerstein, the vertical drama actress, told me that she’s “not fundamentally against AI”—she admits “it provides a lot of resources to filmmakers” in specialized editing tasks—but she takes a hard stance against it on social media.

“It’s hard to ever explain gray areas on social media,” she said, and she doesn’t want to “come off as hypocritical.”

Even though she doesn’t think that AI poses a risk to her job—“people want to see what I’m up to”—she does fear people (both fans and vertical drama studios) making an AI representation of her without her permission. And she has found it easiest to just say, “You know what? Don’t involve me in AI.”

Others see it as a much broader issue. Actress Susan Spano told me it was “an issue for humans, not just actors.”

“This is a world of humans and animals,” she said. “Interaction with humans is what makes it fun. I mean, do we want a world of robots?”

It’s relatively easy for actors to take a firm stance against AI because they inherently do their work in the physical world. But things are more complicated for other Hollywood creatives, such as directors, writers, and film editors. AI tools can genuinely make them more productive, and they’re at risk of losing work if they don’t stay on the cutting edge.

So the non-actors I talked to took a range of approaches to AI. Some still reject it. Others have used the tools reluctantly and tried to keep their heads down. Still others have openly embraced the technology.

Kavan Cardoza is a director and AI filmmaker. Credit: Phantom X

Take Kavan Cardoza, for example. He worked as a music video director and photographer for close to a decade before getting his break into filmmaking with AI.

After the image model Midjourney was first released in 2022, Cardoza started playing around with image generation and later video generation. Eventually, he “started making a bunch of fake movie trailers” for existing movies and franchises. In December 2024, he made a fan film in the Batman universe that “exploded on the Internet,” before Warner Bros. took it down for copyright infringement.

Cardoza acknowledges that he re-created actors in former Batman movies “without their permission.” But he insists he wasn’t “trying to be malicious or whatever. It was truly just a fan film.”

Whereas Accetturo received death threats, the response to Cardoza’s fan film was quite positive.

“Every other major studio started contacting me,” Cardoza said. He set up an AI studio, Phantom X, with several of his close friends. Phantom X started by making ads (where AI video is catching on quickest), but Cardoza wanted to focus back on films.

In June, Cardoza made a short film called Echo Hunter, a blend of Blade Runner and The Matrix. Some shots look clearly AI-generated, but Cardoza used motion-capture technology from Runway to put the faces of real actors into his AI-generated world. Overall, the piece pretty much hangs together.

Cardoza wanted to work with real actors because their artistic choices can help elevate the script he’s written: “There’s a lot more levels of creativity to it.” But he needed SAG-AFTRA’s approval to make a film that blends AI techniques with the likenesses of SAG-AFTRA actors. To get it, he had to promise not to reuse the actors’ likenesses in other films.

In Cardoza’s view, AI is “giving voices to creators that otherwise never would have had the voice.”

But Cardoza isn’t wedded to AI. When an interviewer asked him whether he’d make a non-AI film if required to, he responded, “Oh, 100 percent.” Cardoza added that if he had the budget to do it now, “I’d probably still shoot it all live action.”

He acknowledged to me that there will be losers in the transition—“there’s always going to be changes”—but he compares the rise of AI with past technological developments in filmmaking, like the rise of visual effects. This created new jobs making visual effects digitally, but reduced jobs making elaborate physical sets.

Cardoza expressed interest in reducing the amount of job loss. In another interview, Cardoza said that for his film project, “we want to make sure we include as many people as possible,” not just actors, but sound designers, script editors, and other specialized roles.

But he believes that eventually, AI will get good enough to do everyone’s job. “Like I say with tech, it’s never about if, it’s just when.”

Accetturo’s entry into AI was similar. He told me that he worked for 15 years as a filmmaker, “mostly as a commercial director and former documentary director.” During the pandemic, he “raised millions” for an animated TV series, but it got caught up in development hell.

AI gave him a new chance at success. Over the summer of 2024, he started playing around with AI video tools. He realized that he was in the sweet spot to take advantage of AI: experienced enough to make something good, but not so established that he was risking his reputation. After Google released Veo 3 in May, Accetturo released a fake medicine ad that went viral. His studio now produces ads for prominent companies like Oracle and Popeyes.

Accetturo says the backlash against him has subsided: “It truly is nothing compared to what it was.” And he says he’s committed to working on AI: “Everyone understands that it’s the future.”

Between the anti- and pro-AI extremes, there are a lot of editors and artists quietly using AI tools without disclosing it. Unsurprisingly, it’s difficult to find people who will speak about this on the record.

“A lot of people want plausible deniability right now,” according to Ryan Hayden, a Hollywood talent agent. “There is backlash about it.”

But if editors don’t use AI tools, they risk becoming obsolete. Hayden says that he knows a lot of people in the editing field trying to master AI because “there’s gonna be a massive cut” in the total number of editors. Those who know AI might survive.

As one comedy writer involved in an AI project told Wired, “We wanted to be at the table and not on the menu.”

Clandestine AI usage extends into the upper reaches of the industry. Hayden knows an editor who works with a major director who has directed $100 million films. “He’s already using AI, sometimes without people knowing.”

Some artists feel morally conflicted but don’t think they can effectively resist. Vinny Dellay, a storyboard artist who has worked on Marvel films and Super Bowl ads, released a video detailing his views on the ethics of using AI as a working artist. Dellay said that he agrees that “AI being trained off of art found on the Internet without getting permission from the artist, it may not be fair, it may not be honest.” But refusing to use AI products won’t stop their general adoption. Believing otherwise is “just being delusional.”

Instead, Dellay said that the right course is to “adapt like cockroaches after a nuclear war.” If they’re lucky, using AI in storyboarding workflows might even “let a storyboard artist pump out twice the boards in half the time without questioning all your life’s choices at 3 am.”

Gille Klabin is an independent writer, director, and visual effects artist. Credit: Gille Klabin

Gille Klabin is an indie director and filmmaker currently working on a feature called Weekend at the End of the World.

As an independent filmmaker, Klabin can’t afford to hire many people. There are many labor-intensive tasks—like making a pitch deck for his film—that he’d otherwise have to do himself. An AI tool “essentially just liberates us to get more done and have more time back in our life.”

But he’s careful to stick to his own moral lines. Any time he mentioned using an AI tool during our interview, he’d explain why he thought that was an appropriate choice. He said he was fine with AI use “as long as you’re using it ethically in the sense that you’re not copying somebody’s work and using it for your own.”

Drawing these lines can be difficult, however. Hayden, the talent agent, told me that as AI tools make low-budget films look better, it gets harder to make high-budget films, which employ the most people at the highest wage levels.

If anything, Klabin’s AI uptake is limited more by the current capabilities of AI models. Klabin is an experienced visual effects artist, and he finds AI products to generally be “not really good enough to be used in a final project.”

He gave me a concrete example. Rotoscoping is a process in which you trace out the subject of the shot so you can edit the background independently. It’s very labor-intensive—one has to edit every frame individually—so Klabin has tried using Runway’s AI-driven rotoscoping. While it can make for a decent first pass, the result is just too messy to use as a final project.

Klabin sent me this GIF of a series of rotoscoped frames from his upcoming movie. While the model does a decent job of identifying the people in the frame, its boundaries aren’t consistent from frame to frame. The result is noisy.

Current AI tools are full of these small glitches, so Klabin only uses them for tasks that audiences don’t see (like creating a movie pitch deck) or in contexts where he can clean up the result afterward.

Stephen Robles reviews Apple products on YouTube and other platforms. He uses AI in some parts of the editing process, such as removing silences or transcribing audio, but doesn’t see it as disruptive to his career.

Stephen Robles is a YouTuber, podcaster, and creator covering tech, particularly Apple. Credit: Stephen Robles

“I am betting on the audience wanting to trust creators, wanting to see authenticity,” he told me. AI video tools don’t really help him with that and can’t replace the reputation he’s sought to build.

Recently, he experimented with using ChatGPT to edit a video thumbnail (the image used to advertise a video). He got a couple of negative reactions about his use of AI, so he said he “might slow down a little bit” with that experimentation.

Robles didn’t seem as concerned about AI models stealing from creators like him. When I asked him about how he felt about Google training on his data, he told me that “YouTube provides me enough benefit that I don’t think too much about that.”

Professional thumbnail artist Antioch Hwang has a similarly pragmatic view toward using AI. Some channels he works with have audiences that are “very sensitive to AI images.” Even using “an AI upscaler to fix up the edges” can provoke strong negative reactions. For those channels, he’s “very wary” about using AI.

Antioch Hwang is a YouTube thumbnail artist. Credit: Antioch Creative

But for most channels he works for, he’s fine using AI, at least for technical tasks. “I think there’s now been a big shift in the public perception of these AI image generation tools,” he told me. “People are now welcoming them into their workflow.”

He’s still careful with his AI use, though, because he thinks that having human artistry helps in the YouTube ecosystem. “If everyone has all the [AI] tools, then how do you really stand out?” he said.

Recently, top creators have started using more rough-looking thumbnails for their videos. AI has made polished thumbnails too easy to create, so top creators are using what Hwang would call “poorly made thumbnails” to help videos stand out.

Hwang told me something surprising: even as AI makes it easier for creators to make thumbnails themselves, business has never been better for thumbnail artists, even at the lower end. He said that demand has soared because “AI as a whole has lowered the barriers for content creation, and now there’s more creators flooding in.”

Still, Hwang doesn’t expect the good times to last forever. “I don’t see AI completely taking over for the next three-ish years. That’s my estimated timeline.”

Everyone I talked to had different answers to when—if ever—AI would meaningfully disrupt their part of the industry.

Some, like Hwang, were pessimistic. Actor Erik Passoja told me he thought the big movie studios—like Warner Bros. or Paramount—would be gone in three to five years.

But others were more optimistic. Tess Dinerstein, the vertical drama actor, said, “I don’t think that verticals are ever going to go fully AI.” Even if it becomes technologically feasible, she argued, “that just doesn’t seem to be what the people want.”

Gille Klabin, the independent filmmaker, thought there would always be a place for high-quality human films. If someone’s work is “fundamentally derivative,” then they are at risk. But he thinks the best human-created work will still stand out. “I don’t know how AI could possibly replace the borderline divine element of consciousness,” he said.

The people who were most bullish on AI were, if anything, the least optimistic about their own career prospects. “I think at a certain point it won’t matter,” Kavan Cardoza told me. “It’ll be that anyone on the planet can just type in some sentences” to generate full, high-quality videos.

This might explain why Accetturo has become something of an AI evangelist; his newsletter tries to teach other filmmakers how to adapt to the coming AI revolution.

AI “is a tsunami that is gonna wipe out everyone” he told me. “So I’m handing out surfboards—teaching people how to surf. Do with it what you will.”

Kai Williams is a reporter for Understanding AI, a Substack newsletter founded by Ars Technica alum Timothy B. Lee. His work is supported by a Tarbell FellowshipSubscribe to Understanding AI to get more from Tim and Kai.

“Go generate a bridge and jump off it”: How video pros are navigating AI Read More »

science-centric-streaming-service-curiosity-stream-is-an-ai-licensing-firm-now

Science-centric streaming service Curiosity Stream is an AI-licensing firm now

We all know streaming services’ usual tricks for making more money: get more subscribers, charge those subscribers more money, and sell ads. But science streaming service Curiosity Stream is taking a new route that could reshape how streaming companies, especially niche options, try to survive.

Discovery Channel founder John Hendricks launched Curiosity Stream in 2015. The streaming service costs $40 per year, and it doesn’t have commercials.

The streaming business has grown to also include the Curiosity Channel TV channel. CuriosityStream Inc. also makes money through original programming and its Curiosity University educational programming. The firm turned its first positive net income in its fiscal Q1 2025, after about a decade of business.

With its focus on science, history, research, and education, Curiosity Stream will always be a smaller player compared to other streaming services. As of March 2023, Curiosity Stream had 23 million subscribers, a paltry user base compared to Netflix’s 301.6 million (as of January 2025).

Still, in an extremely competitive market, Curiosity Stream’s revenue increased 41 percent year over year in its Q3 2025 earnings announced this month. This was largely due to the licensing of Curiosity Stream’s original programming to train large language models (LLMs).

“Looking at our year-to-date numbers, licensing generated $23.4 million through September, which … is already over half of what our subscription business generated for all of 2024,” Phillip Hayden, Curiosity Stream’s CFO, said during a call with investors this month.

Thus far, Curiosity Stream has completed 18 AI-related fulfillments “across video, audio, and code assets” with nine partners, an October announcement said.

The company expects to make more revenue from IP licensing deals with AI companies than it does from subscriptions by 2027, “possibly earlier,” CEO Clint Stinchcomb said during the earnings call.

Science-centric streaming service Curiosity Stream is an AI-licensing firm now Read More »

google-tells-employees-it-must-double-capacity-every-6-months-to-meet-ai-demand

Google tells employees it must double capacity every 6 months to meet AI demand

While AI bubble talk fills the air these days, with fears of overinvestment that could pop at any time, something of a contradiction is brewing on the ground: Companies like Google and OpenAI can barely build infrastructure fast enough to fill their AI needs.

During an all-hands meeting earlier this month, Google’s AI infrastructure head Amin Vahdat told employees that the company must double its serving capacity every six months to meet demand for artificial intelligence services, reports CNBC. Vahdat, a vice president at Google Cloud, presented slides showing the company needs to scale “the next 1000x in 4-5 years.”

While a thousandfold increase in compute capacity sounds ambitious by itself, Vahdat noted some key constraints: Google needs to be able to deliver this increase in capability, compute, and storage networking “for essentially the same cost and increasingly, the same power, the same energy level,” he told employees during the meeting. “It won’t be easy but through collaboration and co-design, we’re going to get there.”

It’s unclear how much of this “demand” Google mentioned represents organic user interest in AI capabilities versus the company integrating AI features into existing services like Search, Gmail, and Workspace. But whether users are using the features voluntarily or not, Google isn’t the only tech company struggling to keep up with a growing user base of customers using AI services.

Major tech companies are in a race to build out data centers. Google competitor OpenAI is planning to build six massive data centers across the US through its Stargate partnership project with SoftBank and Oracle, committing over $400 billion in the next three years to reach nearly 7 gigawatts of capacity. The company faces similar constraints serving its 800 million weekly ChatGPT users, with even paid subscribers regularly hitting usage limits for features like video synthesis and simulated reasoning models.

“The competition in AI infrastructure is the most critical and also the most expensive part of the AI race,” Vahdat said at the meeting, according to CNBC’s viewing of the presentation. The infrastructure executive explained that Google’s challenge goes beyond simply outspending competitors. “We’re going to spend a lot,” he said, but noted the real objective is building infrastructure that is “more reliable, more performant and more scalable than what’s available anywhere else.”

Google tells employees it must double capacity every 6 months to meet AI demand Read More »

ai-trained-on-bacterial-genomes-produces-never-before-seen-proteins

AI trained on bacterial genomes produces never-before-seen proteins

The researchers argue that this setup lets Evo “link nucleotide-level patterns to kilobase-scale genomic context.” In other words, if you prompt it with a large chunk of genomic DNA, Evo can interpret that as an LLM would interpret a query and produce an output that, in a genomic sense, is appropriate for that interpretation.

The researchers reasoned that, given the training on bacterial genomes, they could use a known gene as a prompt, and Evo should produce an output that includes regions that encode proteins with related functions. The key question is whether it would simply output the sequences for proteins we know about already, or whether it would come up with output that’s less predictable.

Novel proteins

To start testing the system, the researchers prompted it with fragments of the genes for known proteins and determined whether Evo could complete them. In one example, if given 30 percent of the sequence of a gene for a known protein, Evo was able to output 85 percent of the rest. When prompted with 80 percent of the sequence, it could return all of the missing sequence. When a single gene was deleted from a functional cluster, Evo could also correctly identify and restore the missing gene.

The large amount of training data also ensured that Evo correctly identified the most important regions of the protein. If it made changes to the sequence, they typically resided in the areas of the protein where variability is tolerated. In other words, its training had enabled the system to incorporate the rules of evolutionary limits on changes in known genes.

So, the researchers decided to test what happened when Evo was asked to output something new. To do so, they used bacterial toxins, which are typically encoded along with an anti-toxin that keeps the cell from killing itself whenever it activates the genes. There are a lot of examples of these out there, and they tend to evolve rapidly as part of an arms race between bacteria and their competitors. So, the team developed a toxin that was only mildly related to known ones, and had no known antitoxin, and fed its sequence to Evo as a prompt. And this time, they filtered out any responses that looked similar to known antitoxin genes.

AI trained on bacterial genomes produces never-before-seen proteins Read More »

trump-revives-unpopular-ted-cruz-plan-to-punish-states-that-impose-ai-laws

Trump revives unpopular Ted Cruz plan to punish states that impose AI laws

The FTC chairman would be required to issue a policy statement detailing “circumstances under which State laws that require alterations to the truthful outputs of AI models are preempted by the FTC Act’s prohibition on engaging in deceptive acts or practices affecting commerce.”

When Cruz proposed a moratorium restricting state AI regulation in mid-2025, Sen. Marsha Blackburn (R-Tenn.) helped lead the fight against it. “Until Congress passes federally preemptive legislation like the Kids Online Safety Act and an online privacy framework, we can’t block states from making laws that protect their citizens,” Blackburn said at the time.

Sen. Maria Cantwell (D-Wash.) also spoke out against the Cruz plan, saying it would preempt “good state consumer protection laws” related to robocalls, deepfakes, and autonomous vehicles.

Trump wants Congress to preempt state laws

Besides reviving the Cruz plan, Trump’s draft executive order seeks new legislation to preempt state laws. The order would direct Trump administration officials to “jointly prepare for my review a legislative recommendation establishing a uniform Federal regulatory framework for AI that preempts State AI laws that conflict with the policy set forth in this order.”

House Majority Leader Steve Scalise (R-La.) this week said a ban on state AI laws could be included in the National Defense Authorization Act (NDAA). Democrats are trying to keep the ban out of the bill.

“We have to allow states to take the lead because we’re not able to, so far in Washington, come up with appropriate legislation,” Sen. Jack Reed (D-R.I.), the ranking member on the Armed Services Committee, told Semafor.

In a Truth Social post on Tuesday, Trump claimed that states are “trying to embed DEI ideology into AI models.” Trump wrote, “We MUST have one Federal Standard instead of a patchwork of 50 State Regulatory Regimes. If we don’t, then China will easily catch us in the AI race. Put it in the NDAA, or pass a separate Bill, and nobody will ever be able to compete with America.”

Trump revives unpopular Ted Cruz plan to punish states that impose AI laws Read More »

“we’re-in-an-llm-bubble,”-hugging-face-ceo-says—but-not-an-ai-one

“We’re in an LLM bubble,” Hugging Face CEO says—but not an AI one

There’s been a lot of talk of an AI bubble lately, especially with regards to circular funding involving companies like OpenAI and Anthropic—but Clem Delangue, CEO of machine learning resources hub Hugging Face, has made the case that the bubble is specific to large language models, which is just one application of AI.

“I think we’re in an LLM bubble, and I think the LLM bubble might be bursting next year,” he said at an Axios event this week, as quoted in a TechCrunch article. “But ‘LLM’ is just a subset of AI when it comes to applying AI to biology, chemistry, image, audio, [and] video. I think we’re at the beginning of it, and we’ll see much more in the next few years.”

At Ars, we’ve written at length in recent days about the fears around AI investment. But to Delangue’s point, almost all of those discussions are about companies whose chief product is large language models, or the data centers meant to drive those—specifically, those focused on general-purpose chatbots that are meant to be everything for everybody.

That’s exactly the sort of application Delangue is bearish on. “I think all the attention, all the focus, all the money, is concentrated into this idea that you can build one model through a bunch of compute and that is going to solve all problems for all companies and all people,” he said.

“We’re in an LLM bubble,” Hugging Face CEO says—but not an AI one Read More »

critics-scoff-after-microsoft-warns-ai-feature-can-infect-machines-and-pilfer-data

Critics scoff after Microsoft warns AI feature can infect machines and pilfer data


Integration of Copilot Actions into Windows is off by default, but for how long?

Credit: Photographer: Chona Kasinger/Bloomberg via Getty Images

Microsoft’s warning on Tuesday that an experimental AI agent integrated into Windows can infect devices and pilfer sensitive user data has set off a familiar response from security-minded critics: Why is Big Tech so intent on pushing new features before their dangerous behaviors can be fully understood and contained?

As reported Tuesday, Microsoft introduced Copilot Actions, a new set of “experimental agentic features” that, when enabled, perform “everyday tasks like organizing files, scheduling meetings, or sending emails,” and provide “an active digital collaborator that can carry out complex tasks for you to enhance efficiency and productivity.”

Hallucinations and prompt injections apply

The fanfare, however, came with a significant caveat. Microsoft recommended users enable Copilot Actions only “if you understand the security implications outlined.”

The admonition is based on known defects inherent in most large language models, including Copilot, as researchers have repeatedly demonstrated.

One common defect of LLMs causes them to provide factually erroneous and illogical answers, sometimes even to the most basic questions. This propensity for hallucinations, as the behavior has come to be called, means users can’t trust the output of Copilot, Gemini, Claude, or any other AI assistant and instead must independently confirm it.

Another common LLM landmine is the prompt injection, a class of bug that allows hackers to plant malicious instructions in websites, resumes, and emails. LLMs are programmed to follow directions so eagerly that they are unable to discern those in valid user prompts from those contained in untrusted, third-party content created by attackers. As a result, the LLMs give the attackers the same deference as users.

Both flaws can be exploited in attacks that exfiltrate sensitive data, run malicious code, and steal cryptocurrency. So far, these vulnerabilities have proved impossible for developers to prevent and, in many cases, can only be fixed using bug-specific workarounds developed once a vulnerability has been discovered.

That, in turn, led to this whopper of a disclosure in Microsoft’s post from Tuesday:

“As these capabilities are introduced, AI models still face functional limitations in terms of how they behave and occasionally may hallucinate and produce unexpected outputs,” Microsoft said. “Additionally, agentic AI applications introduce novel security risks, such as cross-prompt injection (XPIA), where malicious content embedded in UI elements or documents can override agent instructions, leading to unintended actions like data exfiltration or malware installation.”

Microsoft indicated that only experienced users should enable Copilot Actions, which is currently available only in beta versions of Windows. The company, however, didn’t describe what type of training or experience such users should have or what actions they should take to prevent their devices from being compromised. I asked Microsoft to provide these details, and the company declined.

Like “macros on Marvel superhero crack”

Some security experts questioned the value of the warnings in Tuesday’s post, comparing them to warnings Microsoft has provided for decades about the danger of using macros in Office apps. Despite the long-standing advice, macros have remained among the lowest-hanging fruit for hackers out to surreptitiously install malware on Windows machines. One reason for this is that Microsoft has made macros so central to productivity that many users can’t do without them.

“Microsoft saying ‘don’t enable macros, they’re dangerous’… has never worked well,” independent researcher Kevin Beaumont said. “This is macros on Marvel superhero crack.”

Beaumont, who is regularly hired to respond to major Windows network compromises inside enterprises, also questioned whether Microsoft will provide a means for admins to adequately restrict Copilot Actions on end-user machines or to identify machines in a network that have the feature turned on.

A Microsoft spokesperson said IT admins will be able to enable or disable an agent workspace at both account and device levels, using Intune or other MDM (Mobile Device Management) apps.

Critics voiced other concerns, including the difficulty for even experienced users to detect exploitation attacks targeting the AI agents they’re using.

“I don’t see how users are going to prevent anything of the sort they are referring to, beyond not surfing the web I guess,” researcher Guillaume Rossolini said.

Microsoft has stressed that Copilot Actions is an experimental feature that’s turned off by default. That design was likely chosen to limit its access to users with the experience required to understand its risks. Critics, however, noted that previous experimental features—Copilot, for instance—regularly become default capabilities for all users over time. Once that’s done, users who don’t trust the feature are often required to invest time developing unsupported ways to remove the features.

Sound but lofty goals

Most of Tuesday’s post focused on Microsoft’s overall strategy for securing agentic features in Windows. Goals for such features include:

  • Non-repudiation, meaning all actions and behaviors must be “observable and distinguishable from those taken by a user”
  • Agents must preserve confidentiality when they collect, aggregate, or otherwise utilize user data
  • Agents must receive user approval when accessing user data or taking actions

The goals are sound, but ultimately they depend on users reading the dialog windows that warn of the risks and require careful approval before proceeding. That, in turn, diminishes the value of the protection for many users.

“The usual caveat applies to such mechanisms that rely on users clicking through a permission prompt,” Earlence Fernandes, a University of California, San Diego professor specializing in AI security, told Ars. “Sometimes those users don’t fully understand what is going on, or they might just get habituated and click ‘yes’ all the time. At which point, the security boundary is not really a boundary.”

As demonstrated by the rash of “ClickFix” attacks, many users can be tricked into following extremely dangerous instructions. While more experienced users (including a fair number of Ars commenters) blame the victims falling for such scams, these incidents are inevitable for a host of reasons. In some cases, even careful users are fatigued or under emotional distress and slip up as a result. Other users simply lack the knowledge to make informed decisions.

Microsoft’s warning, one critic said, amounts to little more than a CYA (short for cover your ass), a legal maneuver that attempts to shield a party from liability.

“Microsoft (like the rest of the industry) has no idea how to stop prompt injection or hallucinations, which makes it fundamentally unfit for almost anything serious,” critic Reed Mideke said. “The solution? Shift liability to the user. Just like every LLM chatbot has a ‘oh by the way, if you use this for anything important be sure to verify the answers” disclaimer, never mind that you wouldn’t need the chatbot in the first place if you knew the answer.”

As Mideke indicated, most of the criticisms extend to AI offerings other companies—including Apple, Google, and Meta—are integrating into their products. Frequently, these integrations begin as optional features and eventually become default capabilities whether users want them or not.

Photo of Dan Goodin

Dan Goodin is Senior Security Editor at Ars Technica, where he oversees coverage of malware, computer espionage, botnets, hardware hacking, encryption, and passwords. In his spare time, he enjoys gardening, cooking, and following the independent music scene. Dan is based in San Francisco. Follow him at here on Mastodon and here on Bluesky. Contact him on Signal at DanArs.82.

Critics scoff after Microsoft warns AI feature can infect machines and pilfer data Read More »

deepmind’s-latest:-an-ai-for-handling-mathematical-proofs

DeepMind’s latest: An AI for handling mathematical proofs


AlphaProof can handle math challenges but needs a bit of help right now.

Computers are extremely good with numbers, but they haven’t gotten many human mathematicians fired. Until recently, they could barely hold their own in high school-level math competitions.

But now Google’s DeepMind team has built AlphaProof, an AI system that matched silver medalists’ performance at the 2024 International Mathematical Olympiad, scoring just one point short of gold at the most prestigious undergrad math competition in the world. And that’s kind of a big deal.

True understanding

The reason computers fared poorly in math competitions is that, while they far surpass humanity’s ability to perform calculations, they are not really that good at the logic and reasoning that is needed for advanced math. Put differently, they are good at performing calculations really quickly, but they usually suck at understanding why they’re doing them. While something like addition seems simple, humans can do semi-formal proofs based on definitions of addition or go for fully formal Peano arithmetic that defines the properties of natural numbers and operations like addition through axioms.

To perform a proof, humans have to understand the very structure of mathematics. The way mathematicians build proofs, how many steps they need to arrive at the conclusion, and how cleverly they design those steps are a testament to their brilliance, ingenuity, and mathematical elegance. “You know, Bertrand Russel published a 500-page book to prove that one plus one equals two,” says Thomas Hubert, a DeepMind researcher and lead author of the AlphaProof study.

DeepMind’s team wanted to develop an AI that understood math at this level. The work started with solving the usual AI problem: the lack of training data.

Math problems translator

Large language models that power AI systems like Chat GPT learn from billions upon billions of pages of text. Because there are texts on mathematics in their training databases—all the handbooks and works of famous mathematicians—they show some level of success in proving mathematical statements. But they are limited by how they operate: They rely on using huge neural nets to predict the next word or token in sequences generated in response to user prompts. Their reasoning is statistical by design, which means they simply return answers that “sound” right.

DeepMind didn’t need the AI to “sound” right—that wasn’t going to cut it in high-level mathematics. They needed their AI to “be” right, to guarantee absolute certainty. That called for an entirely new, more formalized training environment. To provide that, the team used a software package called Lean.

Lean is a computer program that helps mathematicians write precise definitions and proofs. It relies on a precise, formal programming language that’s also called Lean, which mathematical statements can be translated into. Once the translated or formalized statement is uploaded to the program, it can check if it is correct and get back with responses like “this is correct,” “something is missing,” or “you used a fact that is not proved yet.”

The problem was, most mathematical statements and proofs that can be found online are written in natural language like “let X be the set of natural numbers that…”—the number of statements written in Lean was rather limited. “The major difficulty of working with formal languages is that there’s very little data,” Hubert says. To go around it, the researchers trained a Gemini large language model to translate mathematical statements from natural language to Lean. The model worked like an automatic formalizer and produced about 80 million formalized mathematical statements.

It wasn’t perfect, but the team managed to use that to their advantage. “There are many ways you can capitalize on approximate translations,” Hubert claims.

Learning to think

The idea DeepMind had for the AlphaProof was to use the architecture the team used in their chess-, Go-, and shogi-playing AlphaZero AI system. Building proofs in Lean and Mathematics in general was supposed to be just another game to master. “We were trying to learn this game through trial and error,” Hubert says. Imperfectly formalized problems offered great opportunity for making errors. In its learning phase, AlphaProof was simply proving and disproving the problems it had in its database. If something was translated poorly, figuring out that something wasn’t right was a useful form of exercise.

Just like AlphaZero, AlphaProof in most cases used two main components. The first was a huge neural net with a few billion parameters that learned to work in the Lean environment through trial and error. It was rewarded for each proven or disproven statement and penalized for each reasoning step it took, which was a way of incentivizing short, elegant proofs.

It was also trained to use a second component, which was a tree search algorithm. This explored all possible actions that could be taken to push the proof forward at each step. Because the number of possible actions in mathematics can be near infinite, the job of the neural net was to look at the available branches in the search tree and commit computational budget only to the most promising ones.

After a few weeks of training, the system could score well on most math competition benchmarks based on problems sourced from past high school-level competitions, but it still struggled with the most difficult of them. To tackle these, the team added a third component that hadn’t been in AlphaZero. Or anywhere else.

Spark of humanity

The third component, called Test-Time Reinforcement Learning (TTRL), roughly emulated the way mathematicians approach the most difficult problems. The learning part relied on the same combination of neural nets with search tree algorithms. The difference came in what it learned from. Instead of relying on a broad database of auto-formalized problems, AlphaProof working in the TTRL mode started its work by generating an entirely new training dataset based on the problem it was dealing with.

The process involved creating countless variations of the original statement, some simplified a little bit more, some more general, and some only loosely connected to it. The system then attempted to prove or disprove them. It was roughly what most humans do when they’re facing a particularly hard puzzle, the AI equivalent of saying, “I don’t get it, so let’s try an easier version of this first to get some practice.” This allowed AlphaProof to learn on the fly, and it worked amazingly well.

At the 2024 International Mathematics Olympiad, there were 42 points to score for solving six different problems worth seven points each. To win gold, participants had to get 29 points or higher, and 58 out of 609 of them did that. Silver medals were awarded to people who earned between 22 and 28 points (there were 123 silver medalists). The problems varied in difficulty, with the sixth one, acting as a “final boss,” being the most difficult of them all. Only six participants managed to solve it. AlphaProof was the seventh.

But AlphaProof wasn’t an end-all, be-all mathematical genius. Its silver had its price—quite literally.

Optimizing ingenuity

The first problem with AlphaProof’s performance was that it didn’t work alone. To begin with, humans had to make the problems compatible with Lean before the software even got to work. And, among the six Olympic problems, the fourth one was about geometry, and the AI was not optimized for that. To deal with it, AlphaProof had to call a friend called AlphaGeometry 2, a geometry-specialized AI that ripped through the task in a few minutes without breaking a sweat. On its own, AlphaProof scored 21 points, not 28, so technically it would win bronze, not silver. Except it wouldn’t.

Human participants of the Olympiad had to solve their six problems in two sessions, four-and-a-half hours long. AlphaProof, on the other hand, wrestled with them for several days using multiple tensor processing units at full throttle. The most time- and energy-consuming component was TTRL, which battled with the three problems it managed to solve for three days each. If AlphaProof was held up to the same standard as human participants, it would basically run out of time. And if it wasn’t born at a tech giant worth hundreds of billions of dollars, it would run out of money, too.

In the paper, the team admits the computational requirements to run AlphaProof are most likely cost-prohibitive for most research groups and aspiring mathematicians. Computing power in AI applications is often measured in TPU-days, meaning a tensor processing unit working flat-out for a full day. AlphaProof needed hundreds of TPU-days per problem.

On top of that, the International Mathematics Olympiad is a high school-level competition, and the problems, while admittedly difficult, were based on things mathematicians already know. Research-level math requires inventing entirely new concepts instead of just working with existing ones.

But DeepMind thinks it can overcome these hurdles and optimize AlphaProof to be less resource-hungry. “We don’t want to stop at math competitions. We want to build an AI system that could really contribute to research-level mathematics,” Hubert says. His goal is to make AlphaProof available to the broader research community. “We’re also releasing a kind of an AlphaProof tool,” he added. “It would be a small trusted testers program to see if this would be useful to mathematicians.”

Nature, 2025.  DOI: 10.1038/s41586-025-09833-y

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

DeepMind’s latest: An AI for handling mathematical proofs Read More »

how-louvre-thieves-exploited-human-psychology-to-avoid-suspicion—and-what-it-reveals-about ai

How Louvre thieves exploited human psychology to avoid suspicion—and what it reveals about AI

On a sunny morning on October 19 2025, four men allegedly walked into the world’s most-visited museum and left, minutes later, with crown jewels worth 88 million euros ($101 million). The theft from Paris’ Louvre Museum—one of the world’s most surveilled cultural institutions—took just under eight minutes.

Visitors kept browsing. Security didn’t react (until alarms were triggered). The men disappeared into the city’s traffic before anyone realized what had happened.

Investigators later revealed that the thieves wore hi-vis vests, disguising themselves as construction workers. They arrived with a furniture lift, a common sight in Paris’s narrow streets, and used it to reach a balcony overlooking the Seine. Dressed as workers, they looked as if they belonged.

This strategy worked because we don’t see the world objectively. We see it through categories—through what we expect to see. The thieves understood the social categories that we perceive as “normal” and exploited them to avoid suspicion. Many artificial intelligence (AI) systems work in the same way and are vulnerable to the same kinds of mistakes as a result.

The sociologist Erving Goffman would describe what happened at the Louvre using his concept of the presentation of self: people “perform” social roles by adopting the cues others expect. Here, the performance of normality became the perfect camouflage.

The sociology of sight

Humans carry out mental categorization all the time to make sense of people and places. When something fits the category of “ordinary,” it slips from notice.

AI systems used for tasks such as facial recognition and detecting suspicious activity in a public area operate in a similar way. For humans, categorization is cultural. For AI, it is mathematical.

But both systems rely on learned patterns rather than objective reality. Because AI learns from data about who looks “normal” and who looks “suspicious,” it absorbs the categories embedded in its training data. And this makes it susceptible to bias.

The Louvre robbers weren’t seen as dangerous because they fit a trusted category. In AI, the same process can have the opposite effect: people who don’t fit the statistical norm become more visible and over-scrutinized.

It can mean a facial recognition system disproportionately flags certain racial or gendered groups as potential threats while letting others pass unnoticed.

A sociological lens helps us see that these aren’t separate issues. AI doesn’t invent its categories; it learns ours. When a computer vision system is trained on security footage where “normal” is defined by particular bodies, clothing, or behavior, it reproduces those assumptions.

Just as the museum’s guards looked past the thieves because they appeared to belong, AI can look past certain patterns while overreacting to others.

Categorization, whether human or algorithmic, is a double-edged sword. It helps us process information quickly, but it also encodes our cultural assumptions. Both people and machines rely on pattern recognition, which is an efficient but imperfect strategy.

A sociological view of AI treats algorithms as mirrors: They reflect back our social categories and hierarchies. In the Louvre case, the mirror is turned toward us. The robbers succeeded not because they were invisible, but because they were seen through the lens of normality. In AI terms, they passed the classification test.

From museum halls to machine learning

This link between perception and categorization reveals something important about our increasingly algorithmic world. Whether it’s a guard deciding who looks suspicious or an AI deciding who looks like a “shoplifter,” the underlying process is the same: assigning people to categories based on cues that feel objective but are culturally learned.

When an AI system is described as “biased,” this often means that it reflects those social categories too faithfully. The Louvre heist reminds us that these categories don’t just shape our attitudes, they shape what gets noticed at all.

After the theft, France’s culture minister promised new cameras and tighter security. But no matter how advanced those systems become, they will still rely on categorization. Someone, or something, must decide what counts as “suspicious behavior.” If that decision rests on assumptions, the same blind spots will persist.

The Louvre robbery will be remembered as one of Europe’s most spectacular museum thefts. The thieves succeeded because they mastered the sociology of appearance: They understood the categories of normality and used them as tools.

And in doing so, they showed how both people and machines can mistake conformity for safety. Their success in broad daylight wasn’t only a triumph of planning. It was a triumph of categorical thinking, the same logic that underlies both human perception and artificial intelligence.

The lesson is clear: Before we teach machines to see better, we must first learn to question how we see.

Vincent Charles, Reader in AI for Business and Management Science, Queen’s University Belfast, and Tatiana Gherman, Associate Professor of AI for Business and Strategy, University of Northampton.  This article is republished from The Conversation under a Creative Commons license. Read the original article.

How Louvre thieves exploited human psychology to avoid suspicion—and what it reveals about AI Read More »