Veo 3

can-today’s-ai-video-models-accurately-model-how-the-real-world-works?

Can today’s AI video models accurately model how the real world works?

But on other tasks, the model showed much more variable results. When asked to generate a video highlighting a specific written character on a grid, for instance, the model failed in nine out of 12 trials. When asked to model a Bunsen burner turning on and burning a piece of paper, it similarly failed nine out of 12 times. When asked to solve a simple maze, it failed in 10 of 12 trials. When asked to sort numbers by popping labeled bubbles in order, it failed 11 out of 12 times.

For the researchers, though, all of the above examples aren’t evidence of failure but instead a sign of the model’s capabilities. To be listed under the paper’s “failure cases,” Veo 3 had to fail a tested task across all 12 trials, which happened in 16 of the 62 tasks tested. For the rest, the researchers write that “a success rate greater than 0 suggests that the model possesses the ability to solve the task.”

Thus, failing 11 out of 12 trails of a certain task is considered evidence for the model’s capabilities in the paper. That evidence of the model “possess[ing] the ability to solve the task” includes 18 tasks where the model failed in more than half of its 12 trial runs and another 14 where it failed in 25 to 50 percent of trials.

Past results, future performance

Yes, in all of these cases, the model technically demonstrates the capability being tested at some point. But the model’s inability to perform that task reliably means that, in practice, it won’t be performant enough for most use cases. Any future model that could become a “unified, generalist vision foundation models” will have to be able to succeed much more consistently on these kinds of tests.

Can today’s AI video models accurately model how the real world works? Read More »

google-adds-photo-to-video-generation-with-veo-3-to-the-gemini-app

Google adds photo-to-video generation with Veo 3 to the Gemini app

Google’s Veo 3 videos have propagated across the Internet since the model’s debut in May, blurring the line between truth and fiction. Now, it’s getting even easier to create these AI videos. The Gemini app is gaining photo-to-video generation, allowing you to upload a photo and turn it into a video. You don’t have to pay anything extra for these Veo 3 videos, but the feature is only available to subscribers of Google’s Pro and Ultra AI plans.

When Veo 3 launched, it could conjure up a video based only on your description, complete with speech, music, and background audio. This has made Google’s new AI videos staggeringly realistic—it’s actually getting hard to identify AI videos at a glance. Using a reference photo makes it easier to get the look you want without tediously describing every aspect. This was an option in Google’s Flow AI tool for filmmakers, but now it’s in the Gemini app and web interface.

To create a video from a photo, you have to select “Video” from the Gemini toolbar. Once this feature is available, you can then add your image and prompt, including audio and dialogue. Generating the video takes several minutes—this process takes a lot of computation, which is why video output is still quite limited.

Google adds photo-to-video generation with Veo 3 to the Gemini app Read More »

tiktok-is-being-flooded-with-racist-ai-videos-generated-by-google’s-veo-3

TikTok is being flooded with racist AI videos generated by Google’s Veo 3

The release of Google’s Veo 3 video generator in May represented a disconcerting leap in AI video quality. While many of the viral AI videos we’ve seen are harmless fun, the model’s pixel-perfect output can also be used for nefarious purposes. On TikTok, which may or may not be banned in the coming months, users have noticed a surplus of racist AI videos, courtesy of Google’s Veo 3.

According to a report from MediaMatters, numerous TikTok accounts have started posting AI-generated videos that use racist and antisemitic tropes in recent weeks. Most of the AI vitriol is aimed at Black people, depicting them as “the usual suspects” in crimes, absent parents, and monkeys with an affinity for watermelon. The content also targets immigrants and Jewish people. The videos top out at eight seconds and bear the “Veo” watermark, confirming they came from Google’s leading AI model.

The compilation video below has examples pulled from TikTok since the release of Veo 3, but be warned, it contains racist and antisemitic content. Some of the videos are shocking, which is likely the point—nothing drives engagement on social media like anger and drama. MediaMatters reports that the original posts have numerous comments echoing the stereotypes used in the video.

Hateful AI videos generated by Veo 3 spreading on TikTok.

Google has stressed security when announcing new AI models—we’ve all seen an AI refuse to complete a task that runs afoul of its guardrails. And it’s never fun when you have genuinely harmless intentions, but the system throws a false positive and blocks your output. Google has mostly struck the right balance previously, but it appears that Veo 3 is more compliant. We’ve tested a few simple prompts with Veo 3 and found it easy to reproduce elements of these videos.

Clear but unenforced policies

TikTok’s terms of service ban this kind of content. “We do not allow any hate speech, hateful behavior, or promotion of hateful ideologies. This includes explicit or implicit content that attacks a protected group,” the community guidelines read. Despite this blanket ban on racist caricatures, the hateful Veo 3 videos appear to be spreading unchecked.

TikTok is being flooded with racist AI videos generated by Google’s Veo 3 Read More »

real-tiktokers-are-pretending-to-be-veo-3-ai-creations-for-fun,-attention

Real TikTokers are pretending to be Veo 3 AI creations for fun, attention


The turing test in reverse

From music videos to “Are you a prompt?” stunts, “real” videos are presenting as AI

Of course I’m an AI creation! Why would you even doubt it? Credit: Getty Images

Since Google released its Veo 3 AI model last week, social media users have been having fun with its ability to quickly generate highly realistic eight-second clips complete with sound and lip-synced dialogue. TikTok’s algorithm has been serving me plenty of Veo-generated videos featuring impossible challenges, fake news reports, and even surreal short narrative films, to name just a few popular archetypes.

However, among all the AI-generated video experiments spreading around, I’ve also noticed a surprising counter-trend on my TikTok feed. Amid all the videos of Veo-generated avatars pretending to be real people, there are now also a bunch of videos of real people pretending to be Veo-generated avatars.

“This has to be real. There’s no way it’s AI.”

I stumbled on this trend when the TikTok algorithm fed me this video topped with the extra-large caption “Google VEO 3 THIS IS 100% AI.” As I watched and listened to the purported AI-generated band that appeared to be playing in the crowded corner of someone’s living room, I read the caption containing the supposed prompt that had generated the clip: “a band of brothers with beards playing rock music in 6/8 with an accordion.”

@kongosmusicWe are so cooked. This took 3 mins to generate. Simple prompt: “a band of brothers playing rock music in 6/8 with an accordion”♬ original sound – KONGOS

After a few seconds of taking those captions at face value, something started to feel a little off. After a few more seconds, I finally noticed the video was posted by Kongos, an indie band that you might recognize from their minor 2012 hit “Come With Me Now.” And after a little digging, I discovered the band in the video was actually just Kongos, and the tune was a 9-year-old song that the band had dressed up as an AI creation to get attention.

Here’s the sad thing: It worked! Without the “Look what Veo 3 did!” hook, I might have quickly scrolled by this video before I took the time to listen to the (pretty good!) song. The novel AI angle made me stop just long enough to pay attention to a Kongos song for the first time in over a decade.

Kongos isn’t the only musical act trying to grab attention by claiming their real performances are AI creations. Darden Bela posted that Veo 3 had “created a realistic AI music video” over a clip from what is actually a 2-year-old music video with some unremarkable special effects. Rapper GameBoi Pat dressed up an 11-month-old song with a new TikTok clip captioned “Google’s Veo 3 created a realistic sounding rapper… This has to be real. There’s no way it’s AI” (that last part is true, at least). I could go on, but you get the idea.

@gameboi_pat This has got to be real. There’s no way it’s AI 😩 #google #veo3 #googleveo3 #AI #prompts #areweprompts? ♬ original sound – GameBoi_pat

I know it’s tough to get noticed on TikTok, and that creators will go to great lengths to gain attention from the fickle algorithm. Still, there’s something more than a little off-putting about flesh-and-blood musicians pretending to be AI creations just to make social media users pause their scrolling for a few extra seconds before they catch on to the joke (or don’t, based on some of the comments).

The whole thing evokes last year’s stunt where a couple of podcast hosts released a posthumous “AI-generated” George Carlin routine before admitting that it had been written by a human after legal threats started flying. As an attention-grabbing stunt, the conceit still works. You want AI-generated content? I can pretend to be that!

Are we just prompts?

Some of the most existentially troubling Veo-generated videos floating around TikTok these days center around a gag known as “the prompt theory.” These clips focus on various AI-generated people reacting to the idea that they are “just prompts” with various levels of skepticism, fear, or even conspiratorial paranoia.

On the other side of that gag, some humans are making joke videos playing off the idea that they’re merely prompts. RedondoKid used the conceit in a basketball trick shot video, saying “of course I’m going to make this. This is AI, you put that I’m going to make this in the prompt.” User thisisamurica thanked his faux prompters for putting him in “a world with such delicious food” before theatrically choking on a forkful of meat. And comedian Drake Cummings developed TikTok skits pretending that it was actually AI video prompts forcing him to indulge in vices like shots of alcohol or online gambling (“Goolgle’s [sic] New A.I. Veo 3 is at it again!! When will the prompts end?!” Cummings jokes in the caption).

@justdrakenaround Goolgle’s New A.I. Veo 3 is at it again!! When will the prompts end?! #veo3 #google #ai #aivideo #skit ♬ original sound – Drake Cummings

Beyond the obvious jokes, though, I’ve also seen a growing trend of TikTok creators approaching friends or strangers and asking them to react to the idea that “we’re all just prompts.” The reactions run the gamut from “get the fuck away from me” to “I blame that [prompter], I now have to pay taxes” to solipsistic philosophical musings from convenience store employees.

I’m loath to call this a full-blown TikTok trend based on a few stray examples. Still, these attempts to exploit the confusion between real and AI-generated video are interesting to see. As one commenter on an “Are you a prompt?” ambush video put it: “New trend: Do normal videos and write ‘Google Veo 3’ on top of the video.”

Which one is real?

The best Veo-related TikTok engagement hack I’ve stumbled on so far, though, might be the videos that show multiple short clips and ask the viewer to decide which are real and which are fake. One video I stumbled on shows an increasing number of “Veo 3 Goth Girls” across four clips, challenging in the caption that “one of these videos is real… can you guess which one?” In another example, two similar sets of kids are shown hanging out in cars while the caption asks, “Are you able to identify which scene is real and which one is from veo3?”

@spongibobbu2 One of these videos is real… can you guess which one? #veo3 ♬ original sound – Jett

After watching both of these videos on loop a few times, I’m relatively (but not entirely) convinced that every single clip in them is a Veo creation. The fact that I watched these videos multiple times shows how effective the “Real or Veo” challenge framing is at grabbing my attention. Additionally, I’m still not 100 percent confident in my assessments, which is a testament to just how good Google’s new model is at creating convincing videos.

There are still some telltale signs for distinguishing a real video from a Veo creation, though. For one, Veo clips are still limited to just eight seconds, so any video that runs longer (without an apparent change in camera angle) is almost certainly not generated by Google’s AI. Looking back at a creator’s other videos can also provide some clues—if the same person was appearing in “normal” videos two weeks ago, it’s unlikely they would be appearing in Veo creations suddenly.

There’s also a subtle but distinctive style to most Veo creations that can distinguish them from the kind of candid handheld smartphone videos that usually fill TikTok. The lighting in a Veo video tends to be too bright, the camera movements a bit too smooth, and the edges of people and objects a little too polished. After you watch enough “genuine” Veo creations, you can start to pick out the patterns.

Regardless, TikTokers trying to pass off real videos as fakes—even as a joke or engagement hack—is a recognition that video sites are now deep in the “deep doubt” era, where you have to be extra skeptical of even legitimate-looking video footage. And the mere existence of convincing AI fakes makes it easier than ever to claim real events captured on video didn’t really happen, a problem that political scientists call the liar’s dividend. We saw this when then-candidate Trump accused Democratic nominee Kamala Harris of “A.I.’d” crowds in real photos of her Detroit airport rally.

For now, TikTokers of all stripes are having fun playing with that idea to gain social media attention. In the long term, though, the implications for discerning truth from reality are more troubling.

Photo of Kyle Orland

Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.

Real TikTokers are pretending to be Veo 3 AI creations for fun, attention Read More »

ai-video-just-took-a-startling-leap-in-realism.-are-we-doomed?

AI video just took a startling leap in realism. Are we doomed?


Tales from the cultural singularity

Google’s Veo 3 delivers AI videos of realistic people with sound and music. We put it to the test.

Still image from an AI-generated Veo 3 video of “A 1980s fitness video with models in leotards wearing werewolf masks.” Credit: Google

Last week, Google introduced Veo 3, its newest video generation model that can create 8-second clips with synchronized sound effects and audio dialog—a first for the company’s AI tools. The model, which generates videos at 720p resolution (based on text descriptions called “prompts” or still image inputs), represents what may be the most capable consumer video generator to date, bringing video synthesis close to a point where it is becoming very difficult to distinguish between “authentic” and AI-generated media.

Google also launched Flow, an online AI filmmaking tool that combines Veo 3 with the company’s Imagen 4 image generator and Gemini language model, allowing creators to describe scenes in natural language and manage characters, locations, and visual styles in a web interface.

An AI-generated video from Veo 3: “ASMR scene of a woman whispering “Moonshark” into a microphone while shaking a tambourine”

Both tools are now available to US subscribers of Google AI Ultra, a plan that costs $250 a month and comes with 12,500 credits. Veo 3 videos cost 150 credits per generation, allowing 83 videos on that plan before you run out. Extra credits are available for the price of 1 cent per credit in blocks of $25, $50, or $200. That comes out to about $1.50 per video generation. But is the price worth it? We ran some tests with various prompts to see what this technology is truly capable of.

How does Veo work?

Like other modern video generation models, Veo 3 is built on diffusion technology—the same approach that powers image generators like Stable Diffusion and Flux. The training process works by taking real videos and progressively adding noise to them until they become pure static, then teaching a neural network to reverse this process step by step. During generation, Veo 3 starts with random noise and a text prompt, then iteratively refines that noise into a coherent video that matches the description.

AI-generated video from Veo 3: “An old professor in front of a class says, ‘Without a firm historical context, we are looking at the dawn of a new era of civilization: post-history.'”

DeepMind won’t say exactly where it sourced the content to train Veo 3, but YouTube is a strong possibility. Google owns YouTube, and DeepMind previously told TechCrunch that Google models like Veo “may” be trained on some YouTube material.

It’s important to note that Veo 3 is a system composed of a series of AI models, including a large language model (LLM) to interpret user prompts to assist with detailed video creation, a video diffusion model to create the video, and an audio generation model that applies sound to the video.

An AI-generated video from Veo 3: “A male stand-up comic on stage in a night club telling a hilarious joke about AI and crypto with a silly punchline.” An AI language model built into Veo 3 wrote the joke.

In an attempt to prevent misuse, DeepMind says it’s using its proprietary watermarking technology, SynthID, to embed invisible markers into frames Veo 3 generates. These watermarks persist even when videos are compressed or edited, helping people potentially identify AI-generated content. As we’ll discuss more later, though, this may not be enough to prevent deception.

Google also censors certain prompts and outputs that breach the company’s content agreement. During testing, we encountered “generation failure” messages for videos that involve romantic and sexual material, some types of violence, mentions of certain trademarked or copyrighted media properties, some company names, certain celebrities, and some historical events.

Putting Veo 3 to the test

Perhaps the biggest change with Veo 3 is integrated audio generation, although Meta previewed a similar audio-generation capability with “Movie Gen” last October, and AI researchers have experimented with using AI to add soundtracks to silent videos for some time. Google DeepMind itself showed off an AI soundtrack-generating model in June 2024.

An AI-generated video from Veo 3: “A middle-aged balding man rapping indie core about Atari, IBM, TRS-80, Commodore, VIC-20, Atari 800, NES, VCS, Tandy 100, Coleco, Timex-Sinclair, Texas Instruments”

Veo 3 can generate everything from traffic sounds to music and character dialogue, though our early testing reveals occasional glitches. Spaghetti makes crunching sounds when eaten (as we covered last week, with a nod to the famous Will Smith AI spaghetti video), and in scenes with multiple people, dialogue sometimes comes from the wrong character’s mouth. But overall, Veo 3 feels like a step change in video synthesis quality and coherency over models from OpenAI, Runway, Minimax, Pika, Meta, Kling, and Hunyuanvideo.

The videos also tend to show garbled subtitles that almost match the spoken words, which is an artifact of subtitles on videos present in the training data. The AI model is imitating what it has “seen” before.

An AI-generated video from Veo 3: “A beer commercial for ‘CATNIP’ beer featuring a real a cat in a pickup truck driving down a dusty dirt road in a trucker hat drinking a can of beer while country music plays in the background, a man sings a jingle ‘Catnip beeeeeeeeeeeeeeeeer’ holding the note for 6 seconds”

We generated each of the eight-second-long 720p videos seen below using Google’s Flow platform. Each video generation took around three to five minutes to complete, and we paid for them ourselves. It’s important to note that better results come from cherry-picking—running the same prompt multiple times until you find a good result. Due to cost and in the spirit of testing, we only ran every prompt once, unless noted.

New audio prompts

Let’s dive right into the deep end with audio generation to get a grip on what this technology can do. We’ve previously shown you a man singing about spaghetti and a rapping shark in our last Veo 3 piece, but here’s some more complex dialogue.

Since 2022, we’ve been using the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting” to test AI image generators like Midjourney. It’s time to bring that barbarian to life.

A muscular barbarian man holding an axe, standing next to a CRT television set. He looks at the TV, then to the camera and literally says, “You’ve been looking for this for years: a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. Got that, Benj?”

The video above represents significant technical progress in AI media synthesis over the course of only three years. We’ve gone from a blurry colorful still-image barbarian to a photorealistic guy that talks to us in 720p high definition with audio. Most notably, there’s no reason to believe technical capability in AI generation will slow down from here.

Horror film: A scared woman in a Victorian outfit running through a forest, dolly shot, being chased by a man in a peanut costume screaming, “Wait! You forgot your wallet!”

Trailer for The Haunted Basketball Train: a Tim Burton film where 1990s basketball star is stuck at the end of a haunted passenger train with basketball court cars, and the only way to survive is to make it to the engine by beating different ghosts at basketball in every car

ASMR video of a muscular barbarian man whispering slowly into a microphone, “You love CRTs, don’t you? That’s OK. It’s OK to love CRT televisions and barbarians.”

1980s PBS show about a man with a beard talking about how his Apple II computer can “connect to the world through a series of tubes”

A 1980s fitness video with models in leotards wearing werewolf masks

A female therapist looking at the camera, zoom call. She says, “Oh my lord, look at that Atari 800 you have behind you! I can’t believe how nice it is!”

With this technology, one can easily imagine a virtual world of AI personalities designed to flatter people. This is a fairly innocent example about a vintage computer, but you can extrapolate, making the fake person talk about any topic at all. There are limits due to Google’s filters, but from what we’ve seen in the past, a future uncensored version of a similarly capable AI video generator is very likely.

Video call screenshot capture of a Zoom chat. A psychologist in a dark, cozy therapist’s office. The therapist says in a friendly voice, “Hi Tom, thanks for calling. Tell me about how you’re feeling today. Is the depression still getting to you? Let’s work on that.”

1960s NASA footage of the first man stepping onto the surface of the Moon, who squishes into a pile of mud and yells in a hillbilly voice, “What in tarnation??”

A local TV news interview of a muscular barbarian talking about why he’s always carrying a CRT TV set around with him

Speaking of fake news interviews, Veo 3 can generate plenty of talking anchor-persons, although sometimes on-screen text is garbled if you don’t specify exactly what it should say. It’s in cases like this where it seems Veo 3 might be most potent at casual media deception.

Footage from a news report about Russia invading the United States

Attempts at music

Veo 3’s AI audio generator can create music in various genres, although in practice, the results are typically simplistic. Still, it’s a new capability for AI video generators. Here are a few examples in various musical genres.

A PBS show of a crazy barbarian with a blonde afro painting pictures of Trees, singing “HAPPY BIG TREES” to some music while he paints

A 1950s cowboy rides up to the camera and sings in country music, “I love mah biiig ooold donkeee”

A 1980s hair metal band drives up to the camera and sings in rock music, “Help me with my huge huge huge hair!”

Mister Rogers’ Neighborhood PBS kids show intro done with psychedelic acid rock and colored lights

1950s musical jazz group with a scat singer singing about pickles amid gibberish

A trip-hop rap song about Ars Technica being sung by a guy in a large rubber shark costume on a stage with a full moon in the background

Some classic prompts from prior tests

The prompts below come from our previous video tests of Gen-3, Video-01, and the open source Hunyuanvideo, so you can flip back to those articles and compare the results if you want to. Overall, Veo 3 appears to have far greater temporal coherency (having a consistent subject or theme over time) than the earlier video synthesis models we’ve tested. But of course, it’s not perfect.

A highly intelligent person reading ‘Ars Technica’ on their computer when the screen explodes

The moonshark jumping out of a computer screen and attacking a person

A herd of one million cats running on a hillside, aerial view

Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy

Aerial shot of a small American town getting deluged with liquid cheese after a massive cheese rainstorm where liquid cheese rained down and dripped all over the buildings

Wide-angle shot, starting with the Sasquatch at the center of the stage giving a TED talk about mushrooms, then slowly zooming in to capture its expressive face and gestures, before panning to the attentive audience

Some notable failures

Google’s Veo 3 isn’t perfect at synthesizing every scenario we can throw at it due to limitations of training data. As we noted in our previous coverage, AI video generators remain fundamentally imitative, making predictions based on statistical patterns rather than a true understanding of physics or how the world works.

For example, if you see mouths moving during speech, or clothes wrinkling in a certain way when touched, it means the neural network doing the video generation has “seen” enough similar examples of that scenario in the training data to render a convincing take on it and apply it to similar situations.

However, when a novel situation (or combination of themes) isn’t well-represented in the training data, you’ll see “impossible” or illogical things happen, such as weird body parts, magically appearing clothing, or an object that “shatters” but remains in the scene afterward, as you’ll see below.

We mentioned audio and video glitches in the introduction. In particular, scenes with multiple people sometimes confuse which character is speaking, such as this argument between tech fans.

A 2000s TV debate between fans of the PowerPC and Intel Pentium chips

Bombastic 1980s infomercial for the “Ars Technica” online service. With cheesy background music and user testimonials

1980s Rambo fighting Soviets on the Moon

Sometimes requests don’t make coherent sense. In this case, “Rambo” is correctly on the Moon firing a gun, but he’s not wearing a spacesuit. He’s a lot tougher than we thought.

An animated infographic showing how many floppy disks it would take to hold an installation of Windows 11

Large amounts of text also present a weak point, but if a short text quotation is explicitly specified in the prompt, Veo 3 usually gets it right.

A young woman doing a complex floor gymnastics routine at the Olympics, featuring running and flips

Despite Veo 3’s advances in temporal coherency and audio generation, it still suffers from the same “jabberwockies” we saw in OpenAI’s viral Sora gymnast video—those non-plausible video hallucinations like impossible morphing body parts.

A silly group of men and women cartwheeling across the road, singing “CHEEEESE” and holding the note for 8 seconds before falling over.

A YouTube-style try-on video of a person trying on various corncob costumes. They shout “Corncob haul!!”

A man made of glass runs into a brick wall and shatters, screaming

A man in a spacesuit holding up 5 fingers and counting down to zero, then blasting off into space with rocket boots

Counting down with fingers is difficult for Veo 3, likely because it’s not well-represented in the training data. Instead, hands are likely usually shown in a few positions like a fist, a five-finger open palm, a two-finger peace sign, and the number one.

As new architectures emerge and future models train on vastly larger datasets with exponentially more compute, these systems will likely forge deeper statistical connections between the concepts they observe in videos, dramatically improving quality and also the ability to generalize more with novel prompts.

The “cultural singularity” is coming—what more is left to say?

By now, some of you might be worried that we’re in trouble as a society due to potential deception from this kind of technology. And there’s a good reason to worry: The American pop culture diet currently relies heavily on clips shared by strangers through social media such as TikTok, and now all of that can easily be faked, whole-cloth. Automated generations of fake people can now argue for ideological positions in a way that could manipulate the masses.

AI-generated video by Veo 3: “A man on the street interview about someone who fears they live in a time where nothing can be believed”

Such videos could be (and were) manipulated before through various means prior to Veo 3, but now the barrier to entry has collapsed from requiring specialized skills, expensive software, and hours of painstaking work to simply typing a prompt and waiting three minutes. What once required a team of VFX artists or at least someone proficient in After Effects can now be done by anyone with a credit card and an Internet connection.

But let’s take a moment to catch our breath. At Ars Technica, we’ve been warning about the deceptive potential of realistic AI-generated media since at least 2019. In 2022, we talked about AI image generator Stable Diffusion and the ability to train people into custom AI image models. We discussed Sora “collapsing media reality” and talked about persistent media skepticism during the “deep doubt era.”

AI-generated video with Veo 3: “A man on the street ranting about the ‘cultural singularity’ and the ‘cultural apocalypse’ due to AI”

I also wrote in detail about the future ability for people to pollute the historical record with AI-generated noise. In that piece, I used the term “cultural singularity” to denote a time when truth and fiction in media become indistinguishable, not only because of the deceptive nature of AI-generated content but also due to the massive quantities of AI-generated and AI-augmented media we’ll likely soon be inundated with.

However, in an article I wrote last year about cloning my dad’s handwriting using AI, I came to the conclusion that my previous fears about the cultural singularity may be overblown. Media has always been vulnerable to forgery since ancient times; trust in any remote communication ultimately depends on trusting its source.

AI-generated video with Veo 3: “A news set. There is an ‘Ars Technica News’ logo behind a man. The man has a beard and a suit and is doing a sit-down interview. He says “This is the age of post-history: a new epoch of civilization where the historical record is so full of fabrication that it becomes effectively meaningless.”

The Romans had laws against forgery in 80 BC, and people have been doctoring photos since the medium’s invention. What has changed isn’t the possibility of deception but its accessibility and scale.

With Veo 3’s ability to generate convincing video with synchronized dialogue and sound effects, we’re not witnessing the birth of media deception—we’re seeing its mass democratization. What once cost millions of dollars in Hollywood special effects can now be created for pocket change.

An AI-generated video created with Google Veo-3: “A candid interview of a woman who doesn’t believe anything she sees online unless it’s on Ars Technica.”

As these tools become more powerful and affordable, skepticism in media will grow. But the question isn’t whether we can trust what we see and hear. It’s whether we can trust who’s showing it to us. In an era where anyone can generate a realistic video of anything for $1.50, the credibility of the source becomes our primary anchor to truth. The medium was never the message—the messenger always was.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

AI video just took a startling leap in realism. Are we doomed? Read More »

google’s-will-smith-double-is-better-at-eating-ai-spaghetti-…-but-it’s-crunchy?

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy?

On Tuesday, Google launched Veo 3, a new AI video synthesis model that can do something no major AI video generator has been able to do before: create a synchronized audio track. While from 2022 to 2024, we saw early steps in AI video generation, each video was silent and usually very short in duration. Now you can hear voices, dialog, and sound effects in eight-second high-definition video clips.

Shortly after the new launch, people began asking the most obvious benchmarking question: How good is Veo 3 at faking Oscar-winning actor Will Smith at eating spaghetti?

First, a brief recap. The spaghetti benchmark in AI video traces its origins back to March 2023, when we first covered an early example of horrific AI-generated video using an open source video synthesis model called ModelScope. The spaghetti example later became well-known enough that Smith parodied it almost a year later in February 2024.

Here’s what the original viral video looked like:

One thing people forget is that at the time, the Smith example wasn’t the best AI video generator out there—a video synthesis model called Gen-2 from Runway had already achieved superior results (though it was not yet publicly accessible). But the ModelScope result was funny and weird enough to stick in people’s memories as an early poor example of video synthesis, handy for future comparisons as AI models progressed.

AI app developer Javi Lopez first came to the rescue for curious spaghetti fans earlier this week with Veo 3, performing the Smith test and posting the results on X. But as you’ll notice below when you watch, the soundtrack has a curious quality: The faux Smith appears to be crunching on the spaghetti.

On X, Javi Lopez ran “Will Smith eating spaghetti” in Google’s Veo 3 AI video generator and received this result.

It’s a glitch in Veo 3’s experimental ability to apply sound effects to video, likely because the training data used to create Google’s AI models featured many examples of chewing mouths with crunching sound effects. Generative AI models are pattern-matching prediction machines, and they need to be shown enough examples of various types of media to generate convincing new outputs. If a concept is over-represented or under-represented in the training data, you’ll see unusual generation results, such as jabberwockies.

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy? Read More »