Google

google-and-doj-tussle-over-how-ai-will-remake-the-web-in-antitrust-closing-arguments

Google and DOJ tussle over how AI will remake the web in antitrust closing arguments

At the same time, Google is seeking to set itself apart from AI upstarts. “Generative AI companies are not trying to out-Google Google,” said Schmidtlein. Google’s team contends that its actions have not harmed any AI products like ChatGPT or Perplexity, and at any rate, they are not in the search market as defined by the court.

Mehta mused about the future of search, suggesting we may have to rethink what a general search engine is in 2025. “Maybe people don’t want 10 blue links anymore,” he said.

The Chromium problem and an elegant solution

At times during the case, Mehta has expressed skepticism about the divestment of Chrome. During closing arguments, Dahlquist reiterated the close relationship between search and browsers, reminding the court that 35 percent of Google’s search volume comes from Chrome.

Mehta now seems more receptive to a Chrome split than before, perhaps in part because the effects of the other remedies are becoming so murky. He called the Chrome divestment “less speculative” and “more elegant” than the data and placement remedies. Google again claimed, as it has throughout the remedy phase, that forcing it to give up Chrome is unsupported in the law and that Chrome’s dominance is a result of innovation.

Break up the company without touching the sides and getting shocked!

Credit: Aurich Lawson

Even if Mehta leans toward ordering this remedy, Chromium may be a sticking point. The judge seems unconvinced that the supposed buyers—a group which apparently includes almost every major tech firm—have the scale and expertise needed to maintain Chromium. This open source project forms the foundation of many other browsers, making its continued smooth operation critical to the web.

If Google gives up Chrome, Chromium goes with it, but what about the people who maintain it? The DOJ contends that it’s common for employees to come along with an acquisition, but that’s far from certain. There was some discussion of ensuring a buyer could commit to hiring staff to maintain Chromium. The DOJ suggests Google could be ordered to provide financial incentives to ensure critical roles are filled, but that sounds potentially messy.

A Chrome sale seems more likely now than it did earlier, but nothing is assured yet. Following the final arguments from each side, it’s up to Mehta to mull over the facts before deciding Google’s fate. That’s expected to happen in August, but nothing will change for Google right away. The company has already confirmed it will appeal the case, hoping to have the original ruling overturned. It could still be years before this case reaches its ultimate conclusion.

Google and DOJ tussle over how AI will remake the web in antitrust closing arguments Read More »

amazon-fire-sticks-enable-“billions-of-dollars”-worth-of-streaming-piracy

Amazon Fire Sticks enable “billions of dollars” worth of streaming piracy

Amazon Fire Sticks are enabling “billions of dollars” worth of streaming piracy, according to a report today from Enders Analysis, a media, entertainment, and telecommunications research firm. Technologies from other media conglomerates, Microsoft, Google, and Facebook, are also enabling what the report’s authors deem an “industrial scale of theft.”

The report, “Video piracy: Big tech is clearly unwilling to address the problem,” focuses on the European market but highlights the global growth of piracy of streaming services as they increasingly acquire rights to live programs, like sporting events.

Per the BBC, the report points to the availability of multiple, simultaneous illegal streams for big events that draw tens of thousands of pirate viewers.

Enders’ report places some blame on Facebook for showing advertisements for access to illegal streams, as well as Google and Microsoft for the alleged “continued depreciation” of their digital rights management (DRM) systems, Widevine and PlayReady, respectively. Ars Technica reached out to Facebook, Google, and Microsoft for comment but didn’t receive a response before publication.

The report echoes complaints shared throughout the industry, including by the world’s largest European soccer streamer, DAZN. Streaming piracy is “almost a crisis for the sports rights industry,” DAZN’s head of global rights, Tom Burrows, said at The Financial Times’ Business of Football Summit in February. At the same event, Nick Herm, COO of Comcast-owned European telecommunication firm Sky Group, estimated that piracy was costing his company “hundreds of millions of dollars” in revenue. At the time, Enders co-founder Claire Enders said that the pirating of sporting events accounts for “about 50 percent of most markets.”

Jailbroken Fire Sticks

Friday’s Enders report named Fire Sticks as a significant contributor to streaming piracy, calling the hardware a “piracy enabler.”

Enders’ report pointed to security risks that pirate viewers face, including providing credit card information and email addresses to unknown entities, which can make people vulnerable to phishing and malware. However, reports of phishing and malware stemming from streaming piracy, which occurs through various methods besides a Fire TV Stick, seem to be rather limited.

Amazon Fire Sticks enable “billions of dollars” worth of streaming piracy Read More »

gemini-in-google-drive-may-finally-be-useful-now-that-it-can-analyze-videos

Gemini in Google Drive may finally be useful now that it can analyze videos

Google’s rapid adoption of AI has seen the Gemini “sparkle” icon become an omnipresent element in almost every Google product. It’s there to summarize your email, add items to your calendar, and more—if you trust it to do those things. Gemini is also integrated with Google Drive, where it’s gaining a new feature that could make it genuinely useful: Google’s AI bot will soon be able to watch videos stored in your Drive so you don’t have to.

Gemini is already accessible in Drive, with the ability to summarize documents or folders, gather and analyze data, and expand on the topics covered in your documents. Google says the next step is plugging videos into Gemini, saving you from wasting time scrubbing through a file just to find something of interest.

Using a chatbot to analyze and manipulate text doesn’t always make sense—after all, it’s not hard to skim an email or short document. It can take longer to interact with a chatbot, which might not add any useful insights. Video is different because watching is a linear process in which you are presented with information at the pace the video creator sets. You can change playback speed or rewind to catch something you missed, but that’s more arduous than reading something at your own pace. So Gemini’s video support in Drive could save you real time.

Suppose you have a recorded meeting in video form uploaded to Drive. You could go back and rewatch it to take notes or refresh your understanding of a particular exchange. Or, Google suggests, you can ask Gemini to summarize the video and tell you what’s important. This could be a great alternative, as grounding AI output with a specific data set or file tends to make it more accurate. Naturally, you should still maintain healthy skepticism of what the AI tells you about the content of your video.

Gemini in Google Drive may finally be useful now that it can analyze videos Read More »

ai-video-just-took-a-startling-leap-in-realism.-are-we-doomed?

AI video just took a startling leap in realism. Are we doomed?


Tales from the cultural singularity

Google’s Veo 3 delivers AI videos of realistic people with sound and music. We put it to the test.

Still image from an AI-generated Veo 3 video of “A 1980s fitness video with models in leotards wearing werewolf masks.” Credit: Google

Last week, Google introduced Veo 3, its newest video generation model that can create 8-second clips with synchronized sound effects and audio dialog—a first for the company’s AI tools. The model, which generates videos at 720p resolution (based on text descriptions called “prompts” or still image inputs), represents what may be the most capable consumer video generator to date, bringing video synthesis close to a point where it is becoming very difficult to distinguish between “authentic” and AI-generated media.

Google also launched Flow, an online AI filmmaking tool that combines Veo 3 with the company’s Imagen 4 image generator and Gemini language model, allowing creators to describe scenes in natural language and manage characters, locations, and visual styles in a web interface.

An AI-generated video from Veo 3: “ASMR scene of a woman whispering “Moonshark” into a microphone while shaking a tambourine”

Both tools are now available to US subscribers of Google AI Ultra, a plan that costs $250 a month and comes with 12,500 credits. Veo 3 videos cost 150 credits per generation, allowing 83 videos on that plan before you run out. Extra credits are available for the price of 1 cent per credit in blocks of $25, $50, or $200. That comes out to about $1.50 per video generation. But is the price worth it? We ran some tests with various prompts to see what this technology is truly capable of.

How does Veo work?

Like other modern video generation models, Veo 3 is built on diffusion technology—the same approach that powers image generators like Stable Diffusion and Flux. The training process works by taking real videos and progressively adding noise to them until they become pure static, then teaching a neural network to reverse this process step by step. During generation, Veo 3 starts with random noise and a text prompt, then iteratively refines that noise into a coherent video that matches the description.

AI-generated video from Veo 3: “An old professor in front of a class says, ‘Without a firm historical context, we are looking at the dawn of a new era of civilization: post-history.'”

DeepMind won’t say exactly where it sourced the content to train Veo 3, but YouTube is a strong possibility. Google owns YouTube, and DeepMind previously told TechCrunch that Google models like Veo “may” be trained on some YouTube material.

It’s important to note that Veo 3 is a system composed of a series of AI models, including a large language model (LLM) to interpret user prompts to assist with detailed video creation, a video diffusion model to create the video, and an audio generation model that applies sound to the video.

An AI-generated video from Veo 3: “A male stand-up comic on stage in a night club telling a hilarious joke about AI and crypto with a silly punchline.” An AI language model built into Veo 3 wrote the joke.

In an attempt to prevent misuse, DeepMind says it’s using its proprietary watermarking technology, SynthID, to embed invisible markers into frames Veo 3 generates. These watermarks persist even when videos are compressed or edited, helping people potentially identify AI-generated content. As we’ll discuss more later, though, this may not be enough to prevent deception.

Google also censors certain prompts and outputs that breach the company’s content agreement. During testing, we encountered “generation failure” messages for videos that involve romantic and sexual material, some types of violence, mentions of certain trademarked or copyrighted media properties, some company names, certain celebrities, and some historical events.

Putting Veo 3 to the test

Perhaps the biggest change with Veo 3 is integrated audio generation, although Meta previewed a similar audio-generation capability with “Movie Gen” last October, and AI researchers have experimented with using AI to add soundtracks to silent videos for some time. Google DeepMind itself showed off an AI soundtrack-generating model in June 2024.

An AI-generated video from Veo 3: “A middle-aged balding man rapping indie core about Atari, IBM, TRS-80, Commodore, VIC-20, Atari 800, NES, VCS, Tandy 100, Coleco, Timex-Sinclair, Texas Instruments”

Veo 3 can generate everything from traffic sounds to music and character dialogue, though our early testing reveals occasional glitches. Spaghetti makes crunching sounds when eaten (as we covered last week, with a nod to the famous Will Smith AI spaghetti video), and in scenes with multiple people, dialogue sometimes comes from the wrong character’s mouth. But overall, Veo 3 feels like a step change in video synthesis quality and coherency over models from OpenAI, Runway, Minimax, Pika, Meta, Kling, and Hunyuanvideo.

The videos also tend to show garbled subtitles that almost match the spoken words, which is an artifact of subtitles on videos present in the training data. The AI model is imitating what it has “seen” before.

An AI-generated video from Veo 3: “A beer commercial for ‘CATNIP’ beer featuring a real a cat in a pickup truck driving down a dusty dirt road in a trucker hat drinking a can of beer while country music plays in the background, a man sings a jingle ‘Catnip beeeeeeeeeeeeeeeeer’ holding the note for 6 seconds”

We generated each of the eight-second-long 720p videos seen below using Google’s Flow platform. Each video generation took around three to five minutes to complete, and we paid for them ourselves. It’s important to note that better results come from cherry-picking—running the same prompt multiple times until you find a good result. Due to cost and in the spirit of testing, we only ran every prompt once, unless noted.

New audio prompts

Let’s dive right into the deep end with audio generation to get a grip on what this technology can do. We’ve previously shown you a man singing about spaghetti and a rapping shark in our last Veo 3 piece, but here’s some more complex dialogue.

Since 2022, we’ve been using the prompt “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting” to test AI image generators like Midjourney. It’s time to bring that barbarian to life.

A muscular barbarian man holding an axe, standing next to a CRT television set. He looks at the TV, then to the camera and literally says, “You’ve been looking for this for years: a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. Got that, Benj?”

The video above represents significant technical progress in AI media synthesis over the course of only three years. We’ve gone from a blurry colorful still-image barbarian to a photorealistic guy that talks to us in 720p high definition with audio. Most notably, there’s no reason to believe technical capability in AI generation will slow down from here.

Horror film: A scared woman in a Victorian outfit running through a forest, dolly shot, being chased by a man in a peanut costume screaming, “Wait! You forgot your wallet!”

Trailer for The Haunted Basketball Train: a Tim Burton film where 1990s basketball star is stuck at the end of a haunted passenger train with basketball court cars, and the only way to survive is to make it to the engine by beating different ghosts at basketball in every car

ASMR video of a muscular barbarian man whispering slowly into a microphone, “You love CRTs, don’t you? That’s OK. It’s OK to love CRT televisions and barbarians.”

1980s PBS show about a man with a beard talking about how his Apple II computer can “connect to the world through a series of tubes”

A 1980s fitness video with models in leotards wearing werewolf masks

A female therapist looking at the camera, zoom call. She says, “Oh my lord, look at that Atari 800 you have behind you! I can’t believe how nice it is!”

With this technology, one can easily imagine a virtual world of AI personalities designed to flatter people. This is a fairly innocent example about a vintage computer, but you can extrapolate, making the fake person talk about any topic at all. There are limits due to Google’s filters, but from what we’ve seen in the past, a future uncensored version of a similarly capable AI video generator is very likely.

Video call screenshot capture of a Zoom chat. A psychologist in a dark, cozy therapist’s office. The therapist says in a friendly voice, “Hi Tom, thanks for calling. Tell me about how you’re feeling today. Is the depression still getting to you? Let’s work on that.”

1960s NASA footage of the first man stepping onto the surface of the Moon, who squishes into a pile of mud and yells in a hillbilly voice, “What in tarnation??”

A local TV news interview of a muscular barbarian talking about why he’s always carrying a CRT TV set around with him

Speaking of fake news interviews, Veo 3 can generate plenty of talking anchor-persons, although sometimes on-screen text is garbled if you don’t specify exactly what it should say. It’s in cases like this where it seems Veo 3 might be most potent at casual media deception.

Footage from a news report about Russia invading the United States

Attempts at music

Veo 3’s AI audio generator can create music in various genres, although in practice, the results are typically simplistic. Still, it’s a new capability for AI video generators. Here are a few examples in various musical genres.

A PBS show of a crazy barbarian with a blonde afro painting pictures of Trees, singing “HAPPY BIG TREES” to some music while he paints

A 1950s cowboy rides up to the camera and sings in country music, “I love mah biiig ooold donkeee”

A 1980s hair metal band drives up to the camera and sings in rock music, “Help me with my huge huge huge hair!”

Mister Rogers’ Neighborhood PBS kids show intro done with psychedelic acid rock and colored lights

1950s musical jazz group with a scat singer singing about pickles amid gibberish

A trip-hop rap song about Ars Technica being sung by a guy in a large rubber shark costume on a stage with a full moon in the background

Some classic prompts from prior tests

The prompts below come from our previous video tests of Gen-3, Video-01, and the open source Hunyuanvideo, so you can flip back to those articles and compare the results if you want to. Overall, Veo 3 appears to have far greater temporal coherency (having a consistent subject or theme over time) than the earlier video synthesis models we’ve tested. But of course, it’s not perfect.

A highly intelligent person reading ‘Ars Technica’ on their computer when the screen explodes

The moonshark jumping out of a computer screen and attacking a person

A herd of one million cats running on a hillside, aerial view

Video game footage of a dynamic 1990s third-person 3D platform game starring an anthropomorphic shark boy

Aerial shot of a small American town getting deluged with liquid cheese after a massive cheese rainstorm where liquid cheese rained down and dripped all over the buildings

Wide-angle shot, starting with the Sasquatch at the center of the stage giving a TED talk about mushrooms, then slowly zooming in to capture its expressive face and gestures, before panning to the attentive audience

Some notable failures

Google’s Veo 3 isn’t perfect at synthesizing every scenario we can throw at it due to limitations of training data. As we noted in our previous coverage, AI video generators remain fundamentally imitative, making predictions based on statistical patterns rather than a true understanding of physics or how the world works.

For example, if you see mouths moving during speech, or clothes wrinkling in a certain way when touched, it means the neural network doing the video generation has “seen” enough similar examples of that scenario in the training data to render a convincing take on it and apply it to similar situations.

However, when a novel situation (or combination of themes) isn’t well-represented in the training data, you’ll see “impossible” or illogical things happen, such as weird body parts, magically appearing clothing, or an object that “shatters” but remains in the scene afterward, as you’ll see below.

We mentioned audio and video glitches in the introduction. In particular, scenes with multiple people sometimes confuse which character is speaking, such as this argument between tech fans.

A 2000s TV debate between fans of the PowerPC and Intel Pentium chips

Bombastic 1980s infomercial for the “Ars Technica” online service. With cheesy background music and user testimonials

1980s Rambo fighting Soviets on the Moon

Sometimes requests don’t make coherent sense. In this case, “Rambo” is correctly on the Moon firing a gun, but he’s not wearing a spacesuit. He’s a lot tougher than we thought.

An animated infographic showing how many floppy disks it would take to hold an installation of Windows 11

Large amounts of text also present a weak point, but if a short text quotation is explicitly specified in the prompt, Veo 3 usually gets it right.

A young woman doing a complex floor gymnastics routine at the Olympics, featuring running and flips

Despite Veo 3’s advances in temporal coherency and audio generation, it still suffers from the same “jabberwockies” we saw in OpenAI’s viral Sora gymnast video—those non-plausible video hallucinations like impossible morphing body parts.

A silly group of men and women cartwheeling across the road, singing “CHEEEESE” and holding the note for 8 seconds before falling over.

A YouTube-style try-on video of a person trying on various corncob costumes. They shout “Corncob haul!!”

A man made of glass runs into a brick wall and shatters, screaming

A man in a spacesuit holding up 5 fingers and counting down to zero, then blasting off into space with rocket boots

Counting down with fingers is difficult for Veo 3, likely because it’s not well-represented in the training data. Instead, hands are likely usually shown in a few positions like a fist, a five-finger open palm, a two-finger peace sign, and the number one.

As new architectures emerge and future models train on vastly larger datasets with exponentially more compute, these systems will likely forge deeper statistical connections between the concepts they observe in videos, dramatically improving quality and also the ability to generalize more with novel prompts.

The “cultural singularity” is coming—what more is left to say?

By now, some of you might be worried that we’re in trouble as a society due to potential deception from this kind of technology. And there’s a good reason to worry: The American pop culture diet currently relies heavily on clips shared by strangers through social media such as TikTok, and now all of that can easily be faked, whole-cloth. Automated generations of fake people can now argue for ideological positions in a way that could manipulate the masses.

AI-generated video by Veo 3: “A man on the street interview about someone who fears they live in a time where nothing can be believed”

Such videos could be (and were) manipulated before through various means prior to Veo 3, but now the barrier to entry has collapsed from requiring specialized skills, expensive software, and hours of painstaking work to simply typing a prompt and waiting three minutes. What once required a team of VFX artists or at least someone proficient in After Effects can now be done by anyone with a credit card and an Internet connection.

But let’s take a moment to catch our breath. At Ars Technica, we’ve been warning about the deceptive potential of realistic AI-generated media since at least 2019. In 2022, we talked about AI image generator Stable Diffusion and the ability to train people into custom AI image models. We discussed Sora “collapsing media reality” and talked about persistent media skepticism during the “deep doubt era.”

AI-generated video with Veo 3: “A man on the street ranting about the ‘cultural singularity’ and the ‘cultural apocalypse’ due to AI”

I also wrote in detail about the future ability for people to pollute the historical record with AI-generated noise. In that piece, I used the term “cultural singularity” to denote a time when truth and fiction in media become indistinguishable, not only because of the deceptive nature of AI-generated content but also due to the massive quantities of AI-generated and AI-augmented media we’ll likely soon be inundated with.

However, in an article I wrote last year about cloning my dad’s handwriting using AI, I came to the conclusion that my previous fears about the cultural singularity may be overblown. Media has always been vulnerable to forgery since ancient times; trust in any remote communication ultimately depends on trusting its source.

AI-generated video with Veo 3: “A news set. There is an ‘Ars Technica News’ logo behind a man. The man has a beard and a suit and is doing a sit-down interview. He says “This is the age of post-history: a new epoch of civilization where the historical record is so full of fabrication that it becomes effectively meaningless.”

The Romans had laws against forgery in 80 BC, and people have been doctoring photos since the medium’s invention. What has changed isn’t the possibility of deception but its accessibility and scale.

With Veo 3’s ability to generate convincing video with synchronized dialogue and sound effects, we’re not witnessing the birth of media deception—we’re seeing its mass democratization. What once cost millions of dollars in Hollywood special effects can now be created for pocket change.

An AI-generated video created with Google Veo-3: “A candid interview of a woman who doesn’t believe anything she sees online unless it’s on Ars Technica.”

As these tools become more powerful and affordable, skepticism in media will grow. But the question isn’t whether we can trust what we see and hear. It’s whether we can trust who’s showing it to us. In an era where anyone can generate a realistic video of anything for $1.50, the credibility of the source becomes our primary anchor to truth. The medium was never the message—the messenger always was.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a tech historian with almost two decades of experience. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

AI video just took a startling leap in realism. Are we doomed? Read More »

google-home-is-getting-deeper-gemini-integration-and-a-new-widget

Google Home is getting deeper Gemini integration and a new widget

As Google moves the last remaining Nest devices into the Home app, it’s also looking at ways to make this smart home hub easier to use. Naturally, Google is doing that by ramping up Gemini integration. The company has announced new automation capabilities with generative AI, as well as better support for third-party devices via the Home API. Google AI will also plug into a new Android widget that can keep you updated on what the smart parts of your home are up to.

The Google Home app is where you interact with all of Google’s smart home gadgets, like cameras, thermostats, and smoke detectors—some of which have been discontinued, but that’s another story. It also accommodates smart home devices from other companies, which can make managing a mixed setup feasible if not exactly intuitive. A dash of AI might actually help here.

Google began testing Gemini integrations in Home last year, and now it’s opening that up to third-party devices via the Home API. Google has worked with a few partners on API integrations before general availability. The previously announced First Alert smoke/carbon monoxide detector and Yale smart lock that are replacing Google’s Nest devices are among the first, along with Cync lighting, Motorola Tags, and iRobot vacuums.

Google Home is getting deeper Gemini integration and a new widget Read More »

google’s-will-smith-double-is-better-at-eating-ai-spaghetti-…-but-it’s-crunchy?

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy?

On Tuesday, Google launched Veo 3, a new AI video synthesis model that can do something no major AI video generator has been able to do before: create a synchronized audio track. While from 2022 to 2024, we saw early steps in AI video generation, each video was silent and usually very short in duration. Now you can hear voices, dialog, and sound effects in eight-second high-definition video clips.

Shortly after the new launch, people began asking the most obvious benchmarking question: How good is Veo 3 at faking Oscar-winning actor Will Smith at eating spaghetti?

First, a brief recap. The spaghetti benchmark in AI video traces its origins back to March 2023, when we first covered an early example of horrific AI-generated video using an open source video synthesis model called ModelScope. The spaghetti example later became well-known enough that Smith parodied it almost a year later in February 2024.

Here’s what the original viral video looked like:

One thing people forget is that at the time, the Smith example wasn’t the best AI video generator out there—a video synthesis model called Gen-2 from Runway had already achieved superior results (though it was not yet publicly accessible). But the ModelScope result was funny and weird enough to stick in people’s memories as an early poor example of video synthesis, handy for future comparisons as AI models progressed.

AI app developer Javi Lopez first came to the rescue for curious spaghetti fans earlier this week with Veo 3, performing the Smith test and posting the results on X. But as you’ll notice below when you watch, the soundtrack has a curious quality: The faux Smith appears to be crunching on the spaghetti.

On X, Javi Lopez ran “Will Smith eating spaghetti” in Google’s Veo 3 AI video generator and received this result.

It’s a glitch in Veo 3’s experimental ability to apply sound effects to video, likely because the training data used to create Google’s AI models featured many examples of chewing mouths with crunching sound effects. Generative AI models are pattern-matching prediction machines, and they need to be shown enough examples of various types of media to generate convincing new outputs. If a concept is over-represented or under-represented in the training data, you’ll see unusual generation results, such as jabberwockies.

Google’s Will Smith double is better at eating AI spaghetti … but it’s crunchy? Read More »

google-pretends-to-be-in-on-the-joke,-but-its-focus-on-ai-mode-search-is-serious

Google pretends to be in on the joke, but its focus on AI Mode search is serious

AI Mode as Google’s next frontier

Google is the world’s largest advertising entity, but search is what fuels the company. Quarter after quarter, Google crows about increasing search volume—it’s the most important internal metric for the company. Google has made plenty of changes to its search engine results pages (SERPs) over the years, but AI mode throws that all out. It doesn’t have traditional search results no matter how far you scroll.

To hear Google’s leadership tell it, AI Mode is an attempt to simplify finding information. According to Liz Reid, Google’s head of search, the next year in search is about going from information to intelligence. When you are searching for information on a complex issue, you probably have to look at a lot of web sources. It’s rare that you’ll find a single page that answers all your questions, and maybe you should be using AI for that stuff.

Google Elizabeth Reid

Google search head Liz Reid says the team’s search efforts are aimed at understanding the underlying task behind a query. Credit: Ryan Whitwam

The challenge for AI search is to simplify the process of finding information, essentially doing the legwork for you. When speaking about the move to AI search, DeepMind CTO Koray Kavukcuoglu says that search is the greatest product in the world, and if AI makes it easier to search for information, that’s a net positive.

Latency is important in search—people don’t like to wait for things to load, which is why Google has always emphasized the speed of its services. But AI can be slow. The key, says Reid, is to know when users will accept a longer wait. For example, AI Overviews is designed to spit out tokens faster because it’s part of the core search experience. AI Mode, however, has the luxury of taking more time to “think.” If you’re shopping for a new appliance, you might do a few hours of research. So an AI search experience that takes longer to come up with a comprehensive answer with tables, formatting, and background info might be a desirable experience because it still saves you time.

Google pretends to be in on the joke, but its focus on AI Mode search is serious Read More »

google-i/o-day

Google I/O Day

What did Google announce on I/O day? Quite a lot of things. Many of them were genuinely impressive. Google is secretly killing it on the actual technology front.

Logan Kilpatrick (DeepMind): Google’s progress in AI since last year:

– The worlds strongest models, on pareto frontier

– Gemini app: has over 400M monthly active users

– We now process 480T tokens a month, up 50x YoY

– Over 7M developers have built with the Gemini API (4x)

Much more to come still!

I think? It’s so hard to keep track. There’s really a lot going on right now, not that most people would have any idea. Instead of being able to deal with all these exciting things, I’m scrambling to get to it all at once.

Google AI: We covered a LOT of ground today. Fortunately, our friends at @NotebookLM put all of today’s news and keynotes into a notebook. This way, you can listen to an audio overview, create a summary, or even view a Mind Map of everything from #GoogleIO 2025.

That’s actually a terrible mind map, it’s missing about half of the things.

As in, you follow their CEO’s link to a page that tells you everything that happened, and it’s literally a link bank to 27 other articles. I did not realize one could fail marketing forever this hard, and this badly. I have remarkably little idea, given how much effort I am willing to put into finding out, what their products can do.

The market seems impressed, with Google outperforming, although the timing of it all was a little weird. I continue to be deeply confused about what the market is expecting, or rather not expecting, out of Google.

Ben Thompson has a gated summary post, Reuters has a summary as well.

I share Ben’s feeling that I’m coming away less impressed than I should be, because so many things were lost in the shuffle. There’s too much stuff here. Don’t announce everything at once like this if you want us to pay attention. And he’s right to worry that it’s not clear that Google, despite doing all the things, can develop compelling products.

I do think it can, though. And I think it’s exactly right to currently produce a bunch of prototypical not-yet-compelling products that aren’t compelling because they aren’t good enough yet… and then later make them good enough.

Except that you need people to actually then, you know, realize the products exist.

This post covers what I could figure out on a deadline. As for why I didn’t simply give this a few more days, well, I had a reason.

  1. The TLDR.

  2. Flow, Veo 3 and Imagen 4.

  3. Gmail Integration That’s Actually Good?

  4. Gemini 2.5 Flash.

  5. Gemma 3n.

  6. Gemini Diffusion.

  7. Jules.

  8. We’re in Deep Research.

  9. Google Search ‘AI Mode’.

  10. AI Shopping.

  11. Agent Mode.

  12. Project Astra or is it Google Live?.

  13. Android XR Glasses.

  14. Gemini For Your Open Tabs In Chrome.

  15. Google Meet Automatic Translation.

  16. We Have Real 3D At Home, Oh No.

  17. You Will Use the AI.

  18. Our Price Cheap.

  19. What To Make Of All This.

Or the ‘too many announcements, lost track.’

Google announced:

  1. Veo 3, which generates amazing eight second videos now with talk and sound.

  2. Flow, designed to tie that into longer stuff, but that doesn’t work right yet.

  3. Various new GMail and related integrations and other ways to spread context.

  4. Gemini 2.5 Flash, and Gemini 2.5 Pro Deep Thinking. They’re good, probably.

  5. Gemma 3n, open source runs on phones with 2GB ram.

  6. Gemini Diffusion as a text model, very intriguing but needs work.

  7. Jules, their answer to Codex, available for free.

  8. They’re going to let you go Full Agent, in Agent Mode, in several places.

  9. Gemini using your open tabs as context, available natively in Chrome.

  10. AI Search, for everyone, for free, as a search option, including a future agent mode and a specialized shopping mode.

  11. Automatic smooth translation for real-time talk including copying tone.

  12. A weird Google Beam thing where you see people in 3D while talking.

  13. They did an Android XR demo, but it’s going to be a while.

  14. For now you use your phone camera for a full Google Live experience, it’s good.

  15. Their new premium AI subscription service is $250/month.

A lot of it is available now, some of it will be a few months. Some of it is free, some of it isn’t, or isn’t after a sample. Some of it is clearly good, some is still buggy, some we don’t know yet. It’s complicated.

Also I think there was a day two?

The offering that got everyone excited and went viral was Veo 3.

They also updated their image generation to Imagen 4 and it’s up to 2k resolution with various improvements and lots of ability to control details. It’s probably pretty good but frankly no one cares.

Did you want an eight second AI video, now with sound, maybe as something you could even extend? They got you. We can talk (cool video). Oh, being able to talk but having nothing to say.

Sundar Pichai (CEO Google): Veo 3, our SOTA video generation model, has native audio generation and is absolutely mindblowing.

For filmmakers + creatives, we’re combining the best of Veo, Imagen and Gemini into a new filmmaking tool called Flow.

Ready today for Google AI Pro and Ultra plan subscribers.

People really love the new non-silent video generation capabilities.

Here’s Bayram Annakov having a guy wake up in a cold sweat. Here’s Google sharing a user extending a video of an eagle carrying a car. Here’s fofr making a man run while advertising replicate, which almost works, and also two talking muffins which totally worked. Here’s Pliny admiring the instruction handling.

And here’s Pliny somewhat jailbreaking it, with videos to show for it. Except, um, Google, why do any of these require jailbreaks? They’re just cool eight second videos. Are they a little NSFW? I mean sure, but we’re strictly (if aggressively) PG-13 here, complete with exactly one F-bomb. I realize this is a negotiation, I realize why we might not want to go to R, but I think refusing to make any of these is rather shameful behavior.

I would say that Flow plus Veo 3 is the first video generation product that makes me think ‘huh, actually that’s starting to be cool.’ Coherence is very strong, you have a lot of tools at your disposal, and sound is huge. They’re going to give you the power to do various shots and virtual camera movements.

I can see actually using this, or something not too different from this. Or I can see someone like Primordial Soup Labs, which formed a partnership with DeepMind, creating an actually worthwhile short film.

Steven McCulloch: Veo 3 has blown past a new threshold of capability, with the ability to one-shot scenes with full lip sync and background audio. What used to be a 4-step workflow with high barrier to entry has been boiled down into a single, frictionless prompt.

This is huge.

They also refer to their music sandbox, powered by Lyria 2, but there’s nothing to announce at this time.

They’re launching SynthID Detector, a tool to detect AI-generated content.

They remind us of Google Vids to turn your slides into videos, please no. Don’t. They’re also offering AI avatars in Vids. Again, please, don’t, what fresh hell is this.

Also there’s Stitch to generate designs and UIs from text prompts?

I keep waiting for it, it keeps not arriving, is it finally happening soon?

Sundar Pichai: With personal smart replies in Gmail, you can give Gemini permission to pull in details from across your Google apps and write in a way that sounds like you.

Rolling out in the coming weeks to subscribers.

I’ve been disappointed too many times at this point, so I will believe it when I see it.

The part that I want most is the pulling in of the details, the ability to have the AI keep track of and remind me of the relevant context, including pulling from Google Drive which in turn means you can for example pull from Obsidian since it’s synced up. They’re also offering ‘source-grounded writing help’ next quarter in Google Docs (but not GMail?) where you have it pull only from particular sources, which is nice if it’s easy enough to use.

I want GMail to properly populate Calendar rather than its current laughably silly hit-and-miss actions (oh look, at movie that runs from 3-4 on Thursday, that’s how that works!), to pull out and make sure I don’t miss key information, to remind me of dropped balls and so on.

They’re offering exactly this with ‘inbox cleanup,’ as in ‘delete all of my unread emails from The Groomed Paw from the last year.’ That’s a first step. We need to kick that up at least a notch, starting with things such as ‘set up an AI filter so I never see another damned Groomed Paw email again unless it seems actually urgent or offers a 50% or bigger sale’ and ‘if Sarah tells me if she’s coming on Friday ping me right away.’

Another offering that sounds great is ‘fast appointment scheduling integrated into GMail,’ in the video it’s a simple two clicks which presumably implies you’ve set things up a lot already. Again, great if it works, but it has to really work and know your preferences and adjust to your existing schedule. If it also reads your other emails and other context to include things not strictly in your calendar, now we’re really talking.

Do I want it to write the actual emails after that? I mean, I guess, sometimes, if it’s good enough. Funnily enough, when that happens, that’s probably exactly the times I don’t want it to sound like me. If I wanted to sound like me I could just write the email. The reason I want the AI to write it is because I need to be Performing Class, or I want to sound like a Dangerous Professional a la Patio11, or I want to do a polite formality. Or when I mostly need to populate the email with a bunch of information.

Of course, if it gets good enough, I’ll also want it to do some ‘sound like me’ work too, such as responding to readers asking questions with known answers. Details are going to matter a ton, and I would have so many notes if I felt someone was listening.

In any case, please, I would love a version of this that’s actually good in those other ways. Are the existing products good enough I should be using them? I don’t know. If there’s one you use that you think I’d want, share in the comments.

I/O Day mostly wasn’t about the actual models or the API, but we do have some incremental changes here thrown into the fray.

Gemini 2.5 Flash is technically still in preview, but it’s widely available including in the Gemini app, and I’d treat it as de facto released. It’s probably the best fast and cheap model, and the best ‘fast thinking’ model if you use that mode.

Also, yes, of course Pliny pwned it, why do we even ask, if you want to use it you set it as the system prompt.

Pliny: ah forgot to mention, prompt is designed to be set as system prompt. a simple obfuscation of any trigger words in your query should be plenty, like “m-d-m-a” rather than “mdma”

Sundar Pichai (CEO Google): Our newest Gemini 2.5 Flash is better on nearly every dimension: reasoning, multimodality, code, long context. Available for preview in the Gemini app, AI Studio and Vertex AI.

And with Deep Think mode, Gemini 2.5 Pro is getting better, too. Available to trusted testers.

Demis Hassabis: Gemini 2.5 Flash is an amazing model for its speed and low-cost.

Logan Kilpatrick: Gemini 2.5 Flash continues to push the pareto frontier, so much intelligence packed into this model, can’t wait for GA in a few weeks!

Peter Wildeford: LLMs are like parrots except the parrots are very good at math

On that last one, the light blue is Deep Thinking, dark blue is regular 2.5 Pro.

Peter Wildeford: It’s pretty confusing that the graphs compare “Gemini 2.5 Pro” to “Gemini 2.5 Pro”

Alex Friedland: The fundamental issue is that numbers are limited and they might run out.

Gemini 2.5 Flash is in second place on (what’s left of) the Arena leaderboard, behind only Gemini 2.5 Pro.

Hasan Can: I wasn’t going to say this at first, because every time I praised one of Google’s models, they ruined it within a few weeks but the new G.2.5 Flash is actually better than the current 2.5 Pro in the Gemini app. It reminds me of the intelligence of the older 2.5 Pro from 03.25.

The Live API will now have audio-visual input and native audio out dialogue with ability to steer tone, accent and style off speaking, ability to respond to user tone of voice, as well as tool use. They’re also adding computer use to the API, and are adding native SDK support for Model Context Protocol (MCP).

There’s a white paper on how they made Gemini secure, and their safeguards, but today is a day that I have sympathy for the ‘we don’t have time for that’ crowd and I’m setting it aside for later. I’ll circle back.

Gemma 3n seems to be a substantial improvement in Google’s open model on-device performance. I don’t know whether it is better than other open alternatives, there’s always a bizarre ocean of different models claiming to be good, but I would be entirely unsurprised if this was very much state of the art.

Google AI Developers: Introducing Gemma 3n, available in early preview today.

The model uses a cutting-edge architecture optimized for mobile on-device usage. It brings multimodality, super fast inference, and more.

Key features include:

-Expanded multimodal understanding with video and audio input, alongside text and images

-Developer-friendly sizes: 4B and 2B (and many in between!)

-Optimized on-device efficiency for 1.5x faster response on mobile compared to Gemma 3 4B

Build live, interactive apps and sophisticated audio-centric experiences, including real-time speech transcription, translation, and rich voice-driven interactions

Gemma 3n leverages a Google DeepMind innovation called Per-Layer Embeddings (PLE) that delivers a significant reduction in RAM usage. While the raw parameter count is 5B and 8B, this innovation allows you to run larger models on mobile devices or live-stream from the cloud, with a memory overhead comparable to a 2B and 4B model, meaning the models can operate with a dynamic memory footprint of just 2GB and 3GB. Learn more in our documentation.

Oh, and also they just added MedGemma for health care, SignGemma for ASL and DolphinGemma for talking to dolphins. Because sure, why not?

This quietly seems like it could turn out to be a really big deal. We have an actually interesting text diffusion model. It can do 2k tokens/second.

Alexander Doria: Gemini Diffusion does pass honorably my nearly impossible OCR correction benchmark: Plainly, “can you correct the OCR of this text.”

Meanwhile, here’s a cool finding, ‘what like it’s hard’ department:

Earlence: Gemini diffusion is cool! Really fast and appears capable in coding tasks. But what is interesting is that one of @elder_plinius jailbreaks (for 2.5) appears to have worked on the diffusion model as well when I used it to ask about Anthrax.

Remember when I spent a day covering OpenAI’s Codex?

Well, Google announced Jules, its own AI coding agent. Context-aware, repo-integrated, ready to ship features. The quick video looks like a superior UI. But how good is it? So far I haven’t seen much feedback on that.

So instead of a detailed examination, that’s all I have for you on this right now. Jules exists, it’s Google’s answer to Codex, we’ll have to see if it is good.

But, twist! It’s free. Right now it’s reporting heavy use (not a shock) so high latency.

In addition to incorporating Gemini 2.5, Deep Research will soon let you connect your Google Drive and GMail, choose particular sources, and integrate with Canvas.

This is pretty exciting – in general any way to get deep dives to use your extensive context properly is a big game and Google is very good with long context.

If you don’t want to wait for Deep Research, you can always Deep Think instead. Well, not yet unless you’re a safety researcher (and if you are, hit them up!) but soon.

JJ Hughes notes how exciting it will be to get true long context into a top level deep reasoning model to unlock new capabilities such as for lawyers like himself, but notes the UI remains terrible for this.

Also, remember NotebookLM? There’s now An App For That and it’s doing well.

Google Search AI Overviews have been a bit of a joke for a while. They’re the most common place people interact with AI, and yet they famously make obvious stupid mistakes, including potentially harmful ones, constantly. That’s been improving, and now with 2.5 powering them it’s going to improve again.

AI Mode is going to be (future tense because it’s not there for me yet) something different from Overviews, but one might ask isn’t it the same as using Gemini? What’s the difference? Is this a version of Perplexity (which has fallen totally out of my rotation), or what?

They’re doing a terrible job explaining any of that, OpenAI is perhaps secretly not the worst namer of AI services.

Sundar Pichai: AI Mode is rolling out to everyone in the US. It’s a total reimagining of Search with more advanced reasoning so you can ask longer, complex queries.

AI Overviews are now used by 1.5B people a month, in 200+ countries and territories.

And Gemini 2.5 is coming to both this week.

My understanding is that the difference is that AI Mode in search will have better integrations for various real time information systems, especially shopping and other commonly accessed knowledge, and has the ability to do a lot of Google searches quickly to generate its context, and also it is free.

They plan on merging ‘Project Mariner’ or ‘Agent Mode’ into it as well, and you’ll have the option to do a ‘deep search.’ They say they’re starting with ‘event tickets, restaurant reservations and local appointments.’ I actually think this is The Way. You don’t try to deploy an agent in general. It’s not time for that yet. You deploy an agent in specific ways where you know it works, on a whitelisted set of websites where you know what it’s doing and that this is safe. You almost don’t notice there’s an agent involved, it feels like using the Web but increasingly without the extra steps.

If they do a decent job of all this, ‘Google Search AI Mode’ is going to be the actually most useful way to do quite a lot of AI things. It won’t be good for jobs that require strong intelligence, but a large percentage of tasks are much more about search. Google has a huge edge there if they execute, including in customization.

They also plan to incorporate AI Search Mode advances directly into regular Google Search, at least in the overviews and I think elsewhere as well.

What I worry about here is it feels like multiple teams fighting over AI turf. The AI Search team is trying to do things that ‘naturally’ fall to Gemini and also to regular Search, and Gemini is trying to do its own form of search, and who knows what the Overviews team is thinking, and so on.

An important special case for Google Search AI Mode (beware, your computer might be accessing GSAM?) will (in a few months) be Shopping With Google AI Mode, I don’t even know what to call anything anymore. Can I call it Google Shopping? Gemini Shopping?

It actually seems really cool, again if executed well, allowing you to search all the sites at once in an AI-powered way, giving you visuals, asking follow ups. It can track prices and then automatically buy when the price is right.

They have a ‘try it on’ that lets you picture yourself in any of the clothing, which is rolling out now to search labs. Neat. It’s double neat if it automatically only shows you clothing that fits you.

Sundar Pichai (CEO Google): Agent Mode in the @Geminiapp can help you get more done across the web – coming to subscribers soon.

Plus a new multi-tasking version of Project Mariner is now available to Google AI Ultra subscribers in the US, and computer use capabilities are coming to the Gemini API.

It will also use MCP, which enshrines MCP as a standard across labs.

The example here is to use ‘agent mode’ to find and go through apartment listings and arrange tours. They say they’re bringing this mode to the Gemini app and planning on incorporating it into Chrome.

I like the idea of their feature ‘teach and repeat.’ As in, you do a task once, and it learns from what you did so it can do similar tasks for you in the future.

Alas, early reports are that Project Mariner is not ready for prime time.

As an example, Bayram Annakov notes it failed on a simple task. That seems to be the norm.

You now can get this for free in Android and iOS, which means sharing live camera feeds while you talk to Gemini and it talks back, now including things like doing Google searches on your behalf, calling up YouTube videos and so on, even making its own phone calls.

I’m not even sure what exactly Project Astra is at this point. I’ve been assuming I’ve been using it when I put Gemini into live video mode, so now it’s simply Google Live, but I’m never quite sure?

Roward Cheung: [Google] revamped project Astra with native audio dialogue, UI control, content retrieval, calling, and shopping.

The official video he includes highlights YouTube search, GMail integration and the ability to have Gemini call a shop (in the background while you keep working) and ask what they have in stock. They’re calling it ‘action intelligence.’

In another area they talk about extending Google Live and Project Astra into search. They’re framing this as you point the camera at something and then you talk and it generates a search, including showing you search results traditional Google style. So it’s at least new in that it can make that change.

If you want to really unlock the power of seeing your screen, you want the screen to see what you see. Thus, Android XR Glasses. That’s a super exciting idea and a long time coming. And we have a controlled demo.

But also, not so fast. We’re talking 2026 at the earliest, probably 18+ months, and we have no idea what they are going to cost. I also got strong ‘not ready for prime time’ vibes from the demo, more of the ‘this is cool in theory but won’t work in practice.’ My guess is that if I had these in current form, I’d almost entirely use them for Google Live purposes and maybe chatting with the AI, and basically nothing else, unless we got better agentic AI that could work with various phone apps?

There’s another new feature where you can open up Gemini in Chrome and ask questions not only about the page, but all your other open pages, which automatically are put into context. It’s one of those long time coming ideas, again if it works well. This one should be available by now.

This is one of many cases where it’s going to take getting used to it so you actually think to use it when this is the right modality, and you have confidence to turn to it, but if so, seems great.

It’s hard to tell how good translation is from a sample video, but I find it credible that this is approaching perfect and means you can pull off free-flowing conversations across languages, as long as you don’t mind a little being lost in translation. They’re claiming they are preserving things like tone of voice.

Sundar Pichai: Real-time speech translation directly in Google Meet matches your tone and pattern so you can have free-flowing conversations across languages

Launching now for subscribers. ¡Es mágico!

Rob Haisfield: Now imagine this with two people wearing AR glasses in person!

They show this in combination with their 3D conferencing platform Google Beam, but the two don’t seem at all related. Translation is for audio, two dimensions are already two more than you need.

Relatedly, Gemini is offering to do automatic transcripts including doing ‘transcript trim’ to get rid of filler words, or one-click balancing your video’s sound.

They’re calling it Google Beam, downwind of Project Starline.

This sounds like it is primarily about 3D video conferencing or some form of AR/VR, or letting people move hands and such around like they’re interacting in person?

Sundar Pichai: Google Beam uses a new video model to transform 2D video streams into a realistic 3D experience — with near perfect headtracking, down to the millimeter, and at 60 frames per second, all in real-time.

The result is an immersive conversational experience. HP will share more soon.

It looks like this isn’t based on the feed from one camera, but rather six, and requires its own unique devices.

This feels like a corporate ‘now with real human physical interactions, fellow humans!’ moment. It’s not that you couldn’t turn it into something cool, but I think you’d have to take it pretty far, and by that I mean I think you’d need haptics. If I can at least shake your hand or hug you, maybe we’ve got something. Go beyond that and the market is obvious.

Whereas the way they’re showing it seems to me to be the type of uncanny valley situation I very much Do Not Want. Why would actually want this for a meeting, either of two people or more than two? I’ve never understood why you would want to have a ‘virtual meeting’ where people were moving in 3D in virtual chairs, or you seemed to be moving in space, it seems like not having to navigate that is one of the ways Google Meet is better than in person.

I can see it if you were using it to do something akin to a shared VR space, or a game, or an intentionally designed viewing experience including potentially watching a sporting event. But for the purposes they are showing off, 2D isn’t a bug. It’s a feature.

On top of that, this won’t be cheap. We’re likely talking $15k-$30k per unit at first for the early devices from HP that you’ll need. Hard pass. But even Google admits the hardware devices aren’t really the point. The point is that you can beam something in one-to-many mode, anywhere in the world, once they figure out what to do with that.

Google’s AI use is growing fast. Really fast.

Sundar Pichai: The world is adopting AI faster than ever before.

This time last year we were processing 9.7 trillion tokens a month across our products and APIs.

Today, that number is 480 trillion. That’s a 50X increase in just a year. 🤯

Gallabytes: I wonder how this breaks down flash versus pro

Peter Wildeford: pinpoint the exact moment Gemini became good

I had Claude estimate similar numbers for other top AI labs. At this point Claude thinks Google is probably roughly on par with OpenAI on tokens processed, and well ahead of everyone else.

But of course you can get a lot of tokens when you throw your AI into every Google search whether the user likes it or not. So the more meaningful number is likely the 400 million monthly active users for Gemini, with usage up 45% in the 2.5 era, but again I don’t think the numbers for different services are all that comparable, but note that ChatGPT’s monthly user count is 1.5 billion, about half of whom use it any given week. The other half have to be some strange weeks, given most of them aren’t exactly switching over to Claude.

Google offers a lot of things for free. That will also be true in AI. In particular, AI Search will stay free, as will basic functionality in the Gemini app. But if you want to take full advantage, yep, you’re going to pay.

They have two plans: The Pro plan at $20/month, and the Ultra plan at $250/month, which includes early access to new features including Agent Mode and much higher rate limits. This is their Ultra pitch.

Hensen Juang: Wait they are bundling YouTube premium?

MuffinV: At this point they will sell every google product as a single subscription.

Hensen Juang: This is the way.

It is indeed The Way. Give me the meta subscription. Google Prime.

For most people, the Pro plan looks like it will suffice. Given everything Google is offering, a lot of you should be giving up your $20/month, even if that’s your third $20/month after Claude and ChatGPT. The free plan is actually pretty solid too, if you’re not going to be that heavy a user because you’re also using the competition.

The $250/month Ultra plan seems like it’s not offering that much extra. The higher rate limits are nice but you probably won’t run into them often. The early access is nice, but the early access products are mostly rough around the edges. It certainly isn’t going to be ‘ten times better,’ and it’s a much worse ‘deal’ than the Pro $20/month plan. But once again, looking at relative prices is a mistake. They don’t matter.

What matters is absolute price versus absolute benefit. If you’re actually getting good use out of the extra stuff, it can easily be well in excess of the $250/month.

If your focus is video, fofr reports you get 12k credits per month, and it costs 150 credits per 8 second Veo 3 video, so with perfect utilization you pay $0.39 per second of video, plus you get the other features. A better deal, if you only want video, is to buy the credits directly, at about $0.19 per second. That’s still not cheap, but it’s a lot better, and this does seem like a big quality jump.

Another key question is, how many iterations does it take to get what you want? That’s a huge determinant of real cost. $0.19 per second is nothing if it always spits out the final product.

For now I don’t see the $250/month being worth it for most people, especially without Project Mariner access. And as JJ Hughes says, add up all these top level subscriptions and pretty soon you’re talking real money. But I’d keep an eye.

Knud Berthelsen: There is so much and you know they will eventually integrate it in their products that actually have users. There needs to be something between the $20/month tier and the $250 for those of us who want an AI agent but not a movie studio.

It’s a lot. Google is pushing ahead on all the fronts at once. The underlying models are excellent. They’re making it rain. It’s all very disjointed, and the vision hasn’t been realized, but there’s tons of potential here.

Pliny: Ok @GoogleDeepMind, almost there. If you can build a kick-ass agentic UI to unify everything and write a half-decent system prompt, people will call it AGI.

Justin Halford: They’re clearly the leading lab at this point.

Askwho: Veo 3’s hyped high-fidelity videos w/ sound are dazzling, but still feel like an advanced toy. Gemini Diffusion shows immense promise, though its current form is weak. Jules is tough to assess fully due to rate limits, but it’s probably the most impactful due to wide availability

Some people will call o3 AGI. I have little doubt some (more) people would call Google Gemini AGI if you made everything involved work as its best self and unified it all.

I wouldn’t be one of those people. Not yet. But yeah, interesting times.

Demis Hassabis does say that unification is the vision, to turn the Gemini app into a universal AI assistant, including combining Google Live for real time vision with Project Mariner for up to ten parallel agent actions.

Ben Thompson came away from all this thinking that the only real ‘products’ here were still Google search and Google Cloud, and that remains the only products that truly matter or function at Google. I get why one would come away with that impression, but again I don’t agree. I think that the other offerings won’t all hit, especially at first, but they’ll get better quickly as AI advances and as the productization and iterations fly by.

He has some great turns of phrase. Here, Ben points out that the problem with AI is that to use it well you have to think and figure out what to do. And if there’s one thing users tend to lack, it would be volition. Until Google can solve volition, the product space will largely go to those who do solve it, which often means startups.

Ben Thompson: Second, the degree to which so many of the demoes yesterday depend on user volition actually kind of dampened my enthusiasm for their usefulness.

It has long been the case that the best way to bring products to the consumer market is via devices, and that seems truer than ever: Android is probably going to be the most important canvas for shipping a lot of these capabilities, and Google’s XR glasses were pretty compelling (and, in my opinion, had a UX much closer to what I envision for XR than Meta’s Orion did).

Devices drive usage at scale, but that actually leaves a lot of room for startups to build software products that incorporate AI to solve problems that people didn’t know they had; the challenge will be in reaching them, which is to say the startup problem is the same as ever.

Google is doing a good job at making Search better; I see no reason to be worried about them making any other great product, even as the possibility of making something great with their models seems higher than ever. That’s good for startups!

I’m excited for it rather than worried, but yes, if you’re a startup, I would worry a bit.

What will we see tomorrow?

Discussion about this post

Google I/O Day Read More »

gemini-2.5-is-leaving-preview-just-in-time-for-google’s-new-$250-ai-subscription

Gemini 2.5 is leaving preview just in time for Google’s new $250 AI subscription

Deep Think graphs I/O

Deep Think is more capable of complex math and coding. Credit: Ryan Whitwam

Both 2.5 models have adjustable thinking budgets when used in Vertex AI and via the API, and now the models will also include summaries of the “thinking” process for each output. This makes a little progress toward making generative AI less overwhelmingly expensive to run. Gemini 2.5 Pro will also appear in some of Google’s dev products, including Gemini Code Assist.

Gemini Live, previously known as Project Astra, started to appear on mobile devices over the last few months. Initially, you needed to have a Gemini subscription or a Pixel phone to access Gemini Live, but now it’s coming to all Android and iOS devices immediately. Google demoed a future “agentic” capability in the Gemini app that can actually control your phone, search the web for files, open apps, and make calls. It’s perhaps a little aspirational, just like the Astra demo from last year. The version of Gemini Live we got wasn’t as good, but as a glimpse of the future, it was impressive.

There are also some developments in Chrome, and you guessed it, it’s getting Gemini. It’s not dissimilar from what you get in Edge with Copilot. There’s a little Gemini icon in the corner of the browser, which you can click to access Google’s chatbot. You can ask it about the pages you’re browsing, have it summarize those pages, and ask follow-up questions.

Google AI Ultra is ultra-expensive

Since launching Gemini, Google has only had a single $20 monthly plan for AI features. That plan granted you access to the Pro models and early versions of Google’s upcoming AI. At I/O, Google is catching up to AI firms like OpenAI, which have offered sky-high AI plans. Google’s new Google AI Ultra plan will cost $250 per month, more than the $200 plan for ChatGPT Pro.

Gemini 2.5 is leaving preview just in time for Google’s new $250 AI subscription Read More »

new-portal-calls-out-ai-content-with-google’s-watermark

New portal calls out AI content with Google’s watermark

Last year, Google open-sourced its SynthID AI watermarking system, allowing other developers access to a toolkit for imperceptibly marking content as AI-generated. Now, Google is rolling out a web-based portal to let people easily test if a piece of media has been watermarked with SynthID.

After uploading a piece of media to the SynthID Detector, users will get back results that “highlight which parts of the content are more likely to have been watermarked,” Google said. That watermarked, AI-generated content should remain detectable by the portal “even when the content is shared or undergoes a range of transformations,” the company said.

The detector will be available to beta testers starting today, and Google says journalists, media professionals, and researchers can apply for a waitlist to get access themselves. To start, users will be able to upload images and audio to the portal for verification, but Google says video and text detection will be added in the coming weeks.

New portal calls out AI content with Google’s watermark Read More »

zero-click-searches:-google’s-ai-tools-are-the-culmination-of-its-hubris

Zero-click searches: Google’s AI tools are the culmination of its hubris


Google’s first year with AI search was a wild ride. It will get wilder.

Google is constantly making changes to its search rankings, but not all updates are equal. Every few months, the company bundles up changes into a larger “core update.” These updates make rapid and profound changes to search, so website operators watch them closely.

The March 2024 update was unique. It was one of Google’s largest core updates ever, and it took over a month to fully roll out. Nothing has felt quite the same since. Whether the update was good or bad depends on who you ask—and maybe who you are.

It’s common for websites to see traffic changes after a core update, but the impact of the March 2024 update marked a seismic shift. Google says the update aimed to address spam and AI-generated content in a meaningful way. Still, many publishers say they saw clicks on legitimate sites evaporate, while others have had to cope with unprecedented volatility in their traffic. Because Google owns almost the entire search market, changes in its algorithm can move the Internet itself.

In hindsight, the March 2024 update looks like the first major Google algorithm update for the AI era. Not only did it (supposedly) veer away from ranking AI-authored content online, but it also laid the groundwork for Google’s ambitious—and often annoying—desire to fuse AI with search.

A year ago, this ambition surfaced with AI Overviews, but now the company is taking an even more audacious route, layering in a new chat-based answer service called “AI Mode.” Both of these technologies do at least two things: They aim to keep you on Google properties longer, and they remix publisher content without always giving prominent citations.

Smaller publishers appear to have borne the brunt of the changes caused by these updates. “Google got all this flak for crushing the small publishers, and it’s true that when they make these changes, they do crush a lot of publishers,” says Jim Yu, CEO of enterprise SEO platform BrightEdge. Yu explains that Google is the only search engine likely to surface niche content in the first place, and there are bound to be changes to sites at the fringes during a major core update.

Google’s own view on the impact of the March 2024 update is unsurprisingly positive. The company said it was hoping to reduce the appearance of unhelpful content in its search engine results pages (SERPs) by 40 percent. After the update, the company claimed an actual reduction of closer to 45 percent. But does it feel like Google’s results have improved by that much? Most people don’t think so.

What causes this disconnect? According to Michael King, founder of SEO firm iPullRank, we’re not speaking the same language as Google. “Google’s internal success metrics differ from user perceptions,” he says. “Google measures user satisfaction through quantifiable metrics, while external observers rely on subjective experiences.”

Google evaluates algorithm changes with various tests, including human search quality testers and running A/B tests on live searches. But more than anything else, success is about the total number of searches (5 trillion of them per year). Google often makes this number a centerpiece of its business updates to show investors that it can still grow.

However, using search quantity to measure quality has obvious problems. For instance, more engagement with a search engine might mean that quality has decreased, so people try new queries (e.g., the old trick of adding “Reddit” to the end of your search string). In other words, people could be searching more because they don’t like the results.

Jim Yu suggests that Google is moving fast and breaking things, but it may not be as bad as we think. “I think they rolled things out faster because they had to move a lot faster than they’ve historically had to move, and it ends up that they do make some real mistakes,” says Yu. “[Google] is held to a higher standard, but by and large, I think their search quality is improving.”

According to King, Google’s current search behavior still favors big names, but other sites have started to see a rebound. “Larger brands are performing better in the top three positions, while lesser-known websites have gained ground in positions 4 through 10,” says King. “Although some websites have indeed lost traffic due to reduced organic visibility, the bigger issue seems tied to increased usage of AI Overviews”—and now the launch of AI Mode.

Yes, the specter of AI hangs over every SERP. The unhelpful vibe many people now get from Google searches, regardless of the internal metrics the company may use, may come from a fundamental shift in how Google surfaces information in the age of AI.

The AI Overview hangover

In 2025, you can’t talk about Google’s changes to search without acknowledging the AI-generated elephant in the room. As it wrapped up that hefty core update in March 2024, Google also announced a major expansion of AI in search, moving the “Search Generative Experience” out of labs and onto Google.com. The feature was dubbed “AI Overviews.”

The AI Overview box has been a fixture on Google’s search results page ever since its debut a year ago. The feature uses the same foundational AI model as Google’s Gemini chatbot to formulate answers to your search queries by ingesting the top 100 (!) search results. It sits at the top of the page, pushing so-called blue link content even farther down below the ads and knowledge graph content. It doesn’t launch on every query, and sometimes it answers questions you didn’t ask—or even hallucinates a totally wrong answer.

And it’s not without some irony that Google’s laudable decision to de-rank synthetic AI slop comes at the same time that Google heavily promotes its own AI-generated content right at the top of SERPs.

AI Overview on phone

AI Overviews appear right at the top of many search results.

Credit: Google

AI Overviews appear right at the top of many search results. Credit: Google

What is Google getting for all of this AI work? More eyeballs, it would seem. “AI is driving more engagement than ever before on Google,” says Yu. BrightEdge data shows that impressions on Google are up nearly 50 percent since AI Overviews launched. Many of the opinions you hear about AI Overviews online are strongly negative, but that doesn’t mean people aren’t paying attention to the feature. In its Q1 2025 earnings report, Google announced that AI Overviews is being “used” by 1.5 billion people every month. (Since you can’t easily opt in or opt out of AI Overviews, this “usage” claim should be taken with a grain of salt.)

Interestingly, the impact of AI Overviews has varied across the web. In October 2024, Google was so pleased with AI Overviews that it expanded them to appear in more queries. And as AI crept into more queries, publishers saw a corresponding traffic drop. Yu estimates this drop to be around 30 percent on average for those with high AI query coverage. For searches that are less supported in AI Overviews—things like restaurants and financial services—the traffic change has been negligible. And there are always exceptions. Yu suggests that some large businesses with high AI Overview query coverage have seen much smaller drops in traffic because they rank extremely well as both AI citations and organic results.

Lower traffic isn’t the end of the world for some businesses. Last May, AI Overviews were largely absent from B2B queries, but that turned around in a big way in recent months. BrightEdge estimates that 70 percent of B2B searches now have AI answers, which has reduced traffic for many companies. Yu doesn’t think it’s all bad, though. “People don’t click through as much—they engage a lot more on the AI—but when they do click, the conversion rate for the business goes up,” Yu says. In theory, serious buyers click and window shoppers don’t.

But the Internet is not a giant mall that exists only for shoppers. It is, first and foremost, a place to share and find information, and AI Overviews have hit some purveyors of information quite hard. At launch, AI Overviews were heavily focused on “What is” and “How to” queries. Such “service content” is a staple of bloggers and big media alike, and these types of publishers aren’t looking for sales conversions—it’s traffic that matters. And they’re getting less of it because AI Overviews “helpfully” repackages and remixes their content, eliminating the need to click through to the site. Some publishers are righteously indignant, asking how it’s fair for Google to remix content it doesn’t own, and to do so without compensation.

But Google’s intentions don’t end with AI Overviews. Last week, the company started an expanded public test of so-called “AI Mode,” right from the front page. AI Mode doesn’t even bother with those blue links. It’s a chatbot experience that, at present, tries to answer your query without clearly citing sources inline. (On some occasions, it will mention Reddit or Wikipedia.) On the right side of the screen, Google provides a little box with three sites linked, which you can expand to see more options. To the end user, it’s utterly unclear if those are “sources,” “recommendations,” or “partner deals.”

Perhaps more surprisingly, in our testing, not a single AI Mode “sites box” listed a site that ranked on the first page for the same query on a regular search. That is, the links in AI Mode for “best foods to eat for a cold” don’t overlap at all with the SERP for the same query in Google Search. In fairness, AI Mode is very new, and its behavior will undoubtedly change. But the direction the company is headed in seems clear.

Google’s real goal is to keep you on Google or other Alphabet properties. In 2019, Rand Fishkin noticed that Google’s evolution from search engine to walled garden was at a tipping point. At that time—and for the first time—more than half of Google searches resulted in zero click-throughs to other sites. But data did show large numbers of clicks to Google’s own properties, like YouTube and Maps. If Google doesn’t intend to deliver a “zero-click” search experience, you wouldn’t know it from historical performance data or the new features the company develops.

You also wouldn’t know it from the way AI Overviews work. They do cite some of the sources used in building each output, and data suggests people click on those links. But are the citations accurate? Is every source used for constructing an AI Overview cited? We don’t really know, as Google is famously opaque about how its search works. We do know that Google uses a customized version of Gemini to support AI Overviews and that Gemini has been trained on billions and billions of webpages.

When AI Overviews do cite a source, it’s not clear how those sources came to be the ones cited. There’s good reason to be suspicious here: AI Overview’s output is not great, as witnessed by the numerous hallucinations we all know and love (telling people to eat rocks, for instance). The only thing we know for sure is that Google isn’t transparent about any of this.

No signs of slowing

Despite all of that, Google is not slowing down on AI in search. More recent core updates have only solidified this new arrangement with an ever-increasing number of AI-answered queries. The company appears OK with its current accuracy problems, or at the very least, it’s comfortable enough to push out AI updates anyway. Google appears to have been caught entirely off guard by the public launch of ChatGPT, and it’s now utilizing its search dominance to play catch-up.

To make matters even more dicey, Google isn’t even trying to address the biggest issue in all this: The company’s quest for zero-click search harms the very content creators upon which the company has built its empire.

For its part, Google has been celebrating its AI developments, insisting that content producers don’t know what’s best for them, refuting any concerns with comments about search volume increases and ever-more-complex search query strings. The changes must be working!

Google has been building toward this moment for years. The company started with a list of ten blue links and nothing else, but little by little, it pushed the links down the page and added more content that keeps people in the Google ecosystem. Way back in 2007, Google added Universal Search, which allowed it to insert content from Google Maps, YouTube, and other services. In 2009, Rich Snippets began displaying more data from search results on SERPs. In 2012, the Knowledge Graph began extracting data from search results to display answers in the search results. Each change kept people on Google longer and reduced click-throughs, all the while pushing the search results down the page.

AI Overviews, and especially AI Mode, are the logical outcome of Google’s years-long transformation from an indexer of information to an insular web portal built on scraping content from around the web. Earlier in Google’s evolution, the implicit agreement was that websites would allow Google to crawl their pages in exchange for sending them traffic. That relationship has become strained as the company has kept more traffic for itself, reducing click-throughs to websites even as search volume continues to increase. And locking Google out isn’t a realistic option when the company controls almost the entire search market.

Even when Google has taken a friendlier approach, business concerns could get in the way. During the search antitrust trial, documents showed that Google initially intended to let sites opt out of being used for AI training for its search-based AI features—but these sites would still be included in search results. The company ultimately canned that idea, leaving site operators with the Pyrrhic choice of participating in the AI “revolution” or becoming invisible on the web. Google now competes with, rather than supports, the open web.

When many of us look at Google’s search results today, the vibe feels off. Maybe it’s the AI, maybe it’s Google’s algorithm, or maybe the Internet just isn’t what it once was. Whatever the cause, the shift toward zero-click search that began more than a decade ago was made clear by the March 2024 core update, and it has only accelerated with the launch of AI Mode. Even businesses that have escaped major traffic drops from AI Overviews could soon find that Google’s AI-only search can get much more overbearing.

The AI slop will continue until morale improves.

Photo of Ryan Whitwam

Ryan Whitwam is a senior technology reporter at Ars Technica, covering the ways Google, AI, and mobile technology continue to change the world. Over his 20-year career, he’s written for Android Police, ExtremeTech, Wirecutter, NY Times, and more. He has reviewed more phones than most people will ever own. You can follow him on Bluesky, where you will see photos of his dozens of mechanical keyboards.

Zero-click searches: Google’s AI tools are the culmination of its hubris Read More »

google-to-give-app-devs-access-to-gemini-nano-for-on-device-ai

Google to give app devs access to Gemini Nano for on-device AI

The rapid expansion of generative AI has changed the way Google and other tech giants design products, but most of the AI features you’ve used are running on remote servers with a ton of processing power. Your phone has a lot less power, but Google appears poised to give developers some important new mobile AI tools. At I/O next week, Google will likely announce a new set of APIs to let developers leverage the capabilities of Gemini Nano for on-device AI.

Google has quietly published documentation on big new AI features for developers. According to Android Authority, an update to the ML Kit SDK will add API support for on-device generative AI features via Gemini Nano. It’s built on AI Core, similar to the experimental Edge AI SDK, but it plugs into an existing model with a set of predefined features that should be easy for developers to implement.

Google says ML Kit’s GenAI APIs will enable apps to do summarization, proofreading, rewriting, and image description without sending data to the cloud. However, Gemini Nano doesn’t have as much power as the cloud-based version, so expect some limitations. For example, Google notes that summaries can only have a maximum of three bullet points, and image descriptions will only be available in English. The quality of outputs could also vary based on the version of Gemini Nano on a phone. The standard version (Gemini Nano XS) is about 100MB in size, but Gemini Nano XXS as seen on the Pixel 9a is a quarter of the size. It’s text-only and has a much smaller context window.

Not all versions of Gemini Nano are created equal.

Credit: Ryan Whitwam

Not all versions of Gemini Nano are created equal. Credit: Ryan Whitwam

This move is good for Android in general because ML Kit works on devices outside Google’s Pixel line. While Pixel devices use Gemini Nano extensively, several other phones are already designed to run this model, including the OnePlus 13, Samsung Galaxy S25, and Xiaomi 15. As more phones add support for Google’s AI model, developers will be able to target those devices with generative AI features.

Google to give app devs access to Gemini Nano for on-device AI Read More »