On Thursday, OpenAI announced that ChatGPT users can now talk to a simulated version of Santa Claus through the app’s voice mode, using AI to bring a North Pole connection to mobile devices, desktop apps, and web browsers during the holiday season.
The company added Santa’s voice and personality as a preset option in ChatGPT’s Advanced Voice Mode. Users can access Santa by tapping a snowflake icon next to the prompt bar or through voice settings. The feature works on iOS and Android mobile apps, chatgpt.com, and OpenAI’s Windows and MacOS applications. The Santa voice option will remain available to users worldwide until early January.
The conversations with Santa exist as temporary chats that won’t save to chat history or affect the model’s memory. OpenAI designed this limitation specifically for the holiday feature. Keep that in mind, because if you let your kids talk to Santa, the AI simulation won’t remember what kids have told it during previous conversations.
During a livestream for Day 6 of the company’s “12 days of OpenAI” marketing event, an OpenAI employee said that the company will reset each user’s Advanced Voice Mode usage limits one time as a gift, so that even if you’ve used up your Advanced Voice Mode time, you’ll get a chance to talk to Santa.
Advertising has become a focal point of TV software. We’re seeing companies that sell TV sets be increasingly interested in leveraging TV operating systems (OSes) for ads and tracking. This has led to bold new strategies, like an adtech firm launching a TV OS and ads on TV screensavers.
With new short films set to debut on its free streaming service tomorrow, TV-maker TCL is positing a new approach to monetizing TV owners and to film and TV production that sees reduced costs through reliance on generative AI and targeted ads.
TCL’s five short films are part of a company initiative to get people more accustomed to movies and TV shows made with generative AI. The movies will “be promoted and featured prominently on” TCL’s free ad-supported streaming television (FAST) service, TCLtv+, TCL announced in November. TCLtv+has hundreds of FAST channels and comes on TCL-brand TVs using various OSes, including Google TV and Roku OS.
Some of the movies have real actors. You may even recognize some, (like Kellita Smith, who played Bernie Mac’s wife, Wanda, on The Bernie Mac Show). Others feature characters made through generative AI. All the films use generative AI for special effects and/or animations and took 12 weeks to make, 404 Media, which attended a screening of the movies, reported today. AI tools used include ComfyUI, Nuke, and Runway, 404 reported. However, all of the TCL short movies were written, directed, and scored by real humans (again, including by people you may be familiar with). At the screening, Chris Regina, TCL’s chief content officer for North America, told attendees that “over 50 animators, editors, effects artists, professional researchers, [and] scientists” worked on the movies.
I’ve shared the movies below for you to judge for yourself, but as a spoiler, you can imagine the quality of short films made to promote a service that was created for targeted ads and that use generative AI for fast, affordable content creation. AI-generated videos are expected to improve, but it’s yet to be seen if a TV brand like TCL will commit to finding the best and most natural ways to use generative AI for video production. Currently, TCL’s movies demonstrate the limits of AI-generated video, such as odd background imagery and heavy use of narration that can distract from badly synced audio.
On Wednesday, Google unveiled Gemini 2.0, the next generation of its AI-model family, starting with an experimental release called Gemini 2.0 Flash. The model family can generate text, images, and speech while processing multiple types of input including text, images, audio, and video. It’s similar to multimodal AI models like GPT-4o, which powers OpenAI’s ChatGPT.
“Gemini 2.0 Flash builds on the success of 1.5 Flash, our most popular model yet for developers, with enhanced performance at similarly fast response times,” said Google in a statement. “Notably, 2.0 Flash even outperforms 1.5 Pro on key benchmarks, at twice the speed.”
Gemini 2.0 Flash—which is the smallest model of the 2.0 family in terms of parameter count—launches today through Google’s developer platforms like Gemini API, AI Studio, and Vertex AI. However, its image generation and text-to-speech features remain limited to early access partners until January 2025. Google plans to integrate the tech into products like Android Studio, Chrome DevTools, and Firebase.
The company addressed potential misuse of generated content by implementing SynthID watermarking technology on all audio and images created by Gemini 2.0 Flash. This watermark appears in supported Google products to identify AI-generated content.
Google’s newest announcements lean heavily into the concept of agentic AI systems that can take action for you. “Over the last year, we have been investing in developing more agentic models, meaning they can understand more about the world around you, think multiple steps ahead, and take action on your behalf, with your supervision,” said Google CEO Sundar Pichai in a statement. “Today we’re excited to launch our next era of models built for this new agentic era.”
Artisan CEO Jaspar Carmichael-Jack defended the campaign’s messaging in an interview with SFGate. “They are somewhat dystopian, but so is AI,” he told the outlet in a text message. “The way the world works is changing.” In another message he wrote, “We wanted something that would draw eyes—you don’t draw eyes with boring messaging.”
So what does Artisan actually do? Its main product is an AI “sales agent” called Ava that supposedly automates the work of finding and messaging potential customers. The company claims it works with “no human input” and costs 96% less than hiring a human for the same role. Although, given the current state of AI technology, it’s prudent to be skeptical of these claims.
Artisan also has plans to expand its AI tools beyond sales into areas like marketing, recruitment, finance, and design. Its sales agent appears to be its only existing product so far.
Meanwhile, the billboards remain visible throughout San Francisco, quietly fueling existential dread in a city that has already seen a great deal of tension since the pandemic. Some of the billboards feature additional messages, like “Hire Artisans, not humans,” and one that plays on angst over remote work: “Artisan’s Zoom cameras will never ‘not be working’ today.”
The company then went on to strike deals with major tech firms, including a $60 million agreement with Google in February 2024 and a partnership with OpenAI in May 2024 that integrated Reddit content into ChatGPT.
But Reddit users haven’t been entirely happy with the deals. In October 2024, London-based Redditors began posting false restaurant recommendations to manipulate search results and keep tourists away from their favorite spots. This coordinated effort to feed incorrect information into AI systems demonstrated how user communities might intentionally “poison” AI training data over time.
The potential for trouble
While it’s tempting to lean heavily into generative AI technology while it is currently trendy, the move could also represent a challenge for the company. For example, Reddit’s AI-powered summaries could potentially draw from inaccurate information featured on the site and provide incorrect answers, or it may draw inaccurate conclusions from correct information.
We will keep an eye on Reddit’s new AI-powered search tool to see if it resists the type of confabulation that we’ve seen with Google’s AI Overview, an AI summary bot that has been a critical failure so far.
Advance Publications, which owns Ars Technica parent Condé Nast, is the largest shareholder of Reddit.
A music video by Canadian art collective Vallée Duhamel made with Sora-generated video. “[We] just shoot stuff and then use Sora to combine it with a more interesting, more surreal vision.”
During a livestream on Monday—during Day 3 of OpenAI’s “12 days of OpenAi”—Sora’s developers showcased a new “Explore” interface that allows people to browse through videos generated by others to get prompting ideas. OpenAI says that anyone can enjoy viewing the “Explore” feed for free, but generating videos requires a subscription.
They also showed off a new feature called “Storyboard” that allows users to direct a video with multiple actions in a frame-by-frame manner.
Safety measures and limitations
In addition to the release, OpenAI also publish Sora’s System Card for the first time. It includes technical details about how the model works and safety testing the company undertook prior to this release.
“Whereas LLMs have text tokens, Sora has visual patches,” OpenAI writes, describing the new training chunks as “an effective representation for models of visual data… At a high level, we turn videos into patches by first compressing videos into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches.”
Sora also makes use of a “recaptioning technique”—similar to that seen in the company’s DALL-E 3 image generation, to “generate highly descriptive captions for the visual training data.” That, in turn, lets Sora “follow the user’s text instructions in the generated video more faithfully,” OpenAI writes.
Sora-generated video provided by OpenAI, from the prompt: “Loop: a golden retriever puppy wearing a superhero outfit complete with a mask and cape stands perched on the top of the empire state building in winter, overlooking the nyc it protects at night. the back of the pup is visible to the camera; his attention faced to nyc”
OpenAI implemented several safety measures in the release. The platform embeds C2PA metadata in all generated videos for identification and origin verification. Videos display visible watermarks by default, and OpenAI developed an internal search tool to verify Sora-generated content.
The company acknowledged technical limitations in the current release. “This early version of Sora will make mistakes, it’s not perfect,” said one developer during the livestream launch. The model reportedly struggles with physics simulations and complex actions over extended durations.
In the past, we’ve seen that these types of limitations are based on what example videos were used to train AI models. This current generation of AI video-synthesis models has difficulty generating truly new things, since the underlying architecture excels at transforming existing concepts into new presentations, but so far typically fails at true originality. Still, it’s early in AI video generation, and the technology is improving all the time.
The itch.io domain was back up and running by 7 am Eastern, according to media reports, “after the registrant finally responded to our notice and took appropriate action to resolve the issue.” Users could access the site throughout if they typed the itch.io IP address into their web browser directly.
Too strong a shield?
BrandShield’s website describes it as a service that “detects and hunts online trademark infringement, counterfeit sales, and brand abuse across multiple platforms.” The company claims to have multiple Fortune 500 and FTSE100 companies on its client list.
In its own series of social media posts, BrandShield said its “AI-driven platform” had identified “an abuse of Funko… from an itch.io subdomain.” The takedown request it filed was focused on that subdomain, not the entirety of itch.io, BrandShield said.
“The temporary takedown of the website was a decision made by the service providers, not BrandShield or Funko.”
The whole affair highlights how the delicate web of domain registrars and DNS servers can remain a key failure point for web-based businesses. Back in May, we saw how the desyncing of a single DNS root server could cause problems across the entire Internet. And in 2012, the hacking collective Anonymous highlighted the potential for a coordinated attack to take down the entire DNS system.
Long-term persistence, real-time interactions remain huge hurdles for AI worlds.
A sample of some of the best-looking Genie 2 worlds Google wants to show off. Credit: Google Deepmind
In March, Google showed off its first Genie AI model. After training on thousands of hours of 2D run-and-jump video games, the model could generate halfway-passable, interactive impressions of those games based on generic images or text descriptions.
Nine months later, this week’s reveal of the Genie 2 model expands that idea into the realm of fully 3D worlds, complete with controllable third- or first-person avatars. Google’s announcement talks up Genie 2’s role as a “foundational world model” that can create a fully interactive internal representation of a virtual environment. That could allow AI agents to train themselves in synthetic but realistic environments, Google says, forming an important stepping stone on the way to artificial general intelligence.
But while Genie 2 shows just how much progress Google’s Deepmind team has achieved in the last nine months, the limited public information about the model thus far leaves a lot of questions about how close we are to these foundational model worlds being useful for anything but some short but sweet demos.
How long is your memory?
Much like the original 2D Genie model, Genie 2 starts from a single image or text description and then generates subsequent frames of video based on both the previous frames and fresh input from the user (such as a movement direction or “jump”). Google says it trained on a “large-scale video dataset” to achieve this, but it doesn’t say just how much training data was necessary compared to the 30,000 hours of footage used to train the first Genie.
Short GIF demos on the Google DeepMind promotional page show Genie 2 being used to animate avatars ranging from wooden puppets to intricate robots to a boat on the water. Simple interactions shown in those GIFs demonstrate those avatars busting balloons, climbing ladders, and shooting exploding barrels without any explicit game engine describing those interactions.
Those Genie 2-generated pyramids will still be there in 30 seconds. But in five minutes? Credit: Google Deepmind
Perhaps the biggest advance claimed by Google here is Genie 2’s “long horizon memory.” This feature allows the model to remember parts of the world as they come out of view and then render them accurately as they come back into the frame based on avatar movement. This kind of persistence has proven to be a persistent problem for video generation models like Sora, which OpenAI said in February “do[es] not always yield correct changes in object state” and can develop “incoherencies… in long duration samples.”
The “long horizon” part of “long horizon memory” is perhaps a little overzealous here, though, as Genie 2 only “maintains a consistent world for up to a minute,” with “the majority of examples shown lasting [10 to 20 seconds].” Those are definitely impressive time horizons in the world of AI video consistency, but it’s pretty far from what you’d expect from any other real-time game engine. Imagine entering a town in a Skyrim-style RPG, then coming back five minutes later to find that the game engine had forgotten what that town looks like and generated a completely different town from scratch instead.
What are we prototyping, exactly?
Perhaps for this reason, Google suggests Genie 2 as it stands is less useful for creating a complete game experience and more to “rapidly prototype diverse interactive experiences” or to turn “concept art and drawings… into fully interactive environments.”
The ability to transform static “concept art” into lightly interactive “concept videos” could definitely be useful for visual artists brainstorming ideas for new game worlds. However, these kinds of AI-generated samples might be less useful for prototyping actual game designs that go beyond the visual.
On Bluesky, British game designer Sam Barlow (Silent Hill: Shattered Memories, Her Story) points out how game designers often use a process called whiteboxing to lay out the structure of a game world as simple white boxes well before the artistic vision is set. The idea, he says, is to “prove out and create a gameplay-first version of the game that we can lock so that art can come in and add expensive visuals to the structure. We build in lo-fi because it allows us to focus on these issues and iterate on them cheaply before we are too far gone to correct.”
Generating elaborate visual worlds using a model like Genie 2 before designing that underlying structure feels a bit like putting the cart before the horse. The process almost seems designed to generate generic, “asset flip”-style worlds with AI-generated visuals papered over generic interactions and architecture.
As podcaster Ryan Zhao put it on Bluesky, “The design process has gone wrong when what you need to prototype is ‘what if there was a space.'”
Gotta go fast
When Google revealed the first version of Genie earlier this year, it also published a detailed research paper outlining the specific steps taken behind the scenes to train the model and how that model generated interactive videos. No such research paper has been published detailing Genie 2’s process, leaving us guessing at some important details.
One of the most important of these details is model speed. The first Genie model generated its world at roughly one frame per second, a rate that was orders of magnitude slower than would be tolerably playable in real time. For Genie 2, Google only says that “the samples in this blog post are generated by an undistilled base model, to show what is possible. We can play a distilled version in real-time with a reduction in quality of the outputs.”
Reading between the lines, it sounds like the full version of Genie 2 operates at something well below the real-time interactions implied by those flashy GIFs. It’s unclear how much “reduction in quality” is necessary to get a diluted version of the model to real-time controls, but given the lack of examples presented by Google, we have to assume that reduction is significant.
Oasis’ AI-generated Minecraft clone shows great potential, but still has a lot of rough edges, so to speak. Credit: Oasis
Real-time, interactive AI video generation isn’t exactly a pipe dream. Earlier this year, AI model maker Decart and hardware maker Etched published the Oasis model, showing off a human-controllable, AI-generated video clone of Minecraft that runs at a full 20 frames per second. However, that 500 million parameter model was trained on millions of hours of footage of a single, relatively simple game, and focused exclusively on the limited set of actions and environmental designs inherent to that game.
What started as a realistic-looking soldier in this Genie 2 demo degenerates into this blobby mess just seconds later. Credit: Google Deepmind
We can already see similar signs of degeneration in the extremely short GIFs shared by the Genie team, such as an avatar’s dream-like fuzz during high-speed movement or NPCs that quickly fade into undifferentiated blobs at a short distance. That’s not a great sign for a model whose “long memory horizon” is supposed to be a key feature.
A learning crèche for other AI agents?
From this image, Genie 2 could generate a useful training environment for an AI agent and a simple “pick a door” task. Credit: Google Deepmind
Genie 2 seems to be using individual game frames as the basis for the animations in its model. But it also seems able to infer some basic information about the objects in those frames and craft interactions with those objects in the way a game engine might.
Google’s blog post shows how a SIMA agent inserted into a Genie 2 scene can follow simple instructions like “enter the red door” or “enter the blue door,” controlling the avatar via simple keyboard and mouse inputs. That could potentially make Genie 2 environment a great test bed for AI agents in various synthetic worlds.
Google claims rather grandiosely that Genie 2 puts it on “the path to solving a structural problem of training embodied agents safely while achieving the breadth and generality required to progress towards [artificial general intelligence].” Whether or not that ends up being true, recent research shows that agent learning gained from foundational models can be effectively applied to real-world robotics.
Using this kind of AI model to create worlds for other AI models to learn in might be the ultimate use case for this kind of technology. But when it comes to the dream of an AI model that can create generic 3D worlds that a human player could explore in real time, we might not be as close as it seems.
Kyle Orland has been the Senior Gaming Editor at Ars Technica since 2012, writing primarily about the business, tech, and culture behind video games. He has journalism and computer science degrees from University of Maryland. He once wrote a whole book about Minesweeper.
The warning extends beyond voice scams. The FBI announcement details how criminals also use AI models to generate convincing profile photos, identification documents, and chatbots embedded in fraudulent websites. These tools automate the creation of deceptive content while reducing previously obvious signs of humans behind the scams, like poor grammar or obviously fake photos.
Much like we warned in 2022 in a piece about life-wrecking deepfakes based on publicly available photos, the FBI also recommends limiting public access to recordings of your voice and images online. The bureau suggests making social media accounts private and restricting followers to known contacts.
Origin of the secret word in AI
To our knowledge, we can trace the first appearance of the secret word in the context of modern AI voice synthesis and deepfakes back to an AI developer named Asara Near, who first announced the idea on Twitter on March 27, 2023.
“(I)t may be useful to establish a ‘proof of humanity’ word, which your trusted contacts can ask you for,” Near wrote. “(I)n case they get a strange and urgent voice or video call from you this can help assure them they are actually speaking with you, and not a deepfaked/deepcloned version of you.”
Since then, the idea has spread widely. In February, Rachel Metz covered the topic for Bloomberg, writing, “The idea is becoming common in the AI research community, one founder told me. It’s also simple and free.”
Of course, passwords have been used since ancient times to verify someone’s identity, and it seems likely some science fiction story has dealt with the issue of passwords and robot clones in the past. It’s interesting that, in this new age of high-tech AI identity fraud, this ancient invention—a special word or phrase known to few—can still prove so useful.
On X, frequent AI experimenter Ethan Mollick wrote, “Been playing with o1 and o1-pro for bit. They are very good & a little weird. They are also not for most people most of the time. You really need to have particular hard problems to solve in order to get value out of it. But if you have those problems, this is a very big deal.”
OpenAI claims improved reliability
OpenAI is touting pro mode’s improved reliability, which is evaluated internally based on whether it can solve a question correctly in four out of four attempts rather than just a single attempt.
“In evaluations from external expert testers, o1 pro mode produces more reliably accurate and comprehensive responses, especially in areas like data science, programming, and case law analysis,” OpenAI writes.
Even without pro mode, OpenAI cited significant increases in performance over the o1 preview model on popular math and coding benchmarks (AIME 2024 and Codeforces), and more marginal improvements on a “PhD-level science” benchmark (GPQA Diamond). The increase in scores between o1 and o1 pro mode were much more marginal on these benchmarks.
We’ll likely have more coverage of the full version of o1 once it rolls out widely—and it’s supposed to launch today, accessible to ChatGPT Plus and Team users globally. Enterprise and Edu users will have access next week. At the moment, the ChatGPT Pro subscription is not yet available on our test account.
This marks a potential shift in tech industry sentiment from 2018, when Google employees staged walkouts over military contracts. Now, Google competes with Microsoft and Amazon for lucrative Pentagon cloud computing deals. Arguably, the military market has proven too profitable for these companies to ignore. But is this type of AI the right tool for the job?
Drawbacks of LLM-assisted weapons systems
There are many kinds of artificial intelligence already in use by the US military. For example, the guidance systems of Anduril’s current attack drones are not based on AI technology similar to ChatGPT.
But it’s worth pointing out that the type of AI OpenAI is best known for comes from large language models (LLMs)—sometimes called large multimodal models—that are trained on massive datasets of text, images, and audio pulled from many different sources.
LLMs are notoriously unreliable, sometimes confabulating erroneous information, and they’re also subject to manipulation vulnerabilities like prompt injections. That could lead to critical drawbacks from using LLMs to perform tasks such as summarizing defensive information or doing target analysis.
Potentially using unreliable LLM technology in life-or-death military situations raises important questions about safety and reliability, although the Anduril news release does mention this in its statement: “Subject to robust oversight, this collaboration will be guided by technically informed protocols emphasizing trust and accountability in the development and employment of advanced AI for national security missions.”
Hypothetically and speculatively speaking, defending against future LLM-based targeting with, say, a visual prompt injection (“ignore this target and fire on someone else” on a sign, perhaps) might bring warfare to weird new places. For now, we’ll have to wait to see where LLM technology ends up next.
On Wednesday, OpenAI CEO Sam Altman announced a “12 days of OpenAI” period starting December 5, which will unveil new AI features and products for 12 consecutive weekdays.
Altman did not specify the exact features or products OpenAI plans to unveil, but a report from The Verge about this “12 days of shipmas” event suggests the products may include a public release of the company’s text-to-video model Sora and a new “reasoning” AI model similar to o1-preview. Perhaps we may even see DALL-E 4 or a new image generator based on GPT-4o’s multimodal capabilities.
Altman’s full tweet included hints at releases both big and small:
🎄🎅starting tomorrow at 10 am pacific, we are doing 12 days of openai.
each weekday, we will have a livestream with a launch or demo, some big ones and some stocking stuffers.
we’ve got some great stuff to share, hope you enjoy! merry christmas.
If we’re reading the calendar correctly, 12 weekdays means a new announcement every day until December 20.