world models

Waymo leverages Genie 3 to create a world model for self-driving cars

AI, Cars, Google, self-driving car, waymo, world models / Rejus Almole / February 6, 2026

On the road with AI

The Waymo World Model is not just a straight port of Genie 3 with dashcam videos stuffed inside. Waymo and DeepMind used a specialized post-training process to make the new model generate both 2D video and 3D lidar outputs of the same scene. While cameras are great for visualizing fine details, Waymo says lidar is necessary to add critical depth information to what a self-driving car “sees” on the road—maybe someone should tell Tesla about that.

Using a world model allows Waymo to take video from its vehicles and use prompts to change the route the vehicle takes, which it calls driving action control. These simulations, which come with lidar maps, reportedly offer greater realism and consistency than older reconstructive simulation methods.

With the world model, Waymo can see what would happen if the car took a different turn.

This model can also help improve the self-driving AI even without adding or removing everything. There are plenty of dashcam videos available for training self-driving vehicles, but they lack the multimodal sensor data of Waymo’s vehicles. Dropping such a video into the Waymo World Model generates matching sensor data, showing how the driving AI would have seen that situation.

While the Waymo World Model can create entirely synthetic scenes, the company seems mostly interested in “mutating” the conditions in real videos. The blog post contains examples of changing the time of day or weather, adding new signage, or placing vehicles in unusual places. Or, hey, why not an elephant in the road?

Waymo is ready in case an elephant shows up.

Waymo’s early test cities were consistently sunny (like Phoenix) with little inclement weather. These kinds of simulations could help the cars adapt to the more varied conditions. The new markets include places with more difficult conditions, including Boston and Washington, D.C.

Of course, the benefit of the new AI model will depend on how accurately Genie 3 can simulate the real world. The test videos we’ve seen of Genie 3 run the gamut from pretty believable to uncanny valley territory, but Waymo believes the technology has improved to the point that it can teach self-driving cars a thing or two.

Waymo leverages Genie 3 to create a world model for self-driving cars Read More »

Google Project Genie lets you create interactive worlds from a photo or prompt

AI, Artificial Intelligence, Google, Project Genie, world models / Paul Patrick / January 29, 2026

If that 60-second jaunt into the AI world isn’t enough, you can just run the prompt again. Because this is generative AI, the results will be a little different each time. Google also lets you “remix” its pre-built worlds with new characters and visual styles. The video generated of your exploration is available for download as well.

Still an experiment

Google stresses that Project Genie is still just a research prototype, and there are, therefore, some notable limitations. As anyone who has used Google Veo or OpenAI Sora to create AI videos will know, it takes a few seconds to create even a short clip. So, it’s impressive that Genie can make it feel interactive at all. However, there will be some input lag, and you can only explore each world for 60 seconds. In addition, the promotable events feature previously demoed for Genie 3, which allows inserting new elements into a running simulation, is not available yet.

While Google has talked up Genie’s ability to accurately model physics, the company notes that testers will probably see examples of worlds that don’t look or behave quite right. Testers may also see changing restrictions on content. The Verge was able to test Project Genie, and initially, it was happy to generate knockoffs of Nintendo games like Super Mario and The Legend of Zelda. By the end of the test, The Verge reports that some of those prompts were being blocked due to “interests of third-party content providers.”

Project Genie is only accessible from a dedicated web app—it won’t be plugged into the Gemini app or website. You can only access this tool for the time being with an AI Ultra subscription, which runs $250 per month. Generating all this AI video is expensive, so it makes sense to start with the higher tier. Google says its goal is to open up access to Project Genie over time.

Google Project Genie lets you create interactive worlds from a photo or prompt Read More »

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time

AI, Gen-4.5, GWM-1, ml, robotics, Runway, Runway Gen-4.5, runway ml, world models / Paul Patrick / December 11, 2025

Even using the word “general” has an air of aspiration to it. You would expect a general world model to be, well, one model—but in this case, we’re looking at three distinct, post-trained models. That caveats the general-ness a bit, but Runway says that it’s “working toward unifying many different domains and action spaces under a single base world model.”

A competitive field

And that brings us to another important consideration: With GWM-1, Runway is entering a competitive gold-rush space where its differentiators and competitive advantages are less clear than they were for video. With video, Runway has been able to make major inroads in film/television, advertising, and other industries because its founders are perceived as being more rooted in those creative industries than most competitors, and they’ve designed tools with those industries in mind.

There are indeed hypothetical applications of world models in film, television, advertising, and game development—but it was apparent from Runway’s livestream that the company is also looking at applications in robotics as well as physics and life sciences research, where competitors are already well-established and where we’ve seen increasing investment in recent months.

Many of those competitors are big tech companies with massive resource advantages over Runway. Runway was one of the first to market with a sellable product, and its aggressive efforts to court industry professionals directly has so far allowed it to overcome those advantages in video generation, but it remains to be seen how things will play out with world models, where it doesn’t enjoy either advantage any more than the other entrants.

Regardless, the GWM-1 advancements are impressive—especially if Runway’s claims about consistency and coherence over longer stretches of time are true.

Runway also used its livestream to announce new Gen 4.5 video generation capabilities, including native audio, audio editing, and multi-shot video editing. Further, it announced a deal with CoreWeave, a cloud computing company with an AI focus. The deal will see Runway utilizing Nvidia’s GB300 NVL72 racks on CoreWeave’s cloud infrastructure for future training and inference.

Runway claims its GWM-1 “world models” can stay coherent for minutes at a time Read More »

Big AI firms pump money into world models as LLM advances slow

AI, Google, Meta, NVIDIA, openai, superintelligence, syndication, world models / Kris Guyer / September 29, 2025

Runway, a video generation start-up that has deals with Hollywood studios, including Lionsgate, launched a product last month that uses world models to create gaming settings, with personalized stories and characters generated in real time.

“Traditional video methods [are a] brute-force approach to pixel generation, where you’re trying to squeeze motion in a couple of frames to create the illusion of movement, but the model actually doesn’t really know or reason about what’s going on in that scene,” said Cristóbal Valenzuela, chief executive officer at Runway.

Previous video-generation models had physics that were unlike the real world, he added, which general-purpose world model systems help to address.

To build these models, companies need to collect a huge amount of physical data about the world.

San Francisco-based Niantic has mapped 10 million locations, gathering information through games including Pokémon Go, which has 30 million monthly players interacting with a global map.

Niantic ran Pokémon Go for nine years and, even after the game was sold to US-based Scopely in June, its players still contribute anonymized data through scans of public landmarks to help build its world model.

“We have a running start at the problem,” said John Hanke, chief executive of Niantic Spatial, as the company is now called following the Scopely deal.

Both Niantic and Nvidia are working on filling gaps by getting their world models to generate or predict environments. Nvidia’s Omniverse platform creates and runs such simulations, assisting the $4.3 trillion tech giant’s push toward robotics and building on its long history of simulating real-world environments in video games.

Nvidia Chief Executive Jensen Huang has asserted that the next major growth phase for the company will come with “physical AI,” with the new models revolutionizing the field of robotics.

Some such as Meta’s LeCun have said this vision of a new generation of AI systems powering machines with human-level intelligence could take 10 years to achieve.

But the potential scope of the cutting-edge technology is extensive, according to AI experts. World models “open up the opportunity to service all of these other industries and amplify the same thing that computers did for knowledge work,” said Nvidia’s Lebaredian.

Additional reporting by Melissa Heikkilä in London and Michael Acton in San Francisco.

Big AI firms pump money into world models as LLM advances slow Read More »

New AI model turns photos into explorable 3D worlds, with caveats

3D generation, 3D reconstruction, AI, AI research, Ai video, Biz & IT, computer vision, depth estimation, generative ai, machine learning, multimodal AI, Open Source, scene generation, tencent, video synthesis, world models / Paul Patrick / September 4, 2025

Training with automated data pipeline

Voyager builds on Tencent’s earlier HunyuanWorld 1.0, released in July. Voyager is also part of Tencent’s broader “Hunyuan” ecosystem, which includes the Hunyuan3D-2 model for text-to-3D generation and the previously covered HunyuanVideo for video synthesis.

To train Voyager, researchers developed software that automatically analyzes existing videos to process camera movements and calculate depth for every frame—eliminating the need for humans to manually label thousands of hours of footage. The system processed over 100,000 video clips from both real-world recordings and the aforementioned Unreal Engine renders.

A diagram of the Voyager world creation pipeline. Credit: Tencent

The model demands serious computing power to run, requiring at least 60GB of GPU memory for 540p resolution, though Tencent recommends 80GB for better results. Tencent published the model weights on Hugging Face and included code that works with both single and multi-GPU setups.

The model comes with notable licensing restrictions. Like other Hunyuan models from Tencent, the license prohibits usage in the European Union, the United Kingdom, and South Korea. Additionally, commercial deployments serving over 100 million monthly active users require separate licensing from Tencent.

On the WorldScore benchmark developed by Stanford University researchers, Voyager reportedly achieved the highest overall score of 77.62, compared to 72.69 for WonderWorld and 62.15 for CogVideoX-I2V. The model reportedly excelled in object control (66.92), style consistency (84.89), and subjective quality (71.09), though it placed second in camera control (85.95) behind WonderWorld’s 92.98. WorldScore evaluates world generation approaches across multiple criteria, including 3D consistency and content alignment.

While these self-reported benchmark results seem promising, wider deployment still faces challenges due to the computational muscle involved. For developers needing faster processing, the system supports parallel inference across multiple GPUs using the xDiT framework. Running on eight GPUs delivers processing speeds 6.69 times faster than single-GPU setups.

Given the processing power required and the limitations in generating long, coherent “worlds,” it may be a while before we see real-time interactive experiences using a similar technique. But as we’ve seen so far with experiments like Google’s Genie, we’re potentially witnessing very early steps into a new interactive, generative art form.

New AI model turns photos into explorable 3D worlds, with caveats Read More »