AI

landmark-ai-deal-sees-hollywood-giant-lionsgate-provide-library-for-ai-training

Landmark AI deal sees Hollywood giant Lionsgate provide library for AI training

The silicon screen —

Runway deal will create a Lionsgate AI video generator, but not everyone is happy.

An illustration of a filmstrip with a robot, horse, rocket, and whale.

On Wednesday, AI video synthesis firm Runway and entertainment company Lionsgate announced a partnership to create a new AI model trained on Lionsgate’s vast film and TV library. The deal will feed Runway legally clear training data and will also reportedly provide Lionsgate with tools to enhance content creation while potentially reducing production costs.

Lionsgate, known for franchises like John Wick and The Hunger Games, sees AI as a way to boost efficiency in content production. Michael Burns, Lionsgate’s vice chair, stated in a press release that AI could help develop “cutting edge, capital efficient content creation opportunities.” He added that some filmmakers have shown enthusiasm about potential applications in pre- and post-production processes.

Runway plans to develop a custom AI model using Lionsgate’s proprietary content portfolio. The model will be exclusive to Lionsgate Studios, allowing filmmakers, directors, and creative staff to augment their work. While specifics remain unclear, the partnership marks the first major collaboration between Runway and a Hollywood studio.

“We’re committed to giving artists, creators and studios the best and most powerful tools to augment their workflows and enable new ways of bringing their stories to life,” said Runway co-founder and CEO Cristóbal Valenzuela in a press release. “The history of art is the history of technology and these new models are part of our continuous efforts to build transformative mediums for artistic and creative expression; the best stories are yet to be told.”

The quest for legal training data

Generative AI models are master imitators, and video synthesis models like Runway’s latest Gen-3 Alpha are no exception. The companies that create them must amass a great deal of existing video (and still image) samples to analyze, allowing the resulting AI models to re-synthesize that information into new video generations, guided by text descriptions called prompts. And wherever that training data is lacking, it can result in unusual generations, as we saw in our hands-on evaluation of Gen-3 Alpha in July.

However, in the past, AI companies have gotten into legal trouble for scraping vast quantities of media without permission. In fact, Runway is currently the defendant in a class-action lawsuit that alleges copyright infringement for using video data obtained without permission to train its video synthesis models. While companies like OpenAI have claimed this scraping process is “fair use,” US courts have not yet definitively ruled on the practice. With other potential legal challenges ahead, it makes sense from Runway’s perspective to reach out and sign deals for training data that is completely in the clear.

Even if the training data becomes fully legal and licensed, different elements of the entertainment industry view generative AI on a spectrum that seems to range between fascination and horror. The technology’s ability to rapidly create images and video based on prompts may attract studios looking to streamline production. However, it raises polarizing concerns among unions about job security, actors and musicians about likeness misuse and ethics, and studios about legal implications.

So far, news of the deal has not been received kindly among vocal AI critics found on social media. On X, filmmaker and AI critic Joe Russo wrote, “I don’t think I’ve ever seen a grosser string of words than: ‘to develop cutting-edge, capital-efficient content creation opportunities.'”

Film concept artist Reid Southen shared a similar negative take on X: “I wonder how the directors and actors of their films feel about having their work fed into the AI to make a proprietary model. As an artist on The Hunger Games? I’m pissed. This is the first step in trying to replace artists and filmmakers.”

It’s a fear that we will likely hear more about in the future as AI video synthesis technology grows more capable—and potentially becomes adopted as a standard filmmaking tool. As studios explore AI applications despite legal uncertainties and labor concerns, partnerships like the Lionsgate-Runway deal may shape the future of content creation in Hollywood.

Landmark AI deal sees Hollywood giant Lionsgate provide library for AI training Read More »

macos-15-sequoia:-the-ars-technica-review

macOS 15 Sequoia: The Ars Technica review

macOS 15 Sequoia: The Ars Technica review

Apple

The macOS 15 Sequoia update will inevitably be known as “the AI one” in retrospect, introducing, as it does, the first wave of “Apple Intelligence” features.

That’s funny because none of that stuff is actually ready for the 15.0 release that’s coming out today. A lot of it is coming “later this fall” in the 15.1 update, which Apple has been testing entirely separately from the 15.0 betas for weeks now. Some of it won’t be ready until after that—rumors say image generation won’t be ready until the end of the year—but in any case, none of it is ready for public consumption yet.

But the AI-free 15.0 release does give us a chance to evaluate all of the non-AI additions to macOS this year. Apple Intelligence is sucking up a lot of the media oxygen, but in most other ways, this is a typical 2020s-era macOS release, with one or two headliners, several quality-of-life tweaks, and some sparsely documented under-the-hood stuff that will subtly change how you experience the operating system.

The AI-free version of the operating system is also the one that all users of the remaining Intel Macs will be using, since all of the Apple Intelligence features require Apple Silicon. Most of the Intel Macs that ran last year’s Sonoma release will run Sequoia this year—the first time this has happened since 2019—but the difference between the same macOS version running on different CPUs will be wider than it has been. It’s a clear indicator that the Intel Mac era is drawing to a close, even if support hasn’t totally ended just yet.

macOS 15 Sequoia: The Ars Technica review Read More »

google-rolls-out-voice-powered-ai-chat-to-the-android-masses

Google rolls out voice-powered AI chat to the Android masses

Chitchat Wars —

Gemini Live allows back-and-forth conversation, now free to all Android users.

The Google Gemini logo.

Enlarge / The Google Gemini logo.

Google

On Thursday, Google made Gemini Live, its voice-based AI chatbot feature, available for free to all Android users. The feature allows users to interact with Gemini through voice commands on their Android devices. That’s notable because competitor OpenAI’s Advanced Voice Mode feature of ChatGPT, which is similar to Gemini Live, has not yet fully shipped.

Google unveiled Gemini Live during its Pixel 9 launch event last month. Initially, the feature was exclusive to Gemini Advanced subscribers, but now it’s accessible to anyone using the Gemini app or its overlay on Android.

Gemini Live enables users to ask questions aloud and even interrupt the AI’s responses mid-sentence. Users can choose from several voice options for Gemini’s responses, adding a level of customization to the interaction.

Gemini suggests the following uses of the voice mode in its official help documents:

Talk back and forth: Talk to Gemini without typing, and Gemini will respond back verbally.

Brainstorm ideas out loud: Ask for a gift idea, to plan an event, or to make a business plan.

Explore: Uncover more details about topics that interest you.

Practice aloud: Rehearse for important moments in a more natural and conversational way.

Interestingly, while OpenAI originally demoed its Advanced Voice Mode in May with the launch of GPT-4o, it has only shipped the feature to a limited number of users starting in late July. Some AI experts speculate that a wider rollout has been hampered by a lack of available computer power since the voice feature is presumably very compute-intensive.

To access Gemini Live, users can reportedly tap a new waveform icon in the bottom-right corner of the app or overlay. This action activates the microphone, allowing users to pose questions verbally. The interface includes options to “hold” Gemini’s answer or “end” the conversation, giving users control over the flow of the interaction.

Currently, Gemini Live supports only English, but Google has announced plans to expand language support in the future. The company also intends to bring the feature to iOS devices, though no specific timeline has been provided for this expansion.

Google rolls out voice-powered AI chat to the Android masses Read More »

openai’s-new-“reasoning”-ai-models-are-here:-o1-preview-and-o1-mini

OpenAI’s new “reasoning” AI models are here: o1-preview and o1-mini

fruit by the foot —

New o1 language model can solve complex tasks iteratively, count R’s in “strawberry.”

An illustration of a strawberry made out of pixel-like blocks.

OpenAI finally unveiled its rumored “Strawberry” AI language model on Thursday, claiming significant improvements in what it calls “reasoning” and problem-solving capabilities over previous large language models (LLMs). Formally named “OpenAI o1,” the model family will initially launch in two forms, o1-preview and o1-mini, available today for ChatGPT Plus and certain API users.

OpenAI claims that o1-preview outperforms its predecessor, GPT-4o, on multiple benchmarks, including competitive programming, mathematics, and “scientific reasoning.” However, people who have used the model say it does not yet outclass GPT-4o in every metric. Other users have criticized the delay in receiving a response from the model, owing to the multi-step processing occurring behind the scenes before answering a query.

In a rare display of public hype-busting, OpenAI product manager Joanne Jang tweeted, “There’s a lot of o1 hype on my feed, so I’m worried that it might be setting the wrong expectations. what o1 is: the first reasoning model that shines in really hard tasks, and it’ll only get better. (I’m personally psyched about the model’s potential & trajectory!) what o1 isn’t (yet!): a miracle model that does everything better than previous models. you might be disappointed if this is your expectation for today’s launch—but we’re working to get there!”

OpenAI reports that o1-preview ranked in the 89th percentile on competitive programming questions from Codeforces. In mathematics, it scored 83 percent on a qualifying exam for the International Mathematics Olympiad, compared to GPT-4o’s 13 percent. OpenAI also states, in a claim that may later be challenged as people scrutinize the benchmarks and run their own evaluations over time, o1 performs comparably to PhD students on specific tasks in physics, chemistry, and biology. The smaller o1-mini model is designed specifically for coding tasks and is priced at 80 percent less than o1-preview.

A benchmark chart provided by OpenAI. They write,

Enlarge / A benchmark chart provided by OpenAI. They write, “o1 improves over GPT-4o on a wide range of benchmarks, including 54/57 MMLU subcategories. Seven are shown for illustration.”

OpenAI attributes o1’s advancements to a new reinforcement learning (RL) training approach that teaches the model to spend more time “thinking through” problems before responding, similar to how “let’s think step-by-step” chain-of-thought prompting can improve outputs in other LLMs. The new process allows o1 to try different strategies and “recognize” its own mistakes.

AI benchmarks are notoriously unreliable and easy to game; however, independent verification and experimentation from users will show the full extent of o1’s advancements over time. It’s worth noting that MIT Research showed earlier this year that some of the benchmark claims OpenAI touted with GPT-4 last year were erroneous or exaggerated.

A mixed bag of capabilities

OpenAI demos “o1” correctly counting the number of Rs in the word “strawberry.”

Amid many demo videos of o1 completing programming tasks and solving logic puzzles that OpenAI shared on its website and social media, one demo stood out as perhaps the least consequential and least impressive, but it may become the most talked about due to a recurring meme where people ask LLMs to count the number of R’s in the word “strawberry.”

Due to tokenization, where the LLM processes words in data chunks called tokens, most LLMs are typically blind to character-by-character differences in words. Apparently, o1 has the self-reflective capabilities to figure out how to count the letters and provide an accurate answer without user assistance.

Beyond OpenAI’s demos, we’ve seen optimistic but cautious hands-on reports about o1-preview online. Wharton Professor Ethan Mollick wrote on X, “Been using GPT-4o1 for the last month. It is fascinating—it doesn’t do everything better but it solves some very hard problems for LLMs. It also points to a lot of future gains.”

Mollick shared a hands-on post in his “One Useful Thing” blog that details his experiments with the new model. “To be clear, o1-preview doesn’t do everything better. It is not a better writer than GPT-4o, for example. But for tasks that require planning, the changes are quite large.”

Mollick gives the example of asking o1-preview to build a teaching simulator “using multiple agents and generative AI, inspired by the paper below and considering the views of teachers and students,” then asking it to build the full code, and it produced a result that Mollick found impressive.

Mollick also gave o1-preview eight crossword puzzle clues, translated into text, and the model took 108 seconds to solve it over many steps, getting all of the answers correct but confabulating a particular clue Mollick did not give it. We recommend reading Mollick’s entire post for a good early hands-on impression. Given his experience with the new model, it appears that o1 works very similar to GPT-4o but iteratively in a loop, which is something that the so-called “agentic” AutoGPT and BabyAGI projects experimented with in early 2023.

Is this what could “threaten humanity?”

Speaking of agentic models that run in loops, Strawberry has been subject to hype since last November, when it was initially known as Q(Q-star). At the time, The Information and Reuters claimed that, just before Sam Altman’s brief ouster as CEO, OpenAI employees had internally warned OpenAI’s board of directors about a new OpenAI model called Q*  that could “threaten humanity.”

In August, the hype continued when The Information reported that OpenAI showed Strawberry to US national security officials.

We’ve been skeptical about the hype around Qand Strawberry since the rumors first emerged, as this author noted last November, and Timothy B. Lee covered thoroughly in an excellent post about Q* from last December.

So even though o1 is out, AI industry watchers should note how this model’s impending launch was played up in the press as a dangerous advancement while not being publicly downplayed by OpenAI. For an AI model that takes 108 seconds to solve eight clues in a crossword puzzle and hallucinates one answer, we can say that its potential danger was likely hype (for now).

Controversy over “reasoning” terminology

It’s no secret that some people in tech have issues with anthropomorphizing AI models and using terms like “thinking” or “reasoning” to describe the synthesizing and processing operations that these neural network systems perform.

Just after the OpenAI o1 announcement, Hugging Face CEO Clement Delangue wrote, “Once again, an AI system is not ‘thinking,’ it’s ‘processing,’ ‘running predictions,’… just like Google or computers do. Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it’s more clever than it is.”

“Reasoning” is also a somewhat nebulous term since, even in humans, it’s difficult to define exactly what the term means. A few hours before the announcement, independent AI researcher Simon Willison tweeted in response to a Bloomberg story about Strawberry, “I still have trouble defining ‘reasoning’ in terms of LLM capabilities. I’d be interested in finding a prompt which fails on current models but succeeds on strawberry that helps demonstrate the meaning of that term.”

Reasoning or not, o1-preview currently lacks some features present in earlier models, such as web browsing, image generation, and file uploading. OpenAI plans to add these capabilities in future updates, along with continued development of both the o1 and GPT model series.

While OpenAI says the o1-preview and o1-mini models are rolling out today, neither model is available in our ChatGPT Plus interface yet, so we have not been able to evaluate them. We’ll report our impressions on how this model differs from other LLMs we have previously covered.

OpenAI’s new “reasoning” AI models are here: o1-preview and o1-mini Read More »

ai-chatbots-might-be-better-at-swaying-conspiracy-theorists-than-humans

AI chatbots might be better at swaying conspiracy theorists than humans

Out of the rabbit hole —

Co-author Gordon Pennycook: “The work overturns a lot of how we thought about conspiracies.”

A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Enlarge / A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Stephanie Keith | Getty Images

Belief in conspiracy theories is rampant, particularly in the US, where some estimates suggest as much as 50 percent of the population believes in at least one outlandish claim. And those beliefs are notoriously difficult to debunk. Challenge a committed conspiracy theorist with facts and evidence, and they’ll usually just double down—a phenomenon psychologists usually attribute to motivated reasoning, i.e., a biased way of processing information.

A new paper published in the journal Science is challenging that conventional wisdom, however. Experiments in which an AI chatbot engaged in conversations with people who believed at least one conspiracy theory showed that the interaction significantly reduced the strength of those beliefs, even two months later. The secret to its success: the chatbot, with its access to vast amounts of information across an enormous range of topics, could precisely tailor its counterarguments to each individual.

“These are some of the most fascinating results I’ve ever seen,” co-author Gordon Pennycook, a psychologist at Cornell University, said during a media briefing. “The work overturns a lot of how we thought about conspiracies, that they’re the result of various psychological motives and needs. [Participants] were remarkably responsive to evidence. There’s been a lot of ink spilled about being in a post-truth world. It’s really validating to know that evidence does matter. We can act in a more adaptive way using this new technology to get good evidence in front of people that is specifically relevant to what they think, so it’s a much more powerful approach.”

When confronted with facts that challenge a deeply entrenched belief, people will often seek to preserve it rather than update their priors (in Bayesian-speak) in light of the new evidence. So there has been a good deal of pessimism lately about ever reaching those who have plunged deep down the rabbit hole of conspiracy theories, which are notoriously persistent and “pose a serious threat to democratic societies,” per the authors. Pennycook and his fellow co-authors devised an alternative explanation for that stubborn persistence of belief.

Bespoke counter-arguments

The issue is that “conspiracy theories just vary a lot from person to person,” said co-author Thomas Costello, a psychologist at American University who is also affiliated with MIT. “They’re quite heterogeneous. People believe a wide range of them and the specific evidence that people use to support even a single conspiracy may differ from one person to another. So debunking attempts where you try to argue broadly against a conspiracy theory are not going to be effective because people have different versions of that conspiracy in their heads.”

By contrast, an AI chatbot would be able to tailor debunking efforts to those different versions of a conspiracy. So in theory a chatbot might prove more effective in swaying someone from their pet conspiracy theory.

To test their hypothesis, the team conducted a series of experiments with 2,190 participants who believed in one or more conspiracy theories. The participants engaged in several personal “conversations” with a large language model (GT-4 Turbo) in which they shared their pet conspiracy theory and the evidence they felt supported that belief. The LLM would respond by offering factual and evidence-based counter-arguments tailored to the individual participant. GPT-4 Turbo’s responses were professionally fact-checked, which showed that 99.2 percent of the claims it made were true, with just 0.8 percent being labeled misleading, and zero as false. (You can try your hand at interacting with the debunking chatbot here.)

Screenshot of the chatbot opening page asking questions to prepare for a conversation

Enlarge / Screenshot of the chatbot opening page asking questions to prepare for a conversation

Thomas H. Costello

Participants first answered a series of open-ended questions about the conspiracy theories they strongly believed and the evidence they relied upon to support those beliefs. The AI then produced a single-sentence summary of each belief, for example, “9/11 was an inside job because X, Y, and Z.” Participants would rate the accuracy of that statement in terms of their own beliefs and then filled out a questionnaire about other conspiracies, their attitude toward trusted experts, AI, other people in society, and so forth.

Then it was time for the one-on-one dialogues with the chatbot, which the team programmed to be as persuasive as possible. The chatbot had also been fed the open-ended responses of the participants, which made it better to tailor its counter-arguments individually. For example, if someone thought 9/11 was an inside job and cited as evidence the fact that jet fuel doesn’t burn hot enough to melt steel, the chatbot might counter with, say, the NIST report showing that steel loses its strength at much lower temperatures, sufficient to weaken the towers’ structures so that it collapsed. Someone who thought 9/11 was an inside job and cited demolitions as evidence would get a different response tailored to that.

Participants then answered the same set of questions after their dialogues with the chatbot, which lasted about eight minutes on average. Costello et al. found that these targeted dialogues resulted in a 20 percent decrease in the participants’ misinformed beliefs—a reduction that persisted even two months later when participants were evaluated again.

As Bence Bago (Tilburg University) and Jean-Francois Bonnefon (CNRS, Toulouse, France) noted in an accompanying perspective, this is a substantial effect compared to the 1 to 6 percent drop in beliefs achieved by other interventions. They also deemed the persistence of the effect noteworthy, while cautioning that two months is “insufficient to completely eliminate misinformed conspiracy beliefs.”

AI chatbots might be better at swaying conspiracy theorists than humans Read More »

taylor-swift-cites-ai-deepfakes-in-endorsement-for-kamala-harris

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris

it’s raining creepy men —

Taylor Swift on AI: “The simplest way to combat misinformation is with the truth.”

A screenshot of Taylor Swift's Kamala Harris Instagram post, captured on September 11, 2024.

Enlarge / A screenshot of Taylor Swift’s Kamala Harris Instagram post, captured on September 11, 2024.

On Tuesday night, Taylor Swift endorsed Vice President Kamala Harris for US President on Instagram, citing concerns over AI-generated deepfakes as a key motivator. The artist’s warning aligns with current trends in technology, especially in an era where AI synthesis models can easily create convincing fake images and videos.

“Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site,” she wrote in her Instagram post. “It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth.”

In August 2024, former President Donald Trump posted AI-generated images on Truth Social falsely suggesting Swift endorsed him, including a manipulated photo depicting Swift as Uncle Sam with text promoting Trump. The incident sparked Swift’s fears about the spread of misinformation through AI.

This isn’t the first time Swift and generative AI have appeared together in the news. In February, we reported that a flood of explicit AI-generated images of Swift originated from a 4chan message board where users took part in daily challenges to bypass AI image generator filters.

Listing image by Ronald Woan/CC BY-SA 2.0

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris Read More »

human-drivers-keep-rear-ending-waymos

Human drivers keep rear-ending Waymos

Traffic safety —

We took a close look at the 23 most serious Waymo crashes.

A Waymo vehicle in San Francisco.

Enlarge / A Waymo vehicle in San Francisco.

Photo by JasonDoiy via Getty Images

On a Friday evening last November, police chased a silver sedan across the San Francisco Bay Bridge. The fleeing vehicle entered San Francisco and went careening through the city’s crowded streets. At the intersection of 11th and Folsom streets, it sideswiped the fronts of two other vehicles, veered onto a sidewalk, and hit two pedestrians.

According to a local news story, both pedestrians were taken to the hospital with one suffering major injuries. The driver of the silver sedan was injured, as was a passenger in one of the other vehicles.

No one was injured in the third car, a driverless Waymo robotaxi. Still, Waymo was required to report the crash to government agencies. It was one of 20 crashes with injuries that Waymo has reported through June.  And it’s the only crash Waymo has classified as causing a serious injury.

Twenty injuries might sound like a lot, but Waymo’s driverless cars have traveled more than 22 million miles. So driverless Waymo taxis have been involved in fewer than one injury-causing crash for every million miles of driving—a much better rate than a typical human driver.

Last week Waymo released a new website to help the public put statistics like this in perspective. Waymo estimates that typical drivers in San Francisco and Phoenix—Waymo’s two biggest markets—would have caused 64 crashes over those 22 million miles. So Waymo vehicles get into injury-causing crashes less than one-third as often, per mile, as human-driven vehicles.

Waymo claims an even more dramatic improvement for crashes serious enough to trigger an airbag. Driverless Waymos have experienced just five crashes like that, and Waymo estimates that typical human drivers in Phoenix and San Francisco would have experienced 31 airbag crashes over 22 million miles. That implies driverless Waymos are one-sixth as likely as human drivers to experience this type of crash.

The new data comes at a critical time for Waymo, which is rapidly scaling up its robotaxi service. A year ago, Waymo was providing 10,000 rides per week. Last month, Waymo announced it was providing 100,000 rides per week. We can expect more growth in the coming months.

So it really matters whether Waymo is making our roads safer or more dangerous. And all the evidence so far suggests that it’s making them safer.

It’s not just the small number of crashes Waymo vehicles experience—it’s also the nature of those crashes. Out of the 23 most serious Waymo crashes, 16 involved a human driver rear-ending a Waymo. Three others involved a human-driven car running a red light before hitting a Waymo. There were no serious crashes where a Waymo ran a red light, rear-ended another car, or engaged in other clear-cut misbehavior.

Digging into Waymo’s crashes

In total, Waymo has reported nearly 200 crashes through June 2024, which works out to about one crash every 100,000 miles. Waymo says 43 percent of crashes across San Francisco and Phoenix had a delta-V of less than 1 mph—in other words, they were very minor fender-benders.

But let’s focus on the 23 most severe crashes: those that either caused an injury, caused an airbag to deploy, or both. These are good crashes to focus on not only because they do the most damage but because human drivers are more likely to report these types of crashes, making it easier to compare Waymo’s software to human drivers.

Most of these—16 crashes in total—involved another car rear-ending a Waymo. Some were quite severe: three triggered airbag deployments, and one caused a “moderate” injury. One vehicle rammed the Waymo a second time as it fled the scene, prompting Waymo to sue the driver.

There were three crashes where a human-driven car ran a red light before crashing into a Waymo:

  • One was the crash I mentioned at the top of this article. A car fleeing the police ran a red light and slammed into a Waymo, another car, and two pedestrians, causing several injuries.
  • In San Francisco, a pair of robbery suspects fleeing police in a stolen car ran a red light “at a high rate of speed” and slammed into the driver’s side door of a Waymo, triggering an airbag. The suspects were uninjured and fled on foot. The Waymo was thankfully empty.
  • In Phoenix, a car ran a red light and then “made contact with the SUV in front of the Waymo AV, and both of the other vehicles spun.” The Waymo vehicle was hit in the process, and someone in one of the other vehicles suffered an injury Waymo described as minor.

There were two crashes where a Waymo got sideswiped by a vehicle in an adjacent lane:

  • In San Francisco, Waymo was stopped at a stop sign in the right lane when another car hit the Waymo while passing it on the left.
  • In Tempe, Arizona, an SUV “overtook the Waymo AV on the left” and then “initiated a right turn,” cutting the Waymo off and causing a crash. A passenger in the SUV said they suffered moderate injuries.

Finally, there were two crashes where another vehicle turned left across the path of a Waymo vehicle:

  • In San Francisco, a Waymo and a large truck were approaching an intersection from opposite directions when a bicycle behind the truck made a sudden left in front of the Waymo. Waymo says the truck blocked Waymo’s vehicle from seeing the bicycle until the last second. The Waymo slammed on its brakes but wasn’t able to stop in time. The San Francisco Fire Department told local media that the bicyclist suffered only minor injuries and was able to leave the scene on their own.
  • A Waymo in Phoenix was traveling in the right lane. A row of stopped cars was in the lane to its left. As Waymo approached an intersection, a car coming from the opposite direction made a left turn through a gap in the row of stopped cars. Again, Waymo says the row of stopped cars blocked it from seeing the turning car until it was too late. A passenger in the turning vehicle reported minor injuries.

It’s conceivable that Waymo was at fault in these last two cases—it’s impossible to say without more details. It’s also possible that Waymo’s erratic braking contributed to a few of those rear-end crashes. Still, it seems clear that a non-Waymo vehicle bore primary responsibility for most, and possibly all, of these crashes.

“About as good as you can do”

One should always be skeptical when a company publishes a self-congratulatory report about its own safety record. So I called Noah Goodall, a civil engineer with many years of experience studying roadway safety, to see what he made of Waymo’s analysis.

“They’ve been the best of the companies doing this,” Goodall told me. He noted that Waymo has a team of full-time safety researchers who publish their work in reputable journals.

Waymo knows precisely how often its own vehicles crash because its vehicles are bristling with sensors. The harder problem is calculating an appropriate baseline for human-caused crashes.

That’s partly because human drivers don’t always report their own crashes to the police, insurance companies, or anyone else. But it’s also because crash rates differ from one area to another. For example, there are far more crashes per mile in downtown San Francisco than in the suburbs of Phoenix.

Waymo tried to account for these factors as it calculated crash rates for human drivers in both Phoenix and San Francisco. To ensure an apples-to-apples comparison, Waymo’s analysis excludes freeway crashes from its human-driven benchmark, since Waymo’s commercial fleet doesn’t use freeways yet.

Waymo estimates that human drivers fail to report 32 percent of injury crashes; the company raised its benchmark for human crashes to account for that. But even without this under-reporting adjustment, Waymo’s injury crash rate would still be roughly 60 percent below that of human drivers. The true number is probably somewhere between the adjusted number (70 percent fewer crashes) and the unadjusted one (60 percent fewer crashes). It’s an impressive figure either way.

Waymo says it doesn’t apply an under-reporting adjustment to its human benchmark for airbag crashes, since humans almost always report crashes that are severe enough to trigger an airbag. So it’s easier to take Waymo’s figure here—an 84 percent decline in airbag crashes—at face value.

Waymo’s benchmarks for human drivers are “about as good as you can do,” Goodall told me. “It’s very hard to get this kind of data.”

When I talked to other safety experts, they were equally positive about the quality of Waymo’s analysis. For example, last year, I asked Phil Koopman, a professor of computer engineering at Carnegie Mellon, about a previous Waymo study that used insurance data to show its cars were significantly safer than human drivers. Koopman told me Waymo’s findings were statistically credible, with some minor caveats.

Similarly, David Zuby, the chief research officer at the Insurance Institute for Highway Safety, had mostly positive things to say about a December study analyzing Waymo’s first 7.1 million miles of driverless operations.

I found a few errors in Waymo’s data

If you look closely, you’ll see that one of the numbers in this article differs slightly from Waymo’s safety website. Specifically, Waymo says that its vehicles get into crashes that cause injury 73 percent less often than human drivers, while the figure I use in this article is 70 percent.

This is because I spotted a couple of apparent classification mistakes in the raw data Waymo used to generate its statistics.

Each time Waymo reports a crash to the National Highway Traffic Safety Administration, it records the severity of injuries caused by the crash. This can be fatal, serious, moderate, minor, none, or unknown.

When Waymo shared an embargoed copy of its numbers with me early last week, it said that there had been 16 injury crashes. However, when I looked at the data Waymo had submitted to federal regulators, it showed 15 minor injuries, two moderate injuries, and one serious injury, for a total of 18.

When I asked Waymo about this discrepancy, the company said it found a programming error. Waymo had recently started using a moderate injury category and had not updated the code that generated its crash statistics to count these crashes. Waymo fixed the error quickly enough that the official version Waymo published on Thursday of last week showed 18 injury crashes.

However, as I continued looking at the data, I noticed another apparent mistake: Two crashes had been put in the “unknown” injury category, yet the narrative for each crash indicated an injury had occurred. One report said “the passenger in the Waymo AV reported an unspecified injury.” The other stated that “an individual involved was transported from the scene to a hospital for medical treatment.”

I notified Waymo about this apparent mistake on Friday and they said they are looking into it. As I write this, the website still claims a 73 percent reduction in injury crashes. But I think it’s clear that these two “unknown” crashes were actually injury crashes. So, all of the statistics in this article are based on the full list of 20 injury crashes.

I think this illustrates that I come by my generally positive outlook on Waymo honestly: I probably scrutinize Waymo’s data releases more carefully than any other journalist, and I’m not afraid to point out when the numbers don’t add up.

Based on my conversations with Waymo, I’m convinced these were honest mistakes rather than deliberate efforts to cover up crashes. I could only identify these mistakes because Waymo went out of its way to make its findings reproducible. It would make no sense to do that if the company simultaneously tried to fake its statistics.

Could there be other injury or airbag-triggering crashes that Waymo isn’t counting? It’s certainly possible, but I doubt there have been very many. You might have noticed that I linked to local media reporting for some of Waymo’s most significant crashes. If Waymo deliberately covered up a severe crash, there would be a big risk that a crash would get reported in the media and then Waymo would have to explain to federal regulators why it wasn’t reporting all legally required crashes.

So, despite the screwups, I find Waymo’s data to be fairly credible, and those data show that Waymo’s vehicles crash far less often than human drivers on public roads.

Tim Lee was on staff at Ars from 2017 to 2021. Last year, he launched a newsletter, Understanding AI, that explores how AI works and how it’s changing our world. You can subscribe here.

Human drivers keep rear-ending Waymos Read More »

proposed-underwater-data-center-surprises-regulators-who-hadn’t-heard-about-it

Proposed underwater data center surprises regulators who hadn’t heard about it

Proposed underwater data center surprises regulators who hadn’t heard about it

BalticServers.com

Data centers powering the generative AI boom are gulping water and exhausting electricity at what some researchers view as an unsustainable pace. Two entrepreneurs who met in high school a few years ago want to overcome that crunch with a fresh experiment: sinking the cloud into the sea.

Sam Mendel and Eric Kim launched their company, NetworkOcean, out of startup accelerator Y Combinator on August 15 by announcing plans to dunk a small capsule filled with GPU servers into San Francisco Bay within a month. “There’s this vital opportunity to build more efficient computer infrastructure that we’re gonna rely on for decades to come,” Mendel says.

The founders contend that moving data centers off land would slow ocean temperature rise by drawing less power and letting seawater cool the capsule’s shell, supplementing its internal cooling system. NetworkOcean’s founders have said a location in the bay would deliver fast processing speeds for the region’s buzzing AI economy.  

But scientists who study the hundreds of square miles of brackish water say even the slightest heat or disturbance from NetworkOcean’s submersible could trigger toxic algae blooms and harm wildlife. And WIRED inquiries to several California and US agencies who oversee the bay found that NetworkOcean has been pursuing its initial test of an underwater data center without having sought, much less received, any permits from key regulators.

The outreach by WIRED prompted at least two agencies—the Bay Conservation and Development Commission and the San Francisco Regional Water Quality Control Board—to email NetworkOcean that testing without permits could run afoul of laws, according to public records and spokespeople for the agencies. Fines from the BCDC can run up to hundreds of thousands of dollars.

The nascent technology has already been in hot water in California. In 2016, the state’s coastal commission issued a previously unreported notice to Microsoft saying that the tech giant had violated the law the year before by plunging an unpermitted server vessel into San Luis Obispo Bay, about 250 miles south of San Francisco. The months-long test, part of what was known as Project Natick, had ended without apparent environmental harm by the time the agency learned of it, so officials decided not to fine Microsoft, according to the notice seen by WIRED.

The renewed scrutiny of underwater data centers has surfaced an increasingly common tension between innovative efforts to combat global climate change and long-standing environmental laws. Permitting takes months, if not years, and can cost millions of dollars, potentially impeding progress. Advocates of the laws argue that the process allows for time and input to better weigh trade-offs.

“Things are overregulated because people often don’t do the right thing,” says Thomas Mumley, recently retired assistant executive officer of the bay water board. “You give an inch, they take a mile. We have to be cautious.”

Over the last two weeks, including during an interview at the WIRED office, NetworkOcean’s founders have provided driblets of details about their evolving plans. Their current intention is to test their underwater vessel for about an hour, just below the surface of what Mendel would only describe as a privately owned and operated portion of the bay that he says is not subject to regulatory oversight. He insists that a permit is not required based on the location, design, and minimal impact. “We have been told by our potential testing site that our setup is environmentally benign,” Mendel says.

Mumley, the retired regulator, calls the assertion about not needing a permit “absurd.” Both Bella Castrodale, the BCDC’s lead enforcement attorney, and Keith Lichten, a water board division manager, say private sites and a quick dip in the bay aren’t exempt from permitting. Several other experts in bay rules tell WIRED that even if some quirk does preclude oversight, they believe NetworkOcean is sending a poor message to the public by not coordinating with regulators.

“Just because these centers would be out of sight does not mean they are not a major disturbance,” says Jon Rosenfield, science director at San Francisco Baykeeper, a nonprofit that investigates industrial polluters.

School project

Mendel and Kim say they tried to develop an underwater renewable energy device together during high school in Southern California before moving onto non-nautical pursuits. Mendel, 23, dropped out of college in 2022 and founded a platform for social media influencers.

About a year ago, he built a small web server using the DIY system Raspberry Pi to host another personal project, and temporarily floated the equipment in San Francisco Bay by attaching it to a buoy from a private boat in the Sausalito area. (Mendel declined to answer questions about permits.) After talking with Kim, also 23, about this experiment, the two decided to move in together and start NetworkOcean.

Their pitch is that underwater data centers are more affordable to develop and maintain, especially as electricity shortages limit sites on land. Surrounding a tank of hot servers with water naturally helps cools them, avoiding the massive resource drain of air-conditioning and also improving on the similar benefits of floating data centers. Developers of offshore wind farms are eager to electrify NetworkOcean vessels, Mendel says.

Proposed underwater data center surprises regulators who hadn’t heard about it Read More »

ai-ruling-on-jobless-claims-could-make-mistakes-courts-can’t-undo,-experts-warn

AI ruling on jobless claims could make mistakes courts can’t undo, experts warn

AI ruling on jobless claims could make mistakes courts can’t undo, experts warn

Nevada will soon become the first state to use AI to help speed up the decision-making process when ruling on appeals that impact people’s unemployment benefits.

The state’s Department of Employment, Training, and Rehabilitation (DETR) agreed to pay Google $1,383,838 for the AI technology, a 2024 budget document shows, and it will be launched within the “next several months,” Nevada officials told Gizmodo.

Nevada’s first-of-its-kind AI will rely on a Google cloud service called Vertex AI Studio. Connecting to Google’s servers, the state will fine-tune the AI system to only reference information from DETR’s database, which officials think will ensure its decisions are “more tailored” and the system provides “more accurate results,” Gizmodo reported.

Under the contract, DETR will essentially transfer data from transcripts of unemployment appeals hearings and rulings, after which Google’s AI system will process that data, upload it to the cloud, and then compare the information to previous cases.

In as little as five minutes, the AI will issue a ruling that would’ve taken a state employee about three hours to reach without using AI, DETR’s information technology administrator, Carl Stanfield, told The Nevada Independent. That’s highly valuable to Nevada, which has a backlog of more than 40,000 appeals stemming from a pandemic-related spike in unemployment claims while dealing with “unforeseen staffing shortages” that DETR reported in July.

“The time saving is pretty phenomenal,” Stanfield said.

As a safeguard, the AI’s determination is then reviewed by a state employee to hopefully catch any mistakes, biases, or perhaps worse, hallucinations where the AI could possibly make up facts that could impact the outcome of their case.

Google’s spokesperson Ashley Simms told Gizmodo that the tech giant will work with the state to “identify and address any potential bias” and to “help them comply with federal and state requirements.” According to the state’s AI guidelines, the agency must prioritize ethical use of the AI system, “avoiding biases and ensuring fairness and transparency in decision-making processes.”

If the reviewer accepts the AI ruling, they’ll sign off on it and issue the decision. Otherwise, the reviewer will edit the decision and submit feedback so that DETR can investigate what went wrong.

Gizmodo noted that this novel use of AI “represents a significant experiment by state officials and Google in allowing generative AI to influence a high-stakes government decision—one that could put thousands of dollars in unemployed Nevadans’ pockets or take it away.”

Google declined to comment on whether more states are considering using AI to weigh jobless claims.

AI ruling on jobless claims could make mistakes courts can’t undo, experts warn Read More »

roblox-announces-ai-tool-for-generating-3d-game-worlds-from-text

Roblox announces AI tool for generating 3D game worlds from text

ease of use —

New AI feature aims to streamline game creation on popular online platform.

Someone holding up a smartphone with

On Friday, Roblox announced plans to introduce an open source generative AI tool that will allow game creators to build 3D environments and objects using text prompts, reports MIT Tech Review. The feature, which is still under development, may streamline the process of creating game worlds on the popular online platform, potentially opening up more aspects of game creation to those without extensive 3D design skills.

Roblox has not announced a specific launch date for the new AI tool, which is based on what it calls a “3D foundational model.” The company shared a demo video of the tool where a user types, “create a race track,” then “make the scenery a desert,” and the AI model creates a corresponding model in the proper environment.

The system will also reportedly let users make modifications, such as changing the time of day or swapping out entire landscapes, and Roblox says the multimodal AI model will ultimately accept video and 3D prompts, not just text.

A video showing Roblox’s generative AI model in action.

The 3D environment generator is part of Roblox’s broader AI integration strategy. The company reportedly uses around 250 AI models across its platform, including one that monitors voice chat in real time to enforce content moderation, which is not always popular with players.

Next-token prediction in 3D

Roblox’s 3D foundational model approach involves a custom next-token prediction model—a foundation not unlike the large language models (LLMs) that power ChatGPT. Tokens are fragments of text data that LLMs use to process information. Roblox’s system “tokenizes” 3D blocks by treating each block as a numerical unit, which allows the AI model to predict the most likely next structural 3D element in a sequence. In aggregate, the technique can build entire objects or scenery.

Anupam Singh, vice president of AI and growth engineering at Roblox, told MIT Tech Review about the challenges in developing the technology. “Finding high-quality 3D information is difficult,” Singh said. “Even if you get all the data sets that you would think of, being able to predict the next cube requires it to have literally three dimensions, X, Y, and Z.”

According to Singh, lack of 3D training data can create glitches in the results, like a dog with too many legs. To get around this, Roblox is using a second AI model as a kind of visual moderator to catch the mistakes and reject them until the proper 3D element appears. Through iteration and trial and error, the first AI model can create the proper 3D structure.

Notably, Roblox plans to open-source its 3D foundation model, allowing developers and even competitors to use and modify it. But it’s not just about giving back—open source can be a two-way street. Choosing an open source approach could also allow the company to utilize knowledge from AI developers if they contribute to the project and improve it over time.

The ongoing quest to capture gaming revenue

News of the new 3D foundational model arrived at the 10th annual Roblox Developers Conference in San Jose, California, where the company also announced an ambitious goal to capture 10 percent of global gaming content revenue through the Roblox ecosystem, and the introduction of “Party,” a new feature designed to facilitate easier group play among friends.

In March 2023, we detailed Roblox’s early foray into AI-powered game development tools, as revealed at the Game Developers Conference. The tools included a Code Assist beta for generating simple Lua functions from text descriptions, and a Material Generator for creating 2D surfaces with associated texture maps.

At the time, Roblox Studio head Stef Corazza described these as initial steps toward “democratizing” game creation with plans for AI systems that are now coming to fruition. The 2023 tools focused on discrete tasks like code snippets and 2D textures, laying the groundwork for the more comprehensive 3D foundational model announced at this year’s Roblox Developer’s Conference.

The upcoming AI tool could potentially streamline content creation on the platform, possibly accelerating Roblox’s path toward its revenue goal. “We see a powerful future where Roblox experiences will have extensive generative AI capabilities to power real-time creation integrated with gameplay,” Roblox said  in a statement. “We’ll provide these capabilities in a resource-efficient way, so we can make them available to everyone on the platform.”

Roblox announces AI tool for generating 3D game worlds from text Read More »

nvidia’s-ai-chips-are-cheaper-to-rent-in-china-than-us

Nvidia’s AI chips are cheaper to rent in China than US

secondhand channels —

Supply of processors helps Chinese startups advance AI technology despite US restrictions.

Nvidia’s AI chips are cheaper to rent in China than US

VGG | Getty Images

The cost of renting cloud services using Nvidia’s leading artificial intelligence chips is lower in China than in the US, a sign that the advanced processors are easily reaching the Chinese market despite Washington’s export restrictions.

Four small-scale Chinese cloud providers charge local tech groups roughly $6 an hour to use a server with eight Nvidia A100 processors in a base configuration, companies and customers told the Financial Times. Small cloud vendors in the US charge about $10 an hour for the same setup.

The low prices, according to people in the AI and cloud industry, are an indication of plentiful supply of Nvidia chips in China and the circumvention of US measures designed to prevent access to cutting-edge technologies.

The A100 and H100, which is also readily available, are among Nvidia’s most powerful AI accelerators and are used to train the large language models that power AI applications. The Silicon Valley company has been banned from shipping the A100 to China since autumn 2022 and has never been allowed to sell the H100 in the country.

Chip resellers and tech startups said the products were relatively easy to procure. Inventories of the A100 and H100 are openly advertised for sale on Chinese social media and ecommerce sites such as Xiaohongshu and Alibaba’s Taobao, as well as in electronics markets, at slight markups to pricing abroad.

China’s larger cloud operators such as Alibaba and ByteDance, known for their reliability and security, charge double to quadruple the price of smaller local vendors for similar Nvidia A100 servers, according to pricing from the two operators and customers.

After discounts, both Chinese tech giants offer packages for prices comparable to Amazon Web Services, which charges $15 to $32 an hour. Alibaba and ByteDance did not respond to requests for comment.

“The big players have to think about compliance, so they are at a disadvantage. They don’t want to use smuggled chips,” said a Chinese startup founder. “Smaller vendors are less concerned.”

He estimated there were more than 100,000 Nvidia H100 processors in the country based on their widespread availability in the market. The Nvidia chips are each roughly the size of a book, making them relatively easy for smugglers to ferry across borders, undermining Washington’s efforts to limit China’s AI progress.

“We bought our H100s from a company that smuggled them in from Japan,” said a startup founder in the automation field who paid about 500,000 yuan ($70,000) for two cards this year. “They etched off the serial numbers.”

Nvidia said it sold its processors “primarily to well-known partners … who work with us to ensure that all sales comply with US export control rules”.

“Our pre-owned products are available through many second-hand channels,” the company added. “Although we cannot track products after they are sold, if we determine that any customer is violating US export controls, we will take appropriate action.”

The head of a small Chinese cloud vendor said low domestic costs helped offset the higher prices that providers paid for smuggled Nvidia processors. “Engineers are cheap, power is cheap, and competition is fierce,” he said.

In Shenzhen’s Huaqiangbei electronics market, salespeople speaking to the FT quoted the equivalent of $23,000–$30,000 for Nvidia’s H100 plug-in cards. Online sellers quote the equivalent of $31,000–$33,000.

Nvidia charges customers $20,000–$23,000 for H100 chips after recently cutting prices, according to Dylan Patel of SemiAnalysis.

One data center vendor in China said servers made by Silicon Valley’s Supermicro and fitted with eight H100 chips hit a peak selling price of 3.2 million yuan after the Biden administration tightened export restrictions in October. He said prices had since fallen to 2.5 million yuan as supply constraints eased.

Several people involved in the trade said merchants in Malaysia, Japan, and Indonesia often shipped Supermicro servers or Nvidia processors to Hong Kong before bringing them across the border to Shenzhen.

The black market trade depends on difficult-to-counter workarounds to Washington’s export regulations, experts said.

For example, while subsidiaries of Chinese companies are banned from buying advanced AI chips outside the country, their executives could establish new companies in countries such as Japan or Malaysia to make the purchases.

“It’s hard to completely enforce export controls beyond the US border,” said an American sanctions expert. “That’s why the regulations create obligations for the shipper to look into end users and [the] commerce [department] adds companies believed to be flouting the rules to the [banned] entity list.”

Additional reporting by Michael Acton in San Francisco.

© 2024 The Financial Times Ltd. All rights reserved. Please do not copy and paste FT articles and redistribute by email or post to the web.

Nvidia’s AI chips are cheaper to rent in China than US Read More »