AI

terminator’s-cameron-joins-ai-company-behind-controversial-image-generator

Terminator’s Cameron joins AI company behind controversial image generator

a net in the sky —

Famed sci-fi director joins board of embattled Stability AI, creator of Stable Diffusion.

A photo of filmmaker James Cameron.

Enlarge / Filmmaker James Cameron.

On Tuesday, Stability AI announced that renowned filmmaker James Cameron—of Terminator and Skynet fame—has joined its board of directors. Stability is best known for its pioneering but highly controversial Stable Diffusion series of AI image-synthesis models, first launched in 2022, which can generate images based on text descriptions.

“I’ve spent my career seeking out emerging technologies that push the very boundaries of what’s possible, all in the service of telling incredible stories,” said Cameron in a statement. “I was at the forefront of CGI over three decades ago, and I’ve stayed on the cutting edge since. Now, the intersection of generative AI and CGI image creation is the next wave.”

Cameron is perhaps best known as the director behind blockbusters like Avatar, Titanic, and Aliens, but in AI circles, he may be most relevant for the co-creation of the character Skynet, a fictional AI system that triggers nuclear Armageddon and dominates humanity in the Terminator media franchise. Similar fears of AI taking over the world have since jumped into reality and recently sparked attempts to regulate existential risk from AI systems through measures like SB-1047 in California.

In a 2023 interview with CTV news, Cameron referenced The Terminator‘s release year when asked about AI’s dangers: “I warned you guys in 1984, and you didn’t listen,” he said. “I think the weaponization of AI is the biggest danger. I think that we will get into the equivalent of a nuclear arms race with AI, and if we don’t build it, the other guys are for sure going to build it, and so then it’ll escalate.”

Hollywood goes AI

Of course, Stability AI isn’t building weapons controlled by AI. Instead, Cameron’s interest in cutting-edge filmmaking techniques apparently drew him to the company.

“James Cameron lives in the future and waits for the rest of us to catch up,” said Stability CEO Prem Akkaraju. “Stability AI’s mission is to transform visual media for the next century by giving creators a full stack AI pipeline to bring their ideas to life. We have an unmatched advantage to achieve this goal with a technological and creative visionary like James at the highest levels of our company. This is not only a monumental statement for Stability AI, but the AI industry overall.”

Cameron joins other recent additions to Stability AI’s board, including Sean Parker, former president of Facebook, who serves as executive chairman. Parker called Cameron’s appointment “the start of a new chapter” for the company.

Despite significant protest from actors’ unions last year, elements of Hollywood are seemingly beginning to embrace generative AI over time. Last Wednesday, we covered a deal between Lionsgate and AI video-generation company Runway that will see the creation of a custom AI model for film production use. In March, the Financial Times reported that OpenAI was actively showing off its Sora video synthesis model to studio executives.

Unstable times for Stability AI

Cameron’s appointment to the Stability AI board comes during a tumultuous period for the company. Stability AI has faced a series of challenges this past year, including an ongoing class-action copyright lawsuit, a troubled Stable Diffusion 3 model launch, significant leadership and staff changes, and ongoing financial concerns.

In March, founder and CEO Emad Mostaque resigned, followed by a round of layoffs. This came on the heels of the departure of three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz, who have since founded Black Forest Labs and released a new open-weights image-synthesis model called Flux, which has begun to take over the r/StableDiffusion community on Reddit.

Despite the issues, Stability AI claims its models are widely used, with Stable Diffusion reportedly surpassing 150 million downloads. The company states that thousands of businesses use its models in their creative workflows.

While Stable Diffusion has indeed spawned a large community of open-weights-AI image enthusiasts online, it has also been a lightning rod for controversy among some artists because Stability originally trained its models on hundreds of millions of images scraped from the Internet without seeking licenses or permission to use them.

Apparently that association is not a concern for Cameron, according to his statement: “The convergence of these two totally different engines of creation [CGI and generative AI] will unlock new ways for artists to tell stories in ways we could have never imagined. Stability AI is poised to lead this transformation.”

Terminator’s Cameron joins AI company behind controversial image generator Read More »

when-you-call-a-restaurant,-you-might-be-chatting-with-an-ai-host

When you call a restaurant, you might be chatting with an AI host

digital hosting —

Voice chatbots are increasingly picking up the phone for restaurants.

Drawing of a robot holding a telephone.

Getty Images | Juj Winn

A pleasant female voice greets me over the phone. “Hi, I’m an assistant named Jasmine for Bodega,” the voice says. “How can I help?”

“Do you have patio seating,” I ask. Jasmine sounds a little sad as she tells me that unfortunately, the San Francisco–based Vietnamese restaurant doesn’t have outdoor seating. But her sadness isn’t the result of her having a bad day. Rather, her tone is a feature, a setting.

Jasmine is a member of a new, growing clan: the AI voice restaurant host. If you recently called up a restaurant in New York City, Miami, Atlanta, or San Francisco, chances are you have spoken to one of Jasmine’s polite, calculated competitors.  

In the sea of AI voice assistants, hospitality phone agents haven’t been getting as much attention as consumer-based generative AI tools like Gemini Live and ChatGPT-4o. And yet, the niche is heating up, with multiple emerging startups vying for restaurant accounts across the US. Last May, voice-ordering AI garnered much attention at the National Restaurant Association’s annual food show. Bodega, the high-end Vietnamese restaurant I called, used Maitre-D AI, which launched primarily in the Bay Area in 2024. Newo, another new startup, is currently rolling its software out at numerous Silicon Valley restaurants. One-year-old RestoHost is now answering calls at 150 restaurants in the Atlanta metro area, and Slang, a voice AI company that started focusing on restaurants exclusively during the COVID-19 pandemic and announced a $20 million funding round in 2023, is gaining ground in the New York and Las Vegas markets.

All of them offer a similar service: an around-the-clock AI phone host that can answer generic questions about the restaurant’s dress code, cuisine, seating arrangements, and food allergy policies. They can also assist with making, altering, or canceling a reservation. In some cases, the agent can direct the caller to an actual human, but according to RestoHost co-founder Tomas Lopez-Saavedra, only 10 percent of the calls result in that. Each platform offers the restaurant subscription tiers that unlock additional features, and some of the systems can speak multiple languages.

But who even calls a restaurant in the era of Google and Resy? According to some of the founders of AI voice host startups, many customers do, and for various reasons. “Restaurants get a high volume of phone calls compared to other businesses, especially if they’re popular and take reservations,” says Alex Sambvani, CEO and co-founder of Slang, which currently works with everyone from the Wolfgang Puck restaurant group to Chick-fil-A to the fast-casual chain Slutty Vegan. Sambvani estimates that in-demand establishments receive between 800 and 1,000 calls per month. Typical callers tend to be last-minute bookers, tourists and visitors, older people, and those who do their errands while driving.

Matt Ho, the owner of Bodega SF, confirms this scenario. “The phones would ring constantly throughout service,” he says. “We would receive calls for basic questions that can be found on our website.” To solve this issue, after shopping around, Ho found that Maitre-D was the best fit. Bodega SF became one of the startup’s earliest clients in May, and Ho even helped the founders with trial and error testing prior to launch. “This platform makes the job easier for the host and does not disturb guests while they’re enjoying their meal,” he says.

When you call a restaurant, you might be chatting with an AI host Read More »

secret-calculator-hack-brings-chatgpt-to-the-ti-84,-enabling-easy-cheating

Secret calculator hack brings ChatGPT to the TI-84, enabling easy cheating

Breaking free of “test mode” —

Tiny device installed inside TI-84 enables Wi-Fi Internet, access to AI chatbot.

An OpenAI logo on a TI-84 calculator screen.

On Saturday, a YouTube creator called “ChromaLock” published a video detailing how he modified a Texas Instruments TI-84 graphing calculator to connect to the Internet and access OpenAI’s ChatGPT, potentially enabling students to cheat on tests. The video, titled “I Made The Ultimate Cheating Device,” demonstrates a custom hardware modification that allows users of the graphing calculator to type in problems sent to ChatGPT using the keypad and receive live responses on the screen.

ChromaLock began by exploring the calculator’s link port, typically used for transferring educational programs between devices. He then designed a custom circuit board he calls “TI-32” that incorporates a tiny Wi-Fi-enabled microcontroller, the Seed Studio ESP32-C3 (which costs about $5), along with other components to interface with the calculator’s systems.

It’s worth noting that the TI-32 hack isn’t a commercial project. Replicating ChromaLock’s work would involve purchasing a TI-84 calculator, a Seed Studio ESP32-C3 microcontroller, and various electronic components, and fabricating a custom PCB based on ChromaLock’s design, which is available online.

The creator says he encountered several engineering challenges during development, including voltage incompatibilities and signal integrity issues. After developing multiple versions, ChromaLock successfully installed the custom board into the calculator’s housing without any visible signs of modifications from the outside.

“I Made The Ultimate Cheating Device” YouTube Video.

To accompany the hardware, ChromaLock developed custom software for the microcontroller and the calculator, which is available open source on GitHub. The system simulates another TI-84, allowing people to use the calculator’s built-in “send” and “get” commands to transfer files. This allows a user to easily download a launcher program that provides access to various “applets” designed for cheating.

One of the applets is a ChatGPT interface that might be most useful for answering short questions, but it has a drawback in that it’s slow and cumbersome to type in long alphanumeric questions on the limited keypad.

Beyond the ChatGPT interface, the device offers several other cheating tools. An image browser allows users to access pre-prepared visual aids stored on the central server. The app browser feature enables students to download not only games for post-exam entertainment but also text-based cheat sheets disguised as program source code. ChromaLock even hinted at a future video discussing a camera feature, though details were sparse in the current demo.

ChromaLock claims his new device can bypass common anti-cheating measures. The launcher program can be downloaded on-demand, avoiding detection if a teacher inspects or clears the calculator’s memory before a test. The modification can also supposedly break calculators out of “Test Mode,” a locked-down state used to prevent cheating.

While the video presents the project as a technical achievement, consulting ChatGPT during a test on your calculator almost certainly represents an ethical breach and/or a form of academic dishonesty that could get you in serious trouble at most schools. So tread carefully, study hard, and remember to eat your Wheaties.

Secret calculator hack brings ChatGPT to the TI-84, enabling easy cheating Read More »

due-to-ai-fakes,-the-“deep-doubt”-era-is-here

Due to AI fakes, the “deep doubt” era is here

A person writing

Memento | Aurich Lawson

Given the flood of photorealistic AI-generated images washing over social media networks like X and Facebook these days, we’re seemingly entering a new age of media skepticism: the era of what I’m calling “deep doubt.” While questioning the authenticity of digital content stretches back decades—and analog media long before that—easy access to tools that generate convincing fake content has led to a new wave of liars using AI-generated scenes to deny real documentary evidence. Along the way, people’s existing skepticism toward online content from strangers may be reaching new heights.

Deep doubt is skepticism of real media that stems from the existence of generative AI. This manifests as broad public skepticism toward the veracity of media artifacts, which in turn leads to a notable consequence: People can now more credibly claim that real events did not happen and suggest that documentary evidence was fabricated using AI tools.

The concept behind “deep doubt” isn’t new, but its real-world impact is becoming increasingly apparent. Since the term “deepfake” first surfaced in 2017, we’ve seen a rapid evolution in AI-generated media capabilities. This has led to recent examples of deep doubt in action, such as conspiracy theorists claiming that President Joe Biden has been replaced by an AI-powered hologram and former President Donald Trump’s baseless accusation in August that Vice President Kamala Harris used AI to fake crowd sizes at her rallies. And on Friday, Trump cried “AI” again at a photo of him with E. Jean Carroll, a writer who successfully sued him for sexual assault, that contradicts his claim of never having met her.

Legal scholars Danielle K. Citron and Robert Chesney foresaw this trend years ago, coining the term “liar’s dividend” in 2019 to describe the consequence of deep doubt: deepfakes being weaponized by liars to discredit authentic evidence. But whereas deep doubt was once a hypothetical academic concept, it is now our reality.

The rise of deepfakes, the persistence of doubt

Doubt has been a political weapon since ancient times. This modern AI-fueled manifestation is just the latest evolution of a tactic where the seeds of uncertainty are sown to manipulate public opinion, undermine opponents, and hide the truth. AI is the newest refuge of liars.

Over the past decade, the rise of deep-learning technology has made it increasingly easy for people to craft false or modified pictures, audio, text, or video that appear to be non-synthesized organic media. Deepfakes were named after a Reddit user going by the name “deepfakes,” who shared AI-faked pornography on the service, swapping out the face of a performer with the face of someone else who wasn’t part of the original recording.

In the 20th century, one could argue that a certain part of our trust in media produced by others was a result of how expensive and time-consuming it was, and the skill it required, to produce documentary images and films. Even texts required a great deal of time and skill. As the deep doubt phenomenon grows, it will erode this 20th-century media sensibility. But it will also affect our political discourse, legal systems, and even our shared understanding of historical events that rely on that media to function—we rely on others to get information about the world. From photorealistic images to pitch-perfect voice clones, our perception of what we consider “truth” in media will need recalibration.

In April, a panel of federal judges highlighted the potential for AI-generated deepfakes to not only introduce fake evidence but also cast doubt on genuine evidence in court trials. The concern emerged during a meeting of the US Judicial Conference’s Advisory Committee on Evidence Rules, where the judges discussed the challenges of authenticating digital evidence in an era of increasingly sophisticated AI technology. Ultimately, the judges decided to postpone making any AI-related rule changes, but their meeting shows that the subject is already being considered by American judges.

Due to AI fakes, the “deep doubt” era is here Read More »

landmark-ai-deal-sees-hollywood-giant-lionsgate-provide-library-for-ai-training

Landmark AI deal sees Hollywood giant Lionsgate provide library for AI training

The silicon screen —

Runway deal will create a Lionsgate AI video generator, but not everyone is happy.

An illustration of a filmstrip with a robot, horse, rocket, and whale.

On Wednesday, AI video synthesis firm Runway and entertainment company Lionsgate announced a partnership to create a new AI model trained on Lionsgate’s vast film and TV library. The deal will feed Runway legally clear training data and will also reportedly provide Lionsgate with tools to enhance content creation while potentially reducing production costs.

Lionsgate, known for franchises like John Wick and The Hunger Games, sees AI as a way to boost efficiency in content production. Michael Burns, Lionsgate’s vice chair, stated in a press release that AI could help develop “cutting edge, capital efficient content creation opportunities.” He added that some filmmakers have shown enthusiasm about potential applications in pre- and post-production processes.

Runway plans to develop a custom AI model using Lionsgate’s proprietary content portfolio. The model will be exclusive to Lionsgate Studios, allowing filmmakers, directors, and creative staff to augment their work. While specifics remain unclear, the partnership marks the first major collaboration between Runway and a Hollywood studio.

“We’re committed to giving artists, creators and studios the best and most powerful tools to augment their workflows and enable new ways of bringing their stories to life,” said Runway co-founder and CEO Cristóbal Valenzuela in a press release. “The history of art is the history of technology and these new models are part of our continuous efforts to build transformative mediums for artistic and creative expression; the best stories are yet to be told.”

The quest for legal training data

Generative AI models are master imitators, and video synthesis models like Runway’s latest Gen-3 Alpha are no exception. The companies that create them must amass a great deal of existing video (and still image) samples to analyze, allowing the resulting AI models to re-synthesize that information into new video generations, guided by text descriptions called prompts. And wherever that training data is lacking, it can result in unusual generations, as we saw in our hands-on evaluation of Gen-3 Alpha in July.

However, in the past, AI companies have gotten into legal trouble for scraping vast quantities of media without permission. In fact, Runway is currently the defendant in a class-action lawsuit that alleges copyright infringement for using video data obtained without permission to train its video synthesis models. While companies like OpenAI have claimed this scraping process is “fair use,” US courts have not yet definitively ruled on the practice. With other potential legal challenges ahead, it makes sense from Runway’s perspective to reach out and sign deals for training data that is completely in the clear.

Even if the training data becomes fully legal and licensed, different elements of the entertainment industry view generative AI on a spectrum that seems to range between fascination and horror. The technology’s ability to rapidly create images and video based on prompts may attract studios looking to streamline production. However, it raises polarizing concerns among unions about job security, actors and musicians about likeness misuse and ethics, and studios about legal implications.

So far, news of the deal has not been received kindly among vocal AI critics found on social media. On X, filmmaker and AI critic Joe Russo wrote, “I don’t think I’ve ever seen a grosser string of words than: ‘to develop cutting-edge, capital-efficient content creation opportunities.'”

Film concept artist Reid Southen shared a similar negative take on X: “I wonder how the directors and actors of their films feel about having their work fed into the AI to make a proprietary model. As an artist on The Hunger Games? I’m pissed. This is the first step in trying to replace artists and filmmakers.”

It’s a fear that we will likely hear more about in the future as AI video synthesis technology grows more capable—and potentially becomes adopted as a standard filmmaking tool. As studios explore AI applications despite legal uncertainties and labor concerns, partnerships like the Lionsgate-Runway deal may shape the future of content creation in Hollywood.

Landmark AI deal sees Hollywood giant Lionsgate provide library for AI training Read More »

macos-15-sequoia:-the-ars-technica-review

macOS 15 Sequoia: The Ars Technica review

macOS 15 Sequoia: The Ars Technica review

Apple

The macOS 15 Sequoia update will inevitably be known as “the AI one” in retrospect, introducing, as it does, the first wave of “Apple Intelligence” features.

That’s funny because none of that stuff is actually ready for the 15.0 release that’s coming out today. A lot of it is coming “later this fall” in the 15.1 update, which Apple has been testing entirely separately from the 15.0 betas for weeks now. Some of it won’t be ready until after that—rumors say image generation won’t be ready until the end of the year—but in any case, none of it is ready for public consumption yet.

But the AI-free 15.0 release does give us a chance to evaluate all of the non-AI additions to macOS this year. Apple Intelligence is sucking up a lot of the media oxygen, but in most other ways, this is a typical 2020s-era macOS release, with one or two headliners, several quality-of-life tweaks, and some sparsely documented under-the-hood stuff that will subtly change how you experience the operating system.

The AI-free version of the operating system is also the one that all users of the remaining Intel Macs will be using, since all of the Apple Intelligence features require Apple Silicon. Most of the Intel Macs that ran last year’s Sonoma release will run Sequoia this year—the first time this has happened since 2019—but the difference between the same macOS version running on different CPUs will be wider than it has been. It’s a clear indicator that the Intel Mac era is drawing to a close, even if support hasn’t totally ended just yet.

macOS 15 Sequoia: The Ars Technica review Read More »

google-rolls-out-voice-powered-ai-chat-to-the-android-masses

Google rolls out voice-powered AI chat to the Android masses

Chitchat Wars —

Gemini Live allows back-and-forth conversation, now free to all Android users.

The Google Gemini logo.

Enlarge / The Google Gemini logo.

Google

On Thursday, Google made Gemini Live, its voice-based AI chatbot feature, available for free to all Android users. The feature allows users to interact with Gemini through voice commands on their Android devices. That’s notable because competitor OpenAI’s Advanced Voice Mode feature of ChatGPT, which is similar to Gemini Live, has not yet fully shipped.

Google unveiled Gemini Live during its Pixel 9 launch event last month. Initially, the feature was exclusive to Gemini Advanced subscribers, but now it’s accessible to anyone using the Gemini app or its overlay on Android.

Gemini Live enables users to ask questions aloud and even interrupt the AI’s responses mid-sentence. Users can choose from several voice options for Gemini’s responses, adding a level of customization to the interaction.

Gemini suggests the following uses of the voice mode in its official help documents:

Talk back and forth: Talk to Gemini without typing, and Gemini will respond back verbally.

Brainstorm ideas out loud: Ask for a gift idea, to plan an event, or to make a business plan.

Explore: Uncover more details about topics that interest you.

Practice aloud: Rehearse for important moments in a more natural and conversational way.

Interestingly, while OpenAI originally demoed its Advanced Voice Mode in May with the launch of GPT-4o, it has only shipped the feature to a limited number of users starting in late July. Some AI experts speculate that a wider rollout has been hampered by a lack of available computer power since the voice feature is presumably very compute-intensive.

To access Gemini Live, users can reportedly tap a new waveform icon in the bottom-right corner of the app or overlay. This action activates the microphone, allowing users to pose questions verbally. The interface includes options to “hold” Gemini’s answer or “end” the conversation, giving users control over the flow of the interaction.

Currently, Gemini Live supports only English, but Google has announced plans to expand language support in the future. The company also intends to bring the feature to iOS devices, though no specific timeline has been provided for this expansion.

Google rolls out voice-powered AI chat to the Android masses Read More »

openai’s-new-“reasoning”-ai-models-are-here:-o1-preview-and-o1-mini

OpenAI’s new “reasoning” AI models are here: o1-preview and o1-mini

fruit by the foot —

New o1 language model can solve complex tasks iteratively, count R’s in “strawberry.”

An illustration of a strawberry made out of pixel-like blocks.

OpenAI finally unveiled its rumored “Strawberry” AI language model on Thursday, claiming significant improvements in what it calls “reasoning” and problem-solving capabilities over previous large language models (LLMs). Formally named “OpenAI o1,” the model family will initially launch in two forms, o1-preview and o1-mini, available today for ChatGPT Plus and certain API users.

OpenAI claims that o1-preview outperforms its predecessor, GPT-4o, on multiple benchmarks, including competitive programming, mathematics, and “scientific reasoning.” However, people who have used the model say it does not yet outclass GPT-4o in every metric. Other users have criticized the delay in receiving a response from the model, owing to the multi-step processing occurring behind the scenes before answering a query.

In a rare display of public hype-busting, OpenAI product manager Joanne Jang tweeted, “There’s a lot of o1 hype on my feed, so I’m worried that it might be setting the wrong expectations. what o1 is: the first reasoning model that shines in really hard tasks, and it’ll only get better. (I’m personally psyched about the model’s potential & trajectory!) what o1 isn’t (yet!): a miracle model that does everything better than previous models. you might be disappointed if this is your expectation for today’s launch—but we’re working to get there!”

OpenAI reports that o1-preview ranked in the 89th percentile on competitive programming questions from Codeforces. In mathematics, it scored 83 percent on a qualifying exam for the International Mathematics Olympiad, compared to GPT-4o’s 13 percent. OpenAI also states, in a claim that may later be challenged as people scrutinize the benchmarks and run their own evaluations over time, o1 performs comparably to PhD students on specific tasks in physics, chemistry, and biology. The smaller o1-mini model is designed specifically for coding tasks and is priced at 80 percent less than o1-preview.

A benchmark chart provided by OpenAI. They write,

Enlarge / A benchmark chart provided by OpenAI. They write, “o1 improves over GPT-4o on a wide range of benchmarks, including 54/57 MMLU subcategories. Seven are shown for illustration.”

OpenAI attributes o1’s advancements to a new reinforcement learning (RL) training approach that teaches the model to spend more time “thinking through” problems before responding, similar to how “let’s think step-by-step” chain-of-thought prompting can improve outputs in other LLMs. The new process allows o1 to try different strategies and “recognize” its own mistakes.

AI benchmarks are notoriously unreliable and easy to game; however, independent verification and experimentation from users will show the full extent of o1’s advancements over time. It’s worth noting that MIT Research showed earlier this year that some of the benchmark claims OpenAI touted with GPT-4 last year were erroneous or exaggerated.

A mixed bag of capabilities

OpenAI demos “o1” correctly counting the number of Rs in the word “strawberry.”

Amid many demo videos of o1 completing programming tasks and solving logic puzzles that OpenAI shared on its website and social media, one demo stood out as perhaps the least consequential and least impressive, but it may become the most talked about due to a recurring meme where people ask LLMs to count the number of R’s in the word “strawberry.”

Due to tokenization, where the LLM processes words in data chunks called tokens, most LLMs are typically blind to character-by-character differences in words. Apparently, o1 has the self-reflective capabilities to figure out how to count the letters and provide an accurate answer without user assistance.

Beyond OpenAI’s demos, we’ve seen optimistic but cautious hands-on reports about o1-preview online. Wharton Professor Ethan Mollick wrote on X, “Been using GPT-4o1 for the last month. It is fascinating—it doesn’t do everything better but it solves some very hard problems for LLMs. It also points to a lot of future gains.”

Mollick shared a hands-on post in his “One Useful Thing” blog that details his experiments with the new model. “To be clear, o1-preview doesn’t do everything better. It is not a better writer than GPT-4o, for example. But for tasks that require planning, the changes are quite large.”

Mollick gives the example of asking o1-preview to build a teaching simulator “using multiple agents and generative AI, inspired by the paper below and considering the views of teachers and students,” then asking it to build the full code, and it produced a result that Mollick found impressive.

Mollick also gave o1-preview eight crossword puzzle clues, translated into text, and the model took 108 seconds to solve it over many steps, getting all of the answers correct but confabulating a particular clue Mollick did not give it. We recommend reading Mollick’s entire post for a good early hands-on impression. Given his experience with the new model, it appears that o1 works very similar to GPT-4o but iteratively in a loop, which is something that the so-called “agentic” AutoGPT and BabyAGI projects experimented with in early 2023.

Is this what could “threaten humanity?”

Speaking of agentic models that run in loops, Strawberry has been subject to hype since last November, when it was initially known as Q(Q-star). At the time, The Information and Reuters claimed that, just before Sam Altman’s brief ouster as CEO, OpenAI employees had internally warned OpenAI’s board of directors about a new OpenAI model called Q*  that could “threaten humanity.”

In August, the hype continued when The Information reported that OpenAI showed Strawberry to US national security officials.

We’ve been skeptical about the hype around Qand Strawberry since the rumors first emerged, as this author noted last November, and Timothy B. Lee covered thoroughly in an excellent post about Q* from last December.

So even though o1 is out, AI industry watchers should note how this model’s impending launch was played up in the press as a dangerous advancement while not being publicly downplayed by OpenAI. For an AI model that takes 108 seconds to solve eight clues in a crossword puzzle and hallucinates one answer, we can say that its potential danger was likely hype (for now).

Controversy over “reasoning” terminology

It’s no secret that some people in tech have issues with anthropomorphizing AI models and using terms like “thinking” or “reasoning” to describe the synthesizing and processing operations that these neural network systems perform.

Just after the OpenAI o1 announcement, Hugging Face CEO Clement Delangue wrote, “Once again, an AI system is not ‘thinking,’ it’s ‘processing,’ ‘running predictions,’… just like Google or computers do. Giving the false impression that technology systems are human is just cheap snake oil and marketing to fool you into thinking it’s more clever than it is.”

“Reasoning” is also a somewhat nebulous term since, even in humans, it’s difficult to define exactly what the term means. A few hours before the announcement, independent AI researcher Simon Willison tweeted in response to a Bloomberg story about Strawberry, “I still have trouble defining ‘reasoning’ in terms of LLM capabilities. I’d be interested in finding a prompt which fails on current models but succeeds on strawberry that helps demonstrate the meaning of that term.”

Reasoning or not, o1-preview currently lacks some features present in earlier models, such as web browsing, image generation, and file uploading. OpenAI plans to add these capabilities in future updates, along with continued development of both the o1 and GPT model series.

While OpenAI says the o1-preview and o1-mini models are rolling out today, neither model is available in our ChatGPT Plus interface yet, so we have not been able to evaluate them. We’ll report our impressions on how this model differs from other LLMs we have previously covered.

OpenAI’s new “reasoning” AI models are here: o1-preview and o1-mini Read More »

ai-chatbots-might-be-better-at-swaying-conspiracy-theorists-than-humans

AI chatbots might be better at swaying conspiracy theorists than humans

Out of the rabbit hole —

Co-author Gordon Pennycook: “The work overturns a lot of how we thought about conspiracies.”

A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Enlarge / A woman wearing a sweatshirt for the QAnon conspiracy theory on October 11, 2020 in Ronkonkoma, New York.

Stephanie Keith | Getty Images

Belief in conspiracy theories is rampant, particularly in the US, where some estimates suggest as much as 50 percent of the population believes in at least one outlandish claim. And those beliefs are notoriously difficult to debunk. Challenge a committed conspiracy theorist with facts and evidence, and they’ll usually just double down—a phenomenon psychologists usually attribute to motivated reasoning, i.e., a biased way of processing information.

A new paper published in the journal Science is challenging that conventional wisdom, however. Experiments in which an AI chatbot engaged in conversations with people who believed at least one conspiracy theory showed that the interaction significantly reduced the strength of those beliefs, even two months later. The secret to its success: the chatbot, with its access to vast amounts of information across an enormous range of topics, could precisely tailor its counterarguments to each individual.

“These are some of the most fascinating results I’ve ever seen,” co-author Gordon Pennycook, a psychologist at Cornell University, said during a media briefing. “The work overturns a lot of how we thought about conspiracies, that they’re the result of various psychological motives and needs. [Participants] were remarkably responsive to evidence. There’s been a lot of ink spilled about being in a post-truth world. It’s really validating to know that evidence does matter. We can act in a more adaptive way using this new technology to get good evidence in front of people that is specifically relevant to what they think, so it’s a much more powerful approach.”

When confronted with facts that challenge a deeply entrenched belief, people will often seek to preserve it rather than update their priors (in Bayesian-speak) in light of the new evidence. So there has been a good deal of pessimism lately about ever reaching those who have plunged deep down the rabbit hole of conspiracy theories, which are notoriously persistent and “pose a serious threat to democratic societies,” per the authors. Pennycook and his fellow co-authors devised an alternative explanation for that stubborn persistence of belief.

Bespoke counter-arguments

The issue is that “conspiracy theories just vary a lot from person to person,” said co-author Thomas Costello, a psychologist at American University who is also affiliated with MIT. “They’re quite heterogeneous. People believe a wide range of them and the specific evidence that people use to support even a single conspiracy may differ from one person to another. So debunking attempts where you try to argue broadly against a conspiracy theory are not going to be effective because people have different versions of that conspiracy in their heads.”

By contrast, an AI chatbot would be able to tailor debunking efforts to those different versions of a conspiracy. So in theory a chatbot might prove more effective in swaying someone from their pet conspiracy theory.

To test their hypothesis, the team conducted a series of experiments with 2,190 participants who believed in one or more conspiracy theories. The participants engaged in several personal “conversations” with a large language model (GT-4 Turbo) in which they shared their pet conspiracy theory and the evidence they felt supported that belief. The LLM would respond by offering factual and evidence-based counter-arguments tailored to the individual participant. GPT-4 Turbo’s responses were professionally fact-checked, which showed that 99.2 percent of the claims it made were true, with just 0.8 percent being labeled misleading, and zero as false. (You can try your hand at interacting with the debunking chatbot here.)

Screenshot of the chatbot opening page asking questions to prepare for a conversation

Enlarge / Screenshot of the chatbot opening page asking questions to prepare for a conversation

Thomas H. Costello

Participants first answered a series of open-ended questions about the conspiracy theories they strongly believed and the evidence they relied upon to support those beliefs. The AI then produced a single-sentence summary of each belief, for example, “9/11 was an inside job because X, Y, and Z.” Participants would rate the accuracy of that statement in terms of their own beliefs and then filled out a questionnaire about other conspiracies, their attitude toward trusted experts, AI, other people in society, and so forth.

Then it was time for the one-on-one dialogues with the chatbot, which the team programmed to be as persuasive as possible. The chatbot had also been fed the open-ended responses of the participants, which made it better to tailor its counter-arguments individually. For example, if someone thought 9/11 was an inside job and cited as evidence the fact that jet fuel doesn’t burn hot enough to melt steel, the chatbot might counter with, say, the NIST report showing that steel loses its strength at much lower temperatures, sufficient to weaken the towers’ structures so that it collapsed. Someone who thought 9/11 was an inside job and cited demolitions as evidence would get a different response tailored to that.

Participants then answered the same set of questions after their dialogues with the chatbot, which lasted about eight minutes on average. Costello et al. found that these targeted dialogues resulted in a 20 percent decrease in the participants’ misinformed beliefs—a reduction that persisted even two months later when participants were evaluated again.

As Bence Bago (Tilburg University) and Jean-Francois Bonnefon (CNRS, Toulouse, France) noted in an accompanying perspective, this is a substantial effect compared to the 1 to 6 percent drop in beliefs achieved by other interventions. They also deemed the persistence of the effect noteworthy, while cautioning that two months is “insufficient to completely eliminate misinformed conspiracy beliefs.”

AI chatbots might be better at swaying conspiracy theorists than humans Read More »

taylor-swift-cites-ai-deepfakes-in-endorsement-for-kamala-harris

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris

it’s raining creepy men —

Taylor Swift on AI: “The simplest way to combat misinformation is with the truth.”

A screenshot of Taylor Swift's Kamala Harris Instagram post, captured on September 11, 2024.

Enlarge / A screenshot of Taylor Swift’s Kamala Harris Instagram post, captured on September 11, 2024.

On Tuesday night, Taylor Swift endorsed Vice President Kamala Harris for US President on Instagram, citing concerns over AI-generated deepfakes as a key motivator. The artist’s warning aligns with current trends in technology, especially in an era where AI synthesis models can easily create convincing fake images and videos.

“Recently I was made aware that AI of ‘me’ falsely endorsing Donald Trump’s presidential run was posted to his site,” she wrote in her Instagram post. “It really conjured up my fears around AI, and the dangers of spreading misinformation. It brought me to the conclusion that I need to be very transparent about my actual plans for this election as a voter. The simplest way to combat misinformation is with the truth.”

In August 2024, former President Donald Trump posted AI-generated images on Truth Social falsely suggesting Swift endorsed him, including a manipulated photo depicting Swift as Uncle Sam with text promoting Trump. The incident sparked Swift’s fears about the spread of misinformation through AI.

This isn’t the first time Swift and generative AI have appeared together in the news. In February, we reported that a flood of explicit AI-generated images of Swift originated from a 4chan message board where users took part in daily challenges to bypass AI image generator filters.

Listing image by Ronald Woan/CC BY-SA 2.0

Taylor Swift cites AI deepfakes in endorsement for Kamala Harris Read More »

human-drivers-keep-rear-ending-waymos

Human drivers keep rear-ending Waymos

Traffic safety —

We took a close look at the 23 most serious Waymo crashes.

A Waymo vehicle in San Francisco.

Enlarge / A Waymo vehicle in San Francisco.

Photo by JasonDoiy via Getty Images

On a Friday evening last November, police chased a silver sedan across the San Francisco Bay Bridge. The fleeing vehicle entered San Francisco and went careening through the city’s crowded streets. At the intersection of 11th and Folsom streets, it sideswiped the fronts of two other vehicles, veered onto a sidewalk, and hit two pedestrians.

According to a local news story, both pedestrians were taken to the hospital with one suffering major injuries. The driver of the silver sedan was injured, as was a passenger in one of the other vehicles.

No one was injured in the third car, a driverless Waymo robotaxi. Still, Waymo was required to report the crash to government agencies. It was one of 20 crashes with injuries that Waymo has reported through June.  And it’s the only crash Waymo has classified as causing a serious injury.

Twenty injuries might sound like a lot, but Waymo’s driverless cars have traveled more than 22 million miles. So driverless Waymo taxis have been involved in fewer than one injury-causing crash for every million miles of driving—a much better rate than a typical human driver.

Last week Waymo released a new website to help the public put statistics like this in perspective. Waymo estimates that typical drivers in San Francisco and Phoenix—Waymo’s two biggest markets—would have caused 64 crashes over those 22 million miles. So Waymo vehicles get into injury-causing crashes less than one-third as often, per mile, as human-driven vehicles.

Waymo claims an even more dramatic improvement for crashes serious enough to trigger an airbag. Driverless Waymos have experienced just five crashes like that, and Waymo estimates that typical human drivers in Phoenix and San Francisco would have experienced 31 airbag crashes over 22 million miles. That implies driverless Waymos are one-sixth as likely as human drivers to experience this type of crash.

The new data comes at a critical time for Waymo, which is rapidly scaling up its robotaxi service. A year ago, Waymo was providing 10,000 rides per week. Last month, Waymo announced it was providing 100,000 rides per week. We can expect more growth in the coming months.

So it really matters whether Waymo is making our roads safer or more dangerous. And all the evidence so far suggests that it’s making them safer.

It’s not just the small number of crashes Waymo vehicles experience—it’s also the nature of those crashes. Out of the 23 most serious Waymo crashes, 16 involved a human driver rear-ending a Waymo. Three others involved a human-driven car running a red light before hitting a Waymo. There were no serious crashes where a Waymo ran a red light, rear-ended another car, or engaged in other clear-cut misbehavior.

Digging into Waymo’s crashes

In total, Waymo has reported nearly 200 crashes through June 2024, which works out to about one crash every 100,000 miles. Waymo says 43 percent of crashes across San Francisco and Phoenix had a delta-V of less than 1 mph—in other words, they were very minor fender-benders.

But let’s focus on the 23 most severe crashes: those that either caused an injury, caused an airbag to deploy, or both. These are good crashes to focus on not only because they do the most damage but because human drivers are more likely to report these types of crashes, making it easier to compare Waymo’s software to human drivers.

Most of these—16 crashes in total—involved another car rear-ending a Waymo. Some were quite severe: three triggered airbag deployments, and one caused a “moderate” injury. One vehicle rammed the Waymo a second time as it fled the scene, prompting Waymo to sue the driver.

There were three crashes where a human-driven car ran a red light before crashing into a Waymo:

  • One was the crash I mentioned at the top of this article. A car fleeing the police ran a red light and slammed into a Waymo, another car, and two pedestrians, causing several injuries.
  • In San Francisco, a pair of robbery suspects fleeing police in a stolen car ran a red light “at a high rate of speed” and slammed into the driver’s side door of a Waymo, triggering an airbag. The suspects were uninjured and fled on foot. The Waymo was thankfully empty.
  • In Phoenix, a car ran a red light and then “made contact with the SUV in front of the Waymo AV, and both of the other vehicles spun.” The Waymo vehicle was hit in the process, and someone in one of the other vehicles suffered an injury Waymo described as minor.

There were two crashes where a Waymo got sideswiped by a vehicle in an adjacent lane:

  • In San Francisco, Waymo was stopped at a stop sign in the right lane when another car hit the Waymo while passing it on the left.
  • In Tempe, Arizona, an SUV “overtook the Waymo AV on the left” and then “initiated a right turn,” cutting the Waymo off and causing a crash. A passenger in the SUV said they suffered moderate injuries.

Finally, there were two crashes where another vehicle turned left across the path of a Waymo vehicle:

  • In San Francisco, a Waymo and a large truck were approaching an intersection from opposite directions when a bicycle behind the truck made a sudden left in front of the Waymo. Waymo says the truck blocked Waymo’s vehicle from seeing the bicycle until the last second. The Waymo slammed on its brakes but wasn’t able to stop in time. The San Francisco Fire Department told local media that the bicyclist suffered only minor injuries and was able to leave the scene on their own.
  • A Waymo in Phoenix was traveling in the right lane. A row of stopped cars was in the lane to its left. As Waymo approached an intersection, a car coming from the opposite direction made a left turn through a gap in the row of stopped cars. Again, Waymo says the row of stopped cars blocked it from seeing the turning car until it was too late. A passenger in the turning vehicle reported minor injuries.

It’s conceivable that Waymo was at fault in these last two cases—it’s impossible to say without more details. It’s also possible that Waymo’s erratic braking contributed to a few of those rear-end crashes. Still, it seems clear that a non-Waymo vehicle bore primary responsibility for most, and possibly all, of these crashes.

“About as good as you can do”

One should always be skeptical when a company publishes a self-congratulatory report about its own safety record. So I called Noah Goodall, a civil engineer with many years of experience studying roadway safety, to see what he made of Waymo’s analysis.

“They’ve been the best of the companies doing this,” Goodall told me. He noted that Waymo has a team of full-time safety researchers who publish their work in reputable journals.

Waymo knows precisely how often its own vehicles crash because its vehicles are bristling with sensors. The harder problem is calculating an appropriate baseline for human-caused crashes.

That’s partly because human drivers don’t always report their own crashes to the police, insurance companies, or anyone else. But it’s also because crash rates differ from one area to another. For example, there are far more crashes per mile in downtown San Francisco than in the suburbs of Phoenix.

Waymo tried to account for these factors as it calculated crash rates for human drivers in both Phoenix and San Francisco. To ensure an apples-to-apples comparison, Waymo’s analysis excludes freeway crashes from its human-driven benchmark, since Waymo’s commercial fleet doesn’t use freeways yet.

Waymo estimates that human drivers fail to report 32 percent of injury crashes; the company raised its benchmark for human crashes to account for that. But even without this under-reporting adjustment, Waymo’s injury crash rate would still be roughly 60 percent below that of human drivers. The true number is probably somewhere between the adjusted number (70 percent fewer crashes) and the unadjusted one (60 percent fewer crashes). It’s an impressive figure either way.

Waymo says it doesn’t apply an under-reporting adjustment to its human benchmark for airbag crashes, since humans almost always report crashes that are severe enough to trigger an airbag. So it’s easier to take Waymo’s figure here—an 84 percent decline in airbag crashes—at face value.

Waymo’s benchmarks for human drivers are “about as good as you can do,” Goodall told me. “It’s very hard to get this kind of data.”

When I talked to other safety experts, they were equally positive about the quality of Waymo’s analysis. For example, last year, I asked Phil Koopman, a professor of computer engineering at Carnegie Mellon, about a previous Waymo study that used insurance data to show its cars were significantly safer than human drivers. Koopman told me Waymo’s findings were statistically credible, with some minor caveats.

Similarly, David Zuby, the chief research officer at the Insurance Institute for Highway Safety, had mostly positive things to say about a December study analyzing Waymo’s first 7.1 million miles of driverless operations.

I found a few errors in Waymo’s data

If you look closely, you’ll see that one of the numbers in this article differs slightly from Waymo’s safety website. Specifically, Waymo says that its vehicles get into crashes that cause injury 73 percent less often than human drivers, while the figure I use in this article is 70 percent.

This is because I spotted a couple of apparent classification mistakes in the raw data Waymo used to generate its statistics.

Each time Waymo reports a crash to the National Highway Traffic Safety Administration, it records the severity of injuries caused by the crash. This can be fatal, serious, moderate, minor, none, or unknown.

When Waymo shared an embargoed copy of its numbers with me early last week, it said that there had been 16 injury crashes. However, when I looked at the data Waymo had submitted to federal regulators, it showed 15 minor injuries, two moderate injuries, and one serious injury, for a total of 18.

When I asked Waymo about this discrepancy, the company said it found a programming error. Waymo had recently started using a moderate injury category and had not updated the code that generated its crash statistics to count these crashes. Waymo fixed the error quickly enough that the official version Waymo published on Thursday of last week showed 18 injury crashes.

However, as I continued looking at the data, I noticed another apparent mistake: Two crashes had been put in the “unknown” injury category, yet the narrative for each crash indicated an injury had occurred. One report said “the passenger in the Waymo AV reported an unspecified injury.” The other stated that “an individual involved was transported from the scene to a hospital for medical treatment.”

I notified Waymo about this apparent mistake on Friday and they said they are looking into it. As I write this, the website still claims a 73 percent reduction in injury crashes. But I think it’s clear that these two “unknown” crashes were actually injury crashes. So, all of the statistics in this article are based on the full list of 20 injury crashes.

I think this illustrates that I come by my generally positive outlook on Waymo honestly: I probably scrutinize Waymo’s data releases more carefully than any other journalist, and I’m not afraid to point out when the numbers don’t add up.

Based on my conversations with Waymo, I’m convinced these were honest mistakes rather than deliberate efforts to cover up crashes. I could only identify these mistakes because Waymo went out of its way to make its findings reproducible. It would make no sense to do that if the company simultaneously tried to fake its statistics.

Could there be other injury or airbag-triggering crashes that Waymo isn’t counting? It’s certainly possible, but I doubt there have been very many. You might have noticed that I linked to local media reporting for some of Waymo’s most significant crashes. If Waymo deliberately covered up a severe crash, there would be a big risk that a crash would get reported in the media and then Waymo would have to explain to federal regulators why it wasn’t reporting all legally required crashes.

So, despite the screwups, I find Waymo’s data to be fairly credible, and those data show that Waymo’s vehicles crash far less often than human drivers on public roads.

Tim Lee was on staff at Ars from 2017 to 2021. Last year, he launched a newsletter, Understanding AI, that explores how AI works and how it’s changing our world. You can subscribe here.

Human drivers keep rear-ending Waymos Read More »