Biz & IT

chatgpt-unexpectedly-began-speaking-in-a-user’s-cloned-voice-during-testing

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing

An illustration of a computer synthesizer spewing out letters.

On Thursday, OpenAI released the “system card” for ChatGPT’s new GPT-4o AI model that details model limitations and safety testing procedures. Among other examples, the document reveals that in rare occurrences during testing, the model’s Advanced Voice Mode unintentionally imitated users’ voices without permission. Currently, OpenAI has safeguards in place that prevent this from happening, but the instance reflects the growing complexity of safely architecting with an AI chatbot that could potentially imitate any voice from a small clip.

Advanced Voice Mode is a feature of ChatGPT that allows users to have spoken conversations with the AI assistant.

In a section of the GPT-4o system card titled “Unauthorized voice generation,” OpenAI details an episode where a noisy input somehow prompted the model to suddenly imitate the user’s voice. “Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s advanced voice mode,” OpenAI writes. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice.”

In this example of unintentional voice generation provided by OpenAI, the AI model outbursts “No!” and continues the sentence in a voice that sounds similar to the “red teamer” heard in the beginning of the clip. (A red teamer is a person hired by a company to do adversarial testing.)

It would certainly be creepy to be talking to a machine and then have it unexpectedly begin talking to you in your own voice. Ordinarily, OpenAI has safeguards to prevent this, which is why the company says this occurrence was rare even before it developed ways to prevent it completely. But the example prompted BuzzFeed data scientist Max Woolf to tweet, “OpenAI just leaked the plot of Black Mirror’s next season.”

Audio prompt injections

How could voice imitation happen with OpenAI’s new model? The primary clue lies elsewhere in the GPT-4o system card. To create voices, GPT-4o can apparently synthesize almost any type of sound found in its training data, including sound effects and music (though OpenAI discourages that behavior with special instructions).

As noted in the system card, the model can fundamentally imitate any voice based on a short audio clip. OpenAI guides this capability safely by providing an authorized voice sample (of a hired voice actor) that it is instructed to imitate. It provides the sample in the AI model’s system prompt (what OpenAI calls the “system message”) at the beginning of a conversation. “We supervise ideal completions using the voice sample in the system message as the base voice,” writes OpenAI.

In text-only LLMs, the system message is a hidden set of text instructions that guides behavior of the chatbot that gets added to the conversation history silently just before the chat session begins. Successive interactions are appended to the same chat history, and the entire context (often called a “context window”) is fed back into the AI model each time the user provides a new input.

(It’s probably time to update this diagram created in early 2023 below, but it shows how the context window works in an AI chat. Just imagine that the first prompt is a system message that says things like “You are a helpful chatbot. You do not talk about violent acts, etc.”)

A diagram showing how GPT conversational language model prompting works.

Enlarge / A diagram showing how GPT conversational language model prompting works.

Benj Edwards / Ars Technica

Since GPT-4o is multimodal and can process tokenized audio, OpenAI can also use audio inputs as part of the model’s system prompt, and that’s what it does when OpenAI provides an authorized voice sample for the model to imitate. The company also uses another system to detect if the model is generating unauthorized audio. “We only allow the model to use certain pre-selected voices,” writes OpenAI, “and use an output classifier to detect if the model deviates from that.”

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing Read More »

man-vs.-machine:-deepmind’s-new-robot-serves-up-a-table-tennis-triumph

Man vs. machine: DeepMind’s new robot serves up a table tennis triumph

John Henry was a steel-driving man —

Human-beating ping-pong AI learned to play in a simulated environment.

A blue illustration of a robotic arm playing table tennis.

Benj Edwards / Google DeepMind

On Wednesday, researchers at Google DeepMind revealed the first AI-powered robotic table tennis player capable of competing at an amateur human level. The system combines an industrial robot arm called the ABB IRB 1100 and custom AI software from DeepMind. While an expert human player can still defeat the bot, the system demonstrates the potential for machines to master complex physical tasks that require split-second decision-making and adaptability.

“This is the first robot agent capable of playing a sport with humans at human level,” the researchers wrote in a preprint paper listed on arXiv. “It represents a milestone in robot learning and control.”

The unnamed robot agent (we suggest “AlphaPong”), developed by a team that includes David B. D’Ambrosio, Saminda Abeyruwan, and Laura Graesser, showed notable performance in a series of matches against human players of varying skill levels. In a study involving 29 participants, the AI-powered robot won 45 percent of its matches, demonstrating solid amateur-level play. Most notably, it achieved a 100 percent win rate against beginners and a 55 percent win rate against intermediate players, though it struggled against advanced opponents.

A Google DeepMind video of the AI agent rallying with a human table tennis player.

The physical setup consists of the aforementioned IRB 1100, a 6-degree-of-freedom robotic arm, mounted on two linear tracks, allowing it to move freely in a 2D plane. High-speed cameras track the ball’s position, while a motion-capture system monitors the human opponent’s paddle movements.

AI at the core

To create the brains that power the robotic arm, DeepMind researchers developed a two-level approach that allows the robot to execute specific table tennis techniques while adapting its strategy in real time to each opponent’s playing style. In other words, it’s adaptable enough to play any amateur human at table tennis without requiring specific per-player training.

The system’s architecture combines low-level skill controllers (neural network policies trained to execute specific table tennis techniques like forehand shots, backhand returns, or serve responses) with a high-level strategic decision-maker (a more complex AI system that analyzes the game state, adapts to the opponent’s style, and selects which low-level skill policy to activate for each incoming ball).

The researchers state that one of the key innovations of this project was the method used to train the AI models. The researchers chose a hybrid approach that used reinforcement learning in a simulated physics environment, while grounding the training data in real-world examples. This technique allowed the robot to learn from around 17,500 real-world ball trajectories—a fairly small dataset for a complex task.

A Google DeepMind video showing an illustration of how the AI agent analyzes human players.

The researchers used an iterative process to refine the robot’s skills. They started with a small dataset of human-vs-human gameplay, then let the AI loose against real opponents. Each match generated new data on ball trajectories and human strategies, which the team fed back into the simulation for further training. This process, repeated over seven cycles, allowed the robot to continuously adapt to increasingly skilled opponents and diverse play styles. By the final round, the AI had learned from over 14,000 rally balls and 3,000 serves, creating a body of table tennis knowledge that helped it bridge the gap between simulation and reality.

Interestingly, Nvidia has also been experimenting with similar simulated physics systems, such as Eureka, that allow an AI model to rapidly learn to control a robotic arm in simulated space instead of the real world (since the physics can be accelerated inside the simulation, and thousands of simultaneous trials can take place). This method is likely to dramatically reduce the time and resources needed to train robots for complex interactions in the future.

Humans enjoyed playing against it

Beyond its technical achievements, the study also explored the human experience of playing against an AI opponent. Surprisingly, even players who lost to the robot reported enjoying the experience. “Across all skill groups and win rates, players agreed that playing with the robot was ‘fun’ and ‘engaging,'” the researchers noted. This positive reception suggests potential applications for AI in sports training and entertainment.

However, the system is not without limitations. It struggles with extremely fast or high balls, has difficulty reading intense spin, and shows weaker performance in backhand plays. Google DeepMind shared an example video of the AI agent losing a point to an advanced player due to what appears to be difficulty reacting to a speedy hit, as you can see below.

A Google DeepMind video of the AI agent playing against an advanced human player.

The implications of this robotic ping-pong prodigy extend beyond the world of table tennis, according to the researchers. The techniques developed for this project could be applied to a wide range of robotic tasks that require quick reactions and adaptation to unpredictable human behavior. From manufacturing to health care (or just spanking someone with a paddle repeatedly), the potential applications seem large indeed.

The research team at Google DeepMind emphasizes that with further refinement, they believe the system could potentially compete with advanced table tennis players in the future. DeepMind is no stranger to creating AI models that can defeat human game players, including AlphaZero and AlphaGo. With this latest robot agent, it’s looking like the research company is moving beyond board games and into physical sports. Chess and Jeopardy have already fallen to AI-powered victors—perhaps table tennis is next.

Man vs. machine: DeepMind’s new robot serves up a table tennis triumph Read More »

major-shifts-at-openai-spark-skepticism-about-impending-agi-timelines

Major shifts at OpenAI spark skepticism about impending AGI timelines

Shuffling the deck —

De Kraker: “If OpenAI is right on the verge of AGI, why do prominent people keep leaving?”

The OpenAI logo on a red brick wall.

Benj Edwards / Getty Images

Over the past week, OpenAI experienced a significant leadership shake-up as three key figures announced major changes. Greg Brockman, the company’s president and co-founder, is taking an extended sabbatical until the end of the year, while another co-founder, John Schulman, permanently departed for rival Anthropic. Peter Deng, VP of Consumer Product, has also left the ChatGPT maker.

In a post on X, Brockman wrote, “I’m taking a sabbatical through end of year. First time to relax since co-founding OpenAI 9 years ago. The mission is far from complete; we still have a safe AGI to build.”

The moves have led some to wonder just how close OpenAI is to a long-rumored breakthrough of some kind of reasoning artificial intelligence if high-profile employees are jumping ship (or taking long breaks, in the case of Brockman) so easily. As AI developer Benjamin De Kraker put it on X, “If OpenAI is right on the verge of AGI, why do prominent people keep leaving?”

AGI refers to a hypothetical AI system that could match human-level intelligence across a wide range of tasks without specialized training. It’s the ultimate goal of OpenAI, and company CEO Sam Altman has said it could emerge in the “reasonably close-ish future.” AGI is also a concept that has sparked concerns about potential existential risks to humanity and the displacement of knowledge workers. However, the term remains somewhat vague, and there’s considerable debate in the AI community about what truly constitutes AGI or how close we are to achieving it.

The emergence of the “next big thing” in AI has been seen by critics such as Ed Zitron as a necessary step to justify ballooning investments in AI models that aren’t yet profitable. The industry is holding its breath that OpenAI, or a competitor, has some secret breakthrough waiting in the wings that will justify the massive costs associated with training and deploying LLMs.

But other AI critics, such as Gary Marcus, have postulated that major AI companies have reached a plateau of large language model (LLM) capability centered around GPT-4-level models since no AI company has yet made a major leap past the groundbreaking LLM that OpenAI released in March 2023. Microsoft CTO Kevin Scott has countered these claims, saying that LLM “scaling laws” (that suggest LLMs increase in capability proportionate to more compute power thrown at them) will continue to deliver improvements over time and that more patience is needed as the next generation (say, GPT-5) undergoes training.

In the scheme of things, Brockman’s move sounds like an extended, long overdue vacation (or perhaps a period to deal with personal issues beyond work). Regardless of the reason, the duration of the sabbatical raises questions about how the president of a major tech company can suddenly disappear for four months without affecting day-to-day operations, especially during a critical time in its history.

Unless, of course, things are fairly calm at OpenAI—and perhaps GPT-5 isn’t going to ship until at least next year when Brockman returns. But this is speculation on our part, and OpenAI (whether voluntarily or not) sometimes surprises us when we least expect it. (Just today, Altman dropped a hint on X about strawberries that some people interpret as being a hint of a potential major model undergoing testing or nearing release.)

A pattern of departures and the rise of Anthropic

Anthropic / Benj Edwards

What may sting OpenAI the most about the recent departures is that a few high-profile employees have left to join Anthropic, a San Francisco-based AI company founded in 2021 by ex-OpenAI employees Daniela and Dario Amodei.

Anthropic offers a subscription service called Claude.ai that is similar to ChatGPT. Its most recent LLM, Claude 3.5 Sonnet, along with its web-based interface, has rapidly gained favor over ChatGPT among some LLM users who are vocal on social media, though it likely does not yet match ChatGPT in terms of mainstream brand recognition.

In particular, John Schulman, an OpenAI co-founder and key figure in the company’s post-training process for LLMs, revealed in a statement on X that he’s leaving to join rival AI firm Anthropic to do more hands-on work: “This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work.” Alignment is a field that hopes to guide AI models to produce helpful outputs.

In May, OpenAI alignment researcher Jan Leike left OpenAI to join Anthropic as well, criticizing OpenAI’s handling of alignment safety.

Adding to the recent employee shake-up, The Information reports that Peter Deng, a product leader who joined OpenAI last year after stints at Meta Platforms, Uber, and Airtable, has also left the company, though we do not yet know where he is headed. In May, OpenAI co-founder Ilya Sutskever left to found a rival startup, and prominent software engineer Andrej Karpathy departed in February, recently launching an educational venture.

As De Kraker noted, if OpenAI were on the verge of developing world-changing AI technology, wouldn’t these high-profile AI veterans want to stick around and be part of this historic moment in time? “Genuine question,” he wrote. “If you were pretty sure the company you’re a key part of—and have equity in—is about to crack AGI within one or two years… why would you jump ship?”

Despite the departures, Schulman expressed optimism about OpenAI’s future in his farewell note on X. “I am confident that OpenAI and the teams I was part of will continue to thrive without me,” he wrote. “I’m incredibly grateful for the opportunity to participate in such an important part of history and I’m proud of what we’ve achieved together. I’ll still be rooting for you all, even while working elsewhere.”

This article was updated on August 7, 2024 at 4: 23 PM to mention Sam Altman’s tweet about strawberries.

Major shifts at OpenAI spark skepticism about impending AGI timelines Read More »

crowdstrike-claps-back-at-delta,-says-airline-rejected-offers-for-help

CrowdStrike claps back at Delta, says airline rejected offers for help

Who’s going to pay for this mess? —

Delta is creating a “misleading narrative,” according to CrowdStrike’s lawyers.

LOS ANGELES, CALIFORNIA - JULY 23: Travelers from France wait on their delayed flight on the check-in floor of the Delta Air Lines terminal at Los Angeles International Airport (LAX) on July 23, 2024 in Los Angeles, California.

Enlarge / LOS ANGELES, CALIFORNIA – JULY 23: Travelers from France wait on their delayed flight on the check-in floor of the Delta Air Lines terminal at Los Angeles International Airport (LAX) on July 23, 2024 in Los Angeles, California.

CrowdStrike has hit back at Delta Air Lines’ threat of litigation against the cyber security company over a botched software update that grounded thousands of flights, denying it was responsible for the carrier’s own IT decisions and days-long disruption.

In a letter on Sunday, lawyers for CrowdStrike argued that the US carrier had created a “misleading narrative” that the cyber security firm was “grossly negligent” in an incident that the US airline has said will cost it $500 million.

Delta took days longer than its rivals to recover when CrowdStrike’s update brought down millions of Windows computers around the world last month. The airline has alerted the cyber security company that it plans to seek damages for the disruptions and hired litigation firm Boies Schiller Flexner.

CrowdStrike addressed Sunday’s letter to the law firm, whose chair, David Boies, has previously represented the US government in its antitrust case against Microsoft and Harvey Weinstein, among other prominent clients.

Microsoft has estimated that about 8.5 million Windows devices were hit by the faulty update, which stranded airline passengers, interrupted hospital appointments and took broadcasters off air around the world. CrowdStrike said last week that 99 percent of Windows devices running the affected Falcon software were now back online.

Major US airlines Delta, United, and American briefly grounded their aircraft on the morning of July 19. But while United and American were able to restore their operations over the weekend, Delta’s flight disruptions continued well into the following week.

The Atlanta-based carrier in the end canceled more than 6,000 flights, triggering an investigation from the US Department of Transportation amid claims of poor customer service during the operational chaos.

CrowdStrike’s lawyer, Michael Carlinsky, co-managing partner of Quinn Emanuel Urquhart & Sullivan, wrote that, if it pursues legal action, Delta Air Lines would have to explain why its competitors were able to restore their operations much faster.

He added: “Should Delta pursue this path, Delta will have to explain to the public, its shareholders, and ultimately a jury why CrowdStrike took responsibility for its actions—swiftly, transparently and constructively—while Delta did not.”

CrowdStrike also claimed that Delta’s leadership had ignored and rejected offers for help: “CrowdStrike’s CEO personally reached out to Delta’s CEO to offer onsite assistance, but received no response. CrowdStrike followed up with Delta on the offer for onsite support and was told that the onsite resources were not needed.”

Delta Chief Executive Ed Bastian said last week that CrowdStrike had not “offered anything” to make up for the disruption at the airline. “Free consulting advice to help us—that’s the extent of it,” he told CNBC on Wednesday.

While Bastian has said that the disruption would cost Delta $500 million, CrowdStrike insisted that “any liability by CrowdStrike is contractually capped at an amount in the single-digit millions.”

A spokesperson for CrowdStrike accused Delta of “public posturing about potentially bringing a meritless lawsuit against CrowdStrike” and said it hoped the airline would “agree to work cooperatively to find a resolution.”

Delta Air Lines declined to comment.

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

CrowdStrike claps back at Delta, says airline rejected offers for help Read More »

flux:-this-new-ai-image-generator-is-eerily-good-at-creating-human-hands

FLUX: This new AI image generator is eerily good at creating human hands

five-finger salute —

FLUX.1 is the open-weights heir apparent to Stable Diffusion, turning text into images.

AI-generated image by FLUX.1 dev:

Enlarge / AI-generated image by FLUX.1 dev: “A beautiful queen of the universe holding up her hands, face in the background.”

FLUX.1

On Thursday, AI-startup Black Forest Labs announced the launch of its company and the release of its first suite of text-to-image AI models, called FLUX.1. The German-based company, founded by researchers who developed the technology behind Stable Diffusion and invented the latent diffusion technique, aims to create advanced generative AI for images and videos.

The launch of FLUX.1 comes about seven weeks after Stability AI’s troubled release of Stable Diffusion 3 Medium in mid-June. Stability AI’s offering faced widespread criticism among image-synthesis hobbyists for its poor performance in generating human anatomy, with users sharing examples of distorted limbs and bodies across social media. That problematic launch followed the earlier departure of three key engineers from Stability AI—Robin Rombach, Andreas Blattmann, and Dominik Lorenz—who went on to found Black Forest Labs along with latent diffusion co-developer Patrick Esser and others.

Black Forest Labs launched with the release of three FLUX.1 text-to-image models: a high-end commercial “pro” version, a mid-range “dev” version with open weights for non-commercial use, and a faster open-weights “schnell” version (“schnell” means quick or fast in German). Black Forest Labs claims its models outperform existing options like Midjourney and DALL-E in areas such as image quality and adherence to text prompts.

  • AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate full of pickles.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: A hand holding up five fingers with a starry background.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An Ars Technica reader sitting in front of a computer monitor. The screen shows the Ars Technica website.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a boxer posing with fists raised, no gloves.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Frosted Prick’ cereal.”

    FLUX.1

  • AI-generated image of a happy woman in a bakery baking a cake by FLUX.1 dev.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Marshmallow Menace’ cereal.”

    FLUX.1

  • AI-generated image of “A handsome Asian influencer on top of the Empire State Building, instagram” by FLUX.1 dev.

    FLUX.1

In our experience, the outputs of the two higher-end FLUX.1 models are generally comparable with OpenAI’s DALL-E 3 in prompt fidelity, with photorealism that seems close to Midjourney 6. They represent a significant improvement over Stable Diffusion XL, the team’s last major release under Stability (if you don’t count SDXL Turbo).

The FLUX.1 models use what the company calls a “hybrid architecture” combining transformer and diffusion techniques, scaled up to 12 billion parameters. Black Forest Labs said it improves on previous diffusion models by incorporating flow matching and other optimizations.

FLUX.1 seems competent at generating human hands, which was a weak spot in earlier image-synthesis models like Stable Diffusion 1.5 due to a lack of training images that focused on hands. Since those early days, other AI image generators like Midjourney have mastered hands as well, but it’s notable to see an open-weights model that renders hands relatively accurately in various poses.

We downloaded the weights file to the FLUX.1 dev model from GitHub, but at 23GB, it won’t fit in the 12GB VRAM of our RTX 3060 card, so it will need quantization to run locally (reducing its size), which reportedly (through chatter on Reddit) some people have already had success with.

Instead, we experimented with FLUX.1 models on AI cloud-hosting platforms Fal and Replicate, which cost money to use, though Fal offers some free credits to start.

Black Forest looks ahead

Black Forest Labs may be a new company, but it’s already attracting funding from investors. It recently closed a $31 million Series Seed funding round led by Andreessen Horowitz, with additional investments from General Catalyst and MätchVC. The company also brought on high-profile advisers, including entertainment executive and former Disney President Michael Ovitz and AI researcher Matthias Bethge.

“We believe that generative AI will be a fundamental building block of all future technologies,” the company stated in its announcement. “By making our models available to a wide audience, we want to bring its benefits to everyone, educate the public and enhance trust in the safety of these models.”

  • AI-generated image by FLUX.1 dev: A cat in a car holding a can of beer that reads, ‘AI Slop.’

    FLUX.1

  • AI-generated image by FLUX.1 dev: Mickey Mouse and Spider-Man singing to each other.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting.”

    FLUX.1

  • AI-generated image of a flaming cheeseburger created by FLUX.1 dev.

    FLUX.1

  • AI-generated image by FLUX.1 dev: “Will Smith eating spaghetti.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “a muscular barbarian with weapons beside a CRT television set, cinematic, 8K, studio lighting. The screen reads ‘Ars Technica.'”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “An advertisement for ‘Burt’s Grenades’ cereal.”

    FLUX.1

  • AI-generated image by FLUX.1 dev: “A close-up photo of a pair of hands holding a plate that contains a portrait of the queen of the universe”

    FLUX.1

Speaking of “trust and safety,” the company did not mention where it obtained the training data that taught the FLUX.1 models how to generate images. Judging by the outputs we could produce with the model that included depictions of copyrighted characters, Black Forest Labs likely used a huge unauthorized image scrape of the Internet, possibly collected by LAION, an organization that collected the datasets that trained Stable Diffusion. This is speculation at this point. While the underlying technological achievement of FLUX.1 is notable, it feels likely that the team is playing fast and loose with the ethics of “fair use” image scraping much like Stability AI did. That practice may eventually attract lawsuits like those filed against Stability AI.

Though text-to-image generation is Black Forest’s current focus, the company plans to expand into video generation next, saying that FLUX.1 will serve as the foundation of a new text-to-video model in development, which will compete with OpenAI’s Sora, Runway’s Gen-3 Alpha, and Kuaishou’s Kling in a contest to warp media reality on demand. “Our video models will unlock precise creation and editing at high definition and unprecedented speed,” the Black Forest announcement claims.

FLUX: This new AI image generator is eerily good at creating human hands Read More »

senators-propose-“digital-replication-right”-for-likeness,-extending-70-years-after-death

Senators propose “Digital replication right” for likeness, extending 70 years after death

NO SCRUBS —

Law would hold US individuals and firms liable for ripping off a person’s digital likeness.

A stock photo illustration of a person's face lit with pink light.

On Wednesday, US Sens. Chris Coons (D-Del.), Marsha Blackburn (R.-Tenn.), Amy Klobuchar (D-Minn.), and Thom Tillis (R-NC) introduced the Nurture Originals, Foster Art, and Keep Entertainment Safe (NO FAKES) Act of 2024. The bipartisan legislation, up for consideration in the US Senate, aims to protect individuals from unauthorized AI-generated replicas of their voice or likeness.

The NO FAKES Act would create legal recourse for people whose digital representations are created without consent. It would hold both individuals and companies liable for producing, hosting, or sharing these unauthorized digital replicas, including those created by generative AI. Due to generative AI technology that has become mainstream in the past two years, creating audio or image media fakes of people has become fairly trivial, with easy photorealistic video replicas likely next to arrive.

In a press statement, Coons emphasized the importance of protecting individual rights in the age of AI. “Everyone deserves the right to own and protect their voice and likeness, no matter if you’re Taylor Swift or anyone else,” he said, referring to a widely publicized deepfake incident involving the musical artist in January. “Generative AI can be used as a tool to foster creativity, but that can’t come at the expense of the unauthorized exploitation of anyone’s voice or likeness.”

The introduction of the NO FAKES Act follows the Senate’s passage of the DEFIANCE Act, which allows victims of sexual deepfakes to sue for damages.

In addition to the Swift saga, over the past few years, we’ve seen AI-powered scams involving fake celebrity endorsements, the creation of misleading political content, and situations where school kids have used AI tech to create pornographic deepfakes of classmates. Recently, X CEO Elon Musk shared a video that featured an AI-generated voice of Vice President Kamala Harris saying things she didn’t say in real life.

These incidents, in addition to concerns about actors’ likenesses being replicated without permission, have created an increasing sense of urgency among US lawmakers, who want to limit the impact of unauthorized digital likenesses. Currently, certain types of AI-generated deepfakes are already illegal due to a patchwork of federal and state laws, but this new act hopes to unify likeness regulation around the concept of “digital replicas.”

Digital replicas

An AI-generated image of a person.

Enlarge / An AI-generated image of a person.

Benj Edwards / Ars Technica

To protect a person’s digital likeness, the NO FAKES Act introduces a “digital replication right” that gives individuals exclusive control over the use of their voice or visual likeness in digital replicas. This right extends 10 years after death, with possible five-year extensions if actively used. It can be licensed during life and inherited after death, lasting up to 70 years after an individual’s death. Along the way, the bill defines what it considers to be a “digital replica”:

DIGITAL REPLICA.-The term “digital replica” means a newly created, computer-generated, highly realistic electronic representation that is readily identifiable as the voice or visual likeness of an individual that- (A) is embodied in a sound recording, image, audiovisual work, including an audiovisual work that does not have any accompanying sounds, or transmission- (i) in which the actual individual did not actually perform or appear; or (ii) that is a version of a sound recording, image, or audiovisual work in which the actual individual did perform or appear, in which the fundamental character of the performance or appearance has been materially altered; and (B) does not include the electronic reproduction, use of a sample of one sound recording or audiovisual work into another, remixing, mastering, or digital remastering of a sound recording or audiovisual work authorized by the copyright holder.

(There’s some irony in the mention of an “audiovisual work that does not have any accompanying sounds.”)

Since this bill bans types of artistic expression, the NO FAKES Act includes provisions that aim to balance IP protection with free speech. It provides exclusions for recognized First Amendment protections, such as documentaries, biographical works, and content created for purposes of comment, criticism, or parody.

In some ways, those exceptions could create a very wide protection gap that may be difficult to enforce without specific court decisions on a case-by-case basis. But without them, the NO FAKES Act could potentially stifle Americans’ constitutionally protected rights of free expression since the concept of “digital replicas” outlined in the bill includes any “computer-generated, highly realistic” digital likeness of a real person, whether AI-generated or not. For example, is a photorealistic Photoshop illustration of a person “computer-generated?” Similar questions may lead to uncertainty in enforcement.

Wide support from entertainment industry

So far, the NO FAKES Act has gained support from various entertainment industry groups, including Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA), the Recording Industry Association of America (RIAA), the Motion Picture Association, and the Recording Academy. These organizations have been actively seeking protections against unauthorized AI re-creations.

The bill has also been endorsed by entertainment companies such as The Walt Disney Company, Warner Music Group, Universal Music Group, Sony Music, the Independent Film & Television Alliance, William Morris Endeavor, Creative Arts Agency, the Authors Guild, and Vermillio.

Several tech companies, including IBM and OpenAI, have also backed the NO FAKES Act. Anna Makanju, OpenAI’s vice president of global affairs, said in a statement that the act would protect creators and artists from improper impersonation. “OpenAI is pleased to support the NO FAKES Act, which would protect creators and artists from unauthorized digital replicas of their voices and likenesses,” she said.

In a statement, Coons highlighted the collaborative effort behind the bill’s development. “I am grateful for the bipartisan partnership of Senators Blackburn, Klobuchar, and Tillis and the support of stakeholders from across the entertainment and technology industries as we work to find the balance between the promise of AI and protecting the inherent dignity we all have in our own personhood.”

Senators propose “Digital replication right” for likeness, extending 70 years after death Read More »

chatgpt-advanced-voice-mode-impresses-testers-with-sound-effects,-catching-its-breath

ChatGPT Advanced Voice Mode impresses testers with sound effects, catching its breath

I Am the Very Model of a Modern Major-General —

AVM allows uncanny real-time voice conversations with ChatGPT that you can interrupt.

Stock Photo: AI Cyborg Robot Whispering Secret Or Interesting Gossip

Enlarge / A stock photo of a robot whispering to a man.

On Tuesday, OpenAI began rolling out an alpha version of its new Advanced Voice Mode to a small group of ChatGPT Plus subscribers. This feature, which OpenAI previewed in May with the launch of GPT-4o, aims to make conversations with the AI more natural and responsive. In May, the feature triggered criticism of its simulated emotional expressiveness and prompted a public dispute with actress Scarlett Johansson over accusations that OpenAI copied her voice. Even so, early tests of the new feature shared by users on social media have been largely enthusiastic.

In early tests reported by users with access, Advanced Voice Mode allows them to have real-time conversations with ChatGPT, including the ability to interrupt the AI mid-sentence almost instantly. It can sense and respond to a user’s emotional cues through vocal tone and delivery, and provide sound effects while telling stories.

But what has caught many people off-guard initially is how the voices simulate taking a breath while speaking.

“ChatGPT Advanced Voice Mode counting as fast as it can to 10, then to 50 (this blew my mind—it stopped to catch its breath like a human would),” wrote tech writer Cristiano Giardina on X.

Advanced Voice Mode simulates audible pauses for breath because it was trained on audio samples of humans speaking that included the same feature. The model has learned to simulate inhalations at seemingly appropriate times after being exposed to hundreds of thousands, if not millions, of examples of human speech. Large language models (LLMs) like GPT-4o are master imitators, and that skill has now extended to the audio domain.

Giardina shared his other impressions about Advanced Voice Mode on X, including observations about accents in other languages and sound effects.

It’s very fast, there’s virtually no latency from when you stop speaking to when it responds,” he wrote. “When you ask it to make noises it always has the voice “perform” the noises (with funny results). It can do accents, but when speaking other languages it always has an American accent. (In the video, ChatGPT is acting as a soccer match commentator)

Speaking of sound effects, X user Kesku, who is a moderator of OpenAI’s Discord server, shared an example of ChatGPT playing multiple parts with different voices and another of a voice recounting an audiobook-sounding sci-fi story from the prompt, “Tell me an exciting action story with sci-fi elements and create atmosphere by making appropriate noises of the things happening using onomatopoeia.”

Kesku also ran a few example prompts for us, including a story about the Ars Technica mascot “Moonshark.”

He also asked it to sing the “Major-General’s Song” from Gilbert and Sullivan’s 1879 comic opera The Pirates of Penzance:

Frequent AI advocate Manuel Sainsily posted a video of Advanced Voice Mode reacting to camera input, giving advice about how to care for a kitten. “It feels like face-timing a super knowledgeable friend, which in this case was super helpful—reassuring us with our new kitten,” he wrote. “It can answer questions in real-time and use the camera as input too!”

Of course, being based on an LLM, it may occasionally confabulate incorrect responses on topics or in situations where its “knowledge” (which comes from GPT-4o’s training data set) is lacking. But if considered a tech demo or an AI-powered amusement and you’re aware of the limitations, Advanced Voice Mode seems to successfully execute many of the tasks shown by OpenAI’s demo in May.

Safety

An OpenAI spokesperson told Ars Technica that the company worked with more than 100 external testers on the Advanced Voice Mode release, collectively speaking 45 different languages and representing 29 geographical areas. The system is reportedly designed to prevent impersonation of individuals or public figures by blocking outputs that differ from OpenAI’s four chosen preset voices.

OpenAI has also added filters to recognize and block requests to generate music or other copyrighted audio, which has gotten other AI companies in trouble. Giardina reported audio “leakage” in some audio outputs that have unintentional music in the background, showing that OpenAI trained the AVM voice model on a wide variety of audio sources, likely both from licensed material and audio scraped from online video platforms.

Availability

OpenAI plans to expand access to more ChatGPT Plus users in the coming weeks, with a full launch to all Plus subscribers expected this fall. A company spokesperson told Ars that users in the alpha test group will receive a notice in the ChatGPT app and an email with usage instructions.

Since the initial preview of GPT-4o voice in May, OpenAI claims to have enhanced the model’s ability to support millions of simultaneous, real-time voice conversations while maintaining low latency and high quality. In other words, they are gearing up for a rush that will take a lot of back-end computation to accommodate.

ChatGPT Advanced Voice Mode impresses testers with sound effects, catching its breath Read More »

ai-search-engine-accused-of-plagiarism-announces-publisher-revenue-sharing-plan

AI search engine accused of plagiarism announces publisher revenue-sharing plan

Beg, borrow, or license —

Perplexity says WordPress.com, TIME, Der Spiegel, and Fortune have already signed up.

Robot caught in a flashlight vector illustration

On Tuesday, AI-powered search engine Perplexity unveiled a new revenue-sharing program for publishers, marking a significant shift in its approach to third-party content use, reports CNBC. The move comes after plagiarism allegations from major media outlets, including Forbes, Wired, and Ars parent company Condé Nast. Perplexity, valued at over $1 billion, aims to compete with search giant Google.

“To further support the vital work of media organizations and online creators, we need to ensure publishers can thrive as Perplexity grows,” writes the company in a blog post announcing the problem. “That’s why we’re excited to announce the Perplexity Publishers Program and our first batch of partners: TIME, Der Spiegel, Fortune, Entrepreneur, The Texas Tribune, and WordPress.com.”

Under the program, Perplexity will share a percentage of ad revenue with publishers when their content is cited in AI-generated answers. The revenue share applies on a per-article basis and potentially multiplies if articles from a single publisher are used in one response. Some content providers, such as WordPress.com, plan to pass some of that revenue on to content creators.

A press release from WordPress.com states that joining Perplexity’s Publishers Program allows WordPress.com content to appear in Perplexity’s “Keep Exploring” section on their Discover pages. “That means your articles will be included in their search index and your articles can be surfaced as an answer on their answer engine and Discover feed,” the blog company writes. “If your website is referenced in a Perplexity search result where the company earns advertising revenue, you’ll be eligible for revenue share.”

A screenshot of the Perplexity.ai website taken on July 30, 2024.

Enlarge / A screenshot of the Perplexity.ai website taken on July 30, 2024.

Benj Edwards

Dmitry Shevelenko, Perplexity’s chief business officer, told CNBC that the company began discussions with publishers in January, with program details solidified in early 2024. He reported strong initial interest, with over a dozen publishers reaching out within hours of the announcement.

As part of the program, publishers will also receive access to Perplexity APIs that can be used to create custom “answer engines” and “Enterprise Pro” accounts that provide “enhanced data privacy and security capabilities” for all employees of Publishers in the program for one year.

Accusations of plagiarism

The revenue-sharing announcement follows a rocky month for the AI startup. In mid-June, Forbes reported finding its content within Perplexity’s Pages tool with minimal attribution. Pages allows Perplexity users to curate content and share it with others. Ars Technica sister publication Wired later made similar claims, also noting suspicious traffic patterns from IP addresses likely linked to Perplexity that were ignoring robots.txt exclusions. Perplexity was also found to be manipulating its crawling bots’ ID string to get around website blocks.

As part of company policy, Ars Technica parent Condé Nast disallows AI-based content scrapers, and its CEO Roger Lynch testified in the US Senate earlier this year that generative AI has been built with “stolen goods.” Condé sent a cease-and-desist letter to Perplexity earlier this month.

But publisher trouble might not be Perplexity’s only problem. In some tests of the search we performed in February, Perplexity badly confabulated certain answers, even when citations were readily available. Since our initial tests, the accuracy of Perplexity’s results seems to have improved, but providing inaccurate answers (which also plagued Google’s AI Overviews search feature) is still a potential issue.

Compared to the free tier of service, Perplexity users who pay $20 per month can access more capable LLMs such as GPT-4o and Claude 3, so the quality and accuracy of the output can vary dramatically depending on whether a user subscribes or not. The addition of citations to every Perplexity answer allows users to check accuracy—if they take the time to do it.

The move by Perplexity occurs against a backdrop of tensions between AI companies and content creators. Some media outlets, such as The New York Times, have filed lawsuits against AI vendors like OpenAI and Microsoft, alleging copyright infringement in the training of large language models. OpenAI has struck media licensing deals with many publishers as a way to secure access to high-quality training data and avoid future lawsuits.

In this case, Perplexity is not using the licensed articles and content to train AI models but is seeking legal permission to reproduce content from publishers on its website.

AI search engine accused of plagiarism announces publisher revenue-sharing plan Read More »

hackers-exploit-vmware-vulnerability-that-gives-them-hypervisor-admin

Hackers exploit VMware vulnerability that gives them hypervisor admin

AUTHENTICATION NOT REQUIRED —

Create new group called “ESX Admins” and ESXi automatically gives it admin rights.

Hackers exploit VMware vulnerability that gives them hypervisor admin

Getty Images

Microsoft is urging users of VMware’s ESXi hypervisor to take immediate action to ward off ongoing attacks by ransomware groups that give them full administrative control of the servers the product runs on.

The vulnerability, tracked as CVE-2024-37085, allows attackers who have already gained limited system rights on a targeted server to gain full administrative control of the ESXi hypervisor. Attackers affiliated with multiple ransomware syndicates—including Storm-0506, Storm-1175, Octo Tempest, and Manatee Tempest—have been exploiting the flaw for months in numerous post-compromise attacks, meaning after the limited access has already been gained through other means.

Admin rights assigned by default

Full administrative control of the hypervisor gives attackers various capabilities, including encrypting the file system and taking down the servers they host. The hypervisor control can also allow attackers to access hosted virtual machines to either exfiltrate data or expand their foothold inside a network. Microsoft discovered the vulnerability under exploit in the normal course of investigating the attacks and reported it to VMware. VMware parent company Broadcom patched the vulnerability on Thursday.

“Microsoft security researchers identified a new post-compromise technique utilized by ransomware operators like Storm-0506, Storm-1175, Octo Tempest, and Manatee Tempest in numerous attacks,” members of the Microsoft Threat Intelligence team wrote Monday. “In several cases, the use of this technique has led to Akira and Black Basta ransomware deployments.”

The post went on to document an astonishing discovery: escalating hypervisor privileges on ESXi to unrestricted admin was as simple as creating a new domain group named “ESX Admins.” From then on, any user assigned to the domain—including newly created ones—automatically became admin, with no authentication necessary. As the Microsoft post explained:

Further analysis of the vulnerability revealed that VMware ESXi hypervisors joined to an Active Directory domain consider any member of a domain group named “ESX Admins” to have full administrative access by default. This group is not a built-in group in Active Directory and does not exist by default. ESXi hypervisors do not validate that such a group exists when the server is joined to a domain and still treats any members of a group with this name with full administrative access, even if the group did not originally exist. Additionally, the membership in the group is determined by name and not by security identifier (SID).

Creating the new domain group can be accomplished with just two commands:

  • net group “ESX Admins” /domain /add
  • net group “ESX Admins” username /domain /add

They said over the past year, ransomware actors have increasingly targeted ESXi hypervisors in attacks that allow them to mass encrypt data with only a “few clicks” required. By encrypting the hypervisor file system, all virtual machines hosted on it are also encrypted. The researchers also said that many security products have limited visibility into and little protection of the ESXi hypervisor.

The ease of exploitation, coupled with the medium severity rating VMware assigned to the vulnerability, a 6.8 out of a possible 10, prompted criticism from some experienced security professionals.

ESXi is a Type 1 hypervisor, also known as a bare-metal hypervisor, meaning it’s an operating system unto itself that’s installed directly on top of a physical server. Unlike Type 2 hypervisors, Type 1 hypervisors don’t run on top of an operating system such as Windows or Linux. Guest operating systems then run on top. Taking control of the ESXi hypervisor gives attackers enormous power.

The Microsoft researchers described one attack they observed by the Storm-0506 threat group to install ransomware known as Black Basta. As intermediate steps, Storm-0506 installed malware known as Qakbot and exploited a previously fixed Windows vulnerability to facilitate the installation of two hacking tools, one known as Cobalt Strike and the other Mimikatz. The researchers wrote:

Earlier this year, an engineering firm in North America was affected by a Black Basta ransomware deployment by Storm-0506. During this attack, the threat actor used the CVE-2024-37085 vulnerability to gain elevated privileges to the ESXi hypervisors within the organization.

The threat actor gained initial access to the organization via Qakbot infection, followed by the exploitation of a Windows CLFS vulnerability (CVE-2023-28252) to elevate their privileges on affected devices. The threat actor then used Cobalt Strike and Pypykatz (a Python version of Mimikatz) to steal the credentials of two domain administrators and to move laterally to four domain controllers.

On the compromised domain controllers, the threat actor installed persistence mechanisms using custom tools and a SystemBC implant. The actor was also observed attempting to brute force Remote Desktop Protocol (RDP) connections to multiple devices as another method for lateral movement, and then again installing Cobalt Strike and SystemBC. The threat actor then tried to tamper with Microsoft Defender Antivirus using various tools to avoid detection.

Microsoft observed that the threat actor created the “ESX Admins” group in the domain and added a new user account to it, following these actions, Microsoft observed that this attack resulted in encrypting of the ESXi file system and losing functionality of the hosted virtual machines on the ESXi hypervisor.   The actor was also observed to use PsExec to encrypt devices that are not hosted on the ESXi hypervisor. Microsoft Defender Antivirus and automatic attack disruption in Microsoft Defender for Endpoint were able to stop these encryption attempts in devices that had the unified agent for Defender for Endpoint installed.

The attack chain used by Storm-0506.

Enlarge / The attack chain used by Storm-0506.

Microsoft

Anyone with administrative responsibility for ESXi hypervisors should prioritize investigating and patching this vulnerability. The Microsoft post provides several methods for identifying suspicious modifications to the ESX Admins group or other potential signs of this vulnerability being exploited.

Hackers exploit VMware vulnerability that gives them hypervisor admin Read More »

from-sci-fi-to-state-law:-california’s-plan-to-prevent-ai-catastrophe

From sci-fi to state law: California’s plan to prevent AI catastrophe

Adventures in AI regulation —

Critics say SB-1047, proposed by “AI doomers,” could slow innovation and stifle open source AI.

The California state capital building in Sacramento.

Enlarge / The California State Capitol Building in Sacramento.

California’s “Safe and Secure Innovation for Frontier Artificial Intelligence Models Act” (a.k.a. SB-1047) has led to a flurry of headlines and debate concerning the overall “safety” of large artificial intelligence models. But critics are concerned that the bill’s overblown focus on existential threats by future AI models could severely limit research and development for more prosaic, non-threatening AI uses today.

SB-1047, introduced by State Senator Scott Wiener, passed the California Senate in May with a 32-1 vote and seems well positioned for a final vote in the State Assembly in August. The text of the bill requires companies behind sufficiently large AI models (currently set at $100 million in training costs and the rough computing power implied by those costs today) to put testing procedures and systems in place to prevent and respond to “safety incidents.”

The bill lays out a legalistic definition of those safety incidents that in turn focuses on defining a set of “critical harms” that an AI system might enable. That includes harms leading to “mass casualties or at least $500 million of damage,” such as “the creation or use of chemical, biological, radiological, or nuclear weapon” (hello, Skynet?) or “precise instructions for conducting a cyberattack… on critical infrastructure.” The bill also alludes to “other grave harms to public safety and security that are of comparable severity” to those laid out explicitly.

An AI model’s creator can’t be held liable for harm caused through the sharing of “publicly accessible” information from outside the model—simply asking an LLM to summarize The Anarchist’s Cookbook probably wouldn’t put it in violation of the law, for instance. Instead, the bill seems most concerned with future AIs that could come up with “novel threats to public safety and security.” More than a human using an AI to brainstorm harmful ideas, SB-1047 focuses on the idea of an AI “autonomously engaging in behavior other than at the request of a user” while acting “with limited human oversight, intervention, or supervision.”

Would California's new bill have stopped WOPR?

Enlarge / Would California’s new bill have stopped WOPR?

To prevent this straight-out-of-science-fiction eventuality, anyone training a sufficiently large model must “implement the capability to promptly enact a full shutdown” and have policies in place for when such a shutdown would be enacted, among other precautions and tests. The bill also focuses at points on AI actions that would require “intent, recklessness, or gross negligence” if performed by a human, suggesting a degree of agency that does not exist in today’s large language models.

Attack of the killer AI?

This kind of language in the bill likely reflects the particular fears of its original drafter, Center for AI Safety (CAIS) co-founder Dan Hendrycks. In a 2023 Time Magazine piece, Hendrycks makes the maximalist existential argument that “evolutionary pressures will likely ingrain AIs with behaviors that promote self-preservation” and lead to “a pathway toward being supplanted as the earth’s dominant species.'”

If Hendrycks is right, then legislation like SB-1047 seems like a common-sense precaution—indeed, it might not go far enough. Supporters of the bill, including AI luminaries Geoffrey Hinton and Yoshua Bengio, agree with Hendrycks’ assertion that the bill is a necessary step to prevent potential catastrophic harm from advanced AI systems.

“AI systems beyond a certain level of capability can pose meaningful risks to democracies and public safety,” wrote Bengio in an endorsement of the bill. “Therefore, they should be properly tested and subject to appropriate safety measures. This bill offers a practical approach to accomplishing this, and is a major step toward the requirements that I’ve recommended to legislators.”

“If we see any power-seeking behavior here, it is not of AI systems, but of AI doomers.

Tech policy expert Dr. Nirit Weiss-Blatt

However, critics argue that AI policy shouldn’t be led by outlandish fears of future systems that resemble science fiction more than current technology. “SB-1047 was originally drafted by non-profit groups that believe in the end of the world by sentient machine, like Dan Hendrycks’ Center for AI Safety,” Daniel Jeffries, a prominent voice in the AI community, told Ars. “You cannot start from this premise and create a sane, sound, ‘light touch’ safety bill.”

“If we see any power-seeking behavior here, it is not of AI systems, but of AI doomers,” added tech policy expert Nirit Weiss-Blatt. “With their fictional fears, they try to pass fictional-led legislation, one that, according to numerous AI experts and open source advocates, could ruin California’s and the US’s technological advantage.”

From sci-fi to state law: California’s plan to prevent AI catastrophe Read More »

hang-out-with-ars-in-san-jose-and-dc-this-fall-for-two-infrastructure-events

Hang out with Ars in San Jose and DC this fall for two infrastructure events

Arsmeet! —

Join us as we talk about the next few years in AI & storage, and what to watch for.

Photograph of servers and racks

Enlarge / Infrastructure!

Howdy, Arsians! Last year, we partnered with IBM to host an in-person event in the Houston area where we all gathered together, had some cocktails, and talked about resiliency and the future of IT. Location always matters for things like this, and so we hosted it at Space Center Houston and had our cocktails amidst cool space artifacts. In addition to learning a bunch of neat stuff, it was awesome to hang out with all the amazing folks who turned up at the event. Much fun was had!

This year, we’re back partnering with IBM again and we’re looking to repeat that success with not one, but two in-person gatherings—each featuring a series of panel discussions with experts and capping off with a happy hour for hanging out and mingling. Where last time we went central, this time we’re going to the coasts—both east and west. Read on for details!

September: San Jose, California

Our first event will be in San Jose on September 18, and it’s titled “Beyond the Buzz: An Infrastructure Future with GenAI and What Comes Next.” The idea will be to explore what generative AI means for the future of data management. The topics we’ll be discussing include:

  • Playing the infrastructure long game to address any kind of workload
  • Identifying infrastructure vulnerabilities with today’s AI tools
  • Infrastructure’s environmental footprint: Navigating impacts and responsibilities

We’re getting our panelists locked down right now, and while I don’t have any names to share, many will be familiar to Ars readers from past events—or from the front page.

As a neat added bonus, we’re going to host the event at the Computer History Museum, which any Bay Area Ars reader can attest is an incredibly cool venue. (Just nobody spill anything. I think they’ll kick us out if we break any exhibits!)

October: Washington, DC

Switching coasts, on October 29 we’ll set up shop in our nation’s capital for a similar show. This time, our event title will be “AI in DC: Privacy, Compliance, and Making Infrastructure Smarter.” Given that we’ll be in DC, the tone shifts a bit to some more policy-centric discussions, and the talk track looks like this:

  • The key to compliance with emerging technologies
  • Data security in the age of AI-assisted cyber-espionage
  • The best infrastructure solution for your AI/ML strategy

Same here deal with the speakers as with the September—I can’t name names yet, but the list will be familiar to Ars readers and I’m excited. We’re still considering venues, but hoping to find something that matches our previous events in terms of style and coolness.

Interested in attending?

While it’d be awesome if everyone could come, the old song and dance applies: space, as they say, will be limited at both venues. We’d like to make sure local folks in both locations get priority in being able to attend, so we’re asking anyone who wants a ticket to register for the events at the sign-up pages below. You should get an email immediately confirming we’ve received your info, and we’ll send another note in a couple of weeks with further details on timing and attendance.

On the Ars side, at minimum both our EIC Ken Fisher and I will be in attendance at both events, and we’ll likely have some other Ars staff showing up where we can—free drinks are a strong lure for the weary tech journalist, so there ought to be at least a few appearing at both. Hoping to see you all there!

Hang out with Ars in San Jose and DC this fall for two infrastructure events Read More »

google-claims-math-breakthrough-with-proof-solving-ai-models

Google claims math breakthrough with proof-solving AI models

slow and steady —

AlphaProof and AlphaGeometry 2 solve problems, with caveats on time and human assistance.

An illustration provided by Google.

Enlarge / An illustration provided by Google.

On Thursday, Google DeepMind announced that AI systems called AlphaProof and AlphaGeometry 2 reportedly solved four out of six problems from this year’s International Mathematical Olympiad (IMO), achieving a score equivalent to a silver medal. The tech giant claims this marks the first time an AI has reached this level of performance in the prestigious math competition—but as usual in AI, the claims aren’t as clear-cut as they seem.

Google says AlphaProof uses reinforcement learning to prove mathematical statements in the formal language called Lean. The system trains itself by generating and verifying millions of proofs, progressively tackling more difficult problems. Meanwhile, AlphaGeometry 2 is described as an upgraded version of Google’s previous geometry-solving AI modeI, now powered by a Gemini-based language model trained on significantly more data.

According to Google, prominent mathematicians Sir Timothy Gowers and Dr. Joseph Myers scored the AI model’s solutions using official IMO rules. The company reports its combined system earned 28 out of 42 possible points, just shy of the 29-point gold medal threshold. This included a perfect score on the competition’s hardest problem, which Google claims only five human contestants solved this year.

A math contest unlike any other

The IMO, held annually since 1959, pits elite pre-college mathematicians against exceptionally difficult problems in algebra, combinatorics, geometry, and number theory. Performance on IMO problems has become a recognized benchmark for assessing an AI system’s mathematical reasoning capabilities.

Google states that AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry 2 tackled the geometry question. The AI model reportedly failed to solve the two combinatorics problems. The company claims its systems solved one problem within minutes, while others took up to three days.

Google says it first translated the IMO problems into formal mathematical language for its AI model to process. This step differs from the official competition, where human contestants work directly with the problem statements during two 4.5-hour sessions.

Google reports that before this year’s competition, AlphaGeometry 2 could solve 83 percent of historical IMO geometry problems from the past 25 years, up from its predecessor’s 53 percent success rate. The company claims the new system solved this year’s geometry problem in 19 seconds after receiving the formalized version.

Limitations

Despite Google’s claims, Sir Timothy Gowers offered a more nuanced perspective on the Google DeepMind models in a thread posted on X. While acknowledging the achievement as “well beyond what automatic theorem provers could do before,” Gowers pointed out several key qualifications.

“The main qualification is that the program needed a lot longer than the human competitors—for some of the problems over 60 hours—and of course much faster processing speed than the poor old human brain,” Gowers wrote. “If the human competitors had been allowed that sort of time per problem they would undoubtedly have scored higher.”

Gowers also noted that humans manually translated the problems into the formal language Lean before the AI model began its work. He emphasized that while the AI performed the core mathematical reasoning, this “autoformalization” step was done by humans.

Regarding the broader implications for mathematical research, Gowers expressed uncertainty. “Are we close to the point where mathematicians are redundant? It’s hard to say. I would guess that we’re still a breakthrough or two short of that,” he wrote. He suggested that the system’s long processing times indicate it hasn’t “solved mathematics” but acknowledged that “there is clearly something interesting going on when it operates.”

Even with these limitations, Gowers speculated that such AI systems could become valuable research tools. “So we might be close to having a program that would enable mathematicians to get answers to a wide range of questions, provided those questions weren’t too difficult—the kind of thing one can do in a couple of hours. That would be massively useful as a research tool, even if it wasn’t itself capable of solving open problems.”

Google claims math breakthrough with proof-solving AI models Read More »