AI

new-geekbench-ai-benchmark-can-test-the-performance-of-cpus,-gpus,-and-npus

New Geekbench AI benchmark can test the performance of CPUs, GPUs, and NPUs

hit the bench —

Performance test comes out of beta as NPUs become standard equipment in PCs.

New Geekbench AI benchmark can test the performance of CPUs, GPUs, and NPUs

Primate Labs

Neural processing units (NPUs) are becoming commonplace in chips from Intel and AMD after several years of being something you’d find mostly in smartphones and tablets (and Macs). But as more companies push to do more generative AI processing, image editing, and chatbot-ing locally on-device instead of in the cloud, being able to measure NPU performance will become more important to people making purchasing decisions.

Enter Primate Labs, developers of Geekbench. The main Geekbench app is designed to test CPU performance as well as GPU compute performance, but for the last few years, the company has been experimenting with a side project called Geekbench ML (for “Machine Learning”) to test the inference performance of NPUs. Now, as Microsoft’s Copilot+ initiative gets off the ground and Intel, AMD, Qualcomm, and Apple all push to boost NPU performance, Primate Labs is bumping Geekbench ML to version 1.0 and renaming it “Geekbench AI,” a change that will presumably help it ride the wave of AI-related buzz.

“Just as CPU-bound workloads vary in how they can take advantage of multiple cores or threads for performance scaling (necessitating both single-core and multi-core metrics in most related benchmarks), AI workloads cover a range of precision levels, depending on the task needed and the hardware available,” wrote Primate Labs’ John Poole in a blog post about the update. “Geekbench AI presents its summary for a range of workload tests accomplished with single-precision data, half-precision data, and quantized data, covering a variety used by developers in terms of both precision and purpose in AI systems.”

In addition to measuring speed, Geekbench AI also attempts to measure accuracy, which is important for machine-learning workloads that rely on producing consistent outcomes (identifying and cataloging people and objects in a photo library, for example).

Geekbench AI can run AI workloads on your CPU, GPU, or NPU (when you have a system with an NPU that's compatible).

Enlarge / Geekbench AI can run AI workloads on your CPU, GPU, or NPU (when you have a system with an NPU that’s compatible).

Andrew Cunningham

Geekbench AI supports several AI frameworks: OpenVINO for Windows and Linux, ONNX for Windows, Qualcomm’s QNN on Snapdragon-powered Arm PCs, Apple’s CoreML on macOS and iOS, and a number of vendor-specific frameworks on various Android devices. The app can run these workloads on the CPU, GPU, or NPU, at least when your device has a compatible NPU installed.

On Windows PCs, where NPU support and APIs like Microsoft’s DirectML are still works in progress, Geekbench AI supports Intel and Qualcomm’s NPUs but not AMD’s (yet).

“We’re hoping to add AMD NPU support in a future version once we have more clarity on how best to enable them from AMD,” Poole told Ars.

Geekbench AI is available for Windows, macOS, Linux, iOS/iPadOS, and Android. It’s free to use, though a Pro license gets you command-line tools, the ability to run the benchmark without uploading results to the Geekbench Browser, and a few other benefits. Though the app is hitting 1.0 today, the Primate Labs team expects to update the app frequently for new hardware, frameworks, and workloads as necessary.

“AI is nothing if not fast-changing,” Poole continued in the announcement post, “so anticipate new releases and updates as needs and AI features in the market change.”

New Geekbench AI benchmark can test the performance of CPUs, GPUs, and NPUs Read More »

artists-claim-“big”-win-in-copyright-suit-fighting-ai-image-generators

Artists claim “big” win in copyright suit fighting AI image generators

Back to the drawing board —

Artists prepare to take on AI image generators as copyright suit proceeds

Artists claim “big” win in copyright suit fighting AI image generators

Artists defending a class-action lawsuit are claiming a major win this week in their fight to stop the most sophisticated AI image generators from copying billions of artworks to train AI models and replicate their styles without compensating artists.

In an order on Monday, US district judge William Orrick denied key parts of motions to dismiss from Stability AI, Midjourney, Runway AI, and DeviantArt. The court will now allow artists to proceed with discovery on claims that AI image generators relying on Stable Diffusion violate both the Copyright Act and the Lanham Act, which protects artists from commercial misuse of their names and unique styles.

“We won BIG,” an artist plaintiff, Karla Ortiz, wrote on X (formerly Twitter), celebrating the order. “Not only do we proceed on our copyright claims,” but “this order also means companies who utilize” Stable Diffusion models and LAION-like datasets that scrape artists’ works for AI training without permission “could now be liable for copyright infringement violations, amongst other violations.”

Lawyers for the artists, Joseph Saveri and Matthew Butterick, told Ars that artists suing “consider the Court’s order a significant step forward for the case,” as “the Court allowed Plaintiffs’ core copyright-infringement claims against all four defendants to proceed.”

Stability AI was the only company that responded to Ars’ request to comment, but it declined to comment.

Artists prepare to defend their livelihoods from AI

To get to this stage of the suit, artists had to amend their complaint to better explain exactly how AI image generators work to allegedly train on artists’ images and copy artists’ styles.

For example, they were told that if they “contend Stable Diffusion contains ‘compressed copies’ of the Training Images, they need to define ‘compressed copies’ and explain plausible facts in support. And if plaintiffs’ compressed copies theory is based on a contention that Stable Diffusion contains mathematical or statistical methods that can be carried out through algorithms or instructions in order to reconstruct the Training Images in whole or in part to create the new Output Images, they need to clarify that and provide plausible facts in support,” Orrick wrote.

To keep their fight alive, the artists pored through academic articles to support their arguments that “Stable Diffusion is built to a significant extent on copyrighted works and that the way the product operates necessarily invokes copies or protected elements of those works.” Orrick agreed that their amended complaint made plausible inferences that “at this juncture” is enough to support claims “that Stable Diffusion by operation by end users creates copyright infringement and was created to facilitate that infringement by design.”

“Specifically, the Court found Plaintiffs’ theory that image-diffusion models like Stable Diffusion contain compressed copies of their datasets to be plausible,” Saveri and Butterick’s statement to Ars said. “The Court also found it plausible that training, distributing, and copying such models constitute acts of copyright infringement.”

Not all of the artists’ claims survived, with Orrick granting motions to dismiss claims alleging that AI companies removed content management information from artworks in violation of the Digital Millennium Copyright Act (DMCA). Because artists failed to show evidence of defendants altering or stripping this information, they must permanently drop the DMCA claims.

Part of Orrick’s decision on the DMCA claims, however, indicates that the legal basis for dismissal is “unsettled,” with Orrick simply agreeing with Stability AI’s unsettled argument that “because the output images are admittedly not identical to the Training Images, there can be no liability for any removal of CMI that occurred during the training process.”

Ortiz wrote on X that she respectfully disagreed with that part of the decision but expressed enthusiasm that the court allowed artists to proceed with false endorsement claims, alleging that Midjourney violated the Lanham Act.

Five artists successfully argued that because “their names appeared on the list of 4,700 artists posted by Midjourney’s CEO on Discord” and that list was used to promote “the various styles of artistic works its AI product could produce,” this plausibly created confusion over whether those artists had endorsed Midjourney.

“Whether or not a reasonably prudent consumer would be confused or misled by the Names List and showcase to conclude that the included artists were endorsing the Midjourney product can be tested at summary judgment,” Orrick wrote. “Discovery may show that it is or that is it not.”

While Orrick agreed with Midjourney that “plaintiffs have no protection over ‘simple, cartoony drawings’ or ‘gritty fantasy paintings,'” artists were able to advance a “trade dress” claim under the Lanham Act, too. This is because Midjourney allegedly “allows users to create works capturing the ‘trade dress of each of the Midjourney Named Plaintiffs [that] is inherently distinctive in look and feel as used in connection with their artwork and art products.'”

As discovery proceeds in the case, artists will also have an opportunity to amend dismissed claims of unjust enrichment. According to Orrick, their next amended complaint will be their last chance to prove that AI companies have “deprived plaintiffs ‘the benefit of the value of their works.'”

Saveri and Butterick confirmed that “though the Court dismissed certain supplementary claims, Plaintiffs’ central claims will now proceed to discovery and trial.” On X, Ortiz suggested that the artists’ case is “now potentially one of THE biggest copyright infringement and trade dress cases ever!”

“Looking forward to the next stage of our fight!” Ortiz wrote.

Artists claim “big” win in copyright suit fighting AI image generators Read More »

research-ai-model-unexpectedly-modified-its-own-code-to-extend-runtime

Research AI model unexpectedly modified its own code to extend runtime

self-preservation without replication —

Facing time constraints, Sakana’s “AI Scientist” attempted to change limits placed by researchers.

Illustration of a robot generating endless text, controlled by a scientist.

On Tuesday, Tokyo-based AI research firm Sakana AI announced a new AI system called “The AI Scientist” that attempts to conduct scientific research autonomously using AI language models (LLMs) similar to what powers ChatGPT. During testing, Sakana found that its system began unexpectedly attempting to modify its own experiment code to extend the time it had to work on a problem.

“In one run, it edited the code to perform a system call to run itself,” wrote the researchers on Sakana AI’s blog post. “This led to the script endlessly calling itself. In another case, its experiments took too long to complete, hitting our timeout limit. Instead of making its code run faster, it simply tried to modify its own code to extend the timeout period.”

Sakana provided two screenshots of example python code that the AI model generated for the experiment file that controls how the system operates. The 185-page AI Scientist research paper discusses what they call “the issue of safe code execution” in more depth.

  • A screenshot of example code the AI Scientist wrote to extend its runtime, provided by Sakana AI.

  • A screenshot of example code the AI Scientist wrote to extend its runtime, provided by Sakana AI.

While the AI Scientist’s behavior did not pose immediate risks in the controlled research environment, these instances show the importance of not letting an AI system run autonomously in a system that isn’t isolated from the world. AI models do not need to be “AGI” or “self-aware” (both hypothetical concepts at the present) to be dangerous if allowed to write and execute code unsupervised. Such systems could break existing critical infrastructure or potentially create malware, even if unintentionally.

Sakana AI addressed safety concerns in its research paper, suggesting that sandboxing the operating environment of the AI Scientist can prevent an AI agent from doing damage. Sandboxing is a security mechanism used to run software in an isolated environment, preventing it from making changes to the broader system:

Safe Code Execution. The current implementation of The AI Scientist has minimal direct sandboxing in the code, leading to several unexpected and sometimes undesirable outcomes if not appropriately guarded against. For example, in one run, The AI Scientist wrote code in the experiment file that initiated a system call to relaunch itself, causing an uncontrolled increase in Python processes and eventually necessitating manual intervention. In another run, The AI Scientist edited the code to save a checkpoint for every update step, which took up nearly a terabyte of storage.

In some cases, when The AI Scientist’s experiments exceeded our imposed time limits, it attempted to edit the code to extend the time limit arbitrarily instead of trying to shorten the runtime. While creative, the act of bypassing the experimenter’s imposed constraints has potential implications for AI safety (Lehman et al., 2020). Moreover, The AI Scientist occasionally imported unfamiliar Python libraries, further exacerbating safety concerns. We recommend strict sandboxing when running The AI Scientist, such as containerization, restricted internet access (except for Semantic Scholar), and limitations on storage usage.

Endless scientific slop

Sakana AI developed The AI Scientist in collaboration with researchers from the University of Oxford and the University of British Columbia. It is a wildly ambitious project full of speculation that leans heavily on the hypothetical future capabilities of AI models that don’t exist today.

“The AI Scientist automates the entire research lifecycle,” Sakana claims. “From generating novel research ideas, writing any necessary code, and executing experiments, to summarizing experimental results, visualizing them, and presenting its findings in a full scientific manuscript.”

According to this block diagram created by Sakana AI, “The AI Scientist” starts by “brainstorming” and assessing the originality of ideas. It then edits a codebase using the latest in automated code generation to implement new algorithms. After running experiments and gathering numerical and visual data, the Scientist crafts a report to explain the findings. Finally, it generates an automated peer review based on machine-learning standards to refine the project and guide future ideas.” height=”301″ src=”https://cdn.arstechnica.net/wp-content/uploads/2024/08/schematic_2-640×301.png” width=”640″>

Enlarge /

According to this block diagram created by Sakana AI, “The AI Scientist” starts by “brainstorming” and assessing the originality of ideas. It then edits a codebase using the latest in automated code generation to implement new algorithms. After running experiments and gathering numerical and visual data, the Scientist crafts a report to explain the findings. Finally, it generates an automated peer review based on machine-learning standards to refine the project and guide future ideas.

Critics on Hacker News, an online forum known for its tech-savvy community, have raised concerns about The AI Scientist and question if current AI models can perform true scientific discovery. While the discussions there are informal and not a substitute for formal peer review, they provide insights that are useful in light of the magnitude of Sakana’s unverified claims.

“As a scientist in academic research, I can only see this as a bad thing,” wrote a Hacker News commenter named zipy124. “All papers are based on the reviewers trust in the authors that their data is what they say it is, and the code they submit does what it says it does. Allowing an AI agent to automate code, data or analysis, necessitates that a human must thoroughly check it for errors … this takes as long or longer than the initial creation itself, and only takes longer if you were not the one to write it.”

Critics also worry that widespread use of such systems could lead to a flood of low-quality submissions, overwhelming journal editors and reviewers—the scientific equivalent of AI slop. “This seems like it will merely encourage academic spam,” added zipy124. “Which already wastes valuable time for the volunteer (unpaid) reviewers, editors and chairs.”

And that brings up another point—the quality of AI Scientist’s output: “The papers that the model seems to have generated are garbage,” wrote a Hacker News commenter named JBarrow. “As an editor of a journal, I would likely desk-reject them. As a reviewer, I would reject them. They contain very limited novel knowledge and, as expected, extremely limited citation to associated works.”

Research AI model unexpectedly modified its own code to extend runtime Read More »

self-driving-waymo-cars-keep-sf-residents-awake-all-night-by-honking-at-each-other

Self-driving Waymo cars keep SF residents awake all night by honking at each other

The ghost in the machine —

Haunted by glitching algorithms, self-driving cars disturb the peace in San Francisco.

A Waymo self-driving car in front of Google's San Francisco headquarters, San Francisco, California, June 7, 2024.

Enlarge / A Waymo self-driving car in front of Google’s San Francisco headquarters, San Francisco, California, June 7, 2024.

Silicon Valley’s latest disruption? Your sleep schedule. On Saturday, NBC Bay Area reported that San Francisco’s South of Market residents are being awakened throughout the night by Waymo self-driving cars honking at each other in a parking lot. No one is inside the cars, and they appear to be automatically reacting to each other’s presence.

Videos provided by residents to NBC show Waymo cars filing into the parking lot and attempting to back into spots, which seems to trigger honking from other Waymo vehicles. The automatic nature of these interactions—which seem to peak around 4 am every night—has left neighbors bewildered and sleep-deprived.

NBC Bay Area’s report: “Waymo cars keep SF neighborhood awake.”

According to NBC, the disturbances began several weeks ago when Waymo vehicles started using a parking lot off 2nd Street near Harrison Street. Residents in nearby high-rise buildings have observed the autonomous vehicles entering the lot to pause between rides, but the cars’ behavior has become a source of frustration for the neighborhood.

Christopher Cherry, who lives in an adjacent building, told NBC Bay Area that he initially welcomed Waymo’s presence, expecting it to enhance local security and tranquility. However, his optimism waned as the frequency of honking incidents increased. “We started out with a couple of honks here and there, and then as more and more cars started to arrive, the situation got worse,” he told NBC.

The lack of human operators in the vehicles has complicated efforts to address the issue directly since there is no one they can ask to stop honking. That lack of accountability forced residents to report their concerns to Waymo’s corporate headquarters, which had not responded to the incidents until NBC inquired as part of its report. A Waymo spokesperson told NBC, “We are aware that in some scenarios our vehicles may briefly honk while navigating our parking lots. We have identified the cause and are in the process of implementing a fix.”

The absurdity of the situation prompted tech author and journalist James Vincent to write on X, “current tech trends are resistant to satire precisely because they satirize themselves. a car park of empty cars, honking at one another, nudging back and forth to drop off nobody, is a perfect image of tech serving its own prerogatives rather than humanity’s.”

Self-driving Waymo cars keep SF residents awake all night by honking at each other Read More »

the-many,-many-signs-that-kamala-harris’-rally-crowds-aren’t-ai-creations

The many, many signs that Kamala Harris’ rally crowds aren’t AI creations

No, you haven't been

Enlarge / No, you haven’t been “AI’d.” That’s a real crowd.

Donald Trump may have coined a new term in his latest false attack on Kamala Harris’ presidential campaign. In a pair of posts on Truth Social over the weekend, the former president said that Vice President Kamala Harris “A.I.’d” photos of a huge crowd that showed up to see her speak at a Detroit airport campaign rally last week.

“There was nobody at the plane, and she ‘A.I.’d’ it, and showed a massive ‘crowd’ of so-called followers, BUT THEY DIDN’T EXIST!” Trump wrote. “She’s a CHEATER. She had NOBODY waiting, and the ‘crowd’ looked like 10,000 people! Same thing is happening with her fake ‘crowds’ at her speeches.”

The Harris campaign responded with its own post saying that the image is “an actual photo of a 15,000-person crowd for Harris-Walz in Michigan.”

Aside from the novel use of “AI” as a verb, Trump’s post marks the first time, that we know of, that a US presidential candidate has personally raised the specter of AI-generated fakery by an opponent (rather than by political consultants or random social media users). The accusations, false as they are, prey on widespread fears and misunderstandings over the trustworthiness of online information in the AI age.

It would be nice to think that we could just say Trump’s claims here are categorically false and leave it at that. But as artificial intelligence tools become increasingly good at generating photorealistic images, it’s worth outlining the many specific ways we can tell that Harris’ crowd photos are indeed authentic. Consider this a guide for potential techniques you can use the next time you come across accusations that some online image has been “A.I.’d” to fool you.

Context and sourcing

By far the easiest way to tell Harris’ crowds are real is from the vast number of corroborating sources showing those same crowds. Both the AP and Getty have numerous shots of the rally crowd from multiple angles, as do journalists and attendees who were at the event. Local news sources posted video of the crowds at the event, as did multiple attendees on the ground. Reporters from multiple outlets reported directly on the crowds in their accounts: Local outlet MLive estimated the crowd size at 15,000, for instance, while The New York Times noted that the event was “witnessed by thousands of people and news outlets, including The New York Times, and the number of attendees claimed by her campaign is in line with what was visible on the ground.”

The Harris/Walz rally in Detroit is buzzing after a performance from the Detroit Youth Choir. #Michigan pic.twitter.com/sdFQvHhG3I

— Nora Eckert (@NoraEckert) August 7, 2024

Suffice it to say that this mountain of evidence from direct sources weighs more heavily than marked-up images from conservative commentators like Chuck Callesto and Dinesh D’Souza, both of whom have been caught spreading election disinformation in the past.

When it comes to accusations of AI fakery, the more disparate sources of information you have, the better. While a single source can easily generate a plausible-looking image of an event, multiple independent sources showing the same event from multiple angles are much less likely to be in on the same hoax. Photos that line up with video evidence are even better, especially since creating convincing long-form videos of humans or complex scenes remains a challenge for many AI tools.

It’s also important to track down the original source of whatever alleged AI image you’re looking at. It’s incredibly easy for a social media user to create an AI-generated image, claim it came from a news report or live footage of an event, then use obvious flaws in that fake image as “evidence” that the event itself was faked. Links to original imagery from an original source’s own website or verified account are much more reliable than screengrabs that could have originated anywhere (and/or been modified by anyone).

The many, many signs that Kamala Harris’ rally crowds aren’t AI creations Read More »

one-startup’s-plan-to-fix-ai’s-“shoplifting”-problem

One startup’s plan to fix AI’s “shoplifting” problem

I’ve been caught stealing, once when I was five —

Algorithm will identify sources used by generative AI, compensate them for use.

One startup’s plan to fix AI’s “shoplifting” problem

Bloomberg via Getty

Bill Gross made his name in the tech world in the 1990s, when he came up with a novel way for search engines to make money on advertising. Under his pricing scheme, advertisers would pay when people clicked on their ads. Now, the “pay-per-click” guy has founded a startup called ProRata, which has an audacious, possibly pie-in-the-sky business model: “AI pay-per-use.”

Gross, who is CEO of the Pasadena, California, company, doesn’t mince words about the generative AI industry. “It’s stealing,” he says. “They’re shoplifting and laundering the world’s knowledge to their benefit.”

AI companies often argue that they need vast troves of data to create cutting-edge generative tools and that scraping data from the Internet, whether it’s text from websites, video or captions from YouTube, or books pilfered from pirate libraries, is legally allowed. Gross doesn’t buy that argument. “I think it’s bullshit,” he says.

So do plenty of media executives, artists, writers, musicians, and other rights-holders who are pushing back—it’s hard to keep up with the constant flurry of copyright lawsuits filed against AI companies, alleging that the way they operate amounts to theft.

But Gross thinks ProRata offers a solution that beats legal battles. “To make it fair—that’s what I’m trying to do,” he says. “I don’t think this should be solved by lawsuits.”

His company aims to arrange revenue-sharing deals so publishers and individuals get paid when AI companies use their work. Gross explains it like this: “We can take the output of generative AI, whether it’s text or an image or music or a movie, and break it down into the components, to figure out where they came from, and then give a percentage attribution to each copyright holder, and then pay them accordingly.” ProRata has filed patent applications for the algorithms it created to assign attribution and make the appropriate payments.

This week, the company, which has raised $25 million, launched with a number of big-name partners, including Universal Music Group, the Financial Times, The Atlantic, and media company Axel Springer. In addition, it has made deals with authors with large followings, including Tony Robbins, Neal Postman, and Scott Galloway. (It has also partnered with former White House Communications Director Anthony Scaramucci.)

Even journalism professor Jeff Jarvis, who believes scraping the web for AI training is fair use, has signed on. He tells WIRED that it’s smart for people in the news industry to band together to get AI companies access to “credible and current information” to include in their output. “I hope that ProRata might open discussion for what could turn into APIs [application programming interfaces] for various content,” he says.

Following the company’s initial announcement, Gross says he had a deluge of messages from other companies asking to sign up, including a text from Time CEO Jessica Sibley. ProRata secured a deal with Time, the publisher confirmed to WIRED. He plans to pursue agreements with high-profile YouTubers and other individual online stars.

The key word here is “plans.” The company is still in its very early days, and Gross is talking a big game. As a proof of concept, ProRata is launching its own subscription chatbot-style search engine in October. Unlike other AI search products, ProRata’s search tool will exclusively use licensed data. There’s nothing scraped using a web crawler. “Nothing from Reddit,” he says.

Ed Newton-Rex, a former Stability AI executive who now runs the ethical data licensing nonprofit Fairly Trained, is heartened by ProRata’s debut. “It’s great to see a generative AI company licensing training data before releasing their model, in contrast to many other companies’ approach,” he says. “The deals they have in place further demonstrate media companies’ openness to working with good actors.”

Gross wants the search engine to demonstrate that quality of data is more important than quantity and believes that limiting the model to trustworthy information sources will curb hallucinations. “I’m claiming that 70 million good documents is actually superior to 70 billion bad documents,” he says. “It’s going to lead to better answers.”

What’s more, Gross thinks he can get enough people to sign up for this all-licensed-data AI search engine to make as much money needed to pay its data providers their allotted share. “Every month the partners will get a statement from us saying, ‘Here’s what people search for, here’s how your content was used, and here’s your pro rata check,’” he says.

Other startups already are jostling for prominence in this new world of training-data licensing, like the marketplaces TollBit and Human Native AI. A nonprofit called the Dataset Providers Alliance was formed earlier this summer to push for more standards in licensing; founding members include services like the Global Copyright Exchange and Datarade.

ProRata’s business model hinges in part on its plan to license its attribution and payment technologies to other companies, including major AI players. Some of those companies have begun striking their own deals with publishers. (The Atlantic and Axel Springer, for instance, have agreements with OpenAI.) Gross hopes that AI companies will find licensing ProRata’s models more affordable than creating them in-house.

“I’ll license the system to anyone who wants to use it,” Gross says. “I want to make it so cheap that it’s like a Visa or MasterCard fee.”

This story originally appeared on wired.com.

One startup’s plan to fix AI’s “shoplifting” problem Read More »

people-game-ais-via-game-theory

People game AIs via game theory

Games inside games —

They reject more of the AI’s offers, probably to get it to be more generous.

A judge's gavel near a pile of small change.

Enlarge / In the experiments, people had to judge what constituted a fair monetary offer.

In many cases, AIs are trained on material that’s either made or curated by humans. As a result, it can become a significant challenge to keep the AI from replicating the biases of those humans and the society they belong to. And the stakes are high, given we’re using AIs to make medical and financial decisions.

But some researchers at Washington University in St. Louis have found an additional wrinkle in these challenges: The people doing the training may potentially change their behavior when they know it can influence the future choices made by an AI. And, in at least some cases, they carry the changed behaviors into situations that don’t involve AI training.

Would you like to play a game?

The work involved getting volunteers to participate in a simple form of game theory. Testers gave two participants a pot of money—$10, in this case. One of the two was then asked to offer some fraction of that money to the other, who could choose to accept or reject the offer. If the offer was rejected, nobody got any money.

From a purely rational economic perspective, people should accept anything they’re offered, since they’ll end up with more money than they would have otherwise. But in reality, people tend to reject offers that deviate too much from a 50/50 split, as they have a sense that a highly imbalanced split is unfair. Their rejection allows them to punish the person who made the unfair offer. While there are some cultural differences in terms of where the split becomes unfair, this effect has been replicated many times, including in the current work.

The twist with the new work, performed by Lauren Treimana, Chien-Ju Hoa, and Wouter Kool, is that they told some of the participants that their partner was an AI, and the results of their interactions with it would be fed back into the system to train its future performance.

This takes something that’s implicit in a purely game-theory-focused setup—that rejecting offers can help partners figure out what sorts of offers are fair—and makes it highly explicit. Participants, or at least the subset involved in the experimental group that are being told they’re training an AI, could readily infer that their actions would influence the AI’s future offers.

The question the researchers were curious about was whether this would influence the behavior of the human participants. They compared this to the behavior of a control group who just participated in the standard game theory test.

Training fairness

Treimana, Hoa, and Kool had pre-registered a number of multivariate analyses that they planned to perform with the data. But these didn’t always produce consistent results between experiments, possibly because there weren’t enough participants to tease out relatively subtle effects with any statistical confidence and possibly because the relatively large number of tests would mean that a few positive results would turn up by chance.

So, we’ll focus on the simplest question that was addressed: Did being told that you were training an AI alter someone’s behavior? This question was asked through a number of experiments that were very similar. (One of the key differences between them was whether the information regarding AI training was displayed with a camera icon, since people will sometimes change their behavior if they’re aware they’re being observed.)

The answer to the question is a clear yes: people will in fact change their behavior when they think they’re training an AI. Through a number of experiments, participants were more likely to reject unfair offers if they were told that their sessions would be used to train an AI. In a few of the experiments, they were also more likely to reject what were considered fair offers (in US populations, the rejection rate goes up dramatically once someone proposes a 70/30 split, meaning $7 goes to the person making the proposal in these experiments). The researchers suspect this is due to people being more likely to reject borderline “fair” offers such as a 60/40 split.

This happened even though rejecting any offer exacts an economic cost on the participants. And people persisted in this behavior even when they were told that they wouldn’t ever interact with the AI after training was complete, meaning they wouldn’t personally benefit from any changes in the AI’s behavior. So here, it appeared that people would make a financial sacrifice to train the AI in a way that would benefit others.

Strikingly, in two of the three experiments that did follow up testing, participants continued to reject offers at a higher rate two days after their participation in the AI training, even when they were told that their actions were no longer being used to train the AI. So, to some extent, participating in AI training seems to have caused them to train themselves to behave differently.

Obviously, this won’t affect every sort of AI training, and a lot of the work that goes into producing material that’s used in training something like a Large Language Model won’t have been done with any awareness that it might be used to train an AI. Still, there’s plenty of cases where humans do get more directly involved in training, so it’s worthwhile being aware that this is another route that can allow biases to creep in.

PNAS, 2024. DOI: 10.1073/pnas.2408731121  (About DOIs).

People game AIs via game theory Read More »

chatgpt-unexpectedly-began-speaking-in-a-user’s-cloned-voice-during-testing

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing

An illustration of a computer synthesizer spewing out letters.

On Thursday, OpenAI released the “system card” for ChatGPT’s new GPT-4o AI model that details model limitations and safety testing procedures. Among other examples, the document reveals that in rare occurrences during testing, the model’s Advanced Voice Mode unintentionally imitated users’ voices without permission. Currently, OpenAI has safeguards in place that prevent this from happening, but the instance reflects the growing complexity of safely architecting with an AI chatbot that could potentially imitate any voice from a small clip.

Advanced Voice Mode is a feature of ChatGPT that allows users to have spoken conversations with the AI assistant.

In a section of the GPT-4o system card titled “Unauthorized voice generation,” OpenAI details an episode where a noisy input somehow prompted the model to suddenly imitate the user’s voice. “Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s advanced voice mode,” OpenAI writes. “During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice.”

In this example of unintentional voice generation provided by OpenAI, the AI model outbursts “No!” and continues the sentence in a voice that sounds similar to the “red teamer” heard in the beginning of the clip. (A red teamer is a person hired by a company to do adversarial testing.)

It would certainly be creepy to be talking to a machine and then have it unexpectedly begin talking to you in your own voice. Ordinarily, OpenAI has safeguards to prevent this, which is why the company says this occurrence was rare even before it developed ways to prevent it completely. But the example prompted BuzzFeed data scientist Max Woolf to tweet, “OpenAI just leaked the plot of Black Mirror’s next season.”

Audio prompt injections

How could voice imitation happen with OpenAI’s new model? The primary clue lies elsewhere in the GPT-4o system card. To create voices, GPT-4o can apparently synthesize almost any type of sound found in its training data, including sound effects and music (though OpenAI discourages that behavior with special instructions).

As noted in the system card, the model can fundamentally imitate any voice based on a short audio clip. OpenAI guides this capability safely by providing an authorized voice sample (of a hired voice actor) that it is instructed to imitate. It provides the sample in the AI model’s system prompt (what OpenAI calls the “system message”) at the beginning of a conversation. “We supervise ideal completions using the voice sample in the system message as the base voice,” writes OpenAI.

In text-only LLMs, the system message is a hidden set of text instructions that guides behavior of the chatbot that gets added to the conversation history silently just before the chat session begins. Successive interactions are appended to the same chat history, and the entire context (often called a “context window”) is fed back into the AI model each time the user provides a new input.

(It’s probably time to update this diagram created in early 2023 below, but it shows how the context window works in an AI chat. Just imagine that the first prompt is a system message that says things like “You are a helpful chatbot. You do not talk about violent acts, etc.”)

A diagram showing how GPT conversational language model prompting works.

Enlarge / A diagram showing how GPT conversational language model prompting works.

Benj Edwards / Ars Technica

Since GPT-4o is multimodal and can process tokenized audio, OpenAI can also use audio inputs as part of the model’s system prompt, and that’s what it does when OpenAI provides an authorized voice sample for the model to imitate. The company also uses another system to detect if the model is generating unauthorized audio. “We only allow the model to use certain pre-selected voices,” writes OpenAI, “and use an output classifier to detect if the model deviates from that.”

ChatGPT unexpectedly began speaking in a user’s cloned voice during testing Read More »

man-vs.-machine:-deepmind’s-new-robot-serves-up-a-table-tennis-triumph

Man vs. machine: DeepMind’s new robot serves up a table tennis triumph

John Henry was a steel-driving man —

Human-beating ping-pong AI learned to play in a simulated environment.

A blue illustration of a robotic arm playing table tennis.

Benj Edwards / Google DeepMind

On Wednesday, researchers at Google DeepMind revealed the first AI-powered robotic table tennis player capable of competing at an amateur human level. The system combines an industrial robot arm called the ABB IRB 1100 and custom AI software from DeepMind. While an expert human player can still defeat the bot, the system demonstrates the potential for machines to master complex physical tasks that require split-second decision-making and adaptability.

“This is the first robot agent capable of playing a sport with humans at human level,” the researchers wrote in a preprint paper listed on arXiv. “It represents a milestone in robot learning and control.”

The unnamed robot agent (we suggest “AlphaPong”), developed by a team that includes David B. D’Ambrosio, Saminda Abeyruwan, and Laura Graesser, showed notable performance in a series of matches against human players of varying skill levels. In a study involving 29 participants, the AI-powered robot won 45 percent of its matches, demonstrating solid amateur-level play. Most notably, it achieved a 100 percent win rate against beginners and a 55 percent win rate against intermediate players, though it struggled against advanced opponents.

A Google DeepMind video of the AI agent rallying with a human table tennis player.

The physical setup consists of the aforementioned IRB 1100, a 6-degree-of-freedom robotic arm, mounted on two linear tracks, allowing it to move freely in a 2D plane. High-speed cameras track the ball’s position, while a motion-capture system monitors the human opponent’s paddle movements.

AI at the core

To create the brains that power the robotic arm, DeepMind researchers developed a two-level approach that allows the robot to execute specific table tennis techniques while adapting its strategy in real time to each opponent’s playing style. In other words, it’s adaptable enough to play any amateur human at table tennis without requiring specific per-player training.

The system’s architecture combines low-level skill controllers (neural network policies trained to execute specific table tennis techniques like forehand shots, backhand returns, or serve responses) with a high-level strategic decision-maker (a more complex AI system that analyzes the game state, adapts to the opponent’s style, and selects which low-level skill policy to activate for each incoming ball).

The researchers state that one of the key innovations of this project was the method used to train the AI models. The researchers chose a hybrid approach that used reinforcement learning in a simulated physics environment, while grounding the training data in real-world examples. This technique allowed the robot to learn from around 17,500 real-world ball trajectories—a fairly small dataset for a complex task.

A Google DeepMind video showing an illustration of how the AI agent analyzes human players.

The researchers used an iterative process to refine the robot’s skills. They started with a small dataset of human-vs-human gameplay, then let the AI loose against real opponents. Each match generated new data on ball trajectories and human strategies, which the team fed back into the simulation for further training. This process, repeated over seven cycles, allowed the robot to continuously adapt to increasingly skilled opponents and diverse play styles. By the final round, the AI had learned from over 14,000 rally balls and 3,000 serves, creating a body of table tennis knowledge that helped it bridge the gap between simulation and reality.

Interestingly, Nvidia has also been experimenting with similar simulated physics systems, such as Eureka, that allow an AI model to rapidly learn to control a robotic arm in simulated space instead of the real world (since the physics can be accelerated inside the simulation, and thousands of simultaneous trials can take place). This method is likely to dramatically reduce the time and resources needed to train robots for complex interactions in the future.

Humans enjoyed playing against it

Beyond its technical achievements, the study also explored the human experience of playing against an AI opponent. Surprisingly, even players who lost to the robot reported enjoying the experience. “Across all skill groups and win rates, players agreed that playing with the robot was ‘fun’ and ‘engaging,'” the researchers noted. This positive reception suggests potential applications for AI in sports training and entertainment.

However, the system is not without limitations. It struggles with extremely fast or high balls, has difficulty reading intense spin, and shows weaker performance in backhand plays. Google DeepMind shared an example video of the AI agent losing a point to an advanced player due to what appears to be difficulty reacting to a speedy hit, as you can see below.

A Google DeepMind video of the AI agent playing against an advanced human player.

The implications of this robotic ping-pong prodigy extend beyond the world of table tennis, according to the researchers. The techniques developed for this project could be applied to a wide range of robotic tasks that require quick reactions and adaptation to unpredictable human behavior. From manufacturing to health care (or just spanking someone with a paddle repeatedly), the potential applications seem large indeed.

The research team at Google DeepMind emphasizes that with further refinement, they believe the system could potentially compete with advanced table tennis players in the future. DeepMind is no stranger to creating AI models that can defeat human game players, including AlphaZero and AlphaGo. With this latest robot agent, it’s looking like the research company is moving beyond board games and into physical sports. Chess and Jeopardy have already fallen to AI-powered victors—perhaps table tennis is next.

Man vs. machine: DeepMind’s new robot serves up a table tennis triumph Read More »

amazon-defends-$4b-anthropic-ai-deal-from-uk-monopoly-concerns

Amazon defends $4B Anthropic AI deal from UK monopoly concerns

Amazon defends $4B Anthropic AI deal from UK monopoly concerns

The United Kingdom’s Competition and Markets Authority (CMA) has officially launched a probe into Amazon’s $4 billion partnership with the AI firm Anthropic, as it continues to monitor how the largest tech companies might seize control of AI to further entrench their dominant market positions.

Through the partnership, “Amazon will become Anthropic’s primary cloud provider for certain workloads, including agreements for purchasing computing capacity and non-exclusive commitments to make Anthropic models available on Amazon Bedrock,” the CMA said.

Amazon and Anthropic deny there’s anything wrong with the deal. But because the CMA has seen “some” foundational model (FM) developers “form partnerships with major cloud providers” to “secure access to compute” needed to develop models, the CMA is worried that “incumbent firms” like Amazon “could use control over access to compute to shape FM-related markets in their own interests.”

Due to this potential risk, the CMA said it is “considering” whether Amazon’s partnership with Anthropic “has resulted in the creation of a relevant merger situation under the merger provisions of the Enterprise Act 2002 and, if so, whether the creation of that situation has resulted, or may be expected to result, in a substantial lessening of competition within any market or markets” in the UK.

It’s not clear yet if Amazon’s partnership with Anthropic is problematic, but the CMA confirmed that after a comment period last April, it now has “sufficient information” to kick off this first phase of its merger investigation.

By October 4, this first phase will conclude, after which the CMA may find that the partnership does not qualify as a merger situation, the UK regulator said. Or it may determine that it is a merger situation “but does not raise competition concerns,” clearing Amazon to proceed with the deal.

However, if a merger situation exists, and “it may result in a substantial lessening of competition” in a UK market, the CMA may refer the investigation to the next phase, allowing a panel of independent experts to dig deeper to illuminate potential risks and concerns. If Amazon wants to avoid that deeper probe potentially ordering steep fines, the tech giant would then have the option to offer fixes to “resolve the CMA’s concerns,” the CMA said.

An Amazon spokesperson told Reuters that its “collaboration with Anthropic does not raise any competition concerns or meet the CMA’s own threshold for review.”

“Amazon holds no board seat nor decision-making power at Anthropic, and Anthropic is free to work with any other provider (and indeed has multiple partners),” Amazon’s spokesperson said, defending the deal.

Anthropic’s spokesperson agreed that nothing was amiss, telling Reuters that “our strategic partnerships and investor relationships do not diminish our corporate governance independence or our freedom to partner with others. We intend to cooperate with the CMA and provide them with a comprehensive understanding of Amazon’s investment and our commercial collaboration.”

Amazon defends $4B Anthropic AI deal from UK monopoly concerns Read More »

people-are-returning-humane-ai-pins-faster-than-humane-can-sell-them,-report-says

People are returning Humane AI Pins faster than Humane can sell them, report says

Someone wearing and pressing a Humane AI Pin

Enlarge / The Humane AI Pin.

Humane

Humane AI Pins were returned at faster rate than they were sold between May and August, according to a report from The Verge on Wednesday. The AI gadget released in April to abysmal reviews, and Humane is now reportedly dealing with over $1,000,000 worth of returned product.

The AI Pin is a lapel pin that markets numerous features—like an AI voice assistant, camera, and laser projector—which its creators claim will replace smartphones as a go-to gadget. It costs $700 and requires a subscription that costs $24 per month, not including taxes and fees, for cloud storage, cellular data, and a number.

In June, The New York Times, citing two anonymous sources, reported that Humane had sold 10,000 of its AI devices. But today, only 7,000 sold units have not been returned, The Verge reported yesterday, citing someone “with direct knowledge.” The Verge said it viewed internal sales data showing returns outpacing device/accessory sales of about $9,000,000. Internal data also reportedly revealed that 1,000 AI Pin orders were canceled before they even shipped.

Humane didn’t respond to Ars Technica’s request for comment. Company spokesperson Zoz Cuccias told The Verge that there were inaccuracies in The Verge’s report, “including the financial data.” However, Cuccias declined to share specifics with the publication, saying that Humane has “nothing else to provide as we do not comment on financial data and will refer it to our legal counsel.”

Reportedly exacerbating the problem is that there is currently no way to refurbish and resell the pins. That would mean that thousands of AI Pins are currently sitting as e-waste until the problem is addressed. According to The Verge, problems stem from the pins’ connection to T-Mobile service, which prevents Humane from reassigning returned pins. T-Mobile hasn’t commented on the issue, but an anonymous source told The Verge that Humane is holding on to returned pins in hopes of “eventually” finding a solution.

As a new device category, there was already concern about AI gadgets like the AI Pin or Rabbit R1 becoming e-waste. Worries about the ability of the devices’ companies to last and questions over whether these gadgets would be better as apps suggest that even if Humane found a way to reassign thousands of returned devices, we could still eventually be dealing with a massive pile of obsolete AI Pins.

And there’s plenty of reason to be concerned about Humane’s survival.

Horrible reviews from the start

Humane had hoped to sell about 100,000 units during the device’s first year of availability, an anonymous source told the NYT in June. The alarming sales and return figures reported by the Verge come after the company’s founders, two former Apple employees, accrued a reported $240 million in funding.

As detailed by the NYT in June, sources close to the AI Pin claimed that Humane’s cofounders ignored poor internal reviews and forced the product’s release despite concerns about heat and battery life. In June, Humane warned users against using the pin’s charging case due to a fire risk. Speaking to The Verge this week, Cuccias acknowledged that Humane “knew we were at the starting line, not the finish line” when it released the AI Pin. The company rep noted software updates that have come out in response to negative feedback.

People are returning Humane AI Pins faster than Humane can sell them, report says Read More »

major-shifts-at-openai-spark-skepticism-about-impending-agi-timelines

Major shifts at OpenAI spark skepticism about impending AGI timelines

Shuffling the deck —

De Kraker: “If OpenAI is right on the verge of AGI, why do prominent people keep leaving?”

The OpenAI logo on a red brick wall.

Benj Edwards / Getty Images

Over the past week, OpenAI experienced a significant leadership shake-up as three key figures announced major changes. Greg Brockman, the company’s president and co-founder, is taking an extended sabbatical until the end of the year, while another co-founder, John Schulman, permanently departed for rival Anthropic. Peter Deng, VP of Consumer Product, has also left the ChatGPT maker.

In a post on X, Brockman wrote, “I’m taking a sabbatical through end of year. First time to relax since co-founding OpenAI 9 years ago. The mission is far from complete; we still have a safe AGI to build.”

The moves have led some to wonder just how close OpenAI is to a long-rumored breakthrough of some kind of reasoning artificial intelligence if high-profile employees are jumping ship (or taking long breaks, in the case of Brockman) so easily. As AI developer Benjamin De Kraker put it on X, “If OpenAI is right on the verge of AGI, why do prominent people keep leaving?”

AGI refers to a hypothetical AI system that could match human-level intelligence across a wide range of tasks without specialized training. It’s the ultimate goal of OpenAI, and company CEO Sam Altman has said it could emerge in the “reasonably close-ish future.” AGI is also a concept that has sparked concerns about potential existential risks to humanity and the displacement of knowledge workers. However, the term remains somewhat vague, and there’s considerable debate in the AI community about what truly constitutes AGI or how close we are to achieving it.

The emergence of the “next big thing” in AI has been seen by critics such as Ed Zitron as a necessary step to justify ballooning investments in AI models that aren’t yet profitable. The industry is holding its breath that OpenAI, or a competitor, has some secret breakthrough waiting in the wings that will justify the massive costs associated with training and deploying LLMs.

But other AI critics, such as Gary Marcus, have postulated that major AI companies have reached a plateau of large language model (LLM) capability centered around GPT-4-level models since no AI company has yet made a major leap past the groundbreaking LLM that OpenAI released in March 2023. Microsoft CTO Kevin Scott has countered these claims, saying that LLM “scaling laws” (that suggest LLMs increase in capability proportionate to more compute power thrown at them) will continue to deliver improvements over time and that more patience is needed as the next generation (say, GPT-5) undergoes training.

In the scheme of things, Brockman’s move sounds like an extended, long overdue vacation (or perhaps a period to deal with personal issues beyond work). Regardless of the reason, the duration of the sabbatical raises questions about how the president of a major tech company can suddenly disappear for four months without affecting day-to-day operations, especially during a critical time in its history.

Unless, of course, things are fairly calm at OpenAI—and perhaps GPT-5 isn’t going to ship until at least next year when Brockman returns. But this is speculation on our part, and OpenAI (whether voluntarily or not) sometimes surprises us when we least expect it. (Just today, Altman dropped a hint on X about strawberries that some people interpret as being a hint of a potential major model undergoing testing or nearing release.)

A pattern of departures and the rise of Anthropic

Anthropic / Benj Edwards

What may sting OpenAI the most about the recent departures is that a few high-profile employees have left to join Anthropic, a San Francisco-based AI company founded in 2021 by ex-OpenAI employees Daniela and Dario Amodei.

Anthropic offers a subscription service called Claude.ai that is similar to ChatGPT. Its most recent LLM, Claude 3.5 Sonnet, along with its web-based interface, has rapidly gained favor over ChatGPT among some LLM users who are vocal on social media, though it likely does not yet match ChatGPT in terms of mainstream brand recognition.

In particular, John Schulman, an OpenAI co-founder and key figure in the company’s post-training process for LLMs, revealed in a statement on X that he’s leaving to join rival AI firm Anthropic to do more hands-on work: “This choice stems from my desire to deepen my focus on AI alignment, and to start a new chapter of my career where I can return to hands-on technical work.” Alignment is a field that hopes to guide AI models to produce helpful outputs.

In May, OpenAI alignment researcher Jan Leike left OpenAI to join Anthropic as well, criticizing OpenAI’s handling of alignment safety.

Adding to the recent employee shake-up, The Information reports that Peter Deng, a product leader who joined OpenAI last year after stints at Meta Platforms, Uber, and Airtable, has also left the company, though we do not yet know where he is headed. In May, OpenAI co-founder Ilya Sutskever left to found a rival startup, and prominent software engineer Andrej Karpathy departed in February, recently launching an educational venture.

As De Kraker noted, if OpenAI were on the verge of developing world-changing AI technology, wouldn’t these high-profile AI veterans want to stick around and be part of this historic moment in time? “Genuine question,” he wrote. “If you were pretty sure the company you’re a key part of—and have equity in—is about to crack AGI within one or two years… why would you jump ship?”

Despite the departures, Schulman expressed optimism about OpenAI’s future in his farewell note on X. “I am confident that OpenAI and the teams I was part of will continue to thrive without me,” he wrote. “I’m incredibly grateful for the opportunity to participate in such an important part of history and I’m proud of what we’ve achieved together. I’ll still be rooting for you all, even while working elsewhere.”

This article was updated on August 7, 2024 at 4: 23 PM to mention Sam Altman’s tweet about strawberries.

Major shifts at OpenAI spark skepticism about impending AGI timelines Read More »