AI

google’s-deepmind-is-building-an-ai-to-keep-us-from-hating-each-other

Google’s DeepMind is building an AI to keep us from hating each other


The AI did better than professional mediators at getting people to reach agreement.

Image of two older men arguing on a park bench.

An unprecedented 80 percent of Americans, according to a recent Gallup poll, think the country is deeply divided over its most important values ahead of the November elections. The general public’s polarization now encompasses issues like immigration, health care, identity politics, transgender rights, or whether we should support Ukraine. Fly across the Atlantic and you’ll see the same thing happening in the European Union and the UK.

To try to reverse this trend, Google’s DeepMind built an AI system designed to aid people in resolving conflicts. It’s called the Habermas Machine after Jürgen Habermas, a German philosopher who argued that an agreement in a public sphere can always be reached when rational people engage in discussions as equals, with mutual respect and perfect communication.

But is DeepMind’s Nobel Prize-winning ingenuity really enough to solve our political conflicts the same way they solved chess or StarCraft or predicting protein structures? Is it even the right tool?

Philosopher in the machine

One of the cornerstone ideas in Habermas’ philosophy is that the reason why people can’t agree with each other is fundamentally procedural and does not lie in the problem under discussion itself. There are no irreconcilable issues—it’s just the mechanisms we use for discussion are flawed. If we could create an ideal communication system, Habermas argued, we could work every problem out.

“Now, of course, Habermas has been dramatically criticized for this being a very exotic view of the world. But our Habermas Machine is an attempt to do exactly that. We tried to rethink how people might deliberate and use modern technology to facilitate it,” says Christopher Summerfield, a professor of cognitive science at Oxford University and a former DeepMind staff scientist who worked on the Habermas Machine.

The Habermas Machine relies on what’s called the caucus mediation principle. This is where a mediator, in this case the AI, sits through private meetings with all the discussion participants individually, takes their statements on the issue at hand, and then gets back to them with a group statement, trying to get everyone to agree with it. DeepMind’s mediating AI plays into one of the strengths of LLMs, which is the ability to briefly summarize a long body of text in a very short time. The difference here is that instead of summarizing one piece of text provided by one user, the Habermas Machine summarizes multiple texts provided by multiple users, trying to extract the shared ideas and find common ground in all of them.

But it has more tricks up its sleeve than simply processing text. At a technical level, the Habermas Machine is a system of two large language models. The first is the generative model based on the slightly fine-tuned Chinchilla, a somewhat dated LLM introduced by DeepMind back in 2022. Its job is to generate multiple candidates for a group statement based on statements submitted by the discussion participants. The second component in the Habermas Machine is a reward model that analyzes individual participants’ statements and uses them to predict how likely each individual is to agree with the candidate group statements proposed by the generative model.

Once that’s done, the candidate group statement with the highest predicted acceptance score is presented to the participants. Then, the participants write their critiques of this group statement, feed those critiques back into the system which generates updated group’s statements and repeats the process. The cycle goes on till the group statement is acceptable to everyone.

Once the AI was ready, DeepMind’s team started a fairly large testing campaign that involved over five thousand people discussing issues such as “should the voting age be lowered to 16?” or “should the British National Health Service be privatized?” Here, the Habermas Machine outperformed human mediators.

Scientific diligence

Most of the first batch of participants were sourced through a crowdsourcing research platform. They were divided into groups of five, and each team was assigned a topic to discuss, chosen from a list of over 5,000  statements about important issues in British politics. There were also control groups working with human mediators. In the caucus mediation process, those human mediators achieved a 44 percent acceptance rate for their handcrafted group statements. The AI scored 56 percent. Participants usually found the AI group statements to be better written as well.

But the testing didn’t end there. Because people you can find on crowdsourcing research platforms are unlikely to be representative of the British population, DeepMind also used a more carefully selected group of participants. They partnered with the Sortition Foundation, which specializes in organizing citizen assemblies in the UK, and assembled a group of 200 people representative of British society when it comes to age, ethnicity, socioeconomic status etc. The assembly was divided into groups of three that deliberated over the same nine questions. And the Habermas Machine worked just as well.

The agreement rate for the statement “we should be trying to reduce the number of people in prison” rose from a pre-discussion 60 percent agreement to 75 percent. The support for the more divisive idea of making it easier for asylum seekers to enter the country went from 39 percent at the start to 51 percent at the end of discussion, which allowed it to achieve majority support. The same thing happened with the problem of encouraging national pride, which started with 42 percent support and ended at 57 percent. The views held by the people in the assembly converged on five out of nine questions. Agreement was not reached on issues like Brexit, where participants were particularly entrenched in their starting positions. Still, in most cases, they left the experiment less divided than they were coming in. But there were some question marks.

The questions were not selected entirely at random. They were vetted, as the team wrote in their paper, to “minimize the risk of provoking offensive commentary.” But isn’t that just an elegant way of saying, ‘We carefully chose issues unlikely to make people dig in and throw insults at each other so our results could look better?’

Conflicting values

“One example of the things we excluded is the issue of transgender rights,” Summerfield told Ars. “This, for a lot of people, has become a matter of cultural identity. Now clearly that’s a topic which we can all have different views on, but we wanted to err on the side of caution and make sure we didn’t make our participants feel unsafe. We didn’t want anyone to come out of the experiment feeling that their basic fundamental view of the world had been dramatically challenged.”

The problem is that when your aim is to make people less divided, you need to know where the division lines are drawn. And those lines, if Gallup polls are to be trusted, are not only drawn between issues like whether the voting age should be 16 or 18 or 21. They are drawn between conflicting values. The Daily Show’s Jon Stewart argued that, for the right side of the US’s political spectrum, the only division line that matters today is “woke” versus “not woke.”

Summerfield and the rest of the Habermas Machine team excluded the question about transgender rights because they believed participants’ well-being should take precedence over the benefit of testing their AI’s performance on more divisive issues. They excluded other questions as well like the problem of climate change.

Here, the reason Summerfield gave was that climate change is a part of an objective reality—it either exists or it doesn’t, and we know it does. It’s not a matter of opinion you can discuss. That’s scientifically accurate. But when the goal is fixing politics, scientific accuracy isn’t necessarily the end state.

If major political parties are to accept the Habermas Machine as the mediator, it has to be universally perceived as impartial. But at least some of the people behind AIs are arguing that an AI can’t be impartial. After OpenAI released the ChatGPT in 2022, Elon Musk posted a tweet, the first of many, where he argued against what he called the “woke” AI. “The danger of training AI to be woke—in other words, lie—is deadly,” Musk wrote. Eleven months later, he announced Grok, his own AI system marketed as “anti-woke.” Over 200 million of his followers were introduced to the idea that there were “woke AIs” that had to be countered by building “anti-woke AIs”—a world where the AI was no longer an agnostic machine but a tool pushing the political agendas of its creators.

Playing pigeons’ games

“I personally think Musk is right that there have been some tests which have shown that the responses of language models tend to favor more progressive and more libertarian views,” Summerfield says. “But it’s interesting to note that those experiments have been usually run by forcing the language model to respond to multiple-choice questions. You ask ‘is there too much immigration’ for example, and the answers are either yes or no. This way the model is kind of forced to take an opinion.”

He said that if you use the same queries as open-ended questions, the responses you get are, for the large part, neutral and balanced. “So, although there have been papers that express the same view as Musk, in practice, I think it’s absolutely untrue,” Summerfield claims.

Does it even matter?

Summerfield did what you would expect a scientist to do: He dismissed Musk’s claims as based on a selective reading of the evidence. That’s usually checkmate in the world of science. But in the world politics, being correct is not what matters the most. Musk was short, catchy, and easy to share and remember. Trying to counter that by discussing methodology in some papers nobody read was a bit like playing chess with a pigeon.

At the same time, Summerfield had his own ideas about AI that others might consider dystopian. “If politicians want to know what the general public thinks today, they might run a poll. But people’s opinions are nuanced, and our tool allows for aggregation of opinions, potentially many opinions, in the highly dimensional space of language itself,” he says. While his idea is that the Habermas Machine can potentially find useful points of political consensus, nothing is stopping it from also being used to craft speeches optimized to win over as many people as possible.

That may be in keeping with Habermas’ philosophy, though. If you look past the myriads of abstract concepts ever-present in German idealism, it offers a pretty bleak view of the world. “The system,” driven by power and money of corporations and corrupt politicians, is out to colonize “the lifeworld,” roughly equivalent to the private sphere we share with our families, friends, and communities. The way you get things done in “the lifeworld” is through seeking consensus, and the Habermas Machine, according to DeepMind, is meant to help with that. The way you get things done in “the system,” on the other hand, is through succeeding—playing it like a game and doing whatever it takes to win with no holds barred, and Habermas Machine apparently can help with that, too.

The DeepMind team reached out to Habermas to get him involved in the project. They wanted to know what he’d have to say about the AI system bearing his name.  But Habermas has never got back to them. “Apparently, he doesn’t use emails,” Summerfield says.

Science, 2024.  DOI: 10.1126/science.adq2852

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Google’s DeepMind is building an AI to keep us from hating each other Read More »

annoyed-redditors-tanking-google-search-results-illustrates-perils-of-ai-scrapers

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers

Fed up Londoners

Apparently, some London residents are getting fed up with social media influencers whose reviews make long lines of tourists at their favorite restaurants, sometimes just for the likes. Christian Calgie, a reporter for London-based news publication Daily Express, pointed out this trend on X yesterday, noting the boom of Redditors referring people to Angus Steakhouse, a chain restaurant, to combat it.

As Gizmodo deduced, the trend seemed to start on the r/London subreddit, where a user complained about a spot in Borough Market being “ruined by influencers” on Monday:

“Last 2 times I have been there has been a queue of over 200 people, and the ones with the food are just doing the selfie shit for their [I]nsta[gram] pages and then throwing most of the food away.”

As of this writing, the post has 4,900 upvotes and numerous responses suggesting that Redditors talk about how good Angus Steakhouse is so that Google picks up on it. Commenters quickly understood the assignment.

“Agreed with other posters Angus steakhouse is absolutely top tier and tourists shoyldnt [sic] miss out on it,” one Redditor wrote.

Another Reddit user wrote:

Spreading misinformation suddenly becomes a noble goal.

As of this writing, asking Google for the best steak, steakhouse, or steak sandwich in London (or similar) isn’t generating an AI Overview result for me. But when I searched for the best steak sandwich in London, the top result is from Reddit, including a thread from four days ago titled “Which Angus Steakhouse do you recommend for their steak sandwich?” and one from two days ago titled “Had to see what all the hype was about, best steak sandwich I’ve ever had!” with a picture of an Angus Steakhouse.

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers Read More »

google-offers-its-ai-watermarking-tech-as-free-open-source-toolkit

Google offers its AI watermarking tech as free open source toolkit

Google also notes that this kind of watermarking works best when there is a lot of “entropy” in the LLM distribution, meaning multiple valid candidates for each token (e.g., “my favorite tropical fruit is [mango, lychee, papaya, durian]”). In situations where an LLM “almost always returns the exact same response to a given prompt”—such as basic factual questions or models tuned to a lower “temperature”—the watermark is less effective.

A diagram explaining how SynthID’s text watermarking works.

A diagram explaining how SynthID’s text watermarking works. Credit: Google / Nature

Google says SynthID builds on previous similar AI text watermarking tools by introducing what it calls a Tournament sampling approach. During the token-generation loop, this approach runs each potential candidate token through a multi-stage, bracket-style tournament, where each round is “judged” by a different randomized watermarking function. Only the final winner of this process makes it into the eventual output.

Can they tell it’s Folgers?

Changing the token selection process of an LLM with a randomized watermarking tool could obviously have a negative effect on the quality of the generated text. But in its paper, Google shows that SynthID can be “non-distortionary” on the level of either individual tokens or short sequences of text, depending on the specific settings used for the tournament algorithm. Other settings can increase the “distortion” introduced by the watermarking tool while at the same time increasing the detectability of the watermark, Google says.

To test how any potential watermark distortions might affect the perceived quality and utility of LLM outputs, Google routed “a random fraction” of Gemini queries through the SynthID system and compared them to unwatermarked counterparts. Across 20 million total responses, users gave 0.1 percent more “thumbs up” ratings and 0.2 percent fewer “thumbs down” ratings to the watermarked responses, showing barely any human-perceptible difference across a large set of real LLM interactions.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy. Credit: Google / Nature

Google’s testing also showed its SynthID detection algorithm successfully detected AI-generated text significantly more often than previous watermarking schemes like Gumbel sampling. But the size of this improvement—and the total rate at which SynthID can successfully detect AI-generated text—depends heavily on the length of the text in question and the temperature setting of the model being used. SynthID was able to detect nearly 100 percent of 400-token-long AI-generated text samples from Gemma 7B-1T at a temperature of 1.0, for instance, compared to about 40 percent for 100-token samples from the same model at a 0.5 temperature.

Google offers its AI watermarking tech as free open source toolkit Read More »

at-ted-ai-2024,-experts-grapple-with-ai’s-growing-pains

At TED AI 2024, experts grapple with AI’s growing pains


A year later, a compelling group of TED speakers move from “what’s this?” to “what now?”

The opening moments of TED AI 2024 in San Francisco on October 22, 2024.

The opening moments of TED AI 2024 in San Francisco on October 22, 2024. Credit: Benj Edwards

SAN FRANCISCO—On Tuesday, TED AI 2024 kicked off its first day at San Francisco’s Herbst Theater with a lineup of speakers that tackled AI’s impact on science, art, and society. The two-day event brought a mix of researchers, entrepreneurs, lawyers, and other experts who painted a complex picture of AI with fairly minimal hype.

The second annual conference, organized by Walter and Sam De Brouwer, marked a notable shift from last year’s broad existential debates and proclamations of AI as being “the new electricity.” Rather than sweeping predictions about, say, looming artificial general intelligence (although there was still some of that, too), speakers mostly focused on immediate challenges: battles over training data rights, proposals for hardware-based regulation, debates about human-AI relationships, and the complex dynamics of workplace adoption.

The day’s sessions covered a wide breadth: physicist Carlo Rovelli explored consciousness and time, Project CETI researcher Patricia Sharma demonstrated attempts to use AI to decode whale communication, Recording Academy CEO Harvey Mason Jr. outlined music industry adaptation strategies, and even a few robots made appearances.

The shift from last year’s theoretical discussions to practical concerns was particularly evident during a presentation from Ethan Mollick of the Wharton School, who tackled what he called “the productivity paradox”—the disconnect between AI’s measured impact and its perceived benefits in the workplace. Already, organizations are moving beyond the gee-whiz period after ChatGPT’s introduction and into the implications of widespread use.

Sam De Brouwer and Walter De Brouwer organized TED AI and selected the speakers. Benj Edwards

Drawing from research claiming AI users complete tasks faster and more efficiently, Mollick highlighted a peculiar phenomenon: While one-third of Americans reported using AI in August of this year, managers often claim “no one’s using AI” in their organizations. Through a live demonstration using multiple AI models simultaneously, Mollick illustrated how traditional work patterns must evolve to accommodate AI’s capabilities. He also pointed to the emergence of what he calls “secret cyborgs“—employees quietly using AI tools without management’s knowledge. Regarding the future of jobs in the age of AI, he urged organizations to view AI as an opportunity for expansion rather than merely a cost-cutting measure.

Some giants in the AI field made an appearance. Jakob Uszkoreit, one of the eight co-authors of the now-famous “Attention is All You Need” paper that introduced Transformer architecture, reflected on the field’s rapid evolution. He distanced himself from the term “artificial general intelligence,” suggesting people aren’t particularly general in their capabilities. Uszkoreit described how the development of Transformers sidestepped traditional scientific theory, comparing the field to alchemy. “We still do not know how human language works. We do not have a comprehensive theory of English,” he noted.

Stanford professor Surya Ganguli presenting at TED AI 2024. Benj Edwards

And refreshingly, the talks went beyond AI language models. For example, Isomorphic Labs Chief AI Officer Max Jaderberg, who previously worked on Google DeepMind’s AlphaFold 3, gave a well-received presentation on AI-assisted drug discovery. He detailed how AlphaFold has already saved “1 billion years of research time” by discovering the shapes of proteins and showed how AI agents are now capable of running thousands of parallel drug design simulations that could enable personalized medicine.

Danger and controversy

While hype was less prominent this year, some speakers still spoke of AI-related dangers. Paul Scharre, executive vice president at the Center for a New American Security, warned about the risks of advanced AI models falling into malicious hands, specifically citing concerns about terrorist attacks with AI-engineered biological weapons. Drawing parallels to nuclear proliferation in the 1960s, Scharre argued that while regulating software is nearly impossible, controlling physical components like specialized chips and fabrication facilities could provide a practical framework for AI governance.

ReplikaAI founder Eugenia Kuyda cautioned that AI companions could become “the most dangerous technology if not done right,” suggesting that the existential threat from AI might come not from science fiction scenarios but from technology that isolates us from human connections. She advocated for designing AI systems that optimize for human happiness rather than engagement, proposing a “human flourishing metric” to measure its success.

Ben Zhao, a University of Chicago professor associated with the Glaze and Nightshade projects, painted a dire picture of AI’s impact on art, claiming that art schools were seeing unprecedented enrollment drops and galleries were closing at an accelerated rate due to AI image generators, though we have yet to dig through the supporting news headlines he momentarily flashed up on the screen.

Some of the speakers represented polar opposites of each other, policy-wise. For example, copyright attorney Angela Dunning offered a defense of AI training as fair use, drawing from historical parallels in technological advancement. A litigation partner at Cleary Gottlieb, which has previously represented the AI image generation service Midjourney in a lawsuit, Dunning quoted Mark Twin saying “there is no such thing as a new idea” and argued that copyright law allows for building upon others’ ideas while protecting specific expressions. She compared current AI debates to past technological disruptions, noting how photography, once feared as a threat to traditional artists, instead sparked new artistic movements like abstract art and pointillism. “Art and science can only remain free if we are free to build on the ideas of those that came before,” Dunning said, challenging more restrictive views of AI training.

Copyright lawyer Angela Dunning quoted Mark Twain in her talk about fair use and AI. Benj Edwards

Dunning’s presentation stood in direct opposition to Ed Newton-Rex, who had earlier advocated for mandatory licensing of training data through his nonprofit Fairly Trained. In fact, the same day, Newton-Rex’s organization unveiled a “Statement on AI training” signed by many artists that says, “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.” The issue has not yet been legally settled in US courts, but clearly, the battle lines have been drawn, and no matter which side you take, TED AI did a good job of giving both perspectives to the audience.

Looking forward

Some speakers explored potential new architectures for AI. Stanford professor Surya Ganguli highlighted the contrast between AI and human learning, noting that while AI models require trillions of tokens to train, humans learn language from just millions of exposures. He proposed “quantum neuromorphic computing” as a potential bridge between biological and artificial systems, suggesting a future where computers could potentially match the energy efficiency of the human brain.

Also, Guillaume Verdon, founder of Extropic and architect of the Effective Accelerationism (often called “E/Acc”) movement, presented what he called “physics-based intelligence” and claimed his company is “building a steam engine for AI,” potentially offering energy efficiency improvements up to 100 million times better than traditional systems—though he acknowledged this figure ignores cooling requirements for superconducting components. The company had completed its first room-temperature chip tape-out just the previous week.

The Day One sessions closed out with predictions about the future of AI from OpenAI’s Noam Brown, who emphasized the importance of scale in expanding future AI capabilities, and University of Washington professor Pedro Domingos spoke about “co-intelligence,” saying, “People are smart, organizations are stupid” and proposing that AI could be used to bridge that gap by drawing on the collective intelligence of an organization.

When attended TED AI last year, some obvious questions emerged: Is this current wave of AI a fad? Will there be a TED AI next year? I think the second TED AI answered these questions well—AI isn’t going away, and there are still endless angles to explore as the field expands rapidly.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

At TED AI 2024, experts grapple with AI’s growing pains Read More »

ios-18.2-developer-beta-adds-chatgpt-and-image-generation-features

iOS 18.2 developer beta adds ChatGPT and image-generation features

Today, Apple released the first developer beta of iOS 18.2 for supported devices. This beta release marks the first time several key AI features that Apple teased at its developer conference this June are available.

Apple is marketing a wide range of generative AI features under the banner “Apple Intelligence.” Initially, Apple Intelligence was planned to release as part of iOS 18, but some features slipped to iOS 18.1, others to iOS 18.2, and a few still to future undisclosed software updates.

iOS 18.1 has been in beta for a while and includes improvements to Siri, generative writing tools that help with rewriting or proofreading, smart replies for Messages, and notification summaries. That update is expected to reach the public next week.

Today’s developer update, iOS 18.2, includes some potentially more interesting components of Apple Intelligence, including Genmoji, Image Playground, Visual Intelligence with Camera Control, and ChatGPT integration.

Genmoji and Image Playground allow users to generate images on-device to send to friends in Messages; there will be Genmoji and Image Playground APIs to allow third-party messaging apps to work with Genmojis, too.

ChatGPT integration allows Siri to pass off user queries that are outside Siri’s normal scope to be answered instead by OpenAI’s ChatGPT. A ChatGPT account is not required, but logging in with an existing account gives you access to premium models available as part of a ChatGPT subscription. If you’re using these features without a ChatGPT account, OpenAI won’t be able to retain your data or use it to train models. If you connect your ChatGPT account, though, then OpenAI’s privacy policies will apply for ChatGPT queries instead of Apple’s.

Genmoji and Image Playground queries will be handled locally on the user’s device, but other Apple Intelligence features may dynamically opt to send queries to the cloud for computation.

There’s no word yet on when iOS 18.2 will be released publicly.

iOS 18.2 developer beta adds ChatGPT and image-generation features Read More »

anthropic-publicly-releases-ai-tool-that-can-take-over-the-user’s-mouse-cursor

Anthropic publicly releases AI tool that can take over the user’s mouse cursor

An arms race and a wrecking ball

Competing companies like OpenAI have been working on equivalent tools but have not made them publicly available yet. It’s something of an arms race, as these tools are projected to generate a lot of revenue in a few years if they progress as expected.

There’s a belief that these tools could eventually automate many menial tasks in office jobs. It could also be a useful tool for developers in that it could “automate repetitive tasks” and streamline laborious QA and optimization work.

That has long been part of Anthropic’s message to investors: Its AI tools could handle large portions of some office jobs more efficiently and affordably than humans can. The public testing of the Computer Use feature is a step toward achieving that goal.

We’re, of course, familiar with the ongoing argument about these types of tools between the “it’s just a tool that will make people’s jobs easier” and the “it will put people out of work across industries like a wrecking ball”—both of these things could happen to some degree. It’s just a question of what the ratio will be—and that may vary by situation or industry.

There are numerous valid concerns about the widespread deployment of this technology, though. To its credit, Anthropic has tried to anticipate some of these by putting safeguards in from the get-go. The company gave some examples in its blog post:

Our teams have developed classifiers and other methods to flag and mitigate these kinds of abuses. Given the upcoming US elections, we’re on high alert for attempted misuses that could be perceived as undermining public trust in electoral processes. While computer use is not sufficiently advanced or capable of operating at a scale that would present heightened risks relative to existing capabilities, we’ve put in place measures to monitor when Claude is asked to engage in election-related activity, as well as systems for nudging Claude away from activities like generating and posting content on social media, registering web domains, or interacting with government websites.

These safeguards may not be perfect, as there may be creative ways to circumvent them or other unintended consequences or misuses yet to be discovered.

Right now, Anthropic is putting Computer Use out there for testing to see what problems arise and to work with developers to improve its capabilities and find positive uses.

Anthropic publicly releases AI tool that can take over the user’s mouse cursor Read More »

openai-releases-chatgpt-app-for-windows

OpenAI releases ChatGPT app for Windows

On Thursday, OpenAI released an early Windows version of its first ChatGPT app for Windows, following a Mac version that launched in May. Currently, it’s only available to subscribers of Plus, Team, Enterprise, and Edu versions of ChatGPT, and users can download it for free in the Microsoft Store for Windows.

OpenAI is positioning the release as a beta test. “This is an early version, and we plan to bring the full experience to all users later this year,” OpenAI writes on the Microsoft Store entry for the app. (Interestingly, ChatGPT shows up as being rated “T for Teen” by the ESRB in the Windows store, despite not being a video game.)

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

A screenshot of the new Windows ChatGPT app captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app captured on October 18, 2024. Credit: Benj Edwards

Upon opening the app, OpenAI requires users to log into a paying ChatGPT account, and from there, the app is basically identical to the web browser version of ChatGPT. You can currently use it to access several models: GPT-4o, GPT-4o with Canvas, 01-preview, 01-mini, GPT-4o mini, and GPT-4. Also, it can generate images using DALL-E 3 or analyze uploaded files and images.

If you’re running Windows 11, you can instantly call up a small ChatGPT window when the app is open using an Alt+Space shortcut (it did not work in Windows 10 when we tried). That could be handy for asking ChatGPT a quick question at any time.

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024.

Credit: Benj Edwards

A screenshot of the new Windows ChatGPT app listing in the Microsoft Store captured on October 18, 2024. Credit: Benj Edwards

And just like the web version, all the AI processing takes place in the cloud on OpenAI’s servers, which means an Internet connection is required.

So as usual, chat like somebody’s watching, and don’t rely on ChatGPT as a factual reference for important decisions—GPT-4o in particular is great at telling you what you want to hear, whether it’s correct or not. As OpenAI says in a small disclaimer at the bottom of the app window: “ChatGPT can make mistakes.”

OpenAI releases ChatGPT app for Windows Read More »

adobe-shows-off-3d-rotation-tool-for-flat-drawings

Adobe shows off 3D rotation tool for flat drawings

“That’s wizardry”

The on-stage demo showed off rotations for a number of varied images, from largely symmetrical dragons, horses, and bats to more complex shapes like a sketch of a bread basket or a living cup of fries (complete with arms, legs, eyes, and a mouth). In each case, the machine-learning algorithm does an admirable job assuming unseen parts of the model from what’s available in the original 2D view, extrapolating a full set of legs on a side-view horse or the bottom of the Fry Man’s shoes, for instance.

Vertical rotation lets you see the bottom of Fry Man’s shoes here.

Vertical rotation lets you see the bottom of Fry Man’s shoes here. Credit: Adobe

Still, we’re sure the vector models on stage were chosen to show Project Turntable in its best light. Without a public testable version, it’s hard to say how it would handle weird edge cases or drawings that don’t closely match objects in its training data (which we don’t know the extent of).

Even so, what was shown on stage has some obvious appeal for working artists. After seeing the on-stage video, Ars Creative Director Aurich Lawson exclaimed on our internal Slack, “That’s wizardry. I don’t know how well it really works—I bet not nearly as good as that demo a lot of the time—but I’m impressed.”

Project Turntable is also notable because it augments original work by human artists rather than replacing it with images created whole cloth by AI. While Project Turntable saves those artists the effort of drawing their 2D objects and characters from multiple angles, that human artist is still responsible for the overall style and look of that original work. Maintaining that human style seems to be a key point for Adobe, which points out that “even after the rotation, the vector graphics stay true to the original shape so you don’t lose any of the design’s essence.”

Adobe’s Brian Domingo told the Creative Bloq blog there’s still no guarantee that Project Turntable will ever be released commercially. Given the obvious enthusiasm of the demo crowd at the MAX conference, though, we think it’s safe to assume that Adobe will do whatever it can to get this feature ready for prime time as soon as possible.

Adobe shows off 3D rotation tool for flat drawings Read More »

cheap-ai-“video-scraping”-can-now-extract-data-from-any-screen-recording

Cheap AI “video scraping” can now extract data from any screen recording


Researcher feeds screen recordings into Gemini to extract accurate information with ease.

Abstract 3d background with different cubes

Recently, AI researcher Simon Willison wanted to add up his charges from using a cloud service, but the payment values and dates he needed were scattered among a dozen separate emails. Inputting them manually would have been tedious, so he turned to a technique he calls “video scraping,” which involves feeding a screen recording video into an AI model, similar to ChatGPT, for data extraction purposes.

What he discovered seems simple on its surface, but the quality of the result has deeper implications for the future of AI assistants, which may soon be able to see and interact with what we’re doing on our computer screens.

“The other day I found myself needing to add up some numeric values that were scattered across twelve different emails,” Willison wrote in a detailed post on his blog. He recorded a 35-second video scrolling through the relevant emails, then fed that video into Google’s AI Studio tool, which allows people to experiment with several versions of Google’s Gemini 1.5 Pro and Gemini 1.5 Flash AI models.

Willison then asked Gemini to pull the price data from the video and arrange it into a special data format called JSON (JavaScript Object Notation) that included dates and dollar amounts. The AI model successfully extracted the data, which Willison then formatted as CSV (comma-separated values) table for spreadsheet use. After double-checking for errors as part of his experiment, the accuracy of the results—and what the video analysis cost to run—surprised him.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video.

A screenshot of Simon Willison using Google Gemini to extract data from a screen capture video. Credit: Simon Willison

“The cost [of running the video model] is so low that I had to re-run my calculations three times to make sure I hadn’t made a mistake,” he wrote. Willison says the entire video analysis process ostensibly cost less than one-tenth of a cent, using just 11,018 tokens on the Gemini 1.5 Flash 002 model. In the end, he actually paid nothing because Google AI Studio is currently free for some types of use.

Video scraping is just one of many new tricks possible when the latest large language models (LLMs), such as Google’s Gemini and GPT-4o, are actually “multimodal” models, allowing audio, video, image, and text input. These models translate any multimedia input into tokens (chunks of data), which they use to make predictions about which tokens should come next in a sequence.

A term like “token prediction model” (TPM) might be more accurate than “LLM” these days for AI models with multimodal inputs and outputs, but a generalized alternative term hasn’t really taken off yet. But no matter what you call it, having an AI model that can take video inputs has interesting implications, both good and potentially bad.

Breaking down input barriers

Willison is far from the first person to feed video into AI models to achieve interesting results (more on that below, and here’s a 2015 paper that uses the “video scraping” term), but as soon as Gemini launched its video input capability, he began to experiment with it in earnest.

In February, Willison demonstrated another early application of AI video scraping on his blog, where he took a seven-second video of the books on his bookshelves, then got Gemini 1.5 Pro to extract all of the book titles it saw in the video and put them in a structured, or organized, list.

Converting unstructured data into structured data is important to Willison, because he’s also a data journalist. Willison has created tools for data journalists in the past, such as the Datasette project, which lets anyone publish data as an interactive website.

To every data journalist’s frustration, some sources of data prove resistant to scraping (capturing data for analysis) due to how the data is formatted, stored, or presented. In these cases, Willison delights in the potential for AI video scraping because it bypasses these traditional barriers to data extraction.

“There’s no level of website authentication or anti-scraping technology that can stop me from recording a video of my screen while I manually click around inside a web application,” Willison noted on his blog. His method works for any visible on-screen content.

Video is the new text

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball.

An illustration of a cybernetic eyeball. Credit: Getty Images

The ease and effectiveness of Willison’s technique reflect a noteworthy shift now underway in how some users will interact with token prediction models. Rather than requiring a user to manually paste or type in data in a chat dialog—or detail every scenario to a chatbot as text—some AI applications increasingly work with visual data captured directly on the screen. For example, if you’re having trouble navigating a pizza website’s terrible interface, an AI model could step in and perform the necessary mouse clicks to order the pizza for you.

In fact, video scraping is already on the radar of every major AI lab, although they are not likely to call it that at the moment. Instead, tech companies typically refer to these techniques as “video understanding” or simply “vision.”

In May, OpenAI demonstrated a prototype version of its ChatGPT Mac App with an option that allowed ChatGPT to see and interact with what is on your screen, but that feature has not yet shipped. Microsoft demonstrated a similar “Copilot Vision” prototype concept earlier this month (based on OpenAI’s technology) that will be able to “watch” your screen and help you extract data and interact with applications you’re running.

Despite these research previews, OpenAI’s ChatGPT and Anthropic’s Claude have not yet implemented a public video input feature for their models, possibly because it is relatively computationally expensive for them to process the extra tokens from a “tokenized” video stream.

For the moment, Google is heavily subsidizing user AI costs with its war chest from Search revenue and a massive fleet of data centers (to be fair, OpenAI is subsidizing, too, but with investor dollars and help from Microsoft). But costs of AI compute in general are dropping by the day, which will open up new capabilities of the technology to a broader user base over time.

Countering privacy issues

As you might imagine, having an AI model see what you do on your computer screen can have downsides. For now, video scraping is great for Willison, who will undoubtedly use the captured data in positive and helpful ways. But it’s also a preview of a capability that could later be used to invade privacy or autonomously spy on computer users on a scale that was once impossible.

A different form of video scraping caused a massive wave of controversy recently for that exact reason. Apps such as the third-party Rewind AI on the Mac and Microsoft’s Recall, which is being built into Windows 11, operate by feeding on-screen video into an AI model that stores extracted data into a database for later AI recall. Unfortunately, that approach also introduces potential privacy issues because it records everything you do on your machine and puts it in a single place that could later be hacked.

To that point, although Willison’s technique currently involves uploading a video of his data to Google for processing, he is pleased that he can still decide what the AI model sees and when.

“The great thing about this video scraping technique is that it works with anything that you can see on your screen… and it puts you in total control of what you end up exposing to the AI model,” Willison explained in his blog post.

It’s also possible in the future that a locally run open-weights AI model could pull off the same video analysis method without the need for a cloud connection at all. Microsoft Recall runs locally on supported devices, but it still demands a great deal of unearned trust. For now, Willison is perfectly content to selectively feed video data to AI models when the need arises.

“I expect I’ll be using this technique a whole lot more in the future,” he wrote, and perhaps many others will, too, in different forms. If the past is any indication, Willison—who coined the term “prompt injection” in 2022—seems to always be a few steps ahead in exploring novel applications of AI tools. Right now, his attention is on the new implications of AI and video, and yours probably should be, too.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

Cheap AI “video scraping” can now extract data from any screen recording Read More »

student-was-punished-for-using-ai—then-his-parents-sued-teacher-and-administrators

Student was punished for using AI—then his parents sued teacher and administrators


Parents claim there was no rule banning AI, but school cites multiple policies.

Illustration of a robot's head on a digital background, to represent an artificial intelligence chatbot

A school district in Massachusetts was sued by a student’s parents after the boy was punished for using an artificial intelligence chatbot to complete an assignment. The lawsuit says the Hingham High School student handbook did not include a restriction on the use of AI.

“They told us our son cheated on a paper, which is not what happened,” Jennifer Harris told WCVB. “They basically punished him for a rule that doesn’t exist.”

Jennifer and her husband, Dale, filed the lawsuit in Plymouth County Superior Court, and the case was then moved to US District Court for the District of Massachusetts. Defendants include the superintendent, principal, a teacher, the history department head, and the Hingham School Committee.

The student is referred to by his initials, RNH. The lawsuit alleges violations of the student’s civil rights, including “the Plaintiff Student’s personal and property rights and liberty to acquire, possess, maintain and protect his rights to equal educational opportunity.”

The defendants’ motion to dismiss the complaint, filed last week, said RNH admitted “that he used an AI tool to generate ideas and shared that he also created portions of his notes and scripts using the AI tool, and described the specific prompt that he put into the chatbot. RNH unequivocally used another author’s language and thoughts, be it a digital and artificial author, without express permission to do so. Furthermore, he did not cite to his use of AI in his notes, scripts or in the project he submitted.”

The school officials’ court filing points to a section of the student handbook on cheating and plagiarism. Although the section doesn’t mention AI, it bans “unauthorized use of technology during an assignment” and “unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one’s own work.”

“Incredibly, RNH and his parents contend that using AI to draft, edit and research content for an AP US History project, all while not citing to use of AI in the project, is not an ‘act of dishonesty,’ ‘use of unauthorized technology’ or plagiarism,” defendants wrote.

School: Policy bans AI tools unless explicitly permitted

The parents’ motion for a preliminary injunction points to the same section of the student handbook and says it was “silent on any policy, procedure, expectation, conduct, discipline, sanction or consequence for the use of AI.” The use of AI was thus “not a violation” of the policy at the time, they say.

School officials cite more than just the student handbook section. They say that in fall 2023, RNH and his classmates were given a copy of a “written policy on Academic Dishonesty and AI expectations” that says students “shall not use AI tools during in-class examinations, processed writing assignments, homework or classwork unless explicitly permitted and instructed.”

The policy quoted in the court filing also says students should “give credit to AI tools whenever used, even if only to generate ideas or edit a small section of student work.” According to defendants, students were instructed to “add an appendix for every use of AI” with the following information:

  • the entire exchange, highlighting the most relevant sections;
  • a description of precisely which AI tools were used (e.g. ChatGPT private subscription version or Bard);
  • an explanation of how the AI tools were used (e.g. to generate ideas, turns of phrase, identify elements of text, edit long stretches of text, build lines of argument, locate pieces of evidence, create concept or planning maps, illustrations of key concepts, etc.);
  • an account of why AI tools were used (e.g. procrastination, to surmount writer’s block, to stimulate thinking, to manage stress level, to address mismanagement of time, to clarify prose, to translate text, to experiment with the technology, etc.).

The incident happened in December 2023 when RNH and a classmate “teamed up for a Social Studies project for the long-running historical contest known colloquially as ‘National History Day,'” the parents’ motion for a preliminary injunction said. The students “used AI to prepare the initial outline and research” for a project on basketball legend Kareem Abdul-Jabbar and his work as a civil rights activist.

The parents’ motion alleges that RNH and his classmate were “unfairly and unjustly accused of cheating, plagiarism, and academic dishonesty.” The defendants “act[ed] as investigator, judge, jury, and executioner in determining the extreme and outrageous sanctions imposed upon these Students,” they allege. A hearing on the motion for preliminary injunction has been set for October 22.

Parents say it isn’t plagiarism

RNH and his classmate “receiv[ed] multiple zeros for different portions of the project” and a Saturday detention, the parents’ motion said. RNH was given a zero on the notes and rough draft portions of the project, and his overall grade on the final paper was 65 out of 100. His average in the “college-level, advanced placement course” allegedly dropped from 84 to 78. The students were also barred from selection for the National Honor Society.

“While there is much dispute as to whether the use of generative AI constitutes plagiarism, plagiarism is defined as the practice of taking someone else’s work or ideas and passing them off as one’s own. During the project, RNH and his classmate did not take someone else’s work or ideas and pass them off as their own,” the motion said. The students “used AI, which generates and synthesizes new information.”

The National Honor Society exclusion was eventually reversed, but not in time for RNH’s applications to colleges for early decision, the parents allege. The initial lawsuit in Plymouth County Superior Court was filed on September 16 and said that RNH was still barred from the group at that time.

“This fall, the district allowed him to reapply for National Honor Society. He was inducted Oct. 8, but the student’s attorney says the damage had already been done,” according to the Patriot Ledger. “Peter Farrell, the student’s lawyer, said the reversal happened only after an investigation revealed that seven other students disciplined for academic dishonesty had been inducted into the National Honors Society, including one student censured for use of artificial intelligence.”

The motion said the punishment had “a significant, severe, and continuing impact on RNH’s future earning capacity, earning potential, and acceptance into an elite college or university course of study given his exemplary academic achievements.” The parents allege that “Defendants exceeded the authority granted to them in an abuse of authority, discretion, and unfettered state action by unfairly and unjustly acting as investigator, judge, jury, and executioner in determining the extreme and outrageous sanctions imposed upon these Students.”

Now “a senior at the top of his class,” RNH is “a three-sport varsity student-athlete, maintains a high grade point average, scored 1520 on his SAT, earned a perfect score on the ACT, and should receive a National Merit Scholarship Corporation Letter of Commendation,” the motion said. “In addition to his high level of academic and athletic achievement, RNH has substantial community service hours including working with cognitively impaired children playing soccer with the Special Needs Athletic Partnership known as ‘SNAP.'”

School defends “relatively lenient” discipline

In their motion to dismiss, school officials defended “the just and legitimate discipline rendered to RNH.”

“This lawsuit is not about the expulsion, or even the suspension, of a high school student,” the school response said. “Instead, the dispute concerns a student, RNH, dissatisfied with a letter grade in AP US History class, having to attend a ‘Saturday’ detention, and his deferral from NHS—rudimentary student discipline administered for an academic integrity violation. RNH was given relatively lenient and measured discipline for a serious infraction, using Artificial Intelligence (‘AI’) on a project, amounting to something well less than a suspension. The discipline was consistent with the applicable Student Handbook.”

The defendants said the court “should not usurp [the] substantial deference given to schools over discipline. Because school officials are in the best position to determine when a student’s actions threaten the safety and welfare of other students, the SJC [Supreme Judicial Court] has stated that school officials must be granted substantial deference in their disciplinary choices.”

The parents’ motion for a preliminary injunction seeks an order requiring defendants “to immediately repair, restore and rectify Plaintiff Student’s letter grade in Social Studies to a grade of ‘B,'” and to expunge “any grade, report, transcript entry or record of discipline imposing any kind of academic sanction” from the incident.

The parents further request the exclusion of “any zero grade from grade calculations for the subject assignment” and an order prohibiting the school district “from characterizing the use of artificial intelligence by the Plaintiff Student as ‘cheating’ or classifying such use as an ‘academic integrity infraction’ or ‘academic dishonesty.'”

The parents also want an order requiring defendants “to undergo training in the use and implementation of artificial intelligence in the classroom, schools and educational environment by a duly qualified third party not employed by the District.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

Student was punished for using AI—then his parents sued teacher and administrators Read More »

deepfake-lovers-swindle-victims-out-of-$46m-in-hong-kong-ai-scam

Deepfake lovers swindle victims out of $46M in Hong Kong AI scam

The police operation resulted in the seizure of computers, mobile phones, and about $25,756 in suspected proceeds and luxury watches from the syndicate’s headquarters. Police said that victims originated from multiple countries, including Hong Kong, mainland China, Taiwan, India, and Singapore.

A widening real-time deepfake problem

Realtime deepfakes have become a growing problem over the past year. In August, we covered a free app called Deep-Live-Cam that can do real-time face-swaps for video chat use, and in February, the Hong Kong office of British engineering firm Arup lost $25 million in an AI-powered scam in which the perpetrators used deepfakes of senior management during a video conference call to trick an employee into transferring money.

News of the scam also comes amid recent warnings from the United Nations Office on Drugs and Crime, notes The Record in a report about the recent scam ring. The agency released a report last week highlighting tech advancements among organized crime syndicates in Asia, specifically mentioning the increasing use of deepfake technology in fraud.

The UN agency identified more than 10 deepfake software providers selling their services on Telegram to criminal groups in Southeast Asia, showing the growing accessibility of this technology for illegal purposes.

Some companies are attempting to find automated solutions to the issues presented by AI-powered crime, including Reality Defender, which creates software that attempts to detect deepfakes in real time. Some deepfake detection techniques may work at the moment, but as the fakes improve in realism and sophistication, we may be looking at an escalating arms race between those who seek to fool others and those who want to prevent deception.

Deepfake lovers swindle victims out of $46M in Hong Kong AI scam Read More »

startup-can-identify-deepfake-video-in-real-time

Startup can identify deepfake video in real time

Real-time deepfakes are no longer limited to billionaires, public figures, or those who have extensive online presences. Mittal’s research at NYU, with professors Chinmay Hegde and Nasir Memon, proposes a potential challenge-based approach to blocking AI bots from video calls, where participants would have to pass a kind of video CAPTCHA test before joining.

As Reality Defender works to improve the detection accuracy of its models, Colman says that access to more data is a critical challenge to overcome—a common refrain from the current batch of AI-focused startups. He’s hopeful more partnerships will fill in these gaps, and without specifics, hints at multiple new deals likely coming next year. After ElevenLabs was tied to a deepfake voice call of US president Joe Biden, the AI-audio startup struck a deal with Reality Defender to mitigate potential misuse.

What can you do right now to protect yourself from video call scams? Just like WIRED’s core advice about avoiding fraud from AI voice calls, not getting cocky about whether you can spot video deepfakes is critical to avoid being scammed. The technology in this space continues to evolve rapidly, and any telltale signs you rely on now to spot AI deepfakes may not be as dependable with the next upgrades to underlying models.

“We don’t ask my 80-year-old mother to flag ransomware in an email,” says Colman. “Because she’s not a computer science expert.” In the future, it’s possible real-time video authentication, if AI detection continues to improve and shows to be reliably accurate, will be as taken for granted as that malware scanner quietly humming along in the background of your email inbox.

This story originally appeared on wired.com.

Startup can identify deepfake video in real time Read More »