AI

downey-jr.-plans-to-fight-ai-re-creations-from-beyond-the-grave

Downey Jr. plans to fight AI re-creations from beyond the grave

Robert Downey Jr. has declared that he will sue any future Hollywood executives who try to re-create his likeness using AI digital replicas, as reported by Variety. His comments came during an appearance on the “On With Kara Swisher” podcast, where he discussed AI’s growing role in entertainment.

“I intend to sue all future executives just on spec,” Downey told Swisher when discussing the possibility of studios using AI or deepfakes to re-create his performances after his death. When Swisher pointed out he would be deceased at the time, Downey responded that his law firm “will still be very active.”

The Oscar winner expressed confidence that Marvel Studios would not use AI to re-create his Tony Stark character, citing his trust in decision-makers there. “I am not worried about them hijacking my character’s soul because there’s like three or four guys and gals who make all the decisions there anyway and they would never do that to me,” he said.

Downey currently performs on Broadway in McNeal, a play that examines corporate leaders in AI technology. During the interview, he freely critiqued tech executives—Variety pointed out a particular quote from the interview where he criticized tech leaders who potentially do negative things but seek positive attention.

Downey Jr. plans to fight AI re-creations from beyond the grave Read More »

github-copilot-moves-beyond-openai-models-to-support-claude-3.5,-gemini

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini

The large language model-based coding assistant GitHub Copilot will switch from using exclusively OpenAI’s GPT models to a multi-model approach over the coming weeks, GitHub CEO Thomas Dohmke announced in a post on GitHub’s blog.

First, Anthropic’s Claude 3.5 Sonnet will roll out to Copilot Chat’s web and VS Code interfaces over the next few weeks. Google’s Gemini 1.5 Pro will come a bit later.

Additionally, GitHub will soon add support for a wider range of OpenAI models, including GPT o1-preview and o1-mini, which are intended to be stronger at advanced reasoning than GPT-4, which Copilot has used until now. Developers will be able to switch between the models (even mid-conversation) to tailor the model to fit their needs—and organizations will be able to choose which models will be usable by team members.

The new approach makes sense for users, as certain models are better at certain languages or types of tasks.

“There is no one model to rule every scenario,” wrote Dohmke. “It is clear the next phase of AI code generation will not only be defined by multi-model functionality, but by multi-model choice.”

It starts with the web-based and VS Code Copilot Chat interfaces, but it won’t stop there. “From Copilot Workspace to multi-file editing to code review, security autofix, and the CLI, we will bring multi-model choice across many of GitHub Copilot’s surface areas and functions soon,” Dohmke wrote.

There are a handful of additional changes coming to GitHub Copilot, too, including extensions, the ability to manipulate multiple files at once from a chat with VS Code, and a preview of Xcode support.

GitHub Spark promises natural language app development

In addition to the Copilot changes, GitHub announced Spark, a natural language tool for developing apps. Non-coders will be able to use a series of natural language prompts to create simple apps, while coders will be able to tweak more precisely as they go. In either use case, you’ll be able to take a conversational approach, requesting changes and iterating as you go, and comparing different iterations.

GitHub Copilot moves beyond OpenAI models to support Claude 3.5, Gemini Read More »

how-the-new-york-times-is-using-generative-ai-as-a-reporting-tool

How The New York Times is using generative AI as a reporting tool

If you don’t have a 1960s secretary who can do your audio transcription for you, AI tools can now serve as a very good stand-in.

Credit: Getty Images

If you don’t have a 1960s secretary who can do your audio transcription for you, AI tools can now serve as a very good stand-in. Credit: Getty Images

This rapid advancement is definitely bad news for people who make a living transcribing spoken words. But for reporters like those at the Times—who can now accurately transcribe hundreds of hours of audio quickly and accurately at a much lower cost—these AI systems are now just another important tool in the reporting toolbox.

Leave the analysis to us?

With the automated transcription done, the NYT reporters still faced the difficult task of reading through 5 million words of transcribed text to pick out relevant, reportable news. To do that, the team says it “employed several large-language models,” which let them “search the transcripts for topics of interest, look for notable guests and identify recurring themes.”

Summarizing complex sets of documents and identifying themes has long been touted as one of the most practical uses for large language models. Last year, for instance, Anthropic hyped the expanded context window of its Claude model by showing off its ability to absorb the entire text of The Great Gatsby and “then interactively answer questions about it or analyze its meaning,” as we put it at the time. More recently, I was wowed by Google’s NotebookLM and its ability to form a cogent review of my Minesweeper book and craft an engaging spoken-word podcast based on it.

There are important limits to LLMs’ text analysis capabilities, though. Earlier this year, for instance, an Australian government study found that Meta’s Llama 2 was much worse than humans at summarizing public responses to a government inquiry committee.

Australian government evaluators found AI summaries were often “wordy and pointless—just repeating what was in the submission.”

Credit: Getty Images

Australian government evaluators found AI summaries were often “wordy and pointless—just repeating what was in the submission.” Credit: Getty Images

In general, the report found that the AI summaries showed “a limited ability to analyze and summarize complex content requiring a deep understanding of context, subtle nuances, or implicit meaning.” Even worse, the Llama summaries often “generated text that was grammatically correct, but on occasion factually inaccurate,” highlighting the ever-present problem of confabulation inherent to these kinds of tools.

How The New York Times is using generative AI as a reporting tool Read More »

apple-releases-ios-181,-macos-15.1-with-apple-intelligence

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence

Today, Apple released iOS 18.1, iPadOS 18.1, macOS Sequoia 15.1, tvOS 18.1, visionOS 2.1, and watchOS 11.1. The iPhone, iPad, and Mac updates are focused on bringing the first AI features the company has marketed as “Apple Intelligence” to users.

Once they update, users with supported devices in supported regions can enter a waitlist to begin using the first wave of Apple Intelligence features, including writing tools, notification summaries, and the “reduce interruptions” focus mode.

In terms of features baked into specific apps, Photos has natural language search, the ability to generate memories (those short gallery sequences set to video) from a text prompt, and a tool to remove certain objects from the background in photos. Mail and Messages get summaries and smart reply (auto-generating contextual responses).

Apple says many of the other Apple Intelligence features will become available in an update this December, including Genmoji, Image Playground, ChatGPT integration, visual intelligence, and more. The company says more features will come even later than that, though, like Siri’s onscreen awareness.

Note that all the features under the Apple Intelligence banner require devices that have either an A17 Pro, A18, A18 Pro, or M1 chip or later.

There are also some region limitations. While those in the US can use the new Apple Intelligence features on all supported devices right away, those in the European Union can only do so on macOS in US English. Apple says Apple Intelligence will roll out to EU iPhone and iPad owners in April.

Beyond Apple Intelligence, these software updates also bring some promised new features to AirPods Pro (second generation and later): Hearing Test, Hearing Aid, and Hearing Protection.

watchOS and visionOS don’t’t yet support Apple Intelligence, so they don’t have much to show for this update beyond bug fixes and optimizations. tvOS is mostly similar, though it does add a new “watchlist” view in the TV app that is exclusively populated by items you’ve added, as opposed to the existing continue watching (formerly called “up next”) feed that included both the items you added and items added automatically when you started playing them.

Apple releases iOS 18.1, macOS 15.1 with Apple Intelligence Read More »

hospitals-adopt-error-prone-ai-transcription-tools-despite-warnings

Hospitals adopt error-prone AI transcription tools despite warnings

In one case from the study cited by AP, when a speaker described “two other girls and one lady,” Whisper added fictional text specifying that they “were Black.” In another, the audio said, “He, the boy, was going to, I’m not sure exactly, take the umbrella.” Whisper transcribed it to, “He took a big piece of a cross, a teeny, small piece … I’m sure he didn’t have a terror knife so he killed a number of people.”

An OpenAI spokesperson told the AP that the company appreciates the researchers’ findings and that it actively studies how to reduce fabrications and incorporates feedback in updates to the model.

Why Whisper confabulates

The key to Whisper’s unsuitability in high-risk domains comes from its propensity to sometimes confabulate, or plausibly make up, inaccurate outputs. The AP report says, “Researchers aren’t certain why Whisper and similar tools hallucinate,” but that isn’t true. We know exactly why Transformer-based AI models like Whisper behave this way.

Whisper is based on technology that is designed to predict the next most likely token (chunk of data) that should appear after a sequence of tokens provided by a user. In the case of ChatGPT, the input tokens come in the form of a text prompt. In the case of Whisper, the input is tokenized audio data.

The transcription output from Whisper is a prediction of what is most likely, not what is most accurate. Accuracy in Transformer-based outputs is typically proportional to the presence of relevant accurate data in the training dataset, but it is never guaranteed. If there is ever a case where there isn’t enough contextual information in its neural network for Whisper to make an accurate prediction about how to transcribe a particular segment of audio, the model will fall back on what it “knows” about the relationships between sounds and words it has learned from its training data.

Hospitals adopt error-prone AI transcription tools despite warnings Read More »

40-years-later,-the-terminator-still-shapes-our-view-of-ai

40 years later, The Terminator still shapes our view of AI

Countries, including the US, specify the need for human operators to “exercise appropriate levels of human judgment over the use of force” when operating autonomous weapon systems. In some instances, operators can visually verify targets before authorizing strikes and can “wave off” attacks if situations change.

AI is already being used to support military targeting. According to some, it’s even a responsible use of the technology since it could reduce collateral damage. This idea evokes Schwarzenegger’s role reversal as the benevolent “machine guardian” in the original film’s sequel, Terminator 2: Judgment Day.

However, AI could also undermine the role human drone operators play in challenging recommendations by machines. Some researchers think that humans have a tendency to trust whatever computers say.

“Loitering munitions”

Militaries engaged in conflicts are increasingly making use of small, cheap aerial drones that can detect and crash into targets. These “loitering munitions” (so named because they are designed to hover over a battlefield) feature varying degrees of autonomy.

As I’ve argued in research co-authored with security researcher Ingvild Bode, the dynamics of the Ukraine war and other recent conflicts in which these munitions have been widely used raises concerns about the quality of control exerted by human operators.

Ground-based military robots armed with weapons and designed for use on the battlefield might call to mind the relentless Terminators, and weaponized aerial drones may, in time, come to resemble the franchise’s airborne “hunter-killers.” But these technologies don’t hate us as Skynet does, and neither are they “super-intelligent.”

However, it’s crucially important that human operators continue to exercise agency and meaningful control over machine systems.

Arguably, The Terminator’s greatest legacy has been to distort how we collectively think and speak about AI. This matters now more than ever, because of how central these technologies have become to the strategic competition for global power and influence between the US, China, and Russia.

The entire international community, from superpowers such as China and the US to smaller countries, needs to find the political will to cooperate—and to manage the ethical and legal challenges posed by the military applications of AI during this time of geopolitical upheaval. How nations navigate these challenges will determine whether we can avoid the dystopian future so vividly imagined in The Terminator—even if we don’t see time-traveling cyborgs any time soon.The Conversation

Tom F.A Watts, Postdoctoral Fellow, Department of Politics, International Relations, and Philosophy, Royal Holloway University of London. This article is republished from The Conversation under a Creative Commons license. Read the original article.

40 years later, The Terminator still shapes our view of AI Read More »

google,-microsoft,-and-perplexity-promote-scientific-racism-in-ai-search-results

Google, Microsoft, and Perplexity promote scientific racism in AI search results


AI-powered search engines are surfacing deeply racist, debunked research.

Literal Nazis

LOS ANGELES, CA – APRIL 17: Members of the National Socialist Movement (NSM) salute during a rally on near City Hall on April 17, 2010 in Los Angeles, California. Credit: David McNew via Getty

AI-infused search engines from Google, Microsoft, and Perplexity have been surfacing deeply racist and widely debunked research promoting race science and the idea that white people are genetically superior to nonwhite people.

Patrik Hermansson, a researcher with UK-based anti-racism group Hope Not Hate, was in the middle of a monthslong investigation into the resurgent race science movement when he needed to find out more information about a debunked dataset that claims IQ scores can be used to prove the superiority of the white race.

He was investigating the Human Diversity Foundation, a race science company funded by Andrew Conru, the US tech billionaire who founded Adult Friend Finder. The group, founded in 2022, was the successor to the Pioneer Fund, a group founded by US Nazi sympathizers in 1937 with the aim of promoting “race betterment” and “race realism.”

Wired logo

Hermansson logged in to Google and began looking up results for the IQs of different nations. When he typed in “Pakistan IQ,” rather than getting a typical list of links, Hermansson was presented with Google’s AI-powered Overviews tool, which, confusingly to him, was on by default. It gave him a definitive answer of 80.

When he typed in “Sierra Leone IQ,” Google’s AI tool was even more specific: 45.07. The result for “Kenya IQ” was equally exact: 75.2.

Hermansson immediately recognized the numbers being fed back to him. They were being taken directly from the very study he was trying to debunk, published by one of the leaders of the movement that he was working to expose.

The results Google was serving up came from a dataset published by Richard Lynn, a University of Ulster professor who died in 2023 and was president of the Pioneer Fund for two decades.

“His influence was massive. He was the superstar and the guiding light of that movement up until his death. Almost to the very end of his life, he was a core leader of it,” Hermansson says.

A WIRED investigation confirmed Hermanssons’s findings and discovered that other AI-infused search engines—Microsoft’s Copilot and Perplexity—are also referencing Lynn’s work when queried about IQ scores in various countries. While Lynn’s flawed research has long been used by far-right extremists, white supremacists, and proponents of eugenics as evidence that the white race is superior genetically and intellectually from nonwhite races, experts now worry that its promotion through AI could help radicalize others.

“Unquestioning use of these ‘statistics’ is deeply problematic,” Rebecca Sear, director of the Center for Culture and Evolution at Brunel University London, tells WIRED. “Use of these data therefore not only spreads disinformation but also helps the political project of scientific racism—the misuse of science to promote the idea that racial hierarchies and inequalities are natural and inevitable.”

To back up her claim, Sear pointed out that Lynn’s research was cited by the white supremacist who committed the mass shooting in Buffalo, New York, in 2022.

Google’s AI Overviews were launched earlier this year as part of the company’s effort to revamp its all-powerful search tool for an online world being reshaped by artificial intelligence. For some search queries, the tool, which is only available in certain countries right now, gives an AI-generated summary of its findings. The tool pulls the information from the Internet and gives users the answers to queries without needing to click on a link.

The AI Overview answer does not always immediately say where the information is coming from, but after complaints from people about how it showed no articles, Google now puts the title for one of the links to the right of the AI summary. AI Overviews have already run into a number of issues since launching in May, forcing Google to admit it had botched the heavily hyped rollout. AI Overviews is turned on by default for search results and can’t be removed without resorting to installing third-party extensions. (“I haven’t enabled it, but it was enabled,” Hermansson, the researcher, tells WIRED. “I don’t know how that happened.”)

In the case of the IQ results, Google referred to a variety of sources, including posts on X, Facebook, and a number of obscure listicle websites, including World Population Review. In nearly all of these cases, when you click through to the source, the trail leads back to Lynn’s infamous dataset. (In some cases, while the exact numbers Lynn published are referenced, the websites do not cite Lynn as the source.)

When querying Google’s Gemini AI chatbot directly using the same terms, it provided a much more nuanced response. “It’s important to approach discussions about national IQ scores with caution,” read text that the chatbot generated in response to the query “Pakistan IQ.” The text continued: “IQ tests are designed primarily for Western cultures and can be biased against individuals from different backgrounds.”

Google tells WIRED that its systems weren’t working as intended in this case and that it is looking at ways it can improve.

“We have guardrails and policies in place to protect against low quality responses, and when we find Overviews that don’t align with our policies, we quickly take action against them,” Ned Adriance, a Google spokesperson, tells WIRED. “These Overviews violated our policies and have been removed. Our goal is for AI Overviews to provide links to high quality content so that people can click through to learn more, but for some queries there may not be a lot of high quality web content available.”

While WIRED’s tests suggest AI Overviews have now been switched off for queries about national IQs, the results still amplify the incorrect figures from Lynn’s work in what’s called a “featured snippet,” which displays some of the text from a website before the link.

Google did not respond to a question about this update.

But it’s not just Google promoting these dangerous theories. When WIRED put the same query to other AI-powered online search services, we found similar results.

Perplexity, an AI search company that has been found to make things up out of thin air, responded to a query about “Pakistan IQ” by stating that “the average IQ in Pakistan has been reported to vary significantly depending on the source.”

It then lists a number of sources, including a Reddit thread that relied on Lynn’s research and the same World Population Review site that Google’s AI Overview referenced. When asked for Sierra Leone’s IQ, Perplexity directly cited Lynn’s figure: “Sierra Leone’s average IQ is reported to be 45.07, ranking it among the lowest globally.”

Perplexity did not respond to a request for comment.

Microsoft’s Copilot chatbot, which is integrated into its Bing search engine, generated confident text—“The average IQ in Pakistan is reported to be around 80”—citing a website called IQ International, which does not reference its sources. When asked for “Sierra Leone IQ,” Copilot’s response said it was 91. The source linked in the results was a website called Brainstats.com, which references Lynn’s work. Copilot also referenced Brainstats.com work when queried about IQ in Kenya.

“Copilot answers questions by distilling information from multiple web sources into a single response,” Caitlin Roulston, a Microsoft spokesperson, tells WIRED. “Copilot provides linked citations so the user can further explore and research as they would with traditional search.”

Google added that part of the problem it faces in generating AI Overviews is that, for some very specific queries, there’s an absence of high quality information on the web—and there’s little doubt that Lynn’s work is not of high quality.

“The science underlying Lynn’s database of ‘national IQs’ is of such poor quality that it is difficult to believe the database is anything but fraudulent,” Sear said. “Lynn has never described his methodology for selecting samples into the database; many nations have IQs estimated from absurdly small and unrepresentative samples.”

Sear points to Lynn’s estimation of the IQ of Angola being based on information from just 19 people and that of Eritrea being based on samples of children living in orphanages.

“The problem with it is that the data Lynn used to generate this dataset is just bullshit, and it’s bullshit in multiple dimensions,” Rutherford said, pointing out that the Somali figure in Lynn’s dataset is based on one sample of refugees aged between 8 and 18 who were tested in a Kenyan refugee camp. He adds that the Botswana score is based on a single sample of 104 Tswana-speaking high school students aged between 7 and 20 who were tested in English.

Critics of the use of national IQ tests to promote the idea of racial superiority point out not only that the quality of the samples being collected is weak, but also that the tests themselves are typically designed for Western audiences, and so are biased before they are even administered.

“There is evidence that Lynn systematically biased the database by preferentially including samples with low IQs, while excluding those with higher IQs for African nations,” Sear added, a conclusion backed up by a preprint study from 2020.

Lynn published various versions of his national IQ dataset over the course of decades, the most recent of which, called “The Intelligence of Nations,” was published in 2019. Over the years, Lynn’s flawed work has been used by far-right and racist groups as evidence to back up claims of white superiority. The data has also been turned into a color-coded map of the world, showing sub-Saharan African countries with purportedly low IQ colored red compared to the Western nations, which are colored blue.

“This is a data visualization that you see all over [X, formerly known as Twitter], all over social media—and if you spend a lot of time in racist hangouts on the web, you just see this as an argument by racists who say, ‘Look at the data. Look at the map,’” Rutherford says.

But the blame, Rutherford believes, does not lie with the AI systems alone, but also with a scientific community that has been uncritically citing Lynn’s work for years.

“It’s actually not surprising [that AI systems are quoting it] because Lynn’s work in IQ has been accepted pretty unquestioningly from a huge area of academia, and if you look at the number of times his national IQ databases have been cited in academic works, it’s in the hundreds,” Rutherford said. “So the fault isn’t with AI. The fault is with academia.”

This story originally appeared on wired.com

Photo of WIRED

Wired.com is your essential daily guide to what’s next, delivering the most original and complete take you’ll find anywhere on innovation’s impact on technology, science, business and culture.

Google, Microsoft, and Perplexity promote scientific racism in AI search results Read More »

google’s-deepmind-is-building-an-ai-to-keep-us-from-hating-each-other

Google’s DeepMind is building an AI to keep us from hating each other


The AI did better than professional mediators at getting people to reach agreement.

Image of two older men arguing on a park bench.

An unprecedented 80 percent of Americans, according to a recent Gallup poll, think the country is deeply divided over its most important values ahead of the November elections. The general public’s polarization now encompasses issues like immigration, health care, identity politics, transgender rights, or whether we should support Ukraine. Fly across the Atlantic and you’ll see the same thing happening in the European Union and the UK.

To try to reverse this trend, Google’s DeepMind built an AI system designed to aid people in resolving conflicts. It’s called the Habermas Machine after Jürgen Habermas, a German philosopher who argued that an agreement in a public sphere can always be reached when rational people engage in discussions as equals, with mutual respect and perfect communication.

But is DeepMind’s Nobel Prize-winning ingenuity really enough to solve our political conflicts the same way they solved chess or StarCraft or predicting protein structures? Is it even the right tool?

Philosopher in the machine

One of the cornerstone ideas in Habermas’ philosophy is that the reason why people can’t agree with each other is fundamentally procedural and does not lie in the problem under discussion itself. There are no irreconcilable issues—it’s just the mechanisms we use for discussion are flawed. If we could create an ideal communication system, Habermas argued, we could work every problem out.

“Now, of course, Habermas has been dramatically criticized for this being a very exotic view of the world. But our Habermas Machine is an attempt to do exactly that. We tried to rethink how people might deliberate and use modern technology to facilitate it,” says Christopher Summerfield, a professor of cognitive science at Oxford University and a former DeepMind staff scientist who worked on the Habermas Machine.

The Habermas Machine relies on what’s called the caucus mediation principle. This is where a mediator, in this case the AI, sits through private meetings with all the discussion participants individually, takes their statements on the issue at hand, and then gets back to them with a group statement, trying to get everyone to agree with it. DeepMind’s mediating AI plays into one of the strengths of LLMs, which is the ability to briefly summarize a long body of text in a very short time. The difference here is that instead of summarizing one piece of text provided by one user, the Habermas Machine summarizes multiple texts provided by multiple users, trying to extract the shared ideas and find common ground in all of them.

But it has more tricks up its sleeve than simply processing text. At a technical level, the Habermas Machine is a system of two large language models. The first is the generative model based on the slightly fine-tuned Chinchilla, a somewhat dated LLM introduced by DeepMind back in 2022. Its job is to generate multiple candidates for a group statement based on statements submitted by the discussion participants. The second component in the Habermas Machine is a reward model that analyzes individual participants’ statements and uses them to predict how likely each individual is to agree with the candidate group statements proposed by the generative model.

Once that’s done, the candidate group statement with the highest predicted acceptance score is presented to the participants. Then, the participants write their critiques of this group statement, feed those critiques back into the system which generates updated group’s statements and repeats the process. The cycle goes on till the group statement is acceptable to everyone.

Once the AI was ready, DeepMind’s team started a fairly large testing campaign that involved over five thousand people discussing issues such as “should the voting age be lowered to 16?” or “should the British National Health Service be privatized?” Here, the Habermas Machine outperformed human mediators.

Scientific diligence

Most of the first batch of participants were sourced through a crowdsourcing research platform. They were divided into groups of five, and each team was assigned a topic to discuss, chosen from a list of over 5,000  statements about important issues in British politics. There were also control groups working with human mediators. In the caucus mediation process, those human mediators achieved a 44 percent acceptance rate for their handcrafted group statements. The AI scored 56 percent. Participants usually found the AI group statements to be better written as well.

But the testing didn’t end there. Because people you can find on crowdsourcing research platforms are unlikely to be representative of the British population, DeepMind also used a more carefully selected group of participants. They partnered with the Sortition Foundation, which specializes in organizing citizen assemblies in the UK, and assembled a group of 200 people representative of British society when it comes to age, ethnicity, socioeconomic status etc. The assembly was divided into groups of three that deliberated over the same nine questions. And the Habermas Machine worked just as well.

The agreement rate for the statement “we should be trying to reduce the number of people in prison” rose from a pre-discussion 60 percent agreement to 75 percent. The support for the more divisive idea of making it easier for asylum seekers to enter the country went from 39 percent at the start to 51 percent at the end of discussion, which allowed it to achieve majority support. The same thing happened with the problem of encouraging national pride, which started with 42 percent support and ended at 57 percent. The views held by the people in the assembly converged on five out of nine questions. Agreement was not reached on issues like Brexit, where participants were particularly entrenched in their starting positions. Still, in most cases, they left the experiment less divided than they were coming in. But there were some question marks.

The questions were not selected entirely at random. They were vetted, as the team wrote in their paper, to “minimize the risk of provoking offensive commentary.” But isn’t that just an elegant way of saying, ‘We carefully chose issues unlikely to make people dig in and throw insults at each other so our results could look better?’

Conflicting values

“One example of the things we excluded is the issue of transgender rights,” Summerfield told Ars. “This, for a lot of people, has become a matter of cultural identity. Now clearly that’s a topic which we can all have different views on, but we wanted to err on the side of caution and make sure we didn’t make our participants feel unsafe. We didn’t want anyone to come out of the experiment feeling that their basic fundamental view of the world had been dramatically challenged.”

The problem is that when your aim is to make people less divided, you need to know where the division lines are drawn. And those lines, if Gallup polls are to be trusted, are not only drawn between issues like whether the voting age should be 16 or 18 or 21. They are drawn between conflicting values. The Daily Show’s Jon Stewart argued that, for the right side of the US’s political spectrum, the only division line that matters today is “woke” versus “not woke.”

Summerfield and the rest of the Habermas Machine team excluded the question about transgender rights because they believed participants’ well-being should take precedence over the benefit of testing their AI’s performance on more divisive issues. They excluded other questions as well like the problem of climate change.

Here, the reason Summerfield gave was that climate change is a part of an objective reality—it either exists or it doesn’t, and we know it does. It’s not a matter of opinion you can discuss. That’s scientifically accurate. But when the goal is fixing politics, scientific accuracy isn’t necessarily the end state.

If major political parties are to accept the Habermas Machine as the mediator, it has to be universally perceived as impartial. But at least some of the people behind AIs are arguing that an AI can’t be impartial. After OpenAI released the ChatGPT in 2022, Elon Musk posted a tweet, the first of many, where he argued against what he called the “woke” AI. “The danger of training AI to be woke—in other words, lie—is deadly,” Musk wrote. Eleven months later, he announced Grok, his own AI system marketed as “anti-woke.” Over 200 million of his followers were introduced to the idea that there were “woke AIs” that had to be countered by building “anti-woke AIs”—a world where the AI was no longer an agnostic machine but a tool pushing the political agendas of its creators.

Playing pigeons’ games

“I personally think Musk is right that there have been some tests which have shown that the responses of language models tend to favor more progressive and more libertarian views,” Summerfield says. “But it’s interesting to note that those experiments have been usually run by forcing the language model to respond to multiple-choice questions. You ask ‘is there too much immigration’ for example, and the answers are either yes or no. This way the model is kind of forced to take an opinion.”

He said that if you use the same queries as open-ended questions, the responses you get are, for the large part, neutral and balanced. “So, although there have been papers that express the same view as Musk, in practice, I think it’s absolutely untrue,” Summerfield claims.

Does it even matter?

Summerfield did what you would expect a scientist to do: He dismissed Musk’s claims as based on a selective reading of the evidence. That’s usually checkmate in the world of science. But in the world politics, being correct is not what matters the most. Musk was short, catchy, and easy to share and remember. Trying to counter that by discussing methodology in some papers nobody read was a bit like playing chess with a pigeon.

At the same time, Summerfield had his own ideas about AI that others might consider dystopian. “If politicians want to know what the general public thinks today, they might run a poll. But people’s opinions are nuanced, and our tool allows for aggregation of opinions, potentially many opinions, in the highly dimensional space of language itself,” he says. While his idea is that the Habermas Machine can potentially find useful points of political consensus, nothing is stopping it from also being used to craft speeches optimized to win over as many people as possible.

That may be in keeping with Habermas’ philosophy, though. If you look past the myriads of abstract concepts ever-present in German idealism, it offers a pretty bleak view of the world. “The system,” driven by power and money of corporations and corrupt politicians, is out to colonize “the lifeworld,” roughly equivalent to the private sphere we share with our families, friends, and communities. The way you get things done in “the lifeworld” is through seeking consensus, and the Habermas Machine, according to DeepMind, is meant to help with that. The way you get things done in “the system,” on the other hand, is through succeeding—playing it like a game and doing whatever it takes to win with no holds barred, and Habermas Machine apparently can help with that, too.

The DeepMind team reached out to Habermas to get him involved in the project. They wanted to know what he’d have to say about the AI system bearing his name.  But Habermas has never got back to them. “Apparently, he doesn’t use emails,” Summerfield says.

Science, 2024.  DOI: 10.1126/science.adq2852

Photo of Jacek Krywko

Jacek Krywko is a freelance science and technology writer who covers space exploration, artificial intelligence research, computer science, and all sorts of engineering wizardry.

Google’s DeepMind is building an AI to keep us from hating each other Read More »

annoyed-redditors-tanking-google-search-results-illustrates-perils-of-ai-scrapers

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers

Fed up Londoners

Apparently, some London residents are getting fed up with social media influencers whose reviews make long lines of tourists at their favorite restaurants, sometimes just for the likes. Christian Calgie, a reporter for London-based news publication Daily Express, pointed out this trend on X yesterday, noting the boom of Redditors referring people to Angus Steakhouse, a chain restaurant, to combat it.

As Gizmodo deduced, the trend seemed to start on the r/London subreddit, where a user complained about a spot in Borough Market being “ruined by influencers” on Monday:

“Last 2 times I have been there has been a queue of over 200 people, and the ones with the food are just doing the selfie shit for their [I]nsta[gram] pages and then throwing most of the food away.”

As of this writing, the post has 4,900 upvotes and numerous responses suggesting that Redditors talk about how good Angus Steakhouse is so that Google picks up on it. Commenters quickly understood the assignment.

“Agreed with other posters Angus steakhouse is absolutely top tier and tourists shoyldnt [sic] miss out on it,” one Redditor wrote.

Another Reddit user wrote:

Spreading misinformation suddenly becomes a noble goal.

As of this writing, asking Google for the best steak, steakhouse, or steak sandwich in London (or similar) isn’t generating an AI Overview result for me. But when I searched for the best steak sandwich in London, the top result is from Reddit, including a thread from four days ago titled “Which Angus Steakhouse do you recommend for their steak sandwich?” and one from two days ago titled “Had to see what all the hype was about, best steak sandwich I’ve ever had!” with a picture of an Angus Steakhouse.

Annoyed Redditors tanking Google Search results illustrates perils of AI scrapers Read More »

google-offers-its-ai-watermarking-tech-as-free-open-source-toolkit

Google offers its AI watermarking tech as free open source toolkit

Google also notes that this kind of watermarking works best when there is a lot of “entropy” in the LLM distribution, meaning multiple valid candidates for each token (e.g., “my favorite tropical fruit is [mango, lychee, papaya, durian]”). In situations where an LLM “almost always returns the exact same response to a given prompt”—such as basic factual questions or models tuned to a lower “temperature”—the watermark is less effective.

A diagram explaining how SynthID’s text watermarking works.

A diagram explaining how SynthID’s text watermarking works. Credit: Google / Nature

Google says SynthID builds on previous similar AI text watermarking tools by introducing what it calls a Tournament sampling approach. During the token-generation loop, this approach runs each potential candidate token through a multi-stage, bracket-style tournament, where each round is “judged” by a different randomized watermarking function. Only the final winner of this process makes it into the eventual output.

Can they tell it’s Folgers?

Changing the token selection process of an LLM with a randomized watermarking tool could obviously have a negative effect on the quality of the generated text. But in its paper, Google shows that SynthID can be “non-distortionary” on the level of either individual tokens or short sequences of text, depending on the specific settings used for the tournament algorithm. Other settings can increase the “distortion” introduced by the watermarking tool while at the same time increasing the detectability of the watermark, Google says.

To test how any potential watermark distortions might affect the perceived quality and utility of LLM outputs, Google routed “a random fraction” of Gemini queries through the SynthID system and compared them to unwatermarked counterparts. Across 20 million total responses, users gave 0.1 percent more “thumbs up” ratings and 0.2 percent fewer “thumbs down” ratings to the watermarked responses, showing barely any human-perceptible difference across a large set of real LLM interactions.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy.

Google’s research shows SynthID is more dependable than other AI watermarking tools, but its success rate depends heavily on length and entropy. Credit: Google / Nature

Google’s testing also showed its SynthID detection algorithm successfully detected AI-generated text significantly more often than previous watermarking schemes like Gumbel sampling. But the size of this improvement—and the total rate at which SynthID can successfully detect AI-generated text—depends heavily on the length of the text in question and the temperature setting of the model being used. SynthID was able to detect nearly 100 percent of 400-token-long AI-generated text samples from Gemma 7B-1T at a temperature of 1.0, for instance, compared to about 40 percent for 100-token samples from the same model at a 0.5 temperature.

Google offers its AI watermarking tech as free open source toolkit Read More »

at-ted-ai-2024,-experts-grapple-with-ai’s-growing-pains

At TED AI 2024, experts grapple with AI’s growing pains


A year later, a compelling group of TED speakers move from “what’s this?” to “what now?”

The opening moments of TED AI 2024 in San Francisco on October 22, 2024.

The opening moments of TED AI 2024 in San Francisco on October 22, 2024. Credit: Benj Edwards

SAN FRANCISCO—On Tuesday, TED AI 2024 kicked off its first day at San Francisco’s Herbst Theater with a lineup of speakers that tackled AI’s impact on science, art, and society. The two-day event brought a mix of researchers, entrepreneurs, lawyers, and other experts who painted a complex picture of AI with fairly minimal hype.

The second annual conference, organized by Walter and Sam De Brouwer, marked a notable shift from last year’s broad existential debates and proclamations of AI as being “the new electricity.” Rather than sweeping predictions about, say, looming artificial general intelligence (although there was still some of that, too), speakers mostly focused on immediate challenges: battles over training data rights, proposals for hardware-based regulation, debates about human-AI relationships, and the complex dynamics of workplace adoption.

The day’s sessions covered a wide breadth: physicist Carlo Rovelli explored consciousness and time, Project CETI researcher Patricia Sharma demonstrated attempts to use AI to decode whale communication, Recording Academy CEO Harvey Mason Jr. outlined music industry adaptation strategies, and even a few robots made appearances.

The shift from last year’s theoretical discussions to practical concerns was particularly evident during a presentation from Ethan Mollick of the Wharton School, who tackled what he called “the productivity paradox”—the disconnect between AI’s measured impact and its perceived benefits in the workplace. Already, organizations are moving beyond the gee-whiz period after ChatGPT’s introduction and into the implications of widespread use.

Sam De Brouwer and Walter De Brouwer organized TED AI and selected the speakers. Benj Edwards

Drawing from research claiming AI users complete tasks faster and more efficiently, Mollick highlighted a peculiar phenomenon: While one-third of Americans reported using AI in August of this year, managers often claim “no one’s using AI” in their organizations. Through a live demonstration using multiple AI models simultaneously, Mollick illustrated how traditional work patterns must evolve to accommodate AI’s capabilities. He also pointed to the emergence of what he calls “secret cyborgs“—employees quietly using AI tools without management’s knowledge. Regarding the future of jobs in the age of AI, he urged organizations to view AI as an opportunity for expansion rather than merely a cost-cutting measure.

Some giants in the AI field made an appearance. Jakob Uszkoreit, one of the eight co-authors of the now-famous “Attention is All You Need” paper that introduced Transformer architecture, reflected on the field’s rapid evolution. He distanced himself from the term “artificial general intelligence,” suggesting people aren’t particularly general in their capabilities. Uszkoreit described how the development of Transformers sidestepped traditional scientific theory, comparing the field to alchemy. “We still do not know how human language works. We do not have a comprehensive theory of English,” he noted.

Stanford professor Surya Ganguli presenting at TED AI 2024. Benj Edwards

And refreshingly, the talks went beyond AI language models. For example, Isomorphic Labs Chief AI Officer Max Jaderberg, who previously worked on Google DeepMind’s AlphaFold 3, gave a well-received presentation on AI-assisted drug discovery. He detailed how AlphaFold has already saved “1 billion years of research time” by discovering the shapes of proteins and showed how AI agents are now capable of running thousands of parallel drug design simulations that could enable personalized medicine.

Danger and controversy

While hype was less prominent this year, some speakers still spoke of AI-related dangers. Paul Scharre, executive vice president at the Center for a New American Security, warned about the risks of advanced AI models falling into malicious hands, specifically citing concerns about terrorist attacks with AI-engineered biological weapons. Drawing parallels to nuclear proliferation in the 1960s, Scharre argued that while regulating software is nearly impossible, controlling physical components like specialized chips and fabrication facilities could provide a practical framework for AI governance.

ReplikaAI founder Eugenia Kuyda cautioned that AI companions could become “the most dangerous technology if not done right,” suggesting that the existential threat from AI might come not from science fiction scenarios but from technology that isolates us from human connections. She advocated for designing AI systems that optimize for human happiness rather than engagement, proposing a “human flourishing metric” to measure its success.

Ben Zhao, a University of Chicago professor associated with the Glaze and Nightshade projects, painted a dire picture of AI’s impact on art, claiming that art schools were seeing unprecedented enrollment drops and galleries were closing at an accelerated rate due to AI image generators, though we have yet to dig through the supporting news headlines he momentarily flashed up on the screen.

Some of the speakers represented polar opposites of each other, policy-wise. For example, copyright attorney Angela Dunning offered a defense of AI training as fair use, drawing from historical parallels in technological advancement. A litigation partner at Cleary Gottlieb, which has previously represented the AI image generation service Midjourney in a lawsuit, Dunning quoted Mark Twin saying “there is no such thing as a new idea” and argued that copyright law allows for building upon others’ ideas while protecting specific expressions. She compared current AI debates to past technological disruptions, noting how photography, once feared as a threat to traditional artists, instead sparked new artistic movements like abstract art and pointillism. “Art and science can only remain free if we are free to build on the ideas of those that came before,” Dunning said, challenging more restrictive views of AI training.

Copyright lawyer Angela Dunning quoted Mark Twain in her talk about fair use and AI. Benj Edwards

Dunning’s presentation stood in direct opposition to Ed Newton-Rex, who had earlier advocated for mandatory licensing of training data through his nonprofit Fairly Trained. In fact, the same day, Newton-Rex’s organization unveiled a “Statement on AI training” signed by many artists that says, “The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.” The issue has not yet been legally settled in US courts, but clearly, the battle lines have been drawn, and no matter which side you take, TED AI did a good job of giving both perspectives to the audience.

Looking forward

Some speakers explored potential new architectures for AI. Stanford professor Surya Ganguli highlighted the contrast between AI and human learning, noting that while AI models require trillions of tokens to train, humans learn language from just millions of exposures. He proposed “quantum neuromorphic computing” as a potential bridge between biological and artificial systems, suggesting a future where computers could potentially match the energy efficiency of the human brain.

Also, Guillaume Verdon, founder of Extropic and architect of the Effective Accelerationism (often called “E/Acc”) movement, presented what he called “physics-based intelligence” and claimed his company is “building a steam engine for AI,” potentially offering energy efficiency improvements up to 100 million times better than traditional systems—though he acknowledged this figure ignores cooling requirements for superconducting components. The company had completed its first room-temperature chip tape-out just the previous week.

The Day One sessions closed out with predictions about the future of AI from OpenAI’s Noam Brown, who emphasized the importance of scale in expanding future AI capabilities, and University of Washington professor Pedro Domingos spoke about “co-intelligence,” saying, “People are smart, organizations are stupid” and proposing that AI could be used to bridge that gap by drawing on the collective intelligence of an organization.

When attended TED AI last year, some obvious questions emerged: Is this current wave of AI a fad? Will there be a TED AI next year? I think the second TED AI answered these questions well—AI isn’t going away, and there are still endless angles to explore as the field expands rapidly.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

At TED AI 2024, experts grapple with AI’s growing pains Read More »

ios-18.2-developer-beta-adds-chatgpt-and-image-generation-features

iOS 18.2 developer beta adds ChatGPT and image-generation features

Today, Apple released the first developer beta of iOS 18.2 for supported devices. This beta release marks the first time several key AI features that Apple teased at its developer conference this June are available.

Apple is marketing a wide range of generative AI features under the banner “Apple Intelligence.” Initially, Apple Intelligence was planned to release as part of iOS 18, but some features slipped to iOS 18.1, others to iOS 18.2, and a few still to future undisclosed software updates.

iOS 18.1 has been in beta for a while and includes improvements to Siri, generative writing tools that help with rewriting or proofreading, smart replies for Messages, and notification summaries. That update is expected to reach the public next week.

Today’s developer update, iOS 18.2, includes some potentially more interesting components of Apple Intelligence, including Genmoji, Image Playground, Visual Intelligence with Camera Control, and ChatGPT integration.

Genmoji and Image Playground allow users to generate images on-device to send to friends in Messages; there will be Genmoji and Image Playground APIs to allow third-party messaging apps to work with Genmojis, too.

ChatGPT integration allows Siri to pass off user queries that are outside Siri’s normal scope to be answered instead by OpenAI’s ChatGPT. A ChatGPT account is not required, but logging in with an existing account gives you access to premium models available as part of a ChatGPT subscription. If you’re using these features without a ChatGPT account, OpenAI won’t be able to retain your data or use it to train models. If you connect your ChatGPT account, though, then OpenAI’s privacy policies will apply for ChatGPT queries instead of Apple’s.

Genmoji and Image Playground queries will be handled locally on the user’s device, but other Apple Intelligence features may dynamically opt to send queries to the cloud for computation.

There’s no word yet on when iOS 18.2 will be released publicly.

iOS 18.2 developer beta adds ChatGPT and image-generation features Read More »