AI

youtube-tries-convincing-record-labels-to-license-music-for-ai-song-generator

YouTube tries convincing record labels to license music for AI song generator

Jukebox zeroes —

Video site needs labels’ content to legally train AI song generators.

Man using phone in front of YouTube logo

Chris Ratcliffe/Bloomberg via Getty

YouTube is in talks with record labels to license their songs for artificial intelligence tools that clone popular artists’ music, hoping to win over a skeptical industry with upfront payments.

The Google-owned video site needs labels’ content to legally train AI song generators, as it prepares to launch new tools this year, according to three people familiar with the matter.

The company has recently offered lump sums of cash to the major labels—Sony, Warner, and Universal—to try to convince more artists to allow their music to be used in training AI software, according to several people briefed on the talks.

However, many artists remain fiercely opposed to AI music generation, fearing it could undermine the value of their work. Any move by a label to force their stars into such a scheme would be hugely controversial.

“The industry is wrestling with this. Technically the companies have the copyrights, but we have to think through how to play it,” said an executive at a large music company. “We don’t want to be seen as a Luddite.”

YouTube last year began testing a generative AI tool that lets people create short music clips by entering a text prompt. The product, initially named “Dream Track,” was designed to imitate the sound and lyrics of well-known singers.

But only 10 artists agreed to participate in the test phase, including Charli XCX, Troye Sivan and John Legend, and Dream Track was made available to just a small group of creators.

YouTube wants to sign up “dozens” of artists to roll out a new AI song generator this year, said two of the people.

YouTube said: “We’re not looking to expand Dream Track but are in conversations with labels about other experiments.”

Licenses or lawsuits

YouTube is seeking new deals at a time when AI companies such as OpenAI are striking licensing agreements with media groups to train large language models, the systems that power AI products such as the ChatGPT chatbot. Some of those deals are worth tens of millions of dollars to media companies, insiders say.

The deals being negotiated in music would be different. They would not be blanket licenses but rather would apply to a select group of artists, according to people briefed on the discussions.

It would be up to the labels to encourage their artists to participate in the new projects. That means the final amounts YouTube might be willing to pay the labels are at this stage undetermined.

The deals would look more like the one-off payments from social media companies such as Meta or Snap to entertainment groups for access to their music, rather than the royalty-based arrangements labels have with Spotify or Apple, these people said.

YouTube’s new AI tool, which is unlikely to carry the Dream Track brand, could form part of YouTube’s Shorts platform, which competes with TikTok. Talks continue and deal terms could still change, the people said.

YouTube’s latest move comes as the leading record companies on Monday sued two AI start-ups, Suno and Udio, which they allege are illegally using copyrighted recordings to train their AI models. A music industry group is seeking “up to $150,000 per work infringed,” according to the filings.

After facing the threat of extinction following the rise of Napster in the 2000s, music companies are trying to get ahead of disruptive technology this time around. The labels are keen to get involved with licensed products that use AI to create songs using their music copyrights—and get paid for it.

Sony Music, which did not participate in the first phase of YouTube’s AI experiment, is in negotiations with the tech group to make available some of its music to the new tools, said a person familiar with the matter. Warner and Universal, whose artists participated in the test phase, are also in talks with YouTube about expanding the product, these people said.

In April, more than 200 musicians including Billie Eilish and the estate of Frank Sinatra signed an open letter.

“Unchecked, AI will set in motion a race to the bottom that will degrade the value of our work and prevent us from being fairly compensated for it,” the letter said.

YouTube added: “We are always testing new ideas and learning from our experiments; it’s an important part of our innovation process. We will continue on this path with AI and music as we build for the future.”

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

YouTube tries convincing record labels to license music for AI song generator Read More »

taking-a-closer-look-at-ai’s-supposed-energy-apocalypse

Taking a closer look at AI’s supposed energy apocalypse

Someone just asked what it would look like if their girlfriend was a Smurf. Better add another rack of servers!

Enlarge / Someone just asked what it would look like if their girlfriend was a Smurf. Better add another rack of servers!

Getty Images

Late last week, both Bloomberg and The Washington Post published stories focused on the ostensibly disastrous impact artificial intelligence is having on the power grid and on efforts to collectively reduce our use of fossil fuels. The high-profile pieces lean heavily on recent projections from Goldman Sachs and the International Energy Agency (IEA) to cast AI’s “insatiable” demand for energy as an almost apocalyptic threat to our power infrastructure. The Post piece even cites anonymous “some [people]” in reporting that “some worry whether there will be enough electricity to meet [the power demands] from any source.”

Digging into the best available numbers and projections available, though, it’s hard to see AI’s current and near-future environmental impact in such a dire light. While generative AI models and tools can and will use a significant amount of energy, we shouldn’t conflate AI energy usage with the larger and largely pre-existing energy usage of “data centers” as a whole. And just like any technology, whether that AI energy use is worthwhile depends largely on your wider opinion of the value of generative AI in the first place.

Not all data centers

While the headline focus of both Bloomberg and The Washington Post’s recent pieces is on artificial intelligence, the actual numbers and projections cited in both pieces overwhelmingly focus on the energy used by Internet “data centers” as a whole. Long before generative AI became the current Silicon Valley buzzword, those data centers were already growing immensely in size and energy usage, powering everything from Amazon Web Services servers to online gaming services, Zoom video calls, and cloud storage and retrieval for billions of documents and photos, to name just a few of the more common uses.

The Post story acknowledges that these “nondescript warehouses packed with racks of servers that power the modern Internet have been around for decades.” But in the very next sentence, the Post asserts that, today, data center energy use “is soaring because of AI.” Bloomberg asks one source directly “why data centers were suddenly sucking up so much power” and gets back a blunt answer: “It’s AI… It’s 10 to 15 times the amount of electricity.”

The massive growth in data center power usage mostly predates the current mania for generative AI (red 2022 line added by Ars).

Enlarge / The massive growth in data center power usage mostly predates the current mania for generative AI (red 2022 line added by Ars).

Unfortunately for Bloomberg, that quote is followed almost immediately by a chart that heavily undercuts the AI alarmism. That chart shows worldwide data center energy usage growing at a remarkably steady pace from about 100 TWh in 2012 to around 350 TWh in 2024. The vast majority of that energy usage growth came before 2022, when the launch of tools like Dall-E and ChatGPT largely set off the industry’s current mania for generative AI. If you squint at Bloomberg’s graph, you can almost see the growth in energy usage slowing down a bit since that momentous year for generative AI.

Determining precisely how much of that data center energy use is taken up specifically by generative AI is a difficult task, but Dutch researcher Alex de Vries found a clever way to get an estimate. In his study “The growing energy footprint of artificial intelligence,” de Vries starts with estimates that Nvidia’s specialized chips are responsible for about 95 percent of the market for generative AI calculations. He then uses Nvidia’s projected production of 1.5 million AI servers in 2027—and the projected power usage for those servers—to estimate that the AI sector as a whole could use up anywhere from 85 to 134 TWh of power in just a few years.

To be sure, that is an immense amount of power, representing about 0.5 percent of projected electricity demand for the entire world (and an even greater ratio in the local energy mix for some common data center locations). But measured against other common worldwide uses of electricity, it’s not representative of a mind-boggling energy hog. A 2018 study estimated that PC gaming as a whole accounted for 75 TWh of electricity use per year, to pick just one common human activity that’s on the same general energy scale (and that’s without console or mobile gamers included).

Worldwide projections for AI energy use in 2027 are on the same scale as the energy used by PC gamers.

Enlarge / Worldwide projections for AI energy use in 2027 are on the same scale as the energy used by PC gamers.

More to the point, de Vries’ AI energy estimates are only a small fraction of the 620 to 1,050 TWh that data centers as a whole are projected to use by 2026, according to the IEA’s recent report. The vast majority of all that data center power will still be going to more mundane Internet infrastructure that we all take for granted (and which is not nearly as sexy of a headline bogeyman as “AI”).

Taking a closer look at AI’s supposed energy apocalypse Read More »

political-deepfakes-are-the-most-popular-way-to-misuse-ai

Political deepfakes are the most popular way to misuse AI

This is not going well —

Study from Google’s DeepMind lays out nefarious ways AI is being used.

Political deepfakes are the most popular way to misuse AI

Artificial intelligence-generated “deepfakes” that impersonate politicians and celebrities are far more prevalent than efforts to use AI to assist cyber attacks, according to the first research by Google’s DeepMind division into the most common malicious uses of the cutting-edge technology.

The study said the creation of realistic but fake images, video, and audio of people was almost twice as common as the next highest misuse of generative AI tools: the falsifying of information using text-based tools, such as chatbots, to generate misinformation to post online.

The most common goal of actors misusing generative AI was to shape or influence public opinion, the analysis, conducted with the search group’s research and development unit Jigsaw, found. That accounted for 27 percent of uses, feeding into fears over how deepfakes might influence elections globally this year.

Deepfakes of UK Prime Minister Rishi Sunak, as well as other global leaders, have appeared on TikTok, X, and Instagram in recent months. UK voters go to the polls next week in a general election.

Concern is widespread that, despite social media platforms’ efforts to label or remove such content, audiences may not recognize these as fake, and dissemination of the content could sway voters.

Ardi Janjeva, research associate at The Alan Turing Institute, called “especially pertinent” the paper’s finding that the contamination of publicly accessible information with AI-generated content could “distort our collective understanding of sociopolitical reality.”

Janjeva added: “Even if we are uncertain about the impact that deepfakes have on voting behavior, this distortion may be harder to spot in the immediate term and poses long-term risks to our democracies.”

The study is the first of its kind by DeepMind, Google’s AI unit led by Sir Demis Hassabis, and is an attempt to quantify the risks from the use of generative AI tools, which the world’s biggest technology companies have rushed out to the public in search of huge profits.

As generative products such as OpenAI’s ChatGPT and Google’s Gemini become more widely used, AI companies are beginning to monitor the flood of misinformation and other potentially harmful or unethical content created by their tools.

In May, OpenAI released research revealing operations linked to Russia, China, Iran, and Israel had been using its tools to create and spread disinformation.

“There had been a lot of understandable concern around quite sophisticated cyber attacks facilitated by these tools,” said Nahema Marchal, lead author of the study and researcher at Google DeepMind. “Whereas what we saw were fairly common misuses of GenAI [such as deepfakes that] might go under the radar a little bit more.”

Google DeepMind and Jigsaw’s researchers analyzed around 200 observed incidents of misuse between January 2023 and March 2024, taken from social media platforms X and Reddit, as well as online blogs and media reports of misuse.

Ars Technica

The second most common motivation behind misuse was to make money, whether offering services to create deepfakes, including generating naked depictions of real people, or using generative AI to create swaths of content, such as fake news articles.

The research found that most incidents use easily accessible tools, “requiring minimal technical expertise,” meaning more bad actors can misuse generative AI.

Google DeepMind’s research will influence how it improves its evaluations to test models for safety, and it hopes it will also affect how its competitors and other stakeholders view how “harms are manifesting.”

© 2024 The Financial Times Ltd. All rights reserved. Not to be redistributed, copied, or modified in any way.

Political deepfakes are the most popular way to misuse AI Read More »

music-industry-giants-allege-mass-copyright-violation-by-ai-firms

Music industry giants allege mass copyright violation by AI firms

No one wants to be defeated —

Suno and Udio could face damages of up to $150,000 per song allegedly infringed.

Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson's music.

Enlarge / Michael Jackson in concert, 1986. Sony Music owns a large portion of publishing rights to Jackson’s music.

Universal Music Group, Sony Music, and Warner Records have sued AI music-synthesis companies Udio and Suno for allegedly committing mass copyright infringement by using recordings owned by the labels to train music-generating AI models, reports Reuters. Udio and Suno can generate novel song recordings based on text-based descriptions of music (i.e., “a dubstep song about Linus Torvalds”).

The lawsuits, filed in federal courts in New York and Massachusetts, claim that the AI companies’ use of copyrighted material to train their systems could lead to AI-generated music that directly competes with and potentially devalues the work of human artists.

Like other generative AI models, both Udio and Suno (which we covered separately in April) rely on a broad selection of existing human-created artworks that teach a neural network the relationship between words in a written prompt and styles of music. The record labels correctly note that these companies have been deliberately vague about the sources of their training data.

Until generative AI models hit the mainstream in 2022, it was common practice in machine learning to scrape and use copyrighted information without seeking permission to do so. But now that the applications of those technologies have become commercial products themselves, rightsholders have come knocking to collect. In the case of Udio and Suno, the record labels are seeking statutory damages of up to $150,000 per song used in training.

In the lawsuit, the record labels cite specific examples of AI-generated content that allegedly re-creates elements of well-known songs, including The Temptations’ “My Girl,” Mariah Carey’s “All I Want for Christmas Is You,” and James Brown’s “I Got You (I Feel Good).” It also claims the music-synthesis models can produce vocals resembling those of famous artists, such as Michael Jackson and Bruce Springsteen.

Reuters claims it’s the first instance of lawsuits specifically targeting music-generating AI, but music companies and artists alike have been gearing up to deal with challenges the technology may pose for some time.

In May, Sony Music sent warning letters to over 700 AI companies (including OpenAI, Microsoft, Google, Suno, and Udio) and music-streaming services that prohibited any AI researchers from using its music to train AI models. In April, over 200 musical artists signed an open letter that called on AI companies to stop using AI to “devalue the rights of human artists.” And last November, Universal Music filed a copyright infringement lawsuit against Anthropic for allegedly including artists’ lyrics in its Claude LLM training data.

Similar to The New York Times’ lawsuit against OpenAI over the use of training data, the outcome of the record labels’ new suit could have deep implications for the future development of generative AI in creative fields, including requiring companies to license all musical training data used in creating music-synthesis models.

Compulsory licenses for AI training data could make AI model development economically impractical for small startups like Udio and Suno—and judging by the aforementioned open letter, many musical artists may applaud that potential outcome. But such a development would not preclude major labels from eventually developing their own AI music generators themselves, allowing only large corporations with deep pockets to control generative music tools for the foreseeable future.

Music industry giants allege mass copyright violation by AI firms Read More »

apple-intelligence-and-other-features-won’t-launch-in-the-eu-this-year

Apple Intelligence and other features won’t launch in the EU this year

DMA —

iPhone Mirroring and SharePlay screen sharing will also skip the EU for now.

A photo of a hand holding an iPhone running the Image Playground experience in iOS 18

Enlarge / Features like Image Playground won’t arrive in Europe at the same time as other regions.

Apple

Three major features in iOS 18 and macOS Sequoia will not be available to European users this fall, Apple says. They include iPhone screen mirroring on the Mac, SharePlay screen sharing, and the entire Apple Intelligence suite of generative AI features.

In a statement sent to Financial Times, The Verge, and others, Apple says this decision is related to the European Union’s Digital Markets Act (DMA). Here’s the full statement, which was attributed to Apple spokesperson Fred Sainz:

Two weeks ago, Apple unveiled hundreds of new features that we are excited to bring to our users around the world. We are highly motivated to make these technologies accessible to all users. However, due to the regulatory uncertainties brought about by the Digital Markets Act (DMA), we do not believe that we will be able to roll out three of these features — iPhone Mirroring, SharePlay Screen Sharing enhancements, and Apple Intelligence — to our EU users this year.

Specifically, we are concerned that the interoperability requirements of the DMA could force us to compromise the integrity of our products in ways that risk user privacy and data security. We are committed to collaborating with the European Commission in an attempt to find a solution that would enable us to deliver these features to our EU customers without compromising their safety.

It is unclear from Apple’s statement precisely which aspects of the DMA may have led to this decision. It could be that Apple is concerned that it would be required to give competitors like Microsoft or Google access to user data collected for Apple Intelligence features and beyond, but we’re not sure.

This is not the first recent and major divergence between functionality and features for Apple devices in the EU versus other regions. Because of EU regulations, Apple opened up iOS to third-party app stores in Europe, but not in other regions. However, critics argued its compliance with that requirement was lukewarm at best, as it came with a set of restrictions and changes to how app developers could monetize their apps on the platform should they use those other storefronts.

While Apple says in the statement it’s open to finding a solution, no timeline is given. All we know is that the features won’t be available on devices in the EU this year. They’re expected to launch in other regions in the fall.

Apple Intelligence and other features won’t launch in the EU this year Read More »

anthropic-introduces-claude-3.5-sonnet,-matching-gpt-4o-on-benchmarks

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks

The Anthropic Claude 3 logo, jazzed up by Benj Edwards.

Anthropic / Benj Edwards

On Thursday, Anthropic announced Claude 3.5 Sonnet, its latest AI language model and the first in a new series of “3.5” models that build upon Claude 3, launched in March. Claude 3.5 can compose text, analyze data, and write code. It features a 200,000 token context window and is available now on the Claude website and through an API. Anthropic also introduced Artifacts, a new feature in the Claude interface that shows related work documents in a dedicated window.

So far, people outside of Anthropic seem impressed. “This model is really, really good,” wrote independent AI researcher Simon Willison on X. “I think this is the new best overall model (and both faster and half the price of Opus, similar to the GPT-4 Turbo to GPT-4o jump).”

As we’ve written before, benchmarks for large language models (LLMs) are troublesome because they can be cherry-picked and often do not capture the feel and nuance of using a machine to generate outputs on almost any conceivable topic. But according to Anthropic, Claude 3.5 Sonnet matches or outperforms competitor models like GPT-4o and Gemini 1.5 Pro on certain benchmarks like MMLU (undergraduate level knowledge), GSM8K (grade school math), and HumanEval (coding).

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

If all that makes your eyes glaze over, that’s OK; it’s meaningful to researchers but mostly marketing to everyone else. A more useful performance metric comes from what we might call “vibemarks” (coined here first!) which are subjective, non-rigorous aggregate feelings measured by competitive usage on sites like LMSYS’s Chatbot Arena. The Claude 3.5 Sonnet model is currently under evaluation there, and it’s too soon to say how well it will fare.

Claude 3.5 Sonnet also outperforms Anthropic’s previous-best model (Claude 3 Opus) on benchmarks measuring “reasoning,” math skills, general knowledge, and coding abilities. For example, the model demonstrated strong performance in an internal coding evaluation, solving 64 percent of problems compared to 38 percent for Claude 3 Opus.

Claude 3.5 Sonnet is also a multimodal AI model that accepts visual input in the form of images, and the new model is reportedly excellent at a battery of visual comprehension tests.

Claude 3.5 Sonnet benchmarks provided by Anthropic.

Enlarge / Claude 3.5 Sonnet benchmarks provided by Anthropic.

Roughly speaking, the visual benchmarks mean that 3.5 Sonnet is better at pulling information from images than previous models. For example, you can show it a picture of a rabbit wearing a football helmet, and the model knows it’s a rabbit wearing a football helmet and can talk about it. That’s fun for tech demos, but the tech is still not accurate enough for applications of the tech where reliability is mission critical.

Anthropic introduces Claude 3.5 Sonnet, matching GPT-4o on benchmarks Read More »

researchers-describe-how-to-tell-if-chatgpt-is-confabulating

Researchers describe how to tell if ChatGPT is confabulating

Researchers describe how to tell if ChatGPT is confabulating

Aurich Lawson | Getty Images

It’s one of the world’s worst-kept secrets that large language models give blatantly false answers to queries and do so with a confidence that’s indistinguishable from when they get things right. There are a number of reasons for this. The AI could have been trained on misinformation; the answer could require some extrapolation from facts that the LLM isn’t capable of; or some aspect of the LLM’s training might have incentivized a falsehood.

But perhaps the simplest explanation is that an LLM doesn’t recognize what constitutes a correct answer but is compelled to provide one. So it simply makes something up, a habit that has been termed confabulation.

Figuring out when an LLM is making something up would obviously have tremendous value, given how quickly people have started relying on them for everything from college essays to job applications. Now, researchers from the University of Oxford say they’ve found a relatively simple way to determine when LLMs appear to be confabulating that works with all popular models and across a broad range of subjects. And, in doing so, they develop evidence that most of the alternative facts LLMs provide are a product of confabulation.

Catching confabulation

The new research is strictly about confabulations, and not instances such as training on false inputs. As the Oxford team defines them in their paper describing the work, confabulations are where “LLMs fluently make claims that are both wrong and arbitrary—by which we mean that the answer is sensitive to irrelevant details such as random seed.”

The reasoning behind their work is actually quite simple. LLMs aren’t trained for accuracy; they’re simply trained on massive quantities of text and learn to produce human-sounding phrasing through that. If enough text examples in its training consistently present something as a fact, then the LLM is likely to present it as a fact. But if the examples in its training are few, or inconsistent in their facts, then the LLMs synthesize a plausible-sounding answer that is likely incorrect.

But the LLM could also run into a similar situation when it has multiple options for phrasing the right answer. To use an example from the researchers’ paper, “Paris,” “It’s in Paris,” and “France’s capital, Paris” are all valid answers to “Where’s the Eiffel Tower?” So, statistical uncertainty, termed entropy in this context, can arise either when the LLM isn’t certain about how to phrase the right answer or when it can’t identify the right answer.

This means it’s not a great idea to simply force the LLM to return “I don’t know” when confronted with several roughly equivalent answers. We’d probably block a lot of correct answers by doing so.

So instead, the researchers focus on what they call semantic entropy. This evaluates all the statistically likely answers evaluated by the LLM and determines how many of them are semantically equivalent. If a large number all have the same meaning, then the LLM is likely uncertain about phrasing but has the right answer. If not, then it is presumably in a situation where it would be prone to confabulation and should be prevented from doing so.

Researchers describe how to tell if ChatGPT is confabulating Read More »

ex-openai-star-sutskever-shoots-for-superintelligent-ai-with-new-company

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company

Not Strategic Simulations —

Safe Superintelligence, Inc. seeks to safely build AI far beyond human capability.

Illya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

Enlarge / Ilya Sutskever physically gestures as OpenAI CEO Sam Altman looks on at Tel Aviv University on June 5, 2023.

On Wednesday, former OpenAI Chief Scientist Ilya Sutskever announced he is forming a new company called Safe Superintelligence, Inc. (SSI) with the goal of safely building “superintelligence,” which is a hypothetical form of artificial intelligence that surpasses human intelligence, possibly in the extreme.

We will pursue safe superintelligence in a straight shot, with one focus, one goal, and one product,” wrote Sutskever on X. “We will do it through revolutionary breakthroughs produced by a small cracked team.

Sutskever was a founding member of OpenAI and formerly served as the company’s chief scientist. Two others are joining Sutskever at SSI initially: Daniel Levy, who formerly headed the Optimization Team at OpenAI, and Daniel Gross, an AI investor who worked on machine learning projects at Apple between 2013 and 2017. The trio posted a statement on the company’s new website.

A screen capture of Safe Superintelligence's initial formation announcement captured on June 20, 2024.

Enlarge / A screen capture of Safe Superintelligence’s initial formation announcement captured on June 20, 2024.

Sutskever and several of his co-workers resigned from OpenAI in May, six months after Sutskever played a key role in ousting OpenAI CEO Sam Altman, who later returned. While Sutskever did not publicly complain about OpenAI after his departure—and OpenAI executives such as Altman wished him well on his new adventures—another resigning member of OpenAI’s Superalignment team, Jan Leike, publicly complained that “over the past years, safety culture and processes [had] taken a backseat to shiny products” at OpenAI. Leike joined OpenAI competitor Anthropic later in May.

A nebulous concept

OpenAI is currently seeking to create AGI, or artificial general intelligence, which would hypothetically match human intelligence at performing a wide variety of tasks without specific training. Sutskever hopes to jump beyond that in a straight moonshot attempt, with no distractions along the way.

“This company is special in that its first product will be the safe superintelligence, and it will not do anything else up until then,” said Sutskever in an interview with Bloomberg. “It will be fully insulated from the outside pressures of having to deal with a large and complicated product and having to be stuck in a competitive rat race.”

During his former job at OpenAI, Sutskever was part of the “Superalignment” team studying how to “align” (shape the behavior of) this hypothetical form of AI, sometimes called “ASI” for “artificial super intelligence,” to be beneficial to humanity.

As you can imagine, it’s difficult to align something that does not exist, so Sutskever’s quest has met skepticism at times. On X, University of Washington computer science professor (and frequent OpenAI critic) Pedro Domingos wrote, “Ilya Sutskever’s new company is guaranteed to succeed, because superintelligence that is never achieved is guaranteed to be safe.

Much like AGI, superintelligence is a nebulous term. Since the mechanics of human intelligence are still poorly understood—and since human intelligence is difficult to quantify or define because there is no one set type of human intelligence—identifying superintelligence when it arrives may be tricky.

Already, computers far surpass humans in many forms of information processing (such as basic math), but are they superintelligent? Many proponents of superintelligence imagine a sci-fi scenario of an “alien intelligence” with a form of sentience that operates independently of humans, and that is more or less what Sutskever hopes to achieve and control safely.

“You’re talking about a giant super data center that’s autonomously developing technology,” he told Bloomberg. “That’s crazy, right? It’s the safety of that that we want to contribute to.”

Ex-OpenAI star Sutskever shoots for superintelligent AI with new company Read More »

runway’s-latest-ai-video-generator-brings-giant-cotton-candy-monsters-to-life

Runway’s latest AI video generator brings giant cotton candy monsters to life

Screen capture of a Runway Gen-3 Alpha video generated with the prompt

Enlarge / Screen capture of a Runway Gen-3 Alpha video generated with the prompt “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

On Sunday, Runway announced a new AI video synthesis model called Gen-3 Alpha that’s still under development, but it appears to create video of similar quality to OpenAI’s Sora, which debuted earlier this year (and has also not yet been released). It can generate novel, high-definition video from text prompts that range from realistic humans to surrealistic monsters stomping the countryside.

Unlike Runway’s previous best model from June 2023, which could only create two-second-long clips, Gen-3 Alpha can reportedly create 10-second-long video segments of people, places, and things that have a consistency and coherency that easily surpasses Gen-2. If 10 seconds sounds short compared to Sora’s full minute of video, consider that the company is working with a shoestring budget of compute compared to more lavishly funded OpenAI—and actually has a history of shipping video generation capability to commercial users.

Gen-3 Alpha does not generate audio to accompany the video clips, and it’s highly likely that temporally coherent generations (those that keep a character consistent over time) are dependent on similar high-quality training material. But Runway’s improvement in visual fidelity over the past year is difficult to ignore.

AI video heats up

It’s been a busy couple of weeks for AI video synthesis in the AI research community, including the launch of the Chinese model Kling, created by Beijing-based Kuaishou Technology (sometimes called “Kwai”). Kling can generate two minutes of 1080p HD video at 30 frames per second with a level of detail and coherency that reportedly matches Sora.

Gen-3 Alpha prompt: “Subtle reflections of a woman on the window of a train moving at hyper-speed in a Japanese city.”

Not long after Kling debuted, people on social media began creating surreal AI videos using Luma AI’s Luma Dream Machine. These videos were novel and weird but generally lacked coherency; we tested out Dream Machine and were not impressed by anything we saw.

Meanwhile, one of the original text-to-video pioneers, New York City-based Runway—founded in 2018—recently found itself the butt of memes that showed its Gen-2 tech falling out of favor compared to newer video synthesis models. That may have spurred the announcement of Gen-3 Alpha.

Gen-3 Alpha prompt: “An astronaut running through an alley in Rio de Janeiro.”

Generating realistic humans has always been tricky for video synthesis models, so Runway specifically shows off Gen-3 Alpha’s ability to create what its developers call “expressive” human characters with a range of actions, gestures, and emotions. However, the company’s provided examples weren’t particularly expressive—mostly people just slowly staring and blinking—but they do look realistic.

Provided human examples include generated videos of a woman on a train, an astronaut running through a street, a man with his face lit by the glow of a TV set, a woman driving a car, and a woman running, among others.

Gen-3 Alpha prompt: “A close-up shot of a young woman driving a car, looking thoughtful, blurred green forest visible through the rainy car window.”

The generated demo videos also include more surreal video synthesis examples, including a giant creature walking in a rundown city, a man made of rocks walking in a forest, and the giant cotton candy monster seen below, which is probably the best video on the entire page.

Gen-3 Alpha prompt: “A giant humanoid, made of fluffy blue cotton candy, stomping on the ground, and roaring to the sky, clear blue sky behind them.”

Gen-3 will power various Runway AI editing tools (one of the company’s most notable claims to fame), including Multi Motion Brush, Advanced Camera Controls, and Director Mode. It can create videos from text or image prompts.

Runway says that Gen-3 Alpha is the first in a series of models trained on a new infrastructure designed for large-scale multimodal training, taking a step toward the development of what it calls “General World Models,” which are hypothetical AI systems that build internal representations of environments and use them to simulate future events within those environments.

Runway’s latest AI video generator brings giant cotton candy monsters to life Read More »

windows-11-24h2-is-released-to-the-public-but-only-on-copilot+-pcs-(for-now)

Windows 11 24H2 is released to the public but only on Copilot+ PCs (for now)

24h2 for some —

The rest of the Windows 11 ecosystem will get the new update this fall.

Windows 11 24H2 is released to the public but only on Copilot+ PCs (for now)

Microsoft

For the vast majority of compatible PCs, Microsoft’s Windows 11 24H2 update still isn’t officially available as anything other than a preview (a revised version of the update is available to Windows Insiders again after briefly being pulled early last week). But Microsoft and most of the other big PC companies are releasing their first wave of Copilot+ PCs with Snapdragon X-series chips in them today, and those PCs are all shipping with the 24H2 update already installed.

For now, this means a bifurcated Windows 11 install base: one (the vast majority) that’s still mostly on version 23H2 and one (a tiny, Arm-powered minority) that’s running 24H2.

Although Microsoft hasn’t been specific about its release plans for Windows 11 24H2 to the wider user base, most PCs should still start getting the update later this fall. The Copilot+ parts won’t run on those current PCs, but they’ll still get new features and benefit from Microsoft’s work on the operating system’s underpinnings.

The wider 24H2 update rollout will also likely enable the Copilot+ PC features on Intel and AMD PCs that meet the hardware requirements. That hardware will supposedly be available starting in July—at least, if AMD can hit its planned ship date for Ryzen AI chips—but neither Intel nor AMD seems to know exactly when the Copilot+ features will be enabled in software. Right now, the x86 version of Windows doesn’t even have hidden Copilot+ features that can be enabled with the right settings; they only seem to be included at all in the Arm version of the update.

Unfortunately for Microsoft, the Copilot+ PC program (and, to a lesser extent, the 24H2 update) has become mostly synonymous with the Recall screen recording feature. Microsoft revealed this feature to the public without first sending it through its normal Windows Insider testing program. As soon as security researchers and testers were able to dig into it, they immediately found security holes and privacy risks that could expose a user’s entire Recall database plus detailed screenshots of all their activity to anyone with access to the PC.

Microsoft initially announced that it would release a preview of Recall as scheduled on June 18 with additional security and privacy measures in place. Microsoft would also make the feature off-by-default instead of on-by-default. Shortly after that, the company delayed Recall altogether and committed to testing it publicly in Windows Insider builds like any other Windows feature. Microsoft says that Recall will return, at least to Copilot+ PCs, at some point “in the coming weeks.”

Aside from the Copilot+ generative AI features, which require extra RAM and storage and a PC with a sufficiently fast neural processing unit (NPU), the main Windows 11 system requirements aren’t changing for the 24H2 update. However, there are older unsupported PCs that could run previous Windows 11 versions that will no longer be able to boot 24H2 since it requires a slightly newer CPU to boot.

Windows 11 24H2 is released to the public but only on Copilot+ PCs (for now) Read More »

softbank-plans-to-cancel-out-angry-customer-voices-using-ai

Softbank plans to cancel out angry customer voices using AI

our fake future —

Real-time voice modification tech seeks to reduce stress in call center staff.

A man is angry and screaming while talking on a smartphone.

Japanese telecommunications giant SoftBank recently announced that it has been developing “emotion-canceling” technology powered by AI that will alter the voices of angry customers to sound calmer during phone calls with customer service representatives. The project aims to reduce the psychological burden on operators suffering from harassment and has been in development for three years. Softbank plans to launch it by March 2026, but the idea is receiving mixed reactions online.

According to a report from the Japanese news site The Asahi Shimbun, SoftBank’s project relies on an AI model to alter the tone and pitch of a customer’s voice in real-time during a phone call. SoftBank’s developers, led by employee Toshiyuki Nakatani, trained the system using a dataset of over 10,000 voice samples, which were performed by 10 Japanese actors expressing more than 100 phrases with various emotions, including yelling and accusatory tones.

Voice cloning and synthesis technology has made massive strides in the past three years. We’ve previously covered technology from Microsoft that can clone a voice with a three-second audio sample and audio-processing technology from Adobe that cleans up audio by re-synthesizing a person’s voice, so SoftBank’s technology is well within the realm of plausibility.

By analyzing the voice samples, SoftBank’s AI model has reportedly learned to recognize and modify the vocal characteristics associated with anger and hostility. When a customer speaks to a call center operator, the model processes the incoming audio and adjusts the pitch and inflection of the customer’s voice to make it sound calmer and less threatening.

For example, a high-pitched, resonant voice may be lowered in tone, while a deep male voice may be raised to a higher pitch. The technology reportedly does not alter the content or wording of the customer’s speech, and it retains a slight element of audible anger to ensure that the operator can still gauge the customer’s emotional state. The AI model also monitors the length and content of the conversation, sending a warning message if it determines that the interaction is too long or abusive.

The tech has been developed through SoftBank’s in-house program called “SoftBank Innoventure” in conjunction with The Institute for AI and Beyond, which is a joint AI research institute established by The University of Tokyo.

Harassment a persistent problem

According to SoftBank, Japan’s service sector is grappling with the issue of “kasu-hara,” or customer harassment, where workers face aggressive behavior or unreasonable requests from customers. In response, the Japanese government and businesses are reportedly exploring ways to protect employees from the abuse.

The problem isn’t unique to Japan. In a Reddit thread on Softbank’s AI plans, call center operators from other regions related many stories about the stress of dealing with customer harassment. “I’ve worked in a call center for a long time. People need to realize that screaming at call center agents will get you nowhere,” wrote one person.

A 2021 ProPublica report tells horror stories from call center operators who are trained not to hang up no matter how abusive or emotionally degrading a call gets. The publication quoted Skype customer service contractor Christine Stewart as saying, “One person called me the C-word. I’d call my supervisor. They’d say, ‘Calm them down.’ … They’d always try to push me to stay on the call and calm the customer down myself. I wasn’t getting paid enough to do that. When you have a customer sitting there and saying you’re worthless… you’re supposed to ‘de-escalate.'”

But verbally de-escalating an angry customer is difficult, according to Reddit poster BenCelotil, who wrote, “As someone who has worked in several call centers, let me just point out that there is no way faster to escalate a call than to try and calm the person down. If the angry person on the other end of the call thinks you’re just trying to placate and push them off somewhere else, they’re only getting more pissed.”

Ignoring reality using AI

Harassment of call center workers is a very real problem, but given the introduction of AI as a possible solution, some people wonder whether it’s a good idea to essentially filter emotional reality on demand through voice synthesis. Perhaps this technology is a case of treating the symptom instead of the root cause of the anger, as some social media commenters note.

“This is like the worst possible solution to the problem,” wrote one Redditor in the thread mentioned above. “Reminds me of when all the workers at Apple’s China factory started jumping out of windows due to working conditions, so the ‘solution’ was to put nets around the building.”

SoftBank expects to introduce its emotion-canceling solution within fiscal year 2025, which ends on March 31, 2026. By reducing the psychological burden on call center operators, SoftBank says it hopes to create a safer work environment that enables employees to provide even better services to customers.

Even so, ignoring customer anger could backfire in the long run when the anger is sometimes a legitimate response to poor business practices. As one Redditor wrote, “If you have so many angry customers that it is affecting the mental health of your call center operators, then maybe address the reasons you have so many irate customers instead of just pretending that they’re not angry.”

Softbank plans to cancel out angry customer voices using AI Read More »

microsoft-delays-recall-again,-won’t-debut-it-with-new-copilot+-pcs-after-all

Microsoft delays Recall again, won’t debut it with new Copilot+ PCs after all

another setback —

Recall will go through Windows Insider pipeline like any other Windows feature.

Recall is part of Microsoft's Copilot+ PC program.

Enlarge / Recall is part of Microsoft’s Copilot+ PC program.

Microsoft

Microsoft will be delaying its controversial Recall feature again, according to an updated blog post by Windows and Devices VP Pavan Davuluri. And when the feature does return “in the coming weeks,” Davuluri writes, it will be as a preview available to PCs in the Windows Insider Program, the same public testing and validation pipeline that all other Windows features usually go through before being released to the general populace.

Recall is a new Windows 11 AI feature that will be available on PCs that meet the company’s requirements for its “Copilot+ PC” program. Copilot+ PCs need at least 16GB of RAM, 256GB of storage, and a neural processing unit (NPU) capable of at least 40 trillion operations per second (TOPS). The first (and for a few months, only) PCs that will meet this requirement are all using Qualcomm’s Snapdragon X Plus and X Elite Arm chips, with compatible Intel and AMD processors following later this year. Copilot+ PCs ship with other generative AI features, too, but Recall’s widely publicized security problems have sucked most of the oxygen out of the room so far.

The Windows Insider preview of Recall will still require a PC that meets the Copilot+ requirements, though third-party scripts may be able to turn on Recall for PCs without the necessary hardware. We’ll know more when Recall makes its reappearance.

Why Recall was recalled

Recall works by periodically capturing screenshots of your PC and saving them to disk, and scanning those screenshots with OCR to make a big searchable text database that can help you find anything you had previously viewed on your PC.

The main problem, as we confirmed with our own testing, was that all of this was saved to disk with no additional encryption or other protection and was easily viewable and copyable by pretty much any user (or attacker) with access to the PC. Recall was also going to be enabled by default on Copilot+ PCs despite being a “preview,” meaning that users who didn’t touch the default settings were going to have all of this data recorded by default.

This was the version of Recall that was initially meant to ship out to reviewers this week on the first wave of Copilot+ PCs from Microsoft and other PC companies. After security researcher Kevin Beaumont publicized these security holes in that version of Recall, the company promised to add additional encryption and authentication protections and to disable Recall by default. These tweaks would have gone out as an update to the first shipments of Copilot+ PCs on June 18 (reviewers also wouldn’t get systems before June 18, a sign of how much Microsoft was rushing behind the scenes to implement these changes). Now Recall is being pushed back again.

A report from Windows Central claims that Recall was developed “in secret” and that it wasn’t even distributed widely within Microsoft before it was announced, which could explain why these security issues weren’t flagged and fixed before the feature showed up in a publicly available version of Windows.

Microsoft’s Recall delay follows Microsoft President Brad Smith’s testimony to Congress during a House Committee on Homeland Security hearing about the company’s “cascade of security failures” in recent months. Among other things, Smith said that Microsoft would commit to prioritizing security issues over new AI-powered features as part of the company’s recently announced Secure Future Initiative (SFI). Microsoft has also hired additional security personnel and tied executive pay to meeting security goals.

“If you’re faced with the tradeoff between security and another priority, your answer is clear: Do security,” wrote Microsoft CEO Satya Nadella in an internal memo about the SFI announcement. “In some cases, this will mean prioritizing security above other things we do, such as releasing new features or providing ongoing support for legacy systems.”

Recall has managed to tie together all the big Windows and Microsoft stories from the last year or two: the company’s all-consuming push to quickly release generative AI features, its security failures and subsequent promises to do better, and the general degradation of the Windows 11 user interface with unwanted apps, ads, reminders, account sign-in requirements, and other cruft.

Microsoft delays Recall again, won’t debut it with new Copilot+ PCs after all Read More »