machine learning

words-are-flowing-out-like-endless-rain:-recapping-a-busy-week-of-llm-news

Words are flowing out like endless rain: Recapping a busy week of LLM news

many things frequently —

Gemini 1.5 Pro launch, new version of GPT-4 Turbo, new Mistral model, and more.

An image of a boy amazed by flying letters.

Enlarge / An image of a boy amazed by flying letters.

Some weeks in AI news are eerily quiet, but during others, getting a grip on the week’s events feels like trying to hold back the tide. This week has seen three notable large language model (LLM) releases: Google Gemini Pro 1.5 hit general availability with a free tier, OpenAI shipped a new version of GPT-4 Turbo, and Mistral released a new openly licensed LLM, Mixtral 8x22B. All three of those launches happened within 24 hours starting on Tuesday.

With the help of software engineer and independent AI researcher Simon Willison (who also wrote about this week’s hectic LLM launches on his own blog), we’ll briefly cover each of the three major events in roughly chronological order, then dig into some additional AI happenings this week.

Gemini Pro 1.5 general release

On Tuesday morning Pacific time, Google announced that its Gemini 1.5 Pro model (which we first covered in February) is now available in 180-plus countries, excluding Europe, via the Gemini API in a public preview. This is Google’s most powerful public LLM so far, and it’s available in a free tier that permits up to 50 requests a day.

It supports up to 1 million tokens of input context. As Willison notes in his blog, Gemini 1.5 Pro’s API price at $7/million input tokens and $21/million output tokens costs a little less than GPT-4 Turbo (priced at $10/million in and $30/million out) and more than Claude 3 Sonnet (Anthropic’s mid-tier LLM, priced at $3/million in and $15/million out).

Notably, Gemini 1.5 Pro includes native audio (speech) input processing that allows users to upload audio or video prompts, a new File API for handling files, the ability to add custom system instructions (system prompts) for guiding model responses, and a JSON mode for structured data extraction.

“Majorly Improved” GPT-4 Turbo launch

A GPT-4 Turbo performance chart provided by OpenAI.

Enlarge / A GPT-4 Turbo performance chart provided by OpenAI.

Just a bit later than Google’s 1.5 Pro launch on Tuesday, OpenAI announced that it was rolling out a “majorly improved” version of GPT-4 Turbo (a model family originally launched in November) called “gpt-4-turbo-2024-04-09.” It integrates multimodal GPT-4 Vision processing (recognizing the contents of images) directly into the model, and it initially launched through API access only.

Then on Thursday, OpenAI announced that the new GPT-4 Turbo model had just become available for paid ChatGPT users. OpenAI said that the new model improves “capabilities in writing, math, logical reasoning, and coding” and shared a chart that is not particularly useful in judging capabilities (that they later updated). The company also provided an example of an alleged improvement, saying that when writing with ChatGPT, the AI assistant will use “more direct, less verbose, and use more conversational language.”

The vague nature of OpenAI’s GPT-4 Turbo announcements attracted some confusion and criticism online. On X, Willison wrote, “Who will be the first LLM provider to publish genuinely useful release notes?” In some ways, this is a case of “AI vibes” again, as we discussed in our lament about the poor state of LLM benchmarks during the debut of Claude 3. “I’ve not actually spotted any definite differences in quality [related to GPT-4 Turbo],” Willison told us directly in an interview.

The update also expanded GPT-4’s knowledge cutoff to April 2024, although some people are reporting it achieves this through stealth web searches in the background, and others on social media have reported issues with date-related confabulations.

Mistral’s mysterious Mixtral 8x22B release

An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It's hard to draw a picture of an LLM, so a robot will have to do.

Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

Not to be outdone, on Tuesday night, French AI company Mistral launched its latest openly licensed model, Mixtral 8x22B, by tweeting a torrent link devoid of any documentation or commentary, much like it has done with previous releases.

The new mixture-of-experts (MoE) release weighs in with a larger parameter count than its previously most-capable open model, Mixtral 8x7B, which we covered in December. It’s rumored to potentially be as capable as GPT-4 (In what way, you ask? Vibes). But that has yet to be seen.

“The evals are still rolling in, but the biggest open question right now is how well Mixtral 8x22B shapes up,” Willison told Ars. “If it’s in the same quality class as GPT-4 and Claude 3 Opus, then we will finally have an openly licensed model that’s not significantly behind the best proprietary ones.”

This release has Willison most excited, saying, “If that thing really is GPT-4 class, it’s wild, because you can run that on a (very expensive) laptop. I think you need 128GB of MacBook RAM for it, twice what I have.”

The new Mixtral is not listed on Chatbot Arena yet, Willison noted, because Mistral has not released a fine-tuned model for chatting yet. It’s still a raw, predict-the-next token LLM. “There’s at least one community instruction tuned version floating around now though,” says Willison.

Chatbot Arena Leaderboard shake-ups

A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Enlarge / A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Benj Edwards

This week’s LLM news isn’t limited to just the big names in the field. There have also been rumblings on social media about the rising performance of open source models like Cohere’s Command R+, which reached position 6 on the LMSYS Chatbot Arena Leaderboard—the highest-ever ranking for an open-weights model.

And for even more Chatbot Arena action, apparently the new version of GPT-4 Turbo is proving competitive with Claude 3 Opus. The two are still in a statistical tie, but GPT-4 Turbo recently pulled ahead numerically. (In March, we reported when Claude 3 first numerically pulled ahead of GPT-4 Turbo, which was then the first time another AI model had surpassed a GPT-4 family model member on the leaderboard.)

Regarding this fierce competition among LLMs—of which most of the muggle world is unaware and will likely never be—Willison told Ars, “The past two months have been a whirlwind—we finally have not just one but several models that are competitive with GPT-4.” We’ll see if OpenAI’s rumored release of GPT-5 later this year will restore the company’s technological lead, we note, which once seemed insurmountable. But for now, Willison says, “OpenAI are no longer the undisputed leaders in LLMs.”

Words are flowing out like endless rain: Recapping a busy week of LLM news Read More »

intel’s-“gaudi-3”-ai-accelerator-chip-may-give-nvidia’s-h100-a-run-for-its-money

Intel’s “Gaudi 3” AI accelerator chip may give Nvidia’s H100 a run for its money

Adventures in Matrix Multiplication —

Intel claims 50% more speed when running AI language models vs. the market leader.

An Intel handout photo of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout photo of the Gaudi 3 AI accelerator.

On Tuesday, Intel revealed a new AI accelerator chip called Gaudi 3 at its Vision 2024 event in Phoenix. With strong claimed performance while running large language models (like those that power ChatGPT), the company has positioned Gaudi 3 as an alternative to Nvidia’s H100, a popular data center GPU that has been subject to shortages, though apparently that is easing somewhat.

Compared to Nvidia’s H100 chip, Intel projects a 50 percent faster training time on Gaudi 3 for both OpenAI’s GPT-3 175B LLM and the 7-billion parameter version of Meta’s Llama 2. In terms of inference (running the trained model to get outputs), Intel claims that its new AI chip delivers 50 percent faster performance than H100 for Llama 2 and Falcon 180B, which are both relatively popular open-weights models.

Intel is targeting the H100 because of its high market share, but the chip isn’t Nvidia’s most powerful AI accelerator chip in the pipeline. Announcements of the H200 and the Blackwell B200 have since surpassed the H100 on paper, but neither of those chips is out yet (the H200 is expected in the second quarter of 2024—basically any day now).

Meanwhile, the aforementioned H100 supply issues have been a major headache for tech companies and AI researchers who have to fight for access to any chips that can train AI models. This has led several tech companies like Microsoft, Meta, and OpenAI (rumor has it) to seek their own AI-accelerator chip designs, although that custom silicon is typically manufactured by either Intel or TSMC. Google has its own line of tensor processing units (TPUs) that it has been using internally since 2015.

Given those issues, Intel’s Gaudi 3 may be a potentially attractive alternative to the H100 if Intel can hit an ideal price (which Intel has not provided, but an H100 reportedly costs around $30,000–$40,000) and maintain adequate production. AMD also manufactures a competitive range of AI chips, such as the AMD Instinct MI300 Series, that sell for around $10,000–$15,000.

Gaudi 3 performance

An Intel handout featuring specifications of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout featuring specifications of the Gaudi 3 AI accelerator.

Intel says the new chip builds upon the architecture of its predecessor, Gaudi 2, by featuring two identical silicon dies connected by a high-bandwidth connection. Each die contains a central cache memory of 48 megabytes, surrounded by four matrix multiplication engines and 32 programmable tensor processor cores, bringing the total cores to 64.

The chipmaking giant claims that Gaudi 3 delivers double the AI compute performance of Gaudi 2 using 8-bit floating-point infrastructure, which has become crucial for training transformer models. The chip also offers a fourfold boost for computations using the BFloat 16-number format. Gaudi 3 also features 128GB of the less expensive HBMe2 memory capacity (which may contribute to price competitiveness) and features 3.7TB of memory bandwidth.

Since data centers are well-known to be power hungry, Intel emphasizes the power efficiency of Gaudi 3, claiming 40 percent greater inference power-efficiency across Llama 7B and 70B parameters, and Falcon 180B parameter models compared to Nvidia’s H100. Eitan Medina, chief operating officer of Intel’s Habana Labs, attributes this advantage to Gaudi’s large-matrix math engines, which he claims require significantly less memory bandwidth compared to other architectures.

Gaudi vs. Blackwell

An Intel handout photo of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout photo of the Gaudi 3 AI accelerator.

Last month, we covered the splashy launch of Nvidia’s Blackwell architecture, including the B200 GPU, which Nvidia claims will be the world’s most powerful AI chip. It seems natural, then, to compare what we know about Nvidia’s highest-performing AI chip to the best of what Intel can currently produce.

For starters, Gaudi 3 is being manufactured using TSMC’s N5 process technology, according to IEEE Spectrum, narrowing the gap between Intel and Nvidia in terms of semiconductor fabrication technology. The upcoming Nvidia Blackwell chip will use a custom N4P process, which reportedly offers modest performance and efficiency improvements over N5.

Gaudi 3’s use of HBM2e memory (as we mentioned above) is notable compared to the more expensive HBM3 or HBM3e used in competing chips, offering a balance of performance and cost-efficiency. This choice seems to emphasize Intel’s strategy to compete not only on performance but also on price.

As far as raw performance comparisons between Gaudi 3 and the B200, that can’t be known until the chips have been released and benchmarked by a third party.

As the race to power the tech industry’s thirst for AI computation heats up, IEEE Spectrum notes that the next generation of Intel’s Gaudi chip, code-named Falcon Shores, remains a point of interest. It also remains to be seen whether Intel will continue to rely on TSMC’s technology or leverage its own foundry business and upcoming nanosheet transistor technology to gain a competitive edge in the AI accelerator market.

Intel’s “Gaudi 3” AI accelerator chip may give Nvidia’s H100 a run for its money Read More »

new-ai-music-generator-udio-synthesizes-realistic-music-on-demand

New AI music generator Udio synthesizes realistic music on demand

Battle of the AI bands —

But it still needs trial and error to generate high-quality results.

A screenshot of AI-generated songs listed on Udio on April 10, 2024.

Enlarge / A screenshot of AI-generated songs listed on Udio on April 10, 2024.

Benj Edwards

Between 2002 and 2005, I ran a music website where visitors could submit song titles that I would write and record a silly song around. In the liner notes for my first CD release in 2003, I wrote about a day when computers would potentially put me out of business, churning out music automatically at a pace I could not match. While I don’t actively post music on that site anymore, that day is almost here.

On Wednesday, a group of ex-DeepMind employees launched Udio, a new AI music synthesis service that can create novel high-fidelity musical audio from written prompts, including user-provided lyrics. It’s similar to Suno, which we covered on Monday. With some key human input, Udio can create facsimiles of human-produced music in genres like country, barbershop quartet, German pop, classical, hard rock, hip hop, show tunes, and more. It’s currently free to use during a beta period.

Udio is also freaking out some musicians on Reddit. As we mentioned in our Suno piece, Udio is exactly the kind of AI-powered music generation service that over 200 musical artists were afraid of when they signed an open protest letter last week.

But as impressive as the Udio songs first seem from a technical AI-generation standpoint (not necessarily judging by musical merit), its generation capability isn’t perfect. We experimented with its creation tool and the results felt less impressive than those created by Suno. The high-quality musical samples showcased on Udio’s site likely resulted from a lot of creative human input (such as human-written lyrics) and cherry-picking the best compositional parts of songs out of many generations. In fact, Udio lays out a five-step workflow to build a 1.5-minute-long song in a FAQ.

For example, we created an Ars Technica “Moonshark” song on Udio using the same prompt as one we used previously with Suno. In its raw form, the results sound half-baked and almost nightmarish (here is the Suno version for comparison). It’s also a lot shorter by default at 32 seconds compared to Suno’s 1-minute and 32-second output. But Udio allows songs to be extended, or you can try generating a poor result again with different prompts for different results.

After registering a Udio account, anyone can create a track by entering a text prompt that can include lyrics, a story direction, and musical genre tags. Udio then tackles the task in two stages. First, it utilizes a large language model (LLM) similar to ChatGPT to generate lyrics (if necessary) based on the provided prompt. Next, it synthesizes music using a method that Udio does not disclose, but it’s likely a diffusion model, similar to Stability AI’s Stable Audio.

From the given prompt, Udio’s AI model generates two distinct song snippets for you to choose from. You can then publish the song for the Udio community, download the audio or video file to share on other platforms, or directly share it on social media. Other Udio users can also remix or build on existing songs. Udio’s terms of service say that the company claims no rights over the musical generations and that they can be used for commercial purposes.

Although the Udio team has not revealed the specific details of its model or training data (which is likely filled with copyrighted material), it told Tom’s Guide that the system has built-in measures to identify and block tracks that too closely resemble the work of specific artists, ensuring that the generated music remains original.

And that brings us back to humans, some of whom are not taking the onset of AI-generated music very well. “I gotta be honest, this is depressing as hell,” wrote one Reddit commenter in a thread about Udio. “I’m still broadly optimistic that music will be fine in the long run somehow. But like, why do this? Why automate art?”

We’ll hazard an answer by saying that replicating art is a key target for AI research because the results can be inaccurate and imprecise and still seem notable or gee-whiz amazing, which is a key characteristic of generative AI. It’s flashy and impressive-looking while allowing for a general lack of quantitative rigor. We’ve already seen AI come for still images, video, and text with varied results regarding representative accuracy. Fully composed musical recordings seem to be next on the list of AI hills to (approximately) conquer, and the competition is heating up.

New AI music generator Udio synthesizes realistic music on demand Read More »

elon-musk:-ai-will-be-smarter-than-any-human-around-the-end-of-next-year

Elon Musk: AI will be smarter than any human around the end of next year

smarter than the average bear —

While Musk says superintelligence is coming soon, one critic says prediction is “batsh*t crazy.”

Elon Musk, owner of Tesla and the X (formerly Twitter) platform, attends a symposium on fighting antisemitism titled 'Never Again : Lip Service or Deep Conversation' in Krakow, Poland on January 22nd, 2024. Musk, who was invited to Poland by the European Jewish Association (EJA) has visited the Auschwitz-Birkenau concentration camp earlier that day, ahead of International Holocaust Remembrance Day. (Photo by Beata Zawrzel/NurPhoto)

Enlarge / Elon Musk, owner of Tesla and the X (formerly Twitter) platform on January 22, 2024.

On Monday, Tesla CEO Elon Musk predicted the imminent rise in AI superintelligence during a live interview streamed on the social media platform X. “My guess is we’ll have AI smarter than any one human probably around the end of next year,” Musk said in his conversation with hedge fund manager Nicolai Tangen.

Just prior to that, Tangen had asked Musk, “What’s your take on where we are in the AI race just now?” Musk told Tangen that AI “is the fastest advancing technology I’ve seen of any kind, and I’ve seen a lot of technology.” He described computers dedicated to AI increasing in capability by “a factor of 10 every year, if not every six to nine months.”

Musk made the prediction with an asterisk, saying that shortages of AI chips and high AI power demands could limit AI’s capability until those issues are resolved. “Last year, it was chip-constrained,” Musk told Tangen. “People could not get enough Nvidia chips. This year, it’s transitioning to a voltage transformer supply. In a year or two, it’s just electricity supply.”

But not everyone is convinced that Musk’s crystal ball is free of cracks. Grady Booch, a frequent critic of AI hype on social media who is perhaps best known for his work in software architecture, told Ars in an interview, “Keep in mind that Mr. Musk has a profoundly bad record at predicting anything associated with AI; back in 2016, he promised his cars would ship with FSD safety level 5, and here we are, closing on an a decade later, still waiting.”

Creating artificial intelligence at least as smart as a human (frequently called “AGI” for artificial general intelligence) is often seen as inevitable among AI proponents, but there’s no broad consensus on exactly when that milestone will be reached—or on the exact definition of AGI, for that matter.

“If you define AGI as smarter than the smartest human, I think it’s probably next year, within two years,” Musk added in the interview with Tangen while discussing AGI timelines.

Even with uncertainties about AGI, that hasn’t kept companies from trying. ChatGPT creator OpenAI, which launched with Musk as a co-founder in 2015, lists developing AGI as its main goal. Musk has not been directly associated with OpenAI for years (unless you count a recent lawsuit against the company), but last year, he took aim at the business of large language models by forming a new company called xAI. Its main product, Grok, functions similarly to ChatGPT and is integrated into the X social media platform.

Booch gives credit to Musk’s business successes but casts doubt on his forecasting ability. “Albeit a brilliant if not rapacious businessman, Mr. Musk vastly overestimates both the history as well as the present of AI while simultaneously diminishing the exquisite uniqueness of human intelligence,” says Booch. “So in short, his prediction is—to put it in scientific terms—batshit crazy.”

So when will we get AI that’s smarter than a human? Booch says there’s no real way to know at the moment. “I reject the framing of any question that asks when AI will surpass humans in intelligence because it is a question filled with ambiguous terms and considerable emotional and historic baggage,” he says. “We are a long, long way from understanding the design that would lead us there.”

We also asked Hugging Face AI researcher Dr. Margaret Mitchell to weigh in on Musk’s prediction. “Intelligence … is not a single value where you can make these direct comparisons and have them mean something,” she told us in an interview. “There will likely never be agreement on comparisons between human and machine intelligence.”

But even with that uncertainty, she feels there is one aspect of AI she can more reliably predict: “I do agree that neural network models will reach a point where men in positions of power and influence, particularly ones with investments in AI, will declare that AI is smarter than humans. By end of next year, sure. That doesn’t sound far off base to me.”

Elon Musk: AI will be smarter than any human around the end of next year Read More »

mit-license-text-becomes-viral-“sad-girl”-piano-ballad-generated-by-ai

MIT License text becomes viral “sad girl” piano ballad generated by AI

WARRANTIES OF MERCHANTABILITY —

“Permission is hereby granted” comes from Suno AI engine that creates new songs on demand.

Illustration of a robot singing.

We’ve come a long way since primitive AI music generators in 2022. Today, AI tools like Suno.ai allow any series of words to become song lyrics, including inside jokes (as you’ll see below). On Wednesday, prompt engineer Riley Goodside tweeted an AI-generated song created with the prompt “sad girl with piano performs the text of the MIT License,” and it began to circulate widely in the AI community online.

The MIT License is a famous permissive software license created in the late 1980s, frequently used in open source projects. “My favorite part of this is ~1: 25 it nails ‘WARRANTIES OF MERCHANTABILITY’ with a beautiful Imogen Heap-style glissando then immediately pronounces ‘FITNESS’ as ‘fistiff,'” Goodside wrote on X.

Suno (which means “listen” in Hindi) was formed in 2023 in Cambridge, Massachusetts. It’s the brainchild of Michael Shulman, Georg Kucsko, Martin Camacho, and Keenan Freyberg, who formerly worked at companies like Meta and TikTok. Suno has already attracted big-name partners, such as Microsoft, which announced the integration of an earlier version of the Suno engine into Bing Chat last December. Today, Suno is on v3 of its model, which can create temporally coherent two-minute songs in many different genres.

The company did not reply to our request for an interview by press time. In March, Brian Hiatt of Rolling Stone wrote a profile about Suno that describes the service as a collaboration between OpenAI’s ChatGPT (for lyric writing) and Suno’s music generation model, which some experts think has likely been trained on recordings of copyrighted music without license or artist permission.

It’s exactly this kind of service that upset over 200 musical artists enough last week that they signed an Artist Rights Alliance open letter asking tech companies to stop using AI tools to generate music that could replace human artists.

Considering the unknown provenance of the training data, ownership of the generated songs seems like a complicated question. Suno’s FAQ says that music generated using its free tier remains owned by Suno and can only be used for non-commercial purposes. Paying subscribers reportedly own generated songs “while subscribed to Pro or Premier,” subject to Suno’s terms of service. However, the US Copyright Office took a stance last year that purely AI-generated visual art cannot be copyrighted, and while that standard has not yet been resolved for AI-generated music, it might eventually become official legal policy as well.

The Moonshark song

A screenshot of the Suno.ai website showing lyrics of an AI-generated

Enlarge / A screenshot of the Suno.ai website showing lyrics of an AI-generated “Moonshark” song.

Benj Edwards

While using the service, Suno appears to have no trouble creating unique lyrics based on your prompt (unless you supply your own) and sets those words to stylized genres of music it generates based on its training dataset. It dynamically generates vocals as well, although they include audible aberrations. Suno’s output is not indistinguishable from high-fidelity human-created music yet, but given the pace of progress we’ve seen, that bridge could be crossed within the next year.

To get a sense of what Suno can do, we created an account on the site and prompted the AI engine to create songs about our mascot, Moonshark, and about barbarians with CRTs, two inside jokes at Ars. What’s interesting is that although the AI model aced the task of creating an original song for each topic, both songs start with the same line, “In the depths of the digital domain.” That’s possibly an artifact of whatever hidden prompt Suno is using to instruct ChatGPT when writing the lyrics.

Suno is arguably a fun toy to experiment with and doubtless a milestone in generative AI music tools. But it’s also an achievement tainted by the unresolved ethical issues related to scraping musical work without the artist’s permission. Then there’s the issue of potentially replacing human musicians, which has not been far from the minds of people sharing their own Suno results online. On Monday, AI influencer Ethan Mollick wrote, “I’ve had a song from Suno AI stuck in my head all day. Grim milestone or good one?”

MIT License text becomes viral “sad girl” piano ballad generated by AI Read More »

billie-eilish,-pearl-jam,-200-artists-say-ai-poses-existential-threat-to-their-livelihoods

Billie Eilish, Pearl Jam, 200 artists say AI poses existential threat to their livelihoods

artificial music —

Artists say AI will “set in motion a race to the bottom that will degrade the value of our work.”

Billie Eilish attends the 2024 Vanity Fair Oscar Party hosted by Radhika Jones at the Wallis Annenberg Center for the Performing Arts on March 10, 2024 in Beverly Hills, California.

Enlarge / Billie Eilish attends the 2024 Vanity Fair Oscar Party hosted by Radhika Jones at the Wallis Annenberg Center for the Performing Arts on March 10, 2024, in Beverly Hills, California.

On Tuesday, the Artist Rights Alliance (ARA) announced an open letter critical of AI signed by over 200 musical artists, including Pearl Jam, Nicki Minaj, Billie Eilish, Stevie Wonder, Elvis Costello, and the estate of Frank Sinatra. In the letter, the artists call on AI developers, technology companies, platforms, and digital music services to stop using AI to “infringe upon and devalue the rights of human artists.” A tweet from the ARA added that AI poses an “existential threat” to their art.

Visual artists began protesting the advent of generative AI after the rise of the first mainstream AI image generators in 2022, and considering that generative AI research has since been undertaken for other forms of creative media, we have seen that protest extend to professionals in other creative domains, such as writers, actors, filmmakers—and now musicians.

“When used irresponsibly, AI poses enormous threats to our ability to protect our privacy, our identities, our music and our livelihoods,” the open letter states. It alleges that some of the “biggest and most powerful” companies (unnamed in the letter) are using the work of artists without permission to train AI models, with the aim of replacing human artists with AI-created content.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

In January, Billboard reported that AI research taking place at Google DeepMind had trained an unnamed music-generating AI on a large dataset of copyrighted music without seeking artist permission. That report may have been referring to Google’s Lyria, an AI-generation model announced in November that the company positioned as a tool for enhancing human creativity. The tech has since powered musical experiments from YouTube.

We’ve previously covered AI music generators that seemed fairly primitive throughout 2022 and 2023, such as Riffusion, Google’s MusicLM, and Stability AI’s Stable Audio. We’ve also covered open source musical voice-cloning technology that is frequently used to make musical parodies online. While we have yet to see an AI model that can generate perfect, fully composed high-quality music on demand, the quality of outputs from music synthesis models has been steadily improving over time.

In considering AI’s potential impact on music, it’s instructive to remember historical instances where tech innovations initially sparked concern among artists. For instance, the introduction of synthesizers in the 1960s and 1970s and the advent of digital sampling in the 1980s both faced scrutiny and fear from parts of the music community, but the music industry eventually adjusted.

While we’ve seen fear of the unknown related to AI going around quite a bit for the past year, it’s possible that AI tools will be integrated into the music production process like any other music production tool or technique that came before. It’s also possible that even if that kind of integration comes to pass, some artists will still get hurt along the way—and the ARA wants to speak out about it before the technology progresses further.

“Race to the bottom”

The Artists Rights Alliance is a nonprofit advocacy group that describes itself as an “alliance of working musicians, performers, and songwriters fighting for a healthy creative economy and fair treatment for all creators in the digital world.”

The signers of the ARA’s open letter say they acknowledge the potential of AI to advance human creativity when used responsibly, but they also claim that replacing artists with generative AI would “substantially dilute the royalty pool” paid out to artists, which could be “catastrophic” for many working musicians, artists, and songwriters who are trying to make ends meet.

In the letter, the artists say that unchecked AI will set in motion a race to the bottom that will degrade the value of their work and prevent them from being fairly compensated. “This assault on human creativity must be stopped,” they write. “We must protect against the predatory use of AI to steal professional artist’ voices and likenesses, violate creators’ rights, and destroy the music ecosystem.”

The emphasis on the word “human” in the letter is notable (“human artist” was used twice and “human creativity” and “human artistry” are used once, each) because it suggests the clear distinction they are drawing between the work of human artists and the output of AI systems. It implies recognition that we’ve entered a new era where not all creative output is made by people.

The letter concludes with a call to action, urging all AI developers, technology companies, platforms, and digital music services to pledge not to develop or deploy AI music-generation technology, content, or tools that undermine or replace the human artistry of songwriters and artists or deny them fair compensation for their work.

While it’s unclear whether companies will meet those demands, so far, protests from visual artists have not stopped development of ever-more advanced image-synthesis models. On Threads, frequent AI industry commentator Dare Obasanjo wrote, “Unfortunately this will be as effective as writing an open letter to stop the sun from rising tomorrow.”

Billie Eilish, Pearl Jam, 200 artists say AI poses existential threat to their livelihoods Read More »

openai-drops-login-requirements-for-chatgpt’s-free-version

OpenAI drops login requirements for ChatGPT’s free version

free as in beer? —

ChatGPT 3.5 still falls far short of GPT-4, and other models surpassed it long ago.

A glowing OpenAI logo on a blue background.

Benj Edwards

On Monday, OpenAI announced that visitors to the ChatGPT website in some regions can now use the AI assistant without signing in. Previously, the company required that users create an account to use it, even with the free version of ChatGPT that is currently powered by the GPT-3.5 AI language model. But as we have noted in the past, GPT-3.5 is widely known to provide more inaccurate information compared to GPT-4 Turbo, available in paid versions of ChatGPT.

Since its launch in November 2022, ChatGPT has transformed over time from a tech demo to a comprehensive AI assistant, and it’s always had a free version available. The cost is free because “you’re the product,” as the old saying goes. Using ChatGPT helps OpenAI gather data that will help the company train future AI models, although free users and ChatGPT Plus subscription members can both opt out of allowing the data they input into ChatGPT to be used for AI training. (OpenAI says it never trains on inputs from ChatGPT Team and Enterprise members at all).

Opening ChatGPT to everyone could provide a frictionless on-ramp for people who might use it as a substitute for Google Search or potentially gain new customers by providing an easy way for people to use ChatGPT quickly, then offering an upsell to paid versions of the service.

“It’s core to our mission to make tools like ChatGPT broadly available so that people can experience the benefits of AI,” OpenAI says on its blog page. “For anyone that has been curious about AI’s potential but didn’t want to go through the steps to set up an account, start using ChatGPT today.”

When you visit the ChatGPT website, you're immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Enlarge / When you visit the ChatGPT website, you’re immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Benj Edwards

Since kids will also be able to use ChatGPT without an account—despite it being against the terms of service—OpenAI also says it’s introducing “additional content safeguards,” such as blocking more prompts and “generations in a wider range of categories.” What exactly that entails has not been elaborated upon by OpenAI, but we reached out to the company for comment.

There might be a few other downsides to the fully open approach. On X, AI researcher Simon Willison wrote about the potential for automated abuse as a way to get around paying for OpenAI’s services: “I wonder how their scraping prevention works? I imagine the temptation for people to abuse this as a free 3.5 API will be pretty strong.”

With fierce competition, more GPT-3.5 access may backfire

Willison also mentioned a common criticism of OpenAI (as voiced in this case by Wharton professor Ethan Mollick) that people’s ideas about what AI models can do have so far largely been influenced by GPT-3.5, which, as we mentioned, is far less capable and far more prone to making things up than the paid version of ChatGPT that uses GPT-4 Turbo.

“In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model,” wrote Mollick in a tweet from early March.

With models like Google Gemini Pro 1.5 and Anthropic Claude 3 potentially surpassing OpenAI’s best proprietary model at the moment —and open weights AI models eclipsing the free version of ChatGPT—allowing people to use GPT-3.5 might not be putting OpenAI’s best foot forward. Microsoft Copilot, powered by OpenAI models, also supports a frictionless, no-login experience, but it allows access to a model based on GPT-4. But Gemini currently requires a sign-in, and Anthropic sends a login code through email.

For now, OpenAI says the login-free version of ChatGPT is not yet available to everyone, but it will be coming soon: “We’re rolling this out gradually, with the aim to make AI accessible to anyone curious about its capabilities.”

OpenAI drops login requirements for ChatGPT’s free version Read More »

playboy-image-from-1972-gets-ban-from-ieee-computer-journals

Playboy image from 1972 gets ban from IEEE computer journals

image processing —

Use of “Lenna” image in computer image processing research stretches back to the 1970s.

Playboy image from 1972 gets ban from IEEE computer journals

Aurich Lawson | Getty Image

On Wednesday, the IEEE Computer Society announced to members that, after April 1, it would no longer accept papers that include a frequently used image of a 1972 Playboy model named Lena Forsén. The so-called “Lenna image,” (Forsén added an extra “n” to her name in her Playboy appearance to aid pronunciation) has been used in image processing research since 1973 and has attracted criticism for making some women feel unwelcome in the field.

In an email from the IEEE Computer Society sent to members on Wednesday, Technical & Conference Activities Vice President Terry Benzel wrote, “IEEE’s diversity statement and supporting policies such as the IEEE Code of Ethics speak to IEEE’s commitment to promoting an including and equitable culture that welcomes all. In alignment with this culture and with respect to the wishes of the subject of the image, Lena Forsén, IEEE will no longer accept submitted papers which include the ‘Lena image.'”

An uncropped version of the 512×512-pixel test image originally appeared as the centerfold picture for the December 1972 issue of Playboy Magazine. Usage of the Lenna image in image processing began in June or July 1973 when an assistant professor named Alexander Sawchuck and a graduate student at the University of Southern California Signal and Image Processing Institute scanned a square portion of the centerfold image with a primitive drum scanner, omitting nudity present in the original image. They scanned it for a colleague’s conference paper, and after that, others began to use the image as well.

The original 512×512

The original 512×512 “Lenna” test image, which is a cropped portion of a 1972 Playboy centerfold.

The image’s use spread in other papers throughout the 1970s, 80s, and 90s, and it caught Playboy’s attention, but the company decided to overlook the copyright violations. In 1997, Playboy helped track down Forsén, who appeared at the 50th Annual Conference of the Society for Imaging Science in Technology, signing autographs for fans. “They must be so tired of me … looking at the same picture for all these years!” she said at the time. VP of new media at Playboy Eileen Kent told Wired, “We decided we should exploit this, because it is a phenomenon.”

The image, which features Forsén’s face and bare shoulder as she wears a hat with a purple feather, was reportedly ideal for testing image processing systems in the early years of digital image technology due to its high contrast and varied detail. It is also a sexually suggestive photo of an attractive woman, and its use by men in the computer field has garnered criticism over the decades, especially from female scientists and engineers who felt that the image (especially related to its association with the Playboy brand) objectified women and created an academic climate where they did not feel entirely welcome.

Due to some of this criticism, which dates back to at least 1996, the journal Nature banned the use of the Lena image in paper submissions in 2018.

The comp.compression Usenet newsgroup FAQ document claims that in 1988, a Swedish publication asked Forsén if she minded her image being used in computer science, and she was reportedly pleasantly amused. In a 2019 Wired article, Linda Kinstler wrote that Forsén did not harbor resentment about the image, but she regretted that she wasn’t paid better for it originally. “I’m really proud of that picture,” she told Kinstler at the time.

Since then, Forsén has apparently changed her mind. In 2019, Creatable and Code Like a Girl created an advertising documentary titled Losing Lena, which was part of a promotional campaign aimed at removing the Lena image from use in tech and the image processing field. In a press release for the campaign and film, Forsén is quoted as saying, “I retired from modelling a long time ago. It’s time I retired from tech, too. We can make a simple change today that creates a lasting change for tomorrow. Let’s commit to losing me.”

It seems like that commitment is now being granted. The ban in IEEE publications, which have been historically important journals for computer imaging development, will likely further set a precedent toward removing the Lenna image from common use. In his email, the IEEE’s Benzel recommended wider sensitivity about the issue, writing, “In order to raise awareness of and increase author compliance with this new policy, program committee members and reviewers should look for inclusion of this image, and if present, should ask authors to replace the Lena image with an alternative.”

Playboy image from 1972 gets ban from IEEE computer journals Read More »

openai-holds-back-wide-release-of-voice-cloning-tech-due-to-misuse-concerns

OpenAI holds back wide release of voice-cloning tech due to misuse concerns

AI speaks letters, text-to-speech or TTS, text-to-voice, speech synthesis applications, generative Artificial Intelligence, futuristic technology in language and communication.

Voice synthesis has come a long way since 1978’s Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices, but also convincingly imitate existing voices using small samples of audio.

Along those lines, OpenAI just announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website.

Once a voice is cloned, a user can input text into the Voice Engine and get an AI-generated voice result. But OpenAI is not ready to widely release its technology yet. The company initially planned to launch a pilot program for developers to sign up for the Voice Engine API earlier this month. But after more consideration about ethical implications, the company decided to scale back its ambitions for now.

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” the company writes. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.”

Voice cloning tech in general is not particularly new—we’ve covered several AI voice synthesis models since 2022, and the tech is active in the open source community with packages like OpenVoice and XTTSv2. But the idea that OpenAI is inching toward letting anyone use their particular brand of voice tech is notable. And in some ways, the company’s reticence to release it fully might be the bigger story.

OpenAI says that benefits of its voice technology include providing reading assistance through natural-sounding voices, enabling global reach for creators by translating content while preserving native accents, supporting non-verbal individuals with personalized speech options, and assisting patients in recovering their own voice after speech-impairing conditions.

But it also means that anyone with 15 seconds of someone’s recorded voice could effectively clone it, and that has obvious implications for potential misuse. Even if OpenAI never widely releases its Voice Engine, the ability to clone voices has already caused trouble in society through phone scams where someone imitates a loved one’s voice and election campaign robocalls featuring cloned voices from politicians like Joe Biden.

Also, researchers and reporters have shown that voice-cloning technology can be used to break into bank accounts that use voice authentication (such as Chase’s Voice ID), which prompted Sen. Sherrod Brown (D-Ohio), the chairman of the US Senate Committee on Banking, Housing, and Urban Affairs, to send a letter to the CEOs of several major banks in May 2023 to inquire about the security measures banks are taking to counteract AI-powered risks.

OpenAI holds back wide release of voice-cloning tech due to misuse concerns Read More »

world’s-first-global-ai-resolution-unanimously-adopted-by-united-nations

World’s first global AI resolution unanimously adopted by United Nations

We hold these seeds to be self-evident —

Nonbinding agreement seeks to protect personal data and safeguard human rights.

The United Nations building in New York.

Enlarge / The United Nations building in New York.

On Thursday, the United Nations General Assembly unanimously consented to adopt what some call the first global resolution on AI, reports Reuters. The resolution aims to foster the protection of personal data, enhance privacy policies, ensure close monitoring of AI for potential risks, and uphold human rights. It emerged from a proposal by the United States and received backing from China and 121 other countries.

Being a nonbinding agreement and thus effectively toothless, the resolution seems broadly popular in the AI industry. On X, Microsoft Vice Chair and President Brad Smith wrote, “We fully support the @UN’s adoption of the comprehensive AI resolution. The consensus reached today marks a critical step towards establishing international guardrails for the ethical and sustainable development of AI, ensuring this technology serves the needs of everyone.”

The resolution, titled “Seizing the opportunities of safe, secure and trustworthy artificial intelligence systems for sustainable development,” resulted from three months of negotiation, and the stakeholders involved seem pleased at the level of international cooperation. “We’re sailing in choppy waters with the fast-changing technology, which means that it’s more important than ever to steer by the light of our values,” one senior US administration official told Reuters, highlighting the significance of this “first-ever truly global consensus document on AI.”

In the UN, adoption by consensus means that all members agree to adopt the resolution without a vote. “Consensus is reached when all Member States agree on a text, but it does not mean that they all agree on every element of a draft document,” writes the UN in a FAQ found online. “They can agree to adopt a draft resolution without a vote, but still have reservations about certain parts of the text.”

The initiative joins a series of efforts by governments worldwide to influence the trajectory of AI development following the launch of ChatGPT and GPT-4, and the enormous hype raised by certain members of the tech industry in a public worldwide campaign waged last year. Critics fear that AI may undermine democratic processes, amplify fraudulent activities, or contribute to significant job displacement, among other issues. The resolution seeks to address the dangers associated with the irresponsible or malicious application of AI systems, which the UN says could jeopardize human rights and fundamental freedoms.

Resistance from nations such as Russia and China was anticipated, and US officials acknowledged the presence of “lots of heated conversations” during the negotiation process, according to Reuters. However, they also emphasized successful engagement with these countries and others typically at odds with the US on various issues, agreeing on a draft resolution that sought to maintain a delicate balance between promoting development and safeguarding human rights.

The new UN agreement may be the first “global” agreement, in the sense of having the participation of every UN country, but it wasn’t the first multi-state international AI agreement. That honor seems to fall to the Bletchley Declaration signed in November by the 28 nations attending the UK’s first AI Summit.

Also in November, the US, Britain, and other nations unveiled an agreement focusing on the creation of AI systems that are “secure by design” to protect against misuse by rogue actors. Europe is slowly moving forward with provisional agreements to regulate AI and is close to implementing the world’s first comprehensive AI regulations. Meanwhile, the US government still lacks consensus on legislative action related to AI regulation, with the Biden administration advocating for measures to mitigate AI risks while enhancing national security.

World’s first global AI resolution unanimously adopted by United Nations Read More »

nvidia-announces-“moonshot”-to-create-embodied-human-level-ai-in-robot-form

Nvidia announces “moonshot” to create embodied human-level AI in robot form

Here come the robots —

As companies race to pair AI with general-purpose humanoid robots, Nvidia’s GR00T emerges.

An illustration of a humanoid robot created by Nvidia.

Enlarge / An illustration of a humanoid robot created by Nvidia.

Nvidia

In sci-fi films, the rise of humanlike artificial intelligence often comes hand in hand with a physical platform, such as an android or robot. While the most advanced AI language models so far seem mostly like disembodied voices echoing from an anonymous data center, they might not remain that way for long. Some companies like Google, Figure, Microsoft, Tesla, Boston Dynamics, and others are working toward giving AI models a body. This is called “embodiment,” and AI chipmaker Nvidia wants to accelerate the process.

“Building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today,” said Nvidia CEO Jensen Huang in a statement. Huang spent a portion of Nvidia’s annual GTC conference keynote on Monday going over Nvidia’s robotics efforts. “The next generation of robotics will likely be humanoid robotics,” Huang said. “We now have the necessary technology to imagine generalized human robotics.”

To that end, Nvidia announced Project GR00T, a general-purpose foundation model for humanoid robots. As a type of AI model itself, Nvidia hopes GR00T (which stands for “Generalist Robot 00 Technology” but sounds a lot like a famous Marvel character) will serve as an AI mind for robots, enabling them to learn skills and solve various tasks on the fly. In a tweet, Nvidia researcher Linxi “Jim” Fan called the project “our moonshot to solve embodied AGI in the physical world.”

AGI, or artificial general intelligence, is a poorly defined term that usually refers to hypothetical human-level AI (or beyond) that can learn any task a human could without specialized training. Given a capable enough humanoid body driven by AGI, one could imagine fully autonomous robotic assistants or workers. Of course, some experts think that true AGI is long way off, so it’s possible that Nvidia’s goal is more aspirational than realistic. But that’s also what makes Nvidia’s plan a moonshot.

NVIDIA Robotics: A Journey From AVs to Humanoids.

“The GR00T model will enable a robot to understand multimodal instructions, such as language, video, and demonstration, and perform a variety of useful tasks,” wrote Fan on X. “We are collaborating with many leading humanoid companies around the world, so that GR00T may transfer across embodiments and help the ecosystem thrive.” We reached out to Nvidia researchers, including Fan, for comment but did not hear back by press time.

Nvidia is designing GR00T to understand natural language and emulate human movements, potentially allowing robots to learn coordination, dexterity, and other skills necessary for navigating and interacting with the real world like a person. And as it turns out, Nvidia says that making robots shaped like humans might be the key to creating functional robot assistants.

The humanoid key

Robotics startup figure, an Nvidia partner, recently showed off its humanoid

Enlarge / Robotics startup figure, an Nvidia partner, recently showed off its humanoid “Figure 01” robot.

Figure

So far, we’ve seen plenty of robotics platforms that aren’t human-shaped, including robot vacuum cleaners, autonomous weed pullers, industrial units used in automobile manufacturing, and even research arms that can fold laundry. So why focus on imitating the human form? “In a way, human robotics is likely easier,” said Huang in his GTC keynote. “And the reason for that is because we have a lot more imitation training data that we can provide robots, because we are constructed in a very similar way.”

That means that researchers can feed samples of training data captured from human movement into AI models that control robot movement, teaching them how to better move and balance themselves. Also, humanoid robots are particularly convenient because they can fit anywhere a person can, and we’ve designed a world of physical objects and interfaces (such as tools, furniture, stairs, and appliances) to be used or manipulated by the human form.

Along with GR00T, Nvidia also debuted a new computer platform called Jetson Thor, based on NVIDIA’s Thor system-on-a-chip (SoC), as part of the new Blackwell GPU architecture, which it hopes will power this new generation of humanoid robots. The SoC reportedly includes a transformer engine capable of 800 teraflops of 8-bit floating point AI computation for running models like GR00T.

Nvidia announces “moonshot” to create embodied human-level AI in robot form Read More »

nvidia-unveils-blackwell-b200,-the-“world’s-most-powerful-chip”-designed-for-ai

Nvidia unveils Blackwell B200, the “world’s most powerful chip” designed for AI

There’s no knowing where we’re rowing —

208B transistor chip can reportedly reduce AI cost and energy consumption by up to 25x.

The GB200

Enlarge / The GB200 “superchip” covered with a fanciful blue explosion.

Nvidia / Benj Edwards

On Monday, Nvidia unveiled the Blackwell B200 tensor core chip—the company’s most powerful single-chip GPU, with 208 billion transistors—which Nvidia claims can reduce AI inference operating costs (such as running ChatGPT) and energy consumption by up to 25 times compared to the H100. The company also unveiled the GB200, a “superchip” that combines two B200 chips and a Grace CPU for even more performance.

The news came as part of Nvidia’s annual GTC conference, which is taking place this week at the San Jose Convention Center. Nvidia CEO Jensen Huang delivered the keynote Monday afternoon. “We need bigger GPUs,” Huang said during his keynote. The Blackwell platform will allow the training of trillion-parameter AI models that will make today’s generative AI models look rudimentary in comparison, he said. For reference, OpenAI’s GPT-3, launched in 2020, included 175 billion parameters. Parameter count is a rough indicator of AI model complexity.

Nvidia named the Blackwell architecture after David Harold Blackwell, a mathematician who specialized in game theory and statistics and was the first Black scholar inducted into the National Academy of Sciences. The platform introduces six technologies for accelerated computing, including a second-generation Transformer Engine, fifth-generation NVLink, RAS Engine, secure AI capabilities, and a decompression engine for accelerated database queries.

Press photo of the Grace Blackwell GB200 chip, which combines two B200 GPUs with a Grace CPU into one chip.

Enlarge / Press photo of the Grace Blackwell GB200 chip, which combines two B200 GPUs with a Grace CPU into one chip.

Several major organizations, such as Amazon Web Services, Dell Technologies, Google, Meta, Microsoft, OpenAI, Oracle, Tesla, and xAI, are expected to adopt the Blackwell platform, and Nvidia’s press release is replete with canned quotes from tech CEOs (key Nvidia customers) like Mark Zuckerberg and Sam Altman praising the platform.

GPUs, once only designed for gaming acceleration, are especially well suited for AI tasks because their massively parallel architecture accelerates the immense number of matrix multiplication tasks necessary to run today’s neural networks. With the dawn of new deep learning architectures in the 2010s, Nvidia found itself in an ideal position to capitalize on the AI revolution and began designing specialized GPUs just for the task of accelerating AI models.

Nvidia’s data center focus has made the company wildly rich and valuable, and these new chips continue the trend. Nvidia’s gaming GPU revenue ($2.9 billion in the last quarter) is dwarfed in comparison to data center revenue (at $18.4 billion), and that shows no signs of stopping.

A beast within a beast

Press photo of the Nvidia GB200 NVL72 data center computer system.

Enlarge / Press photo of the Nvidia GB200 NVL72 data center computer system.

The aforementioned Grace Blackwell GB200 chip arrives as a key part of the new NVIDIA GB200 NVL72, a multi-node, liquid-cooled data center computer system designed specifically for AI training and inference tasks. It combines 36 GB200s (that’s 72 B200 GPUs and 36 Grace CPUs total), interconnected by fifth-generation NVLink, which links chips together to multiply performance.

A specification chart for the Nvidia GB200 NVL72 system.

Enlarge / A specification chart for the Nvidia GB200 NVL72 system.

“The GB200 NVL72 provides up to a 30x performance increase compared to the same number of NVIDIA H100 Tensor Core GPUs for LLM inference workloads and reduces cost and energy consumption by up to 25x,” Nvidia said.

That kind of speed-up could potentially save money and time while running today’s AI models, but it will also allow for more complex AI models to be built. Generative AI models—like the kind that power Google Gemini and AI image generators—are famously computationally hungry. Shortages of compute power have widely been cited as holding back progress and research in the AI field, and the search for more compute has led to figures like OpenAI CEO Sam Altman trying to broker deals to create new chip foundries.

While Nvidia’s claims about the Blackwell platform’s capabilities are significant, it’s worth noting that its real-world performance and adoption of the technology remain to be seen as organizations begin to implement and utilize the platform themselves. Competitors like Intel and AMD are also looking to grab a piece of Nvidia’s AI pie.

Nvidia says that Blackwell-based products will be available from various partners starting later this year.

Nvidia unveils Blackwell B200, the “world’s most powerful chip” designed for AI Read More »