openai

llms-keep-leaping-with-llama-3,-meta’s-newest-open-weights-ai-model

LLMs keep leaping with Llama 3, Meta’s newest open-weights AI model

computer-powered word generator —

Zuckerberg says new AI model “was still learning” when Meta stopped training.

A group of pink llamas on a pixelated background.

On Thursday, Meta unveiled early versions of its Llama 3 open-weights AI model that can be used to power text composition, code generation, or chatbots. It also announced that its Meta AI Assistant is now available on a website and is going to be integrated into its major social media apps, intensifying the company’s efforts to position its products against other AI assistants like OpenAI’s ChatGPT, Microsoft’s Copilot, and Google’s Gemini.

Like its predecessor, Llama 2, Llama 3 is notable for being a freely available, open-weights large language model (LLM) provided by a major AI company. Llama 3 technically does not quality as “open source” because that term has a specific meaning in software (as we have mentioned in other coverage), and the industry has not yet settled on terminology for AI model releases that ship either code or weights with restrictions (you can read Llama 3’s license here) or that ship without providing training data. We typically call these releases “open weights” instead.

At the moment, Llama 3 is available in two parameter sizes: 8 billion (8B) and 70 billion (70B), both of which are available as free downloads through Meta’s website with a sign-up. Llama 3 comes in two versions: pre-trained (basically the raw, next-token-prediction model) and instruction-tuned (fine-tuned to follow user instructions). Each has a 8,192 token context limit.

A screenshot of the Meta AI Assistant website on April 18, 2024.

Enlarge / A screenshot of the Meta AI Assistant website on April 18, 2024.

Benj Edwards

Meta trained both models on two custom-built, 24,000-GPU clusters. In a podcast interview with Dwarkesh Patel, Meta CEO Mark Zuckerberg said that the company trained the 70B model with around 15 trillion tokens of data. Throughout the process, the model never reached “saturation” (that is, it never hit a wall in terms of capability increases). Eventually, Meta pulled the plug and moved on to training other models.

“I guess our prediction going in was that it was going to asymptote more, but even by the end it was still leaning. We probably could have fed it more tokens, and it would have gotten somewhat better,” Zuckerberg said on the podcast.

Meta also announced that it is currently training a 400B parameter version of Llama 3, which some experts like Nvidia’s Jim Fan think may perform in the same league as GPT-4 Turbo, Claude 3 Opus, and Gemini Ultra on benchmarks like MMLU, GPQA, HumanEval, and MATH.

Speaking of benchmarks, we have devoted many words in the past to explaining how frustratingly imprecise benchmarks can be when applied to large language models due to issues like training contamination (that is, including benchmark test questions in the training dataset), cherry-picking on the part of vendors, and an inability to capture AI’s general usefulness in an interactive session with chat-tuned models.

But, as expected, Meta provided some benchmarks for Llama 3 that list results from MMLU (undergraduate level knowledge), GSM-8K (grade-school math), HumanEval (coding), GPQA (graduate-level questions), and MATH (math word problems). These show the 8B model performing well compared to open-weights models like Google’s Gemma 7B and Mistral 7B Instruct, and the 70B model also held its own against Gemini Pro 1.5 and Claude 3 Sonnet.

A chart of instruction-tuned Llama 3 8B and 70B benchmarks provided by Meta.

Enlarge / A chart of instruction-tuned Llama 3 8B and 70B benchmarks provided by Meta.

Meta says that the Llama 3 model has been enhanced with capabilities to understand coding (like Llama 2) and, for the first time, has been trained with both images and text—though it currently outputs only text. According to Reuters, Meta Chief Product Officer Chris Cox noted in an interview that more complex processing abilities (like executing multi-step plans) are expected in future updates to Llama 3, which will also support multimodal outputs—that is, both text and images.

Meta plans to host the Llama 3 models on a range of cloud platforms, making them accessible through AWS, Databricks, Google Cloud, and other major providers.

Also on Thursday, Meta announced that Llama 3 will become the new basis of the Meta AI virtual assistant, which the company first announced in September. The assistant will appear prominently in search features for Facebook, Instagram, WhatsApp, Messenger, and the aforementioned dedicated website that features a design similar to ChatGPT, including the ability to generate images in the same interface. The company also announced a partnership with Google to integrate real-time search results into the Meta AI assistant, adding to an existing partnership with Microsoft’s Bing.

LLMs keep leaping with Llama 3, Meta’s newest open-weights AI model Read More »

words-are-flowing-out-like-endless-rain:-recapping-a-busy-week-of-llm-news

Words are flowing out like endless rain: Recapping a busy week of LLM news

many things frequently —

Gemini 1.5 Pro launch, new version of GPT-4 Turbo, new Mistral model, and more.

An image of a boy amazed by flying letters.

Enlarge / An image of a boy amazed by flying letters.

Some weeks in AI news are eerily quiet, but during others, getting a grip on the week’s events feels like trying to hold back the tide. This week has seen three notable large language model (LLM) releases: Google Gemini Pro 1.5 hit general availability with a free tier, OpenAI shipped a new version of GPT-4 Turbo, and Mistral released a new openly licensed LLM, Mixtral 8x22B. All three of those launches happened within 24 hours starting on Tuesday.

With the help of software engineer and independent AI researcher Simon Willison (who also wrote about this week’s hectic LLM launches on his own blog), we’ll briefly cover each of the three major events in roughly chronological order, then dig into some additional AI happenings this week.

Gemini Pro 1.5 general release

On Tuesday morning Pacific time, Google announced that its Gemini 1.5 Pro model (which we first covered in February) is now available in 180-plus countries, excluding Europe, via the Gemini API in a public preview. This is Google’s most powerful public LLM so far, and it’s available in a free tier that permits up to 50 requests a day.

It supports up to 1 million tokens of input context. As Willison notes in his blog, Gemini 1.5 Pro’s API price at $7/million input tokens and $21/million output tokens costs a little less than GPT-4 Turbo (priced at $10/million in and $30/million out) and more than Claude 3 Sonnet (Anthropic’s mid-tier LLM, priced at $3/million in and $15/million out).

Notably, Gemini 1.5 Pro includes native audio (speech) input processing that allows users to upload audio or video prompts, a new File API for handling files, the ability to add custom system instructions (system prompts) for guiding model responses, and a JSON mode for structured data extraction.

“Majorly Improved” GPT-4 Turbo launch

A GPT-4 Turbo performance chart provided by OpenAI.

Enlarge / A GPT-4 Turbo performance chart provided by OpenAI.

Just a bit later than Google’s 1.5 Pro launch on Tuesday, OpenAI announced that it was rolling out a “majorly improved” version of GPT-4 Turbo (a model family originally launched in November) called “gpt-4-turbo-2024-04-09.” It integrates multimodal GPT-4 Vision processing (recognizing the contents of images) directly into the model, and it initially launched through API access only.

Then on Thursday, OpenAI announced that the new GPT-4 Turbo model had just become available for paid ChatGPT users. OpenAI said that the new model improves “capabilities in writing, math, logical reasoning, and coding” and shared a chart that is not particularly useful in judging capabilities (that they later updated). The company also provided an example of an alleged improvement, saying that when writing with ChatGPT, the AI assistant will use “more direct, less verbose, and use more conversational language.”

The vague nature of OpenAI’s GPT-4 Turbo announcements attracted some confusion and criticism online. On X, Willison wrote, “Who will be the first LLM provider to publish genuinely useful release notes?” In some ways, this is a case of “AI vibes” again, as we discussed in our lament about the poor state of LLM benchmarks during the debut of Claude 3. “I’ve not actually spotted any definite differences in quality [related to GPT-4 Turbo],” Willison told us directly in an interview.

The update also expanded GPT-4’s knowledge cutoff to April 2024, although some people are reporting it achieves this through stealth web searches in the background, and others on social media have reported issues with date-related confabulations.

Mistral’s mysterious Mixtral 8x22B release

An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It's hard to draw a picture of an LLM, so a robot will have to do.

Enlarge / An illustration of a robot holding a French flag, figuratively reflecting the rise of AI in France due to Mistral. It’s hard to draw a picture of an LLM, so a robot will have to do.

Not to be outdone, on Tuesday night, French AI company Mistral launched its latest openly licensed model, Mixtral 8x22B, by tweeting a torrent link devoid of any documentation or commentary, much like it has done with previous releases.

The new mixture-of-experts (MoE) release weighs in with a larger parameter count than its previously most-capable open model, Mixtral 8x7B, which we covered in December. It’s rumored to potentially be as capable as GPT-4 (In what way, you ask? Vibes). But that has yet to be seen.

“The evals are still rolling in, but the biggest open question right now is how well Mixtral 8x22B shapes up,” Willison told Ars. “If it’s in the same quality class as GPT-4 and Claude 3 Opus, then we will finally have an openly licensed model that’s not significantly behind the best proprietary ones.”

This release has Willison most excited, saying, “If that thing really is GPT-4 class, it’s wild, because you can run that on a (very expensive) laptop. I think you need 128GB of MacBook RAM for it, twice what I have.”

The new Mixtral is not listed on Chatbot Arena yet, Willison noted, because Mistral has not released a fine-tuned model for chatting yet. It’s still a raw, predict-the-next token LLM. “There’s at least one community instruction tuned version floating around now though,” says Willison.

Chatbot Arena Leaderboard shake-ups

A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Enlarge / A Chatbot Arena Leaderboard screenshot taken on April 12, 2024.

Benj Edwards

This week’s LLM news isn’t limited to just the big names in the field. There have also been rumblings on social media about the rising performance of open source models like Cohere’s Command R+, which reached position 6 on the LMSYS Chatbot Arena Leaderboard—the highest-ever ranking for an open-weights model.

And for even more Chatbot Arena action, apparently the new version of GPT-4 Turbo is proving competitive with Claude 3 Opus. The two are still in a statistical tie, but GPT-4 Turbo recently pulled ahead numerically. (In March, we reported when Claude 3 first numerically pulled ahead of GPT-4 Turbo, which was then the first time another AI model had surpassed a GPT-4 family model member on the leaderboard.)

Regarding this fierce competition among LLMs—of which most of the muggle world is unaware and will likely never be—Willison told Ars, “The past two months have been a whirlwind—we finally have not just one but several models that are competitive with GPT-4.” We’ll see if OpenAI’s rumored release of GPT-5 later this year will restore the company’s technological lead, we note, which once seemed insurmountable. But for now, Willison says, “OpenAI are no longer the undisputed leaders in LLMs.”

Words are flowing out like endless rain: Recapping a busy week of LLM news Read More »

intel’s-“gaudi-3”-ai-accelerator-chip-may-give-nvidia’s-h100-a-run-for-its-money

Intel’s “Gaudi 3” AI accelerator chip may give Nvidia’s H100 a run for its money

Adventures in Matrix Multiplication —

Intel claims 50% more speed when running AI language models vs. the market leader.

An Intel handout photo of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout photo of the Gaudi 3 AI accelerator.

On Tuesday, Intel revealed a new AI accelerator chip called Gaudi 3 at its Vision 2024 event in Phoenix. With strong claimed performance while running large language models (like those that power ChatGPT), the company has positioned Gaudi 3 as an alternative to Nvidia’s H100, a popular data center GPU that has been subject to shortages, though apparently that is easing somewhat.

Compared to Nvidia’s H100 chip, Intel projects a 50 percent faster training time on Gaudi 3 for both OpenAI’s GPT-3 175B LLM and the 7-billion parameter version of Meta’s Llama 2. In terms of inference (running the trained model to get outputs), Intel claims that its new AI chip delivers 50 percent faster performance than H100 for Llama 2 and Falcon 180B, which are both relatively popular open-weights models.

Intel is targeting the H100 because of its high market share, but the chip isn’t Nvidia’s most powerful AI accelerator chip in the pipeline. Announcements of the H200 and the Blackwell B200 have since surpassed the H100 on paper, but neither of those chips is out yet (the H200 is expected in the second quarter of 2024—basically any day now).

Meanwhile, the aforementioned H100 supply issues have been a major headache for tech companies and AI researchers who have to fight for access to any chips that can train AI models. This has led several tech companies like Microsoft, Meta, and OpenAI (rumor has it) to seek their own AI-accelerator chip designs, although that custom silicon is typically manufactured by either Intel or TSMC. Google has its own line of tensor processing units (TPUs) that it has been using internally since 2015.

Given those issues, Intel’s Gaudi 3 may be a potentially attractive alternative to the H100 if Intel can hit an ideal price (which Intel has not provided, but an H100 reportedly costs around $30,000–$40,000) and maintain adequate production. AMD also manufactures a competitive range of AI chips, such as the AMD Instinct MI300 Series, that sell for around $10,000–$15,000.

Gaudi 3 performance

An Intel handout featuring specifications of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout featuring specifications of the Gaudi 3 AI accelerator.

Intel says the new chip builds upon the architecture of its predecessor, Gaudi 2, by featuring two identical silicon dies connected by a high-bandwidth connection. Each die contains a central cache memory of 48 megabytes, surrounded by four matrix multiplication engines and 32 programmable tensor processor cores, bringing the total cores to 64.

The chipmaking giant claims that Gaudi 3 delivers double the AI compute performance of Gaudi 2 using 8-bit floating-point infrastructure, which has become crucial for training transformer models. The chip also offers a fourfold boost for computations using the BFloat 16-number format. Gaudi 3 also features 128GB of the less expensive HBMe2 memory capacity (which may contribute to price competitiveness) and features 3.7TB of memory bandwidth.

Since data centers are well-known to be power hungry, Intel emphasizes the power efficiency of Gaudi 3, claiming 40 percent greater inference power-efficiency across Llama 7B and 70B parameters, and Falcon 180B parameter models compared to Nvidia’s H100. Eitan Medina, chief operating officer of Intel’s Habana Labs, attributes this advantage to Gaudi’s large-matrix math engines, which he claims require significantly less memory bandwidth compared to other architectures.

Gaudi vs. Blackwell

An Intel handout photo of the Gaudi 3 AI accelerator.

Enlarge / An Intel handout photo of the Gaudi 3 AI accelerator.

Last month, we covered the splashy launch of Nvidia’s Blackwell architecture, including the B200 GPU, which Nvidia claims will be the world’s most powerful AI chip. It seems natural, then, to compare what we know about Nvidia’s highest-performing AI chip to the best of what Intel can currently produce.

For starters, Gaudi 3 is being manufactured using TSMC’s N5 process technology, according to IEEE Spectrum, narrowing the gap between Intel and Nvidia in terms of semiconductor fabrication technology. The upcoming Nvidia Blackwell chip will use a custom N4P process, which reportedly offers modest performance and efficiency improvements over N5.

Gaudi 3’s use of HBM2e memory (as we mentioned above) is notable compared to the more expensive HBM3 or HBM3e used in competing chips, offering a balance of performance and cost-efficiency. This choice seems to emphasize Intel’s strategy to compete not only on performance but also on price.

As far as raw performance comparisons between Gaudi 3 and the B200, that can’t be known until the chips have been released and benchmarked by a third party.

As the race to power the tech industry’s thirst for AI computation heats up, IEEE Spectrum notes that the next generation of Intel’s Gaudi chip, code-named Falcon Shores, remains a point of interest. It also remains to be seen whether Intel will continue to rely on TSMC’s technology or leverage its own foundry business and upcoming nanosheet transistor technology to gain a competitive edge in the AI accelerator market.

Intel’s “Gaudi 3” AI accelerator chip may give Nvidia’s H100 a run for its money Read More »

us-lawmaker-proposes-a-public-database-of-all-ai-training-material

US lawmaker proposes a public database of all AI training material

Who’s got the receipts? —

Proposed law would require more transparency from AI companies.

US lawmaker proposes a public database of all AI training material

Amid a flurry of lawsuits over AI models’ training data, US Representative Adam Schiff (D-Calif.) has introduced a bill that would require AI companies to disclose exactly which copyrighted works are included in datasets training AI systems.

The Generative AI Disclosure Act “would require a notice to be submitted to the Register of Copyrights prior to the release of a new generative AI system with regard to all copyrighted works used in building or altering the training dataset for that system,” Schiff said in a press release.

The bill is retroactive and would apply to all AI systems available today, as well as to all AI systems to come. It would take effect 180 days after it’s enacted, requiring anyone who creates or alters a training set not only to list works referenced by the dataset, but also to provide a URL to the dataset within 30 days before the AI system is released to the public. That URL would presumably give creators a way to double-check if their materials have been used and seek any credit or compensation available before the AI tools are in use.

All notices would be kept in a publicly available online database.

Schiff described the act as championing “innovation while safeguarding the rights and contributions of creators, ensuring they are aware when their work contributes to AI training datasets.”

“This is about respecting creativity in the age of AI and marrying technological progress with fairness,” Schiff said.

Currently, creators who don’t have access to training datasets rely on AI models’ outputs to figure out if their copyrighted works may have been included in training various AI systems. The New York Times, for example, prompted ChatGPT to spit out excerpts of its articles, relying on a tactic to identify training data by asking ChatGPT to produce lines from specific articles, which OpenAI has curiously described as “hacking.”

Under Schiff’s law, The New York Times would need to consult the database to ID all articles used to train ChatGPT or any other AI system.

Any AI maker who violates the act would risk a “civil penalty in an amount not less than $5,000,” the proposed bill said.

At a hearing on artificial intelligence and intellectual property, Rep. Darrell Issa (R-Calif.)—who chairs the House Judiciary Subcommittee on Courts, Intellectual Property, and the Internet—told Schiff that his subcommittee would consider the “thoughtful” bill.

Schiff told the subcommittee that the bill is “only a first step” toward “ensuring that at a minimum” creators are “aware of when their work contributes to AI training datasets,” saying that he would “welcome the opportunity to work with members of the subcommittee” on advancing the bill.

“The rapid development of generative AI technologies has outpaced existing copyright laws, which has led to widespread use of creative content to train generative AI models without consent or compensation,” Schiff warned at the hearing.

In Schiff’s press release, Meredith Stiehm, president of the Writers Guild of America West, joined leaders from other creative groups celebrating the bill as an “important first step” for rightsholders.

“Greater transparency and guardrails around AI are necessary to protect writers and other creators” and address “the unprecedented and unauthorized use of copyrighted materials to train generative AI systems,” Stiehm said.

Until the thorniest AI copyright questions are settled, Ken Doroshow, a chief legal officer for the Recording Industry Association of America, suggested that Schiff’s bill filled an important gap by introducing “comprehensive and transparent recordkeeping” that would provide “one of the most fundamental building blocks of effective enforcement of creators’ rights.”

A senior adviser for the Human Artistry Campaign, Moiya McTier, went further, celebrating the bill as stopping AI companies from “exploiting” artists and creators.

“AI companies should stop hiding the ball when they copy creative works into AI systems and embrace clear rules of the road for recordkeeping that create a level and transparent playing field for the development and licensing of genuinely innovative applications and tools,” McTier said.

AI copyright guidance coming soon

While courts weigh copyright questions raised by artists, book authors, and newspapers, the US Copyright Office announced in March that it would be issuing guidance later this year, but the office does not seem to be prioritizing questions on AI training.

Instead, the Copyright Office will focus first on issuing guidance on deepfakes and AI outputs. This spring, the office will release a report “analyzing the impact of AI on copyright” of “digital replicas, or the use of AI to digitally replicate individuals’ appearances, voices, or other aspects of their identities.” Over the summer, another report will focus on “the copyrightability of works incorporating AI-generated material.”

Regarding “the topic of training AI models on copyrighted works as well as any licensing considerations and liability issues,” the Copyright Office did not provide a timeline for releasing guidance, only confirming that their “goal is to finalize the entire report by the end of the fiscal year.”

Once guidance is available, it could sway court opinions, although courts do not necessarily have to apply Copyright Office guidance when weighing cases.

The Copyright Office’s aspirational timeline does seem to be ahead of when at least some courts can be expected to decide on some of the biggest copyright questions for some creators. The class-action lawsuit raised by book authors against OpenAI, for example, is not expected to be resolved until February 2025, and the New York Times’ lawsuit is likely on a similar timeline. However, artists suing Stability AI face a hearing on that AI company’s motion to dismiss this May.

US lawmaker proposes a public database of all AI training material Read More »

new-ai-music-generator-udio-synthesizes-realistic-music-on-demand

New AI music generator Udio synthesizes realistic music on demand

Battle of the AI bands —

But it still needs trial and error to generate high-quality results.

A screenshot of AI-generated songs listed on Udio on April 10, 2024.

Enlarge / A screenshot of AI-generated songs listed on Udio on April 10, 2024.

Benj Edwards

Between 2002 and 2005, I ran a music website where visitors could submit song titles that I would write and record a silly song around. In the liner notes for my first CD release in 2003, I wrote about a day when computers would potentially put me out of business, churning out music automatically at a pace I could not match. While I don’t actively post music on that site anymore, that day is almost here.

On Wednesday, a group of ex-DeepMind employees launched Udio, a new AI music synthesis service that can create novel high-fidelity musical audio from written prompts, including user-provided lyrics. It’s similar to Suno, which we covered on Monday. With some key human input, Udio can create facsimiles of human-produced music in genres like country, barbershop quartet, German pop, classical, hard rock, hip hop, show tunes, and more. It’s currently free to use during a beta period.

Udio is also freaking out some musicians on Reddit. As we mentioned in our Suno piece, Udio is exactly the kind of AI-powered music generation service that over 200 musical artists were afraid of when they signed an open protest letter last week.

But as impressive as the Udio songs first seem from a technical AI-generation standpoint (not necessarily judging by musical merit), its generation capability isn’t perfect. We experimented with its creation tool and the results felt less impressive than those created by Suno. The high-quality musical samples showcased on Udio’s site likely resulted from a lot of creative human input (such as human-written lyrics) and cherry-picking the best compositional parts of songs out of many generations. In fact, Udio lays out a five-step workflow to build a 1.5-minute-long song in a FAQ.

For example, we created an Ars Technica “Moonshark” song on Udio using the same prompt as one we used previously with Suno. In its raw form, the results sound half-baked and almost nightmarish (here is the Suno version for comparison). It’s also a lot shorter by default at 32 seconds compared to Suno’s 1-minute and 32-second output. But Udio allows songs to be extended, or you can try generating a poor result again with different prompts for different results.

After registering a Udio account, anyone can create a track by entering a text prompt that can include lyrics, a story direction, and musical genre tags. Udio then tackles the task in two stages. First, it utilizes a large language model (LLM) similar to ChatGPT to generate lyrics (if necessary) based on the provided prompt. Next, it synthesizes music using a method that Udio does not disclose, but it’s likely a diffusion model, similar to Stability AI’s Stable Audio.

From the given prompt, Udio’s AI model generates two distinct song snippets for you to choose from. You can then publish the song for the Udio community, download the audio or video file to share on other platforms, or directly share it on social media. Other Udio users can also remix or build on existing songs. Udio’s terms of service say that the company claims no rights over the musical generations and that they can be used for commercial purposes.

Although the Udio team has not revealed the specific details of its model or training data (which is likely filled with copyrighted material), it told Tom’s Guide that the system has built-in measures to identify and block tracks that too closely resemble the work of specific artists, ensuring that the generated music remains original.

And that brings us back to humans, some of whom are not taking the onset of AI-generated music very well. “I gotta be honest, this is depressing as hell,” wrote one Reddit commenter in a thread about Udio. “I’m still broadly optimistic that music will be fine in the long run somehow. But like, why do this? Why automate art?”

We’ll hazard an answer by saying that replicating art is a key target for AI research because the results can be inaccurate and imprecise and still seem notable or gee-whiz amazing, which is a key characteristic of generative AI. It’s flashy and impressive-looking while allowing for a general lack of quantitative rigor. We’ve already seen AI come for still images, video, and text with varied results regarding representative accuracy. Fully composed musical recordings seem to be next on the list of AI hills to (approximately) conquer, and the competition is heating up.

New AI music generator Udio synthesizes realistic music on demand Read More »

mit-license-text-becomes-viral-“sad-girl”-piano-ballad-generated-by-ai

MIT License text becomes viral “sad girl” piano ballad generated by AI

WARRANTIES OF MERCHANTABILITY —

“Permission is hereby granted” comes from Suno AI engine that creates new songs on demand.

Illustration of a robot singing.

We’ve come a long way since primitive AI music generators in 2022. Today, AI tools like Suno.ai allow any series of words to become song lyrics, including inside jokes (as you’ll see below). On Wednesday, prompt engineer Riley Goodside tweeted an AI-generated song created with the prompt “sad girl with piano performs the text of the MIT License,” and it began to circulate widely in the AI community online.

The MIT License is a famous permissive software license created in the late 1980s, frequently used in open source projects. “My favorite part of this is ~1: 25 it nails ‘WARRANTIES OF MERCHANTABILITY’ with a beautiful Imogen Heap-style glissando then immediately pronounces ‘FITNESS’ as ‘fistiff,'” Goodside wrote on X.

Suno (which means “listen” in Hindi) was formed in 2023 in Cambridge, Massachusetts. It’s the brainchild of Michael Shulman, Georg Kucsko, Martin Camacho, and Keenan Freyberg, who formerly worked at companies like Meta and TikTok. Suno has already attracted big-name partners, such as Microsoft, which announced the integration of an earlier version of the Suno engine into Bing Chat last December. Today, Suno is on v3 of its model, which can create temporally coherent two-minute songs in many different genres.

The company did not reply to our request for an interview by press time. In March, Brian Hiatt of Rolling Stone wrote a profile about Suno that describes the service as a collaboration between OpenAI’s ChatGPT (for lyric writing) and Suno’s music generation model, which some experts think has likely been trained on recordings of copyrighted music without license or artist permission.

It’s exactly this kind of service that upset over 200 musical artists enough last week that they signed an Artist Rights Alliance open letter asking tech companies to stop using AI tools to generate music that could replace human artists.

Considering the unknown provenance of the training data, ownership of the generated songs seems like a complicated question. Suno’s FAQ says that music generated using its free tier remains owned by Suno and can only be used for non-commercial purposes. Paying subscribers reportedly own generated songs “while subscribed to Pro or Premier,” subject to Suno’s terms of service. However, the US Copyright Office took a stance last year that purely AI-generated visual art cannot be copyrighted, and while that standard has not yet been resolved for AI-generated music, it might eventually become official legal policy as well.

The Moonshark song

A screenshot of the Suno.ai website showing lyrics of an AI-generated

Enlarge / A screenshot of the Suno.ai website showing lyrics of an AI-generated “Moonshark” song.

Benj Edwards

While using the service, Suno appears to have no trouble creating unique lyrics based on your prompt (unless you supply your own) and sets those words to stylized genres of music it generates based on its training dataset. It dynamically generates vocals as well, although they include audible aberrations. Suno’s output is not indistinguishable from high-fidelity human-created music yet, but given the pace of progress we’ve seen, that bridge could be crossed within the next year.

To get a sense of what Suno can do, we created an account on the site and prompted the AI engine to create songs about our mascot, Moonshark, and about barbarians with CRTs, two inside jokes at Ars. What’s interesting is that although the AI model aced the task of creating an original song for each topic, both songs start with the same line, “In the depths of the digital domain.” That’s possibly an artifact of whatever hidden prompt Suno is using to instruct ChatGPT when writing the lyrics.

Suno is arguably a fun toy to experiment with and doubtless a milestone in generative AI music tools. But it’s also an achievement tainted by the unresolved ethical issues related to scraping musical work without the artist’s permission. Then there’s the issue of potentially replacing human musicians, which has not been far from the minds of people sharing their own Suno results online. On Monday, AI influencer Ethan Mollick wrote, “I’ve had a song from Suno AI stuck in my head all day. Grim milestone or good one?”

MIT License text becomes viral “sad girl” piano ballad generated by AI Read More »

ai-hardware-company-from-jony-ive,-sam-altman-seeks-$1-billion-in-funding

AI hardware company from Jony Ive, Sam Altman seeks $1 billion in funding

AI Boom —

A venture fund founded by Laurene Powell Jobs could finance the company.

Jony Ive, the former Apple designer.

Jony Ive, the former Apple designer.

Former Apple design lead Jony Ive and current OpenAI CEO Sam Altman are seeking funding for a new company that will produce an “artificial intelligence-powered personal device,” according to The Information‘s sources, who are said to be familiar with the plans.

The exact nature of the device is unknown, but it will not look anything like a smartphone, according to the sources. We first heard tell of this venture in the fall of 2023, but The Information’s story reveals that talks are moving forward to get the company off the ground.

Ive and Altman hope to raise at least $1 billion for the new company. The complete list of potential funding sources they’ve spoken with is unknown, but The Information’s sources say they are in talks with frequent OpenAI investor Thrive Capital as well as Emerson Collective, a venture capital firm founded by Laurene Powell Jobs.

SoftBank CEO and super-investor Masayoshi Son is also said to have spoken with Altman and Ive about the venture. Financial Times previously reported that Son wanted Arm (another company he has backed) to be involved in the project.

Obviously, those are some of the well-established and famous names within today’s tech industry. Personal connections may play a role; for example, Jobs is said to have a friendship with both Ive and Altman. That might be critical because the pedigree involved could scare off smaller investors since the big names could drive up the initial cost of investment.

Although we don’t know anything about the device yet, it would likely put Ive in direct competition with his former employer, Apple. It has been reported elsewhere that Apple is working on bringing powerful new AI features to iOS 18 and later versions of the software for iPhones, iPads, and the company’s other devices.

Altman already has his hands in several other AI ventures besides OpenAI. The Information reports that there is no indication yet that OpenAI would be directly involved in the new hardware company.

AI hardware company from Jony Ive, Sam Altman seeks $1 billion in funding Read More »

publisher:-openai’s-gpt-store-bots-are-illegally-scraping-our-textbooks

Publisher: OpenAI’s GPT Store bots are illegally scraping our textbooks

OpenAI logo

For the past few months, Morten Blichfeldt Andersen has spent many hours scouring OpenAI’s GPT Store. Since it launched in January, the marketplace for bespoke bots has filled up with a deep bench of useful and sometimes quirky AI tools. Cartoon generators spin up New Yorker–style illustrations and vivid anime stills. Programming and writing assistants offer shortcuts for crafting code and prose. There’s also a color analysis bot, a spider identifier, and a dating coach called RizzGPT. Yet Blichfeldt Andersen is hunting only for one very specific type of bot: Those built on his employer’s copyright-protected textbooks without permission.

Blichfeldt Andersen is publishing director at Praxis, a Danish textbook purveyor. The company has been embracing AI and created its own custom chatbots. But it is currently engaged in a game of whack-a-mole in the GPT Store, and Blichfeldt Andersen is the man holding the mallet.

“I’ve been personally searching for infringements and reporting them,” Blichfeldt Andersen says. “They just keep coming up.” He suspects the culprits are primarily young people uploading material from textbooks to create custom bots to share with classmates—and that he has uncovered only a tiny fraction of the infringing bots in the GPT Store. “Tip of the iceberg,” Blichfeldt Andersen says.

It is easy to find bots in the GPT Store whose descriptions suggest they might be tapping copyrighted content in some way, as Techcrunch noted in a recent article claiming OpenAI’s store was overrun with “spam.” Using copyrighted material without permission is permissible in some contexts but in others rightsholders can take legal action. WIRED found a GPT called Westeros Writer that claims to “write like George R.R. Martin,” the creator of Game of Thrones. Another, Voice of Atwood, claims to imitate the writer Margaret Atwood. Yet another, Write Like Stephen, is intended to emulate Stephen King.

When WIRED tried to trick the King bot into revealing the “system prompt” that tunes its responses, the output suggested it had access to King’s memoir On Writing. Write Like Stephen was able to reproduce passages from the book verbatim on demand, even noting which page the material came from. (WIRED could not make contact with the bot’s developer, because it did not provide an email address, phone number, or external social profile.)

OpenAI spokesperson Kayla Wood says it responds to takedown requests against GPTs made with copyrighted content but declined to answer WIRED’s questions about how frequently it fulfills such requests. She also says the company proactively looks for problem GPTs. “We use a combination of automated systems, human review, and user reports to find and assess GPTs that potentially violate our policies, including the use of content from third parties without necessary permission,” Wood says.

New disputes

The GPT store’s copyright problem could add to OpenAI’s existing legal headaches. The company is facing a number of high-profile lawsuits alleging copyright infringement, including one brought by The New York Times and several brought by different groups of fiction and nonfiction authors, including big names like George R.R. Martin.

Chatbots offered in OpenAI’s GPT Store are based on the same technology as its own ChatGPT but are created by outside developers for specific functions. To tailor their bot, a developer can upload extra information that it can tap to augment the knowledge baked into OpenAI’s technology. The process of consulting this additional information to respond to a person’s queries is called retrieval-augmented generation, or RAG. Blichfeldt Andersen is convinced that the RAG files behind the bots in the GPT Store are a hotbed of copyrighted materials uploaded without permission.

Publisher: OpenAI’s GPT Store bots are illegally scraping our textbooks Read More »

billie-eilish,-pearl-jam,-200-artists-say-ai-poses-existential-threat-to-their-livelihoods

Billie Eilish, Pearl Jam, 200 artists say AI poses existential threat to their livelihoods

artificial music —

Artists say AI will “set in motion a race to the bottom that will degrade the value of our work.”

Billie Eilish attends the 2024 Vanity Fair Oscar Party hosted by Radhika Jones at the Wallis Annenberg Center for the Performing Arts on March 10, 2024 in Beverly Hills, California.

Enlarge / Billie Eilish attends the 2024 Vanity Fair Oscar Party hosted by Radhika Jones at the Wallis Annenberg Center for the Performing Arts on March 10, 2024, in Beverly Hills, California.

On Tuesday, the Artist Rights Alliance (ARA) announced an open letter critical of AI signed by over 200 musical artists, including Pearl Jam, Nicki Minaj, Billie Eilish, Stevie Wonder, Elvis Costello, and the estate of Frank Sinatra. In the letter, the artists call on AI developers, technology companies, platforms, and digital music services to stop using AI to “infringe upon and devalue the rights of human artists.” A tweet from the ARA added that AI poses an “existential threat” to their art.

Visual artists began protesting the advent of generative AI after the rise of the first mainstream AI image generators in 2022, and considering that generative AI research has since been undertaken for other forms of creative media, we have seen that protest extend to professionals in other creative domains, such as writers, actors, filmmakers—and now musicians.

“When used irresponsibly, AI poses enormous threats to our ability to protect our privacy, our identities, our music and our livelihoods,” the open letter states. It alleges that some of the “biggest and most powerful” companies (unnamed in the letter) are using the work of artists without permission to train AI models, with the aim of replacing human artists with AI-created content.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

  • A list of musical artists that signed the ARA open letter against generative AI.

In January, Billboard reported that AI research taking place at Google DeepMind had trained an unnamed music-generating AI on a large dataset of copyrighted music without seeking artist permission. That report may have been referring to Google’s Lyria, an AI-generation model announced in November that the company positioned as a tool for enhancing human creativity. The tech has since powered musical experiments from YouTube.

We’ve previously covered AI music generators that seemed fairly primitive throughout 2022 and 2023, such as Riffusion, Google’s MusicLM, and Stability AI’s Stable Audio. We’ve also covered open source musical voice-cloning technology that is frequently used to make musical parodies online. While we have yet to see an AI model that can generate perfect, fully composed high-quality music on demand, the quality of outputs from music synthesis models has been steadily improving over time.

In considering AI’s potential impact on music, it’s instructive to remember historical instances where tech innovations initially sparked concern among artists. For instance, the introduction of synthesizers in the 1960s and 1970s and the advent of digital sampling in the 1980s both faced scrutiny and fear from parts of the music community, but the music industry eventually adjusted.

While we’ve seen fear of the unknown related to AI going around quite a bit for the past year, it’s possible that AI tools will be integrated into the music production process like any other music production tool or technique that came before. It’s also possible that even if that kind of integration comes to pass, some artists will still get hurt along the way—and the ARA wants to speak out about it before the technology progresses further.

“Race to the bottom”

The Artists Rights Alliance is a nonprofit advocacy group that describes itself as an “alliance of working musicians, performers, and songwriters fighting for a healthy creative economy and fair treatment for all creators in the digital world.”

The signers of the ARA’s open letter say they acknowledge the potential of AI to advance human creativity when used responsibly, but they also claim that replacing artists with generative AI would “substantially dilute the royalty pool” paid out to artists, which could be “catastrophic” for many working musicians, artists, and songwriters who are trying to make ends meet.

In the letter, the artists say that unchecked AI will set in motion a race to the bottom that will degrade the value of their work and prevent them from being fairly compensated. “This assault on human creativity must be stopped,” they write. “We must protect against the predatory use of AI to steal professional artist’ voices and likenesses, violate creators’ rights, and destroy the music ecosystem.”

The emphasis on the word “human” in the letter is notable (“human artist” was used twice and “human creativity” and “human artistry” are used once, each) because it suggests the clear distinction they are drawing between the work of human artists and the output of AI systems. It implies recognition that we’ve entered a new era where not all creative output is made by people.

The letter concludes with a call to action, urging all AI developers, technology companies, platforms, and digital music services to pledge not to develop or deploy AI music-generation technology, content, or tools that undermine or replace the human artistry of songwriters and artists or deny them fair compensation for their work.

While it’s unclear whether companies will meet those demands, so far, protests from visual artists have not stopped development of ever-more advanced image-synthesis models. On Threads, frequent AI industry commentator Dare Obasanjo wrote, “Unfortunately this will be as effective as writing an open letter to stop the sun from rising tomorrow.”

Billie Eilish, Pearl Jam, 200 artists say AI poses existential threat to their livelihoods Read More »

openai-drops-login-requirements-for-chatgpt’s-free-version

OpenAI drops login requirements for ChatGPT’s free version

free as in beer? —

ChatGPT 3.5 still falls far short of GPT-4, and other models surpassed it long ago.

A glowing OpenAI logo on a blue background.

Benj Edwards

On Monday, OpenAI announced that visitors to the ChatGPT website in some regions can now use the AI assistant without signing in. Previously, the company required that users create an account to use it, even with the free version of ChatGPT that is currently powered by the GPT-3.5 AI language model. But as we have noted in the past, GPT-3.5 is widely known to provide more inaccurate information compared to GPT-4 Turbo, available in paid versions of ChatGPT.

Since its launch in November 2022, ChatGPT has transformed over time from a tech demo to a comprehensive AI assistant, and it’s always had a free version available. The cost is free because “you’re the product,” as the old saying goes. Using ChatGPT helps OpenAI gather data that will help the company train future AI models, although free users and ChatGPT Plus subscription members can both opt out of allowing the data they input into ChatGPT to be used for AI training. (OpenAI says it never trains on inputs from ChatGPT Team and Enterprise members at all).

Opening ChatGPT to everyone could provide a frictionless on-ramp for people who might use it as a substitute for Google Search or potentially gain new customers by providing an easy way for people to use ChatGPT quickly, then offering an upsell to paid versions of the service.

“It’s core to our mission to make tools like ChatGPT broadly available so that people can experience the benefits of AI,” OpenAI says on its blog page. “For anyone that has been curious about AI’s potential but didn’t want to go through the steps to set up an account, start using ChatGPT today.”

When you visit the ChatGPT website, you're immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Enlarge / When you visit the ChatGPT website, you’re immediately presented with a chat box like this (in some regions). Screenshot captured April 1, 2024.

Benj Edwards

Since kids will also be able to use ChatGPT without an account—despite it being against the terms of service—OpenAI also says it’s introducing “additional content safeguards,” such as blocking more prompts and “generations in a wider range of categories.” What exactly that entails has not been elaborated upon by OpenAI, but we reached out to the company for comment.

There might be a few other downsides to the fully open approach. On X, AI researcher Simon Willison wrote about the potential for automated abuse as a way to get around paying for OpenAI’s services: “I wonder how their scraping prevention works? I imagine the temptation for people to abuse this as a free 3.5 API will be pretty strong.”

With fierce competition, more GPT-3.5 access may backfire

Willison also mentioned a common criticism of OpenAI (as voiced in this case by Wharton professor Ethan Mollick) that people’s ideas about what AI models can do have so far largely been influenced by GPT-3.5, which, as we mentioned, is far less capable and far more prone to making things up than the paid version of ChatGPT that uses GPT-4 Turbo.

“In every group I speak to, from business executives to scientists, including a group of very accomplished people in Silicon Valley last night, much less than 20% of the crowd has even tried a GPT-4 class model,” wrote Mollick in a tweet from early March.

With models like Google Gemini Pro 1.5 and Anthropic Claude 3 potentially surpassing OpenAI’s best proprietary model at the moment —and open weights AI models eclipsing the free version of ChatGPT—allowing people to use GPT-3.5 might not be putting OpenAI’s best foot forward. Microsoft Copilot, powered by OpenAI models, also supports a frictionless, no-login experience, but it allows access to a model based on GPT-4. But Gemini currently requires a sign-in, and Anthropic sends a login code through email.

For now, OpenAI says the login-free version of ChatGPT is not yet available to everyone, but it will be coming soon: “We’re rolling this out gradually, with the aim to make AI accessible to anyone curious about its capabilities.”

OpenAI drops login requirements for ChatGPT’s free version Read More »

openai-shows-off-sora-ai-video-generator-to-hollywood-execs

OpenAI shows off Sora AI video generator to Hollywood execs

No lights, no camera, action —

CEO Sam Altman met with Universal, Paramount, and Warner Bros Discovery.

a robotic intelligence works as a cameraman (3d rendering)

OpenAI has launched a charm offensive in Hollywood, holding meetings with major studios including Paramount, Universal, and Warner Bros Discovery to showcase its video generation technology Sora and allay fears the artificial intelligence model will harm the movie industry.

Chief Executive Sam Altman and Chief Operating Officer Brad Lightcap gave presentations to executives from the film industry giants, said multiple people with knowledge of the meetings, which took place in recent days.

Altman and Lightcap showed off Sora, a new generative AI model that can create detailed videos from simple written prompts.

The technology first gained Hollywood’s attention after OpenAI published a selection of videos produced by the model last month. The clips quickly went viral online and have led to debate over the model’s potential impact on the creative industries.

“Sora is causing enormous excitement,” said media analyst Claire Enders. “There is a sense it is going to revolutionize the making of movies and bring down the cost of production and reduce the demand for [computer-generated imagery] very strongly.”

AI-generated video of a cat and human, generated via video generation model Sora.

Those involved in the meetings said OpenAI was seeking input from the film bosses on how Sora should be rolled out. Some who watched the demonstrations said they could see how Sora or similar AI products could save time and money on production but added the technology needed further development.

OpenAI’s overtures to the studios come at a delicate moment in Hollywood. Last year’s monthslong strikes ended with the Writers Guild of America and the Screen Actors Guild securing groundbreaking protections from AI in their contracts. This year, contract negotiations are underway with the International Alliance of Theatrical Stage Employees—and AI is again expected to be a hot-button issue.

Earlier this week, OpenAI released new Sora videos generated by a number of visual artists and directors, including short films, as well as their impressions of the technology. The model will aim to compete with several available text-to-video services from start-ups, including Runway, Pika, and Stability AI. These other services already offer commercial uses for content.

An AI-generated video from Sora of a dog.

However, Sora has not been widely released. OpenAI has held off announcing a launch date or the circumstances under which it will be available. One person with knowledge of its strategy said the company was deciding how to commercialize the technology. Another person said there were safety steps still to take before the company considered putting Sora into a product.

OpenAI is also working to improve the system. Currently, Sora can only make videos under one minute in length, and its creations have limitations, such as glass bouncing off the floor instead of shattering or adding extra limbs to people and animals.

Some studios appeared open to using Sora in filmmaking or TV production in future, but licensing and partnerships have not yet been discussed, said people involved in the talks.

“There have been no meetings with OpenAI about partnerships,” one studio executive said. “They’ve done demos, just like Apple has been demo-ing the Vision Pro [mixed-reality headset]. They’re trying to get people excited.”

OpenAI has been previewing the model in a “very controlled manner” to “industries that are likely to be impacted first,” said one person close to OpenAI.

Media analyst Enders said the reception from the movie industry had been broadly optimistic on Sora as it is “seen completely as a cost-saving element, rather than impacting the creative ethos of storytelling.”

OpenAI declined to comment.

An AI-generated video from Sora of a woman walking down a Tokyo street.

© 2024 The Financial Times Ltd. All rights reserved Not to be redistributed, copied, or modified in any way.

OpenAI shows off Sora AI video generator to Hollywood execs Read More »

openai-holds-back-wide-release-of-voice-cloning-tech-due-to-misuse-concerns

OpenAI holds back wide release of voice-cloning tech due to misuse concerns

AI speaks letters, text-to-speech or TTS, text-to-voice, speech synthesis applications, generative Artificial Intelligence, futuristic technology in language and communication.

Voice synthesis has come a long way since 1978’s Speak & Spell toy, which once wowed people with its state-of-the-art ability to read words aloud using an electronic voice. Now, using deep-learning AI models, software can create not only realistic-sounding voices, but also convincingly imitate existing voices using small samples of audio.

Along those lines, OpenAI just announced Voice Engine, a text-to-speech AI model for creating synthetic voices based on a 15-second segment of recorded audio. It has provided audio samples of the Voice Engine in action on its website.

Once a voice is cloned, a user can input text into the Voice Engine and get an AI-generated voice result. But OpenAI is not ready to widely release its technology yet. The company initially planned to launch a pilot program for developers to sign up for the Voice Engine API earlier this month. But after more consideration about ethical implications, the company decided to scale back its ambitions for now.

“In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time,” the company writes. “We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models.”

Voice cloning tech in general is not particularly new—we’ve covered several AI voice synthesis models since 2022, and the tech is active in the open source community with packages like OpenVoice and XTTSv2. But the idea that OpenAI is inching toward letting anyone use their particular brand of voice tech is notable. And in some ways, the company’s reticence to release it fully might be the bigger story.

OpenAI says that benefits of its voice technology include providing reading assistance through natural-sounding voices, enabling global reach for creators by translating content while preserving native accents, supporting non-verbal individuals with personalized speech options, and assisting patients in recovering their own voice after speech-impairing conditions.

But it also means that anyone with 15 seconds of someone’s recorded voice could effectively clone it, and that has obvious implications for potential misuse. Even if OpenAI never widely releases its Voice Engine, the ability to clone voices has already caused trouble in society through phone scams where someone imitates a loved one’s voice and election campaign robocalls featuring cloned voices from politicians like Joe Biden.

Also, researchers and reporters have shown that voice-cloning technology can be used to break into bank accounts that use voice authentication (such as Chase’s Voice ID), which prompted Sen. Sherrod Brown (D-Ohio), the chairman of the US Senate Committee on Banking, Housing, and Urban Affairs, to send a letter to the CEOs of several major banks in May 2023 to inquire about the security measures banks are taking to counteract AI-powered risks.

OpenAI holds back wide release of voice-cloning tech due to misuse concerns Read More »