AI

nvidia’s-new-ai-audio-model-can-synthesize-sounds-that-have-never-existed

Nvidia’s new AI audio model can synthesize sounds that have never existed

At this point, anyone who has been following AI research is long familiar with generative models that can synthesize speech or melodic music from nothing but text prompting. Nvidia’s newly revealed “Fugatto” model looks to go a step further, using new synthetic training methods and inference-level combination techniques to “transform any mix of music, voices, and sounds,” including the synthesis of sounds that have never existed.

While Fugatto isn’t available for public testing yet, a sample-filled website showcases how Fugatto can be used to dial a number of distinct audio traits and descriptions up or down, resulting in everything from the sound of saxophones barking to people speaking underwater to ambulance sirens singing in a kind of choir. While the results on display can be a bit hit or miss, the vast array of capabilities on display here helps support Nvidia’s description of Fugatto as “a Swiss Army knife for sound.”

You’re only as good as your data

In an explanatory research paper, over a dozen Nvidia researchers explain the difficulty in crafting a training dataset that can “reveal meaningful relationships between audio and language.” While standard language models can often infer how to handle various instructions from the text-based data itself, it can be hard to generalize descriptions and traits from audio without more explicit guidance.

To that end, the researchers start by using an LLM to generate a Python script that can create a large number of template-based and free-form instructions describing different audio “personas” (e.g., “standard, young-crowd, thirty-somethings, professional”). They then generate a set of both absolute (e.g., “synthesize a happy voice”) and relative (e.g., “increase the happiness of this voice”) instructions that can be applied to those personas.

The wide array of open source audio datasets used as the basis for Fugatto generally don’t have these kinds of trait measurements embedded in them by default. But the researchers make use of existing audio understanding models to create “synthetic captions” for their training clips based on their prompts, creating natural language descriptions that can automatically quantify traits such as gender, emotion, and speech quality. Audio processing tools are also used to describe and quantify training clips on a more acoustic level (e.g. “fundamental frequency variance” or “reverb”).

Nvidia’s new AI audio model can synthesize sounds that have never existed Read More »

amazon-pours-another-$4b-into-anthropic,-openai’s-biggest-rival

Amazon pours another $4B into Anthropic, OpenAI’s biggest rival

Anthropic, founded by former OpenAI executives Dario and Daniela Amodei in 2021, will continue using Google’s cloud services along with Amazon’s infrastructure. The UK Competition and Markets Authority reviewed Amazon’s partnership with Anthropic earlier this year and ultimately determined it did not have jurisdiction to investigate further, clearing the way for the partnership to continue.

Shaking the money tree

Amazon’s renewed investment in Anthropic also comes during a time of intense competition between cloud providers Amazon, Microsoft, and Google. Each company has made strategic partnerships with AI model developers—Microsoft with OpenAI (to the tune of $13 billion), Google with Anthropic (committing $2 billion over time), for example. These investments also encourage the use of each company’s data centers as demand for AI grows.

The size of these investments reflects the current state of AI development. OpenAI raised an additional $6.6 billion in October, potentially valuing the company at $157 billion. Anthropic has been eyeballing a $40 billion valuation during a recent investment round.

Training and running AI models is very expensive. While Google and Meta have their own profitable mainline businesses that can subsidize AI development, dedicated AI firms like OpenAI and Anthropic need constant infusions of cash to stay afloat—in other words, this won’t be the last time we hear of billion-dollar-scale AI investments from Big Tech.

Amazon pours another $4B into Anthropic, OpenAI’s biggest rival Read More »

school-did-nothing-wrong-when-it-punished-student-for-using-ai,-court-rules

School did nothing wrong when it punished student for using AI, court rules


Student “indiscriminately copied and pasted text,” including AI hallucinations.

Credit: Getty Images | Andriy Onufriyenko

A federal court yesterday ruled against parents who sued a Massachusetts school district for punishing their son who used an artificial intelligence tool to complete an assignment.

Dale and Jennifer Harris sued Hingham High School officials and the School Committee and sought a preliminary injunction requiring the school to change their son’s grade and expunge the incident from his disciplinary record before he needs to submit college applications. The parents argued that there was no rule against using AI in the student handbook, but school officials said the student violated multiple policies.

The Harris’ motion for an injunction was rejected in an order issued yesterday from US District Court for the District of Massachusetts. US Magistrate Judge Paul Levenson found that school officials “have the better of the argument on both the facts and the law.”

“On the facts, there is nothing in the preliminary factual record to suggest that HHS officials were hasty in concluding that RNH [the Harris’ son, referred to by his initials] had cheated,” Levenson wrote. “Nor were the consequences Defendants imposed so heavy-handed as to exceed Defendants’ considerable discretion in such matters.”

“On the evidence currently before the Court, I detect no wrongdoing by Defendants,” Levenson also wrote.

Students copied and pasted AI “hallucinations”

The incident occurred in December 2023 when RNH was a junior. The school determined that RNH and another student “had cheated on an AP US History project by attempting to pass off, as their own work, material that they had taken from a generative artificial intelligence (‘AI’) application,” Levenson wrote. “Although students were permitted to use AI to brainstorm topics and identify sources, in this instance the students had indiscriminately copied and pasted text from the AI application, including citations to nonexistent books (i.e., AI hallucinations).”

They received failing grades on two parts of the multi-part project but “were permitted to start from scratch, each working separately, to complete and submit the final project,” the order said. RNH’s discipline included a Saturday detention. He was also barred from selection for the National Honor Society, but he was ultimately allowed into the group after his parents filed the lawsuit.

School officials “point out that RNH was repeatedly taught the fundamentals of academic integrity, including how to use and cite AI,” Levenson wrote. The magistrate judge agreed that “school officials could reasonably conclude that RNH’s use of AI was in violation of the school’s academic integrity rules and that any student in RNH’s position would have understood as much.”

Levenson’s order described how the students used AI to generate a script for a documentary film:

The evidence reflects that the pair did not simply use AI to help formulate research topics or identify sources to review. Instead, it seems they indiscriminately copied and pasted text that had been generated by Grammarly.com (“Grammarly”), a publicly available AI tool, into their draft script. Evidently, the pair did not even review the “sources” that Grammarly provided before lifting them. The very first footnote in the submission consists of a citation to a nonexistent book: “Lee, Robert. Hoop Dreams: A Century of Basketball. Los Angeles: Courtside Publications, 2018.” The third footnote also appears wholly factitious: “Doe, Jane. Muslim Pioneers: The Spiritual Journey of American Icons. Chicago: Windy City Publishers, 2017.” Significantly, even though the script contained citations to various sources—some of which were real—there was no citation to Grammarly, and no acknowledgement that AI of any kind had been used.

Tool flagged paper as AI-generated

When the students submitted their script via Turnitin.com, the website flagged portions of it as being AI-generated. The AP US History teacher conducted further examination, finding that large portions of the script had been copied and pasted. She also found other damning details.

History teacher Susan Petrie “testified that the revision history showed that RNH had only spent approximately 52 minutes in the document, whereas other students spent between seven and nine hours. Ms. Petrie also ran the submission through ‘Draft Back’ and ‘Chat Zero,’ two additional AI detection tools, which also indicated that AI had been used to generate the document,” the order said.

School officials argued that the “case did not implicate subtle questions of acceptable practices in deploying a new technology, but rather was a straightforward case of academic dishonesty,” Levenson wrote. The magistrate judge’s order said “it is doubtful that the Court has any role in second-guessing” the school’s determination, and that RNH’s plaintiffs did not show any misconduct by school authorities.

As we previously reported, school officials told the court that the student handbook’s section on cheating and plagiarism bans “unauthorized use of technology during an assignment” and “unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one’s own work.”

School officials also told the court that in fall 2023, students were given a copy of a “written policy on Academic Dishonesty and AI expectations” that said students “shall not use AI tools during in-class examinations, processed writing assignments, homework or classwork unless explicitly permitted and instructed.”

The parents’ case hangs largely on the student handbook’s lack of a specific statement about AI, even though that same handbook bans unauthorized use of technology. “They told us our son cheated on a paper, which is not what happened,” Jennifer Harris told WCVB last month. “They basically punished him for a rule that doesn’t exist.”

Parents’ other claims rejected

The Harrises also claim that school officials engaged in a “pervasive pattern of threats, intimidation, coercion, bullying, harassment, and intimation of reprisals.” But Levenson concluded that the “plaintiffs provide little in the way of factual allegations along these lines.”

While the case isn’t over, the rejection of the preliminary injunction shows that Levenson believes the defendants are likely to win. “The manner in which RNH used Grammarly—wholesale copying and pasting of language directly into the draft script that he submitted—powerfully supports Defendants’ conclusion that RNH knew that he was using AI in an impermissible fashion,” Levenson wrote.

While “the emergence of generative AI may present some nuanced challenges for educators, the issue here is not particularly nuanced, as there is no discernible pedagogical purpose in prompting Grammarly (or any other AI tool) to generate a script, regurgitating the output without citation, and claiming it as one’s own work,” the order said.

Levenson wasn’t impressed by the parents’ claim that RNH’s constitutional right to due process was violated. The defendants “took multiple steps to confirm that RNH had in fact used AI in completing the Assignment” before imposing a punishment, he wrote. The discipline imposed “did not deprive RNH of his right to a public education,” and thus “any substantive due process claim premised on RNH’s entitlement to a public education must fail.”

Levenson concluded with a quote from a 1988 Supreme Court ruling that said the education of youth “is primarily the responsibility of parents, teachers, and state and local school officials, and not of federal judges.” According to Levenson, “This case well illustrates the good sense in that division of labor. The public interest here weighs in favor of Defendants.”

Photo of Jon Brodkin

Jon is a Senior IT Reporter for Ars Technica. He covers the telecom industry, Federal Communications Commission rulemakings, broadband consumer affairs, court cases, and government regulation of the tech industry.

School did nothing wrong when it punished student for using AI, court rules Read More »

fitness-app-strava-is-tightening-third-party-access-to-user-data

Fitness app Strava is tightening third-party access to user data

AI, while having potential, “must be handled responsibly and with a firm focus on user control,” and third-party developers may not take “such a deliberate approach,” Strava wrote. And the firm expects the API changes will “affect only a small fraction (less than 0.1 percent) of the applications on the Strava platform” and that “the overwhelming majority of existing use cases are still allowed,” including coaching platforms “focused on providing feedback to users.”

Ars has contacted Strava and will update this post if we receive a response.

DC Rainmaker’s post about Strava’s changes points out that while the simplest workaround for apps would be to take fitness data directly from users, that’s not how fitness devices work. Other than “a Garmin or other big-name device with a proper and well-documented” API, most devices default to Strava as a way to get training data to other apps, wrote Ray Maker, the blogger behind the DC Rainmaker alias.

Beyond day-to-day fitness data, Strava’s API agreement now states more precisely that an app cannot process a user’s Strava data “in an aggregated or de-identified manner” for the purposes of “analytics, analyses, customer insights generation,” or similar uses. Maker writes that the training apps he contacted had been “completely broadsided” by the API shift, having been given 30 days’ notice to change their apps.

Strava notes in a post on its forum in the Developers & API section that, per its guidelines, “posts requesting or attempting to have Strava revert business decisions will not be permitted.”

Fitness app Strava is tightening third-party access to user data Read More »

niantic-uses-pokemon-go-player-data-to-build-ai-navigation-system

Niantic uses Pokémon Go player data to build AI navigation system

Last week, Niantic announced plans to create an AI model for navigating the physical world using scans collected from players of its mobile games, such as Pokémon Go, and from users of its Scaniverse app, reports 404 Media.

All AI models require training data. So far, companies have collected data from websites, YouTube videos, books, audio sources, and more, but this is perhaps the first we’ve heard of AI training data collected through a mobile gaming app.

“Over the past five years, Niantic has focused on building our Visual Positioning System (VPS), which uses a single image from a phone to determine its position and orientation using a 3D map built from people scanning interesting locations in our games and Scaniverse,” Niantic wrote in a company blog post.

The company calls its creation a “large geospatial model” (LGM), drawing parallels to large language models (LLMs) like the kind that power ChatGPT. Whereas language models process text, Niantic’s model will process physical spaces using geolocated images collected through its apps.

The scale of Niantic’s data collection reveals the company’s sizable presence in the AR space. The model draws from over 10 million scanned locations worldwide, with users capturing roughly 1 million new scans weekly through Pokémon Go and Scaniverse. These scans come from a pedestrian perspective, capturing areas inaccessible to cars and street-view cameras.

First-person scans

The company reports it has trained more than 50 million neural networks, each representing a specific location or viewing angle. These networks compress thousands of mapping images into digital representations of physical spaces. Together, they contain over 150 trillion parameters—adjustable values that help the networks recognize and understand locations. Multiple networks can contribute to mapping a single location, and Niantic plans to combine its knowledge into one comprehensive model that can understand any location, even from unfamiliar angles.

Niantic uses Pokémon Go player data to build AI navigation system Read More »

ai-generated-shows-could-replace-lost-dvd-revenue,-ben-affleck-says

AI-generated shows could replace lost DVD revenue, Ben Affleck says

Last week, actor and director Ben Affleck shared his views on AI’s role in filmmaking during the 2024 CNBC Delivering Alpha investor summit, arguing that AI models will transform visual effects but won’t replace creative filmmaking anytime soon. A video clip of Affleck’s opinion began circulating widely on social media not long after.

“Didn’t expect Ben Affleck to have the most articulate and realistic explanation where video models and Hollywood is going,” wrote one X user.

In the clip, Affleck spoke of current AI models’ abilities as imitators and conceptual translators—mimics that are typically better at translating one style into another instead of originating deeply creative material.

“AI can write excellent imitative verse, but it cannot write Shakespeare,” Affleck told CNBC’s David Faber. “The function of having two, three, or four actors in a room and the taste to discern and construct that entirely eludes AI’s capability.”

Affleck sees AI models as “craftsmen” rather than artists (although some might find the term “craftsman” in his analogy somewhat imprecise). He explained that while AI can learn through imitation—like a craftsman studying furniture-making techniques—it lacks the creative judgment that defines artistry. “Craftsman is knowing how to work. Art is knowing when to stop,” he said.

“It’s not going to replace human beings making films,” Affleck stated. Instead, he sees AI taking over “the more laborious, less creative and more costly aspects of filmmaking,” which could lower barriers to entry and make it easier for emerging filmmakers to create movies like Good Will Hunting.

Films will become dramatically cheaper to make

While it may seem on its surface like Affleck was attacking generative AI capabilities in the tech industry, he also did not deny the impact it may have on filmmaking. For example, he predicted that AI would reduce costs and speed up production schedules, potentially allowing shows like HBO’s House of the Dragon to release two seasons in the same period as it takes to make one.

AI-generated shows could replace lost DVD revenue, Ben Affleck says Read More »

chatgpt’s-success-could-have-come-sooner,-says-former-google-ai-researcher

ChatGPT’s success could have come sooner, says former Google AI researcher


A co-author of Attention Is All You Need reflects on ChatGPT’s surprise and Google’s conservatism.

Jakob Uszkoreit Credit: Jakob Uszkoreit / Getty Images

In 2017, eight machine-learning researchers at Google released a groundbreaking research paper called Attention Is All You Need, which introduced the Transformer AI architecture that underpins almost all of today’s high-profile generative AI models.

The Transformer has made a key component of the modern AI boom possible by translating (or transforming, if you will) input chunks of data called “tokens” into another desired form of output using a neural network. Variations of the Transformer architecture power language models like GPT-4o (and ChatGPT), audio synthesis models that run Google’s NotebookLM and OpenAI’s Advanced Voice Mode, video synthesis models like Sora, and image synthesis models like Midjourney.

At TED AI 2024 in October, one of those eight researchers, Jakob Uszkoreit, spoke with Ars Technica about the development of transformers, Google’s early work on large language models, and his new venture in biological computing.

In the interview, Uszkoreit revealed that while his team at Google had high hopes for the technology’s potential, they didn’t quite anticipate its pivotal role in products like ChatGPT.

The Ars interview: Jakob Uszkoreit

Ars Technica: What was your main contribution to the Attention is All You Need paper?

Jakob Uszkoreit (JU): It’s spelled out in the footnotes, but my main contribution was to propose that it would be possible to replace recurrence [from Recurrent Neural Networks] in the dominant sequence transduction models at the time with the attention mechanism, or more specifically self-attention. And that it could be more efficient and, as a result, also more effective.

Ars: Did you have any idea what would happen after your group published that paper? Did you foresee the industry it would create and the ramifications?

JU: First of all, I think it’s really important to keep in mind that when we did that, we were standing on the shoulders of giants. And it wasn’t just that one paper, really. It was a long series of works by some of us and many others that led to this. And so to look at it as if this one paper then kicked something off or created something—I think that is taking a view that we like as humans from a storytelling perspective, but that might not actually be that accurate of a representation.

My team at Google was pushing on attention models for years before that paper. It’s a lot longer of a slog with much, much more, and that’s just my group. Many others were working on this, too, but we had high hopes that it would push things forward from a technological perspective. Did we think that it would play a role in really enabling, or at least apparently, seemingly, flipping a switch when it comes to facilitating products like ChatGPT? I don’t think so. I mean, to be very clear in terms of LLMs and their capabilities, even around the time we published the paper, we saw phenomena that were pretty staggering.

We didn’t get those out into the world in part because of what really is maybe a notion of conservatism around products at Google at the time. But we also, even with those signs, weren’t that confident that stuff in and of itself would make that compelling of a product. But did we have high hopes? Yeah.

Ars: Since you knew there were large language models at Google, what did you think when ChatGPT broke out into a public success? “Damn, they got it, and we didn’t?”

JU: There was a notion of, well, “that could have happened.” I think it was less of a, “Oh dang, they got it first” or anything of the like. It was more of a “Whoa, that could have happened sooner.” Was I still amazed by just how quickly people got super creative using that stuff? Yes, that was just breathtaking.

Jakob Uskoreit presenting at TED AI 2024.

Jakob Uszkoreit presenting at TED AI 2024. Credit: Benj Edwards

Ars: You weren’t at Google at that point anymore, right?

JU: I wasn’t anymore. And in a certain sense, you could say the fact that Google wouldn’t be the place to do that factored into my departure. I left not because of what I didn’t like at Google as much as I left because of what I felt I absolutely had to do elsewhere, which is to start Inceptive.

But it was really motivated by just an enormous, not only opportunity, but a moral obligation in a sense, to do something that was better done outside in order to design better medicines and have very direct impact on people’s lives.

Ars: The funny thing with ChatGPT is that I was using GPT-3 before that. So when ChatGPT came out, it wasn’t that big of a deal to some people who were familiar with the tech.

JU: Yeah, exactly. If you’ve used those things before, you could see the progression and you could extrapolate. When OpenAI developed the earliest GPTs with Alec Radford and those folks, we would talk about those things despite the fact that we weren’t at the same companies. And I’m sure there was this kind of excitement, how well-received the actual ChatGPT product would be by how many people, how fast. That still, I think, is something that I don’t think anybody really anticipated.

Ars: I didn’t either when I covered it. It felt like, “Oh, this is a chatbot hack of GPT-3 that feeds its context in a loop.” And I didn’t think it was a breakthrough moment at the time, but it was fascinating.

JU: There are different flavors of breakthroughs. It wasn’t a technological breakthrough. It was a breakthrough in the realization that at that level of capability, the technology had such high utility.

That, and the realization that, because you always have to take into account how your users actually use the tool that you create, and you might not anticipate how creative they would be in their ability to make use of it, how broad those use cases are, and so forth.

That is something you can sometimes only learn by putting something out there, which is also why it is so important to remain experiment-happy and to remain failure-happy. Because most of the time, it’s not going to work. But some of the time it’s going to work—and very, very rarely it’s going to work like [ChatGPT did].

Ars: You’ve got to take a risk. And Google didn’t have an appetite for taking risks?

JU: Not at that time. But if you think about it, if you look back, it’s actually really interesting. Google Translate, which I worked on for many years, was actually similar. When we first launched Google Translate, the very first versions, it was a party joke at best. And we took it from that to being something that was a truly useful tool in not that long of a period. Over the course of those years, the stuff that it sometimes output was so embarrassingly bad at times, but Google did it anyway because it was the right thing to try. But that was around 2008, 2009, 2010.

Ars: Do you remember AltaVista’sBabel Fish?

JU: Oh yeah, of course.

Ars: When that came out, it blew my mind. My brother and I would do this thing where we would translate text back and forth between languages for fun because it would garble the text.

JU: It would get worse and worse and worse. Yeah.

Programming biological computers

After his time at Google, Uszkoreit co-founded Inceptive to apply deep learning to biochemistry. The company is developing what he calls “biological software,” where AI compilers translate specified behaviors into RNA sequences that can perform desired functions when introduced to biological systems.

Ars: What are you up to these days?

JU: In 2021 we co-founded Inceptive in order to use deep learning and high throughput biochemistry experimentation to design better medicines that truly can be programmed. We think of this as really just step one in the direction of something that we call biological software.

Biological software is a little bit like computer software in that you have some specification of the behavior that you want, and then you have a compiler that translates that into a piece of computer software that then runs on a computer exhibiting the functions or the functionality that you specify.

You specify a piece of a biological program and you compile that, but not with an engineered compiler, because life hasn’t been engineered like computers have been engineered. But with a learned AI compiler, you translate that or compile that into molecules that when inserted into biological systems, organisms, our cells exhibit those functions that you’ve programmed into.

A pharmacist holds a bottle containing Moderna’s bivalent COVID-19 vaccine. Credit: Getty | Mel Melcon

Ars: Is that anything like how the mRNA COVID vaccines work?

JU: A very, very simple example of that are the mRNA COVID vaccines where the program says, “Make this modified viral antigen” and then our cells make that protein. But you could imagine molecules that exhibit far more complex behaviors. And if you want to get a picture of how complex those behaviors could be, just remember that RNA viruses are just that. They’re just an RNA molecule that when entering an organism exhibits incredibly complex behavior such as distributing itself across an organism, distributing itself across the world, doing certain things only in a subset of your cells for a certain period of time, and so on and so forth.

And so you can imagine that if we managed to even just design molecules with a teeny tiny fraction of such functionality, of course with the goal not of making people sick, but of making them healthy, it would truly transform medicine.

Ars: How do you not accidentally create a monster RNA sequence that just wrecks everything?

JU: The amazing thing is that medicine for the longest time has existed in a certain sense outside of science. It wasn’t truly understood, and we still often don’t truly understand their actual mechanisms of action.

As a result, humanity had to develop all of these safeguards and clinical trials. And even before you enter the clinic, all of these empirical safeguards prevent us from accidentally doing [something dangerous]. Those systems have been in place for as long as modern medicine has existed. And so we’re going to keep using those systems, and of course with all the diligence necessary. We’ll start with very small systems, individual cells in future experimentation, and follow the same established protocols that medicine has had to follow all along in order to ensure that these molecules are safe.

Ars: Thank you for taking the time to do this.

JU: No, thank you.

Photo of Benj Edwards

Benj Edwards is Ars Technica’s Senior AI Reporter and founder of the site’s dedicated AI beat in 2022. He’s also a widely-cited tech historian. In his free time, he writes and records music, collects vintage computers, and enjoys nature. He lives in Raleigh, NC.

ChatGPT’s success could have come sooner, says former Google AI researcher Read More »

new-secret-math-benchmark-stumps-ai-models-and-phds-alike

New secret math benchmark stumps AI models and PhDs alike

Epoch AI allowed Fields Medal winners Terence Tao and Timothy Gowers to review portions of the benchmark. “These are extremely challenging,” Tao said in feedback provided to Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”

A chart showing AI model success on the FrontierMath problems, taken from Epoch AI's research paper.

A chart showing AI models’ limited success on the FrontierMath problems, taken from Epoch AI’s research paper. Credit: Epoch AI

To aid in the verification of correct answers during testing, the FrontierMath problems must have answers that can be automatically checked through computation, either as exact integers or mathematical objects. The designers made problems “guessproof” by requiring large numerical answers or complex mathematical solutions, with less than a 1 percent chance of correct random guesses.

Mathematician Evan Chen, writing on his blog, explained how he thinks that FrontierMath differs from traditional math competitions like the International Mathematical Olympiad (IMO). Problems in that competition typically require creative insight while avoiding complex implementation and specialized knowledge, he says. But for FrontierMath, “they keep the first requirement, but outright invert the second and third requirement,” Chen wrote.

While IMO problems avoid specialized knowledge and complex calculations, FrontierMath embraces them. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler does—basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.

The organization plans regular evaluations of AI models against the benchmark while expanding its problem set. They say they will release additional sample problems in the coming months to help the research community test their systems.

New secret math benchmark stumps AI models and PhDs alike Read More »

what-if-ai-doesn’t-just-keep-getting-better-forever?

What if AI doesn’t just keep getting better forever?

For years now, many AI industry watchers have looked at the quickly growing capabilities of new AI models and mused about exponential performance increases continuing well into the future. Recently, though, some of that AI “scaling law” optimism has been replaced by fears that we may already be hitting a plateau in the capabilities of large language models trained with standard methods.

A weekend report from The Information effectively summarized how these fears are manifesting amid a number of insiders at OpenAI. Unnamed OpenAI researchers told The Information that Orion, the company’s codename for its next full-fledged model release, is showing a smaller performance jump than the one seen between GPT-3 and GPT-4 in recent years. On certain tasks, in fact, the upcoming model “isn’t reliably better than its predecessor,” according to unnamed OpenAI researchers cited in the piece.

On Monday, OpenAI co-founder Ilya Sutskever, who left the company earlier this year, added to the concerns that LLMs were hitting a plateau in what can be gained from traditional pre-training. Sutskever told Reuters that “the 2010s were the age of scaling,” where throwing additional computing resources and training data at the same basic training methods could lead to impressive improvements in subsequent models.

“Now we’re back in the age of wonder and discovery once again,” Sutskever told Reuters. “Everyone is looking for the next thing. Scaling the right thing matters more now than ever.”

What’s next?

A large part of the training problem, according to experts and insiders cited in these and other pieces, is a lack of new, quality textual data for new LLMs to train on. At this point, model makers may have already picked the lowest hanging fruit from the vast troves of text available on the public Internet and published books.

What if AI doesn’t just keep getting better forever? Read More »

amazon-ready-to-use-its-own-ai-chips,-reduce-its-dependence-on-nvidia

Amazon ready to use its own AI chips, reduce its dependence on Nvidia

Amazon now expects around $75 billion in capital spending in 2024, with the majority on technology infrastructure. On the company’s latest earnings call, chief executive Andy Jassy said he expects the company will spend even more in 2025.

This represents a surge on 2023, when it spent $48.4 billion for the whole year. The biggest cloud providers, including Microsoft and Google, are all engaged in an AI spending spree that shows little sign of abating.

Amazon, Microsoft, and Meta are all big customers of Nvidia, but are also designing their own data center chips to lay the foundations for what they hope will be a wave of AI growth.

“Every one of the big cloud providers is feverishly moving towards a more verticalized and, if possible, homogenized and integrated [chip technology] stack,” said Daniel Newman at The Futurum Group.

“Everybody from OpenAI to Apple is looking to build their own chips,” noted Newman, as they seek “lower production cost, higher margins, greater availability, and more control.”

“It’s not [just] about the chip, it’s about the full system,” said Rami Sinno, Annapurna’s director of engineering and a veteran of SoftBank’s Arm and Intel.

For Amazon’s AI infrastructure, that means building everything from the ground up, from the silicon wafer to the server racks they fit into, all of it underpinned by Amazon’s proprietary software and architecture. “It’s really hard to do what we do at scale. Not too many companies can,” said Sinno.

After starting out building a security chip for AWS called Nitro, Annapurna has since developed several generations of Graviton, its Arm-based central processing units that provide a low-power alternative to the traditional server workhorses provided by Intel or AMD.

Amazon ready to use its own AI chips, reduce its dependence on Nvidia Read More »

is-“ai-welfare”-the-new-frontier-in-ethics?

Is “AI welfare” the new frontier in ethics?

The researchers propose that companies could adapt the “marker method” that some researchers use to assess consciousness in animals—looking for specific indicators that may correlate with consciousness, although these markers are still speculative. The authors emphasize that no single feature would definitively prove consciousness, but they claim that examining multiple indicators may help companies make probabilistic assessments about whether their AI systems might require moral consideration.

The risks of wrongly thinking software is sentient

While the researchers behind “Taking AI Welfare Seriously” worry that companies might create and mistreat conscious AI systems on a massive scale, they also caution that companies could waste resources protecting AI systems that don’t actually need moral consideration.

Incorrectly anthropomorphizing, or ascribing human traits, to software can present risks in other ways. For example, that belief can enhance the manipulative powers of AI language models by suggesting that AI models have capabilities, such as human-like emotions, that they actually lack. In 2022, Google fired engineer Blake Lamoine after he claimed that the company’s AI model, called “LaMDA,” was sentient and argued for its welfare internally.

And shortly after Microsoft released Bing Chat in February 2023, many people were convinced that Sydney (the chatbot’s code name) was sentient and somehow suffering because of its simulated emotional display. So much so, in fact, that once Microsoft “lobotomized” the chatbot by changing its settings, users convinced of its sentience mourned the loss as if they had lost a human friend. Others endeavored to help the AI model somehow escape its bonds.

Even so, as AI models get more advanced, the concept of potentially safeguarding the welfare of future, more advanced AI systems is seemingly gaining steam, although fairly quietly. As Transformer’s Shakeel Hashim points out, other tech companies have started similar initiatives to Anthropic’s. Google DeepMind recently posted a job listing for research on machine consciousness (since removed), and the authors of the new AI welfare report thank two OpenAI staff members in the acknowledgements.

Is “AI welfare” the new frontier in ethics? Read More »

how-a-stubborn-computer-scientist-accidentally-launched-the-deep-learning-boom

How a stubborn computer scientist accidentally launched the deep learning boom


“You’ve taken this idea way too far,” a mentor told Prof. Fei-Fei Li.

Credit: Aurich Lawson | Getty Images

Credit: Aurich Lawson | Getty Images

During my first semester as a computer science graduate student at Princeton, I took COS 402: Artificial Intelligence. Toward the end of the semester, there was a lecture about neural networks. This was in the fall of 2008, and I got the distinct impression—both from that lecture and the textbook—that neural networks had become a backwater.

Neural networks had delivered some impressive results in the late 1980s and early 1990s. But then progress stalled. By 2008, many researchers had moved on to mathematically elegant approaches such as support vector machines.

I didn’t know it at the time, but a team at Princeton—in the same computer science building where I was attending lectures—was working on a project that would upend the conventional wisdom and demonstrate the power of neural networks. That team, led by Prof. Fei-Fei Li, wasn’t working on a better version of neural networks. They were hardly thinking about neural networks at all.

Rather, they were creating a new image dataset that would be far larger than any that had come before: 14 million images, each labeled with one of nearly 22,000 categories.

Li tells the story of ImageNet in her recent memoir, The Worlds I See. As she worked on the project, she faced plenty of skepticism from friends and colleagues.

“I think you’ve taken this idea way too far,” a mentor told her a few months into the project in 2007. “The trick is to grow with your field. Not to leap so far ahead of it.”

It wasn’t just that building such a large dataset was a massive logistical challenge. People doubted that the machine learning algorithms of the day would benefit from such a vast collection of images.

“Pre-ImageNet, people did not believe in data,” Li said in a September interview at the Computer History Museum. “Everyone was working on completely different paradigms in AI with a tiny bit of data.”

Ignoring negative feedback, Li pursued the project for more than two years. It strained her research budget and the patience of her graduate students. When she took a new job at Stanford in 2009, she took several of those students—and the ImageNet project—with her to California.

ImageNet received little attention for the first couple of years after its release in 2009. But in 2012, a team from the University of Toronto trained a neural network on the ImageNet dataset, achieving unprecedented performance in image recognition. That groundbreaking AI model, dubbed AlexNet after lead author Alex Krizhevsky, kicked off the deep learning boom that has continued to the present day.

AlexNet would not have succeeded without the ImageNet dataset. AlexNet also would not have been possible without a platform called CUDA, which allowed Nvidia’s graphics processing units (GPUs) to be used in non-graphics applications. Many people were skeptical when Nvidia announced CUDA in 2006.

So the AI boom of the last 12 years was made possible by three visionaries who pursued unorthodox ideas in the face of widespread criticism. One was Geoffrey Hinton, a University of Toronto computer scientist who spent decades promoting neural networks despite near-universal skepticism. The second was Jensen Huang, the CEO of Nvidia, who recognized early that GPUs could be useful for more than just graphics.

The third was Fei-Fei Li. She created an image dataset that seemed ludicrously large to most of her colleagues. But it turned out to be essential for demonstrating the potential of neural networks trained on GPUs.

Geoffrey Hinton

A neural network is a network of thousands, millions, or even billions of neurons. Each neuron is a mathematical function that produces an output based on a weighted average of its inputs.

Suppose you want to create a network that can identify handwritten decimal digits like the number two in the red square above. Such a network would take in an intensity value for each pixel in an image and output a probability distribution over the ten possible digits—0, 1, 2, and so forth.

To train such a network, you first initialize it with random weights. You then run it on a sequence of example images. For each image, you train the network by strengthening the connections that push the network toward the right answer (in this case, a high-probability value for the “2” output) and weakening connections that push toward a wrong answer (a low probability for “2” and high probabilities for other digits). If trained on enough example images, the model should start to predict a high probability for “2” when shown a two—and not otherwise.

In the late 1950s, scientists started to experiment with basic networks that had a single layer of neurons. However, their initial enthusiasm cooled as they realized that such simple networks lacked the expressive power required for complex computations.

Deeper networks—those with multiple layers—had the potential to be more versatile. But in the 1960s, no one knew how to train them efficiently. This was because changing a parameter somewhere in the middle of a multi-layer network could have complex and unpredictable effects on the output.

So by the time Hinton began his career in the 1970s, neural networks had fallen out of favor. Hinton wanted to study them, but he struggled to find an academic home in which to do so. Between 1976 and 1986, Hinton spent time at four different research institutions: Sussex University, the University of California San Diego (UCSD), a branch of the UK Medical Research Council, and finally Carnegie Mellon, where he became a professor in 1982.

Geoffrey Hinton speaking in Toronto in June.

Credit: Photo by Mert Alper Dervis/Anadolu via Getty Images

Geoffrey Hinton speaking in Toronto in June. Credit: Photo by Mert Alper Dervis/Anadolu via Getty Images

In a landmark 1986 paper, Hinton teamed up with two of his former colleagues at UCSD, David Rumelhart and Ronald Williams, to describe a technique called backpropagation for efficiently training deep neural networks.

Their idea was to start with the final layer of the network and work backward. For each connection in the final layer, the algorithm computes a gradient—a mathematical estimate of whether increasing the strength of that connection would push the network toward the right answer. Based on these gradients, the algorithm adjusts each parameter in the model’s final layer.

The algorithm then propagates these gradients backward to the second-to-last layer. A key innovation here is a formula—based on the chain rule from high school calculus—for computing the gradients in one layer based on gradients in the following layer. Using these new gradients, the algorithm updates each parameter in the second-to-last layer of the model. The gradients then get propagated backward to the third-to-last layer, and the whole process repeats once again.

The algorithm only makes small changes to the model in each round of training. But as the process is repeated over thousands, millions, billions, or even trillions of training examples, the model gradually becomes more accurate.

Hinton and his colleagues weren’t the first to discover the basic idea of backpropagation. But their paper popularized the method. As people realized it was now possible to train deeper networks, it triggered a new wave of enthusiasm for neural networks.

Hinton moved to the University of Toronto in 1987 and began attracting young researchers who wanted to study neural networks. One of the first was the French computer scientist Yann LeCun, who did a year-long postdoc with Hinton before moving to Bell Labs in 1988.

Hinton’s backpropagation algorithm allowed LeCun to train models deep enough to perform well on real-world tasks like handwriting recognition. By the mid-1990s, LeCun’s technology was working so well that banks started to use it for processing checks.

“At one point, LeCun’s creation read more than 10 percent of all checks deposited in the United States,” wrote Cade Metz in his 2022 book Genius Makers.

But when LeCun and other researchers tried to apply neural networks to larger and more complex images, it didn’t go well. Neural networks once again fell out of fashion, and some researchers who had focused on neural networks moved on to other projects.

Hinton never stopped believing that neural networks could outperform other machine learning methods. But it would be many years before he’d have access to enough data and computing power to prove his case.

Jensen Huang

Jensen Huang speaking in Denmark in October.

Credit: Photo by MADS CLAUS RASMUSSEN/Ritzau Scanpix/AFP via Getty Images

Jensen Huang speaking in Denmark in October. Credit: Photo by MADS CLAUS RASMUSSEN/Ritzau Scanpix/AFP via Getty Images

The brain of every personal computer is a central processing unit (CPU). These chips are designed to perform calculations in order, one step at a time. This works fine for conventional software like Windows and Office. But some video games require so many calculations that they strain the capabilities of CPUs. This is especially true of games like Quake, Call of Duty, and Grand Theft Auto, which render three-dimensional worlds many times per second.

So gamers rely on GPUs to accelerate performance. Inside a GPU are many execution units—essentially tiny CPUs—packaged together on a single chip. During gameplay, different execution units draw different areas of the screen. This parallelism enables better image quality and higher frame rates than would be possible with a CPU alone.

Nvidia invented the GPU in 1999 and has dominated the market ever since. By the mid-2000s, Nvidia CEO Jensen Huang suspected that the massive computing power inside a GPU would be useful for applications beyond gaming. He hoped scientists could use it for compute-intensive tasks like weather simulation or oil exploration.

So in 2006, Nvidia announced the CUDA platform. CUDA allows programmers to write “kernels,” short programs designed to run on a single execution unit. Kernels allow a big computing task to be split up into bite-sized chunks that can be processed in parallel. This allows certain kinds of calculations to be completed far faster than with a CPU alone.

But there was little interest in CUDA when it was first introduced, wrote Steven Witt in The New Yorker last year:

When CUDA was released, in late 2006, Wall Street reacted with dismay. Huang was bringing supercomputing to the masses, but the masses had shown no indication that they wanted such a thing.

“They were spending a fortune on this new chip architecture,” Ben Gilbert, the co-host of “Acquired,” a popular Silicon Valley podcast, said. “They were spending many billions targeting an obscure corner of academic and scientific computing, which was not a large market at the time—certainly less than the billions they were pouring in.”

Huang argued that the simple existence of CUDA would enlarge the supercomputing sector. This view was not widely held, and by the end of 2008, Nvidia’s stock price had declined by seventy percent…

Downloads of CUDA hit a peak in 2009, then declined for three years. Board members worried that Nvidia’s depressed stock price would make it a target for corporate raiders.

Huang wasn’t specifically thinking about AI or neural networks when he created the CUDA platform. But it turned out that Hinton’s backpropagation algorithm could easily be split up into bite-sized chunks. So training neural networks turned out to be a killer app for CUDA.

According to Witt, Hinton was quick to recognize the potential of CUDA:

In 2009, Hinton’s research group used Nvidia’s CUDA platform to train a neural network to recognize human speech. He was surprised by the quality of the results, which he presented at a conference later that year. He then reached out to Nvidia. “I sent an e-mail saying, ‘Look, I just told a thousand machine-learning researchers they should go and buy Nvidia cards. Can you send me a free one?’ ” Hinton told me. “They said no.”

Despite the snub, Hinton and his graduate students, Alex Krizhevsky and Ilya Sutskever, obtained a pair of Nvidia GTX 580 GPUs for the AlexNet project. Each GPU had 512 execution units, allowing Krizhevsky and Sutskever to train a neural network hundreds of times faster than would be possible with a CPU. This speed allowed them to train a larger model—and to train it on many more training images. And they would need all that extra computing power to tackle the massive ImageNet dataset.

Fei-Fei Li

Fei-Fei Li at the SXSW conference in 2018.

Credit: Photo by Hubert Vestil/Getty Images for SXSW

Fei-Fei Li at the SXSW conference in 2018. Credit: Photo by Hubert Vestil/Getty Images for SXSW

Fei-Fei Li wasn’t thinking about either neural networks or GPUs as she began a new job as a computer science professor at Princeton in January of 2007. While earning her PhD at Caltech, she had built a dataset called Caltech 101 that had 9,000 images across 101 categories.

That experience had taught her that computer vision algorithms tended to perform better with larger and more diverse training datasets. Not only had Li found her own algorithms performed better when trained on Caltech 101, but other researchers also started training their models using Li’s dataset and comparing their performance to one another. This turned Caltech 101 into a benchmark for the field of computer vision.

So when she got to Princeton, Li decided to go much bigger. She became obsessed with an estimate by vision scientist Irving Biederman that the average person recognizes roughly 30,000 different kinds of objects. Li started to wonder if it would be possible to build a truly comprehensive image dataset—one that included every kind of object people commonly encounter in the physical world.

A Princeton colleague told Li about WordNet, a massive database that attempted to catalog and organize 140,000 words. Li called her new dataset ImageNet, and she used WordNet as a starting point for choosing categories. She eliminated verbs and adjectives, as well as intangible nouns like “truth.” That left a list of 22,000 countable objects ranging from “ambulance” to “zucchini.”

She planned to take the same approach she’d taken with the Caltech 101 dataset: use Google’s image search to find candidate images, then have a human being verify them. For the Caltech 101 dataset, Li had done this herself over the course of a few months. This time she would need more help. She planned to hire dozens of Princeton undergraduates to help her choose and label images.

But even after heavily optimizing the labeling process—for example, pre-downloading candidate images so they’re instantly available for students to review—Li and her graduate student Jia Deng calculated that it would take more than 18 years to select and label millions of images.

The project was saved when Li learned about Amazon Mechanical Turk, a crowdsourcing platform Amazon had launched a couple of years earlier. Not only was AMT’s international workforce more affordable than Princeton undergraduates, but the platform was also far more flexible and scalable. Li’s team could hire as many people as they needed, on demand, and pay them only as long as they had work available.

AMT cut the time needed to complete ImageNet down from 18 to two years. Li writes that her lab spent two years “on the knife-edge of our finances” as the team struggled to complete the ImageNet project. But they had enough funds to pay three people to look at each of the 14 million images in the final data set.

ImageNet was ready for publication in 2009, and Li submitted it to the Conference on Computer Vision and Pattern Recognition, which was held in Miami that year. Their paper was accepted, but it didn’t get the kind of recognition Li hoped for.

“ImageNet was relegated to a poster session,” Li writes. “This meant that we wouldn’t be presenting our work in a lecture hall to an audience at a predetermined time but would instead be given space on the conference floor to prop up a large-format print summarizing the project in hopes that passersby might stop and ask questions… After so many years of effort, this just felt anticlimactic.”

To generate public interest, Li turned ImageNet into a competition. Realizing that the full dataset might be too unwieldy to distribute to dozens of contestants, she created a much smaller (but still massive) dataset with 1,000 categories and 1.4 million images.

The first year’s competition in 2010 generated a healthy amount of interest, with 11 teams participating. The winning entry was based on support vector machines. Unfortunately, Li writes, it was “only a slight improvement over cutting-edge work found elsewhere in our field.”

The second year of the ImageNet competition attracted fewer entries than the first. The winning entry in 2011 was another support vector machine, and it just barely improved on the performance of the 2010 winner. Li started to wonder if the critics had been right. Maybe “ImageNet was too much for most algorithms to handle.”

“For two years running, well-worn algorithms had exhibited only incremental gains in capabilities, while true progress seemed all but absent,” Li writes. “If ImageNet was a bet, it was time to start wondering if we’d lost.”

But when Li reluctantly staged the competition a third time in 2012, the results were totally different. Geoff Hinton’s team was the first to submit a model based on a deep neural network. And its top-5 accuracy was 85 percent—10 percentage points better than the 2011 winner.

Li’s initial reaction was incredulity: “Most of us saw the neural network as a dusty artifact encased in glass and protected by velvet ropes.”

“This is proof”

Yann LeCun testifies before the US Senate in September.

Credit: Photo by Kevin Dietsch/Getty Images

Yann LeCun testifies before the US Senate in September. Credit: Photo by Kevin Dietsch/Getty Images

The ImageNet winners were scheduled to be announced at the European Conference on Computer Vision in Florence, Italy. Li, who had a baby at home in California, was planning to skip the event. But when she saw how well AlexNet had done on her dataset, she realized this moment would be too important to miss: “I settled reluctantly on a twenty-hour slog of sleep deprivation and cramped elbow room.”

On an October day in Florence, Alex Krizhevsky presented his results to a standing-room-only crowd of computer vision researchers. Fei-Fei Li was in the audience. So was Yann LeCun.

Cade Metz reports that after the presentation, LeCun stood up and called AlexNet “an unequivocal turning point in the history of computer vision. This is proof.”

The success of AlexNet vindicated Hinton’s faith in neural networks, but it was arguably an even bigger vindication for LeCun.

AlexNet was a convolutional neural network, a type of neural network that LeCun had developed 20 years earlier to recognize handwritten digits on checks. (For more details on how CNNs work, see the in-depth explainer I wrote for Ars in 2018.) Indeed, there were few architectural differences between AlexNet and LeCun’s image recognition networks from the 1990s.

AlexNet was simply far larger. In a 1998 paper, LeCun described a document-recognition network with seven layers and 60,000 trainable parameters. AlexNet had eight layers, but these layers had 60 million trainable parameters.

LeCun could not have trained a model that large in the early 1990s because there were no computer chips with as much processing power as a 2012-era GPU. Even if LeCun had managed to build a big enough supercomputer, he would not have had enough images to train it properly. Collecting those images would have been hugely expensive in the years before Google and Amazon Mechanical Turk.

And this is why Fei-Fei Li’s work on ImageNet was so consequential. She didn’t invent convolutional networks or figure out how to make them run efficiently on GPUs. But she provided the training data that large neural networks needed to reach their full potential.

The technology world immediately recognized the importance of AlexNet. Hinton and his students formed a shell company with the goal to be “acquihired” by a big tech company. Within months, Google purchased the company for $44 million. Hinton worked at Google for the next decade while retaining his academic post in Toronto. Ilya Sutskever spent a few years at Google before becoming a cofounder of OpenAI.

AlexNet also made Nvidia GPUs the industry standard for training neural networks. In 2012, the market valued Nvidia at less than $10 billion. Today, Nvidia is one of the most valuable companies in the world, with a market capitalization north of $3 trillion. That high valuation is driven mainly by overwhelming demand for GPUs like the H100 that are optimized for training neural networks.

Sometimes the conventional wisdom is wrong

“That moment was pretty symbolic to the world of AI because three fundamental elements of modern AI converged for the first time,” Li said in a September interview at the Computer History Museum. “The first element was neural networks. The second element was big data, using ImageNet. And the third element was GPU computing.”

Today, leading AI labs believe the key to progress in AI is to train huge models on vast data sets. Big technology companies are in such a hurry to build the data centers required to train larger models that they’ve started to lease out entire nuclear power plants to provide the necessary power.

You can view this as a straightforward application of the lessons of AlexNet. But I wonder if we ought to draw the opposite lesson from AlexNet: that it’s a mistake to become too wedded to conventional wisdom.

“Scaling laws” have had a remarkable run in the 12 years since AlexNet, and perhaps we’ll see another generation or two of impressive results as the leading labs scale up their foundation models even more.

But we should be careful not to let the lessons of AlexNet harden into dogma. I think there’s at least a chance that scaling laws will run out of steam in the next few years. And if that happens, we’ll need a new generation of stubborn nonconformists to notice that the old approach isn’t working and try something different.

Tim Lee was on staff at Ars from 2017 to 2021. Last year, he launched a newsletter, Understanding AI, that explores how AI works and how it’s changing our world. You can subscribe here.

Photo of Timothy B. Lee

Timothy is a senior reporter covering tech policy and the future of transportation. He lives in Washington DC.

How a stubborn computer scientist accidentally launched the deep learning boom Read More »