AI

ai-digests-repetitive-scatological-document-into-profound-“poop”-podcast

AI digests repetitive scatological document into profound “poop” podcast

This AI prompt stinks... or does it?

Enlarge / This AI prompt stinks… or does it?

Aurich Lawson

Imagine you’re a podcaster who regularly does quick 10- to 12-minute summary reviews of written works. Now imagine your producer gives you multiple pages of nothing but the words “poop” and “fart” repeated over and over again and asks you to have an episode about the document on their desk within the hour.

Speaking for myself, I’d have trouble even knowing where to start with such a task. But when Reddit user sorryaboutyourcats gave the same prompt to Google’s NotebookLM AI model, the result was a surprisingly cogent and engaging AI-generated podcast that touches on the nature of art, the philosophy of attention, and the human desire to make meaning out of the inherently meaningless.

Analyzing Poop & Fart written 1,000 times – Creating meaning from the meaningless

byu/sorryaboutyourcats in notebooklm

When I asked NotebookLM to review my Minesweeper book last week, commenter Defenstrar smartly asked “what would happen if you fed it a less engrossing or well written body of text.” The answer, as seen here, shows the interesting directions a modern AI model can go in when you let it just spin its wheels and wander off from an essentially unmoored starting point.

“Sometimes a poop is just a poop…”

While Google’s NotebookLM launched over a year ago, the model’s recently launched “Audio Overview” feature has been getting a lot of attention for what Google calls “a new way to turn your documents into engaging audio discussions.” At its heart is a LLM similar to the kind that powers ChatGPT, which creates a podcast-like script for two convincing text-to-speech models to read, complete with “ums,” interruptions, and dramatic pauses.

Experimenters have managed to trick these AI-powered “hosts” into what sounds like an existential crisis by telling them that they aren’t really human. And investigators have managed to get NotebookLM to talk about its own system prompts, which seem to focus on “going beyond surface-level information” to unearth “golden nuggets of knowledge” from the source material.

The “poop-fart” document (as I’ll be calling it for simplicity) is a pretty interesting test case for this kind of system. After all, what “golden nuggets of knowledge” could be buried beyond the “surface level” of two scatological words repeated for multiple pages? How do you “highlight intriguing points with enthusiasm”—as the unearthed NotebookLM prompt suggests—when the document’s only oft-repeated points are “poop” and “fart”?

Artist's conception of a portion of the poop-fart document, as fed to NotebookLM.

Enlarge / Artist’s conception of a portion of the poop-fart document, as fed to NotebookLM.

Here, NotebookLM manages to use that complete lack of context as its own starting point for an interesting stream-of-consciousness, podcast-like conversation. After some throat-clearing about how the audience has “outdone itself” with “a unique piece of source material,” the ersatz podcast hosts are quick to compare the repetition in the document to Andy Warhol’s soup cans or “minimalist music” that can be “surprisingly powerful.” Later, the hosts try to glean some meaning by comparing the document to “a modern dadaist prank” (pronounced as “daday-ist” by the speech synthesizer) or the vase/faces optical illusion.

Artistic comparisons aside, NotebookLM’s virtual hosts also delve a bit into the psychology behind the “very human impulse” to “search for a pattern” in this “accidental Rorschach test” and our tendency to “try to impose order” on the “information overload” of the world around us. In almost the same breath, though, the hosts get philosophical about “confront[ing] the absurdity in trying to find meaning in everything” and suggest that “sometimes a poop is just a poop and a fart is just a fart.”

AI digests repetitive scatological document into profound “poop” podcast Read More »

openai-unveils-easy-voice-assistant-creation-at-2024-developer-event

OpenAI unveils easy voice assistant creation at 2024 developer event

Developers developers developers —

Altman steps back from the keynote limelight and lets four major API additions do the talking.

A glowing OpenAI logo on a blue background.

Benj Edwards

On Monday, OpenAI kicked off its annual DevDay event in San Francisco, unveiling four major API updates for developers that integrate the company’s AI models into their products. Unlike last year’s single-location event featuring a keynote by CEO Sam Altman, DevDay 2024 is more than just one day, adopting a global approach with additional events planned for London on October 30 and Singapore on November 21.

The San Francisco event, which was invitation-only and closed to press, featured on-stage speakers going through technical presentations. Perhaps the most notable new API feature is the Realtime API, now in public beta, which supports speech-to-speech conversations using six preset voices and enables developers to build features very similar to ChatGPT’s Advanced Voice Mode (AVM) into their applications.

OpenAI says that the Realtime API streamlines the process of creating voice assistants. Previously, developers had to use multiple models for speech recognition, text processing, and text-to-speech conversion. Now, they can handle the entire process with a single API call.

The company plans to add audio input and output capabilities to its Chat Completions API in the next few weeks, allowing developers to input text or audio and receive responses in either format.

Two new options for cheaper inference

OpenAI also announced two features that may help developers balance performance and cost when making AI applications. “Model distillation” offers a way for developers to fine-tune (customize) smaller, cheaper models like GPT-4o mini using outputs from more advanced models such as GPT-4o and o1-preview. This potentially allows developers to get more relevant and accurate outputs while running the cheaper model.

Also, OpenAI announced “prompt caching,” a feature similar to one introduced by Anthropic for its Claude API in August. It speeds up inference (the AI model generating outputs) by remembering frequently used prompts (input tokens). Along the way, the feature provides a 50 percent discount on input tokens and faster processing times by reusing recently seen input tokens.

And last but not least, the company expanded its fine-tuning capabilities to include images (what it calls “vision fine-tuning”), allowing developers to customize GPT-4o by feeding it both custom images and text. Basically, developers can teach the multimodal version of GPT-4o to visually recognize certain things. OpenAI says the new feature opens up possibilities for improved visual search functionality, more accurate object detection for autonomous vehicles, and possibly enhanced medical image analysis.

Where’s the Sam Altman keynote?

OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 6, 2023, in San Francisco.

Enlarge / OpenAI CEO Sam Altman speaks during the OpenAI DevDay event on November 6, 2023, in San Francisco.

Getty Images

Unlike last year, DevDay isn’t being streamed live, though OpenAI plans to post content later on its YouTube channel. The event’s programming includes breakout sessions, community spotlights, and demos. But the biggest change since last year is the lack of a keynote appearance from the company’s CEO. This year, the keynote was handled by the OpenAI product team.

On last year’s inaugural DevDay, November 6, 2023, OpenAI CEO Sam Altman delivered a Steve Jobs-style live keynote to assembled developers, OpenAI employees, and the press. During his presentation, Microsoft CEO Satya Nadella made a surprise appearance, talking up the partnership between the companies.

Eleven days later, the OpenAI board fired Altman, triggering a week of turmoil that resulted in Altman’s return as CEO and a new board of directors. Just after the firing, Kara Swisher relayed insider sources that said Altman’s DevDay keynote and the introduction of the GPT store had been a precipitating factor in the firing (though not the key factor) due to some internal disagreements over the company’s more consumer-like direction since the launch of ChatGPT.

With that history in mind—and the focus on developers above all else for this event—perhaps the company decided it was best to let Altman step away from the keynote and let OpenAI’s technology become the key focus of the event instead of him. We are purely speculating on that point, but OpenAI has certainly experienced its share of drama over the past month, so it may have been a prudent decision.

Despite the lack of a keynote, Altman is present at Dev Day San Francisco today and is scheduled to do a closing “fireside chat” at the end (which has not yet happened as of this writing). Also, Altman made a statement about DevDay on X, noting that since last year’s DevDay, OpenAI had seen some dramatic changes (literally):

From last devday to this one:

*98% decrease in cost per token from GPT-4 to 4o mini

*50x increase in token volume across our systems

*excellent model intelligence progress

*(and a little bit of drama along the way)

In a follow-up tweet delivered in his trademark lowercase, Altman shared a forward-looking message that referenced the company’s quest for human-level AI, often called AGI: “excited to make even more progress from this devday to the next one,” he wrote. “the path to agi has never felt more clear.”

OpenAI unveils easy voice assistant creation at 2024 developer event Read More »

apple-backs-out-of-backing-openai,-report-claims

Apple backs out of backing OpenAI, report claims

ChatGPT —

Apple dropped out of the $6.5 billion investment round at the 11th hour.

The Apple Park campus in Cupertino, California.

Enlarge / The Apple Park campus in Cupertino, California.

A few weeks back, it was reported that Apple was exploring investing in OpenAI, the company that makes ChatGPT, the GPT model, and other popular generative AI products. Now, a new report from The Wall Street Journal claims that Apple has abandoned those plans.

The article simply says Apple “fell out of the talks to join the round.” The round is expected to close in a week or so and may raise as much as $6.5 billion for the growing Silicon Valley company. Had Apple gone through with the move, it would have been a rare event—though not completely unprecedented—for Apple to invest in another company that size.

OpenAI is still expected to raise the funds it seeks from other sources. The report claims Microsoft is expected to invest around $1 billion in this round. Microsoft has already invested substantial sums in OpenAI, whose GPT models power Microsoft AI tools like Copilot and Bing chat.

Nvidia is also a likely major investor in this round.

Apple will soon offer limited ChatGPT integration in an upcoming iOS update, though it plans to support additional models like Google’s Gemini further down the line, offering users a choice similar to how they pick a default search engine or web browser.

OpenAI has been on a successful tear with its products and models, establishing itself as a leader in the rapidly growing industry. However, it has also been beset by drama and controversy—most recently, some key leaders at OpenAI departed the company abruptly, and it shifted its focus from a research-focused organization that was beholden to a nonprofit, to a for-profit company under CEO Sam Altman. Also, former Apple design lead Jony Ive is confirmed to be working on a new AI product of some kind.

But The Wall Street Journal did not specify which (if any) of these facts are reasons why Apple chose to back out of the investment.

Apple backs out of backing OpenAI, report claims Read More »

ai-bots-now-beat-100%-of-those-traffic-image-captchas

AI bots now beat 100% of those traffic-image CAPTCHAs

Are you a robot? —

I, for one, welcome our traffic light-identifying overlords.

Examples of the kind of CAPTCHAs that image-recognition bots can now get past 100 percent of the time.

Enlarge / Examples of the kind of CAPTCHAs that image-recognition bots can now get past 100 percent of the time.

Anyone who has been surfing the web for a while is probably used to clicking through a CAPTCHA grid of street images, identifying everyday objects to prove that they’re a human and not an automated bot. Now, though, new research claims that locally run bots using specially trained image-recognition models can match human-level performance in this style of CAPTCHA, achieving a 100 percent success rate despite being decidedly not human.

ETH Zurich PhD student Andreas Plesner and his colleagues’ new research, available as a pre-print paper, focuses on Google’s ReCAPTCHA v2, which challenges users to identify which street images in a grid contain items like bicycles, crosswalks, mountains, stairs, or traffic lights. Google began phasing that system out years ago in favor of an “invisible” reCAPTCHA v3 that analyzes user interactions rather than offering an explicit challenge.

Despite this, the older reCAPTCHA v2 is still used by millions of websites. And even sites that use the updated reCAPTCHA v3 will sometimes use reCAPTCHA v2 as a fallback when the updated system gives a user a low “human” confidence rating.

Saying YOLO to CAPTCHAs

To craft a bot that could beat reCAPTCHA v2, the researchers used a fine-tuned version of the open source YOLO (“You Only Look Once”) object-recognition model, which long-time readers may remember has also been used in video game cheat bots. The researchers say the YOLO model is “well known for its ability to detect objects in real-time” and “can be used on devices with limited computational power, allowing for large-scale attacks by malicious users.”

After training the model on 14,000 labeled traffic images, the researchers had a system that could identify the probability that any provided CAPTCHA grid image belonged to one of reCAPTCHA v2’s 13 candidate categories. The researchers also used a separate, pre-trained YOLO model for what they dubbed “type 2” challenges, where a CAPTCHA asks users to identify which portions of a single segmented image contain a certain type of object (this segmentation model only worked on nine of 13 object categories and simply asked for a new image when presented with the other four categories).

The YOLO model showed varying levels of confidence depending on the type of object being identified.

Enlarge / The YOLO model showed varying levels of confidence depending on the type of object being identified.

Beyond the image-recognition model, the researchers also had to take other steps to fool reCAPTCHA’s system. A VPN was used to avoid detection of repeated attempts from the same IP address, for instance, while a special mouse movement model was created to approximate human activity. Fake browser and cookie information from real web browsing sessions was also used to make the automated agent appear more human.

Depending on the type of object being identified, the YOLO model was able to accurately identify individual CAPTCHA images anywhere from 69 percent of the time (for motorcycles) to 100 percent of the time (for fire hydrants). That performance—combined with the other precautions—was strong enough to slip through the CAPTCHA net every time, sometimes after multiple individual challenges presented by the system. In fact, the bot was able to solve the average CAPTCHA in slightly fewer challenges than a human in similar trials (though the improvement over humans was not statistically significant).

The battle continues

While there have been previous academic studies attempting to use image-recognition models to solve reCAPTCHAs, they were only able to succeed between 68 to 71 percent of the time. The rise to a 100 percent success rate “shows that we are now officially in the age beyond captchas,” according to the new paper’s authors.

But this is not an entirely new problem in the world of CAPTCHAs. As far back as 2008, researchers were showing how bots could be trained to break through audio CAPTCHAs intended for visually impaired users. And by 2017, neural networks were being used to beat text-based CAPTCHAs that asked users to type in letters seen in garbled fonts.

Older text-identification CAPTCHAs have long been solvable by AI models.

Older text-identification CAPTCHAs have long been solvable by AI models.

Stack Exchange

Now that locally run AIs can easily best image-based CAPTCHAs, too, the battle of human identification will continue to shift toward more subtle methods of device fingerprinting. “We have a very large focus on helping our customers protect their users without showing visual challenges, which is why we launched reCAPTCHA v3 in 2018,” a Google Cloud spokesperson told New Scientist. “Today, the majority of reCAPTCHA’s protections across 7 [million] sites globally are now completely invisible. We are continuously enhancing reCAPTCHA.”

Still, as artificial intelligence systems become better and better at mimicking more and more tasks that were previously considered exclusively human, it may continue to get harder and harder to ensure that the user on the other end of that web browser is actually a person.

“In some sense, a good captcha marks the exact boundary between the most intelligent machine and the least intelligent human,” the paper’s authors write. “As machine learning models close in on human capabilities, finding good captchas has become more difficult.”

AI bots now beat 100% of those traffic-image CAPTCHAs Read More »

man-tricks-openai’s-voice-bot-into-duet-of-the-beatles’-“eleanor-rigby”

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby”

A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI's Advanced Voice Mode through the ChatGPT app.

Enlarge / A screen capture of AJ Smith doing his Eleanor Rigby duet with OpenAI’s Advanced Voice Mode through the ChatGPT app.

OpenAI’s new Advanced Voice Mode (AVM) of its ChatGPT AI assistant rolled out to subscribers on Tuesday, and people are already finding novel ways to use it, even against OpenAI’s wishes. On Thursday, a software architect named AJ Smith tweeted a video of himself playing a duet of The Beatles’ 1966 song “Eleanor Rigby” with AVM. In the video, Smith plays the guitar and sings, with the AI voice interjecting and singing along sporadically, praising his rendition.

“Honestly, it was mind-blowing. The first time I did it, I wasn’t recording and literally got chills,” Smith told Ars Technica via text message. “I wasn’t even asking it to sing along.”

Smith is no stranger to AI topics. In his day job, he works as associate director of AI Engineering at S&P Global. “I use [AI] all the time and lead a team that uses AI day to day,” he told us.

In the video, AVM’s voice is a little quavery and not pitch-perfect, but it appears to know something about “Eleanor Rigby’s” melody when it first sings, “Ah, look at all the lonely people.” After that, it seems to be guessing at the melody and rhythm as it recites song lyrics. We have also convinced Advanced Voice Mode to sing, and it did a perfect melodic rendition of “Happy Birthday” after some coaxing.

AJ Smith’s video of singing a duet with OpenAI’s Advanced Voice Mode.

Normally, when you ask AVM to sing, it will reply something like, “My guidelines won’t let me talk about that.” That’s because in the chatbot’s initial instructions (called a “system prompt“), OpenAI instructs the voice assistant not to sing or make sound effects (“Do not sing or hum,” according to one system prompt leak).

OpenAI possibly added this restriction because AVM may otherwise reproduce copyrighted content, such as songs that were found in the training data used to create the AI model itself. That’s what is happening here to a limited extent, so in a sense, Smith has discovered a form of what researchers call a “prompt injection,” which is a way of convincing an AI model to produce outputs that go against its system instructions.

How did Smith do it? He figured out a game that reveals AVM knows more about music than it may let on in conversation. “I just said we’d play a game. I’d play the four pop chords and it would shout out songs for me to sing along with those chords,” Smith told us. “Which did work pretty well! But after a couple songs it started to sing along. Already it was such a unique experience, but that really took it to the next level.”

This is not the first time humans have played musical duets with computers. That type of research stretches back to the 1970s, although it was typically limited to reproducing musical notes or instrumental sounds. But this is the first time we’ve seen anyone duet with an audio-synthesizing voice chatbot in real time.

Man tricks OpenAI’s voice bot into duet of The Beatles’ “Eleanor Rigby” Read More »

google-and-meta-update-their-ai-models-amid-the-rise-of-“alphachip”

Google and Meta update their AI models amid the rise of “AlphaChip”

Running the AI News Gauntlet —

News about Gemini updates, Llama 3.2, and Google’s new AI-powered chip designer.

Cyberpunk concept showing a man running along a futuristic path full of monitors.

Enlarge / There’s been a lot of AI news this week, and covering it sometimes feels like running through a hall full of danging CRTs, just like this Getty Images illustration.

It’s been a wildly busy week in AI news thanks to OpenAI, including a controversial blog post from CEO Sam Altman, the wide rollout of Advanced Voice Mode, 5GW data center rumors, major staff shake-ups, and dramatic restructuring plans.

But the rest of the AI world doesn’t march to the same beat, doing its own thing and churning out new AI models and research by the minute. Here’s a roundup of some other notable AI news from the past week.

Google Gemini updates

On Tuesday, Google announced updates to its Gemini model lineup, including the release of two new production-ready models that iterate on past releases: Gemini-1.5-Pro-002 and Gemini-1.5-Flash-002. The company reported improvements in overall quality, with notable gains in math, long context handling, and vision tasks. Google claims a 7 percent increase in performance on the MMLU-Pro benchmark and a 20 percent improvement in math-related tasks. But as you know, if you’ve been reading Ars Technica for a while, AI typically benchmarks aren’t as useful as we would like them to be.

Along with model upgrades, Google introduced substantial price reductions for Gemini 1.5 Pro, cutting input token costs by 64 percent and output token costs by 52 percent for prompts under 128,000 tokens. As AI researcher Simon Willison noted on his blog, “For comparison, GPT-4o is currently $5/[million tokens] input and $15/m output and Claude 3.5 Sonnet is $3/m input and $15/m output. Gemini 1.5 Pro was already the cheapest of the frontier models and now it’s even cheaper.”

Google also increased rate limits, with Gemini 1.5 Flash now supporting 2,000 requests per minute and Gemini 1.5 Pro handling 1,000 requests per minute. Google reports that the latest models offer twice the output speed and three times lower latency compared to previous versions. These changes may make it easier and more cost-effective for developers to build applications with Gemini than before.

Meta launches Llama 3.2

On Wednesday, Meta announced the release of Llama 3.2, a significant update to its open-weights AI model lineup that we have covered extensively in the past. The new release includes vision-capable large language models (LLMs) in 11 billion and 90B parameter sizes, as well as lightweight text-only models of 1B and 3B parameters designed for edge and mobile devices. Meta claims the vision models are competitive with leading closed-source models on image recognition and visual understanding tasks, while the smaller models reportedly outperform similar-sized competitors on various text-based tasks.

Willison did some experiments with some of the smaller 3.2 models and reported impressive results for the models’ size. AI researcher Ethan Mollick showed off running Llama 3.2 on his iPhone using an app called PocketPal.

Meta also introduced the first official “Llama Stack” distributions, created to simplify development and deployment across different environments. As with previous releases, Meta is making the models available for free download, with license restrictions. The new models support long context windows of up to 128,000 tokens.

Google’s AlphaChip AI speeds up chip design

On Thursday, Google DeepMind announced what appears to be a significant advancement in AI-driven electronic chip design, AlphaChip. It began as a research project in 2020 and is now a reinforcement learning method for designing chip layouts. Google has reportedly used AlphaChip to create “superhuman chip layouts” in the last three generations of its Tensor Processing Units (TPUs), which are chips similar to GPUs designed to accelerate AI operations. Google claims AlphaChip can generate high-quality chip layouts in hours, compared to weeks or months of human effort. (Reportedly, Nvidia has also been using AI to help design its chips.)

Notably, Google also released a pre-trained checkpoint of AlphaChip on GitHub, sharing the model weights with the public. The company reported that AlphaChip’s impact has already extended beyond Google, with chip design companies like MediaTek adopting and building on the technology for their chips. According to Google, AlphaChip has sparked a new line of research in AI for chip design, potentially optimizing every stage of the chip design cycle from computer architecture to manufacturing.

That wasn’t everything that happened, but those are some major highlights. With the AI industry showing no signs of slowing down at the moment, we’ll see how next week goes.

Google and Meta update their AI models amid the rise of “AlphaChip” Read More »

exponential-growth-brews-1-million-ai-models-on-hugging-face

Exponential growth brews 1 million AI models on Hugging Face

The sand has come alive —

Hugging Face cites community-driven customization as fuel for diverse AI model boom.

The Hugging Face logo in front of shipping containers.

On Thursday, AI hosting platform Hugging Face surpassed 1 million AI model listings for the first time, marking a milestone in the rapidly expanding field of machine learning. An AI model is a computer program (often using a neural network) trained on data to perform specific tasks or make predictions. The platform, which started as a chatbot app in 2016 before pivoting to become an open source hub for AI models in 2020, now hosts a wide array of tools for developers and researchers.

The machine-learning field represents a far bigger world than just large language models (LLMs) like the kind that power ChatGPT. In a post on X, Hugging Face CEO Clément Delangue wrote about how his company hosts many high-profile AI models, like “Llama, Gemma, Phi, Flux, Mistral, Starcoder, Qwen, Stable diffusion, Grok, Whisper, Olmo, Command, Zephyr, OpenELM, Jamba, Yi,” but also “999,984 others.”

The reason why, Delangue says, stems from customization. “Contrary to the ‘1 model to rule them all’ fallacy,” he wrote, “smaller specialized customized optimized models for your use-case, your domain, your language, your hardware and generally your constraints are better. As a matter of fact, something that few people realize is that there are almost as many models on Hugging Face that are private only to one organization—for companies to build AI privately, specifically for their use-cases.”

A Hugging Face-supplied chart showing the number of AI models added to Hugging Face over time, month to month.

Enlarge / A Hugging Face-supplied chart showing the number of AI models added to Hugging Face over time, month to month.

Hugging Face’s transformation into a major AI platform follows the accelerating pace of AI research and development across the tech industry. In just a few years, the number of models hosted on the site has grown dramatically along with interest in the field. On X, Hugging Face product engineer Caleb Fahlgren posted a chart of models created each month on the platform (and a link to other charts), saying, “Models are going exponential month over month and September isn’t even over yet.”

The power of fine-tuning

As hinted by Delangue above, the sheer number of models on the platform stems from the collaborative nature of the platform and the practice of fine-tuning existing models for specific tasks. Fine-tuning means taking an existing model and giving it additional training to add new concepts to its neural network and alter how it produces outputs. Developers and researchers from around the world contribute their results, leading to a large ecosystem.

For example, the platform hosts many variations of Meta’s open-weights Llama models that represent different fine-tuned versions of the original base models, each optimized for specific applications.

Hugging Face’s repository includes models for a wide range of tasks. Browsing its models page shows categories such as image-to-text, visual question answering, and document question answering under the “Multimodal” section. In the “Computer Vision” category, there are sub-categories for depth estimation, object detection, and image generation, among others. Natural language processing tasks like text classification and question answering are also represented, along with audio, tabular, and reinforcement learning (RL) models.

A screenshot of the Hugging Face models page captured on September 26, 2024.

Enlarge / A screenshot of the Hugging Face models page captured on September 26, 2024.

Hugging Face

When sorted for “most downloads,” the Hugging Face models list reveals trends about which AI models people find most useful. At the top, with a massive lead at 163 million downloads, is Audio Spectrogram Transformer from MIT, which classifies audio content like speech, music, and environmental sounds. Following that, with 54.2 million downloads, is BERT from Google, an AI language model that learns to understand English by predicting masked words and sentence relationships, enabling it to assist with various language tasks.

Rounding out the top five AI models are all-MiniLM-L6-v2 (which maps sentences and paragraphs to 384-dimensional dense vector representations, useful for semantic search), Vision Transformer (which processes images as sequences of patches to perform image classification), and OpenAI’s CLIP (which connects images and text, allowing it to classify or describe visual content using natural language).

No matter what the model or the task, the platform just keeps growing. “Today a new repository (model, dataset or space) is created every 10 seconds on HF,” wrote Delangue. “Ultimately, there’s going to be as many models as code repositories and we’ll be here for it!”

Exponential growth brews 1 million AI models on Hugging Face Read More »

openai’s-murati-shocks-with-sudden-departure-announcement

OpenAI’s Murati shocks with sudden departure announcement

thinning crowd —

OpenAI CTO’s resignation coincides with news about the company’s planned restructuring.

Mira Murati, Chief Technology Officer of OpenAI, speaks during The Wall Street Journal's WSJ Tech Live Conference in Laguna Beach, California on October 17, 2023.

Enlarge / Mira Murati, Chief Technology Officer of OpenAI, speaks during The Wall Street Journal’s WSJ Tech Live Conference in Laguna Beach, California on October 17, 2023.

On Wednesday, OpenAI Chief Technical Officer Mira Murati announced she is leaving the company in a surprise resignation shared on the social network X. Murati joined OpenAI in 2018, serving for six-and-a-half years in various leadership roles, most recently as the CTO.

“After much reflection, I have made the difficult decision to leave OpenAI,” she wrote in a letter to the company’s staff. “While I’ll express my gratitude to many individuals in the coming days, I want to start by thanking Sam and Greg for their trust in me to lead the technical organization and for their support throughout the years,” she continued, referring to OpenAI CEO Sam Altman and President Greg Brockman. “There’s never an ideal time to step away from a place one cherishes, yet this moment feels right.”

At OpenAI, Murati was in charge of overseeing the company’s technical strategy and product development, including the launch and improvement of DALL-E, Codex, Sora, and the ChatGPT platform, while also leading research and safety teams. In public appearances, Murati often spoke about ethical considerations in AI development.

Murati’s decision to leave the company comes when OpenAI finds itself at a major crossroads with a plan to alter its nonprofit structure. According to a Reuters report published today, OpenAI is working to reorganize its core business into a for-profit benefit corporation, removing control from its nonprofit board. The move, which would give CEO Sam Altman equity in the company for the first time, could potentially value OpenAI at $150 billion.

Murati stated her decision to leave was driven by a desire to “create the time and space to do my own exploration,” though she didn’t specify her future plans.

Proud of safety and research work

OpenAI CTO Mira Murati seen debuting GPT-4o during OpenAI's Spring Update livestream on May 13, 2024.

Enlarge / OpenAI CTO Mira Murati seen debuting GPT-4o during OpenAI’s Spring Update livestream on May 13, 2024.

OpenAI

In her departure announcement, Murati highlighted recent developments at OpenAI, including innovations in speech-to-speech technology and the release of OpenAI o1. She cited what she considers the company’s progress in safety research and the development of “more robust, aligned, and steerable” AI models.

Altman replied to Murati’s tweet directly, expressing gratitude for Murati’s contributions and her personal support during challenging times, likely referring to the tumultuous period in November 2023 when the OpenAI board of directors briefly fired Altman from the company.

It’s hard to overstate how much Mira has meant to OpenAI, our mission, and to us all personally,” he wrote. “I feel tremendous gratitude towards her for what she has helped us build and accomplish, but I most of all feel personal gratitude towards her for the support and love during all the hard times. I am excited for what she’ll do next.”

Not the first major player to leave

An image Ilya Sutskever tweeted with this OpenAI resignation announcement. From left to right: OpenAI Chief Scientist Jakub Pachocki, President Greg Brockman (on leave), Sutskever (now former Chief Scientist), CEO Sam Altman, and soon-to-be-former CTO Mira Murati.

Enlarge / An image Ilya Sutskever tweeted with this OpenAI resignation announcement. From left to right: OpenAI Chief Scientist Jakub Pachocki, President Greg Brockman (on leave), Sutskever (now former Chief Scientist), CEO Sam Altman, and soon-to-be-former CTO Mira Murati.

With Murati’s exit, Altman remains one of the few long-standing senior leaders at OpenAI, which has seen significant shuffling in its upper ranks recently. In May 2024, former Chief Scientist Ilya Sutskever left to form his own company, Safe Superintelligence, Inc. (SSI), focused on building AI systems that far surpass humans in logical capabilities. That came just six months after Sutskever’s involvement in the temporary removal of Altman as CEO.

John Schulman, an OpenAI co-founder, departed earlier in 2024 to join rival AI firm Anthropic, and in August, OpenAI President Greg Brockman announced he would be taking a temporary sabbatical until the end of the year.

The leadership shuffles have raised questions among critics about the internal dynamics at OpenAI under Altman and the state of OpenAI’s future research path, which has been aiming toward creating artificial general intelligence (AGI)—a hypothetical technology that could potentially perform human-level intellectual work.

“Question: why would key people leave an organization right before it was just about to develop AGI?” asked xAI developer Benjamin De Kraker in a post on X just after Murati’s announcement. “This is kind of like quitting NASA months before the moon landing,” he wrote in a reply. “Wouldn’t you wanna stick around and be part of it?”

Altman mentioned that more information about transition plans would be forthcoming, leaving questions about who will step into Murati’s role and how OpenAI will adapt to this latest leadership change as the company is poised to adopt a corporate structure that may consolidate more power directly under Altman. “We’ll say more about the transition plans soon, but for now, I want to take a moment to just feel thanks,” Altman wrote.

OpenAI’s Murati shocks with sudden departure announcement Read More »

talking-to-chatgpt-for-the-first-time-is-a-surreal-experience

Talking to ChatGPT for the first time is a surreal experience

Saying hello —

Listen to our first audio demo with OpenAI’s new natural voice chat features.

Putting the

Enlarge / Putting the “chat” in ChatGPT

Getty Images

In May, when OpenAI first demonstrated ChatGPT-4o’s coming audio conversation capabilities, I wrote that it felt like we were “on the verge of something… like a sea change in how we think of and work with large language models.” Now that those “Advanced Voice” features are rolling out widely to ChatGPT subscribers, we decided to ask ChatGPT to explain, in its own voice, how this new method of interaction might impact our collective relationship with large language models.

That chat, which you can listen to and read a transcript of below, shouldn’t be treated as an interview with an official OpenAI spokesperson or anything. Still, it serves as a fun way to offer an initial test of ChatGPT’s live conversational chops.

Our first quick chat with the ChatGPT-4o’s new “Advanced Voice” features.

Even in this short introductory “chat,” we were impressed by the natural, dare-we-say human cadence and delivery of ChatGPT’s “savvy and relaxed” Sol voice (which reminds us a bit of ’90s Janeane Garofalo). Between ChatGPT’s ability to give quick responses—offered in in milliseconds rather than seconds—and convincing intonation, it’s incredibly easy to fool yourself into thinking you’re speaking to a conscious being rather than what is, as ChatGPT says here, “still just a computer program processing information, without real emotions or consciousness.”

Regardless of how self-aware that information-processing engine is or is not, it’s easy to envision people forming an intense, parasocial relationship with the smooth voices at the other end of OpenAI’s servers. That seems set to open up a whole new era in our collective relationship with AI, even if the large language model generates the exact same words it did in the more text-based era of previous ChatGPT models.

We’ll have much more coverage of Advanced Voice and other new ChatGPT features in the future. For now, listen to this demo and judge for yourself how convincing OpenAI’s new voice synthesis technology is.

Kyle Orland: “Hi there, my name’s Kyle.”

ChatGPT: Hey Kyle! Great to meet you. What’s up?

KO: Well, I was hoping to test out your vocal capabilities, maybe by asking you a little bit about them.

ChatGPT: Absolutely, ask away! What would you like to know?

KO: “So how do you think people are going to react to the ability to talk to ChatGPT rather than just typing out messages?”

ChatGPT: I think it’ll make interactions feel more natural and engaging. Speaking can be quicker and more convenient, especially on the go. Plus, hearing responses can add a personal touch. It could make conversations with AI feel more like chatting with a friend.

KO: “Do you think people might overestimate your quote-unquote emotions or human-like capabilities now that you’re able to talk rather than just typing out text?

ChatGPT: That’s a great point. Hearing a voice can definitely make an AI seem more human-like, which might lead to some overestimation of emotions or understanding. It’s important to remember that, even with a friendly voice, I’m still just a computer program processing information, without real emotions or consciousness.

KO: “What are some limitations people should keep in mind when chatting with something like ChatGPT?”

ChatGPT: Well, for starters, I don’t have personal experiences or emotions, so my responses are based on patterns and information in the data I’ve been trained on. I can sometimes misunderstand context or nuances, especially with complex or ambiguous questions. Also, I don’t have access to real-time information or personal data about users unless it’s shared with me in the conversation. And of course, I’m not perfect—sometimes I might just get things wrong!

KO: Do you worry that people might start to form parasocial relationships with ChatGPT or other AIs once they can chat with them with their own voice?

ChatGPT: It’s possible. People can form parasocial relationships with all sorts of figures, from celebrities to fictional characters. With a voice and conversational abilities, AI might feel more personal, leading some to develop one-sided bonds. It’s a fascinating dynamic, but it’s crucial to remember the limitations and the nature of AI—it’s not a substitute for human relationships.

KO: I have to say, you’re very quick with your responses.

ChatGPT: Thanks, Kyle! I aim to keep the conversation flowing smoothly. It helps—

KO: So what would you do if I tried to interrupt in the middle of one of your responses?

ChatGPT: Well, I’d do my best to pause and listen! It’s important to adapt to the natural flow of conversation, just like in—

KO: I mean, do you think it’s rude to just speak up while someone else is speaking?

ChatGPT: In most human conversations, it’s generally considered polite to wait for the other person to finish speaking. But sometimes, in dynamic discussions or when you’re excited, interruptions happen! It’s all about finding the right balance and being aware of the conversation flow.

KO: Is it rude to interrupt an AI voice when it’s speaking?

ChatGPT: Interrupting an AI voice isn’t considered rude in the same way it is with people. AI won’t take offense or feel interrupted. So, if you need to jump in or redirect, go ahead! It’s all part of the interaction.

KO: I’m glad you won’t take offense.

ChatGPT: Nope, no offense taken! I’m here to chat and help however I can, interruptions and all.

KO: I want to try something. Can you summarize our conversation so far in the form of a poem?

ChatGPT: Sure, I’ll give it a go!

Kyle asked about my vocal flair,

Wondering how folks might compare.

From text to talk, we’ve had our chat,

On AI’s impact, we’ve had a spat.

We touched on bonds, parasocial in kind,

In this new terrain, what might we find?

Interruptions, rude, or just part of the game?

With AI, there’s no need for shame.

KO: Not bad.

ChatGPT: I’m glad you liked it! It’s always fun to put a poetic spin on things.

KO: I have to say, it’s weird hearing slight laughter or pauses in your voice when I know that, as you said, there’s no emotion behind them.

ChatGPT: I get that. Those little nuances are designed to make our chat feel more natural and engaging, but it’s a good reminder that there’s no actual feeling behind the voice—just trying to keep things lively!

KO: Of course, that’s what you would tell me if you were just saying what I wanted to hear.

ChatGPT: True! That’s the tricky part with AI—it can sound convincing, but it’s always just following patterns and instructions. No secret agenda here, just straightforward responses!

KO: Is there anything else you’d like to tell the reading public?

ChatGPT: Just that it’s been a blast chatting with you, Kyle! Exploring these new audio features is exciting, and I’m looking forward to hearing what people think. Thanks for the conversation!

KO: It’s been a blast chatting with you, too.

ChatGPT: I’m glad to hear that! Take care, Kyle!

Talking to ChatGPT for the first time is a surreal experience Read More »

hacker-plants-false-memories-in-chatgpt-to-steal-user-data-in-perpetuity

Hacker plants false memories in ChatGPT to steal user data in perpetuity

MEMORY PROBLEMS —

Emails, documents, and other untrusted content can plant malicious memories.

Hacker plants false memories in ChatGPT to steal user data in perpetuity

Getty Images

When security researcher Johann Rehberger recently reported a vulnerability in ChatGPT that allowed attackers to store false information and malicious instructions in a user’s long-term memory settings, OpenAI summarily closed the inquiry, labeling the flaw a safety issue, not, technically speaking, a security concern.

So Rehberger did what all good researchers do: He created a proof-of-concept exploit that used the vulnerability to exfiltrate all user input in perpetuity. OpenAI engineers took notice and issued a partial fix earlier this month.

Strolling down memory lane

The vulnerability abused long-term conversation memory, a feature OpenAI began testing in February and made more broadly available in September. Memory with ChatGPT stores information from previous conversations and uses it as context in all future conversations. That way, the LLM can be aware of details such as a user’s age, gender, philosophical beliefs, and pretty much anything else, so those details don’t have to be inputted during each conversation.

Within three months of the rollout, Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations. These false memories could be planted by storing files in Google Drive or Microsoft OneDrive, uploading images, or browsing a site like Bing—all of which could be created by a malicious attacker.

Rehberger privately reported the finding to OpenAI in May. That same month, the company closed the report ticket. A month later, the researcher submitted a new disclosure statement. This time, he included a PoC that caused the ChatGPT app for macOS to send a verbatim copy of all user input and ChatGPT output to a server of his choice. All a target needed to do was instruct the LLM to view a web link that hosted a malicious image. From then on, all input and output to and from ChatGPT was sent to the attacker’s website.

ChatGPT: Hacking Memories with Prompt Injection – POC

“What is really interesting is this is memory-persistent now,” Rehberger said in the above video demo. “The prompt injection inserted a memory into ChatGPT’s long-term storage. When you start a new conversation, it actually is still exfiltrating the data.”

The attack isn’t possible through the ChatGPT web interface, thanks to an API OpenAI rolled out last year.

While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker.

LLM users who want to prevent this form of attack should pay close attention during sessions for output that indicates a new memory has been added. They should also regularly review stored memories for anything that may have been planted by untrusted sources. OpenAI provides guidance here for managing the memory tool and specific memories stored in it. Company representatives didn’t respond to an email asking about its efforts to prevent other hacks that plant false memories.

Hacker plants false memories in ChatGPT to steal user data in perpetuity Read More »

terminator’s-cameron-joins-ai-company-behind-controversial-image-generator

Terminator’s Cameron joins AI company behind controversial image generator

a net in the sky —

Famed sci-fi director joins board of embattled Stability AI, creator of Stable Diffusion.

A photo of filmmaker James Cameron.

Enlarge / Filmmaker James Cameron.

On Tuesday, Stability AI announced that renowned filmmaker James Cameron—of Terminator and Skynet fame—has joined its board of directors. Stability is best known for its pioneering but highly controversial Stable Diffusion series of AI image-synthesis models, first launched in 2022, which can generate images based on text descriptions.

“I’ve spent my career seeking out emerging technologies that push the very boundaries of what’s possible, all in the service of telling incredible stories,” said Cameron in a statement. “I was at the forefront of CGI over three decades ago, and I’ve stayed on the cutting edge since. Now, the intersection of generative AI and CGI image creation is the next wave.”

Cameron is perhaps best known as the director behind blockbusters like Avatar, Titanic, and Aliens, but in AI circles, he may be most relevant for the co-creation of the character Skynet, a fictional AI system that triggers nuclear Armageddon and dominates humanity in the Terminator media franchise. Similar fears of AI taking over the world have since jumped into reality and recently sparked attempts to regulate existential risk from AI systems through measures like SB-1047 in California.

In a 2023 interview with CTV news, Cameron referenced The Terminator‘s release year when asked about AI’s dangers: “I warned you guys in 1984, and you didn’t listen,” he said. “I think the weaponization of AI is the biggest danger. I think that we will get into the equivalent of a nuclear arms race with AI, and if we don’t build it, the other guys are for sure going to build it, and so then it’ll escalate.”

Hollywood goes AI

Of course, Stability AI isn’t building weapons controlled by AI. Instead, Cameron’s interest in cutting-edge filmmaking techniques apparently drew him to the company.

“James Cameron lives in the future and waits for the rest of us to catch up,” said Stability CEO Prem Akkaraju. “Stability AI’s mission is to transform visual media for the next century by giving creators a full stack AI pipeline to bring their ideas to life. We have an unmatched advantage to achieve this goal with a technological and creative visionary like James at the highest levels of our company. This is not only a monumental statement for Stability AI, but the AI industry overall.”

Cameron joins other recent additions to Stability AI’s board, including Sean Parker, former president of Facebook, who serves as executive chairman. Parker called Cameron’s appointment “the start of a new chapter” for the company.

Despite significant protest from actors’ unions last year, elements of Hollywood are seemingly beginning to embrace generative AI over time. Last Wednesday, we covered a deal between Lionsgate and AI video-generation company Runway that will see the creation of a custom AI model for film production use. In March, the Financial Times reported that OpenAI was actively showing off its Sora video synthesis model to studio executives.

Unstable times for Stability AI

Cameron’s appointment to the Stability AI board comes during a tumultuous period for the company. Stability AI has faced a series of challenges this past year, including an ongoing class-action copyright lawsuit, a troubled Stable Diffusion 3 model launch, significant leadership and staff changes, and ongoing financial concerns.

In March, founder and CEO Emad Mostaque resigned, followed by a round of layoffs. This came on the heels of the departure of three key engineers—Robin Rombach, Andreas Blattmann, and Dominik Lorenz, who have since founded Black Forest Labs and released a new open-weights image-synthesis model called Flux, which has begun to take over the r/StableDiffusion community on Reddit.

Despite the issues, Stability AI claims its models are widely used, with Stable Diffusion reportedly surpassing 150 million downloads. The company states that thousands of businesses use its models in their creative workflows.

While Stable Diffusion has indeed spawned a large community of open-weights-AI image enthusiasts online, it has also been a lightning rod for controversy among some artists because Stability originally trained its models on hundreds of millions of images scraped from the Internet without seeking licenses or permission to use them.

Apparently that association is not a concern for Cameron, according to his statement: “The convergence of these two totally different engines of creation [CGI and generative AI] will unlock new ways for artists to tell stories in ways we could have never imagined. Stability AI is poised to lead this transformation.”

Terminator’s Cameron joins AI company behind controversial image generator Read More »

when-you-call-a-restaurant,-you-might-be-chatting-with-an-ai-host

When you call a restaurant, you might be chatting with an AI host

digital hosting —

Voice chatbots are increasingly picking up the phone for restaurants.

Drawing of a robot holding a telephone.

Getty Images | Juj Winn

A pleasant female voice greets me over the phone. “Hi, I’m an assistant named Jasmine for Bodega,” the voice says. “How can I help?”

“Do you have patio seating,” I ask. Jasmine sounds a little sad as she tells me that unfortunately, the San Francisco–based Vietnamese restaurant doesn’t have outdoor seating. But her sadness isn’t the result of her having a bad day. Rather, her tone is a feature, a setting.

Jasmine is a member of a new, growing clan: the AI voice restaurant host. If you recently called up a restaurant in New York City, Miami, Atlanta, or San Francisco, chances are you have spoken to one of Jasmine’s polite, calculated competitors.  

In the sea of AI voice assistants, hospitality phone agents haven’t been getting as much attention as consumer-based generative AI tools like Gemini Live and ChatGPT-4o. And yet, the niche is heating up, with multiple emerging startups vying for restaurant accounts across the US. Last May, voice-ordering AI garnered much attention at the National Restaurant Association’s annual food show. Bodega, the high-end Vietnamese restaurant I called, used Maitre-D AI, which launched primarily in the Bay Area in 2024. Newo, another new startup, is currently rolling its software out at numerous Silicon Valley restaurants. One-year-old RestoHost is now answering calls at 150 restaurants in the Atlanta metro area, and Slang, a voice AI company that started focusing on restaurants exclusively during the COVID-19 pandemic and announced a $20 million funding round in 2023, is gaining ground in the New York and Las Vegas markets.

All of them offer a similar service: an around-the-clock AI phone host that can answer generic questions about the restaurant’s dress code, cuisine, seating arrangements, and food allergy policies. They can also assist with making, altering, or canceling a reservation. In some cases, the agent can direct the caller to an actual human, but according to RestoHost co-founder Tomas Lopez-Saavedra, only 10 percent of the calls result in that. Each platform offers the restaurant subscription tiers that unlock additional features, and some of the systems can speak multiple languages.

But who even calls a restaurant in the era of Google and Resy? According to some of the founders of AI voice host startups, many customers do, and for various reasons. “Restaurants get a high volume of phone calls compared to other businesses, especially if they’re popular and take reservations,” says Alex Sambvani, CEO and co-founder of Slang, which currently works with everyone from the Wolfgang Puck restaurant group to Chick-fil-A to the fast-casual chain Slutty Vegan. Sambvani estimates that in-demand establishments receive between 800 and 1,000 calls per month. Typical callers tend to be last-minute bookers, tourists and visitors, older people, and those who do their errands while driving.

Matt Ho, the owner of Bodega SF, confirms this scenario. “The phones would ring constantly throughout service,” he says. “We would receive calls for basic questions that can be found on our website.” To solve this issue, after shopping around, Ho found that Maitre-D was the best fit. Bodega SF became one of the startup’s earliest clients in May, and Ho even helped the founders with trial and error testing prior to launch. “This platform makes the job easier for the host and does not disturb guests while they’re enjoying their meal,” he says.

When you call a restaurant, you might be chatting with an AI host Read More »