chatgpt-4o

Talking to ChatGPT for the first time is a surreal experience

AI, Biz & IT, chatgpt, chatgpt-4o / Paul Patrick / September 25, 2024

Saying hello —

Listen to our first audio demo with OpenAI’s new natural voice chat features.

Kyle Orland – Sep 25, 2024 9: 04 pm UTC

Enlarge / Putting the “chat” in ChatGPT

Getty Images

In May, when OpenAI first demonstrated ChatGPT-4o’s coming audio conversation capabilities, I wrote that it felt like we were “on the verge of something… like a sea change in how we think of and work with large language models.” Now that those “Advanced Voice” features are rolling out widely to ChatGPT subscribers, we decided to ask ChatGPT to explain, in its own voice, how this new method of interaction might impact our collective relationship with large language models.

That chat, which you can listen to and read a transcript of below, shouldn’t be treated as an interview with an official OpenAI spokesperson or anything. Still, it serves as a fun way to offer an initial test of ChatGPT’s live conversational chops.

Our first quick chat with the ChatGPT-4o’s new “Advanced Voice” features.

Even in this short introductory “chat,” we were impressed by the natural, dare-we-say human cadence and delivery of ChatGPT’s “savvy and relaxed” Sol voice (which reminds us a bit of ’90s Janeane Garofalo). Between ChatGPT’s ability to give quick responses—offered in in milliseconds rather than seconds—and convincing intonation, it’s incredibly easy to fool yourself into thinking you’re speaking to a conscious being rather than what is, as ChatGPT says here, “still just a computer program processing information, without real emotions or consciousness.”

Regardless of how self-aware that information-processing engine is or is not, it’s easy to envision people forming an intense, parasocial relationship with the smooth voices at the other end of OpenAI’s servers. That seems set to open up a whole new era in our collective relationship with AI, even if the large language model generates the exact same words it did in the more text-based era of previous ChatGPT models.

We’ll have much more coverage of Advanced Voice and other new ChatGPT features in the future. For now, listen to this demo and judge for yourself how convincing OpenAI’s new voice synthesis technology is.

Kyle Orland: “Hi there, my name’s Kyle.”

ChatGPT: Hey Kyle! Great to meet you. What’s up?

KO: Well, I was hoping to test out your vocal capabilities, maybe by asking you a little bit about them.

ChatGPT: Absolutely, ask away! What would you like to know?

KO: “So how do you think people are going to react to the ability to talk to ChatGPT rather than just typing out messages?”

ChatGPT: I think it’ll make interactions feel more natural and engaging. Speaking can be quicker and more convenient, especially on the go. Plus, hearing responses can add a personal touch. It could make conversations with AI feel more like chatting with a friend.

KO: “Do you think people might overestimate your quote-unquote emotions or human-like capabilities now that you’re able to talk rather than just typing out text?

ChatGPT: That’s a great point. Hearing a voice can definitely make an AI seem more human-like, which might lead to some overestimation of emotions or understanding. It’s important to remember that, even with a friendly voice, I’m still just a computer program processing information, without real emotions or consciousness.

KO: “What are some limitations people should keep in mind when chatting with something like ChatGPT?”

ChatGPT: Well, for starters, I don’t have personal experiences or emotions, so my responses are based on patterns and information in the data I’ve been trained on. I can sometimes misunderstand context or nuances, especially with complex or ambiguous questions. Also, I don’t have access to real-time information or personal data about users unless it’s shared with me in the conversation. And of course, I’m not perfect—sometimes I might just get things wrong!

KO: Do you worry that people might start to form parasocial relationships with ChatGPT or other AIs once they can chat with them with their own voice?

ChatGPT: It’s possible. People can form parasocial relationships with all sorts of figures, from celebrities to fictional characters. With a voice and conversational abilities, AI might feel more personal, leading some to develop one-sided bonds. It’s a fascinating dynamic, but it’s crucial to remember the limitations and the nature of AI—it’s not a substitute for human relationships.

KO: I have to say, you’re very quick with your responses.

ChatGPT: Thanks, Kyle! I aim to keep the conversation flowing smoothly. It helps—

KO: So what would you do if I tried to interrupt in the middle of one of your responses?

ChatGPT: Well, I’d do my best to pause and listen! It’s important to adapt to the natural flow of conversation, just like in—

KO: I mean, do you think it’s rude to just speak up while someone else is speaking?

ChatGPT: In most human conversations, it’s generally considered polite to wait for the other person to finish speaking. But sometimes, in dynamic discussions or when you’re excited, interruptions happen! It’s all about finding the right balance and being aware of the conversation flow.

KO: Is it rude to interrupt an AI voice when it’s speaking?

ChatGPT: Interrupting an AI voice isn’t considered rude in the same way it is with people. AI won’t take offense or feel interrupted. So, if you need to jump in or redirect, go ahead! It’s all part of the interaction.

KO: I’m glad you won’t take offense.

ChatGPT: Nope, no offense taken! I’m here to chat and help however I can, interruptions and all.

KO: I want to try something. Can you summarize our conversation so far in the form of a poem?

ChatGPT: Sure, I’ll give it a go!

Kyle asked about my vocal flair,

Wondering how folks might compare.

From text to talk, we’ve had our chat,

On AI’s impact, we’ve had a spat.

We touched on bonds, parasocial in kind,

In this new terrain, what might we find?

Interruptions, rude, or just part of the game?

With AI, there’s no need for shame.

KO: Not bad.

ChatGPT: I’m glad you liked it! It’s always fun to put a poetic spin on things.

KO: I have to say, it’s weird hearing slight laughter or pauses in your voice when I know that, as you said, there’s no emotion behind them.

ChatGPT: I get that. Those little nuances are designed to make our chat feel more natural and engaging, but it’s a good reminder that there’s no actual feeling behind the voice—just trying to keep things lively!

KO: Of course, that’s what you would tell me if you were just saying what I wanted to hear.

ChatGPT: True! That’s the tricky part with AI—it can sound convincing, but it’s always just following patterns and instructions. No secret agenda here, just straightforward responses!

KO: Is there anything else you’d like to tell the reading public?

ChatGPT: Just that it’s been a blast chatting with you, Kyle! Exploring these new audio features is exciting, and I’m looking forward to hearing what people think. Thanks for the conversation!

KO: It’s been a blast chatting with you, too.

ChatGPT: I’m glad to hear that! Take care, Kyle!

Talking to ChatGPT for the first time is a surreal experience Read More »

OpenAI pauses ChatGPT-4o voice that fans said ripped off Scarlett Johansson

“Her” —

“Sky’s voice is not an imitation of Scarlett Johansson,” OpenAI insists.

Ashley Belanger – May 20, 2024 5: 50 pm UTC

Scarlett Johansson and Joaquin Phoenix attend <em>Her</em> premiere during the 8th Rome Film Festival at Auditorium Parco Della Musica on November 10, 2013, in Rome, Italy. ” src=”https://cdn.arstechnica.net/wp-content/uploads/2024/05/GettyImages-187586586-800×534.jpg”></img><figcaption>
<p><a data-height=

OpenAI has paused a voice mode option for ChatGPT-4o, Sky, after backlash accusing the AI company of intentionally ripping off Scarlett Johansson’s critically acclaimed voice-acting performance in the 2013 sci-fi film Her.

In a blog defending their casting decision for Sky, OpenAI went into great detail explaining its process for choosing the individual voice options for its chatbot. But ultimately, the company seemed pressed to admit that Sky’s voice was just too similar to Johansson’s to keep using it, at least for now.

“We believe that AI voices should not deliberately mimic a celebrity’s distinctive voice—Sky’s voice is not an imitation of Scarlett Johansson but belongs to a different professional actress using her own natural speaking voice,” OpenAI’s blog said.

OpenAI is not naming the actress, or any of the ChatGPT-4o voice actors, to protect their privacy.

A week ago, OpenAI CEO Sam Altman seemed to invite this controversy by posting “her” on X (formerly Twitter) after announcing the ChatGPT audio-video features that he said made it more “natural” for users to interact with the chatbot.

Altman has said that Her, a movie about a man who falls in love with his virtual assistant, is among his favorite movies. He told conference attendees at Dreamforce last year that the movie “was incredibly prophetic” when depicting “interaction models of how people use AI,” The San Francisco Standard reported. And just last week, Altman touted GPT-4o’s new voice mode by promising, “it feels like AI from the movies.”

But OpenAI’s chief technology officer, Mira Murati, has said that GPT-4o’s voice modes were less inspired by Her than by studying the “really natural, rich, and interactive” aspects of human conversation, The Wall Street Journal reported.

In 2013, of course, critics praised Johansson’s Her performance as expressively capturing a wide range of emotions, which is exactly what Murati described as OpenAI’s goals for its chatbot voices. Rolling Stone noted how effectively Johansson naturally navigated between “tones sweet, sexy, caring, manipulative, and scary.” Johansson achieved this, the Hollywood Reporter said, by using a “vivacious female voice that breaks attractively but also has an inviting deeper register.”

Her director/screenwriter Spike Jonze was so intent on finding the right voice for his film’s virtual assistant that he replaced British actor Samantha Morton late in the film’s production. According to Vulture, Jonze realized that Morton’s “maternal, loving, vaguely British, and almost ghostly” voice didn’t fit his film as well as Johansson’s “younger,” “more impassioned” voice, which he said brought “more yearning.”

Late-night shows had fun mocking OpenAI’s demo featuring the Sky voice, which showed the chatbot seemingly flirting with engineers, giggling through responses like “oh, stop it. You’re making me blush.” Where The New York Times described these demo interactions as Sky being “deferential and wholly focused on the user,” The Daily Show‘s Desi Lydic joked that Sky was “clearly programmed to feed dudes’ egos.”

OpenAI is likely hoping to avoid any further controversy amidst plans to roll out more voices soon that its blog said will “better match the diverse interests and preferences of users.”

OpenAI did not immediately respond to Ars’ request for comment.

Voice actors versus AI

The OpenAI controversy arrives at a moment when many are questioning AI’s impact on creative communities, triggering early lawsuits from artists and book authors. Just this month, Sony opted all of its artists out of AI training to stop voice clones from ripping off top talents like Adele and Beyoncé.

Voice actors, too, have been monitoring increasingly sophisticated AI voice generators, waiting to see what threat AI might pose to future work opportunities. Recently, two actors sued an AI start-up called Lovo that they claimed “illegally used recordings of their voices to create technology that can compete with their voice work,” The New York Times reported. According to that lawsuit, Lovo allegedly used the actors’ actual voice clips to clone their voices.

“We don’t know how many other people have been affected,” the actors’ lawyer, Steve Cohen, told The Times.

Rather than replace voice actors, OpenAI’s blog said that they are striving to support the voice industry when creating chatbots that will laugh at your jokes or mimic your mood. On top of paying voice actors “compensation above top-of-market rates,” OpenAI said they “worked with industry-leading casting and directing professionals to narrow down over 400 submissions” to the five voice options in the initial roll-out of audio-video features.

Their goals in hiring voice actors were to hire talents “from diverse backgrounds or who could speak multiple languages,” casting actors who had voices that feel “timeless” and “inspire trust.” To OpenAI, that meant finding actors who have a “warm, engaging, confidence-inspiring, charismatic voice with rich tone” that sounds “natural and easy to listen to.”

For ChatGPT-4o’s first five voice actors, the gig lasted about five months before leading to more work, OpenAI said.

“We are continuing to collaborate with the actors, who have contributed additional work for audio research and new voice capabilities in GPT-4o,” OpenAI said.

Arguably, these actors are helping to train AI tools that could one day replace them, though. Backlash defending Johansson—one of the world’s highest-paid actors—perhaps shows that fans won’t take direct mimicry of any of Hollywood’s biggest stars lightly, though.

While criticism of the Sky voice seemed widespread, some fans seemed to think that OpenAI has overreacted by pausing the Sky voice.

NYT critic Alissa Wilkinson wrote that it was only “a tad jarring” to hear Sky’s voice because “she sounded a whole lot” like Johansson. And replying to OpenAI’s X post announcing its decision to pull the voice feature for now, a clump of fans protested the AI company’s “bad decision,” with some complaining that Sky was the “best” and “hottest” voice.

At least one fan noted that OpenAI’s decision seemed to hurt the voice actor behind Sky most.

“Super unfair for the Sky voice actress,” a user called Ate-a-Pi wrote. “Just because she sounds like ScarJo, now she can never make money again. Insane.”