OpenAI Launches Advanced Voice Mode for ChatGPT with Hyper-Realistic Audio

OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday(07/30/24), offering a select group of ChatGPT Plus users a first glimpse at GPT-4o’s hyper-realistic audio responses. This feature, which will be expanded to all Plus users in Fall 2024, introduces significant advancements in AI-driven vocal technology.
The Debut of GPT-4o’s Voice
When OpenAI first demonstrated GPT-4o’s voice capabilities in May, it left the audience astounded with its quick and eerily lifelike responses. Named “Sky,” the voice bore a striking resemblance to Scarlett Johansson, known for her role as an artificial assistant in the film “Her.” Following the demonstration, Johansson sought legal counsel to protect her likeness, prompting OpenAI to withdraw the voice from further demos and delay the feature’s launch to enhance safety protocols.

Advanced Voice Mode: A Step Forward
The Advanced Voice Mode, while not including the video and screen sharing features previewed during OpenAI’s Spring Update, marks a significant upgrade from the existing Voice Mode. Unlike the previous system, which relied on three separate models to interpret and respond to voice inputs, GPT-4o integrates these processes into a single, multimodal system. This reduces latency significantly and allows the AI to detect nuances in emotional tone, such as sadness, excitement, or singing.
Limited Availability and Safety Measures
Initially available to a small alpha group, OpenAI aims to closely monitor the usage of this new feature. Users selected for early access will receive notifications and guidance through the ChatGPT app. Over the past months, OpenAI has rigorously tested GPT-4o’s voice capabilities with the help of over 100 external testers fluent in 45 languages, with a full safety report expected to be released in early August.
Voice Customization and Ethical Considerations
The Advanced Voice Mode will offer four preset voices—Juniper, Breeze, Cove, and Ember—developed in collaboration with professional voice actors. The controversial Sky voice has been permanently retired from the lineup. Lindsay McCallum, an OpenAI spokesperson, emphasized that ChatGPT is designed to avoid impersonating real individuals and will automatically block any attempts to generate such outputs.
Navigating Legal Landscapes
Amidst rising concerns over deepfake technologies and their implications, OpenAI has introduced stringent measures to prevent misuse. Following incidents where AI technology was used to mislead or impersonate, such as the deceptive use of voice cloning during the New Hampshire primaries, OpenAI has implemented filters to block requests for generating copyrighted audio content. These steps are part of broader efforts to navigate the legal complexities facing AI developers, particularly in the music industry where litigation over copyright infringement is increasingly common.

