It's not an echo – ChatGPT might suddenly mimic your voice when you speak to it

A close up of ChatGPT on a phone, with the OpenAI logo in the background of the photo
(Image credit: Shutterstock/Daniel Chetroni)

ChatGPT might sometimes seem able to think like you, but wait until it suddenly sounds just like you, too. That’s a chance brought to light by the new Advanced Voice Mode for ChatGPT, specifically the more advanced GPT-4o model. OpenAI released the system card last week explaining what GPT-4o can and can’t do, which includes the very unlikely but still real possibility of Advanced Voice Mode, imitating users’ voices without their consent.

Advanced Voice Mode lets users engage in spoken conversations with the AI chatbot. The idea is to make interactions more natural and accessible. The AI has a few preset voices from which users can choose. However, the system card reports that this feature has exhibited unexpected behavior under certain conditions. During testing, a noisy input triggered the AI to mimic the voice of the user. 

The GPT-4o model produces voices using a system prompt, a hidden set of instructions that guides the model’s behavior during interactions. In the case of voice synthesis, this prompt relies on an authorized voice sample. But, while the system prompt guides the AI’s behavior, it is not foolproof. The model’s ability to synthesize voice from short audio clips means that, under certain conditions, it could generate other voices, including your own. You can hear what happened in the clip below when the AI jumps in with “No!” and suddenly sounds like the first speaker.

Voice Clone of Your Own

“Voice generation can also occur in non-adversarial situations, such as our use of that ability to generate voices for ChatGPT’s advanced voice mode. During testing, we also observed rare instances where the model would unintentionally generate an output emulating the user’s voice,” OpenAI explained in the system card. “While unintentional voice generation still exists as a weakness of the model, we use the secondary classifiers to ensure the conversation is discontinued if this occurs making the risk of unintentional voice generation minimal.”

As OpenAI said, it has since implemented safeguards to prevent such occurrences. That means using an output classifier designed to detect deviations from the pre-selected authorized voices. This classifier acts as a safeguard, helping to ensure that the AI does not generate unauthorized audio. Still, the fact that it happened at all reinforces how quickly this technology is evolving and how any safeguards have to evolve to match what the AI can do. The model’s outburst, where it suddenly exclaimed “No!” in a voice similar to the tester’s, underscores the potential for AI to inadvertently blur the lines between machine and human interactions.

You might also like...

TOPICS
Eric Hal Schwartz
Contributor

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.