Microsoft has a new text-to-speech AI tool to wow and annoy us

A man wearing headphones using a laptop.
(Image credit: Shutterstock)

It seems that 2023 is the year of artificial intelligence (AI), and Microsoft is the latest company keen to get in on the action. 

Researchers from the company have posted a paper detailing a new technology that would see huge leaps forward in text-to-speech tools.

A summary on the paper explains how the technology, which is being called VALL-E, “emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.”

Microsoft VALL-E

What this means in simple forms is that the tool can now break down what makes a person sound the way they do, including phoneme and acoustic code prompts, thanks to Meta’s EnCodec, and generate a sound that mimics more closely what they person may sound like beyond the three seconds of sample voice recording. The early stages of VALL-E have been made possible by analyzing over 60,000 hours’ worth of English language voice recordings.

The GitHub post surfaces a number of examples of how the technology can be used, including maintaining emotional cues and even environmental effects, such as the disconnected sound that’s typical of a phone conversation.

While concise, there is a mention of the potential implications of such text-to-speech tools, which is increasingly important in a time where AI has uncovered ethical concerns that we’d only previously dreamt of (or had nightmares of).

In fact, any number of problems could arise from false recordings giving permission to something (like the number of banks that use telephone-based voice recognition authentication), to a whole lot worse.

The conclusion states that VALL-E “may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. Benj Edwards of Ars Technica has also noted that Microsoft is yet to share the project’s code for anybody else to try out, indicating that the potential risks are still being considered.

TOPICS
Craig Hale

With several years’ experience freelancing in tech and automotive circles, Craig’s specific interests lie in technology that is designed to better our lives, including AI and ML, productivity aids, and smart fitness. He is also passionate about cars and the decarbonisation of personal transportation. As an avid bargain-hunter, you can be sure that any deal Craig finds is top value!

Read more
PlayAI
What is PlayAI: Everything we know about this text-to-speech, voice-cloning platform
Voice cloning
I cloned my voice in seconds using a free AI app, and we really need to talk about speech synthesis
Hume AI
What is Hume: Bring emotional understanding to AI-generated voices
ElevenLabs GenFM
What is ElevenLabs? Everything we know about the best AI speech startup
A hand reaching out to touch a futuristic rendering of an AI processor.
Best AI tools of 2025
Speechify
What is Speechify? Everything we know about the AI text-to-speech tool
Latest in Software & Services
TinEye website
I like this reverse image search service the most
A person in a wheelchair working at a computer.
Here’s a free way to find long lost relatives and friends
A white woman with long brown hair in a ponytail looks down at her computer in a distressed manner. She is holding her forehead with one hand and a credit card with the other
This people search finder covers all the bases, but it's not perfect
That's Them home page
Is That's Them worth it? My honest review
woman listening to computer
AWS vs Azure: choosing the right platform to maximize your company's investment
A person at a desktop computer working on spreadsheet tables.
Trello vs Jira: which project management solution is best for you?
Latest in News
A young woman is working on a laptop in a relaxed office space.
I’ll admit, Microsoft’s new Windows 11 update surprised me with its usefulness, providing accessibility fixes, a gamepad keyboard layout, and PC spec cards
inZOI promotional material.
inZOI has become the most wishlisted game on Steam, but I wouldn't get too caught up in the hype
Xbox Series X and Xbox wireless controller set to a green background
Xbox Insiders are currently testing a new Game Hub feature that looks useful, but I've got mixed feelings about it
A stylized depiction of a padlocked WiFi symbol sitting in the centre of an interlocking vault.
Broadcom warns of worrying security flaws affecting VMware tools
Nespresso Vertuo Pop machine in Candy Pink with coffee drinks and capsules
My favorite Nespresso coffee maker just got a fresh new makeover, and now I love it even more
Microsoft Surface Laptop and Surface Pro devices on a table.
Hate Windows 11’s search? Microsoft is fixing it with AI, and that almost makes me want to buy a Copilot+ PC