Mozilla’s new open source model aims to revolutionize voice recognition

You may have noticed the steady and sure progress of voice recognition tech in recent times – all the big tech firms want to make strides in this arena if only to improve their digital assistants, from Cortana to Siri – but Mozilla wants to push harder, and more broadly, on this front with the release of an open source speech recognition model.

The initial release of this Automatic Speech Recognition engine has just been unleashed, based on work carried out by the Machine Learning team at Mozilla. The engine is modelled on ‘Deep Speech’ papers published by Baidu, which detail a trainable multi-layered deep neural network.

Mozilla says that its project initially had a goal of hitting a ‘word error rate’ of less than 10%. However, the firm says the engine’s word error rate on LibriSpeech’s test-clean set is now 6.5%, clearly beating this goal, and achieving close to the Holy Grail of human-level performance (which occurs at around 5.8%, according to the Deep Speech 2 paper).

Mozilla has worked hard to train the speech recognition model using ‘supervised learning’ and a huge dataset of thousands of hours of labeled audio, drawn from all manner of sources including free (TED-LIUM and LibriSpeech) and paid (Fisher and Switchboard) speech corpora.

Further labeled speech data was pulled from the likes of language study departments in universities, and public TV and radio stations, all of which was more fuel to the fire for honing the speech recognition engine.

And of course the huge strength of this project, its open source nature, means that this honed technology is now open to anyone to use in their speech recognition projects.

Streamlined speech

Mozilla further notes that the plan for the future is to release a model that’s light and fast enough to run on a smartphone or single-board computer like the Raspberry Pi.

The company has also unleashed its Common Voice initiative, which is an open and publicly available voice dataset containing some 400,000 recordings from 20,000 different speakers – that represents around 500 hours of speech.

As Mozilla puts it, the idea here is to “build a speech corpus that's free, open source, and big enough to create meaningful products with”, running in parallel with the new speech recognition model.

Microsoft is also making big strides on the voice recognition front, having achieved a word error rate of 5.1% in the Switchboard speech recognition benchmark, as announced back in the summer.

Darren is a freelancer writing news and features for TechRadar (and occasionally T3) across a broad range of computing topics including CPUs, GPUs, various other hardware, VPNs, antivirus and more. He has written about tech for the best part of three decades, and writes books in his spare time (his debut novel - 'I Know What You Did Last Supper' - was published by Hachette UK in 2013).

Latest in Computing
ChatGPT Advanced Voice mode on a smartphone.
Talking to ChatGPT just got better, and you don’t need to pay to access the new functionality
Grok Image Edits
I tried Grok’s new AI image editing features – they’re fun but won’t replace Photoshop any time soon
AI hallucinations
We're already trusting AI with too much – I just hope AI hallucinations disappear before it's too late
Google Gemini AI
Gemini can now see your screen and judge your tabs
Girl wearing Meta Quest 3 headset interacting with a jungle playset
Latest Meta Quest 3 software beta teases a major design overhaul and VR screen sharing – and I need these updates now
Teenager playing on a gaming PC with two monitors
Samsung's OLED monitors are about to get much cheaper - and it's about time
Latest in News
ChatGPT Advanced Voice mode on a smartphone.
Talking to ChatGPT just got better, and you don’t need to pay to access the new functionality
Insecure network with several red platforms connected through glowing data lines and a black hat hacker symbol
Multiple H3C Magic routers hit by critical severity remote command injection, with no fix in sight
Apple Watch Ultra 2 timer
The Apple Watch is getting a sleep alarm upgrade it probably should have had 10 years ago
Nikon Z5
The Nikon Z5 II could land soon – here's what to expect from Nikon's rumored entry-level full-frame camera
Google Pixel Watch 3
Google Pixel Watches hit with delayed notifications, crashing, and performance issues following Wear OS 5.1 update
Zendesk Relate 2025
Zendesk Relate 2025 - everything you need to know as the event unfolds