Machine language: Computers are on the brink of mastering speech recognition

Like the other open source deep learning toolkits from Facebook, Google and various universities, CNTK uses GPUs for speed. Not only is it as fast or faster than the other toolkits when you run it on one PC with one GPU, it's nearly twice as fast when you run it on a PC with two GPUs. It's also the only toolkit that can run on multiple machines at once, and with eight GPUs on two PCs it's about three times as fast as the competition.

A speed comparison of CNTK versus other toolkits

A speed comparison of CNTK versus other toolkits

CNTK is faster than other deep learning toolkits and it scales better because you can run it distributed across multiple machines (it should run well on the new Azure GPU service that's currently in private preview). That performance is important for dealing with the massive amounts of data you need for problems like speech recognition.

"If you want to really develop artificial intelligence, you have to process data at web scale," he says. "Google brags that they deal with a huge amount of data in a distributed way, but what they've open sourced is really a small toolset."

"Since we adopted CNTK for experimenting with Cortana's speech recognition, the productivity for the product team has increased by almost a factor of ten. It's given them a huge boost. Before, it took them weeks to finish one experiment. They said before they adopted it they felt like they were driving a Volkswagen, after they switched it's like driving a Ferrari."

Nothing new

Speech recognition has been in Windows since Windows 95, Huang points out. "Thanks to Bill Gates' vision, as early as the 90s, we invested early in speech recognition. The progress year by year in driving speech recognition errors down has been foundational – if the error rate is too high [to be useful], then having vision doesn't help!

"But 20 years ago, Microsoft introduced the first speech API in Windows 95 and 20 years after that Microsoft added a range of AI tools going beyond speech into vision and understanding in Azure ML. With CNTK, it's the same desire to enable developers to take advantage of technology."

But the speech recognition it was designed to speed up isn't the only thing CNTK is good at. Microsoft has been trying it out for image recognition as well and, Huang claims, "CNTK is on a par with the best toolset out there for image processing."

Before, the Microsoft researchers and developers working on image recognition were using the popular Caffe tool from the University of Berkeley. Now they're switching over to CNTK, and as the latest GPUs arrive its performance is just getting better.

All-rounder

Being good at more than one task isn't usual for AI toolkits; they're usually very specific. "Caffe is just beautiful for image processing," says Huang, "but it's almost impossible to adopt that for speech." Huang is cautious about claiming that CNTK can handle all deep learning tasks – speech recognition, image recognition and natural language understanding are the three areas he's focusing on, but he's excited to see what people will do with it in other areas.

He concludes: "This tool is so powerful; it can absolutely deal with bigger challenges. The beauty of the tool is that when we get this into the hands of developers, something totally unexpected could happen that's just beyond our imagination. I believe they will find very creative ways of using it.

"The Microsoft internal workloads that we're building with CNTK are unbelievable. If you ask me what the next breakthrough will be, I'd say artificial intelligence – we'll create truly intelligent services that will help people to do more and reach a new level we've never experienced in the past."

TOPICS
Contributor

Mary (Twitter, Google+, website) started her career at Future Publishing, saw the AOL meltdown first hand the first time around when she ran the AOL UK computing channel, and she's been a freelance tech writer for over a decade. She's used every version of Windows and Office released, and every smartphone too, but she's still looking for the perfect tablet. Yes, she really does have USB earrings.

Latest in Software & Services
woman listening to computer
AWS vs Azure: choosing the right platform to maximize your company's investment
A person at a desktop computer working on spreadsheet tables.
Trello vs Jira: which project management solution is best for you?
Autonomous finance
Quickbooks vs Quicken: what are the main strengths and weaknesses for your business
finance
Quickbooks vs Xero: which is the best for your business?
Group of people meeting
Zoom vs Google Meet: which is the best video conferencing tool for your business?
Fingers typing on a computer keyboard.
Microsoft 365 Personal vs Microsoft 365 Family: are there any real differences?
Latest in News
Google Pixel 8a in aloe green showing
Google Pixel 9a benchmark link teases the performance of the upcoming mid-ranger
Quordle on a smartphone held in a hand
Quordle hints and answers for Monday, March 17 (game #1148)
NYT Strands homescreen on a mobile phone screen, on a light blue background
NYT Strands hints and answers for Monday, March 17 (game #379)
NYT Connections homescreen on a phone, on a purple background
NYT Connections hints and answers for Monday, March 17 (game #645)
Apple iPhone 16 Pro HANDS ON
Leaked iPhone 17 dummy units may have given us our best look yet at all four models
A super close up image of the Google Gemini app in the Play Store
It's official: Google Assistant will be retired for phones this year, with Gemini taking over