Microsoft, OpenAI may have solved a fundamental AI bottleneck

A profile of a human brain against a digital background.
(Image credit: Pixabay)

Microsoft and Open AI have developed a new method for optimizing massive AI models that are too expensive to train multiple times, such as GPT-3.

A blog post published by Microsoft Research describes a technique called µ-Parametrization (or µP), which plays on the discovery of similarities between the behaviour of small- and large-scale AI models to minimize the quantity of compute resources required to make optimizations.

Although you’d need a doctorate to make sense of the specifics, the essential message is this: with µ-Parametrization, it will be cheaper and simpler to develop larger-scale AI models capable of yielding far superior performance to those available today.

Optimizing AI models

As explained in the blog post, one reason large AI models are difficult to train effectively is because we have little insight into the way their behavior changes as they scale. As such, the larger the AI model, the less well-tuned researchers would currently expect it to be.

However, µ-Parametrization offers a route to tuning large-scale models at much lower costs and much greater efficiency, by capitalizing on the insight that neural networks of varying sizes share the same optimal hyperparameters (HPs) in some conditions.

Essentially, this means a small-scale tuning process can be extrapolated outwards and mapped onto a much larger model, instead of tuning an entire multi-billion-parameter model directly.

“µP’s principled way of parameterizing the model and selecting the learning rate make it easier for anybody to scale the training of deep neural networks. Such an elegant combination of beautiful theory and practical impact,” said Johannes Gehrke, Lab Director at Microsoft Research.

To put the theory into practice, Microsoft worked with OpenAI to unleash µ-Parametrization on GPT-3, a natural language model whose largest iteration is made up of 175 billion parameters.

“After parameterizing a version of GPT-3 with relative attention in µP, we tuned a small proxy model with 40 million parameters before copying the best hyperparameter combination to the 6.7-billion parameter variant of GPT-3,” Microsoft explained.

The results were quite startling; the collaborators managed to create an even more performant version of GPT-3, using just 7% of the compute power consumed in the pretraining of the 6.7-billion parameter model.

To help other practitioners benefit from these findings, Microsoft has published a PyTorch package designed to help integrate µ-Parametrization into their existing models, which can supposedly be finicky in practice.

The company also says there remains plenty that is yet to be understood about the scaling of AI models, however, and pledged to continue its work to “derive more principled approaches to large-scale machine learning”.

Joel Khalili
News and Features Editor

Joel Khalili is the News and Features Editor at TechRadar Pro, covering cybersecurity, data privacy, cloud, AI, blockchain, internet infrastructure, 5G, data storage and computing. He's responsible for curating our news content, as well as commissioning and producing features on the technologies that are transforming the way the world does business.

Read more
A profile of a human brain against a digital background.
Navigating the rising costs of AI inference in the era of large-scale applications
An AI face in profile against a digital background.
Five pillars for practical GenAI implementation
Data center racks with cables and servers
The tipping point for AI and Managed Cloud
An AI face in profile against a digital background.
Navigating transparency, bias, and the human imperative in the age of democratized AI
Image of someone clicking a cloud icon.
Unified data means faster AI: Here’s how to unleash its potential
ChatGPT deep research
OpenAI reveals its most powerful tool yet, designed for "deep research"
Latest in Pro
Hands typing on a keyboard surrounded by security icons
The psychology of scams: how cybercriminals are exploiting the human brain
The TikTok logo appears on a smartphone screen with the United States flag in the background
Oracle could still end up running TikTok
Abstract image of cyber security in action.
MassJacker malware targets those looking for pirated software
Stress
Complexity of IT systems could be increasing security risks for businesses
Code Skull
US government warns Medusa ransomware has hit hundreds of critical infrastructure targets
Ai tech, businessman show virtual graphic Global Internet connect Chatgpt Chat with AI, Artificial Intelligence.
CEOs think they might lose their jobs if they can't deliver on AI
Latest in News
Jason Sudeikis' Ted Lasso pointing at someone in Ted Lasso season 2
Believe it, baby: Ted Lasso season 4 is officially in development for Apple TV+ – and Jason Sudeikis will reprise his role as the titular soccer coach
Rainbow Six Siege X promotional art.
The Tom Clancy's Rainbow Six Siege X 6v6 mode might finally pull me away from Black Ops 6
A close up of the new web version of Apple Music Classical
Apple Music Classical is now available on the web, but its Mac app is still nowhere in sight
Silent Hill f
Silent Hill f will present players with 'a beautiful yet terrifying choice', and I can't wait to see what it is
Google Chromecast 2
Google is finally rolling out a fix for broken Chromecasts – just as new bugs appear on the Chromecast with Google TV
Garmin Instinct 3 in Neotropic Green
"I'm an idiot": Garmin user reveals how fixing one setting completely changed their training after months of making no progress