Google's AI robots are learning from watching movies – just like the rest of us

Google DeepMind AI Robot
(Image credit: Google DeepMind)

Google DeepMind's robotics team is teaching robots to learn how a human intern would: by watching a video. The team has published a new paper demonstrating how Google's RT-2 robots embedded with the Gemini 1.5 Pro generative AI model can absorb information from videos to learn how to get around and even carry out requests at their destination.

Thanks to the Gemini 1.5 Pro model's long context window, training a robot like a new intern is possible. This window allows the AI to process extensive amounts of information simultaneously. The researchers would film a video tour of a designated area, such as a home or office. Then, the robot would watch the video and learn about the environment. 

The details in the video tours let the robot complete tasks based on its learned knowledge, using both verbal and image outputs. It's an impressive way of showing how robots might interact with their environment in ways reminiscent of human behavior. You can see how it works in the video below, as well as examples of different tasks the robot might carry out.

Robot AI Expertise

Those demonstrations aren't rare flukes, either. In practical tests, Gemini-powered robots operated within a 9,000-square-foot area and successfully followed over 50 different user instructions with a 90 percent success rate. This high level of accuracy opens up many potential real-world uses for AI-powered robots, helping out at home with chores or at work with menial or even more complex tasks.

That's because one of the more notable aspects of the Gemini 1.5 Pro model is its ability to complete multi-step tasks. DeepMind's research has found that the robots can work out how to answer questions like whether there's a specific drink available by navigating to a refrigerator, visually processing what's within, and then returning and answering the question. 

The idea of planning and carrying out the entire sequence of actions demonstrates a level of understanding and execution that goes beyond the current standard of single-step orders for most robots. 

Don't expect to see this robot for sale any time soon, though. For one thing, it takes up to 30 seconds to process each instruction, which is way slower than just doing something yourself in most cases. The chaos of real-world homes and offices will be much harder for a robot to navigate than a controlled environment, no matter how advanced the AI model is. 

Still, integrating AI models like Gemini 1.5 Pro into robotics is part of a larger leap forward in the field. Robots equipped with models like Gemini or its rivals could transform healthcare, shipping, and even janitorial duties.

You might also like

TOPICS
Eric Hal Schwartz
Contributor

Eric Hal Schwartz is a freelance writer for TechRadar with more than 15 years of experience covering the intersection of the world and technology. For the last five years, he served as head writer for Voicebot.ai and was on the leading edge of reporting on generative AI and large language models. He's since become an expert on the products of generative AI models, such as OpenAI’s ChatGPT, Anthropic’s Claude, Google Gemini, and every other synthetic media tool. His experience runs the gamut of media, including print, digital, broadcast, and live events. Now, he's continuing to tell the stories people want and need to hear about the rapidly evolving AI space and its impact on their lives. Eric is based in New York City.

Read more
Google Gemini Robotics
Gemini just got physical and you should prepare for a robot revolution
the YouTube logo on a screen in front of other YouTube logos covering a black background
Gemini AI can now watch YouTube videos for you, and this changes everything
Google Gemini AI logo on a smartphone with Google background
Here's why Google's Gemini AI getting a proper memory could save lives
a google TV
I got a sneak peek of 3 great new AI features for Google Home devices and TVs, and one is straight out of a Black Mirror episode
C-3PO
Boston Dynamics is using Nvidia tech to make Atlas a better robot than C-3PO ever was – and it's about time
Google AI Mode
Google previews AI Mode for search, taking on the likes of ChatGPT search and Perplexity
Latest in Artificial Intelligence
The Claude, ChatGPT, Google Gemini and Perplexity logos, clockwise from top left
The ultimate AI search face-off - I pitted Claude's new search tool against ChatGPT Search, Perplexity, and Gemini, the results might surprise you
Dream Machine on a laptop.
What is Dream Machine: everything you need to know about the AI video generator
Apple Intelligence Bella Ramsey ad
The Bella Ramsey Apple Intelligence ad that disappeared, and why Apple is now facing a false advertising lawsuit
Google Gemini Canvas
Is Gemini Canvas better than ChatGPT Canvas? I tested out both AI writing tools to find out which is king
Hugging Snap
This AI app claims it can see what I'm looking at – which it mostly can
Apple's Craig Federighi presents Apple Intelligence at the 2024 Worldwide Developers Conference (WWDC).
Apple Intelligence might finally transform Siri into the ultimate AI assistant if these leadership changes are true
Latest in News
Quordle on a smartphone held in a hand
Quordle hints and answers for Sunday, March 23 (game #1154)
NYT Strands homescreen on a mobile phone screen, on a light blue background
NYT Strands hints and answers for Sunday, March 23 (game #385)
NYT Connections homescreen on a phone, on a purple background
NYT Connections hints and answers for Sunday, March 23 (game #651)
Google Pixel 9 Pro Fold main display opened
Apple is rumored to be prioritizing battery life on the foldable iPhone – which could also feature a liquid metal hinge for added durability
Google Pixel 9
The Google Pixel 10 just showed up in Android code – and may come with a useful speed boost
L-mount alliance
Sirui joins L-Mount Alliance to deliver its superb budget lenses for Leica, DJI, Sigma and Panasonic cameras