Forget Sora, this is the AI video that will blow your mind – and maybe scare you

Figure 01 robot
(Image credit: Figure AI)

Humanoid robotic development has for the better part of two decades moved at a snail's pace but rapid acceleration is underway thanks to a collaboration between Figure AI and OpenAI with the result being the most stunning bit of real humanoid robot video I've ever seen.

On Wednesday, startup robotics firm Figure AI released a video update (see below) of its Figure 01 robot running a new Visual Language Model (VLM) that has somehow transformed the bot from a rather uninteresting automaton into a full-fledged sci-fi bot that approaches C-3PO-level capabilities.

In the video, Figure 01 stands behind a table set with a plate, an apple, and a cup. To the left is a drainer. A human stands in front of the robot and asks the robot, "Figure 01, what do you see right now?"

After a few seconds, Figure 01 responds in a remarkably human-sounding voice (there is no face, just an animated light that moves in sync with the voice), detailing everything on the table and the details of the man standing before it.

"That's cool," I thought.

Then the man asks, "Hey, can I have something to eat?"

Figure 01 responds, "Sure thing" and then with a dextrous flourish of fluid movement, picks up the apple and hands it to the guy.

"Woah," I thought.

Next, the man empties some crumpled debris from a bin in front of Figure 01 while asking, "Can you explain why you did what you just did while you pick up this trash?"

Figure 01 wastes no time explaining its reasoning while placing the paper back into the bin. "So, I gave you the apple because it's the only edible item I could provide you with from the table."

I thought, "This can't be real."

It is, though, at least according to Figure AI.

Speech-to-speech

The company explained in a release that Figure 01 engages in "speech-to-speech" reasoning using OpenAI's pre-trained multimodal model, VLM, to understand images and texts and relies on an entire voice conversation to craft its responses. This is different than, say, OpenAI's GPT-4, which focuses on written prompts.

It's also using what the company calls "learned low-level bimanual manipulation." The system matches precise image calibrations (down to a pixel level) with its neural network to control movement. "These networks take in onboard images at 10hz, and generate 24-DOF actions (wrist poses and finger joint angles) at 200hz," Figure AI wrote in a release.

The company claims that every behavior in the video is based on system learning and is not teleoperated, meaning there's no one behind-the-scenes puppeteering Figure 01.

Without seeing Figure 01 in person, and asking my own questions, it's hard to verify these claims. There is the possibility that this is not the first time Figure 01 has run through this routine. It could've been the 100th time, which might account for its speed and fluidity.

Or maybe this is 100% real and in that case, wow. Just wow.

You might also like

Lance Ulanoff
Editor At Large

A 38-year industry veteran and award-winning journalist, Lance has covered technology since PCs were the size of suitcases and “on line” meant “waiting.” He’s a former Lifewire Editor-in-Chief, Mashable Editor-in-Chief, and, before that, Editor in Chief of PCMag.com and Senior Vice President of Content for Ziff Davis, Inc. He also wrote a popular, weekly tech column for Medium called The Upgrade.

Lance Ulanoff makes frequent appearances on national, international, and local news programs including Live with Kelly and Mark, the Today Show, Good Morning America, CNBC, CNN, and the BBC. 

Read more
Unitree G1
Robot abuse is still the path to the future – even if I hate it
Seve Jobs
Apple might be building a humanoid robot, and I truly hope it looks like Steve Jobs
1X Neo Gamma
This robot video is weirdly depressing, and I've never been so happy to be human
Google Gemini Robotics
Gemini just got physical and you should prepare for a robot revolution
OmniHuman
TikTok owner ByteDance has a new AI video creator you have to see to believe
Apple Robot Playing Music
Apple built a super-cute, expressive robot lamp that is giving us major Pixar vibes
Latest in Computing
Using Zipped files and folders in Windows 11
Hidden clues suggest Microsoft is moving another part of Windows 11’s Control Panel to the Settings app – and this time it’s mouse options
John Loeffler holding the Ryzen 7 7800X3D
Great news! The best gaming CPU ever made is finally available for it's original MSRP again
Audio Overview in Gemini
Get ready for Audio Overview in Google Gemini, I’ve used it in Notebook LM and it's a complete game changer
Google Gemini Canvas 'Collaborate with Gemini'
Gemini just got a huge writing and coding upgrade - Google keeps making its AI better and ChatGPT should be worried
A woman sitting in a chair looking at a Windows 11 laptop
Microsoft is supercharging Windows 11’s voice commands on Copilot+ PCs with Snapdragon CPUs, and fine-tuning a few Recall features
Samsung TV and Meta Quest 3S headset on purple background with don't miss text overlay
Verizon's latest home internet deal includes a free Meta Quest VR headset or a brand-new Samsung TV
Latest in News
Lego Pokemon
Pokemon and Lego announce the most electrifying collaboration of all time and I’m going to be first in line
Apple Watch app health
Apple Watch blood pressure monitoring tech revealed in patent
Using Zipped files and folders in Windows 11
Hidden clues suggest Microsoft is moving another part of Windows 11’s Control Panel to the Settings app – and this time it’s mouse options
Core Time 2 and COre 2 Duo watches running Pebble OS
Pebble is back! Pebble founder announces two new smartwatches, and they're basically the opposite of an Apple Watch in every way
an image of the Samsung Galaxy S24 Ultra
Finally! One UI 7 has a release date - here are the Samsung phones that’ll get it first
Google Cloud logo
Google to acquire cloud security platform Wiz in $32 billion deal