For a robot learning parkour, practice makes perfect.
Lifelike mechanisms of the sort made famous by viral Boston Dynamics videos are often trained through simulations that mimic a range of real-world environments and situations, according to researchers at MIT’s Computer Science and AI Lab (CSAIL). But available virtual training data is hard to gather in the sheer quantities needed for such a process.
A team of researchers at the lab are proposing a new system that taps generative AI to solve this problem. LucidSim combines a visual generator, physics simulator, and text-generated prompts to scale up the amount of diverse and realistic simulations on hand to train robots.
“Roboticists are trying to replicate the success of text-to-image models or language models like ChatGPT, but ChatGPT is trained on tens of trillions of tokens,” Ge Yang, a CSAIL postdoc fellow and author on the paper, told Tech Brew. “We’re nowhere close to that amount of data, so that’s the problem—robotics has a data problem.”
Yang hopes that the method could boost performance and help humanoid robots catch up to the progress made with language and image models during the AI revolution. Startups that create such robots have been gaining traction during the generative AI boom as tech companies like OpenAI consider the best way to deploy powerful models in the real world.
In order to scale up data creation, the researchers wrote a program that will vary thousands of ChatGPT prompts across factors like season, time of day, and specific geography. The text prompts are then fed to a text-to-image model paired with a physics and depth rendering simulation to bring them to life for the robots.
“I have this Asian or Chinese alley collection, a Pennsylvania collection—each one of them corresponds to a small number of four or five meta prompts. And then you feed the meta-prompts through this auto-prompting setup to generate hundreds or thousands of image-specific prompts,” Yang said. “So, this way, with very little work, I can generate thousands of prompts instead of having to try to come up with the prompts myself.”
MIT CSAIL
To test how well the system works, the researchers simulated robot tasks like climbing stairs, chasing a soccer ball, and climbing over hurdles. In many cases, the method was able to outperform a process called domain randomization, currently the go-to method for generating simulations.
Keep up with the innovative tech transforming business
Tech Brew keeps business leaders up-to-date on the latest innovations, automation advances, policy shifts, and more, so they can make informed decisions about tech.
The team also tested the system against the other primary way to train robots—human expert imitation—and found it fared better than that method in certain tasks as well.
“I don’t over-claim, but the results are extremely encouraging,” Yang said. “If you look at imitation learning-based methods…there’s always this performance ceiling, depending on how difficult you make the success criteria…The reason is because learning is a process where, as your robot becomes more capable, the world it sees also becomes larger. So, for a robot that’s improving, the data it needs to actually keep improving also changes.”
Yang said LucidSim is the first step in a bigger project to improve robot learning and general intelligence. He said scaling up robots hand in hand with AI models is essential for the field to be able to tackle certain problems in the real world.
“A lot of these things we assume AI should be able to handle are in fact domains where there’s a rich, interactive world behind each one of the two-dimensional images,” Yang said. “This is why unless we actually solve these in-body scenarios with robots, you’re going to have this big chunk of problems with AI that you’re not going to be able to solve.”