Transferring World Models from Simulation for Efficient Real-World Finetuning

Under submission |

Robot learning requires a considerable amount of data to realize it’s promise of generalization. However, it can be challenging to actually collect the magnitude of data necessary for generalization entirely in the real world. Simulation can serve as a source of plentiful data with coverage over relevant states and actions, without requiring the burden of human data collection. However, the high-fidelity physics simulators are fundamentally misspecified approximations to reality, making direct zero-shot transfer challenging. This makes real-world finetuning of policies pretrained in simulation an attractive approach to robot learning. However, current finetuning methods simply use the simulator to provide a reasonable initialization for real-world learning. We go beyond this paradigm by demonstrating how the task structure extracted from its simulation can be used to effectively guide and accelerate learning in the real world. Specifically, we argue that dynamics models and value functions learned in simulation can provide gradient information for the real-world learning problem, substantially reducing the complexity of learning a finetuned policy in real. We demonstrate our approach across several tabletop manipulation tasks in simulation and the real world, learning successful policies for problems that are challenging to handle using purely real-world data.