Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. Our primary contributions are:
In this work, we address diverse mobile manipulation tasks integral to human's daily life. Trained in a photo-realistic simulation, Our controller effectively accomplishes these tasks through harmonious mobile manipulation techniques in a real-world apartment featuring a novel layout, without any fine-tuning or adaptation.
Our HarmonicMM controller takes robot proprioception and multi-view visual observations as input and output navigation and manipulation commands at the same time. Every visual observation is preprocessed by a frozen DINOv2 encoder followed by a separate small CNN, and fed into the GRU and policy along with the proprioception reading.
We evaluate the task completion performance of different methods. Our method outperforms baselines on all tasks with higher Success Rates and higher Progresses.
We evaluate the efficiency of different controller. Our method outperforms baselines on all tasks with higher Progress Speed and shorter Episode Length (Eps-Length)
@misc{yang2024harmonicmobilemanipulation,
title={Harmonic Mobile Manipulation},
author={Ruihan Yang and Yejin Kim and Rose Hendrix and Aniruddha Kembhavi and Xiaolong Wang and Kiana Ehsani},
year={2024},
eprint={2312.06639},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2312.06639},
}