Harmonic Mobile Manipulation

1UC San Diego, 2PRIOR @ Allen Institute for AI 3University of Washington, Seattle

IROS 2024 Best Paper on Mobile Manipulation / Oral

Abstract

Recent advancements in robotics have enabled robots to navigate complex scenes or manipulate diverse objects independently. However, robots are still impotent in many household tasks requiring coordinated behaviors such as opening doors. The factorization of navigation and manipulation, while effective for some tasks, fails in scenarios requiring coordinated actions. Our primary contributions are:

  • An end-to-end learning approach that jointly optimizes navigation and manipulation, achieving an absolute improvement of 17.6% in average success rate across tasks compared to previous methods
  • Adding the support for more complex tasks, such as door opening and table cleaning, to ProcTHOR.
  • Successful transfer of agents trained in simulation to real-world settings with only RGB visual observation and proprioception.
  • Introducing a new benchmark for complex mobile manipulation tasks, including opening fridges, cleaning tables, and opening doors by pulling and pushing.


Video


Harmonic Mobile Manipulation (HarmonicMM)

In this work, we address diverse mobile manipulation tasks integral to human's daily life. Trained in a photo-realistic simulation, Our controller effectively accomplishes these tasks through harmonious mobile manipulation techniques in a real-world apartment featuring a novel layout, without any fine-tuning or adaptation.

Our HarmonicMM controller takes robot proprioception and multi-view visual observations as input and output navigation and manipulation commands at the same time. Every visual observation is preprocessed by a frozen DINOv2 encoder followed by a separate small CNN, and fed into the GRU and policy along with the proprioception reading.


Real World Deployment (Without Any Fine-tuning or Adaptation)

Opening Door (Pull) (With Corresponding Visual Observation)

HarmonicMM

Manipulation Camera

HarmonicMM

Manipulation Camera

HarmonicMM

Manipulation Camera

Opening Door (Push)

Cleaning Table


Simulation Trajectories (With Corresponding Visual Observation)

Opening Door (Pull)

HarmonicMM

HarmonicMM

Opening Door (Push)

HarmonicMM

HarmonicMM

Cleaning Table

HarmonicMM

HarmonicMM

Opening Fridge

HarmonicMM

HarmonicMM


Quantitative Results

We evaluate the task completion performance of different methods. Our method outperforms baselines on all tasks with higher Success Rates and higher Progresses.

We evaluate the efficiency of different controller. Our method outperforms baselines on all tasks with higher Progress Speed and shorter Episode Length (Eps-Length)


BibTeX


      @misc{yang2024harmonicmobilemanipulation,
        title={Harmonic Mobile Manipulation}, 
        author={Ruihan Yang and Yejin Kim and Rose Hendrix and Aniruddha Kembhavi and Xiaolong Wang and Kiana Ehsani},
        year={2024},
        eprint={2312.06639},
        archivePrefix={arXiv},
        primaryClass={cs.RO},
        url={https://arxiv.org/abs/2312.06639}, 
      }