VisualMimic

Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

VisualMimic Team

*Equal contribution    Equal advising

logo

VisualMimic enables *generalizable* visuomotor skills across time & space.

Morning
Dusk
Evening
Midnight
Hover Tower
Engineering Building
Robotics Center
Memorial Church

VisualMimic enables humanoid loco-manipulation with *whole-body dexterity*.

Kick Box
Kick Ball
Lift Box
Push Box with Feet and Hands
Push Box with Shoulder
Push Box with Two Hands

VisualMimic Framework

Method

Unveiling Key Designs in VisualMimic

More Results

Diverse Simulation Tasks with Vision

Real-World Results

Sim2Sim Results


Related Work

Acknowledgements

We would like to thank all members of the CogAI group and The Movement Lab from Stanford University for their support. We also thank the Stanford Robotics Center for providing the experiment space. This work is in part supported by Stanford Institute for Human-Centered AI (HAI), Stanford Robotics Center (SRC), ONR MURI N00014-22-1-2740, ONR MURI N00014-24-1-2748, and NSF:FRR 215385.

BibTeX

@article{shao2025visualmimic,
title={VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation},
author= {Shaofeng Yin and Yanjie Ze and Hong-Xing Yu and C. Karen Liu and Jiajun Wu},
year= {2025},
journal= {arXiv preprint arXiv:2509.20322}
}

Website modified from TWIST.
@ 2025 Yanjie Ze