VisualMimic enables generalizable visuomotor skills across time & space.

Morning

Dusk

Evening

Midnight

Hover Tower

Engineering Building

Robotics Center

Memorial Church

VisualMimic enables humanoid loco-manipulation with whole-body dexterity.

Kick Box

Kick Ball

Lift Box

Push Box with Feet and Hands

Push Box with Shoulder

Push Box with Two Hands

VisualMimic Framework

Unveiling Key Designs in VisualMimic

Interface Design for High-Level/Low-Level Communication

VisualMimic

VisualMimic w/ 3 Keypoints Only

Training Low-Level Tracker with Teacher-Student Distillation

VisualMimic

VisualMimic w/o Distillation

VisualMimic

VisualMimic w/o Distillation

For more details, please refer to our arXiv.

More Results

Diverse Simulation Tasks with Vision

Push Box

Kick Box

Drib Ball

Lift Box

Push Cube

Large Kick

Reach Box

Balance Ball

Real-World Results

Lift Box 2 (Recover from Failure)

Kick Box 2 (Gentle)

Drib Ball 2 (Foot Collide with Gantry)

Sim2Sim Results

Kick Ball

Lift Box

Push Box

Kick Box

Acknowledgements

We would like to thank all members of the CogAI group and The Movement Lab from Stanford University for their support. We also thank the Stanford Robotics Center for providing the experiment space. This work is in part supported by Stanford Institute for Human-Centered AI (HAI), Stanford Robotics Center (SRC), ONR MURI N00014-22-1-2740, ONR MURI N00014-24-1-2748, and NSF:FRR 215385.

BibTeX

@article{shao2025visualmimic,
title={VisualMimic: Visual Humanoid Loco-Manipulation via Motion Tracking and Generation},
author= {Shaofeng Yin and Yanjie Ze and Hong-Xing Yu and C. Karen Liu and Jiajun Wu},
year= {2025},
journal= {arXiv preprint arXiv:2509.20322}
}

Website modified from TWIST.
@ 2025 Yanjie Ze

VisualMimic

Visual Humanoid Loco-Manipulation via Motion Tracking and Generation

VisualMimic Team

VisualMimic enables *generalizable* visuomotor skills across time & space.

VisualMimic enables humanoid loco-manipulation with *whole-body dexterity*.

VisualMimic Framework

Unveiling Key Designs in VisualMimic

Interface Design for High-Level/Low-Level Communication

Training Low-Level Tracker with Teacher-Student Distillation

More Results

Diverse Simulation Tasks with Vision

Real-World Results

Sim2Sim Results

Related Work

Acknowledgements

BibTeX

VisualMimic enables generalizable visuomotor skills across time & space.

VisualMimic enables humanoid loco-manipulation with whole-body dexterity.