dataset June 29, 2026

HIW-500: 500 Hours of Humanoid Robot Learning in Real Homes

HIW-500 (Humanoids In-the-Wild) is a large‑scale dataset released by BitRobot that captures whole‑body teleoperation of the Unitree G1 humanoid robot across real household environments in Southeast Asia. The collection comprises over 500 hours of demonstrations, 23,000+ episodes, and roughly 10 TB of synchronized data, covering more than ten everyday tasks in twelve different homes. Each episode includes multi‑modal streams such as head‑mounted RGB stereo video, wrist‑mounted RGB and stereo IR video, 29‑degree‑of‑freedom joint states, end‑effector information, IMU, odometry, and rich metadata including language annotations and sub‑task labels (161 distinct sub‑tasks with 148 k+ annotations).

The dataset is explicitly designed for research in mobile manipulation, bimanual interaction, long‑horizon household skill acquisition, and imitation learning from in‑the‑wild demonstrations. It is provided in two consumable formats: raw ROS bag/MCAP recordings for robotics pipelines and a LeRobot format for easier integration with existing learning frameworks. The CC‑BY‑4.0 license encourages open research while offering commercial access options for extended coverage or custom data collection.

HIW-500’s breadth—spanning diverse home layouts, lighting conditions, object states, and operator styles—makes it a valuable benchmark for evaluating general‑purpose robot learning algorithms, especially those that must handle variability and long‑term planning in real domestic settings.

Project Ideas

  1. Train an imitation‑learning policy that can replicate the "building children table" task using the raw ROS bag recordings.
  2. Develop a multimodal perception model that fuses head‑camera RGB stereo and wrist‑camera IR streams to improve object detection in cluttered home environments.
  3. Benchmark long‑horizon planning algorithms by evaluating how well they can sequence the 161 sub‑tasks to complete a multi‑step household chore.
  4. Create a language‑grounded command interpreter that maps the provided language annotations to robot joint trajectories for zero‑shot task execution.
  5. Build a simulation‑to‑real transfer pipeline that converts the LeRobot format data into a simulated environment for pre‑training before fine‑tuning on the real‑world demonstrations.
← Back to all reports