Kitchen Robotics: 600 Hours of Human Tele‑Operated Demonstrations
nvidia/PhysicalAI-Robotics-Kitchen-Sim-Demos ↗
PhysicalAI‑Robotics‑Kitchen‑Sim‑Demos is a large‑scale dataset released by NVIDIA that captures 600 hours of human‑teleoperated manipulation in a simulated kitchen environment. The data spans 316 distinct tasks—ranging from atomic actions like opening a fridge drawer to complex composite procedures such as preparing a multi‑course meal—totaling 55 000 recorded trajectories. Each trajectory includes low‑dimensional proprioceptive and action data stored as Parquet files, synchronized MP4 video from three camera viewpoints (left/right third‑person and eye‑in‑hand), and MuJoCo‑specific metadata for simulation replay.
The dataset follows the LeRobot format, providing comprehensive metadata files (info.json, tasks.jsonl, episodes.jsonl, etc.) that describe the robot embodiment (Franka Panda on an Omron mobile base), observation modalities, and episode statistics. Extras contain compressed MJCF model definitions and raw MuJoCo state files, enabling researchers to recreate the exact simulation conditions. All content is released under a CC‑BY‑4.0 license, making it freely reusable for academic and commercial research.
PhysicalAI‑Robotics‑Kitchen‑Sim‑Demos is positioned for imitation learning, behavior cloning, and hierarchical policy research in the robotics domain. By offering both sensor streams and visual recordings, it supports multimodal learning approaches and sim‑to‑real transfer experiments. The breadth of tasks—covering navigation, object manipulation, cooking, cleaning, and organization—provides a rich testbed for evaluating general‑purpose kitchen assistants and benchmarking new learning algorithms.
Project Ideas
- Train a behavior‑cloning policy that can open, close, and manipulate kitchen appliances using the low‑dimensional action and proprioception data.
- Develop a hierarchical task planner that composes atomic policies into composite meal‑preparation sequences using the provided task hierarchy metadata.
- Benchmark sim‑to‑real transfer by fine‑tuning policies trained on the MuJoCo states and then deploying them on a real Franka Panda robot.
- Create a multimodal perception model that fuses proprioceptive signals with the three‑view video streams to improve visual‑servoing performance.
- Use the trajectory videos to generate synthetic training data for vision‑only policies, evaluating how visual imitation compares to state‑based imitation.