Skip to main content

Format Specification

LeRobot Data is the EgoSuite export format intended for robot learning workflows. It organizes egocentric episodes into a training-friendly dataset layout for downstream policy learning, evaluation, and data loading.

Overview

This dataset contains egocentric data captured from a head-mounted device and optional wrist-mounted cameras. Each episode records a single continuous human activity, including synchronized RGB video streams, head pose, hand pose, optional body pose in world coordinates, and action-level semantic annotations. All camera videos are lens-distortion corrected (undistorted).

Folder Structure

Each episode is stored as a self-contained LeRobot v3.0 dataset:

{task_name}/{episode_uuid}/
├── data/
│ └── chunk-000/
│ └── file-000.parquet # per-frame state data
├── videos/
│ └── observation.images.{cam}/
│ └── chunk-000/
│ └── file-000.mp4 # video per camera
├── meta/
│ ├── info.json # dataset-level metadata
│ ├── tasks.parquet # task index mapping
│ ├── subtasks.parquet # subtask index mapping
│ └── episodes/
│ └── chunk-000/
│ └── file-000.parquet # episode-level metadata & stats
├── pointcloud/
│ └── frame_*.pcd # sparse point clouds (optional)
└── annotation.json # original action-level semantic annotation file

Data Parquet Schema

data/chunk-*/file-*.parquet — one row per frame, all poses in world coordinates, precision fp32.

#ColumnTypeNote
0indexint64Global frame index across the full shard
1episode_indexint64Episode index within the shard
2frame_indexint64Frame index within the episode
3timestampfloat32Time in seconds since episode start
4task_indexint64Maps to meta/tasks.parquet
5subtask_indexint64 (nullable)Maps to meta/subtasks.parquet; null if not annotated
6observation.state.hand_left_worldfloat32Left hand joint positions in world space. Shape (21, 3). See Hand Joint Convention below
7observation.state.hand_left_world_rotationfloat32Left hand joint rotations in world space. Shape (21, 4), quaternion (qw, qx, qy, qz)
8observation.state.hand_right_worldfloat32Right hand joint positions in world space. Shape (21, 3)
9observation.state.hand_right_world_rotationfloat32Right hand joint rotations in world space. Shape (21, 4), quaternion (qw, qx, qy, qz)
10observation.state.body_worldfloat32Body joint positions in world space. Shape (22, 3) for full-body data or (14, 3) for upper-body data. Optional — included when body data is included
11observation.state.body_world_rotationfloat32Body joint rotations in world space. Shape (22, 4) for full-body data or (14, 4) for upper-body data, quaternion (qw, qx, qy, qz). Optional — included when body data is included
12observation.state.head_worldfloat32Head position in world space. Shape (1, 3)
13observation.state.head_world_rotationfloat32Head rotation in world space. Shape (1, 4), quaternion (qw, qx, qy, qz)
14observation.state.head_left_camera_positionfloat32Head-left camera position in world space. Shape (3,)
15observation.state.head_left_camera_rotationfloat32Head-left camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz)
16observation.state.head_right_camera_positionfloat32Head-right camera position in world space. Shape (3,)
17observation.state.head_right_camera_rotationfloat32Head-right camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz)
18observation.state.left_wrist_camera_positionfloat32Left wrist camera position in world space. Shape (3,). Optional — included when wrist camera data is included
19observation.state.left_wrist_camera_rotationfloat32Left wrist camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz). Optional — included when wrist camera data is included
20observation.state.right_wrist_camera_positionfloat32Right wrist camera position in world space. Shape (3,). Optional — included when wrist camera data is included
21observation.state.right_wrist_camera_rotationfloat32Right wrist camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz). Optional — included when wrist camera data is included

Notes:

  • All position values are in meters.
  • All quaternions use scalar-first order: (qw, qx, qy, qz).
  • Camera columns follow the pattern observation.state.{cam}_camera_position / {cam}_camera_rotation for each camera present in the episode. Wrist camera columns are optional.

Hand Joint Convention

LEFT_HAND_JOINTS = {
0: 'leftWrist',
1: 'leftThumbMCP',
2: 'leftThumbPIP',
3: 'leftThumbDIP',
4: 'leftThumbTip',
5: 'leftIndexMCP',
6: 'leftIndexPIP',
7: 'leftIndexDIP',
8: 'leftIndexTip',
9: 'leftMiddleMCP',
10: 'leftMiddlePIP',
11: 'leftMiddleDIP',
12: 'leftMiddleTip',
13: 'leftRingMCP',
14: 'leftRingPIP',
15: 'leftRingDIP',
16: 'leftRingTip',
17: 'leftLittleMCP',
18: 'leftLittlePIP',
19: 'leftLittleDIP',
20: 'leftLittleTip',
}

RIGHT_HAND_JOINTS = {
0: 'rightWrist',
1: 'rightThumbMCP',
2: 'rightThumbPIP',
3: 'rightThumbDIP',
4: 'rightThumbTip',
5: 'rightIndexMCP',
6: 'rightIndexPIP',
7: 'rightIndexDIP',
8: 'rightIndexTip',
9: 'rightMiddleMCP',
10: 'rightMiddlePIP',
11: 'rightMiddleDIP',
12: 'rightMiddleTip',
13: 'rightRingMCP',
14: 'rightRingPIP',
15: 'rightRingDIP',
16: 'rightRingTip',
17: 'rightLittleMCP',
18: 'rightLittlePIP',
19: 'rightLittleDIP',
20: 'rightLittleTip',
}

Body Joint Convention

Full-body data uses the 22-joint convention:

BODY_JOINTS = {
0: 'Pelvis',
1: 'leftHip',
2: 'rightHip',
3: 'Spine1',
4: 'leftKnee',
5: 'rightKnee',
6: 'Spine2',
7: 'leftAnkle',
8: 'rightAnkle',
9: 'Spine3',
10: 'leftFoot',
11: 'rightFoot',
12: 'Neck',
13: 'leftCollar',
14: 'rightCollar',
15: 'Head',
16: 'leftShoulder',
17: 'rightShoulder',
18: 'leftElbow',
19: 'rightElbow',
20: 'leftWrist',
21: 'rightWrist',
}

Upper-body data uses 14 joints. It removes the lower-limb joints from the 22-joint layout and reindexes the remaining joints compactly:

UPPER_BODY_JOINTS = {
0: 'Pelvis',
1: 'Spine1',
2: 'Spine2',
3: 'Spine3',
4: 'Neck',
5: 'leftCollar',
6: 'rightCollar',
7: 'Head',
8: 'leftShoulder',
9: 'rightShoulder',
10: 'leftElbow',
11: 'rightElbow',
12: 'leftWrist',
13: 'rightWrist',
}

Task and Subtask

LocationPurpose
meta/tasks.parquetMaps task_index (int) to the corresponding task name (string)
meta/subtasks.parquetMaps subtask_index (int) to the corresponding subtask name (string)
data/chunk-*/file-*.parquetIncludes per-frame columns task_index and subtask_index

meta/tasks.parquet

One row per unique task.

FieldStorageTypeDescription
Task namePandas indexstrHuman-readable task description
task_indexColumnintInteger ID referenced by task_index in data parquets

Example:

                                                          task_index
task
Coffee Table Snack Setup Preparation: The person... 0
  • Some parquet readers or viewers show the physical parquet columns instead of the pandas index view:
{"task_index": 0, "task": "Coffee Table Snack Setup Preparation: The person..."}
  • Index.name must be "task", Index.dtype must be str.
  • The DataFrame must contain exactly one column: task_index.

meta/subtasks.parquet

One row per unique subtask.

FieldStorageTypeDescription
Subtask namePandas indexstrHuman-readable subtask description
subtask_indexColumnintInteger ID referenced by subtask_index in data parquets

Example:

                                        subtask_index
subtask
Walked to the cabinet. 0
Squat down. 1
Open the drawer. 2
  • Some parquet readers or viewers show the physical parquet columns instead of the pandas index view:
{"subtask_index": 0, "subtask": "Walked to the cabinet."}
{"subtask_index": 1, "subtask": "Squat down."}
{"subtask_index": 2, "subtask": "Open the drawer."}
  • Index.name must be "subtask", Index.dtype must be str.
  • The DataFrame must contain exactly one column: subtask_index.
  • Every subtask_index value that appears in data/chunk-*/file-*.parquet must have a corresponding row in this file.
  • Frames with no annotation have subtask_index = null.

meta/episodes/*.parquet

One row per episode.

FieldTypeDescription
episode_indexint64Zero-based episode index within the shard
taskslist<string>Task name(s) for this episode
lengthint64Total number of frames in the episode
data/chunk_indexint64Chunk index of the data parquet file
data/file_indexint64File index within the data chunk
videos/observation.images.{cam}/chunk_indexint64Chunk index of the video file
videos/observation.images.{cam}/file_indexint64File index within the video chunk
videos/observation.images.{cam}/from_timestampdoubleStart timestamp (seconds) of the video segment
videos/observation.images.{cam}/to_timestampdoubleEnd timestamp (seconds) of the video segment
stats/{col}/minlist<double>Per-column min statistic
stats/{col}/maxlist<double>Per-column max statistic
stats/{col}/meanlist<double>Per-column mean statistic
stats/{col}/stdlist<double>Per-column std statistic
stats/{col}/countlist<int64>Frame count used for stats
dataset_from_indexint64Global starting frame index of this episode within the shard
dataset_to_indexint64Global ending frame index (exclusive) of this episode
meta/episodes/chunk_indexint64Chunk index of this episodes parquet file
meta/episodes/file_indexint64File index within the episodes chunk
camera_intrinsics/{cam}list<float>[8]Camera intrinsics [fx, fy, cx, cy, k1, k2, k3, k4] using undistorted intrinsics. One column per camera
episode_uuidstringUnique identifier for this episode
environment_idstringEnvironment identifier
scene_idstringScene identifier
operator_idstringOperator identifier
batch_versionstringBatch version identifier

meta/info.json

Dataset-level metadata following the LeRobot v3.0 schema.

FieldTypeDescription
codebase_versionstring"v3.0" for LeRobot v3.0
robot_typenullNot applicable for egocentric data
total_episodesintTotal number of episodes in the dataset
total_framesintTotal number of frames across all episodes
total_tasksintNumber of unique tasks
chunks_sizeintMax episodes per chunk (default 1000)
fpsfloatFrame rate of the head camera
splitsobjectTrain/val split definitions
data_pathstringPath template for data parquet files
video_pathstringPath template for video files
featuresobjectFeature schema for all state and image columns