Format Specification
LeRobot Data is the EgoSuite export format intended for robot learning workflows. It organizes egocentric episodes into a training-friendly dataset layout for downstream policy learning, evaluation, and data loading.
Overview
This dataset contains egocentric data captured from a head-mounted device and optional wrist-mounted cameras. Each episode records a single continuous human activity, including synchronized RGB video streams, head pose, hand pose, optional body pose in world coordinates, and action-level semantic annotations. All camera videos are lens-distortion corrected (undistorted).
Folder Structure
Each episode is stored as a self-contained LeRobot v3.0 dataset:
{task_name}/{episode_uuid}/
├── data/
│ └── chunk-000/
│ └── file-000.parquet # per-frame state data
├── videos/
│ └── observation.images.{cam}/
│ └── chunk-000/
│ └── file-000.mp4 # video per camera
├── meta/
│ ├── info.json # dataset-level metadata
│ ├── tasks.parquet # task index mapping
│ ├── subtasks.parquet # subtask index mapping
│ └── episodes/
│ └── chunk-000/
│ └── file-000.parquet # episode-level metadata & stats
├── pointcloud/
│ └── frame_*.pcd # sparse point clouds (optional)
└── annotation.json # original action-level semantic annotation file
Data Parquet Schema
data/chunk-*/file-*.parquet — one row per frame, all poses in world coordinates, precision fp32.
| # | Column | Type | Note |
|---|---|---|---|
| 0 | index | int64 | Global frame index across the full shard |
| 1 | episode_index | int64 | Episode index within the shard |
| 2 | frame_index | int64 | Frame index within the episode |
| 3 | timestamp | float32 | Time in seconds since episode start |
| 4 | task_index | int64 | Maps to meta/tasks.parquet |
| 5 | subtask_index | int64 (nullable) | Maps to meta/subtasks.parquet; null if not annotated |
| 6 | observation.state.hand_left_world | float32 | Left hand joint positions in world space. Shape (21, 3). See Hand Joint Convention below |
| 7 | observation.state.hand_left_world_rotation | float32 | Left hand joint rotations in world space. Shape (21, 4), quaternion (qw, qx, qy, qz) |
| 8 | observation.state.hand_right_world | float32 | Right hand joint positions in world space. Shape (21, 3) |
| 9 | observation.state.hand_right_world_rotation | float32 | Right hand joint rotations in world space. Shape (21, 4), quaternion (qw, qx, qy, qz) |
| 10 | observation.state.body_world | float32 | Body joint positions in world space. Shape (22, 3) for full-body data or (14, 3) for upper-body data. Optional — included when body data is included |
| 11 | observation.state.body_world_rotation | float32 | Body joint rotations in world space. Shape (22, 4) for full-body data or (14, 4) for upper-body data, quaternion (qw, qx, qy, qz). Optional — included when body data is included |
| 12 | observation.state.head_world | float32 | Head position in world space. Shape (1, 3) |
| 13 | observation.state.head_world_rotation | float32 | Head rotation in world space. Shape (1, 4), quaternion (qw, qx, qy, qz) |
| 14 | observation.state.head_left_camera_position | float32 | Head-left camera position in world space. Shape (3,) |
| 15 | observation.state.head_left_camera_rotation | float32 | Head-left camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz) |
| 16 | observation.state.head_right_camera_position | float32 | Head-right camera position in world space. Shape (3,) |
| 17 | observation.state.head_right_camera_rotation | float32 | Head-right camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz) |
| 18 | observation.state.left_wrist_camera_position | float32 | Left wrist camera position in world space. Shape (3,). Optional — included when wrist camera data is included |
| 19 | observation.state.left_wrist_camera_rotation | float32 | Left wrist camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz). Optional — included when wrist camera data is included |
| 20 | observation.state.right_wrist_camera_position | float32 | Right wrist camera position in world space. Shape (3,). Optional — included when wrist camera data is included |
| 21 | observation.state.right_wrist_camera_rotation | float32 | Right wrist camera rotation in world space. Shape (4,), quaternion (qw, qx, qy, qz). Optional — included when wrist camera data is included |
Notes:
- All position values are in meters.
- All quaternions use scalar-first order:
(qw, qx, qy, qz). - Camera columns follow the pattern
observation.state.{cam}_camera_position/{cam}_camera_rotationfor each camera present in the episode. Wrist camera columns are optional.
Hand Joint Convention
LEFT_HAND_JOINTS = {
0: 'leftWrist',
1: 'leftThumbMCP',
2: 'leftThumbPIP',
3: 'leftThumbDIP',
4: 'leftThumbTip',
5: 'leftIndexMCP',
6: 'leftIndexPIP',
7: 'leftIndexDIP',
8: 'leftIndexTip',
9: 'leftMiddleMCP',
10: 'leftMiddlePIP',
11: 'leftMiddleDIP',
12: 'leftMiddleTip',
13: 'leftRingMCP',
14: 'leftRingPIP',
15: 'leftRingDIP',
16: 'leftRingTip',
17: 'leftLittleMCP',
18: 'leftLittlePIP',
19: 'leftLittleDIP',
20: 'leftLittleTip',
}
RIGHT_HAND_JOINTS = {
0: 'rightWrist',
1: 'rightThumbMCP',
2: 'rightThumbPIP',
3: 'rightThumbDIP',
4: 'rightThumbTip',
5: 'rightIndexMCP',
6: 'rightIndexPIP',
7: 'rightIndexDIP',
8: 'rightIndexTip',
9: 'rightMiddleMCP',
10: 'rightMiddlePIP',
11: 'rightMiddleDIP',
12: 'rightMiddleTip',
13: 'rightRingMCP',
14: 'rightRingPIP',
15: 'rightRingDIP',
16: 'rightRingTip',
17: 'rightLittleMCP',
18: 'rightLittlePIP',
19: 'rightLittleDIP',
20: 'rightLittleTip',
}
Body Joint Convention
Full-body data uses the 22-joint convention:
BODY_JOINTS = {
0: 'Pelvis',
1: 'leftHip',
2: 'rightHip',
3: 'Spine1',
4: 'leftKnee',
5: 'rightKnee',
6: 'Spine2',
7: 'leftAnkle',
8: 'rightAnkle',
9: 'Spine3',
10: 'leftFoot',
11: 'rightFoot',
12: 'Neck',
13: 'leftCollar',
14: 'rightCollar',
15: 'Head',
16: 'leftShoulder',
17: 'rightShoulder',
18: 'leftElbow',
19: 'rightElbow',
20: 'leftWrist',
21: 'rightWrist',
}
Upper-body data uses 14 joints. It removes the lower-limb joints from the 22-joint layout and reindexes the remaining joints compactly:
UPPER_BODY_JOINTS = {
0: 'Pelvis',
1: 'Spine1',
2: 'Spine2',
3: 'Spine3',
4: 'Neck',
5: 'leftCollar',
6: 'rightCollar',
7: 'Head',
8: 'leftShoulder',
9: 'rightShoulder',
10: 'leftElbow',
11: 'rightElbow',
12: 'leftWrist',
13: 'rightWrist',
}
Task and Subtask
| Location | Purpose |
|---|---|
meta/tasks.parquet | Maps task_index (int) to the corresponding task name (string) |
meta/subtasks.parquet | Maps subtask_index (int) to the corresponding subtask name (string) |
data/chunk-*/file-*.parquet | Includes per-frame columns task_index and subtask_index |
meta/tasks.parquet
One row per unique task.
| Field | Storage | Type | Description |
|---|---|---|---|
| Task name | Pandas index | str | Human-readable task description |
task_index | Column | int | Integer ID referenced by task_index in data parquets |
Example:
task_index
task
Coffee Table Snack Setup Preparation: The person... 0
- Some parquet readers or viewers show the physical parquet columns instead of the pandas index view:
{"task_index": 0, "task": "Coffee Table Snack Setup Preparation: The person..."}
Index.namemust be"task",Index.dtypemust bestr.- The DataFrame must contain exactly one column:
task_index.
meta/subtasks.parquet
One row per unique subtask.
| Field | Storage | Type | Description |
|---|---|---|---|
| Subtask name | Pandas index | str | Human-readable subtask description |
subtask_index | Column | int | Integer ID referenced by subtask_index in data parquets |
Example:
subtask_index
subtask
Walked to the cabinet. 0
Squat down. 1
Open the drawer. 2
- Some parquet readers or viewers show the physical parquet columns instead of the pandas index view:
{"subtask_index": 0, "subtask": "Walked to the cabinet."}
{"subtask_index": 1, "subtask": "Squat down."}
{"subtask_index": 2, "subtask": "Open the drawer."}
Index.namemust be"subtask",Index.dtypemust bestr.- The DataFrame must contain exactly one column:
subtask_index. - Every
subtask_indexvalue that appears indata/chunk-*/file-*.parquetmust have a corresponding row in this file. - Frames with no annotation have
subtask_index = null.
meta/episodes/*.parquet
One row per episode.
| Field | Type | Description |
|---|---|---|
episode_index | int64 | Zero-based episode index within the shard |
tasks | list<string> | Task name(s) for this episode |
length | int64 | Total number of frames in the episode |
data/chunk_index | int64 | Chunk index of the data parquet file |
data/file_index | int64 | File index within the data chunk |
videos/observation.images.{cam}/chunk_index | int64 | Chunk index of the video file |
videos/observation.images.{cam}/file_index | int64 | File index within the video chunk |
videos/observation.images.{cam}/from_timestamp | double | Start timestamp (seconds) of the video segment |
videos/observation.images.{cam}/to_timestamp | double | End timestamp (seconds) of the video segment |
stats/{col}/min | list<double> | Per-column min statistic |
stats/{col}/max | list<double> | Per-column max statistic |
stats/{col}/mean | list<double> | Per-column mean statistic |
stats/{col}/std | list<double> | Per-column std statistic |
stats/{col}/count | list<int64> | Frame count used for stats |
dataset_from_index | int64 | Global starting frame index of this episode within the shard |
dataset_to_index | int64 | Global ending frame index (exclusive) of this episode |
meta/episodes/chunk_index | int64 | Chunk index of this episodes parquet file |
meta/episodes/file_index | int64 | File index within the episodes chunk |
camera_intrinsics/{cam} | list<float>[8] | Camera intrinsics [fx, fy, cx, cy, k1, k2, k3, k4] using undistorted intrinsics. One column per camera |
episode_uuid | string | Unique identifier for this episode |
environment_id | string | Environment identifier |
scene_id | string | Scene identifier |
operator_id | string | Operator identifier |
batch_version | string | Batch version identifier |
meta/info.json
Dataset-level metadata following the LeRobot v3.0 schema.
| Field | Type | Description |
|---|---|---|
codebase_version | string | "v3.0" for LeRobot v3.0 |
robot_type | null | Not applicable for egocentric data |
total_episodes | int | Total number of episodes in the dataset |
total_frames | int | Total number of frames across all episodes |
total_tasks | int | Number of unique tasks |
chunks_size | int | Max episodes per chunk (default 1000) |
fps | float | Frame rate of the head camera |
splits | object | Train/val split definitions |
data_path | string | Path template for data parquet files |
video_path | string | Path template for video files |
features | object | Feature schema for all state and image columns |