Skip to main content

Head RGB Camera

The EgoSuite headset provides two head‑mounted RGB cameras. In the MCAP files, these cameras appear as separate topics for:

  • Compressed video streams
  • Raw compressed video streams when source raw MP4 files are included
  • Per‑camera intrinsics (calibration)
  • Per‑camera extrinsics (pose per frame transforms)

Coordinate Frames

Head cameras use consistent coordinate conventions across the dataset:

  • World Frame:

    • All EgoSuite pose data (body and hands) is expressed in a world frame.
    • The camera extrinsics relate this world frame to each camera frame.
  • Camera Frame (OpenCV convention):

    • Camera projections use the standard OpenCV camera coordinate system:
      • zz axis points forward from the camera.
      • xx axis points to the right in the image.
      • yy axis points down in the image.
    • This convention is used for the intrinsic matrix KK, the distortion model DD, rectification RR, and the projection matrix PP, as well as for the extrinsic rotation RR and translation tt.
OpenCV camera coordinate system with z forward, x right, y down

Camera coordinate frame (OpenCV convention): z-axis forward (blue), x-axis right (red), y-axis down (green).

Topic & Message Type

The following topics correspond to the head cameras. All channels use protobuf encoding:

  • Left RGB Camera:

    • Video topic: /sensor/camera/head_left/video
    • Message type: foxglove.CompressedVideo
    • Raw video topic: /sensor/camera/head_left/video_raw
    • Raw video type: foxglove.CompressedVideo
    • Intrinsic topic: /sensor/camera/head_left/intrinsic
    • Intrinsic type: foxglove.CameraCalibration
    • Extrinsic topic: /sensor/camera/head_left/extrinsic
    • Extrinsic type: foxglove.FrameTransforms
  • Right RGB Camera:

    • Video topic: /sensor/camera/head_right/video
    • Message type: foxglove.CompressedVideo
    • Raw video topic: /sensor/camera/head_right/video_raw
    • Raw video type: foxglove.CompressedVideo
    • Intrinsic topic: /sensor/camera/head_right/intrinsic
    • Intrinsic type: foxglove.CameraCalibration
    • Extrinsic topic: /sensor/camera/head_right/extrinsic
    • Extrinsic type: foxglove.FrameTransforms

Head camera video topics carry compressed video frames as foxglove.CompressedVideo messages. You can inspect, play back using LW-VIZ or export these streams with LW-EgoSuite Devkit.

Camera Intrinsics

Camera publishes its calibration on a dedicated foxglove.CameraCalibration topic. For field definitions (including width, height, intrinsic matrix K, distortion model and parameters D, rectification matrix R, projection matrix P, and frame_id), see CameraCalibration documentation.

Camera Extrinsics

Camera extrinsic info is expressed as foxglove.FrameTransforms message. For field definitions (e.g. parent_frame_id, child_frame_id, translation, rotation), see FrameTransform documentation.

In EgoSuite MCAP files, each extrinsic message represents the position and orientation of the camera in the world frame. The message uses parent_frame_id = "world" and child_frame_id set to the concrete camera frame (head_left_camera or head_right_camera), with translation giving the camera center position and rotation (as a quaternion) giving the camera's orientation.

Computing the W2C (world-to-camera) extrinsic matrix

The EgoSuite MCAP extrinsic camera message uses the C2W (camera-to-world) convention. The following code converts it to a W2C (world-to-camera) extrinsic matrix, which transforms a point from world coordinates to camera coordinates.

import numpy as np
from scipy.spatial.transform import Rotation as R

R_c2w = R.from_quat([quat_x, quat_y, quat_z, quat_w]).as_matrix()
t_c2w = np.array([pos_x, pos_y, pos_z])

R_w2c = R_c2w.T
t_w2c = -R_w2c @ t_c2w

Typical Usage

Common use cases for head camera data include:

  • Visualizing egocentric RGB streams aligned with human pose.
  • Projecting 3D body or hand keypoints into image space using CameraCalibration plus FrameTransforms.
  • Synchronizing multi‑camera data (head and wrist cameras) using the shared MCAP timeline and topic timestamps.
Left and right head camera views with projected body and hand pose

Example head camera image.
Left – view from the left head camera; right – view from the right head camera.
Body pose and hand pose are projected into each image.