Skip to main content

Audio

When present, EgoSuite episodes include a synchronized audio stream recorded together with videos.

Topic & Message Type

  • Topic: /audio
  • Message type: foxglove.RawAudio
  • Encoding: protobuf

There is typically a single audio topic per episode. The foxglove.RawAudio message currently supports only the pcm-s16 format.

Timing & Synchronization

Audio in EgoSuite MCAP files is designed to be time‑aligned with other modalities:

  • Each foxglove.RawAudio message carries a nanosecond‑precision timestamp.
  • These timestamps are on the same global timeline as:
    • Camera video topics (/sensor/camera/*/video)
    • Pose topics (/pose/body, /pose/left_hand, /pose/right_hand, /pose/head_pose, etc.)
    • Point clouds (/pointcloud)
  • This allows you to:
    • Play back audio and video in sync.
    • Analyze how sound evolves while the human performs actions captured by body/hand pose.

Typical Usage

Common use cases for the audio stream include:

  • Synchronized playback with head and wrist camera video in tools like Foxglove Studio.
  • Audio‑conditioned models that leverage synchronized pose, video, and sound.
  • Detecting events (e.g. contact, object impact) in the waveform and relating them to 3D motion or hand–object interactions.