Skip to main content

Audio

When present, EgoSuite episodes include a synchronized audio stream recorded together with videos.

Topic & Message Type

Topic: /audio
Message type: foxglove.RawAudio
Encoding: protobuf

There is typically a single audio topic per episode. The foxglove.RawAudio message currently supports only the pcm-s16 format.

Timing & Synchronization

Audio in EgoSuite MCAP files is designed to be time‑aligned with other modalities:

Each foxglove.RawAudio message carries a nanosecond‑precision timestamp.
These timestamps are on the same global timeline as:
- Camera video topics (/sensor/camera/*/video)
- Pose topics (/pose/body, /pose/left_hand, /pose/right_hand, /pose/head_pose, etc.)
- Point clouds (/pointcloud)
This allows you to:
- Play back audio and video in sync.
- Analyze how sound evolves while the human performs actions captured by body/hand pose.

Typical Usage

Common use cases for the audio stream include:

Synchronized playback with head and wrist camera video in tools like Foxglove Studio.
Audio‑conditioned models that leverage synchronized pose, video, and sound.
Detecting events (e.g. contact, object impact) in the waveform and relating them to 3D motion or hand–object interactions.

Topic & Message Type
Timing & Synchronization
Typical Usage