Audio
When present, EgoSuite episodes include a synchronized audio stream recorded together with videos.
Topic & Message Type
- Topic:
/audio - Message type:
foxglove.RawAudio - Encoding:
protobuf
There is typically a single audio topic per episode.
The foxglove.RawAudio message currently supports only the pcm-s16 format.
Timing & Synchronization
Audio in EgoSuite MCAP files is designed to be time‑aligned with other modalities:
- Each
foxglove.RawAudiomessage carries a nanosecond‑precision timestamp. - These timestamps are on the same global timeline as:
- Camera video topics (
/sensor/camera/*/video) - Pose topics (
/pose/body,/pose/left_hand,/pose/right_hand,/pose/head_pose, etc.) - Point clouds (
/pointcloud)
- Camera video topics (
- This allows you to:
- Play back audio and video in sync.
- Analyze how sound evolves while the human performs actions captured by body/hand pose.
Typical Usage
Common use cases for the audio stream include:
- Synchronized playback with head and wrist camera video in tools like Foxglove Studio.
- Audio‑conditioned models that leverage synchronized pose, video, and sound.
- Detecting events (e.g. contact, object impact) in the waveform and relating them to 3D motion or hand–object interactions.