Recording
How Spoke Work's recording pipeline works — audio capture, streaming transcription, and speaker identification.
Audio Capture
Spoke Work records audio in WAV format with these parameters:
| Parameter | Value |
|---|---|
| Sample rate | 16,000 Hz |
| Channels | Mono |
| Bit depth | 16-bit PCM |
| Replay buffer | 5 seconds |
The screen stays awake during recording. Recording phases: idle → recording → paused → completed.
Streaming Transcription
Audio is streamed to Deepgram via WebSocket in real-time:
- Connection automatically reconnects on failure (up to 3 attempts)
- Speaker diarization is enabled by default
- Transcription arrives as interim results, then final results with punctuation
- Results include speaker labels and timestamps
Speaker Identification
Spoke Work uses a two-layer approach:
- Deepgram diarization — Assigns numeric speaker labels (Speaker 0, Speaker 1, etc.)
- AI speaker inference — Maps numeric labels to actual participant names using conversation context and participant descriptions from the channel
The inference runs via a Supabase Edge Function with a 30-second timeout. Participant names and descriptions improve accuracy.
Speaker Smoothing
A smoothing algorithm reduces label flickering during real-time diarization. When a segment is misattributed, the smoothing algorithm considers surrounding context to correct the assignment.
Upload and Recovery
After recording stops:
- Audio file is saved locally as WAV
- Upload to Supabase Storage begins
- On failure, retries 3 times with delays of 2s, 5s, and 10s
- The full transcript with speaker labels is saved to the database