Recording

How Spoke Work's recording pipeline works — audio capture, streaming transcription, and speaker identification.

Audio Capture

Spoke Work records audio in WAV format with these parameters:

Parameter	Value
Sample rate	16,000 Hz
Channels	Mono
Bit depth	16-bit PCM
Replay buffer	5 seconds

The screen stays awake during recording. Recording phases: idle → recording → paused → completed.

Streaming Transcription

Audio is streamed to Deepgram via WebSocket in real-time:

Connection automatically reconnects on failure (up to 3 attempts)
Speaker diarization is enabled by default
Transcription arrives as interim results, then final results with punctuation
Results include speaker labels and timestamps

Speaker Identification

Spoke Work uses a two-layer approach:

Deepgram diarization — Assigns numeric speaker labels (Speaker 0, Speaker 1, etc.)
AI speaker inference — Maps numeric labels to actual participant names using conversation context and participant descriptions from the channel

The inference runs via a Supabase Edge Function with a 30-second timeout. Participant names and descriptions improve accuracy.

Speaker Smoothing

A smoothing algorithm reduces label flickering during real-time diarization. When a segment is misattributed, the smoothing algorithm considers surrounding context to correct the assignment.

Upload and Recovery

After recording stops:

Audio file is saved locally as WAV
Upload to Supabase Storage begins
On failure, retries 3 times with delays of 2s, 5s, and 10s
The full transcript with speaker labels is saved to the database