Spoke Work

Recording

How Spoke Work's recording pipeline works — audio capture, streaming transcription, and speaker identification.

Audio Capture

Spoke Work records audio in WAV format with these parameters:

ParameterValue
Sample rate16,000 Hz
ChannelsMono
Bit depth16-bit PCM
Replay buffer5 seconds

The screen stays awake during recording. Recording phases: idlerecordingpausedcompleted.

Streaming Transcription

Audio is streamed to Deepgram via WebSocket in real-time:

  • Connection automatically reconnects on failure (up to 3 attempts)
  • Speaker diarization is enabled by default
  • Transcription arrives as interim results, then final results with punctuation
  • Results include speaker labels and timestamps

Speaker Identification

Spoke Work uses a two-layer approach:

  1. Deepgram diarization — Assigns numeric speaker labels (Speaker 0, Speaker 1, etc.)
  2. AI speaker inference — Maps numeric labels to actual participant names using conversation context and participant descriptions from the channel

The inference runs via a Supabase Edge Function with a 30-second timeout. Participant names and descriptions improve accuracy.

Speaker Smoothing

A smoothing algorithm reduces label flickering during real-time diarization. When a segment is misattributed, the smoothing algorithm considers surrounding context to correct the assignment.

Upload and Recovery

After recording stops:

  1. Audio file is saved locally as WAV
  2. Upload to Supabase Storage begins
  3. On failure, retries 3 times with delays of 2s, 5s, and 10s
  4. The full transcript with speaker labels is saved to the database

On this page