isDiarization: true) on /streams, the server attributes speech to distinct speakers. Because segments for different speakers are finalized independently, type: "transcript" messages are not guaranteed to arrive in chronological order over the WebSocket.
This applies to multi-speaker
/streams sessions only when diarization is enabled. The /transcribe endpoint is single-speaker only.Diarized Transcript Segment Ordering
Eachtranscript message carries a data array, and multiple segments for different speakers can arrive in the same message. Ordering of transcript text should be mindful of both speaker and speech time: A later-spoken segment for one speaker can also arrive before an earlier-spoken segment from another.
Only final segments are returned on
/streams (final: true); there are no interim results to reconcile. The handling concern is purely ordering, not interim-vs-final dedup as is required with dictation transcript handling).data includes a time object you can use to recover the true order:
| Field | Use |
|---|---|
time.start | Primary ordering key — when the speech began (seconds) |
time.end | Secondary key for tie-breaks and overlap checks (seconds) |
speakerId | A distinct integer per detected speaker (up to four). Returns -1 when diarization is disabled. |
participant.channel | Audio channel the segment was attributed to |
speakerId and participant.channel are independent concepts. Diarization separates speakers within a transcribed stream; the channel reflects audio routing. Do not assume a fixed mapping between the two.Recommendation
- Position each segment by
time.start, not by arrival order. Insert into the running transcript at the position determined bytime.startrather than appending to the end. - Use
time.endas a secondary key for tie-breaks. - Iterate the full
dataarray on every message and order before rendering — do not assume a single segment per message.
Please contact us if you need further assistance working with diarized transcripts.