/transcribe WebSocket emits transcript segments as plain text. Each segment is produced without full knowledge of your UI state (cursor position, selection, surrounding characters, field boundaries).
The backend service can’t reliably decide whether to prefix spaces, capitalize the first letter of a new sentence in your document, or avoid double-spaces because those depend on what’s already in the text field at the current cursor.
Understanding Transcript Fields
Usedata.text for insertion into the document - it is already normalized with punctuation applied and command phrases removed.
Use rawTranscriptText for debugging, analytics, or for workflows where you want to show what the user literally said (including spoken punctuation phrases).
Spacing Rules at the Insertion Boundary
Goal is to insert segments so words are not stitched together incorrectly ("pain." + "Patient" → "pain. Patient"), but also avoid leading spaces when the cursor is at the beginning of a field or after whitespace.
- Before inserting
Text:- Look at the character immediately before the insertion point (
prevChar). - Look at the first character of
Text(nextChar).
- Look at the character immediately before the insertion point (
- Then decide whether to prepend a space.
Recommendation
—Prepend exactly one space only if all are true:
- You are not at the start of the field (
cursor > 0). prevCharis not whitespace (' ','\n','\t') and is not an opening bracket/quote you want to stick to (e.g.(,[,{,“,").nextCharis not punctuation that should attach to the left (e.g., . : ; ! ? ) ] } %).
"pain."+"Patient reports..."→ adds a space.""+"Patient reports..."→ no leading space."("+"mild chest pain"→ no space after(."pain"+","→ no space before comma.
- Collapse multiple spaces at the join (“pain.␠␠” + “Patient” → “pain.␠Patient”).
- Avoid adding a space if there’s already whitespace immediately before the cursor.
Handling Interim vs Final Segments
BecauseisFinal: false indicates an interim, preview results, you’ll typically want to do the following:
- Display one “active interim span” in the editor (or in your UI layer) that you replace as new interim segments arrive.
- On
isFinal: true, commit by turning the interim span into normal text and clearing the interim buffer. - Ensure interim text from detected
command phrasesare also cleared.
Recommendation
—
- Maintain:
committedText(what’s already committed in the field), andinterimText(what you’re showing but will replace).
- On each message:
- If
isFinal: false: replaceinterimTextwith the new segment (after applying boundary spacing vs. committed + cursor). - If
isFinal: true: append/insert it intocommittedText, clearinterimText.
- If
Cursor Movement and Selections
Transcript insertion is safest when the insertion point is stable, but handling for cursor movement is recommended to support navigation during active dictation.Recommendation
—
- If the user moves the cursor or changes selection while dictating, treat it as a new insertion context:
- Clear the interim span.
- Use a
flushmessage so remaining buffered audio returns transcripts promptly, then continue dictation in the new location (the API will respond withtype: "flushed"when done returning text from audio received before thetype: flushmessage was sent to the server.)
- If there is an active selection, decide explicitly:
- Replace the selection with the inserted transcript (common), or
- Collapse selection to an insertion point before inserting.
Examples
Scenarios
| Existing Text | Cursor Location | Incoming text | Insert | |
|---|---|---|---|---|
| 1 | "" | 0 | "Patient reports mild chest pain." | "Patient reports mild chest pain." (no leading space) |
| 2 | "Assessment:" | end | "patient denies fever." | " patient denies fever." (add one leading space) |
| 3 | "Pain (" | end | "mild" | "mild" (no space after ( |
| 4 | "No" | end | "," | " (no space before comma) |
Sample Code
Contact us if you need further assistance working with transcript segments.