Skip to main content
The /transcribe WebSocket emits transcript segments as plain text. Each segment is produced without full knowledge of your UI state (cursor position, selection, surrounding characters, field boundaries). The backend service can’t reliably decide whether to prefix spaces, capitalize the first letter of a new sentence in your document, or avoid double-spaces because those depend on what’s already in the text field at the current cursor.
Treat each incoming transcript segment as a suggestion with the client responsible for the following:
  • Whitespace normalization at the insertion boundary (before/after the inserted segment).
  • Interim rendering (show “in-progress” text that can be replaced).
  • Commit semantics when isFinal: true.
Click here to view transcript response message details.

Understanding Transcript Fields

Use data.text for insertion into the document - it is already normalized with punctuation applied and command phrases removed. Use rawTranscriptText for debugging, analytics, or for workflows where you want to show what the user literally said (including spoken punctuation phrases).

Spacing Rules at the Insertion Boundary

Goal is to insert segments so words are not stitched together incorrectly ("pain." + "Patient""pain. Patient"), but also avoid leading spaces when the cursor is at the beginning of a field or after whitespace.
  1. Before inserting Text:
    • Look at the character immediately before the insertion point (prevChar).
    • Look at the first character of Text (nextChar).
  2. Then decide whether to prepend a space.

Recommendation

Prepend exactly one space only if all are true:
  • You are not at the start of the field (cursor > 0).
  • prevChar is not whitespace (' ', '\n', '\t') and is not an opening bracket/quote you want to stick to (e.g. (, [, {, , ").
  • nextChar is not punctuation that should attach to the left (e.g. , . : ; ! ? ) ] } %).
This handles the common cases:
  • "pain." + "Patient reports..." → adds a space.
  • "" + "Patient reports..." → no leading space.
  • "(" + "mild chest pain" → no space after (.
  • "pain" + "," → no space before comma.
Post-insert normalization (optional) If your editor frequently contains trailing spaces before the cursor, you can also:
  • Collapse multiple spaces at the join (“pain.␠␠” + “Patient” → “pain.␠Patient”).
  • Avoid adding a space if there’s already whitespace immediately before the cursor.

Handling Interim vs Final Segments

Because isFinal: false indicates an interim, preview results, you’ll typically want to do the following:
  • Display one “active interim span” in the editor (or in your UI layer) that you replace as new interim segments arrive.
  • On isFinal: true, commit by turning the interim span into normal text and clearing the interim buffer.
  • Ensure interim text from detected command phrases are also cleared.

Recommendation

  • Maintain:
    • committedText (what’s already committed in the field), and
    • interimText (what you’re showing but will replace).
  • On each message:
    • If isFinal: false: replace interimText with the new segment (after applying boundary spacing vs. committed + cursor).
    • If isFinal: true: append/insert it into committedText, clear interimText.

Cursor Movement and Selections

Transcript insertion is safest when the insertion point is stable, but handling for cursor movement is recommended to support navigation during active dictation.

Recommendation

  • If the user moves the cursor or changes selection while dictating, treat it as a new insertion context:
    • Clear the interim span.
    • Use a flush message so remaining buffered audio returns transcripts promptly, then continue dictation in the new location (the API will respond with type: "flushed" when done returning text from audio received before the type: flush message was sent to the server.)
  • If there is an active selection, decide explicitly:
    • Replace the selection with the inserted transcript (common), or
    • Collapse selection to an insertion point before inserting.

Examples

Scenarios

Existing TextCursor LocationIncoming textInsert
1""0"Patient reports mild chest pain.""Patient reports mild chest pain." (no leading space)
2"Assessment:"end"patient denies fever."" patient denies fever." (add one leading space)
3"Pain ("end"mild""mild" (no space after (
4"No"end","" (no space before comma)

Sample Code

const LEFT_ATTACH = new Set([",", ".", ":", ";", "!", "?", ")", "]", "}", "%"]);
const NO_SPACE_AFTER = new Set(["(", "[", "{", "“", "\"", "'"]);

function isWhitespace(ch: string) {
  return ch === " " || ch === "\n" || ch === "\t";
}

function needsLeadingSpace(prevChar: string | null, nextChar: string) {
  if (!prevChar) return false;                   // start of field
  if (isWhitespace(prevChar)) return false;      // already separated
  if (NO_SPACE_AFTER.has(prevChar)) return false;// e.g. "(" + "word"
  if (LEFT_ATTACH.has(nextChar)) return false;   // e.g. "word" + ","
  return true;
}

/**
 * Decide what string to insert at the current cursor.
 * `text` is the entire field value; `cursor` is insertion index.
 * `segment` is message.data.text (recommended) 
 */
export function buildInsertion(text: string, cursor: number, segment: string) {
  if (!segment) return "";

  const prevChar = cursor > 0 ? text[cursor - 1] : null;
  const nextChar = segment[0];

  const lead = needsLeadingSpace(prevChar, nextChar) ? " " : "";

  // Optional: collapse multiple spaces immediately before cursor if you want
  // (keep conservative; only do this if your product expects it)
  // const trimmedLeft = (prevChar === " " && lead === " ") ? "" : lead;

  return lead + segment;
}

Contact us if you need further assistance working with transcript segments.