Skip to main content
The /transcribe WebSocket emits transcript segments as plain text. Each segment is produced without full knowledge of your UI state (cursor position, selection, surrounding characters, field boundaries). The backend service can’t reliably decide whether to prefix spaces, capitalize the first letter of a new sentence in your document, or avoid double-spaces because those depend on what’s already in the text field at the current cursor.
This guide assumes an established, configured connection. For establishing the WebSocket, sending configuration, and waiting for CONFIG_ACCEPTED before streaming audio, see WebSocket Connections.
Treat each incoming transcript segment as a suggestion with the client responsible for the following:
  • Whitespace normalization at the insertion boundary (before/after the inserted segment).
  • Interim rendering (show “in-progress” text that can be replaced).
  • Commit semantics when isFinal: true.
Click here to view transcript response message details.

Understanding Transcript Fields

Use data.text for insertion into the document - it is already normalized with punctuation applied, locale-specific spacing preserved, and command phrases removed. Use data.rawTranscriptText for debugging, analytics, or for workflows where you want to show what the user literally said (including spoken punctuation phrases).
/transcribe returns a single data object per transcript message (one segment). This differs from /streams, which returns a data array of speaker-attributed segments — see Diarized Transcript Handling.

Spacing Rules at the Insertion Boundary

Goal is to insert segments so words are not stitched together incorrectly ("pain." + "Patient""pain. Patient"), but also avoid leading spaces when the cursor is at the beginning of a field or after whitespace.
  1. Before inserting Text:
    • Look at the character immediately before the insertion point (prevChar).
    • Look at the first character of Text (nextChar).
  2. Then decide whether to prepend a space.

Recommendation

Prepend exactly one space only if all are true:
  • You are not at the start of the field (cursor > 0).
  • prevChar is not whitespace (' ', '\n', '\t') and is not an opening bracket/quote you want to stick to (e.g. (, [, {, , ").
  • nextChar is not punctuation that should attach to the left in the current locale (for most locales, , . : ; ! ? ) ] } %).
This handles the common cases:
  • "pain." + "Patient reports..." → adds a space.
  • "" + "Patient reports..." → no leading space.
  • "(" + "mild chest pain" → no space after (.
  • "pain" + "," → no space before comma.
Post-insert normalization (optional) If your editor frequently contains trailing spaces before the cursor, you can also:
  • Collapse multiple spaces at the join (“pain.␠␠” + “Patient” → “pain.␠Patient”).
  • Avoid adding a space if there’s already whitespace immediately before the cursor.
Casing at insertion boundaries is client-owned and should only be applied at true sentence or paragraph boundaries, not on every interim update.

Locale-specific spacing

Some punctuation spacing rules are language-specific. For primaryLanguage: "fr" (or "fr-FR"):
  • Use a non-breaking space (\u00A0) before standalone :, ;, !, and ? when those characters arrive as separate transcript segments.
  • If data.text already contains the spacing (for example, bonjour\u00A0:), insert it as-is and do not normalize it away.
Do not apply this French non-breaking-space rule to other locales such as fr-CH.

Handling Interim vs Final Segments

Because isFinal: false indicates an interim, preview results, you’ll typically want to do the following:
  • Display one “active interim span” in the editor (or in your UI layer) that you replace as new interim segments arrive.
  • On isFinal: true, commit by turning the interim span into normal text and clearing the interim buffer.
  • Ensure interim text from detected command phrases are also cleared.

Recommendation

  • Maintain:
    • committedText (what’s already committed in the field), and
    • interimText (what you’re showing but will replace).
  • On each message:
    • If isFinal: false: replace interimText with the new segment (after applying boundary spacing vs. committed + cursor, keep only the newest interim for the current unfinished span).
    • If isFinal: true: append/insert it into committedText, clear interimText.
In the editor, maintain a single active interim span and replace it as newer interim results arrive. Do not accumulate multiple interim results into committed text.After stop/flush, transcript messages may arrive out of order. If a final result has already been committed for a span, ignore later interim results for that span. Also ignore late interim results that overlap already-finalized timeline. start should be the primary identity key; end is the secondary stale-overlap check during stop/flush races.

Cursor Movement and Selections

Transcript insertion is safest when the insertion point is stable, but handling for cursor movement is recommended to support navigation during active dictation.

Recommendation

  • If the user moves the cursor or changes selection while dictating, treat it as a new insertion context:
    • Clear the interim span.
    • Use a flush message so remaining buffered audio returns transcripts promptly, then continue dictation in the new location (the API will respond with type: "flushed" when done returning text from audio received before the type: flush message was sent to the server.)
  • If there is an active selection, decide explicitly:
    • Replace the selection with the inserted transcript (common), or
    • Collapse selection to an insertion point before inserting.
This page covers single-speaker /transcribe dictation. See Diarized Transcript Handling for multi-speaker, diarized output on /streams, where segments can arrive out of chronological order across speakers.

Examples

Scenarios

Existing TextCursor LocationIncoming textInsert
1""0"Patient reports mild chest pain.""Patient reports mild chest pain." (no leading space)
2"Assessment:"end"patient denies fever."" patient denies fever." (add one leading space)
3"Pain ("end"mild""mild" (no space after (
4"No"end",""," (no space before comma)
5"Bonjour"end":""\u00A0:" when primaryLanguage is "fr" or "fr-FR"

Sample Code

JavaScript
const FRENCH_NBSP_BEFORE = new Set([":", ";", "!", "?"]);
const LEFT_ATTACH = new Set([",", ".", ":", ";", "!", "?", ")", "]", "}", "%"]);
const NO_SPACE_AFTER = new Set(["(", "[", "{", "“", "\"", "'"]);

function isWhitespace(ch: string) {
  return ch === " " || ch === "\n" || ch === "\t" || ch === "\u00A0";
}

function getLeadingSeparator(
  prevChar: string | null,
  nextChar: string,
  primaryLanguage?: string,
) {
  if (!prevChar) return "";
  if (isWhitespace(prevChar)) return "";
  if (isWhitespace(nextChar)) return "";
  if (NO_SPACE_AFTER.has(prevChar)) return "";

  const normalizedPrimaryLanguage = primaryLanguage?.toLowerCase();
  const isFrenchNbspLocale =
    normalizedPrimaryLanguage === "fr" || normalizedPrimaryLanguage === "fr-fr";

  if (isFrenchNbspLocale && FRENCH_NBSP_BEFORE.has(nextChar)) {
    return "\u00A0";
  }

  if (LEFT_ATTACH.has(nextChar)) return "";
  return " ";
}

/**
 * Decide what string to insert at the current cursor.
 * `text` is the entire field value, `cursor` is the insertion index,
 * and `segment` should be `message.data.text`.
 */
export function buildInsertion(
  text: string,
  cursor: number,
  segment: string,
  primaryLanguage?: string,
) {
  if (!segment) return "";

  const prevChar = cursor > 0 ? text[cursor - 1] : null;
  const nextChar = segment[0];
  const lead = getLeadingSeparator(prevChar, nextChar, primaryLanguage);

  // Preserve any service-provided spacing inside the segment itself.
  return lead + segment;
}

Contact us if you need further assistance working with transcript segments.