Skip to main content

Overview

The WebSocket Secure (WSS) /transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech to text. Clients can send and receive structured data, including transcripts and detected commands. This documentation provides a comprehensive guide for integrating these capabilities.
This /transcribe endpoint supports real-time stateless dictation.
  • If you are looking for real-time ambient documentation interactions, you should use the /streams WSS
  • If you are looking for transcript generation based on a pre-recorded audio file, then please refer to the /transcripts endpoint

1. Establishing a Connection

Clients must initiate a WebSocket connection using the wss:// scheme.
The authentication for the WSS streams requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.

Query Parameters

environment
enum
required
eu or us
tenant-name
string
required
Specifies the tenant context
token
string
required
Bearer $token
import { CortiClient } from "@corti/sdk";

const client = new CortiClient({
  auth: {
    accessToken: "<ACCESS_TOKEN_FROM_YOUR_BACKEND>",
  },
});
curl --request GET \
  --url "wss://api.$environment.corti.app/audio-bridge/v2/transcribe?tenant-name=$tenant&token=Bearer%20$accessToken"

2. Handshake Response

101 Switching Protocols

Indicates a successful WebSocket connection. Upon successful connection, send a config message to define the configuration: Specify the input language and expected output preferences.
The config message must be sent within 10 seconds of the web socket being opened to prevent CONFIG-TIMEOUT, which will require establishing a new wss connection.

3. Sending Messages

Configuration

Declare your /transcribe configuration using the message "type": "config" followed by defining the "configuration": {<config details, per options below>}. Defining the type is required along with the primaryLanguage configuration parameter. The other parameters are optional for use, depending on your need and workflow.
Configuration notes:
  • Clients must send a streams configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data.
  • If the configuration is not valid it will return CONFIG_DENIED.
  • The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.
type
string
default:"config"
required
configuration
object
required

Example

Here is an example configuration for transcription of dictated audio in English with spoken punctuation enabled, two commands defined, and (default) formatting options defined:
{
  "type": "config",
  "configuration":{
    "primaryLanguage": "en",
    "interimResults": true,
    "spokenPunctuation": true, 
    "commands": [
      {
        "id": "next_section",
        "phrases": [
          "next section", "go to next section"
        ]
      },
      {
        "id": "insert_template",
        "phrases": [
          "insert my {template_name} template", "insert {template_name} template"
        ],
        "variables": [
          {
            "key": "template_name",
            "type": "enum",
            "enum": [
              "soap", "radiology", "referral"
            ]
          }
        ]
      }
    ],
    "formatting": {
      "dates": "locale:long",
      "times": "locale",
      "numbers": "numerals_above_nine",
      "measurements": "abbreviated",
      "numericRanges": "numerals",
      "ordinals": "numerals_above_nine"
    },
    "audioFormat": "audio/ogg"
  }
}

Sending Audio

Ensure that your configuration was accepted before sending audio, and that the initial audio chunk is not too small as it needs to contain the headers to properly decode the audio.We recommend sending audio in chunks of 250-500ms. In terms of buffering, the limit is 64000 bytes per chunk.Audio data should be sent as raw binary without JSON wrapping.
A variety of common audio formats are supported; audio will be passed through a transcoder before speech-to-text processing. Similarly, specification of sample rate, depth or other audio settings is not required at this time. See more details on supported audio formats here.
transcribeSocket.sendAudio(audioChunk); // method doesn't do the chunking

Flush the Audio Buffer

To flush the audio buffer, forcing transcript segments and detected commands to be returned over the web socket (e.g., when turning off or muting the microphone in a “hold-to-talk” dictation workflow, or in applications that support mic “go to sleep”), send a message -
{
  "type":"flush"
}
The server will return text/commands for audio sent before the flush message and then respond with messages -
{
  "type": "flushed"
}
{
  "type": "delta_usage",
  "credits": 0.00116
}
Delta usage represents incremental credit consumption between recording initiation and flush events. Delta usage is approximate and may differ slightly from final usage sent after end message is processed (see below). Final, end session usage will be reflected in API billing. The web socket will remain open after flush processing so dictation can continue.
Client side considerations:1 If you rely on a flush event to separate data (e.g., for different sections in an EHR template), then be sure to receive the flushed event before moving on to the next data field.2 When using a web browser MediaRecorder API, audio is buffered and only emitted at the configured timeslice interval. Therefore, before sending a flush message, call MediaRecorder.requestData() to force any remaining buffered audio on the client to be transmitted to the server. This ensures all audio reaches the server before the flush is processed.

Ending the Session

To end the /transcribe session, send a message -
{
  "type": "end"
}
This will signal the server to send any remaining transcript segments and/or detected commands. Then, the server will send two messages -
{
  "type": "usage",
  "credits": 0.1
}
{
  "type": "ended"
}
Following the message type ended, the server will close the web socket.

4. Responses

Configuration

type
string
default:"CONFIG_ACCEPTED"
required
Returned when sending a valid configuration.
sessionId
uuid
required
Returned when sending a valid configuration.
configuration
object
The resolved configuration, including accepted client-defined values and server-applied defaults for parameters not defined in client configuration.

Transcripts

type
string
default:"transcript"
required
data
string
required
Transcript response
{
    "type": "transcript",
    "data": {
        "text": "patient reports mild chest pain.",
        "rawTranscriptText": "patient reports mild chest pain period",
        "start": 0.0,
        "end": 3.2,
        "isFinal": true
    }
}
Click here for detailed guide on how to properly insert transcript segments with proper handling of whitespace, interim vs. final results, and text vs. rawTranscriptText fields.

Commands

type
string
default:"command"
required
data
string
required
Command response
{
  "type": "command",
  "data": {
    "id": "insert_template",
    "variables": {
      "template_name": "radiology"
    },
    "rawTranscriptText": "insert my radiology template",
    "start": 2.3,
    "end": 2.9
  }
}

Flushed

type
string
default:"flushed"
required
Returned by server, after processing flush event from client, to return transcript segments/ detected commands
{
  "type": "flushed"
}

Usage

type
string
default:"delta-usage"
Returned by server, after processing flush event from client, to convey amount of credits consumed since recording started. Delta usage is approximate and may differ slightly from final usage sent after end message is processed.
{
  "type": "delta_usage",
  "credits": 0.00116
}
type
string
default:"usage"
Returned by server, after processing end event from client, to convey amount of credits consumed
{
  "type": "usage",
  "credits": 0.1
}

Ended

type
string
default:"ended"
required
Returned by server, after processing end event from client, before closing the web socket
{
  "type": "ended"
}

Subscribe to messages in SDK

socket.on("message", (msg) => {
    switch (msg.type) {
        case "transcript":
            console.log("Transcript:", msg.data.text, "final:", msg.data.isFinal);
            break;
        case "command":
            console.log("Command:", msg.data.id, msg.data.variables);
            break;
        case "flushed":
            console.log("Flush complete");
            break;
        case "ended":
            console.log("Session ended");
            socket.close();
            break;
        case "usage":
            console.log("Credits used:", msg.credits);
            break;
        case "error":
            console.error("Server error:", msg.error);
            break;
    }
});

5. Error Handling

type
string
required
Returned when sending an invalid configuration.Possible errors: CONFIG_DENIED, CONFIG_TIMEOUT, CONFIG_ALREADY_RECEIVED, CONFIG_MISSING
reason
string
The reason the configuration is invalid.
sessionId
uuid
required
The session ID.
Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors. These are sent as JSON objects with the following structure:
{
  "type": "error",
  "error": {
    "id": "error id",
    "title": "error title",
    "status": 400,
    "details": "error details",
    "doc":"link to documentation"
  }
}

Handle errors in SDK

With the recommended approach (passing configuration as part of connect), configuration errors are raised during connect (the call fails). Runtime errors are emitted via the error event; you can also inspect the original message in the message handler.
try {
    const socket = await client.transcribe.connect({
        configuration: {
            primaryLanguage: "en",
            automaticPunctuation: true,
        },
    });

    socket.on("error", (err) => {
        // Network errors and reconnect failures
        console.error("Socket error:", err.message);
    });

    socket.on("message", (msg) => {
        if (msg.type === "error") {
            // Server-sent runtime error
            console.error("Server error:", msg.error);
        }
    });
} catch (err) {
    // CONFIG_DENIED, CONFIG_TIMEOUT, or connection failure
    console.error("Connect failed:", err.message);
}