Skip to main content

Overview

The WebSocket Secure (WSS) /transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands. This documentation provides a comprehensive guide for integrating these capabilities.
This /transcribe endpoint supports real-time stateless dictation.
  • If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
  • If you are looking for transcript generation based on a pre-recorded audio file, then please refer to the /transcripts endpoint

1. Establishing a Connection

Clients must initiate a WebSocket connection using the wss:// scheme.
The authentication for the WSS stream requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.

Query Parameters

environment
enum
required
eu or us
tenant-name
string
required
Specifies the tenant context
token
string
required
Bearer $token
Example wss:/transcribe request
  curl --request GET \
    --url wss://api.${environment}.corti.app/audio-bridge/v2/transcribe?tenant-name=${tenant}&token=Bearer%20${accessToken}

Using SDK

You can use the Corti SDK (currently in “beta”) to connect to the /transcribe endpoint.
import { CortiClient, CortiEnvironment } from "@corti/sdk";

const cortiClient = new CortiClient({
    tenantName: "YOUR_TENANT_NAME",
    environment: CortiEnvironment.Eu,
    auth: {
        accessToken: "YOUR_ACCESS_TOKEN"
    },
});

const transcribeSocket = await cortiClient.transcribe.connect();

2. Handshake Response

101 Switching Protocols

Indicates a successful WebSocket connection. Upon successful connection, send a config message to define the configuration: Specify the input language and expected output preferences.
The config message must be sent within 10 seconds of the web socket being opened to prevent CONFIG-TIMEOUT, which will require establishing a new wss connection.

3. Sending Messages

Configuration

Declare your /transcribe configuration using the message "type": "config" followed by defining the "configuration": {"<config details, per options below>"}. Defining the type is required along with the primaryLanguage configuration parameter. The other parameters are optional for use, depending on your need and workflow.
Configuration notes:
  • Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data.
  • If the configuration is not valid it will return CONFIG_DENIED.
  • The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.
primaryLanguage
string
required
The locale of the primary spoken language. See supported languages codes and more information here.
spokenPunctuation
bool
When true, converts spoken punctuation such as period or slash into .or /. Read more about supported punctuation here.
automaticPunctuation
bool
When true, automatically punctuates and capitalizes in the final transcript.
Spoken and Automatic Punctuation are mutually exclusive - only one should be set to true in a given configuration request. If both are included and set to true, then spokenPunctuation will take precedence and override automaticPunctuation.
commands
command object[]
Provide the commands that should be registered and detected - Read more about commands here.
formatting
command object[]
Define each type of formatting preferences using the enum options described below. Read more about formatting here.
Formatting is currently beta status. API details subject to change ahead of general release.Defining formatting configuration is optional. When these preferences are not configured, the default values listed below will be applied automatically.
interimResults
bool
When true, returns interim (preview) results with reduced latency as compared to the final results. Interim (preview) results are indicated by isFinal=false in the stream response.
interimResults is currently beta status. API details subject to change ahead of general release. Use of this feature is optional and has no bearing on the final transcription results.

Example

Here is an example configuration for transcription of dictated audio in English with spoken punctuation enabled, two commands defined, and (default) formatting options defined:
wss:/transcribe configuration example
{
  "type": "config",
  "configuration":{
    "primaryLanguage": "en",
    "spokenPunctuation": true, 
    "commands": [
      {
        "id": "next_section",
        "phrases": ["next section", "go to next section"]
      },
      {
              "id": "insert_template",
              "phrases": [
                  "insert my {template_name} template",
                  "insert {template_name} template"
              ],
              "variables": [
                  {
                      "key": "template_name",
                      "type": "enum",
                      "enum": [
                          "radiology",
                          "referral"
                      ]
                  }
              ]
      }
    ],
    "formatting": {                        // default values:
      "dates": "d MM yyyy",                // long format (”3 February 2025”)
      "times": "h24",                      // 24 hour format (”16:00”)
      "numbers": "numerals_above_nine",    // single digit as words, multi-digit as number (”one, two, … nine, 10, 11, 12”)
      "measurements": "abbreviated",       // abbreviated ("mm", "cm", "in", "BP 120/80 mmHg")
      "numericRanges": "numerals",         // numerals ("1-10")
      "ordinals": "numerals",              // numerals ("1st, 2nd")
    },
  }
}

Using SDK

You can use the Corti SDK (currently in “beta”) to send configuration.
You can provide the configuration either directly when connecting, or send it as a separate message after establishing the connection:
const configuration = {
  "type": "config",
  "configuration":{
    "primaryLanguage": "en",
    "spokenPunctuation": true,
    "commands": [
      {
        "id": "next_section",
        "phrases": ["next section", "go to next section"]
      },
    ]
  }
};

const transcribeSocket = await cortiClient.transcribe.connect(
  { configuration }
);

Sending audio

Raw audio data to be transcribed sent as binary. See more details on audio formats and best practices for recording here.

Using SDK

You can use the Corti SDK (currently in “beta”) to send audio data.
transcribeSocket.sendAudio(audioChunk); // method doesn't do the chunking

Ending the session

To end the /transcribe session send a type: end message. This will signal the server to send any remaining transcript segments and detected commands. Then, the server will send a usage message,
{
  "type":"usage",
  "credits":0.1
}
Then, a message of type ended will be returned by the server before it closes the web socket.

Using SDK

You can use the Corti SDK (currently in “beta”) to end the /transcribe session.
When using automatic configuration (passing configuration to connect), the socket will close itself without reconnecting when it receives an ended message. When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this, you must subscribe to the ended message and manually close the connection.
const transcribeSocket = await cortiClient.transcribe.connect({
  configuration
});

transcribeSocket.sendEnd({ type: "end" });

4. Responses

Configuration

type
string
default:"CONFIG_ACCEPTED"
required
Returned when sending a valid configuration.
sessionId
uuid
required
Returned when sending a valid configuration.

Transcripts

type
string
default:"transcript"
required
data
string
required

Commands

type
string
default:"command"
required
data
string
required
Command response
{
  "type": "command",
  "data": {
    "id": "insert_template",
    "variables": {
      "template_name": "radiology"
    },
    "rawTranscriptText": "insert my radiology template",
    "start": 2.3,
    "end": 2.9,
  }
}

Using SDK

You can use the Corti SDK (currently in “beta”) to subscribe to responses from the /transcribe endpoint.
transcribeSocket.on("message", (message) => {
  switch (message.type) {
    case "transcript":
      console.log("Transcript:", message.data.text);
      break;
    case "command":
      console.log("Command detected:", message.data.id, message.data.variables);
      break;
    case "error":
      console.error("Error:", message.error);
      break;
    case "usage":
      console.log("Usage credits:", message.credits);
      break;
    default:
      // handle other messages
      break;
  }
});

5. Error Responses

type
string
required
Returned when sending an invalid configuration.Possible errors CONFIG_DENIED, CONFIG_TIMEOUT
reason
string
The reason the configuration is invalid.
sessionId
uuid
required
The session ID.
Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors. These are sent as JSON objects with the following structure:
{
  "type": "error",
  "error": {
    "id": "error id",
    "title": "error title",
    "status": 400,
    "details": "error details",
    "doc":"link to documentation"
  }
}
In some cases, receiving an “error” type message will cause the stream to end and send a message of type usage and type ENDED.

Using SDK

You can use the Corti SDK (currently in “beta”) to handle error messages.
With recommended configuration, configuration errors (e.g., CONFIG_DENIED, etc.) and runtime errors will both trigger the error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.
const transcribeSocket = await cortiClient.transcribe.connect({
  configuration
});

transcribeSocket.on("error", (error) => {
  // Emitted for both configuration and runtime errors
  console.error("Error event:", error);
  // The socket will close itself automatically
});

// still can be accessed with normal "message" subscription
transcribeSocket.on("message", (message) => {
  if (
    message.type === "CONFIG_DENIED" ||
    message.type === "CONFIG_TIMEOUT"
  ) {
    console.log("Configuration error (message):", message);
  }

  if (message.type === "error") {
    console.log("Runtime error (message):", message);
  }
});