Real-time stateless dictation

Overview

The WebSocket Secure (WSS) /transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands. This documentation provides a comprehensive guide for integrating these capabilities.

This /transcribe endpoint supports real-time stateless dictation.

If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
If you are looking for asynchronous transcript generation as part of an interaction, then please refer to the /transcripts endpoint

Environment Options

Environment	Description
`us`	US-based instance
`eu`	EU-based instance

Establishing a Connection

Clients must initiate a WebSocket connection using the wss:// scheme.

When creating an interaction, the 200 response provides a websocketUrl for that interaction including the tenant-name as url parameter. The authentication for the WSS stream requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.

Query Parameters

tenant-name

string

required

Specifies the tenant context

token

string

required

Bearer $token

  curl --request GET \
    --url wss://api.${environment}.corti.app/audio-bridge/v2/transcribe?tenant-name=${tenant}&token=Bearer%20${accessToken}

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to connect to the /transcribe endpoint.

import { CortiClient, CortiEnvironment } from "@corti/sdk";

const cortiClient = new CortiClient({
    tenantName: "YOUR_TENANT_NAME",
    environment: CortiEnvironment.BetaEu,
    auth: {
        accessToken: "YOUR_ACCESS_TOKEN"
    },
});

const transcribeSocket = await cortiClient.transcribe.connect();

Handshake Response

101 Switching Protocols

Indicates a successful WebSocket connection. Upon successful connection, send a message including the configuration to specify the input and expected output formats.

Sending Messages

Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data. If the configuration is not valid it will return CONFIG_DENIED. The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.

Basic Stream Configuration

primaryLanguage

string

required

The locale of the primary spoken language. Check https://docs.corti.ai/about/languages for more.

interimResults

bool

When true, returns interim results for reduced latency

spokenPunctuation

bool

When true, converts spoken punctuation such as period or slash into .or /.

automaticPunctuation

bool

When true, automatically punctuates and capitalizes in the final transcript.

Advanced Stream Configuration

Commands

The transcribe endpoint supports registration and detection of commands, common in dictation workflows. Extend the configuration with the following parameters to register commands that should be detected.

commands

command object[]

Provide the commands that should be registered and detected

Show command properties

string

required

To identify the command when it gets detected and returned over the WebSocket

phrases

string[]

The spoken phrases that should trigger the command

variables

string[]

The spoken phrases that should trigger the command

Show variable properties

key

string

required

To identify the command when it gets detected and returned over the WebSocket

type

string

The spoken phrases that should trigger the command

enum

string[]

The spoken phrases that should trigger the command

Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation and automatic punctuation enabled, and example commands defined.

Configuration example

{
  primaryLanguage: "en",
  interimResults: true, 
  spokenPunctuation: true, 
  automaticPunctuation: true,
  commands: [
    {
      id: "next_section",
      phrases: ["next section", "go to next section"]
    },
    {
      id: "delete",
      phrases: ["delete that"]
    },
    {
            "id": "insert_template",
            "phrases": [
                "insert my {template_name} template",
                "insert {template_name} template"
            ],
            "variables": [
                {
                    "key": "template_name",
                    "type": "enum",
                    "enum": [
                        "radiology",
                        "referral"
                    ]
                }
            ]
    }
  ],
}

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to send configuration.

You can provide the configuration either directly when connecting, or send it as a separate message after establishing the connection:

const configuration = {
  primaryLanguage: "en",
  commands: [
    {
      id: "next_section",
      phrases: ["next section", "go to next section"]
    },
  ]
};

const transcribeSocket = await cortiClient.transcribe.connect(
  { configuration }
);

Formatting

The transcribe endpoint provides the option to configure formatting preferences. Extend the configuration with the following parameters to apply formatting that should be used when returning text output.

Formatting functionality is currently in beta status. API details subject to change ahead of general release.Defining formatting configuration is optional. When these preferences are not configured, the default values listed below will be applied automatically.

formatting

command object[]

Define formatting preferences

Show formatting options

dates

enum

Option	Format	Example
0	As dictated	”February third twenty twenty five”
1	Long date (default)	“3 February 2025”
2	Short date (US)	“02/03/2025”
3	Short date (EU)	“03/02/2025”
4	ISO	”20250302”

times

enum

Option	Format	Example
0	As dictated	”Four o’clock” or “four thirty five” or “sixteen hundred”
1	12-hour	”4:00 PM”
2	24-hour (default)	“16:00”

numbers

enum

Option	Format	Example
0	As dictated	”one, two, … nine, ten, eleven”
1	Single digit as words, multi-digit as number (default)	“One, two … nine, 10, 11”
2	Numbers only	”1, 2, … 9, 10, 11”

units

enum

Option	Format	Example
0	As dictated	”Millimeters, centimeters, inches”
1	Abbreviated (default)	“mm, cm, in”

Click here to see a full list of supported units

abbreviations

enum

Option	Format	Example
0	As dictated	”Blood pressure one twenty over eighty”
1	Abbreviated (default)	“BP 120/80”

Click here to see a full list of supported abbreviations

numericRanges

enum

Option	Format	Example
0	As dictated	”one to ten”
1	As numbers (default)	“1-10”

ordinals

enum

Option	Format	Example
0	As dictated	”First, second, third”
1	Abbreviated (default)	“1st, 2nd, 3rd”

Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation enabled, and formatting options defined:

Configuration example with formatting

{
  primaryLanguage: "en",
  interimResults: true, 
  spokenPunctuation: true, 
  commands: [...],
  formatting: {       // default values:
    dates: 1,         // long format (”3 February 2025”)
    times: 2,         // 24 hour format (”16:00”)
    numbers: 1,       // single digit as words, multi-digit as number (”one, two, … nine, 10, 11”)
    units: 1,         // abbreviated ("mm", "cm", "in"…)
    abbreviations: 1, // abbreviated ("BP 120/80 mmHg")
    numericRanges: 1, // abbreviated ("1-10")
    ordinals: 1,      // abbreviated ("1st, 2nd")
  },
}

Sending audio

Raw audio data to be transcribed.

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to send audio data.

transcribeSocket.sendAudio(audioChunk); // method doesn't do the chunking

Ending

To end the /transcribe session send a type: end. This will signal the server to send any remaining transcript segments and detected commands before the server sends a usage message

{
  "type":"usage",
  "credits":0.1
}

, then a message of type ended, and then closes.

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to end the /transcribe session.

When using automatic configuration (passing configuration to connect), the socket will close itself without reconnecting when it receives an ENDED message. When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this, you must subscribe to the ended message and manually close the connection.

const transcribeSocket = await cortiClient.transcribe.connect({
  configuration
});

transcribeSocket.sendEnd({ type: "end" });

Responses

Configuration

type

string

default:"CONFIG_ACCEPTED"

required

Returned when sending a valid configuration.

sessionId

uuid

required

Returned when sending a valid configuration.

Transcripts

type

string

default:"transcript"

required

data

string

required

Show data attributes

text

string

required

Transcript segment with punctuations applied and command phrases removed

rawTranscriptText

string

required

The raw transcript without spoken punctuation applied and without command phrases removed

start

float64

required

Start time of the transcript segment in seconds

end

float64

required

End time of the transcript segment in seconds

isFinal

bool

required

If false, then interim transcript result

Commands

type

string

default:"command"

required

data

string

required

Show data attributes

string

required

To identify the command when it gets detected and returned over the WebSocket

variables

string[]

The variables identified

rawTranscriptText

string

required

The raw transcript without spoken punctuation applied and without command phrases removed

start

float64

required

Start time of the transcript segment in seconds

end

float64

required

End time of the transcript segment in seconds

Command response

{
  "type": "command",
  "data": {
    "id": "insert_template",
    "variables": {
      "template_name": "radiology"
    },
    "rawTranscriptText": "insert my radiology template",
    "start": 2.3,
    "end": 2.9,
  }
}

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to subscribe to responses from the /transcribe endpoint.

transcribeSocket.on("message", (message) => {
  switch (message.type) {
    case "transcript":
      console.log("Transcript:", message.data.text);
      break;
    case "command":
      console.log("Command detected:", message.data.id, message.data.variables);
      break;
    case "error":
      console.error("Error:", message.error);
      break;
    case "usage":
      console.log("Usage credits:", message.credits);
      break;
    default:
      // handle other messages
      break;
  }
});

Error Responses

type

string

required

Returned when sending an invalid configuration.Possible errors CONFIG_DENIED, CONFIG_TIMEOUT

reason

string

The reason the configuration is invalid.

sessionId

uuid

required

The session ID.

Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors. These are sent as JSON objects with the following structure:

{
  "type": "error",
  "error": {
    "id": "error id",
    "title": "error title",
    "status": 400,
    "details": "error details",
    "doc":"link to documentation"
  }
}

In some cases, receiving an “error” type message will cause the stream to end and send a message of type usage and type ENDED.

Using SDK

You can use the Corti SDK (currently in “alpha”, not for production use) to handle error messages.

With recommended configuration, configuration errors (e.g., CONFIG_DENIED, etc.) and runtime errors will both trigger the error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.

const transcribeSocket = await cortiClient.transcribe.connect({
  configuration
});

transcribeSocket.on("error", (error) => {
  // Emitted for both configuration and runtime errors
  console.error("Error event:", error);
  // The socket will close itself automatically
});

// still can be accessed with normal "message" subscription
transcribeSocket.on("message", (message) => {
  if (
    message.type === "CONFIG_DENIED" ||
    message.type === "CONFIG_TIMEOUT"
  ) {
    console.log("Configuration error (message):", message);
  }

  if (message.type === "error") {
    console.log("Runtime error (message):", message);
  }
});

Get Started

Live Streams

interactions

recordings

transcripts

facts

documents

templates

codes

alignment

classification

contextual

explainability

swagger

Overview

Environment Options

Establishing a Connection

Query Parameters

Using SDK

Handshake Response

101 Switching Protocols

Sending Messages

Basic Stream Configuration

Advanced Stream Configuration

Commands

Using SDK

Formatting

Sending audio

Using SDK

Ending

Using SDK

Responses

Configuration

Transcripts

Commands

Using SDK

Error Responses

Using SDK

Get Started

Live Streams

interactions

recordings

transcripts

facts

documents

templates

codes

alignment

classification

contextual

explainability

swagger

​Overview

​Environment Options

​Establishing a Connection

​Query Parameters

​Using SDK

​Handshake Response

​101 Switching Protocols

​Sending Messages

​Basic Stream Configuration

​Advanced Stream Configuration

​Commands

​Using SDK

​Formatting

​Sending audio

​Using SDK

​Ending

​Using SDK

​Responses

​Configuration

​Transcripts

​Commands

​Using SDK

​Error Responses

​Using SDK

Overview

Environment Options

Establishing a Connection

Query Parameters

Using SDK

Handshake Response

101 Switching Protocols

Sending Messages

Basic Stream Configuration

Advanced Stream Configuration

Commands

Using SDK

Formatting

Sending audio

Using SDK

Ending

Using SDK

Responses

Configuration

Transcripts

Commands

Using SDK

Error Responses

Using SDK