Overview

The WebSocket Secure (WSS) API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands.

This documentation provides a comprehensive guide for integrating these capabilities.

This /transcribe endpoint supports stateless front-end dictation.

  • If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
  • If you are looking for asynchronous transcript generation as part of an interaction, then please refer to the /transcripts endpoint

Environment Options

EnvironmentDescription
usUS-based instance
euEU-based instance
beta-euBeta EU instance (default)

Establishing a Connection

Clients must initiate a WebSocket connection using the wss:// scheme.

When creating an interaction, the 200 response provides a websocketUrl for that interaction including the tenant-name as url parameter. The authentication for the WSS stream requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.

Request

Query Parameters

tenant-name
string
required

Specifies the tenant context

token
string
required
Bearer $token
  curl --request GET \
    --url wss://api.${environment}.corti.app/audio-bridge/v2/transcribe?tenant-name=${tenant}&token=Bearer%20${accessToken}

Handshake Response

101 Switching Protocols

Indicates a successful WebSocket connection. Upon successful connection, send a message including the configuration to specify the input and expected output formats.

      {
  "type": "transcript",
  "data": {
    "text": "Hello.\nHow are you?",
    "rawTranscriptText": "hello new line how are you",
    "start": 0.124,
    "end": 6.421,
    "isFinal": true
  }
}

Sending Messages

Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data. If the configuration is not valid it will return CONFIG_DENIED. The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.

Basic Stream Configuration

Body

primaryLanguage
string
required

The locale of the primary spoken language. Check https://docs.corti.ai/about/languages for more.

interimResults
bool

When true, returns interim results for reduced latency

spokenPunctuation
bool

When true, converts spoken punctuation such as period or slash into .or /.

automaticPunctuation
bool

When true, automatically punctuates and capitalizes in the final transcript.

Advanced Stream Configuration: Commands

The transcribe endpoint supports registration and detection of commands, common in front end-dictation workflows. Extend the configuration with the following parameters to register commands that should be detected.

Body

commands
command object[]

Provide the commands that should be registered and detected

Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation and automatic punctuation enabled, and example commands defined.

Configuration example
{
  primaryLanguage: "en",
  interimResults: true, 
  spokenPunctuation: true, 
  automaticPunctuation: true,
  commands: [
    {
      id: "next_section",
      phrases: ["next section", "go to next section"]
    },
    {
      id: "delete",
      phrases: ["delete that"]
    },
    {
            "id": "insert_template",
            "phrases": [
                "insert my {template_name} template",
                "insert {template_name} template"
            ],
            "variables": [
                {
                    "key": "template_name",
                    "type": "enum",
                    "enum": [
                        "radiology",
                        "referral"
                    ]
                }
            ]
    }
  ],
}

Sending audio

Raw audio data to be transcribed.

Ending

To end the /transcribe session send a type: end. This will signal the server to send any remaining transcript segments and detected commands before ending the session.

Responses

Configuration

type
string
default:"CONFIG_ACCEPTED"
required

Returned when sending a valid configuration.

sessionId
uuid
required

Returned when sending a valid configuration.

Transcripts

type
string
default:"transcript"
required
data
string
required

Commands

type
string
default:"command"
required
data
string
required
Command response
      {
    "type": "command",
    "data": {
        "id": "insert_template",
        "variables": {
            "template_name": "radiology"
        },
        "rawTranscriptText": "insert my radiology template",
        "start": 2.3,
        "end": 2.9,
        }
      }

Error Responses

type
string
required

Returned when sending an invalid configuration.

Possible errors CONFIG_DENIED, CONFIG_TIMEOUT

reason
string

The reason the configuration is invalid.

sessionId
uuid
required

The session ID.