The WebSocket Secure (WSS) /transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech to text. Clients can send and receive structured data, including transcripts and detected commands.This documentation provides a comprehensive guide for integrating these capabilities.
This /transcribe endpoint supports real-time stateless dictation.
If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
If you are looking for transcript generation based on a pre-recorded audio file, then please refer to the /transcripts endpoint
Indicates a successful WebSocket connection.Upon successful connection, send a config message to define the configuration: Specify the input language and expected output preferences.
The config message must be sent within 10 seconds of the web socket being opened to prevent CONFIG-TIMEOUT, which will require establishing a new wss connection.
Declare your /transcribe configuration using the message "type": "config" followed by defining the "configuration": {<config details, per options below>}.Defining the type is required along with the primaryLanguage configuration parameter. The other parameters are optional for use, depending on your need and workflow.
Configuration notes:
Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data.
If the configuration is not valid it will return CONFIG_DENIED.
The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.
When true, automatically punctuates and capitalizes in the final transcript.
Spoken and Automatic Punctuation are mutually exclusive - only one should be set to true in a given configuration request. If both are included and set to true, then spokenPunctuation will take precedence and override automaticPunctuation.
Unique value to identify the command. This, along with the command phrase, will be returned by the API when the command is recognized during dictation.
Define each type of formatting preferences using the enum options described below. Formatting configuration is optional, and when no properties are configured, the values listed as default will be applied automatically. Read more about formatting here.
Formatting is currently in beta testing. API details subject to change ahead of general release.
Here is an example configuration for transcription of dictated audio in English with spoken punctuation enabled, two commands defined, and (default) formatting options defined:
Ensure that your configuration was accepted before sending audio, and that the initial audio chunk is not too small as it needs to contain the headers to properly decode the audio.We recommend sending audio in chunks of 250-500ms. In terms of buffering, the limit is 64000 bytes per chunk.Audio data should be sent as raw binary without JSON wrapping.
A variety of common audio formats are supported; audio will be passed through a transcoder before speech-to-text processing. Similarly, specification of sample rate, depth or other audio settings is not required at this time.See more details on supported audio formats here.
To flush the audio buffer, forcing transcript segments and detected commands to be returned over the web socket (e.g., when turning off or muting the microphone in a “hold-to-talk” dictation workflow, or in applications that support mic “go to sleep”), send a message -
Copy
Ask AI
{ "type":"flush"}
The server will return text/commands for audio sent before the flush message and then respond with message -
Copy
Ask AI
{ "type":"flushed"}
The web socket will remain open so dictation can continue.
Client side considerations:1 If you rely on a flush event to separate data (e.g., for different sections in an EHR template), then be sure to receive the flushed event before moving on to the next data field.2 When using a web browser MediaRecorder API, audio is buffered and only emitted at the configured timeslice interval. Therefore, before sending a flush message, call MediaRecorder.requestData() to force any remaining buffered audio on the client to be transmitted to the server. This ensures all audio reaches the server before the flush is processed.
You can use the Corti SDK (currently in “beta”) to end the /transcribe session.
When using automatic configuration (passing configuration to connect),
the socket will close itself without reconnecting when it receives an ended message.When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this,
you must subscribe to the ended message and manually close the connection.
Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors.
These are sent as JSON objects with the following structure:
You can use the Corti SDK (currently in “beta”) to handle error messages.
With recommended configuration, configuration errors (e.g., CONFIG_DENIED, etc.) and runtime errors will both trigger the error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.
Copy
Ask AI
const transcribeSocket = await cortiClient.transcribe.connect({ configuration});transcribeSocket.on("error", (error) => { // Emitted for both configuration and runtime errors console.error("Error event:", error); // The socket will close itself automatically});// still can be accessed with normal "message" subscriptiontranscribeSocket.on("message", (message) => { if ( message.type === "CONFIG_DENIED" || message.type === "CONFIG_TIMEOUT" ) { console.log("Configuration error (message):", message); } if (message.type === "error") { console.log("Runtime error (message):", message); }});