The WebSocket Secure (WSS) /transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands.This documentation provides a comprehensive guide for integrating these capabilities.
This /transcribe endpoint supports real-time stateless dictation.
If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
If you are looking for asynchronous transcript generation as part of an interaction, then please refer to the /transcripts endpoint
Clients must initiate a WebSocket connection using the wss:// scheme.
When creating an interaction, the 200 response provides a websocketUrl for that interaction including the tenant-name as url parameter.
The authentication for the WSS stream requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.
Indicates a successful WebSocket connection.
Upon successful connection, send a message including the configuration to specify the input and expected output formats.
Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED before transmitting other data.
If the configuration is not valid it will return CONFIG_DENIED.
The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT.
The transcribe endpoint supports registration and detection of commands, common in dictation workflows.
Extend the configuration with the following parameters to register commands that should be detected.
The spoken phrases that should trigger the command
Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation and automatic punctuation enabled, and example commands defined.
The transcribe endpoint provides the option to configure formatting preferences.
Extend the configuration with the following parameters to apply formatting that should be used when returning text output.
Formatting functionality is currently in beta status. API details subject to change ahead of general release.Defining formatting configuration is optional. When these preferences are not configured, the default values listed below will be applied automatically.
Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation enabled, and formatting options defined:
Configuration example with formatting
Copy
Ask AI
{ primaryLanguage: "en", interimResults: true, spokenPunctuation: true, commands: [...], formatting: { // default values: dates: 1, // long format (”3 February 2025”) times: 2, // 24 hour format (”16:00”) numbers: 1, // single digit as words, multi-digit as number (”one, two, … nine, 10, 11”) units: 1, // abbreviated ("mm", "cm", "in"…) abbreviations: 1, // abbreviated ("BP 120/80 mmHg") numericRanges: 1, // abbreviated ("1-10") ordinals: 1, // abbreviated ("1st, 2nd") },}
To end the /transcribe session send a type: end.
This will signal the server to send any remaining transcript segments and detected commands before the server sends a usage message
You can use the Corti SDK (currently in “alpha”, not for production use) to end the /transcribe session.
When using automatic configuration (passing configuration to connect),
the socket will close itself without reconnecting when it receives an ENDED message.When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this,
you must subscribe to the ended message and manually close the connection.
Once configuration has been accepted and the session is running, you may encounter runtime or application-level errors.
These are sent as JSON objects with the following structure:
You can use the Corti SDK (currently in “alpha”, not for production use) to handle error messages.
With recommended configuration, configuration errors (e.g., CONFIG_DENIED, etc.) and runtime errors will both trigger the error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.
Copy
Ask AI
const transcribeSocket = await cortiClient.transcribe.connect({ configuration});transcribeSocket.on("error", (error) => { // Emitted for both configuration and runtime errors console.error("Error event:", error); // The socket will close itself automatically});// still can be accessed with normal "message" subscriptiontranscribeSocket.on("message", (message) => { if ( message.type === "CONFIG_DENIED" || message.type === "CONFIG_TIMEOUT" ) { console.log("Configuration error (message):", message); } if (message.type === "error") { console.log("Runtime error (message):", message); }});