Overview
The WebSocket Secure (WSS)/transcribe API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands.
This documentation provides a comprehensive guide for integrating these capabilities.
This
/transcribe endpoint supports real-time stateless dictation.- If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
- If you are looking for transcript generation based on a pre-recorded audio file, then please refer to the /transcripts endpoint
1. Establishing a Connection
Clients must initiate a WebSocket connection using thewss:// scheme.
The authentication for the WSS stream requires in addition to the
tenant-name parameter a token parameter to pass in the Bearer access token.Query Parameters
eu or usSpecifies the tenant context
Bearer $token
Example wss:/transcribe request
Using SDK
You can use the Corti SDK (currently in “beta”) to connect to the /transcribe endpoint.
2. Handshake Response
101 Switching Protocols
Indicates a successful WebSocket connection. Upon successful connection, send aconfig message to define the configuration: Specify the input language and expected output preferences.
The config message must be sent within 10 seconds of the web socket being opened to prevent
CONFIG-TIMEOUT, which will require establishing a new wss connection.3. Sending Messages
Configuration
Declare your/transcribe configuration using the message "type": "config" followed by defining the "configuration": {"<config details, per options below>"}.
Defining the type is required along with the primaryLanguage configuration parameter. The other parameters are optional for use, depending on your need and workflow.
Configuration notes:
- Clients must send a stream configuration message and wait for a response of type
CONFIG_ACCEPTEDbefore transmitting other data. - If the configuration is not valid it will return
CONFIG_DENIED. - The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with
CONFIG_TIMEOUT.
The locale of the primary spoken language. See supported languages codes and more information here.
When true, converts spoken punctuation such as
period or slash into .or /. Read more about supported punctuation here.When true, automatically punctuates and capitalizes in the final transcript.
Spoken and Automatic Punctuation are mutually exclusive - only one should be set to true in a given configuration request. If both are included and set to
true, then spokenPunctuation will take precedence and override automaticPunctuation.Provide the commands that should be registered and detected - Read more about commands here.
Define each type of formatting preferences using the
enum options described below. Read more about formatting here.Formatting is currently
beta status. API details subject to change ahead of general release.Defining formatting configuration is optional. When these preferences are not configured, the default values listed below will be applied automatically.When true, returns interim (preview) results with reduced latency as compared to the final results. Interim (preview) results are indicated by isFinal=false in the stream response.
interimResults is currently
beta status. API details subject to change ahead of general release. Use of this feature is optional and has no bearing on the final transcription results.Example
Here is an example configuration for transcription of dictated audio in English with spoken punctuation enabled, two commands defined, and (default) formatting options defined:wss:/transcribe configuration example
Using SDK
You can use the Corti SDK (currently in “beta”) to send configuration.
Sending audio
Raw audio data to be transcribed sent as binary. See more details on audio formats and best practices for recording here.Using SDK
You can use the Corti SDK (currently in “beta”) to send audio data.
Ending the session
To end the /transcribe session send atype: end message.
This will signal the server to send any remaining transcript segments and detected commands. Then, the server will send a usage message,
ended will be returned by the server before it closes the web socket.
Using SDK
You can use the Corti SDK (currently in “beta”) to end the /transcribe session.
connect),
the socket will close itself without reconnecting when it receives an ended message.
When using manual configuration, the socket will attempt to reconnect after the server closes the connection. To prevent this,
you must subscribe to the ended message and manually close the connection.
4. Responses
Configuration
Returned when sending a valid configuration.
Returned when sending a valid configuration.
Transcripts
Commands
Command response
Using SDK
You can use the Corti SDK (currently in “beta”) to subscribe to responses from the /transcribe endpoint.
5. Error Responses
Returned when sending an invalid configuration.Possible errors
CONFIG_DENIED, CONFIG_TIMEOUTThe reason the configuration is invalid.
The session ID.
usage and type ENDED.
Using SDK
You can use the Corti SDK (currently in “beta”) to handle error messages.
CONFIG_DENIED, etc.) and runtime errors will both trigger the error event and automatically close the socket. You can also inspect the original message in the message handler. With manual configuration, configuration errors are only received as messages (not as error events), and you must close the socket manually to avoid reconnection.