Stateless front-end dictation
WebSocket Secure (WSS) API Documentation for stateless use of /transcribe endpoint for real-time speech-to-text
Overview
The WebSocket Secure (WSS) API enables real-time, bidirectional communication with the Corti system for stateless speech-to-text. Clients can send and receive structured data, including transcripts and detected commands.
This documentation provides a comprehensive guide for integrating these capabilities.
This /transcribe
endpoint supports stateless front-end dictation.
- If you are looking for real-time ambient documentation interactions, you should use the /stream WSS
- If you are looking for asynchronous transcript generation as part of an interaction, then please refer to the /transcripts endpoint
Environment Options
Environment | Description |
---|---|
us | US-based instance |
eu | EU-based instance |
beta-eu | Beta EU instance (default) |
Establishing a Connection
Clients must initiate a WebSocket connection using the wss://
scheme.
websocketUrl
for that interaction including the tenant-name
as url parameter.
The authentication for the WSS stream requires in addition to the tenant-name
parameter a token
parameter to pass in the Bearer access token.Request
Query Parameters
Specifies the tenant context
Handshake Response
101 Switching Protocols
Indicates a successful WebSocket connection. Upon successful connection, send a message including the configuration to specify the input and expected output formats.
Sending Messages
Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED
before transmitting other data.
If the configuration is not valid it will return CONFIG_DENIED
.
The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with CONFIG_TIMEOUT
.
Basic Stream Configuration
Body
The locale of the primary spoken language. Check https://docs.corti.ai/about/languages for more.
When true, returns interim results for reduced latency
When true, converts spoken punctuation such as period
or slash
into .
or /
.
When true, automatically punctuates and capitalizes in the final transcript.
Advanced Stream Configuration: Commands
The transcribe endpoint supports registration and detection of commands, common in front end-dictation workflows. Extend the configuration with the following parameters to register commands that should be detected.
Body
Provide the commands that should be registered and detected
Here is an example configuration for transcription of dictated audio in English, with interim results, spoken punctuation and automatic punctuation enabled, and example commands defined.
Sending audio
Raw audio data to be transcribed.
Ending
To end the /transcribe session send a type: end
. This will signal the server to send any remaining transcript segments and detected commands before ending the session.
Responses
Configuration
Returned when sending a valid configuration.
Returned when sending a valid configuration.
Transcripts
Commands
Error Responses
Returned when sending an invalid configuration.
Possible errors CONFIG_DENIED
, CONFIG_TIMEOUT
The reason the configuration is invalid.
The session ID.
Was this page helpful?