Overview
The WebSocket Secure (WSS)/streams API enables real-time, bidirectional communication with the Corti system for interaction streaming. Clients can send and receive structured data, including transcripts and fact updates. Learn more about FactsR™ here.
This documentation provides a structured guide for integrating the Corti WSS API for real-time interaction streaming.
This
/streams endpoint supports real-time ambient documentation interactions and clinical decision support workflows.- If you are looking for a stateless endpoint that is geared towards front-end dictation workflows you should use the /transcribe WSS
- If you are looking for asynchronous ambient documentation interactions, then please refer to the /documents endpoint
1. Establishing a Connection
Clients must initiate a WebSocket connection using thewss:// scheme and provide a valid interaction ID in the URL.
When creating an interaction, the 200 response provides a
websocketUrl for that interaction including the tenant-name as url parameter.
The authentication for the WSS streams requires in addition to the tenant-name parameter a token parameter to pass in the Bearer access token.Path Parameters
Unique interaction identifier
Query Parameters
eu or usSpecifies the tenant context
Bearer $token
2. Handshake Responses
101 Switching Protocols
Indicates a successful WebSocket connection. Upon successful connection, send aconfig message to define the configuration: Specify the input language and expected output preferences.
The config message must be sent within 10 seconds of the web socket being opened to prevent
CONFIG-TIMEOUT, which will require establishing a new wss connection.3. Sending Messages
Configuration
Declare your/streams configuration using the message "type": "config" followed by defining the "configuration": {<config details, per options below>}.
Defining the type is required along with transcription: primaryLanguage and mode: type and outputLocale configuration parameters. The other parameters are optional for use, depending on your need and workflow.
Configuration notes:
- Clients must send a streams configuration message and wait for a response of type
CONFIG_ACCEPTEDbefore transmitting other data. - If the configuration is not valid it will return
CONFIG_DENIED. - The configuration must be committed within 10 seconds of opening the WebSocket, else it will time-out with
CONFIG_NOT_PROVIDED.
Example
Sending Audio
Ensure that your configuration was accepted before sending audio, and that the initial audio chunk is not too small as it needs to contain the headers to properly decode the audio.We recommend sending audio in chunks of 250-500ms. In terms of buffering, the limit is 64000 bytes per chunk.Audio data should be sent as raw binary without JSON wrapping.
Channels, participants, and speakers
In most workflows, especially in-person settings, mono-channel audio should be used. If the microphone is a stereo-microphone, then ensure to setisMultichannel: false and audio will be converted to mono-channel, preventing duplicate transcripts from being returned.
In a telehealth workflow, or other virtual setting, the virtual audio may be on one channel (e.g., from webRTC) with audio from the microphone of the local client on a separate channel. In this scenario, define isMultichannel: true and assign each channel the relevant participant role (e.g., if the doctor is on the local client, then set that to channel 0 with participant defined as doctor and the virtual audio for patient on channel defined as participantpatient`).
Diarization is independent of audio channels and participant roles as it enables speaker separation for mono audio.With configuration
isDiarization: true, transcript segments will be assigned to automatically with first speaker identified being channel 0, second on channel 1, etc. If isDiarization:false, then transcript segments will all be assigned with speakerId: -1.Read more here.Flush the Audio Buffer
To flush the audio buffer, forcing transcript segments to be returned over the web socket (e.g., when turning off or muting the microphone for the patient to share something private, not to be recorded, during the conversation), send a message -flush message and then respond with messages -
Delta usage represents incremental credit consumption between recording initiation and flush events. Delta usage is approximate and may differ slightly from final usage sent after end message is processed (see below). Final, end session usage will be reflected in API billing.
The web socket will remain open after flush processing so recording can continue.
FactsR generation (i.e., when working in configuration.mode: facts) is not impacted by the flush event and will continue to process as normal.
Ending the Session
To end the/streams session, send a message -
mode configuration). Then, the server will send two messages -
ENDED, the server will close the web socket.
You can at any time open the WebSocket again by sending the configuration.
4. Responses
Configuration
Returned when sending a valid configuration. Response body will include the full configuration object to confirm values applied in the configuration.
Returned when sending a valid configuration.
Transcripts
Transcript response
Transcript output will automatically apply server-side default values documented here. Configuration of output formatting is not supported on the
streams endpoint at this time as it is with transcribe.Facts
Fact response
By default, incoming audio and returned data streams are persisted on the server, associated with the interactionId. You may query the interaction to retrieve the stored
recordings, transcripts, and facts via the relevant REST endpoints. Audio recordings are saved as .webm format; transcripts and facts as json objects.Data persistence can be disabled by Corti upon request when needed to support compliance with your applicable regulations and data handling preferences.Flushed
Returned by server, after processing
flush event from client, to return transcript segmentsUsage
Returned by server, after processing
flush event from client, to convey amount of credits consumed since recording started. Delta usage is approximate and may differ slightly from final usage sent after end message is processed.Returned by server, after processing
end event from client, to convey amount of credits consumedEnded
Returned by server, after processing
end event from client, before closing the web socket5. Error Handling
In case of an invalid or missing interaction ID, the server will return an error before opening the WebSocket. In case of an invalid configuration, the server will return one of the following errors:Returned when sending an invalid configuration.Possible errors:
CONFIG_DENIED, CONFIG_NOT_PROVIDED, CONFIG_ALREADY_RECEIVED, CONFIG_MISSINGThe reason the configuration is invalid.
The interaction ID.