Real-time transcript and Fact generation
WebSocket Secure (WSS) API Documentation for /stream endpoint in support of real-time conversational transcript and fact generation
Overview
The WebSocket Secure (WSS) API enables real-time, bidirectional communication with the Corti system for interaction streaming. Clients can send and receive structured data, including transcripts and facts updates.
This documentation provides a structured guide for integrating the Corti WSS API for real-time interaction streaming.
This /stream
endpoint supports real-time ambient documentation interactions.
- If you are looking for a stateless, atomic endpoint that is more geared towards front-end dictation workflows you should use the /transcribe WSS
- If you are looking for asynchronous ambient documentation interactions, then please refer to the /documents endpoint
Environment Options
Environment | Description |
---|---|
us | US-based instance |
eu | EU-based instance |
beta-eu | Beta EU instance (default) |
Establishing a Connection
Clients must initiate a WebSocket connection using the wss://
scheme and provide a valid interaction ID in the URL.
websocketUrl
for that interaction including the tenant-name
as url parameter.
The authentication for the WSS stream requires in addition to the tenant-name
parameter a token
parameter to pass in the Bearer access token.Request
Path Parameters
Unique interaction identifier
Query Parameters
Specifies the tenant context
Responses
101 Switching Protocols
Indicates a successful WebSocket connection. Once connected, the server streams data in the following formats.
transcripts
and facts
and recording(s)
in the relevant REST endpoints. Audio recordings are saved as .webm format.
This can be configured by Corti to be turned off to ensure you can comply with your applicable regulations and data handling preferences.Data Streams
Transcript Stream
Property | Type | Description |
---|---|---|
type | string | Message type (transcript ) |
data.id | string | Unique identifier for the transcript |
data.transcript | string | The transcribed text |
data.final | boolean | Indicates whether the transcript is finalized or interim |
data.participant.channel | string | The audio channel for the participant (e.g. 0 or 1) |
data.participant.role | string | Role of the participant (e.g., doctor ) |
data.time.start | number | Start time of the transcript segment |
data.time.end | number | End time of the transcript segment |
Fact Stream
Property | Type | Description |
---|---|---|
type | string | Message type (facts ) |
data.id | string | Unique identifier for the fact |
data.text | string | Text description of the fact |
data.group | string | Categorization of the fact (e.g., medical-history ) |
data.groupId | string | Unique identifier for the group |
data.isDiscarded | boolean | Indicates if the fact was discarded |
data.source | string | Source of the fact (e.g., core for generated automatically) |
data.createdAt | string | Timestamp when the fact was created |
data.updatedAt | string | Timestamp when the fact was last updated |
Sending Messages
Clients must send a stream configuration message and wait for a response of type CONFIG_ACCEPTED
before transmitting other data.
Stream Configuration
Property | Type | Required | Description |
---|---|---|---|
type | string | Yes | Message type (config ) |
configuration | object | Yes | Configuration settings |
transcription.primaryLanguage | string | Yes | Primary spoken language for transcription |
transcription.isDiarization | boolean | No - false | Enable speaker diarization |
transcription.isMultichannel | boolean | No - false | Enable multi-channel audio processing |
transcription.participants | array | Yes | List of participants with roles assigned to a channel |
mode.type | string | facts , transcription | Processing mode |
mode.outputLocale | string | Yes | Output language locale specific to facts |
Example Configuration
Once the server responds with:
Clients can proceed with sending audio or controlling the stream status.
Controlling Stream Status
To end the stream, send:
The connection remains open until all transcripts are complete. The server then sends a message of type: "ENDED"
and closes the connection.
Sending Audio Data
Ensure that your configuration was accepted before starting to send audio and that your initial audio chunk is not too small as it needs to contain the headers to properly decode the audio. We recommend sending audio in chunks of 500ms. In terms of buffering, the limit is 64000 bytes per chunk. Audio data should be sent as raw binary without JSON wrapping.
While we for bandwith and efficiency reasons recommend utilizing the webm/opus encoding, you can send a variety of common audio formats as the audio you send first passes through a transcoder. Similarly, you do not need to specify any sample rate, depth or other audio settings.
Channels, participants and speakers
In a typical on-site setting you will be sending mono-channel audio. If the microphone is a stereo-microphone, you can ensure to set isMultichannel: false
and audio will be converted to mono-channel, ensuring no duplicate transcripts are being returned.
In a virtual setting such as telehealth, you would typically have the virtual audio on one channel from webRTC and mix in on a separate channel the microphone of the local client. In this scenario, define isMultichannel: true
and assign each channel the relevant participant role, e.g. if the doctor is on the local client and channel 0, then you can set the role for channel 0 to doctor
.
Diarization is independent of audio channels and participant roles. If you want transcript segments to be assigned to automatically identified speakers, set isDiarization: true
. If false
, transcript segments will be returned with speakerId: -1
. If set to true
, then diarization will try to identify speakers separately on each channel. The first identified speaker on each channel will have transcript segments with speakerId: 0
, the second speakerId: 1
and so forth.
Error Handling
In case of an invalid or missing interaction ID, the server will return an error before opening the WebSocket.
During a WebSocket session the following messages related to configuration can be returned.
In addition, a reason will be supplied, e.g. reason: language unavailable
Closing the Connection
To terminate the WebSocket session, send a standard WebSocket close frame, or use:
The connection remains open until all transcripts are complete, at which point the server sends a message of type ENDED
and then closes.
Was this page helpful?