Stream endpoint

Interested in building a real-time intelligence or clinical decision support solution? Look no further than the Corti AI Stream API: Real-time, bidirectional WebSocket Secure (WSS) communication with the Corti AI platform for live transcript generation and clinical fact extraction (FactsR™).

Delivered as a modular API, /stream provides a stateful interaction framework for developers to embed clinical-grade intelligence directly into their healthcare applications, creating safer, leaner, and more trusted AI experiences at the point of care.

Depending on your use case, the stream endpoint can be used for real-time conversational transcript, fact extraction, or both! See the full API specification here.

Using the API

Initiate a /stream by creating an Interaction, which will return a web socket URL will be returned along with the interactionId.

Connect to the web socket and set your configuration:

Parameter	Description
`primaryLanguage`	Spoken language to be transcribed
`diarization`	Enable speaker separation (most effective with multichannel audio)
`multichannel`	Enable multichannel audio (be sure to define channels)
`participants`	Assign speaker roles for audio channels and diarization
`mode`	Define `facts` or `transcript` depending on the desired real-time output
`outputLocale`	Output language for extracted `facts` (required with “mode”:“facts”)

Once config is accepted, begin sending audio packets.

Receive transcripts every ~3 seconds and facts every ~60 seconds (standard response times can adapted for custom response times).

Send the end message to close the audio stream.

Features

Click on the cards to learn more…

Languages

Corti speech recognition is specifically designed for use in the healthcare domain. A tier system has been introduced to categorize functionality and performance that is available per language and endpoint. Languages in the Enhanced and Premier tiers have the utmost functionality and recognition accuracy - they’re the ones recommended for dictation use.

Facts

FactsR™ is a real-time agentic reasoning system for clinical consultations. Designed with ambient documentation in mind, FactsR reduces general purpose AI driven “note bloat” by 65 percent, keeping records precise, relevant, and tightly aligned with the actual clinical conversation.

Transcript

Live transcription is enabled by the same infrastructure that supports /transcribe, except when using /stream the focus is on conversational speech-to-text (as opposed to dictation).

Audio configuration

With support for mono or multi-channel audio, with live transcoding and a variety of file formats to choose from, don’t let the complexities fo audio capture and processing inhibit opportunities for real-time intelligence. Read more about our recommendations and best practices.

Diarization

Diarization is the process of segmenting an audio recording by speaker, assigning portions of speech to distinct identities (e.g., “Doctor,” “Patient”). This enables accurate transcription, attribution, and analysis of multi-speaker clinical conversations, but is not required for effective AI scribing or workflow speech-enablement.

Vocabulary

coming soon
Unparalleled access to and control over the vocabulary used by the speech recognition models will give organizations the utmost control over the dictation experience: Gain visibility into the terminologies the models are trained on and update the vocabulary as needed to optimize for localized or specialized needs, respond to reported issues, and stay ahead of the wave of changes to medical practices and communication.

Please contact us for more information or help.

Get Started

Speech Recognition

Text Generation

Agentic Framework (beta)

Administration

Using the API

Features

Languages

Facts

Transcript

Audio configuration

Diarization

Vocabulary

Get Started

Speech Recognition

Text Generation

Agentic Framework (beta)

Administration

​Using the API

​Features

Languages

Facts

Transcript

Audio configuration

Diarization

Vocabulary

Using the API

Features