Recordings and Transcripts

Speech recognition is a machine learning technology that provides the ability to transform individual or conversational spoken audio into text. Although current opportunities for working with audio and transcripts in AI-powered workflows are unprecedented, fundamental speech-to-text capability that has been used in medical transcription workflows for decades is still an important need.

This page explains key functionality provided by the Corti audio file processing via the /recordings and /transcripts endpoints.Read the API specifications: Upload recording and create transcript.

Transcript creation is an asynchronous workflow - see below how to upload audio files, create, and receive transcripts.For real-time audio streaming, read more about /transcribe here or /stream here.

Using the API

Review supported audio file requirements here.

Corti ASR supports file transcoding; however, it is recommended to follow the outlined best practices for a consistent and reliable experience.

Create an Interaction: POST:/interactions/

Note the interactionId included in the response that will be used for aggregating the audio file and transcript assets.

Upload an audio file: POST:/interactions/{id}/recordings/

Note the recordingId that will be used for transcript creation

Create the transcript: POST:/interactions/{id}/transcripts/

Each interaction may have more than one audio file and transcript associated with it. Audio files up to 60min in total duration, or 150MB in total size, are supported.

Receive the transcript:

First, the transcript will process synchronously for a maximum of 25 seconds
If the audio file transcription takes longer than the 25 second synchronous processing timeout, then it will continue to process asynchronously.
- In this scenario, an empty transcript will be returned with a location header that can be used to retrieve the final transcript via the transcriptId.
- The client can poll the Get Transcript endpoint status (GET /interactions/{id}/transcripts/{transcriptId}/status) for transcript status (processing, completed, failed).

Use the List Transcripts endpoint to view all transcripts associated with an interaction, and completed transcripts can be retrieved via the Get Transcript endpoint.

Features

Click on the cards to learn more…

Languages

Corti speech recognition is specifically designed for use in the healthcare domain. A tier system has been introduced to categorize functionality and performance that is available per language and endpoint. Languages in the Enhanced and Premier tiers have the utmost functionality and recognition accuracy - they’re the ones recommended for dictation use.

Punctuation

Punctuation is essential for coherent documentation. Setting the isDictation parameter to true in /transcripts requests enables spokenPunctuation functionality.

Audio configuration

With support for mono or multi-channel audio, with live transcoding and a variety of file formats to choose from, don’t let the complexities fo audio capture and processing inhibit opportunities for real-time intelligence. Read more about our recommendations and best practices.

Diarization

Diarization is the process of segmenting an audio recording by speaker, assigning portions of speech to distinct identities (e.g., “Doctor,” “Patient”). This enables accurate transcription, attribution, and analysis of multi-speaker clinical conversations, but is not required for effective AI scribing or workflow speech-enablement.

Formatting

beta
Speech recognition can be used to create a verbatim transcript of the audio; however, some content is not documented in the same manner as it is verbalized. The formatting features provide control over how key information should for represented in the textual output.
_{This feature is currently supported on the /transcribe endpoint, but coming soon to /transcripts.}

Vocabulary

coming soon
Unparalleled access to and control over the vocabulary used by the speech recognition models will give organizations the utmost control over the dictation experience: Gain visibility into the terminologies the models are trained on and update the vocabulary as needed to optimize for localized or specialized needs, respond to reported issues, and stay ahead of the wave of changes to medical practices and communication.

Please contact us for more information or help.

Get Started

Speech Recognition

Text Generation

Agentic Framework (beta)

Administration

Using the API

Features

Languages

Punctuation

Audio configuration

Diarization

Formatting

Vocabulary

Get Started

Speech Recognition

Text Generation

Agentic Framework (beta)

Administration

​Using the API

​Features

Languages

Punctuation

Audio configuration

Diarization

Formatting

Vocabulary

Using the API

Features