Overview of Corti speech recognition

Introduction

Corti speech recognition is specifically designed for use in the healthcare domain. The API endpoints detailed below provide access to different speech recognition functionality. Selecting the right endpoint is important based on your needs and use case.

Please review the languages page to learn more about languages supported per endpoint, functionality per language tier, and language code to use in API requests.

Corti speech recognition endpoints

Endpoint	Description
Transcribe	Stateless, real-time speech-to-text and commands used to power dictation workflows
Stream	Real-time transcript generation and fact extraction to power intelligent ambient documentation and decision support workflows
Transcripts	Speech-to-text via batch audio file processing supporting dictation or conversational transcripts

Endpoint functionality

	Transcribe	Stream	Transcripts
Connection	WSS	WSS	REST
Data processing	Synchronous	Synchronous	Asynchronous
Architecture	Stateless	Stateful	Stateful
Speech-to-text	Dictation	Conversational transcript	Dictation or transcript
Diarization	No	Optional	Optional
Multichannel	No	Optional	Optional
Custom command definition	Yes	No	No
Automatic punctuation	Optional	Yes	Optional
Spoken punctuation	Optional	No	Optional
Smart formatting	coming soon
Custom dictionary	coming soon

Please contact us if you are interested in features that are not listed here, need help determining the best speech recognition endpoint for your needs, or have questions about how to configure your API requests.

Architecture

The Corti speech recognition pipeline is based on a combination of model architectures. The main workhorse is an encoder-decoder architecture with byte pair encoding (BPE), similar to Whisper. There is also a connectionist temporal classification (CTC)-based architecture that works in tandem with the encoder model. The combination of those two modeling paradigms provide the following benefits:

Reduce potential hallucinations
Improve speed of predictions
Configure latency to balance response time and accuracy
Make it easy to integrate with the system

Finetune and evaluate

Model and architecture finetuning happens in two ways:

Decoder training: Language modeling/terminology learning
Encoder training: Auditory model learning

As a starting point, open source models are used as baseline parameters upon which various methodologies are employed to refine the parameters of other models. Different data sets are used for model finetuning and training than for validation. Assessment methodologies include, but are not limited to the following:

Word error rate (WER)
Character error rate (CER)
Medical term accuracy rate
Levenshtein distance

See some of our supporting research here:

Get Started

About Corti API

Templates and Documents

Workflows

SDKs

Introduction

Corti speech recognition endpoints

Endpoint functionality

Model training and refinement

Architecture

Finetune and evaluate

Get Started

About Corti API

Templates and Documents

Workflows

SDKs

​Introduction

​Corti speech recognition endpoints

​Endpoint functionality

​Model training and refinement

​Architecture

​Finetune and evaluate

Introduction

Corti speech recognition endpoints

Endpoint functionality

Model training and refinement

Architecture

Finetune and evaluate