Skip to main content
Diarization is the process of segmenting an audio recording by speaker, assigning portions of speech to distinct participants. This results in a transcript where spoken content is grouped and labeled by who said what, improving clarity in multi-speaker conversations.

Feature availability:


Speaker-separated transcripts improve readability for humans. AI scribe and automation workflows do not rely on diarization labels to avoid propagating speaker-assignment errors.

Purpose and Intended Use

Diarization is a user-facing readability feature. Its primary purpose is to make transcripts easier for humans to read, review, and understand by visually separating speakers in a conversation. This is especially useful in clinical settings where multiple participants are present and conversational turns matter for interpretation.

Important Limitations

Diarization is not a dependency for AI scribe or document generation. Corti’s downstream language models do not rely on diarization labels or speaker assignments to generate clinical notes, summaries, or structured documentation. As a result:
  • Imperfect speaker attribution does not reduce the quality of AI-generated documents.
  • Diarization should not be treated as a source of clinical truth or used for automated decision-making.

Bottom Line

  • Diarization improves how transcripts look and read, but it does not influence how Corti’s AI understands or processes the conversation.
  • Its value lies in human review, auditability, and transcript usability—not model reasoning.

Mono vs. Multichannel Audio

Mono audio contains all speakers mixed into a single channel. In this case, diarization attempts to separate speakers based on voice characteristics alone.
Risks: Overlapping speech, turn-taking ambiguity, and similar voices cannot always be reliably separated, which can impact speaker attribution.
Multichannel audio provides one channel per speaker (for example, telehealth workflow, where separate microphones are used by clinician and patient). When multiple channels are available, transcripts can be attributed directly based on channel configuration, resulting in consistent and predictable speaker labeling.
Diarization is not required in a multichannel audio workflow as speakers are naturally separated by channel.
For details on how to configure and upload mono or multichannel recordings, see details here.
Please contact us if you have further questions on diarized transcripts.