Introduction

The Transcription Workflow is defined by processing a complete audio file to return a text document. In scenarios where real-time speech-to-text is not required or feasible, the transcription workflow provides functional and cost effective means for creating verbatim, conversational or dictation-style transcripts.

Endpoints and capabilities

EndpointCapabilityUse
InteractionsThe foundational unit that ties together all related data and operations, enabling a cohesive workflow.Required
RecordingsUpload audio file(s) that can be used for transcript generation.Required
TranscriptsGenerate transcripts for audio files that are associated with the interaction.Required

Workflow

1

Create interaction

  1. The workflow begins with the client initiating an interaction by sending a POST request to the /interactions endpoint.
  2. The API responds with a unique id for the interaction and a WebSocket URL (wssUrl). The identifier will be used to manage the subsequent steps of the workflow. The WebSocket URL will not be required for this workflow.
2

Upload audio recording

  1. Once the interaction is initialized, the client uploads an audio file associated with that interaction by sending a POST request to /interactions/:id/recording.
  2. The API responds with a 200 status and returns a recordingId, confirming that the audio file has been successfully uploaded and linked to the interaction.
3

Create transcript

  1. After the recording is uploaded, the client initiates the transcription process by sending a POST request to /interactions/:id/transcripts.
  2. The API processes the audio and returns a 200 status with the generated transcript. This transcript contains the text version of the recorded interaction, extracted and formatted for review.
See details on transcription configuration options here