Please ensure your audio files conform to the specifications listed below to prevent error responses. Let us know if you need help with audio formatting, request configuration, or errors.
Audio File Requirements
The following file types are supported by Corti speech recognition, for both streaming and asynchronous workflows:WAV
PCM uncompressed preserves raw data at 16-bit/16 kHz, offering maximum fidelity and predictable preprocessing
FLAC
Provides lossless compression, halving storage needs compared to WAV without sacrificing audio detail, which helps retain speech clarity for recognition reliability
MP3
Use lossy compression to reduce file size while keeping speech intelligible, efficient for streaming but with some trade‑off in audio quality
M4A
Use lossy compression to reduce file size while keeping speech intelligible, efficient for streaming but with some trade‑off in audio quality
WebM
Often used for web streaming, supporting efficient, low-latency delivery of speech audio within web-friendly containers
Unsupported Audio
RAW audio
is not supported at this time
Files without spoken word
may produce a
400
error Corti ASR supports file transcoding so that your audio files do not need to conform to a specific precision and sample rate.Use of 16‑bit depth and 16 kHz sample rate for audio files is recommended (but not required) to support consistent, accurate speech recognition.
Recommended precision and sample rate
Quality | Rationale | |
---|---|---|
16‑bit depth | Delivers sufficient dynamic range precision with low quantization noise, balancing quality with processing efficiency for speech recognition. | |
16 kHz sampling | Captures the full range of human speech frequencies (up to 8 kHz), with higher rates offering negligible recognition benefit but increasing computational cost. |
Please contact us if you need more information about supported audio formats or are having issues processing an audio file.Additional references and resources: