Request Body
Audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
Model to use. Currently only
whisper-1 is supported.Language of the audio in ISO-639-1 format (e.g.,
en, zh, ja).Optional text to guide the model’s style or continue a previous segment.
Output format:
json, text, srt, verbose_json, vtt.Sampling temperature (0 to 1).
Timestamp granularity:
word and/or segment. Requires verbose_json.Response
The transcribed text.
verbose_json:
Always
transcribe.Detected language.
Audio duration in seconds.
Transcription segments with timestamps.
Word-level timestamps (if requested).