Create Transcription
Use CometAPI POST /v1/audio/transcriptions to transcribe audio into text in the original language. Supports Whisper model with multiple output formats.
First request
Send a supported audio file withmodel and file. Keep the first file short while you validate upload handling, authentication, and response parsing.
Read the response
The default response includes the transcribedtext. If you request another response format, make sure your client parses that format instead of assuming the default JSON shape.
Next steps
- Use Create Speech when you need text-to-speech output.
- Use Create Translation when the target output should be English.
Authorizations
Bearer token authentication. Use your CometAPI key.
Body
The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.
The speech-to-text model to use. Choose a current speech model from the Models page.
The language of the input audio in ISO-639-1 format (e.g., en, zh, ja). Supplying the language improves accuracy and latency.
Optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.
The output format for the transcription.
json, text, srt, verbose_json, vtt Sampling temperature between 0 and 1. Higher values produce more random output; lower values are more focused. When set to 0, the model auto-adjusts temperature using log probability.
0 <= x <= 1Response
The transcription result.
The transcribed text.