Audio Models
Create speech
Use CometAPI POST /v1/audio/speech to convert text into audio with a selected text-to-speech model and output format.
POST
Python (OpenAI SDK)
Use this endpoint to turn text into an audio file through the OpenAI-compatible audio API. It fits narration, short voice prompts, read-aloud features, and other workflows where your app already has text and needs speech output.
First request
Start with three fields:model, input, and voice. Keep the first request short so you can verify authentication, audio format, and file handling before you tune speed or output format.
Read the response
The response is binary audio, not JSON. In SDK examples, write the response to a file such asoutput.mp3. In direct HTTP clients, save the response body and set the file extension to match the requested response_format.
Next steps
- Use Create Transcription when you need to turn speech back into text.
- Use Create Translation when you need English text from non-English audio.
Authorizations
Bearer token authentication. Use your CometAPI key.
Body
application/json
The TTS model to use. Choose a current speech model from the Models page.
The text to generate audio for. Maximum length is 4096 characters.
Maximum string length:
4096The voice to use for speech synthesis.
Available options:
alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer The audio output format.
Available options:
mp3, opus, aac, flac, wav, pcm The speed of the generated audio. Select a value between 0.25 and 4.0.
Required range:
0.25 <= x <= 4Response
200 - audio/mpeg
The audio file content.
The response is of type file.