> ## Documentation Index
> Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create speech

> Use CometAPI POST /v1/audio/speech to convert text into audio with a selected text-to-speech model and output format.

Use this endpoint to turn text into an audio file through the OpenAI-compatible audio API. It fits narration, short voice prompts, read-aloud features, and other workflows where your app already has text and needs speech output.

## First request

Start with three fields: `model`, `input`, and `voice`. Keep the first request short so you can verify authentication, audio format, and file handling before you tune speed or output format.

## Read the response

The response is binary audio, not JSON. In SDK examples, write the response to a file such as `output.mp3`. In direct HTTP clients, save the response body and set the file extension to match the requested `response_format`.

## Next steps

* Use [Create Transcription](/api/audio/create-transcription) when you need to turn speech back into text.
* Use [Create Translation](/api/audio/create-translation) when you need English text from non-English audio.


## OpenAPI

````yaml api/openapi/audio/post-create-speech.openapi.json POST /v1/audio/speech
openapi: 3.1.0
info:
  title: Create speech API
  version: 1.0.0
servers:
  - url: https://api.cometapi.com
security:
  - bearerAuth: []
paths:
  /v1/audio/speech:
    post:
      summary: Create speech
      operationId: create_speech
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              required:
                - model
                - input
                - voice
              properties:
                model:
                  type: string
                  description: >-
                    The TTS model to use. Choose a current speech model from the
                    [Models page](/overview/models).
                  default: tts-1
                input:
                  type: string
                  description: >-
                    The text to generate audio for. Maximum length is 4096
                    characters.
                  maxLength: 4096
                voice:
                  type: string
                  description: The voice to use for speech synthesis.
                  enum:
                    - alloy
                    - ash
                    - ballad
                    - coral
                    - echo
                    - fable
                    - onyx
                    - nova
                    - sage
                    - shimmer
                  default: alloy
                response_format:
                  type: string
                  description: The audio output format.
                  enum:
                    - mp3
                    - opus
                    - aac
                    - flac
                    - wav
                    - pcm
                  default: mp3
                speed:
                  type: number
                  description: >-
                    The speed of the generated audio. Select a value between
                    0.25 and 4.0.
                  minimum: 0.25
                  maximum: 4
                  default: 1
            examples:
              Default:
                summary: Standard TTS (tts-1)
                value:
                  model: tts-1
                  input: The quick brown fox jumped over the lazy dog.
                  voice: alloy
              gpt_4o_mini_tts:
                summary: GPT-4o mini TTS (gpt-4o-mini-tts)
                value:
                  model: gpt-4o-mini-tts
                  input: The quick brown fox jumped over the lazy dog.
                  voice: alloy
      responses:
        '200':
          description: The audio file content.
          content:
            audio/mpeg:
              schema:
                type: string
                format: binary
      x-codeSamples:
        - lang: python
          label: Python (OpenAI SDK)
          source: |-
            import os
            from openai import OpenAI

            client = OpenAI(
                api_key=os.environ["COMETAPI_KEY"],
                base_url="https://api.cometapi.com/v1"
            )

            response = client.audio.speech.create(
                model="tts-1",
                voice="alloy",
                input="The quick brown fox jumped over the lazy dog."
            )

            response.stream_to_file("output.mp3")
        - lang: javascript
          label: JavaScript (OpenAI SDK)
          source: |-
            import OpenAI from "openai";
            import fs from "fs";

            const client = new OpenAI({
              apiKey: process.env.COMETAPI_KEY,
              baseURL: "https://api.cometapi.com/v1"
            });

            const response = await client.audio.speech.create({
              model: "tts-1",
              voice: "alloy",
              input: "The quick brown fox jumped over the lazy dog."
            });

            const buffer = Buffer.from(await response.arrayBuffer());
            fs.writeFileSync("output.mp3", buffer);
        - lang: shell
          label: cURL
          source: |-
            curl -X POST https://api.cometapi.com/v1/audio/speech \
              -H "Authorization: Bearer $COMETAPI_KEY" \
              -H "Content-Type: application/json" \
              -d '{
                "model": "tts-1",
                "input": "The quick brown fox jumped over the lazy dog.",
                "voice": "alloy"
              }' \
              --output output.mp3
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: Bearer token authentication. Use your CometAPI key.

````