> ## Documentation Index
> Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create transcription

> Use CometAPI POST /v1/audio/transcriptions to transcribe audio into text with a selected transcription model and response format.

Use this endpoint to transcribe audio into text in the source language. It fits meeting notes, voice messages, media indexing, captions, and support workflows that need searchable text.

## First request

Send a supported audio file with `model` and `file`. Keep the first file short while you validate upload handling, authentication, and response parsing.

## Read the response

The default response includes the transcribed `text`. If you request another response format, make sure your client parses that format instead of assuming the default JSON shape.

## Next steps

* Use [Create Speech](/api/audio/create-speech) when you need text-to-speech output.
* Use [Create Translation](/api/audio/create-translation) when the target output should be English.


## OpenAPI

````yaml api/openapi/audio/post-create-transcription.openapi.json POST /v1/audio/transcriptions
openapi: 3.1.0
info:
  title: Create transcription API
  version: 1.0.0
servers:
  - url: https://api.cometapi.com
security:
  - bearerAuth: []
paths:
  /v1/audio/transcriptions:
    post:
      summary: Create transcription
      operationId: create_transcription
      requestBody:
        required: true
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                file:
                  format: binary
                  type: string
                  description: >-
                    The audio file to transcribe. Supported formats: flac, mp3,
                    mp4, mpeg, mpga, m4a, ogg, wav, webm.
                model:
                  type: string
                  description: >-
                    The speech-to-text model to use. Choose a current speech
                    model from the [Models page](/overview/models).
                  default: whisper-1
                language:
                  type: string
                  description: >-
                    The language of the input audio in ISO-639-1 format (e.g.,
                    `en`, `zh`, `ja`). Supplying the language improves accuracy
                    and latency.
                prompt:
                  type: string
                  description: >-
                    Optional text to guide the model's style or continue a
                    previous audio segment. The prompt should match the audio
                    language.
                response_format:
                  type: string
                  description: The output format for the transcription.
                  enum:
                    - json
                    - text
                    - srt
                    - verbose_json
                    - vtt
                  default: json
                temperature:
                  type: number
                  description: >-
                    Sampling temperature between 0 and 1. Higher values produce
                    more random output; lower values are more focused. When set
                    to 0, the model auto-adjusts temperature using log
                    probability.
                  minimum: 0
                  maximum: 1
                  default: 0
              required:
                - file
                - model
      responses:
        '200':
          description: The transcription result.
          content:
            application/json:
              schema:
                type: object
                required:
                  - text
                properties:
                  text:
                    type: string
                    description: The transcribed text.
              examples:
                Default:
                  summary: Transcription result
                  value:
                    text: Hello, welcome to CometAPI.
      x-codeSamples:
        - lang: python
          label: Python (OpenAI SDK)
          source: |-
            import os
            from openai import OpenAI

            client = OpenAI(
                api_key=os.environ["COMETAPI_KEY"],
                base_url="https://api.cometapi.com/v1"
            )

            audio_file = open("audio.mp3", "rb")
            transcription = client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file
            )
            print(transcription.text)
        - lang: javascript
          label: JavaScript (OpenAI SDK)
          source: |-
            import OpenAI from "openai";
            import fs from "fs";

            const client = new OpenAI({
              apiKey: process.env.COMETAPI_KEY,
              baseURL: "https://api.cometapi.com/v1"
            });

            const transcription = await client.audio.transcriptions.create({
              model: "whisper-1",
              file: fs.createReadStream("audio.mp3")
            });
            console.log(transcription.text);
        - lang: shell
          label: cURL
          source: |-
            curl -X POST https://api.cometapi.com/v1/audio/transcriptions \
              -H "Authorization: Bearer $COMETAPI_KEY" \
              -F model="whisper-1" \
              -F file="@audio.mp3"
components:
  securitySchemes:
    bearerAuth:
      type: http
      scheme: bearer
      description: Bearer token authentication. Use your CometAPI key.

````