Skip to main content
POST
/
v1
/
audio
/
transcriptions
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="<COMETAPI_KEY>",
    base_url="https://api.cometapi.com/v1"
)

audio_file = open("audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)
print(transcription.text)
{
  "text": "Hello, welcome to CometAPI."
}

Documentation Index

Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Use this endpoint to transcribe audio into text in the source language. It fits meeting notes, voice messages, media indexing, captions, and support workflows that need searchable text.

First request

Send a supported audio file with model and file. Keep the first file short while you validate upload handling, authentication, and response parsing.

Read the response

The default response includes the transcribed text. If you request another response format, make sure your client parses that format instead of assuming the default JSON shape.

Next steps

Authorizations

Authorization
string
header
required

Bearer token authentication. Use your CometAPI key.

Body

multipart/form-data
file
file
required

The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

model
string
default:whisper-1
required

The speech-to-text model to use. Choose a current speech model from the Models page.

language
string

The language of the input audio in ISO-639-1 format (e.g., en, zh, ja). Supplying the language improves accuracy and latency.

prompt
string

Optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

response_format
enum<string>
default:json

The output format for the transcription.

Available options:
json,
text,
srt,
verbose_json,
vtt
temperature
number
default:0

Sampling temperature between 0 and 1. Higher values produce more random output; lower values are more focused. When set to 0, the model auto-adjusts temperature using log probability.

Required range: 0 <= x <= 1

Response

200 - application/json

The transcription result.

text
string
required

The transcribed text.