跳转到主要内容
POST
/
v1
/
audio
/
transcriptions
Python (OpenAI SDK)
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["COMETAPI_KEY"],
    base_url="https://api.cometapi.com/v1"
)

audio_file = open("audio.mp3", "rb")
transcription = client.audio.transcriptions.create(
    model="whisper-1",
    file=audio_file
)
print(transcription.text)
{
  "text": "Hello, welcome to CometAPI."
}
使用此端点可将音频转录为源语言文本。适用于会议记录、语音消息、媒体索引、字幕,以及需要可搜索文本的支持工作流。

首次请求

发送一个受支持的音频文件,并提供 modelfile。在验证上传处理、身份验证和响应解析时,建议第一个文件尽量简短。

读取响应

默认响应包含转录后的 text。如果你请求了其他响应格式,请确保你的客户端按该格式进行解析,而不是默认假设为 JSON 结构。

后续步骤

  • 当你需要文本转语音输出时,请使用 创建语音
  • 当目标输出应为英语时,请使用 创建翻译

授权

Authorization
string
header
必填

Bearer token authentication. Use your CometAPI key.

请求体

multipart/form-data
file
file
必填

The audio file to transcribe. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

model
string
默认值:whisper-1
必填

The speech-to-text model to use. Choose a current speech model from the Models page.

language
string

The language of the input audio in ISO-639-1 format (e.g., en, zh, ja). Supplying the language improves accuracy and latency.

prompt
string

Optional text to guide the model's style or continue a previous audio segment. The prompt should match the audio language.

response_format
enum<string>
默认值:json

The output format for the transcription.

可用选项:
json,
text,
srt,
verbose_json,
vtt
temperature
number
默认值:0

Sampling temperature between 0 and 1. Higher values produce more random output; lower values are more focused. When set to 0, the model auto-adjusts temperature using log probability.

必填范围: 0 <= x <= 1

响应

200 - application/json

The transcription result.

text
string
必填

The transcribed text.