跳转到主要内容
POST
/
v1
/
chat
/
completions
from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)
{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

概述

聊天补全端点是与大语言模型交互时最常用的 API。它接收由多条消息组成的对话,并返回模型的响应。 CometAPI 通过单一统一接口将此端点路由到多个提供商——包括 OpenAI、Anthropic Claude(通过兼容层)、Google Gemini 等。你只需更改 model 参数即可在不同模型之间切换。
此端点遵循 OpenAI 聊天补全格式。大多数兼容 OpenAI 的 SDK 和工具只需将 base_url 改为 https://api.cometapi.com/v1 即可与 CometAPI 配合使用。

重要说明

模型特定行为——不同模型可能支持不同的参数子集,并返回略有差异的响应字段。例如,reasoning_effort 仅适用于推理模型(o-series、GPT-5.1+),某些模型也可能不支持 logprobsn > 1。
响应透传——CometAPI 会原样透传模型响应(在提供商之间路由时除格式规范化外不做修改),确保你收到与原始 API 一致的输出。
OpenAI Pro 模型——对于 OpenAI Pro 系列模型(例如 o1-pro),请改用 responses 端点。

消息角色

角色描述
system设置助手的行为和个性。放置在对话开头。
developer在较新的模型(o1+)中替代 system。无论用户输入如何,都提供模型应遵循的指令。
user来自最终用户的消息。
assistant先前的模型响应,用于维持对话历史。
tool工具/函数调用的结果。必须包含与原始工具调用匹配的 tool_call_id
对于较新的模型(GPT-4.1、GPT-5 系列、o-series),在指令消息中优先使用 developer 而不是 system。两者都可用,但 developer 能提供更强的指令遵循行为。

多模态(Multimodal)输入

许多模型支持文本之外的图像和音频。使用 content 的数组格式来发送多模态消息:
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}
detail 参数控制图像分析深度:
  • low —— 更快,使用更少的 tokens(固定成本)
  • high —— 更详细的分析,消耗更多 tokens
  • auto —— 由模型决定(默认)

流式输出(Streaming)

stream 设置为 true 时,响应会以 Server-Sent Events (SSE) 的形式传输。每个事件都包含一个带有增量内容的 chat.completion.chunk 对象:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
如需在流式响应中包含 Token 使用统计信息,请将 stream_options.include_usage 设置为 true。使用量数据会出现在 [DONE] 之前的最后一个 chunk 中。

结构化输出

使用 response_format 强制模型返回符合特定 schema 的有效 JSON:
{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}
JSON Schema 模式(json_schema)可保证输出与您的 schema 严格匹配。JSON Object 模式(json_object)仅保证返回的是有效 JSON——并不保证具体结构。

工具 / 函数调用(Function Calling)

通过提供工具定义来启用模型调用外部函数:
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}
当模型决定调用某个工具时,响应中的 finish_reason 会是 tool_calls,并且 message.tool_calls 数组会包含函数名称和参数。随后,您需要执行该函数,并将结果作为一个带有匹配 tool_call_idtool 消息发送回去。

响应字段

字段说明
id唯一的补全标识符(例如 chatcmpl-abc123)。
object始终为 chat.completion
model生成该响应的模型(可能包含版本后缀)。
choices补全候选数组(通常为 1,除非 n > 1)。
choices[].messageassistant 的响应消息,包含 rolecontent,以及可选的 tool_calls
choices[].finish_reason模型停止的原因:stoplengthtool_callscontent_filter
usageToken 消耗明细:prompt_tokenscompletion_tokenstotal_tokens,以及更详细的子项统计。
system_fingerprint用于调试可复现性的后端配置指纹。

跨提供商说明

ParameterOpenAI GPTClaude (via compat)Gemini (via compat)
temperature0–20–10–2
top_p0–10–10–1
n1–128仅支持 11–8
stop最多 4 个最多 4 个最多 5 个
tools
response_format✅ (json_schema)
logprobs
reasoning_efforto-series、GPT-5.1+❌(Gemini 原生请使用 thinking
  • max_tokens — 旧版参数。适用于大多数模型,但对较新的 OpenAI 模型已被弃用。
  • max_completion_tokens — GPT-4.1、GPT-5 系列和 o-series 模型推荐使用的参数。推理模型必须使用该参数,因为它同时包含输出 tokens 和 reasoning tokens。
CometAPI 在路由到不同提供商时会自动处理参数映射。
  • system — 传统的指令角色。适用于所有模型。
  • developer — 随 o1 模型引入。对较新的模型提供更强的指令遵循能力。在旧模型上会回退为 system 的行为。
对于面向 GPT-4.1+ 或 o-series 模型的新项目,建议使用 developer

常见问题

如何处理速率限制?

遇到 429 Too Many Requests 时,请实现指数退避:
import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

如何维护对话上下文?

将完整的对话历史包含在 messages 数组中:
messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

finish_reason 是什么意思?

ValueMeaning
stop自然完成,或命中停止序列。
length达到 max_tokensmax_completion_tokens 限制。
tool_calls模型调用了一个或多个工具/函数。
content_filter由于内容策略,输出被过滤。

如何控制成本?

  1. 使用 max_completion_tokens 限制输出长度。
  2. 选择更具性价比的模型(例如,对于简单任务可使用 gpt-5.4-minigpt-5.4-nano)。
  3. 保持 Prompt 简洁,避免冗余上下文。
  4. usage 响应字段中监控 token 使用情况。

授权

Authorization
string
header
必填

Bearer token authentication. Use your CometAPI key.

请求体

application/json
model
string
默认值:gpt-5.4
必填

Model ID to use for this request. See the Models page for current options.

示例:

"gpt-4.1"

messages
object[]
必填

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

stream
boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature
number
默认值:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

必填范围: 0 <= x <= 2
top_p
number
默认值:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

必填范围: 0 <= x <= 1
n
integer
默认值:1

Number of completion choices to generate for each input message. Defaults to 1.

stop
string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens
integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty
number
默认值:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

必填范围: -2 <= x <= 2
frequency_penalty
number
默认值:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

必填范围: -2 <= x <= 2
logit_bias
object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user
string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens
integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format
object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

tools
object[]

A list of tools the model may call. Currently supports function type tools.

tool_choice
默认值:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs
boolean
默认值:false

Whether to return log probabilities of the output tokens.

top_logprobs
integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

必填范围: 0 <= x <= 20
reasoning_effort
enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

可用选项:
low,
medium,
high
stream_options
object

Options for streaming. Only valid when stream is true.

service_tier
enum<string>

Specifies the processing tier.

可用选项:
auto,
default,
flex,
priority

响应

200 - application/json

Successful chat completion response.

id
string

Unique completion identifier.

示例:

"chatcmpl-abc123"

object
enum<string>
可用选项:
chat.completion
示例:

"chat.completion"

created
integer

Unix timestamp of creation.

示例:

1774412483

model
string

The model used (may include version suffix).

示例:

"gpt-5.4-2025-07-16"

choices
object[]

Array of completion choices.

usage
object
service_tier
string
示例:

"default"

system_fingerprint
string | null
示例:

"fp_490a4ad033"