聊天补全
使用 CometAPI POST /v1/chat/completions 发送多消息对话,并通过流式输出、temperature 和 max_tokens 控制获取 LLM 回复。
CometAPI 通过单一的 OpenAI 兼容接口,将聊天补全路由到多个提供商——包括 OpenAI、Claude 和 Gemini。通过更改Documentation Index
Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt
Use this file to discover all available pages before exploring further.
model 参数即可在模型之间切换;大多数 OpenAI 兼容 SDK 只需将 base_url 设置为 https://api.cometapi.com/v1 即可工作。
消息角色
| Role | Description |
|---|---|
system | 设置助手的行为和个性。放置在对话开头。 |
developer | 对于较新的模型(o1+),用于替代 system。无论用户输入什么,都提供模型应遵循的指令。 |
user | 来自终端用户的消息。 |
assistant | 先前的模型响应,用于维护对话历史。 |
tool | 工具/函数调用的结果。必须包含与原始工具调用匹配的 tool_call_id。 |
发送多模态(Multimodal)输入
许多模型支持图像和音频与文本一起输入。要发送多模态消息,请对content 使用数组格式:
detail 参数控制图像分析深度:
low— 更快,使用更少的 tokens(固定成本)high— 详细分析,消耗更多 tokensauto— 由模型决定(默认)
流式输出(Streaming)响应
要接收增量输出,请将stream 设置为 true。响应会以 Server-Sent Events (SSE) 的形式传递,其中每个事件都包含一个 chat.completion.chunk 对象:
请求结构化输出
要强制模型返回符合特定 schema 的有效 JSON,请使用response_format:
json_schema)可保证输出与您的 schema 完全匹配。JSON Object 模式(json_object)仅保证返回有效 JSON——不强制约束其结构。调用工具和函数
要让模型能够调用外部函数,请提供工具定义:finish_reason: "tool_calls",并且 message.tool_calls 数组会包含函数名称和参数。随后,您需要执行该函数,并将结果作为带有匹配 tool_call_id 的 tool 消息发送回去。
跨提供商说明
各提供商的参数支持
各提供商的参数支持
| 参数 | OpenAI GPT | Claude(通过 compat) | Gemini(通过 compat) |
|---|---|---|---|
temperature | 0–2 | 0–1 | 0–2 |
top_p | 0–1 | 0–1 | 0–1 |
n | 1–128 | 仅 1 | 1–8 |
stop | 最多 4 个 | 最多 4 个 | 最多 5 个 |
tools | ✅ | ✅ | ✅ |
response_format | ✅ | ✅ (json_schema) | ✅ |
logprobs | ✅ | ❌ | ❌ |
reasoning_effort | o-series、GPT-5.1+ | ❌ | ❌(Gemini 原生请使用 thinking) |
max_tokens 与 max_completion_tokens
max_tokens 与 max_completion_tokens
max_tokens— 旧版参数。适用于大多数模型,但对于较新的 OpenAI 模型已被弃用。max_completion_tokens— GPT-4.1、GPT-5 系列和 o-series 模型的推荐参数。对于推理模型是必需的,因为它同时包含输出 tokens 和推理 tokens。
system 与 developer role
system 与 developer role
system— 传统的指令 role。适用于所有模型。developer— 随 o1 模型引入。可为较新的模型提供更强的指令遵循能力。在较旧模型上会回退为system行为。
developer。常见问题
如何处理速率限制?
当遇到429 Too Many Requests 时,请实现指数退避:
如何维护对话上下文?
在messages 数组中包含完整的对话历史:
finish_reason 是什么意思?
| Value | Meaning |
|---|---|
stop | 自然完成或命中停止序列。 |
length | 达到 max_tokens 或 max_completion_tokens 限制。 |
tool_calls | 模型调用了一个或多个工具/函数调用。 |
content_filter | 由于内容策略,输出被过滤。 |
如何控制成本?
- 使用
max_completion_tokens限制输出长度。 - 选择性价比高的模型(例如,对于更简单的任务可使用
gpt-5.4-mini或gpt-5.4-nano)。 - 保持 Prompt 简洁——避免冗余上下文。
- 监控
usage响应字段中的 Token 使用情况。
授权
Bearer token authentication. Use your CometAPI key.
请求体
Model ID to use for this request. See the Models page for current options.
"gpt-4.1"
A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).
If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.
Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.
0 <= x <= 2Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.
0 <= x <= 1Number of completion choices to generate for each input message. Defaults to 1.
Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.
Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.
-2 <= x <= 2A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.
A unique identifier for your end-user. Helps with abuse detection and monitoring.
An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.
Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.
A list of tools the model may call. Currently supports function type tools.
Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.
Whether to return log probabilities of the output tokens.
Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.
0 <= x <= 20Controls the reasoning effort for o-series and GPT-5.1+ models.
low, medium, high Options for streaming. Only valid when stream is true.
Specifies the processing tier.
auto, default, flex, priority 响应
Successful chat completion response.
Unique completion identifier.
"chatcmpl-abc123"
chat.completion "chat.completion"
Unix timestamp of creation.
1774412483
The model used (may include version suffix).
"gpt-5.4-2025-07-16"
Array of completion choices.
"default"
"fp_490a4ad033"