聊天補全
使用 CometAPI POST /v1/chat/completions 傳送多訊息對話,並取得可搭配串流、temperature 與 max_tokens 控制的 LLM 回覆。
CometAPI 會透過單一的 OpenAI 相容介面,將聊天補全路由到多個供應商,包括 OpenAI、Claude 和 Gemini。只要變更Documentation Index
Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt
Use this file to discover all available pages before exploring further.
model 參數即可切換模型;大多數 OpenAI 相容 SDK 只需將 base_url 設為 https://api.cometapi.com/v1 即可運作。
訊息角色
| 角色 | 說明 |
|---|---|
system | 設定助理的行為與個性。放置於對話開頭。 |
developer | 在較新的模型(o1+)中取代 system。提供模型無論使用者輸入為何都應遵循的指示。 |
user | 來自終端使用者的訊息。 |
assistant | 先前的模型回應,用於維持對話歷史。 |
tool | 工具/函式呼叫的結果。必須包含與原始工具呼叫相符的 tool_call_id。 |
傳送多模態(Multimodal)輸入
許多模型支援文字以外的圖片與音訊。若要傳送多模態訊息,請對content 使用陣列格式:
detail 參數可控制圖片分析深度:
low— 較快,使用較少 token(固定成本)high— 詳細分析,消耗更多 tokenauto— 由模型決定(預設)
串流回應
若要接收逐步輸出,請將stream 設為 true。回應會以 Server-Sent Events (SSE) 傳送,其中每個事件都包含一個 chat.completion.chunk 物件:
請求結構化輸出
若要強制模型回傳符合特定 schema 的有效 JSON,請使用response_format:
json_schema)可保證輸出會精確符合你的 schema。JSON Object 模式(json_object)只保證會是有效的 JSON——不會強制結構。呼叫工具與函式
若要讓模型呼叫外部函式,請提供工具定義:finish_reason: "tool_calls",而 message.tool_calls 陣列會包含函式名稱與引數。接著你需要執行該函式,並將結果以具有對應 tool_call_id 的 tool 訊息傳回。
跨供應商注意事項
各供應商之間的參數支援
各供應商之間的參數支援
| Parameter | OpenAI GPT | Claude (via compat) | Gemini (via compat) |
|---|---|---|---|
temperature | 0–2 | 0–1 | 0–2 |
top_p | 0–1 | 0–1 | 0–1 |
n | 1–128 | 僅 1 | 1–8 |
stop | 最多 4 個 | 最多 4 個 | 最多 5 個 |
tools | ✅ | ✅ | ✅ |
response_format | ✅ | ✅ (json_schema) | ✅ |
logprobs | ✅ | ❌ | ❌ |
reasoning_effort | o-series, GPT-5.1+ | ❌ | ❌(Gemini 原生請使用 thinking) |
max_tokens 與 max_completion_tokens
max_tokens 與 max_completion_tokens
max_tokens— 舊版參數。適用於大多數模型,但對較新的 OpenAI 模型已被棄用。max_completion_tokens— GPT-4.1、GPT-5 系列與 o-series 模型的建議參數。推理模型必須使用此參數,因為它同時包含輸出 tokens 與 reasoning tokens。
system 與 developer role
system 與 developer role
system— 傳統的指令角色。適用於所有模型。developer— 隨 o1 模型引入。可為較新的模型提供更強的指令遵循能力。在較舊的模型上會回退為system行為。
developer。常見問題
如何處理速率限制?
當遇到429 Too Many Requests 時,請實作指數退避:
如何維持對話上下文?
在messages 陣列中包含完整的對話歷史:
finish_reason 是什麼意思?
| 值 | 含義 |
|---|---|
stop | 自然完成,或命中了停止序列。 |
length | 達到 max_tokens 或 max_completion_tokens 的限制。 |
tool_calls | 模型呼叫了一個或多個工具/函式呼叫。 |
content_filter | 由於內容政策,輸出已被過濾。 |
如何控制成本?
- 使用
max_completion_tokens限制輸出長度。 - 選擇具成本效益的模型(例如,較簡單的任務可使用
gpt-5.4-mini或gpt-5.4-nano)。 - 保持 Prompt 簡潔——避免冗餘的上下文。
- 在
usage回應欄位中監控 Token 使用量。
授權
Bearer token authentication. Use your CometAPI key.
主體
Model ID to use for this request. See the Models page for current options.
"gpt-4.1"
A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).
If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.
Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.
0 <= x <= 2Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.
0 <= x <= 1Number of completion choices to generate for each input message. Defaults to 1.
Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.
Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.
-2 <= x <= 2A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.
A unique identifier for your end-user. Helps with abuse detection and monitoring.
An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.
Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.
A list of tools the model may call. Currently supports function type tools.
Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.
Whether to return log probabilities of the output tokens.
Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.
0 <= x <= 20Controls the reasoning effort for o-series and GPT-5.1+ models.
low, medium, high Options for streaming. Only valid when stream is true.
Specifies the processing tier.
auto, default, flex, priority 回應
Successful chat completion response.
Unique completion identifier.
"chatcmpl-abc123"
chat.completion "chat.completion"
Unix timestamp of creation.
1774412483
The model used (may include version suffix).
"gpt-5.4-2025-07-16"
Array of completion choices.
"default"
"fp_490a4ad033"