Chat Completions
Use CometAPI POST /v1/chat/completions to send multi-message conversations and get LLM replies with streaming, temperature, and max_tokens controls.
model parameter; most OpenAI-compatible SDKs work by setting base_url to https://api.cometapi.com/v1.
Message roles
| Role | Description |
|---|---|
system | Sets the assistant’s behavior and personality. Placed at the start of the conversation. |
developer | Replaces system for newer models (o1+). Provides instructions the model should follow regardless of user input. |
user | Messages from the end user. |
assistant | Previous model responses, used to maintain conversation history. |
tool | Results from tool/function calls. Must include tool_call_id matching the original tool call. |
Send multimodal input
Many models support images and audio alongside text. To send multimodal messages, use the array format forcontent:
detail parameter controls image analysis depth:
low— faster, uses fewer tokens (fixed cost)high— detailed analysis, more tokens consumedauto— the model decides (default)
Stream responses
To receive incremental output, setstream to true. The response is delivered as Server-Sent Events (SSE), where each event contains a chat.completion.chunk object:
Request structured output
To force the model to return valid JSON matching a specific schema, useresponse_format:
json_schema) guarantees the output matches your schema exactly. JSON Object mode (json_object) only guarantees valid JSON — the structure is not enforced.Call tools and functions
To enable the model to call external functions, provide tool definitions:finish_reason: "tool_calls" and the message.tool_calls array will contain the function name and arguments. You then execute the function and send the result back as a tool message with the matching tool_call_id.
Cross-provider notes
Parameter support across providers
Parameter support across providers
| Parameter | OpenAI GPT | Claude (via compat) | Gemini (via compat) |
|---|---|---|---|
temperature | 0–2 | 0–1 | 0–2 |
top_p | 0–1 | 0–1 | 0–1 |
n | 1–128 | 1 only | 1–8 |
stop | Up to 4 | Up to 4 | Up to 5 |
tools | ✅ | ✅ | ✅ |
response_format | ✅ | ✅ (json_schema) | ✅ |
logprobs | ✅ | ❌ | ❌ |
reasoning_effort | o-series, GPT-5.1+ | ❌ | ❌ (use thinking for Gemini native) |
max_tokens vs max_completion_tokens
max_tokens vs max_completion_tokens
max_tokens— The legacy parameter. Works with most models but is deprecated for newer OpenAI models.max_completion_tokens— The recommended parameter for GPT-4.1, GPT-5 series, and o-series models. Required for reasoning models as it includes both output tokens and reasoning tokens.
system vs developer role
system vs developer role
system— The traditional instruction role. Works with all models.developer— Introduced with o1 models. Provides stronger instruction-following for newer models. Falls back tosystembehavior on older models.
developer for new projects targeting GPT-4.1+ or o-series models.FAQ
How to handle rate limits?
When encountering429 Too Many Requests, implement exponential backoff:
How to maintain conversation context?
Include the full conversation history in themessages array:
What does finish_reason mean?
| Value | Meaning |
|---|---|
stop | Natural completion or hit a stop sequence. |
length | Reached max_tokens or max_completion_tokens limit. |
tool_calls | The model invoked one or more tool/function calls. |
content_filter | Output was filtered due to content policy. |
How to control costs?
- Use
max_completion_tokensto cap output length. - Choose cost-effective models (e.g.,
gpt-5.4-miniorgpt-5.4-nanofor simpler tasks). - Keep prompts concise — avoid redundant context.
- Monitor token usage in the
usageresponse field.
Authorizations
Bearer token authentication. Use your CometAPI key.
Body
Model ID to use for this request. See the Models page for current options.
"gpt-4.1"
A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).
If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.
Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.
0 <= x <= 2Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.
0 <= x <= 1Number of completion choices to generate for each input message. Defaults to 1.
Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.
Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.
-2 <= x <= 2A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.
A unique identifier for your end-user. Helps with abuse detection and monitoring.
An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.
Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.
A list of tools the model may call. Currently supports function type tools.
Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.
Whether to return log probabilities of the output tokens.
Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.
0 <= x <= 20Controls the reasoning effort for o-series and GPT-5.1+ models.
low, medium, high Options for streaming. Only valid when stream is true.
Specifies the processing tier.
auto, default, flex, priority Response
Successful chat completion response.
Unique completion identifier.
"chatcmpl-abc123"
Object type. Non-streaming responses use chat.completion.
chat.completion "chat.completion"
Unix timestamp of creation.
1774412483
The model used (may include version suffix).
"gpt-5.4-2026-03-05"
Array of completion choices.
Token accounting for this request. Billing uses these counts.
Service tier that processed the request, when the provider reports one.
"default"
Provider backend configuration fingerprint, when the provider reports one.
"fp_490a4ad033"