Skip to main content
POST
/
v1
/
chat
/
completions
from openai import OpenAI client = OpenAI( base_url="https://api.cometapi.com/v1", api_key="<COMETAPI_KEY>", ) completion = client.chat.completions.create( model="gpt-5.4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello!"}, ], ) print(completion.choices[0].message)
{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

Documentation Index

Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt

Use this file to discover all available pages before exploring further.

CometAPI routes Chat Completions to multiple providers — including OpenAI, Claude, and Gemini — through a single OpenAI-compatible interface. Switch between models by changing the model parameter; most OpenAI-compatible SDKs work by setting base_url to https://api.cometapi.com/v1.
Request parameters and response fields can vary significantly between model providers. Check the official documentation for the provider behind the model you use whenever you need the complete parameter list or provider-specific behavior. For example, reasoning_effort only applies to reasoning models (o-series, GPT-5.1+), and some models do not support logprobs or n > 1.
For OpenAI Pro models, o-series reasoning models, and Codex models, use the Responses endpoint instead. These model families have more complete support on the Responses API.

Message roles

RoleDescription
systemSets the assistant’s behavior and personality. Placed at the start of the conversation.
developerReplaces system for newer models (o1+). Provides instructions the model should follow regardless of user input.
userMessages from the end user.
assistantPrevious model responses, used to maintain conversation history.
toolResults from tool/function calls. Must include tool_call_id matching the original tool call.
For newer models (GPT-4.1, GPT-5 series, o-series), prefer developer over system for instruction messages. Both work, but developer provides stronger instruction-following behavior.

Send multimodal input

Many models support images and audio alongside text. To send multimodal messages, use the array format for content:
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}
The detail parameter controls image analysis depth:
  • low — faster, uses fewer tokens (fixed cost)
  • high — detailed analysis, more tokens consumed
  • auto — the model decides (default)

Stream responses

To receive incremental output, set stream to true. The response is delivered as Server-Sent Events (SSE), where each event contains a chat.completion.chunk object:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]
To include token usage statistics in streaming responses, set stream_options.include_usage to true. The usage data appears in the final chunk before [DONE].

Request structured output

To force the model to return valid JSON matching a specific schema, use response_format:
{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}
JSON Schema mode (json_schema) guarantees the output matches your schema exactly. JSON Object mode (json_object) only guarantees valid JSON — the structure is not enforced.

Call tools and functions

To enable the model to call external functions, provide tool definitions:
{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}
When the model decides to call a tool, the response will have finish_reason: "tool_calls" and the message.tool_calls array will contain the function name and arguments. You then execute the function and send the result back as a tool message with the matching tool_call_id.

Cross-provider notes

ParameterOpenAI GPTClaude (via compat)Gemini (via compat)
temperature0–20–10–2
top_p0–10–10–1
n1–1281 only1–8
stopUp to 4Up to 4Up to 5
tools
response_format✅ (json_schema)
logprobs
reasoning_efforto-series, GPT-5.1+❌ (use thinking for Gemini native)
  • max_tokens — The legacy parameter. Works with most models but is deprecated for newer OpenAI models.
  • max_completion_tokens — The recommended parameter for GPT-4.1, GPT-5 series, and o-series models. Required for reasoning models as it includes both output tokens and reasoning tokens.
CometAPI automatically handles the mapping when routing to different providers.
  • system — The traditional instruction role. Works with all models.
  • developer — Introduced with o1 models. Provides stronger instruction-following for newer models. Falls back to system behavior on older models.
Use developer for new projects targeting GPT-4.1+ or o-series models.

FAQ

How to handle rate limits?

When encountering 429 Too Many Requests, implement exponential backoff:
import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

How to maintain conversation context?

Include the full conversation history in the messages array:
messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

What does finish_reason mean?

ValueMeaning
stopNatural completion or hit a stop sequence.
lengthReached max_tokens or max_completion_tokens limit.
tool_callsThe model invoked one or more tool/function calls.
content_filterOutput was filtered due to content policy.

How to control costs?

  1. Use max_completion_tokens to cap output length.
  2. Choose cost-effective models (e.g., gpt-5.4-mini or gpt-5.4-nano for simpler tasks).
  3. Keep prompts concise — avoid redundant context.
  4. Monitor token usage in the usage response field.

Authorizations

Authorization
string
header
required

Bearer token authentication. Use your CometAPI key.

Body

application/json
model
string
default:gpt-5.4
required

Model ID to use for this request. See the Models page for current options.

Example:

"gpt-4.1"

messages
object[]
required

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

stream
boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature
number
default:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

Required range: 0 <= x <= 2
top_p
number
default:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

Required range: 0 <= x <= 1
n
integer
default:1

Number of completion choices to generate for each input message. Defaults to 1.

stop
string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens
integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty
number
default:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

Required range: -2 <= x <= 2
frequency_penalty
number
default:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

Required range: -2 <= x <= 2
logit_bias
object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user
string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens
integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format
object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

tools
object[]

A list of tools the model may call. Currently supports function type tools.

tool_choice
default:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs
boolean
default:false

Whether to return log probabilities of the output tokens.

top_logprobs
integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

Required range: 0 <= x <= 20
reasoning_effort
enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

Available options:
low,
medium,
high
stream_options
object

Options for streaming. Only valid when stream is true.

service_tier
enum<string>

Specifies the processing tier.

Available options:
auto,
default,
flex,
priority

Response

Successful chat completion response.

id
string

Unique completion identifier.

Example:

"chatcmpl-abc123"

object
enum<string>
Available options:
chat.completion
Example:

"chat.completion"

created
integer

Unix timestamp of creation.

Example:

1774412483

model
string

The model used (may include version suffix).

Example:

"gpt-5.4-2025-07-16"

choices
object[]

Array of completion choices.

usage
object
service_tier
string
Example:

"default"

system_fingerprint
string | null
Example:

"fp_490a4ad033"