Use o CometAPI POST /v1/chat/completions para enviar conversas com múltiplas mensagens e obter respostas de LLM com controles de Streaming, temperature e max_tokens.
from openai import OpenAI
client = OpenAI(
base_url="https://api.cometapi.com/v1",
api_key="<COMETAPI_KEY>",
)
completion = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(completion.choices[0].message){
"id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
"object": "chat.completion",
"created": 1774412483,
"model": "gpt-4.1-nano-2025-04-14",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Four",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 29,
"completion_tokens": 2,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "fp_490a4ad033"
}model.
base_url para https://api.cometapi.com/v1.reasoning_effort se aplica apenas a modelos de raciocínio (série o, GPT-5.1+), e alguns modelos podem não oferecer suporte a logprobs ou n > 1.o1-pro), use o endpoint responses em vez deste.| Role | Description |
|---|---|
system | Define o comportamento e a personalidade do assistente. É colocado no início da conversa. |
developer | Substitui system para modelos mais novos (o1+). Fornece instruções que o modelo deve seguir independentemente da entrada do usuário. |
user | Mensagens do usuário final. |
assistant | Respostas anteriores do modelo, usadas para manter o histórico da conversa. |
tool | Resultados de chamadas de ferramenta/função. Deve incluir tool_call_id correspondente à chamada de ferramenta original. |
developer a system para mensagens de instrução. Ambos funcionam, mas developer oferece um comportamento mais forte de seguimento de instruções.content para enviar mensagens multimodais:
{
"role": "user",
"content": [
{"type": "text", "text": "Describe this image"},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/image.png",
"detail": "high"
}
}
]
}
detail controla a profundidade da análise da imagem:
low — mais rápido, usa menos tokens (custo fixo)high — análise detalhada, mais tokens consumidosauto — o modelo decide (padrão)stream é definido como true, a resposta é entregue como Server-Sent Events (SSE). Cada evento contém um objeto chat.completion.chunk com conteúdo incremental:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
stream_options.include_usage como true. Os dados de uso aparecem no chunk final antes de [DONE].response_format:
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "result",
"strict": true,
"schema": {
"type": "object",
"properties": {
"answer": {"type": "string"},
"confidence": {"type": "number"}
},
"required": ["answer", "confidence"],
"additionalProperties": false
}
}
}
}
json_schema) garante que a saída corresponda exatamente ao seu schema. O modo JSON Object (json_object) garante apenas JSON válido — a estrutura não é imposta.{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}
finish_reason: "tool_calls" e o array message.tool_calls conterá o nome da função e os argumentos. Em seguida, você executa a função e envia o resultado de volta como uma mensagem tool com o tool_call_id correspondente.
| Field | Descrição |
|---|---|
id | Identificador único da completion (por exemplo, chatcmpl-abc123). |
object | Sempre chat.completion. |
model | O model que gerou a resposta (pode incluir sufixo de versão). |
choices | Array de opções de completion (normalmente 1, a menos que n > 1). |
choices[].message | A mensagem de resposta do assistant com role, content e, opcionalmente, tool_calls. |
choices[].finish_reason | Motivo pelo qual o model parou: stop, length, tool_calls ou content_filter. |
usage | Detalhamento do consumo de Tokens: prompt_tokens, completion_tokens, total_tokens e subcontagens detalhadas. |
system_fingerprint | Fingerprint da configuração de backend para depuração da reprodutibilidade. |
Suporte de parâmetros entre provedores
| Parameter | OpenAI GPT | Claude (via compat) | Gemini (via compat) |
|---|---|---|---|
temperature | 0–2 | 0–1 | 0–2 |
top_p | 0–1 | 0–1 | 0–1 |
n | 1–128 | apenas 1 | 1–8 |
stop | Até 4 | Até 4 | Até 5 |
tools | ✅ | ✅ | ✅ |
response_format | ✅ | ✅ (json_schema) | ✅ |
logprobs | ✅ | ❌ | ❌ |
reasoning_effort | o-series, GPT-5.1+ | ❌ | ❌ (use thinking para Gemini nativo) |
max_tokens vs max_completion_tokens
max_tokens — O parâmetro legado. Funciona com a maioria dos modelos, mas está obsoleto para os modelos OpenAI mais novos.max_completion_tokens — O parâmetro recomendado para os modelos GPT-4.1, série GPT-5 e o-series. Necessário para modelos de reasoning, pois inclui tanto tokens de saída quanto tokens de reasoning.system vs developer role
system — O role tradicional de instrução. Funciona com todos os modelos.developer — Introduzido com os modelos o1. Fornece um seguimento de instruções mais forte para modelos mais novos. Recorre ao comportamento de system em modelos mais antigos.developer em novos projetos voltados para os modelos GPT-4.1+ ou o-series.429 Too Many Requests, implemente exponential backoff:
import time
import random
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://api.cometapi.com/v1",
api_key="<COMETAPI_KEY>",
)
def chat_with_retry(messages, max_retries=3):
for i in range(max_retries):
try:
return client.chat.completions.create(
model="gpt-5.4",
messages=messages,
)
except RateLimitError:
if i < max_retries - 1:
wait_time = (2 ** i) + random.random()
time.sleep(wait_time)
else:
raise
messages:
messages = [
{"role": "developer", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is Python?"},
{"role": "assistant", "content": "Python is a high-level programming language..."},
{"role": "user", "content": "What are its main advantages?"},
]
finish_reason?| Value | Meaning |
|---|---|
stop | Conclusão natural ou atingiu uma sequência de parada. |
length | Atingiu o limite de max_tokens ou max_completion_tokens. |
tool_calls | O modelo invocou uma ou mais chamadas de ferramenta/função. |
content_filter | A saída foi filtrada devido à política de conteúdo. |
max_completion_tokens para limitar o comprimento da saída.gpt-5.4-mini ou gpt-5.4-nano para tarefas mais simples).usage da resposta.Bearer token authentication. Use your CometAPI key.
Model ID to use for this request. See the Models page for current options.
"gpt-4.1"
A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).
Show child attributes
If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.
Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.
0 <= x <= 2Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.
0 <= x <= 1Number of completion choices to generate for each input message. Defaults to 1.
Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.
Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.
Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.
-2 <= x <= 2A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.
A unique identifier for your end-user. Helps with abuse detection and monitoring.
An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.
Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.
Show child attributes
A list of tools the model may call. Currently supports function type tools.
Show child attributes
Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.
Whether to return log probabilities of the output tokens.
Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.
0 <= x <= 20Controls the reasoning effort for o-series and GPT-5.1+ models.
low, medium, high Options for streaming. Only valid when stream is true.
Show child attributes
Specifies the processing tier.
auto, default, flex, priority Successful chat completion response.
Unique completion identifier.
"chatcmpl-abc123"
chat.completion "chat.completion"
Unix timestamp of creation.
1774412483
The model used (may include version suffix).
"gpt-5.4-2025-07-16"
Array of completion choices.
Show child attributes
Show child attributes
"default"
"fp_490a4ad033"
from openai import OpenAI
client = OpenAI(
base_url="https://api.cometapi.com/v1",
api_key="<COMETAPI_KEY>",
)
completion = client.chat.completions.create(
model="gpt-5.4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
],
)
print(completion.choices[0].message){
"id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
"object": "chat.completion",
"created": 1774412483,
"model": "gpt-4.1-nano-2025-04-14",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Four",
"refusal": null,
"annotations": []
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 29,
"completion_tokens": 2,
"total_tokens": 31,
"prompt_tokens_details": {
"cached_tokens": 0,
"audio_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0,
"audio_tokens": 0,
"accepted_prediction_tokens": 0,
"rejected_prediction_tokens": 0
}
},
"service_tier": "default",
"system_fingerprint": "fp_490a4ad033"
}