Chat Completions - CometAPI Documentation

POST

chat

completions

from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

O CometAPI encaminha Chat Completions para vários provedores — incluindo OpenAI, Claude e Gemini — por meio de uma única interface compatível com OpenAI. Alterne entre modelos mudando o parâmetro model; a maioria dos SDKs compatíveis com OpenAI funciona ao definir base_url como https://api.cometapi.com/v1.

Modelos diferentes oferecem suporte a subconjuntos diferentes de parâmetros e retornam campos de resposta ligeiramente diferentes. Por exemplo, reasoning_effort se aplica apenas a modelos de raciocínio (série o, GPT-5.1+), e alguns modelos não oferecem suporte a logprobs ou n > 1.

Para modelos OpenAI Pro, modelos de raciocínio da série o e modelos Codex, use o endpoint Responses. Essas famílias de modelos têm suporte mais completo na API Responses.

Papéis de mensagem

Role	Description
`system`	Define o comportamento e a personalidade do assistente. Fica no início da conversa.
`developer`	Substitui `system` nos modelos mais novos (o1+). Fornece instruções que o modelo deve seguir independentemente da entrada do usuário.
`user`	Mensagens do usuário final.
`assistant`	Respostas anteriores do modelo, usadas para manter o histórico da conversa.
`tool`	Resultados de chamadas de tool/function. Deve incluir `tool_call_id` correspondente à chamada de tool original.

Para modelos mais novos (GPT-4.1, série GPT-5, série o), prefira developer em vez de system para mensagens de instrução. Ambos funcionam, mas developer oferece um comportamento mais forte de seguimento de instruções.

Enviar entrada multimodal

Muitos modelos oferecem suporte a imagens e áudio junto com texto. Para enviar mensagens multimodais, use o formato de array para content:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}

O parâmetro detail controla a profundidade da análise da imagem:

low — mais rápido, usa menos tokens (custo fixo)
high — análise detalhada, mais tokens consumidos
auto — o modelo decide (padrão)

Transmitir respostas por Streaming

Para receber saída incremental, defina stream como true. A resposta é entregue como Server-Sent Events (SSE), em que cada evento contém um objeto chat.completion.chunk:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Para incluir estatísticas de uso de tokens em respostas de streaming, defina stream_options.include_usage como true. Os dados de uso aparecem no chunk final antes de [DONE].

Solicitar saída estruturada

Para forçar o modelo a retornar JSON válido que corresponda a um schema específico, use response_format:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}

O modo JSON Schema (json_schema) garante que a saída corresponda exatamente ao seu schema. O modo JSON Object (json_object) garante apenas JSON válido — a estrutura não é imposta.

Chamar tools e functions

Para permitir que o modelo chame funções externas, forneça definições de tools:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Quando o modelo decide chamar uma tool, a resposta terá finish_reason: "tool_calls" e o array message.tool_calls conterá o nome da função e os argumentos. Em seguida, você executa a função e envia o resultado de volta como uma mensagem tool com o tool_call_id correspondente.

Observações entre provedores

Parameter support across providers

Parameter	OpenAI GPT	Claude (via compat)	Gemini (via compat)
`temperature`	0–2	0–1	0–2
`top_p`	0–1	0–1	0–1
`n`	1–128	apenas 1	1–8
`stop`	Até 4	Até 4	Até 5
`tools`	✅	✅	✅
`response_format`	✅	✅ (json_schema)	✅
`logprobs`	✅	❌	❌
`reasoning_effort`	série o, GPT-5.1+	❌	❌ (use `thinking` para Gemini nativo)

max_tokens vs max_completion_tokens

max_tokens — O parâmetro legado. Funciona com a maioria dos modelos, mas está obsoleto para os modelos OpenAI mais novos.
max_completion_tokens — O parâmetro recomendado para GPT-4.1, série GPT-5 e modelos da série o. Obrigatório para modelos de raciocínio, pois inclui tanto tokens de saída quanto tokens de raciocínio.

O CometAPI lida automaticamente com o mapeamento ao encaminhar para diferentes provedores.

system vs developer role

system — O papel tradicional de instrução. Funciona com todos os modelos.
developer — Introduzido com os modelos o1. Fornece um seguimento de instruções mais forte para modelos mais novos. Recorre ao comportamento de system em modelos mais antigos.

Use developer em novos projetos voltados para GPT-4.1+ ou modelos da série o.

FAQ

Como lidar com rate limits?

Ao encontrar 429 Too Many Requests, implemente exponential backoff:

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

Como manter o contexto da conversa?

Inclua o histórico completo da conversa no array messages:

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

O que `finish_reason` significa?

Value	Meaning
`stop`	Conclusão natural ou atingiu uma sequência de parada.
`length`	Atingiu o limite de `max_tokens` ou `max_completion_tokens`.
`tool_calls`	O modelo invocou uma ou mais chamadas de tool/function.
`content_filter`	A saída foi filtrada devido à política de conteúdo.

Como controlar custos?

Use max_completion_tokens para limitar o comprimento da saída.
Escolha modelos com bom custo-benefício (por exemplo, gpt-5.4-mini ou gpt-5.4-nano para tarefas mais simples).
Mantenha os prompts concisos — evite contexto redundante.
Monitore o uso de tokens no campo de resposta usage.

Autorizações

Authorization

string

header

obrigatório

Bearer token authentication. Use your CometAPI key.

Corpo

application/json

model

string

padrão:gpt-5.4

obrigatório

Model ID to use for this request. See the Models page for current options.

Exemplo:

"gpt-4.1"

messages

object[]

obrigatório

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

Show child attributes

stream

boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature

number

padrão:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

Intervalo necessário: 0 <= x <= 2

top_p

number

padrão:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

Intervalo necessário: 0 <= x <= 1

integer

padrão:1

Number of completion choices to generate for each input message. Defaults to 1.

stop

string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens

integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty

number

padrão:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

Intervalo necessário: -2 <= x <= 2

frequency_penalty

number

padrão:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

Intervalo necessário: -2 <= x <= 2

logit_bias

object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user

string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens

integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format

object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently supports function type tools.

Show child attributes

tool_choice

padrão:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs

boolean

padrão:false

Whether to return log probabilities of the output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

Intervalo necessário: 0 <= x <= 20

reasoning_effort

enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

Opções disponíveis:

low,

medium,

high

stream_options

object

Options for streaming. Only valid when stream is true.

Show child attributes

service_tier

enum<string>

Specifies the processing tier.

Opções disponíveis:

auto,

default,

flex,

priority

Resposta

Successful chat completion response.

string

Unique completion identifier.

Exemplo:

"chatcmpl-abc123"

object

enum<string>

Opções disponíveis:

chat.completion

Exemplo:

"chat.completion"

created

integer

Unix timestamp of creation.

Exemplo:

1774412483

model

string

The model used (may include version suffix).

Exemplo:

"gpt-5.4-2025-07-16"

choices

object[]

Array of completion choices.

Show child attributes

usage

object

Show child attributes

service_tier

string

Exemplo:

"default"

system_fingerprint

string | null

Exemplo:

"fp_490a4ad033"

Responses

from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

Documentation Index

​Papéis de mensagem

​Enviar entrada multimodal

​Transmitir respostas por Streaming

​Solicitar saída estruturada

​Chamar tools e functions

​Observações entre provedores

​FAQ

​Como lidar com rate limits?

​Como manter o contexto da conversa?

​O que finish_reason significa?

​Como controlar custos?

Autorizações

Corpo

Resposta

Papéis de mensagem

Enviar entrada multimodal

Transmitir respostas por Streaming

Solicitar saída estruturada

Chamar tools e functions

Observações entre provedores

FAQ

Como lidar com rate limits?

Como manter o contexto da conversa?

O que `finish_reason` significa?

Como controlar custos?