Chat Completions - CometAPI Documentation

POST

chat

completions

from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

CometAPI định tuyến Chat Completions đến nhiều nhà cung cấp — bao gồm OpenAI, Claude và Gemini — thông qua một giao diện tương thích OpenAI duy nhất. Chuyển đổi giữa các model bằng cách thay đổi tham số model; hầu hết các SDK tương thích OpenAI đều hoạt động bằng cách đặt base_url thành https://api.cometapi.com/v1.

Các model khác nhau hỗ trợ các tập con tham số khác nhau và trả về các trường phản hồi hơi khác nhau. Ví dụ, reasoning_effort chỉ áp dụng cho các reasoning model (o-series, GPT-5.1+), và một số model không hỗ trợ logprobs hoặc n > 1.

Đối với các model OpenAI Pro, các reasoning model o-series và các model Codex, hãy sử dụng endpoint Responses thay thế. Các họ model này có mức hỗ trợ đầy đủ hơn trên Responses API.

Vai trò của message

Role	Description
`system`	Thiết lập hành vi và tính cách của assistant. Được đặt ở đầu cuộc hội thoại.
`developer`	Thay thế `system` cho các model mới hơn (o1+). Cung cấp các chỉ dẫn mà model phải tuân theo bất kể đầu vào của người dùng.
`user`	Các tin nhắn từ người dùng cuối.
`assistant`	Các phản hồi trước đó của model, được dùng để duy trì lịch sử hội thoại.
`tool`	Kết quả từ các lệnh gọi tool/function. Phải bao gồm `tool_call_id` khớp với lệnh gọi tool ban đầu.

Đối với các model mới hơn (GPT-4.1, GPT-5 series, o-series), nên ưu tiên developer thay cho system cho các message chỉ dẫn. Cả hai đều hoạt động, nhưng developer cung cấp hành vi tuân theo chỉ dẫn mạnh hơn.

Gửi đầu vào multimodal

Nhiều model hỗ trợ hình ảnh và âm thanh cùng với văn bản. Để gửi các message multimodal, hãy dùng định dạng mảng cho content:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}

Tham số detail kiểm soát độ sâu phân tích hình ảnh:

low — nhanh hơn, dùng ít token hơn (chi phí cố định)
high — phân tích chi tiết, tiêu tốn nhiều token hơn
auto — model tự quyết định (mặc định)

Stream phản hồi

Để nhận đầu ra tăng dần, hãy đặt stream thành true. Phản hồi được gửi dưới dạng Server-Sent Events (SSE), trong đó mỗi event chứa một đối tượng chat.completion.chunk:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Để đưa thống kê sử dụng token vào các phản hồi streaming, hãy đặt stream_options.include_usage thành true. Dữ liệu usage sẽ xuất hiện trong chunk cuối cùng trước [DONE].

Yêu cầu đầu ra có cấu trúc

Để buộc model trả về JSON hợp lệ khớp với một schema cụ thể, hãy sử dụng response_format:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}

Chế độ JSON Schema (json_schema) đảm bảo đầu ra khớp chính xác với schema của bạn. Chế độ JSON Object (json_object) chỉ đảm bảo JSON hợp lệ — cấu trúc không được áp đặt.

Gọi tools và functions

Để cho phép model gọi các function bên ngoài, hãy cung cấp định nghĩa tool:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Khi model quyết định gọi một tool, phản hồi sẽ có finish_reason: "tool_calls" và mảng message.tool_calls sẽ chứa tên function cùng các đối số. Sau đó bạn thực thi function và gửi kết quả trở lại dưới dạng một message tool với tool_call_id tương ứng.

Ghi chú giữa các nhà cung cấp

Hỗ trợ tham số giữa các nhà cung cấp

Tham số	OpenAI GPT	Claude (qua compat)	Gemini (qua compat)
`temperature`	0–2	0–1	0–2
`top_p`	0–1	0–1	0–1
`n`	1–128	chỉ 1	1–8
`stop`	Tối đa 4	Tối đa 4	Tối đa 5
`tools`	✅	✅	✅
`response_format`	✅	✅ (json_schema)	✅
`logprobs`	✅	❌	❌
`reasoning_effort`	o-series, GPT-5.1+	❌	❌ (dùng `thinking` cho Gemini native)

max_tokens so với max_completion_tokens

max_tokens — Tham số cũ. Hoạt động với hầu hết model nhưng đã bị deprecated đối với các model OpenAI mới hơn.
max_completion_tokens — Tham số được khuyến nghị cho GPT-4.1, dòng GPT-5 và các model o-series. Bắt buộc đối với các model reasoning vì nó bao gồm cả output tokens và reasoning tokens.

CometAPI tự động xử lý việc ánh xạ khi định tuyến tới các nhà cung cấp khác nhau.

role system so với developer

system — Role chỉ dẫn truyền thống. Hoạt động với tất cả model.
developer — Được giới thiệu cùng các model o1. Cung cấp khả năng tuân theo chỉ dẫn mạnh hơn cho các model mới hơn. Tự động quay về hành vi system trên các model cũ hơn.

Dùng developer cho các dự án mới nhắm tới GPT-4.1+ hoặc các model o-series.

Câu hỏi thường gặp

Cách xử lý giới hạn tốc độ?

Khi gặp lỗi 429 Too Many Requests, hãy triển khai exponential backoff:

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.4",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

Cách duy trì ngữ cảnh hội thoại?

Bao gồm toàn bộ lịch sử hội thoại trong mảng messages:

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

`finish_reason` có nghĩa là gì?

Giá trị	Ý nghĩa
`stop`	Hoàn tất tự nhiên hoặc chạm đến chuỗi dừng.
`length`	Đã đạt giới hạn `max_tokens` hoặc `max_completion_tokens`.
`tool_calls`	Model đã gọi một hoặc nhiều công cụ/hàm.
`content_filter`	Đầu ra đã bị lọc do chính sách nội dung.

Cách kiểm soát chi phí?

Sử dụng max_completion_tokens để giới hạn độ dài đầu ra.
Chọn các model tiết kiệm chi phí (ví dụ: gpt-5.4-mini hoặc gpt-5.4-nano cho các tác vụ đơn giản hơn).
Giữ prompt ngắn gọn — tránh ngữ cảnh dư thừa.
Theo dõi mức sử dụng token trong trường phản hồi usage.

Ủy quyền

Authorization

string

header

bắt buộc

Bearer token authentication. Use your CometAPI key.

Nội dung

application/json

model

string

mặc định:gpt-5.4

bắt buộc

Model ID to use for this request. See the Models page for current options.

Ví dụ:

"gpt-4.1"

messages

object[]

bắt buộc

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

Show child attributes

stream

boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature

number

mặc định:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

Phạm vi bắt buộc: 0 <= x <= 2

top_p

number

mặc định:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

Phạm vi bắt buộc: 0 <= x <= 1

integer

mặc định:1

Number of completion choices to generate for each input message. Defaults to 1.

stop

string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens

integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty

number

mặc định:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

Phạm vi bắt buộc: -2 <= x <= 2

frequency_penalty

number

mặc định:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

Phạm vi bắt buộc: -2 <= x <= 2

logit_bias

object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user

string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens

integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format

object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently supports function type tools.

Show child attributes

tool_choice

mặc định:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs

boolean

mặc định:false

Whether to return log probabilities of the output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

Phạm vi bắt buộc: 0 <= x <= 20

reasoning_effort

enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

Tùy chọn có sẵn:

low,

medium,

high

stream_options

object

Options for streaming. Only valid when stream is true.

Show child attributes

service_tier

enum<string>

Specifies the processing tier.

Tùy chọn có sẵn:

auto,

default,

flex,

priority

Phản hồi

Successful chat completion response.

string

Unique completion identifier.

Ví dụ:

"chatcmpl-abc123"

object

enum<string>

Tùy chọn có sẵn:

chat.completion

Ví dụ:

"chat.completion"

created

integer

Unix timestamp of creation.

Ví dụ:

1774412483

model

string

The model used (may include version suffix).

Ví dụ:

"gpt-5.4-2025-07-16"

choices

object[]

Array of completion choices.

Show child attributes

usage

object

Show child attributes

service_tier

string

Ví dụ:

"default"

system_fingerprint

string | null

Ví dụ:

"fp_490a4ad033"

Responses

from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key="<COMETAPI_KEY>",
)

completion = client.chat.completions.create(
    model="gpt-5.4",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

Documentation Index

​Vai trò của message

​Gửi đầu vào multimodal

​Stream phản hồi

​Yêu cầu đầu ra có cấu trúc

​Gọi tools và functions

​Ghi chú giữa các nhà cung cấp

​Câu hỏi thường gặp

​Cách xử lý giới hạn tốc độ?

​Cách duy trì ngữ cảnh hội thoại?

​finish_reason có nghĩa là gì?

​Cách kiểm soát chi phí?

Ủy quyền

Nội dung

Phản hồi

Vai trò của message

Gửi đầu vào multimodal

Stream phản hồi

Yêu cầu đầu ra có cấu trúc

Gọi tools và functions

Ghi chú giữa các nhà cung cấp

Câu hỏi thường gặp

Cách xử lý giới hạn tốc độ?

Cách duy trì ngữ cảnh hội thoại?

`finish_reason` có nghĩa là gì?

Cách kiểm soát chi phí?