Chat

Overview

chat/completions is the most common LLM API interface, which takes a list of messages forming a conversation as input and returns intelligent model responses.

Important Notes

⚠️ Model Variations
Different model providers may support different request parameters and return varying response fields. We strongly recommend consulting the respective model provider's official documentation for complete parameter lists and usage methods.

⚠️ Response Pass-through Principle
CometAPI usually does not modify the model’s responses, except when restoring certain internally developed model calls.

Reference Documentation

For more detailed information about the chat/completions interface, please refer to the OpenAI Official Documentation.

OpenAI Related Guides:

Conversation State Management

API Reference

Request Parameters

Required Parameters

model string required
Specifies the model ID to use for generating responses.

{
  "model": "gpt-4"
}

messages array required
A list of conversation messages containing roles and content. Each message must include:

role string - The role of the message, possible values:

system - System message to set assistant behavior

user - User message

assistant - Assistant's historical replies

content string - The specific content of the message

{
  "messages": [
    {
      "role": "system",
      "content": "You are a professional AI assistant"
    },
    {
      "role": "user",
      "content": "What is machine learning?"
    }
  ]
}

Optional Parameters

stream boolean optional
Whether to enable streaming responses. When set to true, the response will be returned in chunks as Server-Sent Events (SSE).

Default: false

{
  "stream": true
}

temperature number optional
Controls response randomness, range 0-2.

Lower values (e.g., 0.2): More deterministic and focused

Higher values (e.g., 1.8): More random and creative

Default: 1.0

max_tokens integer optional
Limits the maximum number of tokens to generate.

top_p number optional
Nucleus sampling parameter, range 0-1. Recommended not to adjust both temperature and top_p simultaneously.

FAQ

How to handle rate limits?

When encountering 429 Too Many Requests, implement exponential backoff retry:

How to maintain conversation context?

Include the complete conversation history in the messages array:

What does finish_reason mean?

Value	Meaning
`stop`	Natural completion
`length`	Reached max_tokens limit
`content_filter`	Triggered content filter
`function_call`	Model called a function

How to control costs?

Use max_tokens to limit output length

Choose appropriate models (e.g., gpt-3.5-turbo is more economical)

Streamline prompts and avoid redundant context

Monitor token consumption in the usage field

curl --location --request POST 'https://api.cometapi.com/v1/chat/completions' \ --header 'Authorization: Bearer ' \ --header 'Content-Type: application/json' \ --data-raw '{ "model": "gpt-5.1", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Hello!" } ] }'

{ "id": "chatcmpl-CbnYmQAVmFC6IzQTs9X0bFc3J1S7q", "object": "chat.completion", "created": 1763124680, "model": "gpt-5.1-2025-11-13", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?", "refusal": null, "annotations": [] }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 18, "completion_tokens": 18, "total_tokens": 36, "prompt_tokens_details": { "cached_tokens": 0, "audio_tokens": 0 }, "completion_tokens_details": { "reasoning_tokens": 0, "audio_tokens": 0, "accepted_prediction_tokens": 0, "rejected_prediction_tokens": 0 } }, "service_tier": "default", "system_fingerprint": null }

Overview#

Important Notes#

Reference Documentation#

API Reference#

Request Parameters#

Required Parameters#

Optional Parameters#

FAQ#

How to handle rate limits?#

How to maintain conversation context?#

What does finish_reason mean?#

How to control costs?#

Request

Responses