Create a Message - CometAPI Documentation

POST

messages

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    }
  }
}

CometAPI supports the Anthropic Messages API natively, giving you direct access to Claude models with all Anthropic-specific features. Use this endpoint for Claude-exclusive capabilities like extended thinking, prompt caching, and effort control.

For Claude Code setup and a direct Messages API test, start with the Claude API quickstart.

Anthropic request parameters and response fields can change as Claude features evolve. Check the Anthropic Messages API documentation for the latest complete parameter list and provider-specific behavior.

Both x-api-key and Authorization: Bearer headers are supported for authentication. The official Anthropic SDKs use x-api-key by default.

Quick start

To use the official Anthropic SDK with CometAPI, set the base URL:

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

Enable extended thinking

Enable Claude’s step-by-step reasoning with the thinking parameter. The response includes thinking content blocks showing Claude’s internal reasoning before the final answer.

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[
        {"role": "user", "content": "Prove that there are infinitely many primes."}
    ],
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Thinking requires a minimum budget_tokens of 1,024. Thinking tokens count towards your max_tokens limit — set max_tokens high enough to accommodate both thinking and the response.

Cache prompts

To reduce latency and cost on subsequent requests, cache large system prompts or conversation prefixes. Add cache_control to content blocks that should be cached:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

Cache usage is reported in the response usage field:

cache_creation_input_tokens — tokens written to cache (billed at a higher rate)
cache_read_input_tokens — tokens read from cache (billed at a reduced rate)

Prompt caching requires a minimum of 1,024 tokens in the cached content block. Content shorter than this will not be cached.

Stream responses

To stream responses using Server-Sent Events (SSE), set stream: true. Events arrive in this order:

message_start — contains the message metadata and initial usage
content_block_start — marks the beginning of each content block
content_block_delta — incremental text chunks (text_delta)
content_block_stop — marks the end of each content block
message_delta — final stop_reason and complete usage
message_stop — signals the end of the stream

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

Control effort

To control how much effort Claude puts into generating a response, use output_config.effort:

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", or "high"
)

Use server tools

Claude supports server-side tools that run on Anthropic’s infrastructure:

Web Fetch
Web Search

Fetch and analyze content from URLs:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

Search the web for real-time information:

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

Response example

A typical response from CometAPI’s Anthropic endpoint:

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

Compare with OpenAI-compatible endpoint

Feature	Anthropic Messages (`/v1/messages`)	OpenAI-Compatible (`/v1/chat/completions`)
Extended thinking	`thinking` parameter with `budget_tokens`	Not available
Prompt caching	`cache_control` on content blocks	Not available
Effort control	`output_config.effort`	Not available
Web fetch/search	Server tools (`web_fetch`, `web_search`)	Not available
Auth header	`x-api-key` or `Bearer`	`Bearer` only
Response format	Anthropic format (`content` blocks)	OpenAI format (`choices`, `message`)
Models	Claude only	Multi-provider (GPT, Claude, Gemini, etc.)

Authorizations

x-api-key

string

header

required

Your CometAPI key passed via the x-api-key header. Authorization: Bearer $COMETAPI_KEY is also supported.

Headers

anthropic-version

string

default:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

Example:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

Body

application/json

model

string

required

The Claude model to use. See the Models page for current Claude model IDs.

Example:

"claude-sonnet-4-6"

messages

object[]

required

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

required

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

Required range: x >= 1

Example:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

default:1

Controls randomness in the response. Range: 0.0–1.0. Use lower values for analytical tasks and higher values for creative tasks. Defaults to 1.0.

Required range: 0 <= x <= 1

top_p

number

Nucleus sampling threshold. Only tokens with cumulative probability up to this value are considered. Range: 0.0–1.0. Use either temperature or top_p, not both.

Required range: 0 <= x <= 1

top_k

integer

Only sample from the top K most probable tokens. Recommended for advanced use cases only.

Required range: x >= 0

stream

boolean

default:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Enable extended thinking — Claude's step-by-step reasoning process. When enabled, the response includes thinking content blocks before the answer. Requires a minimum budget_tokens of 1,024.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for output behavior.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

Available options:

auto,

standard_only

Response

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

Available options:

message

role

enum<string>

Always assistant.

Available options:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response (e.g., claude-sonnet-4-6).

stop_reason

enum<string>

Why the model stopped generating.

Available options:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

Create a model response

Generate content

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    }
  }
}

​Quick start

​Enable extended thinking

​Cache prompts

​Stream responses

​Control effort

​Use server tools

​Response example

​Compare with OpenAI-compatible endpoint

Authorizations

Headers

Body

Response

Quick start

Enable extended thinking

Cache prompts

Stream responses

Control effort

Use server tools

Response example

Compare with OpenAI-compatible endpoint