Anthropic 訊息 - CometAPI Documentation

POST

messages

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    }
  }
}

CometAPI 原生支援 Anthropic 訊息 API，讓你能直接存取 Claude 模型及所有 Anthropic 專屬功能。當你需要 Claude 專有能力，例如延伸思考、Prompt 快取與 effort control 時，請使用此端點。

支援使用 x-api-key 與 Authorization: Bearer 標頭進行驗證。官方 Anthropic SDK 預設使用 x-api-key。

快速開始

若要搭配 CometAPI 使用官方 Anthropic SDK，請設定 base URL：

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

啟用延伸思考

使用 thinking 參數啟用 Claude 的逐步推理能力。回應會包含 thinking 內容區塊，顯示 Claude 在給出最終答案前的內部推理過程。

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,
    },
    messages=[
        {"role": "user", "content": "Prove that there are infinitely many primes."}
    ],
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Thinking 需要至少 1,024 個 budget_tokens。Thinking Token 會計入你的 max_tokens 限制，因此請將 max_tokens 設定得足夠高，以容納 thinking 與回應內容。

快取 Prompt

為了降低後續請求的延遲與成本，你可以快取大型 system Prompt 或對話前綴。請在需要快取的內容區塊中加入 cache_control：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

快取使用情況會在回應的 usage 欄位中回報：

cache_creation_input_tokens — 寫入快取的 tokens（以較高費率計費）
cache_read_input_tokens — 從快取讀取的 tokens（以較低費率計費）

Prompt 快取要求被快取的內容區塊至少有 1,024 tokens。短於此長度的內容將不會被快取。

串流回應

若要使用 Server-Sent Events (SSE) 串流回應，請設定 stream: true。事件會依照以下順序到達：

message_start — 包含訊息中繼資料與初始 usage
content_block_start — 標記每個內容區塊的開始
content_block_delta — 遞增文字分塊（text_delta）
content_block_stop — 標記每個內容區塊的結束
message_delta — 最終的 stop_reason 與完整的 usage
message_stop — 表示串流結束

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

控制 effort

若要控制 Claude 在產生回應時投入多少 effort，請使用 output_config.effort：

message = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", or "high"
)

使用伺服器工具

Claude 支援在 Anthropic 基礎架構上執行的伺服器端工具：

Web Fetch
Web Search

從 URL 擷取並分析內容：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

搜尋網路以取得即時資訊：

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

回應範例

來自 CometAPI Anthropic 端點的典型回應：

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-4-6",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

與 OpenAI 相容端點比較

功能	Anthropic 訊息 (`/v1/messages`)	OpenAI-Compatible (`/v1/chat/completions`)
延伸思考	帶有 `budget_tokens` 的 `thinking` 參數	不可用
Prompt 快取	內容區塊上的 `cache_control`	不可用
Effort 控制	`output_config.effort`	不可用
Web 擷取/搜尋	伺服器工具（`web_fetch`、`web_search`）	不可用
驗證標頭	`x-api-key` 或 `Bearer`	僅 `Bearer`
回應格式	Anthropic 格式（`content` 區塊）	OpenAI 格式（`choices`、`message`）
模型	僅 Claude	多供應商（GPT、Claude、Gemini 等）

授權

x-api-key

string

header

必填

Your CometAPI key passed via the x-api-key header. Authorization: Bearer <key> is also supported.

標頭

anthropic-version

string

預設值:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

範例:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

主體

application/json

model

string

必填

The Claude model to use. See the Models page for current Claude model IDs.

範例:

"claude-sonnet-4-6"

messages

object[]

必填

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

必填

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

必填範圍: x >= 1

範例:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

預設值:1

Controls randomness in the response. Range: 0.0–1.0. Use lower values for analytical tasks and higher values for creative tasks. Defaults to 1.0.

必填範圍: 0 <= x <= 1

top_p

number

Nucleus sampling threshold. Only tokens with cumulative probability up to this value are considered. Range: 0.0–1.0. Use either temperature or top_p, not both.

必填範圍: 0 <= x <= 1

top_k

integer

Only sample from the top K most probable tokens. Recommended for advanced use cases only.

必填範圍: x >= 0

stream

boolean

預設值:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Enable extended thinking — Claude's step-by-step reasoning process. When enabled, the response includes thinking content blocks before the answer. Requires a minimum budget_tokens of 1,024.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for output behavior.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

可用選項:

auto,

standard_only

回應

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

可用選項:

message

role

enum<string>

Always assistant.

可用選項:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response (e.g., claude-sonnet-4-6).

stop_reason

enum<string>

Why the model stopped generating.

可用選項:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

回應

Gemini 產生內容

import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key="<COMETAPI_KEY>",
)

message = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    }
  }
}

Documentation Index

​快速開始

​啟用延伸思考

​快取 Prompt

​串流回應

​控制 effort

​使用伺服器工具

​回應範例

​與 OpenAI 相容端點比較

授權

標頭

主體

回應

快速開始

啟用延伸思考

快取 Prompt

串流回應

控制 effort

使用伺服器工具

回應範例

與 OpenAI 相容端點比較