메시지 생성

POST

messages

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    },
    "output_tokens_details": {
      "thinking_tokens": 123
    }
  }
}

CometAPI는 Anthropic 메시지 API를 네이티브로 지원하여 Anthropic 전용 기능과 함께 Claude 모델에 직접 액세스할 수 있게 해줍니다. adaptive thinking, 프롬프트 캐싱, effort control과 같은 Claude 기능에는 이 엔드포인트를 사용하세요.

전체 파라미터 목록, 응답 스키마, Claude 전용 동작에 대한 권위 있는 기준으로 공식 Anthropic Messages API reference를 사용하세요. 이 CometAPI 페이지는 해당 요청 형태를 CometAPI를 통해 어떻게 보내는지 설명합니다.

Claude 기능이 발전함에 따라 Anthropic 요청 파라미터와 응답 필드가 변경될 수 있습니다. 최신 전체 파라미터 목록과 provider별 동작은 Anthropic Messages API 문서에서 확인하세요.

많은 최신 Claude 모델은 Messages API에서 기본값이 아닌 temperature, top_p, top_k 값을 거부합니다. 선택한 모델에서 지원이 확인된 경우가 아니라면 이러한 샘플링 필드는 생략하세요. 모델이 unsupported 또는 deprecated-parameter 오류를 반환하면 요청에서 해당 필드를 제거하세요.

인증에는 x-api-key와 Authorization: Bearer 헤더를 모두 지원합니다. 공식 Anthropic SDK는 기본적으로 x-api-key를 사용합니다.

빠른 시작

CometAPI와 함께 공식 Anthropic SDK를 사용하려면 base URL을 설정하세요:

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
print(message.content[0].text)

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
    apiKey: process.env.COMETAPI_KEY,
    baseURL: "https://api.cometapi.com",
});

const message = await client.messages.create({
    model: "claude-sonnet-5",
    max_tokens: 1024,
    messages: [{ role: "user", content: "Hello!" }],
});
console.log(message.content[0].text);

adaptive thinking 제어

output_config.effort와 함께 adaptive thinking을 사용하면 Claude가 응답에 얼마나 많은 작업을 적용할지 제어할 수 있습니다. 최신 Claude 모델은 레거시 수동 thinking 형식인 thinking={"type": "enabled", "budget_tokens": ...}를 거부합니다.

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=4096,
    thinking={"type": "adaptive"},
    output_config={"effort": "xhigh"},
    messages=[
        {
            "role": "user",
            "content": "Analyze the trade-offs between a monolithic architecture and microservices for a small engineering team.",
        }
    ],
)

for block in message.content:
    if block.type == "text":
        print(block.text)

Thinking 토큰(Token)은 max_tokens 제한에 포함됩니다. 더 높은 effort 수준을 사용할 때는 thinking과 최종 답변 모두를 위해 max_tokens를 충분히 높게 설정하세요.

프롬프트 캐시

후속 요청의 지연 시간과 비용을 줄이기 위해, 큰 system 프롬프트 또는 대화 접두사를 캐시하세요. 캐시해야 하는 content 블록에 cache_control을 추가합니다:

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "You are an expert code reviewer. [Long detailed instructions...]",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Review this code..."}],
)

캐시 사용량은 응답의 usage 필드에 보고됩니다:

cache_creation_input_tokens — 캐시에 기록된 토큰(Token) 수(더 높은 요금으로 청구)
cache_read_input_tokens — 캐시에서 읽은 토큰(Token) 수(할인된 요금으로 청구)

프롬프트 캐싱을 사용하려면 캐시된 content 블록에 최소 1,024 tokens가 필요합니다. 이보다 짧은 content는 캐시되지 않습니다.

응답 스트리밍

Server-Sent Events (SSE)를 사용해 응답을 스트리밍하려면 stream: true로 설정하세요. 이벤트는 다음 순서로 도착합니다:

message_start — 메시지 메타데이터와 초기 usage를 포함
content_block_start — 각 content 블록의 시작을 표시
content_block_delta — 점진적으로 전달되는 텍스트 청크(text_delta)
content_block_stop — 각 content 블록의 끝을 표시
message_delta — 최종 stop_reason 및 전체 usage
message_stop — 스트림의 끝을 알림

with client.messages.stream(
    model="claude-sonnet-5",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
) as stream:
    for text in stream.text_stream:
        print(text, end="")

노력 수준 제어

Claude가 응답을 생성할 때 얼마나 많은 노력을 들일지 제어하려면 output_config.effort를 사용하세요:

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=4096,
    messages=[
        {"role": "user", "content": "Summarize this briefly."}
    ],
    output_config={"effort": "low"},  # "low", "medium", "high", "xhigh", or "max"
)

서버 도구 사용

Claude는 Anthropic의 인프라에서 실행되는 서버 측 도구를 지원합니다:

Web Fetch
Web Search

URL의 콘텐츠를 가져와 분석합니다:

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Analyze the content at https://arxiv.org/abs/1512.03385"}
    ],
    tools=[
        {"type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5}
    ],
)

실시간 정보를 위해 웹을 검색합니다:

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What are the latest developments in AI?"}
    ],
    tools=[
        {"type": "web_search_20250305", "name": "web_search", "max_uses": 5}
    ],
)

응답 예시

CometAPI의 Anthropic 엔드포인트에서 반환되는 일반적인 응답 예시입니다:

{
  "id": "msg_bdrk_01UjHdmSztrL7QYYm7CKBDFB",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "type": "text",
      "text": "Hello!"
    }
  ],
  "model": "claude-sonnet-5",
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 19,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 0,
      "ephemeral_1h_input_tokens": 0
    },
    "output_tokens": 4
  }
}

OpenAI 호환 엔드포인트와 비교

기능	Anthropic 메시지 (`/v1/messages`)	OpenAI 호환 (`/v1/chat/completions`)
적응형 사고	`thinking`과 `type: "adaptive"`, `output_config.effort` 사용	지원되지 않음
프롬프트 캐싱	content 블록의 `cache_control`	지원되지 않음
effort 제어	`output_config.effort`	지원되지 않음
웹 가져오기/검색	서버 도구(`web_fetch`, `web_search`)	지원되지 않음
인증 헤더	`x-api-key` 또는 `Bearer`	`Bearer`만 지원
응답 형식	Anthropic 형식(`content` 블록)	OpenAI 형식(`choices`, `message`)
모델	Claude 전용	멀티 프로바이더(GPT, Claude, Gemini 등)

인증

x-api-key

string

header

필수

Your CometAPI key passed via the x-api-key header. Authorization: Bearer $COMETAPI_KEY is also supported.

헤더

anthropic-version

string

기본값:2023-06-01

The Anthropic API version to use. Defaults to 2023-06-01.

예시:

"2023-06-01"

anthropic-beta

string

Comma-separated list of beta features to enable. Examples: max-tokens-3-5-sonnet-2024-07-15, pdfs-2024-09-25, output-128k-2025-02-19.

본문

application/json

model

string

필수

The Claude model to use. See the Models page for available Claude model IDs.

예시:

"claude-sonnet-5"

messages

object[]

필수

The conversation messages. Must alternate between user and assistant roles. Each message's content can be a string or an array of content blocks (text, image, document, tool_use, tool_result). There is a limit of 100,000 messages per request.

Show child attributes

max_tokens

integer

필수

The maximum number of tokens to generate. The model may stop before reaching this limit. When using thinking, the thinking tokens count towards this limit.

필수 범위: x >= 1

예시:

1024

system

System prompt providing context and instructions to Claude. Can be a plain string or an array of content blocks (useful for prompt caching).

temperature

number

기본값:1

Model-dependent sampling control. Many newer Claude models reject non-default temperature values on the Messages API. Omit this field unless you have verified that the selected model accepts it; if the model returns an unsupported or deprecated-parameter error, remove the field instead of substituting another sampling value.

필수 범위: 0 <= x <= 1

예시:

1

top_p

number

Model-dependent nucleus sampling control. Many newer Claude models reject non-default top_p values on the Messages API. Omit this field unless you have verified support for the selected model. Do not set temperature and top_p together.

필수 범위: 0 <= x <= 1

예시:

1

top_k

integer

Model-dependent top-k sampling control. Many newer Claude models reject non-default top_k values on the Messages API. Omit this field unless you have verified support for the selected model.

필수 범위: x >= 0

예시:

0

stream

boolean

기본값:false

If true, stream the response incrementally using Server-Sent Events (SSE). Events include message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop.

stop_sequences

string[]

Custom strings that cause the model to stop generating when encountered. The stop sequence is not included in the response.

thinking

object

Controls Claude thinking when the selected model supports a configurable thinking mode. For newer adaptive-thinking models, use {"type":"adaptive"} with output_config.effort, or omit thinking when adaptive thinking is already the model default. Manual {"type":"enabled","budget_tokens":...} is supported only by older models and is rejected by newer Claude models.

Show child attributes

tools

object[]

Tools the model may use. Supports client-defined functions, web search (web_search_20250305), web fetch (web_fetch_20250910), code execution (code_execution_20250522), and more.

Show child attributes

tool_choice

object

Controls how the model uses tools.

Show child attributes

metadata

object

Request metadata for tracking and analytics.

Show child attributes

output_config

object

Configuration for response effort and output format. Field support depends on the selected Claude model.

Show child attributes

service_tier

enum<string>

The service tier to use. auto tries priority capacity first, standard_only uses only standard capacity.

사용 가능한 옵션:

auto,

standard_only

응답

200 - application/json

Successful response. When stream is true, the response is a stream of SSE events.

string

Unique identifier for this message (e.g., msg_01XFDUDYJgAACzvnptvVoYEL).

type

enum<string>

Always message.

사용 가능한 옵션:

message

role

enum<string>

Always assistant.

사용 가능한 옵션:

assistant

content

object[]

The response content blocks. May include text, thinking, tool_use, and other block types.

Show child attributes

model

string

The specific model version that generated this response, such as claude-sonnet-5.

stop_reason

enum<string>

Why the model stopped generating. refusal can be returned as a successful HTTP response when the model declines a request.

사용 가능한 옵션:

end_turn,

max_tokens,

stop_sequence,

tool_use,

pause_turn,

refusal

stop_sequence

string | null

The stop sequence that caused the model to stop, if applicable.

usage

object

Token usage statistics.

Show child attributes

모델 응답 생성

콘텐츠 생성

import os
import anthropic

client = anthropic.Anthropic(
    base_url="https://api.cometapi.com",
    api_key=os.environ["COMETAPI_KEY"],
)

message = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    system="You are a helpful assistant.",
    messages=[
        {"role": "user", "content": "Hello, world"}
    ],
)

print(message.content[0].text)

{
  "id": "<string>",
  "type": "message",
  "role": "assistant",
  "content": [
    {
      "text": "<string>",
      "thinking": "<string>",
      "signature": "<string>",
      "id": "<string>",
      "name": "<string>",
      "input": {}
    }
  ],
  "model": "<string>",
  "stop_sequence": "<string>",
  "usage": {
    "input_tokens": 123,
    "output_tokens": 123,
    "cache_creation_input_tokens": 123,
    "cache_read_input_tokens": 123,
    "cache_creation": {
      "ephemeral_5m_input_tokens": 123,
      "ephemeral_1h_input_tokens": 123
    },
    "output_tokens_details": {
      "thinking_tokens": 123
    }
  }
}

콘텐츠 검토

API 키

빠른 시작

adaptive thinking 제어

프롬프트 캐시

응답 스트리밍

노력 수준 제어

서버 도구 사용

응답 예시

OpenAI 호환 엔드포인트와 비교

인증

헤더

본문

응답

​빠른 시작

​adaptive thinking 제어

​프롬프트 캐시

​응답 스트리밍

​노력 수준 제어

​서버 도구 사용

​응답 예시

​OpenAI 호환 엔드포인트와 비교

인증

헤더

본문

응답

빠른 시작

adaptive thinking 제어

프롬프트 캐시

응답 스트리밍

노력 수준 제어

서버 도구 사용

응답 예시

OpenAI 호환 엔드포인트와 비교