채팅 완성 생성

POST

chat

completions

import os
from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key=os.environ["COMETAPI_KEY"],
)

completion = client.chat.completions.create(
    model="gpt-5.6-sol",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

{
  "error": {
    "code": "",
    "message": "model name is required (request id: <request_id>)",
    "type": "comet_api_error"
  }
}

{
  "error": {
    "code": "",
    "message": "invalid token (request id: <request_id>)",
    "type": "comet_api_error"
  }
}

{
  "error": {
    "message": "field messages is required (request id: <request_id>)",
    "type": "comet_api_error",
    "param": "",
    "code": "invalid_request"
  }
}

CometAPI는 단일 OpenAI 호환 인터페이스를 통해 OpenAI, Claude, Gemini를 비롯한 여러 제공업체로 채팅 완성을 라우팅합니다. model 매개변수를 변경하여 모델을 전환할 수 있으며, 대부분의 OpenAI 호환 SDK는 base_url을 https://api.cometapi.com/v1으로 설정하면 작동합니다.

요청 매개변수와 응답 필드는 모델 제공업체마다 크게 다를 수 있습니다. 전체 매개변수 목록이나 제공업체별 동작이 필요할 때는 항상 사용하는 모델의 제공업체 공식 문서를 확인하세요. 예를 들어 reasoning_effort은 추론 모델(o-series, GPT-5.1+)에만 적용되며, 일부 모델은 logprobs 또는 n > 1을 지원하지 않습니다.

OpenAI Pro 모델, o-series 추론 모델 및 Codex 모델에는 응답 엔드포인트를 대신 사용하세요. 이러한 모델 계열은 Responses API에서 더 완전하게 지원됩니다.

메시지 역할

역할	설명
`system`	어시스턴트의 동작과 성격을 설정합니다. 대화 시작 부분에 배치됩니다.
`developer`	최신 모델(o1 이상)에서는 `system`을 대체합니다. 사용자 입력과 관계없이 모델이 따라야 할 지침을 제공합니다.
`user`	최종 사용자가 보낸 메시지입니다.
`assistant`	대화 기록을 유지하는 데 사용되는 이전 모델 응답입니다.
`tool`	도구/함수 호출의 결과입니다. 원래 도구 호출과 일치하는 `tool_call_id`을 포함해야 합니다.

최신 모델(GPT-4.1, GPT-5 시리즈, o-series)에서는 지침 메시지에 system보다 developer을 사용하는 것이 좋습니다. 둘 다 작동하지만 developer이 더 강력한 지침 준수 동작을 제공합니다.

멀티모달 입력 전송

많은 모델이 텍스트와 함께 이미지 및 오디오를 지원합니다. 멀티모달 메시지를 전송하려면 content에 배열 형식을 사용하세요:

{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this image"},
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png",
        "detail": "high"
      }
    }
  ]
}

detail 매개변수는 이미지 분석 깊이를 제어합니다:

low — 더 빠르고 더 적은 토큰을 사용합니다(고정 비용)
high — 세부 분석, 더 많은 토큰 소비
auto — 모델이 결정합니다(기본값)

응답 스트리밍

증분 출력을 수신하려면 stream을 true으로 설정하세요. 응답은 다음 형식으로 전달됩니다 서버 전송 이벤트(SSE), 각 이벤트에는 chat.completion.chunk 객체가 포함됩니다:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

스트리밍 응답에 토큰 사용량 통계를 포함하려면 stream_options.include_usage을 true으로 설정하세요. 사용량 데이터는 [DONE] 이전의 마지막 청크에 표시됩니다.

구조화된 출력 요청

모델이 특정 스키마와 일치하는 유효한 JSON을 반환하도록 강제하려면 response_format을 사용하세요:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "result",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "answer": {"type": "string"},
          "confidence": {"type": "number"}
        },
        "required": ["answer", "confidence"],
        "additionalProperties": false
      }
    }
  }
}

{
  "response_format": {"type": "json_object"}
}

JSON Schema 모드(json_schema)는 출력이 스키마와 정확히 일치하도록 보장합니다. JSON Object 모드(json_object)는 유효한 JSON만 보장하며 구조는 강제하지 않습니다.

도구 및 함수 호출

모델이 외부 함수를 호출하도록 하려면 도구 정의를 제공하세요:

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

모델이 도구 호출을 결정하면 응답에는 finish_reason: "tool_calls"이 포함되고 message.tool_calls 배열에는 함수 이름과 인수가 포함됩니다. 그런 다음 함수를 실행하고 일치하는 tool_call_id을 포함한 tool 메시지로 결과를 다시 전송합니다.

제공업체 간 참고 사항

제공업체별 매개변수 지원

매개변수	OpenAI GPT	Claude(호환성을 통해)	Gemini(호환성을 통해)
`temperature`	0–2	0–1	0–2
`top_p`	0–1	0–1	0–1
`n`	1–128	1개만	1–8
`stop`	최대 4개	최대 4개	최대 5개
`tools`	✅	✅	✅
`response_format`	✅	✅ (json_schema)	✅
`logprobs`	✅	❌	❌
`reasoning_effort`	o-series, GPT-5.1+	❌	❌ (Gemini 네이티브에는 `thinking`을 사용)

max_tokens와 max_completion_tokens 비교

max_tokens — 레거시 매개변수입니다. 대부분의 모델에서 작동하지만 최신 OpenAI 모델에서는 더 이상 사용되지 않습니다.
max_completion_tokens — GPT-4.1, GPT-5 시리즈 및 o-series 모델에 권장되는 매개변수입니다. 출력 토큰과 추론 토큰을 모두 포함하므로 추론 모델에는 필수입니다.

CometAPI는 서로 다른 제공업체로 라우팅할 때 매핑을 자동으로 처리합니다.

system 역할과 developer 역할 비교

system — 기존 지침 역할입니다. 모든 모델에서 작동합니다.
developer — o1 모델과 함께 도입되었습니다. 최신 모델에서 더 강력한 지침 준수를 제공합니다. 이전 모델에서는 system 동작으로 대체됩니다.

GPT-4.1+ 또는 o-series 모델을 대상으로 하는 새 프로젝트에는 developer을 사용하세요.

FAQ

속도 제한은 어떻게 처리하나요?

429 Too Many Requests이 발생하면 지수 백오프를 구현하세요:

import os
import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key=os.environ["COMETAPI_KEY"],
)

def chat_with_retry(messages, max_retries=3):
    for i in range(max_retries):
        try:
            return client.chat.completions.create(
                model="gpt-5.6-sol",
                messages=messages,
            )
        except RateLimitError:
            if i < max_retries - 1:
                wait_time = (2 ** i) + random.random()
                time.sleep(wait_time)
            else:
                raise

대화 컨텍스트는 어떻게 유지하나요?

전체 대화 기록을 messages 배열에 포함하세요:

messages = [
    {"role": "developer", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is Python?"},
    {"role": "assistant", "content": "Python is a high-level programming language..."},
    {"role": "user", "content": "What are its main advantages?"},
]

`finish_reason`은 무엇을의미하나요?

값	의미
`stop`	자연스럽게 완료되었거나 중지 시퀀스에 도달했습니다.
`length`	`max_tokens` 또는 `max_completion_tokens` 한도에 도달했습니다.
`tool_calls`	모델이 하나 이상의 도구/함수 호출을 실행했습니다.
`content_filter`	콘텐츠 정책으로 인해 출력이 필터링되었습니다.

비용은 어떻게 제어하나요?

출력 길이를 제한하려면 max_completion_tokens을 사용하세요.
지능과 비용의 균형을 위해 gpt-5.6-terra을 사용하거나, 효율적인 대량 워크로드에는 gpt-5.6-luna을 사용하세요.
프롬프트는 간결하게 유지하고 중복된 컨텍스트는 피하세요.
usage 응답 필드에서 토큰 사용량을 모니터링하세요.

인증

Authorization

string

header

필수

Bearer token authentication. Use your CometAPI key.

본문

application/json

model

string

기본값:gpt-5.6-sol

필수

Model ID to use for this request. See the Models page for current options.

예시:

"gpt-4.1"

messages

object[]

필수

A list of messages forming the conversation. Each message has a role (system, user, assistant, or developer) and content (text string or multimodal content array).

Show child attributes

stream

boolean

If true, partial response tokens are delivered incrementally via server-sent events (SSE). The stream ends with a data: [DONE] message.

temperature

number

기본값:1

Sampling temperature between 0 and 2. Higher values (e.g., 0.8) produce more random output; lower values (e.g., 0.2) make output more focused and deterministic. Recommended to adjust this or top_p, but not both.

필수 범위: 0 <= x <= 2

top_p

number

기본값:1

Nucleus sampling parameter. The model considers only the tokens whose cumulative probability reaches top_p. For example, 0.1 means only the top 10% probability tokens are considered. Recommended to adjust this or temperature, but not both.

필수 범위: 0 <= x <= 1

integer

기본값:1

Number of completion choices to generate for each input message. Defaults to 1.

stop

string

Up to 4 sequences where the API will stop generating further tokens. Can be a string or an array of strings.

max_tokens

integer

Maximum number of tokens to generate in the completion. The total of input + output tokens is capped by the model's context length.

presence_penalty

number

기본값:0

Number between -2.0 and 2.0. Positive values penalize tokens based on whether they have already appeared, encouraging the model to explore new topics.

필수 범위: -2 <= x <= 2

frequency_penalty

number

기본값:0

Number between -2.0 and 2.0. Positive values penalize tokens proportionally to how often they have appeared, reducing verbatim repetition.

필수 범위: -2 <= x <= 2

logit_bias

object

A JSON object mapping token IDs to bias values from -100 to 100. The bias is added to the model's logits before sampling. Values between -1 and 1 subtly adjust likelihood; -100 or 100 effectively ban or force selection of a token.

user

string

A unique identifier for your end-user. Helps with abuse detection and monitoring.

max_completion_tokens

integer

An upper bound for the number of tokens to generate, including visible output tokens and reasoning tokens. Use this instead of max_tokens for GPT-4.1+, GPT-5 series, and o-series models.

response_format

object

Specifies the output format. Use {"type": "json_object"} for JSON mode, or {"type": "json_schema", "json_schema": {...}} for strict structured output.

Show child attributes

tools

object[]

A list of tools the model may call. Currently supports function type tools.

Show child attributes

tool_choice

기본값:auto

Controls how the model selects tools. auto (default): model decides. none: no tools. required: must call a tool.

logprobs

boolean

기본값:false

Whether to return log probabilities of the output tokens.

top_logprobs

integer

Number of most likely tokens to return at each position (0-20). Requires logprobs to be true.

필수 범위: 0 <= x <= 20

reasoning_effort

enum<string>

Controls the reasoning effort for o-series and GPT-5.1+ models.

사용 가능한 옵션:

low,

medium,

high

stream_options

object

Options for streaming. Only valid when stream is true.

Show child attributes

service_tier

enum<string>

Specifies the processing tier.

사용 가능한 옵션:

auto,

default,

flex,

priority

응답

Successful chat completion response.

string

Unique completion identifier.

예시:

"chatcmpl-abc123"

object

enum<string>

Object type. Non-streaming responses use chat.completion.

사용 가능한 옵션:

chat.completion

예시:

"chat.completion"

created

integer

Unix timestamp of creation.

예시:

1774412483

model

string

The model used (may include version suffix).

예시:

"gpt-5.4-2026-03-05"

choices

object[]

Array of completion choices.

Show child attributes

usage

object

Token accounting for this request. Billing uses these counts.

Show child attributes

service_tier

string

Service tier that processed the request, when the provider reports one.

예시:

"default"

system_fingerprint

string | null

Provider backend configuration fingerprint, when the provider reports one.

예시:

"fp_490a4ad033"

텍스트 및 채팅 API

모델 응답 생성

import os
from openai import OpenAI
client = OpenAI(
    base_url="https://api.cometapi.com/v1",
    api_key=os.environ["COMETAPI_KEY"],
)

completion = client.chat.completions.create(
    model="gpt-5.6-sol",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
    ],
)

print(completion.choices[0].message)

{
  "id": "chatcmpl-DNA27oKtBUL8TmbGpBM3B3zhWgYfZ",
  "object": "chat.completion",
  "created": 1774412483,
  "model": "gpt-4.1-nano-2025-04-14",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Four",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 29,
    "completion_tokens": 2,
    "total_tokens": 31,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_490a4ad033"
}

{
  "error": {
    "code": "",
    "message": "model name is required (request id: <request_id>)",
    "type": "comet_api_error"
  }
}

{
  "error": {
    "code": "",
    "message": "invalid token (request id: <request_id>)",
    "type": "comet_api_error"
  }
}

{
  "error": {
    "message": "field messages is required (request id: <request_id>)",
    "type": "comet_api_error",
    "param": "",
    "code": "invalid_request"
  }
}

콘텐츠 검토

API 키

메시지 역할

멀티모달 입력 전송

응답 스트리밍

구조화된 출력 요청

도구 및 함수 호출

제공업체 간 참고 사항

FAQ

속도 제한은 어떻게 처리하나요?

대화 컨텍스트는 어떻게 유지하나요?

`finish_reason`은 무엇을의미하나요?

비용은 어떻게 제어하나요?

인증

본문

응답

​메시지 역할

​멀티모달 입력 전송

​응답 스트리밍

​구조화된 출력 요청

​도구 및 함수 호출

​제공업체 간 참고 사항

​FAQ

​속도 제한은 어떻게 처리하나요?

​대화 컨텍스트는 어떻게 유지하나요?

​finish_reason은 무엇을의미하나요?

​비용은 어떻게 제어하나요?

인증

본문

응답

메시지 역할

멀티모달 입력 전송

응답 스트리밍

구조화된 출력 요청

도구 및 함수 호출

제공업체 간 참고 사항

FAQ

속도 제한은 어떻게 처리하나요?

대화 컨텍스트는 어떻게 유지하나요?

`finish_reason`은 무엇을의미하나요?

비용은 어떻게 제어하나요?