Skip to main content

Documentation Index

Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt

Use this file to discover all available pages before exploring further.

Handle rate limits by controlling concurrency before requests leave your app. When CometAPI returns 429, retry with exponential backoff and jitter, then lower burst traffic if repeated retries occur.

Limit concurrency

The following Python example caps concurrent chat requests with an async semaphore:
import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=os.environ["COMETAPI_KEY"],
    base_url="https://api.cometapi.com/v1",
)

semaphore = asyncio.Semaphore(5)

async def ask(prompt):
    async with semaphore:
        completion = await client.chat.completions.create(
            model="your-model-id",
            messages=[{"role": "user", "content": prompt}],
        )
        return completion.choices[0].message.content

async def main():
    prompts = ["Say hello.", "Write a title.", "Return one JSON key."]
    results = await asyncio.gather(*(ask(prompt) for prompt in prompts))
    print(results)

asyncio.run(main())
The result is an array of model outputs:
[
  "Hello.",
  "A concise title",
  "{\"key\":\"value\"}"
]

Retry rate limits

The following JavaScript example retries 429 responses with jitter:
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.COMETAPI_KEY,
  baseURL: "https://api.cometapi.com/v1",
});

async function sleep(milliseconds) {
  return new Promise((resolve) => setTimeout(resolve, milliseconds));
}

async function createCompletion() {
  for (let attempt = 0; attempt < 5; attempt += 1) {
    try {
      return await client.chat.completions.create({
        model: "your-model-id",
        messages: [{ role: "user", content: "Say hello." }],
      });
    } catch (error) {
      if (error.status !== 429 || attempt === 4) {
        throw error;
      }

      const delay = Math.min(30000, 1000 * 2 ** attempt);
      await sleep(delay + Math.random() * 1000);
    }
  }
}

const completion = await createCompletion();
console.log(completion.choices[0].message.content);
The successful response contains a normal chat completion:
{
  "choices": [
    {
      "message": {
        "content": "Hello."
      }
    }
  ]
}

Common errors

ErrorFix
Unlimited parallel requestsAdd a semaphore, queue, or worker pool.
Retrying all failuresRetry only 429 and temporary server failures.
No per-model metricsLog route, model ID, status, and latency for each request.
Retry stormAdd jitter and cap the maximum retry delay.
Last updated: May 27, 2026