Handle rate limits and concurrency - CometAPI Documentation

Handle rate limits by controlling concurrency before requests leave your app. When CometAPI returns 429, retry with exponential backoff and jitter, then lower burst traffic if repeated retries occur.

Limit concurrency

The following Python example caps concurrent chat requests with an async semaphore:

import asyncio
import os
from openai import AsyncOpenAI

client = AsyncOpenAI(
    api_key=os.environ["COMETAPI_KEY"],
    base_url="https://api.cometapi.com/v1",
)

semaphore = asyncio.Semaphore(5)

async def ask(prompt):
    async with semaphore:
        completion = await client.chat.completions.create(
            model="your-model-id",
            messages=[{"role": "user", "content": prompt}],
        )
        return completion.choices[0].message.content

async def main():
    prompts = ["Say hello.", "Write a title.", "Return one JSON key."]
    results = await asyncio.gather(*(ask(prompt) for prompt in prompts))
    print(results)

asyncio.run(main())

The result is an array of model outputs:

[
  "Hello.",
  "A concise title",
  "{\"key\":\"value\"}"
]

Retry rate limits

The following JavaScript example retries 429 responses with jitter:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.COMETAPI_KEY,
  baseURL: "https://api.cometapi.com/v1",
});

async function sleep(milliseconds) {
  return new Promise((resolve) => setTimeout(resolve, milliseconds));
}

async function createCompletion() {
  for (let attempt = 0; attempt < 5; attempt += 1) {
    try {
      return await client.chat.completions.create({
        model: "your-model-id",
        messages: [{ role: "user", content: "Say hello." }],
      });
    } catch (error) {
      if (error.status !== 429 || attempt === 4) {
        throw error;
      }

      const delay = Math.min(30000, 1000 * 2 ** attempt);
      await sleep(delay + Math.random() * 1000);
    }
  }
}

const completion = await createCompletion();
console.log(completion.choices[0].message.content);

The successful response contains a normal chat completion:

{
  "choices": [
    {
      "message": {
        "content": "Hello."
      }
    }
  ]
}

Common errors

Error	Fix
Unlimited parallel requests	Add a semaphore, queue, or worker pool.
Retrying all failures	Retry only `429` and temporary server failures.
No per-model metrics	Log route, model ID, status, and latency for each request.
Retry storm	Add jitter and cap the maximum retry delay.

​Limit concurrency

​Retry rate limits

​Common errors

​Related links

Limit concurrency

Retry rate limits

Common errors

Related links