Skip to main content
LlamaIndex provides the CometLLM class as a first-class integration with CometAPI. Use it to power RAG pipelines, agents, and LLM chains with any model in CometAPI’s catalog.

Prerequisites

  • Python 3.8+
  • A CometAPI account with an active API key — get yours here
1

Install the LlamaIndex CometAPI integration

pip install llama-index-llms-cometapi llama-index
2

Set your API key

from llama_index.llms.cometapi import CometLLM
import os

os.environ["COMETAPI_KEY"] = "<COMETAPI_KEY>"
api_key = os.getenv("COMETAPI_KEY")
Using environment variables is safer than hardcoding credentials in scripts.
3

Initialize the model and make completion calls

from llama_index.core.llms import ChatMessage

llm = CometLLM(
    api_key=api_key,
    max_tokens=256,
    context_window=4096,
    model="your-model-id",
)

# Chat call
messages = [
    ChatMessage(role="system", content="You are a helpful assistant"),
    ChatMessage(role="user", content="Say 'Hi' only!"),
]
resp = llm.chat(messages)
print(resp)

# Completion call
resp = llm.complete("Who is Kaiming He?")
print(resp)
4

Enable streaming

Use stream_chat or stream_complete for real-time chunked output:
# Streaming chat
message = ChatMessage(role="user", content="Tell me what ResNet is")
for chunk in llm.stream_chat([message]):
    print(chunk.delta, end="")

# Streaming completion
for chunk in llm.stream_complete("Tell me about Large Language Models"):
    print(chunk.delta, end="")
  • Models: See the CometAPI Models page for all available options.
  • Using other models: Initialize with a different current model ID, e.g. CometLLM(api_key=api_key, model="your-model-id", max_tokens=1024).
  • Fine-tuning: Pass temperature and max_tokens directly to CometLLM(...).
  • Error handling: Wrap calls in try/except to catch key errors or network issues.
  • Security: Never commit API keys to version control. Use environment variables.
  • More docs: LlamaIndex documentationCometAPI quick startColab example