API Doc-CometAPI
HomeDashBoardModel Marketplace
HomeDashBoardModel Marketplace
Discord_Support
  1. GET START
  • GET START
    • Quick Start
    • Important Guidelines
    • Release Notes
  • API Reference
    • Error Codes & Handling
    • Text Models-openai format
      • Chat
      • response
      • gpt-4o-image generates image
      • Images
      • Image Editing (gpt-image-1)
      • Recognizing Images
      • Embeddings
      • Realtime
      • Models
      • Hunyuan3D
    • Anthropic Compatiable
      • Anthropic Claude
    • Gemini
      • Guide to calling gemini-2.5-flash-image
      • Gemini generates image
    • Image Models
      • Midjourney(images)
        • Quick Tutorial - Complete Process in One Go
        • Task Fetching API
          • List by Condition
          • Fetch Single Task (most recommended)
        • Imagine
        • Submit Video
        • Submit Editor
        • Action (UPSCALE; VARIATION; REROLL; ZOOM, etc.)
        • Blend (image -> image)
        • Describe (image -> text)
        • Modal (Area Redesign & Zoom)
      • Flux(images)
        • Generate image (replicate format)
        • Create Task - General
        • flux fine-tune images(Temporarily unavailable)
        • flux generate image(Temporarily unavailable)
        • flux query
      • Replicate(image)
        • Create Task - General
        • Create Task -flux-kontext-pro、max
        • Create Task -flux-1.1-pro
        • Create Task -flux-1.1-pro-ultra
        • replicate query
      • Recraft(images)
        • Appendix
        • Recraft Generate Image
        • Recraft Vectorize Image
        • Recraft Remove Background
        • Recraft Clarity Upscale
        • Recraft Create style
        • Recraft Generative Upscale
      • Ideogram(images)(Temporarily removed)
        • Official documentation (updated in real time)
        • Generate 3.0 (text to image)
        • Remix 3.0 (hybrid image)
        • Reframe 3.0(Reconstruction)
        • Replace Background 3.0(Background replacement)
        • Edit 3.0(Editing images)
        • ideogram Text Raw Image
        • ideogram Hybrid image
        • ideogram enlargement HD
        • ideogram describes the image
        • ideogram Edit image((legacy))
    • Music Models
      • Suno
        • Setting suno Version
        • Suno API Scenario Application Guide
        • Generate lyrics
        • Generate music clip
        • Upload clip
        • Submit concatenation
        • Full Track Audio Separation
        • Single Track Audio Separation
        • Create New Persona
        • Single task query
        • Generate mp4 mv video
        • Timing: lyrics, audio timeline
        • Get wav format file
        • Batch query tasks
      • Udio(Temporarily unavailable)
        • Generate music
        • Task query
    • Video Models
      • veo3
        • veo3-chat format
        • Submit video generation task
        • Query video generation status
      • runway(video)
        • official format
          • runway images raw video
          • Generate a video from a video
          • Generate an image from text
          • Upscale a video
          • Control a character
          • runway to get task details
        • Reverse Format
          • generate(text)
          • generate(Reference images)
          • Video to Video Style Redraw
          • Act-one Expression Migration
          • feed-get task
      • kling (video)
        • callback_url
        • testing
          • Multimodal Video Editing (In Testing)
            • Initialize Video for Editing
            • Add Video Selection
            • Delete Video Selection
            • Clear Video Selection
            • Preview Selected Video Area
            • Create Task
        • Generating images
        • Expanded
        • Text Generation Video
        • Image Generation Video
        • Multi-Image To Video
        • Multi-Image to Image
        • Video Extension
        • virtual try-on
        • lip sync
        • effects
        • Video to audio
        • Text to audio
        • Individual queries
      • bytedance
        • bytedance-video
        • bytedance-video get
        • bytedance-image-generation
        • bytedance-Image Editing
      • MiniMax Conch(video)
        • MiniMax Conch Official Documentation
        • MiniMax Conch Generation
        • MiniMax Conch Query
        • MiniMax Conch Download
      • luma (video)(temporarily dismantle)
        • Official api interface format
          • luma generate
          • luma search
      • PIKA(video)(temporarily dismantle)
        • pika feed
        • PIKA Reference Video Generation
        • PIKA Reference Image Generation
        • PIKA reference text generation
      • sora(temporarily dismantle)
        • Reverse Format
          • Create Video
          • Query Video Task
          • Create Video
    • Audio Models
      • Create speech
      • Create transcription
      • Create translation
  • CODE EXAMPLES
    • Code example
  • Guides & Tutorials
    • Integration Guides
      • COMET API API Call Testing
      • OpenManus
      • Chatbox
      • CherryStudio
      • Cursor
      • COZE
      • Cline
      • ChatHub
      • Dify
      • LiteLLM
      • zapier
      • n8n
      • n8n Local Deployment
      • AnythingLLM
      • MAKE
      • Immersive Translation
      • NEXT CHAT (ChatGPT Next Web)
      • ChatAll Translation
      • FastGPT
      • Lobe-Chat
      • Zotero
      • LangChain
      • Open WebUI
      • OpenAI Translator
      • Pot Translation
      • Obsidian's Text Generator Plugin
      • GPT Academic Optimization (gpt_academic)
      • gptme
      • avante.nvim
      • Eudic Translation
      • librechat
      • utools-ChatGPT Friend
      • IntelliJ Translation Plugin
      • Lazy Customer Service
      • buildship
      • sillytavern
    • Best Practices
      • Claude Code Installation and Usage Guide
      • Gemini CLI Installation and Usage Guide
      • CometAPI Account Balance Query API Usage Instructions
      • Retry Logic Documentation for CometAPI and OpenAI Official API
      • Midjourney Best Practices
      • Runway Best Practices
  • Pricing & Billing
    • About Pricing
  • Support
    • Help Center
    • Confusion about use
    • Common Misconceptions
    • Terms of service
    • Privacy policy
    • Interface Stability
  1. GET START

Release Notes

🌟 2025-08-27#

🔹 gemini-2.5-flash-image-preview,gemini-2.5-flash-image#

-gemini-2.5-flash-image-preview,gemini-2.5-flash-image :
Gemini 2.5 Flash Image (also known as nano-banana) is Google’s most advanced image generation and editing model. This update enables you to blend multiple images into a single image, maintain character consistency to tell richer stories, perform targeted transformations using natural language, and use Gemini’s world knowledge to generate and edit images.
Follows the OpenAI chat standard format. See details: CometAPI Chat Documentation https://apidoc.cometapi.com/gemini-generates-image-20873272e0
GUIDE:https://apidoc.cometapi.com/guide-to-calling-gemini-2-5-flash-image-1425263m0

🌟 2025-08-22#

🔹 deepseek-v3.1, deepseek-v3-1-250821#

deepseek-v3.1, deepseek-v3-1-250821: DeepSeek-V3.1 is DeepSeek's all-new hybrid inference model.
🧠 Hybrid inference: Think & Non-Think — one model, two modes
⚡️ Faster thinking: DeepSeek-V3.1 reaches answers in less time vs. DeepSeek-R1-0528
Follows the OpenAI chat standard format, see details: CometAPI Chat Documentation

🌟 2025-08-20#

🎉 CometAPI Update: Suno Adds Instrumental & Vocals, Plus Major Upgrades for Kling's Effects, Quality, and Models! 🎉#

🔹 Suno#

🎵 Introducing Two Major Music Creation Features: Easily add accompaniment to vocals and generate lyrics & vocals for instrumental tracks.
Add Instrumental: Upload an a cappella vocal track, and Suno will intelligently generate and add a matching accompaniment.
Add Vocals: Upload an instrumental track, and Suno will generate lyrics and a vocal performance to match.
Documentation: Suno Scenario Application Guide

🔹 Kling#

✨ Massive Video Effects Library Expansion: Added 63 new video effects (62 single-subject effects and 1 two-person interactive effect), bringing the total to 80 available effects for more creative choices.
🔊 Video-to-Audio Optimization: The video-to-audio generation feature now supports full-resolution video uploads for more precise sound effect matching.
📈 Multi-Image to Video Performance Skyrockets: Experience a 102% improvement over the previous version! See significant enhancements in subject consistency, dynamic quality, and interaction naturalness. This is a seamless upgrade with no code changes required.
🎬 Text-to-Video Quality Upgrade: Version 1.6 now supports the generation of higher-quality videos.
Parameter Example: "mode": "pro"
Documentation: Kling Video Generation
🎨 Image Generation Model Update: The new kling-v2-new model is now live, supporting nearly 300 image styles to maximize your creativity!
Documentation: Kling Image Generation

🌟 2025-08-18#

🚀 New and Updated Models: Runway, VEO3, hunyuan-3D, Midjourney Fully Updated!#

🔹 Runway#

Runway model adds multiple core functions, expanding video and image generation capabilities:
Video to Video: Video to video generation.
Text to Image: Text to image generation.
Video Upscale: Video super-resolution enhancement.
Control a Character: Character control function.
Click the link to experience it now: https://apidoc.cometapi.com/generate-a-video-from-a-video-20308134e0

🔹 VEO3#

VEO3 now supports asynchronous interface for task processing, optimizing the calling efficiency of long-duration tasks and enhancing the overall experience.
Click the link to experience it now: https://apidoc.cometapi.com/submit-video-generation-task-18941528e0

🔹 Huanyuan3D#

Supports Hunyuan3D-2, providing powerful 3D content creation capabilities to assist in efficiently generating high-quality 3D models.
Click the link to experience it now: https://apidoc.cometapi.com/hunyuan3d-20073774e0

🌟 2025-08-08#

🔹 GPT-5 Series
gpt-5, gpt-5-2025-08-07: OpenAI's flagship model, widely recognized as the industry's most powerful for coding, reasoning, and agentic tasks. It is designed to handle the most complex cross-domain challenges and excels in code generation, advanced reasoning, and autonomous agents, making it the premier choice for users demanding peak performance.
gpt-5-chat-latest: The continuously updated version of GPT-5. It always incorporates the latest features and optimizations, recommended for applications that need to stay current with the latest model capabilities.
🔹 GPT-5 Mini Series
gpt-5-mini, gpt-5-mini-2025-08-07: The cost-effective version of GPT-5, specifically optimized for speed and cost. It strikes an excellent balance between performance and affordability, making it the ideal choice for everyday tasks like general chat, content creation, and routine Q&A.
🔹 GPT-5 Nano Series
gpt-5-nano, gpt-5-nano-2025-08-07: The fastest and most cost-effective lightweight version in the GPT-5 family. It is perfect for scenarios requiring high throughput and instant responses, such as text classification, sentiment analysis, summary extraction, and data formatting.
API Call Instructions: gpt-5-chat-latest should be called using the standard /v1/chat/completions format. For other models (gpt-5, gpt-5-mini, gpt-5-nano, and their dated versions), using the /v1/responses format is recommended. For details, please refer to: https://apidoc.cometapi.com/api-13851472

Note#

Important: top_p is not supported by this series of models.
Temperature Settings
gpt-5-chat-latest: Supports custom temperature values between 0 and 1 (inclusive).
All other GPT-5 models: The temperature is fixed at 1. You may set it to 1 or omit it (defaults to 1).
When calling the GPT-5 series models (excluding gpt-5-chat-latest), the max_tokens field should be changed to max_completion_tokens.

🌟 2025.08.06#

🔹 claude-opus-4-1-20250805
claude-opus-4-1-20250805: Anthropic's flagship Claude Opus 4.1 model, achieving major breakthroughs in programming, reasoning, and agentic tasks, with SWE-bench Verified reaching 74.5%.
Significantly enhanced multi-file code refactoring, debugging precision, and detail-oriented reasoning capabilities. This model is suitable for demanding programming and reasoning scenarios.
We have also added cometapi-opus-4-1-20250805 specifically for Cursor integration.
🔹 claude-opus-4-1-20250805-thinking
claude-opus-4-1-20250805-thinking: Claude Opus 4.1 version with extended thinking capabilities, providing up to 64K tokens of deep reasoning capacity.
Optimized for research, data analysis, and tool-assisted reasoning tasks, with powerful detail-oriented reasoning abilities.
We have also added cometapi-opus-4-1-20250805-thinking specifically for Cursor integration.
🔹 gpt-oss-120b
gpt-oss-120b: OpenAI's released 117B parameter Mixture of Experts (MoE) open-source model, designed for high-level reasoning, agentic, and general production use cases.
🔹 gpt-oss-20b
gpt-oss-20b: 21B parameter open-source MoE model with 3.6B active parameter architecture, optimized for low-latency inference and consumer-grade hardware deployment.
All above models follow the OpenAI chat standard format for API calls. For details, please refer to: https://apidoc.cometapi.com/api-13851472

🌟 2025.08.05#

🚀 Feature Updates: gemini-2.5-flash-lite, o3 & o4-mini Deep Research, Volcano Engine Generation Models
gemini-2.5-flash-lite - Google's most cost-effective model, built for large-scale tasks!
⚡️ High Efficiency: Designed for large-scale, low-latency applications.
🔧 Standard Format: Follows the OpenAI chat standard format, see details: CometAPI Chat Documentation
o3 & o4-mini Deep Research Agents - Get in-depth analysis reports with web-connected research agents!
🧠 Advanced Analysis: Supports multi-step reasoning and provides reports with citations.
🤖 Available Models: o3-deep-research, o3-deep-research-2025-06-26, o4-mini-deep-research, o4-mini-deep-research-2025-06-26
📚 How to Call: The four deep research models above must be called using the following format:
Volcano Engine Video & Image Models - Experience powerful new video and image models!
🎬 Video Generation: Create videos from images (bytedance-seedance-1-0-pro, bytedance-seedance-1-0-lite-i2v-250428) or text (bytedance-seedance-1-0-lite-t2v-250428).
🎨 Image Generation & Editing: Generate images with bytedance-seedream-3.0-t2i or edit them using prompts with bytedance-seedEdit-3.0-i2i.
📚 How to Call:https://apidoc.cometapi.com/api-19771367

🌟 2025.07.31#

🚀 Feature Updates: MJ Video Generation, Flux-Kontext Multi-Image Reference, Kling-v1-6 Multi-Image Reference
MJ Video Generation - Transform static images into dynamic video effects!
🎬 New capability: MJ original image generation endpoint /mj/submit/imagine now supports video generation
🎨 Bring creativity to life: Perfect for creating animated effects, creative video generation, and various other applications
📚 Learn more: View Documentation
Flux-Kontext Series Multi-Image Reference - Enhanced AI creation with multiple references!
🖼️ Expanded support: Now supports uploading up to 4 reference images (previously only single image supported)
🎯 Precision boost: Multi-image reference makes AI creation more precise with richer inspiration
🔧 Compatible models: Only supported by black-forest-labs/flux-kontext-max and black-forest-labs/flux-kontext-pro models
📚 Learn more: View Documentation
Kling-v1-6 Multi-Image Reference - Elevate your video generation quality!
📸 Multi-image input: Supports up to 4 images as reference input
⚡ Quality enhancement: Significantly improves video generation quality
🎯 Model specific: Only available for `kling- View Documentation

🌟 2025.07.29#

🔹 Latest Support: glm-4.5, glm-4.5-air, glm-4.5-x, glm-4.5-airx, glm-4.5-flash
glm-4.5: Flagship model with 355B total parameters and 32B active parameters, designed for agentic applications, supporting hybrid reasoning modes and excelling in complex reasoning, tool calling, and web browsing.
glm-4.5-air: Cost-effective model with 106B total parameters and 12B active parameters, maintaining strong performance while significantly reducing costs, ideal for resource-sensitive applications.
glm-4.5-x: High-performance model optimized for ultra-fast inference and powerful reasoning capabilities, delivering millisecond-level response times for scenarios requiring speed and logic.
glm-4.5-airx: Lightweight yet powerful model combining Air's cost advantages with X's speed benefits, offering the perfect balance between performance and efficiency.
glm-4.5-flash: Efficient multi-purpose model with high generation speed, specifically optimized for coding and reasoning tasks, suitable for developers getting started and rapid prototyping.
-Follows the OpenAI chat standard format, see details: CometAPI Chat Documentation

🌟 2025.07.25#

🔹 New Model: gemini-2.5-pro-all, gemini-2.5-flash-all, gemini-2.5-pro-deepsearch, gemini-2.5-flash-deepsearch, deepseek-r1t2-chimera
gemini-2.5-pro-all: A multimodal version of the Gemini model,
supporting analysis of files, videos, and images, as well as image generation and real-time web access.
gemini-2.5-flash-all: A multimodal version of the Gemini model,
supporting analysis of files, videos, and images, as well as image generation and real-time web access.
gemini-2.5-pro-deepsearch: A deep search model with enhanced deep search and information retrieval capabilities
ideal for complex knowledge integration and analysis.
gemini-2.5-flash-deepsearch: A deep search model combining the rapid performance of the Flash model with advanced deep search capabilities for fast, in-depth information discovery.
deepseek-r1t2-chimera: A 671B parameter Mixture-of-Experts (MoE) text generation model merged from DeepSeek-AI's R1-0528, R1, and V3-0324, supporting a context of up to 60k tokens.
-Follows the OpenAI chat standard format, see details: CometAPI Chat Documentation

🌟 2025.07.24#

🔹 qwen3-coder-plus
qwen3-coder-plus: Focused on code generation, understanding, and optimization, excels in complex programming tasks.
🔹 qwen3-coder-plus-2025-07-22
qwen3-coder-plus-2025-07-22: Optimized version from 2025-07-22, stable and reliable, suitable for production.
🔹 qwen3-coder-480b-a35b-instruct
qwen3-coder-480b-a35b-instruct: Flagship model with 480 billion parameters, MoE architecture, capable of handling extremely complex programming.
Those models follows the openai chat standard format call, specific reference:
https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.07.18#

🔹 Suno v4.5+
Suno v4.5+: v4.5+ has richer sounds, new creation methods, and a maximum length of 8 minutes. This website currently supports Suno 4.5+. Please change the request parameter mv to chirp-bluejay.
The above model follows the suno format, please refer to: https://apidoc.cometapi.com/api-13851480

🌟 2025.07.17#

CometAPI supports Midjourney uploading masked images for local modifications
Refer to: https://apidoc.cometapi.com/api-18989894

🌟 2025.07.16#

🔹 kimi-k2-0711-preview
kimi-k2-0711-preview: Kimi K2 is a large-scale mixed-expertise (MoE) language model developed by Moonshot AI.
with 1 trillion total parameters and 32 billion active parameters per forward pass. It is optimized for agent capabilities including advanced tool usage, inference, and code synthesis.
Kimi K2 performs well in a variety of benchmarks, especially in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA) and tool usage (Tau2, AceBench) tasks.
It supports long contextual inference with up to 128K tokens and features a novel training stack design that includes the MuonClip optimizer for stable large-scale MoE training.
The model follows the openai chat standard format call, specific reference:
https://apidoc.cometapi.com/chat-api-13851472
🌟 Since Google officially took down the gemini-2.5 model with version number on 7-15, the preview model has been forwarded to the official version; the gemini-2.5-pro model can be called as a priority. Click here to see the official notice: https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash

🌟 2025.07.14#

CometAPI now supports direct calls to the OpenAI API to process PDFs without uploading files by providing the URL of the PDF file.
For details on how to call it, see: https://apidoc.cometapi.com/api-18535147
🌟 As OpenAI has officially taken down the gpt-4.5 series models on 7-14; gpt-4.1 models can be called in priority Click here to see the official notice: https://platform.openai.com/docs/deprecations

🌟 2025.07.11#

🚀 CometAPI supports Claude code!
• Add power to your development workflow. We're excited to announce that CometAPI now fully supports the powerful Claude Code.
• What does this mean for you?
• Top Artificial Intelligence features: Easily generate, debug and optimize code using models built specifically for developers.
• ⚙️ Flexible Model Selection: Our comprehensive range of models allows you to develop more seamlessly.
• Seamless Integration: APIs are always available. Integrate Claude Code directly into your existing workflow in minutes.
• Ready to build faster? Please click on the link below to make a call.
• click :https://apidoc.cometapi.com/doc-1266358

🌟 2025.07.10#

🔹 grok-4
🔹 grok-4-0709
grok-4,grok-4-0709: Currently supports text modal, with visual, image generation and other features coming soon. Extremely powerful technical parameters and ecological capabilities: Context Window: Supports up to 256,000 tokens of contextualization, ahead of mainstream models.
The model follows the openai chat standard format call, specific reference:
https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.07.04#

Suno now supports stem separation, creating Persona, generating MP4 MV videos, getting WAV format files, and Timing: lyrics & audio timeline
Now supports Suno's full-track and single-track stem separation features, which can split your songs into up to 12 clean tracks—including vocals, drums, bass, etc., convenient for preview and download.
Create new Persona to generate Persona: singer style; generate music in different formats;
Note: Full-track stem separation feature is priced at 5 times the music generation cost; single-track billing is still being optimized, currently maintaining the same 5x pricing as full-track, will be charged at 1x base price in the future
For specific model usage, please refer to the above: https://apidoc.cometapi.com/api-18657316
🔹 veo3
🔹 veo3-pro
🔹 veo3-fast
🔹 veo3-frames
🔹 veo3-fast-frames
🔹 veo3-pro-frames
veo3,veo3-pro,veo3-fast: is the official Google's latest video generation model, the generated video with sound, the world's only video model with sound. veo3-frames,veo3-fast-frames,veo3-pro-frames Support first frame mode.
This model follows the OpenAI chat standard format for calls, refer to: https://apidoc.cometapi.com/api-18582532

🌟 2025.07.01#

🔹 mj_fast_video
Midjourney video generation is now supported
Synchronized support for official website low-dynamic, high-dynamic, auto-generated, manual generation.
please click : https://apidoc.cometapi.com/api-18581293

🌟 2025.07.01#

🔹 kling_image_expand
Now supports the Keyline API to expand the image.
please click : https://apidoc.cometapi.com/api-18584170

🌟 2025.06.25#

🔹 black-forest-labs/flux-kontext-pro
🔹 black-forest-labs/flux-kontext-max
🔹 flux-kontext-pro
🔹 flux-kontext-max
black-forest-labs/flux-kontext-pro, black-forest-labs/flux-kontext-max:
The above two models follow the replicate call format; see details: https://apidoc.cometapi.com/api-16455857
flux-kontext-pro, flux-kontext-max:
The above two models follow the OpenAI chat standard call format; see details: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.06.19#

🔹 gemini-2.5-flash-lite-preview-06-17
gemini-2.5-flash-lite-preview-06-17: Large scale processing,Lower cost.
This model follows the OpenAI chat standard format for API calls, please refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.06.11#

🔹 o3-pro
🔹 o3-pro-2025-06-10
o3-pro,o3-pro-2025-06-10: Supports web search, file analysis, visual input reasoning, Python programming, and personalized responses.
Compared to previous models, o3-pro shows significant improvements in clarity, completeness, instruction following, and accuracy.
This model adheres to the OpenAI v1/responses standard call format. For specific reference:
curl --location --request POST 'https://api.cometapi.com/v1/responses' \
--header 'Authorization: Bearer sk-xxxxxx' \
--header 'User-Agent: Apifox/1.0.0 (https://apifox.com)' \
--header 'Content-Type: application/json' \
--header 'Accept: */*' \
--header 'Host: api.cometapi.com' \
--header 'Connection: keep-alive' \
--data-raw '{
     "model": "o3-pro",
        "input": [{"role": "user", "content": "What’s the difference between inductive and deductive reasoning?"}]
    }'

🌟 2025.06.06#

🔹 gemini-2.5-pro-preview-06-05
gemini-2.5-pro-preview-06-05: With native multimodal processing capabilities and a very long context window of up to 1 million words (Token), it provides unprecedented power for processing complex, long sequence tasks.
This model follows the OpenAI chat standard format for API calls, please refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.05.30#

🌟 Gemini Models 1.5 series model off the shelf notice:
Due to the fact that Gemini no longer offers the 1.5 series officially, it will be phased out today;
Updated gemini-2.5-flash-preview-05-20 \ gemini-2.5-flash-preview-04-17 \ gemini-2.5-pro-preview-05-06 \ gemini-2.5-pro-preview-03-25 \ gemini-2.5-pro -preview-03-25 \ gemini-2.5-pro-exp-03-25 ;* gemini-2.5-pro-exp-03-25
Please continue to call the 2.5 series, the above models follow the openai chat standard format call, refer to: https://apidoc.cometapi.com/api-276386060
🌟 Notes on using gpt to generate images:
Since the interface gpt-4o-image is an interface realized by technical means, not an asynchronous interface, technically it can't achieve complete stability, so there will be instability.
If you have high stability requirements, we recommend using gpt-image-1, the official API call is more stable. Refer to the official images/generations format for the calling method, and https://api.cometapi.com/v1/images/generations for the url details.
Meanwhile, gpt-4o-image and gpt-image-1 support chat format, which can be called by technical means, please refer to the following url for details:
https://api.cometapi.com/v1/chat/completions

🌟 2025.05.29#

🔹 deepseek-r1-0528
deepseek-r1-0528: Advanced reasoning capabilities, large parameter scale, powerful performance, suitable for complex tasks.
This model follows the OpenAI chat standard format for API calls, please refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.05.23#

🔹 claude-sonnet-4-20250514
claude-sonnet-4-20250514: An important model in the Claude 4 series developed by Anthropic, significantly improving coding and reasoning capabilities compared to its predecessor Claude Sonnet 3.7.
It can respond more precisely to user instructions and efficiently handle complex tasks. This model is suitable for applications requiring high performance and cost-effectiveness. We've also added cometapi-sonnet-4-20250514 specifically for use in Cursor.
🔹 claude-sonnet-4-20250514-thinking
claude-sonnet-4-20250514-thinking: An important model in the Claude 4 series developed by Anthropic, significantly improving coding and reasoning capabilities compared to its predecessor Claude Sonnet 3.7.
It can respond more precisely to user instructions and efficiently handle complex tasks. This model is suitable for applications requiring high performance and cost-effectiveness. We've also added cometapi-sonnet-4-20250514-thinking specifically for use in Cursor.
🔹 claude-opus-4-20250514
claude-opus-4-20250514: Opus 4 is Anthropic's most advanced model, acclaimed as the world's best coding model.
It excels in handling complex, long-running tasks and intelligent agent workflows, particularly suitable for applications requiring high autonomy and intelligence.
🔹 claude-opus-4-20250514-thinking
claude-opus-4-20250514-thinking: Opus 4 is Anthropic's most advanced model, acclaimed as the world's best coding model.
It excels in handling complex, long-running tasks and intelligent agent workflows, particularly suitable for applications requiring high autonomy and intelligence.
This model follows the OpenAI chat standard format for calls, refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.05.07#

🔹 Suno v4.5
Suno v4.5: v4.5 has more expressive music and richer vocals, designed to enhance the user's expression and intuition in music creation. This site now supports Suno 4.5, change the request parameter mv to chirp-auk
The above model follows the suno format, please refer to: https://apidoc.cometapi.com/api-13851480

🌟 2025.04.29#

🔹 qwen3-235b-a22b
qwen3-235b-a22b: This is the flagship model of the Qwen3 series, with 235 billion parameters, utilizing a Mixture of Experts (MoE) architecture.
Particularly suitable for complex tasks requiring high-performance inference, such as coding, mathematics, and multimodal applications.
🔹 qwen3-30b-a3b
qwen3-30b-a3b: With 30 billion parameters, it balances performance and resource requirements, suitable for enterprise-level applications.
This model may use MoE or other optimized architectures, applicable for scenarios requiring efficient processing of complex tasks, such as intelligent customer service and content generation.
🔹 qwen3-8b
qwen3-8b: A lightweight model with 800 million parameters, designed specifically for resource-constrained environments (such as mobile devices or low-configuration servers).
Its efficiency and fast response capability make it suitable for simple queries, real-time interaction, and lightweight applications.
These models follow the OpenAI Chat standard format for calls. For specific details, please refer to:
https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.27#

🔹 gpt-image-1
gpt-image-1 introduces native multimodal models to the API, built on GPT-4o's image generation capabilities, designed to provide developers with a powerful and flexible tool for generating high-quality, diverse images.
Features: High-fidelity images; diverse visual styles; rich world knowledge; consistent text rendering; unlocking practical applications across multiple domains.
This model follows the openai v1/images/generations format for calls, see details at: https://apidoc.cometapi.com/images-api-13851474 ;Here's an example of input parameters:
{
    "model": "gpt-image-1",
    "prompt": "A cute baby sea otter",
    "n": 1,
    "size": "1024x1024"
}

🌟 2025.04.20#

🔹 gemini-2.5-flash-preview-04-17
gemini-2.5-flash-preview-04-17, Gemini 2.5 Flash is an AI model developed by Google, designed to provide developers with fast and cost-effective solutions, especially suitable for applications requiring enhanced reasoning capabilities.
According to the Gemini 2.5 Flash preview announcement, the model's preview version was released on April 17, 2025, supports multimodal input, and has a context window of up to 1 million tokens.
This model follows the OpenAI chat standard format for calling, refer to:https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.17#

🔹 o4-mini
🔹 o4-mini-2025-04-16
o4-mini, o4-mini-2025-04-16: A smaller, faster, and more economical model, research shows it performs well in mathematics, coding, and visual tasks, designed to be efficient and responsive, suitable for developers. Released on April 16, 2025.
🔹 o3
🔹 o3-2025-04-16
o3, o3-2025-04-16: A reflective generative pre-trained transformer (GPT) model designed to handle problems requiring step-by-step logical reasoning.
Research shows it excels at mathematics, coding, and scientific tasks. It can also use tools such as web browsing and image generation, with a release date of April 16, 2025.
The above models follow the OpenAI chat standard format for calls, refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.15#

🔹 gpt-4.1
gpt-4.1: Major advancements in coding and instruction following; GPT-4.1 has become the leading model for coding.
Long context: On Video-MME, a benchmark for multimodal long context understanding, GPT-4.1 has created a new state-of-the-art result.
The GPT-4.1 model series delivers superior performance at lower cost.
🔹 gpt-4.1-mini
gpt-4.1-mini: Represents a significant leap in small model performance, even outperforming GPT-4o on many benchmarks.
It matches or exceeds GPT-4o in intelligence assessment while reducing latency by nearly half and costs by 83%.
🔹 gpt-4.1-nano
gpt-4.1-nano: Features a larger context window—supporting up to 1 million context tokens
And can better utilize this context through improved long context understanding. Has an updated knowledge cutoff date of June 2024.
These models follows the standard OpenAI chat format for API calls, for reference see: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.14#

🔹 grok-3-deepersearch
grok-3-deepersearch: Features high data timeliness, excellent interactive experience, and thorough search thinking process; comprehensive webpage aggregation.
This model follows the OpenAI chat standard format for API calls, refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.13#

🔹 gemini-2.0-flash-exp-image-generation
This model supports conversation while enabling image generation and editing capabilities, outputting high-definition images.
This model follows the OpenAI chat standard format for API calls, refer to: https://apidoc.cometapi.com/api-15928299

🌟 2025.04.10#

🔹 grok-3-fast
🔹 grok-3-fast-latest
grok-3-fast, grok-3-fast-latest: grok-3 and grok-3-fast use exactly the same underlying model and provide the same response quality. However, grok-3-fast is served on faster infrastructure, delivering response times that are much quicker than the standard grok-3.
This model follows the OpenAI chat standard format for API calls, refer to: https://apidoc.cometapi.com/chat-api-13851472
🔹 grok-3-mini
🔹 grok-3-mini-latest
grok-3-mini, grok-3-mini-latest: A lightweight model that thinks before responding. Fast, intelligent, and ideal for logic-based tasks that don't require deep domain knowledge. The original thought traces are accessible.
This model follows the OpenAI chat standard format for API calls, refer to: https://apidoc.cometapi.com/chat-api-13851472
🔹 grok-3-mini-fast
🔹 grok-3-mini-fast-latest
grok-3-mini-fast, grok-3-mini-fast-latest: grok-3-mini and grok-3-mini-fast use exactly the same underlying model and provide the same response quality. However, grok-3-mini-fast is served on faster infrastructure, delivering response times that are much quicker than the standard grok-3-mini.
This model follows the OpenAI chat standard format for API calls, refer to: https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.04.07#

🔹 llama-4-maverick
llama-4-maverick, a high-capacity multimodal language model from Meta, supports multilingual text and image inputs and generates multilingual text and code output in 12 supported languages.
Maverick is optimized for visual language tasks and has instructions tuned for assistant-like behavior, image reasoning, and generic multimodal interaction.
Maverick features native multimodal early fusion and 1 million labeled context windows.
Maverick is released on April 5, 2025 under the Llama 4 Community License for research and commercial applications requiring advanced multimodal understanding and high model throughput.
The model follows the openai chat standard format call,cf:https://apidoc.cometapi.com/chat-api-13851472
🔹 llama-4-scout
llama-4-scout, is a mixed-expertise (MoE) language model developed by Meta. It supports native multimodal input (text and images) and multilingual output (text and code) for 12 supported languages.
Designed for assisted interaction and visual reasoning, Scout uses 16 experts per forward pass, a context length of 10 million words, and a training corpus of about 40 trillion words.
Designed for high efficiency and local or commercial deployment, llama-4-scout employs early fusion technology for seamless modal integration.
It is command-tuned for multilingual chat, subtitling, and image comprehension tasks.
It is released under the Llama 4 Community License, with last training data as of August 2024 and a public release on April 5, 2025.
The model follows the openai chat standard format for calls, cf:https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.03.29#

🔹 gpt-4o-all
gpt-4o-all has support for ChatGPT's latest generated image mode
The model follows the openai chat standard format for calls, cf:https://apidoc.cometapi.com/api-15928299
🔹 gpt-4o-image
gpt-4o-image This model is dedicated to image generation and editing, which enables image style conversion, preservation of original image features, superb consistency, and output of high-definition images.
The model follows the openai chat standard format for calls, cf:https://apidoc.cometapi.com/api-15928299

🌟 2025.03.27#

🔹 gemini-2.5-pro-exp-03-25
Features native multimodal processing capabilities, with an extensive context window of up to 1 million tokens, providing unprecedented powerful support for complex, long-sequence tasks.
The model follows the openai chat standard format for calls, cf:https://apidoc.cometapi.com/chat-api-13851472
🔹 gemini-2.5-pro-preview-03-25
According to Google's data, Gemini 2.5 Pro demonstrates particularly outstanding performance in handling complex tasks.
The model follows the openai chat standard format for calls, cf:https://apidoc.cometapi.com/chat-api-13851472

🌟 2025.03.24#

🔹 gpt-4.5-preview-2025-02-27
Preview Version: Showcasing the latest features of GPT-4.5, providing enhanced understanding and generation capabilities, suitable for various tasks, improving user experience.
🔹 gpt-4.5-preview
Preview Version: Deeply optimized algorithms and performance, delivering ultra-fast responses and precise outputs, perfectly suited for efficient decision-making scenarios.
🔹 gpt-4.5
Professional Standard Version: Stable and reliable, combining rich expression and multi-task processing capabilities, suitable for wide applications including business, education, creative, and technical fields.

🌟 2025.02.20#

🔹 claude-3-7-sonnet-thinking
Advanced model designed for complex reasoning and creative thinking, unleashing unlimited possibilities, empowering breakthrough problem-solving and innovation.
🔹 claude-3-7-sonnet-20250219
High-end version integrating the latest technological breakthroughs, handling complex tasks with superior performance, providing intelligent innovative solutions for users.
🔹 cometapi-3-7-sonnet
Outstanding multi-domain processing expert, delivering precise and smooth output experience, easily tackling various professional challenges.
🔹 cometapi-3-7-sonnet-thinking
Equipped with revolutionary algorithm architecture, significantly enhancing deep analysis and complex task management capabilities, making thinking more thorough and comprehensive.

🔗 Usage Guide:#

✅ All models have been added to the default group, allowing you to flexibly call them according to different usage scenarios and requirements, easily integrate, and maximize their application value.

🛠 Quick Start:#

Simple integration into your system unlocks powerful capabilities. Fully utilize each model's unique advantages to meet professional needs across different domains.
🔥 Experience the revolutionary performance improvements these breakthrough models bring right now! 🔥

For professional support or detailed consultation, please contact our customer service team or visit our technical documentation center. We look forward to your valuable feedback!
Previous
Important Guidelines
Next
Error Codes & Handling
Built with