🎵 Introducing Two Major Music Creation Features: Easily add accompaniment to vocals and generate lyrics & vocals for instrumental tracks.
Add Instrumental: Upload an a cappella vocal track, and Suno will intelligently generate and add a matching accompaniment.
Add Vocals: Upload an instrumental track, and Suno will generate lyrics and a vocal performance to match.
✨ Massive Video Effects Library Expansion: Added 63 new video effects (62 single-subject effects and 1 two-person interactive effect), bringing the total to 80 available effects for more creative choices.
🔊 Video-to-Audio Optimization: The video-to-audio generation feature now supports full-resolution video uploads for more precise sound effect matching.
📈 Multi-Image to Video Performance Skyrockets: Experience a 102% improvement over the previous version! See significant enhancements in subject consistency, dynamic quality, and interaction naturalness. This is a seamless upgrade with no code changes required.
🎬 Text-to-Video Quality Upgrade: Version 1.6 now supports the generation of higher-quality videos.
Parameter Example: "mode": "pro"
🎨 Image Generation Model Update: The new kling-v2-new
model is now live, supporting nearly 300 image styles to maximize your creativity!
gpt-5
, gpt-5-2025-08-07
: OpenAI's flagship model, widely recognized as the industry's most powerful for coding, reasoning, and agentic tasks. It is designed to handle the most complex cross-domain challenges and excels in code generation, advanced reasoning, and autonomous agents, making it the premier choice for users demanding peak performance.
gpt-5-chat-latest
: The continuously updated version of GPT-5. It always incorporates the latest features and optimizations, recommended for applications that need to stay current with the latest model capabilities.
Important: top_p is not supported by this series of models.
gpt-5-chat-latest
: Supports custom temperature values between 0 and 1 (inclusive).
All other GPT-5 models: The temperature is fixed at 1. You may set it to 1 or omit it (defaults to 1).
When calling the GPT-5 series models (excluding gpt-5-chat-latest), the max_tokens field should be changed to max_completion_tokens.
claude-opus-4-1-20250805
: Anthropic's flagship Claude Opus 4.1 model, achieving major breakthroughs in programming, reasoning, and agentic tasks, with SWE-bench Verified reaching 74.5%.
Significantly enhanced multi-file code refactoring, debugging precision, and detail-oriented reasoning capabilities. This model is suitable for demanding programming and reasoning scenarios.
We have also added cometapi-opus-4-1-20250805
specifically for Cursor integration.
claude-opus-4-1-20250805-thinking
: Claude Opus 4.1 version with extended thinking capabilities, providing up to 64K tokens of deep reasoning capacity.
Optimized for research, data analysis, and tool-assisted reasoning tasks, with powerful detail-oriented reasoning abilities.
We have also added cometapi-opus-4-1-20250805-thinking
specifically for Cursor integration.
glm-4.5
: Flagship model with 355B total parameters and 32B active parameters, designed for agentic applications, supporting hybrid reasoning modes and excelling in complex reasoning, tool calling, and web browsing.
glm-4.5-air
: Cost-effective model with 106B total parameters and 12B active parameters, maintaining strong performance while significantly reducing costs, ideal for resource-sensitive applications.
glm-4.5-x
: High-performance model optimized for ultra-fast inference and powerful reasoning capabilities, delivering millisecond-level response times for scenarios requiring speed and logic.
glm-4.5-airx
: Lightweight yet powerful model combining Air's cost advantages with X's speed benefits, offering the perfect balance between performance and efficiency.
glm-4.5-flash
: Efficient multi-purpose model with high generation speed, specifically optimized for coding and reasoning tasks, suitable for developers getting started and rapid prototyping.
-Follows the OpenAI chat standard format, see details: CometAPI Chat Documentationgemini-2.5-pro-all
: A multimodal version of the Gemini model,
supporting analysis of files, videos, and images, as well as image generation and real-time web access.
gemini-2.5-flash-all
: A multimodal version of the Gemini model,
supporting analysis of files, videos, and images, as well as image generation and real-time web access.
gemini-2.5-pro-deepsearch
: A deep search model with enhanced deep search and information retrieval capabilities
ideal for complex knowledge integration and analysis.
gemini-2.5-flash-deepsearch
: A deep search model combining the rapid performance of the Flash model with advanced deep search capabilities for fast, in-depth information discovery.
deepseek-r1t2-chimera
: A 671B parameter Mixture-of-Experts (MoE) text generation model merged from DeepSeek-AI's R1-0528, R1, and V3-0324, supporting a context of up to 60k tokens.
-Follows the OpenAI chat standard format, see details: CometAPI Chat Documentationkimi-k2-0711-preview
: Kimi K2 is a large-scale mixed-expertise (MoE) language model developed by Moonshot AI.
with 1 trillion total parameters and 32 billion active parameters per forward pass. It is optimized for agent capabilities including advanced tool usage, inference, and code synthesis.
Kimi K2 performs well in a variety of benchmarks, especially in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA) and tool usage (Tau2, AceBench) tasks.
It supports long contextual inference with up to 128K tokens and features a novel training stack design that includes the MuonClip optimizer for stable large-scale MoE training.
• Add power to your development workflow. We're excited to announce that CometAPI now fully supports the powerful Claude Code.
• What does this mean for you?
• Top Artificial Intelligence features: Easily generate, debug and optimize code using models built specifically for developers.
• ⚙️ Flexible Model Selection: Our comprehensive range of models allows you to develop more seamlessly.
• Seamless Integration: APIs are always available. Integrate Claude Code directly into your existing workflow in minutes.
• Ready to build faster? Please click on the link below to make a call.
Now supports Suno's full-track and single-track stem separation features, which can split your songs into up to 12 clean tracks—including vocals, drums, bass, etc., convenient for preview and download.
Create new Persona to generate Persona: singer style; generate music in different formats;
Note: Full-track stem separation feature is priced at 5 times the music generation cost; single-track billing is still being optimized, currently maintaining the same 5x pricing as full-track, will be charged at 1x base price in the future
o3-pro
,o3-pro-2025-06-10
: Supports web search, file analysis, visual input reasoning, Python programming, and personalized responses.
Compared to previous models, o3-pro shows significant improvements in clarity, completeness, instruction following, and accuracy.
This model adheres to the OpenAI v1/responses standard call format. For specific reference:
qwen3-235b-a22b
: This is the flagship model of the Qwen3 series, with 235 billion parameters, utilizing a Mixture of Experts (MoE) architecture.
Particularly suitable for complex tasks requiring high-performance inference, such as coding, mathematics, and multimodal applications.
qwen3-30b-a3b
: With 30 billion parameters, it balances performance and resource requirements, suitable for enterprise-level applications.
This model may use MoE or other optimized architectures, applicable for scenarios requiring efficient processing of complex tasks, such as intelligent customer service and content generation.
qwen3-8b
: A lightweight model with 800 million parameters, designed specifically for resource-constrained environments (such as mobile devices or low-configuration servers).
Its efficiency and fast response capability make it suitable for simple queries, real-time interaction, and lightweight applications.
gemini-2.5-flash-preview-04-17
, Gemini 2.5 Flash is an AI model developed by Google, designed to provide developers with fast and cost-effective solutions, especially suitable for applications requiring enhanced reasoning capabilities.
According to the Gemini 2.5 Flash preview announcement, the model's preview version was released on April 17, 2025, supports multimodal input, and has a context window of up to 1 million tokens.
o4-mini
, o4-mini-2025-04-16
: A smaller, faster, and more economical model, research shows it performs well in mathematics, coding, and visual tasks, designed to be efficient and responsive, suitable for developers. Released on April 16, 2025.
o3
, o3-2025-04-16
: A reflective generative pre-trained transformer (GPT) model designed to handle problems requiring step-by-step logical reasoning.
Research shows it excels at mathematics, coding, and scientific tasks. It can also use tools such as web browsing and image generation, with a release date of April 16, 2025.
gpt-4.1
: Major advancements in coding and instruction following; GPT-4.1 has become the leading model for coding.
Long context: On Video-MME, a benchmark for multimodal long context understanding, GPT-4.1 has created a new state-of-the-art result.
The GPT-4.1 model series delivers superior performance at lower cost.
llama-4-maverick, a high-capacity multimodal language model from Meta, supports multilingual text and image inputs and generates multilingual text and code output in 12 supported languages.
Maverick is optimized for visual language tasks and has instructions tuned for assistant-like behavior, image reasoning, and generic multimodal interaction.
Maverick features native multimodal early fusion and 1 million labeled context windows.
Maverick is released on April 5, 2025 under the Llama 4 Community License for research and commercial applications requiring advanced multimodal understanding and high model throughput.
llama-4-scout, is a mixed-expertise (MoE) language model developed by Meta. It supports native multimodal input (text and images) and multilingual output (text and code) for 12 supported languages.
Designed for assisted interaction and visual reasoning, Scout uses 16 experts per forward pass, a context length of 10 million words, and a training corpus of about 40 trillion words.
Designed for high efficiency and local or commercial deployment, llama-4-scout employs early fusion technology for seamless modal integration.
It is command-tuned for multilingual chat, subtitling, and image comprehension tasks.
It is released under the Llama 4 Community License, with last training data as of August 2024 and a public release on April 5, 2025.