API Doc-CometAPI
English
  • English
  • Русский
HomeDashBoardModel Marketplace
HomeDashBoardModel Marketplace
Discord_Support
English
  • English
  • Русский
  1. Gemini
  • 🚀 Overview
    • Quick Start
    • Important Guidelines
    • Release Notes
    • Quickly request CometAPI via ApiDog
    • Models
  • 💬 Text Models
    • Chat
    • Responses
    • Anthropic Messages
    • Gemini Generating Content
    • Embeddings
    • Recognizing Images
  • 🖼️ Image Models
    • Midjourney
      • Midjourney Quick Start: Complete Image Generation Workflow in One Go
      • Task Fetching API
        • List by Condition
        • Fetch Single Task (most recommended)
      • Submit Video
      • Submit Editor
      • Imagine
      • Action (UPSCALE; VARIATION; REROLL; ZOOM, etc.)
      • Blend (image -> image)
      • Describe (image -> text)
      • Modal (Area Redesign & Zoom)
    • Replicate(image)
      • Create Predictions - General
      • replicate query
      • Create Task -flux-kontext-pro、max
      • Create Task -flux-1.1-pro
      • Create Task -flux-1.1-pro-ultra
    • seededit/seedream
      • bytedance-image-generation(seedream)
      • bytedance-Image Editing (seededit)
    • OpenAI
      • Images
      • gpt-4o-image generates image
      • Image Editing (gpt-image-1)
    • Gemini
      • Guide to calling gemini-2.5-flash-image (Nano Banana)
      • Gemini generates image
        POST
    • Hunyuan3D
      POST
  • 🎵 Music Models
    • Suno
      • Setting suno Version
      • Suno API Scenario Application Guide
      • Generate lyrics
      • Generate music clip
      • Upload clip
      • Submit concatenation
      • Full Track Audio Separation
      • Single Track Audio Separation
      • Create New Persona
      • add style tags
      • Single task query
      • Generate mp4 mv video
      • Timing: lyrics, audio timeline
      • Get wav format file
      • Get midi
      • Batch query tasks
  • 📺 Video Models
    • veo3
      • veo3-chat format
      • Submit video generation task
      • Query video generation status
    • sora-2
      • official
        • Create video
        • Remix video
        • Retrieve video
        • Delete video
        • Retrieve video content
      • sora-2 generate video by chat
    • runway(video)
      • official format
        • runway images raw video
        • Generate a video from a video
        • Generate an image from text
        • Upscale a video
        • Control a character
        • runway to get task details
      • Reverse Format
        • generate(text)
        • generate(Reference images)
        • Video to Video Style Redraw
        • Act-one Expression Migration
        • feed-get task
    • kling (video)
      • callback_url
      • Multimodal Video Editing
        • Initialize Video for Editing
        • Add Video Selection
        • Delete Video Selection
        • Clear Video Selection
        • Preview Selected Video Area
        • Create Task
      • Generating images
      • Expanded
      • Text Generation Video
      • Image Generation Video
      • Multi-Image To Video
      • Multi-Image to Image
      • Video Extension
      • virtual try-on
      • lip sync
      • effects
      • Video to audio
      • Text to audio
      • Individual queries
    • bytedance
      • bytedance-video
      • bytedance-video get
    • MiniMax Conch(video)
      • MiniMax Conch Official Documentation
      • MiniMax Conch Generation
      • MiniMax Conch Query
      • MiniMax Conch Download
  • 🔊 Audio Models
    • Create speech
    • Create transcription
    • Create translation
  • 🧩 Integration Guides
    • LiteLLM
    • Dify
    • Make
    • n8n
    • Lobe-Chat
    • COZE
    • Zapier
    • Activepieces
    • LlamaIndex
    • Continue
    • FlowiseAI
    • Chatbox
    • CherryStudio
    • Cursor
    • AnythingLLM
    • LangChain
    • BuildShip
    • gptme
    • Immersive Translation
    • Cline
    • Eudic Translation
    • ChatHub
    • OpenAI Translator
    • ChatAll Translation
    • Pot Translation
    • Zotero
    • NEXT CHAT (ChatGPT Next Web)
    • Obsidian's Text Generator Plugin
    • librechat
    • utools-ChatGPT Friend
    • avante.nvim
    • Open WebUI
    • GPT Academic Optimization (gpt_academic)
    • OpenManus
    • IntelliJ Translation Plugin
    • FastGPT
    • n8n Local Deployment
  • ⚠️ Errors
    • Error Codes & Handling
  • 📝 Code Examples
    • Text-to-Image Generation
    • Image-to-image generation URL upload
    • Regular Post Text Conversation
    • OpenAI Official Library Usage Example
    • Streamed Output
    • Json Fixed Format Output Code Display
    • Embedding code example
    • o1-preview Model Code Example
    • LangChain Usage Example (Successful Test Date: 2024-11-25)
    • Openai dall-e-3 & flux series drawing model
    • gpt, claude, gemini multimodal network image parsing example
    • Multimodal PDF File Parsing Examples for GPT, Claude, and Gemini
    • Code example
  • 🏄🏼‍♀️ Best Practices
    • Midjourney Best Practices
    • Retry Logic Documentation for CometAPI and OpenAI Official API
    • Runway Best Practices
    • Claude Code Installation and Usage Guide
    • Gemini CLI Installation and Usage Guide
    • Codex Usage Guide
    • CometAPI Account Balance Query API Usage Instructions
  • 💳 Pricing & Billing
    • About Pricing
  • 🤝🏼 Support
    • Help Center
    • Interface Stability
    • Privacy policy
    • Terms of service
    • Common Misconceptions
    • Confusion about use
  1. Gemini

Guide to calling gemini-2.5-flash-image (Nano Banana)

This document demonstrates how to use Google Gemini's image model, gemini-2.5-flash-image-preview, via cometapi for image generation. It covers two common methods:
Gemini's official generateContent API for text-to-image generation
Gemini's official generateContent API for image-to-image generation (both input and output are Base64)
Important Notes:
Replace sk-xxxx in the examples with your cometapi key. For security, do not expose your key in client-side code or public repositories.
The Authorization header for cometapi typically uses the key value directly, e.g., Authorization: sk-xxxx.
The returned image is usually provided as Base64-encoded inline_data. You will need to decode it on the client side and save it as a file.
Basic Information:
Base URL: https://api.cometapi.com
Model Name: gemini-2.5-flash-image-preview / gemini-2.5-flash-image

I. Gemini's Official generateContent for Text-to-Image#

Use Gemini's official generateContent endpoint for text-to-image generation. Place the text prompt in contents.parts[].text.
Example (Windows shell, using ^ for line continuation):
Response Highlights:
The image data is typically found in response.candidates[0].content.parts, which can contain:
Text description:
{ "text": "..." }
Image data:
{ "inline_data": { "mime_type": "image/png", "data": "<base64>" } }
Decode the data field (the Base64 string) and save it as a file with the corresponding extension.

II. Gemini's Official generateContent for Image-to-Image (Base64 I/O)#

This endpoint supports "image-to-image" generation: upload an input image (as Base64) and receive a modified new image (also in Base64 format).
Example:
Description:
First, convert your source image file into a Base64 string and place it in inline_data.data. Do not include prefixes like data:image/jpeg;base64,.
The output is also located in candidates[0].content.parts and includes:
An optional text part (description or prompt).
The image part as inline_data (where data is the Base64 of the output image).
For multiple images, you can append them directly, for example:

III. Official Gemini: Image Generation from Multiple Images (Base64 Input/Output)#

This endpoint supports "multi-image to image" generation: upload multiple input images (Base64) and it will return a new, modified image (also in Base64 format).

Method 1: Combine multiple images into a single collage, as shown in the example below#

99002da7909de23682b9390cf1b325a0.jpg
Example input description:
A model is posing and leaning against a pink bmw. She is wearing the following items, the scene is against a light grey background. The green alien is a keychain and it's attached to the pink handbag. The model also has a pink parrot on her shoulder. There is a pug sitting next to her wearing a pink collar and gold headphones.
Returned Base64 converted back to an image:
68774664ff156db0ee5d50861b620932.jpg
Example:
Notes:
First, convert your source image file to a Base64 string and insert it into inline_data.data (do not include prefixes like data:image/jpeg;base64,).
The output is also located in candidates[0].content.parts and contains:
An optional text part (description or prompt)
An image part inline_data (where data is the Base64 of the output image)

Method 2: Pass multiple images via Base64 parameters (up to three images)#

Example:
API Input Parameters:
Example of returned image:
converted_image.png

How to Extract and Save the Base64 Image from the Response#

Using the Gemini-style response as an example, the pseudo-structure is as follows (for illustration purposes):
{
  "candidates": [
    {
      "content": {
        "parts": [
          { "text": "..." },
          {
            "inlineData": {
              "mimeType": "image/png",
              "data": "<base64-string>"
            }
          }
        ]
      }
    }
  ]
}
Extract the inlineData.data string and choose the file extension based on its mimeType (e.g., .png, .jpg, .webp).
In your client application, decode the Base64 string and save it as a binary file.
Python Example:
Node.js Example:

FAQ & Suggestions#

Authorization Header Format
Use Authorization: sk-xxxx. Please refer to the official cometapi documentation for confirmation.
Prompt Optimization
Specifying style keywords (e.g., "cyberpunk, film grain, low contrast, high saturation"), aspect ratio (square/landscape/portrait), subject, background, lighting, and level of detail can help improve the results.
Base64 Note
Do not include a prefix like data:image/png;base64, in the data field; only include the pure Base64 data string.
Troubleshooting
4xx errors usually indicate issues with request parameters or authentication (check your key, model name, JSON format). 5xx errors are server-side problems (you can retry later or contact support).
The above covers the methods and key points for using gemini-2.5-flash-image-preview via cometapi. Choose the API style that suits your needs and implement the Base64 image decoding and display on the client side.

🍌 Flash 2.5 Image Updates#

a. Flexible Aspect Ratios#

Now supports multiple aspect ratio settings for easy content creation across different devices. All resolutions consume 1,290 tokens by default.
Supported aspect ratios:
1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Reference examples:
https://ai.google.dev/gemini-api/docs/image-generation#aspect_ratios
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/image-generation#googlegenaisdk_imggen_mmflash_with_txt-drest
https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2_5_image_gen.ipynb

b. Model Name Update#

All new features will be available on the new model ID: gemini-2.5-flash-image. The previous gemini-2.5-flash-image-preview will be deprecated.
⚠️ Migration required by October 31, 2025

c. Force Image Output#

To address the frequent issue of text-only outputs, you can now set "responseModalities" to ["IMAGE"] only in API requests. This ensures image generation without text-only responses.
Official API documentation for reference:
Image Generation
Image Understanding (Multimodal)
Image Generation
Modified at 2025-10-22 09:07:38
Previous
Image Editing (gpt-image-1)
Next
Gemini generates image
Built with