呼叫 Gemini 圖像模型指南

本指南示範如何透過 CometAPI 使用 Google Gen AI SDK 來操作 Gemini 圖像模型。內容涵蓋：

文字生圖
圖生圖編輯
多圖合成
儲存生成的圖片

Base URL: https://api.cometapi.com
安裝 SDK：pip install google-genai（Python）或 npm install @google/genai（Node.js）

設定

使用 CometAPI 的 base URL 初始化 client：

from google import genai
from google.genai import types
import os

COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=COMETAPI_KEY,
)

文字轉圖片生成

根據文字 Prompt 生成圖片並將其儲存為檔案。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")
        print("Image saved to generated_image.png")

回應結構： 圖片資料位於 candidates[0].content.parts 中，其中可以包含文字和／或圖片部分：

{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "Here is your image..." },
        {
          "inlineData": {
            "mimeType": "image/png",
            "data": "<base64-encoded-image>"
          }
        }
      ]
    }
  }]
}

圖生圖生成

上傳一張輸入圖片，並透過文字 Prompt 進行轉換。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

# Load the source image
source_image = Image.open("source.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Transform this into a watercolor painting", source_image],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("watercolor_output.png")

Python SDK 可直接接受 PIL.Image 物件——不需要手動進行 Base64 編碼。
傳入原始 Base64 字串時，不要包含 data:image/jpeg;base64, 前綴。

多圖像合成

從多張輸入圖片生成一張新圖片。CometAPI 支援兩種方式：

方法 1：單一拼貼圖片

將多張來源圖片合併成一張拼貼圖，然後描述想要的輸出結果。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

collage = Image.open("collage.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[
        "A model is posing and leaning against a pink BMW with a green alien keychain attached to a pink handbag, a pink parrot on her shoulder, and a pug wearing a pink collar and gold headphones",
        collage,
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("composition_output.png")

方法 2：多張獨立圖片（最多 14 張）

直接傳入多張圖片。Gemini 3 模型最多支援 14 張參考圖片（物件 + 角色）：

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")
image3 = Image.open("image3.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Merge the three images", image1, image2, image3],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("merged_output.png")

4K 圖像生成

指定包含 aspect_ratio 與 image_size 的 image_config，以輸出高解析度影像：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="4K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("butterfly_4k.png")

多輪影像編輯（聊天）

使用 SDK 的聊天功能來逐步精修影像：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

chat = client.chats.create(
    model="gemini-3.1-flash-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

# First turn: generate
response = chat.send_message(
    "Create a vibrant infographic explaining photosynthesis as a recipe, styled like a colorful kids cookbook"
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis.png")

# Second turn: refine
response = chat.send_message("Update this infographic to be in Spanish. Do not change any other elements.")

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis_spanish.png")

提示

Prompt 最佳化

請明確指定風格關鍵字（例如「cyberpunk、film grain、low contrast」）、長寬比、主體、背景、光線，以及細節層級。

Base64 格式

使用原始 HTTP 時，請勿包含 data:image/png;base64, 前綴——只使用原始 Base64 字串即可。Python SDK 會透過 PIL.Image 物件自動處理這一點。

強制輸出影像

將 "responseModalities" 僅設為 ["IMAGE"]，即可保證輸出影像而不含文字。

如需更多詳細資訊，請參閱 API 參考文件。 官方文件： Gemini Image Generation

Gemini Image Understanding

概覽

API 參考

整合指南

錯誤

定價與帳務

支援

設定

文字轉圖片生成

圖生圖生成

多圖像合成

方法 1：單一拼貼圖片

方法 2：多張獨立圖片（最多 14 張）

4K 圖像生成

多輪影像編輯（聊天）

提示

概覽

API 參考

整合指南

錯誤

定價與帳務

支援

​設定

​文字轉圖片生成

​圖生圖生成

​多圖像合成

​方法 1：單一拼貼圖片

​方法 2：多張獨立圖片（最多 14 張）

​4K 圖像生成

​多輪影像編輯（聊天）

​提示

設定

文字轉圖片生成

圖生圖生成

多圖像合成

方法 1：單一拼貼圖片

方法 2：多張獨立圖片（最多 14 張）

4K 圖像生成

多輪影像編輯（聊天）

提示