跳轉到主要內容
本指南示範如何透過 CometAPI 使用 Google Gen AI SDK 來操作 Gemini 圖像模型。內容涵蓋:
  • 文字生圖
  • 圖生圖編輯
  • 多圖合成
  • 儲存生成的圖片
  • Base URL: https://api.cometapi.com
  • 安裝 SDK:pip install google-genai(Python)或 npm install @google/genai(Node.js)

設定

使用 CometAPI 的 base URL 初始化 client:
from google import genai
from google.genai import types
import os

COMETAPI_KEY = os.environ.get("COMETAPI_KEY") or "<YOUR_COMETAPI_KEY>"

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=COMETAPI_KEY,
)

文字轉圖片生成

根據文字 Prompt 生成圖片並將其儲存為檔案。
from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("generated_image.png")
        print("Image saved to generated_image.png")
回應結構: 圖片資料位於 candidates[0].content.parts 中,其中可以包含文字和/或圖片部分:
{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "Here is your image..." },
        {
          "inlineData": {
            "mimeType": "image/png",
            "data": "<base64-encoded-image>"
          }
        }
      ]
    }
  }]
}

圖生圖生成

上傳一張輸入圖片,並透過文字 Prompt 進行轉換。
from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

# Load the source image
source_image = Image.open("source.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Transform this into a watercolor painting", source_image],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        image = part.as_image()
        image.save("watercolor_output.png")
  • Python SDK 可直接接受 PIL.Image 物件——不需要手動進行 Base64 編碼。
  • 傳入原始 Base64 字串時,不要包含 data:image/jpeg;base64, 前綴。

多圖像合成

從多張輸入圖片生成一張新圖片。CometAPI 支援兩種方式:

方法 1:單一拼貼圖片

將多張來源圖片合併成一張拼貼圖,然後描述想要的輸出結果。
輸入拼貼範例
生成輸出
from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

collage = Image.open("collage.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[
        "A model is posing and leaning against a pink BMW with a green alien keychain attached to a pink handbag, a pink parrot on her shoulder, and a pug wearing a pink collar and gold headphones",
        collage,
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("composition_output.png")

方法 2:多張獨立圖片(最多 14 張)

直接傳入多張圖片。Gemini 3 模型最多支援 14 張參考圖片(物件 + 角色):
from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")
image3 = Image.open("image3.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Merge the three images", image1, image2, image3],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

for part in response.parts:
    if part.inline_data is not None:
        part.as_image().save("merged_output.png")
多圖像生成結果

4K 圖像生成

指定包含 aspect_ratioimage_sizeimage_config,以輸出高解析度影像:
from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="4K",
        ),
    ),
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("butterfly_4k.png")

多輪影像編輯(聊天)

使用 SDK 的聊天功能來逐步精修影像:
from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

chat = client.chats.create(
    model="gemini-3.1-flash-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

# First turn: generate
response = chat.send_message(
    "Create a vibrant infographic explaining photosynthesis as a recipe, styled like a colorful kids cookbook"
)

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis.png")

# Second turn: refine
response = chat.send_message("Update this infographic to be in Spanish. Do not change any other elements.")

for part in response.parts:
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        image.save("photosynthesis_spanish.png")

提示

請明確指定風格關鍵字(例如「cyberpunk、film grain、low contrast」)、長寬比、主體、背景、光線,以及細節層級。
使用原始 HTTP 時,請勿包含 data:image/png;base64, 前綴——只使用原始 Base64 字串即可。Python SDK 會透過 PIL.Image 物件自動處理這一點。
"responseModalities" 僅設為 ["IMAGE"],即可保證輸出影像而不含文字。
如需更多詳細資訊,請參閱 API 參考文件 官方文件: Gemini Image Generation