使用 Gemini 图像模型

本指南演示如何通过 CometAPI 使用 Google Gen AI SDK 调用 Gemini 图像模型。内容包括：

文生图
图像到图像编辑
多图合成
保存生成的图像

Base URL： https://api.cometapi.com
安装 SDK：pip install google-genai（Python）或 npm install @google/genai（Node.js）

设置

使用 CometAPI 的 base URL 初始化客户端：

from google import genai
from google.genai import types
import os

COMETAPI_KEY = os.environ["COMETAPI_KEY"]

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=COMETAPI_KEY,
)

import { GoogleGenAI } from "@google/genai";

const COMETAPI_KEY = process.env.COMETAPI_KEY;

const ai = new GoogleGenAI({
  apiKey: COMETAPI_KEY,
  httpOptions: { apiVersion: "v1beta", baseUrl: "https://api.cometapi.com" },
});

package main

import (
	"context"
	"os"
	"google.golang.org/genai"
)

func main() {
	ctx := context.Background()
	apiKey := os.Getenv("COMETAPI_KEY")

	client, _ := genai.NewClient(ctx, &genai.ClientConfig{
		APIKey:  apiKey,
		Backend: genai.BackendGeminiAPI,
		HTTPOptions: genai.HTTPOptions{
			BaseURL: "https://api.cometapi.com",
		},
	})
	// use client below...
}

文本转图像生成

根据文本 Prompt 生成图像并将其保存到文件中。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        final_image = part.as_image()

if final_image:
    final_image.save("generated_image.png")
    print("Image saved to generated_image.png")

import { GoogleGenAI } from "@google/genai";
import * as fs from "fs";

const ai = new GoogleGenAI({
  apiKey: process.env.COMETAPI_KEY,
  httpOptions: { apiVersion: "v1beta", baseUrl: "https://api.cometapi.com" },
});

const response = await ai.models.generateContent({
  model: "gemini-3.1-flash-image-preview",
  contents: "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme",
  config: { responseModalities: ["TEXT", "IMAGE"] },
});

let finalImagePart;
for (const part of response.candidates[0].content.parts) {
  if (part.thought === true) {
    continue;
  }
  if (part.text) {
    console.log(part.text);
  }
  if (part.inlineData) {
    finalImagePart = part;
  }
}

if (finalImagePart) {
  const buffer = Buffer.from(finalImagePart.inlineData.data, "base64");
  fs.writeFileSync("generated_image.png", buffer);
  console.log("Image saved to generated_image.png");
}

curl -s -X POST \
  "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Create a picture of a nano banana dish in a fancy restaurant with a Gemini theme"}]
    }],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"]
    }
  }'

保存最终图像部分： 图像数据位于 candidates[0].content.parts 中，其中可能同时包含文本部分和/或图像部分。Gemini 图像模型也可能在最终图像之前返回中间 thought 部分，尤其是在你同时请求文本和图像，或者显式启用 thinking 输出时。不要盲目保存第一个 inlineData；应跳过 thought 为 true 的部分，然后保存最后一个剩余的图像部分。仅包含最终图像的典型响应：

{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "Here is your image..." },
        {
          "inlineData": {
            "mimeType": "image/png",
            "data": "<base64-encoded-image>"
          }
        }
      ]
    }
  }]
}

包含文本部分、中间 thought 图像以及最终图像的响应：

{
  "candidates": [{
    "content": {
      "role": "model",
      "parts": [
        { "text": "Here is your image..." },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "<base64-encoded-intermediate-image>"
          },
          "thought": true
        },
        {
          "inlineData": {
            "mimeType": "image/jpeg",
            "data": "<base64-encoded-final-image>"
          },
          "thought": false,
          "thoughtSignature": "<signature>"
        }
      ]
    },
    "finishReason": "STOP"
  }]
}

对每个 Gemini 图像响应都使用这条解析规则：

const imageParts = response.candidates[0].content.parts.filter(
  (part) => part.inlineData && part.thought !== true,
);
const finalImagePart = imageParts.at(-1);

图生图生成

上传输入图像，并使用文本提示词对其进行转换。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

# Load the source image
source_image = Image.open("source.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Transform this into a watercolor painting", source_image],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.text is not None:
        print(part.text)
    elif part.inline_data is not None:
        final_image = part.as_image()

if final_image:
    final_image.save("watercolor_output.png")

import { GoogleGenAI } from "@google/genai";
import * as fs from "fs";

const ai = new GoogleGenAI({
  apiKey: process.env.COMETAPI_KEY,
  httpOptions: { apiVersion: "v1beta", baseUrl: "https://api.cometapi.com" },
});

const imageData = fs.readFileSync("source.jpg").toString("base64");

const response = await ai.models.generateContent({
  model: "gemini-3.1-flash-image-preview",
  contents: [
    { text: "Transform this into a watercolor painting" },
    { inlineData: { mimeType: "image/jpeg", data: imageData } },
  ],
  config: { responseModalities: ["TEXT", "IMAGE"] },
});

const imageParts = response.candidates[0].content.parts.filter(
  (part) => part.inlineData && part.thought !== true,
);
const finalImagePart = imageParts.at(-1);

if (finalImagePart) {
  fs.writeFileSync("watercolor_output.png", Buffer.from(finalImagePart.inlineData.data, "base64"));
}

curl -s -X POST \
  "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "Transform this into a watercolor painting" },
        { "inline_data": { "mime_type": "image/jpeg", "data": "<base64-encoded-source-image>" } }
      ]
    }],
    "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] }
  }'

Python SDK 可直接接受 PIL.Image 对象——无需手动进行 Base64 编码。
传递原始 Base64 字符串时，不要包含 data:image/jpeg;base64, 前缀。

多图像合成

基于多个输入图像生成一张新图像。CometAPI 支持两种方式：

方法 1：单张拼贴图像

将多个源图像合并为一张拼贴图，然后描述期望的输出。

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

collage = Image.open("collage.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=[
        "A model is posing and leaning against a pink BMW with a green alien keychain attached to a pink handbag, a pink parrot on her shoulder, and a pug wearing a pink collar and gold headphones",
        collage,
    ],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.inline_data is not None:
        final_image = part.as_image()

if final_image:
    final_image.save("composition_output.png")

curl -s -X POST \
  "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "A model is posing and leaning against a pink BMW with a green alien keychain attached to a pink handbag, a pink parrot on her shoulder, and a pug wearing a pink collar and gold headphones" },
        { "inline_data": { "mime_type": "image/jpeg", "data": "<base64-encoded-collage-image>" } }
      ]
    }],
    "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] }
  }'

方法 2：多张独立图像（最多 14 张）

直接传入多张图像。Gemini 3 模型最多支持 14 张参考图像（物体 + 角色）：

from google import genai
from google.genai import types
from PIL import Image
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

image1 = Image.open("image1.jpg")
image2 = Image.open("image2.jpg")
image3 = Image.open("image3.jpg")

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents=["Merge the three images", image1, image2, image3],
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.inline_data is not None:
        final_image = part.as_image()

if final_image:
    final_image.save("merged_output.png")

curl -s -X POST \
  "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "role": "user",
      "parts": [
        { "text": "Merge the three images" },
        { "inline_data": { "mime_type": "image/jpeg", "data": "<base64-image-1>" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "<base64-image-2>" } },
        { "inline_data": { "mime_type": "image/jpeg", "data": "<base64-image-3>" } }
      ]
    }],
    "generationConfig": { "responseModalities": ["TEXT", "IMAGE"] }
  }'

4K 图像生成

指定带有 aspect_ratio 和 image_size 的 image_config 以获得高分辨率输出：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

response = client.models.generate_content(
    model="gemini-3.1-flash-image-preview",
    contents="Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",
            image_size="4K",
        ),
    ),
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        final_image = image

if final_image:
    final_image.save("butterfly_4k.png")

import { GoogleGenAI } from "@google/genai";
import * as fs from "fs";

const ai = new GoogleGenAI({
  apiKey: process.env.COMETAPI_KEY,
  httpOptions: { apiVersion: "v1beta", baseUrl: "https://api.cometapi.com" },
});

const response = await ai.models.generateContent({
  model: "gemini-3.1-flash-image-preview",
  contents: "Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment",
  config: {
    responseModalities: ["TEXT", "IMAGE"],
    imageConfig: { aspectRatio: "1:1", imageSize: "4K" },
  },
});

const imageParts = response.candidates[0].content.parts.filter(
  (part) => part.inlineData && part.thought !== true,
);
const finalImagePart = imageParts.at(-1);

if (finalImagePart) {
  fs.writeFileSync("butterfly_4k.png", Buffer.from(finalImagePart.inlineData.data, "base64"));
}

curl -s -X POST \
  "https://api.cometapi.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent" \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"parts": [{"text": "Da Vinci style anatomical sketch of a Monarch butterfly on textured parchment"}]}],
    "generationConfig": {
      "responseModalities": ["TEXT", "IMAGE"],
      "imageConfig": {"aspectRatio": "1:1", "imageSize": "4K"}
    }
  }'

对于高分辨率请求，请以最后一个非 thought 的图像 part 作为输出判断依据。如果你的集成保存的是第一个 inlineData part，则可能会保存一个中间 thought 图像，其分辨率会低于请求的 imageSize。

多轮图像编辑（聊天）

使用 SDK 的聊天功能来迭代优化图像：

from google import genai
from google.genai import types
import os

client = genai.Client(
    http_options={"api_version": "v1beta", "base_url": "https://api.cometapi.com"},
    api_key=os.environ.get("COMETAPI_KEY"),
)

chat = client.chats.create(
    model="gemini-3.1-flash-image-preview",
    config=types.GenerateContentConfig(
        response_modalities=["TEXT", "IMAGE"],
    ),
)

## First turn: Generate
response = chat.send_message(
    "Create a vibrant infographic explaining photosynthesis as a recipe, styled like a colorful kids cookbook"
)

final_image = None
for part in response.parts:
    if getattr(part, "thought", False):
        continue
    if part.text is not None:
        print(part.text)
    elif image := part.as_image():
        final_image = image

if final_image:
    final_image.save("photosynthesis.png")

第二轮：优化

response = chat.send_message(“将这张信息图更新为西班牙语。不要更改任何其他元素。”)

final_image = None
for part in response.parts:
if getattr(part, “thought”, False):
continue
if part.text is not None:
print(part.text)
elif image := part.as_image():
final_image = image

if final_image:
final_image.save(“photosynthesis_spanish.png”)

```javascript Node.js
import { GoogleGenAI } from "@google/genai";
import * as fs from "fs";

const ai = new GoogleGenAI({
  apiKey: process.env.COMETAPI_KEY,
  httpOptions: { apiVersion: "v1beta", baseUrl: "https://api.cometapi.com" },
});

const chat = ai.chats.create({
  model: "gemini-3.1-flash-image-preview",
  config: { responseModalities: ["TEXT", "IMAGE"] },
});

// First turn: generate
const response1 = await chat.sendMessage(
  "Create a vibrant infographic explaining photosynthesis as a recipe, styled like a colorful kids cookbook"
);
const imageParts1 = response1.candidates[0].content.parts.filter(
  (part) => part.inlineData && part.thought !== true,
);
const finalImagePart1 = imageParts1.at(-1);
if (finalImagePart1) {
  fs.writeFileSync("photosynthesis.png", Buffer.from(finalImagePart1.inlineData.data, "base64"));
}

// Second turn: refine
const response2 = await chat.sendMessage(
  "Update this infographic to be in Spanish. Do not change any other elements."
);
const imageParts2 = response2.candidates[0].content.parts.filter(
  (part) => part.inlineData && part.thought !== true,
);
const finalImagePart2 = imageParts2.at(-1);
if (finalImagePart2) {
  fs.writeFileSync("photosynthesis_spanish.png", Buffer.from(finalImagePart2.inlineData.data, "base64"));
}

提示

Prompt 优化

明确指定风格关键词（例如 "cyberpunk, film grain, low contrast"）、宽高比、主体、背景、光照和细节级别。

Base64 格式

使用原始 HTTP 时，不要包含 data:image/png;base64, 前缀——只使用原始 Base64 字符串。Python SDK 会通过 PIL.Image 对象自动处理这一点。

强制输出图像

将 "responseModalities" 仅设置为 ["IMAGE"]，以保证只输出图像而不输出文本。

为什么我的图像模糊或分辨率较低？

检查你的代码是否保存了中间思考图像。Gemini 图像响应可能包含 thought 为 true 的图像部分；这些不是最终输出。请跳过 thought: true 的部分，并保存最后一个存在 inlineData 且 thought 不为 true 的图像部分。如果你不需要文本输出，请请求 "responseModalities": ["IMAGE"]，以减少处理文本/图像混合响应的复杂度。

更多详情，请参阅 API 参考。 官方文档： Nano Banana 图像生成

Gemini 图像理解

内容审核

API 密钥

设置

文本转图像生成

图生图生成

多图像合成

方法 1：单张拼贴图像

方法 2：多张独立图像（最多 14 张）

4K 图像生成

多轮图像编辑（聊天）

提示

​设置

​文本转图像生成

​图生图生成

​多图像合成

​方法 1：单张拼贴图像

​方法 2：多张独立图像（最多 14 张）

​4K 图像生成

​多轮图像编辑（聊天）

​提示

设置

文本转图像生成

图生图生成

多图像合成

方法 1：单张拼贴图像

方法 2：多张独立图像（最多 14 张）

4K 图像生成

多轮图像编辑（聊天）

提示