Gemini Generating Content
Use the Gemini native API format through CometAPI for text generation, multimodal input, thinking/reasoning, function calling, Google Search grounding, JSON mode, and streaming.
x-goog-api-key and Authorization: Bearer headers are supported for authentication.Quick start
To use any Gemini SDK or HTTP client with CometAPI, replace the base URL and API key:| Setting | Google Default | CometAPI |
|---|---|---|
| Base URL | generativelanguage.googleapis.com | api.cometapi.com |
| API key | $GEMINI_API_KEY | $COMETAPI_KEY |
Send video input
GeminigenerateContent accepts video as a content part. Choose the input shape based on where the video is stored:
| Video source | Request part | Use when |
|---|---|---|
| Local video file | inlineData | The video is small enough to send as base64 in the JSON request. |
| Public video URL | fileData.fileUri | The video is available through a public HTTPS URL that does not require authentication. |
inlineData.mimeType and fileData.fileUri. Do not send URL media as file_data.file_uri.fileData.fileUri:
generateContent request itself with inlineData or fileData.fileUri.
Configure thinking (reasoning)
Gemini models can perform internal reasoning before generating a response. The control method depends on the model generation.- Gemini 3 (thinkingLevel)
- Gemini 2.5 (thinkingBudget)
thinkingLevel to control reasoning depth. Available levels: MINIMAL, LOW, MEDIUM, HIGH.Use gemini-3-flash-preview as the default example model unless you specifically need a different Gemini 3 variant.Stream responses
To receive Server-Sent Events as the model generates content, usestreamGenerateContent?alt=sse as the operator. Each SSE event contains a data: line with a JSON GenerateContentResponse object.
Set system instructions
To guide the model’s behavior across the entire conversation, usesystemInstruction:
Request JSON output
To force structured JSON output, setresponseMimeType. Optionally provide a responseSchema for strict schema validation:
Ground with Google Search
To enable real-time web search, add agoogleSearch tool:
groundingMetadata with source URLs and confidence scores.
Response example
A typical response from CometAPI’s Gemini endpoint:thoughtsTokenCount field in usageMetadata shows how many tokens the model spent on internal reasoning, even when thinking output is not included in the response.Compare with OpenAI-compatible endpoint
| Feature | Gemini Native (/v1beta/models/...) | OpenAI-Compatible (/v1/chat/completions) |
|---|---|---|
| Thinking control | thinkingConfig with thinkingLevel / thinkingBudget | Not available |
| Google Search grounding | tools: [\{"google_search": \{\}\}] | Not available |
| Google Maps grounding | tools: [\{"googleMaps": \{\}\}] | Not available |
| Image generation modality | responseModalities: ["IMAGE"] | Not available |
| Auth header | x-goog-api-key or Bearer | Bearer only |
| Response format | Gemini native (candidates, parts) | OpenAI format (choices, message) |
Authorizations
Your CometAPI key passed via the x-goog-api-key header. Bearer token authentication (Authorization: Bearer $COMETAPI_KEY) is also supported.
Path Parameters
Gemini model ID. Example: gemini-3-flash-preview, gemini-2.5-pro. See the Models page for current options.
The operation to perform. Use generateContent for synchronous responses, or streamGenerateContent?alt=sse for Server-Sent Events streaming.
generateContent, streamGenerateContent?alt=sse Body
Conversation content. Each entry has an optional role (user or model) and a parts array.
System instructions that guide the model's behavior across the entire conversation. Text only.
Tools the model may use to generate responses. Supports function declarations, Google Search, Google Maps, and code execution.
Configuration for tool usage, such as function calling mode.
Safety filter settings. Override default thresholds for specific harm categories.
Configuration for model generation behavior including temperature, output length, and response format.
The name of cached content to use as context. Format: cachedContents/{id}. See the Gemini context caching documentation for details.
Response
Successful response. For streaming requests, the response is a stream of SSE events, each containing a GenerateContentResponse JSON object prefixed with data:.
The generated response candidates.
Feedback on the prompt, including safety blocking information.
Token usage statistics for the request.
The model version that generated this response.
The timestamp when this response was created (ISO 8601 format).
Unique identifier for this response.