Skip to main content
POST
/
kling
/
v1
/
videos
/
avatar
/
image2video
cURL
curl https://api.cometapi.com/kling/v1/videos/avatar/image2video \
  -H "Authorization: Bearer $COMETAPI_KEY" \
  -H "Content-Type: application/json" \
  -d '{
      "image": "https://your-image-host/avatar.jpg",
      "prompt": "The speaker talks naturally to camera",
      "sound_file": "https://your-audio-host/speech.wav",
      "mode": "std"
    }'
{
  "code": 123,
  "message": "<string>",
  "data": {
    "task_id": "<string>",
    "task_status": "<string>",
    "created_at": 123,
    "updated_at": 123,
    "task_info": {}
  }
}
Use this endpoint to create talking-avatar clips from one source image plus one audio source.

Before you call it

  • Provide one avatar image as a public URL or raw base64 string
  • Use an avatar image that meets Kling pixel requirements; tiny thumbnails are rejected by the generation task
  • Send exactly one of audio_id or sound_file
  • Keep the first request simple: one face image, one audio clip, and a short optional prompt
  • Include task_id when the referenced audio belongs to a prior task that must be linked
  • Start with mode: std unless you specifically need the higher-quality path

Audio source rules

  • audio_id is the easiest path when you already generated speech through the Kling TTS route
  • sound_file works when you already have your own MP3, WAV, M4A, or AAC asset
  • Avatar audio is documented as 2 to 60 seconds long

Task flow

1

Create the avatar task

Submit the image and one audio source, then save the returned task id.
2

Poll the task

Continue with Get a Kling task until the task reaches a terminal state.
3

Store the finished result

Copy the final asset into your own storage if you need retention beyond the provider delivery URL.
For the complete parameter reference, see the official Kling Avatar documentation.

Authorizations

Authorization
string
header
required

Bearer token authentication. Use your CometAPI key.

Headers

Content-Type
string

Optional content type header.

Body

application/json
image
string
required

Avatar image URL or base64 image string. Use an image that meets Kling pixel requirements; very small thumbnails are rejected.

prompt
string
required

Prompt describing the desired avatar performance.

audio_id
string
required

Audio id from a prior Kling audio task.

sound_file
string

Public audio URL when you provide your own audio.

task_id
string

Optional prior task id associated with the referenced audio asset.

mode
enum<string>

Generation mode. Use std or pro; omitted requests use std.

Available options:
std,
pro

Response

200 - application/json

Task accepted.

code
integer
required
message
string
required
data
object
required