Skip to main content
POST
/
kling
/
v1
/
videos
/
avatar
/
image2video
Create a Kling avatar task
curl --request POST \
  --url https://api.cometapi.com/kling/v1/videos/avatar/image2video \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "image": "example"
}
'
{
  "code": 123,
  "message": "<string>",
  "data": {
    "task_id": "<string>",
    "task_status": "<string>",
    "created_at": 123,
    "updated_at": 123,
    "task_info": {}
  }
}
Use this endpoint to create talking-avatar clips from one source image plus one audio source.

Before you call it

  • Provide one avatar image as a public URL or raw base64 string
  • Send exactly one of audio_id or sound_file
  • Keep the first request simple: one face image, one audio clip, and a short optional prompt
  • Start with mode: std unless you specifically need the higher-quality path

Audio source rules

  • audio_id is the easiest path when you already generated speech through the Kling TTS route
  • sound_file works when you already have your own MP3, WAV, M4A, or AAC asset
  • Avatar audio is documented as 2 to 60 seconds long

Task flow

1

Create the avatar task

Submit the image and one audio source, then save the returned task id.
2

Poll the task

Continue with Individual Queries until the task reaches a terminal state.
3

Store the finished result

Copy the final asset into your own storage if you need retention beyond the provider delivery URL.
For the complete parameter reference, see the official Kling Avatar documentation.

Authorizations

Authorization
string
header
required

Bearer token authentication. Use your CometAPI key.

Headers

Content-Type
string

Optional content type header.

Body

application/json
image
string
default:example
required

Avatar reference image. Accepts an image URL or raw Base64 string (no data: prefix). Supported formats: JPG, JPEG, PNG. Max file size 10 MB. Minimum dimension 300 px on each side; aspect ratio between 1:2.5 and 2.5:1.

audio_id
string
required

Audio ID returned by the Kling TTS API. Only audio clips between 2 and 60 seconds generated within the last 30 days are accepted. Mutually exclusive with sound_file — exactly one must be provided.

sound_file
string

Audio file as a URL or Base64 string. Accepted formats: MP3, WAV, M4A, AAC. Max 5 MB, duration 2–60 seconds. Mutually exclusive with audio_id — exactly one must be provided.

prompt
string

Text prompt to guide avatar actions, emotions, and camera movements. Max 2500 characters.

mode
string

Generation mode. std (standard, faster and more cost-effective) or pro (professional, higher quality output).

callback_url
string

Webhook URL for task status notifications. The server sends a callback when the task status changes.

external_task_id
string

Optional user-defined task ID for your own tracking. Does not replace the system-generated task ID. Must be unique per account.

Response

200 - application/json

Task accepted.

code
integer
required
message
string
required
data
object
required