数字人
使用 CometAPI 中的 Kling Avatar API 基于图像生成数字人驱动视频。使用 POST /kling/v1/videos/avatar/image2video 快速创建图生视频数字人。
使用此端点可通过一张源图像加一个音频源创建会说话的数字人片段。Documentation Index
Fetch the complete documentation index at: https://apidoc.cometapi.com/llms.txt
Use this file to discover all available pages before exploring further.
调用前准备
- 提供一个数字人
image,可以是公开 URL 或原始 base64 字符串 - 仅发送
audio_id或sound_file其中之一 - 首次请求请尽量保持简单:一张人脸图片、一个音频片段,以及一个可选的简短 prompt
- 除非你明确需要更高质量的路径,否则请先使用
mode: std
音频源规则
- 如果你已经通过 Kling TTS 路径生成了语音,
audio_id是最简单的方式 - 如果你已经有自己的 MP3、WAV、M4A 或 AAC 资源,则可以使用
sound_file - 文档中说明数字人音频时长为 2 到 60 秒
任务流程
轮询任务
授权
Bearer token authentication. Use your CometAPI key.
请求头
Optional content type header.
请求体
- Option 1
- Option 2
Avatar reference image. Accepts an image URL or raw Base64 string (no data: prefix). Supported formats: JPG, JPEG, PNG. Max file size 10 MB. Minimum dimension 300 px on each side; aspect ratio between 1:2.5 and 2.5:1.
Audio ID returned by the Kling TTS API. Only audio clips between 2 and 60 seconds generated within the last 30 days are accepted. Mutually exclusive with sound_file — exactly one must be provided.
Text prompt to guide avatar actions, emotions, and camera movements. Max 2500 characters. Required — the API rejects requests without this field.
Audio file as a URL or Base64 string. Accepted formats: MP3, WAV, M4A, AAC. Max 5 MB, duration 2–60 seconds. Mutually exclusive with audio_id — exactly one must be provided.
Generation mode. std (standard, faster and more cost-effective) or pro (professional, higher quality output).
Webhook URL for task status notifications. The server sends a callback when the task status changes.
Optional user-defined task ID for your own tracking. Does not replace the system-generated task ID. Must be unique per account.