Introduction
This workflow mainly uses the ComfyUI-LatentSyncWrapper 1.5 node,this node provides advanced lip-sync capabilities in ComfyUI using ByteDance's LatentSync 1.5 model. It allows you to synchronize video lips with audio input with improved temporal consistency and better performance on a wider range of languages.The comfyui-kokoro node realizes the function of generating speech from text. By combining these two nodes, the AI digital human function can be perfectly realized.
https://github.com/ShmuelRonen/ComfyUI-LatentSyncWrapper?tab=readme-ov-file
https://github.com/stavsap/comfyui-kokoro
Workflow Overview
How to use this workflow
Step 1: Upload the video and set the content and language of the voice
1.Upload video files.
2.Enter the content of the voice in the input box and select the language of the voice in the 'lang' parameter position.
Step 2: LatentSync parameter settings, voice timbre selection
1.lips_expression
: Controls the expressiveness of lip movements (default: 1.5)
- Higher values (2.0-3.0): More pronounced lip movements, better for expressive speech.
- Lower values (1.0-1.5): Subtler lip movements, better for calm speech.
- This parameter affects the model's guidance scale, balancing between natural movement and lip sync accuracy.
inference_steps
: Number of denoising steps during inference (default: 20)
- Higher values (30-50): Better quality results but slower processing.
- Lower values (10-15): Faster processing but potentially lower quality.
- The default of 20 usually provides a good balance between quality and speed.
2.Choose the timbre, divided into male and female. The letters starting with 'af' are generally female voices, and the letters starting with 'am' are generally male voices.