Introduction
This workflow is based on the forefront of multimedia innovation, with Tencent Sonic model as the core, to promote the development of the digital human field. After the image is converted to video, the user manually uploads the audio. The Sonic model uses cutting-edge algorithms to accurately analyze the picture, capture the lip shape, calibrate the lip shape in milliseconds, optimize the audio playback and intonation, and make the digital human "speak" realistically. Finally, an AI video with sound is generated, which enables creation in multiple fields such as film and television, teaching, social networking, and digital humans.
ComfyUI_Sonic
Sonic is an innovative model focusing on global audio perception in portrait animation, dedicated to bringing better effects and experience to audio-driven face generation. The GitHub repository address of the project is: https://github.com/jixiaozhong/Sonic
Workflow Overview
How to use this workflow
Step 1: Load audio and Image
Step 2: Adjust the number of video frames
Usually, 150 frames can generate a very smooth 5s video, and 300 frames can generate a 20s video.