Workflows/Sonic: Lip Sync for Image-to-Speech Transformation

Sonic: Lip Sync for Image-to-Speech Transformation

Save it for me

Operate

MimicPC

02/28/2025

ComfyUI

Video Generation

Audio Generation

1 / 0

Detailed Introduction

Introduction

This workflow is based on the forefront of multimedia innovation, with Tencent Sonic model as the core, to promote the development of the digital human field. After the image is converted to video, the user manually uploads the audio. The Sonic model uses cutting-edge algorithms to accurately analyze the picture, capture the lip shape, calibrate the lip shape in milliseconds, optimize the audio playback and intonation, and make the digital human "speak" realistically. Finally, an AI video with sound is generated, which enables creation in multiple fields such as film and television, teaching, social networking, and digital humans.

ComfyUI_Sonic

Sonic is an innovative model focusing on global audio perception in portrait animation, dedicated to bringing better effects and experience to audio-driven face generation. The GitHub repository address of the project is： https://github.com/jixiaozhong/Sonic

Workflow Overview

How to use this workflow

Step 1: Load audio and Image

Step 2: Adjust the number of video frames

Usually, 150 frames can generate a very smooth 5s video, and 300 frames can generate a 20s video.

Step 3: Get the final video

Details

APP	ComfyUI(v0.3.14)
Update Time	02/28/2025
File Space	10.8 GB
Models	1
Extensions	4