$\text{Style}^2\text{Talker}$: High-Resolution Talking Head Generation with Emotion Style and Art Style

This repository provides official implementations of PyTorch for the $partial$ core components of the following paper:

Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, et al.
In AAAI, 2024.

Our approach takes an identity image and an audio clip as inputs and generates a talking head with emotion style and art style, which are controlled respectively by an emotion source text and an art source picture. The pipeline of our $\text{Style}^2\text{Talker}$ is as follows:

Requirements

We train and test based on Python 3.7 and Pytorch. To install the dependencies run:

conda create -n style2talker python=3.7
conda activate style2talker

python packages

pip install -r requirements.txt

Inference

Run the demo：

python inference.py --img_path path/to/image --wav_path path/to/audio --source_3DMM path/to/source_3DMM --style_e_source "a textual description for emotion style" --art_style_id num/for/art_style --save_path path/to/save

The result will be stored in save_path.

Data Preprocess:

Crop videos in training datasets:
```
python data_preprocess/crop_video.py
```
Extract 3DMM parameters from cropped videos using Deep3DFaceReconstruction:
```
python data_preprocess/extract_3DMM.py
```
Extract landmarks from cropped videos:
```
python data_preprocess/extract_lmdk.py
```
Extract mel feature from audio:
```
python data_preprocess/get_mel.py
```
We save the video frames and 3DMM parameters in a lmdb file:
```
python data_preprocess/prepare_lmdb.py
```

Train

Following VToonify, different art styles correspond to different checkpoints, and you can use the following script to train the model to get the art style you want:
```
# Train Style-A:
python -m torch.distributed.launch --nproc_per_node=4 --master_port 12344 train_style_a.py
```

Dataset

We use the following dataset for Style-E training.

MEAD. download link.

We use the following dataset for Style-A training.

MEAD. download link.
HDTF. download link.

Art reference picture dataset.

Cartoon. download link.
Illustration, Arcane, Comic, Pixar. download link.

Acknowledgement

Some code are borrowed from following projects:

Thanks for their contributions!

Citation

If you find this codebase useful for your research, please use the following entry.

@inproceedings{tan2024style2talker,
  title={Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style},
  author={Tan, Shuai and Ji, Bin and Pan, Ye},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={5},
  pages={5079--5087},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Deep3DFaceRecon_pytorch		Deep3DFaceRecon_pytorch
config		config
data_loaders		data_loaders
data_preprocess		data_preprocess
demo		demo
diffusion		diffusion
model		model
models		models
runner		runner
utils		utils
README.md		README.md
inference.py		inference.py
requirement.txt		requirement.txt
train_style_a.py		train_style_a.py
util.py		util.py

tanshuai0219/style2talker

Folders and files

Latest commit

History

Repository files navigation

$\text{Style}^2\text{Talker}$: High-Resolution Talking Head Generation with Emotion Style and Art Style

This repository provides official implementations of PyTorch for the $partial$ core components of the following paper:

Requirements

Inference

Data Preprocess:

Train

Dataset

Acknowledgement

Citation

About

Resources

Stars

Watchers

Forks

Languages