Skip to content

[Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"

Notifications You must be signed in to change notification settings

YBYBZhang/VideoElevator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VideoElevator

Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"

arXiv Project visitors

project_teaser.mp4

VideoElevator aims to elevate the quality of generated videos with text-to-image diffusion models. It is training-free and plug-and-play to support cooperation of various text-to-video and text-to-image diffusion models.

News

  • [04/07/2024] We release the code of VideoElevator, including three example scripts.

Method

**Top:** Taking text τ as input, conventional T2V performs both temporal and spatial modeling and accumulates low-quality contents throughout sampling chain.

Bottom: VideoElevator explicitly decompose each step into temporal motion refining and spatial quality elevating, where the former encapsulates T2V to enhance temporal consistency and the latter harnesses T2I to provide more faithful details, e.g., dressed in suit. Empirically, applying T2V in several timesteps is enough to ensure temporal consistency.

Setup

1. Download Weights

All pre-trained weights are downloaded to checkpoints/ directory, including the pre-trained weights of text-to-video and text-to-image diffusion models. Users can download the corresponding weights according to their needs.

  1. Text-to-video diffusion models: LaVie, ZeroScope, AnimateLCM.
  2. Text-to-image diffusion models: StableDiffusion v1.5, StableDiffusion v2.1-base.
  3. [Optional] LoRA from Civitai: RCNZ Cartoon, RealisticVision, Lyriel, ToonYou.

2. Requirements

conda create -n videoelevator python=3.10
conda activate videoelevator
pip install -r requirements.txt

Inference

We provide three example scripts of VideoElevator in example_scripts/ directory, and recommend to run example_scripts/sd_animatelcm.py. To perform improved text-to-video generation, directly run command python example_scripts/sd_animatelcm.py.

Notably, all scripts can run with less than 11 GB VRAM (e.g., 2080Ti GPU).

[Optional] Hyper-parameters

You can define the following hyper-parameters, and check their effects in Ablation studies of project page:

  • stable_steps: the choice of timestep in temporal motion refining.
  • stable_num: the number of steps used in T2V denoising.

Citation

@article{zhang2024videoelevator,
  title={VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models},
  author={Zhang, Yabo and Wei, Yuxiang and Lin, Xianhui and Hui, Zheng and Ren, Peiran and Xie, Xuansong and Ji, Xiangyang and Zuo, Wangmeng},
  journal={arXiv preprint arXiv:2403.05438},
  year={2024}
}

Acknowledgement

This repository borrows code from Diffusers, LaVie, AnimateLCM, and FreeInit. Thanks for their contributions!

About

[Arxiv 2024] Official pytorch implementation of "VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages