Skip to content

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Notifications You must be signed in to change notification settings

ziqihuangg/Awesome-Evaluation-of-Visual-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 

Repository files navigation

Awesome Evaluation of Visual Generation

Visitor

This repository collects methods for evaluating visual generation.

overall_structure

Overview

What You'll Find Here

Within this repository, we collect works that aim to answer some critical questions in the field of evaluating visual generation, such as:

  • Model Evaluation: How does one determine the quality of a specific image or video generation model?
  • Sample/Content Evaluation: What methods can be used to evaluate the quality of a particular generated image or video?
  • User Control Consistency Evaluation: How to tell how well the generated images and videos align with the user controls or inputs?

Updates

This repository is updated periodically. If you have suggestions for additional resources, updates on methodologies, or fixes for expiring links, please feel free to do any of the following:

  • raise an Issue,
  • nominate awesome related works with Pull Requests,
  • We are also contactable via email (ZIQI002 at e dot ntu dot edu dot sg).

Table of Contents

1. Evaluation Metrics of Generative Models

1.1. Evaluation Metrics of Image Generation

Metric Paper Code
Inception Score (IS) Improved Techniques for Training GANs (NeurIPS 2016)
Fréchet Inception Distance (FID) GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017) Code Code
Kernel Inception Distance (KID) Demystifying MMD GANs (ICLR 2018) Code Code
CLIP-FID The Role of ImageNet Classes in Fréchet Inception Distance (ICLR 2023) Code Code
Precision-and-Recall Assessing Generative Models via Precision and Recall (2018-05-31, NeurIPS 2018)
Improved Precision and Recall Metric for Assessing Generative Models (NeurIPS 2019)
Code Code
Renyi Kernel Entropy (RKE) An Information-Theoretic Evaluation of Generative Models in Learning Multi-modal Distributions (NeurIPS 2023) Code
CLIP Maximum Mean Discrepancy (CMMD) Rethinking FID: Towards a Better Evaluation Metric for Image Generation (CVPR 2024) Code

1.2. Evaluation Metrics of Video Generation

Metric Paper Code
FID-vid GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium (NeurIPS 2017)
Fréchet Video Distance (FVD) Towards Accurate Generative Models of Video: A New Metric & Challenges (arXiv 2018)
FVD: A new Metric for Video Generation (2019-05-04) (Note: ICLR 2019 Workshop DeepGenStruct Program Chairs)
Code

1.3. Evaluation Metrics for Latent Representation

2. Evaluation Metrics of Condition Consistency

2.1 Evaluation Metrics of Multi-Modal Condition Consistency

Metric Condition Pipeline Code References
CLIP Score (a.k.a. CLIPSIM) Text cosine similarity between the CLIP image and text embeddings Code PyTorch Lightning CLIP Paper (ICML 2021). Metrics first used in CLIPScore Paper (arXiv 2021) and GODIVA Paper (arXiv 2021) applies it in video evaluation.
Mask Accuracy Segmentation Mask predict the segmentatio mask, and compute pixel-wise accuracy against the ground-truth segmentation mask any segmentation method for your setting
DINO Similarity Image of a Subject (human / object etc) cosine similarity between the DINO embeddings of the generated image and the condition image Code DINO paper. Metric is proposed in DreamBooth.

2.2. Evaluation Metrics of Image Similarity

Metrics Paper Code
Learned Perceptual Image Patch Similarity (LPIPS) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric (2018-01-11) (CVPR 2018) Code Website
Structural Similarity Index (SSIM) Image quality assessment: from error visibility to structural similarity (TIP 2004) Code Code
Peak Signal-to-Noise Ratio (PSNR) - Code
Multi-Scale Structural Similarity Index (MS-SSIM) Multiscale structural similarity for image quality assessment (SSC 2004) PyTorch-Metrics
Feature Similarity Index (FSIM) FSIM: A Feature Similarity Index for Image Quality Assessment (TIP 2011) Code

The community has also been using DINO or CLIP features to measure the semantic similarity of two images / frames.

There are also recent works on new methods to measure visual similarity (more will be added):

3. Evaluation Systems of Generative Models

3.1. Evaluation of Unconditional Image Generation

3.2. Evaluation of Text-to-Image Generation

NOTE: evaluates task of image and text generation

3.3. Evaluation of Text-Based Image Editing

3.4. Evaluation of Video Generation

3.4.1. Evaluation of Text-to-Video Generation

NOTE: hallucination detection

3.4.2. Evaluation of Image-to-Video Generation

3.4.3. Evaluation of Talking Face Generation

3.5. Evaluation of Text-to-Motion Generation

3.6. Evaluation of Model Trustworthiness

3.6.1. Evaluation of Visual-Generation-Model Trustworthiness

3.6.2. Evaluation of Non-Visual-Generation-Model Trustworthiness

Not for visual generation, but related evaluations of other models like LLMs

3.7. Evaluation of Entity Relation

4. Improving Visual Generation with Evaluation / Feedback / Reward

5. Quality Assessment for AIGC

5.1. Image Quality Assessment for AIGC

5.2. Aesthetic Predictors for Generated Images

6. Study and Rethinking

6.1. Evaluation of Evaluations

6.2. Survey

Note: Refer to table 2 for evaluation metrics for long video generation

6.3. Study

6.4. Competition

7. Other Useful Resources

  • Stanford Course: CS236 "Deep Generative Models" - Lecture 15 "Evaluation of Generative Models" [slides]

About

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published