Skip to content

Instantly share code, notes, and snippets.

@jere357
Created October 27, 2023 23:24
Show Gist options
  • Save jere357/c6f9462188a15c9a64688411b84093c0 to your computer and use it in GitHub Desktop.
Save jere357/c6f9462188a15c9a64688411b84093c0 to your computer and use it in GitHub Desktop.
intro.md

Introduction

What are generative vision models and how do they differ from other models?

Mathematical models can generally be separated into two large families, generative models and discriminative models. The main difference between discriminative models and generative models is that discriminative models learn boundaries that separate different classes, while generative models learn the distribution of different classes.

Discriminative models can be applied to standard computer vision tasks such as classification and regression, these tasks can be expanded into more complex things such as semantic segmentation or object detection.

For the sake of brevity, in this chapter we will consider generative models that solve these tasks

  • noise to image (DCGAN)
  • text to image (diffusion models)
  • image to image (StyleGAN, cycleGAN, diffusion models)

This section will cover 2 kinds of generative models. GAN-based models, and diffusion-based models. (further elaboration on this is not needed in the intro i'd say)

Before diving into generative models let's quickly go over the metric that is most commonly used to evaluate these models, FID. FID stands for Fréchet Inception Distance, it is an improvement on the inception score and was introduced in this paper. FID and IS (inception score) are two very similar metrics since they both use features from the inceptionv3 model.

FID is calculated by constructing 2 distributions from their inceptionv3 features. 1 distribution representing the training data, and the other representing the generated data. Then you calculate the Fréchet distance between these 2 distributions and that is your FID score. The lower the better.

Other 2 metrics you might come across are SSIM and PSNR.

  • mention Inception score or no? this paper

  • PSNR (peak signal-to-noise ratio) can be interpreted almost as MSE. Generally, values from [25,35] are okay results while 35+ is very good.

  • SSIM (Structural Similarity Index) is a metric in the range [0, 1] where 1 is a perfect match. The final index is calculated from 3 components: luminance, contrast, and structure. this paper analyzes SSIM and its components if you're really interested

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment