Introduction

What are generative vision models and how do they differ from other models?

Mathematical models can generally be separated into two large families, generative models and discriminative models. The main difference between discriminative models and generative models is that discriminative models learn boundaries that separate different classes, while generative models learn the distribution of different classes.

Discriminative models can be applied to standard computer vision tasks such as classification and regression, these tasks can be expanded into more complex things such as semantic segmentation or object detection.

For the sake of brevity, in this chapter we will consider generative models that solve these tasks

noise to image (DCGAN)
text to image (diffusion models)
image to image (StyleGAN, cycleGAN, diffusion models)

This section will cover 2 kinds of generative models. GAN-based models, and diffusion-based models. (further elaboration on this is not needed in the intro i'd say)

Before diving into generative models let's quickly go over the metric that is most commonly used to evaluate these models, FID. FID stands for Fréchet Inception Distance, it is an improvement on the inception score and was introduced in this paper. FID and IS (inception score) are two very similar metrics since they both use features from the inceptionv3 model.

FID is calculated by constructing 2 distributions from their inceptionv3 features. 1 distribution representing the training data, and the other representing the generated data. Then you calculate the Fréchet distance between these 2 distributions and that is your FID score. The lower the better.

Other 2 metrics you might come across are SSIM and PSNR.

mention Inception score or no? this paper
PSNR (peak signal-to-noise ratio) can be interpreted almost as MSE. Generally, values from [25,35] are okay results while 35+ is very good.
SSIM (Structural Similarity Index) is a metric in the range [0, 1] where 1 is a perfect match. The final index is calculated from 3 components: luminance, contrast, and structure. this paper analyzes SSIM and its components if you're really interested

jere357/intro.md

Introduction