ELBO | John Lambert

Table of Contents:

Overview

Overview

The Mathematics of the Evidence Lower Bound (ELBO) and why it’s important.

This is the backbone of variational inference; the math behind VAEs (Variational Autoencoders), Bayesian neural networks, and even some large language model fine-tuning tricks.

Problem Statement: True Probability is Intractable

In a probabilistic model, we often want to find the parameters θ that maximize the likelihood of the data:

log pθ(x) = log ∫ pθ(x, z) dz

That integral? Usually impossible to compute directly for deep models. So we cheat, but not the immoral kind; we do it mathematically :)

The Trick: Introduce an Approximation

We invent a new distribution qφ(z

x), which is our guess of the true posterior pθ(z

x).

Now rewrite:

log pθ(x) = L(θ, φ) + D_KL( qφ(z

pθ(z

x) )

Which implies:

log pθ(x) ≥ L(θ, φ)

We can maximize this bound instead of the intractable likelihood.

What’s Inside the ELBO

If you expand it, you get:

L(θ, φ) = E_qφ(z

x)[ log pθ(x

z) ] – D_KL( qφ(z

p(z) )

Two important terms:

Reconstruction term: Encourages the model to explain the data well. (If this term is high, your decoder is good.)

Regularization term: Forces the latent space to stay close to a simple prior (like a Gaussian). (If this term is low, your latent space is smooth and disentangled.)

So the ELBO can be seen as a trade-off between accuracy and simplicity.

The Reparameterization Trick

The clever step that makes it all differentiable:

z = μφ(x) + σφ(x) * ε, where ε ~ N(0, I)

That’s the small piece of math that made VAEs trainable and so widespread. Hella cool ngl.

Why It’s So Important

This one framework connects:

a) Deep generative models (VAEs, Diffusion models’ initial phases) b) Bayesian deep learning (uncertainty estimation) c) Self-supervised representation learning (via information bottlenecks) d) Reinforcement learning (via variational world models) …and many more.

Conceptually speaking, ELBO brings together probability, information theory, and optimization in one equation.