NLP Evaluation
Perplexity of a probability distribution
Natural language generation evaluation
More than one effective way to say most things.
Given a sequence \(\mathbf{x} = [x_1,\dots,x_N]\) of length N and a probability distribution \(p\):
\[PP(p, \mathbf{x}) = \prod_{i=1}^n \Bigg( \frac{1}{ p(x_i) }) \Bigg)^{ \frac{1}{n} }\]which uses a product of inverse assigned probabilities, and a geometric mean.
\[perplexity(X) = 2^{H(X)}\]We wish to minimize perpleixity. Equivalent to the exponentiation of the cross-entropy loss.
Does the model assign high probability to the input sequence?
Weaknesses: heavily dependent on the underlying vocabulary. I can reduce perplexity just by changing the size of my vocabulary. (Can’t make comparisons across vocabularies, or datasets with different vocabularies.)
N-Gram Based Methods
Edit distance (measure of distance between strings), BLEU, ROUGE,