Kernel Trick
The Kernel Trick
The Kernel Trick is a poorly taught but beautiful piece of insight that makes SVMs work.
\[K(x,z) = (x^Tz)^2 = (\sum\limits_{i=1}^n x_iz_i)^2\]You might be tempted to simplify this immediately to \( \sum\limits_{i=1}^n (x_iz_i)^2 \), but this would be incorrect. This expression is actually a square of sums, which involves a large product (remember expanding binomial products like \( (a-b)(a-b) \) which involve \(2^2\) terms ).
We can expand the square of sums, and we’ll use different indices \(i,j\) to keep track of the respective terms:
\[K(x,z) = (\sum\limits_{i=1}^n x_iz_i) (\sum\limits_{j=1}^n x_jz_j)\] \[K(x,z) = \sum\limits_{i=1}^n \sum\limits_{j=1}^n (x_ix_j) (z_iz_j)\]This is much easier to understand with an example. Consider \(n=3\):
\[K(x,z) = \begin{bmatrix} x_1z_1 + x_2z_2 + x_3z_3 \end{bmatrix} \begin{bmatrix} x_1z_1 + x_2z_2 + x_3z_3 \end{bmatrix}\]Expanding terms, we get a sum with 9 total terms:
\[K(x,z) = \bigg(x_1z_1x_1z_1 + x_1z_1x_2z_2 + x_1z_1x_3z_3\bigg) + \bigg(x_2z_2 x_1z_1 + x_2z_2 x_2z_2 + x_2z_2x_3z_3\bigg) + \bigg( x_3z_3x_1z_1 + x_3z_3x_2z_2 + x_3z_3x_3z_3 \bigg)\]Surprisingly, this sum can be written as a simple inner product of two vectors
\[K(x,z) =\phi(x)^T \phi(z) = \begin{bmatrix} x_1x_1 \\ x_1x_2 \\ x_1x_3 \\ x_2x_1 \\ x_2x_2 \\ x_2x_3 \\ \vdots \\ \end{bmatrix} \begin{bmatrix} z_1z_1 \\ z_1z_2 \\ z_1z_3 \\ z_2z_1 \\ z_2z_2 \\ z_2z_3 \\ \vdots \\ \end{bmatrix}\]Thus, we’ve shown that for \(n=3\), \(K(x,z) = \phi(x)^T \phi(z)\) which contains all of the cross-product terms.
References
[1]. Christopher J.C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition. PDF.