Nevertheless, like any learning algorithm SVMs have a number of tunable parameters; these include the penalty parameter C as well as any other parameters that the kernel might depend on, e.g. the kernel width for radial basis function (RBF) kernels. We call these hyperparameters, to distinguish them from the lower-level parameters which the algorithm fits, i.e. weight vector and offset.

If there are only one or two hyperparameters, one can certainly try a direct minimization of test error - as measured e.g. by cross-validation - over a grid of hyperparameter values. But this becomes impractical when many hyperparameters are involved. For example, in an RBF kernel one might want to allow a separate width parameter for each input dimension (or feature). Optimizing over these widths amounts to automatic relevance determination (ARD) since large width parameters indicate that the feature concerned has little effect on the kernel and hence on prediction performance.

We interpret the SVM algorithm as the maximum a posteriori solution to a Bayesian inference problem. It is then natural to select hyperparameters to maximize the evidence, i.e. the overall likelihood of the observed data. The key advantage is that the evidence is a continuous function of the hyperparameters, and so can be optimized by e.g. gradient ascent. We have tested this method on a number of standard data sets and found very encouraging results. For details and background references, see the papers on SVMs in my publications list.

We have written software to automate the tuning of all hyperparameters for SVM classifiers with the popular RBF kernels, extended to allow ARD. Evidence gradients are estimated by sampling from the Bayesian posterior, and this is speeded up by a Nystrom approxmation which reduces the dimensionality of the space that needs to be sampled. You can download the complete software bundle free from here as long as it's for research and education use; unpack it with gzip and tar on Unix, or WinZip or similar on Windows. If you want further information about the software before downloading, have a look at the user's guide.

The approach should also extend straightforwardly to SVM regression rather than classification. We may implement this in the future, or if you're interested and would like to collaborate on this, get in touch.

Home | King's College | Search | Comments |