A well-known survival tip if you find yourself stranded in unfamiliar territory, especially if visibility is limited (maybe in a forest), is just to head downhill. At each step you head in the direction that slopes downwards most steeply and hope that you will eventually find a river that can provide you with sustenance and lead you back to civilisation.
What fewer people know is that the same principle applies to optimising Machine Learning models.
A Machine Learning model takes an input (e.g. a photo) and produces and output (e.g. “This is a photo of a cat”). Internally the model contains a number of parameters. These are numeric values that determine what prediction the model will make for any given input. Depending on the model, there may be just a few or there may be millions, but the challenge is the same: we must find the values of these parameters that yield the most accurate predictions. When we have done this then we have optimised the model.
The set of parameters can be viewed as coordinates on the landscape that you wish to navigate. When you find the coordinates with the lowest height then you have optimised your model to make the best predictions.
The height of your model’s virtual landscape at a given set of coordinates is defined by the model’s so-called “loss function” and is calculated by means of your training data. The training data is a set of sample inputs to your model together with the corresponding correct outputs. For example, it could be a set of photos of cats labelled “cat” together with a set of photos of dogs labelled “dog”.
The loss function compares the model’s predictions with the correct values and calculates how far from perfect the predictions are, i.e. the height of the ground at the current point.
Crucially, the loss function allows you not only to calculate the height at a given point but also which way the ground is sloping.
Initialising the model is like being dropped into an unknown landscape. By putting training data through the model and applying the loss function you can take a step in the direction that it is most steeply sloping downhill. You can then repeat the process until you reach the lowest point. At which point, congratulations! You have optimised your model, which is the Machine Learning equivalent of reaching civilisation at last!
Further reading: Machine Learning Mastery