Now that you have a high-level view, let’s try to see what is happening behind the scene in simple terms.
Let’s continue with the previous supervised learning system example of the house prices. Each feature in our training set influences the final selling price, this influence is what we call the weight. Our machine learning system now has to figure out the correct weight for each feature.
Training and weights
In the background, the system is trying to figure out a formula/model to predict prices for the training set by trial and error. Basically, “Let’s see what happens when I assign more weight to this feature”. As the system already knows the real price of the houses in the training set, after assigning the new weight, it will calculate the price of each house in the training set to see how far off (error) it is from the already known real price. The system at this point would take a holistic look at the error across all the houses and aggregate it using some formula, this value is called the cost. If this cost across the board starts to go down, then the system knows it’s on the right track and continues to make iterative changes in that direction. On the other hand, if the cost starts going up, it will start moving the weights in the other direction until it can find an optimal point where it can’t lower the cost anymore.
Error and Cost
Just to clear that up, let’s go over that again. It’s possible that the error across some houses starts getting worse with changing the weights in the background BUT if the overall picture (cost) starts looking better for the model, the system will optimise in that direction. The system cares about the cost more than the individual error.
On a different note, let’s assume we include the owner’s name in the list of features provided to the system. This feature shouldn’t have any impact on the selling price, the system should assign it no weight. If our system starts assigning weights to the owner’s name, then we have a biased/unfair system on our hands. A biased system would result in over/underfitting and this is something we never want in our system.
Once training, our system would spit out the optimal weights for the features with the lowest cost. We can then save these weights so that we don’t have to train the system every time we want to make a prediction.
This is basically the gist of a supervised learning system. There are a variety of implementations that help us achieve the above, which I will continue to learn and share.