Earlier when working with multiple linear regression, we encountered 3 different types of error at the end of the script’s output. These errors help us ascertain the accuracy of the predictions across the test set. I will try to explain them as simply as possible, to the best of my own understanding.
MAE – Mean Absolute Error
MAE is the most intuitive of them all. The name in itself is pretty good at telling us what’s going on.
- Mean: average
- Absolute: without direction, get rid of any negative signs
Simply put, the average difference observed in the predicted and actual values across the whole test set.
In the background, the algorithm takes the differences in all of the predicted and actual prices, adds them up and then divides them by the number of observations. It doesn’t matter if the prediction is higher or lower than the actual price, the algorithm just looks at the absolute value. A lower value indicates better accuracy.
In our case, the MAE was telling us that on average our predictions are off by roughly $24,213. Is this good or bad? To compare, we can go back to our stats table printed earlier by Python and find the mean house price, it’s roughly $493,091. Now a simple calculation will tell us that the error is about 5% of mean house price, I think that’s pretty good. However, keep in mind that our training and test sets are pretty tiny and things might change significantly when a larger dataset is used.
As a general guide, I think we can use MAE when we aren’t too worried about the outliers.
Mean Squared Error
I personally don’t focus too much on MSE as I see it as a stepping stone for calculating RMSE. However, let’s see what’s it about.
- Mean: average
- Squared: square the errors so a difference of 2, becomes 4, a difference of 3 becomes 9
As you can see, as a result of the squaring, it assigns more weight to the bigger errors. The algorithm then continues to add them up and average them. If you are worried about the outliers, this is the number to look at. Keep in mind, it’s not in the same unit as our dependent value. In our case, the value was roughly 82,3755,495, this is NOT the dollar value of the error like MAE. As before, lower the number the better.
Root Mean Squared Error
RMSE can be obtained just be obtaining the square root of MSE. This number is in the same unit as the value that was to be predicted. In our case, the RMSE is roughly $28,701. As you can see, this value is higher than MAE and is about 6% of the mean house price. Is that acceptable? It’s entirely your call.
MSE & RMSE are really useful when you want to see if the outliers are messing with your predictions. It’s possible that you might decide to investigate those outliers and remove them altogether from your dataset. Even better, maybe you decided to dive deeper into understanding them better and discover some features that make them special.
I hope after this brief introduction into the errors, you can make more informed decisions about the usefulness of your models and predictions.