Overfitting and Underfitting

- Overfitting manifests itself when you have a model that does not generalize well. Say that
you achieve a classification accuracy rate on your training data of 95 percent, but when
you test its accuracy on another set of data, the accuracy falls to 50 percent. This would be
considered a high variance. If we had a case of 60 percent accuracy on the train data and
59 percent accuracy on the test data, we now have a low variance but a high bias. This
bias-variance trade-off is fundamental to machine learning and model complexity
- A bias error is the difference between the value or class
that we predict and the actual value or class in our training data.
- A variance error is the
amount by which the predicted value or class in our training set differs from the predicted
value or class versus the other datasets
- Our goal is to minimize the total error
(bias + variance)
- let’s say that we are trying to predict a value and we build a
simple linear model with our train data. As this is a simple model, we could expect a
high bias, while on the other hand, it would have a low variance between the train and
test data. Now, let’s try including polynomial terms in the linear model or build decision
trees. The models are more complex and should reduce the bias. However, as the bias
decreases, the variance, at some point, begins to expand and generalizability is
diminished. You can see this phenomena in the following illustration. Any machine
learning effort should strive to achieve the optimal trade-off between the bias and
variance, which is easier said than done.
Appreciate you sharing, great article.Much thanks again. Really Cool.
ReplyDeletedata science online free
Best Data Science Online Training