# Main take-awayΒΆ

## Wrap-upΒΆ

**Overfitting**is caused by the**limited size of the training set**, the**noise**in the data, and the**high flexibility**of common machine learning models.**Underfitting**happens when the learnt prediction functions suffer from**systematic errors**. This can be caused by a choice of model family and parameters, which leads to a**lack of flexibility**to capture the repeatable structure of the true data generating process.For a fixed training set, the objective is to

**minimize the test error**by adjusting the model family and its parameters to find the**best trade-off between overfitting for underfitting**.For a given choice of model family and parameters,

**increasing the training set size will decrease overfitting**but can also cause an increase of underfitting.The test error of a model that is neither overfitting nor underfitting can still be high if the variations of the target variable cannot be fully determined by the input features. This irreducible error is caused by what we sometimes call label noise. In practice, this often happens when we do not have access to important features for one reason or another.

## To go furtherΒΆ

It is possible to give a precise mathematical treatment of the bias and the
variance of a regression model. The Wikipedia article on the Bias-variance
tradeoff explains
how the **squared test error can be decomposed as the sum of the squared bias,
the variance and the irreducible error** for a given regression error.

The next chapters on linear models, decision trees and ensembles will give concrete examples on how to diagnose and how to tackle overfitting and underfitting.

You can refer to the following scikit-learn examples which are related to the concepts approached during this module: