# Module overview#

## What you will learn#

This module gives an intuitive introduction to the very **fundamental
concepts** of overfitting and underfitting in machine learning.

Machine learning models can never make perfect predictions: the test error is
never exactly zero. This failure comes from a **fundamental trade-off** between
**modeling flexibility** and the **limited size of the training dataset**.

The first presentation will define those problems and characterize how and why they arise.

Then we will present a methodology to quantify those problems by **contrasting
the train error with the test error** for various choice of the model family,
model parameters. More importantly, we will emphasize the **impact of the size
of the training set on this trade-off**.

Finally we will relate overfitting and underfitting to the concepts of statistical variance and bias.

## Before getting started#

The required technical skills to carry on this module are:

skills acquired during the βThe Predictive Modeling Pipelineβ module with basic usage of scikit-learn.

## Objectives and time schedule#

The objective in the module are the following:

understand the concept of overfitting and underfitting;

understand the concept of generalization;

understand the general cross-validation framework used to evaluate a model.

The estimated time to go through this module is about 3 hours.