The goal of this course is to teach machine learning with scikit-learn to beginners, even without a strong technical background.
Predictive modeling brings value to a vast variety of data, in business intelligence, health, industrial processes and scientific discoveries. It is a pillar of modern data science. In this field, scikit-learn is a central tool: it is easily accessible, yet powerful, and naturally dovetails in the wider ecosystem of data-science tools based on the Python programming language.
This course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.
The course is more than a cookbook: it will teach you to be critical about each step of the design of a predictive modeling pipeline: from choices in data preprocessing, to choosing models, gaining insights on their failure modes and interpreting their predictions.
Follow the MOOC
The course aims to be accessible without a strong technical background. The requirements for this course are:
basic knowledge of Python programming : defining variables, writing functions, importing modules
some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required.
For a quick introduction on these requirements, you can go through these course materials or use the following resources:
The MOOC material is developed publicly under the CC-By license, including the notebooks, exercises and solutions to the exercises (but not the solutions for the quiz ;) via the following GitHub repository:
This is also published as a static website at:
It is possible to use the rocket icon at the top of each notebook page to interactively execute the code cells via the Binder service.
Note however that it is required to use the version hosted on the fun-mooc platform to complete the quiz.