Table of contents# Introduction Machine Learning Concepts 🎥 Introducing machine-learning concepts ✅ Quiz Intro.01 The predictive modeling pipeline Module overview Tabular data exploration First look at our dataset 📝 Exercise M1.01 📃 Solution for Exercise M1.01 ✅ Quiz M1.01 Fitting a scikit-learn model on numerical data First model with scikit-learn 📝 Exercise M1.02 📃 Solution for Exercise M1.02 Working with numerical data 📝 Exercise M1.03 📃 Solution for Exercise M1.03 Preprocessing for numerical features 🎥 Validation of a model Model evaluation using cross-validation ✅ Quiz M1.02 Handling categorical data Encoding of categorical variables 📝 Exercise M1.04 📃 Solution for Exercise M1.04 Using numerical and categorical variables together 📝 Exercise M1.05 📃 Solution for Exercise M1.05 🎥 Visualizing scikit-learn pipelines in Jupyter Visualizing scikit-learn pipelines in Jupyter ✅ Quiz M1.03 🏁 Wrap-up quiz 1 Main take-away Selecting the best model Module overview Overfitting and underfitting 🎥 Overfitting and Underfitting Cross-validation framework ✅ Quiz M2.01 Validation and learning curves 🎥 Comparing train and test errors Overfit-generalization-underfit Effect of the sample size in cross-validation 📝 Exercise M2.01 📃 Solution for Exercise M2.01 ✅ Quiz M2.02 Bias versus variance trade-off 🎥 Bias versus Variance ✅ Quiz M2.03 🏁 Wrap-up quiz 2 Main take-away Hyperparameter tuning Module overview Manual tuning Set and get hyperparameters in scikit-learn 📝 Exercise M3.01 📃 Solution for Exercise M3.01 ✅ Quiz M3.01 Automated tuning Hyperparameter tuning by grid-search Hyperparameter tuning by randomized-search 🎥 Analysis of hyperparameter search results Analysis of hyperparameter search results Evaluation and hyperparameter tuning 📝 Exercise M3.02 📃 Solution for Exercise M3.02 ✅ Quiz M3.02 🏁 Wrap-up quiz 3 Main take-away Linear models Module overview Intuitions on linear models 🎥 Intuitions on linear models Linear regression without scikit-learn 📝 Exercise M4.01 📃 Solution for Exercise M4.01 Linear regression using scikit-learn Linear models for classification ✅ Quiz M4.01 Non-linear feature engineering for linear models Non-linear feature engineering for Linear Regression 📝 Exercise M4.02 📃 Solution for Exercise M4.02 Non-linear feature engineering for Logistic Regression 📝 Exercise M4.03 📃 Solution for Exercise M4.03 ✅ Quiz M4.02 Regularization in linear model 🎥 Intuitions on regularized linear models Regularization of linear regression model 📝 Exercise M4.04 📃 Solution for Exercise M4.04 ✅ Quiz M4.03 🏁 Wrap-up quiz 4 Main take-away Decision tree models Module overview Intuitions on tree-based models 🎥 Intuitions on tree-based models ✅ Quiz M5.01 Decision tree in classification Build a classification decision tree 📝 Exercise M5.01 📃 Solution for Exercise M5.01 ✅ Quiz M5.02 Decision tree in regression Decision tree for regression 📝 Exercise M5.02 📃 Solution for Exercise M5.02 ✅ Quiz M5.03 Hyperparameters of decision tree Importance of decision tree hyperparameters on generalization ✅ Quiz M5.04 🏁 Wrap-up quiz 5 Main take-away Ensemble of models Module overview Ensemble method using bootstrapping 🎥 Intuitions on ensemble models: bagging Introductory example to ensemble models Bagging 📝 Exercise M6.01 📃 Solution for Exercise M6.01 Random forests 📝 Exercise M6.02 📃 Solution for Exercise M6.02 ✅ Quiz M6.01 Ensemble based on boosting 🎥 Intuitions on ensemble models: boosting Adaptive Boosting (AdaBoost) Gradient-boosting decision tree 📝 Exercise M6.03 📃 Solution for Exercise M6.03 Speeding-up gradient-boosting ✅ Quiz M6.02 Hyperparameter tuning with ensemble methods Hyperparameter tuning 📝 Exercise M6.04 📃 Solution for Exercise M6.04 ✅ Quiz M6.03 🏁 Wrap-up quiz 6 Main take-away Evaluating model performance Module overview Comparing a model with simple baselines Comparing model performance with a simple baseline 📝 Exercise M7.01 📃 Solution for Exercise M7.01 ✅ Quiz M7.01 Choice of cross-validation Stratification Sample grouping Non i.i.d. data ✅ Quiz M7.02 Nested cross-validation Nested cross-validation ✅ Quiz M7.03 Classification metrics Classification 📝 Exercise M7.02 📃 Solution for Exercise M7.02 ✅ Quiz M7.04 Regression metrics Regression 📝 Exercise M7.03 📃 Solution for Exercise M7.03 ✅ Quiz M7.05 🏁 Wrap-up quiz 7 Main take-away Concluding remarks 🎥 Concluding remarks Concluding remarks Appendix Glossary Datasets description The penguins datasets The adult census dataset The California housing dataset The Ames housing dataset The blood transfusion dataset The bike rides dataset Acknowledgement Notebook timings Table of contents 🚧 Feature selection Module overview Benefits of using feature selection Caveats of feature selection 📝 Exercise 01 📃 Solution for Exercise 01 Limitation of selecting feature using a model Main take-away ✅ Quiz 🚧 Interpretation Feature importance Take Away ✅ Quiz