📝 Exercise M7.03#
As with the classification metrics exercise, we will evaluate the regression metrics within a cross-validation framework to get familiar with the syntax.
We will use the Ames house prices dataset.
import pandas as pd
import numpy as np
ames_housing = pd.read_csv("../datasets/house_prices.csv")
data = ames_housing.drop(columns="SalePrice")
target = ames_housing["SalePrice"]
data = data.select_dtypes(np.number)
target /= 1000
Note
If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.
The first step will be to create a linear regression model.
# Write your code here.
Then, use the cross_val_score
to estimate the generalization performance of
the model. Use a KFold
cross-validation with 10 folds. Make the use of the
\(R^2\) score explicit by assigning the parameter scoring
(even though it is
the default score).
# Write your code here.
Then, instead of using the \(R^2\) score, use the mean absolute error. You need
to refer to the documentation for the scoring
parameter.
# Write your code here.
Finally, use the cross_validate
function and compute multiple scores/errors
at once by passing a list of scorers to the scoring
parameter. You can
compute the \(R^2\) score and the mean absolute error for instance.
# Write your code here.