π Solution for Exercise M6.01#
The aim of this notebook is to investigate if we can tune the hyperparameters of a bagging regressor and evaluate the gain obtained.
We will load the California housing dataset and split it into a training and a testing set.
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
data, target = fetch_california_housing(as_frame=True, return_X_y=True)
target *= 100 # rescale the target in k$
data_train, data_test, target_train, target_test = train_test_split(
data, target, random_state=0, test_size=0.5
)
Note
If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.
Create a BaggingRegressor and provide a DecisionTreeRegressor to its
parameter estimator. Train the regressor and evaluate its generalization
performance on the testing set using the mean absolute error.
# solution
from sklearn.metrics import mean_absolute_error
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
tree = DecisionTreeRegressor()
bagging = BaggingRegressor(estimator=tree, n_jobs=2)
bagging.fit(data_train, target_train)
target_predicted = bagging.predict(data_test)
print(
"Basic mean absolute error of the bagging regressor:\n"
f"{mean_absolute_error(target_test, target_predicted):.2f} k$"
)
Basic mean absolute error of the bagging regressor:
36.65 k$
Now, create a RandomizedSearchCV instance using the previous model and tune
the important parameters of the bagging regressor. Find the best parameters
and check if you are able to find a set of parameters that improve the default
regressor still using the mean absolute error as a metric.
Tip
You can list the bagging regressorβs parameters using the get_params method.
# solution
for param in bagging.get_params().keys():
print(param)
bootstrap
bootstrap_features
estimator__ccp_alpha
estimator__criterion
estimator__max_depth
estimator__max_features
estimator__max_leaf_nodes
estimator__min_impurity_decrease
estimator__min_samples_leaf
estimator__min_samples_split
estimator__min_weight_fraction_leaf
estimator__monotonic_cst
estimator__random_state
estimator__splitter
estimator
max_features
max_samples
n_estimators
n_jobs
oob_score
random_state
verbose
warm_start
from scipy.stats import randint
from sklearn.model_selection import RandomizedSearchCV
param_grid = {
"n_estimators": randint(10, 30),
"max_samples": [0.5, 0.8, 1.0],
"max_features": [0.5, 0.8, 1.0],
"estimator__max_depth": randint(3, 10),
}
search = RandomizedSearchCV(
bagging, param_grid, n_iter=20, scoring="neg_mean_absolute_error"
)
_ = search.fit(data_train, target_train)
import pandas as pd
columns = [f"param_{name}" for name in param_grid.keys()]
columns += ["mean_test_error", "std_test_error"]
cv_results = pd.DataFrame(search.cv_results_)
cv_results["mean_test_error"] = -cv_results["mean_test_score"]
cv_results["std_test_error"] = cv_results["std_test_score"]
cv_results[columns].sort_values(by="mean_test_error")
| param_n_estimators | param_max_samples | param_max_features | param_estimator__max_depth | mean_test_error | std_test_error | |
|---|---|---|---|---|---|---|
| 9 | 14 | 0.8 | 0.8 | 9 | 39.225503 | 0.687964 |
| 11 | 11 | 0.8 | 0.8 | 8 | 40.454714 | 1.230036 |
| 1 | 21 | 1.0 | 1.0 | 8 | 40.916534 | 1.209031 |
| 16 | 16 | 1.0 | 1.0 | 8 | 40.934873 | 0.876322 |
| 5 | 19 | 0.8 | 1.0 | 8 | 41.353128 | 1.023113 |
| 3 | 14 | 0.5 | 1.0 | 8 | 41.381863 | 1.272000 |
| 13 | 28 | 0.5 | 0.8 | 7 | 42.599618 | 0.685139 |
| 19 | 13 | 1.0 | 0.8 | 6 | 45.460637 | 1.235512 |
| 14 | 13 | 0.8 | 1.0 | 6 | 45.612365 | 1.345202 |
| 10 | 19 | 0.8 | 0.5 | 8 | 47.523950 | 1.988204 |
| 7 | 23 | 1.0 | 1.0 | 5 | 48.052128 | 1.266011 |
| 15 | 21 | 0.8 | 0.8 | 5 | 48.486813 | 0.622832 |
| 12 | 28 | 1.0 | 0.5 | 6 | 50.222766 | 1.700856 |
| 4 | 12 | 0.8 | 0.5 | 6 | 50.418440 | 2.360962 |
| 18 | 29 | 0.5 | 0.5 | 5 | 51.237748 | 1.397550 |
| 8 | 15 | 0.5 | 1.0 | 4 | 51.528831 | 1.154186 |
| 17 | 13 | 0.5 | 1.0 | 4 | 51.656793 | 1.038916 |
| 6 | 23 | 1.0 | 0.5 | 5 | 51.881204 | 0.949487 |
| 2 | 12 | 0.8 | 0.8 | 4 | 52.978192 | 1.213968 |
| 0 | 22 | 0.5 | 0.8 | 3 | 57.140928 | 0.400483 |
target_predicted = search.predict(data_test)
print(
"Mean absolute error after tuning of the bagging regressor:\n"
f"{mean_absolute_error(target_test, target_predicted):.2f} k$"
)
Mean absolute error after tuning of the bagging regressor:
40.61 k$
We see that the predictor provided by the bagging regressor does not need much hyperparameter tuning compared to a single decision tree.