Analysis of hyperparameter search results#

In the previous notebook we showed how to implement a randomized search for tuning the hyperparameters of a HistGradientBoostingClassifier to fit the adult_census dataset. In practice, a randomized hyperparameter search is usually run with a large number of iterations.

In order to avoid the computational cost and still make a decent analysis, we load the results obtained from a similar search with 500 iterations.

import pandas as pd

cv_results = pd.read_csv(
    "../figures/randomized_search_results.csv", index_col=0
)
cv_results
mean_fit_time std_fit_time mean_score_time std_score_time param_classifier__l2_regularization param_classifier__learning_rate param_classifier__max_bins param_classifier__max_leaf_nodes param_classifier__min_samples_leaf params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.540456 0.062725 0.052069 0.002661 2.467047 0.550075 86 22 6 {'classifier__l2_regularization': 2.4670474863... 0.856558 0.862271 0.857767 0.854491 0.856675 0.857552 0.002586 48
1 1.110536 0.033403 0.074142 0.002165 0.015449 0.001146 60 19 1 {'classifier__l2_regularization': 0.0154488709... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
2 1.137484 0.053150 0.092993 0.029005 1.095093 0.004274 151 17 10 {'classifier__l2_regularization': 1.0950934559... 0.783267 0.758941 0.776413 0.779143 0.758941 0.771341 0.010357 311
3 3.935108 0.202993 0.118105 0.023658 0.003621 0.001305 18 164 37 {'classifier__l2_regularization': 0.0036210968... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
4 0.255219 0.038301 0.056048 0.016736 0.000081 5.407382 97 8 3 {'classifier__l2_regularization': 8.1060737427... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
495 0.452411 0.023006 0.055563 0.000846 0.000075 0.364373 92 17 4 {'classifier__l2_regularization': 7.4813767874... 0.858332 0.865001 0.862681 0.860360 0.860770 0.861429 0.002258 34
496 1.133042 0.014456 0.078186 0.002199 5.065946 0.001222 7 17 1 {'classifier__l2_regularization': 5.0659455480... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
497 0.911828 0.017167 0.076563 0.005130 2.460025 0.044408 16 7 7 {'classifier__l2_regularization': 2.4600250010... 0.839907 0.849713 0.846847 0.846028 0.844390 0.845377 0.003234 140
498 1.168120 0.121819 0.061283 0.000760 0.000068 0.287904 227 146 5 {'classifier__l2_regularization': 6.7755366885... 0.861881 0.865001 0.862408 0.859951 0.861862 0.862221 0.001623 33
499 0.823774 0.120686 0.060351 0.014958 0.445218 0.005112 19 8 19 {'classifier__l2_regularization': 0.4452178932... 0.764569 0.765902 0.765902 0.764947 0.765083 0.765281 0.000535 319

500 rows × 18 columns

We define a function to remove the prefixes in the hyperparameters column names.

def shorten_param(param_name):
    if "__" in param_name:
        return param_name.rsplit("__", 1)[1]
    return param_name


cv_results = cv_results.rename(shorten_param, axis=1)
cv_results
mean_fit_time std_fit_time mean_score_time std_score_time l2_regularization learning_rate max_bins max_leaf_nodes min_samples_leaf params split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score rank_test_score
0 0.540456 0.062725 0.052069 0.002661 2.467047 0.550075 86 22 6 {'classifier__l2_regularization': 2.4670474863... 0.856558 0.862271 0.857767 0.854491 0.856675 0.857552 0.002586 48
1 1.110536 0.033403 0.074142 0.002165 0.015449 0.001146 60 19 1 {'classifier__l2_regularization': 0.0154488709... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
2 1.137484 0.053150 0.092993 0.029005 1.095093 0.004274 151 17 10 {'classifier__l2_regularization': 1.0950934559... 0.783267 0.758941 0.776413 0.779143 0.758941 0.771341 0.010357 311
3 3.935108 0.202993 0.118105 0.023658 0.003621 0.001305 18 164 37 {'classifier__l2_regularization': 0.0036210968... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
4 0.255219 0.038301 0.056048 0.016736 0.000081 5.407382 97 8 3 {'classifier__l2_regularization': 8.1060737427... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
495 0.452411 0.023006 0.055563 0.000846 0.000075 0.364373 92 17 4 {'classifier__l2_regularization': 7.4813767874... 0.858332 0.865001 0.862681 0.860360 0.860770 0.861429 0.002258 34
496 1.133042 0.014456 0.078186 0.002199 5.065946 0.001222 7 17 1 {'classifier__l2_regularization': 5.0659455480... 0.758974 0.758941 0.758941 0.758941 0.758941 0.758947 0.000013 323
497 0.911828 0.017167 0.076563 0.005130 2.460025 0.044408 16 7 7 {'classifier__l2_regularization': 2.4600250010... 0.839907 0.849713 0.846847 0.846028 0.844390 0.845377 0.003234 140
498 1.168120 0.121819 0.061283 0.000760 0.000068 0.287904 227 146 5 {'classifier__l2_regularization': 6.7755366885... 0.861881 0.865001 0.862408 0.859951 0.861862 0.862221 0.001623 33
499 0.823774 0.120686 0.060351 0.014958 0.445218 0.005112 19 8 19 {'classifier__l2_regularization': 0.4452178932... 0.764569 0.765902 0.765902 0.764947 0.765083 0.765281 0.000535 319

500 rows × 18 columns

As we have more than 2 parameters in our randomized-search, we cannot visualize the results using a heatmap. We could still do it pair-wise, but having a two-dimensional projection of a multi-dimensional problem can lead to a wrong interpretation of the scores.

import seaborn as sns
import numpy as np

df = pd.DataFrame(
    {
        "max_leaf_nodes": cv_results["max_leaf_nodes"],
        "learning_rate": cv_results["learning_rate"],
        "score_bin": pd.cut(
            cv_results["mean_test_score"], bins=np.linspace(0.5, 1.0, 6)
        ),
    }
)
sns.set_palette("YlGnBu_r")
ax = sns.scatterplot(
    data=df,
    x="max_leaf_nodes",
    y="learning_rate",
    hue="score_bin",
    s=50,
    color="k",
    edgecolor=None,
)
ax.set_xscale("log")
ax.set_yscale("log")

_ = ax.legend(
    title="mean_test_score", loc="center left", bbox_to_anchor=(1, 0.5)
)
../_images/b7e6a554cea594ebd8f36a1f2592cf642b313110e8533f103fe8e88749178092.png

In the previous plot we see that the top performing values are located in a band of learning rate between 0.01 and 1.0, but we have no control in how the other hyperparameters interact with such values for the learning rate. Instead, we can visualize all the hyperparameters at the same time using a parallel coordinates plot.

import numpy as np
import plotly.express as px

fig = px.parallel_coordinates(
    cv_results.rename(shorten_param, axis=1).apply(
        {
            "learning_rate": np.log10,
            "max_leaf_nodes": np.log2,
            "max_bins": np.log2,
            "min_samples_leaf": np.log10,
            "l2_regularization": np.log10,
            "mean_test_score": lambda x: x,
        }
    ),
    color="mean_test_score",
    color_continuous_scale=px.colors.sequential.Viridis,
)
fig.show()