# π Solution for Exercise M3.02#

The goal is to find the best set of hyperparameters which maximize the generalization performance on a training set.

```
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
data, target = fetch_california_housing(return_X_y=True, as_frame=True)
target *= 100 # rescale the target in k$
data_train, data_test, target_train, target_test = train_test_split(
data, target, random_state=42
)
```

In this exercise, we will progressively define the regression pipeline and later tune its hyperparameters.

Start by defining a pipeline that:

uses a

`StandardScaler`

to normalize the numerical data;uses a

`sklearn.neighbors.KNeighborsRegressor`

as a predictive model.

```
# solution
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsRegressor
scaler = StandardScaler()
model = make_pipeline(scaler, KNeighborsRegressor())
```

Use `RandomizedSearchCV`

with `n_iter=20`

to find the best set of
hyperparameters by tuning the following parameters of the `model`

:

the parameter

`n_neighbors`

of the`KNeighborsRegressor`

with values`np.logspace(0, 3, num=10).astype(np.int32)`

;the parameter

`with_mean`

of the`StandardScaler`

with possible values`True`

or`False`

;the parameter

`with_std`

of the`StandardScaler`

with possible values`True`

or`False`

.

Notice that in the notebook βHyperparameter tuning by randomized-searchβ we
pass distributions to be sampled by the `RandomizedSearchCV`

. In this case we
define a fixed grid of hyperparameters to be explored. Using a `GridSearchCV`

instead would explore all the possible combinations on the grid, which can be
costly to compute for large grids, whereas the parameter `n_iter`

of the
`RandomizedSearchCV`

controls the number of different random combination that
are evaluated. Notice that setting `n_iter`

larger than the number of possible
combinations in a grid (in this case 10 x 2 x 2 = 40) would lead to repeating
already-explored combinations.

Once the computation has completed, print the best combination of parameters
stored in the `best_params_`

attribute.

```
# solution
import numpy as np
from sklearn.model_selection import RandomizedSearchCV
param_distributions = {
"kneighborsregressor__n_neighbors": np.logspace(0, 3, num=10).astype(
np.int32
),
"standardscaler__with_mean": [True, False],
"standardscaler__with_std": [True, False],
}
model_random_search = RandomizedSearchCV(
model,
param_distributions=param_distributions,
n_iter=20,
n_jobs=2,
verbose=1,
random_state=1,
)
model_random_search.fit(data_train, target_train)
model_random_search.best_params_
```

```
Fitting 5 folds for each of 20 candidates, totalling 100 fits
```

```
{'standardscaler__with_std': True,
'standardscaler__with_mean': False,
'kneighborsregressor__n_neighbors': 10}
```

So the best hyperparameters give a model where the features are scaled but not centered.

Getting the best parameter combinations is the main outcome of the
hyper-parameter optimization procedure. However it is also interesting to
assess the sensitivity of the best models to the choice of those parameters.
The following code, not required to answer the quiz question shows how to
conduct such an interactive analysis for this this pipeline using a parallel
coordinate plot using the `plotly`

library.

We could use `cv_results = model_random_search.cv_results_`

to make a parallel
coordinate plot as we did in the previous notebook (you are more than welcome
to try!).

```
import pandas as pd
cv_results = pd.DataFrame(model_random_search.cv_results_)
```

To simplify the axis of the plot, we will rename the column of the dataframe and only select the mean test score and the value of the hyperparameters.

```
column_name_mapping = {
"param_kneighborsregressor__n_neighbors": "n_neighbors",
"param_standardscaler__with_mean": "centering",
"param_standardscaler__with_std": "scaling",
"mean_test_score": "mean test score",
}
cv_results = cv_results.rename(columns=column_name_mapping)
cv_results = cv_results[column_name_mapping.values()].sort_values(
"mean test score", ascending=False
)
```

In addition, the parallel coordinate plot from `plotly`

expects all data to be
numeric. Thus, we convert the boolean indicator informing whether or not the
data were centered or scaled into an integer, where True is mapped to 1 and
False is mapped to 0. As `n_neighbors`

has `dtype=object`

, we also convert it
explicitly to an integer.

```
column_scaler = ["centering", "scaling"]
cv_results[column_scaler] = cv_results[column_scaler].astype(np.int64)
cv_results["n_neighbors"] = cv_results["n_neighbors"].astype(np.int64)
cv_results
```

n_neighbors | centering | scaling | mean test score | |
---|---|---|---|---|

17 | 10 | 0 | 1 | 0.687926 |

18 | 4 | 0 | 1 | 0.674812 |

6 | 46 | 0 | 1 | 0.668778 |

9 | 100 | 0 | 1 | 0.648317 |

16 | 2 | 1 | 1 | 0.629772 |

15 | 215 | 1 | 1 | 0.617295 |

12 | 215 | 0 | 1 | 0.617295 |

10 | 464 | 1 | 1 | 0.567164 |

0 | 1 | 0 | 1 | 0.508809 |

13 | 1000 | 1 | 1 | 0.486503 |

8 | 21 | 0 | 0 | 0.103390 |

11 | 21 | 1 | 0 | 0.103390 |

3 | 46 | 1 | 0 | 0.061394 |

4 | 100 | 0 | 0 | 0.033122 |

1 | 215 | 0 | 0 | 0.017583 |

5 | 215 | 1 | 0 | 0.017583 |

14 | 464 | 1 | 0 | 0.007987 |

19 | 464 | 0 | 0 | 0.007987 |

7 | 1000 | 0 | 0 | 0.002900 |

2 | 1 | 0 | 0 | -0.238830 |

```
import plotly.express as px
fig = px.parallel_coordinates(
cv_results,
color="mean test score",
dimensions=["n_neighbors", "centering", "scaling", "mean test score"],
color_continuous_scale=px.colors.diverging.Tealrose,
)
fig.show()
```