π Exercise M4.04#
In the previous Module we tuned the hyperparameter C of the logistic
regression without mentioning that it controls the regularization strength.
Later, on the slides on π₯ Intuitions on regularized linear models we
mentioned that a small C provides a more regularized model, whereas a
non-regularized model is obtained with an infinitely large value of C.
Indeed, C behaves as the inverse of the alpha coefficient in the Ridge
model.
In this exercise, we ask you to train a logistic regression classifier using
different values of the parameter C to find its effects by yourself.
We start by loading the dataset. We only keep the Adelie and Chinstrap classes to keep the discussion simple.
Note
If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.
import pandas as pd
penguins = pd.read_csv("../datasets/penguins_classification.csv")
penguins = (
penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
)
culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"
from sklearn.model_selection import train_test_split
penguins_train, penguins_test = train_test_split(
penguins, random_state=0, test_size=0.4
)
data_train = penguins_train[culmen_columns]
data_test = penguins_test[culmen_columns]
target_train = penguins_train[target_column]
target_test = penguins_test[target_column]
We define a function to help us fit a given model and plot its decision
boundary. We recall that by using a DecisionBoundaryDisplay with diverging
colormap, vmin=0 and vmax=1, we ensure that the 0.5 probability is mapped
to the white color. Equivalently, the darker the color, the closer the
predicted probability is to 0 or 1 and the more confident the classifier is in
its predictions.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.inspection import DecisionBoundaryDisplay
def plot_decision_boundary(model):
model.fit(data_train, target_train)
accuracy = model.score(data_test, target_test)
C = model.get_params()["logisticregression__C"]
disp = DecisionBoundaryDisplay.from_estimator(
model,
data_train,
response_method="predict_proba",
plot_method="pcolormesh",
cmap="RdBu_r",
alpha=0.8,
vmin=0.0,
vmax=1.0,
)
DecisionBoundaryDisplay.from_estimator(
model,
data_train,
response_method="predict_proba",
plot_method="contour",
linestyles="--",
linewidths=1,
alpha=0.8,
levels=[0.5],
ax=disp.ax_,
)
sns.scatterplot(
data=penguins_train,
x=culmen_columns[0],
y=culmen_columns[1],
hue=target_column,
palette=["tab:blue", "tab:red"],
ax=disp.ax_,
)
plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
plt.title(f"C: {C} \n Accuracy on the test set: {accuracy:.2f}")
Letβs now create our predictive model.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
logistic_regression = make_pipeline(StandardScaler(), LogisticRegression())
Influence of the parameter C on the decision boundary#
Given the following candidates for the C parameter and the
plot_decision_boundary function, find out the impact of C on the
classifierβs decision boundary.
How does the value of
Cimpact the confidence on the predictions?How does it impact the underfit/overfit trade-off?
How does it impact the position and orientation of the decision boundary?
Try to give an interpretation on the reason for such behavior.
Cs = [1e-6, 0.01, 0.1, 1, 10, 100, 1e6]
# Write your code here.
Impact of the regularization on the weights#
Look at the impact of the C hyperparameter on the magnitude of the weights.
Hint: You can access pipeline
steps
by name or position. Then you can query the attributes of that step such as
coef_.
# Write your code here.
Impact of the regularization on with non-linear feature engineering#
Use the plot_decision_boundary function to repeat the experiment using a
non-linear feature engineering pipeline. For such purpose, insert
Nystroem(kernel="rbf", gamma=1, n_components=100) between the
StandardScaler and the LogisticRegression steps.
Does the value of
Cstill impact the position of the decision boundary and the confidence of the model?What can you say about the impact of
Con the underfitting vs overfitting trade-off?
from sklearn.kernel_approximation import Nystroem
# Write your code here.