π Exercise M4.04#
In the previous Module we tuned the hyperparameter C
of the logistic
regression without mentioning that it controls the regularization strength.
Later, on the slides on π₯ Intuitions on regularized linear models we
mentioned that a small C
provides a more regularized model, whereas a
non-regularized model is obtained with an infinitely large value of C
.
Indeed, C
behaves as the inverse of the alpha
coefficient in the Ridge
model.
In this exercise, we ask you to train a logistic regression classifier using
different values of the parameter C
to find its effects by yourself.
We start by loading the dataset. We only keep the Adelie and Chinstrap classes to keep the discussion simple.
Note
If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.
import pandas as pd
penguins = pd.read_csv("../datasets/penguins_classification.csv")
penguins = (
penguins.set_index("Species").loc[["Adelie", "Chinstrap"]].reset_index()
)
culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"
from sklearn.model_selection import train_test_split
penguins_train, penguins_test = train_test_split(
penguins, random_state=0, test_size=0.4
)
data_train = penguins_train[culmen_columns]
data_test = penguins_test[culmen_columns]
target_train = penguins_train[target_column]
target_test = penguins_test[target_column]
We define a function to help us fit a given model
and plot its decision
boundary. We recall that by using a DecisionBoundaryDisplay
with diverging
colormap, vmin=0
and vmax=1
, we ensure that the 0.5 probability is mapped
to the white color. Equivalently, the darker the color, the closer the
predicted probability is to 0 or 1 and the more confident the classifier is in
its predictions.
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.inspection import DecisionBoundaryDisplay
def plot_decision_boundary(model):
model.fit(data_train, target_train)
accuracy = model.score(data_test, target_test)
C = model.get_params()["logisticregression__C"]
disp = DecisionBoundaryDisplay.from_estimator(
model,
data_train,
response_method="predict_proba",
plot_method="pcolormesh",
cmap="RdBu_r",
alpha=0.8,
vmin=0.0,
vmax=1.0,
)
DecisionBoundaryDisplay.from_estimator(
model,
data_train,
response_method="predict_proba",
plot_method="contour",
linestyles="--",
linewidths=1,
alpha=0.8,
levels=[0.5],
ax=disp.ax_,
)
sns.scatterplot(
data=penguins_train,
x=culmen_columns[0],
y=culmen_columns[1],
hue=target_column,
palette=["tab:blue", "tab:red"],
ax=disp.ax_,
)
plt.legend(bbox_to_anchor=(1.05, 0.8), loc="upper left")
plt.title(f"C: {C} \n Accuracy on the test set: {accuracy:.2f}")
Letβs now create our predictive model.
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
logistic_regression = make_pipeline(StandardScaler(), LogisticRegression())
Influence of the parameter C
on the decision boundary#
Given the following candidates for the C
parameter and the
plot_decision_boundary
function, find out the impact of C
on the
classifierβs decision boundary.
How does the value of
C
impact the confidence on the predictions?How does it impact the underfit/overfit trade-off?
How does it impact the position and orientation of the decision boundary?
Try to give an interpretation on the reason for such behavior.
Cs = [1e-6, 0.01, 0.1, 1, 10, 100, 1e6]
# Write your code here.
Impact of the regularization on the weights#
Look at the impact of the C
hyperparameter on the magnitude of the weights.
Hint: You can access pipeline
steps
by name or position. Then you can query the attributes of that step such as
coef_
.
# Write your code here.
Impact of the regularization on with non-linear feature engineering#
Use the plot_decision_boundary
function to repeat the experiment using a
non-linear feature engineering pipeline. For such purpose, insert
Nystroem(kernel="rbf", gamma=1, n_components=100)
between the
StandardScaler
and the LogisticRegression
steps.
Does the value of
C
still impact the position of the decision boundary and the confidence of the model?What can you say about the impact of
C
on the underfitting vs overfitting trade-off?
from sklearn.kernel_approximation import Nystroem
# Write your code here.