📝 Exercise M4.01#
The aim of this exercise is two-fold:
understand the parametrization of a linear model;
quantify the fitting accuracy of a set of such models.
We will reuse part of the code of the course to:
load data;
create the function representing a linear model.
Prerequisites#
Data loading#
Note
If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.
import pandas as pd
penguins = pd.read_csv("../datasets/penguins_regression.csv")
feature_name = "Flipper Length (mm)"
target_name = "Body Mass (g)"
data, target = penguins[[feature_name]], penguins[target_name]
Model definition#
def linear_model_flipper_mass(
flipper_length, weight_flipper_length, intercept_body_mass
):
"""Linear model of the form y = a * x + b"""
body_mass = weight_flipper_length * flipper_length + intercept_body_mass
return body_mass
Main exercise#
Define a vector weights = [...]
and a vector intercepts = [...]
of the
same length. Each pair of entries (weights[i], intercepts[i])
tags a
different model. Use these vectors along with the vector
flipper_length_range
to plot several linear models that could possibly fit
our data. Use the above helper function to visualize both the models and the
real samples.
import numpy as np
flipper_length_range = np.linspace(data.min(), data.max(), num=300)
# Write your code here.
In the previous question, you were asked to create several linear models. The visualization allowed you to qualitatively assess if a model was better than another.
Now, you should come up with a quantitative measure which indicates the
goodness of fit of each linear model and allows you to select the best model.
Define a function goodness_fit_measure(true_values, predictions)
that takes
as inputs the true target values and the predictions and returns a single
scalar as output.
# Write your code here.
You can now copy and paste the code below to show the goodness of fit for each model.
for model_idx, (weight, intercept) in enumerate(zip(weights, intercepts)):
target_predicted = linear_model_flipper_mass(data, weight, intercept)
print(f"Model #{model_idx}:")
print(f"{weight:.2f} (g / mm) * flipper length + {intercept:.2f} (g)")
print(f"Error: {goodness_fit_measure(target, target_predicted):.3f}\n")
# Write your code here.