πŸ“ Exercise M5.01

πŸ“ Exercise M5.01#

In the previous notebook, we showed how a tree with 1 level depth works. The aim of this exercise is to repeat part of the previous experiment for a tree with 2 levels depth to show how such parameter affects the feature space partitioning.

We first load the penguins dataset and split it into a training and a testing sets:

import pandas as pd

penguins = pd.read_csv("../datasets/penguins_classification.csv")
culmen_columns = ["Culmen Length (mm)", "Culmen Depth (mm)"]
target_column = "Species"
/tmp/ipykernel_5693/3201094834.py:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd

Note

If you want a deeper overview regarding this dataset, you can refer to the Appendix - Datasets description section at the end of this MOOC.

from sklearn.model_selection import train_test_split

data, target = penguins[culmen_columns], penguins[target_column]
data_train, data_test, target_train, target_test = train_test_split(
    data, target, random_state=0
)

Create a decision tree classifier with a maximum depth of 2 levels and fit the training data.

# Write your code here.

Now plot the data and the decision boundary of the trained classifier to see the effect of increasing the depth of the tree.

Hint: Use the class DecisionBoundaryDisplay from the module sklearn.inspection as shown in previous course notebooks.

Warning

At this time, it is not possible to use response_method="predict_proba" for multiclass problems. This is a planned feature for a future version of scikit-learn. In the mean time, you can use response_method="predict" instead.

# Write your code here.

Did we make use of the feature β€œCulmen Length”? Plot the tree using the function sklearn.tree.plot_tree to find out!

# Write your code here.

Compute the accuracy of the decision tree on the testing data.

# Write your code here.