📝 Exercise M1.01

📝 Exercise M1.01#

Imagine we are interested in predicting penguins species based on two of their body measurements: culmen length and culmen depth. First we want to do some data exploration to get a feel for the data.

What are the features? What is the target?

The data is located in ../datasets/penguins_classification.csv, load it with pandas into a DataFrame.

# Write your code here.

Show a few samples of the data.

How many features are numerical? How many features are categorical?

# Write your code here.

What are the different penguins species available in the dataset and how many samples of each species are there? Hint: select the right column and use the value_counts method.

# Write your code here.

Plot histograms for the numerical features

# Write your code here.

Show features distribution for each class. Hint: use seaborn.pairplot

# Write your code here.

Looking at these distributions, how hard do you think it would be to classify the penguins only using "culmen depth" and "culmen length"?