📝 Exercise M1.02

📝 Exercise M1.02#

The goal of this exercise is to fit a similar model as in the previous notebook to get familiar with manipulating scikit-learn objects and in particular the .fit/.predict/.score API.

Let’s load the adult census dataset with only numerical variables

import pandas as pd

adult_census = pd.read_csv("../datasets/adult-census-numeric.csv")
data = adult_census.drop(columns="class")
target = adult_census["class"]

In the previous notebook we used model = KNeighborsClassifier(). All scikit-learn models can be created without arguments. This is convenient because it means that you don’t need to understand the full details of a model before starting to use it.

One of the KNeighborsClassifier parameters is n_neighbors. It controls the number of neighbors we are going to use to make a prediction for a new data point.

What is the default value of the n_neighbors parameter?

Hint: Look at the documentation on the scikit-learn website or directly access the description inside your notebook by running the following cell. This opens a pager pointing to the documentation.

from sklearn.neighbors import KNeighborsClassifier

KNeighborsClassifier?

Create a KNeighborsClassifier model with n_neighbors=50

# Write your code here.

Fit this model on the data and target loaded above

# Write your code here.

Use your model to make predictions on the first 10 data points inside the data. Do they match the actual target values?

# Write your code here.

Compute the accuracy on the training data.

# Write your code here.

Now load the test data from "../datasets/adult-census-numeric-test.csv" and compute the accuracy on the test data.

# Write your code here.