# π Exercise M4.02ΒΆ

The goal of this exercise is to build an intuition on what will be the parametersβ values of a linear model when the link between the data and the target is non-linear.

First, we will generate such non-linear data.

Tip

`np.random.RandomState`

allows to create a random number generator which can
be later used to get deterministic results.

```
import numpy as np
# Set the seed for reproduction
rng = np.random.RandomState(0)
# Generate data
n_sample = 100
data_max, data_min = 1.4, -1.4
len_data = (data_max - data_min)
data = rng.rand(n_sample) * len_data - len_data / 2
noise = rng.randn(n_sample) * .3
target = data ** 3 - 0.5 * data ** 2 + noise
```

```
import pandas as pd
import seaborn as sns
full_data = pd.DataFrame({"data": data, "target": target})
_ = sns.scatterplot(data=full_data, x="data", y="target", color="black",
alpha=0.5)
```

We observe that the link between the data `data`

and vector `target`

is
non-linear. For instance, `data`

could represent to be the years of
experience (normalized) and `target`

the salary (normalized). Therefore, the
problem here would be to infer the salary given the years of experience.

Using the function `f`

defined below, find both the `weight`

and the
`intercept`

that you think will lead to a good linear model. Plot both the
data and the predictions of this model. Compute the mean squared error as
well.

```
def f(data, weight=0, intercept=0):
target_predict = weight * data + intercept
return target_predict
```

```
# Write your code here.: plot both the data and the model predictions
```

```
# Write your code here.: compute the mean squared error
```

Train a linear regression model and plot both the data and the predictions of the model. Compute the mean squared error with this model.

Warning

In scikit-learn, by convention `data`

(also called `X`

in the scikit-learn
documentation) should be a 2D matrix of shape `(n_samples, n_features)`

.
If `data`

is a 1D vector, you need to reshape it into a matrix with a
single column if the vector represents a feature or a single row if the
vector represents a sample.

```
from sklearn.linear_model import LinearRegression
# Write your code here.: fit a linear regression
```

```
# Write your code here.: plot the data and the prediction of the linear
# regression model
```

```
# Write your code here.: compute the mean squared error
```