# π Solution for Exercise M4.02ΒΆ

The goal of this exercise is to build an intuition on what will be the parametersβ values of a linear model when the link between the data and the target is non-linear.

First, we will generate such non-linear data.

Tip

`np.random.RandomState`

allows to create a random number generator which can
be later used to get deterministic results.

```
import numpy as np
# Set the seed for reproduction
rng = np.random.RandomState(0)
# Generate data
n_sample = 100
data_max, data_min = 1.4, -1.4
len_data = (data_max - data_min)
data = rng.rand(n_sample) * len_data - len_data / 2
noise = rng.randn(n_sample) * .3
target = data ** 3 - 0.5 * data ** 2 + noise
```

Note

To ease the plotting, we will create a Pandas dataframe containing the data and target

```
import pandas as pd
full_data = pd.DataFrame({"data": data, "target": target})
```

```
import seaborn as sns
_ = sns.scatterplot(data=full_data, x="data", y="target", color="black",
alpha=0.5)
```

We observe that the link between the data `data`

and vector `target`

is
non-linear. For instance, `data`

could represent to be the years of
experience (normalized) and `target`

the salary (normalized). Therefore, the
problem here would be to infer the salary given the years of experience.

Using the function `f`

defined below, find both the `weight`

and the
`intercept`

that you think will lead to a good linear model. Plot both the
data and the predictions of this model.

```
def f(data, weight=0, intercept=0):
target_predict = weight * data + intercept
return target_predict
```

```
# solution
predictions = f(data, weight=1.2, intercept=-0.2)
```

```
ax = sns.scatterplot(data=full_data, x="data", y="target", color="black",
alpha=0.5)
_ = ax.plot(data, predictions)
```

Compute the mean squared error for this model

```
# solution
from sklearn.metrics import mean_squared_error
error = mean_squared_error(target, f(data, weight=1.2, intercept=-0.2))
print(f"The MSE is {error}")
```

```
The MSE is 0.3811808390081437
```

Train a linear regression model on this dataset.

Warning

In scikit-learn, by convention `data`

(also called `X`

in the scikit-learn
documentation) should be a 2D matrix of shape `(n_samples, n_features)`

.
If `data`

is a 1D vector, you need to reshape it into a matrix with a
single column if the vector represents a feature or a single row if the
vector represents a sample.

```
from sklearn.linear_model import LinearRegression
# solution
linear_regression = LinearRegression()
data_2d = data.reshape(-1, 1)
linear_regression.fit(data_2d, target)
```

```
LinearRegression()
```

Compute predictions from the linear regression model and plot both the data and the predictions.

```
# solution
predictions = linear_regression.predict(data_2d)
```

```
ax = sns.scatterplot(data=full_data, x="data", y="target", color="black",
alpha=0.5)
_ = ax.plot(data, predictions)
```

Compute the mean squared error

```
# solution
error = mean_squared_error(target, predictions)
print(f"The MSE is {error}")
```

```
The MSE is 0.37117544002508424
```