Veeg om het menu te tonen

Challenge: Classifying Inseparable Data

You will use the following dataset with two features:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:


              123456
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])
plt.show()

Let's use cross-validation to evaluate a simple logistic regression on this data:


              123456789101112131415161718
            
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)

y_pred = lr.predict(X)
plt.scatter(df['X1'], df['X2'], c=y_pred)
plt.show()

print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Taak

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Oplossing

Schakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 2. Hoofdstuk 6

single

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Challenge: Classifying Inseparable Data

You will use the following dataset with two features:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:


              123456
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])
plt.show()

Let's use cross-validation to evaluate a simple logistic regression on this data:


              123456789101112131415161718
            
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)

y_pred = lr.predict(X)
plt.scatter(df['X1'], df['X2'], c=y_pred)
plt.show()

print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')

Taak

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Oplossing

Schakel over naar desktop voor praktijkervaringGa verder vanaf waar je bent met een van de onderstaande opties

Was alles duidelijk?

Bedankt voor je feedback!

Veeg om het menu te tonen

Challenge: Classifying Inseparable Data

Oplossing

Awesome!

Challenge: Classifying Inseparable Data

Oplossing

Awesome!