Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Lære Challenge: Classifying Inseparable Data | Logistic Regression
Classification with Python

Sveip for å vise menyen

book
Challenge: Classifying Inseparable Data

You will use the following dataset with two features:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:

123456
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
copy

Let's use cross-validation to evaluate a simple logistic regression on this data:

123456789101112131415161718
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
copy

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Oppgave

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

  • Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
  • Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
  • Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 6
Vi beklager at noe gikk galt. Hva skjedde?

Spør AI

expand
ChatGPT

Spør om hva du vil, eller prøv ett av de foreslåtte spørsmålene for å starte chatten vår

book
Challenge: Classifying Inseparable Data

You will use the following dataset with two features:

1234
import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
copy

If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:

123456
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
copy

Let's use cross-validation to evaluate a simple logistic regression on this data:

123456789101112131415161718
import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
copy

As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV allows you to find the optimal C parameter for better accuracy.

This task also uses the Pipeline class. You can think of it as a sequence of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each step in the pipeline.

Oppgave

Swipe to start coding

You are given a dataset described as a DataFrame in the df variable.

  • Create a pipeline that will hold the polynomial features of degree 2 of X and be scaled and store the resulting pipeline in the pipe variable.
  • Create a param_grid dictionary to with values [0.01, 0.1, 1, 10, 100] of the C hyperparameter.
  • Initialize and train a GridSearchCV object and store the trained object in the grid_cv variable.

Løsning

Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Alt var klart?

Hvordan kan vi forbedre det?

Takk for tilbakemeldingene dine!

Seksjon 2. Kapittel 6
Switch to desktopBytt til skrivebordet for virkelighetspraksisFortsett der du er med et av alternativene nedenfor
Vi beklager at noe gikk galt. Hva skjedde?
some-alt