Challenge: Classifying Inseparable Data
You will use the following dataset with two features:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:
123456import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
Let's use cross-validation to evaluate a simple logistic regression on this data:
123456789101112131415161718import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV
allows you to find the optimal C
parameter for better accuracy.
This task also uses the Pipeline
class. You can think of it as a sequence of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each step in the pipeline.
Swipe to start coding
You are given a dataset described as a DataFrame
in the df
variable.
- Create a pipeline that will hold the polynomial features of degree 2 of
X
and be scaled and store the resulting pipeline in thepipe
variable. - Create a
param_grid
dictionary to with values[0.01, 0.1, 1, 10, 100]
of theC
hyperparameter. - Initialize and train a
GridSearchCV
object and store the trained object in thegrid_cv
variable.
Рішення
Дякуємо за ваш відгук!
single
Запитати АІ
Запитати АІ
Запитайте про що завгодно або спробуйте одне із запропонованих запитань, щоб почати наш чат
Awesome!
Completion rate improved to 4.17
Challenge: Classifying Inseparable Data
Свайпніть щоб показати меню
You will use the following dataset with two features:
1234import pandas as pd df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') print(df.head())
If you run the code below and take a look at the resulting scatter plot, you'll see that the dataset is not linearly separable:
123456import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') plt.scatter(df['X1'], df['X2'], c=df['y']) plt.show()
Let's use cross-validation to evaluate a simple logistic regression on this data:
123456789101112131415161718import pandas as pd import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler from sklearn.linear_model import LogisticRegression from sklearn.model_selection import cross_val_score df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv') X = df[['X1', 'X2']] y = df['y'] X = StandardScaler().fit_transform(X) lr = LogisticRegression().fit(X, y) y_pred = lr.predict(X) plt.scatter(df['X1'], df['X2'], c=y_pred) plt.show() print(f'Cross-validation accuracy: {cross_val_score(lr, X, y).mean():.2f}')
As you can see, regular Logistic Regression is not suited for this task. Using polynomial regression may help improve the model's performance. Additionally, employing GridSearchCV
allows you to find the optimal C
parameter for better accuracy.
This task also uses the Pipeline
class. You can think of it as a sequence of preprocessing steps. Its .fit_transform()
method sequentially applies .fit_transform()
to each step in the pipeline.
Swipe to start coding
You are given a dataset described as a DataFrame
in the df
variable.
- Create a pipeline that will hold the polynomial features of degree 2 of
X
and be scaled and store the resulting pipeline in thepipe
variable. - Create a
param_grid
dictionary to with values[0.01, 0.1, 1, 10, 100]
of theC
hyperparameter. - Initialize and train a
GridSearchCV
object and store the trained object in thegrid_cv
variable.
Рішення
Дякуємо за ваш відгук!
Awesome!
Completion rate improved to 4.17single