Course Content

Classification with Python

Challenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

Here is its plot.


              12345
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:


              123456789101112
            
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)
print(cross_val_score(lr, X, y).mean())

The result is awful. Regular Logistic Regression is not suited for this task. Your task is to check whether the PolynomialFeatures will help. To find the best C parameter, you will use the GridSearchCV class.

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
Initialize and train a GridSearchCV object.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 6

Challenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

Here is its plot.


              12345
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:


              123456789101112
            
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)
print(cross_val_score(lr, X, y).mean())

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
Initialize and train a GridSearchCV object.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

Section 2. Chapter 6

Challenge: Classifying Unseparateble Data

In this Challenge, you are given the following dataset:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

Here is its plot.


              12345
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:


              123456789101112
            
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)
print(cross_val_score(lr, X, y).mean())

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
Initialize and train a GridSearchCV object.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Everything was clear?

Thanks for your feedback!

In this Challenge, you are given the following dataset:


              1234
            
import pandas as pd

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
print(df.head())

Here is its plot.


              12345
            
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
plt.scatter(df['X1'], df['X2'], c=df['y'])

The dataset is for sure not linearly separable. Let's look at the Logistic Regression performance:


              123456789101112
            
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score

df = pd.read_csv('https://codefinity-content-media.s3.eu-west-1.amazonaws.com/b71ff7ac-3932-41d2-a4d8-060e24b00129/circles.csv')
X = df[['X1', 'X2']]
y = df['y']

X = StandardScaler().fit_transform(X)
lr = LogisticRegression().fit(X, y)
print(cross_val_score(lr, X, y).mean())

In this challenge, the Pipeline is used. You can think of it as a list of preprocessing steps. Its .fit_transform() method sequentially applies .fit_transform() to each item.

Task

Build a Logistic Regression model with polynomial features and find the best C parameter using GridSearchCV

Create a pipeline to make an X_poly variable that will hold the polynomial features of degree 2 of X and be scaled.
Create a param_grid dictionary to tell the GridSearchCV you want to try values [0.01, 0.1, 1, 10, 100] of a C parameter.
Initialize and train a GridSearchCV object.

Switch to desktop for real-world practiceContinue from where you are using one of the options below

Section 2. Chapter 6

Switch to desktop for real-world practiceContinue from where you are using one of the options below