Impara Building Pipelines with scikit-learn | Orchestrating ML Pipelines

When you build machine learning solutions, you often repeat the same steps: data preprocessing, feature engineering, model training, and evaluation. Writing these steps separately can lead to code duplication and make it hard to reproduce results. scikit-learn provides the Pipeline class, which lets you chain preprocessing and modeling steps together into a single, streamlined workflow. This approach makes your code cleaner, more maintainable, and easier to reproduce.

Definition

A pipeline standardizes the ML workflow and reduces code duplication.


              12345678910111213141516171819202122232425262728
            
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Load sample data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create a pipeline with preprocessing and modeling steps
pipeline = Pipeline([
    ("scaler", StandardScaler()),          # Step 1: Standardize features
    ("classifier", LogisticRegression())   # Step 2: Train classifier
])

# Fit the pipeline on training data
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

print("Pipeline accuracy:", pipeline.score(X_test, y_test))

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 1

Chieda ad AI

Chieda pure quello che desidera o provi una delle domande suggerite per iniziare la nostra conversazione

Suggested prompts:

Can you explain how the Pipeline class works in this example?

What are the benefits of using a pipeline in machine learning projects?

How can I add more steps to the pipeline, like feature selection or other preprocessing?

Awesome!

Completion rate improved to 6.25

Scorri per mostrare il menu

Definition

A pipeline standardizes the ML workflow and reduces code duplication.


              12345678910111213141516171819202122232425262728
            
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline

# Load sample data
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Create a pipeline with preprocessing and modeling steps
pipeline = Pipeline([
    ("scaler", StandardScaler()),          # Step 1: Standardize features
    ("classifier", LogisticRegression())   # Step 2: Train classifier
])

# Fit the pipeline on training data
pipeline.fit(X_train, y_train)

# Predict on test data
predictions = pipeline.predict(X_test)

print("Pipeline accuracy:", pipeline.score(X_test, y_test))

Tutto è chiaro?

Grazie per i tuoi commenti!

Sezione 4. Capitolo 1