ML Introduction with scikit-learn

Scikit-learn, often abbreviated as sklearn, is an open-source machine learning library for the Python programming language.

It provides simple and efficient tools for data analysis and modeling, including various algorithms for classification, regression, clustering, dimensionality reduction, and more.

Commonly used sklearn modules

sklearn.preprocessing: This module focuses on preparing and transforming data before it's used for modeling. It includes functions for scaling, normalization, encoding categorical variables, and handling missing values;
sklearn.model_selection: Model selection and evaluation are essential in machine learning. This module provides tools for splitting datasets into training and testing sets, performing cross-validation to assess model performance, and conducting hyperparameter tuning;
sklearn.pipeline: Pipelines simplify the workflow of building machine learning models. They enable you to chain together data preprocessing steps, model training, and evaluation seamlessly;
Model-specific Modules: Scikit-learn provides various modules for specific types of machine learning models (sklearn.cluster, sklearn.tree, sklearn.svm, sklearn.linear_model)

Example

sklearn allows us to build and train complex machine learning models and all preprocessing steps using a fairly simple interface. As a result, we can focus not on implementing algorithms and their training but on solving business problems.

For example, we can easily train a neural network to solve an object classification problem:


              123456789101112131415161718192021222324252627282930313233343536373839
            
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris

# Load the Iris dataset
data = load_iris()

# Create a DataFrame from the Iris dataset
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target

# Define features (X) and target (y)
X = df.drop('target', axis=1)
y = df['target']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create a neural network classifier
classifier = MLPClassifier(hidden_layer_sizes=(50, 25), max_iter=1000, random_state=42)

# Train the classifier on the training data
classifier.fit(X_train_scaled, y_train)

# Predict the target on the test data
y_pred = classifier.predict(X_test_scaled)

# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

We can see that we used simple methods .fit() and .predict() to use a neural network model! Due to this, we may not know all the details of the neural network, but at the same time solve applied problems quickly and correctly.

¿Todo estuvo claro?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2

Pregunte a AI

Pregunte lo que quiera o pruebe una de las preguntas sugeridas para comenzar nuestra charla

Contenido del Curso

Foundations of Machine Learning Track Overview

What is Machine Learning?ML Introduction with scikit-learn Linear Regression with Python Classification with Python Cluster Analysis Ensemble Learning