Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
ML Introduction with scikit-learn | Description of Track Courses
Foundations of Machine Learning Track Overview
course content

Course Content

Foundations of Machine Learning Track Overview

bookML Introduction with scikit-learn

Scikit-learn, often abbreviated as sklearn, is an open-source machine learning library for the Python programming language.

It provides simple and efficient tools for data analysis and modeling, including various algorithms for classification, regression, clustering, dimensionality reduction, and more.

Commonly used sklearn modules

  • sklearn.preprocessing: This module focuses on preparing and transforming data before it's used for modeling. It includes functions for scaling, normalization, encoding categorical variables, and handling missing values;
  • sklearn.model_selection: Model selection and evaluation are essential in machine learning. This module provides tools for splitting datasets into training and testing sets, performing cross-validation to assess model performance, and conducting hyperparameter tuning;
  • sklearn.pipeline: Pipelines simplify the workflow of building machine learning models. They enable you to chain together data preprocessing steps, model training, and evaluation seamlessly;
  • Model-specific Modules: Scikit-learn provides various modules for specific types of machine learning models (sklearn.cluster, sklearn.tree, sklearn.svm, sklearn.linear_model)

Example

sklearn allows us to build and train complex machine learning models and all preprocessing steps using a fairly simple interface. As a result, we can focus not on implementing algorithms and their training but on solving business problems.

For example, we can easily train a neural network to solve an object classification problem:

123456789101112131415161718192021222324252627282930313233343536373839
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neural_network import MLPClassifier from sklearn.metrics import accuracy_score from sklearn.datasets import load_iris # Load the Iris dataset data = load_iris() # Create a DataFrame from the Iris dataset df = pd.DataFrame(data.data, columns=data.feature_names) df['target'] = data.target # Define features (X) and target (y) X = df.drop('target', axis=1) y = df['target'] # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Standardize the features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Create a neural network classifier classifier = MLPClassifier(hidden_layer_sizes=(50, 25), max_iter=1000, random_state=42) # Train the classifier on the training data classifier.fit(X_train_scaled, y_train) # Predict the target on the test data y_pred = classifier.predict(X_test_scaled) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}')
copy

We can see that we used simple methods .fit() and .predict() to use a neural network model! Due to this, we may not know all the details of the neural network, but at the same time solve applied problems quickly and correctly.

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 2
some-alt