Contenido del Curso
Foundations of Machine Learning Track Overview
Foundations of Machine Learning Track Overview
ML Introduction with scikit-learn
Scikit-learn, often abbreviated as sklearn
, is an open-source machine learning library for the Python programming language.
It provides simple and efficient tools for data analysis and modeling, including various algorithms for classification, regression, clustering, dimensionality reduction, and more.
Commonly used sklearn modules
sklearn.preprocessing
: This module focuses on preparing and transforming data before it's used for modeling. It includes functions for scaling, normalization, encoding categorical variables, and handling missing values;sklearn.model_selection
: Model selection and evaluation are essential in machine learning. This module provides tools for splitting datasets into training and testing sets, performing cross-validation to assess model performance, and conducting hyperparameter tuning;sklearn.pipeline
: Pipelines simplify the workflow of building machine learning models. They enable you to chain together data preprocessing steps, model training, and evaluation seamlessly;- Model-specific Modules: Scikit-learn provides various modules for specific types of machine learning models (
sklearn.cluster
,sklearn.tree
,sklearn.svm
,sklearn.linear_model
)
Example
sklearn
allows us to build and train complex machine learning models and all preprocessing steps using a fairly simple interface. As a result, we can focus not on implementing algorithms and their training but on solving business problems.
For example, we can easily train a neural network to solve an object classification problem:
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from sklearn.neural_network import MLPClassifier from sklearn.metrics import accuracy_score from sklearn.datasets import load_iris # Load the Iris dataset data = load_iris() # Create a DataFrame from the Iris dataset df = pd.DataFrame(data.data, columns=data.feature_names) df['target'] = data.target # Define features (X) and target (y) X = df.drop('target', axis=1) y = df['target'] # Split the dataset into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Standardize the features scaler = StandardScaler() X_train_scaled = scaler.fit_transform(X_train) X_test_scaled = scaler.transform(X_test) # Create a neural network classifier classifier = MLPClassifier(hidden_layer_sizes=(50, 25), max_iter=1000, random_state=42) # Train the classifier on the training data classifier.fit(X_train_scaled, y_train) # Predict the target on the test data y_pred = classifier.predict(X_test_scaled) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}')
We can see that we used simple methods .fit()
and .predict()
to use a neural network model! Due to this, we may not know all the details of the neural network, but at the same time solve applied problems quickly and correctly.
¡Gracias por tus comentarios!