Course Content
Foundations of Machine Learning Track Overview
Foundations of Machine Learning Track Overview
Classification with Python
Classification in machine learning is a learning task that involves categorizing data instances into predefined classes or labels based on their features.
Classification aims to build a model to accurately assign new, unseen data points to the correct classes.
How can we use classification in real life?
- Email Spam Detection: Classification is used to determine whether an incoming email is spam or not spam (ham). Features extracted from the email's content and metadata are used to make this classification;
- Image Recognition: Classification algorithms can identify objects, people, animals, and scenes in images. This technology powers applications like self-driving cars, security cameras, and medical image analysis;
- Medical Diagnosis: Classification helps diagnose diseases based on medical test results, patient history, and symptoms;
- Text Categorization: News articles, legal documents, and social media posts can be classified into categories for information retrieval, content organization, and recommendation systems.
Example
Let's consider a simple classification task: classifying whether a fruit is an apple or an orange based on weight and size.
import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Generate synthetic data np.random.seed(0) # Features: weight in grams and size in centimeters X = np.array([[120, 6], [150, 7], [100, 5], [130, 6.5], [170, 7.5], [130, 6], [180, 8], [90, 4.5], [110, 5.5], [160, 7], [145, 6.5], [155, 7], [140, 6.5]]) # Labels: 0 for apple, 1 for orange y = np.array([0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0]) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree classifier classifier = DecisionTreeClassifier() # Train the classifier on the training data classifier.fit(X_train, y_train) # Predict labels on the test data y_pred = classifier.predict(X_test) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}') # Visualize the decision boundary plt.figure(figsize=(8, 6)) plt.scatter(X[:, 0], X[:, 1], c=y, label='Apple') plt.scatter(X[:, 0], X[:, 1], c=1-y, label='Orange') plt.xlabel('Weight (grams)') plt.ylabel('Size (cm)') plt.title('Apple vs Orange Classification') plt.xlim(80, 200) plt.ylim(4, 9) plt.legend() plt.show()
Everything was clear?
Thanks for your feedback!
Section 1. Chapter 4