Classification with Python

Classification in machine learning is a learning task that involves categorizing data instances into predefined classes or labels based on their features.

Classification aims to build a model to accurately assign new, unseen data points to the correct classes.

How can we use classification in real life?

Email Spam Detection: Classification is used to determine whether an incoming email is spam or not spam (ham). Features extracted from the email's content and metadata are used to make this classification;
Image Recognition: Classification algorithms can identify objects, people, animals, and scenes in images. This technology powers applications like self-driving cars, security cameras, and medical image analysis;
Medical Diagnosis: Classification helps diagnose diseases based on medical test results, patient history, and symptoms;
Text Categorization: News articles, legal documents, and social media posts can be classified into categories for information retrieval, content organization, and recommendation systems.

Example

Let's consider a simple classification task: classifying whether a fruit is an apple or an orange based on weight and size.


              1234567891011121314151617181920212223242526272829303132333435363738394041
            
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Generate synthetic data
np.random.seed(0)
# Features: weight in grams and size in centimeters
X = np.array([[120, 6], [150, 7], [100, 5], [130, 6.5], [170, 7.5], [130, 6], [180, 8],
              [90, 4.5], [110, 5.5], [160, 7], [145, 6.5], [155, 7], [140, 6.5]])
# Labels: 0 for apple, 1 for orange
y = np.array([0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a decision tree classifier
classifier = DecisionTreeClassifier()

# Train the classifier on the training data
classifier.fit(X_train, y_train)

# Predict labels on the test data
y_pred = classifier.predict(X_test)

# Evaluate the classifier
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Visualize the decision boundary
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=y, label='Apple')
plt.scatter(X[:, 0], X[:, 1], c=1-y, label='Orange')
plt.xlabel('Weight (grams)')
plt.ylabel('Size (cm)')
plt.title('Apple vs Orange Classification')
plt.xlim(80, 200)
plt.ylim(4, 9)
plt.legend()
plt.show()

Everything was clear?

Thanks for your feedback!

Section 1. Chapter 4

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Course Content

Foundations of Machine Learning Track Overview

What is Machine Learning?ML Introduction with scikit-learn Linear Regression with Python Classification with Python Cluster Analysis Ensemble Learning