Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Classification with Python | Description of Track Courses
Foundations of Machine Learning Track Overview
course content

Course Content

Foundations of Machine Learning Track Overview

bookClassification with Python

Classification in machine learning is a learning task that involves categorizing data instances into predefined classes or labels based on their features.

Classification aims to build a model to accurately assign new, unseen data points to the correct classes.

How can we use classification in real life?

  • Email Spam Detection: Classification is used to determine whether an incoming email is spam or not spam (ham). Features extracted from the email's content and metadata are used to make this classification;
  • Image Recognition: Classification algorithms can identify objects, people, animals, and scenes in images. This technology powers applications like self-driving cars, security cameras, and medical image analysis;
  • Medical Diagnosis: Classification helps diagnose diseases based on medical test results, patient history, and symptoms;
  • Text Categorization: News articles, legal documents, and social media posts can be classified into categories for information retrieval, content organization, and recommendation systems.

Example

Let's consider a simple classification task: classifying whether a fruit is an apple or an orange based on weight and size.

1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Generate synthetic data np.random.seed(0) # Features: weight in grams and size in centimeters X = np.array([[120, 6], [150, 7], [100, 5], [130, 6.5], [170, 7.5], [130, 6], [180, 8], [90, 4.5], [110, 5.5], [160, 7], [145, 6.5], [155, 7], [140, 6.5]]) # Labels: 0 for apple, 1 for orange y = np.array([0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0]) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree classifier classifier = DecisionTreeClassifier() # Train the classifier on the training data classifier.fit(X_train, y_train) # Predict labels on the test data y_pred = classifier.predict(X_test) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}') # Visualize the decision boundary plt.figure(figsize=(8, 6)) plt.scatter(X[:, 0], X[:, 1], c=y, label='Apple') plt.scatter(X[:, 0], X[:, 1], c=1-y, label='Orange') plt.xlabel('Weight (grams)') plt.ylabel('Size (cm)') plt.title('Apple vs Orange Classification') plt.xlim(80, 200) plt.ylim(4, 9) plt.legend() plt.show()
copy

Everything was clear?

How can we improve it?

Thanks for your feedback!

Section 1. Chapter 4
some-alt