Classification with Python
Classification in machine learning is a learning task that involves categorizing data instances into predefined classes or labels based on their features.
Classification aims to build a model to accurately assign new, unseen data points to the correct classes.
How can we use classification in real life?
- Email Spam Detection: Classification is used to determine whether an incoming email is spam or not spam (ham). Features extracted from the email's content and metadata are used to make this classification;
- Image Recognition: Classification algorithms can identify objects, people, animals, and scenes in images. This technology powers applications like self-driving cars, security cameras, and medical image analysis;
- Medical Diagnosis: Classification helps diagnose diseases based on medical test results, patient history, and symptoms;
- Text Categorization: News articles, legal documents, and social media posts can be classified into categories for information retrieval, content organization, and recommendation systems.
Example
Let's consider a simple classification task: classifying whether a fruit is an apple or an orange based on weight and size.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Generate synthetic data np.random.seed(0) # Features: weight in grams and size in centimeters X = np.array([[120, 6], [150, 7], [100, 5], [130, 6.5], [170, 7.5], [130, 6], [180, 8], [90, 4.5], [110, 5.5], [160, 7], [145, 6.5], [155, 7], [140, 6.5]]) # Labels: 0 for apple, 1 for orange y = np.array([0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0]) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree classifier classifier = DecisionTreeClassifier() # Train the classifier on the training data classifier.fit(X_train, y_train) # Predict labels on the test data y_pred = classifier.predict(X_test) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}') # Visualize the decision boundary plt.figure(figsize=(8, 6)) plt.scatter(X[:, 0], X[:, 1], c=y, label='Apple') plt.scatter(X[:, 0], X[:, 1], c=1-y, label='Orange') plt.xlabel('Weight (grams)') plt.ylabel('Size (cm)') plt.title('Apple vs Orange Classification') plt.xlim(80, 200) plt.ylim(4, 9) plt.legend() plt.show()
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Spørg mig spørgsmål om dette emne
Opsummér dette kapitel
Vis virkelige eksempler
Awesome!
Completion rate improved to 16.67
Classification with Python
Stryg for at vise menuen
Classification in machine learning is a learning task that involves categorizing data instances into predefined classes or labels based on their features.
Classification aims to build a model to accurately assign new, unseen data points to the correct classes.
How can we use classification in real life?
- Email Spam Detection: Classification is used to determine whether an incoming email is spam or not spam (ham). Features extracted from the email's content and metadata are used to make this classification;
- Image Recognition: Classification algorithms can identify objects, people, animals, and scenes in images. This technology powers applications like self-driving cars, security cameras, and medical image analysis;
- Medical Diagnosis: Classification helps diagnose diseases based on medical test results, patient history, and symptoms;
- Text Categorization: News articles, legal documents, and social media posts can be classified into categories for information retrieval, content organization, and recommendation systems.
Example
Let's consider a simple classification task: classifying whether a fruit is an apple or an orange based on weight and size.
1234567891011121314151617181920212223242526272829303132333435363738394041import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.metrics import accuracy_score # Generate synthetic data np.random.seed(0) # Features: weight in grams and size in centimeters X = np.array([[120, 6], [150, 7], [100, 5], [130, 6.5], [170, 7.5], [130, 6], [180, 8], [90, 4.5], [110, 5.5], [160, 7], [145, 6.5], [155, 7], [140, 6.5]]) # Labels: 0 for apple, 1 for orange y = np.array([0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0]) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Create a decision tree classifier classifier = DecisionTreeClassifier() # Train the classifier on the training data classifier.fit(X_train, y_train) # Predict labels on the test data y_pred = classifier.predict(X_test) # Evaluate the classifier accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}') # Visualize the decision boundary plt.figure(figsize=(8, 6)) plt.scatter(X[:, 0], X[:, 1], c=y, label='Apple') plt.scatter(X[:, 0], X[:, 1], c=1-y, label='Orange') plt.xlabel('Weight (grams)') plt.ylabel('Size (cm)') plt.title('Apple vs Orange Classification') plt.xlim(80, 200) plt.ylim(4, 9) plt.legend() plt.show()
Tak for dine kommentarer!