Contenido del Curso
Computer Vision Course Outline
Computer Vision Course Outline
Challenge: Building a CNN
Convolutional Neural Networks (CNNs) are widely used in image classification due to their ability to extract hierarchical features. In this task, you will implement and train a VGG-like CNN using TensorFlow and Keras on the CIFAR-10
dataset. The dataset consists of 60,000 images (32×32×3
) belonging to 10 different classes, including airplanes, cars, birds, cats, and more.
This project will guide you through loading the dataset, preprocessing the images, defining the CNN model, training it, and evaluating its performance.
1. Data Preprocessing for CNNs
Before training a CNN, preprocessing the data is a crucial step to ensure better performance and faster convergence. Common preprocessing methods include:
-
Normalization: this method involves scaling the pixel values of images from a range between 0 and 255 to a range between 0 and 1. It is often implemented as
x_train / 255.0, x_test / 255.0
. -
One-Hot Encoding: labels are often converted into one-hot encoded vectors for classification tasks. This is typically done using the
keras.utils.to_categorical
function, which transforms integer labels (e.g., 0, 1, 2, etc.) into a one-hot encoded vector, such as[1, 0, 0, 0]
for a 4-class classification problem.
2. Building the CNN Architecture
A CNN architecture is composed of several layers that perform different tasks to extract features and make predictions. You can implement key CNN layers by:
Convolutional Layer (Conv2D)
Note
input_shape
parameter, you need to specify only in the input layer.
Pooling Layer (MaxPooling2D)
Flatten Layer
Dense Layer
Note
The final dense layer typically has the number of units equal to the number of classes and uses a softmax activation function to output a probability distribution across the classes.
3. Model Compilation
After defining the architecture, the model needs to be compiled. This step involves specifying the loss function, optimizer, and metrics that will guide the model during training. The following methods are commonly used in CNNs:
Optimizer (Adam)
The optimizer adjusts the weights of the model to minimize the loss function. The Adam optimizer is popular due to its efficiency and ability to adapt the learning rate during training.
Loss Function (Categorical Crossentropy)
For multi-class classification, categorical crossentropy is typically used as the loss function. This can be implemented as:
Metrics
The model performance is monitored using metrics for classification tasks, which are accuracy, precision, recall, etc. These can be defined as:
Compile
4. Training the Model
Training a CNN involves feeding the input data to the network, computing the loss, and updating the weights using backpropagation. The training process is controlled by the following key methods:
- Fitting the Model: the
fit()
method is used to train the model. This method takes in the training data, the number of epochs, and the batch size. It also includes an optional validation split to evaluate the model's performance on unseen data during training:
- Batch Size and Epochs: the batch size determines the number of samples processed before updating the model weights, and the number of epochs refers to how many times the entire dataset is passed through the model.
5. Evaluation
Classification Report
sklearn.metrics.classification_report()
compares true and predicted values from test dataset. It includes precision, recall, and F1 score for each class. But methods need to get only class labels, so don't forget to convert it back from vectors ([0,0,1,0]
-> 2
):
Evaluate
Once the model is trained, it is evaluated on the test dataset to assess its generalization ability. The evaluation provides metrics, which were mentioned in the .compile()
method. The evaluation is performed using the .evaluate()
:
Confusion Matrix
To gain more insights into the model's performance, we can visualize the confusion matrix, which shows the true positive, false positive, true negative, and false negative predictions for each class. The confusion matrix can be computed using TensorFlow:
This matrix can then be visualized using heatmaps to observe how well the model performs on each class:
1. Load and Preprocess the Dataset
- Import the CIFAR-10 dataset from Keras.
- Normalize the pixel values to the range
[0,1]
for better convergence. - Convert the class labels into
one-hot encoded
format for categorical classification.
2. Define the CNN Model
Implement a VGG-like CNN architecture with the following key layers:
Convolutional Layers:
- Kernel size:
3×3
; - Activation function:
ReLU
; - Padding:
'same'
Pooling Layers:
- Pooling type:
Max Pooling
- Pooling size:
2×2
Dropout Layers (Prevent overfitting by randomly disabling neurons):
- Dropout rate:
25%
Flatten Layer - convert 2D feature maps into a 1D vector for classification.
Fully Connected Layers - dense layers for final classification, with a relu or softmax output layer.
Compile the model using:
Adam optimizer
(for efficient learning).Categorical cross-entropy
loss function (for multi-class classification).Accuracy metric
to measure performance (classes are balanced, and you can add other metrics on your own).
3. Train the Model
- Specify
epochs
andbatch_size
parameters for training (e.g.epochs=20, batch_size=64
). - Specify
validation_split
parameter to define percentage of training data become validation one to track model performance on unseen images. - Save the training history to visualize accuracy and loss trends.
4. Evaluate and Visualize Results
- Test the model on CIFAR-10 test data and print the accuracy.
- Plot training loss vs. validation loss to check for overfitting.
- Plot training accuracy vs. validation accuracy to ensure learning progression.
¡Gracias por tus comentarios!