Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Aprende Challenge: Building a CNN | Convolutional Neural Networks
Computer Vision Course Outline
course content

Contenido del Curso

Computer Vision Course Outline

Computer Vision Course Outline

1. Introduction to Computer Vision
2. Image Processing with OpenCV
3. Convolutional Neural Networks
4. Object Detection
5. Advanced Topics Overview

book
Challenge: Building a CNN

Convolutional Neural Networks (CNNs) are widely used in image classification due to their ability to extract hierarchical features. In this task, you will implement and train a VGG-like CNN using TensorFlow and Keras on the CIFAR-10 dataset. The dataset consists of 60,000 images (32×32×3) belonging to 10 different classes, including airplanes, cars, birds, cats, and more.

This project will guide you through loading the dataset, preprocessing the images, defining the CNN model, training it, and evaluating its performance.

1. Data Preprocessing for CNNs

Before training a CNN, preprocessing the data is a crucial step to ensure better performance and faster convergence. Common preprocessing methods include:

  • Normalization: this method involves scaling the pixel values of images from a range between 0 and 255 to a range between 0 and 1. It is often implemented as x_train / 255.0, x_test / 255.0.

  • One-Hot Encoding: labels are often converted into one-hot encoded vectors for classification tasks. This is typically done using the keras.utils.to_categorical function, which transforms integer labels (e.g., 0, 1, 2, etc.) into a one-hot encoded vector, such as [1, 0, 0, 0] for a 4-class classification problem.

2. Building the CNN Architecture

A CNN architecture is composed of several layers that perform different tasks to extract features and make predictions. You can implement key CNN layers by:

Convolutional Layer (Conv2D)

Note

input_shape parameter, you need to specify only in the input layer.

Pooling Layer (MaxPooling2D)

Flatten Layer

Dense Layer

Note

The final dense layer typically has the number of units equal to the number of classes and uses a softmax activation function to output a probability distribution across the classes.

3. Model Compilation

After defining the architecture, the model needs to be compiled. This step involves specifying the loss function, optimizer, and metrics that will guide the model during training. The following methods are commonly used in CNNs:

Optimizer (Adam)

The optimizer adjusts the weights of the model to minimize the loss function. The Adam optimizer is popular due to its efficiency and ability to adapt the learning rate during training.

Loss Function (Categorical Crossentropy)

For multi-class classification, categorical crossentropy is typically used as the loss function. This can be implemented as:

Metrics

The model performance is monitored using metrics for classification tasks, which are accuracy, precision, recall, etc. These can be defined as:

Compile

4. Training the Model

Training a CNN involves feeding the input data to the network, computing the loss, and updating the weights using backpropagation. The training process is controlled by the following key methods:

  • Fitting the Model: the fit() method is used to train the model. This method takes in the training data, the number of epochs, and the batch size. It also includes an optional validation split to evaluate the model's performance on unseen data during training:
  • Batch Size and Epochs: the batch size determines the number of samples processed before updating the model weights, and the number of epochs refers to how many times the entire dataset is passed through the model.

5. Evaluation

Classification Report

sklearn.metrics.classification_report() compares true and predicted values from test dataset. It includes precision, recall, and F1 score for each class. But methods need to get only class labels, so don't forget to convert it back from vectors ([0,0,1,0] -> 2):

Evaluate

Once the model is trained, it is evaluated on the test dataset to assess its generalization ability. The evaluation provides metrics, which were mentioned in the .compile() method. The evaluation is performed using the .evaluate():

Confusion Matrix

To gain more insights into the model's performance, we can visualize the confusion matrix, which shows the true positive, false positive, true negative, and false negative predictions for each class. The confusion matrix can be computed using TensorFlow:

This matrix can then be visualized using heatmaps to observe how well the model performs on each class:

1. Load and Preprocess the Dataset

  • Import the CIFAR-10 dataset from Keras.
  • Normalize the pixel values to the range [0,1] for better convergence.
  • Convert the class labels into one-hot encoded format for categorical classification.

2. Define the CNN Model

Implement a VGG-like CNN architecture with the following key layers: VGG like architecture

Convolutional Layers:

  • Kernel size: 3×3;
  • Activation function: ReLU;
  • Padding: 'same'

Pooling Layers:

  • Pooling type: Max Pooling
  • Pooling size: 2×2

Dropout Layers (Prevent overfitting by randomly disabling neurons):

  • Dropout rate: 25%

Flatten Layer - convert 2D feature maps into a 1D vector for classification.

Fully Connected Layers - dense layers for final classification, with a relu or softmax output layer.

Compile the model using:

  • Adam optimizer (for efficient learning).
  • Categorical cross-entropy loss function (for multi-class classification).
  • Accuracy metric to measure performance (classes are balanced, and you can add other metrics on your own).

3. Train the Model

  • Specify epochs and batch_size parameters for training (e.g. epochs=20, batch_size=64).
  • Specify validation_split parameter to define percentage of training data become validation one to track model performance on unseen images.
  • Save the training history to visualize accuracy and loss trends.

4. Evaluate and Visualize Results

  • Test the model on CIFAR-10 test data and print the accuracy.
  • Plot training loss vs. validation loss to check for overfitting.
  • Plot training accuracy vs. validation accuracy to ensure learning progression.

COLAB CNN PROJECT

question-icon

Enter the parts of the key (You received them after you had done the project)

1.  2.  3.  4.  5.
¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 3. Capítulo 7
Lamentamos que algo salió mal. ¿Qué pasó?
some-alt