Contenido del Curso
Neural Networks with TensorFlow
Neural Networks with TensorFlow
Learning Rate Scheduling
Learning rate scheduling refers to varying the learning rate during training, rather than keeping it constant. This approach can lead to better performance and faster convergence by adapting the learning rate to the stage of training.
Types of Learning Rate Schedulers
- Time-Based Decay: Reduces the learning rate gradually over time.
- Exponential Decay: Decreases the learning rate exponentially, following a predefined exponential function.
- Custom Decay: Decreases the learning rate based of a specific function.
- Learning Rate Warmup: Temporarily increases the learning rate at the beginning of training.
The first three methods are known as Learning Rate Decay. Learning rate decay is used to gradually reduce the learning rate during training, allowing for more precise weight updates and improved convergence as the model approaches the optimal solution.
Learning Rate Decay
- Works Best With: Traditional optimizers like Stochastic Gradient Descent (SGD) benefit most from learning rate scheduling. Momentum also sees significant improvements.
- Has Less Impact On: Adaptive optimizers like Adam, RMSprop, or Adagrad are less dependent on learning rate scheduling, as they adjust their learning rates automatically during training. However, they can still benefit from it in some cases.
Time-Based Decay
lr=0.1
: This sets the initial learning rate for the Stochastic Gradient Descent (SGD) optimizer to 0.1.decay=0.01
: This sets the decay rate for the learning rate. In this context, the decay rate specifies how much the learning rate decreases after each training epoch.
Exponential Decay
initial_learning_rate = 0.1
: This sets the initial learning rate for the optimizer to0.1
.decay_steps = 10
: This sets the number of steps after which the learning rate decay occurs. Here, the learning rate will decay every10
steps.decay_rate = 0.96
: This sets the rate at which the learning rate decays. Each time the decay occurs, the learning rate is multiplied by0.96
.staircase=True
: This means the learning rate decays in a stepwise fashion rather than smoothly, which makes the decay happen at discrete intervals (everydecay_steps
).
Custom Decay
custom_lr_scheduler
: Intended to reduce the learning rate by half every10
epochs, but it should not change at epoch0
. The function takes two arguments:epoch
(the current epoch number) andlr
(the current learning rate).lr_scheduler = LearningRateScheduler(custom_lr_scheduler)
: This creates a learning rate scheduler callback using the custom function.model.fit(..., callbacks=[lr_scheduler])
: The custom learning rate scheduler is passed as a callback to the model's fit method.
Learning Rate Warmup
The final method, Learning Rate Warmup, contrasts with the others by initially increasing the learning rate rather than decreasing it. The idea behind this technique is to allow the model to start learning gradually, helping it to stabilize and adapt before adopting a higher learning rate.
- Purpose: The warmup phase gradually increases the learning rate from a small value to the intended initial learning rate. Warmup can prevent the model from diverging early in training due to large weight updates. This helps in stabilizing the training, especially when starting with a high learning rate or training large models from scratch.
- Process: The learning rate linearly increases with each epoch during the warmup period. After the warmup, it follows the predefined learning rate schedule (which could be constant, decaying, or any other form).
Keras does not have a built-in function for Learning Rate Warmup, but it can be implemented using a custom learning rate scheduler, as demonstrated in the following example:
¡Gracias por tus comentarios!