Learn Optimization Theory Behind PEFT | Theory, Limitations, Deployment

Swipe to show menu

Parameter-efficient fine-tuning (PEFT) works by freezing most pretrained model parameters and updating only a small subset. This approach reduces catastrophic forgetting, as most of the model's knowledge remains intact. It addresses the stability-plasticity dilemma by maintaining stability (retaining old knowledge) while allowing enough plasticity (learning new tasks) through the limited trainable parameters.

The optimization landscape of PEFT can be clearly described using a constrained optimization formulation. Instead of updating all parameters, you solve:

\text{minimize L(θ₀ + Δθ)}, \text{subject to}\ Δθ ∈ S,

where $L$ is the loss function, $θ₀$ represents the original pretrained parameters, and $Δθ$ denotes the parameter updates constrained to a specific subspace $S$ . Typically, $S$ is a low-dimensional subspace, such as the span of a few adapter weights or a set of low-rank matrices. This restriction fundamentally changes the way the optimizer explores the parameter space, focusing updates within a carefully chosen region rather than the entire high-dimensional space.

Restricting updates to a low-dimensional manifold in PEFT helps prevent overfitting and improves generalization by encouraging the reuse of pretrained knowledge. While this may limit model expressivity, carefully designed subspaces can capture essential adaptations for new tasks, balancing flexibility with the strengths of the original model.

Norm constraints and sparsity play an important role in further regularizing updates during PEFT. By imposing a constraint on the norm of the update (such as an L2 norm bound), you ensure that changes to the model are small and controlled, reducing the risk of destabilizing the pretrained knowledge. Sparsity constraints, where only a small number of parameters are allowed to change, can also help by focusing adaptation on the most relevant parts of the model. Both types of constraints act as regularizers, promoting solutions that are both effective for the new task and robust to overfitting.

Optimization Benefits of PEFT

Freezing most parameters preserves pretrained knowledge and reduces catastrophic forgetting;
Constrained optimization updates only a small, well-defined subspace, improving stability;
Limiting updates to a low-dimensional manifold balances generalization and adaptability;
Norm and sparsity constraints regularize updates, making adaptation robust and stable;
Choosing the right subspace and constraint structure is crucial for effective PEFT.

Everything was clear?

Thanks for your feedback!

Section 3. Chapter 1

Ask AI

Ask anything or try one of the suggested questions to begin our chat

Section 3. Chapter 1