LoRA
The LoRA (Low-Rank Adaptation) method introduces a highly parameter-efficient way to fine-tune large models by injecting low-rank matrices into the frozen weight matrices of the model. Instead of updating the full weight matrix W during training, LoRA adds a trainable update in the form of a low-rank matrix product: ΞW=BA, where B and A are learnable matrices of shapes (outΒ features,r) and (r,inΒ features) respectively, and r is the chosen rank. This update is then added to the original weights as Wnewβ=W+ΞW=W+BA, allowing the model to adapt without modifying the vast majority of its parameters.
With LoRA, the training process focuses exclusively on the low-rank matrices B and A, while the backbone weights W are kept frozen. This means that only the parameters in BA are learned, which greatly improves training stability and reduces the risk of catastrophic forgetting, since the original knowledge encoded in W is preserved throughout fine-tuning.
The expressivity of LoRA is governed by the rank r of the low-rank update. A higher rank increases the capacity of the update and allows the model to represent more complex changes, but also increases the number of trainable parameters, reducing parameter efficiency. Conversely, a lower rank means fewer parameters and greater efficiency, but may limit the types of updates that can be represented, potentially hurting final performance if the target task requires more capacity. Thus, choosing r is a trade-off between efficiency and adaptability.
Key Insights:
- LoRA enables efficient adaptation by injecting trainable low-rank matrices into frozen weights;
- Only a small number of parameters are updated, reducing memory and computation needs;
- Keeping the backbone frozen preserves pre-trained knowledge and increases training stability;
- The mathematical formulation (ΞW=BA) ensures updates are low-rank by construction;
- Expressivity is limited by the chosen rank r: higher ranks increase capacity but reduce efficiency;
- LoRA may be less effective if the required adaptation cannot be captured by a low-rank update.
Thanks for your feedback!
Ask AI
Ask AI
Ask anything or try one of the suggested questions to begin our chat
Can you explain how LoRA compares to other fine-tuning methods?
What are some practical applications of LoRA in real-world models?
How do I choose the optimal rank $$r$$ for my task?
Awesome!
Completion rate improved to 11.11
LoRA
Swipe to show menu
The LoRA (Low-Rank Adaptation) method introduces a highly parameter-efficient way to fine-tune large models by injecting low-rank matrices into the frozen weight matrices of the model. Instead of updating the full weight matrix W during training, LoRA adds a trainable update in the form of a low-rank matrix product: ΞW=BA, where B and A are learnable matrices of shapes (outΒ features,r) and (r,inΒ features) respectively, and r is the chosen rank. This update is then added to the original weights as Wnewβ=W+ΞW=W+BA, allowing the model to adapt without modifying the vast majority of its parameters.
With LoRA, the training process focuses exclusively on the low-rank matrices B and A, while the backbone weights W are kept frozen. This means that only the parameters in BA are learned, which greatly improves training stability and reduces the risk of catastrophic forgetting, since the original knowledge encoded in W is preserved throughout fine-tuning.
The expressivity of LoRA is governed by the rank r of the low-rank update. A higher rank increases the capacity of the update and allows the model to represent more complex changes, but also increases the number of trainable parameters, reducing parameter efficiency. Conversely, a lower rank means fewer parameters and greater efficiency, but may limit the types of updates that can be represented, potentially hurting final performance if the target task requires more capacity. Thus, choosing r is a trade-off between efficiency and adaptability.
Key Insights:
- LoRA enables efficient adaptation by injecting trainable low-rank matrices into frozen weights;
- Only a small number of parameters are updated, reducing memory and computation needs;
- Keeping the backbone frozen preserves pre-trained knowledge and increases training stability;
- The mathematical formulation (ΞW=BA) ensures updates are low-rank by construction;
- Expressivity is limited by the chosen rank r: higher ranks increase capacity but reduce efficiency;
- LoRA may be less effective if the required adaptation cannot be captured by a low-rank update.
Thanks for your feedback!