Summary  
Introduces a technique for adapting large models by injecting trainable low-rank matrix updates into frozen weight matrices, so only a small set of parameters is learned (ΔW=BA) to preserve pre-trained knowledge and improve efficiency.

General domain of usage  
Machine learning model fine-tuning

The **LoRA** (Low-Rank Adaptation) method introduces a highly parameter-efficient way to fine-tune large models by injecting low-rank matrices into the frozen weight matrices of the model. Instead of updating the full weight matrix $$W$$ during training, LoRA adds a trainable update in the form of a low-rank matrix product: $$ΔW = BA$$, where $$B$$ and $$A$$ are learnable matrices of shapes $$(\text{out features}, r)$$ and $$(r, \text{in features})$$ respectively, and $$r$$ is the chosen rank. This update is then added to the original weights as $$W_{new} = W + ΔW = W + BA$$, allowing the model to adapt without modifying the vast majority of its parameters.

With **LoRA**, the training process focuses exclusively on the low-rank matrices $$B$$ and $$A$$, while the backbone weights $$W$$ are kept frozen. This means that only the parameters in $$BA$$ are learned, which greatly improves training stability and reduces the risk of **catastrophic forgetting**, since the original knowledge encoded in $$W$$ is preserved throughout fine-tuning.

The expressivity of **LoRA** is governed by the rank $$r$$ of the low-rank update. A higher rank increases the capacity of the update and allows the model to represent more complex changes, but also increases the number of trainable parameters, reducing parameter efficiency. Conversely, a lower rank means fewer parameters and greater efficiency, but may limit the types of updates that can be represented, potentially hurting final performance if the target task requires more capacity. Thus, choosing $$r$$ is a trade-off between efficiency and adaptability.

**Key Insights:**

- **LoRA** enables efficient adaptation by injecting trainable low-rank matrices into frozen weights;
- Only a small number of parameters are updated, reducing memory and computation needs;
- Keeping the backbone frozen preserves pre-trained knowledge and increases training stability;
- The mathematical formulation ($$ΔW = BA$$) ensures updates are low-rank by construction;
- Expressivity is limited by the chosen rank $$r$$: higher ranks increase capacity but reduce efficiency;
- **LoRA** may be less effective if the required adaptation cannot be captured by a low-rank update.

Which of the following statements about LoRA are accurate according to the key insights?

A mathematically rigorous, code-light exploration of parameter-efficient fine-tuning (PEFT) for large neural networks. This course emphasizes linear algebra, optimization, and architectural trade-offs, providing deep intuition and theoretical grounding for modern PEFT methods.

Establishes the mathematical and architectural motivations for parameter-efficient fine-tuning, introducing key concepts and the design space.

Delves into the mathematical and architectural details of the most prominent PEFT methods, with intuition and minimal code.

Analyzes the theoretical underpinnings, practical limitations, and deployment considerations for PEFT.