Apprendre Prompt-Based Methods

Glissez pour afficher le menu

Prompt-based parameter-efficient fine-tuning methods — such as prompt tuning and prefix tuning — offer a distinct way to adapt large language models.

Unlike approaches that modify internal weights, these methods rely on virtual tokens and continuous embeddings to guide model behavior;
Virtual tokens are learnable vectors that are not part of the model's original vocabulary;
During training, you prepend these virtual tokens to the input sequence;
The model treats virtual tokens as additional context, optimizing their embeddings to encode task-specific information;
The input to the model becomes a concatenation of learned virtual token embeddings and the embeddings of your actual text.

This setup allows the prompt to serve as a soft instruction or bias, steering the model without changing its internal parameters.

Prefix tuning prepends learned embeddings to the key and value states in each attention layer, not just the input. In transformers, attention layers use key, value, and query matrices to relate tokens. By adding prefix embeddings to the key and value matrices, you bias the model's attention at every layer, influencing information flow throughout the network without changing core parameters.

Attention biasing acts as a soft constraint: instead of enforcing strict rules, you nudge the model's focus toward desired patterns using soft prompts. This approach is flexible, but it creates a capacity bottleneck — all task-specific information must fit into a limited set of learned embeddings. Complex tasks may exceed this capacity, limiting performance.

Prompt-based methods offer maximum flexibility and memory efficiency by updating only prompt embeddings, allowing easy task switching without changing base model parameters. However, they may slow inference, especially with longer prompts or prefix tuning. LoRA modifies model weights with low-rank adapters, which increases training memory use but usually delivers faster inference and leaves the model architecture unchanged.

Enable task adaptation by learning only a small number of virtual token embeddings;
Offer high flexibility by allowing easy prompt swapping for different tasks;
Minimize memory usage during fine-tuning, as only prompt parameters are updated;
May introduce inference slowdowns, especially with long prompts or deep prefix tuning;
Suffer from capacity bottlenecks, as all task-specific knowledge must fit in the prompt;
Generally less effective for highly complex or nuanced tasks compared to LoRA;
Do not require modifying or storing copies of the base model weights.

Tout était clair ?

Merci pour vos commentaires !

Section 2. Chapitre 3

Demandez à l'IA

Posez n'importe quelle question ou essayez l'une des questions suggérées pour commencer notre discussion

Section 2. Chapitre 3