Leer When Agents Fail and How to Catch It | Trust, Control and Safety

Veeg om het menu te tonen

Agents fail in predictable ways. Once you know the patterns, you will catch most problems before they cause any real damage. This chapter covers the most common failure modes, what causes them, and the practical signals that tell you something has gone wrong.

The Four Most Common Failure Modes

1. Hallucination – the agent states something incorrect with full confidence. This happens most often when it is working from general knowledge rather than a document you provided, or when it is asked about recent events, specific numbers, or niche topics outside its training data.

The signal: a specific fact, statistic or name that you cannot trace back to your source material.

2. Instruction drift – the agent follows the first part of your prompt well but gradually drifts away from your instructions as the output gets longer. You ask for five bullet points and get eight. You ask for a formal tone and the last paragraph becomes casual.

The signal: the end of a long output does not match the format or tone of the beginning.

Note

Instruction drift is more common with longer, more complex tasks. If you are asking the agent to produce something lengthy, break the task into smaller steps rather than asking for everything in one prompt.

3. Scope creep – the agent adds information, caveats or sections you did not ask for. This often happens when your prompt is slightly ambiguous and the agent tries to be helpful by filling in what it thinks you might need.

The signal: the output is longer than expected and contains sections that were not in your brief.

4. Overconfident gaps – the agent does not have the information it needs to complete the task but does not clearly flag this – instead it produces something that looks complete but glosses over the missing parts.

The signal: a section of the output is noticeably more vague or generic than the rest.

Definition

Instruction drift – when an agent follows the initial instructions in a prompt correctly but gradually deviates from them as it generates longer output. A common cause is that the agent's attention to the original instructions weakens as the response grows.

How to Catch Failures Before They Matter

Three habits will catch most agent failures before they cause problems.

Compare output to prompt. Before reading the output in full, check that it matches the structure and scope you asked for. If you asked for five points and got eight, read the extra three carefully – they may be useful or they may be noise.

Flag the vague parts. When a section of the output feels generic or non-specific, that is a signal the agent was guessing. Push on those sections specifically – paste them back and ask the agent to be more precise or to indicate if it does not have enough information.

Test claims that matter. Any specific fact, figure or quote that you plan to use should be verified against the source. If there is no source to check against, treat it as a working assumption until you can confirm it.

[Screenshot: Claude – a follow-up prompt asking the agent to clarify or source a specific claim from its previous response]

What should I do when I catch a significant error?

The right response depends on how significant the error is and where it appears in the output.

Minor error

For a minor error – a wrong date, a slightly off figure, a tone mismatch – correct it yourself and move on. It is faster than asking the agent to fix it.

Structural error

For a structural error – the agent misunderstood the task, the output is missing a major section, or the approach is fundamentally wrong – stop and rewrite the prompt. Add the context or constraints that would have prevented the error, and run it again.

Pattern of errors across multiple outputs

For a pattern of errors across multiple outputs – the same type of mistake appearing repeatedly – update your standing instructions to address it directly. If the agent keeps adding unnecessary caveats, add a line to your instructions that says "do not add caveats unless I ask for them".

Was alles duidelijk?

Bedankt voor je feedback!

Sectie 4. Hoofdstuk 2

Vraag AI

Vraag wat u wilt of probeer een van de voorgestelde vragen om onze chat te starten.

Sectie 4. Hoofdstuk 2