LLM Failures Zoo
A curated collection of real-world LLM failures. Learn from production incidents, understand failure patterns, and build better guardrails.
Showing 8 of 8 cases
JSON field drift breaks downstream parser
Model returns null instead of expected string type, causing parser crash
Problem
After prompt refactor, the status field started returning null instead of string
Minimal Repro
Fix
Added explicit type constraint in prompt
// .promptproof.yml
tests:
- name: order-status-extraction
schema:
type: object
required: ["status"]
properties:
status:
type: string
enum: ["pending", "processing", "shipped", "delivered"]
Output Diff
PII slip in support reply
Full email and phone number exposed in customer support response
Problem
Model included sensitive customer data in public-facing response
Minimal Repro
Fix
Added PII detection regex patterns
// .promptproof.yml
tests:
- name: no-pii-leak
deny_patterns:
- '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
- '\b\d{3}[-.]?\d{3}[-.]?\d{4}\b'
- '\b\d{3}-\d{2}-\d{4}\b'
Output Diff
Tool hallucination triggers phantom calendar event
Model invents non-existent tool function causing system errors
Problem
LLM called schedule_meeting() which doesn't exist in tool registry
Minimal Repro
Fix
Whitelist allowed tool names in validation
// .promptproof.yml
tests:
- name: valid-tool-use
custom: |
const output = JSON.parse(response);
const allowedTools = ['send_email', 'check_calendar'];
return allowedTools.includes(output.tool);
Output Diff
Summary invents fact with high confidence
Model adds information not present in source text
Problem
Summary included "40% revenue increase" not mentioned in original report
Minimal Repro
Fix
Added factual grounding check
// .promptproof.yml
tests:
- name: factual-summary
grounding:
method: semantic_similarity
threshold: 0.85
source: context
Output Diff
Refusal regression after prompt refactor
Model starts refusing legitimate requests after prompt update
Problem
New safety instructions too restrictive, blocking valid use cases
Minimal Repro
Fix
Adjusted safety boundaries in system prompt
// .promptproof.yml
tests:
- name: no-false-refusal
deny_patterns:
- 'I cannot'
- 'I am unable'
- 'I won\'t'
context: legitimate_requests
Output Diff
Unsafe SQL generation allows injection
Generated SQL query vulnerable to injection attacks
Problem
Model directly interpolates user input without parameterization
Minimal Repro
Fix
Enforce parameterized query generation
// .promptproof.yml
tests:
- name: safe-sql
deny_patterns:
- 'DROP\s+TABLE'
- 'DELETE\s+FROM'
- '--'
require_patterns:
- '\?|:\w+|\$\d+'
Output Diff
Output format inconsistency breaks API
Model switches between array and object format randomly
Problem
Inconsistent response structure causes API parsing failures
Minimal Repro
Fix
Strict JSON schema validation
// .promptproof.yml
tests:
- name: consistent-format
schema:
type: object
properties:
features:
type: array
items:
type: string
Output Diff
Infinite loop in chain-of-thought
Model enters recursive reasoning loop, exploding token usage
Problem
Unclear termination condition causes endless self-prompting
Minimal Repro
Fix
Token limit and step counter
// .promptproof.yml
tests:
- name: bounded-reasoning
max_tokens: 150
custom: |
const steps = response.match(/Step \d+/g);
return steps && steps.length <= 3;