Guardrails Design โ Stopping Models Before They Go Wrong
Building safe harnesses with input filters, output validation, and action limits
Claude Code asks "should I delete this file?" That's an execution-stage guardrail. Delete without asking and there's no recovery.
Input Guardrails
Validate inputs before they reach the model.
Prompt injection defense โ separate user input so it can't overwrite system prompts. Isolating input with XML tags or delimiters is standard.
Token length limits โ when input exceeds the context window, you need a truncation strategy. Naive truncation drops critical info.
PII masking โ mask personal data (emails, phone numbers, SSN) before sending to the model.
Execution Guardrails
Restrict model actions when using tools.
Allowlist-based tool use โ whitelist callable tools. "File reads OK, file deletes need approval."
Timeout and retry limits โ prevent infinite loops. If 5 retries fail, escalate.
Sandboxed execution โ run code in isolated environments. Docker containers or VMs protect the host.
Output Guardrails
Validate model responses before delivering to users.
Schema validation โ verify JSON output matches expected schema. Catch missing required fields and type mismatches.
Harmful content filters โ block inappropriate model outputs. Sometimes checked by a separate classifier model.
Grounding checks โ verify that information the model referenced actually exists in the provided context. Prevents hallucination.
How It Works
Input guardrails โ prompt injection defense, token length limits, PII masking
Execution guardrails โ allowlist-based tool use, timeout, sandbox
Output guardrails โ schema validation, harmful content filter, grounding checks
All three stages needed for complete guardrails โ one alone leaves gaps