Back to glossary

Agent Guardrails

Safety mechanisms that constrain agent behavior within acceptable boundaries, preventing harmful actions, excessive spending, or unauthorized access. Guardrails operate at the prompt, tool, and system levels to enforce policies.

Agent guardrails are the safety infrastructure that makes production agent deployment responsible. They include input validation (blocking prompt injection attempts), output filtering (preventing harmful or off-brand responses), action constraints (limiting which tools can be called and with what parameters), and resource limits (capping token usage, API calls, and execution time).

For any team deploying agents that interact with customers or modify production systems, guardrails are non-negotiable. Implement them in layers: prompt-level guardrails instruct the model on boundaries, tool-level guardrails validate parameters before execution, and system-level guardrails enforce hard limits regardless of model behavior. Common guardrails include spending caps per conversation, allowlists for permitted actions, PII detection and redaction, and content policy enforcement. Test guardrails adversarially, as the model may find creative ways to work around soft constraints. Hard system-level limits that cannot be bypassed by model outputs are your last line of defense.

Related Terms

Model Context Protocol (MCP)

An open standard that defines how AI models connect to external tools, data sources, and services through a unified interface. MCP enables agents to dynamically discover and invoke capabilities without hardcoded integrations.

Tool Use

The ability of an AI model to invoke external functions, APIs, or services during a conversation to perform actions beyond text generation. Tool use transforms language models from passive responders into active problem solvers.

Function Calling

A model capability where the AI generates structured JSON arguments for predefined functions rather than free-form text. Function calling provides a reliable bridge between natural language understanding and programmatic execution.

Agentic Workflow

A multi-step process where an AI agent autonomously plans, executes, and iterates on tasks using tools, reasoning, and feedback loops. Agentic workflows go beyond single-turn interactions to accomplish complex goals.

ReAct Pattern

An agent architecture that interleaves Reasoning and Acting steps, where the model thinks about what to do next, takes an action, observes the result, and repeats. ReAct combines chain-of-thought reasoning with tool use in a unified loop.

Chain of Thought

A prompting technique that instructs the model to break down complex problems into sequential reasoning steps before producing a final answer. Chain of thought significantly improves accuracy on math, logic, and multi-step tasks.