Definition
Guardrails are programmable rules and constraints placed around a large language model at runtime to keep its behaviour safe, on-topic, and within policy. Rather than altering the model's training or weights, guardrails sit between the user and the model — and between the model and the application — inspecting, validating, and where necessary blocking or rewriting what passes through. They are an application-layer control, which means a self-hoster can add or change them without retraining anything.
Input rails and output rails
Guardrail frameworks typically distinguish several stages. Input rails screen the user's request — classifying it, and blocking or redirecting prompts that are off-topic, harmful, or adversarial (a defence against prompt injection). Output rails check the model's response before it reaches the user, validating it against content policy, format requirements, or factual-accuracy checks. Additional rails can govern retrieval results, dialogue flow, and tool or action calls made by an AI agent.
Why they matter for self-hosted AI
When you run a model locally for real tasks, guardrails are how you enforce your own boundaries: refusing certain topics, blocking sensitive-data leakage, and ensuring outputs match the structure your software expects. Open toolkits such as NVIDIA's NeMo Guardrails make these rules declarative and inspectable.
Guardrails complement careful prompt engineering and a well-designed system prompt, but they enforce constraints in code rather than relying on instructions alone.
In Simple Terms
Guardrails are programmable rules and constraints placed around a large language model at runtime to keep its behaviour safe, on-topic, and within policy. Rather than…
