Configure custom guardrails for PII and content
Most of what an organisation needs to keep out of its AI traffic is specific to that organisation. Generic provider safety filters catch the obvious categories (self-harm, explicit content, and overt abuse) but they know nothing about the internal project codename that must never reach a third-party model, the customer-record format that counts as personally identifiable information (PII) under the organisation's own policy, or the regulatory phrasing that legal requires be stripped from outbound responses. A custom guardrail is the mechanism for encoding those organisation-defined rules and having Tetrate Agent Router enforce them inline, on every request, before content reaches a model or returns to a caller.
A guardrail is a content-filtering rule (PII, keyword, regular expression, or machine-learning-based) enforced inline by the gateway. Custom guardrails are the rules an organisation defines for itself, in contrast to the vendor guardrails that a model provider ships natively. The two are complementary: vendor guardrails cover the provider's notion of safety, and custom guardrails cover the organisation's. This guide covers the work of building the custom kind in the Admin Dashboard: choosing a rule type, deciding where in the request lifecycle it runs, scoping it to the right traffic, setting the action taken on a match, testing it before it is allowed to block anything, and confirming that its decisions land in the audit log.
Persona: Platform operator working in the Admin Dashboard, often alongside security and compliance stakeholders who own the underlying policy.
Estimated time: 20--40 minutes for a first guardrail, including time spent testing; less thereafter once the pattern is familiar.
When this guide applies
This guide is the right starting point in any of these situations:
| Situation | What it covers |
|---|---|
| Redacting PII from prompts before they reach an external model | A PII-detection guardrail applied to inbound traffic with a redact action |
| Blocking an internal codename or restricted term from leaving the organisation | A keyword guardrail applied to outbound responses with a block action |
| Enforcing a structured-format rule that vendor filters do not understand | A regular-expression guardrail scoped to the relevant traffic |
| Detecting a category of content that simple matching cannot describe | A machine-learning-based guardrail with a flag or block action |
| Piloting a new content policy without disrupting live traffic | Testing a guardrail and running it in a flag-only mode before enforcement |
For the provider-native side, enabling and tuning the safety filters a model vendor supplies, see Configure vendor guardrails. For how a developer attaches an existing guardrail to a specific request path from the Console, see Protect requests with guardrails.
Outcomes
By the end of this guide:
- At least one custom guardrail exists with an appropriate rule type and match action.
- The guardrail is applied to inbound prompts, outbound responses, or both, as the policy requires.
- The guardrail is scoped to the intended traffic: a single model, a routing policy, or platform-wide.
- The guardrail has been tested against representative content before being allowed to enforce.
- The relationship between custom guardrails, vendor guardrails, and the audit log is clear.
Prerequisites
- Administrator access to the Admin Dashboard, typically the
super_adminrole or a role granted guardrail-management permissions. - A written statement of the policy the guardrail is meant to enforce. The most reliable guardrails start from a compliance or security requirement expressed in plain language, not from a pattern invented at configuration time.
- For keyword and regular-expression rules: the specific terms or the pattern to be matched, ideally reviewed with the stakeholder who owns the policy.
- Representative sample content, both content that should match and content that should not, for the testing step.
Step 1: choose the rule type
The rule type determines how a guardrail decides whether a piece of content matches. The platform supports four types, and the right one is dictated by what the policy is trying to catch.
| Rule type | What it matches | When to use it |
|---|---|---|
| PII detection | Recognised classes of personally identifiable information (names, email addresses, phone numbers, payment-card numbers, and similar) | Removing or blocking PII that should never reach an external model |
| Keyword | A fixed list of exact terms or phrases | Restricted codenames, banned product names, or any closed set of known strings |
| Regular expression | Text matching a supplied pattern | Structured formats such as internal record identifiers or ticket references that follow a predictable shape |
| Machine-learning classification | A category of content inferred by a model rather than described by a pattern | Toxicity, sensitive topics, or any category too varied to enumerate as keywords or a regex |
The choice is not always exclusive. A PII requirement is often met with the PII-detection type for the well-known classes and a regular-expression rule for an organisation-specific identifier that the PII detector does not recognise. Where a policy spans several of these, several guardrails are configured, each doing one job well, rather than one rule stretched to cover everything.
A common early mistake is reaching for a regular expression where PII detection or a keyword list would be clearer and more maintainable. Regular expressions are easy to get subtly wrong, and a pattern that is too broad blocks legitimate traffic while one that is too narrow misses the cases it was written for. The testing step exists in part to catch exactly this.
Step 2: create the guardrail
The guardrail entry carries the rule type chosen in Step 1, the matching configuration that type requires, and the match action.
-
Sign in to the Admin Dashboard.
-
Open the guardrails section from the sidebar.
-
Review the existing guardrails. Each entry surfaces the rule type, where it applies in the request lifecycle, its scope, its action, and whether it is currently enforcing.
-
Create a new guardrail.
-
Give it a descriptive name. The name appears wherever the guardrail is referenced, so a phrase that states the policy (such as
Redact customer PII inboundorBlock project codenames outbound) is more useful than a generic label. -
Select the rule type, then supply the matching configuration:
- For a PII-detection rule, select the classes of PII to detect.
- For a keyword rule, enter the list of terms.
- For a regular-expression rule, enter the pattern.
- For a machine-learning-classification rule, select the category and, if offered, a sensitivity level.
-
Set the match action, covered in Step 3.
-
Save the guardrail.
The guardrail is created in a non-enforcing state until it is both tested and applied. Creating it does not, by itself, change how any request is handled.
Step 3: decide the action on a match
The action is what the gateway does when content matches the rule. Three actions are available, and they differ sharply in how visible and how disruptive they are.
| Action | What happens on a match | Typical use |
|---|---|---|
| Redact | The matching span is removed or masked, and the request continues with the sanitised content | PII that should be stripped but should not stop the request |
| Block | The request is rejected and does not reach the model, or the response is withheld from the caller | Content that must never pass under any circumstances |
| Flag | The content passes unchanged, but the match is recorded | Observing how often a rule would fire before it is allowed to enforce |
Redact is the right action when the goal is to let the interaction proceed without the sensitive content, the canonical case for PII on inbound prompts. Block is the right action when a match means the interaction itself is not permitted. Flag is the safest action to start with for any new rule: it produces the same audit signal as the others without changing what callers experience, which makes it the natural mode for the pilot described in Step 6.
Step 4: choose the direction
A guardrail can inspect inbound prompts, outbound responses, or both. The direction is a property of the policy, not a default to accept without thought.
- Inbound inspects the prompt on its way to the model. PII redaction almost always belongs here: the sensitive content is removed before it ever leaves the organisation's boundary.
- Outbound inspects the model's response before it returns to the caller. Rules that govern what may leave the platform (a restricted term that a model might generate, or a category of content that must not be returned) belong here.
- Both applies the same rule in each direction. This is appropriate where the concern is symmetric, such as a keyword that must neither be sent to a model nor surfaced in a response.
Applying a rule in a direction where it cannot match wastes inspection effort and clutters the audit log with rules that never fire. Matching the direction to the policy keeps both the enforcement and the audit trail meaningful.
Step 5: scope the guardrail
Scope determines which traffic the guardrail inspects. The platform supports three levels, in increasing breadth.
| Scope | What it covers | When to use it |
|---|---|---|
| Per model | Only requests routed to a specific model | A rule that matters for one provider, for example stripping PII only before traffic reaches an external model, while an internal model is exempt |
| Per routing policy | All traffic governed by a named routing policy | A rule that should follow a defined class of traffic regardless of which backend serves it |
| Platform-wide | Every request through the gateway | A baseline rule that must hold everywhere, such as a universal block on a specific restricted term |
Narrow scopes are easier to reason about and less likely to produce surprising blocks, but a rule that must hold universally should be scoped platform-wide rather than replicated across many models. A useful pattern is a small set of platform-wide baseline guardrails for the rules that admit no exception, with narrower per-model or per-policy guardrails layered on top for context-specific concerns. Where several guardrails apply to the same request, each is evaluated; a block from any one of them stops the request.
Step 6: test before enforcing
A guardrail that has never been tested against real content is a guardrail that will eventually block something it should not, or miss something it should not. The platform provides a way to evaluate a rule against sample content before it is allowed to enforce.
- Open the guardrail and locate its test surface.
- Submit content that should match, and confirm the guardrail reports a match and the configured action.
- Submit content that should not match, and confirm the guardrail leaves it untouched. This second case catches the over-broad rule: the regular expression that matches more than intended, or the keyword that collides with legitimate text.
- For any rule of consequence, run it in flag mode against live traffic for a period before switching it to redact or block. Flag mode produces the full audit signal without affecting callers, which turns the question "will this rule misfire in production?" into an observation rather than a gamble.
Resolving every false match and every missed match at this stage is far cheaper than discovering them once the rule is rejecting real requests. Only after the rule behaves correctly in testing is it worth applying and enforcing it against live traffic.
How guardrail actions appear in the audit log
Every guardrail decision is recorded, which is what makes a guardrail defensible to a compliance stakeholder rather than merely active. When a guardrail fires, an entry is written to the audit log capturing which guardrail matched, the action taken (redact, block, or flag), the scope and direction in which it fired, and the request context, without recording the sensitive content itself.
This serves two purposes. During a pilot, the log is the evidence that a flag-mode rule is matching the right traffic at a reasonable rate before it is promoted to enforcement. In steady state, the log is the trail that answers a compliance question after the fact: how often a PII rule has redacted content, whether a block rule has ever fired, and which traffic triggered it. The full set of guardrail-related events and their fields is documented in Audit log events, and the workflow for reviewing them is covered in Audit platform activity.
What to do next
- Configure vendor guardrails: layer provider-native safety filters beneath the custom rules built here. See Configure vendor guardrails.
- Protect requests with guardrails: the developer-side view of attaching a guardrail to a specific request path from the Console. See Protect requests with guardrails.
- Audit platform activity: review the guardrail events generated by the rules configured here. See Audit platform activity.
- Reference: the definitions behind the terms used in this guide are in the glossary, and the guardrail event fields are in Audit log events.
Where to go next