Skip to main content

Protect requests with guardrails

Sending an AI request across a network boundary means trusting that nothing sensitive leaks outward and that nothing unsafe comes back. A guardrail moves that work into the gateway: a content-filtering rule (PII, keyword, regex, or machine-learning-based) enforced inline as a request passes through, before it reaches a provider and again before the response returns to the application. Every integration that routes through a protected API key inherits the same protection without code changes. This is the developer-facing view of guardrails: knowing which traffic is protected, recognising when a guardrail has acted on a request, and requesting coverage where it is missing. The operator-side configuration is covered separately and linked at the end.


Persona: Developer working in the Agent Router Console.

Estimated time: 10--15 minutes to review how guardrails apply to existing traffic and to confirm their effect in Request Logs.

When this guide applies

Guardrails are relevant whenever request or response content carries risk that should be handled before it crosses an application or provider boundary. The guide is especially useful in these situations:

SituationWhat guardrails address
User-supplied prompts may contain PIIRedaction or blocking before content reaches an external provider
Responses are shown directly to end usersFiltering of unsafe or disallowed content before it returns to the application
A workload must meet a data-handling or compliance requirementA consistent, centrally enforced control rather than per-service logic
An integration must behave identically across multiple providersEnforcement on the gateway applies regardless of which backend serves the request

Where the concern is which backend serves a request rather than what the request contains, Route requests across providers is the relevant guide instead.

Outcomes

By the end of this guide:

  • The role of a guardrail as an inline, gateway-enforced control is understood, along with the boundary between developer and operator responsibilities.
  • The way guardrails attach to traffic on an API key or routing path is understood well enough to reason about which requests are protected.
  • A guardrail action can be recognised from the response signals and from the corresponding entry in Request Logs.
  • A request for guardrail coverage can be raised with the operator team with enough detail to act on.

Prerequisites

  • A working API key with recent traffic against it, as set up in Route requests across providers.
  • Familiarity with Request Logs, as covered in Monitor traffic and usage. Request Logs is the primary surface for confirming guardrail behaviour.
  • For requesting new coverage: a point of contact on the platform operator team, since guardrails are defined and attached in the Admin Dashboard rather than the Console.

Step 1: understand how a guardrail acts on a request

A guardrail is evaluated inline by the gateway, on the same request path that carries every routed call. Two points on that path can be inspected:

  • On the request: before the prompt is forwarded to a provider. A guardrail can redact matched content (for example, masking an email address or an account number) so that the sanitised prompt is what the provider receives, or it can block the request outright so that nothing is forwarded at all.
  • On the response: before the model output returns to the calling application. A guardrail can redact matched content from the response or block the response so that the disallowed content is never delivered.

The available rule types span the common cases:

Rule typeWhat it matches
PIIPersonal data such as names, email addresses, phone numbers, and account identifiers
KeywordA fixed list of terms
RegexA pattern, for content with a predictable structure
ML-basedCategories detected by a model rather than by a literal match

The action taken on a match (redact or block) and the thresholds and patterns that define a match are operator-configured. From a developer's perspective, the contract is straightforward: a protected request is inspected on the way in and on the way out, and the gateway either passes it through, passes through a redacted version, or refuses it.

Step 2: know which traffic is protected

Guardrails are attached to traffic by the operator rather than chosen per request by the developer. A request is protected because the API key it presents, or the routing path that key resolves to, has a guardrail applied to it, not because the application opted in on the call itself.

This has two practical consequences:

  • Protection is inherited from the key. Every integration that presents a protected API key is subject to the same guardrails, with no change to the request payload. The model field, the message content, and the headers are unaffected; the protection is invisible in the request shape.
  • Coverage can differ between keys. One API key may carry a strict PII guardrail while another, used for an internal experiment, carries none. The one-key-per-purpose pattern described in Monitor traffic and usage makes this distinction clean: a key's purpose and its guardrail coverage can be reasoned about together rather than guessed at.

Because the Console does not expose guardrail attachment to developers directly, the reliable way to confirm whether a given key is protected, and by what, is to check with the operator team or to observe guardrail actions in Request Logs, as covered in Step 3.

Step 3: tell when a guardrail acted on a request

A guardrail leaves two kinds of evidence: an immediate signal in the response, and a durable record in Request Logs.

Response signals

  • A redaction is visible in the content itself. Where a guardrail masked matched text, the prompt the provider received, or the response the application received, contains the masked form rather than the original. A request that succeeds with altered content is the normal signature of a redaction guardrail.
  • A block is surfaced as a refusal rather than a model completion. The request does not reach the provider, or the response is withheld, and an error indicating that the content was disallowed is returned in place of a normal completion. Application code that already handles non-success responses from the gateway will surface this in the same path it uses for other errors.

Request Logs

Request Logs is the authoritative place to confirm what happened, because it records the request as the gateway processed it. For a request suspected of triggering a guardrail:

  1. Open Monitoring → Request Logs in the Console.
  2. Filter by the API key the integration uses and a time range covering the request.
  3. Locate the request and open its detail panel.
  4. Compare the recorded request and response content against what the application sent and received. Where a guardrail redacted content, the masked form is what the log shows.
  5. Read the status and any error message. A blocked request is recorded as a non-success outcome with an indication that content was disallowed, distinct from a provider failure or a rate-limit response.

The distinction matters when debugging: a request that an application reports as "failing" may have been blocked by a guardrail rather than rejected by a provider, and the two have entirely different remedies. Request Logs is where that distinction becomes clear. The full Request Logs workflow (filters, the detail panel, and reading the status column) is covered in Monitor traffic and usage.

Step 4: request guardrail coverage for an application

Because guardrails are defined and attached in the Admin Dashboard, adding or changing coverage for an application is a request to the operator team rather than a self-service action in the Console. A request that the operator can act on without a round trip includes:

  • The API key or keys the application uses, identified by name (for example, checkout-service-prod). Naming the key ties the request to a specific traffic path.
  • The content of concern: the categories of data or content that must be controlled, such as customer PII in prompts or disallowed categories in responses.
  • The required action: whether matched content should be redacted so the request still completes, or blocked so it does not. Redaction preserves functionality at the cost of altered content; blocking prioritises safety at the cost of failed requests. The right choice depends on the workload.
  • The direction: whether the concern is on the request (outbound to the provider), the response (inbound to the application), or both.
  • The expected traffic shape: a representative example of a prompt and response helps the operator choose a rule type and tune it without guesswork.

The operator translates this into a concrete rule and attaches it to the relevant keys or routing paths. The mechanics of that work (choosing between vendor-provided and custom rules, setting thresholds, and validating behaviour) are covered in the operator guides linked under What to do next. After a guardrail is attached, its effect can be confirmed using the Request Logs workflow in Step 3.

Step 5: reason about guardrails alongside routing and fallbacks

Guardrails and routing operate on the same request but answer different questions. Routing decides which backend serves a request; a guardrail decides whether and in what form the content is allowed to pass. Because the guardrail is enforced on the gateway rather than per provider, a few properties hold regardless of routing configuration:

  • Guardrails apply across every backend. A request protected by a PII guardrail is inspected the same way whether routing sends it to the primary model or, after a failure, to a fallback. The protection does not have to be reconfigured per provider, and switching providers does not silently drop it. See Improve resilience with fallbacks for the failover behaviour itself.
  • A block is not a failure that fallback should retry. Walking a fallback chain is the gateway's response to a backend that failed to serve a request. A request blocked by a guardrail was refused on content grounds, not failed on availability grounds, so it is not a candidate for failover to another backend; another provider would refuse the same content for the same reason.
  • Redaction is consistent across a traffic split. Where traffic is distributed by weight across two backends, a redaction guardrail sanitises content identically on whichever backend a given request lands on, so the split does not produce two different levels of protection. See Reduce cost with traffic splitting.

The practical takeaway is that guardrails compose cleanly with routing: protection is a property of the key and the path, applied uniformly, rather than something that has to be re-established for each backend a request might reach.

What to do next