Detect and redact sensitive data

Sensitive data leaks at the gateway long before anyone notices. A learner's record pasted into a prompt, an API key embedded in a code snippet, a payment-card number copied into a support question: each of these can travel to a third-party model provider, and from there into a provider's own logs, the moment a request is forwarded unchecked. Data-loss prevention (DLP) is the discipline of stopping that traffic at the boundary: inspecting prompts on their way out and responses on their way back, recognising sensitive content, and removing, rejecting, or recording it before it crosses a line it should not cross.

Tetrate Agent Router enforces DLP through the same rule engine that backs its content guardrails. A guardrail is a content-filtering rule enforced inline by the gateway; DLP is the sensitive-data-focused application of that engine, configured in the Admin Dashboard and tuned for the specific categories a security or compliance team cares about: personally identifiable information (PII), source secrets, and API keys. This guide covers the work of standing up a DLP policy on the platform: choosing what to detect, deciding what happens on a match, scoping the policy to the right traffic, applying it in both directions, testing it before it enforces, and confirming that every match lands in the audit trail.

This guide does not re-explain the rule engine itself. The mechanics of creating a guardrail, the four rule types, and the testing surface are covered once, in Configure custom guardrails. What follows is the DLP-specific reading of that material: which detectors to reach for, and how to assemble them into a policy that a compliance stakeholder will accept.

Persona: Platform operator working in the Admin Dashboard, typically alongside the security and compliance stakeholders who own the underlying data-handling policy.

Estimated time: 30--45 minutes for a first DLP policy, including time spent testing against seeded samples.

When this guide applies

This guide is the right starting point in any of these situations:

Situation	What it covers
Stripping PII from prompts before they reach an external provider	A PII detector applied to inbound traffic with a redact action
Preventing learner records or customer data from leaving the platform	A detector scoped to the apps or teams that handle that data
Stopping API keys and secrets from being sent to or returned by a model	A secret detector applied in both directions with a block action
Encoding an organisation-specific record format the built-in detectors do not know	A custom pattern added alongside the built-in detectors
Demonstrating sensitive-data controls for a compliance review	A scoped policy with redact and block actions and an audit trail of matches

For the developer-side view, how a request path opts into the guardrails an operator has configured, see Protect requests with guardrails.

Outcomes

By the end of this guide:

At least one DLP policy exists, built from one or more built-in detectors and any custom patterns the policy requires.
The policy applies the appropriate action (redact, block, or alert) to each category of sensitive data.
The policy inspects inbound prompts, outbound responses, or both, matching the direction in which the data can leak.
The policy is scoped to the intended traffic by app, team, or use-case rather than applied indiscriminately.
The policy has been tested against seeded sample content before being allowed to enforce.
The relationship between DLP, custom guardrails, and the audit trail is clear.

Prerequisites

Administrator access to the Admin Dashboard, typically the super_admin role or a role granted guardrail-management permissions.
A written data-handling policy that states which categories of data are sensitive and what must happen to each. The most defensible DLP policies start from a compliance requirement expressed in plain language, not from a pattern invented at configuration time.
For any organisation-specific format (a learner identifier or an internal record number) the pattern that describes it, ideally reviewed with the stakeholder who owns the data.
Seeded sample content for the testing step: text that should match each detector, and realistic text that should not.
Familiarity with Configure custom guardrails, which describes the rule engine this guide applies.

Step 1: decide what counts as sensitive

DLP begins with a list, not a configuration screen. Before any detector is enabled, the categories of data the policy must catch should be named and agreed with the stakeholder who owns them. The platform groups its built-in detectors into the categories that recur across most organisations.

Detector category	What it recognises	Why it matters at the gateway
Common PII	Names, email addresses, phone numbers, postal addresses, payment-card numbers, government identifiers	This data must not reach an external provider or its logs without cause
Learner and customer records	Structured records about an individual, often a composite of PII fields and an organisation-specific identifier	Education and customer data carries regulatory and contractual handling obligations
Secrets	Passwords, private keys, connection strings, and credential-bearing tokens	A leaked secret is an immediate security incident, not a privacy concern
API keys	Provider and platform API keys, frequently pasted into prompts inside code	An exposed key grants whoever holds it the access it was issued for

The built-in detectors cover the common shapes of each category out of the box (the well-known PII classes and the recognisable forms of secrets and API keys) without any pattern being written by hand. Where a category includes something organisation-specific that the built-in detectors do not recognise, such as a learner identifier that follows an internal format, a custom pattern fills the gap. Custom patterns are added as regular-expression or keyword rules through the same guardrail mechanism; see Configure custom guardrails for how each rule type is configured.

A policy is usually assembled from several detectors, each doing one job, rather than one rule stretched to cover every category. A built-in PII detector for the well-known classes, a built-in secret detector for credentials, and a custom pattern for the organisation's own record format is a typical starting shape.

Step 2: choose the action for each category

The action is what the gateway does when content matches a detector. Three actions are available, and the right one differs by category. A single DLP policy commonly uses more than one: redact for data that should be stripped, block for data that must never pass, and alert where visibility is the goal.

Action	What happens on a match	Where it fits in a DLP policy
Redact	The matching span is masked or removed, and the request continues with the sanitised content	PII and learner records that should be stripped without stopping the interaction
Block	The request is rejected, or the response is withheld from the caller	Secrets and API keys, where a match means the interaction itself is not permitted
Alert	The content passes unchanged, but the match is recorded and a notification is raised	Observing how often a category would fire before enforcement, or flagging a category for review without disrupting traffic

Redact is the workhorse of a PII policy: the sensitive span is removed so the model never sees it, while the rest of the prompt proceeds and the interaction is not interrupted. Block is the right action where the presence of the data is itself the problem: a credential or API key has no legitimate reason to travel to a model, so a match should stop the request outright. Alert is the safest action to start any new detector with: it produces the same audit signal as the others without changing what callers experience, which makes it the natural mode for the pilot in Step 5. The three actions map onto the redact, block, and flag actions described in Configure custom guardrails; alert is the DLP framing of the flag action, paired with a notification.

Step 3: apply detection in both directions

Sensitive data leaks in two directions, and a DLP policy has to account for both.

Inbound inspection examines the prompt on its way to the model. This is where most PII redaction belongs: the sensitive content is removed before it leaves the organisation's boundary and before any provider can log it.
Outbound inspection examines the model's response before it returns to the caller. A model can reproduce sensitive data it was given earlier in a conversation, or surface a secret it inferred from context; outbound inspection catches data on its way back out.
Both directions apply the same detector each way. Secrets and API keys are the clearest case for symmetric inspection: a credential must neither be sent to a model nor returned in a response.

Matching the direction to the way each category can actually leak keeps the policy meaningful. A detector applied where its category cannot appear inspects to no purpose and clutters the audit trail with a rule that never fires.

Step 4: scope the policy by app, team, and use-case

A DLP policy rarely applies uniformly to all traffic. The same prompt that is unremarkable from an internal analytics tool may be a violation from a public-facing assistant. Scope is what lets one policy be strict where the data is sensitive and permissive where it is not.

Scope	What it covers	When to use it
By app	Traffic from a specific application or integration	A learner-facing app whose prompts routinely contain learner records, where an internal reporting tool would not
By team	Traffic owned by a named team or group	A team handling regulated data, held to a stricter standard than the rest of the organisation
By use-case	Traffic governed by a defined routing policy or class of use	A category of traffic (support, content generation, or evaluation) that carries a particular data-handling obligation regardless of which app originates it

Scoping in the Admin Dashboard reuses the guardrail scoping levels (per model, per routing policy, and platform-wide) described in Configure custom guardrails. The app, team, and use-case framing above is how those levels are read for DLP. A useful pattern is a small set of platform-wide baseline detectors for the categories that admit no exception (secrets and API keys typically belong here) with narrower app-, team-, or use-case-scoped detectors layered on top for the data that is sensitive only in certain contexts. Where several detectors apply to the same request, each is evaluated independently, and a block from any one of them stops the request.

Step 5: test against a seeded sample

A DLP policy that has never been tested against representative content will eventually redact something it should leave alone, or pass something it should have caught. The platform's guardrail testing surface evaluates a detector against sample content before it is allowed to enforce.

Open the detector and locate its test surface, as described in Configure custom guardrails.
Submit seeded content that should match (a sample record, a fabricated key in the expected format, or a synthetic secret) and confirm the detector reports a match and the configured action. The samples should be fabricated test data, never real sensitive data drawn from production.
Submit realistic content that should not match, and confirm the detector leaves it untouched. This second case catches the over-broad pattern: the custom rule that flags more than intended, or a detector that collides with legitimate text.
Run each detector of consequence in alert mode against live traffic for a period before switching it to redact or block. Alert mode produces the full audit signal without affecting callers, which turns the question of whether the detector will misfire in production into an observation rather than a gamble.

Resolving every false match and every missed match at this stage costs far less than discovering them once the policy is rejecting real requests. Only after a detector behaves correctly against the seeded sample is it worth promoting to enforcement.

How DLP relates to custom guardrails

DLP is not a separate subsystem. It is the sensitive-data-focused application of the same rule engine that backs every custom guardrail, configured through the same Admin Dashboard surface and enforced inline on the same request path. The distinction is one of intent rather than mechanism: a custom guardrail can encode any organisation-defined content rule, while a DLP policy is the subset of those rules aimed specifically at PII, learner and customer records, secrets, and API keys, assembled with redact, block, and alert actions and scoped to the traffic that handles such data.

The practical consequence is that everything in Configure custom guardrails applies here (the rule types, the action semantics, the scoping levels, and the testing surface) and this guide adds only the DLP reading of that material. An operator who has built a custom guardrail has already built the mechanism a DLP detector runs on.

How DLP matches appear in the audit trail

Every DLP match is recorded, which is what makes the policy defensible to a compliance stakeholder rather than merely active. When a detector fires, an entry is written to the audit log capturing which detector matched, the action taken (redact, block, or alert), the scope and direction in which it fired, and the request context, without recording the sensitive content itself. The audit trail proves the control is working without becoming a second copy of the data it was meant to protect.

This serves two purposes. During a pilot, the audit trail is the evidence that an alert-mode detector is matching the right traffic at a reasonable rate before it is promoted to enforcement. In steady state, it is the record that answers a compliance question after the fact: how often a PII detector has redacted content, whether a secret block has ever fired, and which traffic triggered it. Reviewing these events is covered in Audit platform activity. Because the audit trail records that a match occurred but not the sensitive content, the retention and purge of these records is governed separately; see Manage log retention and purge. Where a security team consumes DLP decisions in a central system, the events can be streamed out; see Export audit and policy decisions to a SIEM.

What to do next

Configure custom guardrails: the full mechanics of the rule engine this guide applies, including the rule types and testing surface. See Configure custom guardrails.
Protect requests with guardrails: the developer-side view of how a request path opts into the detectors configured here. See Protect requests with guardrails.
Audit platform activity: review the DLP matches generated by the policy built here. See Audit platform activity.
Manage log retention and purge: govern how long audit records of DLP matches are kept. See Manage log retention and purge.
Export audit and policy decisions to a SIEM: stream DLP decisions to a central security system. See Export audit and policy decisions to a SIEM.
Reference: the definitions behind the terms used in this guide are in the glossary.

Where to go next

Configure custom guardrails

The full mechanics of the rule engine this guide applies, including rule types and the testing surface.

Export audit and policy decisions to a SIEM

Stream DLP decisions to a central security system.

When this guide applies​

Outcomes​

Prerequisites​

Step 1: decide what counts as sensitive​

Step 2: choose the action for each category​

Step 3: apply detection in both directions​

Step 4: scope the policy by app, team, and use-case​

Step 5: test against a seeded sample​

How DLP relates to custom guardrails​

How DLP matches appear in the audit trail​

What to do next​