Skip to main content

Key concepts

A handful of terms show up again and again across these docs. Some are common to any AI gateway; others are specific to how Tetrate Agent Router is built and run.

Read it end to end the first time through. When you just need a quick definition later, the Glossary in the Reference section gives the one-line version.


Structural concepts

These are the building blocks: the components of the platform itself and how they're deployed.

Data plane and management plane

The platform splits into two halves that talk over a small, well-defined interface.

The data plane handles the request path. It's a customer-managed Kubernetes deployment containing the Controller and a proxy component, and every AI request and response flows through it. That's the key point: your traffic, payloads, prompts, completions, and credentials never leave your network. The data plane receives application requests, evaluates routing policy, attaches the right credentials, calls the upstream AI provider, and returns the response.

The management plane is the Tetrate-hosted control surface. It stores routing rules, policies, model and provider definitions, user records, API key metadata, audit history, and analytics rollups. The data plane pulls its configuration from here and pushes telemetry back, but no request payloads ever cross the boundary. You keep full data sovereignty without having to operate the control-plane stack yourself.

The Console and the Admin Dashboard

Two web applications front the platform, each scoped to a different audience.

The Agent Router Console is the developer-facing application. It's where you issue API keys, author routing policies, assemble MCP profiles, configure integrations, and test prompts in the Playground. Developers spend most of their time here.

The Admin Dashboard is the platform-operator application. It's where you provision models and providers, manage users, set budgets and rate limits, review audit logs, and configure SSO. Most of the platform's governance happens here.

The two applications share the same underlying system. They're two views of the same data, each scoped to the responsibilities of its audience.


Routing concepts

Routing is the platform's central job: for each incoming AI request, deciding which backend should serve it. A few concepts describe how that decision gets made.

Backends, models, and providers

A provider is an upstream AI service the platform can talk to: OpenAI, Anthropic, Google, Azure OpenAI, Mistral, and many others, plus self-hosted endpoints and customer-specific deployments. A model is a specific offering from a provider, such as gpt-4o or claude-sonnet-4-6. A backend, in routing terms, is a model-on-a-provider combination paired with the credentials needed to reach it. Every routing decision ultimately resolves to a choice of backend.

Routing chains and policies

Each API key carries a routing chain: an ordered or weighted list of backends the gateway considers when a request arrives. The chain encodes the consumer's preferences, fallbacks, and constraints, and one or more routing policies attached to the key shape how it behaves.

Fallback policy

A fallback policy is an ordered list of backends. When a request arrives, the gateway tries the first backend in the chain; if that call fails with a recoverable error (a 5xx, a rate-limit response, or a connection timeout), it moves to the next backend and tries again, continuing until one succeeds or the chain runs out. The fallback decision happens inside the gateway, so your application sees a single request that either succeeds or fails. It never has to implement retry logic of its own.

Fallback policies are how you manage provider risk: a primary commercial model with one or more secondary models behind it, or a commercial model with a self-hosted open-weights model as a last-resort backstop.

Traffic splitting

A traffic splitting policy distributes requests across two or more backends by percentage rather than by priority. A 70/30 split, for example, sends 70 % of requests to the first backend and 30 % to the second. Use it to evaluate one model against another under real production traffic, to manage cost by routing a portion of traffic to a cheaper backend, or to roll out a new provider gradually.

Splitting and fallback combine: a split picks the primary backend for a given request, and a fallback chain takes over if that backend fails.

Advanced routing rules

When neither weighted splitting nor priority-ordered fallback is enough, advanced routing rules dispatch requests based on attributes of the request itself: the model the caller asked for, custom headers, request metadata, or other signals. Use them to encode policies such as "route requests from this tenant to this provider" or "send anything tagged code-completion to a model tuned for that task".


Credentials and identity

Identity in the platform is layered. Applications authenticate to the gateway with API keys; human operators authenticate to the Console and Admin Dashboard with single sign-on; and the platform authenticates to upstream providers using credentials that are either platform-managed or supplied by the consumer.

API keys

An API key is the credential an application presents when it calls the gateway. Each key ties to a consumer (a person or a service), a routing chain, and any policies attached to that chain. Keys are issued from the Admin Dashboard or claimed by developers in the Console, depending on your onboarding model. Revocation, rotation, and per-key budgets are all managed centrally.

Bring your own key (BYOK)

Bring Your Own Key, or BYOK, is the model under which a consumer's own upstream provider credentials are used in place of platform-managed ones. A team that holds its own OpenAI account, for example, can register those credentials with the platform and have its requests routed through them rather than through the platform's pooled credentials.

BYOK is a first-class part of the routing model: BYOK and platform-managed backends can sit in the same routing chain, and policy, not application code, decides between them. Reach for BYOK when billing, compliance, or contractual relationships require a specific team's traffic to land on a specific provider account.

Single sign-on and role mapping

The platform integrates with corporate identity providers through single sign-on (SSO) over OIDC. Microsoft Entra ID, Okta, Auth0, Keycloak, and any OIDC-compliant provider can handle authentication for the Console and the Admin Dashboard. Once SSO is configured, role assignment is driven from the identity provider itself: app role or group claims in the OIDC token map to Agent Router roles on every login, so directory changes propagate without manual work in the platform. The Configuring SSO: Mapping Roles and Groups guide covers the mapping mechanics in detail.


Agent infrastructure

Beyond model routing, the platform gives you a control surface for the agent ecosystem, particularly the Model Context Protocol servers that AI clients consume.

Model context protocol (MCP)

The Model Context Protocol (MCP) is an emerging standard for exposing tools, data sources, and context to AI clients. MCP servers act as adapters: one wraps a file system, another a ticketing API, another a documentation source, and so on. AI clients such as Claude Code, Cursor, and VS Code consume those servers to extend their own capabilities.

MCP profile

An MCP profile is a single endpoint that aggregates multiple MCP servers behind one interface. Instead of configuring each AI client with a separate connection to every server it needs, you assemble a profile centrally, govern it with the platform's identity and access model, and expose it as one URL that clients connect to once. Profiles get rid of the per-agent fan-out that becomes unmanageable as the MCP ecosystem grows, and they bring MCP traffic under the same audit, access-control, and observability regime as standard model traffic.


Extensibility and observability

Two more concepts shape how the platform plugs into your surrounding software estate: the extension model inside the gateway, and the telemetry interface that exports operational data to external systems.

Dynamic modules

The gateway is built on a proxy component, and platform-specific behaviour is delivered through dynamic modules: high-performance proxy extensions, written in Rust or Go, compiled to shared libraries that run inline in the filter chain. Dynamic modules implement the routing logic, credential handling, and provider-specific request translation that set Agent Router apart from a generic HTTP gateway. You don't configure them directly, but they're why the gateway behaves consistently across providers and why request-path overhead stays low.

OpenTelemetry export

The platform speaks OpenTelemetry (OTEL), the open observability standard. Request-level metrics, traces, and logs are exported in OTEL format to whatever observability stack you already run: Grafana, Datadog, New Relic, Honeycomb, or any other OTEL-compatible backend. The platform operator owns the export configuration and applies it centrally, so individual applications don't need separate instrumentation to show up in your existing dashboards.

Audit logging

Alongside operational telemetry, the platform keeps an audit log of administrative actions: model and provider changes, user and role assignments, SSO configuration changes, API key issuance and revocation, and other state-modifying events in the Admin Dashboard. You can query the audit log from the Admin Dashboard, and it's the primary record reviewed during compliance investigations or post-incident analysis. The audit event schema is documented in the Reference section.