Installation guide for Azure
This guide installs the Agent Router data plane on Azure Kubernetes Service (AKS).
Architecture
The Agent Router platform uses a split-plane model. A Management Plane hosted by Tetrate holds configuration, license data, and the web dashboard. A Data Plane runs in a customer-managed Kubernetes cluster and handles all AI traffic. The two planes communicate over a single outbound HTTPS connection initiated by the data plane. No inbound connections from the internet are required.
The data plane needs a stable public hostname so applications can reach it; TLS certificates are issued for hostnames, not IP addresses. DNS and TLS are configured in Step 8 and Appendix B.
The procedure produces:
- An Azure Container Registry mirroring the Agent Router images
- An AKS cluster running the data plane in the
tars-systemandtars-dataplanenamespaces - An Azure Application Gateway fronting the data plane at a customer-owned DNS name
- A registered data plane URL on the management plane
Plan for 30--60 minutes of installation time, plus DNS propagation.
Table of contents
-
Prepare for the installation: obtain the data plane credential and install the CLI
-
Cluster setup: create or reuse a Kubernetes cluster
-
Registry setup: mirror Agent Router images so the cluster can pull them
-
Data Plane installation: deploy the data plane
-
Ingress setup: expose the data plane externally
-
DNS configuration: wire the hostname to the ingress and register the URL on the management plane
-
Testing the installation: verify the install works end-to-end
-
Appendices
Prerequisites
Dashboard and Router app access
Agent Router exposes two web surfaces. Both URLs are provided during onboarding:
- Dashboard (admin):
https://dashboard.<your-tenant>.tetrate.ai. Used in Step 1 and Step 8. - Router app (end-user):
https://router.<your-tenant>.tetrate.ai. Used in Step 11 for creating API keys and MCP profiles.
Required tools
Install the following on the workstation used for this guide:
| Tool | Install |
|---|---|
az CLI | https://learn.microsoft.com/cli/azure/install-azure-cli |
kubectl | https://kubernetes.io/docs/tasks/tools |
helm (3+) | https://helm.sh/docs/intro/install |
docker | https://docs.docker.com/get-docker |
curl | Preinstalled on macOS and most Linux distributions |
tare CLI | Covered in Step 2 |
A data-plane-credentials.json file is also required. See Step 1.
Infrastructure
A dedicated workload cluster must be provisioned before starting the installation. The cluster requires at least three nodes. See Cluster sizing for more detail.
Tetrate support does not cover client-side infrastructure provisioning or Kubernetes issues. The instructions for creating clusters and related infrastructure components are provided as a courtesy and should be carefully evaluated before executing them.
Azure permissions
The following role assignments are required on the subscription used for deployment:
| Role | Scope | Required for |
|---|---|---|
| Contributor | The resource group | AKS, ACR, App Gateway creation |
| Azure Kubernetes Service Contributor Role | The AKS cluster | Enabling the AGIC addon |
| AcrPush | The container registry | Pushing synced images |
| User Access Administrator | The resource group | Attaching ACR to AKS |
| Network Contributor | The AKS managed RG (MC_*) | Letting AGIC manage App Gateway state |
| Log Analytics Contributor | The linked Log Analytics RG | Required only when Container Insights is enabled |
Cluster sizing
The default chart installs multiple always-on components: egress envoy (minimum 2 replicas), AI gateway controller and ext_proc, controller and worker, Redis, and rate-limit services. A single-node footprint is not sufficient.
The egress envoy is the dominant resource consumer. Its CPU and memory usage scale with the configuration size held in memory: the count of AIGatewayRoute and AIServiceBackend resources, header-mutation rules, and per-route features. The AI gateway team's control-plane scaling benchmark shows roughly linear CPU and memory growth as routes are added. Plan capacity for route counts that grow as providers, models, and projects are added. General-purpose VM sizes from the Standard_D*s_v5 family provide a balanced default.
| Size | Use case | Recommended node pool | Approximate allocatable target |
|---|---|---|---|
| Small | Dev / test / low traffic | 3 × Standard_B2s | ≥ 6 vCPU, ≥ 20 GiB RAM |
| Medium | Staging / light production | 3 × Standard_D4s_v5 | ≥ 12 vCPU, ≥ 40 GiB RAM |
| High | Production with burst headroom | 3 × Standard_D8s_v5 (or split into system + data plane pools) | ≥ 24 vCPU, ≥ 80 GiB RAM |
Maintain a minimum of three nodes to tolerate node upgrades and evictions. Demo installs may start at Small; production installations should start at Medium.
AKS Automatic is not supported.
Conventions
All commands assume the environment variables defined in Step 3 are exported in the current shell. Re-export them when opening a new terminal.
Replace every <placeholder> value with a site-specific one.
Step 1: obtain the data plane credential
In the dashboard, navigate to System → Settings → Data plane and click Generate Data plane credential.
Save the downloaded file as data-plane-credentials.json on the workstation used for installation. This file is the long-lived identity the data plane uses to authenticate to the management plane.
Some parts of the product still use the older "service account" naming for this file. The dashboard is standardizing on "data plane credential"; the file is the same.
Each data plane uses its own credential. Credentials can be revoked from the dashboard, and additional credentials can be generated (for example, one per environment) at any time.
Step 2: install the tare CLI
Run the installer script:
curl -sSL https://tare.tetrate.ai/tools/install.sh | bash
Output:
==> tare installer
==> channel: stable
==> Detected platform: darwin-arm64
==> Installing tare for darwin-arm64...
==> Downloading from: https://tare.tetrate.ai/tools/tags/v0.1.0-beta.4/tare-darwin-arm64.tar.gz
ok Installed tare to /Users/johndoe/.tare/bin/tare
==> tare version: tare version v0.1.0-beta.4
ok Installation directory is already in your PATH
==> Get started:
tare install identity.json --serve-url https://proxy.acme.com
tare install --help
The installer prints the install path (typically ~/.tare/bin/tare). Add it to PATH and verify the version:
export PATH="$PATH:$HOME/.tare/bin"
echo 'export PATH="$PATH:$HOME/.tare/bin"' >> ~/.zshrc # or ~/.bashrc
$ tare --version
tare version v0.1.0-beta.4
Step 3: provision the aks cluster
Step 7 uses the AGIC addon (Application Gateway Ingress Controller). The az aks create flags below configure the cluster networking for AGIC compatibility.
Reusing an existing AKS cluster. Check AGIC compatibility before continuing:
az aks show -n <cluster-name> -g <resource-group> \
--query 'networkProfile.{plugin: networkPlugin, mode: networkPluginMode, dataplane: networkDataplane}'
- If the output is
{plugin: "azure", mode: null, dataplane: "azure"}, AGIC is supported. Skip to Step 4. - If
modeisoverlayordataplaneiscilium, AGIC is not supported. Use Appendix A: AGC for ingress.
Step 3.1: set environment variables
RESOURCE_GROUP=<resource-group>
LOCATION=<region>
AKS_CLUSTER_NAME=<cluster-name>
# Use the regional default Kubernetes version to avoid an unexpected end-of-support situation.
K8S_VERSION=$(az aks get-versions --location "${LOCATION}" --query "values[?isDefault].version | [0]" -o tsv)
# Optional resource tags; adapt or remove as needed.
TAGS=(
owner="<your-name>"
team="<your-team>"
purpose=development
)
Step 3.2: sign in to Azure
az login
az account set --subscription <subscription-id>
Step 3.3: create the resource group
az group create \
--name "${RESOURCE_GROUP}" \
--location "${LOCATION}" \
--tags "${TAGS[@]}"
Step 3.4: create the aks cluster
az aks create \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--location "${LOCATION}" \
--kubernetes-version "${K8S_VERSION}" \
--node-count 3 \
--node-vm-size Standard_D4s_v5 \
--network-plugin azure \
--network-dataplane azure \
--enable-managed-identity \
--generate-ssh-keys \
--tags "${TAGS[@]}"
Provisioning takes approximately five minutes. AKS includes a default CSI driver, so no additional configuration is required for the persistent volumes used by the data plane's Redis state.
The --node-count and --node-vm-size values above correspond to the Medium tier in Cluster sizing. Adjust as needed.
Step 3.5: fetch the kubeconfig
az aks get-credentials \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--file ~/kubeconfig-${AKS_CLUSTER_NAME}
export KUBECONFIG=~/kubeconfig-${AKS_CLUSTER_NAME}
kubectl get nodes
Expected output:
NAME STATUS ROLES AGE VERSION
aks-nodepool1-xxxxxxxx-vmss000000 Ready <none> 2m v1.34.x
aks-nodepool1-xxxxxxxx-vmss000001 Ready <none> 2m v1.34.x
Step 4: create an Azure container registry
Create the registry that the next step syncs Agent Router images into:
ACR_NAME=<globally-unique-ACR-name> # 5–50 lowercase alphanumeric chars
az acr create \
--resource-group "${RESOURCE_GROUP}" \
--name "${ACR_NAME}" \
--sku Standard \
--tags "${TAGS[@]}"
Step 5: sync Agent Router images to acr
Step 5.1: authenticate Docker to the acr
az acr login --name "${ACR_NAME}"
This refreshes local Docker credentials for ${ACR_NAME}.azurecr.io for roughly three hours. Re-run this command if unauthorized: authentication required errors occur during image sync.
Step 5.2: sync images
Copy the container images from Tetrate's registry into the registry created above. The tare CLI authenticates to the source registry automatically. Only the destination ACR requires a local login.
tare install /path/to/data-plane-credentials.json \
--image-sync ${ACR_NAME}.azurecr.io/tare \
--sync-only
The sync produces no progress output and takes several minutes. After it completes, verify the images:
az acr repository list --name "${ACR_NAME}" -o tsv
The output should list ten repositories under the tare/ prefix, including ai-gateway-controller, envoy-tars, gateway, liaison, ratelimit, redis, tare-doctor, and valet.
Step 5.3: grant aks pull access to acr
Attach the ACR to AKS so the cluster's managed identity can pull images:
az aks update \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--attach-acr "${ACR_NAME}"
No image-pull secret is required; AKS handles authentication via its managed identity.
If the account lacks User Access Administrator on the resource group, the command above fails with Could not create a role assignment for ACR. Fall back to the admin-user flow:
az acr update --name "${ACR_NAME}" --admin-enabled true
ACR_USERNAME=$(az acr credential show --name "${ACR_NAME}" --query "username" -o tsv)
ACR_PASSWORD=$(az acr credential show --name "${ACR_NAME}" --query "passwords[0].value" -o tsv)
Pipe these credentials to tare install in Step 6 using --image-pull-secret-stdin.
Step 6: install the Agent Router data plane
tare install /path/to/data-plane-credentials.json \
--image-sync ${ACR_NAME}.azurecr.io/tare
The tare install command performs the following actions:
- Creates the
tars-systemandtars-dataplanenamespaces. - Installs the Agent Router data plane via Helm.
If the admin-user fallback from Step 5.3 was used, pipe the credentials so tare install creates the image-pull secret:
echo "${ACR_USERNAME}:${ACR_PASSWORD}" | \
tare install /path/to/data-plane-credentials.json \
--image-sync ${ACR_NAME}.azurecr.io/tare \
--image-pull-secret-stdin
Step 7: expose the data plane via agic
The data plane terminates external traffic on a single in-cluster service: egress in tars-dataplane on port 10080. It serves both LLM API requests (/v1/*) and MCP traffic (/mcp/*, /.well-known/*); no separate routes are needed.
Step 7.1: enable agic on the cluster
This provisions an Azure Application Gateway (Standard_v2) and wires it to AKS. Provisioning takes approximately five minutes.
az aks enable-addons \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--addons ingress-appgw \
--appgw-name "${AKS_CLUSTER_NAME}-appgw" \
--appgw-subnet-cidr 10.225.0.0/24
The --appgw-subnet-cidr must not overlap any existing subnet in the VNet (the default AKS subnet is 10.224.0.0/16). Keep the size at /24; this works for any cluster networking and is required for clusters that ever used Overlay.
Verify the controller is running:
kubectl get pods -n kube-system -l app=ingress-appgw
A single ingress-appgw-deployment-* pod should be Running. A small number of restarts during the first few minutes is normal while the controller reconciles against the in-progress ARM provisioning.
Step 7.2: create the ingress
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tars-ingress
namespace: tars-dataplane
annotations:
# AGIC's default health probe is GET / on the backend, but the egress envoy
# only serves /v1/* and /mcp/* paths and returns 404 for /. Without these
# two annotations, AGIC marks the backend unhealthy and every request
# returns 502 Bad Gateway.
appgw.ingress.kubernetes.io/health-probe-path: /
appgw.ingress.kubernetes.io/health-probe-status-codes: "200-499"
spec:
ingressClassName: azure-application-gateway
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: egress
port:
number: 10080
EOF
Step 7.3: retrieve the public ip
kubectl get ingress tars-ingress -n tars-dataplane
Expected output:
NAME CLASS HOSTS ADDRESS PORTS AGE
tars-ingress azure-application-gateway * 40.x.x.x 80 30s
Record the ADDRESS value; it is used in Step 8.
The Ingress resource persists across tare install re-runs and does not need to be re-applied.
TLS is required for production.
The Ingress above listens on HTTP/80 only, which is acceptable for local testing but not for any customer-facing deployment. Configure TLS on the Ingress before going to production. Appendix B describes two example mechanisms: bring-your-own certificate and cert-manager.
Step 8: wire DNS and register the URL
Step 8.1: add the DNS a record
Create an A record pointing the data plane hostname to the App Gateway address from Step 7.3:
<your-data-plane-hostname>. A <appgw-ip-from-step-7-3>
The App Gateway listens on port 80 by default; no port suffix is required.
DNS propagation typically takes one to two minutes. Verify with:
dig +short <your-data-plane-hostname>
# Should return the App Gateway IP
Step 8.2: register the URL on the management plane
In the dashboard, navigate to System → Settings → Profile → Proxy URL and set it to https://<your-data-plane-hostname> (or http://... if TLS was skipped for local testing).
Changes propagate to the data plane within approximately 30 seconds.
For demo installations without a domain, <appgw-ip>.nip.io resolves automatically. Use it as a placeholder, then switch to a real hostname before going to production.
Step 9: verify provider routes
Providers (OpenAI, Anthropic, and others) and their upstream API keys are configured during Agent Router onboarding, not during this install. The data plane retrieves that configuration automatically once it is connected to the management plane.
Verify the data plane received the provider routes:
kubectl get aigatewayroutes -A
kubectl get aiservicebackends -A
Both should show Accepted resources within a minute of the data plane coming up. If the lists are empty, consult the Agent Router onboarding guide to confirm providers are configured.
Step 10: verify the install
Run tare doctor:
tare doctor /path/to/data-plane-credentials.json --verbose
Pass criteria: all in-cluster checks report Status: OK (or Healthy) with 0 errors, 0 warnings, and the final line confirms the health-report bundle was accepted (Sending health report ... OK (bundle <id>)).
Expected output (abridged):
CHECKS PERFORMED:
- Namespace existence (system, dataplane)
- CRD presence (Gateway API, AI Gateway, RouteDeployment)
- Controller deployments ready (TARS, AI Gateway, Envoy Gateway)
- Proxy deployment ready
- GatewayClass and Gateway accepted/programmed
- EnvoyPatchPolicy acceptance (per instance)
- EnvoyProxy acceptance (per instance)
- Egress EnvoyProxy image uses envoy-tars
- Identity Secret and ConfigMap present
- AIServiceBackend acceptance
- Envoy Gateway Backend acceptance
- BackendTrafficPolicy acceptance
- BackendSecurityPolicy acceptance
- BackendTLSPolicy acceptance
- ClientTrafficPolicy acceptance
- HTTPRouteFilter presence/acceptance
- ReferenceGrant presence
- RouteDeployment status conditions
- AIGatewayRoute acceptance/resolution
- HTTPRoute parent acceptance/resolution
- MCPRoute parent acceptance/resolution
- Proxy admin and forward endpoints
- Pod CrashLoopBackOff (excluding tars-config-monitor)
Sending health report to https://api.<your-tenant>.tetrate.ai/v1/dataplane-status... OK (bundle <bundle-id>)
Send a request with an invalid token to confirm auth is enforced:
curl -sS -o /dev/null -w "HTTP %{http_code}\n" \
"http://<your-data-plane-hostname>/v1/chat/completions" \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer NotREAL" \
-d '{"model":"gpt-5-mini","messages":[{"role":"user","content":"hi"}]}'
# Expected: HTTP 401
If the response is HTTP 200, contact support. Auth is enforced automatically on every install. If tare doctor reports Status: Broken, see Troubleshooting.
Step 11: smoke tests
In the router app (https://router.<your-tenant>.tetrate.ai), select API Keys in the sidebar and create a key. Applications (and the tests below) use this key as Authorization: Bearer ....
Set the host and key once:
export DP_HOST=<your-data-plane-hostname>
export TARS_API_KEY="<your-api-key-from-router-app>"
Chat Completions (OpenAI shape)
curl -s "http://${DP_HOST}/v1/chat/completions" \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer ${TARS_API_KEY}" \
-d '{
"model": "gpt-5-mini",
"messages": [{"role": "user", "content": "hello, what are you?"}]
}'
Anthropic Messages (native shape)
curl -s "http://${DP_HOST}/v1/messages" \
-X POST -H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-H "Authorization: Bearer ${TARS_API_KEY}" \
-d '{
"model": "claude-haiku-4-5",
"max_tokens": 64,
"messages": [{"role": "user", "content": "hello"}]
}'
List available models
curl -s "http://${DP_HOST}/v1/models" \
-H "Authorization: Bearer ${TARS_API_KEY}" | jq '.data[].id' | sort -u
Streaming
curl -s "http://${DP_HOST}/v1/chat/completions" \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer ${TARS_API_KEY}" \
-d '{
"model": "gpt-5-mini",
"stream": true,
"messages": [{"role": "user", "content": "count from 1 to 5"}]
}'
MCP
Create an MCP profile in the router app: MCP Profiles (sidebar) → Create profile. The profile ID is used in the URL below.
curl -s "http://${DP_HOST}/mcp/<profile-id>" \
-X POST -H "Content-Type: application/json" \
-H "Authorization: Bearer ${TARS_API_KEY}" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
To register the MCP profile with Claude Code:
claude mcp add --transport http <profile-name> \
http://${DP_HOST}/mcp/<profile-id> \
--header "Authorization: Bearer ${TARS_API_KEY}"
Upgrading
To upgrade the data plane to a newer Agent Router release:
- Install the new
tareCLI version (re-run Step 2). - Re-run image sync (Step 5.2) and
tare install(Step 6).
Cluster state (namespaces, ingress, DNS, and dashboard configuration) persists across upgrades.
Cleanup
To remove the deployment, delete resources in dependency order:
# 1. Ingress + AGIC addon (also releases the App Gateway)
kubectl delete ingress tars-ingress -n tars-dataplane --ignore-not-found
az aks disable-addons \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--addons ingress-appgw 2>/dev/null || true
# 2. Helm release
helm uninstall tars -n tars-system 2>/dev/null || true
# 3. ACR
az acr delete --resource-group "${RESOURCE_GROUP}" --name "${ACR_NAME}" --yes
# 4. AKS cluster
az aks delete \
--resource-group "${RESOURCE_GROUP}" \
--name "${AKS_CLUSTER_NAME}" \
--yes --no-wait
# 5. Resource group (catches anything left behind)
az group delete --name "${RESOURCE_GROUP}" --yes --no-wait
# 6. Kubeconfig
rm -f ~/kubeconfig-${AKS_CLUSTER_NAME}
If helm uninstall tars hangs on finalizers (for example, GatewayClass//tars-egress still exists), see the finalizer cleanup procedure in Troubleshooting.
Troubleshooting
Issues are grouped by the step where they are most likely to occur.
Image synchronization issues
| Symptom | Cause | Fix |
|---|---|---|
401 Unauthorized on HEAD https://registry.tetrate.ai/v2/... | The credential is not authorized to pull from registry.tetrate.ai. | Regenerate the credential from Dashboard → System → Settings → Data plane → Credentials and retry. |
unauthorized: authentication required from the destination registry | The az acr login token has expired (~3h default). | Re-run az acr login --name "${ACR_NAME}" and retry the sync. |
Acr pull access
| Symptom | Cause | Fix |
|---|---|---|
Pods in ImagePullBackOff with 403 Forbidden from ACR | The image-pull secret was not created or is missing from the pod's namespace. | Confirm in both namespaces: kubectl get secret tars-image-pull-secret -n tars-dataplane and kubectl get secret tars-image-pull-secret -n tars-system. If missing, re-run tare install with the --image-pull-secret-stdin flow from the Step 6 tip. |
Agic
| Symptom | Cause | Fix |
|---|---|---|
az aks enable-addons fails with AuthorizationFailed: ... managedClusters/write | Caller lacks AKS write permission. | Grant Azure Kubernetes Service Contributor Role on the AKS resource. |
LinkedAuthorizationFailed: ... Microsoft.OperationalInsights/workspaces/sharedkeys/read | Container Insights is enabled and AGIC requires read access on the linked Log Analytics workspace. | Grant Log Analytics Contributor on the linked workspace RG, or disable Container Insights: az aks disable-addons -n <cluster> -g <rg> --addons monitoring. |
Ingress has no ADDRESS after five minutes; AGIC log reports App Gateway in stopped state | AGIC reconciled too early. | Restart the AGIC controller: kubectl delete pod -n kube-system -l app=ingress-appgw. The replacement pod re-reads state and programs the gateway. |
AGIC log loops on Waiting for overlay extension config to be ready | Cluster uses Cilium dataplane or Azure CNI Overlay; AGIC does not support either. | Switch to Appendix A: AGC, or recreate the cluster with traditional Azure CNI. |
Ingress has an ADDRESS but curl returns 502 Bad Gateway | AGIC's default health probe is GET / and the egress envoy returns 404 there. | The Ingress YAML in Step 7.2 sets the health-probe-path and health-probe-status-codes: "200-499" annotations. Add them if missing; AGIC reconciles within ~30 seconds. |
Testing
| Symptom | Cause | Fix |
|---|---|---|
HTTP 404 with body No matching route found. It is likely because the model specified in your request is not configured in the Gateway. | The requested model name is not configured, or no providers are configured. | Verify kubectl get aigatewayroutes -A shows Accepted rows. If empty, consult the Agent Router onboarding guide. Changes propagate to the data plane within ~30 seconds. |
| HTTP 404 with empty body | No AIGatewayRoute resources exist in the cluster. | Check the data plane is connected to the management plane: kubectl logs -n tars-system deployment/controller-worker --tail=50. The log entry No secret found for provider indicates the provider key did not reach the data plane; contact the Agent Router onboarding team. |
| HTTP 401 with a valid bearer | The API key was issued against a different management plane than this data plane is registered to. | Issue a new key from the router app for this tenant (https://router.<your-tenant>.tetrate.ai → API Keys → Create). |
HTTP 502 from the dashboard playground (but not from direct curl) | The URL registered on the management plane does not match what the App Gateway serves. Most common cause: registered https://<host> but the App Gateway only listens on HTTP/80. | Either enable TLS on the App Gateway (see Appendix B) and keep the https:// URL, or set the registered URL in Dashboard → System → Settings → Data planes to http://<host> to match. |
Cleanup: Helm uninstall hangs
| Symptom | Cause | Fix |
|---|---|---|
helm uninstall tars times out with resource GatewayClass//tars-egress still exists. status: Terminating | Custom resource finalizers block namespace deletion when controllers exit before the finalizer drains. | Force-clear finalizers in two passes. (1) Clear gateway-related CRs: kubectl patch gatewayclass tars-egress --type=merge -p '{"metadata":{"finalizers":[]}}' and repeat for aiservicebackends, backendsecuritypolicies, mcproutes, tarsroutedeployments. (2) Once namespaces start terminating, do the same for aigatewayroutes. Then kubectl delete ns tars-system tars-dataplane. |
Appendix a: alternative ingress (agc)
Use AGC when the AKS cluster runs the Cilium dataplane or Azure CNI Overlay, neither of which is supported by AGIC, and recreating the cluster is impractical. AGC works on any CNI.
AGC planning is required at cluster creation time: the AKS cluster needs --enable-oidc-issuer and --enable-workload-identity. Existing clusters without these flags can be updated using az aks update. No rebuild is required.
High-level steps:
- Register the resource provider:
az provider register --namespace Microsoft.ServiceNetworking - Create a user-assigned managed identity for the ALB controller.
- Grant the identity
AppGw for Containers Configuration Manageron the cluster's node resource group andNetwork Contributoron the cluster's VNet. - Federate the identity with the AKS OIDC issuer.
- Install the ALB controller via Helm (
oci://mcr.microsoft.com/application-lb/charts/alb-controller). - Create a delegated subnet for AGC and an
ApplicationLoadBalancerCR. - Create a
Gateway(Gateway API) andHTTPRouteinstead of anIngress.
For the full walkthrough, see the Microsoft documentation.
Replace Step 7 with the AGC setup. Step 8 and all subsequent steps are unchanged; only the ingress provisioning differs.
Appendix b: enable TLS
The main flow uses HTTP-only on port 80 so the install can complete without a certificate. Production deployments require TLS on the App Gateway. AGIC supports any certificate delivery mechanism that produces a kubernetes.io/tls Secret in the cluster. Two example flows are described below. Existing TLS provisioning workflows (corporate CA, Azure Key Vault, internal PKI) can be used by delivering the resulting certificate and key as a tls Secret named in the Ingress.
B.1: bring your own certificate
Create the secret from an existing fullchain and key:
kubectl create secret tls tars-ingress-tls \
--cert=path/to/fullchain.pem \
--key=path/to/privkey.pem \
-n tars-dataplane
Update the Ingress to use the secret, with host scoping and HTTP-to-HTTPS redirect:
metadata:
annotations:
appgw.ingress.kubernetes.io/health-probe-path: /
appgw.ingress.kubernetes.io/health-probe-status-codes: "200-499"
appgw.ingress.kubernetes.io/ssl-redirect: "true"
spec:
ingressClassName: azure-application-gateway
tls:
- hosts: [proxy.example.com]
secretName: tars-ingress-tls
rules:
- host: proxy.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: egress
port:
number: 10080
B.2: cert-manager and let's encrypt
cert-manager auto-issues and auto-renews certificates from Let's Encrypt. The HTTP-01 challenge runs through the same AGIC ingress configured in Step 7, so no additional infrastructure is required. This option suits sites without an existing certificate workflow.
Prerequisites:
- The DNS A record from Step 8.1 must be live (
dig +short <your-host>returns the App Gateway IP). Let's Encrypt validates over DNS and HTTP. - The App Gateway must listen on HTTP/80 (default from Step 7).
- An email address for Let's Encrypt expiry notices.
Step b.2.1: install cert-manager
One-time, cluster-wide:
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager --create-namespace \
--version v1.18.2 \
--set crds.enabled=true
Verify the install:
kubectl get pods -n cert-manager
# Expect 3 pods Running: cert-manager-*, cert-manager-cainjector-*, cert-manager-webhook-*
Step b.2.2: create the clusterissuer
cat <<'EOF' | kubectl apply -f -
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
email: [email protected] # Replace with a real address
server: https://acme-v02.api.letsencrypt.org/directory
privateKeySecretRef:
name: letsencrypt-prod-account-key
solvers:
- http01:
ingress:
ingressClassName: azure-application-gateway
EOF
Confirm the issuer reached the Ready state:
kubectl get clusterissuer letsencrypt-prod
# NAME READY AGE
# letsencrypt-prod True 20s
For testing, point server: at https://acme-staging-v02.api.letsencrypt.org/directory. The staging issuer has higher rate limits and a separate root, preserving the production quota. Switch to production once the certificate issues cleanly on staging.
Step b.2.3: update the ingress with TLS and cert-manager annotations
Replace the Ingress from Step 7.2 with the version below. Two additions: a tls: block referencing a Secret cert-manager will create, and the cert-manager.io/cluster-issuer annotation that triggers issuance.
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tars-ingress
namespace: tars-dataplane
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
appgw.ingress.kubernetes.io/ssl-redirect: "true"
appgw.ingress.kubernetes.io/health-probe-path: /
appgw.ingress.kubernetes.io/health-probe-status-codes: "200-499"
spec:
ingressClassName: azure-application-gateway
tls:
- hosts: [proxy.example.com] # Replace with the data plane hostname
secretName: tars-ingress-tls
rules:
- host: proxy.example.com # Must match tls.hosts
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: egress
port:
number: 10080
EOF
The host: on the rule must match the hostname in tls.hosts and the hostname registered for the data plane on the management plane. A mismatch causes either AGIC to reject the rule or Let's Encrypt to fail the HTTP-01 challenge.
Step b.2.4: wait for cert-manager to issue the certificate
kubectl get certificate -n tars-dataplane -w
# Initial: READY=False (cert-manager solves the HTTP-01 challenge)
# After ~1m: READY=True
If the certificate remains READY=False for more than a couple of minutes, inspect the order and challenge status:
kubectl describe certificate tars-ingress-tls -n tars-dataplane
kubectl get challenge -n tars-dataplane
kubectl describe challenge -n tars-dataplane # Reports the precise solver error
Common challenge failures and fixes:
| Failure message | Fix |
|---|---|
Self-check failed: ... acme: server returned a non-2xx HTTP status (404) | AGIC has not yet programmed the solver path. Wait ~30 seconds; cert-manager creates a solver Ingress and AGIC reconciles. |
dns: NXDOMAIN or no IP for hostname | DNS A record has not propagated. Confirm with dig +short <host>. |
urn:ietf:params:acme:error:rateLimited | Let's Encrypt quota exceeded. Switch to the staging issuer (see Step B.2.2 tip) and retry. |
Step b.2.5: verify HTTPS end-to-end
curl -v https://<your-data-plane-hostname>/v1/models \
-H "Authorization: Bearer ${TARS_API_KEY}" \
2>&1 | grep -E '^[<>] (HTTP|x-amz|Authorization|expire date|issuer)' | head -10
Expected: a clean TLS handshake (no certificate errors) and HTTP/2 200. Verify the certificate chain:
echo | openssl s_client -servername <host> -connect <host>:443 2>/dev/null \
| openssl x509 -noout -subject -issuer -dates
# subject= CN = proxy.example.com
# issuer= C = US, O = Let's Encrypt, CN = R10
# notAfter=... (~90 days from issue)
Step b.2.6: update the registered URL
If an http://... URL was registered in Step 8.2, update it to https://.... The data plane URL on the management plane must match the protocol the App Gateway serves.
Renewal: cert-manager renews automatically at two-thirds of the certificate's lifetime (approximately 60 days for Let's Encrypt's 90-day certificates). No manual action is required.
Appendix c: forward observability data to an OpenTelemetry collector
The data plane can stream envoy HTTP access logs and router_* application metrics to a customer-managed OpenTelemetry Collector, which forwards to any backend (Azure Monitor, Datadog, Grafana Cloud, SigNoz).
The two streams are configured in separate fields on the EnvoyProxy resource:
| Stream | EnvoyProxy field | Contents |
|---|---|---|
| Access logs (per-request HTTP metadata) | accessLog.sinks[] | Method, status, path, latency, MCP headers, downstream/upstream addresses |
Metrics (router_* and envoy native stats) | metrics.sinks[] | router_requests_total, router_model_requests_total, plus envoy cluster/listener counters |
Either stream can be configured independently; the instructions below cover both in order.
Available metrics
| Name | Type | Labels |
|---|---|---|
router_requests_total | Counter | method, endpoint, status_code |
router_request_duration_ms | Histogram | method, endpoint, status_code |
router_errors_total | Counter | type, endpoint, status, model, provider |
router_streaming_requests_total | Counter | model, provider, endpoint |
router_model_requests_total | Counter | model, provider, endpoint, byok |
router_auth_attempts_total | Counter | result, auth_mode |
router_balance_checks_total | Counter | result |
router_overrun_protections_total | Counter | reason |
Two equivalent scrape paths are available:
- Direct envoy admin (
<egress-pod>:19001/stats/prometheus): metric names appear as listed above. This is the lightest setup for Prometheus-only consumers that do not need access logs. - OpenTelemetry Collector (configured below): exposes the same
router_*names onotel-collector.tars-dataplane.svc:9464/metrics, plus an OTLP-gRPC receiver for the access-log stream. Recommended when a single collection point fans out to multiple backends.
If router_* metrics do not appear after sending traffic, ask the MP operator to verify PROXY_CONFIG is set on the management plane; these metrics require it.
Filter probe and scanner noise before building dashboards
When the data plane is exposed on a public address, two sources contribute noise to router_errors_total and router_auth_attempts_total:
- Health probes (AGIC, AGC, or any load balancer): the probe pings the backend every few seconds. The egress envoy applies its auth filter before path matching, so unauthenticated probes register as authentication failures on the probe path (default
/). - Internet bot and scanner traffic: any public IP attracts opportunistic scans targeting paths such as
/wiki,/favicon.ico,/SDK/webLanguage, and/invoker/EJBInvokerServlet. Each scan increments the auth-failure counter.
These counters reach the thousands within a few hours. Unfiltered charts make a healthy service appear to be failing.
Filter to the data plane's real endpoints (/v1/* and /mcp/*):
# Prometheus: keep only real customer traffic
router_requests_total{endpoint=~"^/v1/.*|^/mcp/.*"}
router_errors_total{endpoint=~"^/v1/.*|^/mcp/.*"}
// Azure Log Analytics / Application Insights equivalent
| extend endpoint = tostring(customDimensions.endpoint)
| where endpoint startswith "/v1/" or endpoint startswith "/mcp/"
Step c.1: deploy the OpenTelemetry collector
Deploy into the tars-dataplane namespace; any other namespace fails with unknown namespace for the cache. The ConfigMap below wires both logs and metrics pipelines, exposes router_* on a Prometheus scrape endpoint (:9464), and is ready to fan out to additional backends. See Send to a real observability backend.
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: tars-dataplane
data:
config.yaml: |
receivers:
otlp:
protocols:
http: { endpoint: 0.0.0.0:4318 }
grpc: { endpoint: 0.0.0.0:4317 }
processors:
batch: { timeout: 5s }
# Strip envoy's internal dynamic-modules scope prefix so router_* metrics
# ship with their canonical names (router_requests_total, etc.) instead
# of dynamicmodulescustom.router_requests_total.
transform/strip_scope:
metric_statements:
- context: metric
statements:
- replace_pattern(name, "^dynamicmodulescustom\\.", "")
exporters:
# In-cluster Prometheus scrape target. Names land clean as router_*.
prometheus:
endpoint: 0.0.0.0:9464
namespace: ""
send_timestamps: true
metric_expiration: 30m
resource_to_telemetry_conversion: { enabled: true }
service:
pipelines:
metrics:
receivers: [otlp]
processors: [transform/strip_scope, batch]
exporters: [prometheus]
# Add a 'logs' pipeline when forwarding envoy HTTP access logs
# (per-request method, status, path, MCP headers) to a backend
# such as Azure Monitor or Datadog. See "Send to a real
# observability backend" below for an example.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: tars-dataplane
spec:
replicas: 1
selector: { matchLabels: { app: otel-collector } }
template:
metadata: { labels: { app: otel-collector } }
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.98.0
ports:
- { containerPort: 4317, name: otlp-grpc }
- { containerPort: 4318, name: otlp-http }
volumeMounts:
- { name: config, mountPath: /etc/otelcol-contrib }
resources:
requests: { cpu: 100m, memory: 128Mi }
limits: { cpu: 250m, memory: 256Mi }
volumes:
- name: config
configMap: { name: otel-collector-config }
---
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: tars-dataplane
spec:
selector: { app: otel-collector }
ports:
- { name: otlp-grpc, port: 4317, targetPort: 4317 }
- { name: otlp-http, port: 4318, targetPort: 4318 }
- { name: prometheus, port: 9464, targetPort: 9464 }
EOF
Once the manifest is applied, an in-cluster Prometheus can scrape http://otel-collector.tars-dataplane.svc:9464/metrics and find clean router_requests_total, router_model_requests_total, and similar names. The transform/strip_scope processor removes envoy's internal scope prefix before export, so dashboards and alerts work without dealing with the OTel encoding.
Step c.2: add the metrics sink to the envoyproxy
Push router_* and envoy native stats from the egress envoy into the collector. The access-log sink is a separate, opt-in step described in Forwarding access logs.
kubectl patch envoyproxy tars-egress-proxy -n tars-system --type=merge -p '{
"spec":{"telemetry":{"metrics":{"sinks":[
{"type":"OpenTelemetry","openTelemetry":{"backendRefs":[
{"group":"","kind":"Service","name":"otel-collector","namespace":"tars-dataplane","port":4317,"weight":1}
]}}
]}}}
}'
Step c.3: restart the egress Envoy
kubectl rollout restart -n tars-dataplane deployment/egress
kubectl rollout status -n tars-dataplane deployment/egress --timeout=120s
Step c.4: verify
After running the smoke tests, scrape the collector's Prometheus endpoint to confirm router_* metrics are flowing with clean names:
POD=$(kubectl get pods -n tars-dataplane -l app=otel-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n tars-dataplane pod/$POD 9464:9464 &
sleep 2
curl -s http://localhost:9464/metrics | grep '^router_' | head
Expected: router_requests_total, router_model_requests_total, router_auth_attempts_total, router_request_duration_ms_bucket, and similar names with non-zero counts matching the traffic sent.
If router_* metrics do not appear after traffic, ask the MP operator to check PROXY_CONFIG on the management plane.
For deeper diagnostics, the collector's self-metrics report pipeline throughput:
kubectl port-forward -n tars-dataplane pod/$POD 8888:8888 &
sleep 2
curl -s http://localhost:8888/metrics | grep otelcol_exporter_sent
Non-zero otelcol_exporter_sent_metric_points confirms metrics are leaving the collector toward each configured exporter. The _log_records counter appears only after a logs pipeline is added (see Forwarding access logs).
The EnvoyProxy metrics-sink patch resets on every tare install re-run. Re-apply after each reinstall, or script it as a post-install hook.
Send to a real observability backend
The starter configuration exports router_* metrics to an in-cluster Prometheus endpoint only. Two common extensions:
- Fan out metrics to a managed backend (Azure Monitor, Datadog, Grafana Cloud): add a backend exporter alongside
prometheus. - Forward HTTP per-request access logs: opt-in. Requires both an EnvoyProxy patch (to make egress emit access logs) and a logs pipeline in the collector.
The Azure Monitor walkthrough below shows both.
Example: Azure monitor / application insights
Create the Application Insights resource (workspace-based, reusing the AKS Log Analytics workspace):
az monitor app-insights component create \
--app tars-dp-insights \
--location "${LOCATION}" \
--kind web \
--resource-group "${RESOURCE_GROUP}" \
--workspace "/subscriptions/<sub-id>/resourceGroups/<workspace-rg>/providers/Microsoft.OperationalInsights/workspaces/<workspace-name>"
CONN_STR=$(az monitor app-insights component show \
--app tars-dp-insights -g "${RESOURCE_GROUP}" \
--query connectionString -o tsv)
Store the connection string in a secret and inject it as an environment variable into the collector pod:
kubectl create secret generic otel-azure-creds -n tars-dataplane \
--from-literal=APP_INSIGHTS_CONN_STR="${CONN_STR}"
kubectl set env deploy/otel-collector -n tars-dataplane --from secret/otel-azure-creds
Add the azuremonitor exporter (ships with otel/opentelemetry-collector-contrib) alongside the default prometheus. Keep the transform/strip_scope processor in the metrics pipeline so names appear in Azure Monitor as clean router_*. The logs pipeline carries envoy HTTP per-request access logs and does not include prometheus, since that exporter handles only metrics:
exporters:
prometheus:
endpoint: 0.0.0.0:9464
namespace: ""
send_timestamps: true
metric_expiration: 30m
resource_to_telemetry_conversion: { enabled: true }
azuremonitor:
connection_string: ${env:APP_INSIGHTS_CONN_STR}
service:
pipelines:
metrics:
receivers: [otlp]
processors: [transform/strip_scope, batch]
exporters: [prometheus, azuremonitor]
logs:
receivers: [otlp]
processors: [batch]
exporters: [azuremonitor]
Environment variable syntax. The collector requires ${env:VAR_NAME} (with the env: prefix). Plain ${VAR_NAME} silently fails to substitute and the exporter does not load. The collector log shows no error, so check the otelcol_exporter_sent_log_records and _metric_points self-metrics on :8888 to confirm.
Restart and verify both exporters are shipping:
kubectl rollout restart deploy/otel-collector -n tars-dataplane
# After a curl to /v1/chat/completions:
POD=$(kubectl get pods -n tars-dataplane -l app=otel-collector -o jsonpath='{.items[0].metadata.name}')
kubectl port-forward -n tars-dataplane pod/$POD 8888:8888 &
sleep 2
curl -s http://localhost:8888/metrics | grep otelcol_exporter_sent
Both prometheus and azuremonitor exporters should show non-zero otelcol_exporter_sent_log_records and _metric_points.
Where to view data: Azure Portal → Application Insights tars-dp-insights → Logs (for KQL) or Workbooks (for custom dashboards).
Default Application Insights panes will not populate. The Overview, Performance, Failures, and Application Map panes require AI-native event types (requests, dependencies, exceptions). The azuremonitor OTel exporter does not translate envoy access logs into those types; the data resides in customMetrics (envoy and router_*) and traces (envoy access logs). Build a Workbook with the queries below for a usable dashboard.
Useful queries (paste into the Logs pane of tars-dp-insights):
// Recent envoy access logs
traces
| where timestamp > ago(15m)
| extend method = tostring(customDimensions["method"]),
status = tostring(customDimensions["response_code"]),
route = tostring(customDimensions["route_name"])
| project timestamp, method, status, route, duration=customDimensions["duration"]
| order by timestamp desc
// router_* metrics: latest cumulative values per metric
customMetrics
| where timestamp > ago(1h)
| where name startswith "router_"
| extend endpoint = tostring(customDimensions.endpoint)
| where endpoint startswith "/v1/" or endpoint startswith "/mcp/" // Exclude probe and scanner noise
| summarize total = max(valueSum) by name
| order by name asc
// Per-minute request rate by status code
customMetrics
| where timestamp > ago(1h)
| where name == "router_requests_total"
| extend status = tostring(customDimensions.status_code)
| summarize cum = max(valueSum) by bin(timestamp, 1m), status
| order by status, timestamp asc
| serialize
| extend per_min = cum - prev(cum, 1, 0)
| where per_min >= 0
| render timechart
Counter aggregation: use max, not sum. Every router_*_total is a cumulative counter. OTel re-ships the current value on every flush (~5s default), so customMetrics rows accumulate by hundreds per hour. sum(valueSum) inflates the result by orders of magnitude (for example, 1.8M when the real cumulative count is ~2,300).
- For cumulative totals:
max(valueSum)(latest snapshot). - For rates over time: compute deltas with
serialize | extend ... = cum - prev(cum, 1, 0).
Forwarding access logs (optional)
Adds envoy HTTP per-request access logs (method, status, path, latency, MCP headers, downstream and upstream addresses) on top of the metrics. Two changes are required; both can be applied incrementally without re-running tare install.
1. Patch the EnvoyProxy to emit access logs to the collector. The default accessLog block varies between tare builds; check the existing shape before patching:
kubectl get envoyproxy tars-egress-proxy -n tars-system \
-o jsonpath='{.spec.telemetry.accessLog}'
# Non-empty: use Path A. Empty: use Path B.
Path A: JSON-patch (default accessLog already present, append a sink):
kubectl patch envoyproxy tars-egress-proxy -n tars-system --type=json -p '[
{
"op": "add",
"path": "/spec/telemetry/accessLog/settings/0/sinks/-",
"value": {
"type": "OpenTelemetry",
"openTelemetry": {
"backendRefs": [
{"group":"","kind":"Service","name":"otel-collector","namespace":"tars-dataplane","port":4317,"weight":1}
]
}
}
}
]'
Path B: merge-patch (no default accessLog, create the entire block):
kubectl patch envoyproxy tars-egress-proxy -n tars-system --type=merge -p '{
"spec":{"telemetry":{"accessLog":{"settings":[
{"sinks":[
{"type":"OpenTelemetry","openTelemetry":{"backendRefs":[
{"group":"","kind":"Service","name":"otel-collector","namespace":"tars-dataplane","port":4317,"weight":1}
]}}
]}
]}}}
}'
2. Add a logs pipeline to the collector ConfigMap, pointing at the chosen backend exporter (for example, azuremonitor). The Azure Monitor walkthrough above shows the full ConfigMap diff; the relevant addition is:
service:
pipelines:
logs:
receivers: [otlp]
processors: [batch]
exporters: [azuremonitor] # Or datadog, otlphttp, and similar
3. Restart egress so the new accessLog sink loads:
kubectl rollout restart -n tars-dataplane deployment/egress
Verify access logs are flowing:
# Azure Monitor: open Application Insights → Logs and run
# traces | where timestamp > ago(15m) | take 10
#
# Datadog / Grafana Cloud / others: check the corresponding Logs explorer.
#
# Real-time verification on the collector side: temporarily add 'debug' to
# the logs pipeline exporters list and grep:
# kubectl logs -n tars-dataplane deployment/otel-collector --tail=100 \
# | grep otel_envoy_accesslog
The EnvoyProxy accessLog patch resets on every tare install re-run. Re-apply after each reinstall.
Other backends
| Platform | Exporter | Reference |
|---|---|---|
| Datadog | datadog | https://docs.datadoghq.com/opentelemetry/otel_collector_datadog_exporter/ |
| Grafana Cloud | otlphttp to a grafana.net endpoint | https://grafana.com/docs/grafana-cloud/send-data/otlp/ |
| SigNoz Cloud | otlphttp with the signoz-access-token header | https://signoz.io/docs/instrumentation/opentelemetry-collector/ |
| Splunk Observability | signalfx | https://docs.splunk.com/observability/en/gdi/opentelemetry/exporters/signalfx-exporter.html |
| In-cluster SigNoz, Jaeger, or Grafana | otlp or otlphttp to the local service | Platform-specific |
The pattern is consistent across backends: define the exporter in the collector's ConfigMap, add it to the relevant pipelines, then restart the collector. The EnvoyProxy patches applied above remain unchanged.
Both EnvoyProxy patches (metrics and accessLog) reset on every tare install re-run. Re-apply after each reinstall.
Appendix d: use an existing private registry
Use this path when an organization-wide private container registry (Nexus, Harbor, JFrog Artifactory, or another enterprise registry) is already in place. This appendix replaces Step 4 and Step 5 in the main flow.
Two separate registry credentials are involved:
- Operator credentials: used by the workstation running
tare install --image-syncto push images into the private registry. - Kubernetes pull credentials: stored as an image-pull secret so AKS nodes can pull images from the private registry.
The data plane credential is still required. tare uses it to authenticate to Tetrate's source registry while syncing images. The private registry username and password are used only for the destination registry and Kubernetes image pulls.
Step d.1: set variables for the private registry
DP_CREDENTIAL=./data-plane-credentials.json
PRIVATE_REGISTRY_HOST=registry.acme.example.com
PRIVATE_IMAGE_REGISTRY=registry.acme.example.com/tare
PULL_SECRET=acme-registry-pull
REGISTRY_USERNAME=<registry-user>
REGISTRY_PASSWORD=<registry-password-or-token>
Step d.2: sync images into the private registry
Log in locally with credentials that can push to the private registry:
printf '%s' "${REGISTRY_PASSWORD}" | \
docker login "${PRIVATE_REGISTRY_HOST}" \
--username "${REGISTRY_USERNAME}" \
--password-stdin
Copy the pinned Agent Router images into the registry:
tare install "${DP_CREDENTIAL}" \
--image-sync "${PRIVATE_IMAGE_REGISTRY}" \
--sync-only
tare authenticates to the Tetrate source registry using the data plane credential. The local Docker login authenticates to the destination registry.
Step d.3: create Kubernetes pull secrets
Create the same pull secret in both namespaces:
kubectl create namespace tars-system --dry-run=client -o yaml | kubectl apply -f -
kubectl create namespace tars-dataplane --dry-run=client -o yaml | kubectl apply -f -
kubectl create secret docker-registry "${PULL_SECRET}" \
--docker-server="${PRIVATE_REGISTRY_HOST}" \
--docker-username="${REGISTRY_USERNAME}" \
--docker-password="${REGISTRY_PASSWORD}" \
--namespace tars-system \
--dry-run=client -o yaml | kubectl apply -f -
kubectl create secret docker-registry "${PULL_SECRET}" \
--docker-server="${PRIVATE_REGISTRY_HOST}" \
--docker-username="${REGISTRY_USERNAME}" \
--docker-password="${REGISTRY_PASSWORD}" \
--namespace tars-dataplane \
--dry-run=client -o yaml | kubectl apply -f -
Step d.4: install from the private registry
Install Agent Router with --image-registry pointing at the private registry and --image-pull-secret-name referencing the existing secret:
tare install "${DP_CREDENTIAL}" \
--image-registry "${PRIVATE_IMAGE_REGISTRY}" \
--image-pull-secret-name "${PULL_SECRET}" \
--wait
This does not create the secret. It tells the Helm install to use the secret that already exists in tars-system and tars-dataplane.
If a platform team mirrors Agent Router images into the private registry before the install, skip the --image-sync step from Step D.2 and run only the tare install command above.
After the install, continue with Step 7 or the organization's preferred ingress path.
Where to go next