Agent Router data plane installation for AWS
This guide installs the Agent Router data plane on AWS Elastic Kubernetes Service (EKS).
Overview
The platform uses a split-plane model. A Management Plane hosted by Tetrate holds configuration and exposes the web applications. A Data Plane runs in your AWS account and handles all AI traffic. The two planes communicate over one outbound HTTPS connection initiated by the data plane; no inbound internet connections are required.
By the end of this guide you will have:
- An EKS cluster with the EBS CSI driver installed
- An ECR registry mirroring the Agent Router images
- The Agent Router data plane running in the
tars-systemandtars-dataplanenamespaces - An Application Load Balancer fronting the data plane at a DNS name you own
- An OpenTelemetry pipeline forwarding metrics to your observability stack
Plan for 30–45 minutes of installation time, plus DNS propagation.
Table of contents
- Prepare for the installation: obtain the data plane credential and install the CLI
- Cluster setup: create or reuse a Kubernetes cluster
- Registry setup: mirror Agent Router images so the cluster can pull them
- Data Plane installation: deploy the data plane
- Ingress setup: expose the data plane externally
- DNS configuration: wire the hostname to the ingress and register the URL on the management plane
- Testing the installation: verify the install works end-to-end
- Operations
- Appendices
Prerequisites
Dashboard and Router app access
Agent Router has two web surfaces. Both URLs are provided during onboarding:
- Dashboard (admin):
https://dashboard.<your-tenant>.tetrate.ai. Used in Step 1 and Step 12. - Router app (end-user):
https://router.<your-tenant>.tetrate.ai. Used in Step 14 to create API keys for callers.
Tools
Install the following on the host that will run the guide:
| Tool | Install |
|---|---|
| aws CLI v2 | https://docs.aws.amazon.com/cli/latest/userguide/install.html |
| eksctl ≥ 0.220 | https://eksctl.io/installation/ (or brew install eksctl). Older versions do not recognise current EKS K8s versions. |
| kubectl | https://kubernetes.io/docs/tasks/tools/ |
| helm (3+) | https://helm.sh/docs/intro/install/ |
| docker | https://docs.docker.com/get-docker/ |
| curl | preinstalled on macOS and most Linux distros |
| tare CLI | covered in Step 2 |
A data-plane-credentials.json file is also required; see Step 1.
Infrastructure
A dedicated workload cluster must be provisioned before starting the installation. The cluster should consist of at least three (3) nodes. See Cluster sizing for more details.
Tetrate support does not cover client-side infrastructure provisioning or Kubernetes issues. The instructions for creating clusters and related infrastructure components are provided as a courtesy and should be carefully evaluated before executing them.
AWS permissions
The operator needs permission to manage these AWS services on the target account:
| Service | Used for |
|---|---|
| EKS (AmazonEKSClusterPolicy, AmazonEKSWorkerNodePolicy) | Cluster creation and node groups |
| EC2 + VPC | Cluster networking, EBS volumes, ELB provisioning |
| IAM | OIDC provider, IRSA roles for EBS CSI driver and AWS Load Balancer Controller |
| CloudFormation | eksctl deploys everything via CloudFormation stacks |
| ECR | Create repositories, push/pull images |
| ELB (Elastic Load Balancing) | ALB created by the AWS Load Balancer Controller |
The simplest setup is an admin role on a sandbox account. For a constrained role, AWS publishes the minimum policy eksctl needs at https://eksctl.io/usage/minimum-iam-policies/.
Cluster sizing
The chart's default install runs multiple always-on components (egress envoy with min 2 replicas, AI gateway controller and ext_proc, controller / worker, Redis / rate-limit). This is not a single-node footprint.
The egress envoy is the dominant resource consumer. Both CPU and memory usage grow with the configuration size the proxy holds in memory: the number of AIGatewayRoute and AIServiceBackend resources, header-mutation rules, and other per-route features. The AI gateway team's control-plane scaling benchmark shows roughly linear CPU and memory growth from adding routes, with memory staying elevated to keep the xDS state available to serve traffic. Plan for routes to scale up as the data plane adds providers, models, and projects. General-purpose EC2 families (m5.* or m6i.*, balanced CPU/RAM) are the right default.
| Size | Use case | Recommended node pool | Approx allocatable target |
|---|---|---|---|
| Small | dev / test / low traffic | 3 × m5.large | ≥ 6 vCPU, ≥ 20 GiB RAM |
| Medium | staging / light production | 3 × m5.xlarge | ≥ 12 vCPU, ≥ 40 GiB RAM |
| High | production with burst headroom | 3 × m5.2xlarge (or split into system + data plane node groups) | ≥ 24 vCPU, ≥ 80 GiB RAM |
Practical floor: 3 nodes minimum to survive a node drain or upgrade. Demo installs can start at Small; production should start at Medium.
Conventions
The steps below export environment variables (AWS_REGION, EKS_CLUSTER_NAME, AWS_ACCOUNT, ECR_HOST, CREDENTIAL_FILE, SERVE_URL, and others) as they become needed. Each later step assumes the variables defined earlier are still exported in the current shell. If you start a new shell mid-install, re-export them before continuing.
Step 1: obtain your data plane credential
In the dashboard, navigate to System → Settings → Data plane credentials → + Generate Data plane credential.
Save the downloaded file as data-plane-credentials.json on the host where the install runs. This file is the long-lived identity the data plane uses to authenticate to the management plane.
Some parts of the product still use older "service account" naming for this file. CLI output and in-cluster paths may reference it as a service account; it is the same file. The dashboard is in the process of standardizing on "data plane credential".
Each data plane uses its own credential. Revoke a credential from the dashboard or generate additional ones (for example, one per environment) at any time.
Step 2: install the tare CLI
curl -sSL https://tare.tetrate.ai/tools/install.sh | bash
Output:
==> tare installer
==> channel: stable
==> Detected platform: darwin-arm64
==> Installing tare for darwin-arm64...
==> Downloading from: https://tare.tetrate.ai/tools/tags/v0.1.0-beta.2/tare-darwin-arm64.tar.gz
ok Installed tare to /Users/johndoe/.tare/bin/tare
==> tare version: tare version v0.1.0-beta.2
ok Installation directory is already in your PATH
==> Get started:
tare install identity.json --serve-url https://proxy.acme.com
tare install --help
The installer prints the install path (typically ~/.tare/bin/tare). Add it to PATH and verify the version:
export PATH="$PATH:$HOME/.tare/bin"
echo 'export PATH="$PATH:$HOME/.tare/bin"' >> ~/.zshrc # or ~/.bashrc
$ tare --version
tare version v0.1.0-beta.2
Step 3: provision the eks cluster
Step 3.1: set environment variables
AWS_REGION=<region> # e.g. us-east-1
EKS_CLUSTER_NAME=<cluster-name>
AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)
# Pick the regional default Kubernetes version so the version doesn't
# go out of support unexpectedly.
K8S_VERSION=$(aws eks describe-cluster-versions \
--region "${AWS_REGION}" \
--query 'clusterVersions[?defaultVersion==`true`].clusterVersion' \
--output text)
# Optional resource tags — adapt or drop these for your organization.
TAGS="Owner=<your-name>,Team=<your-team>,Purpose=development"
Confirm eksctl version reports ≥ 0.220 before continuing. Older binaries reject the K8s version EKS now defaults to and the cluster create call fails with invalid version, supported values: 1.23, …, 1.31.
Step 3.2: log in to AWS
aws sts get-caller-identity
Step 3.3: create the eks cluster
eksctl create cluster \
--name "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--version "${K8S_VERSION}" \
--nodes 3 \
--nodes-min 3 \
--nodes-max 3 \
--node-type m5.xlarge \
--with-oidc \
--tags "${TAGS}"
Provisioning takes ~15 minutes. --with-oidc enables IAM Roles for Service Accounts (IRSA), which the EBS CSI driver and AWS Load Balancer Controller depend on.
The --nodes 3 and --node-type m5.xlarge values above match the Medium tier in Cluster sizing. Adjust for your environment.
Step 3.4: verify
kubectl get nodes
Expected:
NAME STATUS ROLES AGE VERSION
ip-192-168-14-164.ec2.internal Ready <none> 2m v1.35.x-eks-...
ip-192-168-25-245.ec2.internal Ready <none> 2m v1.35.x-eks-...
ip-192-168-40-57.ec2.internal Ready <none> 2m v1.35.x-eks-...
Already have an EKS cluster? Reuse it after confirming that:
- OIDC provider is associated:
aws eks describe-cluster --name <name> --region <region> --query "cluster.identity.oidc.issuer" - EBS CSI driver is installed:
aws eks list-addons --cluster-name <name> --region <region> --query 'addons' | grep aws-ebs-csi-driver - K8s version is supported (
aws eks describe-cluster-versions --region <region>shows the cluster's version is not deprecated)
Step 4: install the ebs csi driver addon
The egress envoy and Redis use persistent volumes, and EKS does not install a CSI driver by default.
# IAM role for the addon (one-shot — keeps the role even after cluster recreation)
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve --role-only \
--role-name "${EKS_CLUSTER_NAME}-AmazonEKS_EBS_CSI_DriverRole"
# Install the addon itself
eksctl create addon \
--cluster "${EKS_CLUSTER_NAME}" \
--name aws-ebs-csi-driver \
--region "${AWS_REGION}" \
--service-account-role-arn "arn:aws:iam::${AWS_ACCOUNT}:role/${EKS_CLUSTER_NAME}-AmazonEKS_EBS_CSI_DriverRole" \
--force
Wait for it to become ACTIVE (~1 min):
aws eks describe-addon --cluster-name "${EKS_CLUSTER_NAME}" \
--addon-name aws-ebs-csi-driver --region "${AWS_REGION}" \
--query 'addon.status' --output text
Step 5: create ecr repositories
Agent Router images must live in your registry. Unlike most cloud registries, ECR requires each repository to be pre-created; tare install --image-sync does not auto-create them.
The set of repos changes across tare releases. Rather than hand-maintaining a list, this guide uses a helper that loops tare install --sync-only, parses any NAME_UNKNOWN errors, creates the missing repo, and retries.
ECR_HOST="${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com"
ECR_PREFIX="${EKS_CLUSTER_NAME}" # repos will be created as ${EKS_CLUSTER_NAME}/<image>
The helper is in this repo at tests/aws-dp-install/sync-images.sh. Run it after the next step.
Step 6: authenticate Docker to ecr
aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${ECR_HOST}"
ECR tokens expire after 12 hours. Re-run this command if subsequent steps fail with unauthorized: authentication required.
Step 7: sync Agent Router images to your ecr
Save data-plane-credentials.json to a known path, then run:
CREDENTIAL_FILE=/path/to/data-plane-credentials.json
SERVE_URL="http://<your-data-plane-hostname>" # e.g. http://proxy.example.com
bash tests/aws-dp-install/sync-images.sh
The helper creates any missing ECR repos as tare reports them. Expect ~3–5 minutes on the first run. On success it prints:
✓ Image sync done in 2m22s
=== sync complete ===
--sync-only still requires a --serve-url value (chicken-and-egg with the management plane URL registration). Use the hostname already pre-cleared with the DNS owner; the URL is registered in the next install step.
Step 8: create the image-pull secret
kubectl create ns tars-system --dry-run=client -o yaml | kubectl apply -f -
kubectl create ns tars-dataplane --dry-run=client -o yaml | kubectl apply -f -
ECR_TOKEN=$(aws ecr get-login-password --region "${AWS_REGION}")
for NS in tars-system tars-dataplane; do
kubectl create secret docker-registry registry-secret \
--docker-server="${ECR_HOST}" \
--docker-username=AWS \
--docker-password="${ECR_TOKEN}" \
-n "${NS}" \
--dry-run=client -o yaml | kubectl apply -f -
done
ECR tokens expire every 12 hours, so this secret stops working after that window. For production deployments, use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity to mint tokens on demand instead of a static secret; see AWS docs.
Step 9: install the data plane
tare install "${CREDENTIAL_FILE}" \
--image-sync "${ECR_HOST}/${ECR_PREFIX}" \
--image-pull-secret-name registry-secret \
--serve-url "${SERVE_URL}"
The command:
- Re-syncs any images (idempotent after the previous step).
- Installs the helm chart into
tars-systemandtars-dataplane. - Registers the
--serve-urlvalue with the management plane.
Watch the pods come up:
kubectl get pods -n tars-system
kubectl get pods -n tars-dataplane
Expected output (all Running, 1/1 or 3/3):
NAME READY STATUS RESTARTS AGE
ai-gateway-controller-... 1/1 Running 0 2m
controller-... 1/1 Running 0 2m
controller-worker-... 1/1 Running 0 2m
envoy-gateway-... 1/1 Running 0 2m
envoy-ratelimit-... 1/1 Running 0 1m
tars-redis-master-... 1/1 Running 0 2m
tars-tare-doctor-... 1/1 Running 0 1m
NAME READY STATUS RESTARTS AGE
egress-... 3/3 Running 0 1m
The install command also prints a Dataplane unreachable warning at this point because DNS does not resolve the data-plane hostname yet. This is expected; DNS is wired in Step 12 below.
Step 10: install the AWS load balancer controller
The data plane needs an L7 load balancer to terminate inbound HTTP and forward to the egress Service in tars-dataplane. The AWS Load Balancer Controller provisions an ALB from Kubernetes Ingress resources.
Step 10.1: iam policy
The latest controller version requires permissions beyond what the v2.7.1 reference policy covered. Fetch the current policy from main:
curl -sSL -o /tmp/iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json
aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file:///tmp/iam-policy.json
If the policy already exists from a prior run, update it instead:
POLICY_ARN=arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy
aws iam create-policy-version \
--policy-arn "${POLICY_ARN}" \
--policy-document file:///tmp/iam-policy.json \
--set-as-default
Step 10.2: iam service account
eksctl create iamserviceaccount \
--cluster="${EKS_CLUSTER_NAME}" \
--region="${AWS_REGION}" \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn="arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy" \
--approve --override-existing-serviceaccounts
Step 10.3: Helm install
helm repo add eks https://aws.github.io/eks-charts
helm repo update
VPC_ID=$(aws eks describe-cluster --name "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--query "cluster.resourcesVpcConfig.vpcId" --output text)
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName="${EKS_CLUSTER_NAME}" \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller \
--set region="${AWS_REGION}" \
--set vpcId="${VPC_ID}"
Wait for the controller pods to roll out:
kubectl rollout status -n kube-system deployment/aws-load-balancer-controller --timeout=180s
Step 11: create the alb ingress
All data-plane traffic (/v1/*, /mcp/*, /.well-known/*) is served by the egress Service in tars-dataplane on port 10080. A single Ingress is enough.
cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tars-ingress
namespace: tars-dataplane
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: egress
port:
number: 10080
EOF
Wait for the ALB to provision (~1–2 min) and capture its hostname:
kubectl get ingress tars-ingress -n tars-dataplane
The ADDRESS column populates with something like k8s-tarsdata-tarsingr-xxxxxxxxxx-yyyyyyyyy.us-east-1.elb.amazonaws.com.
If ADDRESS stays empty for more than a few minutes, check the controller for FailedDeployModel events:
kubectl get events -n tars-dataplane --sort-by=.lastTimestamp | tail -5
kubectl logs -n kube-system deployment/aws-load-balancer-controller --tail=30 | grep -iE "error|fail"
The most common cause is an outdated IAM policy missing a permission such as elasticloadbalancing:DescribeListenerAttributes. Re-run the IAM policy update above and restart the controller.
The Ingress persists across tare install reinstalls; re-applying it during upgrades is not required.
Step 12: wire DNS and register the URL
Step 12.1: DNS
Point the data-plane hostname at the ALB via a CNAME record at your DNS provider:
proxy.example.com CNAME k8s-tarsdata-tarsingr-xxxxxxxxxx-yyyyyyyyy.us-east-1.elb.amazonaws.com
Verify it resolves (propagation may take a couple of minutes):
dig +short proxy.example.com
Step 12.2: register the URL on the management plane
If the URL passed to tare install --serve-url matches the hostname just wired in DNS, registration is already done; tare install registered the data-plane URL when it ran. Otherwise, in the dashboard: System → Settings → Data planes → edit your data plane → URL.
Step 13: verify the install
Run tare doctor to check pod, CRD, and policy state:
tare doctor "${CREDENTIAL_FILE}"
A healthy install reports all components in Accepted: True state. If tare doctor flags any errors, contact Tetrate Support.
Step 14: smoke tests
Once tare doctor reports clean, validate end-to-end traffic.
Create an API key from the router app (Router → API Keys → + Create):
export DP_HOST=proxy.example.com
export TARS_API_KEY=<your-api-key>
Run the suite (sends each endpoint with and without auth; expects 200 vs 401):
bash tests/aws-dp-install/smoke.sh all
Expected output:
==== list models (GET /v1/models) ====
ok models auth http=200 expected=200
ok models NO-auth http=401 expected=401
==== chat completions (POST /v1/chat/completions) ====
ok openai gpt-5-mini auth http=200 expected=200
ok openai gpt-5-mini NO-auth http=401 expected=401
==== responses API (POST /v1/responses) ====
ok responses auth http=200 expected=200
ok responses NO-auth http=401 expected=401
==== anthropic messages (POST /v1/messages) ====
ok anthropic auth http=200 expected=200
ok anthropic NO-auth http=401 expected=401
==== embeddings (POST /v1/embeddings) ====
ok embeddings auth http=200 expected=200
ok embeddings NO-auth http=401 expected=401
==== image generation (POST /v1/images/generations) ====
ok images auth http=200 expected=200
ok images NO-auth http=401 expected=401
-
pass: 12 fail: 0 auth-bypass: 0
If any row fails (in particular, if a NO-auth row returns 200 instead of 401), contact Tetrate Support.
Upgrading
An upgrade re-runs the install command. The helm chart, CRDs, and config in tars-config are upgraded; persistent state (Redis-backed rate-limit counters, accumulated audit data) survives. The ALB Ingress persists, the IAM policy does not change, and the EBS CSI addon remains installed.
tare install "${CREDENTIAL_FILE}" \
--image-sync "${ECR_HOST}/${ECR_PREFIX}" \
--image-pull-secret-name registry-secret \
--serve-url "${SERVE_URL}"
Some runtime patches (notably observability env vars on the egress deployment) are reset on every install. Re-apply the patches from Appendix B after each upgrade.
Cleanup
To remove everything created by this guide:
# 1. Ingress (releases the ALB)
kubectl delete ingress tars-ingress -n tars-dataplane --ignore-not-found
# 2. AWS Load Balancer Controller
helm uninstall aws-load-balancer-controller -n kube-system 2>/dev/null || true
eksctl delete iamserviceaccount \
--name aws-load-balancer-controller \
--namespace kube-system \
--cluster "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" 2>/dev/null || true
ALB_POLICY_ARN=$(aws iam list-policies \
--query "Policies[?PolicyName=='AWSLoadBalancerControllerIAMPolicy'].Arn" \
--output text 2>/dev/null)
if [ -n "${ALB_POLICY_ARN}" ] && [ "${ALB_POLICY_ARN}" != "None" ]; then
# Delete non-default versions first (IAM rule)
for V in $(aws iam list-policy-versions --policy-arn "${ALB_POLICY_ARN}" \
--query 'Versions[?IsDefaultVersion==`false`].VersionId' --output text); do
aws iam delete-policy-version --policy-arn "${ALB_POLICY_ARN}" --version-id "${V}"
done
aws iam delete-policy --policy-arn "${ALB_POLICY_ARN}"
fi
# 3. ECR repositories (the cluster prefix matches what was used at create time)
for REPO in $(aws ecr describe-repositories --region "${AWS_REGION}" \
--query "repositories[?starts_with(repositoryName, '${EKS_CLUSTER_NAME}/')].repositoryName" \
--output text); do
aws ecr delete-repository --repository-name "${REPO}" \
--region "${AWS_REGION}" --force
done
# 4. EKS cluster (also tears down the EBS CSI addon and its IAM role via
# CloudFormation — takes ~10 min)
eksctl delete cluster --name "${EKS_CLUSTER_NAME}" --region "${AWS_REGION}"
# 5. Local kubeconfig file
rm -f "${HOME}/kubeconfig-${EKS_CLUSTER_NAME}"
DNS records (CNAMEs at your DNS provider) must be removed manually.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
| eksctl create cluster errors with invalid version, supported values: 1.23, …, 1.31 | eksctl is older than the K8s version EKS currently defaults to | brew upgrade eksctl (or re-download), then retry |
| tare install --sync-only errors with NAME_UNKNOWN: The repository with name '...' does not exist | ECR repo for that image was never created | The sync-images.sh helper handles this; re-run it |
| Ingress ADDRESS stays empty for more than 2 min | LB Controller IAM policy is missing newer permissions | Update the IAM policy from main, restart the controller |
| Pods stuck ImagePullBackOff with no basic auth credentials | ECR token in registry-secret expired (12h lifetime) | Re-run the image-pull-secret creation step (Step 8) |
| egress pods restart loop with connection refused to redis | EBS CSI driver missing, so Redis PV never binds | Install the EBS CSI addon (Step 4) |
Appendix a: TLS via acm + HTTPS listener
The default flow uses HTTP on port 80. For production, terminate TLS at the ALB using an AWS Certificate Manager (ACM) certificate.
- Request the certificate in ACM for the data-plane hostname. Validate via DNS (CNAME) or email per the domain control method.
- Get the certificate ARN:
CERT_ARN=$(aws acm list-certificates --region "${AWS_REGION}" \
--query "CertificateSummaryList[?DomainName=='proxy.example.com'].CertificateArn" \
--output text)
- Update the Ingress to listen on HTTPS:443 and redirect HTTP → HTTPS:
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: <CERT_ARN>
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-2021-06
- Update the data-plane URL on the management plane to
https://.... Re-runtare installto re-register the--serve-urlvalue automatically.
The ALB picks up the annotation change without redeploying. Verify with:
curl -I https://proxy.example.com/v1/models -H "Authorization: Bearer ${TARS_API_KEY}"
Appendix b: observability with OpenTelemetry
The data plane emits OTLP metrics (router_* family). To forward them to your observability backend, deploy an OpenTelemetry Collector in-cluster.
cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: otel-system
apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: otel-system
data:
config.yaml: |
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
transform/strip_scope:
metric_statements:
- context: metric
statements:
- replace_pattern(name, "^dynamicmodulescustom\\.", "router_")
exporters:
debug:
verbosity: detailed
# Replace this with your real backend exporter (Datadog, Grafana Cloud,
# SigNoz, CloudWatch via the awsemf exporter, etc.).
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch, transform/strip_scope]
exporters: [debug]
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: otel-system
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.98.0
ports:
- containerPort: 4317
- containerPort: 4318
volumeMounts:
- name: config
mountPath: /etc/otelcol-contrib
volumes:
- name: config
configMap:
name: otel-collector-config
apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: otel-system
spec:
selector:
app: otel-collector
ports:
- name: otlp-http
port: 4318
targetPort: 4318
- name: otlp-grpc
port: 4317
targetPort: 4317
EOF
Point the egress envoy at the collector:
kubectl set env deploy/egress -n tars-dataplane \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.otel-system.svc.cluster.local:4318 \
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
Generate traffic via the smoke tests, then verify metrics are flowing:
kubectl logs -n otel-system deployment/otel-collector --tail=50 | grep "Name:"
Expected metrics include router_requests_total, router_auth_attempts_total, and similar.
The kubectl set env patch is reset by tare install re-runs. Re-apply after every upgrade until the chart accepts these settings via Helm values.
Where to go next