Skip to main content

Agent Router data plane installation for AWS

This guide installs the Agent Router data plane on AWS Elastic Kubernetes Service (EKS).


Overview

The platform uses a split-plane model. A Management Plane hosted by Tetrate holds configuration and exposes the web applications. A Data Plane runs in your AWS account and handles all AI traffic. The two planes communicate over one outbound HTTPS connection initiated by the data plane; no inbound internet connections are required.

By the end of this guide you will have:

  • An EKS cluster with the EBS CSI driver installed
  • An ECR registry mirroring the Agent Router images
  • The Agent Router data plane running in the tars-system and tars-dataplane namespaces
  • An Application Load Balancer fronting the data plane at a DNS name you own
  • An OpenTelemetry pipeline forwarding metrics to your observability stack

Plan for 30–45 minutes of installation time, plus DNS propagation.

Table of contents

Prerequisites

Dashboard and Router app access

Agent Router has two web surfaces. Both URLs are provided during onboarding:

  • Dashboard (admin): https://dashboard.<your-tenant>.tetrate.ai. Used in Step 1 and Step 12.
  • Router app (end-user): https://router.<your-tenant>.tetrate.ai. Used in Step 14 to create API keys for callers.

Tools

Install the following on the host that will run the guide:

ToolInstall
aws CLI v2https://docs.aws.amazon.com/cli/latest/userguide/install.html
eksctl ≥ 0.220https://eksctl.io/installation/ (or brew install eksctl). Older versions do not recognise current EKS K8s versions.
kubectlhttps://kubernetes.io/docs/tasks/tools/
helm (3+)https://helm.sh/docs/intro/install/
dockerhttps://docs.docker.com/get-docker/
curlpreinstalled on macOS and most Linux distros
tare CLIcovered in Step 2

A data-plane-credentials.json file is also required; see Step 1.

Infrastructure

A dedicated workload cluster must be provisioned before starting the installation. The cluster should consist of at least three (3) nodes. See Cluster sizing for more details.

warning

Tetrate support does not cover client-side infrastructure provisioning or Kubernetes issues. The instructions for creating clusters and related infrastructure components are provided as a courtesy and should be carefully evaluated before executing them.

AWS permissions

The operator needs permission to manage these AWS services on the target account:

ServiceUsed for
EKS (AmazonEKSClusterPolicy, AmazonEKSWorkerNodePolicy)Cluster creation and node groups
EC2 + VPCCluster networking, EBS volumes, ELB provisioning
IAMOIDC provider, IRSA roles for EBS CSI driver and AWS Load Balancer Controller
CloudFormationeksctl deploys everything via CloudFormation stacks
ECRCreate repositories, push/pull images
ELB (Elastic Load Balancing)ALB created by the AWS Load Balancer Controller

The simplest setup is an admin role on a sandbox account. For a constrained role, AWS publishes the minimum policy eksctl needs at https://eksctl.io/usage/minimum-iam-policies/.

Cluster sizing

The chart's default install runs multiple always-on components (egress envoy with min 2 replicas, AI gateway controller and ext_proc, controller / worker, Redis / rate-limit). This is not a single-node footprint.

The egress envoy is the dominant resource consumer. Both CPU and memory usage grow with the configuration size the proxy holds in memory: the number of AIGatewayRoute and AIServiceBackend resources, header-mutation rules, and other per-route features. The AI gateway team's control-plane scaling benchmark shows roughly linear CPU and memory growth from adding routes, with memory staying elevated to keep the xDS state available to serve traffic. Plan for routes to scale up as the data plane adds providers, models, and projects. General-purpose EC2 families (m5.* or m6i.*, balanced CPU/RAM) are the right default.

SizeUse caseRecommended node poolApprox allocatable target
Smalldev / test / low traffic3 × m5.large≥ 6 vCPU, ≥ 20 GiB RAM
Mediumstaging / light production3 × m5.xlarge≥ 12 vCPU, ≥ 40 GiB RAM
Highproduction with burst headroom3 × m5.2xlarge (or split into system + data plane node groups)≥ 24 vCPU, ≥ 80 GiB RAM

Practical floor: 3 nodes minimum to survive a node drain or upgrade. Demo installs can start at Small; production should start at Medium.

Conventions

The steps below export environment variables (AWS_REGION, EKS_CLUSTER_NAME, AWS_ACCOUNT, ECR_HOST, CREDENTIAL_FILE, SERVE_URL, and others) as they become needed. Each later step assumes the variables defined earlier are still exported in the current shell. If you start a new shell mid-install, re-export them before continuing.

Step 1: obtain your data plane credential

In the dashboard, navigate to System → Settings → Data plane credentials → + Generate Data plane credential.

Save the downloaded file as data-plane-credentials.json on the host where the install runs. This file is the long-lived identity the data plane uses to authenticate to the management plane.

note

Some parts of the product still use older "service account" naming for this file. CLI output and in-cluster paths may reference it as a service account; it is the same file. The dashboard is in the process of standardizing on "data plane credential".

Each data plane uses its own credential. Revoke a credential from the dashboard or generate additional ones (for example, one per environment) at any time.

Step 2: install the tare CLI

curl -sSL https://tare.tetrate.ai/tools/install.sh | bash

Output:

==> tare installer
==> channel: stable

==> Detected platform: darwin-arm64

==> Installing tare for darwin-arm64...
==> Downloading from: https://tare.tetrate.ai/tools/tags/v0.1.0-beta.2/tare-darwin-arm64.tar.gz
ok Installed tare to /Users/johndoe/.tare/bin/tare

==> tare version: tare version v0.1.0-beta.2

ok Installation directory is already in your PATH

==> Get started:
tare install identity.json --serve-url https://proxy.acme.com
tare install --help

The installer prints the install path (typically ~/.tare/bin/tare). Add it to PATH and verify the version:

export PATH="$PATH:$HOME/.tare/bin"
echo 'export PATH="$PATH:$HOME/.tare/bin"' >> ~/.zshrc # or ~/.bashrc
$ tare --version
tare version v0.1.0-beta.2

Step 3: provision the eks cluster

Step 3.1: set environment variables

AWS_REGION=<region> # e.g. us-east-1
EKS_CLUSTER_NAME=<cluster-name>
AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text)

# Pick the regional default Kubernetes version so the version doesn't
# go out of support unexpectedly.
K8S_VERSION=$(aws eks describe-cluster-versions \
--region "${AWS_REGION}" \
--query 'clusterVersions[?defaultVersion==`true`].clusterVersion' \
--output text)

# Optional resource tags — adapt or drop these for your organization.
TAGS="Owner=<your-name>,Team=<your-team>,Purpose=development"
tip

Confirm eksctl version reports ≥ 0.220 before continuing. Older binaries reject the K8s version EKS now defaults to and the cluster create call fails with invalid version, supported values: 1.23, …, 1.31.

Step 3.2: log in to AWS

aws sts get-caller-identity

Step 3.3: create the eks cluster

eksctl create cluster \
--name "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--version "${K8S_VERSION}" \
--nodes 3 \
--nodes-min 3 \
--nodes-max 3 \
--node-type m5.xlarge \
--with-oidc \
--tags "${TAGS}"

Provisioning takes ~15 minutes. --with-oidc enables IAM Roles for Service Accounts (IRSA), which the EBS CSI driver and AWS Load Balancer Controller depend on.

The --nodes 3 and --node-type m5.xlarge values above match the Medium tier in Cluster sizing. Adjust for your environment.

Step 3.4: verify

kubectl get nodes

Expected:

NAME STATUS ROLES AGE VERSION
ip-192-168-14-164.ec2.internal Ready <none> 2m v1.35.x-eks-...
ip-192-168-25-245.ec2.internal Ready <none> 2m v1.35.x-eks-...
ip-192-168-40-57.ec2.internal Ready <none> 2m v1.35.x-eks-...
tip

Already have an EKS cluster? Reuse it after confirming that:

  • OIDC provider is associated: aws eks describe-cluster --name <name> --region <region> --query "cluster.identity.oidc.issuer"
  • EBS CSI driver is installed: aws eks list-addons --cluster-name <name> --region <region> --query 'addons' | grep aws-ebs-csi-driver
  • K8s version is supported (aws eks describe-cluster-versions --region <region> shows the cluster's version is not deprecated)

Step 4: install the ebs csi driver addon

The egress envoy and Redis use persistent volumes, and EKS does not install a CSI driver by default.

# IAM role for the addon (one-shot — keeps the role even after cluster recreation)
eksctl create iamserviceaccount \
--name ebs-csi-controller-sa \
--namespace kube-system \
--cluster "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--attach-policy-arn arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy \
--approve --role-only \
--role-name "${EKS_CLUSTER_NAME}-AmazonEKS_EBS_CSI_DriverRole"
# Install the addon itself
eksctl create addon \
--cluster "${EKS_CLUSTER_NAME}" \
--name aws-ebs-csi-driver \
--region "${AWS_REGION}" \
--service-account-role-arn "arn:aws:iam::${AWS_ACCOUNT}:role/${EKS_CLUSTER_NAME}-AmazonEKS_EBS_CSI_DriverRole" \
--force

Wait for it to become ACTIVE (~1 min):

aws eks describe-addon --cluster-name "${EKS_CLUSTER_NAME}" \
--addon-name aws-ebs-csi-driver --region "${AWS_REGION}" \
--query 'addon.status' --output text

Step 5: create ecr repositories

Agent Router images must live in your registry. Unlike most cloud registries, ECR requires each repository to be pre-created; tare install --image-sync does not auto-create them.

The set of repos changes across tare releases. Rather than hand-maintaining a list, this guide uses a helper that loops tare install --sync-only, parses any NAME_UNKNOWN errors, creates the missing repo, and retries.

ECR_HOST="${AWS_ACCOUNT}.dkr.ecr.${AWS_REGION}.amazonaws.com"
ECR_PREFIX="${EKS_CLUSTER_NAME}" # repos will be created as ${EKS_CLUSTER_NAME}/<image>

The helper is in this repo at tests/aws-dp-install/sync-images.sh. Run it after the next step.

Step 6: authenticate Docker to ecr

aws ecr get-login-password --region "${AWS_REGION}" | docker login --username AWS --password-stdin "${ECR_HOST}"

ECR tokens expire after 12 hours. Re-run this command if subsequent steps fail with unauthorized: authentication required.

Step 7: sync Agent Router images to your ecr

Save data-plane-credentials.json to a known path, then run:

CREDENTIAL_FILE=/path/to/data-plane-credentials.json
SERVE_URL="http://<your-data-plane-hostname>" # e.g. http://proxy.example.com

bash tests/aws-dp-install/sync-images.sh

The helper creates any missing ECR repos as tare reports them. Expect ~3–5 minutes on the first run. On success it prints:

✓ Image sync done in 2m22s
=== sync complete ===
note

--sync-only still requires a --serve-url value (chicken-and-egg with the management plane URL registration). Use the hostname already pre-cleared with the DNS owner; the URL is registered in the next install step.

Step 8: create the image-pull secret

kubectl create ns tars-system --dry-run=client -o yaml | kubectl apply -f -
kubectl create ns tars-dataplane --dry-run=client -o yaml | kubectl apply -f -

ECR_TOKEN=$(aws ecr get-login-password --region "${AWS_REGION}")

for NS in tars-system tars-dataplane; do
kubectl create secret docker-registry registry-secret \
--docker-server="${ECR_HOST}" \
--docker-username=AWS \
--docker-password="${ECR_TOKEN}" \
--docker-email=[email protected] \
-n "${NS}" \
--dry-run=client -o yaml | kubectl apply -f -
done

ECR tokens expire every 12 hours, so this secret stops working after that window. For production deployments, use IAM Roles for Service Accounts (IRSA) or EKS Pod Identity to mint tokens on demand instead of a static secret; see AWS docs.

Step 9: install the data plane

tare install "${CREDENTIAL_FILE}" \
--image-sync "${ECR_HOST}/${ECR_PREFIX}" \
--image-pull-secret-name registry-secret \
--serve-url "${SERVE_URL}"

The command:

  • Re-syncs any images (idempotent after the previous step).
  • Installs the helm chart into tars-system and tars-dataplane.
  • Registers the --serve-url value with the management plane.

Watch the pods come up:

kubectl get pods -n tars-system
kubectl get pods -n tars-dataplane

Expected output (all Running, 1/1 or 3/3):

NAME READY STATUS RESTARTS AGE
ai-gateway-controller-... 1/1 Running 0 2m
controller-... 1/1 Running 0 2m
controller-worker-... 1/1 Running 0 2m
envoy-gateway-... 1/1 Running 0 2m
envoy-ratelimit-... 1/1 Running 0 1m
tars-redis-master-... 1/1 Running 0 2m
tars-tare-doctor-... 1/1 Running 0 1m

NAME READY STATUS RESTARTS AGE
egress-... 3/3 Running 0 1m

The install command also prints a Dataplane unreachable warning at this point because DNS does not resolve the data-plane hostname yet. This is expected; DNS is wired in Step 12 below.

Step 10: install the AWS load balancer controller

The data plane needs an L7 load balancer to terminate inbound HTTP and forward to the egress Service in tars-dataplane. The AWS Load Balancer Controller provisions an ALB from Kubernetes Ingress resources.

Step 10.1: iam policy

The latest controller version requires permissions beyond what the v2.7.1 reference policy covered. Fetch the current policy from main:

curl -sSL -o /tmp/iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json

aws iam create-policy \
--policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file:///tmp/iam-policy.json

If the policy already exists from a prior run, update it instead:

POLICY_ARN=arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy
aws iam create-policy-version \
--policy-arn "${POLICY_ARN}" \
--policy-document file:///tmp/iam-policy.json \
--set-as-default

Step 10.2: iam service account

eksctl create iamserviceaccount \
--cluster="${EKS_CLUSTER_NAME}" \
--region="${AWS_REGION}" \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn="arn:aws:iam::${AWS_ACCOUNT}:policy/AWSLoadBalancerControllerIAMPolicy" \
--approve --override-existing-serviceaccounts

Step 10.3: Helm install

helm repo add eks https://aws.github.io/eks-charts
helm repo update

VPC_ID=$(aws eks describe-cluster --name "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" \
--query "cluster.resourcesVpcConfig.vpcId" --output text)

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
-n kube-system \
--set clusterName="${EKS_CLUSTER_NAME}" \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller \
--set region="${AWS_REGION}" \
--set vpcId="${VPC_ID}"

Wait for the controller pods to roll out:

kubectl rollout status -n kube-system deployment/aws-load-balancer-controller --timeout=180s

Step 11: create the alb ingress

All data-plane traffic (/v1/*, /mcp/*, /.well-known/*) is served by the egress Service in tars-dataplane on port 10080. A single Ingress is enough.

cat <<'EOF' | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: tars-ingress
namespace: tars-dataplane
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}]'
spec:
ingressClassName: alb
rules:
- http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: egress
port:
number: 10080
EOF

Wait for the ALB to provision (~1–2 min) and capture its hostname:

kubectl get ingress tars-ingress -n tars-dataplane

The ADDRESS column populates with something like k8s-tarsdata-tarsingr-xxxxxxxxxx-yyyyyyyyy.us-east-1.elb.amazonaws.com.

If ADDRESS stays empty for more than a few minutes, check the controller for FailedDeployModel events:

kubectl get events -n tars-dataplane --sort-by=.lastTimestamp | tail -5
kubectl logs -n kube-system deployment/aws-load-balancer-controller --tail=30 | grep -iE "error|fail"

The most common cause is an outdated IAM policy missing a permission such as elasticloadbalancing:DescribeListenerAttributes. Re-run the IAM policy update above and restart the controller.

The Ingress persists across tare install reinstalls; re-applying it during upgrades is not required.

Step 12: wire DNS and register the URL

Step 12.1: DNS

Point the data-plane hostname at the ALB via a CNAME record at your DNS provider:

proxy.example.com CNAME k8s-tarsdata-tarsingr-xxxxxxxxxx-yyyyyyyyy.us-east-1.elb.amazonaws.com

Verify it resolves (propagation may take a couple of minutes):

dig +short proxy.example.com

Step 12.2: register the URL on the management plane

If the URL passed to tare install --serve-url matches the hostname just wired in DNS, registration is already done; tare install registered the data-plane URL when it ran. Otherwise, in the dashboard: System → Settings → Data planes → edit your data plane → URL.

Step 13: verify the install

Run tare doctor to check pod, CRD, and policy state:

tare doctor "${CREDENTIAL_FILE}"

A healthy install reports all components in Accepted: True state. If tare doctor flags any errors, contact Tetrate Support.

Step 14: smoke tests

Once tare doctor reports clean, validate end-to-end traffic.

Create an API key from the router app (Router → API Keys → + Create):

export DP_HOST=proxy.example.com
export TARS_API_KEY=<your-api-key>

Run the suite (sends each endpoint with and without auth; expects 200 vs 401):

bash tests/aws-dp-install/smoke.sh all

Expected output:

==== list models (GET /v1/models) ====
ok models auth http=200 expected=200
ok models NO-auth http=401 expected=401

==== chat completions (POST /v1/chat/completions) ====
ok openai gpt-5-mini auth http=200 expected=200
ok openai gpt-5-mini NO-auth http=401 expected=401

==== responses API (POST /v1/responses) ====
ok responses auth http=200 expected=200
ok responses NO-auth http=401 expected=401

==== anthropic messages (POST /v1/messages) ====
ok anthropic auth http=200 expected=200
ok anthropic NO-auth http=401 expected=401

==== embeddings (POST /v1/embeddings) ====
ok embeddings auth http=200 expected=200
ok embeddings NO-auth http=401 expected=401

==== image generation (POST /v1/images/generations) ====
ok images auth http=200 expected=200
ok images NO-auth http=401 expected=401

-
pass: 12 fail: 0 auth-bypass: 0

If any row fails (in particular, if a NO-auth row returns 200 instead of 401), contact Tetrate Support.

Upgrading

An upgrade re-runs the install command. The helm chart, CRDs, and config in tars-config are upgraded; persistent state (Redis-backed rate-limit counters, accumulated audit data) survives. The ALB Ingress persists, the IAM policy does not change, and the EBS CSI addon remains installed.

tare install "${CREDENTIAL_FILE}" \
--image-sync "${ECR_HOST}/${ECR_PREFIX}" \
--image-pull-secret-name registry-secret \
--serve-url "${SERVE_URL}"
note

Some runtime patches (notably observability env vars on the egress deployment) are reset on every install. Re-apply the patches from Appendix B after each upgrade.

Cleanup

To remove everything created by this guide:

# 1. Ingress (releases the ALB)
kubectl delete ingress tars-ingress -n tars-dataplane --ignore-not-found
# 2. AWS Load Balancer Controller
helm uninstall aws-load-balancer-controller -n kube-system 2>/dev/null || true
eksctl delete iamserviceaccount \
--name aws-load-balancer-controller \
--namespace kube-system \
--cluster "${EKS_CLUSTER_NAME}" \
--region "${AWS_REGION}" 2>/dev/null || true
ALB_POLICY_ARN=$(aws iam list-policies \
--query "Policies[?PolicyName=='AWSLoadBalancerControllerIAMPolicy'].Arn" \
--output text 2>/dev/null)
if [ -n "${ALB_POLICY_ARN}" ] && [ "${ALB_POLICY_ARN}" != "None" ]; then
# Delete non-default versions first (IAM rule)
for V in $(aws iam list-policy-versions --policy-arn "${ALB_POLICY_ARN}" \
--query 'Versions[?IsDefaultVersion==`false`].VersionId' --output text); do
aws iam delete-policy-version --policy-arn "${ALB_POLICY_ARN}" --version-id "${V}"
done
aws iam delete-policy --policy-arn "${ALB_POLICY_ARN}"
fi
# 3. ECR repositories (the cluster prefix matches what was used at create time)
for REPO in $(aws ecr describe-repositories --region "${AWS_REGION}" \
--query "repositories[?starts_with(repositoryName, '${EKS_CLUSTER_NAME}/')].repositoryName" \
--output text); do
aws ecr delete-repository --repository-name "${REPO}" \
--region "${AWS_REGION}" --force
done
# 4. EKS cluster (also tears down the EBS CSI addon and its IAM role via
# CloudFormation — takes ~10 min)
eksctl delete cluster --name "${EKS_CLUSTER_NAME}" --region "${AWS_REGION}"
# 5. Local kubeconfig file
rm -f "${HOME}/kubeconfig-${EKS_CLUSTER_NAME}"
note

DNS records (CNAMEs at your DNS provider) must be removed manually.

Troubleshooting

SymptomLikely causeFix
eksctl create cluster errors with invalid version, supported values: 1.23, …, 1.31eksctl is older than the K8s version EKS currently defaults tobrew upgrade eksctl (or re-download), then retry
tare install --sync-only errors with NAME_UNKNOWN: The repository with name '...' does not existECR repo for that image was never createdThe sync-images.sh helper handles this; re-run it
Ingress ADDRESS stays empty for more than 2 minLB Controller IAM policy is missing newer permissionsUpdate the IAM policy from main, restart the controller
Pods stuck ImagePullBackOff with no basic auth credentialsECR token in registry-secret expired (12h lifetime)Re-run the image-pull-secret creation step (Step 8)
egress pods restart loop with connection refused to redisEBS CSI driver missing, so Redis PV never bindsInstall the EBS CSI addon (Step 4)

Appendix a: TLS via acm + HTTPS listener

The default flow uses HTTP on port 80. For production, terminate TLS at the ALB using an AWS Certificate Manager (ACM) certificate.

  1. Request the certificate in ACM for the data-plane hostname. Validate via DNS (CNAME) or email per the domain control method.
  2. Get the certificate ARN:
CERT_ARN=$(aws acm list-certificates --region "${AWS_REGION}" \
--query "CertificateSummaryList[?DomainName=='proxy.example.com'].CertificateArn" \
--output text)
  1. Update the Ingress to listen on HTTPS:443 and redirect HTTP → HTTPS:
metadata:
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'
alb.ingress.kubernetes.io/ssl-redirect: '443'
alb.ingress.kubernetes.io/certificate-arn: <CERT_ARN>
alb.ingress.kubernetes.io/ssl-policy: ELBSecurityPolicy-TLS13-1-2-2021-06
  1. Update the data-plane URL on the management plane to https://.... Re-run tare install to re-register the --serve-url value automatically.

The ALB picks up the annotation change without redeploying. Verify with:

curl -I https://proxy.example.com/v1/models -H "Authorization: Bearer ${TARS_API_KEY}"

Appendix b: observability with OpenTelemetry

The data plane emits OTLP metrics (router_* family). To forward them to your observability backend, deploy an OpenTelemetry Collector in-cluster.

cat <<'EOF' | kubectl apply -f -
apiVersion: v1
kind: Namespace
metadata:
name: otel-system

apiVersion: v1
kind: ConfigMap
metadata:
name: otel-collector-config
namespace: otel-system
data:
config.yaml: |
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
processors:
batch:
timeout: 5s
transform/strip_scope:
metric_statements:
- context: metric
statements:
- replace_pattern(name, "^dynamicmodulescustom\\.", "router_")
exporters:
debug:
verbosity: detailed
# Replace this with your real backend exporter (Datadog, Grafana Cloud,
# SigNoz, CloudWatch via the awsemf exporter, etc.).
service:
pipelines:
metrics:
receivers: [otlp]
processors: [batch, transform/strip_scope]
exporters: [debug]

apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
namespace: otel-system
spec:
replicas: 1
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: collector
image: otel/opentelemetry-collector-contrib:0.98.0
ports:
- containerPort: 4317
- containerPort: 4318
volumeMounts:
- name: config
mountPath: /etc/otelcol-contrib
volumes:
- name: config
configMap:
name: otel-collector-config

apiVersion: v1
kind: Service
metadata:
name: otel-collector
namespace: otel-system
spec:
selector:
app: otel-collector
ports:
- name: otlp-http
port: 4318
targetPort: 4318
- name: otlp-grpc
port: 4317
targetPort: 4317
EOF

Point the egress envoy at the collector:

kubectl set env deploy/egress -n tars-dataplane \
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector.otel-system.svc.cluster.local:4318 \
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf

Generate traffic via the smoke tests, then verify metrics are flowing:

kubectl logs -n otel-system deployment/otel-collector --tail=50 | grep "Name:"

Expected metrics include router_requests_total, router_auth_attempts_total, and similar.

note

The kubectl set env patch is reset by tare install re-runs. Re-apply after every upgrade until the chart accepts these settings via Helm values.