Gateway installation guide for AKS
This guide installs the Agent Router gateway in an Azure Kubernetes Service (AKS) environment using the tare command-line utility. The resulting ingress terminates TLS for the data plane and routes traffic to it from a customer-facing hostname.
By the end of this guide the cluster has:
- AGIC (Application Gateway Ingress Controller) enabled on the AKS cluster.
- A new Application Gateway (Standard_v2) provisioned in the
MC_resource group. - An Ingress resource in the
tars-dataplanenamespace, annotated for AGIC health probes. - A customer-facing DNS
Arecord pointing at the App Gateway frontend.
Plan for 45--90 minutes end-to-end:
- 10--15 minutes of CLI work.
- 15--60 minutes of background certificate provisioning.
Architecture
┌───────────────────────────────────────────┐
│ proxy.<your-domain> ← customer DNS │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Static IP │ ← Global address │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Cert Map + │ ← TLS termination │
│ │ Managed Cert │ │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Agent Router | ← installed in this │
│ │ Gateway(k8s) │ guide │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Agent Router | |
| | Dataplane │ ← installed earlier │
│ └────────────────┘ via tare CLI │
└───────────────────────────────────────────┘
The tare utility provisions everything from the static IP down to the gateway routes. The operator provides one customer-facing DNS record at the top.
This guide replaces the manual AGIC enable plus Ingress YAML in the main Azure runbook (Step 9a) with a single command. The command runs the same pre-flights as tare doctor, enables the AGIC addon, applies the dataplane Ingress with the correct health-probe annotations, and waits for the App Gateway to come up.
Use this guide when:
- An AKS cluster already exists and the dataplane is installed (
tare install ...has already run). - The cluster is AGIC-compatible:
networkProfile.networkPlugin=azure,networkPluginMode=null,networkDataplane=azure. Cilium and Overlay clusters must use AGC (a separate path, tracked as a follow-up to fraser#3687). - A reproducible, scripted install is preferred over running
az aks enable-addonspluskubectl apply -f ingress.yamlby hand.
Do not use this guide when:
- The cluster runs Cilium or Overlay. The pre-flight refuses; use the AGC path in the main runbook.
- The dataplane is not yet installed. Gateway install needs the
tars-dataplaneService to point its Ingress at. - Only an inspection of intended changes is required. Use
--plan-onlyor--dry-run-prereqsfirst.
Prerequisites
tareCLIv0.1.0-beta.2or later. Install or upgrade withcurl -fsSL https://tare.tetrate.ai/tools/install.sh | bash. The binary lands at~/.tare/bin/tare; add it toPATH.azCLI signed in to the subscription that owns the AKS cluster.kubectlpointed at the AKS cluster (az aks get-credentials -g $RG -n $CLUSTER); confirm withkubectl config current-context.- Dataplane already installed;
kubectl get deploy -n tars-dataplane tars-dataplanereturns a deployment. identity.json(the dataplane credential and service-account file). The same file used fortare install.- Azure RBAC on the signed-in identity:
Azure Kubernetes Service Contributor Roleon the AKS resource group (foraz aks enable-addons).Network Contributoron theMC_resource group (MC_<rg>_<cluster>_<region>). AGIC requires this to manage the App Gateway lifecycle. The grant must be scoped toMC_RG; the parent-RG scope does not propagate.- The pre-flight prints the exact
az role assignment createcommands if either role is missing. It degrades gracefully when role-list read is denied, printing a warning rather than failing fast.
Conventions
The commands below assume the following environment variables are exported in the current shell:
export CUSTOMER="acme"
export SERVE_DOMAIN="proxy.acme.example.com"
export SUBSCRIPTION_ID="<azure-subscription-id>"
export AKS_RG="<aks-resource-group>"
export AKS_CLUSTER_NAME="<aks-cluster-name>"
export REGION="<aks-region>"
Each step assumes these variables remain set; export them again after opening a new shell.
Step 1: generate the config
tare gateway config init --type azure \
--customer "${CUSTOMER}" \
--serve-domain "${SERVE_DOMAIN}" \
--azure-subscription-id "${SUBSCRIPTION_ID}" \
--azure-resource-group "${AKS_RG}" \
--aks-cluster-name "${AKS_CLUSTER_NAME}"
The command writes azure-gateway.json to the current directory. The App Gateway name (<aks-cluster-name>-appgw) and subnet CIDR (10.225.0.0/24) are filled in by the resolver at install time using documented defaults. Edit azure-gateway.json to override either.
The interactive wizard runs when the flags are omitted; it prompts in order for customer ID, serve domain, subscription, resource group, and cluster name.
Step 2: lint the config
tare gateway config lint --config azure-gateway.json --type azure
The lint command is hermetic and makes no cloud calls. GCP-only rules self-gate, so they do not produce phantom warnings against an Azure config. Exit code 0 with warnings is acceptable for a fresh wizard output; exit code 1 means a required field is missing or malformed.
Step 3: preview the install
tare gateway install identity.json --type azure --config azure-gateway.json --plan-only
The Plan view shows:
- The pre-flights that will run and the checks they perform.
- The
az aks enable-addonscommand, or a skip notice if AGIC is already enabled. - The Ingress manifest that will be applied to
tars-dataplane.
Review the Plan before applying. To change any value, edit azure-gateway.json and rerun.
Step 4: apply
Step 4.1: dry-run the cloud commands (optional)
tare gateway install identity.json --type azure --config azure-gateway.json --dry-run-prereqs
The dry-run prints the az commands without executing them. The output is suitable for capture into a change ticket or for execution from a workstation with different credentials.
Step 4.2: run the install
tare gateway install identity.json --type azure --config azure-gateway.json --apply-prereqs --wait --yes
--yes skips the Proceed? prompt and is required in non-TTY contexts such as CI and scripts. Omit it in an interactive terminal to keep a final confirmation prompt.
Step 4.3: install sequence
The install proceeds through the following stages:
- Pre-flight
networkProfile- fails fast if the cluster is AGIC-incompatible (Cilium or Overlay). - Pre-flight Azure RBAC - checks
AKS Service Contributoron the AKS RG andNetwork ContributoronMC_RGagainst the signed-in identity. Prints exactaz role assignment createcommands and exits when a role is missing. When role-list read is denied, the step warns and continues. - Pre-flight
kubectl auth can-i- verifies the current kubeconfig cancreate ingressintars-dataplane. az aks enable-addons ingress-appgw- provisions a new App Gateway Standard_v2 (approximately 5 minutes). Idempotent: when the addon is already enabled, the step is skipped.- Wait for
operationalState=Running- polls the App Gateway resource. A known race exists where AGIC enable returns success while the App Gateway is brieflyStopped; the wait loop auto-recovers by issuing astarton the gateway. - Apply Ingress to
tars-dataplanewith the two AGIC health-probe annotations (appgw.ingress.kubernetes.io/health-probe-path: /andhealth-probe-status-codes: "200-499"). Without these annotations, AGIC marks the envoy backend unhealthy and every request returns 502. --wait- polls until the Ingress reports anaddress. If the address stays empty for 60 seconds or more after the App Gateway isRunning, the install issues a one-shot AGIC re-nudge (cycling the addon). The AGIC controller sometimes misses the first reconcile.
Step 5: wire DNS and verify
kubectl get ingress -n tars-dataplane
# NAME CLASS HOSTS ADDRESS PORTS AGE
# tars-dataplane azure-application-gateway proxy.acme.example.com 20.62.x.y 80 2m
Point the wildcard or serve-domain A record at the ADDRESS value. The App Gateway listens on port 80.
Smoke-test the endpoint:
curl -sS -o /dev/null -w '%{http_code}\n' "https://${SERVE_DOMAIN}/healthz"
# expected: 200 (after DNS propagates)
Step 6: confirm via tare doctor
tare doctor identity.json
The doctor should report Layer 9 (cloud ingress / Application Gateway) as healthy. If the smoke test fails, run /agentrouter dp diagnose for the layer-by-layer walk.
Troubleshooting
Appgw operationalState=Stopped after enable
Known race: AGIC sometimes returns addon enabled before the gateway finishes starting. The wait loop in tare gateway install auto-recovers by issuing az network application-gateway start. For manual recovery:
az network application-gateway start \
--resource-group "MC_${AKS_RG}_${AKS_CLUSTER_NAME}_${REGION}" \
--name "${AKS_CLUSTER_NAME}-appgw"
Ingress address stays empty after appgw is running
AGIC occasionally misses the first reconcile. The --wait flow handles this by cycling the addon after 60 seconds. For manual recovery:
az aks disable-addons --addons ingress-appgw -g "${AKS_RG}" -n "${AKS_CLUSTER_NAME}"
az aks enable-addons --addons ingress-appgw -g "${AKS_RG}" -n "${AKS_CLUSTER_NAME}" \
--appgw-name "${AKS_CLUSTER_NAME}-appgw"
The recovery re-uses the existing App Gateway and is safe to repeat.
AuthorizationFailed despite freshly granted rbac
The Azure CLI caches tokens for approximately one hour. Mint a fresh token:
az logout
az login
Then rerun tare gateway install.
Pre-flight fails with networkProfile incompatible
The cluster runs Cilium or Overlay. AGIC will never converge - the App Gateway provisions but stays Stopped, and AGIC logs failed to reconcile overlay CNI. Two options apply:
- Switch to AGC (Application Gateway for Containers). See Path 9b in the main Azure runbook. AGC works on any AKS dataplane and is Microsoft's recommended replacement.
- Rebuild the cluster with traditional Azure CNI:
az aks create ... --network-plugin azure --network-dataplane azure(omit--network-plugin-mode overlay). TheazCLI 2.60 and later defaults silently to Cilium and Overlay; pass--network-dataplane azureexplicitly.
Backend health unhealthy, every request returns 502
When the Ingress is applied manually without the AGIC health-probe annotations, AGIC defaults to GET / expecting 200, but the egress envoy returns 404 on /. Re-apply the Ingress through tare gateway install (the install always writes the two annotations), or patch the Ingress directly:
kubectl annotate ingress -n tars-dataplane tars-dataplane \
appgw.ingress.kubernetes.io/health-probe-path=/ \
appgw.ingress.kubernetes.io/health-probe-status-codes='200-499' --overwrite
Notes
- AGC (the Cilium and Overlay path) is not implemented in
tare gateway install --type azureyet - tracked as a follow-up to fraser#3687. Until then, AGC clusters follow Step 9b of the main Azure runbook manually. - AWS (
--type aws) is not implemented. - The install is idempotent at the addon level (re-enabling is a no-op). The Ingress apply uses
kubectl apply, so re-running picks up annotation changes made by hand inazure-gateway.json.
Where to go next