Skip to main content

Gateway installation guide for AKS

This guide installs the Agent Router gateway in an Azure Kubernetes Service (AKS) environment using the tare command-line utility. The resulting ingress terminates TLS for the data plane and routes traffic to it from a customer-facing hostname.


By the end of this guide the cluster has:

  • AGIC (Application Gateway Ingress Controller) enabled on the AKS cluster.
  • A new Application Gateway (Standard_v2) provisioned in the MC_ resource group.
  • An Ingress resource in the tars-dataplane namespace, annotated for AGIC health probes.
  • A customer-facing DNS A record pointing at the App Gateway frontend.

Plan for 45--90 minutes end-to-end:

  • 10--15 minutes of CLI work.
  • 15--60 minutes of background certificate provisioning.

Architecture

┌───────────────────────────────────────────┐
│ proxy.<your-domain> ← customer DNS │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Static IP │ ← Global address │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Cert Map + │ ← TLS termination │
│ │ Managed Cert │ │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Agent Router | ← installed in this │
│ │ Gateway(k8s) │ guide │
│ └────────┬───────┘ │
│ │ │
│ ┌────────▼───────┐ │
│ │ Agent Router | |
| | Dataplane │ ← installed earlier │
│ └────────────────┘ via tare CLI │
└───────────────────────────────────────────┘

The tare utility provisions everything from the static IP down to the gateway routes. The operator provides one customer-facing DNS record at the top.

info

This guide replaces the manual AGIC enable plus Ingress YAML in the main Azure runbook (Step 9a) with a single command. The command runs the same pre-flights as tare doctor, enables the AGIC addon, applies the dataplane Ingress with the correct health-probe annotations, and waits for the App Gateway to come up.

Use this guide when:

  • An AKS cluster already exists and the dataplane is installed (tare install ... has already run).
  • The cluster is AGIC-compatible: networkProfile.networkPlugin=azure, networkPluginMode=null, networkDataplane=azure. Cilium and Overlay clusters must use AGC (a separate path, tracked as a follow-up to fraser#3687).
  • A reproducible, scripted install is preferred over running az aks enable-addons plus kubectl apply -f ingress.yaml by hand.

Do not use this guide when:

  • The cluster runs Cilium or Overlay. The pre-flight refuses; use the AGC path in the main runbook.
  • The dataplane is not yet installed. Gateway install needs the tars-dataplane Service to point its Ingress at.
  • Only an inspection of intended changes is required. Use --plan-only or --dry-run-prereqs first.

Prerequisites

  • tare CLI v0.1.0-beta.2 or later. Install or upgrade with curl -fsSL https://tare.tetrate.ai/tools/install.sh | bash. The binary lands at ~/.tare/bin/tare; add it to PATH.
  • az CLI signed in to the subscription that owns the AKS cluster.
  • kubectl pointed at the AKS cluster (az aks get-credentials -g $RG -n $CLUSTER); confirm with kubectl config current-context.
  • Dataplane already installed; kubectl get deploy -n tars-dataplane tars-dataplane returns a deployment.
  • identity.json (the dataplane credential and service-account file). The same file used for tare install.
  • Azure RBAC on the signed-in identity:
    • Azure Kubernetes Service Contributor Role on the AKS resource group (for az aks enable-addons).
    • Network Contributor on the MC_ resource group (MC_<rg>_<cluster>_<region>). AGIC requires this to manage the App Gateway lifecycle. The grant must be scoped to MC_RG; the parent-RG scope does not propagate.
    • The pre-flight prints the exact az role assignment create commands if either role is missing. It degrades gracefully when role-list read is denied, printing a warning rather than failing fast.

Conventions

The commands below assume the following environment variables are exported in the current shell:

export CUSTOMER="acme"
export SERVE_DOMAIN="proxy.acme.example.com"
export SUBSCRIPTION_ID="<azure-subscription-id>"
export AKS_RG="<aks-resource-group>"
export AKS_CLUSTER_NAME="<aks-cluster-name>"
export REGION="<aks-region>"

Each step assumes these variables remain set; export them again after opening a new shell.

Step 1: generate the config

tare gateway config init --type azure \
--customer "${CUSTOMER}" \
--serve-domain "${SERVE_DOMAIN}" \
--azure-subscription-id "${SUBSCRIPTION_ID}" \
--azure-resource-group "${AKS_RG}" \
--aks-cluster-name "${AKS_CLUSTER_NAME}"

The command writes azure-gateway.json to the current directory. The App Gateway name (<aks-cluster-name>-appgw) and subnet CIDR (10.225.0.0/24) are filled in by the resolver at install time using documented defaults. Edit azure-gateway.json to override either.

The interactive wizard runs when the flags are omitted; it prompts in order for customer ID, serve domain, subscription, resource group, and cluster name.

Step 2: lint the config

tare gateway config lint --config azure-gateway.json --type azure

The lint command is hermetic and makes no cloud calls. GCP-only rules self-gate, so they do not produce phantom warnings against an Azure config. Exit code 0 with warnings is acceptable for a fresh wizard output; exit code 1 means a required field is missing or malformed.

Step 3: preview the install

tare gateway install identity.json --type azure --config azure-gateway.json --plan-only

The Plan view shows:

  • The pre-flights that will run and the checks they perform.
  • The az aks enable-addons command, or a skip notice if AGIC is already enabled.
  • The Ingress manifest that will be applied to tars-dataplane.

Review the Plan before applying. To change any value, edit azure-gateway.json and rerun.

Step 4: apply

Step 4.1: dry-run the cloud commands (optional)

tare gateway install identity.json --type azure --config azure-gateway.json --dry-run-prereqs

The dry-run prints the az commands without executing them. The output is suitable for capture into a change ticket or for execution from a workstation with different credentials.

Step 4.2: run the install

tare gateway install identity.json --type azure --config azure-gateway.json --apply-prereqs --wait --yes

--yes skips the Proceed? prompt and is required in non-TTY contexts such as CI and scripts. Omit it in an interactive terminal to keep a final confirmation prompt.

Step 4.3: install sequence

The install proceeds through the following stages:

  1. Pre-flight networkProfile - fails fast if the cluster is AGIC-incompatible (Cilium or Overlay).
  2. Pre-flight Azure RBAC - checks AKS Service Contributor on the AKS RG and Network Contributor on MC_RG against the signed-in identity. Prints exact az role assignment create commands and exits when a role is missing. When role-list read is denied, the step warns and continues.
  3. Pre-flight kubectl auth can-i - verifies the current kubeconfig can create ingress in tars-dataplane.
  4. az aks enable-addons ingress-appgw - provisions a new App Gateway Standard_v2 (approximately 5 minutes). Idempotent: when the addon is already enabled, the step is skipped.
  5. Wait for operationalState=Running - polls the App Gateway resource. A known race exists where AGIC enable returns success while the App Gateway is briefly Stopped; the wait loop auto-recovers by issuing a start on the gateway.
  6. Apply Ingress to tars-dataplane with the two AGIC health-probe annotations (appgw.ingress.kubernetes.io/health-probe-path: / and health-probe-status-codes: "200-499"). Without these annotations, AGIC marks the envoy backend unhealthy and every request returns 502.
  7. --wait - polls until the Ingress reports an address. If the address stays empty for 60 seconds or more after the App Gateway is Running, the install issues a one-shot AGIC re-nudge (cycling the addon). The AGIC controller sometimes misses the first reconcile.

Step 5: wire DNS and verify

kubectl get ingress -n tars-dataplane
# NAME CLASS HOSTS ADDRESS PORTS AGE
# tars-dataplane azure-application-gateway proxy.acme.example.com 20.62.x.y 80 2m

Point the wildcard or serve-domain A record at the ADDRESS value. The App Gateway listens on port 80.

Smoke-test the endpoint:

curl -sS -o /dev/null -w '%{http_code}\n' "https://${SERVE_DOMAIN}/healthz"
# expected: 200 (after DNS propagates)

Step 6: confirm via tare doctor

tare doctor identity.json

The doctor should report Layer 9 (cloud ingress / Application Gateway) as healthy. If the smoke test fails, run /agentrouter dp diagnose for the layer-by-layer walk.

Troubleshooting

Appgw operationalState=Stopped after enable

Known race: AGIC sometimes returns addon enabled before the gateway finishes starting. The wait loop in tare gateway install auto-recovers by issuing az network application-gateway start. For manual recovery:

az network application-gateway start \
--resource-group "MC_${AKS_RG}_${AKS_CLUSTER_NAME}_${REGION}" \
--name "${AKS_CLUSTER_NAME}-appgw"

Ingress address stays empty after appgw is running

AGIC occasionally misses the first reconcile. The --wait flow handles this by cycling the addon after 60 seconds. For manual recovery:

az aks disable-addons --addons ingress-appgw -g "${AKS_RG}" -n "${AKS_CLUSTER_NAME}"
az aks enable-addons --addons ingress-appgw -g "${AKS_RG}" -n "${AKS_CLUSTER_NAME}" \
--appgw-name "${AKS_CLUSTER_NAME}-appgw"

The recovery re-uses the existing App Gateway and is safe to repeat.

AuthorizationFailed despite freshly granted rbac

The Azure CLI caches tokens for approximately one hour. Mint a fresh token:

az logout
az login

Then rerun tare gateway install.

Pre-flight fails with networkProfile incompatible

The cluster runs Cilium or Overlay. AGIC will never converge - the App Gateway provisions but stays Stopped, and AGIC logs failed to reconcile overlay CNI. Two options apply:

  1. Switch to AGC (Application Gateway for Containers). See Path 9b in the main Azure runbook. AGC works on any AKS dataplane and is Microsoft's recommended replacement.
  2. Rebuild the cluster with traditional Azure CNI: az aks create ... --network-plugin azure --network-dataplane azure (omit --network-plugin-mode overlay). The az CLI 2.60 and later defaults silently to Cilium and Overlay; pass --network-dataplane azure explicitly.

Backend health unhealthy, every request returns 502

When the Ingress is applied manually without the AGIC health-probe annotations, AGIC defaults to GET / expecting 200, but the egress envoy returns 404 on /. Re-apply the Ingress through tare gateway install (the install always writes the two annotations), or patch the Ingress directly:

kubectl annotate ingress -n tars-dataplane tars-dataplane \
appgw.ingress.kubernetes.io/health-probe-path=/ \
appgw.ingress.kubernetes.io/health-probe-status-codes='200-499' --overwrite

Notes

  • AGC (the Cilium and Overlay path) is not implemented in tare gateway install --type azure yet - tracked as a follow-up to fraser#3687. Until then, AGC clusters follow Step 9b of the main Azure runbook manually.
  • AWS (--type aws) is not implemented.
  • The install is idempotent at the addon level (re-enabling is a no-op). The Ingress apply uses kubectl apply, so re-running picks up annotation changes made by hand in azure-gateway.json.