tare upgrade
Upgrade an installed TARS dataplane to the version embedded in the current
tare binary, with zero downtime when the data plane is HA.
Synopsis
tare upgrade <identity-file> [flags]
Description
tare upgrade performs the operator workflow for moving an existing TARS
dataplane release to a newer version. Unlike tare install, it does not
fresh-install and does not re-prompt for configuration the operator already
supplied at install time, it reads the live release's Helm values via
helm get values and carries forward the operator-supplied config.
Phases:
- Preflight (Helm 3+,
kubectl, cluster connectivity; warns when metrics-server is absent and HA is on) - Discover existing release in the target namespace (refuses to run if
none is found: that path is
tare install) - Read existing release values and carry forward registry, customer, environment, deployment mode, serve URL, image pull secret name, and the OTel collector enable + endpoint + auth-header config
- HA preflight: read the
EnvoyProxyCRD'senvoyHpa.minReplicas; refuse to upgrade if it is< 2unless--allow-downtimeis set - Apply CRDs from the embedded chart (Helm's
crds/directory is install-only, so an upgrade has to upsert CRDs explicitly) helm upgrade --install --atomic --waitagainst the embedded chart
helm and kubectl are downloaded automatically on first run into
~/.tare/tools/ and reused on subsequent runs.
The new tare binary's embedded chart supplies the new chart defaults:
notably the HA-by-default flip and the EnvoyProxy.spec.shutdown block.
Existing operator-supplied values override chart defaults where they
overlap.
Image sync is out of scope.
tare upgradedoes not pull new images into your registry. Sync first, then upgrade:tare install identity.json --image-sync <REG> --sync-onlytare upgrade identity.json
Usage
Standard upgrade
After downloading a newer tare binary and placing it in ~/.tare/bin/:
tare upgrade identity.json
tare upgrade reads the existing release values, carries them forward,
applies CRD changes from the new chart, and runs helm upgrade --install --atomic --wait. A failed rollout auto-rolls back to the previous
revision.
Single-replica install (lab / non-production)
If the existing install is single-replica (the pre-ADR-041 default, or
tare install --ha=false), tare upgrade refuses to proceed because
rolling the single data-plane Envoy pod would drop in-flight LLM streams:
data-plane Envoy is single-replica (envoyHpa.minReplicas=1).
Upgrading now will RST in-flight LLM streams.
Fix HA first (idempotent, takes ~1m):
tare install --ha identity.json
Or force this upgrade with downtime:
tare upgrade identity.json --allow-downtime
The recommended path is to migrate to HA first via tare install --ha
(which is idempotent: it calls helm upgrade --install and bumps the
HPA minReplicas from 1 to 2). That migration step itself causes a
brief one-time disruption while the second pod comes up, but every
subsequent tare upgrade is zero-downtime.
Override carried-forward values
CLI flags take precedence over carryover. Override only what you need to change, e.g., to move to a new image registry as part of the upgrade:
# 1. Sync to the new registry
tare install identity.json --image-sync newregistry.example.com --sync-only
# 2. Upgrade using the new registry
tare upgrade identity.json --image-registry newregistry.example.com
Disable atomic rollback
When debugging a stuck release, run without --atomic to leave the
half-deployed release in place:
tare upgrade identity.json --no-atomic
The default --atomic is the safer behavior for production. Use this
flag only when you need to inspect a failed rollout.
Flags
Main
| Flag | Default | Description |
|---|---|---|
--allow-downtime | false | Proceed even when the data plane is single-replica. In-flight LLM streams will be dropped. |
--ha | true | HA-safe defaults for the data-plane Envoy proxy (HPA min 2, PDB min 1). Pass --ha=false to keep single-replica. |
--drain-timeout-seconds | 300 | EnvoyProxy.spec.shutdown.drainTimeout (seconds). Maximum time Envoy waits for in-flight requests (long LLM streams) to finish before SIGKILL. |
--no-atomic | false | Disable helm --atomic (no auto-rollback on failure). Leaves stuck releases in place for debugging. |
--timeout | 10m | helm upgrade --wait timeout. Should exceed drainTimeout × replicas to allow serial drain. |
Telemetry
| Flag | Default | Description |
|---|---|---|
--enable-otel-collector | carry-forward | Override the OTel collector enable state. Default carries the existing release's value. |
--otel-collector-endpoint <URL> | carry-forward | Override OTel OTLP endpoint. |
--otel-exporter-auth-headers <H> | carry-forward | Override OTel Authorization header value. |
Advanced / hidden
Hidden from --help but accepted for advanced use; defaults all come
from carryover unless overridden.
| Flag | Default | Description |
|---|---|---|
--release-name | tars | Helm release name. |
--system-namespace | tars-system | Kubernetes system namespace. |
--dataplane-namespace | tars-dataplane | Kubernetes dataplane namespace. |
--chart-path | embedded | Path or OCI/HTTP reference to chart. |
--chart-version | none | Chart version (for remote/OCI charts). |
--image-registry | carry-forward | Image registry override. |
--image-tag | embedded manifest | Image tag override. |
--deployment-mode | carry-forward | enterprise or saas. |
--customer | carry-forward | Customer label override. |
--environment | carry-forward | Environment label override. |
--serve-url | carry-forward | Serve URL override. |
--redis-type | cluster | cluster or hosted. |
--no-rate-limiting | false | Disable rate limiting. |
--skip-preflight | false | Skip Helm/kubectl/cluster preflight (also TARE_SKIP_PREFLIGHT=1). |
Carryover behavior
tare upgrade reads the existing release's user-supplied values (helm get values returns only what was passed via -f or --set at install
time, not chart defaults). The following fields are carried forward
into the regenerated values:
| Helm values path | CLI flag override |
|---|---|
global.imageRegistry | --image-registry |
global.customer | --customer |
global.environment | --environment |
global.deploymentMode | --deployment-mode |
global.serveUrl | --serve-url |
global.imagePullSecretName | (no override; lives in operator-supplied pull secret) |
otelCollector.enabled | --enable-otel-collector |
otelCollector.exporters.otlp.endpoint | --otel-collector-endpoint |
otelCollector.exporters.otlp.headers.Authorization | --otel-exporter-auth-headers |
Fields not in the carryover list fall back to embedded chart defaults (from the new tare binary). Pass the corresponding flag to override.
Single-replica preflight
tare upgrade reads
spec.provider.kubernetes.envoyHpa.minReplicas from the
EnvoyProxy CRD in the system namespace. The check has three outcomes:
minReplicas ≥ 2: proceed.minReplicas < 2and--allow-downtimenot set: abort with the migration message.EnvoyProxyresource not found, or kubectl call fails: print a warning and skip the preflight (operator's responsibility to verify HA).
The preflight does not currently inspect live Deployment.spec.replicas
because the EnvoyProxy CRD HPA configuration is the source-of-truth for
how the data plane scales over time. Inspecting both would tighten the
check; that is a future enhancement.
Atomic rollback
The default helm upgrade --atomic makes the upgrade transactional: if
any pod fails its readiness probe within --timeout, Helm rolls the
release back to the previous revision and exits non-zero. The data
plane returns to its previous state without operator intervention.
--no-atomic disables this: useful for inspecting a stuck release
mid-rollout, at the cost of leaving traffic potentially affected.
Output
tare upgrade writes phase-prefixed status lines ([i/N] <phase>) to
stderr. Final lines include the rollback hint and inspection commands:
[1/5] Preflight (Helm / kubectl / cluster)
[2/5] Discover existing release
release tars found in tars-system
carried forward: registry="gcr.io/acme/tars" customer="acme" env="" otel=true
[3/5] HA preflight
EnvoyProxy minReplicas=2 (HA)
Using embedded chart
[4/5] Apply / update CRDs
3 created, 12 updated, 4 skipped
[5/5] Helm upgrade
...
Upgrade completed.
Rollback (if needed):
helm rollback tars -n tars-system
Inspect:
kubectl get pods -n tars-system
kubectl get pods -n tars-dataplane
Verification
# Pods in both namespaces should be Running with the new image tag
kubectl get pods -n tars-system
kubectl get pods -n tars-dataplane
# EnvoyProxy HA configuration
kubectl get envoyproxy -n tars-system -o yaml | yq .spec.provider.kubernetes.envoyHpa
# Active rollout status
kubectl get deploy -n tars-dataplane -o wide
Management-plane event reporting
tare upgrade reports the upgrade as an event to the management plane via
the same in-cluster ConfigMap outbox that tare install uses. After a
successful helm upgrade, the CLI writes a ConfigMap labelled
tars.io/component=install-event to tars-system carrying operation=upgrade
and the new Helm revision. The tare-doctor CronJob forwards it on its
next tick (default every 5 minutes); the dashboard's deployment-history
timeline shows the row at helm_revision = N+1 with from_version and
to_version populated when chart versions differ.
The event POST never blocks the upgrade and never changes the exit code.
Failed atomic-rollback upgrades (helm upgrade --atomic that triggers a
rollback) produce two dashboard rows:
- One for the failed upgrade attempt:
operation: upgrade, status: failed,error_summarypopulated. - One for the auto-rollback:
operation: rollback, status: success, on the new helm revision the rollback created. This is thetare-doctorreconciler recognising helm's "Rollback to N" description and labelling the artefact correctly, instead of attributing the system reversion to the operator who triggered the upgrade.
Rollback
tare upgrade defaults to --atomic, so failed upgrades roll back
automatically. To roll back a successful but problematic upgrade:
# View revision history
helm history tars -n tars-system
# Roll back to the previous revision
helm rollback tars -n tars-system
# Or to a specific revision number
helm rollback tars 5 -n tars-system
Note: helm rollback does not roll back CRDs. Any CRD schema changes
applied during the upgrade remain in place. This is rarely an issue in
practice because CRD changes are additive (new fields, no removed
fields) in supported upgrade paths.
Migration: single-replica → HA
Customers whose tare install ran before ADR 041 default flip are on
single-replica installs. The migration path is one idempotent install
call that bumps the EnvoyProxy CRD's HPA minReplicas from 1 to 2:
tare install --ha identity.json
This calls helm upgrade --install, so it does not require uninstall
and does not change any other configuration. The Envoy Gateway
controller scales the data-plane Deployment up to 2 pods (a one-time
brief disruption while the second pod comes up).
After this migration, tare upgrade is zero-downtime.
Where to go next