Performance and regression guards¶
AgentTier commits to a small set of performance budgets. This page states them, shows how to measure them, and describes the guard that keeps the Web UI bundle from regressing silently.
Budgets¶
| Metric | Budget | Why |
|---|---|---|
| Cold sandbox start (no warm pool) | ≤ 10 s | A kubectl apply / API create to a usable Running sandbox. |
| Warm sandbox start (warm pool hit) | ≤ 1 s | A pre-provisioned pool pod is claimed in place. |
| Web UI JS bundle | ≤ 750 KB minified | Keeps first paint fast; enforced in CI. |
| Reconciler queue depth (steady state) | ~0 | A queue sitting > 0 for more than a few seconds is a controller bug, not load. |
Bundle-size gate (enforced in CI)¶
The CI build job runs hack/check-bundle-size.sh after npm run build and
fails the build if the emitted Vite JS exceeds 750 KB. Run it locally the
same way:
(cd web-ui && npm ci && npm run build) && hack/check-bundle-size.sh
Override the limit deliberately with BUNDLE_LIMIT_KB=… (and write down why).
If you blow the budget, split the heavy feature behind a dynamic import()
rather than raising the ceiling.
Cold vs. warm start (hack/perf-smoke.sh)¶
Measures p50/p99 time-to-Running against a live cluster (kind or the e2e
cluster). Run it twice to compare:
# Cold: ensure the template's warm pool is at 0, then:
COUNT=10 NS=agenttier TEMPLATE=general-coding hack/perf-smoke.sh
# Warm: pre-warm the pool (Web UI Settings → warm pools, or the warmpool API),
# wait for the pool to report Ready, then run the same command.
It prints p50/p99/max and cleans up the sandboxes it created. The warm number should land sub-second when a pool pod is claimed; the cold number is dominated by image pull + pod scheduling and should stay within the 10 s budget on a warm-image node.
Load / saturation (hack/load-test.sh)¶
Drives the Router API with hey to exercise
the opt-in rate limiter and find where a single Router replica saturates:
kubectl -n agenttier port-forward svc/agenttier-router 8080:8080 &
BASE=http://localhost:8080 TOKEN=<api-key> N=1000 C=50 hack/load-test.sh
A burst of 429s confirms the rate limiter engaging (when enabled); p99 latency
climbing sharply as concurrency rises is the signal that the single Router
replica is the bottleneck — the data point that justifies and sizes a
multi-replica / HPA rollout.
Reference numbers¶
These are indicative measurements on the agentloft-e2e cluster (2× t3.large,
EKS 1.30); reproduce with the scripts above on your own cluster.
| Scenario | p50 | p99 |
|---|---|---|
Warm-pool claim (general-coding) |
~0.8 s | ~1.0 s |
| Cold start, image already on node | ~6 s | ~9 s |
| Cold start, image pull required | dominated by pull | dominated by pull |
Numbers are tracked over time so a regression (a heavy init step, a bundle blow-up) shows up against this baseline rather than going unnoticed.