Installation¶
AgentTier installs as a single Helm chart. CRDs, RBAC, and reference templates are bundled.
Requirements¶
- Kubernetes 1.27+
- CNI that supports NetworkPolicy (Calico, Cilium, AWS VPC CNI with NetworkPolicy enabled)
- A CSI storage driver (EBS CSI, PD CSI, Azure Disk CSI, or any RWO-capable CSI)
- Helm 3.x
Optional but recommended:
- An ingress controller (ingress-nginx, AWS ALB Controller, Traefik) for the Web UI and port-forward preview URLs
- An OIDC identity provider (Cognito, Okta, Azure AD, Auth0) for multi-user auth
- gVisor
RuntimeClass(for running untrusted agent workloads with kernel-level isolation)
Quick install¶
helm repo add agenttier https://agenttier.github.io/agenttier/charts
helm repo update
helm install agenttier agenttier/agenttier \
--namespace agenttier --create-namespace
Images are pulled anonymously from ghcr.io/agenttier/*. Every released image is keyless-signed with cosign — see Verifying images before using on production-sensitive clusters.
Production install¶
A realistic values file for an EKS cluster with Cognito OIDC, warm pool, and ALB ingress:
# values.prod.yaml
auth:
oidc:
issuerUrl: "https://cognito-idp.us-east-1.amazonaws.com/us-east-1_XXXXXXXXX"
clientId: "your-client-id"
adminGroup: "agenttier-admins"
groupClaim: "cognito:groups"
networking:
defaultPolicy: deny-all
previewDomain: "preview.agenttier.example.com"
portForwardIngressClass: "alb"
security:
gvisor:
enabled: true
defaults:
sandbox:
image: "ghcr.io/agenttier/sandbox-general:v0.3.0"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
warmPool:
enabled: true
desiredCount: 2
template: "general-coding"
controller:
replicas: 2
resources:
requests: { cpu: "100m", memory: "128Mi" }
limits: { cpu: "500m", memory: "512Mi" }
router:
replicas: 2
service:
annotations:
service.beta.kubernetes.io/aws-load-balancer-connection-idle-timeout: "3600"
optional:
imagePrepull:
enabled: true
ingress:
enabled: true
className: alb
annotations:
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTP":80},{"HTTPS":443}]'
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:111122223333:certificate/xxxx
alb.ingress.kubernetes.io/load-balancer-attributes: idle_timeout.timeout_seconds=4000
alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.type=lb_cookie,stickiness.lb_cookie.duration_seconds=3600
hosts:
- host: agenttier.example.com
paths:
- path: /
pathType: Prefix
tls:
- hosts: [agenttier.example.com]
serviceMonitor:
enabled: true # requires Prometheus Operator
podDisruptionBudget:
enabled: true
observability:
otlp:
endpoint: "otel-collector.observability.svc.cluster.local:4317"
Install with this values file:
helm install agenttier agenttier/agenttier \
--namespace agenttier --create-namespace \
-f values.prod.yaml
Helm values reference¶
All values are documented inline in helm/agenttier/values.yaml. The knobs you will most often change:
Auth¶
| Value | Purpose |
|---|---|
auth.oidc.issuerUrl |
OIDC issuer URL. Empty = dev mode (every request is anonymous admin). |
auth.oidc.clientId |
OIDC client ID. |
auth.oidc.adminGroup |
Group name that receives the isAdmin claim. |
auth.oidc.groupClaim |
JWT claim that carries the user's groups (default groups). |
auth.apiKeys |
List of accepted API keys (SHA-256 hashed on disk). |
Networking¶
| Value | Purpose |
|---|---|
networking.defaultPolicy |
deny-all (default) or allow-internet. |
networking.previewDomain |
Wildcard domain for port-forward preview URLs. Leave empty to use only the Router-proxied preview. |
networking.portForwardIngressClass |
Ingress class name (alb, nginx, traefik). |
Sandbox defaults¶
| Value | Purpose |
|---|---|
defaults.sandbox.image |
Default sandbox image for templates that don't override. |
defaults.sandbox.resources |
Default CPU/memory requests and limits. |
defaults.sandbox.storage.size |
Default PVC size. |
defaults.sandbox.timeout |
Default max runtime. |
defaults.sandbox.idleTimeout |
Default idle auto-stop. |
Security¶
| Value | Purpose |
|---|---|
security.gvisor.enabled |
Create a gvisor RuntimeClass and mark it available to templates. |
security.podSecurityContext |
Overrides the restrictive default (non-root, RO rootfs, drop ALL caps). |
Warm pool¶
| Value | Purpose |
|---|---|
warmPool.enabled |
Leader-elected reconciler that pre-creates idle Pods. |
warmPool.desiredCount |
Number of hot spares to keep. |
warmPool.template |
Template the warm Pods use. |
The Settings page in the Web UI mutates the same values via the agenttier-warmpool-config ConfigMap, so admins can retune without redeploying the chart.
Optional add-ons¶
| Value | Purpose |
|---|---|
optional.imagePrepull.enabled |
DaemonSet that pre-caches sandbox images on every node. |
optional.serviceMonitor.enabled |
Prometheus Operator ServiceMonitor (requires the Operator). |
optional.podDisruptionBudget.enabled |
PDB for controller + router. |
optional.otelCollector.enabled |
Sidecar OTel Collector. |
Observability¶
| Value | Purpose |
|---|---|
observability.otlp.endpoint |
OTLP endpoint for traces + metrics + logs. |
observability.logLevel |
Controller + Router log verbosity (info, debug). |
Upgrading¶
Helm upgrades are in-place and CRD-aware. Chart versions track the app version.
helm repo update
helm upgrade agenttier agenttier/agenttier \
--namespace agenttier -f values.prod.yaml
See the CHANGELOG for per-version upgrade notes.
Uninstall¶
helm uninstall agenttier --namespace agenttier
kubectl delete namespace agenttier
# CRDs are kept by default so your sandboxes survive a re-install.
# Remove them explicitly if you want a clean slate:
kubectl delete crd \
sandboxes.agenttier.io \
sandboxtemplates.agenttier.io \
clustersandboxtemplates.agenttier.io
# If you're upgrading from the pre-rename `agentloft.io` CRDs (rare), also
# remove those — Helm won't touch them:
kubectl delete crd \
sandboxes.agentloft.io \
sandboxtemplates.agentloft.io \
clustersandboxtemplates.agentloft.io 2>/dev/null || true
Exposing the Web UI on AWS with ALB¶
For production on EKS, use the AWS Load Balancer Controller and enable the chart's Ingress. ALB has native WebSocket support, better idle timeout controls, TLS termination at the edge, and cleaner integration with WAF, ACM, and Route 53 than the legacy Classic ELB.
Prerequisites (one-time per cluster):
# 1. Download the latest IAM policy from upstream. The version pinned below
# works with AWS Load Balancer Controller v2.13+ (it includes the
# `elasticloadbalancing:DescribeListenerAttributes` permission that newer
# controllers require; older policy snapshots lack it and cause the
# controller to fail with "AccessDenied" when creating listener rules).
curl -sSL -o alb-iam-policy.json \
https://raw.githubusercontent.com/kubernetes-sigs/aws-load-balancer-controller/main/docs/install/iam_policy.json
aws iam create-policy --policy-name AWSLoadBalancerControllerIAMPolicy \
--policy-document file://alb-iam-policy.json
# 2. Associate the cluster's OIDC provider with IAM (safe to re-run).
aws eks describe-cluster --name <cluster> --query 'cluster.identity.oidc.issuer'
# 3. Create an IRSA role for the controller's ServiceAccount.
eksctl create iamserviceaccount \
--cluster <cluster> --namespace kube-system \
--name aws-load-balancer-controller \
--role-name AmazonEKSLoadBalancerControllerRole \
--attach-policy-arn=arn:aws:iam::<account>:policy/AWSLoadBalancerControllerIAMPolicy \
--override-existing-serviceaccounts --approve
# 4. Install the controller.
helm repo add eks https://aws.github.io/eks-charts
helm repo update
helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
--namespace kube-system --set clusterName=<cluster> \
--set serviceAccount.create=false \
--set serviceAccount.name=aws-load-balancer-controller
If you don't use eksctl, do step 3 manually: create an IAM role whose trust
policy federates to the cluster OIDC provider with sub =
system:serviceaccount:kube-system:aws-load-balancer-controller, attach the
policy, then annotate the ServiceAccount with
eks.amazonaws.com/role-arn=<role-arn>.
Then enable the chart's Ingress. The chart ships sensible defaults under
optional.ingress.annotations for idle_timeout.timeout_seconds=4000 and
sticky sessions, so long-running terminal sessions stay alive without
disconnects. Override host and optionally point certificate-arn at an ACM
certificate to terminate TLS at the ALB:
helm upgrade --install agenttier agenttier/agenttier \
--namespace agenttier --create-namespace \
--set optional.ingress.enabled=true \
--set optional.ingress.hosts[0].host=agenttier.example.com \
--set optional.ingress.hosts[0].paths[0].path=/ \
--set optional.ingress.hosts[0].paths[0].pathType=Prefix
The Router additionally sends WebSocket control-frame pings and application heartbeats every 30 seconds, so even with the 60s ALB default the browser terminal survives long idle periods.
Verifying released images¶
Every image published on a v* tag is keyless-signed and ships with SPDX + CycloneDX SBOMs. See Verifying images for cosign verify and cosign verify-attestation flows. For hardened clusters, enforce with Kyverno / sigstore policy-controller rather than relying on manual verification.