Features¶
What AgentTier ships today, grouped by what you probably need first.
Declarative sandboxes¶
- Kubernetes CRDs —
Sandbox(namespace-scoped),SandboxTemplate(namespace-scoped),ClusterSandboxTemplate(cluster-scoped). Manage sandboxes withkubectl, GitOps (Argo CD / Flux), or through the REST API, SDK, or Web UI. - State machine — Creating → Running → Stopped → Running → Deleting, with an Error sink and Kubernetes Events at every transition so
kubectl describe sandboxtells the full story. - Stop and resume — Stop deletes the Pod while preserving the PVC. Resume re-attaches the same volume in about two seconds. Workspace contents, installed packages, and git state are exactly as left.
- Idle and max-runtime timeouts — per-sandbox via
spec.idleTimeout/spec.timeout, or per-namespace via governance caps. A configurable grace window notifies connected terminal sessions before auto-stop. - Self-healing — restart on transient pod failures (OOM, preemption) with 10s / 20s / 40s / 80s / 160s exponential backoff. Permanent failure modes (image pull forever, config error) are surfaced on the sandbox
status.conditions.
Warm pod pool¶
- Sub-second startup — a leader-elected controller keeps N pre-provisioned Pods hot. When a user creates a sandbox, AgentTier claims one from the pool (measured 791 ms vs ~10 s cold).
- Immediate PVC binding — the warm pool uses a
gp3-immediateStorageClass so the EBS volume is provisioned up-front; pod scheduling no longer waits onWaitForFirstConsumer. - Runtime reconfiguration — change pool size or template through the Settings page; the controller picks it up from the
agenttier-warmpool-configConfigMap without a redeploy.
Templates and agent harnesses¶
- Field-level merge with inheritance —
spec.inheritsFromchains templates up to depth 10. Sandbox spec overrides template spec overrides parent template overrides cluster defaults, one field at a time. - Harness config — tell AgentTier which shell, tools, system prompt, and hooks to run. Hooks fire on start / idle / stop / resume.
- Init scripts — run cluster-approved setup commands before the container becomes Running (install extra tooling, clone a repo, wait for a service).
- Embedded files — templates can seed files into the workspace (e.g. a default
.tmux.conf, a README, a code-of-conduct). - Reference images —
general-coding(Ubuntu + Node + Python + Go),claude-code-bedrock(Claude Code CLI wired to AWS Bedrock via IRSA),minimal-shell(Alpine + bash + git + curl). All published onghcr.io/agenttier/sandbox-*.
Security and isolation¶
- NetworkPolicy by default — deny-all egress, allow DNS. Opt-in egress rules per template (e.g. "allow github.com and pypi.org"). Inter-sandbox peering is opt-in via label selectors.
- Hardened pod defaults — non-root, read-only root filesystem, drop all capabilities,
seccomp=RuntimeDefault, per-sandbox ServiceAccounts with zero cluster permissions. - Kernel isolation — optional gVisor RuntimeClass for untrusted workloads.
- Per-session credentials — STS AssumeRole or Kubernetes Secrets projected into the exec session at terminal open time (not baked into the image).
- IRSA / Workload Identity — zero long-lived cloud keys. IAM roles attach to the sandbox's ServiceAccount on EKS, Workload Identity does the same on GKE.
- Signed container images — every released image is cosign-signed with keyless OIDC (GitHub Actions identity). SPDX + CycloneDX SBOMs attached as OCI attestations. See Verifying images.
Interactive access¶
- Browser terminal — full PTY over WebSocket with xterm.js. Resize, ANSI colors, paste, copy, and a 30-second reconnection window for network blips.
- Non-interactive exec —
POST /api/v1/sandboxes/{id}/execreturnsstdout/stderr/exitCode. Matches how the SDK'ssandbox.exec()is wired. - Port forwarding — expose any container port with one click (Web UI) or one API call. AgentTier creates a ClusterIP Service, adds an Ingress when a preview domain is configured, and also offers an authenticated in-Router reverse proxy so users can reach ports even without DNS. See Port forwarding.
Multi-tenancy and governance¶
- OIDC + API keys — Cognito, Okta, Azure AD, Auth0, Google — anything with a JWKS endpoint works. API keys are stored as SHA-256 hashes with an LRU cache. Dev mode (no OIDC configured) grants anonymous admin for local development.
- Governance policies — cluster-wide default + per-namespace overrides with field-level merge. Enforced synchronously at sandbox creation; violations return a structured
policy_violationbody with stable machine codes so UIs pinpoint the failing field. See Governance for the full rule list. - Admin-gated editor —
Settings → Governancein the Web UI renders the active policies; only users with the admin claim can edit. - Audit trail — lifecycle, terminal, credential, share, clone, and port-forward events recorded as Kubernetes Events. The Activity Log page filters on action, user, and time range. An optional SQL backend (phase 7.13) is planned for long-term retention.
Web UI¶
- Dashboard — sandbox cards with status, template, age, one-click Stop / Resume / Delete / Open Terminal. Running cards also show an inline Port Forwards panel.
- Templates editor — in-browser YAML editor with syntax highlighting, create / save / delete, field validation.
- Activity Log — time-ordered events with filters.
- Metrics — live sandbox counts, average startup time, reconciliation queue depth.
- Cost Estimator — current monthly cost based on running resources.
- Settings — governance policies, warm pool sizing and template, operational defaults. Admin-gated.
Client tooling¶
- Python SDK —
pip install agenttier. Sync + async clients, typed Pydantic models, auto-detected auth, structured exception hierarchy. See SDK. - CLI —
agenttierGo binary for linux / macOS / Windows on amd64 + arm64. See CLI. - REST API — sandboxes, templates, governance, port forwarding, audit, analytics, warm pool, identity. Documented inline in
pkg/router/server.goand exercised by the SDK.
Observability¶
- OpenTelemetry — distributed traces across controller + router with trace context in structured JSON logs. OTLP exporter wires to any collector; the Helm chart can optionally deploy one as a sidecar.
- Prometheus —
/metricsexposes sandbox counts by status/template, startup-duration histograms, reconciliation queue depth, error counters, terminal session stats. OptionalServiceMonitorfor Prometheus Operator. - Kubernetes Events — every lifecycle transition emits a typed Event on the Sandbox resource so
kubectl describe sandboxis a first-class debugging surface. - Startup logging —
startupDurationMsis logged per creation and recorded on an Event for regression tracking.
Deployment and operations¶
- Single Helm chart — one
helm install agenttier agenttier/agenttierdeploys controller, router, web UI, CRDs, RBAC, and all opt-ins. - Multi-cluster — works on EKS, GKE, AKS, kind, and any self-managed Kubernetes 1.27+ with NetworkPolicy-capable CNI.
- Leader-elected HA — multi-replica controller with Lease-based election. Graceful degradation for non-critical dependency failures (e.g. can't reach OTel collector).
- Kubernetes-native state — defaults to Kubernetes etcd + Events + ConfigMaps for all state. An optional SQL backend (Postgres / MySQL / SQLite) is on the roadmap for compliance-driven long-term retention.
- Terraform — EKS / GKE / AKS modules under
terraform/for fully-provisioned reference deployments.
What is not here yet¶
Roadmap items that are not shipped in v0.3.0 and will return real errors or missing features if you rely on them:
- Sharing and collaboration (viewer/collaborator roles, expiring share links) — planned for 0.2.x.
- File transfer API — planned for 0.2.x.
- Sandbox cloning via
VolumeSnapshot— planned for 0.2.x. - Notifications (webhook / email / Slack) — planned for 0.2.x.
- WebSocket ping frames + ALB migration — planned for 0.2.x; sessions through AWS Classic ELBs may still need manual reconnection every 60 minutes without the
connection-idle-timeoutannotation tweak. - Optional SQL backend for audit + analytics long-term retention — planned for 0.3.x.
Track progress in the GitHub issues or the todo.md file in the repo if you are contributing.