Tutorial: Agent mode in depth¶
Agent mode turns AgentTier into a runtime for AI agents — both turnkey harnesses and code that uses agent frameworks. You configure a sandbox once with your agent code, then call /invoke to run it on demand. Output streams back as Server-Sent Events.
By the end of this tutorial you will have a working LangGraph agent running on the live cluster, invoked from the SDK, the CLI, and the Web UI.
Time: ~30 minutes
Prerequisites: AgentTier installed; the langgraph-agent template visible in kubectl get clustersandboxtemplates.
1. The mental model¶
Agent mode adds two endpoints to a sandbox:
POST /api/v1/sandboxes/{id}/configure— uploads files into the PVC and runs a one-shot install. Idempotent.POST /api/v1/sandboxes/{id}/invoke— runs the configured entrypoint, pipes the request body to stdin, streams stdout / stderr / exit as SSE events.
AgentTier does not run the agent loop, manage models, or own tool dispatch. Your code (or the framework you pick) does. AgentTier owns lifecycle, auth, secrets injection, network policy, streaming transport, audit, OTel, and governance.
A typical agent sandbox config is:
entrypoint:["python", "/workspace/agent.py"]— runs every invoke.installCommand:["pip", "install", "-r", "/workspace/requirements.txt"]— runs once at configure time.workingDir:/workspace.env: API keys, model endpoints,MEM0_BASE_URL, etc.maxConcurrentInvokes: optional cap on parallel invokes per sandbox.defaultInvokeTimeout: per-invoke wall-clock cap.
2. Create an agent-mode sandbox¶
kubectl apply -f - <<'EOF'
apiVersion: agenttier.io/v1alpha1
kind: Sandbox
metadata:
name: agent-tutorial
namespace: default
spec:
templateRef:
name: langgraph-agent
kind: ClusterSandboxTemplate
EOF
kubectl wait --for=jsonpath='{.status.phase}'=Running \
sandbox/agent-tutorial --timeout=180s
The langgraph-agent template sets mode: agent and points at ghcr.io/agenttier/sandbox-langgraph:v0.3.0, which preinstalls Python 3.11, LangGraph, LangChain, httpx, and the mem0 client.
3. Configure with your agent code¶
The simplest agent code, copied from the bundled example:
# /tmp/agent.py
import json
import sys
def main():
payload = sys.stdin.read()
try:
data = json.loads(payload) if payload.strip() else {}
except json.JSONDecodeError:
data = {"prompt": payload}
prompt = data.get("prompt", "hello")
# In a real agent, this is where LangGraph / LangChain takes over.
# For the tutorial we just echo back deterministically.
print(json.dumps({"reply": f"You said: {prompt}", "echo": data}))
if __name__ == "__main__":
main()
Write the file:
cat > /tmp/agent.py <<'EOF'
import json
import sys
def main():
payload = sys.stdin.read()
try:
data = json.loads(payload) if payload.strip() else {}
except json.JSONDecodeError:
data = {"prompt": payload}
prompt = data.get("prompt", "hello")
print(json.dumps({"reply": f"You said: {prompt}", "echo": data}))
if __name__ == "__main__":
main()
EOF
Now run configure from the CLI:
agenttier configure agent-tutorial \
--file agent.py=/tmp/agent.py \
--entrypoint "python /workspace/agent.py"
(No --install needed — the langgraph image already has Python and stdlib.)
Status afterward:
kubectl get sandbox agent-tutorial -o jsonpath='{.status.agentConfigure}'
# {"lastConfiguredAt":"...", "installCommandHash":"...", "entrypoint":[...], "installExitCode":0}
The installCommandHash makes configure idempotent: re-running with the same install command is a no-op.
4. Invoke from the CLI¶
agenttier invoke agent-tutorial --prompt "hello world"
# {"reply": "You said: hello world", "echo": {"prompt": "hello world"}}
# (exit 0)
The CLI streams stdout / stderr to your terminal in real time and exits with the entrypoint's exit code. Pipe-friendly:
agenttier invoke agent-tutorial --input @/path/to/body.json | jq .reply
5. Invoke from the SDK¶
from agenttier import AgentTierClient
with AgentTierClient(api_url="http://localhost:8081") as client:
sbx = client.get_sandbox("agent-tutorial")
# one-shot
result = sbx.agent.invoke({"prompt": "hello from python"})
print(result.exit_code) # 0
print(result.stdout) # JSON
print(result.duration_ms) # 200ish
# streaming
for event in sbx.agent.invoke_stream({"prompt": "stream please"}):
if event.stream == "stdout":
print(event.data, end="")
elif event.stream == "stderr":
print(f"[STDERR] {event.data}", end="")
elif event.stream == "exit":
print(f"\nexit={event.data['exit_code']}")
For long-running invokes, invoke_stream lets you render output incrementally — exactly what the Web UI does.
6. Invoke from the Web UI¶
Open the Web UI, expand Advanced on the agent-tutorial card, click the Agent tab. Three panels:
- Configure — re-run installs, edit entrypoint, see
lastConfiguredAtand the install log. - Invoke — paste a prompt or JSON body, click Invoke, watch stdout/stderr stream live. The Cancel button SIGTERMs the entrypoint mid-flight.
- Recent invokes — last few invokes from this session with status, duration, and (if OTel is wired) a trace link.
Try canceling: invoke a long-running script and click Cancel before it finishes. The exit event reports exit_code: -1 (SIGTERM) and the recent-invokes list shows it as cancelled.
7. Cancel an in-flight invoke programmatically¶
The first SSE event always carries an invoke_id. Save it and POST to /invoke/cancel:
import threading
import time
invoke_id_holder = {}
def consume():
for event in sbx.agent.invoke_stream({"prompt": "..."}):
if event.stream == "start":
invoke_id_holder["id"] = event.data["invoke_id"]
# ...
t = threading.Thread(target=consume)
t.start()
time.sleep(2)
sbx.agent.invoke_cancel(invoke_id_holder["id"])
t.join()
For most cases, prefer simply closing the SSE stream — the Router treats client disconnect as cancel and SIGTERMs the entrypoint automatically.
8. Concurrency caps¶
Set maxConcurrentInvokes to throttle. From the SDK's perspective, over-cap requests get HTTP 429 with Retry-After: 5 and a structured body. Patch the spec:
kubectl patch sandbox agent-tutorial --type=merge -p \
'{"spec":{"harness":{"agent":{"maxConcurrentInvokes":2}}}}'
Now any third concurrent invoke is rejected:
from agenttier.exceptions import HTTPError
try:
sbx.agent.invoke({"prompt": "..."})
except HTTPError as e:
if e.status_code == 429:
print(f"throttled — try again in {e.retry_after}s")
Cluster-wide ceilings are enforced by governance — see Governance.
9. Optional mem0 memory sidecar¶
For a quick local memory backend, enable the sidecar at install time:
helm upgrade --install agenttier agenttier/agenttier \
-n agenttier --reuse-values \
--set optional.agentMemorySidecar.enabled=true
Once enabled, AgentTier injects a mem0 container into every mode: agent Pod, listens on 127.0.0.1:11434, and sets MEM0_BASE_URL automatically. Your agent code can:
from mem0 import Memory
m = Memory()
m.add("user prefers dark mode", user_id="alice")
# ...later invoke...
print(m.search("preferences", user_id="alice"))
Memory persists in /workspace/.agenttier/memory/ so it survives stop / resume.
For production, you typically prefer an external service (AgentCore Memory, Pinecone, Postgres + pgvector, OpenSearch). See Agent memory for those patterns.
The mem0 sidecar is opt-in and disabled by default because the upstream image is currently arm64-only.
10. Audit and observability¶
Every invoke emits:
- An OTel span
agenttier.invokewith sandbox / template / actor / invoke_id / duration / exit_code attributes. - A line in the audit log via
GET /api/v1/audit/events. - Prometheus metrics:
agenttier_invoke_requests_total,agenttier_invoke_duration_seconds,agenttier_invoke_throttled_total.
/configure emits parallel agenttier.configure spans and metrics.
By default argv and stdin are not recorded — they may contain secrets. Operators can flip audit.includeInvokePayloads=true for development clusters.
11. Bring your own framework¶
The langgraph-agent template is one example. The contract is:
- Image must be
mode: agent-compatible: it should accept the entrypoint command and read stdin if applicable. installCommandruns once and produces a usable/workspace.entrypointis invoked on every/invoke.
Any framework that fits — Strands Agents, AutoGen, CrewAI, Pydantic AI, OpenAI Agents SDK — drops in. Build your own image (or extend sandbox-langgraph), reference it in a SandboxTemplate, configure, invoke.
For fully turnkey harnesses (OpenHands, OpenClaw), the same flow applies: the harness owns the loop, AgentTier owns the runtime.
12. Clean up¶
agenttier sandbox delete agent-tutorial
What you just learned¶
- Agent mode is
mode: agentplus aHarnessSpec.agentblock — entrypoint, install, env, concurrency cap. - Configure once, invoke many. Both endpoints stream SSE.
- Cancel via
/invoke/cancelor by closing the SSE stream. - Concurrency capped per-sandbox by
maxConcurrentInvokes, cluster-wide by governance. - Optional
mem0sidecar for local memory; external services preferred for production. - AgentTier does not own the agent loop or model providers — your code or framework does.
What to read next¶
- Agent mode reference — every endpoint and field.
- Agent memory patterns — PVC-local, sidecar, external services, NetworkPolicy egress snippets.
- Governance — cluster ceilings on agent caps and image registries.
- SDK reference — full
sandbox.agentAPI surface.