AgentCgroup: What Happens When AI Coding Agents Meet OS Resources?
AI coding agents such as Claude Code, OpenHands, and SWE-agent are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls inside sandboxed containers. Despite growing adoption, the OS-level resource dynamics of these workloads remain poorly understood. We present the first systematic characterization, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM backends. Our measurements reveal that OS-level overhead, including container initialization and tool execution, accounts for 56–74% of end-to-end latency, while LLM reasoning contributes only 26–44%. Memory exhibits a 15.4x peak-to-average ratio (compared to ~1.5x for serverless and 2–3x for microservices), with change rates reaching 3 GB/s in sub-second bursts. The same tool type (Bash) varies 13.7x in memory consumption depending on command semantics, and repeated runs of the same task produce 1.8x execution time variance with near-zero correlation (r = −0.14) between token output and peak memory.
These characteristics expose mismatches with existing resource management mechanisms, from kernel cgroup limits and systemd-oomd to Kubernetes VPA, where static allocation either wastes 93% of provisioned capacity or triggers OOM kills that destroy minutes of accumulated, non-reproducible agent state. In this post, we summarize the characterization findings from our AgentCgroup paper and describe how eBPF-based in-kernel enforcement can bridge the gap between agent workload dynamics and OS-level resource control.