Blog

February 17, 2026
13 min read

AgentCgroup: What Happens When AI Coding Agents Meet OS Resources?

AI coding agents such as Claude Code, OpenHands, and SWE-agent are increasingly deployed in multi-tenant cloud environments, where they execute diverse tool calls inside sandboxed containers. Despite growing adoption, the OS-level resource dynamics of these workloads remain poorly understood. We present the first systematic characterization, analyzing 144 software engineering tasks from the SWE-rebench benchmark across two LLM backends. Our measurements reveal that OS-level overhead, including container initialization and tool execution, accounts for 56–74% of end-to-end latency, while LLM reasoning contributes only 26–44%. Memory exhibits a 15.4x peak-to-average ratio (compared to ~1.5x for serverless and 2–3x for microservices), with change rates reaching 3 GB/s in sub-second bursts. The same tool type (Bash) varies 13.7x in memory consumption depending on command semantics, and repeated runs of the same task produce 1.8x execution time variance with near-zero correlation (r = −0.14) between token output and peak memory.

These characteristics expose mismatches with existing resource management mechanisms, from kernel cgroup limits and systemd-oomd to Kubernetes VPA, where static allocation either wastes 93% of provisioned capacity or triggers OOM kills that destroy minutes of accumulated, non-reproducible agent state. In this post, we summarize the characterization findings from our AgentCgroup paper and describe how eBPF-based in-kernel enforcement can bridge the gap between agent workload dynamics and OS-level resource control.

February 13, 2026
20 min read

Reverse Engineering Claude Code's SSL Traffic with eBPF

Ever wondered what your AI coding agent is actually sending over the network? As part of our work on AgentSight — an eBPF-powered observability tool that monitors AI agents at system boundaries without modifying their code — we needed to intercept Claude Code's encrypted API traffic to understand its communication patterns. AgentSight can capture both what an AI agent intends to do (by intercepting LLM API traffic) and what it actually does (by monitoring kernel events), then correlate the two into a complete causal chain. To make this work, we first need to crack open the agent's encrypted network traffic. Sounds straightforward, right? Just hook into Node.js's OpenSSL with sslsniff, like we did with older versions of Claude Code. Back then, Claude Code ran on Node.js with dynamically linked libssl.so, and standard eBPF-based SSL hooking worked out of the box.

But everything changed. The new Claude Code is a 213 MB stripped Bun binary with BoringSSL statically linked and all symbols removed. No libssl.so, no SSL_write symbol to hook — standard tools are completely blind to it. So we did what any self-respecting systems researcher would do: we asked Claude Code to reverse engineer itself. Yes, the very AI agent we were trying to monitor became our reverse engineering assistant, analyzing its own binary, searching for byte patterns, and helping us locate the stripped SSL functions inside itself.

This post chronicles that complete journey: from discovering why standard tools fail, to matching byte patterns against Bun's open-source profile builds, to finally capturing every single API call — including the full /v1/messages conversation stream with prompts, responses, and SSE events. Along the way, we chased a fascinating red herring about Bun's "dual TLS paths" that turned out to be a simple timing issue. The techniques here apply to any stripped binary with statically-linked SSL, not just Claude Code.

AgentSight is open source at https://github.com/agent-sight/agentsight, and the arxiv paper describing the full system is available at https://arxiv.org/abs/2508.02736.

January 29, 2026
42 min read

A Taxonomy of GPU Bugs: 19 Defect Classes for CUDA Verification

Introduction

GPU programming introduces a distinct class of correctness and performance challenges that differ fundamentally from traditional CPU-based systems. The SIMT (Single Instruction, Multiple Threads) execution model, hierarchical memory architecture, and massive parallelism create unique bug patterns that require specialized verification and detection techniques.

Just as eBPF enables safe, verified extension code to run inside the Linux kernel, bpftime gpu_ext (The arxiv, previous name eGPU) bring eBPF to GPUs, allowing user-defined policy code (for observability, scheduling, or resource control) to be injected into GPU drivers and kernels with static verification guarantees. Such a GPU extension framework must ensure that policy code cannot introduce crashes, hangs, data races, or unbounded overhead. A critical concern in modern GPU deployments is performance interference in multi-tenant environments: contention for shared resources makes execution time unpredictable. "Making Powerful Enemies on NVIDIA GPUs" studies how adversarial kernels can amplify slowdowns, arguing that performance interference is a system-level safety property when GPUs are shared. This motivates treating bounded overhead as a correctness property, not merely an optimization goal.

To build a sound GPU extension verifier, we must first understand what can go wrong. This taxonomy identifies the defect classes a verifier must address, drawing lessons from eBPF's success: restrict the programming model, enforce bounded execution, and verify memory safety before loading. We synthesize findings from static verifiers (GPUVerify, GKLEE, ESBMC-GPU), dynamic detectors (Compute Sanitizer, Simulee, CuSan), and empirical bug studies (Wu et al., ScoRD, iGUARD) into 19 defect classes organized along two dimensions: impact type (Safety, Correctness, Performance) and GPU specificity (GPU-specific, GPU-amplified, CPU-shared). Each entry provides concrete examples, documents detection tools, and offers actionable verification strategies.

January 11, 2026
38 min read

Architectures for Agent Systems: A Survey of Isolation, Integration, and Governance

Large Language Model (LLM) based agent systems – software that leverages LLMs to autonomously plan and execute multi-step tasks using external tools – are rapidly moving from proof-of-concept demos into enterprise deployment. These agents promise to automate coding, IT operations, data analysis, and more, but deploying them in production raises new challenges in security, reliability, and integration. Over the last half-year, the community has converged on key strategies: strong isolation for executing untrusted actions, standardized protocols for tool integration, and governance frameworks to align agent behavior with enterprise policies. This survey provides a systematic review of recent developments (roughly the latter half of 2025), including agent sandbox architectures, emerging standards like MCP, open-source projects, industry initiatives, and research advances. We focus on the pain points encountered when bringing agent systems to production and how the latest solutions address (or still fall short on) those needs.

October 14, 2025
18 min read

The GPU Observability Gap: Why We Need eBPF on GPU devices

Yusheng Zheng=, Tong Yu=, Yiwei Yang=

As a revolutionary technology that provides programmability in the kernel, eBPF has achieved tremendous success in CPU observability, networking, and security. However, for the increasingly important field of GPU computing, we also need a flexible and efficient means of observation. Currently, most GPU performance analysis tools are limited to observing from the CPU side through drivers/user-space APIs or vendor-specific performance analysis interfaces (like CUPTI), making it difficult to gain deep insights into the internal execution of the GPU. To address this, bpftime provides GPU support through its CUDA/SYCL attachment implementation, enabling eBPF programs to execute within GPU kernels on NVIDIA and AMD GPUs. This brings eBPF's programmability, observability, and customization capabilities to GPU computing workloads. By doing so, it enables real-time profiling, debugging, and runtime extension of GPU applications without source code modification, addressing a gap in the current observability landscape.

Note: GPU support is still experimental. For questions or suggestions, open an issue or contact us.

October 14, 2025
71 min read

NVIDIA Open GPU Kernel Modules Comprehensive Source Code Analysis

In May 2022, NVIDIA made a decision that would fundamentally alter the landscape of GPU computing on Linux: they open-sourced the kernel-mode components of their GPU driver. This wasn't a simple code dump. Instead, it was the release of over 935,000 lines of production-quality, battle-tested code that powers everything from consumer gaming rigs to the world's fastest supercomputers. For the first time, developers, researchers, and engineers could peer inside the machinery that manages some of the most complex hardware ever created.

This document represents a comprehensive deep-dive into that codebase, providing a technical autopsy of one of the most sophisticated device drivers in the Linux ecosystem. Over the course of this analysis, we've examined every major subsystem, traced data flows through multiple abstraction layers, and documented architectural decisions that span over a decade of GPU evolution. What emerges is not just a driver, but a masterclass in systems programming: how to manage heterogeneous computing resources, how to maintain binary compatibility across wildly different hardware generations, and how to balance the competing demands of performance, security, and maintainability.

Version: 580.95.05 Analysis Date: 2025-10-13 License: Dual MIT/GPL Repository: https://github.com/NVIDIA/open-gpu-kernel-modules

October 11, 2025
26 min read

Understanding iaprof: A Deep Dive into AI/GPU Flame Graph Profiling

An exploration of Intel's innovative profiling tool that bridges the gap between CPU and GPU execution

Project Link: github.com/intel/iaprof - Intel's AI/GPU flame graph profiler

If you've ever tried to optimize a GPU-accelerated machine learning workload, you've likely encountered a frustrating problem: your code runs on the CPU, but the performance bottlenecks live on the GPU. Traditional profiling tools show you one world or the other, but never both together. This disconnect makes it nearly impossible to understand which lines of your Python or C++ code are responsible for expensive GPU operations.

Enter iaprof, Intel's solution to this observability challenge. At its core, iaprof is a profiling tool that generates what Intel calls "AI Flame Graphs" - interactive visualizations that seamlessly connect your application code to the GPU instructions it triggers, all while showing where performance is actually being lost. This blog post will take you on a deep dive into how iaprof works, the technologies it leverages, and the architecture that makes it possible.

August 26, 2025
12 min read

AgentSight: Keeping Your AI Agents Under Control with eBPF-Powered System Observability

Picture this: your AI agent is autonomously writing code and executing commands, but you have no idea what it's actually doing. Sounds a bit unsettling, right? As LLM agents like Claude Code and Gemini-cli become increasingly prevalent, we're handing more and more control to AI. But here's the catch: these AI agents are fundamentally different from traditional software, and our existing monitoring tools are struggling to keep up. They can either see what the AI is "thinking" (through LLM prompts) or what it's "doing" (system calls), but never both together. It's like watching someone's mouth move or seeing their actions, but never connecting the two. This blind spot makes it nearly impossible to tell if your AI is working normally, under attack, or burning through your budget in an infinite loop.

That's why we built AgentSight. We leverage eBPF to monitor AI agents at system boundaries where they can't hide. AgentSight intercepts encrypted LLM communications to understand what the AI intends to do, monitors kernel events to track what it actually does, then uses a smart correlation engine to connect these dots into a complete causal chain. The best part? You don't need to modify your code, it works with any framework, and the performance overhead is less than 3%! In our tests, AgentSight successfully caught prompt injection attacks, spotted expensive reasoning loops before they drained budgets, and revealed hidden bottlenecks in multi-agent collaborations. AgentSight is now open source and ready to use at https://github.com/agent-sight/agentsight. The arxiv paper is available at https://arxiv.org/abs/2508.02736.

August 20, 2025
59 min read

Profiling and Tracing Tools Across System Layers and Architectures

Profiling and tracing are complementary techniques for analyzing software performance and behavior. Profiling typically measures where a program spends its time or resources, aggregating data (e.g. CPU time per function, memory usage per module) to identify performance bottlenecks and hot spots. In contrast, tracing records a timeline of events or operations (e.g. function calls, kernel events, network requests) to reconstruct execution flows. Both are motivated by the need to understand and optimize complex systems: using these tools is “vital for understanding program behavior, performance bottlenecks and optimisation potentials”. Common use cases include finding CPU hotspots in code, diagnosing memory leaks or I/O delays, understanding concurrency issues (e.g. lock contention), and tracking requests across distributed services. By applying profiling or tracing, developers and researchers can observe the internal state and performance of software, leading to faster troubleshooting and more efficient, reliable systems.

In modern environments, profiling/tracing spans multiple layers – from low-level CPU and OS events up to application, runtime, and distributed system behaviors. The sections below present a comprehensive catalog of tools and frameworks at each level (covering both open-source and commercial solutions), organized by context. We compare their capabilities across major architectures (x86, ARM, RISC-V, GPUs, TPUs, DSPs, etc.), and discuss the design challenges (overhead, observability, correlation, etc.) that shape current research. Typical scenarios and use cases are highlighted to illustrate why these tools are indispensable – for example, improving end-user experience by optimizing slow code paths, catching sporadic memory leaks via continuous monitoring, or tracing microservice interactions to pinpoint latency in a complex transaction. The goal is to give both a broad overview and deep insights into the state of the art in profiling and tracing technology.

June 21, 2025
26 min read

The Modern Memory Testing Arsenal -- A Complete Guide to Benchmarking Tools for Next-Gen Memory Systems

Introduction

Memory systems are evolving rapidly. From traditional DDR DRAM to high-bandwidth memory (HBM), persistent memory (PMEM), and the emerging Compute Express Link (CXL) technology, today's systems feature complex heterogeneous memory hierarchies that demand sophisticated evaluation approaches.

This comprehensive guide surveys the cutting-edge tools and methodologies available for testing, benchmarking, and profiling modern memory systems. Whether you're a hardware architect designing next-generation memory controllers, a software developer optimizing applications for heterogeneous memory, or a researcher exploring memory system co-design, this survey provides a roadmap to the essential tools shaping memory system evaluation from 2018 to 2025.

We cover everything from synthetic workload generators that can clone application memory behavior, to trace replay frameworks that enable reproducible testing, specialized benchmark suites for emerging technologies, and profiling tools that provide deep insights into memory performance bottlenecks. The landscape has evolved from simple bandwidth and latency measurements to sophisticated AI-driven workload synthesis and unified frameworks that span multiple memory technologies.

June 21, 2025
25 min read

Observability, Profiling, and Debugging in Systems Conference (2015–2025)

Abstract:

This survey reviews over a decade (2015–2025) of research on observability, profiling, and debugging techniques in computer systems, focusing on main-track papers from OSDI, SOSP, and EuroSys. We cover more than 100 papers spanning dynamic tracing frameworks, logging and monitoring infrastructures, performance anomaly detection, root cause analysis, and system visibility mechanisms. We identify core problems addressed (from tracing distributed requests to detecting configuration or concurrency bugs), techniques employed (dynamic instrumentation, static analysis, in situ logging, distributed monitors, ML-assisted analysis), targeted domains (OS kernels, cloud and distributed systems, mobile/IoT systems, etc.), and how these works relate and build upon each other. Trends over time are discussed – e.g., the evolution from ad-hoc tracing in monolithic systems to always-on, low-overhead observability in microservices – as well as emerging integration of machine learning for anomaly detection and root cause analysis. We conclude with open challenges such as scaling observability to highly disaggregated systems, reducing overhead and noise in tracing, automating diagnosis across abstraction layers, and improving the usability of debugging tools in production.

May 11, 2025
68 min read

Checkpoint/Restore Systems: Evolution, Techniques, and Applications in AI Agents

Checkpoint/restore (C/R) technology – the ability to save a running program’s state to persistent storage and later resume execution from that point – has long been a cornerstone of fault tolerance and process management in computing. By capturing a snapshot of a process or group of processes, C/R enables recovery from failures, migration of computations, load balancing, and the suspension/resumption of work. Traditionally, C/R has been critical in high-performance computing (HPC) environments to mitigate frequent failures in large clusters, in operating systems for process migration and preemption, and in virtualization platforms for live virtual machine (VM) migration with minimal downtime. As we usher in an era of AI-centric applications – from AI-assisted developer tools and autonomous agents to distributed machine learning pipelines – the scope of C/R is expanding. Modern AI systems often consist of long-running stateful agents, complex multi-process pipelines, and GPU-accelerated workloads, all of which introduce new requirements and challenges for checkpointing. For example, training massive deep learning models over weeks exposes a system to many failures; one 54-day run of a 405-billion parameter model across 16,000 GPUs experienced 419 interruptions (78% from hardware faults), potentially costing millions in lost work. Techniques like maintaining redundant in-memory states for fast recovery are used in such cases, underscoring the importance of robust checkpointing. This survey provides a comprehensive overview of C/R systems and their evolution, spanning traditional use cases (before the advent of AI agents) and emerging applications in AI. We cover checkpointing at all levels of the software stack (OS-level, container, VM, application, and library-level), discuss stateless vs. stateful restoration strategies for AI systems, compare prominent open-source and proprietary C/R solutions, delve into the technical mechanisms enabling C/R (memory snapshotting, I/O and descriptor handling, GPU state, etc.), and highlight research challenges in bringing reliable, efficient C/R to dynamic, interactive AI agent environments. We also include extensive references to both classic literature and recent works (with a focus on peer-reviewed research), and we provide comparative tables to summarize the landscape of C/R tools and their capabilities. By looking at past and present developments, we aim to outline the trajectory of checkpoint/restore technology and identify opportunities for new tooling tailored to the next generation of AI-driven applications.

April 25, 2025
31 min read

ASPLOS 2025: Paper Summaries and Insights

The Association for Computing Machinery's Architectural Support for Programming Languages and Operating Systems (ASPLOS) conference is a premier venue where researchers present cutting-edge work spanning computer architecture, programming languages, operating systems, and their intersections. ASPLOS 2025 showcased significant advances across these domains, with particular emphasis on AI systems, heterogeneous computing, and novel memory architectures. This summary analyzes key papers and trends from the conference, highlighting their implications for both academia and industry.

April 25, 2025
45 min read

EuroSys 2025 Paper Summaries and Analysis

This post offers a detailed examination of papers accepted at EuroSys 2025, one of the premier conferences in computer systems research. I've analyzed over 40 papers spanning AI systems, cloud computing, networking, storage, and security to identify emerging trends, technical breakthroughs, and industry implications. For researchers and practitioners alike, this analysis provides a roadmap of where systems research is heading—highlighting both solved problems and remaining challenges. Each paper is summarized with its key contributions and practical relevance, followed by a synthesis of overarching themes and future research directions.

April 21, 2025
53 min read

GPU Profiling Under the Hood: An Implementation-Focused Survey of Modern Accelerator Tracing Tools

Profiling and tracing heterogeneous accelerators (GPUs, DPUs, and APUs) is crucial for optimizing performance in modern systems. This survey provides a deep implementation-oriented review of how state-of-the-art tools capture low-level execution details. We target two audiences: (1) tool developers seeking insight into how existing profilers are built – including what interfaces they hook and how they trace both CPUs and accelerators – and (2) system engineers deciding which tools to integrate, based on capabilities and overhead. We explore internal architectures of key profiling tools, the runtime libraries and driver APIs they intercept, whether they trace events on CPU, GPU, or both, and how they handle cross-device correlation. We then examine implementation strategies (instrumentation vs. sampling, use of performance counters, dynamic binary instrumentation, kernel hooks like eBPF), as well as GPU-specific techniques (e.g. warp-level instruction sampling). Data handling and visualization approaches are compared, including trace data formats, storage and export methods, and GUI/CLI/Web interfaces (with integration into dashboards like TensorBoard or Grafana). We catalog the performance metrics these tools collect – from kernel latencies and throughput to SM occupancy, cache misses, and interconnect utilization – explaining how raw hardware counters are translated into derived metrics and the granularity/accuracy trade-offs involved. Next, we discuss what insights users can glean from visualizations (timelines, flame charts, dependency graphs, etc.), such as detecting pipeline stalls, memory bottlenecks, or CPU–GPU desynchronization. Tool extensibility (support for plugins or user-defined instrumentation) is reviewed, as are the relative overheads and intrusiveness of different measurement techniques. We also address security and sandboxing concerns – e.g. profiling in containerized or multi-tenant environments – and outline known limitations plus future directions (including support for emerging DPUs and integrated CPU-GPU APUs). The goal is a comprehensive systems-level survey that goes beyond feature lists to compare how these profilers work under the hood, including any reverse-engineered or undocumented methods used by open-source tools.

April 11, 2025
30 min read

The Accelerator Toolkit: A Review of Profiling and Tracing for GPUs and other co-processor

Modern computing increasingly relies on specialized accelerators – notably GPUs, DPUs, and APUs – to handle diverse workloads. A graphics processing unit (GPU) is a massively parallel processor originally for graphics, now essential in general-purpose computing (HPC) and AI. A data processing unit (DPU) is a newer class of programmable processor combining CPU cores with high-performance network/storage engines. DPUs offload networking, security, and storage tasks from the CPU, and are considered the "third pillar" of computing alongside CPUs and GPUs. Meanwhile, accelerated processing units (APUs) integrate CPU and GPU components on one chip – an approach pioneered by AMD's Fusion architecture – enabling unified memory and high throughput for HPC and AI workloads. These accelerators run a range of workloads: GPUs excel in parallel math (HPC simulation, deep learning training/inference, data analytics) and rendering graphics; DPUs focus on data-centric tasks (network packet processing, encryption, storage offload, virtualization); and APUs target heterogeneous workloads needing tight CPU-GPU coupling (e.g. sharing memory for AI or multimedia applications).

Profiling and tracing tools are crucial for optimizing performance on these accelerators. Such tools collect low-level hardware telemetry (e.g. counters for utilization, memory throughput, SM occupancy, cache misses) and can perform instruction or event-level tracing (capturing timelines of kernel executions, memory copies, network packet flows, etc.). The goal is to identify bottlenecks and inefficiencies in both general-purpose code and domain-specific pipelines (like ML model training, network function processing, or graphics rendering). However, profiling highly parallel, heterogeneous systems presents challenges of overhead, data volume, and cross-platform compatibility. This review categorizes current tracing/profiling tools by hardware type and domain, compares open-source and commercial solutions, and highlights major projects and recent research. We also discuss specialized toolsets (including eBPF-based approaches akin to Linux's BCC) adapted for GPUs/DPUs/APUs, typical workloads and matching toolchains, and the limitations and emerging directions in this landscape.

February 18, 2025
30 min read

OS-Level Challenges in LLM Inference and Optimizations

Large Language Model (LLM) inference pushes computing systems to their limits, not only in raw compute but also in how the operating system (OS) manages resources. This report examines OS-level challenges for LLM inference and explores potential solutions. We focus first on key bottlenecks – memory management, CPU scheduling, I/O, and real-time constraints – then discuss how kernel-level techniques (like eBPF and custom scheduling) can address these issues. We also consider the impact of system calls and page faults on performance, security/isolation concerns in multi-tenant environments, best practices and emerging research in OS customization for AI, and practical considerations for implementing such optimizations in a research project. The goal is to guide a research initiative by outlining the challenges and then potential solutions and areas for further investigation.

February 16, 2025
33 min read

WASI and the WebAssembly Component Model: Current Status

WebAssembly (WASM) has evolved from a browser-based technology into a promising runtime for server and embedded applications. Key to this evolution are the WebAssembly System Interface (WASI) – a standardized set of syscalls for WASM outside the browser – and the emerging Component Model, which enables modular, language-agnostic composition of WebAssembly modules. This report analyzes the current status of WASI (including its major runtimes like Wasmtime and Wasmer) and the Component Model, focusing on the technical limitations that impede adoption. We draw from official proposals, runtime issue trackers, academic studies, and industry experiences to highlight critical gaps, unresolved bugs, and feature requests. Case studies illustrate how these challenges block real-world use cases, and we conclude with recommendations for improving WebAssembly’s performance, security, and integrability with host environments.

February 12, 2025
41 min read

eBPF Ecosystem Progress in 2024–2025: A Technical Deep Dive

Introduction and Summary

Extended Berkeley Packet Filter (eBPF) continues to rapidly evolve, cementing its role as a cornerstone for operating system extensibility. In 2024 and into early 2025, the eBPF ecosystem saw significant advancements across the Linux kernel, tooling, security, networking, and observability domains. This report provides a comprehensive technical deep dive into these developments, with a high-level summary here and detailed sections below. Key highlights include:

This is a AI generated report from OpenAI Deep Reasearch. Let's see how it performs for eBPF.

February 10, 2025
33 min read

Security Vulnerabilities Study in Software Extensions and Plugins

Introduction

Software extensions and plugins allow customization and added features across many systems – from web servers and databases to browsers, IDEs, and CMS platforms. However, these extensibility mechanisms introduce security risks. Extensions often run with high privileges inside host applications, so a flaw in an extension can compromise the entire system ([PDF] Protecting Browsers from Extension Vulnerabilities - Google Research). Common vulnerability types include memory safety issues (buffer/stack overflows and heap corruptions), privilege escalations, sandbox escapes, infinite loops or livelocks causing Denial-of-Service, and arbitrary code execution (ACE). This report examines historical and recent vulnerabilities in extensions/plugins across various ecosystems, highlighting notable CVEs, trends, case studies of major incidents, and how improved security models like an Extension Interface Model (EIM) could mitigate such issues.

October 1, 2024
7 min read

Can LLMs understand Linux kernel? A New AI-Powered Approach to Understanding Large Codebases

Ever tried diving into a massive codebase like the Linux kernel and felt like you were swimming in an ocean of code with no land in sight? Trust me, you're not alone. Understanding large, complex, and constantly evolving codebases is like trying to read a never-ending novel that's being written by thousands of authors simultaneously. It's overwhelming, to say the least.

See our arxiv and GitHub repository for more details!

September 10, 2024
6 min read

Building High-Performance Userspace eBPF VMs with LLVM

We are excited to introduce llvmbpf, a new project aimed at empowering developers with a high-performance, multi-architecture eBPF virtual machine (VM) that leverages the LLVM framework for Just-In-Time (JIT) and Ahead-Of-Time (AOT) compilation.

This component is part of the bpftime project but focuses solely on the core VM. It operates as a standalone eBPF VM library or a compiler tool. This library is optimized for performance, flexibility, and minimal dependencies, making it easy to integrate into various environments without unnecessary overhead.

August 11, 2024
16 min read

The Past, Present, and Future of eBPF and Its Path to Revolutionizing Systems

This blog post mainly references Alexei Starovoitov's presentation "Modernize BPF for the Next 10 Years" at BPFConf 2024.

Imagine having a Swiss Army knife for your computer's core operations—something that lets you peek inside how data moves, tweak processes on the fly, and monitor everything in real-time. That’s exactly what eBPF (Extended Berkeley Packet Filter) offers. Over the past decade, eBPF has transformed from a simple packet filtering tool into a powerhouse for networking, observability, and security. So, what’s next for eBPF? Let’s dive into its journey, explore where it’s headed in the next ten years, and discuss the challenges and opportunities that lie ahead. This exploration will help us understand how eBPF is shaping the future of modern systems.

July 11, 2024
6 min read

Simplifying Kernel Programming: The LLM-Powered eBPF Tool

Kernel programming can be intimidating, requiring deep knowledge of operating system internals and programming constraints. Our latest tool, Kgent, aims to change that by making it easier than ever to create extended Berkeley Packet Filters (eBPF) programs. Kgent leverages the power of large language models (LLMs) to translate natural language prompts into eBPF code, opening up kernel programming to a wider audience.

Our paper, "Kgent: Kernel Extensions Large Language Model Agent," was recently presented at eBPF '24: Proceedings of the ACM SIGCOMM 2024 Workshop on eBPF and Kernel Extensions. Let's dive into what makes Kgent a game-changer for kernel programming.

June 11, 2024
15 min read

The eBPF Evolution and Future: From Linux Origins to Cross-Platform Dominance

eBPF (Extended Berkeley Packet Filter) has become a revolutionary force in operating system kernel programming since its inception. Initially created for network packet filtering, eBPF has evolved into a versatile kernel extension tool, supporting system monitoring, performance analysis, security policy enforcement, and more. As the technology has advanced, eBPF has gained widespread adoption not only on Linux platforms but also on other operating systems like Windows and macOS, showing significant potential as a cross-platform infrastructure technology. In this context, eBPF is driving the evolution of modern operating system architectures, opening new possibilities for cloud-native infrastructure and platform engineering.

April 11, 2024
8 min read

Implementing an Inline Hook in C in 5 minutes

One of the fascinating aspects of programming comes when we try to alter the behavior of a program while it is running.

In this tutorial, we shed light on one method that can make this possible - an "Inline Hook". We will delve into how you can manipulate the execution flow of a program in the C programming language. By implementing an Inline Hook, we aim to divert the program's execution flow to our function, then returning it back to the normal flow.

March 11, 2024
15 min read

The Evolution and Impact of eBPF: A list of Key Research Papers from Recent Years

This is a list of eBPF related papers I read in recent years, might be helpful for people who are interested in eBPF related research.

eBPF (extended Berkeley Packet Filter) is an emerging technology that allows safe execution of user-provided programs in the Linux kernel. It has gained widespread adoption in recent years for accelerating network processing, enhancing observability, and enabling programmable packet processing.

This document list some key research papers on eBPF over the past few years. The papers cover several aspects of eBPF, including accelerating distributed systems, storage, and networking, formally verifying the eBPF JIT compiler and verifier, applying eBPF for intrusion detection, and automatically generating hardware designs from eBPF programs.

February 11, 2024
4 min read

Introducing eunomia-bpf v1.0: Simplifying eBPF with CO-RE and WebAssembly

The world of eBPF (Extended Berkeley Packet Filter) has been rapidly evolving, offering developers powerful tools to monitor and modify the behavior of systems at the kernel level. Today, we're thrilled to introduce the latest milestone in this journey - eunomia-bpf v1.0. This release is a testament to our commitment to simplifying and enhancing the eBPF development experience with CO-RE (Compile Once, Run Everywhere) and WebAssembly.

February 11, 2024
16 min read

The Secure Path Forward for eBPF runtime: Challenges and Innovations

Yusheng Zheng

Extended Berkeley Packet Filter (eBPF) represents a significant evolution in the way we interact with and extend the capabilities of modern operating systems. As a powerful technology that enables the Linux kernel to run sandboxed programs in response to events, eBPF has become a cornerstone for system observability, networking, and security features.

However, as with any system that interfaces closely with the kernel, the security of eBPF itself is paramount. In this blog, we delve into the often-overlooked aspect of eBPF security, exploring how the mechanisms intended to safeguard eBPF can themselves be fortified. We'll dissect the role of the eBPF verifier, scrutinize the current access control model, and investigate potential improvements from ongoing research. Moreover, we'll navigate through the complexities of securing eBPF, addressing open questions and the challenges they pose to system architects and developers alike.

January 11, 2024
10 min read

Userspace eBPF Runtimes: Overview and Applications

Yusheng Zheng

In this blog post, we'll dive into the world of eBPF in userspace. While many are familiar with kernel-based eBPF, userspace eBPF runtimes have been making significant strides and offer compelling use cases. We will also compare userspace eBPF runtimes with Wasm runtimes, another popular technology in the cloud-native and edge computing landscape. Among these, we're excited to introduce bpftime. Powered by an LLVM JIT/AOT backend, our benchmarks suggest that bpftime stands out as one of the fastest userspace eBPF runtimes available.

November 11, 2023
13 min read

bpftime: Extending eBPF from Kernel to User Space

Yu Sheng Zheng, Yu Tong

eBPF is a revolutionary technology that originated in the Linux kernel, enabling sandboxed programs to run within the operating system's kernel. It is used to safely and efficiently extend the kernel's capabilities without altering its source code or loading kernel modules.

In this blog, we are excited to introduce a new open-source user-space eBPF runtime: https://github.com/eunomia-bpf/bpftime. bpftime further expands the capabilities of eBPF, allowing existing eBPF tools and applications, such as BCC tools, bpftrace, Deepflow, etc., to run in non-privileged user space without any code modifications, while using the same libraries and toolchains as kernel eBPF.

bpftime not only provides dynamic tracing or extension mechanisms like Uprobe and system call tracepoints, but also offers an order of magnitude performance improvement over kernel Uprobe. Moreover, like kernel eBPF, it requires no manual code instrumentation or process restarts. bpftime supports inter-process eBPF maps through user-space shared memory, while being compatible with kernel eBPF maps, enabling seamless operations with kernel eBPF infrastructure. Additionally, it includes high-performance LLVM JIT/AOT compilers for various architectures, as well as a lightweight JIT and interpreter for x86. Through performance data and real-world examples, we will demonstrate how bpftime can be effective in the real world and provide insights into its future development. We hope bpftime will bring unprecedented performance and flexibility to system monitoring, analysis, and extension. We also introduced the design and implementation of bpftime at the Linux plumbers 23 conference[2].

September 11, 2023
7 min read

Use ChatGPT to write eBPF programs and trace the Linux kernel with natural language

eBPF is a revolutionary technology that originated in the Linux kernel and allows sandboxed programs to run in the kernel of an operating system. It is used to securely and efficiently extend the functionality of the kernel without changing its source code or loading kernel modules. Today, eBPF is widely used in various scenarios: in modern data centers and cloud-native environments, it can provide high-performance network packet processing and load balancing; with very low resource overhead, it enables observability of various fine-grained metrics, helping application developers trace applications and gain insights for performance troubleshooting; it ensures secure execution of applications and container runtimes, and more. eBPF has become an increasingly popular technology that helps us to efficiently trace and analyze almost all applications in the kernel and user space.

However, developing eBPF programs or tracing various events generated by the kernel requires certain expertise. For developers unfamiliar with this technology, it can be challenging. In this case, the new ideas brought by our demo tool, GPTtrace, might help you solve this problem. It uses ChatGPT to write eBPF programs and trace the Linux kernel with natural language: https://github.com/eunomia-bpf/GPTtrace

If you are a developer who wants to trace and analyze more efficiently, similar solutions are definitely worth trying. The combination of ChatGPT and eBPF technology will play a more important role in future software development, debugging, and observability scenarios, and it may also bring a new interactive learning paradigm.

July 11, 2023
6 min read

Simplifying eBPF Development: GitHub Templates and Codespaces for Online Compilation and Execution

Embarking on the eBPF journey can feel daunting, especially when confronted with setting up the perfect environment or making the ideal language choice. But what if there was a streamlined way to immerse yourself in eBPF without the initial hurdles? Look no further! This guide unveils the magic of GitHub templates combined with GitHub Codespaces, empowering you to seamlessly initiate, compile, and run eBPF projects online. Dive in, click once, and turbocharge your eBPF expedition!

June 11, 2023
15 min read

When Wasm Meets eBPF: Writing, Distributing, Loading, and Running eBPF Programs with WebAssembly

In today's cloud-native world, eBPF and WebAssembly are two of the hottest lightweight code execution sandboxes/virtual machines. Both of them run high-performance bytecode programs compiled from languages such as C, C++, and Rust, and both are cross-platform and portable. The biggest difference between them is that eBPF runs in the Linux kernel, while WebAssembly runs in user space. We want to make an attempt to integrate them: using Wasm to write universal eBPF programs that can be distributed to different versions and architectures of Linux kernels without the need for recompiling.

April 11, 2023
9 min read

eBPF Advanced: Overview of New Kernel Features

The Linux kernel primarily released versions 5.16-5.19, 6.0, and 6.1 in 2022, each of which introduced numerous new features for eBPF. This article will provide a brief introduction to these new features, and for more detailed information, please refer to the corresponding link. Overall, eBPF remains one of the most active modules in the kernel, and its functionality is still rapidly evolving. In a sense, eBPF is rapidly evolving towards a complete kernel-level programmable interface.

March 11, 2023
3 min read

Progress of eunomia-bpf in March

The eunomia-bpf project is an open-source project aimed at providing a set of tools for writing and running eBPF programs more conveniently in the Linux kernel. In the past month, the project has made some new progress. Here is an overview of these advances.

February 11, 2023
11 min read

eunomia-bpf 0.3.0 Release: Easily Build, Package, and Publish Full eBPF Applications by Writing Kernel-Mode Code

Introduction to eunomia-bpf

eBPF, derived from BPF, is an efficient and flexible virtual machine component within the kernel. It executes bytecode at various kernel hook points in a secure manner, enabling developers to build performance analysis tools, software-defined networks, security solutions, and more. However, there are some inconveniences when it comes to developing and using eBPF applications:

Setting up and developing eBPF programs is a complex task. It requires handling interactions and information processing between kernel mode and user mode, as well as configuring the environment and writing corresponding build scripts.
Currently, it is difficult to achieve compatibility and unified management among tools written in different user mode languages like C, Go, Rust, etc. There is a challenge in integrating various development ecosystems, such as supporting multiple architectures, languages, and kernel versions. How can we package, distribute, and publish binary eBPF programs in a standardized and convenient way? Additionally, there is a need to easily adjust mounting points, parameters, and other aspects of eBPF programs.
How can we make it easier to use eBPF tools? Is it possible to download and use them with just one command from the cloud, similar to Docker? Can we run eBPF programs as services, allowing hot updates and dynamic insertion/removal through HTTP requests and URLs?

eunomia-bpf is an open-source eBPF dynamic loading runtime and development toolchain designed to simplify the development, building, distribution, and execution of eBPF programs. It is based on the CO-RE lightweight development framework of libbpf.

February 11, 2023
15 min read

Wasm-bpf: Bridging WebAssembly and eBPF for Kernel Programmability

Authors: Yu-Sheng Zheng, Mao-Lin Chou

Wasm was initially developed as a secure sandbox for browsers, and has since evolved into a high-performance, cross-platform, and multi-language software sandbox environment for cloud-native software components. The lightweight containerization provided by Wasm also makes it a suitable runtime for the next generation of serverless platforms. Another exciting trend is the rise of eBPF, which enables cloud-native developers to build secure networking, service mesh, and observability components. It is gradually penetrating and deepening into various components of the kernel, providing more powerful programmable interaction capabilities in kernel space.

Wasm-bpf is a new open-source project[1] that defines a set of abstract eBPF-related system interfaces and provides a corresponding development toolchain, library, and a universal Wasm + eBPF runtime platform instance. This allows applications in any Wasm virtual machine or Wasm lightweight container to sink and extend their use cases into kernel space, gaining access to almost all data from both kernel and user space. This enables programmable control at the entire operating system level in multiple areas such as networking and security, greatly expanding the applications of the WebAssembly ecosystem outside the browser.

February 11, 2023
11 min read

Wasm-bpf: A Common eBPF Kernel Programmability for Cloud-Native Webassembly

Author: Yusheng Zheng, Mao-Lin Chen

Originally developed with a browser-safe sandbox in mind, Wasm has evolved to make WebAssembly a high-performance, cross-platform and multilingual software sandbox environment for cloud-native software components, and Wasm lightweight containers are well suited as the next-generation serverless platform runtime. Another exciting trend is the rise of eBPF, which enables cloud-native developers to build secure networks, service grids, and multiple observable components, and which is also gradually penetrating and penetrating deeper into kernel components, providing more powerful kernel-state programmable interactions.

Wasm-bpf is a new open source project [1] that defines a set of abstractions for eBPF-related system interfaces and provides a corresponding set of development toolchains, libraries, and generic Wasm + eBPF runtime platform instances, giving applications in any Wasm virtual machine or Wasm lightweight container the ability to sink and extend usage scenarios to the kernel state, accessing almost all data in the kernel state and The eBPF runtime platform instance allows applications in any Wasm virtual machine or Wasm lightweight container to sink and expand their usage scenarios to the kernel state, access almost all data in the kernel state and user state, and achieve programmable control over the entire operating system in many aspects such as networking and security, thus greatly expanding the WebAssembly ecosystem in non-browser application scenarios.

January 11, 2023
8 min read

eunomia-bpf: Looking forward to 2023, let eBPF sprout wings with Wasm

Looking back at 2022, two technologies have received a lot of attention: eBPF and WebAssembly.

October 11, 2022
7 min read

如何在 Linux 显微镜（LMP）项目中开启 eBPF 之旅？

eBPF 为 Linux 内核提供了可扩展性，使开发人员能够对 Linux 内核进行编程，以便根据他们的业务需求快速构建智能的或丰富的功能。

我们的 LMP(Linux Microscope) 项目是为了充分挖掘 ebpf 的可能性而建立的，项目以构建 eBPF 学习社区、成为 eBPF 工具集散地、孵化 eBPF 想法和项目为目标，正在大力建设中。之前我们在 LMP 其中的 eBPF Supermarket 中包含了大量由个人开发者编写的 eBPF 工具，覆盖了网络、性能分析、安全等多种功能，我们正在尝试把其中的一些程序迁移到 eBPF Hub，一些规范化的 eBPF 程序库，可以随时下载运行，或嵌入大型应用程序中作为插件使用。

我们尝试在 eBPF Hub 中，基于 eunomia-bpf 开发框架创建符合 OCI 标准的 Wasm 和 eBPF 程序，并利用 ORAS 简化扩展 LMP 的 eBPF 分发、加载、运行能力。

September 11, 2022
8 min read

Running the ecli on Android 13

Author: CH3CHOHCH3

This article mainly records the author's exploration process, results, and encountered issues when testing the support level of the high version Android Kernel for CO-RE technology based on libbpf in the Android Studio Emulator. The testing method used is to build a Debian environment in the Android Shell environment and attempt to build the eunomia-bpf toolchain and run its test cases based on this environment.

February 11, 2022
14 min read

在 WebAssembly 中编写 eBPF 程序和使用 libbpf

Authors: Yu Tong, Zheng Yusheng

eBPF (extended Berkeley Packet Filter) is a high-performance kernel virtual machine that runs in the kernel space and is used to collect system and network information. With the continuous development of computer technology, eBPF has become increasingly powerful and is used to build various efficient online diagnostic and tracing systems, as well as secure networks and service meshes.

WebAssembly (Wasm) was initially developed for browser security sandbox purposes. As of now, WebAssembly has evolved into a high-performance, cross-platform, and multi-language software sandbox environment for cloud-native software components. The lightweight nature of Wasm containers makes them suitable for running as the next-generation serverless platform runtime or for efficient execution in resource-constrained scenarios such as edge computing.

Now, with the help of the Wasm-bpf compilation toolchain and runtime, we can use Wasm to write eBPF programs as cross-platform modules, while using C/C++ or Rust to write Wasm programs. By using eBPF programs in WebAssembly, we not only enable Wasm applications to benefit from the high performance and access to system interfaces of eBPF, but also allow eBPF programs to leverage the sandboxing, flexibility, cross-platform nature, and dynamic loading of Wasm. Additionally, we can conveniently and quickly distribute and manage eBPF programs using Wasm OCI images. Combining these two technologies will provide a completely new development experience for the eBPF and Wasm ecosystems!

February 11, 2022
5 min read

在 WebAssembly 中使用 Rust 编写 eBPF 程序并发布 OCI 镜像

作者：于桐，郑昱笙

eBPF（extended Berkeley Packet Filter）是一种高性能的内核虚拟机，可以运行在内核空间中，以收集系统和网络信息。随着计算机技术的不断发展，eBPF 的功能日益强大，并且已经成为各种效率高效的在线诊断和跟踪系统，以及构建安全的网络、服务网格的重要组成部分。

WebAssembly（Wasm）最初是以浏览器安全沙盒为目的开发的，发展到目前为止，WebAssembly 已经成为一个用于云原生软件组件的高性能、跨平台和多语言软件沙箱环境，Wasm 轻量级容器也非常适合作为下一代无服务器平台运行时，或在边缘计算等资源受限的场景高效执行。

现在，借助 Wasm-bpf 编译工具链和运行时，我们可以使用 Wasm 将 eBPF 程序编写为跨平台的模块，使用 C/C++ 和 Rust 编写程序。通过在 WebAssembly 中使用 eBPF 程序，我们不仅让 Wasm 应用获得 eBPF 的高性能、对系统接口的访问能力，还可以让 eBPF 程序享受到 Wasm 的沙箱、灵活性、跨平台性、和动态加载的能力，并且使用 Wasm 的 OCI 镜像来方便、快捷地分发和管理 eBPF 程序。例如，可以类似 docker 一样，从云端一行命令获取 Wasm 轻量级容器镜像，并运行任意 eBPF 程序。通过结合这两种技术，我们将会给 eBPF 和 Wasm 生态来一个全新的开发体验！