eBPF for Kubernetes Observability: An Architecture Decision Record for Replacing Traditional APM

ADR-001: Adopt eBPF-Based Observability for Kubernetes Clusters

Status: Accepted (March 2026)
Primary Keyword: eBPF Kubernetes observability
Archetype: Architecture Decision Record (ADR)
Target Audience: Platform engineers, SREs, DevOps architects evaluating observability stacks

Executive Summary

This Architecture Decision Record documents our organizational decision to standardize on eBPF-based observability as the primary monitoring and instrumentation approach for Kubernetes clusters, superseding traditional agent-based APM platforms (Datadog, New Relic, Dynatrace).

Key findings:
– 90% overhead reduction versus traditional APM agents (CPU and memory)
– Zero instrumentation effort — no SDK integration, no code changes, no deployment delays
– 300% year-over-year adoption growth across CNCF member organizations (2024-2026)
– 67% of large-scale Kubernetes operators already using at least one eBPF observability tool
– Industry validation: Splunk’s OpenTelemetry eBPF Instrumentation (OBI) announcement at KubeCon EU 2026

This decision aligns with industry momentum, operational efficiency requirements, and the maturation of open-source tools (Cilium, Hubble, Pixie, Tetragon) that have achieved production-grade stability.

Problem Statement

Traditional APM approaches impose significant operational and resource constraints on Kubernetes environments:

Agent overhead: In-process or sidecar agents consume 2-4% CPU and 100-300 MB memory per pod, creating operational burden on high-density clusters
Instrumentation friction: SDK integration requires code changes, vendor-specific APIs, and multi-team coordination across polyglot services
Data granularity trade-offs: Agent-based approaches rely on sampling and selective instrumentation, missing tail behavior and low-cardinality phenomena
Onboarding latency: Weeks of engineering effort to instrument microservices across an organization
Vendor lock-in: Proprietary instrumentation formats and APIs create switching costs

eBPF fundamentally inverts this model: instead of adding instrumentation to applications, eBPF programs run directly in the Linux kernel, observing application behavior transparently and efficiently.

What is eBPF? Foundational Concepts

eBPF as a Kernel Virtual Machine

eBPF (extended Berkeley Packet Filter) is a sandboxed virtual machine that runs programs inside the Linux kernel. Unlike traditional kernel modules, eBPF programs:

Are loaded dynamically without recompiling the kernel
Run with kernel privilege but sandboxed execution constraints
Cannot block or iterate indefinitely
Have verifiable memory safety guarantees

Think of eBPF as analogous to the JavaScript engine in a web browser—both are runtime sandboxes that execute untrusted code safely. The web browser’s JavaScript engine ensures that malicious scripts cannot access the host filesystem or corrupt browser memory. Similarly, the eBPF verifier ensures that malicious or buggy kernel programs cannot crash the kernel, leak memory, or access unauthorized memory regions.

The Verifier: Static Analysis as a Safety Mechanism

Before any eBPF program executes, it passes through the eBPF verifier—a static analyzer that simulates all possible program paths to guarantee safety.

The verifier performs three critical validation stages:

1. Control Flow Graph (CFG) Validation
The verifier constructs a directed acyclic graph of the program’s control flow and ensures:
– No loops exist (eBPF programs must terminate)
– No unreachable instructions follow conditional branches
– All jumps land on valid instruction boundaries

2. Execution Path Simulation
The verifier acts as an abstract interpreter, simulating every possible execution path through the program:
– Tracks register state (initialized, uninitialized, scalar, pointer)
– Verifies memory accesses occur within bounds
– Ensures pointers derive from valid kernel data structures
– Validates that stack operations don’t overflow

3. Memory Safety Enforcement
The verifier tracks the “register state” of each processor register throughout all program paths. A register’s state encodes:
– Type: Is it uninitialized, a scalar value, or a pointer to kernel memory?
– Bounds: For pointers, what is the valid memory range this pointer can access?
– Initialization: Has this register been written to before being read?

Example: If your program attempts to dereference a pointer without verifying it’s within a valid range, the verifier rejects it. If a register is read before being initialized, the verifier rejects it.

Post-Verification Hardening
Upon successful verification, the kernel applies two hardening steps:
1. Just-in-Time (JIT) Compilation: The eBPF bytecode is compiled to native machine instructions for the CPU architecture (x86-64, ARM, RISC-V), eliminating the interpreter overhead and achieving performance parity with natively compiled kernel code.
2. Read-only Memory Protection: The kernel memory page holding the compiled eBPF program is marked read-only. If any attacker attempts to modify the program after loading, the kernel panics rather than allowing silent corruption.

This combination ensures that an eBPF program, once loaded and JIT-compiled, is as efficient and trustworthy as natively compiled kernel code—but without the audit burden and deployment friction of kernel modules.

eBPF Maps: Kernel Data Persistence

eBPF programs operate in the kernel but need a mechanism to store state and communicate results to userspace. This is where eBPF maps come in.

What Are eBPF Maps?

eBPF maps are in-kernel data structures that persist across eBPF program invocations and serve as the bridge between kernel and userspace. A map is allocated in kernel memory (via the bpf() syscall) and is accessible to both:
– eBPF programs (reading and writing via helper functions like bpf_map_lookup_elem())
– Userspace processes (via bpf() syscall or memory-mapped file handles)

Maps are analogous to a shared queue or mailbox: the eBPF program writes observations (network flows, function calls, system events) into the map, and a userspace collector reads and processes them.

Map Types and Use Cases

BPF_MAP_TYPE_HASH and BPF_MAP_TYPE_ARRAY
General-purpose key-value storage for maintaining state (connection tracking, per-service metrics, request counters). Hash maps provide O(1) lookup by key; array maps use fixed integer indices.

BPF_MAP_TYPE_PERCPU_HASH and BPF_MAP_TYPE_PERCPU_ARRAY
Per-CPU variants where each logical CPU in the system has its own copy of the map. This eliminates race conditions without locks—multiple CPUs can write simultaneously to their own memory regions without contention. Essential for high-throughput metrics collection.

BPF_MAP_TYPE_RING_BUFFER
A modern, lock-free circular buffer for streaming events from kernel to userspace with zero-copy semantics. Events are written to the ring buffer, and userspace reads from it without transferring data. Critical for high-cardinality event streams (HTTP requests, syscalls).

BPF_MAP_TYPE_PERF_BUFFER
An earlier event streaming mechanism (predecessor to ring buffers) that uses per-CPU perf buffers. Slightly less efficient than ring buffers but still acceptable for many observability workloads.

Memory Safety in Maps

eBPF programs cannot access arbitrary kernel memory—they can only manipulate their context (registers, stack) and kernel data structures exposed through maps. The verifier enforces this:
– Pointer dereferences must be explicitly validated
– Bounds checks are mandatory for array access
– Memory accesses use kernel-controlled copy functions (bpf_probe_read_kernel(), bpf_probe_read_user())

This design prevents eBPF programs from corrupting kernel state while allowing deep observability into system behavior.

Entry Points: How eBPF Hooks Into Kernel Events

eBPF doesn’t exist in a vacuum—it needs attachment points where programs are triggered. The Linux kernel provides multiple mechanisms:

kprobes and uprobes: Function Instrumentation

kprobes attach eBPF programs to arbitrary kernel functions (system calls, device drivers). When the kernel executes the target function, the kprobe fires, executes the attached eBPF program, and continues.

uprobes are the userspace equivalent—they attach eBPF programs to functions within userspace binaries. For example, you can attach a uprobe to the do_HTTP_read() function in your Go application, and every time that function executes, the eBPF program runs with access to the function’s registers and stack.

Overhead: Near-zero when the probe fires (just a kernel jump), but there is a small overhead to checking whether any probes are attached before executing a function.

Tracepoints: Predefined Kernel Events

Kernel tracepoints are static instrumentation points compiled into the kernel—they mark important events (process creation, file open, network packet transmission). eBPF programs attach to these predefined tracepoints without modifying kernel code.

Tracepoints are more efficient than kprobes because they’re already present in the kernel’s execution path; kprobes require a CPU exception handler to be invoked.

XDP: Packet Processing at Driver Level

XDP (eXpress Data Path) is an early packet processing framework where eBPF programs attach at the network driver level—before the packet enters the kernel’s network stack. XDP programs can:
– Drop packets early (DDoS mitigation)
– Redirect packets between interfaces
– Parse and modify packet headers
– Collect packet statistics

XDP runs in the critical path of packet processing, so eBPF programs must be extremely efficient (sub-microsecond execution).

tc eBPF: Traffic Control and Egress/Ingress Filtering

The traffic control (tc) subsystem uses eBPF programs to implement packet scheduling, queuing, and filtering at the kernel’s egress and ingress points. Cilium (Kubernetes CNI) heavily relies on tc eBPF for policy enforcement.

Kubernetes-Centric eBPF Observability Stack

The production-ready eBPF stack for Kubernetes in 2026 is built from four complementary CNCF projects:

Cilium + Hubble: Network Observability and CNI

Cilium is a Kubernetes CNI (Container Network Interface) that replaces the default networking layer with eBPF-powered networking. Instead of iptables and userspace proxies, Cilium uses eBPF programs to enforce network policies, load-balance traffic, and provide L3-L7 service routing—all in kernel space.

Hubble is Cilium’s observability component. While Cilium’s eBPF programs enforce policies, Hubble’s eBPF programs observe network flows, generating a stream of network events:
– Flow metadata: Source/destination IP, port, protocol, DNS names, service names
– L7 parsing: HTTP/HTTPS request paths, gRPC service names, Kafka topics
– Latency metrics: Per-flow p50/p95/p99 latencies
– Traffic direction: Ingress, egress, and intra-cluster flows

Hubble outputs flow events to a ring buffer, which a userspace collector aggregates into:
– Service dependency maps (who calls whom)
– Network latency heatmaps
– Policy violation alerts

Cost: 1-2% CPU overhead per node (for both Cilium and Hubble combined), plus ~20 MB memory. Scales sublinearly with cluster size.

Pixie: Application-Level Observability and APM

Pixie provides zero-instrumentation APM for Kubernetes. Unlike traditional APM agents that require SDK integration, Pixie uses eBPF uprobes to intercept function calls in application binaries.

What Pixie Captures:
– HTTP/1.1, HTTP/2, gRPC requests: Full request/response bodies and headers
– Database queries: SQL queries sent by your application
– Redis commands: Commands issued to in-memory caches
– Message queues: Kafka produce/consume calls

Pixie captures 100% of requests (not sampled) by storing them in a rolling in-memory buffer (8 GB default) on each node. After 60 seconds, the buffer rolls, and older entries are discarded. For long-term retention, aggregated metrics (latencies, error rates) are exported.

Data Collection Mechanism:
Pixie’s eBPF programs attach uprobes to runtime symbols in language runtimes (Go, Python, Java, Node.js, .NET). When a function like net.http.ReadRequest() executes, Pixie’s eBPF code runs in-kernel and captures request metadata directly from memory—without requiring code instrumentation.

Cost: 1-3% CPU overhead per node, 50-100 MB memory. The memory cost scales with request volume and buffer size (user-tunable).

Tetragon: Runtime Security and Syscall Observability

Tetragon is Cilium’s security observability engine. It uses eBPF to monitor and optionally enforce syscall-level behavior:

Observability:
– Track file access (open, unlink, chmod)
– Monitor process execution and parent-child relationships
– Log network connections and bind events
– Detect and record container escapes

Enforcement:
– Block unauthorized system calls
– Deny file access based on policy
– Prevent process execution
– Audit trails for compliance (PCI-DSS, HIPAA, SOC2)

Tetragon is particularly powerful for detecting runtime anomalies: if a containerized application suddenly starts executing curl or wget, Tetragon can alert or block the access.

Cost: 0.5-1% CPU overhead per node, 20-50 MB memory.

Splunk OBI: Standardized eBPF Instrumentation (OpenTelemetry)

At KubeCon EU 2026, Splunk announced the beta launch of OpenTelemetry eBPF Instrumentation (OBI), co-developed with Grafana Labs (who donated Beyla to the OpenTelemetry project).

OBI standardizes eBPF instrumentation around OpenTelemetry, the CNCF’s vendor-neutral observability standard. Instead of each tool (Pixie, Cilium, Tetragon) exporting telemetry in its own format, OBI ensures all eBPF observability converges on OTLP (OpenTelemetry Protocol).

What OBI Provides:
– Automatic distributed tracing via eBPF (no SDK required)
– RED metrics (Rate, Errors, Duration) per service
– Support for all major languages (Go, Java, Python, Node.js, .NET, Ruby, C/C++, Rust)
– Vendor-agnostic export via OTLP

Why It Matters: OBI decouples eBPF observability from any single vendor. You deploy OBI, get observability data in standard OTLP format, and choose any backend (Grafana, Prometheus, Loki, Splunk, New Relic) to consume the data.

Architectural Comparison: eBPF vs. Traditional APM

The performance and operational differences between eBPF-based and traditional APM approaches are substantial:

Resource Overhead

Traditional APM (Agent-based):
– CPU: 2-4% per pod (Datadog, New Relic agents)
– Memory: 100-300 MB per pod
– Network bandwidth: Continuous agent-to-backend communication
– Scales linearly with pod count

Example: A 100-node cluster with 50 pods per node requires 5,000 running agent processes, each consuming resources.

eBPF-based Approach:
– CPU: 0.5-1% per node (for all observability combined: Cilium, Hubble, Pixie, Tetragon)
– Memory: 10-50 MB per node for eBPF kernel programs
– Network bandwidth: Aggregated, filtered telemetry only
– Scales sublinearly with pod count

Same 100-node cluster requires only 100 eBPF programs (one per node), each running in kernel space with kernel-level filtering and aggregation.

Performance Benchmarks (groundcover Flora study, 2025):
– Flora (eBPF): +9% CPU, +0% memory overhead
– Datadog agent: +249% CPU, +227% memory overhead
– OpenTelemetry auto-instrumentation: +59% CPU, +27% memory overhead
– Pixie agent: +32% CPU, +9% memory overhead

Bottom line: eBPF achieves 90% less overhead than traditional APM agents.

Instrumentation Effort

Traditional APM:
– SDK integration per language: weeks of engineering
– Code review and testing cycles
– Dependency management and versioning
– Vendor-specific APIs to learn
– Application restarts required

eBPF:
– Deploy eBPF programs and collectors
– Zero application code changes
– Deploy in minutes, not weeks
– Vendor-agnostic (OTLP standard)
– No application restart needed

Data Granularity

Traditional APM:
– Sampling: Often 10-50% of requests captured
– SDK-limited: Only what the SDK explicitly instruments
– Lower cardinality: Missing tail behavior, rare error paths

eBPF:
– 100% request capture (for 60-second rolling windows)
– Kernel-level visibility: All function calls, syscalls, network operations
– Higher cardinality: Tail latencies, rare error conditions visible

Why eBPF Is Winning: Adoption Metrics and Momentum

CNCF Survey Data (2024-2026)

67% adoption: Of organizations running Kubernetes at large scale, 67% use at least one eBPF observability tool
300% YoY growth: eBPF observability adoption grew 300% year-over-year from 2024 to 2026
Vendor adoption: Datadog, New Relic, and Dynatrace all announced eBPF-native observability modes in 2025-2026

Industry Momentum

KubeCon EU 2026 (Amsterdam):
– Splunk announced OBI (OpenTelemetry eBPF Instrumentation) in beta
– CiliumCon (co-located event) focused on eBPF production deployments
– OpenSSF (Open Source Security Foundation) featured Tetragon in security talks

Public Cloud Defaults:
– Google GKE: eBPF mode enabled by default
– Amazon EKS: eBPF support GA (generally available)
– Microsoft AKS: eBPF networking available

Open Source Maturity

Cilium: CNCF graduated project (highest maturity level)
Hubble: Stable observability platform
Pixie: CNCF incubation, production deployments at scale
Tetragon: CNCF incubation, runtime security workloads
OpenTelemetry eBPF (Beyla): Donated to CNCF, vendor-backed

Decision Matrix and Trade-off Analysis

The following table summarizes key trade-offs:

Criterion	Traditional APM	eBPF-based
CPU Overhead	2-4% per pod	0.5-1% per node
Memory Overhead	100-300 MB per pod	10-50 MB per node
Instrumentation Effort	Weeks (SDKs, code changes)	Minutes (deploy collectors)
Data Granularity	Sampled (10-50%)	100% capture (windowed)
Vendor Lock-in	High (proprietary APIs)	Low (OTLP standard)
Language Support	Limited (SDK per language)	Universal (kernel-level)
Polyglot Support	Difficult (per-language SDKs)	Automatic (kernel intercepts all)
Learning Curve	Vendor-specific	Vendor-agnostic (OTLP)
Production Readiness	Mature	GA (2026)
Scaling Efficiency	Linear	Sublinear

Recommendation: For any Kubernetes cluster with >20 nodes or >500 pods, eBPF-based observability delivers superior operational efficiency and lower TCO.

Implementation Architecture

Recommended Deployment Model (Four Layers)

Layer 1: Network Observability (Cilium + Hubble)
– Deploy Cilium as the CNI
– Enable Hubble for flow visibility
– Output: Service maps, network latency metrics, policy audit logs

Layer 2: Application Observability (Pixie)
– Deploy Pixie agents on all nodes
– Attach uprobes to application runtimes
– Output: Distributed traces, request latencies, service dependencies

Layer 3: Security Observability (Tetragon)
– Deploy Tetragon for runtime monitoring
– Define policies for syscall enforcement
– Output: Audit trails, security events, compliance logs

Layer 4: Standardized Telemetry (Splunk OBI / OpenTelemetry)
– Use OBI or vendor-backed eBPF instrumentation
– Export all telemetry via OTLP
– Decouple backend choice (Grafana, Prometheus, Splunk, etc.)

Data Flow

Kernel: eBPF programs execute, generating events
Maps: Events accumulate in ring buffers and hash maps
Collectors: Userspace daemons read from maps via bpf() syscalls
Aggregation: Events are filtered, sampled, and aggregated in-kernel (via eBPF) or in-process
Export: Aggregated telemetry shipped to backends via OTLP or vendor protocols

Operational Considerations

Deployment:
– eBPF programs are loaded at container/pod startup via DaemonSets
– No kernel module recompilation or system reboots needed
– Rollback is instantaneous (unload eBPF program)

Security:
– Verifier prevents malicious programs from loading
– RBAC controls who can load eBPF programs
– Audit trails (Tetragon) track all program loads

Performance Monitoring:
– Monitor eBPF program execution latency via kernel metrics
– Track map memory usage and collision rates
– Alert on verifier rejections (indicates incompatible kernel version)

Known Limitations and Edge Cases

Kernel Version Dependency

eBPF features depend on Linux kernel version:
– Kernel 5.0+: Basic eBPF (maps, kprobes, tracepoints)
– Kernel 5.8+: Ring buffers (zero-copy event streaming)
– Kernel 5.10+: BPF CO-RE (portable eBPF bytecode across kernel versions)

Mitigation: Use BPF CO-RE to ensure eBPF programs work across kernel versions without recompilation.

Verifier Complexity

As eBPF programs grow in complexity, the verifier may reject valid programs due to state explosion. Example: deeply nested loops or complex pointer arithmetic may be rejected even if safe.

Mitigation: Use eBPF helper functions (kernel-provided utilities) instead of complex in-program logic. Keep programs focused and modular.

Language-Specific Challenges

Not all languages expose userspace symbols equally:
– Go: Goroutines have complex memory layouts; uprobes sometimes capture inconsistent state
– Python: Interpreted; function calls don’t always map to memory boundaries
– Java: JIT compilation means function addresses change dynamically

Mitigation: Newer tools like OBI use language-specific plugins to handle these edge cases.

Sampling and Tail Loss

Pixie and ring buffer implementations use rolling windows. Under extreme load, older events are discarded before userspace reads them.

Mitigation: Increase buffer sizes (tunable at deployment time) or use kernel-side filtering to reduce event volume.

Conclusion and Recommendation

We recommend adopting an eBPF-first observability strategy for all new Kubernetes deployments and migrating existing workloads over 12 months.

Key Rationale

Operational Efficiency: 90% less overhead than traditional APM
Zero Instrumentation: No SDK integration, no code changes, no deployment delays
Industry Validation: 300% YoY adoption, CNCF standardization, vendor consensus
Production Readiness: Cilium is CNCF graduated; Pixie and Tetragon are mature; OBI announced at KubeCon EU 2026
Vendor Flexibility: OpenTelemetry standard decouples eBPF collection from backend choice
Scalability: Sublinear resource growth with cluster size and pod count

Migration Path

Phase 1 (Months 1-3): Pilot eBPF observability on non-critical cluster using Cilium + Hubble for network observability.

Phase 2 (Months 4-6): Add Pixie for application tracing. Validate data quality and export pipeline.

Phase 3 (Months 7-9): Deploy Tetragon for security observability. Train security teams on policy enforcement.

Phase 4 (Months 10-12): Migrate backend choice to vendor-neutral OTLP. Deprecate legacy APM agents.

Success Metrics

Resource utilization: 90% reduction in agent CPU/memory overhead
Instrumentation time: Reduce onboarding time for new services from weeks to minutes
Data completeness: Achieve 100% request capture (for rolling windows)
Adoption: 100% of new Kubernetes deployments use eBPF observability

References and Further Reading

CNCF and Industry Reports

Technical Documentation

Academic and Deep-Technical References

Comparisons and Case Studies

OpenTelemetry and Standards

Appendix: eBPF Glossary

Bytecode: Machine-independent instruction format; eBPF programs are compiled to bytecode, then JIT-compiled to native instructions.

Kprobe: Kernel probe; attaches eBPF programs to arbitrary kernel functions.

Uprobe: Userspace probe; attaches eBPF programs to functions in userspace binaries.

Tracepoint: Static instrumentation point in kernel code; predefined and more efficient than kprobes.

XDP (eXpress Data Path): Early packet processing framework where eBPF programs run at the network driver level.

Ring Buffer: Lock-free circular buffer for streaming events from kernel to userspace with zero-copy semantics.

JIT Compilation: Just-in-time compilation of eBPF bytecode to native machine instructions for the CPU architecture.

Verifier: Static analyzer that ensures eBPF programs are safe to execute before loading into the kernel.

Helper Functions: Kernel-provided utilities callable from eBPF programs (e.g., bpf_map_lookup_elem(), bpf_get_current_pid_tgid()).

BPF CO-RE (Compile Once, Run Everywhere): Technology enabling eBPF programs to run across multiple kernel versions without recompilation.

OTLP (OpenTelemetry Protocol): Standard protocol for exporting observability data (traces, metrics, logs) in a vendor-agnostic format.

Document Version: 1.0
Last Updated: April 17, 2026
Status: Accepted
Next Review Date: October 2026

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories