Cilium eBPF Service Mesh: L3-L7 Networking, Observability, Security 2026

Cilium eBPF Service Mesh: L3-L7 Networking, Observability, Security 2026

Cilium eBPF Service Mesh: L3-L7Networking, Observability, Security 2026

Cilium eBPF service mesh replaces the Kubernetes networking bottleneck—kube-proxy—with kernel-space packet processing, cutting latency and CPU overhead by 60–80% compared to iptables-based networking. Unlike sidecar-heavy competitors like Istio, Cilium’s sidecarless model deploys a single per-node Envoy proxy and eBPF hooks in the kernel, delivering layer 7 policy, identity-based security, and fine-grained observability without per-pod resource overhead. This post explores Cilium’s eBPF datapath, ClusterMesh multi-cluster topology, Hubble observability, runtime security via Tetragon, and production trade-offs. Learn why Cilium scales where Istio struggles and when the kernel’s policy engine beats application proxies.

Why Cilium eBPF service mesh matters in 2026

Cilium eBPF service mesh addresses three critical gaps in Kubernetes networking that compound at scale. Traditional CNI and kube-proxy are iptables-based, meaning every packet triggers a userspace lookup and chain traversal—a 40–70 microsecond tax per flow. Istio’s sidecar-per-pod model adds 50–300MB per application pod and requires managing service mesh control plane scale for thousands of proxies. Cilium’s kernel-first architecture moves policy, load-balancing, and observability into eBPF programs, which run directly on the Linux kernel without context-switching to userspace. By 2026, adoption has grown 3x year-over-year among enterprises running >500 pods per cluster, with Isovalent’s Cilium Service Mesh now competing directly with Linkerd and Istio on feature parity. This post covers the eBPF internals, identity-based security model, multi-cluster federation, and honest failure modes so you can evaluate whether Cilium fits your infrastructure.

Cilium eBPF service mesh architecture with kernel datapath, Envoy proxy, and Hubble observability flows

Cilium architecture: eBPF kernel datapath + user-space agent

Cilium’s architecture decouples kernel-space networking from user-space policy and observability. The eBPF datapath in the kernel handles packet forwarding, NAT, load-balancing, and connection tracking in the XDP or TC (traffic control) layer—before userspace daemons ever see the frame. The Cilium agent (running on each node) watches the Kubernetes API, constructs policy rules, and loads eBPF programs into the kernel. Hubble, the observability sidecar, reads kernel events and exports flow telemetry. This design avoids the per-pod sidecar problem: a 500-pod cluster with Istio needs 500 Envoy proxies consuming 150GB of memory; Cilium needs one Envoy per node, using ~2GB total.

eBPF datapath: replacing kube-proxy and IPtables

The Cilium kernel module installed as cilium-agent ebpf replaces kube-proxy entirely. Every packet entering a pod triggers eBPF hooks in the XDP (eXpress Data Path) ingress or TC egress. The eBPF program consults a kernel map (a fast hash table) for the destination service and applies connection tracking, DNS policy, and load-balancing before the packet ever reaches the container network namespace. For a pod-to-pod connection on the same node, the eBPF program rewrites the destination MAC and IP, then redirects the packet back to the kernel stack—zero syscalls, zero context-switch, sub-microsecond latency. For cross-node traffic, Cilium uses VXLAN (or Geneve overlay) or native BGP routing depending on your cluster mode.

eBPF map types used:
BPF_MAP_TYPE_HASH — connection state and service endpoints (CT map)
BPF_MAP_TYPE_ARRAY — fast config lookups (policy rules)
BPF_MAP_TYPE_LRU_HASH — LRU eviction for large conntrack tables (>1M flows)
BPF_MAP_TYPE_RING_BUFFER — lock-free ring for event export to userspace (Hubble)

One eBPF program per kernel function (XDP ingress, TC egress, socket operations) keeps total eBPF code under the 1M instruction limit per program.

Identity-based networking: CiliumNetworkPolicy and network endpoints

Cilium moves beyond IP/port-based policies (Kubernetes NetworkPolicy) to identity-based rules. Every pod gets a numeric identity (e.g., 1234) derived from its labels and namespace. A policy rule then says “allow traffic from pods with identity=1234 to pods with identity=5678 on port 8080”, independent of the actual IP address. When a pod restarts or migrates, the IP changes but the identity persists, and policy applies without rule update. This abstraction is the foundation for zero-trust networking.

CiliumNetworkPolicy extends Kubernetes NetworkPolicy with fromEndpoints selectors:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-frontend-to-api
spec:
  endpointSelector:
    matchLabels:
      tier: api
  ingress:
    - fromEndpoints:
      - matchLabels:
          tier: frontend
      toPorts:
      - ports:
        - port: "8080"
          protocol: TCP

At policy evaluation time (in eBPF), the kernel consults the identity map to determine if the source pod’s identity matches the policy’s fromEndpoints selector. This happens before the packet reaches the application, making policy enforced at line-rate.

L7 policy via per-node Envoy and socket LB

While eBPF excels at L3/L4 (IP, port, protocol), layer 7 policies (HTTP host matching, gRPC method names, path-based routing) still need a proxy. Cilium’s Cilium Service Mesh deploys a single Envoy proxy per node, configured to intercept traffic destined for L7-policy-governed services. Instead of injecting sidecars, Cilium uses socket-level load-balancing (SO_REUSEPORT + BPF socket programs) to redirect traffic bound for a service to the per-node Envoy. Envoy then applies L7 rules and forwards to the backend pod.

The socket LB program:
1. Intercepts all sendto() / recvfrom() calls for service IPs.
2. Rewrites the destination to the per-node Envoy (127.0.0.1:xxxxx).
3. Envoy processes the request, applies L7 policy (e.g., “allow GET /api/* only”).
4. Envoy forwards to the real backend via a backend service IP.

This design avoids the per-pod memory overhead of sidecar proxies while still enabling HTTP/gRPC policies.

Pod-to-pod traffic with eBPF datapath vs traditional kube-proxy CNI

Conntrack and socket LB: the eBPF connection state machine

Cilium maintains a distributed conntrack table—a kernel eBPF map tracking TCP/UDP flows across all pods on a node. Each entry holds the original source/dest IP and port (4-tuple), plus a state flag (SYN_SENT, ESTABLISHED, TIME_WAIT) and timestamp. When a packet arrives, the eBPF program looks up the 4-tuple in conntrack. If it’s a new flow, the program checks policy; if allowed, it allocates a conntrack entry and forwards the packet. If it’s an existing flow, the program re-applies the NAT rewrite and forwards.

Connection establishment for service-to-service:
– Pod A sends SYN to Service IP 10.0.1.1:8080 (virtual IP).
– Kernel intercepts at XDP; eBPF looks up Service 10.0.1.1 in the service map and finds endpoints {Pod B IP, Pod C IP}.
– eBPF selects Pod B via round-robin hash and rewrites dest IP to Pod B’s IP (e.g., 10.0.2.50:8080).
– eBPF allocates a conntrack entry {Pod A 10.0.1.10:54321, Pod B 10.0.2.50:8080, ESTABLISHED}.
– Response from Pod B is also rewritten (source IP back to 10.0.1.1:8080) via the same conntrack entry.
– From Pod A’s perspective, it always sees traffic from 10.0.1.1:8080 (the Service IP), never the real backend.

This happens sub-microsecond per packet, with no syscalls or context switches. For 100k concurrent connections, the conntrack map uses ~800MB (8KB per entry), comparable to iptables but with better cache locality.

Sidecarless L7: per-node Envoy vs per-pod Istio sidecar comparison

Identity-based security, policy evaluation, and multi-cluster federation

Cilium’s security model rests on identities and explicit allow-lists. Every pod’s identity is computed from its Kubernetes labels and namespace using a hash function, resulting in a stable numeric ID across restarts. When a policy rule references matchLabels: tier=frontend, the Cilium agent looks up all pods with that label, resolves their identities, and installs a policy rule in eBPF: “allow traffic from identities {1234, 5678, 9999}”. This abstraction decouples policy from ephemeral IP addresses.

Identity allocation and distributed enforcement

Cilium’s identity allocator is a distributed system. Each cluster node runs a Cilium agent; one node is elected as the identity allocator leader. When a new pod is created with labels tier=frontend, the Cilium agent on that node requests an identity from the leader. The leader computes the identity hash, allocates a unique numeric ID (e.g., 1234), and broadcasts it to all agents via etcd. Each agent loads the identity mapping into a kernel eBPF map, so local policy enforcement can use the numeric ID without further lookups.

Identity caching and rollover:
– Identity map is small (~10KB for 10k pods due to hash compression).
– Cache lookups are O(1) hash-table hits.
– Identity allocation is eventually consistent; new pods wait ~100ms to get an ID.

Policy evaluation: the decision tree

At packet arrival time, the eBPF program executes a decision tree:

  1. Lookup source identity: use the source IP to find the pod’s identity from the pod map.
  2. Lookup destination service: use the destination IP to find the service definition.
  3. Select endpoint: use consistent hashing to pick a backend pod.
  4. Evaluate policy: consult the policy map for a rule matching {source identity, destination identity, port}.
  5. Action: if rule allows, rewrite addresses and forward; if deny, drop packet.

This tree runs entirely in eBPF, with all maps resident in kernel memory. Latency is sub-microsecond for cache-resident accesses (~200 nanoseconds).

Cilium policy evaluation decision tree with identity lookups and endpoint selection

ClusterMesh: multi-cluster federation and Hubble observability

ClusterMesh extends Cilium’s identity-based security to multiple Kubernetes clusters. Each cluster runs a Cilium control plane; clusters connect via a shared etcd cluster or via clustermesh-apiserver, a gateway that syncs identity and endpoint information across cluster boundaries. Pods in Cluster A can directly address pods in Cluster B using a logical service name (e.g., api.cluster-b.svc.cilium.io); the Cilium agent resolves this to Pod IPs in Cluster B and installs cross-cluster routes.

Security model across clusters:
– Identities are globally unique (prefixed with cluster ID).
– Policy rules apply uniformly: “allow frontend-identity 1234 to api-identity 5678” works whether they’re in the same cluster or different clusters.
– Encryption is automatic: cross-cluster traffic is wrapped in IPSec or WireGuard tunnels.

Hubble, Cilium’s observability platform, exports flow-level telemetry from both clusters:
– Source pod identity, destination pod identity, service IP, endpoint IP.
– Layer 7 metadata (HTTP method, path, status code).
– Policy decision (allowed, denied, dropped).
– Encryption status (encrypted, unencrypted, encrypted in-cluster only).

Hubble UI renders a live graph of all pod-to-pod communication across clusters, color-coded by policy decision.

ClusterMesh multi-cluster topology with Hubble observability and cross-cluster identity federation

Deeper walkthrough: Tetragon runtime security, WireGuard/IPSec encryption, and BGP integration

Tetragon: kernel-space runtime security enforcement

Tetragon, Cilium’s sister project for runtime security, uses eBPF tracepoints to monitor system calls and enforce process-level policies. While Cilium controls network-layer traffic, Tetragon controls system calls: file access, process spawning, network socket creation, and more. A Tetragon policy can say “container image nginx:1.21 is only allowed to make DNS lookups and HTTP requests to whitelisted domains”, and enforce that in the kernel before the process ever executes the syscall.

Tetragon enforcement:
– Monitor execve() syscalls and match against image/container metadata.
– Monitor open() syscalls and enforce file-access policies (read-only, no-exec).
– Monitor socket() syscalls and enforce network-access policies (allow 53 for DNS, deny 22 for SSH).

Combined with Cilium’s network policies, Tetragon provides zero-trust enforcement: network microsegmentation at the pod level (Cilium) + process-level syscall auditing (Tetragon).

Encryption: WireGuard and IPSec tunneling

Cilium encrypts cross-node and cross-cluster traffic using two backends:

WireGuard (recommended for 2026):
– Modern, high-performance UDP-based tunnel.
– Kernel module (since Linux 5.6), <4000 lines of code.
– Overhead: ~5–10% throughput loss, <1ms latency added.
– Configuration: cilium.wireguard.enabled=true.

IPSec (legacy fallback):
– ESP (Encapsulating Security Payload) tunneling, RFC 4303.
– Higher CPU overhead (25–40% throughput loss).
– Better ecosystem support in legacy enterprises.

Both modes use pre-shared keys (PSK) derived from a cluster secret or external key management (Vault). Cross-cluster traffic between ClusterMesh members always encrypts by default.

BGP integration: native routing and load-balancing

Cilium can announce Kubernetes Service IPs directly to the underlay network using BGP. This allows external load-balancers, routers, and on-premises systems to reach Services without kube-proxy or cloud-provider load-balancers. A router running BGP learns that Service IP 10.0.1.1 is reachable via node 192.168.1.10, and routes external traffic directly to that node. Cilium’s XDP program then applies service-to-endpoint load-balancing on the node.

Use cases:
– Bare-metal clusters: advertise Service IPs directly to top-of-rack switches.
– Hybrid cloud: advertise Service IPs to on-premises routers.
– Zero-trust egress: combine with identity-based policies to control which external systems reach which services.

BGP configuration via CiliumBGPPeeringPolicy:

apiVersion: cilium.io/v2alpha1
kind: CiliumBGPPeeringPolicy
metadata:
  name: bgp-peering
spec:
  virtualRouters:
    - asn: 64512
      neighbors:
        - peerAddress: 192.168.1.1
          peerAsn: 64513
      prefixAdvertisements:
        - cidr: 10.0.1.0/24  # Service CIDR

Top-of-rack router learns routes to Service CIDR via BGP and forwards external traffic toward nodes running Cilium.

Bandwidth manager and QoS scheduling

Cilium’s bandwidth manager uses eBPF to enforce per-pod rate-limiting and prioritization. Pods can be tagged with bandwidth limits (e.g., cilium.io/tx-bytes-per-sec: 100Mi). The eBPF program in the TC (traffic control) egress layer drops or queues packets to enforce the limit, without requiring userspace tools like tc-qdisc.

IPv6 NAT46/NAT64 and dual-stack networking

Cilium supports dual-stack (IPv4 + IPv6) clusters and can translate between IPv4 and IPv6 address spaces. A pod in an IPv6-only cluster can reach an IPv4-only service via IPv6-to-IPv4 NAT translation in the eBPF datapath. This is critical for heterogeneous clusters that span cloud (IPv6) and on-premises (IPv4) environments.

Trade-offs, gotchas, and what goes wrong

Cilium’s eBPF approach is powerful but comes with operational and architectural trade-offs that can bite in production. eBPF programs are difficult to debug—kernel panics are silent, memory corruption is hard to trace, and a buggy program can crash the entire node. Cilium has had stability issues in past releases (pre-1.13) where certain policy configurations would cause eBPF verifier errors or memory leaks in large clusters (>5000 pods).

Kernel version dependency: Cilium requires Linux 4.9+ and newer kernel features (eBPF for load-balancing, XDP drivers) are available only in 5.4+. Some cloud providers (EKS on older AMIs, AKS on certain VM SKUs) ship kernels missing required features. Diagnosis requires checking /boot/config-$(uname -r) for CONFIG_BPF flags and running the Cilium connectivity check (cilium connectivity test).

Socket LB complexity: The per-node Envoy proxy and socket-level load-balancing add operational overhead. Envoy memory usage is unpredictable with large numbers of services (each service endpoint requires Envoy listener/cluster config). Tuning Envoy resource requests requires monitoring and is non-obvious.

Policy evaluation latency at tail: While eBPF is fast on cache hits, a cache miss (e.g., first packet of a new flow with a policy lookup) can add 5–10 microseconds. For latency-sensitive workloads (trading, low-latency analytics), this may not be negligible.

Cross-node latency with VXLAN overlay: If your cluster uses VXLAN for inter-node communication (cloud deployments often do), you’re adding ~100–500 microseconds of overlay encapsulation/decapsulation. Native BGP routing avoids this but requires bare-metal or cloud support.

Debugging observability: Unlike Istio, which logs all proxy decisions in userspace, Cilium’s eBPF program decisions (allow/deny) are only visible via Hubble. Hubble has sampling overhead; high-traffic clusters may drop events if observability is not tuned.

Anti-patterns to avoid:
– Deploying without network policy — eBPF’s identity-based security only helps if you write policies.
– Relying on socket LB L7 policies for >100 distinct services — Envoy becomes the bottleneck.
– Mixing overlay (VXLAN) and native routing — performance is unpredictable.
– Running Cilium + Istio sidecars — double NAT, triple encryption, observable slowdown.

Practical recommendations

Cilium is a mature choice for Kubernetes clusters >200 pods seeking to reduce kube-proxy overhead and adopt identity-based networking. Start with Cilium’s eBPF datapath for L3/L4 networking, then gradually enable L7 policies via Cilium Service Mesh. Use Hubble to visualize traffic and validate policies before strict enforcement. Monitor kernel eBPF statistics via Prometheus metrics (Cilium exposes cilium_policy_decisions and cilium_socket_lb_connections) to catch pathological cases early.

Deployment checklist:
1. Verify kernel support: uname -r ≥ 5.4; check for CONFIG_BPF=y, CONFIG_BPF_SYSCALL=y, CONFIG_XDP_SOCKETS=y.
2. Run Cilium’s pre-flight checks: cilium status, cilium connectivity test.
3. Install network policies incrementally—start with allow-all, then tighten.
4. Monitor Hubble for policy violations and unexpected flows.
5. Set up Tetragon for runtime security on sensitive workloads (databases, secrets stores).
6. Benchmark before/after: measure tail latency, CPU usage, and memory footprint.

Frequently asked questions

How does Cilium compare to Istio?

Cilium is sidecarless and kernel-native, using eBPF to enforce policies at the Linux kernel level without per-pod sidecars. Istio injects Envoy sidecars in every pod, consuming 50–300MB each. Cilium has lower resource overhead but steeper learning curve (eBPF requires kernel knowledge). Istio has wider ecosystem support (Jaeger, Kiali integration) and more mature debugging tooling. For large clusters (>500 pods), Cilium is preferred; for smaller clusters with polyglot deployment, Istio is more approachable.

Does Cilium replace Istio entirely?

Not quite. Cilium excels at L3/L4 networking and identity-based security. Istio provides richer service mesh observability (distributed tracing, service graphs), traffic management (canary deployments, circuit breakers), and multi-cluster federation with easier operational workflows. Many enterprises run Cilium for networking + Kyverno for policy + Prometheus + Jaeger for observability, which replaces Istio piecemeal. As Cilium Service Mesh matures (2025–2026), the gap narrows.

What’s the memory footprint of Cilium per node?

The Cilium DaemonSet pod typically consumes 200–500MB of memory (depending on cluster size). The per-node Envoy proxy (if L7 policies are enabled) adds another 200–500MB. Compare this to Istio: 500+ pods × 50MB Envoy per pod = 25GB for the same cluster. Cilium is 50x more efficient on memory.

Can Cilium handle 10,000 pods in a single cluster?

Yes. Cilium has been tested up to 10,000 pods per cluster. The main bottleneck is kernel eBPF map sizes (64MB max for BPF_MAP_TYPE_LRU_HASH by default, tunable to 2GB via kernel parameters). Monitor conntrack_full metrics and increase map sizes preemptively. Identity allocation scales linearly; policy evaluation is O(1) per packet.

What observability does Hubble provide?

Hubble exports per-packet metadata (source/dest identity, service IP, endpoint IP, port, protocol, policy decision, encryption status, HTTP method/path/status for L7). Hubble UI provides a live service graph. Hubble export to Splunk or Elasticsearch enables long-term retention and forensic analysis. Sampling is configurable; default is 1-in-100 for high-traffic clusters.

Further reading

References

  1. Cilium Official Documentation — authoritative guide to Cilium architecture, eBPF datapath, and policies.
  2. Isovalent Cilium Service Mesh Whitepaper — in-depth coverage of sidecarless architecture and comparison with Istio/Linkerd.
  3. Linux Kernel eBPF Documentation — kernel eBPF programming, BPF map types, and verifier constraints.
  4. Envoy Proxy Documentation — reference for L7 policy and socket-level load-balancing integration.
  5. Gartner Service Mesh Market Guide 2026 — market positioning and adoption trends.

Last updated: April 22, 2026. Author: Riju (about).

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *