Kubernetes CNI Compared: Calico vs Cilium vs Flannel
Choosing the wrong CNI is the most expensive networking decision a Kubernetes team makes, and almost nobody revisits it until it hurts. A thorough Kubernetes CNI comparison is not academic: the plugin you pick decides whether network policy is enforceable, whether you can replace kube-proxy, how packets are encrypted on the wire, and how much CPU your nodes burn on connection tracking at scale. The default that ships with your distribution is rarely the one you want for production. By 2026 the field has consolidated around three serious answers — Calico, Cilium, and Flannel — plus Multus when a single interface per pod is not enough. They sit at opposite ends of a spectrum from “just give me pod connectivity” to “give me an eBPF service mesh.” This article dissects each against the Container Network Interface (CNI) specification, the dataplane that actually moves packets, and the operational reality of running them.
What this covers: the CNI spec, iptables vs eBPF vs overlay dataplanes, network policy depth including Cilium L7, performance and encryption, observability with Hubble, Flannel’s niche, Multus for telco, and a decision matrix.
Context and Background
Kubernetes deliberately ships without a built-in pod network. The control plane assigns every pod an IP and assumes a flat network where any pod can reach any other pod without Network Address Translation (NAT), but it delegates the implementation of that promise to a plugin. That contract is the Container Network Interface, a CNCF specification that defines a tiny protocol: the container runtime invokes a plugin binary with an ADD, DEL, or CHECK verb and a JSON configuration, and the plugin wires up the pod’s network namespace and returns the assigned addresses. Everything else — routing, policy, encryption, load balancing — is the plugin’s business.
That minimalism is why the CNI ecosystem is so diverse. Flannel arrived early with the simplest possible answer: an overlay network that tunnels pod traffic between nodes and does nothing else. Calico, from Tigera, took the opposite tack with real Layer 3 routing and a full network policy engine built on iptables, later adding an eBPF dataplane. Cilium, now a CNCF graduated project, rebuilt the entire dataplane on eBPF (extended Berkeley Packet Filter), the Linux kernel technology that runs sandboxed programs at packet-processing hooks without a kernel module. By 2026 Cilium has become the default or recommended CNI across the major managed offerings — it powers Google Kubernetes Engine (GKE) Dataplane V2, is the built-in option on Azure AKS, and is widely deployed on Amazon EKS. For background on why eBPF reshaped this layer, see our deep dive on eBPF Kubernetes observability replacing APM. The Tigera and Cilium project documentation remain the authoritative references for current dataplane behaviour (Tigera/Calico docs, Cilium docs).
It helps to be precise about what the CNI specification does and does not mandate. The spec defines the plugin invocation contract — the verbs, the environment variables such as CNI_COMMAND and CNI_NETNS, and the JSON result structure — plus a delegation model in which one plugin can call another (this is exactly the hook Multus exploits). It says nothing about how addresses are allocated, how routes are programmed, or how policy is enforced. IPAM is itself a pluggable sub-component: the reference host-local plugin hands out addresses from a per-node range, while Calico and Cilium ship their own IPAM that allocates from cluster-wide pools and coordinates through the Kubernetes API or a key-value store. That separation is why two CNIs can implement the identical spec yet behave nothing alike at scale — the interesting engineering all lives below the contract.
A consequence of that design that surprises newcomers is that the CNI plugin runs per pod creation, not as a long-lived router. When the kubelet starts a pod it triggers a single ADD invocation that wires the namespace and exits; from then on, packet forwarding is handled by whatever the plugin programmed into the kernel — routes, iptables chains, or eBPF programs — with no plugin process in the data path. This is why CNI performance is really kernel-dataplane performance: the plugin binary’s own speed is almost irrelevant to throughput, because it does its work once at pod setup and then gets out of the way. Understanding this collapses a lot of confusion about “which CNI is faster” — the answer is always “which dataplane the CNI programmed,” not “which plugin binary ran.”
Calico vs Cilium vs Flannel: the core comparison
For a typical production cluster in 2026, Cilium is the strongest default: eBPF dataplane, kube-proxy replacement, Layer 3 to Layer 7 policy, transparent encryption, and Hubble observability. Calico is the mature, flexible choice with both iptables and eBPF modes and the richest declarative policy model. Flannel is the right answer only when you want pure connectivity with zero policy and minimal moving parts.

Figure 1: The CNI ADD path from kubelet to a connected pod, branching into the three dataplane families.
Figure 1 traces what actually happens when a pod starts. The kubelet creates the pod sandbox and asks the container runtime to set up networking; the runtime invokes the configured CNI plugin with ADD; the plugin’s IP Address Management (IPAM) component allocates an address, creates the virtual ethernet (veth) pair that bridges the pod namespace to the host, and then programs the dataplane. The branch at the bottom is the whole story of this comparison: the same CNI contract terminates in three radically different packet-forwarding strategies.
The CNI contract is small; the dataplane is everything
It is tempting to compare these tools on features, but the durable distinction is the dataplane — the mechanism that forwards a packet from one pod to another. Flannel uses an overlay: it encapsulates pod packets inside VXLAN (Virtual Extensible LAN) frames and tunnels them between nodes over UDP port 8472. Calico’s standard mode uses native Layer 3 routing with iptables and conntrack for policy and connection tracking, optionally distributing routes with BGP (Border Gateway Protocol). Cilium uses eBPF programs attached to kernel hooks to forward, load-balance, and filter packets without iptables at all. Every performance, policy, and observability difference downstream flows from this one choice.
Why eBPF changed the calculus
The shift to eBPF is the defining trend of the last several years. Traditional kube-proxy programs iptables rules whose evaluation cost grows roughly linearly with the number of services, so large clusters spend measurable CPU walking rule chains for every new connection. An eBPF dataplane replaces those chains with hash-table lookups in kernel maps, giving near-constant-time service resolution regardless of cluster size. Both Cilium and Calico can run this way and can fully replace kube-proxy. This is the single biggest reason teams migrate off Flannel and iptables-mode Calico as their clusters and service counts grow.
Mechanically, eBPF works by letting userspace load small, verified programs that attach to kernel hooks — the traffic-control (tc) ingress and egress points, the socket layer, and XDP (eXpress Data Path) at the driver. The kernel’s verifier statically proves each program terminates and touches only permitted memory before it runs, so the safety story is stronger than a loadable kernel module. Cilium compiles its forwarding, load-balancing, and policy logic into these programs and stores state — service backends, connection tracking, identity mappings — in eBPF maps that both kernel and userspace can read. Because a service lookup is a single hash-map read rather than a linear chain walk, the cost of resolving a ClusterIP is effectively independent of how many Services exist. On a cluster with thousands of Services this is the difference between kube-proxy consuming several percent of every node’s CPU on rule synchronisation and that cost largely vanishing. NFTables, which Calico now supports, narrows the gap somewhat by giving iptables-style policy a more efficient backend, but it is still a rule-evaluation model rather than a compiled-program model.
It is worth being concrete about why the iptables model degrades, because “iptables is slow” is a lazy summary. Classic kube-proxy in iptables mode expresses every Service and every endpoint as rules in sequential chains, and a new connection traverses those chains until it matches. As Services and endpoints multiply, two costs grow: the per-connection traversal lengthens, and — more painfully — every change to the Service set forces kube-proxy to recompute and reinstall large swaths of the ruleset, an O(rules) operation that can take seconds on a busy cluster and briefly spike CPU cluster-wide. The IPVS mode that kube-proxy added years ago improved the data-path lookup by using hash tables, but it still lived inside the kube-proxy reconciliation model. The eBPF approach removes both costs at once: lookups are hash-map reads, and updates touch only the changed map entries rather than rebuilding a chain. That update cost, not just the per-packet cost, is a major reason large clusters feel the difference.
Maturity and governance matter for a network you cannot easily replace
A CNI is hard to swap once it carries production traffic, so project health is a first-class selection criterion. Cilium reached CNCF graduated status, signalling broad adoption and governance maturity, and is now the dataplane behind multiple managed Kubernetes offerings. Calico has a decade of production history and a commercial backer in Tigera, with steady releases — v3.31 added NFTables support alongside the eBPF and iptables dataplanes. Flannel is stable and widely used but deliberately narrow in scope; it has no policy engine of its own and depends on a companion like Calico for policy.
There is a fourth category worth naming so you do not over-think the choice: the cloud-provider CNIs. Amazon’s VPC CNI assigns pods real VPC IP addresses from elastic network interfaces, Azure has its own CNI variants, and GKE wraps Cilium in Dataplane V2. These integrate tightly with the cloud’s own network and security groups but trade away portability and, historically, some policy and observability depth — which is exactly why Cilium has been adopted underneath several of them. The practical reading is that on managed clusters you are often already running an eBPF dataplane whether you chose it or not, and the live decision is how much of its policy and observability surface you turn on.
Deeper analysis: dataplanes, policy, and observability
This is where the three diverge sharply. The dataplane determines raw forwarding behaviour, the policy engine determines what you can express and enforce, and the observability layer determines whether you can see what the network is doing. Figure 2 shows the forwarding paths side by side.

Figure 2: Three packet-forwarding paths — Flannel encapsulation, Calico iptables, and the Cilium eBPF dataplane.
In the Flannel path a packet leaving a pod is encapsulated into a VXLAN frame and sent as UDP to the destination node, which decapsulates it and delivers it locally. The encapsulation adds header overhead and a per-packet processing cost, and because Flannel has no policy engine, nothing inspects the packet for allow/deny decisions. In Calico’s iptables mode the packet traverses conntrack and iptables chains where policy verdicts are applied before the routing decision sends it toward the destination node, typically over native routing rather than an overlay. In the Cilium eBPF dataplane the packet hits eBPF programs at the traffic-control and socket layers; service load balancing, policy enforcement, and routing all happen in those programs, and kube-proxy is entirely absent. Fewer hops, no iptables traversal, and constant-time lookups are the payoff.
Network policy: from L3/L4 to Cilium’s L7
All three stories around policy are different. Flannel enforces nothing — you must layer Calico policy on top (the “Canal” pattern) to get any filtering. Calico implements the standard Kubernetes NetworkPolicy plus its own richer GlobalNetworkPolicy and NetworkPolicy custom resources, supporting ordered policy tiers, explicit deny rules, CIDR and namespace selectors, and DNS-based egress policy. That declarative model is the most expressive Layer 3/Layer 4 policy system among the three.
Cilium matches the Layer 3/Layer 4 model and goes a layer higher: its CiliumNetworkPolicy can express Layer 7 rules. Because the eBPF dataplane understands application protocols, you can write a policy that allows HTTP GET on /public but denies POST to /admin, or restricts Kafka topics, or limits gRPC methods — enforcement that an iptables dataplane fundamentally cannot do because it never parses the application protocol. This identity-aware, protocol-aware enforcement is Cilium’s signature capability and the main reason security-conscious teams choose it.
A subtle but important detail is how Cilium decides who a packet is from. Rather than keying policy on IP addresses — which churn constantly as pods come and go — Cilium assigns each set of pod labels a numeric security identity and carries that identity with the packet. Policy is then evaluated against identities, so a rule like “frontend may talk to backend” stays valid no matter how many times either pod is rescheduled or re-addressed. This identity model is also what makes L7 enforcement tractable: when a policy needs to parse HTTP, Cilium transparently steers the connection through an in-kernel proxy that applies the L7 rules, then returns it to the fast path. Calico’s model, by contrast, is firmly L3/L4 but compensates with operational richness: ordered policy tiers let platform teams set guardrails that application teams cannot override, and DNS-based egress rules let you allow traffic to api.stripe.com by name rather than by a brittle list of IPs. Neither model is strictly superior — Cilium reaches higher up the stack, Calico expresses richer governance at L3/L4.
A practical warning that applies to every policy engine here: Kubernetes network policy is default-allow until a pod is selected by at least one policy, at which point it becomes default-deny for the direction that policy covers. This trips up teams constantly. Applying a single ingress policy to a pod silently drops all other ingress to that pod, which is intended but surprising; conversely, a pod that no policy selects accepts everything, so a cluster can look “secured” while large swaths of it are wide open. The correct posture in a regulated environment is to apply an explicit default-deny baseline per namespace and then open specific flows, rather than relying on per-app policies whose coverage is hard to audit. This is policy semantics, not a CNI feature, but it is where most real-world policy incidents originate.
Observability: Hubble turns the dataplane into a flow graph

Figure 3: Cilium evaluating an L7 HTTP policy and emitting the verdict to Hubble for the service map.
Figure 3 shows the observability dividend of an eBPF dataplane. Because every packet already passes through Cilium’s eBPF programs, Hubble — Cilium’s observability layer — can record a structured flow for each connection: source and destination identity (Kubernetes labels, not just IPs), protocol, the policy verdict (forwarded or dropped), HTTP status codes, and DNS queries. The result is a live service map and per-flow metrics with no sidecars and no application instrumentation. Calico offers flow logs and integrates with Prometheus, but the label-aware, L7-aware flow visibility that Hubble provides is a class apart. Flannel provides essentially no observability of its own. If you care about understanding traffic, this is a decisive axis; for a broader treatment see our piece on continuous profiling with eBPF.
The operational payoff of Hubble shows up most clearly when debugging a dropped connection — historically one of the most painful tasks in Kubernetes networking. Without flow-level observability, a connection that a policy silently denies looks identical to a connection lost to a misconfigured route or a crashed pod: the client just sees a timeout. Hubble records the drop with its verdict and the policy that caused it, attributing the denial to a named policy and a source-and-destination identity, which turns a multi-hour packet-capture exercise into a single query. This is why teams that adopt Cilium for performance often end up valuing the observability more — it changes network debugging from inference to direct observation, and that change compounds over every incident.
Routing modes: overlay versus native
A dimension that quietly dominates real-world performance is whether the CNI tunnels traffic or routes it natively. Overlay (encapsulation) modes — Flannel’s VXLAN, or Cilium and Calico when configured with tunnelling — wrap each pod packet inside an outer packet so it can traverse a node network that knows nothing about pod IPs. This is portable and works on almost any underlay, which is why it is the safe default, but it costs header bytes and per-packet encapsulation work.
Native routing skips the tunnel: the underlying network is made aware of pod IP ranges, usually by Calico or Cilium advertising routes over BGP to the physical fabric or by the cloud’s route tables carrying pod CIDRs directly. Packets then travel as ordinary routed IP with no encapsulation, which is faster and easier to debug with standard tools. The trade-off is that native routing requires cooperation from the underlay — your top-of-rack switches must accept BGP peering, or your cloud must allow pod CIDRs in its route tables. On-prem teams with control of the fabric usually prefer native routing for the performance; teams on locked-down cloud networks often have no choice but to encapsulate. This is frequently a bigger lever on throughput than the Flannel-versus-Cilium choice itself, and it is independent of policy and observability — you can run native routing with full Cilium policy, or overlay with none.
There is a newer middle path worth knowing: direct-routing and “encapsulation only when needed” modes that route natively within a subnet or availability zone and fall back to encapsulation only across boundaries the underlay cannot route. Cilium’s options here let a cluster keep the performance of native routing for the common intra-zone case while preserving the portability of an overlay for the cross-boundary minority of traffic. The practical lesson is that overlay-versus-native is not strictly binary; the most performant production configurations often route natively where they can and encapsulate only where they must, and getting this right requires knowing your underlay’s actual routing capabilities rather than defaulting to a full overlay out of caution.
Encryption on the wire
Transparent in-cluster encryption is increasingly a compliance requirement. Both Cilium and Calico support WireGuard, the modern, fast, kernel-integrated VPN protocol, to encrypt node-to-node pod traffic with minimal configuration. Cilium also supports IPsec for environments that mandate it. Flannel has no native pod-traffic encryption; you would tunnel it externally. WireGuard’s lower overhead and simpler key management make it the default choice in both Cilium and Calico, though you must size MTU carefully — encapsulation overhead means the pod MTU has to drop below the node MTU or you will see fragmentation and throughput loss.
The choice between WireGuard and IPsec is not purely a performance question; it is often a compliance one. WireGuard uses a fixed, modern cryptographic suite and is dramatically simpler to operate, which is why it has become the default. IPsec, by contrast, supports the specific cipher suites and key-management regimes that some regulated environments — particularly government and finance — mandate by policy, and it predates WireGuard’s acceptance in compliance frameworks. So a team may run the slower, more complex IPsec path not because it is technically preferable but because an auditor requires a FIPS-validated cipher that WireGuard’s fixed suite does not offer. Knowing which constraint you are actually under — performance or compliance — tells you which to pick before you measure anything.
Performance: where the differences actually show up
Raw throughput between modern CNIs on a fast network is closer than vendor marketing suggests; the meaningful gaps appear in CPU overhead and tail latency under high connection churn and high Service counts. The table below is an illustrative sketch of the directional behaviour practitioners report — it is not a benchmark of your hardware, and you should run the official CNI benchmark suites on representative nodes before committing. Treat the numbers as relative shapes, not absolutes.
| Scenario | Flannel VXLAN | Calico iptables | eBPF dataplane |
|---|---|---|---|
| Pod-to-pod throughput (same network) | Lower (overlay tax) | High (native routing) | High (native routing) |
| Service resolution cost at 5,000 Services | N/A (no kube-proxy LB difference) | Grows with rule count | Near-constant |
| Per-node CPU for networking under churn | Moderate | Higher | Lowest |
| Tail latency for new connections | Moderate | Higher under load | Lowest |
The methodology caveat matters more than the table. Synthetic iperf runs between two idle pods on a 25/100 GbE network often show all three within a few percent of each other, which misleads teams into thinking the choice is irrelevant. The real cost surfaces under production conditions: thousands of Services, high pod churn from autoscaling, and short-lived connections. That is precisely the regime where iptables rule synchronisation and chain traversal become visible in node CPU graphs and where eBPF’s constant-time maps pull ahead. So benchmark the workload you actually run, not a two-pod best case.
There is one more performance dimension that single-flow benchmarks miss entirely: connection-establishment rate. Many real workloads — service meshes, serverless-style short requests, scrapers, health checkers — open and close enormous numbers of short-lived connections, and the cost there is dominated by per-connection setup work: conntrack insertion, policy evaluation, and, in iptables mode, chain traversal on the SYN. An eBPF dataplane that resolves a service and inserts connection state with hash-map operations handles a high connection-establishment rate far more gracefully than a model that walks rule chains per new flow. If your workload is “few long-lived high-bandwidth connections,” CNI choice barely matters; if it is “millions of short-lived connections,” the dataplane is one of the largest levers you have. Characterising your connection lifetime distribution is therefore a prerequisite to any meaningful benchmark.
Decision matrix
The matrix below scores the four options across the dimensions that drive most real selections. “L7” means application-layer (HTTP, gRPC, Kafka) awareness. Ratings are a practitioner’s qualitative read of 2026 capabilities, not a benchmark.
| Dimension | Flannel | Calico (iptables) | Calico (eBPF) | Cilium (eBPF) | Multus |
|---|---|---|---|---|---|
| Dataplane | VXLAN overlay | iptables + routing | eBPF | eBPF | meta (delegates) |
| Network policy (L3/L4) | None | Excellent | Excellent | Excellent | Inherits primary |
| L7 policy | No | No | No | Yes | N/A |
| Kube-proxy replacement | No | Yes | Yes | Yes | N/A |
| Encryption | None native | WireGuard / IPsec | WireGuard / IPsec | WireGuard / IPsec | Inherits |
| Observability | Minimal | Flow logs | Flow logs | Hubble (L7) | N/A |
| Multiple pod interfaces | No | No | No | No | Yes |
| Operational complexity | Lowest | Moderate | Moderate–high | Moderate–high | Adds a layer |
| Best fit | Dev / simple clusters | Policy-heavy on-prem | High-scale policy | Security + observability | Telco / NFV |
Multus: when one interface is not enough

Figure 4: Multus attaching a primary cluster interface plus SR-IOV and Macvlan secondary interfaces to a pod.
Multus is not a competitor to the other three — it is a meta-plugin that sits in front of them. Standard Kubernetes gives every pod exactly one network interface. Telco, network function virtualization (NFV), and high-performance workloads frequently need more: a control-plane interface on the cluster network plus dedicated high-throughput data-plane interfaces. Multus solves this by delegating: the primary interface (eth0) is set up by a normal CNI such as Calico, Cilium, or Flannel, and Multus attaches additional interfaces (net1, net2) using other plugins — SR-IOV (Single Root I/O Virtualization) for line-rate hardware-backed networking, or Macvlan for a separate Layer 2 segment. This is why Multus is effectively mandatory in 5G core and NFV deployments, where signalling, management, and user-plane traffic must ride separate interfaces with distinct QoS.
Operationally, Multus is driven by a NetworkAttachmentDefinition custom resource that names a secondary CNI configuration, and pods request extra interfaces through a k8s.v1.cni.cncf.io/networks annotation listing the attachments they want. Flannel is a popular primary under Multus precisely because it is simple and gets out of the way, leaving the specialised plugins to handle the data plane. The pattern shows up far beyond telco: machine-learning clusters use it to give pods a dedicated RDMA (Remote Direct Memory Access) interface for GPU-to-GPU traffic while keeping ordinary control traffic on the cluster network. The cost is that you now operate two or more CNIs at once, and IP address management, MTU, and policy must be reasoned about per interface rather than per pod.
A frequently missed consequence of Multus is that network policy generally applies only to the primary interface. The standard policy engines understand the cluster network the primary CNI manages; the secondary SR-IOV or Macvlan interfaces typically sit outside that policy enforcement entirely, because they bypass the primary dataplane by design — that is the whole point of attaching them. This means a pod with a high-throughput secondary interface may have a completely unpoliced second path to the network, which is acceptable in a tightly controlled telco data plane but is a serious gap if someone attaches a secondary interface casually. The rule of thumb is to treat every secondary interface as an explicit, audited exception to your policy posture, not as just “another NIC,” because your policy engine almost certainly is not watching it.
Trade-offs, Gotchas, and What Goes Wrong
The biggest mistake teams make is choosing on raw benchmark numbers. Throughput differences between a well-tuned overlay and an eBPF dataplane are often dwarfed by application behaviour, and the cost that actually bites in production is CPU spent on connection tracking and iptables rule evaluation as service counts climb — which is exactly where eBPF wins, not in single-flow line-rate tests.
Overlay overhead is real but situational. Flannel’s VXLAN encapsulation adds latency and CPU, and the reduced effective MTU silently degrades large transfers if you forget to lower the pod MTU. On a flat datacentre network, native routing (Calico with BGP, or Cilium in routing mode) avoids the tunnel entirely and outperforms an overlay; in a cloud with restrictive networking you may be forced into encapsulation regardless of CNI.
eBPF is not free either. It demands a reasonably recent kernel, and debugging eBPF-mode forwarding requires a different mental model and tooling than iptables -L. Calico’s eBPF mode and Cilium both move logic out of familiar iptables, so on-call engineers must learn cilium monitor, Hubble, and bpftool. Migrating an existing cluster from kube-proxy to an eBPF replacement is a dataplane change that needs careful rollout and rollback planning. There is also a subtler operational tax: when something goes wrong at 3 a.m., a tenured engineer can reason about iptables chains from memory, whereas eBPF program state lives in kernel maps you must know how to dump. That gap is closing as tooling matures, but it is real, and it is why some teams keep Calico in iptables mode despite the performance on the table — they are buying debuggability, and that is a legitimate trade.
WireGuard MTU is a recurring incident. In encapsulating environments (for example WireGuard over a cloud overlay), the per-packet overhead can stack; vendor guidance such as Calico’s recommended ~8,941-byte MTU in AWS VPCs exists precisely because operators get this wrong and see mysterious throughput cliffs. Finally, Multus multiplies failure surface: a misconfigured NetworkAttachmentDefinition or an exhausted SR-IOV virtual function pool can leave pods stuck in ContainerCreating with errors that are far less obvious than a single-CNI failure.
Two more traps deserve naming. The first is policy drift on Flannel: because Flannel enforces nothing, a cluster that “has network policy” because someone applied NetworkPolicy objects may in fact be enforcing none of them if no policy-capable CNI is installed — the objects are accepted by the API and silently ignored. Teams discover this during a security audit, not before. The second is the migration cliff between dataplanes. Switching a live cluster from kube-proxy to an eBPF kube-proxy replacement, or from Calico iptables mode to eBPF mode, is not a config toggle you flip casually: existing connections, NodePort behaviour, and source-IP preservation can all shift, so it needs a staged rollout, a tested rollback, and a maintenance window. Underestimating this is how a “simple CNI upgrade” turns into an incident.
A third trap is IP address exhaustion, which is invisible until it is catastrophic. Every CNI allocates pod IPs from a CIDR, and the way it carves that CIDR into per-node blocks determines how many pods a cluster can ever hold. A cluster sized with a /24 per node and a modest cluster CIDR can hit a wall where new pods cannot schedule because no addresses remain, even though CPU and memory are plentiful — and on cloud CNIs that assign real VPC IPs, the limit can be the VPC subnet or the per-node ENI capacity rather than the cluster CIDR at all. This is a planning decision made at cluster creation that is painful to change later, so it belongs in the same early-and-generous bucket as partition counts and bucket sizing: estimate peak pod density per node, multiply by node count with headroom, and verify the CIDR and per-node block size support it before the cluster carries traffic.
Practical Recommendations
Start from the workload, not the brand. If you are running a development cluster, an edge node, or anything where you genuinely do not need policy, Flannel’s simplicity is a feature — fewer components means fewer ways to fail. The moment you need to enforce network segmentation, you need a real policy engine, which means Calico or Cilium (or Flannel plus Calico policy).
Between Calico and Cilium, let your priorities decide. Choose Cilium when you want L7 policy, the best-in-class observability of Hubble, kube-proxy replacement, and you are comfortable operating eBPF — it is the strongest general-purpose default in 2026 and the path of least resistance on managed clusters that already default to it. Choose Calico when you want the most expressive declarative policy model, value the option to run iptables, eBPF, or NFTables dataplanes, or run on-prem with BGP integration into your physical fabric. Add Multus only when a pod provably needs more than one interface; do not adopt it speculatively. For clusters where networking cost is a line item, pair your CNI choice with the practices in our guide to Kubernetes cost optimization.
Whatever you pick, treat the dataplane as a deliberate, tested choice rather than a default you inherited. If you are on a managed cluster, find out which dataplane it actually ships and whether you are permitted to bring your own — that single fact determines what policy and observability you can ever turn on. If you self-manage, prototype the chosen CNI in its production dataplane mode (eBPF if that is the plan) on a staging cluster that mirrors your Service count and pod churn, not a toy cluster, because the regime where the differences appear is precisely the busy one. And budget for the operational learning curve: an eBPF dataplane pays back in performance and observability but asks your on-call engineers to learn new tooling. The worst outcome is choosing a powerful CNI, never enabling its policy or encryption, and carrying the complexity for none of the benefit.
Selection checklist:
- Do you need network policy at all? No → Flannel. Yes → continue.
- Do you need L7 (HTTP/gRPC/Kafka) policy or rich flow observability? Yes → Cilium.
- Do you need maximum declarative policy flexibility, BGP, or an iptables option? → Calico.
- Will you replace kube-proxy? → Cilium or Calico in eBPF mode.
- Do any pods need multiple interfaces (telco/NFV/HPC)? → add Multus over your chosen primary CNI.
- Is in-cluster encryption required? → enable WireGuard on Cilium or Calico and pre-test MTU.
- Have you sized the pod CIDR and per-node IP blocks for peak pod density? → verify before launch.
Frequently Asked Questions
Is Cilium always better than Calico?
No. Cilium leads on L7 policy, Hubble observability, and being the default on several managed platforms, which makes it the strongest general default. But Calico offers a more flexible, more expressive declarative policy model, the choice of iptables, eBPF, or NFTables dataplanes, and mature BGP integration for on-prem fabrics. For policy-heavy on-prem clusters or teams that prefer iptables-based debugging, Calico is often the better fit. Match the tool to the workload rather than assuming one universally wins.
Can I use Flannel in production?
Yes, but only where you do not need network policy or advanced features. Flannel is stable and widely deployed for straightforward connectivity, and its simplicity reduces operational risk. The catch is that it enforces no policy and offers minimal observability, so regulated or multi-tenant environments will outgrow it quickly. A common pattern is Flannel for connectivity plus Calico for policy (Canal), though running a single policy-capable CNI is usually cleaner.
What does it mean to replace kube-proxy with eBPF?
Kube-proxy implements Kubernetes Service load balancing using iptables rules whose evaluation cost grows with the number of services. Both Cilium and Calico can run an eBPF dataplane that performs the same service load balancing using kernel hash maps instead, giving near-constant-time lookups regardless of cluster size and removing kube-proxy entirely. This reduces CPU overhead and latency on large clusters, which is the main reason teams adopt eBPF-based CNIs as service counts grow into the thousands.
Do I need Multus, or is it overkill?
Multus is overkill for most clusters and essential for a specific class. If your pods are fine with a single network interface — which is true of nearly all web and microservice workloads — you do not need it. You need Multus when a pod must attach multiple interfaces, typically in telco, NFV, or 5G core deployments that separate signalling, management, and user-plane traffic, or in HPC needing SR-IOV. It runs alongside a normal primary CNI rather than replacing it.
How do Calico and Cilium handle encryption?
Both support transparent node-to-node encryption of pod traffic. WireGuard is the recommended option in each — it is fast, kernel-integrated, and simple to enable — and Cilium additionally supports IPsec for environments that mandate it. The key operational caveat is MTU: encapsulation and encryption add per-packet overhead, so you must lower the pod MTU below the node MTU to avoid fragmentation and throughput loss. Flannel has no native pod-traffic encryption and would require an external tunnel.
Which CNI is the default in managed Kubernetes?
By 2026 the managed landscape has converged toward eBPF dataplanes. Google Kubernetes Engine uses Cilium-based Dataplane V2, Azure AKS offers Cilium as a built-in option, and Cilium is widely recommended on Amazon EKS. Managed clusters often abstract the CNI choice, but understanding which dataplane sits underneath matters for policy capabilities, observability, and kube-proxy behaviour. If you self-manage, you choose freely; if you use a managed offering, check which dataplane it ships and whether it permits a custom CNI.
Further Reading
- eBPF Kubernetes observability: replacing APM — how the same eBPF dataplane that powers Cilium reshapes monitoring.
- Continuous profiling with eBPF — going deeper on what eBPF unlocks beyond networking.
- Kubernetes cost optimization and GPU right-sizing — pairing networking decisions with cluster economics.
- Calico documentation and Cilium documentation — the authoritative, version-current references for each dataplane.
By Riju — about
