Cilium vs Calico for Industrial Kubernetes Networking — ADR (2026)

Choosing a CNI for a single SaaS cluster is mostly a tooling preference. Choosing one for Cilium vs Calico industrial Kubernetes networking — where the same control plane has to reach from a cell controller on a noisy plant floor up through a regional cluster and into a cloud aggregator — is an architecture decision with consequences that last for years. Get it wrong and you spend the next eighteen months fighting MTU on overlays, fighting BGP with the network team, or fighting an open-source license change. Get it right and the CNI fades into the background and lets the rest of the platform — service mesh, observability, GitOps — actually do its job.

This post is an ADR (architecture decision record) in blog form. It walks the context, the candidates, the weighted scoring, the call, and the consequences. It’s deliberately balanced. Both projects are good. The point is to make the trade-offs legible so your team can replay the same reasoning in your own environment. What this post covers: drivers, options, weighting, detailed comparison, decision, consequences, implementation notes, gotchas, and an FAQ.

Context and decision drivers

Answer up front: “industrial Kubernetes” is not the same workload as a cloud SaaS cluster, and three things change the CNI calculus — the OT/IT boundary, the edge footprint, and the network team owning the underlay. Anything that ignores those three will steer you toward the wrong default.

In a typical industrial deployment you have at least three tiers of Kubernetes. At the cell or machine level you have something like K3s or MicroK8s running on a single node or two-node HA, talking to PLCs over Modbus, OPC UA, or EtherNet/IP. At the plant level you have a small multi-node cluster that aggregates lines, runs vision and ML inference, and brokers traffic to and from the historian. Above that sits a regional or cloud cluster that hosts the digital twin, the MES connector, and the data lake ingestion path. The CNI has to behave acceptably at all three layers and ideally support a single observability and policy story across them.

The OT/IT boundary matters because the people who own the plant network are not the same people who own the AWS VPC. They speak BGP, MPLS, and VLAN. They are unlikely to be impressed by a Kubernetes admin who wants to tunnel everything in VXLAN over their carefully engineered fabric. A CNI that can run with native routing — advertising pod CIDRs over BGP into the top-of-rack switches — is a far easier conversation than one that demands an overlay everywhere. Conversely, the cloud tier is the opposite: there’s no BGP fabric, just a software-defined network, and the value tilts toward identity-aware policy and observability.

The edge footprint matters because cell controllers are not data-center hardware. They are often four-core, eight-gigabyte boxes that already run a vision pipeline. A CNI that consumes a gigabyte of RAM per node is unacceptable. So is anything that requires a recent kernel the OT operating system vendor will not ship. Most industrial edge OSes today ship a 5.10 or 5.15 LTS kernel with selective backports, and eBPF features the upstream project added “yesterday” may not be available for another year.

Finally, regulatory and safety constraints push you toward strong identity, immutable audit trails, and explicit egress controls. NIS2 in Europe, the IEC 62443 family across most industrial markets, and customer-mandated zero-trust postures all expect that you can answer “who talked to whom, when, and over what protocol” with high resolution. That requirement disproportionately rewards CNIs with first-class flow observability.

The drivers we will weight, then, are: identity-aware policy, L7 and CRD-driven policy, observability, multi-cluster and edge mesh capability, encryption, performance and footprint, BGP and underlay integration, operational simplicity, and project governance. Read the CNI comparison across Calico, Cilium, Flannel, and Multus for the broader landscape; this post focuses on the head-to-head.

Options considered

Answer up front: the field narrows to Cilium and Calico for any serious industrial deployment. Antrea, Flannel, and the cloud-provider CNIs are credible niche picks but lose on one or more of the criteria above. We considered four contenders before scoring.

Cilium is an eBPF-native CNI. The data plane runs as eBPF programs attached to socket, tc, and XDP hooks, replacing most of the work iptables and kube-proxy used to do. The control plane is the cilium-agent on every node plus an operator. Identity is allocated per pod label set and propagated globally, including across clusters via ClusterMesh. The Hubble subsystem exports flow telemetry, and Tetragon (sister project) adds eBPF-based runtime security. Cilium graduated in CNCF in late 2023 and is the most active CNI project on GitHub by a wide margin. Isovalent — the company behind Cilium — was acquired by Cisco in 2024, which we’ll discuss honestly.

Calico is the elder statesman. The default data plane has been iptables on Linux netfilter, with Felix as the per-node agent and BIRD or GoBGP as the BGP speaker. Calico can run pure L3 with no overlay (the deployment most network engineers love), IPIP or VXLAN where the underlay does not cooperate, and an eBPF data plane as an opt-in. Calico Open Source is project-led with steward Tigera commercializing on top. It has a long, boring track record of being the safe pick in regulated industries. The eBPF mode is real and matured significantly in 2024 and 2025; it is no longer fair to call it experimental.

Antrea is the VMware-led OVS-based CNI. It is excellent in vSphere-heavy environments and integrates cleanly with NSX. For pure-Kubernetes industrial customers without a VMware footprint, the community gravity is smaller and there’s less Stack Overflow history to lean on when an edge node misbehaves at 3 a.m.

Flannel is simple and stable but offers no network policy of its own. In an industrial context that’s a non-starter. People still pair it with Calico-for-policy on small edge clusters, but at that point you might as well run Calico end-to-end.

We dismiss Antrea and Flannel for the head-to-head but acknowledge both. The scoring below is Cilium versus Calico.

Decision criteria and weighting

Answer up front: we score on nine weighted criteria summing to one hundred percent. Weights reflect industrial-context priorities: identity and observability matter more than they would in a pure SaaS cluster, while raw throughput matters less than predictability under load.

#	Criterion	Weight	Why it matters in industrial K8s
1	Identity-aware policy	15%	Zero-trust between OT and IT planes demands stronger identity than IP
2	L7 and CRD policy	10%	Layer-7 rules on OPC UA, HTTP, gRPC, Kafka, DNS reduce blast radius
3	Observability and flow logs	15%	IEC 62443 audit, incident response, capacity planning all depend on flow visibility
4	Multi-cluster and edge mesh	10%	Federated services across plant and cloud clusters
5	Wire encryption (WireGuard / IPsec)	10%	WAN segments between sites usually demand encryption
6	Performance and footprint	10%	Cell controllers are small, plant clusters carry sustained streaming traffic
7	BGP and underlay integration	10%	Plant network teams own the fabric and speak BGP
8	Operational simplicity	10%	Site engineers may not be Kubernetes specialists
9	Project governance and OSS risk	10%	Eighteen-month-plus deployments need a stable upstream story

The weights are not universal. If your industrial deployment is greenfield, identical SDN at every site, and run by a small central platform team, BGP and operational simplicity matter less and observability matters more. If your deployment is a brownfield retrofit across forty plants with established fabric and few platform engineers, the opposite is true. The framework still applies; only the weights change.

Detailed comparison

Answer up front: Cilium leads on identity, L7 policy, observability, and multi-cluster; Calico leads on BGP integration, operational simplicity, and governance; the two are close on encryption and performance. Below is the per-criterion breakdown that drives the scores.

Data plane

Cilium’s eBPF data plane bypasses iptables for the hot path. Pod-to-pod traffic is steered by programs attached to socket, tc, and XDP hooks. The kube-proxy replacement removes a long-standing scaling pain. The published throughput numbers are favorable, but the more relevant industrial property is predictability: under heavy churn (deployments rolling, services scaling, NetworkPolicies updating), eBPF programs do not blow up linearly the way iptables rule chains can.

Calico’s default iptables data plane is mature, predictable, and well-understood by network engineers. It scales well into the low thousands of services per cluster. The eBPF data plane in Calico is genuinely solid in 2026; it works, it’s documented, and it is a credible alternative to Cilium for shops that want eBPF benefits without leaving the Calico ecosystem. The catch is that fewer people run it in production, and the surrounding tooling around it is thinner than Cilium’s.

Identity and policy

Cilium assigns numeric identities to pods based on their label sets and propagates them in packet metadata. Policy is enforced in eBPF against identity rather than IP, which is profoundly useful when pods churn or when IP CIDRs overlap across clusters. CiliumNetworkPolicy supports L7 rules for HTTP, Kafka, DNS, gRPC, and several other protocols out of the box, and policy can target identities across ClusterMesh peers.

Calico’s NetworkPolicy is rock-solid at L3 and L4, with GlobalNetworkPolicy and tiered policy via CRDs. L7 enforcement is supported through a Tigera commercial component or via integration with Envoy; in pure open source, you get L3/L4 with selectors. For the policy primitives most industrial teams actually need — segmenting OT-facing namespaces, blocking unsolicited egress, restricting which namespaces can reach the historian — Calico is fully adequate.

Encryption

Both projects support WireGuard for pod-to-pod and node-to-node encryption. WireGuard is the modern default; IPsec is supported in both but heavier to operate. The configuration models are similar. For an industrial deployment where the WAN segment between a plant and the cloud is the part you want encrypted, either CNI gets you there, and you should consider scoping encryption to the WAN-facing nodes rather than blanket pod-to-pod inside a plant.

Observability

This is where Cilium pulls ahead clearly. Hubble exports per-flow metadata — source identity, destination identity, verdict, latency, L7 attributes — and the Hubble UI gives you a service map that updates in near real time. For incident response in an industrial plant, where the question “what talked to the historian in the last ten minutes” needs a fast answer, Hubble’s flow visibility is a step change.

Calico has logs, metrics, and flow visibility via the commercial Tigera offering, but open-source Calico does not match Hubble’s depth. You can wire Calico to OpenTelemetry collectors and to Pixie or other observability stacks — see the eBPF observability tutorial — but it’s more assembly than out-of-box. For pure open-source posture, Cilium wins this row by a wide margin.

Multi-cluster and edge mesh

Cilium’s ClusterMesh peers two or more clusters and lets services discover each other by name across cluster boundaries. Combined with identity, this gives you cross-cluster policy without bolting on a service mesh. For an industrial pattern with one regional cluster and ten plant clusters, ClusterMesh is the simplest path to “the digital twin in the regional cluster can query the cell controller in the plant cluster, subject to a single policy.”

Calico has federation through Tigera commercial offerings and various community patterns, but multi-cluster identity is not as integrated. For multi-cluster industrial fleets, Cilium has the architectural advantage. The eBPF vs sidecar service mesh ADR walks through how Cilium’s mesh-lite story compares with Istio and Linkerd.

BGP and underlay integration

Calico wins here, full stop. Calico has been BGP-first since day one. Plant network teams who already run a BGP fabric with ToR switches can peer Calico nodes directly, advertise pod CIDRs, and run a pure-L3 cluster with no overlay. It’s the kind of setup where the network team nods politely and then leaves you alone for six months.

Cilium added BGP support and has matured it substantially, but it is still catching up. For greenfield clouds it doesn’t matter; for brownfield plant deployments where the fabric is a given, Calico is simply easier.

Performance and footprint

For the cell-level footprint — a single node with four cores and eight gigabytes of RAM, running K3s — Calico in pure-iptables mode with a minimal Felix configuration is the leanest option we tested in 2025. Cilium’s footprint has improved markedly with the minimal-build options, but iptables-Calico remains a tighter fit on the smallest boxes.

At plant and regional scale, Cilium’s data plane is at parity with Calico-iptables and often ahead, particularly under heavy churn. Numbers depend so heavily on workload and kernel that we won’t publish a single benchmark here; run your own with your traffic mix.

Governance and OSS risk

Calico Open Source is project-led with Tigera commercializing on top. There has been no relicensing pressure on the open-source core in the years we’ve tracked it. Cilium is a CNCF Graduated project, governed under CNCF, but Isovalent — the original sponsor — is now part of Cisco. Cisco has so far been a constructive steward, but any acquisition introduces a non-zero risk that future commercial features will erode the open-source offering. We’re not predicting that; we are noting the asymmetric risk.

The decision

Answer up front: Cilium as the default for greenfield clusters; Calico for brownfield plant clusters with established BGP fabric or extreme footprint constraints. Total weighted scores are close, and the right call is per-cluster rather than per-organization.

Running the criteria through the weights gives roughly:

Criterion	Weight	Cilium score (0–5)	Calico score (0–5)	Weighted Cilium	Weighted Calico	Notes
Identity-aware policy	15%	5	3	0.75	0.45	Cilium identity is first-class
L7 and CRD policy	10%	5	3	0.50	0.30	Calico L7 needs Tigera
Observability	15%	5	3	0.75	0.45	Hubble vs assembly
Multi-cluster / edge mesh	10%	5	3	0.50	0.30	ClusterMesh advantage
Wire encryption	10%	4	4	0.40	0.40	Both ship WireGuard
Performance / footprint	10%	4	4	0.40	0.40	Cilium plant, Calico cell
BGP / underlay	10%	3	5	0.30	0.50	Calico is BGP-native
Operational simplicity	10%	3	4	0.30	0.40	Calico fewer moving parts
Governance / OSS risk	10%	3	4	0.30	0.40	Cisco overhang on Cilium
Total	100%	—	—	4.20	3.60	Cilium leads, not by a landslide

Cilium wins the aggregated score, but the margin is small enough that the right answer is a per-tier default, not a one-size-fits-all decision.

Cloud and regional tier — Cilium. ClusterMesh, Hubble, Tetragon, identity. This is where observability and multi-cluster pay off most.
Plant tier greenfield — Cilium. Same reasoning, with WireGuard scoped to WAN-facing nodes.
Plant tier brownfield — Calico. If the plant network team already speaks BGP and the fabric is a given, Calico saves you months of cross-team negotiation.
Cell controller tier — Calico (default) with Cilium minimal as an option. Footprint and operational simplicity dominate at this scale.

This is the actual ADR call our team carries forward. It maps cleanly to how clusters get provisioned: the GitOps pipelines lay down the CNI based on a tier and brownfield label on each cluster registration, with no further argument needed. See the GitOps for industrial fleets tutorial for how to encode that in Argo CD.

Consequences

Answer up front: the decision buys identity, observability, and multi-cluster reach at the cost of higher kernel-version sensitivity and a learning curve for engineers used to iptables-shaped mental models. Both are worth it; both are real.

Positive consequences

Unified identity across clusters simplifies zero-trust enforcement. A pod label like tier=historian carries the same meaning from cell to cloud.
Flow visibility out of the box via Hubble. Incident response time drops because “who talked to what” is a query, not a forensic exercise.
Multi-cluster services without a sidecar mesh. ClusterMesh covers most of what teams reach for Istio to do, at a fraction of the operational cost.
eBPF as a substrate for other capabilities: runtime security via Tetragon, observability via Pixie or Cilium-native exporters, traffic engineering down the road.

Negative consequences

Kernel-version sensitivity. Some Cilium features need 5.15 or 6.x kernels. Confirm that your edge OS vendor supports the version you depend on before committing.
Learning curve. Engineers who debug networking by iptables -L -n -t nat | grep ... need to learn cilium monitor, cilium bpf, and hubble observe. Allow time.
Cisco overhang. If Cilium’s commercial direction changes meaningfully, you need a documented exit. Calico is that exit path.
Brownfield friction. Cilium’s BGP is competent but not yet on par with Calico’s. If your network team is BGP-first, that friction is real and Calico is the kinder choice for those clusters.

Implementation notes

Answer up front: install via Helm with explicit values files per tier, encode the decision in GitOps, and start with Hubble in observation-only mode before you write a single policy.

A few concrete patterns that have worked for us and clients:

Helm values per tier. Maintain values-cloud.yaml, values-plant.yaml, values-cell-cilium.yaml, and values-cell-calico.yaml in the platform repo. Differ only by what actually differs — encryption, BGP, Hubble enablement — not by stylistic preference.
kube-proxy replacement. If you’re going Cilium, run it. kubeProxyReplacement: true removes a class of conntrack issues and is usually a clear win in 2026.
WireGuard scoped to WAN nodes. Don’t pay the encryption tax on intra-plant pod-to-pod traffic. Use node labels and Cilium’s encryption configuration to scope WireGuard to nodes that hold the WAN uplink.
BGP via the Cilium BGP Control Plane (for Cilium) or via BGPConfiguration and BGPPeer CRs (for Calico). Coordinate ASNs and route filtering with the plant network team before any deployment hits production.
Hubble in observe-only first. Turn on Hubble Relay, point a dashboard at it, watch traffic for a week. Then write CiliumNetworkPolicy rules based on observed flows, not on speculation. This is the cheapest way to avoid breaking factory floor comms.
Image registries on-prem. Cell-level clusters often have intermittent WAN. Mirror Cilium and Calico images to a local registry; assume the WAN will be down when you need to pull.
Upgrade discipline. Treat CNI version like kernel version. Pin it, test it on a non-prod plant cluster first, never auto-upgrade.

For data-side decisions that depend on the network, see the industrial lakehouse ADR comparing Iceberg, Delta, and Hudi. The CNI choice shapes how cleanly that pipeline lands in your edge-to-cloud topology.

Trade-offs and gotchas

Answer up front: the most painful failure modes are MTU mismatches, NodePort vs LoadBalancer assumptions, kernel feature gaps, and IP exhaustion. None are CNI-specific, but each shows up earlier and harder in industrial deployments.

MTU. Overlays steal bytes. If your fabric MTU is 1500 and you run VXLAN or IPIP, your effective pod MTU is around 1450. PLC traffic is usually small, but ML inference with images is not. Set MTU explicitly in the CNI config and validate with ping -M do -s ....
NodePort exposure. In OT contexts, NodePort is often the only path from a non-Kubernetes appliance into the cluster. Both CNIs support it, but make sure your network policy doesn’t accidentally block the host network namespace.
DNS performance. CoreDNS under heavy churn is a recurring sore point. Cilium offers a local DNS cache; Calico relies on standard NodeLocal DNSCache. Both work; both need to be enabled deliberately.
IP exhaustion. Cluster CIDRs that looked generous in 2023 are tight in 2026 once you run two replicas per workload across dozens of namespaces. Plan for IPv6 dual-stack or at minimum reserve a larger pod CIDR than you think you need.
eBPF feature drift. A feature on Cilium 1.14 may not be present on the 1.12 you actually shipped a year ago to edge nodes that nobody has touched since. Audit installed versions before you assume parity.
Calico eBPF DSR. Direct server return in Calico’s eBPF mode is great when the underlay is symmetric. Asymmetric underlays — common in older plant networks — can break it.
Cilium ClusterMesh DNS scope. Cross-cluster services depend on identical service-name resolution. If you renamed services between clusters, you will find the difference at the worst time.

Practical recommendations

Answer up front: decide per-tier, codify the decision in GitOps, instrument first, encrypt at the WAN, and stage every upgrade through a plant pilot cluster.

A short checklist to carry into your own ADR:

Draw the topology before you pick the CNI. Tiers, brownfield vs greenfield, network-team relationship.
Weight the nine criteria for your context. Don’t copy our weights blindly.
Default to Cilium for cloud and greenfield; default to Calico for brownfield plant; pick by footprint at the cell.
Encode the choice as a label on every cluster registration; let GitOps pick the values file.
Turn on observability before policy. Hubble in observe-only for a week per tier minimum.
Scope encryption. WireGuard at the WAN, not the whole plant LAN.
Coordinate BGP with the network team in writing. ASNs, prefixes, route filters, failure modes.
Pin CNI versions, mirror images on-prem, stage upgrades through a pilot plant cluster.
Document the exit path. If Cilium changes direction, your Calico playbook is your fallback.
Revisit the ADR every twelve months. CNI maturity moves fast.

FAQ

Is Calico’s eBPF mode a real alternative to Cilium in 2026?

Yes. Calico’s eBPF data plane shipped years ago and has matured significantly through 2024 and 2025. It supports kube-proxy replacement, DSR, and competitive performance. Calls of “experimental” are out of date. That said, the surrounding eBPF tooling — flow visibility, identity propagation, multi-cluster — is thinner than Cilium’s. If you want eBPF specifically for performance, Calico-eBPF is credible. If you want eBPF for the full identity-plus-observability story, Cilium remains ahead.

Does the Isovalent acquisition by Cisco change the Cilium recommendation?

Not yet. Cilium is a CNCF Graduated project and the open-source roadmap has continued. Cisco has been a constructive steward through 2024 and 2025. We are not predicting harm, but we are noting that any acquisition introduces tail risk. The mitigation is to keep the architecture portable: design policies and identity models that map to Calico, document the migration path, and avoid leaning on closed-source Isovalent commercial features unless you’ve contracted for them explicitly.

Can we run Cilium on cell controllers with a 5.10 kernel?

Partly. The core Cilium data plane runs on 5.10, but some features (BPF LSM, certain bandwidth-manager paths, newer Tetragon hooks) need 5.15 or later. For a four-core, eight-gigabyte cell controller, we recommend Calico in pure-iptables mode for the footprint, or a deliberately minimized Cilium build if you’ve already standardized on Cilium elsewhere and accept the trade-off in resource use.

What about Cilium Service Mesh versus running Istio on top of Calico?

For industrial workloads where the value of a service mesh is mTLS plus identity-aware policy plus L7 visibility, Cilium Service Mesh covers most of that without sidecars. Istio on top of Calico is still a perfectly valid choice if you need Istio-specific features (Envoy filters, deep traffic-shifting, ecosystem integrations). The eBPF-vs-sidecar trade-off is its own ADR; see our eBPF vs sidecar service mesh post for the long form.

How do we migrate from Calico to Cilium without downtime?

You generally don’t, in place. Both CNIs assume they own the data plane. The reliable pattern is to provision a new cluster with Cilium, replicate workloads via GitOps, shift traffic via a higher-level load balancer or service mesh, and decommission the old cluster. For industrial clusters with stateful edge workloads, plan for a maintenance window aligned to a production downtime. Trying to swap CNI under live PLC traffic is not a path to a happy plant manager.

Is there a one-CNI answer for the whole fleet?

If you’re prepared to invest in operational maturity, yes — Cilium can run at every tier in 2026, including cell controllers with care. But the cost-benefit usually favors a mixed deployment: Cilium where its strengths compound (cloud, regional, greenfield plant), Calico where the brownfield reality demands it. The mixed model is not a compromise; it’s the honest answer to a mixed environment.

Cilium vs Calico for Industrial Kubernetes Networking — ADR (2026)

Cilium vs Calico for Industrial Kubernetes Networking — ADR (2026)

Context and decision drivers

Options considered

Decision criteria and weighting

Detailed comparison

Data plane

Identity and policy

Encryption

Observability

Multi-cluster and edge mesh

BGP and underlay integration

Performance and footprint

Governance and OSS risk

The decision

Consequences

Positive consequences

Negative consequences

Implementation notes

Trade-offs and gotchas

Practical recommendations

FAQ

Further reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories