Kubernetes Network Policy Egress: RDS & External Services

Kubernetes Network Policy Egress: RDS & External Services

Kubernetes Network Policy Egress: Locking Down RDS and External Services

A Kubernetes network policy egress rule controls which destinations a pod is allowed to reach. By default, every pod in a cluster can talk to anything — your production database, the public internet, even the cloud metadata endpoint that hands out IAM credentials. Egress policies flip that wide-open posture to deny-by-default, then explicitly allow only the destinations a workload genuinely needs, such as an RDS instance, CoreDNS, and a short list of external APIs.

Last updated: June 2026.

This guide walks through the egress model from the ground up. We start with how the native NetworkPolicy object works, then explain why static CIDR rules and DNS-aware (FQDN) rules solve two different problems. We trace how those rules become real datapath enforcement in the kernel, work through RDS-specific YAML you can adapt, catalogue the pitfalls that silently break things, compare Calico and Cilium under production load, and finish with a hardening checklist and a 2026 FAQ. Every YAML block below is illustrative and must be adapted to your cluster — labels, CIDRs, ports, and namespaces will differ — before you apply it.


How Kubernetes NetworkPolicy Egress Works

A NetworkPolicy is a namespaced object that selects pods with a label selector and lists ingress and egress rules for them. The moment a policy with policyTypes: [Egress] selects a pod, that pod switches from “allow all outbound” to “deny all outbound except what’s listed.” Each egress rule combines a destination (to) with allowed ports, and a pod is allowed out only if at least one rule matches both.

The native API is deliberately Layer 3/4 only — it reasons about IP addresses and ports, never application payloads. You can express a destination three ways inside a to block: ipBlock (a raw CIDR range, optionally with except sub-ranges), namespaceSelector (pods in matching namespaces), and podSelector (pods carrying matching labels). There is no native field for matching a DNS name. That single gap is the reason every DNS-aware extension in this guide exists, and it is the most important fact to internalize before you design an egress strategy.

Two behaviors surprise newcomers. First, policies are additive and there is no deny rule — you never write “block X.” You instead select a pod (which makes it default-deny) and then allow the narrow set of destinations it needs; anything unlisted is dropped. Second, network policies are only enforced if your CNI plugin implements them. Several managed-cluster defaults historically ignored egress entirely, so applying a policy there gave teams a dangerous false sense of security. Always confirm your CNI enforces egress before you trust a rule in an audit.

There is also a foundational gotcha that deserves its own line: as soon as you add any egress rule, you must explicitly allow DNS. Pods resolve names through CoreDNS in kube-system, and if your egress policy doesn’t permit UDP and TCP traffic to port 53, name resolution dies. The workload then appears to hang on every outbound call, even though the “real” destination rule looks perfectly correct.


Static CIDR vs DNS-Aware Egress

The core egress decision is whether to allow destinations by IP range (static CIDR) or by domain name (DNS-aware / FQDN). Static CIDR rules are simple, portable, and CNI-agnostic — ideal when the destination is a stable internal address like a VPC-peered RDS subnet. DNS-aware rules trade that portability for the ability to follow endpoints whose IPs rotate, like SaaS APIs and CDN-fronted services.

Egress policy decision flow comparing static CIDR rules against DNS-aware FQDN rules

Static ipBlock rules are the native, vendor-neutral path. They work on any conformant CNI, they’re trivial to audit (a reviewer reads a CIDR and a port), and they add zero runtime machinery. Their weakness is drift. Cloud providers rotate public IPs, CDNs front a single hostname with hundreds of addresses, and SaaS vendors change ranges without notice. As the Cilium community bluntly puts it, “maintaining CIDR lists for egress rules is a constant operational burden that almost always drifts out of date.” For anything internal and stable — your RDS subnet, an internal service mesh range, a peered VPC — that drift never happens and CIDR is the right call. For anything on the public internet, CIDR lists rot.

DNS-aware egress fixes drift by letting you write the rule against api.stripe.com or *.amazonaws.com instead of an IP list. The CNI watches DNS responses, learns which IPs a name currently resolves to, and programs those IPs into the allow-list automatically, expiring them as the DNS TTL elapses. Both Cilium (via toFQDNs in a CiliumNetworkPolicy) and Calico Enterprise (via DNS domain matching) support this, through different mechanisms covered below. The trade-off is lock-in: FQDN matching is a CNI-specific extension, so a policy that uses it is no longer portable plain NetworkPolicy. For destinations that genuinely rotate, that coupling is usually worth it; for stable internal targets, it’s needless complexity.

A useful rule of thumb: reach for CIDR when you can name the IP range and trust it to stay put, and reach for FQDN when the only stable identifier you have is a hostname. Many production policies use both — a CIDR rule for the database, an FQDN rule for a third-party API, and a DNS rule so both can resolve names. Mixing them in one policy is normal and encouraged.


How NetworkPolicy Becomes Datapath Rules

A NetworkPolicy is just a desired-state object until your CNI translates it into kernel enforcement. The API server stores the policy; the CNI agent on each node watches for policy and pod changes, then renders the rules into the datapath — either iptables/ipset chains (Calico’s default mode) or compiled eBPF programs and maps (Cilium). The kernel then permits or drops packets on the wire, with no userspace round-trip for a simple L3/L4 decision.

How a NetworkPolicy is translated by the CNI into iptables or eBPF datapath rules

The two datapaths scale very differently, and the difference is not academic once a cluster grows. iptables evaluates packets against linear rule chains, so per-packet cost rises with the number of rules — a thousand policies means longer chains to walk. eBPF instead compiles policies into hash-map lookups attached to well-defined kernel hook points, giving near-constant lookup time regardless of policy count. Per Cilium’s documentation, packets are “classified in the kernel and either permitted or dropped; no packet need reach userspace for a simple L3/L4 policy.” Benchmarking reported in 2026 found that applying 200 policies added a maximum delay of roughly 0.2 ms even for large responses — a profile iptables struggles to match as rules multiply.

This datapath distinction matters for egress specifically because DNS-aware rules are inherently dynamic: resolved IPs churn as TTLs expire and load balancers shuffle backends. eBPF maps can be updated in place the instant a DNS response arrives, which is why FQDN egress feels seamless on Cilium — the proxy learns an IP and writes it straight into a map the datapath already consults. Calico’s iptables mode achieves the equivalent by maintaining ipsets that it updates as DNS is snooped, avoiding a full chain rebuild on every change.

Understanding that your YAML becomes a kernel program (or an iptables chain plus ipsets) also demystifies debugging. When a policy “doesn’t work,” the real question is whether the agent rendered it correctly into the datapath, whether the pod’s labels actually matched the selector, and whether some other policy is also selecting the pod and tightening the allow-list. Tools like Hubble and Calico flow logs exist precisely to make that rendered, in-kernel verdict observable instead of guessed at.


RDS-Specific Egress YAML Patterns

For most teams the highest-value egress policy is the one that restricts an app to its database and nothing else. The pattern is identical regardless of CNI: deny all egress, then allow DNS to CoreDNS plus the RDS endpoint on the database port. Because an RDS instance usually sits at a stable private IP inside a VPC-peered subnet, a static ipBlock covering that subnet is often the cleanest, most portable choice — no extra CNI features required.

Here is a working native NetworkPolicy that allows a payments-api pod to reach DNS and a PostgreSQL RDS subnet on port 5432, and nothing else:

# ILLUSTRATIVE — adapt CIDRs, ports, labels, and namespace to your cluster
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: app-egress-rds-only
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Egress
  egress:
    # 1. Allow DNS resolution to CoreDNS (required for ANY name lookup)
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
    # 2. Allow PostgreSQL egress to the RDS private subnet only
    - to:
        - ipBlock:
            cidr: 10.20.30.0/24   # RDS subnet CIDR
      ports:
        - protocol: TCP
          port: 5432

Two details make or break this policy. The DNS rule must come first in spirit — without it the pod cannot resolve the RDS endpoint name even though the CIDR rule is correct. And the ipBlock must scope the subnet tightly: a /24 for the database subnet is far safer than a /16 that accidentally re-opens the whole VPC. If you run multiple database engines, add one to/ports pair per engine (5432 for PostgreSQL, 3306 for MySQL, 6379 for Redis) rather than widening the port range.

When the RDS endpoint is fronted by a name whose IP can change — common with Aurora cluster endpoints, cross-region read replicas, or failover routing — reach for a DNS-aware rule instead. The Cilium equivalent below matches the RDS hostname directly with toFQDNs, and still includes the mandatory DNS allow rule. Without that DNS rule, Cilium’s proxy never sees the lookup and has nothing to learn the IP from, so the FQDN rule matches nothing:

# ILLUSTRATIVE — Cilium FQDN egress; requires the Cilium CNI
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: app-egress-rds-fqdn
  namespace: payments
spec:
  endpointSelector:
    matchLabels:
      app: payments-api
  egress:
    # DNS rule lets Cilium's proxy observe lookups and learn resolved IPs
    - toEndpoints:
        - matchLabels:
            k8s:io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: ANY
          rules:
            dns:
              - matchPattern: "*"
    # Allow egress only to the resolved RDS hostname on the DB port
    - toFQDNs:
        - matchName: "payments-db.cluster-abc123.us-east-1.rds.amazonaws.com"
      toPorts:
        - ports:
            - port: "5432"
              protocol: TCP

You can tighten the DNS rule further by replacing matchPattern: "*" with the specific RDS domain (for example matchPattern: "*.rds.amazonaws.com"), so the pod can only resolve database names and nothing else. That turns DNS itself into part of the allow-list — a subtle but powerful hardening step that native policies cannot express.


Common Pitfalls and Debugging

The single most common egress failure is forgetting the DNS allow rule. Once any egress policy selects a pod, all outbound traffic — including the port 53 lookup to CoreDNS — is denied unless you list it explicitly. The symptom looks like a frozen application or vague connection timeouts, but the real cause is that name resolution never completes, so the connection never even gets an IP to dial. Add the DNS rule first, every time, before you debug anything else.

Other recurring traps to check before you escalate:

  • Short DNS TTLs racing the proxy. With FQDN policies, if a record’s TTL is very small, a resolved IP can expire from the allow-list before the connection finishes opening. Cilium recommends setting dnsProxy.minTtl to keep learned entries valid long enough to close that timing gap.
  • ipBlock does not select pods. ipBlock matches raw CIDRs and intentionally ignores pod identity. Use podSelector or namespaceSelector for in-cluster destinations and reserve ipBlock for external IPs and on-prem ranges.
  • A second policy is also selecting the pod. Policies are additive but each is independently restrictive — if two egress policies select the same pod, the pod is allowed out only where both permit it. A “missing” allow is sometimes a second policy quietly narrowing the set.
  • Cloud metadata endpoint left open. A deny-by-default egress policy should explicitly not re-allow 169.254.169.254. Locking down the metadata endpoint to stop credential exfiltration is one of the biggest reasons to adopt egress policies at all.
  • Policy applied on a CNI that ignores egress. Confirm your plugin actually enforces egress; a policy on a non-enforcing CNI is documentation, not security.

To debug, watch the datapath, not just the YAML. On Cilium, Hubble makes verdicts observable — hubble observe --verdict DROPPED shows exactly which flow was denied and which policy denied it, turning a guessing game into a lookup. On Calico, flow logs and calicoctl surface the same drop-with-reason data. Per the CNI debugging community, dropped-flow visibility is the fastest way to distinguish a misconfigured policy from a missing one — and that distinction usually points straight at either the DNS rule or a CIDR that’s scoped wrong.


Calico vs Cilium for Egress

For egress, Calico vs Cilium comes down to datapath and DNS approach. Calico defaults to an iptables/ipset datapath (with an eBPF mode available) and snoops DNS traffic without a dedicated proxy, integrating cleanly with the cluster’s existing CoreDNS. Cilium is eBPF-native and uses a DNS proxy to intercept lookups, then programs the resolved IPs into eBPF maps. Both deliver FQDN egress; they differ chiefly in performance profile and operational complexity.

Tigera, Calico’s creator, argues that snooping DNS “without requiring any modifications to how DNS queries are handled” scales more predictably, because each component scales independently rather than funneling every cluster lookup through a single proxy that can become a bottleneck under heavy DNS load. In their model there is one less moving part to size, monitor, and fail. For teams that want FQDN egress with minimal new infrastructure, that low-friction story is genuinely appealing — especially in clusters already standardized on Calico.

The Cilium camp counters that an in-kernel eBPF datapath delivers O(1) policy lookups instead of growing iptables chains, plus first-class L7 awareness (HTTP, gRPC, and DNS-level rules) and built-in Hubble observability that makes every allow and deny inspectable. The DNS proxy is the price of that depth: it is the mechanism that lets Cilium both enforce DNS-level rules and learn FQDN-to-IP mappings in real time, writing them into maps the datapath already reads.

Practical guidance: if you already run Calico and need straightforward FQDN egress with the fewest moving parts, its DNS-snooping model is low-friction and battle-tested. If you want L7-aware policy, rich flow visibility, predictable performance as policy counts climb, and a path toward a sidecarless service mesh, Cilium’s eBPF datapath is the stronger long-term foundation. Many platform teams in 2026 standardize on Cilium precisely because the same eBPF layer powers networking, egress security, and mesh together. For a deeper look at Cilium’s eBPF internals and how that datapath underpins L7 policy, see our sidecarless eBPF service mesh deep dive.


Production Hardening for Egress

Production-grade egress means deny-by-default everywhere, then a tight allow-list per workload. Start by applying a namespace-wide default-deny egress policy, add explicit DNS and destination rules for each app, and treat the cloud metadata endpoint and the open internet as forbidden unless a workload has a documented reason to reach them. Then layer observability on top so every denied flow is visible rather than silent.

A pragmatic hardening checklist that holds up under audit:

  • Default-deny first. Apply an egress NetworkPolicy that selects all pods in a namespace with empty egress rules, then add specific allows. This deny-by-default posture is the backbone of a zero-trust network architecture — nothing is trusted implicitly, every path is named.
  • Pin DNS explicitly in every egress-restricted namespace so ordinary lookups and FQDN-based rules both keep working. Where your CNI supports it, scope the DNS rule to the domains a workload may resolve.
  • Prefer CIDR for stable internal targets (RDS subnets, internal services, peered VPCs) and FQDN for rotating external endpoints (SaaS APIs, CDNs). Don’t force one mechanism to do both jobs.
  • Block the metadata endpoint (169.254.169.254/32) so a compromised pod can’t trade its node role for cloud credentials — a common lateral-movement and exfiltration path.
  • Roll out in audit mode where the CNI supports it, observing what real traffic needs before you flip to enforce. This surfaces forgotten dependencies without an outage.
  • Wire up Hubble or flow logs and alert on unexpected denies. An egress policy is only as good as your ability to see what it drops; silent drops become 2 a.m. incidents.

Egress policy is iterative, not a one-shot config. Ship default-deny, observe the drops, widen the allow-list to exactly what the workload proves it needs, and repeat per service. Done well, a compromised pod can reach its database and the two APIs it legitimately calls — and absolutely nothing else, including the metadata endpoint that would otherwise hand an attacker the keys to the account.


Frequently Asked Questions

The following questions are structured for FAQPage schema and address common People Also Ask intents around Kubernetes egress.

Does Kubernetes block egress traffic by default?
No. By default every pod can send traffic to any destination — other pods, your database, SaaS APIs, and the public internet. Egress is only restricted once a NetworkPolicy with policyTypes: [Egress] selects the pod. At that point the pod becomes deny-by-default outbound, and only the destinations you explicitly list are allowed; everything else is dropped.

Why does my pod lose DNS when I add an egress policy?
Because adding any egress rule switches the pod to deny-by-default for outbound traffic, including the port 53 lookup to CoreDNS. You must add an explicit egress rule allowing UDP and TCP to port 53 toward the kube-system DNS pods. Without it, name resolution silently fails and every outbound connection appears to hang, even when your “real” destination rule is correct.

Can a native NetworkPolicy restrict egress to a domain name?
Not directly. The native NetworkPolicy API is Layer 3/4 only and matches destinations by ipBlock CIDR, podSelector, or namespaceSelector — there is no DNS or FQDN field. To allow egress by hostname you need a CNI extension such as Cilium’s toFQDNs in a CiliumNetworkPolicy or Calico’s DNS domain matching.

How do I restrict pod egress to an AWS RDS instance?
If the RDS endpoint sits at a stable private IP, use a static ipBlock egress rule for the RDS subnet CIDR on the database port (for example 5432 for PostgreSQL), plus a DNS allow rule. If the endpoint IP rotates — common with Aurora cluster endpoints or cross-region replicas — use a DNS-aware FQDN rule that matches the RDS hostname instead, and keep the DNS rule so the name can resolve.

Should I use Calico or Cilium for DNS-aware egress?
Both support FQDN egress. Calico snoops DNS traffic without a dedicated proxy and lets each component scale independently, which suits low-friction setups already on Calico. Cilium uses an eBPF datapath plus a DNS proxy, offering O(1) policy lookups, L7-aware rules, and Hubble observability. Choose based on whether you prioritize operational simplicity (Calico) or eBPF-native depth and L7 visibility (Cilium).

How do I debug a blocked egress connection?
Inspect the datapath, not just the YAML. On Cilium run hubble observe --verdict DROPPED to see which flow was denied and by which policy; on Calico use flow logs and calicoctl. The most common root cause is a missing DNS allow rule, followed by short DNS TTLs on FQDN policies, ipBlock rules used where a pod selector was needed, and a second policy independently narrowing the allow-list.


Further Reading


Written by Riju, a cloud and DevOps engineer focused on Kubernetes security, networking, and platform reliability. Read more about the author and this site on the about page.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *