Introduction: The Egress Problem in Kubernetes
When deploying stateful applications to Kubernetes—particularly those requiring database access—network security becomes a critical concern. A pod needs to reach AWS RDS, but you want to prevent any pod from accessing arbitrary external services. Kubernetes NetworkPolicies exist to solve this, but they operate at a layer that introduces subtle challenges: DNS resolution timing, IP address rotation, CIDR block allocation, and the distinction between different network policy implementations.
This guide deconstructs Kubernetes egress policies from first principles. We’ll examine how traffic flows through the OSI stack, why naive CIDR selectors fail in production, how different CNI plugins (Calico, Cilium, Flannel) implement policies at different layers, and how to write YAML that actually works—not just for RDS, but for any external service with dynamic DNS.
By the end, you’ll understand the iptables rules being generated beneath your NetworkPolicy objects, be able to diagnose DNS-induced connectivity failures, and know when to escalate from NetworkPolicy to egress-aware CNI solutions.
Section 1: Fundamental Concepts—What NetworkPolicy Actually Does
1.1 The OSI Layer Problem
Kubernetes NetworkPolicies are a layer-3 construct. They operate at the IP/CIDR level, making decisions based on source and destination IP addresses, protocols, and ports. This is critical to understand: NetworkPolicies do not see DNS names natively.

When a pod makes a request to mydb.xxx.rds.amazonaws.com, the following happens:
- Application layer (Layer 7): Pod issues HTTP/SQL request to a hostname.
- Transport layer (Layer 4): OS opens a socket to an IP:port.
- Network layer (Layer 3): OS queries DNS resolver (typically kube-dns/CoreDNS), receives an IP address, and kernel routes the packet to that IP.
- Data link layer (Layer 2): Iptables and netfilter (the kernel subsystem implementing NetworkPolicies) inspect the packet before it leaves the pod’s network namespace.
NetworkPolicies only see and control Layer 3 and Layer 4. By the time a packet reaches netfilter, DNS resolution has already occurred. The hostname is gone; only the resolved IP remains. This creates the first production pitfall: if you write a policy allowing a specific hostname, the policy engine doesn’t understand that hostname.
1.2 The Default-Deny Model
Kubernetes NetworkPolicies operate on a default-allow model: if no policy exists, all traffic is permitted. The moment you create a NetworkPolicy in a namespace targeting a pod, that pod enters a default-deny state for the direction(s) you’ve defined (ingress or egress).
Consider this simple example:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-ingress
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: frontend
This policy says: “For pods labeled app: myapp, only allow ingress from pods labeled app: frontend.” Crucially, this policy does not mention egress, so egress remains unrestricted. The moment you add an egress rule, egress becomes default-deny except for what you explicitly allow.
This is the mental model: each policy is cumulative. A pod can be selected by multiple policies; all rules are OR’d together. If any policy allows traffic, traffic is allowed (unless a deny policy exists—but Kubernetes doesn’t have a native deny semantics; that’s left to CNI implementations like Calico’s GlobalNetworkPolicy).
1.3 Ingress vs. Egress Policy Asymmetry
Ingress and egress policies are enforced at different points in the network stack:
- Ingress: Enforced on the target pod. When a packet arrives at pod A, the policy on pod A determines whether to accept it. The source pod is unaware.
- Egress: Enforced on the source pod. When pod B sends a packet, the policy on pod B determines whether to allow it to leave the pod’s network namespace.
This asymmetry has a consequence: ingress policies are simpler because they operate closer to the destination; egress policies must predict where the traffic is going, which requires either static CIDR blocks or DNS-aware mechanisms.
Section 2: Egress Rules Architecture—Static CIDR vs. DNS-Aware Approaches
2.1 Static CIDR Selectors (The Traditional Approach)
The simplest egress policy uses CIDR notation to specify allowed destination IPs. If you know that RDS lives in a specific CIDR block, you can write:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-to-rds
spec:
podSelector:
matchLabels:
app: myapp
policyTypes:
- Egress
egress:
# Allow DNS (for any resolving)
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
# Allow traffic to RDS CIDR
- to:
- ipBlock:
cidr: 10.0.0.0/16 # VPC CIDR where RDS lives
ports:
- protocol: TCP
port: 3306 # MySQL
Why this works: RDS instances live within a VPC. AWS provides you with the VPC CIDR block (e.g., 10.0.0.0/16). When you deploy your cluster within that VPC, pods’ traffic destined for RDS stays within the VPC and matches the CIDR rule.
Why this is fragile in production:
-
CIDR blocks are overly broad: Allowing
10.0.0.0/16allows traffic to all resources in the VPC, not just RDS. If an attacker compromises your pod, they can reach other databases, file shares, and services. -
IP changes: If RDS uses failover, it may momentarily get a different IP within the same CIDR. Some DNS-based failover architectures (Route53 weighted records) can even flip between different CIDR blocks in multi-region setups.
-
Multi-CIDR scenarios: If your RDS is in a peered VPC or you’re using a transit gateway, you might need multiple CIDR blocks, and the policy becomes hard to reason about.
-
No per-RDS-instance granularity: You cannot write a policy that says “allow only this specific RDS instance”; you can only allow the CIDR.
2.2 DNS-Aware Egress (Cilium’s Approach)
Cilium, an eBPF-based CNI, allows policies to reference DNS names directly. When a pod resolves a DNS name via the cluster’s resolver (CoreDNS), Cilium intercepts the DNS response and learns the IP → DNS name mapping. Subsequent packets matching that IP are allowed if there’s a policy rule for that DNS name.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: app-to-rds-dns
spec:
endpointSelector:
matchLabels:
app: myapp
egress:
- toFQDNs:
- matchName: "mydb.xxx.rds.amazonaws.com"
toPorts:
- ports:
- port: "3306"
protocol: TCP
Why this is better:
- Granular control: You specify the exact hostname, not a CIDR block.
- Automatic IP tracking: Cilium watches DNS responses and updates its internal state when IPs change.
- TTL-aware: Cilium respects DNS TTLs, removing stale mappings to prevent policy violations from aged DNS data.
The catch: This requires Cilium to be your CNI. It’s not a standard Kubernetes feature. Additionally, it only works if DNS queries flow through the cluster’s resolver; direct IP access or queries to external resolvers bypass the mechanism.
2.3 Hybrid: Static IP with Continuous Validation
A production pattern combines static CIDR rules with external DNS monitoring. A controller watches RDS endpoints and updates the NetworkPolicy or a custom resource when IPs change. This requires operational overhead but provides both clarity (you see the exact CIDR being allowed) and resilience (the policy self-heals when RDS IPs change).
Section 3: Implementation Details—How NetworkPolicies Become Iptables Rules
3.1 The CNI Plugin Layer
NetworkPolicies are a Kubernetes API construct, but they’re not enforced by kube-apiserver. Instead, the network plugin (CNI) reads NetworkPolicy objects and implements them in the kernel. Different CNIs use different mechanisms:
- Calico: Uses iptables (and optionally eBPF) to implement policies. Default behavior is layer-3 CIDR-based.
- Cilium: Uses eBPF hooks at the kernel level for per-packet decision-making, including DNS-aware rules.
- Flannel: Does not implement NetworkPolicies natively; you need an additional policy controller.
- Weave: Implements policies via iptables, similar to Calico.
For standard Kubernetes NetworkPolicy (the networking.k8s.io/v1 API), Calico and Weave are most common. For advanced features, Cilium is the standard.
3.2 Iptables Chain Structure for Egress
When you apply a NetworkPolicy with egress rules to a pod, Calico creates an iptables chain that looks conceptually like this:
Pod Namespace (veth interface)
↓
FORWARD chain checks (nat, filter)
↓
Calico EGRESS chain (cali_EGRESS-MYPOD)
↓
Match destination IP and port
↓
ACCEPT → traffic exits pod namespace
↓
REJECT → traffic dropped, ICMP error returned
For a policy allowing egress to 10.0.0.0/16:3306, Calico generates a rule like:
-A cali_EGRESS-MYPOD -d 10.0.0.0/16 -p tcp -m tcp --dport 3306 -j ACCEPT
If no rule matches, the default jump is to REJECT (or DROP, depending on policy).
3.3 DNS Traffic and Recursive Lookups
A crucial detail: your policy must explicitly allow DNS.
When a pod resolves mydb.xxx.rds.amazonaws.com, it queries kube-dns (typically 10.96.0.10:53 in the kube-system namespace). This query requires:
- Egress from pod to the DNS service IP on UDP port 53.
- Ingress to the DNS service (but this is handled by the service’s policy, not the pod’s).
- Egress from pod to the destination IP once resolved.
If your egress policy allows only traffic to 10.0.0.0/16 on port 3306, but doesn’t allow UDP port 53 to 10.96.0.10, DNS resolution fails, and the application hangs waiting for DNS.
A typical egress rule must include:
egress:
# Allow DNS queries
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
# Allow RDS
- to:
- ipBlock:
cidr: 10.0.0.0/16
ports:
- protocol: TCP
port: 3306
3.4 Default Egress Rule (Allow External Traffic)
Interestingly, when you apply an egress rule targeting pods within the cluster (via podSelector or namespaceSelector), you may still need to explicitly allow external traffic. Some network policies default to denying all egress once any egress rule is defined; others allow external traffic by default. This varies by CNI.
To be safe, include an explicit rule:
egress:
# ... other rules ...
# Allow external traffic (if not already permitted)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32 # Block AWS metadata service
Section 4: RDS-Specific Egress Patterns—Production YAML
4.1 Single RDS Instance in Private Subnet
This is the most common scenario: your RDS instance lives in a private subnet, and you know its IP or CIDR block. Your application pod needs to reach it on the standard MySQL/PostgreSQL port.

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-to-rds
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
tier: backend
policyTypes:
- Egress
egress:
# Rule 1: Allow DNS for resolution
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53 # TCP DNS for large responses
# Rule 2: Allow RDS via CIDR (intra-VPC)
- to:
- ipBlock:
cidr: 10.0.0.0/16
ports:
- protocol: TCP
port: 3306
- protocol: TCP
port: 5432 # PostgreSQL
# Rule 3: Allow HTTPS to external services (optional)
- to:
- ipBlock:
cidr: 0.0.0.0/0
except:
- 169.254.169.254/32 # AWS metadata service
- 10.0.0.0/8 # Private networks
ports:
- protocol: TCP
port: 443
Explanation:
– Rule 1 allows the pod to query DNS within the cluster.
– Rule 2 allows traffic to the entire VPC on MySQL/PostgreSQL ports. In a hardened setup, you’d narrow this further (e.g., 10.0.2.0/24 for the RDS subnet).
– Rule 3 allows HTTPS to the public internet (useful for calling external APIs), but explicitly denies the AWS metadata service and private IP ranges.
Testing:
kubectl apply -f netpol.yaml
kubectl run -it test-pod --rm --image=nicolaka/netshoot -- bash
# Inside the pod:
curl -v mydb.xxx.rds.amazonaws.com:3306 # Should timeout (not HTTP), but connection should establish
mysql -h mydb.xxx.rds.amazonaws.com -u admin -p # Should connect
nslookup mydb.xxx.rds.amazonaws.com # Should resolve
4.2 RDS with Aurora Cluster Endpoint
AWS Aurora uses cluster endpoints (e.g., mydb-cluster.cluster-xxx.us-east-1.rds.amazonaws.com) that are DNS-based and may resolve to different IPs over time. Additionally, Aurora auto-scaling can change the IPs of individual instances.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-to-aurora-cluster
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Egress
egress:
# DNS resolution
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
# Aurora endpoints may be in multiple AZs, so we need broader CIDR
- to:
- ipBlock:
cidr: 10.0.0.0/16 # Entire VPC
ports:
- protocol: TCP
port: 3306
Why broader CIDR: Aurora can distribute read replicas across multiple availability zones, each in different subnets. Rather than hardcoding each subnet, the CIDR covers all potential instance placements.
Better approach with Cilium:
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: app-to-aurora
namespace: production
spec:
endpointSelector:
matchLabels:
app: api-server
egress:
- toFQDNs:
- matchName: "mydb-cluster.cluster-xxx.us-east-1.rds.amazonaws.com"
toPorts:
- ports:
- port: "3306"
protocol: TCP
With Cilium, the policy explicitly names the Aurora cluster endpoint, and Cilium automatically maintains the IP mappings as instances scale.
4.3 Multi-Region or Cross-VPC RDS
If your RDS is in a different VPC (via peering, transit gateway, or a managed bastion), you need separate CIDR rules for each path.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: app-to-multiregion-rds
namespace: production
spec:
podSelector:
matchLabels:
app: api-server
policyTypes:
- Egress
egress:
# DNS
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
# Primary RDS in VPC A
- to:
- ipBlock:
cidr: 10.0.0.0/16
ports:
- protocol: TCP
port: 3306
# Secondary RDS in VPC B (via peering)
- to:
- ipBlock:
cidr: 172.16.0.0/16
ports:
- protocol: TCP
port: 3306
# RDS behind bastion in public subnet (via transit gateway)
- to:
- ipBlock:
cidr: 192.168.1.0/24
ports:
- protocol: TCP
port: 3306
Section 5: Common Pitfalls and Debugging Strategies
5.1 The DNS Timing Trap
Symptom: Pod can ping an IP but cannot resolve the hostname.
Root cause: The egress policy allows traffic to the destination IP but not to DNS (port 53).
Diagnosis:
kubectl exec -it <pod> -- bash
nslookup mydb.xxx.rds.amazonaws.com # Hangs or times out
Fix: Add a DNS rule to the policy.
5.2 The ICMP Exception
Symptom: Pod can establish TCP connections but receives no responses; traceroute shows timeouts.
Root cause: Your policy allows TCP port 3306 but blocks ICMP. Some network paths rely on ICMP to signal MTU errors (Path MTU Discovery).
Fix: Explicitly allow ICMP:
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/16
ports:
- protocol: ICMP
Or use iptables to allow ICMP errors within established connections (Calico handles this automatically with the AllowIPIPPackets setting).
5.3 Metadata Service Blocking
Symptom: Pod works fine with direct RDS connection but fails when using AWS SDK (e.g., boto3) to assume IAM roles.
Root cause: The policy blocks 169.254.169.254 (AWS metadata service), which the SDK queries for temporary credentials.
Fix: Allow the metadata service:
egress:
- to:
- ipBlock:
cidr: 169.254.169.254/32
ports:
- protocol: TCP
port: 80
Or use Pod Identity (if on EKS) instead of metadata service.
5.4 RDS IP Changes During Failover
Symptom: Pod connects fine, but after RDS failover, new connections timeout.
Root cause: RDS failover changes the IP of the endpoint. Your CIDR rule covers it, but DNS resolution returns a new IP, and the connection string is stale in application code.
Debugging:
# Check if IP changes:
while true; do
echo "$(date): $(nslookup mydb.xxx.rds.amazonaws.com | grep Address | tail -1)"
sleep 10
done
Fix: Always use DNS names in connection strings, not IPs. Application libraries (JDBC, SQLAlchemy) should be configured with DNS names so they re-resolve on connection failures.
5.5 Diagnosing Policies with netshoot
Deploy a test pod and examine iptables rules:
kubectl run -it netshoot --image=nicolaka/netshoot -- bash
# List iptables chains for Calico
iptables -L -n | grep -i calico
# Check a specific chain (replace with actual chain name)
iptables -L -n cali_EGRESS-<pod-name> -v
# Monitor connections
tcpdump -i eth0 'tcp port 3306' -A
# Test DNS
dig mydb.xxx.rds.amazonaws.com @10.96.0.10
Section 6: Calico vs Cilium—Choosing Your CNI
6.1 Calico’s Approach

Strengths:
– Mature, widely adopted, excellent documentation.
– Works with standard Kubernetes NetworkPolicy API.
– Low overhead (iptables-based).
– Integrates well with AWS (no special IAM roles needed).
Limitations:
– Layer 3 only; no DNS-aware rules.
– CIDR selectors are inflexible.
– No built-in egress endpoint isolation (you can’t say “allow only this RDS instance”).
Best for: Traditional on-prem or cloud deployments where you can predict and hardcode CIDR blocks.
6.2 Cilium’s Approach
Strengths:
– eBPF-based, so rules can operate at any OSI layer.
– DNS-aware egress rules (game-changer for dynamic endpoints).
– Advanced features like L7 policy (allow only specific HTTP paths), service meshless traffic management.
– Dramatically better observability and debugging.
Limitations:
– Requires eBPF-capable kernel (Linux 4.18+); not available on older systems.
– Higher operational complexity (requires understanding eBPF concepts).
– Larger resource footprint (especially with advanced features enabled).
– AWS integration requires specific IAM configurations.
Best for: Modern cloud environments with dynamic services and applications that need granular control (e.g., zero-trust security model).
6.3 Decision Tree

The decision tree above visualizes the choice:
Does your RDS IP change?
├─ No, static IP or narrow CIDR → Calico + NetworkPolicy
└─ Yes, dynamic endpoints
├─ Can you run eBPF? (kernel 4.18+) → Cilium
└─ No → Calico + external DNS monitor + policy controller
Section 7: Production Patterns and Hardening
7.1 The Four-Layer Isolation Pattern
In a production system, NetworkPolicy is one layer of defense:
Layer 1: AWS Security Groups (VM-level, outside k8s)
↓ (RDS requires inbound on 3306 from your cluster security group)
Layer 2: Kubernetes NetworkPolicy (pod-level CIDR/DNS rules)
↓ (controls which pods can egress to which IPs)
Layer 3: Application-level authentication (database user/password)
↓ (RDS enforces user privileges, query limits, etc.)
Layer 4: Observability and audit (CloudTrail, RDS logs)
↓ (detects anomalous queries)
A compromised pod hitting RDS must pass all four layers. If the pod’s policy allows RDS access, it moves to layer 3, where the database enforcer checks authentication. If an unauthorized query is sent, layer 4 logs it.
7.2 Namespace-Level Policies
Apply a default-deny policy to all namespaces, then add specific allowances:
---
# Default deny all egress in the production namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-egress
namespace: production
spec:
podSelector: {} # Apply to all pods
policyTypes:
- Egress
egress: [] # No rules = deny all
---
# Allow only essential services (DNS, RDS, etc.)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-dns-and-rds
namespace: production
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: UDP
port: 53
- to:
- ipBlock:
cidr: 10.0.0.0/16
ports:
- protocol: TCP
port: 3306
7.3 Monitoring and Alerting on Policy Violations
Calico and Cilium both provide metrics on policy hits and denies. Integrate these into your monitoring:
# Calico: export metrics from Felix (the node agent)
kubectl get po -n kale-system
# Felix exposes metrics on localhost:9091
# Cilium: Hubble provides metrics and flow visualization
cilium hubble ui
Set up alerts for:
– High rates of policy denies (indicates misconfiguration or attack)
– Policy violations from unexpected pods
– DNS failures (correlated with connectivity failures)
Conclusion: From Policy Definition to Runtime Enforcement
Kubernetes NetworkPolicies for RDS egress seem simple on the surface—allow traffic to a CIDR block on a port—but their correctness depends on understanding several layers:
- OSI layer alignment: Policies operate at layer 3, but DNS is layer 7. You must bridge this gap explicitly.
- CNI implementation details: Calico uses iptables chains; Cilium uses eBPF. Each has different capabilities and limitations.
- AWS-specific considerations: RDS lives in VPCs; your policy must account for CIDR blocks, failovers, and Aurora’s distributed nature.
- Operational reality: IPs change, networks are misconfigured, and debugging requires netshoot and iptables inspection.
The most robust production approach combines:
– Static CIDR rules for predictability and visibility.
– DNS-aware rules (via Cilium or external monitoring) for resilience.
– Default-deny policies to make the security posture explicit.
– Continuous observability to catch policy violations and misconfigurations early.
Start with Calico and hardcoded CIDR blocks. As your environment scales and endpoints become dynamic, graduate to Cilium’s DNS-aware policies. Monitor aggressively, and always test policies in staging before production.
