Calico on EC2 caveat
Users running Calico on AWS EC2 with multi‑subnet VXLAN reported pod‑to‑pod DNS failures that required disabling Source/Destination checks — a specific networking fix for Kubernetes overlays on EC2. (x.com)
Tigera’s Calico documentation explicitly requires disabling AWS source/destination checks when Calico assigns pod IPs that aren’t native EC2 addresses, because EC2 src/dst checks block routed pod traffic across instances. (docs.tigera.io) Felix exposes awsSrcDstCheck (also surfaced as FELIX_AWSSRCDSTCHECK) that can be set to Disable to have Calico turn off the EC2 src/dst check automatically, but the node IAM role must include permissions such as ec2:ModifyNetworkInterfaceAttribute and ec2:DescribeInstances for that to work. (docs.tigera.io) Multiple Project Calico issue threads recorded a VXLAN regression in v3.28/v3.29 where DNS resolution failed across nodes and only succeeded when CoreDNS was co‑located on the same host, indicating an encapsulation/path problem rather than a CoreDNS process crash. (github.com) A separate Calico GitHub issue documents kube‑dns connectivity breaks after installing Calico where pods could not reach the cluster DNS IP, and troubleshooting guidance in those threads points to underlying EC2 networking (src/dst checks, routes, security groups) as the common root cause. (github.com) AWS provides the CLI/API to toggle the check per ENI via ModifyNetworkInterfaceAttribute (the src/dst-check flag), and community troubleshooting posts note that disabling the EC2 source/destination check takes effect immediately without a Calico restart. (docs.aws.amazon.com) Installation/ops tooling (kOps, EKS community guides) recommend setting FELIX_AWSSRCDSTCHECK=Disable or the equivalent felixConfig awsSrcDstCheck=Disable and adding an IAM policy to nodes so Calico can call ec2:ModifyNetworkInterfaceAttribute on node ENIs automatically. (github.com) Operational configs for Calico on AWS using VXLAN also call out opening UDP 4789 for VXLAN and related control ports (for example TCP 5473) in node security groups to avoid encapsulation blocking that can manifest as cross‑node DNS and pod connectivity failures. (sbulav.github.io)