Deep Calico troubleshooting on AWS EC2
A practical post walks through Calico networking failures on Kubernetes running on AWS EC2, diagnosing VXLAN CrossSubnet mode and Source/Destination Check drops and showing fixes like disabling checks. The thread is a concise, hands-on reference for infra engineers debugging Pod-to-Pod networking on cloud VMs. (x.com)
Calico’s FelixConfiguration supports setting awsSrcDstCheck: Disable so Felix can toggle AWS source/destination checks for nodes automatically. (github.com) Disabling per-EC2 source/destination checks can also be done with the AWS CLI using modify-instance-attribute (example flag --no-source-dest-check --instance-id i-0123456789abcdef0). (awscli.amazonaws.com) When an IPPool is configured with vxlanMode or ipipMode set to CrossSubnet, Calico leaves intra-subnet traffic unencapsulated and applies VXLAN/IP-in-IP only across VPC subnet boundaries. (bookstack.cn) VXLAN encapsulation in Calico uses UDP port 4789 (50 bytes of overhead for UDP+VXLAN) and requires cluster node security groups to allow bidirectional UDP/4789 to avoid dropped encapsulated packets. (oneuptime.com) Calico’s AWS guidance explicitly warns that you must disable AWS src/dst checks when Calico assigns pod IPs outside the EC2 instance IP range to allow native routing within a VPC subnet. (docs.tigera.io) The Calico troubleshooting guide documents diagnostic commands such as sudo calicoctl node diags and kubectl logs -n calico-system <pod_name> for collecting node and component logs during pod-to-pod connectivity failures. (docs.tigera.io) Published incident writeups of Calico on self-managed AWS highlight recurring root causes—security groups blocking VXLAN/BGP, VPC src/dst checks left enabled, and MTU mismatches—each requiring targeted fixes (open UDP/4789, disable src/dst checks, or align MTU). (github.com)