AWS Networking VPC and Transit Gateway

Notes from running multi-VPC AWS networking at scale. NAT Gateway cost traps, when peering breaks down, why Transit Gateway and PrivateLink are worth the hourly fee.

Friday afternoon, I was digging through our AWS bill ahead of a hackathon pitch at the creator economy platform I worked at. Compute was fine. RDS was fine, mostly, that was a different fight I was already winning. The line that wouldn’t stop growing was networking. Specifically, NAT Gateway data processing. We were paying per gigabyte for traffic that mostly didn’t need to leave the VPC in the first place. I’d been on AWS networking for years and still managed to sleep on it. Most teams do.

This post is the thing I wish a previous me had read. VPC peering vs Transit Gateway vs PrivateLink, where each one actually fits, and which AWS networking choices quietly cost the most.

The cost trap nobody talks about

The default EKS setup has private subnets, a NAT Gateway per AZ, and a chunky security group story. Pods need to reach S3, ECR, Secrets Manager, your CI registry, maybe a vendor API. By default, every byte of that egress goes out through NAT. NAT Gateway charges a per-hour fee plus a per-GB data processing fee. The hourly is small. The per-GB is what kills you.

The other quiet leak is cross-AZ. EC2 to EC2 across availability zones in the same region is not free. Pods scheduled randomly across three AZs that chat heavily? That bill shows up on the data transfer line, not on EC2.

I’m not saying don’t use multi-AZ. I’m saying notice when chatty pairs are getting scheduled across AZ and pin them when it matters.

VPC peering when it actually fits

VPC peering is a 1:1 handshake between two VPCs. No transitive routing. If A peers with B and B peers with C, A still can’t talk to C unless A also peers with C. The route tables don’t get smarter. You do.

It’s the right call when you have a small, fixed topology. Two VPCs. Maybe three. Both in the same account, or split across two accounts you control. Latency is essentially native and the data transfer pricing is the same as cross-AZ inside one VPC, which is fine for a low-traffic link.

resource "aws_vpc_peering_connection" "platform_to_data" {
  vpc_id        = aws_vpc.platform.id
  peer_vpc_id   = aws_vpc.data.id
  peer_owner_id = var.data_account_id
  auto_accept   = false

  tags = {
    Name = "platform-to-data"
  }
}

resource "aws_vpc_peering_connection_accepter" "data_side" {
  provider                  = aws.data_account
  vpc_peering_connection_id = aws_vpc_peering_connection.platform_to_data.id
  auto_accept               = true
}

# this is the part teams forget. peering does nothing without routes.
resource "aws_route" "platform_to_data" {
  route_table_id            = aws_route_table.platform_private.id
  destination_cidr_block    = aws_vpc.data.cidr_block
  vpc_peering_connection_id = aws_vpc_peering_connection.platform_to_data.id
}

resource "aws_route" "data_to_platform" {
  provider                  = aws.data_account
  route_table_id            = aws_route_table.data_private.id
  destination_cidr_block    = aws_vpc.platform.cidr_block
  vpc_peering_connection_id = aws_vpc_peering_connection.platform_to_data.id
}

The trap: someone keeps adding peerings. By the time you’re at five VPCs, that’s ten peerings, and every new VPC adds N more route table entries on every side. CIDR overlap surprises start showing up. Route table sprawl becomes a quiet operational tax. That’s when you should have stopped.

Transit Gateway when the mesh grows

Past four VPCs, Transit Gateway is the right call. Hub and spoke. Each VPC attaches to the TGW once. Routing is governed by TGW route tables, which you control per attachment. Multi-account works cleanly via AWS Resource Access Manager.

Yes it’s pricier per gigabyte than peering. Yes there’s a per-attachment hourly fee. But you stop maintaining N-squared peering connections and route entries by hand. You get one place to reason about routing. And you can segment by giving prod attachments a different TGW route table than sandbox attachments.

resource "aws_ec2_transit_gateway" "main" {
  description                     = "shared TGW"
  amazon_side_asn                 = 64512
  auto_accept_shared_attachments  = "enable"
  default_route_table_association = "disable"
  default_route_table_propagation = "disable"
}

resource "aws_ec2_transit_gateway_route_table" "prod" {
  transit_gateway_id = aws_ec2_transit_gateway.main.id
  tags = { Name = "tgw-rt-prod" }
}

resource "aws_ec2_transit_gateway_vpc_attachment" "platform" {
  transit_gateway_id = aws_ec2_transit_gateway.main.id
  vpc_id             = aws_vpc.platform.id
  subnet_ids         = aws_subnet.platform_private[*].id

  tags = { Name = "tgw-attach-platform" }
}

resource "aws_ec2_transit_gateway_route_table_association" "platform_prod" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.platform.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

resource "aws_ec2_transit_gateway_route_table_propagation" "platform_prod" {
  transit_gateway_attachment_id  = aws_ec2_transit_gateway_vpc_attachment.platform.id
  transit_gateway_route_table_id = aws_ec2_transit_gateway_route_table.prod.id
}

Defaulting the TGW to disable auto-association and auto-propagation is the one thing I push on every review. The default-on behavior is convenient until it isn’t, and then a sandbox VPC ends up reachable from prod and you find out via a Datadog query, which is not the kind of discovery anyone wants.

PrivateLink and endpoint routing

This is the one most teams sleep on. Every byte your pods send to S3, ECR, Secrets Manager, STS, or SSM via the public AWS endpoint goes out through the NAT Gateway by default. That’s processing fees on traffic that didn’t need to leave AWS at all.

S3 and DynamoDB have a free Gateway endpoint. It’s a no-brainer. Add it to every VPC, today.

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = aws_vpc.platform.id
  service_name      = "com.amazonaws.${var.region}.s3"
  vpc_endpoint_type = "Gateway"
  route_table_ids   = aws_route_table.platform_private[*].id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = "*"
      Action    = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
      Resource = [
        "arn:aws:s3:::platform-artifacts",
        "arn:aws:s3:::platform-artifacts/*",
      ]
    }]
  })
}

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = aws_vpc.platform.id
  service_name        = "com.amazonaws.${var.region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.platform_private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = aws_vpc.platform.id
  service_name        = "com.amazonaws.${var.region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = aws_subnet.platform_private[*].id
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true
}

Interface endpoints aren’t free, but the per-GB is meaningfully lower than NAT and you skip the NAT processing fee entirely. ECR pulls were the single biggest reason our NAT bill was big before we added the dkr and api endpoints. Once they were in, the pull bandwidth disappeared from the NAT graph the next day.

Before you decide what to add, look at the bill. CloudWatch has the bytes per NAT Gateway:

aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --dimensions Name=NatGatewayId,Value=nat-0abc123def \
  --statistics Sum \
  --period 86400 \
  --start-time "$(date -u -d '14 days ago' +%FT%TZ)" \
  --end-time "$(date -u +%FT%TZ)"

Sort by volume across NAT Gateways for the last 14 days. The top destinations from VPC flow logs will tell you what to point at endpoints first. Usually S3, then ECR, then your CI provider’s IP range.

Cross-region the boring way

If you’re going cross-region, Transit Gateway peering between two TGWs is the cleaner version of inter-region VPC peering. It composes with your existing TGW route tables, no one-off peerings sprinkled around.

The thing nobody tells you up front is the latency budget. Frankfurt to Virginia is roughly 85 ms round-trip on a good day. If your read path crosses regions synchronously, your p99 just inherited that floor. Replication is the boring answer.

Takeaways

NAT Gateway per-GB is the bill you don’t see coming. Pull the bytes-out metric, point the top destinations at endpoints.
Peering for small fixed meshes. Transit Gateway past four VPCs. Don’t try to “just one more peering” your way out of that.
S3 and DynamoDB gateway endpoints are free. There is no excuse to not have them.
Disable TGW default association and propagation. Be explicit per attachment.
Cross-region isn’t a routing problem, it’s a latency budget. Replicate, don’t read across.

Thanks for reading. If you’ve got thoughts, send them my way.