AWS VPC Deep Dive: Complete Networking Guide for Cloud Engineers (2026)
AWS networking is the foundation that everything else sits on. Whether you're deploying a web application, a database cluster, or a microservices platform, it all lives inside a VPC (Virtual Private Cloud). Get the networking wrong and you get security vulnerabilities, high costs, and connectivity issues that are painful to debug.
This guide teaches you AWS VPC from scratch, then builds up to the production patterns used at companies running large-scale AWS workloads. It's required knowledge for the AWS Solutions Architect Associate (SAA-C03) and covers real hands-on configuration.
What is a VPC?
A Virtual Private Cloud is your own isolated network within AWS. Think of it as your own private data center network, but running on AWS infrastructure.
Every AWS account gets a default VPC in each region. For production workloads, you always create custom VPCs with proper CIDR planning, subnet segmentation, and access controls.
Key VPC properties:
- A VPC spans all Availability Zones in a region
- VPCs are isolated — resources in different VPCs can't communicate by default
- Each VPC has a CIDR block (IP address range) you define
- You can have up to 5 VPCs per region (soft limit, can request increase)
CIDR Notation
Every VPC has a CIDR block. Understanding CIDR is essential.
10.0.0.0/16 — the VPC's total IP range
^--- the prefix length (16 bits fixed = 65,536 addresses)
Breaking this down:
- /16 = 65,534 usable IPs (a large VPC)
- /24 = 254 usable IPs (a subnet)
- /28 = 14 usable IPs (a small subnet)
Production VPC sizing:
- Small: 10.0.0.0/24 (254 IPs)
- Medium: 10.0.0.0/20 (4,094 IPs)
- Large: 10.0.0.0/16 (65,534 IPs) — recommended default
Multi-VPC strategy: allocate /16 blocks per environment
- 10.0.0.0/16 — production
- 10.1.0.0/16 — staging
- 10.2.0.0/16 — development
Subnets
A subnet is a range of IP addresses within your VPC, tied to a single Availability Zone.
Public subnet: Has a route to an Internet Gateway. Resources in a public subnet can receive inbound internet connections (if security groups allow).
Private subnet: No direct route to the internet. Resources can still reach the internet via a NAT Gateway (for software updates, external APIs) but are not reachable from the internet.
Production Subnet Design
For a three-tier architecture, use this pattern:
VPC: 10.0.0.0/16
AZ us-east-1a:
Public: 10.0.1.0/24 (ALB, NAT Gateway, bastion host)
Private: 10.0.10.0/24 (ECS tasks, EC2, Lambda)
Database: 10.0.20.0/24 (RDS, ElastiCache — never public)
AZ us-east-1b:
Public: 10.0.2.0/24
Private: 10.0.11.0/24
Database: 10.0.21.0/24
AZ us-east-1c:
Public: 10.0.3.0/24
Private: 10.0.12.0/24
Database: 10.0.22.0/24
Why three tiers?
- Public: Internet-facing load balancers only. Small subnets — you don't need many IPs here.
- Private: Your application logic. Larger subnets. No inbound internet.
- Database: Isolated. Only accept connections from the private subnet.
Route Tables
Every subnet has a route table that determines where traffic goes.
Public subnet route table:
Destination Target
10.0.0.0/16 local ← VPC-internal traffic
0.0.0.0/0 igw-xxx ← Everything else goes to internet
Private subnet route table:
Destination Target
10.0.0.0/16 local
0.0.0.0/0 nat-xxx ← Outbound only via NAT Gateway
Database subnet route table:
Destination Target
10.0.0.0/16 local ← Only internal traffic, no internet
Internet Gateway vs NAT Gateway
Internet Gateway (IGW):
- Attached to the VPC (one per VPC)
- Enables two-way internet communication for resources with public IPs
- Free — no per-hour charge, only data transfer costs
NAT Gateway:
- Placed in a public subnet
- Allows private subnet resources to initiate outbound internet connections
- Blocks all inbound connections from the internet
- Costs ~$0.045/hour per NAT Gateway + $0.045/GB data processed
- Deploy one per AZ for high availability
Internet → [Internet Gateway] → Public subnet → [NAT Gateway]
↓
Private subnet
(outbound only)
Security Groups vs Network ACLs
Both filter traffic, but at different levels and with different behavior.
Security Groups (instance-level firewall)
- Stateful: If you allow inbound port 443, the response traffic is automatically allowed outbound (no explicit outbound rule needed)
- Allow rules only: You can't explicitly deny traffic (only allow)
- Attached to: EC2 instances, ECS tasks, RDS, Lambda (in VPC), Load Balancers
- Evaluation: All rules are evaluated (most permissive wins)
# ALB security group — allow inbound 80/443 from internet
resource "aws_security_group" "alb" {
name_prefix = "alb-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 80
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# App security group — only accept traffic from ALB
resource "aws_security_group" "app" {
name_prefix = "app-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 3000
to_port = 3000
protocol = "tcp"
security_groups = [aws_security_group.alb.id] # Only from ALB!
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
# RDS security group — only accept traffic from app
resource "aws_security_group" "database" {
name_prefix = "db-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.app.id] # Only from app!
}
}
Network ACLs (subnet-level firewall)
- Stateless: You must explicitly allow both inbound AND outbound traffic
- Allow AND deny rules: Can block specific IPs
- Attached to: Subnets (applies to all resources in the subnet)
- Evaluation: Rules processed in numerical order, first match wins
NACLs are a second layer of defense. Best practice: use security groups as your primary protection, NACLs for broad subnet-level blocks (e.g., blocking known malicious IP ranges).
Complete Production VPC in Terraform
# terraform/vpc/main.tf
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0"
name = "production"
cidr = "10.0.0.0/16"
azs = ["us-east-1a", "us-east-1b", "us-east-1c"]
# Public subnets (ALB, NAT gateways)
public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
# Private subnets (application tier)
private_subnets = ["10.0.10.0/24", "10.0.11.0/24", "10.0.12.0/24"]
# Database subnets (RDS, ElastiCache)
database_subnets = ["10.0.20.0/24", "10.0.21.0/24", "10.0.22.0/24"]
create_database_subnet_group = true
create_database_subnet_route_table = true # No internet route for DB subnets
# One NAT Gateway per AZ for high availability
enable_nat_gateway = true
single_nat_gateway = false
one_nat_gateway_per_az = true
enable_dns_hostnames = true
enable_dns_support = true
# Required tags for EKS (if you'll run Kubernetes)
public_subnet_tags = {
"kubernetes.io/role/elb" = 1
"kubernetes.io/cluster/production" = "shared"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = 1
"kubernetes.io/cluster/production" = "shared"
}
tags = {
Environment = "production"
ManagedBy = "terraform"
}
}
VPC Endpoints
Keep traffic between AWS services within the AWS network — don't route it over the internet.
# S3 gateway endpoint (FREE — always use this)
resource "aws_vpc_endpoint" "s3" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.us-east-1.s3"
vpc_endpoint_type = "Gateway"
route_table_ids = module.vpc.private_route_table_ids
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Principal = "*"
Action = ["s3:*"]
Resource = "*"
}]
})
}
# DynamoDB gateway endpoint (FREE)
resource "aws_vpc_endpoint" "dynamodb" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.us-east-1.dynamodb"
vpc_endpoint_type = "Gateway"
route_table_ids = module.vpc.private_route_table_ids
}
# ECR interface endpoint (PAID — needed for ECS Fargate without internet)
resource "aws_vpc_endpoint" "ecr_api" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.us-east-1.ecr.api"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
}
resource "aws_vpc_endpoint" "ecr_dkr" {
vpc_id = module.vpc.vpc_id
service_name = "com.amazonaws.us-east-1.ecr.dkr"
vpc_endpoint_type = "Interface"
subnet_ids = module.vpc.private_subnets
security_group_ids = [aws_security_group.vpc_endpoints.id]
private_dns_enabled = true
}
Why endpoints matter:
- Security: Traffic never leaves AWS network
- Cost: S3/DynamoDB gateway endpoints are free. No NAT Gateway charges for S3 traffic.
- Performance: Lower latency without internet round-trip
VPC Peering
Connect two VPCs so resources can communicate using private IPs.
# Peer production VPC with data VPC
resource "aws_vpc_peering_connection" "prod_to_data" {
vpc_id = module.vpc_production.vpc_id # Requester
peer_vpc_id = module.vpc_data.vpc_id # Accepter
peer_region = "us-east-1"
auto_accept = true
tags = {
Name = "production-to-data"
}
}
# Add routes in both directions
resource "aws_route" "prod_to_data" {
route_table_id = module.vpc_production.private_route_table_ids[0]
destination_cidr_block = module.vpc_data.vpc_cidr_block
vpc_peering_connection_id = aws_vpc_peering_connection.prod_to_data.id
}
resource "aws_route" "data_to_prod" {
route_table_id = module.vpc_data.private_route_table_ids[0]
destination_cidr_block = module.vpc_production.vpc_cidr_block
vpc_peering_connection_id = aws_vpc_peering_connection.prod_to_data.id
}
VPC Peering limitations:
- No transitive routing (A peers with B and C, but B and C can't reach each other through A)
- Manual route table updates in every subnet
- CIDR blocks cannot overlap
- Better alternative for many VPCs: AWS Transit Gateway
Transit Gateway
For connecting 5+ VPCs, Transit Gateway is far more manageable than VPC Peering.
resource "aws_ec2_transit_gateway" "main" {
description = "Central routing hub"
default_route_table_association = "enable"
default_route_table_propagation = "enable"
amazon_side_asn = 64512
tags = { Name = "central-transit-gateway" }
}
# Attach each VPC to the Transit Gateway
resource "aws_ec2_transit_gateway_vpc_attachment" "production" {
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.vpc_production.vpc_id
subnet_ids = module.vpc_production.private_subnets
tags = { Name = "production" }
}
resource "aws_ec2_transit_gateway_vpc_attachment" "staging" {
transit_gateway_id = aws_ec2_transit_gateway.main.id
vpc_id = module.vpc_staging.vpc_id
subnet_ids = module.vpc_staging.private_subnets
tags = { Name = "staging" }
}
VPC Flow Logs
Enable VPC flow logs for security auditing and network debugging:
resource "aws_cloudwatch_log_group" "vpc_flow_logs" {
name = "/aws/vpc/flow-logs"
retention_in_days = 30
}
resource "aws_flow_log" "main" {
iam_role_arn = aws_iam_role.vpc_flow_logs.arn
log_destination = aws_cloudwatch_log_group.vpc_flow_logs.arn
traffic_type = "ALL"
vpc_id = module.vpc.vpc_id
tags = { Name = "production-flow-logs" }
}
Flow logs record every accepted/rejected connection: source IP, destination IP, port, protocol, bytes transferred. Invaluable for:
- Detecting port scans and unauthorized access attempts
- Debugging connectivity issues ("why can't pod A reach pod B?")
- Security compliance requirements
SAA-C03 VPC Exam Tips
The AWS Solutions Architect Associate exam loves VPC questions. Know these cold:
- Public subnet = has a route to an Internet Gateway in its route table
- Private subnet = no direct internet route; uses NAT Gateway for outbound
- NAT Gateway lives in a PUBLIC subnet, not private
- Internet Gateway is attached to the VPC, not subnets
- Security Groups: stateful, instance-level, allow only
- NACLs: stateless, subnet-level, allow and deny
- VPC Endpoints: keep traffic inside AWS network, no internet required
- VPC Peering: no transitive routing, can't overlap CIDRs
- Transit Gateway: hub-and-spoke, handles transitive routing
*CloudPath Academy's Phase 2 covers VPC architecture in depth with hands-on labs. You'll build a complete multi-tier VPC with public/private subnets, NAT Gateways, security groups, and VPC endpoints using Terraform.*