How Aurora Serverless v2 Scaling Works
Aurora Serverless v2 runs on dedicated infrastructure within AWS but scales the compute allocation (measured in Aurora Capacity Units, or ACUs) continuously in response to observed load. Each ACU represents approximately 2 GiB of memory and proportional CPU. A cluster configured with a minimum of 0.5 ACU and maximum of 128 ACU can in principle scale from the equivalent of a nano instance to roughly a 256 GiB memory instance.
The scaling mechanism is reactive: Aurora monitors CPU utilization, connection count, and memory pressure, then adjusts ACU allocation up or down. Scale-up happens in ACU increments and takes time — typically 15–90 seconds per scaling step depending on the magnitude of the increase. Scale-down is more gradual to avoid oscillation.
This is the fundamental constraint. Aurora Serverless v2 is not "instant scale." It is "continuous scale with a reactive loop." The reactive loop has latency, and that latency is the source of cold-start-adjacent behavior.
The Minimum ACU Setting Is the Most Important Parameter
The single configuration decision with the largest impact on burst behavior is the minimum ACU setting. A cluster configured with min_capacity = 0.5 ACU idles at 0.5 ACU and must scale up when traffic arrives. A cluster configured with min_capacity = 4 ACU is always running at the equivalent of a small-to-medium RDS instance and has both spare buffer pool capacity and spare connection headroom to absorb moderate bursts without scaling.
# View current serverless v2 scaling configuration
aws rds describe-db-clusters \
--db-cluster-identifier your-cluster-id \
--query 'DBClusters[0].ServerlessV2ScalingConfiguration'
# Output:
# {
# "MinCapacity": 0.5,
# "MaxCapacity": 16.0
# }
# Modify minimum ACU to reduce burst sensitivity
aws rds modify-db-cluster \
--db-cluster-identifier your-cluster-id \
--serverless-v2-scaling-configuration MinCapacity=2,MaxCapacity=16 \
--apply-immediately
The cost implication is real: a minimum of 0.5 ACU costs roughly $0.06/hour in us-east-1 (2025 pricing); a minimum of 4 ACU costs roughly $0.48/hour. For a cluster that genuinely idles to near-zero traffic for extended periods (overnight, weekends), the 0.5 ACU minimum makes sense. For a cluster that has a predictable business-hours traffic pattern with near-zero off-hours load, a scheduled scaling approach is better than a permanent minimum.
Diagnosing Burst-Related Scaling Lag
The CloudWatch metrics that expose scaling lag for Aurora Serverless v2:
# ACU utilization — tracks actual ACU usage vs provisioned
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name ServerlessDatabaseCapacity \
--dimensions Name=DBClusterIdentifier,Value=your-cluster-id \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Average Maximum \
--output table
# DatabaseConnections — spike here before ACU spike indicates connection pressure
aws cloudwatch get-metric-statistics \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBClusterIdentifier,Value=your-cluster-id \
--start-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ) \
--end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
--period 60 \
--statistics Average Maximum \
--output table
The pattern to look for: a spike in DatabaseConnections followed 30–90 seconds later by a spike in ServerlessDatabaseCapacity. The gap between these two events is the scaling lag window during which the cluster is under-provisioned for the incoming load. Connection timeouts and elevated latency concentrate in this window.
If you see connection spikes that do not produce a corresponding ACU increase, the cluster may be hitting the maximum ACU cap — scale-up is requested but capped at your configured maximum.
The Connection Limit Problem at Low ACU
Aurora Serverless v2 derives its maximum connection count from current ACU allocation, not from max ACU. At 0.5 ACU, Aurora PostgreSQL supports approximately 90 connections. At 2 ACU, approximately 360 connections. At 8 ACU, approximately 1440 connections.
This means a cluster sitting at minimum ACU after an idle period cannot accept a burst of 200 connections — those connections queue or fail while Aurora scales. The connection limit scales with ACU; the ACU scales with load; but load arrives before ACU can catch up. This is the circular dependency that causes cold-start symptoms.
-- Check current connection limit on Aurora PostgreSQL
SHOW max_connections;
-- For Aurora Serverless v2, this value changes as ACUs scale
-- Monitor it to understand the current ceiling
SELECT
setting AS max_connections_current,
unit
FROM pg_settings
WHERE name = 'max_connections';
RDS Proxy is the standard mitigation for the connection limit problem. RDS Proxy maintains a persistent connection pool to the Aurora cluster and multiplexes application connections onto a smaller number of database connections. Under a burst, the proxy handles hundreds of application connections while the cluster scales, queuing excess connections in the proxy pool rather than failing them outright.
# Create an RDS Proxy for an Aurora Serverless v2 cluster
aws rds create-db-proxy \
--db-proxy-name your-proxy-name \
--engine-family POSTGRESQL \
--auth '[{"AuthScheme":"SECRETS","SecretArn":"arn:aws:secretsmanager:...","IAMAuth":"DISABLED"}]' \
--role-arn arn:aws:iam::account-id:role/rds-proxy-role \
--vpc-subnet-ids subnet-xxx subnet-yyy \
--vpc-security-group-ids sg-xxx
# Register the Aurora cluster with the proxy
aws rds register-db-proxy-targets \
--db-proxy-name your-proxy-name \
--db-cluster-identifiers your-cluster-id
Scheduled Scaling for Predictable Traffic Patterns
For workloads with predictable burst windows — business-hours OLTP, daily batch jobs, weekly reporting runs — scheduled minimum ACU changes eliminate the reactive lag for known events:
import boto3
from datetime import datetime
rds = boto3.client('rds', region_name='us-east-1')
def set_min_capacity(cluster_id: str, min_acu: float, max_acu: float):
rds.modify_db_cluster(
DBClusterIdentifier=cluster_id,
ServerlessV2ScalingConfiguration={
'MinCapacity': min_acu,
'MaxCapacity': max_acu
},
ApplyImmediately=True
)
print(f"{datetime.now()}: Set {cluster_id} to min={min_acu} max={max_acu} ACU")
# Called by EventBridge Scheduler at business hours start (e.g., 7:55 AM ET Mon-Fri)
def pre_warm_for_business_hours(event, context):
set_min_capacity('your-cluster-id', min_acu=4.0, max_acu=32.0)
# Called by EventBridge Scheduler at business hours end (e.g., 7:00 PM ET Mon-Fri)
def scale_down_after_hours(event, context):
set_min_capacity('your-cluster-id', min_acu=0.5, max_acu=32.0)
The pre-warm Lambda runs 5 minutes before expected traffic to give Aurora time to reach the new minimum before requests arrive. This eliminates scaling lag for the morning traffic burst that catches most Serverless v2 clusters unprepared.
Workloads That Are Poor Fits for Serverless v2
Aurora Serverless v2 is a good fit for variable workloads with at least occasional quiet periods. It is a poor fit for:
- Sustained high-throughput OLTP with consistent load: A cluster that never scales below 8 ACU because it's always busy should just be a provisioned r6g.2xlarge — the Serverless v2 pricing model provides no savings at sustained high utilization and the reactive scaling adds overhead.
- Latency-sensitive applications with unpredictable spike patterns: If your traffic can spike 10x with no advance warning and p99 latency must stay under 50ms, the scaling lag during the first 30–90 seconds of a spike is a problem that minimum ACU configuration helps but does not fully solve.
- Applications that cannot use RDS Proxy: Certain PostgreSQL features break under RDS Proxy's connection multiplexing — notably prepared statements with session-scoped state and SET commands for session configuration. If your application relies heavily on these, RDS Proxy compatibility needs verification before deploying it as the burst buffer.
Monitoring Setup
A CloudWatch alarm that catches scaling pressure before it becomes a user-visible problem:
# Alarm: ACU utilization approaching maximum — risk of cap-induced lag
aws cloudwatch put-metric-alarm \
--alarm-name "aurora-serverless-v2-acu-ceiling" \
--alarm-description "ACU approaching max capacity — scale-up may be capped" \
--namespace AWS/RDS \
--metric-name ServerlessDatabaseCapacity \
--dimensions Name=DBClusterIdentifier,Value=your-cluster-id \
--statistic Maximum \
--period 300 \
--threshold 12 \ # 75% of max_capacity=16
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:account-id:your-alert-topic
# Alarm: connection count approaching current ACU-derived limit
aws cloudwatch put-metric-alarm \
--alarm-name "aurora-serverless-v2-connections-high" \
--alarm-description "Connection count high relative to current ACU allocation" \
--namespace AWS/RDS \
--metric-name DatabaseConnections \
--dimensions Name=DBClusterIdentifier,Value=your-cluster-id \
--statistic Maximum \
--period 60 \
--threshold 200 \ # adjust based on your expected ACU range
--comparison-operator GreaterThanOrEqualToThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:account-id:your-alert-topic
Pair these alarms with a dashboard that plots ServerlessDatabaseCapacity, DatabaseConnections, and CPUUtilization on the same timeline. The visual correlation between connection spikes and ACU scaling events makes scaling lag immediately obvious and gives you the data to tune the minimum ACU setting based on actual traffic patterns rather than guessing.
Running Aurora Serverless v2 and seeing intermittent connection timeouts under traffic bursts?
We analyze your CloudWatch scaling metrics, configure minimum ACU and RDS Proxy settings for your actual traffic pattern, and eliminate the scaling lag window. Free assessment, no obligation.