The Default Parameter Group Problem
When you provision an Aurora PostgreSQL cluster, AWS applies a default parameter group. That group is designed to work safely across the widest possible range of workloads — which means it's tuned conservatively, not optimally.
For many clusters running in production at real scale, the defaults are meaningfully wrong: not dangerous, but leaving performance on the table. Connection pools saturate at the default max_connections. Query planning goes sideways on tables with non-uniform data distribution. Checkpoint behavior creates I/O spikes at predictable intervals.
Here are the 12 parameters that matter most, what the defaults are, and what to set them to.
Memory Parameters
1. shared_buffers
Default in Aurora: Set automatically based on instance class (typically 25% of RAM).
What it does: The size of PostgreSQL's shared memory buffer pool — data pages cached in memory.
Production guidance: Aurora manages its buffer pool differently from standard PostgreSQL because of the shared storage architecture. The shared_buffers value still matters for the local cache layer. In most Aurora deployments, the default is appropriate. However, if you're running a write-heavy OLTP workload on an instance with >16 GB RAM, bumping to 30–35% of RAM can reduce I/O wait.
The Aurora nuance: Aurora's storage layer has its own cache (the buffer cache in the Aurora storage node). Some reads that would hit disk in standard PostgreSQL will be served from Aurora storage cache instead. Don't over-allocate shared_buffers thinking you're compensating for slow storage — you're not, the storage layer is already fast.
2. work_mem
Default: 4 MB
What it does: Memory available to each sort or hash operation per query. A complex query with multiple sorts and hash joins can use work_mem multiple times simultaneously.
Production issue: At 4 MB, sort operations on large result sets spill to disk. This is extremely common in reporting workloads.
Production guidance: 64–256 MB for OLTP clusters with occasional reporting. For dedicated analytics/reporting clusters, 512 MB–2 GB. Be conservative: work_mem multiplies by concurrent operations, and setting it too high causes OOM in high-concurrency scenarios.
-- Check how often sorts are spilling to disk
SELECT query, rows, sort_mem_used, sort_disk_used
FROM pg_stat_statements
WHERE sort_disk_used > 0
ORDER BY sort_disk_used DESC
LIMIT 20;
3. effective_cache_size
Default: 4 GB (static, not auto-set by AWS for most instance types)
What it does: Hints to the query planner about the total memory available for caching. Does NOT allocate memory — it's just a planner hint.
Production issue: With a default of 4 GB on an r6g.4xlarge with 128 GB RAM, the planner significantly underestimates available cache and may choose sequential scans over index scans on large tables.
Production guidance: Set to 75% of total instance RAM. On a 128 GB instance: 96 GB. On a 32 GB instance: 24 GB.
Connection Parameters
4. max_connections
Default: Auto-set by Aurora based on instance class (typically LEAST({DBInstanceClassMemory/9531392}, 5000)).
The math: On a db.r6g.large (16 GB RAM), this gives ~1,700 connections. That sounds like a lot.
Production issue: Each PostgreSQL connection consumes ~5–10 MB of memory in backend process overhead. At 1,700 connections on a 16 GB instance, you've used 8–17 GB just in connection overhead before any query runs.
Production guidance: Use a connection pooler (PgBouncer via RDS Proxy or self-managed). Set max_connections based on actual connection pooler output, not anticipated application connections. Most production clusters need 200–400 direct database connections from the pooler.
5. idle_in_transaction_session_timeout
Default: 0 (disabled)
What it does: Terminates sessions that have been idle-in-transaction beyond this duration.
Production issue: Long-running idle transactions hold locks, block autovacuum, and bloat the transaction ID space. Applications with connection leak bugs or ORMs that forget to close transactions will accumulate these.
Production guidance: 60000 (60 seconds) for OLTP workloads. Long-running analytical transactions need a higher value or should run in dedicated sessions.
6. statement_timeout
Default: 0 (disabled)
What it does: Cancels queries running longer than the specified duration.
Production guidance: Set at the application connection level rather than instance-wide, because different workloads have different tolerance. For OLTP: 30000 (30 seconds) at connection string. For batch jobs and analytics: no limit or application-controlled. Set instance-wide to 300000 (5 minutes) as a last-resort safety net.
Write Performance Parameters
7. checkpoint_completion_target
Default: 0.9
What it does: Controls how aggressively PostgreSQL spreads checkpoint writes over the interval between checkpoints. Higher = more spread out = smoother I/O.
Production guidance: The default of 0.9 is actually good for most Aurora workloads. If you're seeing I/O spikes at regular intervals, confirm checkpoint_timeout is also appropriate.
8. checkpoint_timeout
Default: 5 minutes
What it does: Maximum time between automatic checkpoints.
Production issue: On high-write workloads, checkpoints happen more frequently than the timeout because of the max_wal_size limit. On low-write workloads with large transactions, infrequent checkpoints mean more recovery work after a crash.
Production guidance: For heavy write workloads: 15–30 minutes. Combined with a larger max_wal_size, this spreads I/O more evenly.
9. synchronous_commit
Default: on
What it does: Whether COMMIT waits for WAL to be written to disk before returning to the application.
Production guidance for Aurora: Aurora's replication model means the semantics here differ from standard PostgreSQL. Aurora replaces the physical WAL write with its own storage protocol. For most Aurora workloads, the default on is correct and the performance difference is minimal because Aurora's storage write is already fast. Only disable if you've confirmed a specific write-latency bottleneck and understand the durability tradeoff.
Query Planning Parameters
10. random_page_cost
Default: 4.0
What it does: Planner's estimate of the cost of a random disk page fetch relative to a sequential read. Higher values bias the planner toward sequential scans.
Production issue: A value of 4.0 was calibrated for spinning disk. Aurora's storage is SSD-backed and has a local buffer cache. Random page access is much cheaper relative to sequential reads.
Production guidance: Set to 1.1 or 1.0 for Aurora. This tells the planner that random access is cheap, encouraging index usage over sequential scans on large tables where indexes are actually selective.
-- Check if the planner is choosing sequential scans where you'd expect index scans
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM orders WHERE customer_id = 12345;
-- If you see "Seq Scan" on a large table with a selective filter,
-- random_page_cost may be the culprit
11. default_statistics_target
Default: 100
What it does: How many distinct values the planner tracks per column for statistics. Higher = more accurate cardinality estimates = better query plans.
Production issue: For columns with many distinct values (customer IDs, transaction IDs, timestamps with high cardinality), 100 buckets can produce poor estimates, leading to plan instability when data distribution is non-uniform.
Production guidance: Set to 500 globally. For specific high-cardinality columns that participate in join conditions: ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000.
12. enable_partitionwise_join and enable_partitionwise_aggregate
Default: both off
What they do: Allow the planner to push joins and aggregations down into individual partitions rather than materializing the entire partitioned table first.
Production guidance: Enable both if you use table partitioning. The planning time increases slightly but execution time for partition-pruning queries improves significantly.
-- Enable these if you use partitioned tables
SET enable_partitionwise_join = on;
SET enable_partitionwise_aggregate = on;
A Practical Parameter Group Template
-- Settings to apply in your Aurora PostgreSQL parameter group
-- Adjust for your specific instance size and workload
-- Memory
work_mem = '128MB' -- adjust based on max_connections
effective_cache_size = '24GB' -- 75% of instance RAM
-- Connections
idle_in_transaction_session_timeout = 60000 -- milliseconds
statement_timeout = 300000 -- 5 min safety net; override per workload
-- Checkpoints
checkpoint_timeout = 900 -- 15 minutes for write-heavy workloads
-- Query Planning
random_page_cost = 1.1 -- Aurora is SSD-backed
default_statistics_target = 500 -- better cardinality estimates
enable_partitionwise_join = on -- if you use partitioned tables
enable_partitionwise_aggregate = on -- if you use partitioned tables
Monitoring Your Parameter Tuning
After applying parameter changes, watch these metrics for 48 hours:
- CloudWatch:
FreeableMemory— confirm you haven't over-allocated memory - CloudWatch:
DiskQueueDepth— checkpoint I/O spikes should smooth out pg_stat_statements— query plan stability (watch for regressions in previously fast queries)pg_stat_activitywith idle-in-transaction filter — confirmsidle_in_transaction_session_timeoutis catching stragglers
Need help tuning Aurora PostgreSQL?
We specialize in Aurora PostgreSQL performance tuning and DBA consulting. If your cluster's query performance has plateaued or you're seeing unpredictable slowdowns, we can help.