The Symptom
Aurora PostgreSQL query times creeping up over weeks with no schema change, no traffic change, and no obvious cause. Query plans that used to return in 200 ms now take 4–8 seconds. The team suspects Aurora capacity; they scale up the instance. The problem persists. They add indexes. No change. Eventually someone runs VACUUM ANALYZE manually on a table and query times snap back.
The table is partitioned. Every partition is large. Autovacuum is the default configuration. This is the single most common performance failure mode we see on Aurora PostgreSQL with partitioned workloads, and it happens slowly enough that nobody notices until it's serious.
Why Defaults Fail on Partitioned Tables
PostgreSQL autovacuum runs as a set of worker processes that scan tables in turn, vacuuming those whose dead-tuple count exceeds a threshold. The threshold defaults to 20% of live tuples (autovacuum_vacuum_scale_factor = 0.2).
On an unpartitioned table with a uniform write pattern, this works fine. Dead tuples distribute evenly, the worker gets to the table regularly, cleanup happens. On a partitioned table, the writes are not uniform. They concentrate on the active partition (usually the current month, quarter, or region).
Consider a transactions table partitioned by month, with 120 monthly partitions. The oldest partitions receive zero writes. The current month partition receives 100% of writes. Autovacuum workers cycle through all 120 partitions looking for work, find nothing to do on 119 of them, and by the time they reach the active one, the naptime expires and they return to the start of the cycle.
Meanwhile, the active partition is accumulating dead tuples faster than vacuum can clear them. Table bloat grows. Statistics go stale. Query plans that depended on accurate row-count estimates pick wrong paths. The performance cliff is gradual but inevitable.
The Settings That Fix It
Cluster-Level Settings (Aurora parameter group)
autovacuum_vacuum_scale_factor = 0.05— trigger vacuum at 5% dead tuples instead of 20%autovacuum_analyze_scale_factor = 0.02— trigger analyze at 2% changes instead of 10%autovacuum_naptime = 15s— check for work every 15 seconds instead of every minuteautovacuum_max_workers = 6— increase from default 3 to handle more partitions concurrentlyautovacuum_vacuum_cost_limit = 2000— increase from default 200 so vacuum runs faster when it runs
Aurora's I/O characteristics (separation of compute from storage) mean you can afford to be more aggressive than self-managed PostgreSQL on the cost limit. Vacuum I/O is already running against the distributed storage layer; the impact on OLTP is smaller than it would be on a co-located disk.
Per-Partition Overrides on the Active Partition
Cluster-level tuning is not enough. The active partition needs more aggressive settings than the older read-only partitions. Apply overrides via ALTER TABLE:
ALTER TABLE transactions_2026_05 SET (
autovacuum_vacuum_scale_factor = 0.02,
autovacuum_analyze_scale_factor = 0.01,
autovacuum_vacuum_cost_delay = 0
);
When the active partition rotates (new month), apply these settings to the new partition and revert the old one to normal. This step belongs in your partition creation automation — not a manual task you remember to do.
Separately Scheduled ANALYZE
Autovacuum triggers ANALYZE on the same partition it vacuums, but only when the analyze threshold is hit. On very high-write partitions, the analyze lags the write rate — statistics drift even though vacuum is running. Query plans suffer.
Schedule an explicit ANALYZE against the active partition every 2–4 hours, independent of autovacuum. On Aurora, pg_cron is available as an extension and is the clean place for this:
SELECT cron.schedule(
'analyze_active_partition',
'0 */4 * * *',
$$ANALYZE transactions_2026_05;$$
);
This is a small operation — typically 30–90 seconds on a hot partition — and has no user-visible impact on workload.
Partition Rotation Automation
When a new month arrives, the following should happen automatically (via pg_cron, AWS Lambda, or your scheduler of choice):
- Create the new partition (if using native partitioning, this may be automatic via default partitions or pg_partman)
- Apply aggressive autovacuum settings to the new active partition
- Revert the previous active partition's autovacuum settings to cluster defaults
- Update the pg_cron ANALYZE job to point at the new partition
- Run an initial
ANALYZEon the new partition after the first hour of traffic
Leaving this as a manual process means the day it's forgotten is the day performance drifts for another month.
Monitoring That Catches Drift Early
Three queries worth running from a monitoring system:
Per-Partition Dead Tuple Ratio
SELECT
relname,
n_live_tup,
n_dead_tup,
n_dead_tup::float / NULLIF(n_live_tup, 0) AS dead_ratio,
last_autovacuum,
last_autoanalyze
FROM pg_stat_user_tables
WHERE relname LIKE 'transactions_%'
ORDER BY dead_ratio DESC NULLS LAST
LIMIT 10;
Alert if dead_ratio > 0.15 on any partition for more than an hour.
Autovacuum Activity
SELECT
schemaname,
relname,
autovacuum_count,
autoanalyze_count,
last_autovacuum,
AGE(NOW(), last_autovacuum) AS time_since_vacuum
FROM pg_stat_user_tables
WHERE relname LIKE 'transactions_%';
Alert if an active partition has gone more than 6 hours without an autovacuum run.
Table Bloat Estimate
Use the pgstattuple extension (available on Aurora) for accurate bloat measurement. A more expensive query — run monthly rather than continuously — that gives you a ground-truth picture of how much space is wasted. Bloat >30% on any partition means autovacuum is not keeping up.
A note on VACUUM FULL: avoid it on Aurora partitions. It takes an ACCESS EXCLUSIVE lock and rewrites the entire table. For a 50 GB partition that's a long outage. Use pg_repack (available on Aurora as an extension since PG 14) for online table reorganization when bloat has already accumulated. Prevention via correct autovacuum tuning is always cheaper than cure.
The Shared-Buffers Interaction
One additional consideration specific to Aurora: shared_buffers is already tuned aggressively by AWS, typically at 75% of instance memory. Autovacuum reads pages into shared_buffers; on memory-constrained instances, aggressive autovacuum can cause cache churn.
On Aurora, this is less of a problem than on self-managed PostgreSQL because the storage layer handles much of the I/O pressure. But if you see increased buffer cache misses after tuning autovacuum aggressively, consider raising autovacuum_vacuum_cost_delay slightly (from 0 to 2–5ms) to let vacuum yield more.
The Bottom Line
Aurora PostgreSQL autovacuum defaults are sized for simple unpartitioned workloads. Partitioned tables — the norm at any meaningful scale on Aurora — require cluster-level tuning, per-partition overrides, explicit ANALYZE scheduling, and automation that follows the active partition as it rotates.
The tuning is not complex. The operational discipline of keeping it current as partitions rotate is what separates clusters that stay healthy for years from ones that quietly degrade one quarter at a time.
Audit your Aurora autovacuum config?
We tune PostgreSQL autovacuum for partitioned workloads as a standing part of our DBA practice. 30-min scoping call, written recommendation in 5–7 business days.