Postgres Transaction ID Wraparound: The 2-Billion Row Counter That Will Corrupt Your Database

How Transaction ID Wraparound Works

PostgreSQL's MVCC (Multi-Version Concurrency Control) uses transaction IDs (XIDs) to determine row visibility. Every INSERT and UPDATE stamps the row with the XID of the transaction that created it. When a transaction queries the database, PostgreSQL compares the row's XID against the transaction's snapshot to determine whether the row is visible.

XIDs are 32-bit unsigned integers. The maximum value is approximately 4.29 billion, but PostgreSQL uses modular arithmetic treating the XID space as circular. At any point in time, roughly 2 billion XIDs into the past are considered "in the past" (visible to current transactions) and 2 billion XIDs into the future are considered "in the future" (not yet visible).

When the live XID counter approaches a row's stamped XID plus 2 billion, that row's XID is about to flip from "in the past" to "in the future." A row that flips to "in the future" becomes invisible to all transactions — as if it doesn't exist. This is XID wraparound, and it is a form of data corruption.

The solution is freezing: VACUUM rewrites old rows with a special frozen XID marker (FrozenTransactionId, value 2) that means "this row is permanently visible to all transactions, regardless of the current XID counter." Once a row is frozen, it is immune to wraparound. The entire operational challenge of XID wraparound management is: ensuring VACUUM freezes rows before their XID gets too old.

The Warning Thresholds

PostgreSQL has two parameters governing freeze behavior:

vacuum_freeze_min_age (default: 50,000,000) — VACUUM will not freeze a row whose XID is newer than this many transactions ago. Prevents freezing rows that might still need to be seen as "in the future" by concurrent long transactions.
vacuum_freeze_table_age (default: 150,000,000) — When a table's oldest unfrozen XID is more than this many transactions old, VACUUM performs an aggressive full-table scan to freeze old tuples (rather than only scanning pages with dead tuples).
autovacuum_freeze_max_age (default: 200,000,000) — Forces autovacuum to run on any table whose oldest unfrozen XID exceeds this age. This is the last automatic defense before manual intervention is required.

The danger zone is at approximately 1.6 billion transactions of age (the default autovacuum_freeze_max_age of 200M is the last trigger; past that, the system is approaching the 2B limit). At 40 million transactions remaining, PostgreSQL begins logging warnings. At 3 million remaining, PostgreSQL refuses all new write transactions and shuts down to protect data integrity.

Monitoring XID Age

This query is the most important health check for any production PostgreSQL database. Run it daily or add it to your monitoring stack:

-- Per-database XID age — the primary wraparound health metric
SELECT
  datname,
  age(datfrozenxid) AS db_xid_age,
  pg_size_pretty(pg_database_size(datname)) AS db_size,
  CASE
    WHEN age(datfrozenxid) > 1500000000 THEN 'CRITICAL — manual VACUUM FREEZE required immediately'
    WHEN age(datfrozenxid) > 1000000000 THEN 'WARNING — increase autovacuum aggressiveness'
    WHEN age(datfrozenxid) > 500000000  THEN 'MONITOR — autovacuum should be keeping up'
    ELSE 'OK'
  END AS status
FROM pg_database
WHERE datallowconn
ORDER BY age(datfrozenxid) DESC;

-- Per-table XID age — find the tables blocking database-level freeze progress
SELECT
  n.nspname AS schema,
  c.relname AS table_name,
  age(c.relfrozenxid) AS table_xid_age,
  c.reltuples::BIGINT AS est_row_count,
  pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
  last_autovacuum,
  last_vacuum
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
LEFT JOIN pg_stat_user_tables t ON t.relid = c.oid
WHERE c.relkind = 'r'
  AND n.nspname NOT IN ('pg_catalog', 'information_schema')
ORDER BY age(c.relfrozenxid) DESC
LIMIT 20;

The database-level datfrozenxid is the minimum of all its tables' relfrozenxid values. The table in your ORDER BY ... DESC LIMIT 20 output that has the oldest XID age is the table controlling your wraparound exposure. It is not always the largest table — it is often a small, rarely-written table that autovacuum skips because it has few dead tuples.

Why Autovacuum Misses Freeze Targets

Autovacuum's default trigger is dead tuple count — it runs on a table when dead tuples exceed autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * reltuples. A table with very low write activity accumulates few dead tuples and is rarely vacuumed. If that table also has old unfrozen XIDs, it will age toward wraparound without autovacuum noticing.

The autovacuum_freeze_max_age parameter is supposed to be the backstop — autovacuum is forced to run on any table whose XID age exceeds this value regardless of dead tuple count. But autovacuum_freeze_max_age defaults to 200 million, meaning a table can sit untouched for 200 million transactions before this override kicks in. On a busy OLTP system processing 100,000 transactions per second, 200 million transactions elapse in about 33 minutes. On a system with mixed workloads, elapsed time varies enormously.

Three situations that cause autovacuum to fall behind on freezing despite correct configuration:

Long-running transactions: An open transaction pins the XID horizon — autovacuum cannot freeze rows newer than the oldest active transaction. A transaction running for hours or days prevents all rows created during that window from being frozen.
Autovacuum cost throttling: Default autovacuum has aggressive I/O cost delay settings (autovacuum_vacuum_cost_delay = 2ms, autovacuum_vacuum_cost_limit = 200) that deliberately slow it down to minimize impact on production workloads. On a large table requiring aggressive freeze work, this throttling can make the autovacuum run take longer than the transaction rate advances the XID counter.
Autovacuum disabled on specific tables: Tables with autovacuum_enabled = false are entirely exempt from autovacuum. This is sometimes set on large archive tables where the DBA decided autovacuum overhead wasn't worth it — but those tables still age toward wraparound.

Finding the XID Horizon Blockers

-- Find long-running transactions blocking the XID horizon
SELECT
  pid,
  usename,
  application_name,
  state,
  xact_start,
  NOW() - xact_start AS duration,
  backend_xid,
  backend_xmin,
  LEFT(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE state != 'idle'
  AND xact_start IS NOT NULL
  AND NOW() - xact_start > INTERVAL '1 hour'
ORDER BY xact_start;

-- Check for replication slots pinning the XID horizon
-- (common on Aurora with logical replication enabled)
SELECT
  slot_name,
  slot_type,
  active,
  xmin,
  catalog_xmin,
  restart_lsn,
  age(xmin) AS xmin_age
FROM pg_replication_slots
ORDER BY age(xmin) DESC NULLS LAST;

Inactive logical replication slots are one of the most dangerous XID horizon pinners. An unused slot from a decommissioned CDC consumer can hold the XID horizon in place indefinitely. The age(xmin) on a stale replication slot can exceed autovacuum_freeze_max_age, preventing any table from being frozen past that point.

-- Drop an inactive replication slot that's blocking freeze
SELECT pg_drop_replication_slot('stale_debezium_slot');

-- VERIFY the slot is stale before dropping — check with the CDC team first
-- Dropping an active slot interrupts the consumer pipeline

Autovacuum Configuration for Freeze Safety

The default autovacuum settings are conservative about resource usage. On a production system where wraparound risk is a concern, these per-table overrides on high-age tables reduce the freeze lag:

-- For a large table with high XID age, increase freeze aggressiveness
ALTER TABLE your_large_table SET (
  autovacuum_freeze_max_age = 100000000,   -- force freeze check earlier (100M vs 200M default)
  autovacuum_vacuum_cost_delay = 0,         -- remove I/O throttling for this table
  autovacuum_vacuum_cost_limit = 800        -- higher cost limit = faster progress
);

-- For the entire database, reduce the global freeze threshold
-- (in postgresql.conf or via ALTER SYSTEM)
ALTER SYSTEM SET autovacuum_freeze_max_age = 150000000;  -- 150M vs 200M default
ALTER SYSTEM SET vacuum_freeze_table_age   = 100000000;  -- aggressive table-level freeze
SELECT pg_reload_conf();

Emergency Manual Freeze

When the database XID age crosses 1.5 billion and autovacuum is not keeping up, manual VACUUM FREEZE is required. This is a heavyweight operation that scans the entire table:

-- Manual freeze on the highest-age table
-- Run during low-traffic window; this is I/O intensive
VACUUM (FREEZE, ANALYZE, VERBOSE) schema_name.table_name;

-- Progress monitoring (PostgreSQL 9.6+)
SELECT
  p.pid,
  p.phase,
  p.heap_blks_total,
  p.heap_blks_scanned,
  ROUND(100.0 * p.heap_blks_scanned / NULLIF(p.heap_blks_total, 0), 1) AS pct_complete,
  p.heap_blks_vacuumed,
  p.index_vacuum_count
FROM pg_stat_progress_vacuum p
JOIN pg_stat_activity a ON p.pid = a.pid;

On Aurora PostgreSQL and RDS PostgreSQL, VACUUM FREEZE on a large table during business hours will cause I/O contention and elevated latency. Schedule it during the lowest-traffic window you have. If the XID age is above 1.8 billion, there is no low-traffic window — run it immediately regardless of business hours, because the alternative (write shutdown or data corruption) is worse than the VACUUM I/O impact.

The CloudWatch Alert You Should Have Running

# For Aurora PostgreSQL and RDS PostgreSQL,
# AWS publishes MaximumUsedTransactionIDs as a CloudWatch metric

aws cloudwatch put-metric-alarm \
  --alarm-name "postgres-xid-wraparound-warning" \
  --alarm-description "XID age approaching freeze threshold" \
  --namespace AWS/RDS \
  --metric-name MaximumUsedTransactionIDs \
  --dimensions Name=DBInstanceIdentifier,Value=your-instance-id \
  --statistic Maximum \
  --period 3600 \
  --threshold 1000000000 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:account-id:your-pager-topic

# Set a second alarm at the critical threshold
aws cloudwatch put-metric-alarm \
  --alarm-name "postgres-xid-wraparound-critical" \
  --alarm-description "XID age critical — manual VACUUM FREEZE required now" \
  --namespace AWS/RDS \
  --metric-name MaximumUsedTransactionIDs \
  --dimensions Name=DBInstanceIdentifier,Value=your-instance-id \
  --statistic Maximum \
  --period 900 \
  --threshold 1500000000 \
  --comparison-operator GreaterThanOrEqualToThreshold \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:account-id:your-pager-topic \
  --ok-actions arn:aws:sns:us-east-1:account-id:your-pager-topic

For self-managed PostgreSQL on EC2, the MaximumUsedTransactionIDs CloudWatch metric is not available. Export the age(datfrozenxid) value from the monitoring query above into your observability stack (Prometheus, Datadog, CloudWatch custom metrics) and alert at 1 billion and 1.5 billion.

If your PostgreSQL monitoring setup has no XID age alert, that is a gap with a potentially catastrophic failure mode. Add it before you do anything else today.

No XID wraparound monitoring on your Postgres cluster?

Start with a free 20-minute assessment call to scope the problem and decide whether a paid diagnostic or implementation step is worthwhile. Written findings are not included in the free call.

Get a Free Assessment → More Articles