Measuring and Reclaiming Postgres Index Bloat Without Downtime

Why Indexes Bloat When Tables Don't

Postgres MVCC marks deleted and updated rows as dead tuples rather than immediately reclaiming their storage. Autovacuum eventually recycles those dead tuple slots in the heap (table) by marking them as reusable. But indexes are different: every index entry pointing to a dead tuple remains in the index structure until the index is rebuilt. Autovacuum removes dead index entries during a vacuum pass, but only when the index page has enough dead entries to justify a full page scan.

On high-churn tables — order tables, event tables, session tables, anything with frequent UPDATE or DELETE — index pages accumulate dead entries faster than autovacuum's default thresholds trigger cleanup. The result: an index that physically occupies 4 GB but contains only 1.2 GB of live data. Every index scan reads the bloated physical structure, pulling more pages into the buffer cache and performing more I/O than necessary.

The standard tools for checking index size do not reveal this:

-- This shows physical size, not live data size
SELECT
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders'
ORDER BY pg_relation_size(indexrelid) DESC;

A 4 GB index showing here could be 30% live or 90% live — you cannot tell from pg_relation_size alone.

Installing and Using pgstattuple

pgstattuple is a contrib extension included with Postgres. It reads the physical structure of heap and index pages and reports live vs. dead content ratios. Unlike the catalog-based size estimates that many bloat queries use, pgstattuple performs an actual page-by-page scan and reports real numbers.

-- Install the extension (requires superuser)
CREATE EXTENSION IF NOT EXISTS pgstattuple;

-- Measure bloat in a specific index
SELECT * FROM pgstatindex('orders_customer_id_idx');

The output columns that matter:

-- Example pgstatindex output:
--  version          | 4
--  tree_level       | 3
--  index_size       | 4362076160   -- 4.1 GB physical size
--  root_block_no    | 412
--  internal_pages   | 2847
--  leaf_pages       | 531891
--  empty_pages      | 0
--  deleted_pages    | 18432        -- pages marked deleted, fully recyclable
--  avg_leaf_density | 41.23        -- 41% of leaf page space is live data
--  leaf_fragmentation | 62.7       -- % of leaf pages out of logical order

-- avg_leaf_density below 50% is significant bloat
-- leaf_fragmentation above 30% degrades sequential index scans

The avg_leaf_density value is the key metric. A healthy B-tree index built on sequentially inserted data has leaf density around 90%. An index on a high-churn table can drop to 30–40%, meaning you're reading 2–3x more pages than the live data requires.

Database-Wide Bloat Scan

Checking indexes one by one is impractical. Use pgstatindex across all large indexes to identify the worst offenders:

-- Scan all indexes over 100MB and report bloat
SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(index_size) AS physical_size,
  round(avg_leaf_density::numeric, 1) AS leaf_density_pct,
  round(leaf_fragmentation::numeric, 1) AS fragmentation_pct,
  deleted_pages,
  pg_size_pretty(
    (index_size * (1 - avg_leaf_density / 100))::bigint
  ) AS estimated_bloat
FROM (
  SELECT
    n.nspname AS schemaname,
    c.relname AS tablename,
    i.relname AS indexname,
    (pgstatindex(i.relname::text)).*
  FROM pg_index ix
  JOIN pg_class c ON c.oid = ix.indrelid
  JOIN pg_class i ON i.oid = ix.indexrelid
  JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname NOT IN ('pg_catalog', 'pg_toast', 'information_schema')
    AND pg_relation_size(ix.indexrelid) > 100 * 1024 * 1024  -- > 100MB
) bloat_data
WHERE avg_leaf_density < 70  -- only report significantly bloated indexes
ORDER BY estimated_bloat DESC;

This query is a full physical scan of every qualifying index — it will take minutes on a large database and generate significant I/O. Run it during a low-traffic window, not during peak hours.

REINDEX CONCURRENTLY: The Non-Blocking Rebuild

Before Postgres 12, REINDEX took an exclusive lock on the table for the duration of the rebuild — unusable in production. REINDEX CONCURRENTLY, introduced in Postgres 12, builds a new index alongside the old one using the same mechanism as CREATE INDEX CONCURRENTLY, then atomically swaps them.

-- Rebuild a single index without locking
REINDEX INDEX CONCURRENTLY orders_customer_id_idx;

-- Rebuild all indexes on a table without locking
REINDEX TABLE CONCURRENTLY orders;

-- Rebuild all indexes in a schema
REINDEX SCHEMA CONCURRENTLY public;

Operational notes for REINDEX CONCURRENTLY:

It takes longer than a regular REINDEX. The concurrent build has to scan the table multiple times and handle writes arriving during the build. Expect 2–3x the time of a regular REINDEX for a large index.
It consumes additional disk space during the build. While the new index is being built, both the old and new index exist on disk simultaneously. For a 4 GB index, you need 4+ GB of free disk space before starting.
It cannot run inside a transaction block. REINDEX CONCURRENTLY must be run outside any explicit transaction. Wrapping it in BEGIN ... COMMIT will error.
If it fails midway, it leaves an invalid index. The invalid index is named with a ccnew suffix. Clean it up with DROP INDEX CONCURRENTLY before retrying.

-- Check for invalid indexes left by a failed REINDEX CONCURRENTLY
SELECT schemaname, tablename, indexname, indexdef
FROM pg_indexes
JOIN pg_class ON pg_class.relname = indexname
JOIN pg_index ON pg_index.indexrelid = pg_class.oid
WHERE NOT pg_index.indisvalid
ORDER BY schemaname, tablename;

-- Drop an invalid index from a failed concurrent reindex
DROP INDEX CONCURRENTLY IF EXISTS orders_customer_id_idx_ccnew;

Automating Bloat Remediation

For databases with ongoing high churn, periodic index rebuilds should be part of the maintenance schedule. A simple approach: a scheduled query that identifies indexes with leaf density below a threshold and generates REINDEX commands.

-- Generate REINDEX CONCURRENTLY commands for bloated indexes
SELECT
  'REINDEX INDEX CONCURRENTLY ' || quote_ident(schemaname)
    || '.' || quote_ident(indexname) || ';' AS reindex_command,
  pg_size_pretty(pg_relation_size(
    (schemaname || '.' || indexname)::regclass
  )) AS current_size,
  round(avg_leaf_density::numeric, 1) AS leaf_density
FROM (
  SELECT
    n.nspname AS schemaname,
    i.relname AS indexname,
    (pgstatindex(i.relname::text)).avg_leaf_density
  FROM pg_index ix
  JOIN pg_class c ON c.oid = ix.indrelid
  JOIN pg_class i ON i.oid = ix.indexrelid
  JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE n.nspname = 'public'
    AND pg_relation_size(ix.indexrelid) > 50 * 1024 * 1024
) idx
WHERE avg_leaf_density < 60
ORDER BY avg_leaf_density ASC;

Run this query weekly, review the output, and execute the REINDEX commands during your maintenance window. On RDS and Aurora, you can run REINDEX CONCURRENTLY directly — it's not a restricted operation.

fillfactor: Preventing Bloat on Update-Heavy Tables

For tables and indexes with frequent in-place updates on indexed columns, setting a lower fillfactor on the index reserves space on each leaf page for future entries. This reduces the fragmentation that updates cause:

-- Create an index with 70% fill factor (reserve 30% for updates)
CREATE INDEX orders_status_idx ON orders (status)
WITH (fillfactor = 70);

-- Change fillfactor on an existing index (takes effect after next REINDEX)
ALTER INDEX orders_status_idx SET (fillfactor = 70);
REINDEX INDEX CONCURRENTLY orders_status_idx;

The trade-off: lower fillfactor means the index uses more disk space initially in exchange for staying denser longer. For indexes on columns that are frequently updated (status columns, state machine fields, running totals), 70–80% fillfactor is a reasonable starting point. For append-only indexes on insert-only tables, 90–100% is fine.

pgstattuple vs. Estimation-Based Bloat Queries

Many bloat detection scripts used in production (including the popular ones from pgExperts and various monitoring tools) estimate bloat from catalog statistics rather than performing physical page reads. These estimates are fast but can be significantly wrong — either under-reporting bloat on indexes with unusual fill patterns or over-reporting it on recently vacuumed indexes.

Use estimation queries for routine monitoring at low overhead. Use pgstattuple for precision before committing maintenance window time to a large REINDEX operation. The physical scan takes minutes but eliminates the false positive/negative problem from estimation-based approaches.

Index bloat quietly degrading query performance on your Postgres cluster?

Start with a free 20-minute assessment call to scope the problem and decide whether a paid diagnostic or implementation step is worthwhile. Written findings are not included in the free call.

Get a Free Assessment → More Articles