Postgres Table Partitioning Breaks More Query Plans Than You Expect

The Partitioning Promise vs. the Reality

The case for Postgres table partitioning is straightforward: partition a large table by date, and queries that filter on that date column only scan the relevant partitions. A query against the last 30 days of a 5-year event log touches 1/60th of the data instead of all of it. Maintenance operations — archiving, dropping old data — operate on whole partitions instead of expensive DELETE operations. Statistics are maintained per-partition, so the planner has more accurate cardinality estimates for time-bounded queries.

All of this is true, and for the right workload, partitioning delivers exactly those benefits. The problem is that partitioning changes the physical structure of the table in ways that affect the planner's cost model for every query touching that table, not just the ones that benefit from partition pruning. Several query patterns that performed well on an unpartitioned table perform significantly worse after partitioning — and the degradation isn't always visible until you're in production with real data distribution across real partitions.

Problem 1: Cross-Partition Joins and Partition-Wise Join Conditions

When you join a partitioned table against another table, Postgres can use partition-wise joins — it parallelizes the join by executing it partition by partition, which is efficient when the join key aligns with the partition key. When the join key does not align with the partition key, Postgres must generate a plan that accesses all partitions and merges the results.

The issue: the planner's cost estimates for cross-partition joins are frequently wrong. It underestimates the overhead of opening and scanning multiple partition files, particularly when the number of partitions is large (50+). A query that joins a table with 100 monthly partitions against a lookup table will generate 100 separate scan nodes in the plan, each with its own index lookup or sequential scan. The planning time itself grows with partition count, and the execution overhead of managing 100 scan nodes is not negligible.

-- Check partition count and sizes for a partitioned table
SELECT
  nmsp_parent.nspname AS parent_schema,
  parent.relname AS parent_table,
  nmsp_child.nspname AS child_schema,
  child.relname AS partition_name,
  pg_size_pretty(pg_relation_size(child.oid)) AS partition_size,
  pg_stat_user_tables.n_live_tup AS live_rows
FROM pg_inherits
JOIN pg_class parent ON pg_inherits.inhparent = parent.oid
JOIN pg_class child ON pg_inherits.inhrelid = child.oid
JOIN pg_namespace nmsp_parent ON nmsp_parent.oid = parent.relnamespace
JOIN pg_namespace nmsp_child ON nmsp_child.oid = child.relnamespace
LEFT JOIN pg_stat_user_tables ON pg_stat_user_tables.relid = child.oid
WHERE parent.relname = 'events'
ORDER BY child.relname;

-- EXPLAIN output for cross-partition join
-- Watch for "Append" nodes with many children -- each is a partition scan
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT e.user_id, u.email, COUNT(*) AS event_count
FROM events e
JOIN users u ON e.user_id = u.id
WHERE u.account_type = 'premium'
  AND e.created_at > NOW() - INTERVAL '90 days'
GROUP BY e.user_id, u.email;

If EXPLAIN shows an Append node with 60+ children for a 90-day query on a table with 5 years of monthly partitions, partition pruning is not working as expected. Common causes: the partition key column is being cast implicitly (date vs. timestamptz mismatch), or the query uses a function on the partition key column that prevents pruning.

Problem 2: Runtime Partition Pruning vs. Plan-Time Pruning

Postgres performs partition pruning at two points: plan time (when the query is parsed and planned) and runtime (during execution, for parameterized queries). Plan-time pruning is free — partitions are excluded before execution starts. Runtime pruning happens during execution but adds overhead on every query execution.

The critical distinction: prepared statements and queries with bind parameters can only use runtime pruning. Every execution of a prepared statement against a partitioned table re-evaluates which partitions to include based on the parameter values. For high-frequency OLTP queries, this per-execution overhead accumulates.

-- This query benefits from plan-time pruning (literal value)
-- Planner knows at plan time which partitions to include
EXPLAIN SELECT * FROM events
WHERE created_at > '2023-04-01'::timestamptz
  AND created_at < '2023-05-01'::timestamptz;
-- Result: Append with only April 2023 partition in plan

-- This query uses runtime pruning (parameterized)
-- Planner cannot exclude partitions at plan time
EXPLAIN SELECT * FROM events
WHERE created_at > $1 AND created_at < $2;
-- Result: Append with ALL partitions -- pruning happens at runtime per execution

-- Check runtime pruning overhead for a high-frequency query
-- Look for "Subplans Removed" in EXPLAIN ANALYZE output
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM events
WHERE created_at > NOW() - INTERVAL '7 days'
  AND user_id = 12345;

For OLTP queries that run thousands of times per second with parameterized partition keys, the runtime pruning overhead can be measurable. The mitigation: use enable_partition_pruning = on (default) and ensure your partition key columns are indexed in all partitions. Also consider whether the query's partition key filter is truly selective enough to justify the runtime pruning cost for that access pattern.

Problem 3: Global Unique Indexes Don't Exist

Postgres does not support global unique indexes on partitioned tables. A UNIQUE constraint on a partitioned table enforces uniqueness only within each partition, not across all partitions. If you need global uniqueness — a globally unique order ID, a globally unique email address — you cannot enforce it with a Postgres partitioned table index alone.

-- This silently creates per-partition unique indexes, NOT a global unique constraint
CREATE TABLE orders (
  id BIGSERIAL,
  customer_id BIGINT,
  created_at TIMESTAMPTZ NOT NULL,
  status TEXT,
  PRIMARY KEY (id, created_at)  -- partition key must be in PK
) PARTITION BY RANGE (created_at);

-- The primary key above only enforces uniqueness within each partition
-- Two rows in different partitions CAN have the same id value

-- Demonstrate the gap:
INSERT INTO orders (id, customer_id, created_at, status)
VALUES (1, 100, '2023-01-15', 'completed');

INSERT INTO orders (id, customer_id, created_at, status)
VALUES (1, 200, '2023-06-15', 'pending');
-- Both inserts succeed -- id=1 exists in January and June partitions

The workarounds for global uniqueness on partitioned tables:

Use a separate identity table. A small non-partitioned table that holds only the unique column (order_id, user_id, etc.) with a true unique constraint. The partitioned table references it via FK. Writes are slightly more expensive but uniqueness is enforced at the database level.
Include the partition key in the unique constraint. If your unique requirement is (id, tenant_id) and you partition by tenant_id, the partition key inclusion makes the constraint technically enforceable as a local unique index per partition. This only works when the partition key is part of the natural key.
Application-level uniqueness with advisory locks. Enforce uniqueness at the application layer using pg_advisory_xact_lock() on a hash of the unique value before inserting. Fragile under high concurrency but workable for low-write tables.

Problem 4: Index Bloat Multiplied by Partition Count

Every partition has its own copy of every index defined on the partitioned table. A table with 10 indexes and 60 monthly partitions has 600 index files on disk. Each index file has its own autovacuum process, its own statistics maintenance, and its own bloat accumulation pattern. Operations that were manageable on a single large table — REINDEX, ANALYZE, VACUUM — become significantly more complex when you must coordinate them across 60 partitions.

-- Count total indexes across all partitions of a partitioned table
SELECT
  COUNT(*) AS total_partition_indexes,
  SUM(pg_relation_size(indexrelid)) AS total_index_bytes,
  pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS total_index_size
FROM pg_index pi
JOIN pg_class pc ON pi.indrelid = pc.oid
JOIN pg_inherits inh ON inh.inhrelid = pc.oid
WHERE inh.inhparent = 'events'::regclass;

-- Find bloated indexes across partitions
SELECT
  schemaname,
  tablename,
  indexname,
  pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
  idx_scan,
  CASE WHEN idx_scan = 0 THEN 'UNUSED' ELSE 'USED' END AS usage_status
FROM pg_stat_user_indexes
WHERE tablename LIKE 'events_%'  -- partition naming pattern
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 30;

Problem 5: Statistics Sampling Misses Cross-Partition Correlation

Postgres maintains per-partition statistics, which is good for partition-local queries. But extended statistics — column correlation, multi-column dependencies — must be defined explicitly on each partition and are not automatically inherited from the parent. Cross-partition queries that depend on multi-column correlations for accurate cardinality estimates will get the same inaccurate estimates that prompted you to use extended statistics in the first place.

-- Create extended statistics on partitioned table parent
-- (Postgres 13+ propagates to new partitions automatically)
CREATE STATISTICS events_user_date_stats (dependencies)
ON user_id, created_at FROM events;

-- But existing partitions need the statistics added manually
-- Generate and run for each partition:
SELECT 'CREATE STATISTICS events_' || relname || '_stats (dependencies) '
    || 'ON user_id, created_at FROM ' || relname || ';'
FROM pg_class
WHERE relname LIKE 'events_2%'
  AND relkind = 'r'
ORDER BY relname;

ANALYZE events;

The Decision Framework: When to Partition

Partition if: the table exceeds 50-100GB, the primary access pattern is time-bounded (last N days/months), you need to drop old data regularly via partition detach/drop (not DELETE), and you have the operational capacity to manage partition-specific maintenance.

Don't partition if: the primary query pattern is point lookups by non-partition-key columns, you need global unique enforcement on columns other than the partition key, or your partition count will exceed 100 (planning overhead becomes measurable). For OLTP tables under 50GB with well-tuned indexes, partitioning adds complexity without proportional benefit.

The teams that regret partitioning are the ones who added it to solve a data management problem (dropping old data is hard) without benchmarking its effect on the OLTP query mix that runs against the same table thousands of times per second.

Need a second opinion on your stack?

We'll review your environment and share findings in 5–7 business days. No sales pitch, no obligation.

Get a Free Assessment → More Articles