When Postgres Parallel Query Makes Things Slower

How Parallel Query Works

When PostgreSQL executes a parallel query, it spawns one or more background worker processes to scan portions of a table or index simultaneously. The leader process coordinates the workers, gathers their partial results, and returns the final result set to the client. Each worker is a full OS process with its own memory allocation — including its own work_mem grant.

The performance benefit is clear for the right workload: a sequential scan of a 500GB table that takes 180 seconds with one process takes 45 seconds with four parallel workers. For analytical queries, ETL loads, and large aggregations running in off-peak windows, parallel query is genuinely valuable.

The problem is that PostgreSQL's default configuration enables parallel query for any query the planner estimates will scan more than min_parallel_table_scan_size (default: 8MB) of data. On an OLTP database with dozens of concurrent short-lived queries, this default is wrong.

The OLTP Scenario Where It Hurts

Consider a SaaS application database: 200 concurrent connections, query mix of point lookups and small range scans, p50 latency of 8ms, p99 of 45ms. The database is on an 8-vCPU RDS instance with max_worker_processes = 8 and max_parallel_workers_per_gather = 2 (the PostgreSQL defaults).

At peak load, a handful of slightly-larger queries — account summary reports, dashboard aggregations — each spawn 2 parallel workers. With 30 such queries running simultaneously, that's 60 parallel workers plus 30 leader processes competing for 8 vCPUs. The parallel workers don't make those queries faster in this environment — they finish in roughly the same wall-clock time because they're all fighting for the same CPU cores. But they do starve the 170 other concurrent OLTP queries of CPU time. p99 latency climbs to 200ms. Users notice.

The other problem: memory. Each parallel worker gets its own work_mem allocation for sort and hash operations. If work_mem = 64MB and a query uses 3 workers, that query can consume up to 192MB for a single hash join. With 30 parallel queries, you're potentially allocating 5.7GB just for work_mem — before the shared buffer pool, connection overhead, and OS memory.

Diagnosing Whether Parallel Query Is Hurting You

-- Check how many parallel queries are currently running
SELECT
  pid,
  query_start,
  state,
  wait_event_type,
  wait_event,
  LEFT(query, 80) AS query
FROM pg_stat_activity
WHERE query ~ 'Parallel' OR backend_type = 'parallel worker'
ORDER BY query_start;

-- Check parallel worker usage in pg_stat_activity
SELECT
  backend_type,
  COUNT(*) AS count
FROM pg_stat_activity
GROUP BY backend_type
ORDER BY count DESC;

-- EXPLAIN a suspicious query to see if parallel is being used
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT account_id, SUM(amount)
FROM transactions
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY account_id;

-- Look for "Gather" or "Gather Merge" nodes in the plan --
-- these indicate parallel execution

If you see Gather or Gather Merge nodes in query plans for queries that run in under 100ms, parallel query is being triggered for queries too small to benefit from parallelism. The coordination overhead of spawning workers and gathering results adds latency to queries that would be faster running serially.

The Right Configuration for OLTP Databases

For pure OLTP workloads — sub-100ms queries, high concurrency, index-driven access patterns — the correct approach is to disable parallel query globally and re-enable it selectively for the specific queries or sessions that benefit.

-- Disable parallel query globally for an OLTP database
-- In postgresql.conf or via ALTER SYSTEM:
max_parallel_workers_per_gather = 0   -- disables parallel query for all sessions
max_parallel_workers = 0              -- no parallel workers at all

-- More surgical: keep workers available but raise the cost threshold
-- so only genuinely large scans trigger parallelism
min_parallel_table_scan_size = '1GB'  -- default 8MB; only parallelize huge scans
min_parallel_index_scan_size = '512MB'
parallel_setup_cost = 10000           -- default 1000; raise spawn cost estimate
parallel_tuple_cost = 0.1             -- default 0.1; keep but let setup_cost dominate

The surgical approach (raising thresholds rather than disabling entirely) is preferable when you have a mixed workload — mostly OLTP but with occasional large analytical queries that genuinely benefit. By setting min_parallel_table_scan_size = 1GB, you ensure that only full scans of tables larger than 1GB trigger parallel workers, which eliminates spurious parallelism on the medium-sized tables that OLTP queries typically touch.

Enabling Parallel Query for Specific Sessions

If you have ETL jobs, reporting queries, or analytical workloads that should use parallel query while OLTP sessions should not, configure parallel query at the session level or role level rather than globally.

-- Create a reporting role with parallel query enabled
-- while the default (application) role has it disabled

-- In postgresql.conf: disable globally
max_parallel_workers_per_gather = 0

-- Override per role for the reporting user
ALTER ROLE reporting_user SET max_parallel_workers_per_gather = 4;
ALTER ROLE reporting_user SET work_mem = '256MB';

-- Or set at session level in the ETL job connection string
-- before running large queries:
SET max_parallel_workers_per_gather = 4;
SET work_mem = '512MB';

SELECT
  customer_segment,
  product_category,
  SUM(revenue) AS total_revenue,
  COUNT(DISTINCT customer_id) AS unique_customers
FROM orders o
JOIN order_items i ON i.order_id = o.order_id
JOIN customers c ON c.id = o.customer_id
WHERE o.created_at BETWEEN '2022-01-01' AND '2022-06-30'
GROUP BY customer_segment, product_category
ORDER BY total_revenue DESC;

-- Reset after the analytical session
RESET max_parallel_workers_per_gather;

The work_mem Multiplication Problem

This is the least-discussed parallel query issue. PostgreSQL's work_mem parameter controls the memory available for a single sort or hash operation within a query. In a parallel query, each worker process gets its own work_mem allocation, independently.

-- Memory consumption calculation for a parallel hash join
-- work_mem = 64MB
-- max_parallel_workers_per_gather = 4
-- Query uses hash join with 4 workers

-- Each worker can use 64MB for its hash table
-- Leader can use 64MB for its portion
-- Total potential work_mem for this one query: 5 * 64MB = 320MB

-- With 50 concurrent parallel queries:
-- 50 * 320MB = 16GB just for work_mem
-- This will cause swapping on any instance with less than ~24GB RAM

-- Safer formula for work_mem with parallel query enabled:
-- work_mem = (available_memory * 0.25) / (max_connections * max_parallel_workers_per_gather)
-- For 16GB RAM, 200 connections, 2 workers:
-- work_mem = (4096MB) / (200 * 2) = ~10MB per worker

Most teams set work_mem to a comfortable value (64MB, 128MB) without adjusting for parallel query's multiplication effect. The result: occasional OOM events during peak query periods when multiple parallel queries simultaneously allocate their maximum work_mem. On RDS and Aurora, OOM events cause instance restarts. On self-managed PostgreSQL, they cause query cancellations or OS-level OOM kills.

When Parallel Query Genuinely Helps

To be clear about the cases where parallel query should be enabled and sized appropriately:

Dedicated analytical or reporting databases where queries routinely scan hundreds of millions of rows. These databases have different concurrency profiles — fewer simultaneous connections, longer-running queries — where parallel workers add real throughput without starving each other.
ETL and batch processing windows where you want to maximize throughput during off-peak hours. Temporarily enable parallel query for the ETL session, run the large transformations, then reset.
Aurora PostgreSQL with large instance classes where the instance has 32+ vCPUs and the workload includes a mix of OLTP and moderate analytical queries. The larger CPU pool means parallel workers have headroom without starving OLTP queries.

The pattern is consistent: parallel query belongs on databases purpose-built for analytical work or on large instances where CPU headroom makes the trade-off favorable. On a shared OLTP database running on a db.r6g.2xlarge or db.r6g.4xlarge, the default settings create more problems than they solve.

PostgreSQL performance degrading under load?

We review production PostgreSQL and Aurora configurations and identify the specific settings causing latency issues. Free assessment, findings in 5-7 days.

Get a Free Assessment → More Articles