Oracle DBMS_STATS vs. Postgres ANALYZE: Why Your Migrated Queries Choose the Wrong Plan

What Statistics Each Optimizer Uses

Both Oracle's Cost-Based Optimizer (CBO) and Postgres's planner use statistics to estimate how many rows a filter will return, which join order minimizes intermediate result sizes, and whether an index scan or a sequential scan is cheaper. The statistics they collect differ in important ways.

Oracle collects: row counts, block counts, average row length, column-level NDV (number of distinct values), null count, min/max values, histograms (frequency histograms for low-NDV columns, height-balanced histograms for high-NDV skewed columns, hybrid histograms in 12c+), column group statistics for correlated columns, and expression statistics for function-based indexes.

Postgres collects: row counts, page counts, average row width, column-level NDV estimates, null fraction, most common values with frequencies (MCV list), a histogram of value distribution, and correlation (physical vs. logical sort order). Extended statistics (Postgres 10+) cover column correlation and multi-column NDV.

The key differences that cause plan regressions:

Oracle histograms are richer by default for skewed columns; Postgres requires explicit extended statistics for multi-column correlation
Oracle's default statistics target (sample rate) is AUTO, which Oracle tunes per table; Postgres's default_statistics_target is 100 (a fixed sample granularity) with per-column overrides
Oracle maintains statistics on partitions and the global table separately; Postgres autovacuum analyzes partitions independently and there are no global partition statistics without manual aggregation
Oracle's CBO uses system statistics (CPU speed, I/O throughput) for cost modeling; Postgres uses fixed cost constants (seq_page_cost, random_page_cost) that must be tuned per storage type

The Statistics Gap After Data Load

When data is migrated from Oracle to Postgres using AWS DMS, ora2pg, or a bulk copy approach, the Postgres tables have data but no statistics. DMS does not run ANALYZE after loading. This means the Postgres planner is operating on the default estimates: roughly 1000 rows per table and uniform distribution across all columns — wildly wrong for any table of meaningful size.

-- Check which tables have stale or missing statistics
SELECT
  schemaname,
  tablename,
  n_live_tup,
  last_analyze,
  last_autoanalyze,
  CASE
    WHEN last_analyze IS NULL AND last_autoanalyze IS NULL THEN 'NEVER ANALYZED'
    WHEN last_analyze < NOW() - INTERVAL '7 days'
     AND last_autoanalyze < NOW() - INTERVAL '1 day' THEN 'STALE'
    ELSE 'OK'
  END AS stats_status
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY n_live_tup DESC;

After any bulk data load — migration, initial load, large batch insert — run ANALYZE manually before allowing application queries:

-- Analyze all tables in the schema after data load
ANALYZE VERBOSE;

-- Or target specific high-volume tables first
ANALYZE orders;
ANALYZE order_items;
ANALYZE customers;

-- For partitioned tables, analyze each partition and the parent
ANALYZE orders_2024_q1;
ANALYZE orders_2024_q2;
ANALYZE orders;  -- updates parent table stats from partition stats

Oracle DBMS_STATS Equivalents in Postgres

DBMS_STATS.GATHER_TABLE_STATS

-- Oracle
EXEC DBMS_STATS.GATHER_TABLE_STATS(
  ownname    => 'APPSCHEMA',
  tabname    => 'ORDERS',
  estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
  method_opt => 'FOR ALL COLUMNS SIZE AUTO',
  cascade    => TRUE  -- also gather index stats
);

-- Postgres equivalent
ANALYZE orders;
-- CASCADE equivalent: ANALYZE also updates pg_class statistics for indexes automatically
-- For explicit index stats update: VACUUM ANALYZE orders;

DBMS_STATS.GATHER_SCHEMA_STATS

-- Oracle
EXEC DBMS_STATS.GATHER_SCHEMA_STATS(
  ownname => 'APPSCHEMA',
  estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
);

-- Postgres equivalent: no single command, use a loop or just ANALYZE
-- ANALYZE with no table argument covers all tables in the database
ANALYZE;

-- Or target a specific schema via a DO block
DO $$
DECLARE
  tbl TEXT;
BEGIN
  FOR tbl IN
    SELECT schemaname || '.' || tablename
    FROM pg_tables
    WHERE schemaname = 'public'
  LOOP
    EXECUTE 'ANALYZE ' || tbl;
    RAISE NOTICE 'Analyzed: %', tbl;
  END LOOP;
END;
$$;

DBMS_STATS.SET_TABLE_STATS (Locking Stats)

Oracle allows locking statistics on a table to prevent autostats from overwriting manually tuned values. Postgres does not have a direct equivalent — autovacuum will re-analyze tables on its schedule. The workaround is using per-table autovacuum cost delay settings to effectively prevent autovacuum from analyzing specific tables:

-- Prevent autovacuum from re-analyzing a specific table
-- (use when statistics are manually tuned and you don't want them overwritten)
ALTER TABLE orders SET (autovacuum_analyze_scale_factor = 1.0);
-- Setting scale factor to 1.0 means autovacuum only triggers when
-- 100% of the table's rows change — effectively never for large stable tables

-- Reset to default behavior
ALTER TABLE orders RESET (autovacuum_analyze_scale_factor);

The statistics_target: The Most Important Tuning Lever

Oracle's AUTO_SAMPLE_SIZE uses an adaptive algorithm to determine how much of the table to sample for accurate NDV estimates. Postgres uses a fixed default_statistics_target of 100 — this controls how many "buckets" the histogram and MCV list use, not directly the sample percentage. Higher values mean more granular statistics, better cardinality estimates, and better plans — at the cost of longer ANALYZE time and larger catalog storage.

-- Check current default statistics target
SHOW default_statistics_target;  -- default is 100

-- Increase globally for better plan quality (typical range: 100–500)
-- Set in postgresql.conf or via ALTER SYSTEM:
ALTER SYSTEM SET default_statistics_target = 300;
SELECT pg_reload_conf();

-- Override per-column for skewed columns that the optimizer gets wrong
-- Example: order_status has very skewed distribution (95% 'COMPLETED')
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
ANALYZE orders;

-- Check per-column statistics targets
SELECT attname, attstattarget
FROM pg_attribute
WHERE attrelid = 'orders'::regclass
  AND attstattarget != -1  -- -1 means use default
ORDER BY attname;

For columns that are frequently used in WHERE clauses and have skewed distributions — status codes, region codes, product categories — increasing statistics_target to 300–500 for those specific columns often resolves cardinality estimation errors that cause bad plans.

Extended Statistics for Correlated Columns

Oracle's column group statistics capture correlations between columns. Without them, Oracle's CBO assumes column independence — it multiplies individual selectivities, underestimating result sizes for correlated filters. Postgres has the same problem by default, but extended statistics (introduced in Postgres 10) address it:

-- Example: queries frequently filter on (country, city) together
-- These columns are correlated: city uniquely determines country
-- Without extended stats, Postgres assumes independence and misestimates cardinality

-- Create extended statistics for the correlated columns
CREATE STATISTICS orders_country_city_stats (ndistinct, dependencies)
  ON country, city FROM customers;

-- Run ANALYZE to populate the extended statistics
ANALYZE customers;

-- Verify extended statistics were created and populated
SELECT stxname, stxkeys, stxkind, stxdinherit
FROM pg_statistic_ext
WHERE stxrelid = 'customers'::regclass;

-- Check that statistics have been computed
SELECT stxname, stxdinherit,
       (stxddependencies IS NOT NULL) AS has_dependencies,
       (stxdndistinct IS NOT NULL) AS has_ndistinct
FROM pg_statistic_ext_data sed
JOIN pg_statistic_ext se ON se.oid = sed.stxoid
WHERE se.stxrelid = 'customers'::regclass;

Diagnosing Cardinality Estimation Errors

When a Postgres query chooses a bad plan after migration, the root cause is almost always a cardinality estimation error. EXPLAIN ANALYZE shows both the estimated and actual row counts — the gap reveals where the optimizer went wrong:

-- Run with ANALYZE to see estimated vs. actual rows
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.order_id, c.customer_name, oi.quantity
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id
WHERE o.status = 'PROCESSING'
  AND o.region = 'NORTHEAST'
  AND o.order_date > '2025-01-01';

In the output, look for nodes where rows=X (estimated) differs significantly from actual rows=Y. A row estimate of 1 when actual rows are 50,000 means the optimizer chose an index scan path that is catastrophically wrong for that cardinality. The fix is: increase statistics_target for the filtered columns, add extended statistics if multiple columns are correlated, or — for cases where statistics cannot be made accurate — use query-level hints via pg_hint_plan to force the correct plan.

Replacing Oracle's SQL Plan Management

Oracle's SQL Plan Management (SPM) captures and pins execution plans, preventing them from changing when statistics change. Postgres has no equivalent built-in feature. The closest options are:

pg_hint_plan extension: embed optimizer hints in SQL comments to force specific join orders, index choices, or scan types
Query rewrite: restructure the SQL to produce a plan that doesn't depend on fragile statistics
plan_cache_mode = force_generic_plan: disables per-parameter plan caching for prepared statements, which sometimes produces more stable plans at the cost of per-call planning overhead

-- pg_hint_plan: force an index scan on a specific table
/*+ IndexScan(orders orders_status_date_idx) */
SELECT * FROM orders WHERE status = 'PROCESSING' AND order_date > '2025-01-01';

-- Force a specific join order
/*+ Leading(orders customers order_items) HashJoin(orders customers) */
SELECT o.order_id, c.customer_name
FROM orders o
JOIN customers c ON c.customer_id = o.customer_id
JOIN order_items oi ON oi.order_id = o.order_id;

The goal after migration is to get statistics accurate enough that hints are not needed. Use hints as a tactical fix while you tune statistics — not as a permanent substitute for correct statistics.

Post-migration query performance not matching Oracle baselines?

We diagnose cardinality estimation errors, tune statistics targets and extended statistics, and resolve plan regressions after Oracle-to-Postgres migrations. Free assessment, no obligation.

Get a Free Assessment → More Articles