Oracle to Postgres for Write-Heavy OLTP: WAL and Checkpoint Tuning

Oracle Redo vs PostgreSQL WAL: The Architectural Difference

Oracle's write path uses a fixed-size circular redo log. LGWR (Log Writer) flushes redo buffers to the online redo log files synchronously on COMMIT. The redo log files are pre-allocated, fixed in size, and recycled in sequence — there is no dynamic growth. The CKPT (Checkpoint) process advances the checkpoint position and coordinates DBWn (Database Writer) in flushing dirty blocks from the buffer cache to data files. The redo log and the data file writes are decoupled: COMMIT waits only for LGWR, not for DBWn.

PostgreSQL's WAL works similarly in principle but differs in important operational details:

WAL segments are 16 MB by default (configurable at compile time or via wal_segment_size). They grow dynamically up to max_wal_size, then the checkpoint process forces a flush to bring usage back down.
COMMIT flushes WAL to disk synchronously (when synchronous_commit = on) via the WALWriter background process or the committing backend itself.
The bgwriter and checkpointer processes write dirty shared_buffers pages to data files. Checkpoints occur either on a time schedule (checkpoint_timeout) or when WAL usage exceeds max_wal_size.
Full-page writes: after a checkpoint, the first modification to any page writes the entire page to WAL (not just the change). This protects against partial writes but doubles or triples WAL volume for workloads with wide write patterns immediately after a checkpoint.

The full-page write behavior is the single most significant WAL volume amplifier for write-heavy OLTP workloads migrated from Oracle. Oracle has no equivalent — it writes only the change vectors to redo. On a database with thousands of different pages being modified per checkpoint cycle, full-page writes after each checkpoint can generate 2-4x the WAL volume of the actual data changes.

Diagnosing Your WAL Generation Rate Before Cutover

Before tuning, measure your expected WAL generation rate by load-testing the migrated application against Aurora PostgreSQL with a representative transaction mix. The key metrics:

-- WAL generation statistics (PostgreSQL 14+)
SELECT
  wal_records,
  wal_fpi,                    -- full page images (the expensive ones)
  wal_bytes,
  wal_buffers_full,           -- how often WAL buffers forced flush
  wal_write,
  wal_sync,
  wal_write_time,
  wal_sync_time,
  stats_reset
FROM pg_stat_wal;

-- WAL bytes per second (run twice, compute delta)
SELECT pg_current_wal_lsn() AS lsn, now() AS measured_at;
-- Wait 60 seconds
SELECT
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), '0/BASELINE_LSN_HERE')
  ) AS wal_in_60s;

-- Full page image ratio — high ratio means checkpoint frequency is too high
SELECT
  wal_fpi::float / NULLIF(wal_records, 0) AS fpi_ratio,
  -- Values above 0.3 indicate excessive checkpoints or very wide write patterns
  pg_size_pretty(wal_bytes) AS total_wal_bytes
FROM pg_stat_wal;

Compare WAL bytes per second to your Oracle redo generation rate (from V$LOG_HISTORY or AWR). A ratio of WAL:redo above 2x is common and expected due to full-page writes. A ratio above 4x suggests checkpoint frequency is too high and full-page write overhead is excessive.

max_wal_size: The Most Important Parameter for Write-Heavy Workloads

The default max_wal_size is 1 GB. On a write-heavy OLTP system generating 500 MB/minute of WAL, PostgreSQL will trigger a checkpoint roughly every 2 minutes. Each checkpoint restarts the full-page write cycle — every page touched after the checkpoint generates a full-page WAL record on first modification. At 2-minute checkpoint intervals, a large fraction of your working set pages are getting written in full on every checkpoint cycle.

Increasing max_wal_size extends the time between WAL-triggered checkpoints, reducing full-page write frequency and WAL volume:

-- postgresql.conf tuning for write-heavy OLTP
-- (Aurora: set via custom parameter group)

# Allow more WAL to accumulate before forcing a checkpoint
max_wal_size = 8GB          # default 1GB — increase to 4-16GB for write-heavy

# Allow more time between time-triggered checkpoints
checkpoint_timeout = 15min  # default 5min — extend to 10-30min

# Spread checkpoint I/O across the checkpoint interval
# 0.9 means use 90% of the interval for checkpoint writes
checkpoint_completion_target = 0.9  # default 0.9, keep this

# WAL buffers — increase for high-concurrency write workloads
wal_buffers = 64MB          # default -1 (auto ~3% of shared_buffers)
                             # explicit 64MB for write-heavy loads

The tradeoff: larger max_wal_size means longer crash recovery time (PostgreSQL must replay more WAL after an unclean shutdown). For most managed Aurora deployments where crash recovery is handled by Aurora's storage layer and recovery time is inherently short, this tradeoff strongly favors larger max_wal_size.

checkpoint_completion_target and I/O Smoothing

Oracle's DBWn writes dirty blocks continuously in the background, smoothing I/O over time. PostgreSQL's checkpoint process writes all dirty pages within a checkpoint interval — but checkpoint_completion_target controls how much of that interval it uses.

At the default of 0.9, PostgreSQL targets completing checkpoint writes within 90% of the checkpoint interval. For a checkpoint_timeout of 15 minutes, that means checkpoint writes spread over ~13.5 minutes, with the remaining 1.5 minutes as headroom. This is the correct setting — do not reduce it. Reducing checkpoint_completion_target concentrates writes into a shorter window and causes I/O spikes that show up as write latency.

On Aurora, the storage layer buffers write I/O at the storage node level, so the impact of checkpoint burst I/O is more muted than on self-managed PostgreSQL. But the WAL volume reduction from longer checkpoint intervals still matters for Aurora because WAL generation rate affects replication lag to Aurora readers.

synchronous_commit: The Latency vs Durability Trade

Oracle's COMMIT always waits for LGWR to confirm the redo record is written to disk before returning to the application. This is synchronous commit — no committed data is ever lost even if the instance crashes immediately after COMMIT returns.

PostgreSQL's synchronous_commit parameter offers four modes:

-- synchronous_commit options and their durability/latency tradeoffs:

-- on (default): COMMIT waits for WAL flush to disk
-- Equivalent to Oracle's behavior. No data loss on crash.
synchronous_commit = on

-- remote_write: COMMIT waits for WAL written to standby OS buffer
-- (requires synchronous_standby_names set)
synchronous_commit = remote_write

-- local: COMMIT waits for local WAL flush but not standby
synchronous_commit = local

-- off: COMMIT returns immediately after WAL buffered but before flush
-- Up to wal_writer_delay (200ms default) of committed transactions
-- can be lost if PostgreSQL crashes — but OS crash does NOT lose data
-- (WAL is in OS page cache; OS crash loses it; Postgres crash does not)
synchronous_commit = off

-- remote_apply: COMMIT waits for standby to apply (strongest)
synchronous_commit = remote_apply

For write-heavy OLTP workloads migrated from Oracle, the default synchronous_commit = on is the safe starting point — it matches Oracle's commit behavior. If write latency benchmarks show COMMIT time as the primary bottleneck, synchronous_commit = off can reduce commit latency by 1-5ms (the WAL flush round-trip). The risk — up to 200ms of committed transactions lost on a PostgreSQL crash (not OS crash) — is acceptable for some workloads and not for others. Financial transaction records typically require on. User event tracking or analytics writes can often tolerate off.

bgwriter Tuning for Sustained Write Loads

PostgreSQL's bgwriter process proactively writes dirty shared_buffers pages to disk before the checkpoint process needs them. Proper bgwriter tuning prevents backend processes from having to write pages themselves (which adds latency to query execution):

-- bgwriter parameters for write-heavy workloads
bgwriter_lru_maxpages = 200    # default 100 — pages written per bgwriter round
bgwriter_lru_multiplier = 4.0  # default 2.0 — anticipation factor for dirty pages
bgwriter_delay = 50ms          # default 200ms — reduce for faster dirty page flushing

-- Monitor bgwriter effectiveness
SELECT
  checkpoints_timed,
  checkpoints_req,             -- checkpoint triggered by WAL size, not time
  -- High checkpoints_req means max_wal_size is too small
  buffers_checkpoint,
  buffers_clean,               -- written by bgwriter
  maxwritten_clean,            -- bgwriter hit its per-round limit (increase bgwriter_lru_maxpages)
  buffers_backend,             -- written by backend processes (bad — means bgwriter can't keep up)
  buffers_backend_fsync,       -- fsync calls by backends (very bad)
  buffers_alloc
FROM pg_stat_bgwriter;

The metric to watch is buffers_backend. If backends are writing their own dirty pages, bgwriter is not keeping up with the write rate. Increase bgwriter_lru_maxpages and reduce bgwriter_delay until buffers_backend approaches zero.

checkpoints_req being high relative to checkpoints_timed is the clearest signal that max_wal_size is too small — WAL is filling up and forcing checkpoints ahead of the time-based schedule.

WAL Compression

PostgreSQL 15 introduced WAL compression using LZ4 or zstd algorithms. For write-heavy workloads where WAL volume is the constraint — either for storage cost, replication bandwidth, or to extend effective max_wal_size — WAL compression can reduce WAL volume by 40-70% on typical OLTP data:

-- Enable WAL compression (PostgreSQL 15+)
wal_compression = lz4    # or 'zstd' for better ratio at slightly higher CPU
                          # or 'pglz' for built-in option (lower ratio)

-- Monitor compression effectiveness
SELECT
  wal_bytes,
  -- Compare to wal_bytes without compression from a baseline period
  -- Ratio of wal_bytes before vs after enabling compression
  wal_fpi,
  wal_fpi * 8192 AS uncompressed_fpi_bytes  -- rough uncompressed FPI size
FROM pg_stat_wal;

WAL compression is particularly effective for full-page images — the exact records that dominate WAL volume for write-heavy workloads. Page images are highly compressible because they contain significant amounts of free space and repeated patterns. LZ4 compression of FPI records typically achieves 50-60% size reduction at negligible CPU cost.

On Aurora PostgreSQL, WAL compression availability depends on the Aurora PostgreSQL version. Check Aurora's parameter group for wal_compression support before planning around it.

The Pre-Cutover Configuration Checklist

For an Oracle write-heavy OLTP migration to Aurora PostgreSQL, finalize these parameters before go-live:

-- Recommended starting configuration for write-heavy OLTP
-- Apply via Aurora custom parameter group

max_wal_size = 8GB
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9
wal_buffers = 64MB
bgwriter_lru_maxpages = 200
bgwriter_lru_multiplier = 4.0
bgwriter_delay = 50            -- milliseconds
synchronous_commit = on        -- match Oracle behavior; evaluate 'off' after validation
wal_compression = lz4          -- if Aurora PostgreSQL version supports it
shared_buffers = 8GB           -- 25% of instance RAM as starting point
effective_cache_size = 24GB    -- 75% of instance RAM

Load test with production-representative traffic against these parameters before cutover. Measure checkpoints_req (should be near zero), buffers_backend (should be near zero), WAL generation rate, and p99 COMMIT latency. Adjust max_wal_size upward if checkpoints_req remains elevated under sustained write load.

The goal is a PostgreSQL configuration that matches the write throughput characteristics of the Oracle system it is replacing — not the PostgreSQL defaults, which are sized for much more modest write loads.

Migrating a write-heavy Oracle OLTP system to Aurora?

We size WAL configuration, checkpoint tuning, and instance architecture for your specific write volume before you commit to a cutover date. Free assessment, no obligation.

Get a Free Assessment → More Articles