Postgres Logical Decoding for Change Data Capture: pgoutput vs. wal2json

How Logical Decoding Works

Postgres writes every change to the Write-Ahead Log (WAL) before applying it to the heap. WAL is normally binary and tightly coupled to the physical storage format — not suitable for consumption by external systems. Logical decoding sits on top of WAL and transforms the physical change records into logical row-level events: INSERT, UPDATE, DELETE for specific tables, in transaction order, with before and after images depending on the replica identity setting.

The transformation is done by an output plugin — a shared library that receives raw decoded change records and serializes them into whatever format the consumer expects. Two output plugins cover the majority of production use cases: pgoutput (bundled with Postgres, used for native logical replication) and wal2json (third-party, produces JSON for CDC pipelines).

Logical decoding requires two prerequisites: wal_level = logical (not the default replica) and a replication slot to track consumer position.

-- Check current WAL level
SHOW wal_level;

-- wal_level must be 'logical' for logical decoding
-- On RDS/Aurora, set via parameter group and reboot:
-- rds.logical_replication = 1  (this also sets wal_level = logical)

-- Check existing replication slots
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn,
       pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;

The retained_wal column in that query is the most important operational metric for logical replication slots. An inactive slot that is not being consumed retains all WAL since restart_lsn — indefinitely, until the slot is dropped or the consumer catches up. This is the disk bloat scenario described in detail in our earlier article on replication slot WAL retention.

pgoutput: The Built-In Plugin

pgoutput is the output plugin used by Postgres native logical replication. It is compiled into Postgres and requires no separate installation. It produces a binary protocol consumed by the pg_logical streaming replication protocol — not human-readable, but efficient.

You interact with pgoutput through publications and subscriptions:

-- Source database: create a publication for CDC
-- Publish all changes to specific tables
CREATE PUBLICATION cdc_pub
  FOR TABLE orders, order_items, customers
  WITH (publish = 'insert, update, delete');

-- Or publish all tables in the database
CREATE PUBLICATION cdc_pub_all FOR ALL TABLES;

-- Check what's published
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete, pubtruncate
FROM pg_publication;

SELECT pubname, schemaname, tablename
FROM pg_publication_tables
WHERE pubname = 'cdc_pub'
ORDER BY schemaname, tablename;

For consuming pgoutput directly (not through a subscription), you create a replication slot and stream from it using the logical replication protocol:

-- Create a logical replication slot using pgoutput
SELECT pg_create_logical_replication_slot('cdc_slot', 'pgoutput');

-- Peek at pending changes without consuming them (useful for testing)
SELECT lsn, xid, data
FROM pg_logical_slot_peek_binary_changes(
  'cdc_slot',
  NULL,  -- up to LSN (NULL = all available)
  NULL,  -- max changes (NULL = all)
  'proto_version', '1',
  'publication_names', 'cdc_pub'
);

pgoutput is the right choice when: you're setting up native Postgres logical replication to a subscriber, you're building a CDC consumer in a language with a Postgres replication protocol library, or you're using Debezium with its Postgres connector (which uses pgoutput by default in recent versions).

wal2json: JSON Output for Pipeline Integration

wal2json is a third-party output plugin that serializes WAL changes as JSON. It is widely used with Debezium (older configurations), AWS DMS, and custom CDC consumers that want human-readable change events without implementing the binary replication protocol.

-- wal2json must be installed separately
-- On Amazon RDS/Aurora: pre-installed, available by default
-- On self-managed Postgres: install via package manager or compile from source

-- Create a replication slot using wal2json
SELECT pg_create_logical_replication_slot('wal2json_slot', 'wal2json');

-- Read changes as JSON
SELECT lsn, xid, data::json
FROM pg_logical_slot_get_changes(
  'wal2json_slot',
  NULL,
  NULL,
  'pretty-print', '1',
  'include-timestamp', '1',
  'include-transaction-id', '1'
);

Example wal2json output for an INSERT:

{
  "change": [
    {
      "kind": "insert",
      "schema": "public",
      "table": "orders",
      "columnnames": ["order_id", "customer_id", "amount", "created_at"],
      "columntypes": ["integer", "integer", "numeric", "timestamp"],
      "columnvalues": [10042, 501, 149.99, "2025-06-03 14:32:10.123456"]
    }
  ]
}

wal2json v2 (the current version) added support for before-images on UPDATE and DELETE, which requires the table's replica identity to include the full row:

-- Enable full replica identity for before-images on UPDATE/DELETE
ALTER TABLE orders REPLICA IDENTITY FULL;

-- Default is REPLICA IDENTITY DEFAULT (primary key columns only)
-- FULL captures all columns in the before-image — higher WAL volume
-- INDEX uses a specific unique index instead of the full row

-- Check replica identity settings
SELECT relname, CASE relreplident
  WHEN 'd' THEN 'DEFAULT (primary key)'
  WHEN 'n' THEN 'NOTHING'
  WHEN 'f' THEN 'FULL'
  WHEN 'i' THEN 'INDEX'
END AS replica_identity
FROM pg_class
WHERE relkind = 'r'
  AND relnamespace = 'public'::regnamespace
ORDER BY relname;

pgoutput vs. wal2json: Choosing the Right Plugin

The choice is primarily driven by your consumer architecture:

Use pgoutput when:

Setting up Postgres-to-Postgres logical replication (subscriptions require pgoutput)
Using Debezium 1.6+ with the Postgres connector (pgoutput is the default and preferred plugin)
Your consumer speaks the Postgres replication protocol natively
You need efficient binary transfer with low overhead on high-volume tables

Use wal2json when:

Building a custom CDC consumer in a language without replication protocol support
Integrating with AWS DMS (which uses wal2json for Postgres sources)
Debugging change streams — JSON output is readable without a protocol decoder
Using older Debezium versions (pre-1.6) that defaulted to wal2json

Replication Slot Monitoring: The Operational Non-Negotiable

Replication slots pin WAL on disk until the consumer confirms it has processed up to a given LSN. An inactive slot — one whose consumer has stopped, crashed, or been decommissioned without dropping the slot — retains all WAL indefinitely. On a high-write database, this fills the WAL partition in hours to days.

-- Monitor all replication slots: active status and WAL retention
SELECT
  slot_name,
  plugin,
  active,
  active_pid,
  restart_lsn,
  confirmed_flush_lsn,
  pg_size_pretty(
    pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
  ) AS wal_retained,
  pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS wal_retained_bytes
FROM pg_replication_slots
ORDER BY wal_retained_bytes DESC NULLS LAST;

Alert on any slot where wal_retained_bytes exceeds a threshold — typically 10–20 GB depending on available disk. On RDS and Aurora, the managed WAL storage has soft limits; a slot retaining hundreds of GB triggers storage alerts and can exhaust storage capacity.

-- Drop a stale slot that is no longer being consumed
-- WARNING: this cannot be undone; the consumer will need to re-snapshot
SELECT pg_drop_replication_slot('stale_cdc_slot');

-- On RDS Aurora, you can also use:
-- SELECT aurora_drop_replication_slot('stale_cdc_slot');

Setting max_slot_wal_keep_size

Postgres 13 introduced max_slot_wal_keep_size, which sets a hard limit on how much WAL a single slot can retain. When a slot's retained WAL exceeds this threshold, Postgres invalidates the slot rather than allowing it to fill disk:

# postgresql.conf
# Allow slots to retain up to 10GB before invalidating
max_slot_wal_keep_size = 10GB

-- Check if any slots have been invalidated
SELECT slot_name, invalidation_reason
FROM pg_replication_slots
WHERE invalidation_reason IS NOT NULL;

An invalidated slot's consumer must perform a full resnapshot of the source tables before resuming CDC — which is expensive for large tables. Set max_slot_wal_keep_size high enough that brief consumer outages (hours) don't invalidate the slot, but low enough that extended outages don't fill disk. For most production setups, 10–50 GB is a reasonable range. On RDS, this parameter is available in the parameter group as of RDS for PostgreSQL 13.

Building a CDC pipeline on Postgres or debugging slot WAL retention?

We design and operationalize logical decoding architectures — plugin selection, replica identity strategy, slot monitoring, and WAL retention guardrails. Free assessment, no obligation.

Get a Free Assessment → More Articles