How Logical Decoding Works
Postgres writes every change to the Write-Ahead Log (WAL) before applying it to the heap. WAL is normally binary and tightly coupled to the physical storage format — not suitable for consumption by external systems. Logical decoding sits on top of WAL and transforms the physical change records into logical row-level events: INSERT, UPDATE, DELETE for specific tables, in transaction order, with before and after images depending on the replica identity setting.
The transformation is done by an output plugin — a shared library that receives raw decoded change records and serializes them into whatever format the consumer expects. Two output plugins cover the majority of production use cases: pgoutput (bundled with Postgres, used for native logical replication) and wal2json (third-party, produces JSON for CDC pipelines).
Logical decoding requires two prerequisites: wal_level = logical (not the default replica) and a replication slot to track consumer position.
-- Check current WAL level
SHOW wal_level;
-- wal_level must be 'logical' for logical decoding
-- On RDS/Aurora, set via parameter group and reboot:
-- rds.logical_replication = 1 (this also sets wal_level = logical)
-- Check existing replication slots
SELECT slot_name, plugin, slot_type, active, restart_lsn, confirmed_flush_lsn,
pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots;
The retained_wal column in that query is the most important operational metric for logical replication slots. An inactive slot that is not being consumed retains all WAL since restart_lsn — indefinitely, until the slot is dropped or the consumer catches up. This is the disk bloat scenario described in detail in our earlier article on replication slot WAL retention.
pgoutput: The Built-In Plugin
pgoutput is the output plugin used by Postgres native logical replication. It is compiled into Postgres and requires no separate installation. It produces a binary protocol consumed by the pg_logical streaming replication protocol — not human-readable, but efficient.
You interact with pgoutput through publications and subscriptions:
-- Source database: create a publication for CDC
-- Publish all changes to specific tables
CREATE PUBLICATION cdc_pub
FOR TABLE orders, order_items, customers
WITH (publish = 'insert, update, delete');
-- Or publish all tables in the database
CREATE PUBLICATION cdc_pub_all FOR ALL TABLES;
-- Check what's published
SELECT pubname, puballtables, pubinsert, pubupdate, pubdelete, pubtruncate
FROM pg_publication;
SELECT pubname, schemaname, tablename
FROM pg_publication_tables
WHERE pubname = 'cdc_pub'
ORDER BY schemaname, tablename;
For consuming pgoutput directly (not through a subscription), you create a replication slot and stream from it using the logical replication protocol:
-- Create a logical replication slot using pgoutput
SELECT pg_create_logical_replication_slot('cdc_slot', 'pgoutput');
-- Peek at pending changes without consuming them (useful for testing)
SELECT lsn, xid, data
FROM pg_logical_slot_peek_binary_changes(
'cdc_slot',
NULL, -- up to LSN (NULL = all available)
NULL, -- max changes (NULL = all)
'proto_version', '1',
'publication_names', 'cdc_pub'
);
pgoutput is the right choice when: you're setting up native Postgres logical replication to a subscriber, you're building a CDC consumer in a language with a Postgres replication protocol library, or you're using Debezium with its Postgres connector (which uses pgoutput by default in recent versions).
wal2json: JSON Output for Pipeline Integration
wal2json is a third-party output plugin that serializes WAL changes as JSON. It is widely used with Debezium (older configurations), AWS DMS, and custom CDC consumers that want human-readable change events without implementing the binary replication protocol.
-- wal2json must be installed separately
-- On Amazon RDS/Aurora: pre-installed, available by default
-- On self-managed Postgres: install via package manager or compile from source
-- Create a replication slot using wal2json
SELECT pg_create_logical_replication_slot('wal2json_slot', 'wal2json');
-- Read changes as JSON
SELECT lsn, xid, data::json
FROM pg_logical_slot_get_changes(
'wal2json_slot',
NULL,
NULL,
'pretty-print', '1',
'include-timestamp', '1',
'include-transaction-id', '1'
);
Example wal2json output for an INSERT:
{
"change": [
{
"kind": "insert",
"schema": "public",
"table": "orders",
"columnnames": ["order_id", "customer_id", "amount", "created_at"],
"columntypes": ["integer", "integer", "numeric", "timestamp"],
"columnvalues": [10042, 501, 149.99, "2025-06-03 14:32:10.123456"]
}
]
}
wal2json v2 (the current version) added support for before-images on UPDATE and DELETE, which requires the table's replica identity to include the full row:
-- Enable full replica identity for before-images on UPDATE/DELETE
ALTER TABLE orders REPLICA IDENTITY FULL;
-- Default is REPLICA IDENTITY DEFAULT (primary key columns only)
-- FULL captures all columns in the before-image — higher WAL volume
-- INDEX uses a specific unique index instead of the full row
-- Check replica identity settings
SELECT relname, CASE relreplident
WHEN 'd' THEN 'DEFAULT (primary key)'
WHEN 'n' THEN 'NOTHING'
WHEN 'f' THEN 'FULL'
WHEN 'i' THEN 'INDEX'
END AS replica_identity
FROM pg_class
WHERE relkind = 'r'
AND relnamespace = 'public'::regnamespace
ORDER BY relname;
pgoutput vs. wal2json: Choosing the Right Plugin
The choice is primarily driven by your consumer architecture:
Use pgoutput when:
- Setting up Postgres-to-Postgres logical replication (subscriptions require pgoutput)
- Using Debezium 1.6+ with the Postgres connector (pgoutput is the default and preferred plugin)
- Your consumer speaks the Postgres replication protocol natively
- You need efficient binary transfer with low overhead on high-volume tables
Use wal2json when:
- Building a custom CDC consumer in a language without replication protocol support
- Integrating with AWS DMS (which uses wal2json for Postgres sources)
- Debugging change streams — JSON output is readable without a protocol decoder
- Using older Debezium versions (pre-1.6) that defaulted to wal2json
Replication Slot Monitoring: The Operational Non-Negotiable
Replication slots pin WAL on disk until the consumer confirms it has processed up to a given LSN. An inactive slot — one whose consumer has stopped, crashed, or been decommissioned without dropping the slot — retains all WAL indefinitely. On a high-write database, this fills the WAL partition in hours to days.
-- Monitor all replication slots: active status and WAL retention
SELECT
slot_name,
plugin,
active,
active_pid,
restart_lsn,
confirmed_flush_lsn,
pg_size_pretty(
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)
) AS wal_retained,
pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) AS wal_retained_bytes
FROM pg_replication_slots
ORDER BY wal_retained_bytes DESC NULLS LAST;
Alert on any slot where wal_retained_bytes exceeds a threshold — typically 10–20 GB depending on available disk. On RDS and Aurora, the managed WAL storage has soft limits; a slot retaining hundreds of GB triggers storage alerts and can exhaust storage capacity.
-- Drop a stale slot that is no longer being consumed
-- WARNING: this cannot be undone; the consumer will need to re-snapshot
SELECT pg_drop_replication_slot('stale_cdc_slot');
-- On RDS Aurora, you can also use:
-- SELECT aurora_drop_replication_slot('stale_cdc_slot');
Setting max_slot_wal_keep_size
Postgres 13 introduced max_slot_wal_keep_size, which sets a hard limit on how much WAL a single slot can retain. When a slot's retained WAL exceeds this threshold, Postgres invalidates the slot rather than allowing it to fill disk:
# postgresql.conf
# Allow slots to retain up to 10GB before invalidating
max_slot_wal_keep_size = 10GB
-- Check if any slots have been invalidated
SELECT slot_name, invalidation_reason
FROM pg_replication_slots
WHERE invalidation_reason IS NOT NULL;
An invalidated slot's consumer must perform a full resnapshot of the source tables before resuming CDC — which is expensive for large tables. Set max_slot_wal_keep_size high enough that brief consumer outages (hours) don't invalidate the slot, but low enough that extended outages don't fill disk. For most production setups, 10–50 GB is a reasonable range. On RDS, this parameter is available in the parameter group as of RDS for PostgreSQL 13.
Building a CDC pipeline on Postgres or debugging slot WAL retention?
We design and operationalize logical decoding architectures — plugin selection, replica identity strategy, slot monitoring, and WAL retention guardrails. Free assessment, no obligation.