Oracle to AWS DMS Cutover: Closing the Gap Without Extended Downtime

Why DMS Cutover Windows Grow

Oracle to Aurora PostgreSQL (or RDS PostgreSQL) migrations via AWS DMS follow a predictable two-phase structure: full-load followed by CDC (Change Data Capture). The full-load phase copies the historical data. CDC phase applies ongoing changes from Oracle's LogMiner or binary logs to keep the target in sync. Cutover means stopping the source application, waiting for CDC lag to reach zero, then pointing the application at the target.

The problem is that CDC lag is not a fixed number you drain down to zero. It's a rate — changes are arriving from the source at one speed and DMS is applying them at another. If the source write rate exceeds DMS's apply throughput for any sustained period, lag grows. And during business hours on the day of cutover, that's exactly what happens to most Oracle OLTP systems.

Teams that have run dry-run cutovers at 2am on a quiet system then try to cut over at 7pm on a Tuesday get surprised. The lag that was 30 seconds during the dry run is now 45 minutes, and the maintenance window was sized for 2 hours total.

Understanding DMS CDC Throughput Limits

AWS DMS CDC throughput for Oracle sources is bounded by three things: the DMS replication instance size, the number of LOB columns and their handling mode, and the LogMiner extraction rate from the Oracle source.

The DMS replication instance is the most commonly under-provisioned component. The default instance for migration projects is often an r5.large or r5.xlarge, which provides approximately 500–800 MB/s of network throughput and 2–4 vCPU for the DMS engine. For a high-volume Oracle source generating 50–200 MB/s of redo log during peak hours, this becomes a bottleneck.

Monitor DMS replication instance CPU and memory in CloudWatch before cutover:

aws cloudwatch get-metric-statistics \
  --namespace AWS/DMS \
  --metric-name CPUUtilization \
  --dimensions Name=ReplicationInstanceIdentifier,Value=your-replication-instance \
  --start-time $(date -u -d '24 hours ago' +%Y-%m-%dT%H:%M:%SZ) \
  --end-time $(date -u +%Y-%m-%dT%H:%M:%SZ) \
  --period 300 \
  --statistics Maximum \
  --output table

If CPU on the replication instance exceeds 70% during normal business hours (not just peak), you are already at risk of falling behind during the cutover window. Upsize the replication instance before the cutover date, not during the window.

The LogMiner Extraction Problem

DMS uses Oracle LogMiner to extract CDC changes from Oracle redo logs. LogMiner has specific performance characteristics that vary by Oracle version and configuration:

LogMiner reads archived redo logs sequentially; it performs poorly when redo log switch frequency is high
LogMiner extraction requires supplemental logging enabled at the table level for the tables you're migrating
Full supplemental logging vs. minimal supplemental logging has a significant impact on redo volume

Verify supplemental logging is correctly scoped before cutover:

-- Check supplemental logging status on source Oracle
SELECT supplemental_log_data_min,
       supplemental_log_data_pk,
       supplemental_log_data_ui,
       supplemental_log_data_fk,
       supplemental_log_data_all
FROM v$database;

-- Check table-level supplemental logging
SELECT log_group_name, log_group_type, always
FROM dba_log_group_columns
WHERE owner = 'SCHEMA_NAME'
ORDER BY log_group_name;

If you enabled ALL COLUMNS supplemental logging instead of PRIMARY KEY supplemental logging, you're generating 3–5x more redo data than necessary. DMS only needs primary key supplemental logging plus any columns you're updating without a full row image. Switching from ALL COLUMNS to PRIMARY KEY supplemental logging on large tables can reduce redo volume by 60% and directly reduce DMS CDC lag.

Gap Closure: The Pre-Cutover Window

The goal of the pre-cutover window is not to eliminate lag — it's to establish a known, stable lag value that you can calculate a drain time from. The cutover window starts when the source application is stopped, and ends when lag reaches zero. Your job is to know, within 15 minutes, how long that drain will take.

Two weeks before cutover, run this measurement exercise:

Identify the 5-minute window with the highest write load on your Oracle source (typically during batch processing or business peak)
Record the DMS CDC lag at the start and end of that window
Calculate the lag accumulation rate: (end_lag - start_lag) / 5 minutes
Record the DMS apply throughput during the same window from CloudWatch CDCIncomingChanges and CDCChangesApplied metrics

This gives you two numbers: peak lag accumulation rate and steady-state apply throughput. The drain time after application stop is:

drain_time_minutes = current_lag_seconds / (apply_rate - arrival_rate_after_app_stop)

After the application stops, arrival_rate drops to near zero (only background Oracle processes).
drain_time_minutes ≈ lag_at_app_stop_seconds / apply_rate_per_second

If DMS is applying 500 changes/second and you have 90,000 changes in the lag queue when the application stops, drain time is approximately 3 minutes. If you have 2,000,000 changes queued, drain time is approximately 67 minutes.

DMS Task Configuration for Cutover Performance

Several DMS task settings directly affect how quickly lag drains during the cutover window.

Parallel Apply Settings

{
  "TargetMetadata": {
    "TargetSchema": "",
    "SupportLobs": true,
    "FullLobMode": false,
    "LobChunkSize": 64,
    "LimitedSizeLobMode": true,
    "LobMaxSize": 32768
  },
  "FullLoadSettings": {
    "TargetTablePrepMode": "DO_NOTHING",
    "CreatePkAfterFullLoad": false,
    "StopTaskCachedChangesApplied": false,
    "StopTaskCachedChangesNotApplied": false
  },
  "ChangeProcessingTuning": {
    "BatchApplyPreserveTransaction": true,
    "BatchApplyTimeoutMin": 1,
    "BatchApplyTimeoutMax": 30,
    "BatchApplyMemoryLimit": 500,
    "BatchSplitSize": 0,
    "MinTransactionSize": 1000,
    "CommitTimeout": 1,
    "MemoryLimitTotal": 1024,
    "MemoryKeepTime": 60,
    "StatementCacheSize": 50
  }
}

The BatchApplyTimeoutMax setting is critical. The default is 30 seconds, meaning DMS will wait up to 30 seconds before flushing a batch to the target. During gap closure, you want this lower — set it to 5–10 seconds to increase apply frequency at the cost of slightly larger commit overhead per batch.

MinTransactionSize controls how many changes DMS accumulates before writing a batch. During normal CDC with ongoing source writes, a higher value (1000–5000) gives better batch efficiency. During gap closure when the source has stopped and the queue is draining, lower values (100–500) reduce per-transaction overhead and speed drain time.

LOB Column Handling

LOB columns are the single largest contributor to DMS CDC slowness on Oracle sources. In limited LOB mode (LimitedSizeLobMode: true), DMS handles LOBs inline up to the LobMaxSize limit. Data exceeding that limit is silently truncated. In full LOB mode, DMS issues a SELECT back to the source Oracle for every LOB value, creating a read load on the source and slowing apply throughput by 3–8x.

Audit your LOB column sizes before the migration:

-- Find tables with LOB columns and their max sizes
SELECT
  t.owner,
  t.table_name,
  t.column_name,
  t.data_type,
  MAX(DBMS_LOB.GETLENGTH(t2.col_value)) AS max_lob_bytes
FROM dba_tab_columns t
  JOIN your_table t2 ON ...  -- sample query per table
WHERE t.data_type IN ('CLOB', 'BLOB', 'NCLOB')
  AND t.owner = 'SCHEMA_NAME'
GROUP BY t.owner, t.table_name, t.column_name, t.data_type
ORDER BY max_lob_bytes DESC NULLS LAST;

Set LobMaxSize to the 99th percentile LOB size in your data, not the maximum. If 99% of your CLOBs are under 64KB but one legacy record has a 10MB CLOB, sizing to 10MB wastes memory on every row. Identify and handle outlier LOB records separately.

The Dry-Run Cutover Protocol

Run at least two dry-run cutovers before the production window. The first dry run reveals configuration problems. The second validates the gap-closure timing under realistic load. A dry-run checklist:

PRE-CUTOVER DRY RUN CHECKLIST
==============================
[ ] DMS task is in CDC_RUNNING state with stable lag (< 60s for 1 hour)
[ ] CloudWatch CDCLatencySource < 60s for past 4 hours
[ ] Replication instance CPU < 60% under peak source load
[ ] LOB handling mode validated against actual data sizes
[ ] Target sequence values set to max(source) + buffer
[ ] Constraint validation: all FK references satisfied on target
[ ] Row count comparison: source vs target for top 20 tables by volume
[ ] Index rebuild on target completed (if deferred post-full-load)
[ ] Application stop procedure rehearsed (connections drained, services stopped)
[ ] Lag-zero confirmation query ready to execute on DMS metrics
[ ] Rollback trigger defined: if gap > X minutes at cutover start, abort

The rollback trigger is non-negotiable. Define a maximum acceptable drain time (typically 2x the expected drain time from your measurements) and commit to aborting the cutover if lag has not closed within that window. Trying to push through a 4-hour drain when the window was planned for 1 hour causes more damage than rescheduling.

Post-Cutover Validation Before DNS Flip

When DMS shows lag = 0, you have 5–10 minutes of confirmation work before flipping DNS or updating connection strings:

-- On target: verify row counts for critical tables
SELECT 'orders' AS tbl, COUNT(*) FROM orders
UNION ALL
SELECT 'order_items', COUNT(*) FROM order_items
UNION ALL
SELECT 'customers', COUNT(*) FROM customers;

-- Compare against source Oracle (run simultaneously)
SELECT 'orders' AS tbl, COUNT(*) FROM orders
UNION ALL
SELECT 'order_items', COUNT(*) FROM order_items
UNION ALL
SELECT 'customers', COUNT(*) FROM customers;

Row count parity is not sufficient validation — you also need to verify the most recent records. Check the maximum value of your primary timestamp or auto-increment column on both source and target for the 5 highest-volume tables. If the source shows order_id max = 8,847,332 and the target shows 8,847,315, seventeen orders went missing and you cannot cut over.

Only after row counts and recent-record checks pass should you execute the DNS flip or connection string update. The DMS task can remain running in CDC mode as a safety net for 30–60 minutes post-cutover before you stop it.

Planning an Oracle DMS cutover in the next 90 days?

We run the pre-cutover measurement exercises, size the DMS infrastructure, and own the gap-closure process so your maintenance window stays inside its planned boundary. Free assessment, no obligation.

Get a Free Assessment → More Articles