Oracle on EC2: EBS Striping for Redo Logs on NVMe Instances

The Redo Log I/O Problem on EC2

Every Oracle COMMIT waits for LGWR to flush the redo buffer to disk. The latency of that write is the floor for your transaction commit time. On a SAN with a write cache, that flush completes in under a millisecond. On EBS gp3, it typically completes in 1-3ms depending on queue depth and volume configuration. On EBS io2, 0.5-1ms. On local NVMe instance store, under 0.1ms.

For an OLTP workload committing 2,000 transactions per second, the difference between 3ms and 0.5ms redo write latency is the difference between needing 6 LGWR processes and needing 1. The redo log I/O path is often the single most latency-sensitive write in an Oracle database, and it's the one that most EC2 configurations handle worst.

The typical mistake: placing Oracle redo logs on a single gp3 volume because "it's fast enough." It may be fast enough at current load. It won't be when load doubles, and the redo log becomes the bottleneck long before anything else in the storage stack does.

Understanding the NVMe Instance Store Constraint

Instance store (local NVMe) on EC2 is attached directly to the physical host. It's not network-attached like EBS — there's no storage fabric between the CPU and the flash. This gives it dramatically lower latency and higher IOPS than any EBS volume type at any price tier.

The constraint: instance store is ephemeral. If the instance is stopped (not just rebooted), the instance store data is gone. If the underlying host fails and the instance is recovered on new hardware, the instance store data is gone. EC2 instance termination destroys it. Instance store is not suitable as the sole location for Oracle redo logs on a production database.

NVMe instance store is available on the following EC2 instance families commonly used for Oracle: i3, i3en, i4i, r5d, r6id, m5d, m6id, x2idn, and x2iedn. The i-family instances are storage-optimized with multiple NVMe devices. The d-suffix instances (r5d, m5d) have 1-4 NVMe devices in addition to standard EBS.

The Architecture: Striped EBS as the Durable Layer

The production-grade Oracle redo log architecture on NVMe instances uses EBS as the durable storage layer, with striping to maximize throughput and IOPS, and places Oracle redo log members on the striped EBS volumes. The NVMe instance store, if present, is used for Oracle's database buffer cache (via DBFS or ASM) to reduce datafile read latency — not for redo logs.

Why stripe EBS for redo logs specifically? A single io2 volume tops out at 64,000 IOPS and 1,000 MB/s. A busy Oracle OLTP database can generate 500+ MB/s of redo during peak batch processing. Two striped io2 volumes give you 2,000 MB/s aggregate throughput at the OS level — well above what LGWR can generate even on the largest EC2 instances. The IOPS cap is also doubled, which matters for redo log writes that arrive as thousands of small sequential writes per second.

Building the Striped EBS Volume with LVM

The configuration below uses Linux LVM striping across two io2 EBS volumes. This is the approach we use for Oracle redo logs on r6id and i4i instances where we want durable, high-throughput redo storage without depending on instance store.

#!/bin/bash
# Oracle redo log EBS striping setup
# Assumes two io2 EBS volumes: /dev/nvme1n1 and /dev/nvme2n1
# Each 500GB, 32000 IOPS provisioned

# Install LVM2 if not present
yum install -y lvm2

# Create physical volumes
pvcreate /dev/nvme1n1
pvcreate /dev/nvme2n1

# Create volume group
vgcreate vg_redo /dev/nvme1n1 /dev/nvme2n1

# Create striped logical volume
# stripes=2 across both PVs
# stripesize=64KB matches Oracle redo block write size
lvcreate -n lv_redo \
  -L 900G \
  --stripes 2 \
  --stripesize 64 \
  vg_redo

# Verify striping configuration
lvdisplay -m /dev/vg_redo/lv_redo

# Format with XFS (recommended for Oracle on Linux)
# nobarrier: safe with EBS because EBS provides its own write barrier guarantee
mkfs.xfs -f \
  -d su=65536,sw=2 \
  /dev/vg_redo/lv_redo

# Mount options optimized for Oracle redo sequential writes
mkdir -p /u03/redo
echo "/dev/vg_redo/lv_redo /u03/redo xfs defaults,noatime,nodiratime,nobarrier 0 0" \
  >> /etc/fstab
mount /u03/redo

# Set permissions for Oracle
chown oracle:oinstall /u03/redo
chmod 775 /u03/redo

Verifying Stripe Performance Before Configuring Oracle

Before placing Oracle redo logs on the striped volume, verify that the stripe is delivering the expected throughput. Use fio — the standard tool for storage performance testing on Linux:

# Sequential write test matching Oracle redo write pattern
# 512KB writes (Oracle typically writes redo in large sequential bursts)
# queue depth 1 (LGWR is typically single-threaded for log writes)
fio --name=redo_write_test \
  --filename=/u03/redo/fio_test \
  --rw=write \
  --bs=512k \
  --direct=1 \
  --numjobs=1 \
  --iodepth=1 \
  --size=10G \
  --runtime=60 \
  --time_based \
  --group_reporting

# Expected result on 2x io2 stripe: 800-1200 MB/s write throughput
# at queue depth 1, latency under 1ms per write

# Also test small random writes (simulates high-frequency small commits)
fio --name=redo_iops_test \
  --filename=/u03/redo/fio_test \
  --rw=randwrite \
  --bs=8k \
  --direct=1 \
  --numjobs=4 \
  --iodepth=8 \
  --size=10G \
  --runtime=60 \
  --time_based \
  --group_reporting

# Clean up test file
rm /u03/redo/fio_test

If write throughput is significantly below expected on the stripe, check that the EBS volumes are actually provisioned at the IOPS you specified (visible in the AWS Console or via aws ec2 describe-volumes) and that the EC2 instance's EBS bandwidth ceiling isn't the bottleneck. An r6id.xlarge has a maximum EBS bandwidth of 4.75 Gbps (~594 MB/s) — less than what two io2 volumes can deliver combined. Size your instance for EBS bandwidth, not just the volume specs.

Oracle Redo Log Configuration on Striped EBS

With the striped volume mounted at /u03/redo, configure Oracle redo log groups to use multiplexed members across the striped path and a secondary path on a separate EBS volume:

-- Check current redo log configuration
SELECT group#, members, bytes/1048576 AS size_mb, status, archived
FROM v$log
ORDER BY group#;

SELECT group#, member, status
FROM v$logfile
ORDER BY group#, member;

-- Add new larger redo log groups on striped EBS
-- 2GB groups for a high-throughput OLTP database
-- Target: log switch no more than once every 20 minutes at peak load
ALTER DATABASE ADD LOGFILE GROUP 10
  ('/u03/redo/redo10a.log',
   '/u04/redo_mirror/redo10b.log')
  SIZE 2048M;

ALTER DATABASE ADD LOGFILE GROUP 11
  ('/u03/redo/redo11a.log',
   '/u04/redo_mirror/redo11b.log')
  SIZE 2048M;

ALTER DATABASE ADD LOGFILE GROUP 12
  ('/u03/redo/redo12a.log',
   '/u04/redo_mirror/redo12b.log')
  SIZE 2048M;

ALTER DATABASE ADD LOGFILE GROUP 13
  ('/u03/redo/redo13a.log',
   '/u04/redo_mirror/redo13b.log')
  SIZE 2048M;

-- Drop old undersized groups (must not be CURRENT or ACTIVE)
-- Force a log switch first if needed
ALTER SYSTEM SWITCH LOGFILE;

-- Wait for groups to become INACTIVE, then drop
ALTER DATABASE DROP LOGFILE GROUP 1;
ALTER DATABASE DROP LOGFILE GROUP 2;
ALTER DATABASE DROP LOGFILE GROUP 3;

The /u04/redo_mirror path should be on a separate EBS volume from the striped set — a single gp3 volume is fine for the mirror because it's only read during recovery. The primary write path is /u03/redo on the stripe. Oracle multiplexes redo log members synchronously, so both copies are written on every LGWR flush, but the commit doesn't wait for the slower mirror member — LGWR proceeds when the fastest member confirms the write.

Wait — that's not quite right. Oracle LGWR writes to all members of a log group and waits for all of them before acknowledging the commit. If you multiplex redo across different speed storage, the commit latency is determined by the slowest member write. For this reason, keep your redo mirror on a volume that's reasonably fast (gp3 with baseline IOPS is fine — it's sequential writes, and gp3 delivers 125-250 MB/s sequential write easily). Don't put the mirror on a mechanical volume or a volume that's also handling heavy datafile I/O.

Monitoring Redo Write Latency After Configuration

-- Measure LGWR write latency after new configuration
-- Compare against pre-change baseline from AWR
SELECT
  event,
  total_waits,
  ROUND(time_waited_micro / 1000 / total_waits, 3) AS avg_wait_ms,
  ROUND(MAX_WAIT / 1000, 3) AS max_wait_ms
FROM v$system_event
WHERE event IN (
  'log file sync',
  'log file parallel write',
  'log buffer space'
)
ORDER BY event;

-- log file parallel write = LGWR writing redo buffers to disk
-- Target: avg < 1ms on striped io2
-- log file sync = user session waiting for LGWR to complete
-- Target: avg < 2ms

-- Check log switch frequency (should be 15-20 min minimum at peak load)
SELECT
  TO_CHAR(first_time, 'YYYY-MM-DD HH24') AS hour,
  COUNT(*) AS switches_per_hour
FROM v$log_history
WHERE first_time > SYSDATE - 3
GROUP BY TO_CHAR(first_time, 'YYYY-MM-DD HH24')
ORDER BY hour DESC;

The target metrics after implementing striped EBS redo: log file parallel write average wait under 1ms, log file sync average wait under 2ms, and log switches no more than 3-4 per hour at peak load. If you're still seeing log file sync above 5ms after implementing the stripe, investigate whether the EC2 instance EBS bandwidth ceiling is the constraint, not the volume configuration.

Need a second opinion on your stack?

We'll review your environment and share findings in 5–7 business days. No sales pitch, no obligation.

Get a Free Assessment → More Articles