DB2 LUW MQT Performance: From 2 Hours to 8 Minutes
The Problem Everyone Misdiagnosed
A reporting job running every morning had crept from 45 minutes to 2 hours over 18 months. The team opened a hardware ticket. They were looking at CPU graphs and buffer pool hit ratios. Nobody was looking at the MQT.
The job was refreshing a DB2 LUW Materialized Query Table that fed four downstream batch processes. Every process downstream was delayed. The business impact was accumulating daily — reports that were supposed to land at 6 AM weren't landing until 8, and the SLA for one of the downstream consumers was 7:30.
The hardware wasn't the problem. The underlying query architecture was.
What MQTs Are and Why They Degrade
A DB2 LUW Materialized Query Table is a table whose contents are defined by a query and maintained by DB2 automatically. When you create an MQT with REFRESH DEFERRED, DB2 stores the query definition and refreshes the table's content when you issue a REFRESH TABLE command — typically in a scheduled batch job. The optimizer can also route queries against the base tables through the MQT automatically if it determines the MQT satisfies the query, giving transparent query acceleration.
The performance of an MQT refresh is entirely determined by the performance of the underlying query. This sounds obvious. It's ignored constantly.
When someone builds an MQT, they write a query, verify it returns the right data, and move on. Months later, the base tables have grown. The query that ran in 4 minutes against 10 million rows now runs in 45 minutes against 80 million rows. The growth didn't change the correctness of the query — it exposed the algorithmic complexity that was always there.
The Correlated Subquery Problem
Correlated subqueries are the most common cause of MQT performance collapse at scale. They look innocent. They return correct results. They scale catastrophically.
A correlated subquery is one that references columns from the outer query — meaning it re-executes for every row the outer query processes. In a small table, this is unnoticeable. Against 80 million rows, you're executing the subquery 80 million times.
Here's the pattern we found, simplified:
Two correlated subqueries. Each one re-executes against its respective table for every row in ACCOUNTS_TBL. At 80 million account-period rows, that's 160 million subquery executions per refresh cycle. DB2's optimizer cannot flatten correlated subqueries into a join in all cases — when it can't, you get the row-by-row loop.
The Fix: Pre-Aggregated Derived Tables
The rewrite replaces correlated subqueries with derived tables (inline views) that pre-aggregate their results before joining. The outer query then joins to pre-aggregated results — one pass through DETAIL_TBL, one pass through TRANSACTIONS, then a join. No loops.
Same result set. Same data freshness. The execution plan went from a nested loop with 160 million subquery evaluations to three sequential scans and two hash joins. DB2's optimizer knows how to optimize hash joins against large tables. It does not know how to optimize away a semantic requirement to evaluate a subquery 80 million times.
Three Things to Check on Every Slow DB2 MQT
1. Correlated Subqueries in the Definition
Open the MQT definition — either from the DDL you have or from SYSCAT.TABLES and SYSCAT.VIEWS. Look for subqueries in the SELECT list or WHERE clause that reference columns from the outer query. If you find any, they are almost certainly the cause of degraded performance at scale. Rewrite as derived tables or CTEs with GROUP BY.
2. REFRESH DEFERRED vs IMMEDIATE vs ON STATEMENT
REFRESH DEFERRED means the MQT is only refreshed when you explicitly run REFRESH TABLE. The data can be stale between refreshes — acceptable for daily batch reporting, not acceptable for real-time dashboards. REFRESH IMMEDIATE means DB2 maintains the MQT synchronously with every DML operation on the base tables — accurate but adds overhead to every INSERT/UPDATE/DELETE. REFRESH ON STATEMENT falls between the two.
Most batch reporting MQTs should be REFRESH DEFERRED. If yours is REFRESH IMMEDIATE and the base tables have heavy write load, you may be paying the maintenance cost continuously rather than once per refresh cycle. That's a different problem with a different fix.
3. Stale Statistics on the Base Tables
DB2's optimizer uses table statistics — row counts, column cardinality, distribution statistics — to build execution plans. If statistics are stale, the optimizer may generate a plan that was reasonable 18 months ago but is wrong for the current data volume. Run RUNSTATS on every table referenced by the MQT before concluding the query is the problem.
If RUNSTATS alone drops your MQT refresh from 2 hours to 45 minutes, the query architecture may still have problems — you just unmasked them. If RUNSTATS drops it to 12 minutes and you're at acceptable SLA, you may be done. If the optimizer was so wrong it was choosing a nested loop where a hash join makes sense, updated statistics may fix the plan without a query rewrite.
Using EXPLAIN to Verify the Fix
Before and after a query rewrite, run EXPLAIN to capture the access plan. DB2 writes explain output to the EXPLAIN_STATEMENT and EXPLAIN_OPERATOR tables (or you can use Visual Explain in Data Studio). Look for:
- Nested Loop Joins against large tables — these are the smoking gun for correlated subquery problems
- TQUEUE operators — these indicate parallelism, which can mask bad query architecture
- SORT operators on large row counts — often avoidable with the right index
- Estimated vs actual row counts — large divergence means stale statistics
After the rewrite, the EXPLAIN output showed Hash Joins replacing the Nested Loop operators. The estimated cost dropped by two orders of magnitude. When we ran the refresh, the execution time matched the estimate — 8 minutes.
The Broader Lesson
Most DB2 performance problems aren't hardware problems. They're query architecture problems wearing a hardware mask. The symptoms — high CPU, long elapsed time, buffer pool pressure — can all result from a query that is algorithmically wrong for its data volume. Adding CPU or memory to a server running a correlated subquery loop against 80 million rows makes the loop run slightly faster. Rewriting the loop removes it entirely.
Before you open a hardware ticket on a slow DB2 job: pull the query text, run EXPLAIN, and check SYSCAT for table row counts and statistics age. Five minutes of investigation here will tell you whether you have a hardware problem or a query problem. In our experience, it's the query 80% of the time.
DB2 Performance Problem? Let's Look at It.
We specialize in DB2 LUW performance — MQT tuning, HADR administration, REORG strategy, and batch job optimization. If you have a job that's slower than it should be, we can find out why.
Talk to Our Team