Problem
PostgreSQL’s query optimizer sometimes chooses sequential scans when index scans would be much faster, especially after schema changes or with large tables.
Symptom: A query that should use an index instead scans millions of rows.
Real Example
Before (Slow - 614ms)
SELECT ...
FROM large_table
WHERE id IN ($1, $2, $3, ..., $50)
  AND status IS NOT NULLQuery Plan:
  Parallel Seq Scan on large_table
    Rows Removed by Filter: 1,574,856
    Buffers: 14,607 pages
    Time: 614msPostgreSQL scanned 3.15M rows to find 50 results.
After (Fast - 45ms)
  WITH target_ids AS MATERIALIZED (
    SELECT unnest(array[$1, $2, $3, ..., $50]) AS id
  )
  SELECT ...
  FROM target_ids
  INNER JOIN large_table USING (id)
  WHERE large_table.status IS NOT NULLQuery Plan:
  Index Scan on large_table
    Buffers: 325 pages
    Time: 45msPostgreSQL did 50 index lookups directly.
Result: 14× faster, 98% less I/O
Why This Works
The Optimization Barrier
MATERIALIZED forces PostgreSQL to:
- Build a temporary table with your target IDs (50 rows)
- See the concrete row count (not an estimate)
- Drive the query from the small table into the large table
- Choose nested loop + index scan (correct strategy)
Without MATERIALIZED (Inlined)
- PostgreSQL sees: WHERE id IN (…) AND status IS NOT NULL
- Estimates: “status IS NOT NULL probably matches lots of rows”
- Decision: “Sequential scan is cheaper”
- Result: ❌ Scans millions of rows
With MATERIALIZED (Barrier)
- PostgreSQL sees: “Small table (50 rows) → JOIN → Large table”
- Decision: “Nested loop + index scan is optimal”
- Result: ✅ 50 index lookups only
Key Components
CTE (Common Table Expression)
A temporary named result set:
  WITH temp_table AS (
    SELECT ...
  )
  SELECT ... FROM temp_tableMATERIALIZED Keyword
Forces physical temporary table creation:
  WITH temp_table AS MATERIALIZED (...)Without it, PostgreSQL might inline the CTE and lose the optimization.
unnest() Function
Converts array to rows:
  SELECT unnest(array['a', 'b', 'c']) AS value;
  -- Results in:
  -- value
  -- -----
  -- a
  -- b
  -- cINNER JOIN Strategy
Start from small table, lookup in large table:
  FROM small_table
  INNER JOIN large_table USING (id)  -- Uses index on large_table.idThis forces PostgreSQL to drive from the small side.
Performance Comparison
| # Items | Before (ms) | After (ms) | Speedup | Buffers Before | Buffers After | 
|---|---|---|---|---|---|
| 1 | 8.4 | 0.19 | 44× | ~14,600 | ~65 | 
| 5 | 29.3 | 0.32 | 91× | ~14,600 | ~175 | 
| 10 | 536.1 | 6.6 | 81× | ~14,600 | ~325 | 
| 50 | 614.3 | 44.9 | 14× | 14,607 | 325 | 
When to Use This Pattern
✅ Good Use Cases
- Small list joining large table:
  WITH targets AS MATERIALIZED (
    SELECT unnest(array[1, 2, 3]) AS id
  )
  SELECT * FROM targets
  JOIN huge_table USING (id)Query planner choosing sequential scan incorrectly:
- Check with EXPLAIN ANALYZE
- If you see Seq Scan instead of Index Scan, try materialized CTE
Known small result set:
  WITH small_result AS MATERIALIZED (
    SELECT id FROM complex_query  -- Returns few rows
  )
  SELECT * FROM small_result
  JOIN large_table USING (id)❌ When NOT to Use
- Large intermediate results: Wastes memory
- Sequential scan is actually correct: Don’t fight the optimizer when it’s right
- Need full query optimization: Use NOT MATERIALIZED or omit keyword
Why Does PostgreSQL Choose Wrong?
PostgreSQL’s cost-based optimizer relies on statistics and heuristics:
- Bad estimates: “status IS NOT NULL matches 95% of rows!”
- Cost model limitations: Overestimates random I/O penalty
- Complex filter interactions: Hard to predict combined selectivity
In theory: Optimizer should make good choices In practice: Sometimes it needs help
Debugging Tools
Check Query Plan
  EXPLAIN (ANALYZE, BUFFERS) SELECT ...;Look For
- Seq Scan where you expect Index Scan
- High buffer counts (pages read)
- Large “Rows Removed by Filter” (wasted work)
- Big difference between “estimated rows” vs “actual rows”
Update Statistics
  ANALYZE table_name;
Stale statistics → bad query plans.
Key Takeaway
The MATERIALIZED CTE trick:
- Gives PostgreSQL exact row counts (not estimates)
- Forces a specific execution order
- Creates an optimization barrier
- Results in correct query plan selection
Sometimes you need to “help” the optimizer by making the small dataset explicit and concrete.