Problem

PostgreSQL’s query optimizer sometimes chooses sequential scans when index scans would be much faster, especially after schema changes or with large tables.

Symptom: A query that should use an index instead scans millions of rows.

Real Example

Before (Slow - 614ms)

SELECT ...
FROM large_table
WHERE id IN ($1, $2, $3, ..., $50)
  AND status IS NOT NULL

Query Plan:

  Parallel Seq Scan on large_table
    Rows Removed by Filter: 1,574,856
    Buffers: 14,607 pages
    Time: 614ms

PostgreSQL scanned 3.15M rows to find 50 results.

After (Fast - 45ms)

  WITH target_ids AS MATERIALIZED (
    SELECT unnest(array[$1, $2, $3, ..., $50]) AS id
  )
  SELECT ...
  FROM target_ids
  INNER JOIN large_table USING (id)
  WHERE large_table.status IS NOT NULL

Query Plan:

  Index Scan on large_table
    Buffers: 325 pages
    Time: 45ms

PostgreSQL did 50 index lookups directly.

Result: 14× faster, 98% less I/O

Why This Works

The Optimization Barrier

MATERIALIZED forces PostgreSQL to:

  1. Build a temporary table with your target IDs (50 rows)
  2. See the concrete row count (not an estimate)
  3. Drive the query from the small table into the large table
  4. Choose nested loop + index scan (correct strategy)

Without MATERIALIZED (Inlined)

  • PostgreSQL sees: WHERE id IN (…) AND status IS NOT NULL
  • Estimates: “status IS NOT NULL probably matches lots of rows”
  • Decision: “Sequential scan is cheaper”
  • Result: ❌ Scans millions of rows

With MATERIALIZED (Barrier)

  • PostgreSQL sees: “Small table (50 rows) → JOIN → Large table”
  • Decision: “Nested loop + index scan is optimal”
  • Result: ✅ 50 index lookups only

Key Components

CTE (Common Table Expression)

A temporary named result set:

  WITH temp_table AS (
    SELECT ...
  )
  SELECT ... FROM temp_table

MATERIALIZED Keyword

Forces physical temporary table creation:

  WITH temp_table AS MATERIALIZED (...)

Without it, PostgreSQL might inline the CTE and lose the optimization.

unnest() Function

Converts array to rows:

  SELECT unnest(array['a', 'b', 'c']) AS value;
  -- Results in:
  -- value
  -- -----
  -- a
  -- b
  -- c

INNER JOIN Strategy

Start from small table, lookup in large table:

  FROM small_table
  INNER JOIN large_table USING (id)  -- Uses index on large_table.id

This forces PostgreSQL to drive from the small side.

Performance Comparison

# ItemsBefore (ms)After (ms)SpeedupBuffers BeforeBuffers After
18.40.1944×~14,600~65
529.30.3291×~14,600~175
10536.16.681×~14,600~325
50614.344.914×14,607325

When to Use This Pattern

✅ Good Use Cases

  • Small list joining large table:
  WITH targets AS MATERIALIZED (
    SELECT unnest(array[1, 2, 3]) AS id
  )
  SELECT * FROM targets
  JOIN huge_table USING (id)

Query planner choosing sequential scan incorrectly:

  • Check with EXPLAIN ANALYZE
  • If you see Seq Scan instead of Index Scan, try materialized CTE

Known small result set:

  WITH small_result AS MATERIALIZED (
    SELECT id FROM complex_query  -- Returns few rows
  )
  SELECT * FROM small_result
  JOIN large_table USING (id)

❌ When NOT to Use

  • Large intermediate results: Wastes memory
  • Sequential scan is actually correct: Don’t fight the optimizer when it’s right
  • Need full query optimization: Use NOT MATERIALIZED or omit keyword

Why Does PostgreSQL Choose Wrong?

PostgreSQL’s cost-based optimizer relies on statistics and heuristics:

  1. Bad estimates: “status IS NOT NULL matches 95% of rows!”
  2. Cost model limitations: Overestimates random I/O penalty
  3. Complex filter interactions: Hard to predict combined selectivity

In theory: Optimizer should make good choices In practice: Sometimes it needs help

Debugging Tools

Check Query Plan

  EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Look For

  • Seq Scan where you expect Index Scan
  • High buffer counts (pages read)
  • Large “Rows Removed by Filter” (wasted work)
  • Big difference between “estimated rows” vs “actual rows”

Update Statistics

  ANALYZE table_name;

Stale statistics → bad query plans.

Key Takeaway

The MATERIALIZED CTE trick:

  • Gives PostgreSQL exact row counts (not estimates)
  • Forces a specific execution order
  • Creates an optimization barrier
  • Results in correct query plan selection

Sometimes you need to “help” the optimizer by making the small dataset explicit and concrete.