Case Study PostgreSQL Performance Optimization With CTE

Problem

PostgreSQL’s query optimizer sometimes chooses sequential scans when index scans would be much faster, especially after schema changes or with large tables.

Symptom: A query that should use an index instead scans millions of rows.

Real Example

Before (Slow - 614ms)

SELECT ...
FROM large_table
WHERE id IN ($1, $2, $3, ..., $50)
  AND status IS NOT NULL

Query Plan:

  Parallel Seq Scan on large_table
    Rows Removed by Filter: 1,574,856
    Buffers: 14,607 pages
    Time: 614ms

PostgreSQL scanned 3.15M rows to find 50 results.

After (Fast - 45ms)

  WITH target_ids AS MATERIALIZED (
    SELECT unnest(array[$1, $2, $3, ..., $50]) AS id
  )
  SELECT ...
  FROM target_ids
  INNER JOIN large_table USING (id)
  WHERE large_table.status IS NOT NULL

Query Plan:

  Index Scan on large_table
    Buffers: 325 pages
    Time: 45ms

PostgreSQL did 50 index lookups directly.

Result: 14× faster, 98% less I/O

Why This Works

The Optimization Barrier

MATERIALIZED forces PostgreSQL to:

Build a temporary table with your target IDs (50 rows)
See the concrete row count (not an estimate)
Drive the query from the small table into the large table
Choose nested loop + index scan (correct strategy)

Without MATERIALIZED (Inlined)

PostgreSQL sees: WHERE id IN (…) AND status IS NOT NULL
Estimates: “status IS NOT NULL probably matches lots of rows”
Decision: “Sequential scan is cheaper”
Result: ❌ Scans millions of rows

With MATERIALIZED (Barrier)

PostgreSQL sees: “Small table (50 rows) → JOIN → Large table”
Decision: “Nested loop + index scan is optimal”
Result: ✅ 50 index lookups only

Key Components

CTE (Common Table Expression)

A temporary named result set:

  WITH temp_table AS (
    SELECT ...
  )
  SELECT ... FROM temp_table

`MATERIALIZED` Keyword

Forces physical temporary table creation:

  WITH temp_table AS MATERIALIZED (...)

Without it, PostgreSQL might inline the CTE and lose the optimization.

`unnest()` Function

Converts array to rows:

  SELECT unnest(array['a', 'b', 'c']) AS value;
  -- Results in:
  -- value
  -- -----
  -- a
  -- b
  -- c

INNER JOIN Strategy

Start from small table, lookup in large table:

  FROM small_table
  INNER JOIN large_table USING (id)  -- Uses index on large_table.id

This forces PostgreSQL to drive from the small side.

Performance Comparison

# Items	Before (ms)	After (ms)	Speedup	Buffers Before	Buffers After
1	8.4	0.19	44×	~14,600	~65
5	29.3	0.32	91×	~14,600	~175
10	536.1	6.6	81×	~14,600	~325
50	614.3	44.9	14×	14,607	325

When to Use This Pattern

✅ Good Use Cases

Small list joining large table:

  WITH targets AS MATERIALIZED (
    SELECT unnest(array[1, 2, 3]) AS id
  )
  SELECT * FROM targets
  JOIN huge_table USING (id)

Query planner choosing sequential scan incorrectly:

Check with EXPLAIN ANALYZE
If you see Seq Scan instead of Index Scan, try materialized CTE

Known small result set:

  WITH small_result AS MATERIALIZED (
    SELECT id FROM complex_query  -- Returns few rows
  )
  SELECT * FROM small_result
  JOIN large_table USING (id)

❌ When NOT to Use

Large intermediate results: Wastes memory
Sequential scan is actually correct: Don’t fight the optimizer when it’s right
Need full query optimization: Use NOT MATERIALIZED or omit keyword

Why Does PostgreSQL Choose Wrong?

PostgreSQL’s cost-based optimizer relies on statistics and heuristics:

Bad estimates: “status IS NOT NULL matches 95% of rows!”
Cost model limitations: Overestimates random I/O penalty
Complex filter interactions: Hard to predict combined selectivity

In theory: Optimizer should make good choices In practice: Sometimes it needs help

Debugging Tools

Check Query Plan

  EXPLAIN (ANALYZE, BUFFERS) SELECT ...;

Look For

Seq Scan where you expect Index Scan
High buffer counts (pages read)
Large “Rows Removed by Filter” (wasted work)
Big difference between “estimated rows” vs “actual rows”

Update Statistics

  ANALYZE table_name;

Stale statistics → bad query plans.

Key Takeaway

The MATERIALIZED CTE trick:

Gives PostgreSQL exact row counts (not estimates)
Forces a specific execution order
Creates an optimization barrier
Results in correct query plan selection

Sometimes you need to “help” the optimizer by making the small dataset explicit and concrete.

折腾 Zhēteng

Explorer

Case Study PostgreSQL Performance Optimization With CTE

Problem

Real Example

Before (Slow - 614ms)

After (Fast - 45ms)

Why This Works

The Optimization Barrier

Without MATERIALIZED (Inlined)

With MATERIALIZED (Barrier)

Key Components

CTE (Common Table Expression)

`MATERIALIZED` Keyword

`unnest()` Function

INNER JOIN Strategy

Performance Comparison

When to Use This Pattern

✅ Good Use Cases

❌ When NOT to Use

Why Does PostgreSQL Choose Wrong?

Debugging Tools

Check Query Plan

Key Takeaway

Graph View

Table of Contents

Backlinks

折腾 Zhēteng

Explorer

Case Study PostgreSQL Performance Optimization With CTE

Problem

Real Example

Before (Slow - 614ms)

After (Fast - 45ms)

Why This Works

The Optimization Barrier

Without MATERIALIZED (Inlined)

With MATERIALIZED (Barrier)

Key Components

CTE (Common Table Expression)

MATERIALIZED Keyword

unnest() Function

INNER JOIN Strategy

Performance Comparison

When to Use This Pattern

✅ Good Use Cases

❌ When NOT to Use

Why Does PostgreSQL Choose Wrong?

Debugging Tools

Check Query Plan

Key Takeaway

Graph View

Table of Contents

Backlinks

`MATERIALIZED` Keyword

`unnest()` Function