Docs For AI
Optimization

Execution Plans

Understanding and analyzing query execution plans for optimization

Execution Plans

Execution plans show how the database engine will execute a query. Understanding them is essential for query optimization.

What is an Execution Plan?

Query execution flow:
┌─────────────────────────────────────────────────────────┐
│ SQL Query                                               │
│     │                                                   │
│     ▼                                                   │
│ ┌─────────────┐                                         │
│ │   Parser    │  Syntax checking                        │
│ └──────┬──────┘                                         │
│        │                                                │
│        ▼                                                │
│ ┌─────────────┐                                         │
│ │  Optimizer  │  Generate execution plan                │
│ └──────┬──────┘                                         │
│        │                                                │
│        ▼                                                │
│ ┌─────────────┐                                         │
│ │  Executor   │  Execute the plan                       │
│ └─────────────┘                                         │
└─────────────────────────────────────────────────────────┘

The optimizer chooses:
- Which indexes to use
- Join order and method
- Scan method (sequential vs index)
- Aggregation strategy

PostgreSQL EXPLAIN

Basic Usage

-- Show execution plan
EXPLAIN SELECT * FROM users WHERE id = 1;

-- Show plan with actual execution statistics
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1;

-- Show detailed output including buffers
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE id = 1;

-- JSON format for tooling
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM users WHERE id = 1;

Reading the Output

EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;

-- Output:
HashAggregate  (cost=245.00..255.00 rows=1000 width=40)
               (actual time=15.2..18.4 rows=950 loops=1)
  Group Key: u.id
  ->  Hash Left Join  (cost=30.00..200.00 rows=3000 width=36)
                      (actual time=2.1..12.3 rows=2800 loops=1)
        Hash Cond: (u.id = o.user_id)
        ->  Index Scan using idx_users_created ON users u
                      (cost=0.29..50.00 rows=1000 width=36)
                      (actual time=0.02..1.2 rows=950 loops=1)
              Index Cond: (created_at > '2024-01-01')
        ->  Hash  (cost=20.00..20.00 rows=800 width=8)
                  (actual time=1.8..1.8 rows=2000 loops=1)
              ->  Seq Scan on orders o
                        (cost=0.00..20.00 rows=800 width=8)
                        (actual time=0.01..0.9 rows=2000 loops=1)
Planning Time: 0.5 ms
Execution Time: 19.1 ms

Key Metrics

┌─────────────────────────────────────────────────────────┐
│ cost=startup..total                                     │
│   - Estimated cost units (not time!)                    │
│   - startup: cost before first row returned             │
│   - total: cost to return all rows                      │
│                                                         │
│ rows=N                                                  │
│   - Estimated number of rows                            │
│                                                         │
│ actual time=startup..total                              │
│   - Real execution time in milliseconds                 │
│                                                         │
│ rows=N loops=M                                          │
│   - Actual rows returned, number of iterations          │
│                                                         │
│ Buffers: shared hit=N read=M                            │
│   - hit: pages from cache                               │
│   - read: pages from disk                               │
└─────────────────────────────────────────────────────────┘

Common Scan Types

Sequential Scan

-- Full table scan, reads every row
Seq Scan on users  (cost=0.00..1000.00 rows=50000 width=100)
  Filter: (status = 'active')
  Rows Removed by Filter: 30000

-- When it's used:
-- ✓ No suitable index
-- ✓ Querying large portion of table
-- ✓ Small table

-- When to optimize:
-- ✗ Querying small subset of large table
-- Solution: Add index
CREATE INDEX idx_users_status ON users(status);

Index Scan

-- Uses index, then fetches rows from table
Index Scan using idx_users_email on users
  (cost=0.29..8.31 rows=1 width=100)
  Index Cond: (email = 'user@example.com')

-- Best for:
-- ✓ Selective queries (few rows)
-- ✓ Columns with unique/high cardinality

Index Only Scan

-- All data from index, no table access
Index Only Scan using idx_users_email_name on users
  (cost=0.29..4.31 rows=1 width=50)
  Index Cond: (email = 'user@example.com')
  Heap Fetches: 0

-- Requires: Covering index with all needed columns
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name);

Bitmap Index Scan

-- Two-phase: build bitmap, then fetch
Bitmap Heap Scan on orders  (cost=10.00..500.00 rows=5000 width=100)
  Recheck Cond: (status = 'pending')
  ->  Bitmap Index Scan on idx_orders_status
        (cost=0.00..9.00 rows=5000 width=0)
        Index Cond: (status = 'pending')

-- Used for:
-- ✓ Medium selectivity queries
-- ✓ OR conditions with multiple indexes
-- ✓ When too many rows for index scan, too few for seq scan

Join Methods

Nested Loop Join

Nested Loop  (cost=0.29..100.00 rows=10 width=200)
  ->  Index Scan on users u  (rows=10)
  ->  Index Scan on orders o  (rows=1 loops=10)
        Index Cond: (o.user_id = u.id)

-- Characteristics:
-- ✓ Fast for small outer table with indexed inner
-- ✓ Good when inner loop uses index
-- ✗ Slow for large tables without index

Hash Join

Hash Join  (cost=30.00..200.00 rows=3000 width=200)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (rows=10000)
  ->  Hash  (cost=20.00..20.00 rows=1000)
        ->  Seq Scan on users u  (rows=1000)

-- Characteristics:
-- ✓ Builds hash table from smaller table
-- ✓ Good for larger joins without indexes
-- ✓ Memory-intensive

Merge Join

Merge Join  (cost=500.00..700.00 rows=5000 width=200)
  Merge Cond: (u.id = o.user_id)
  ->  Sort  (cost=200.00..210.00 rows=1000)
        Sort Key: u.id
        ->  Seq Scan on users u
  ->  Sort  (cost=300.00..320.00 rows=10000)
        Sort Key: o.user_id
        ->  Seq Scan on orders o

-- Characteristics:
-- ✓ Requires sorted input (or sorts first)
-- ✓ Good for large sorted datasets
-- ✓ Can use index for sort

MySQL EXPLAIN

Basic Usage

-- Basic explain
EXPLAIN SELECT * FROM users WHERE id = 1;

-- Detailed format (MySQL 8.0+)
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE id = 1;

-- Analyze actual execution (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1;

Reading MySQL Output

EXPLAIN SELECT * FROM orders WHERE user_id = 5 AND status = 'pending';

-- Output columns:
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | orders | ref  | idx_user_id   | idx  | 4       | const| 100  | 10.00    | Using where |
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+

-- type column (best to worst):
-- system > const > eq_ref > ref > range > index > ALL

Access Types

┌─────────────────────────────────────────────────────────┐
│ const     - Single row using PRIMARY KEY or UNIQUE      │
│ eq_ref    - One row per combination (JOIN on PK)        │
│ ref       - Multiple rows using index prefix            │
│ range     - Index range scan (BETWEEN, <, >)            │
│ index     - Full index scan                             │
│ ALL       - Full table scan (usually bad!)              │
└─────────────────────────────────────────────────────────┘

Identifying Problems

Missing Index

-- PostgreSQL: Sequential scan on large table
Seq Scan on orders  (cost=0.00..50000.00 rows=1000000)
  Filter: (user_id = 5)
  Rows Removed by Filter: 999000

-- Solution
CREATE INDEX idx_orders_user_id ON orders(user_id);

Poor Estimate vs Actual

-- Estimated 10 rows, actually 10000
Index Scan on users (cost=0.29..100.00 rows=10)
                    (actual time=0.5..500.0 rows=10000 loops=1)

-- Causes: Outdated statistics
-- Solution
ANALYZE users;  -- PostgreSQL
ANALYZE TABLE users;  -- MySQL

Inefficient Join Order

-- Large table scanned first
Nested Loop  (actual time=5000.0..6000.0 rows=10)
  ->  Seq Scan on large_table  (rows=1000000)
  ->  Index Scan on small_table  (rows=1 loops=1000000)

-- Should start with smaller/more filtered table

Sort Spilling to Disk

Sort  (cost=500.00..550.00 rows=10000)
  Sort Key: created_at
  Sort Method: external merge  Disk: 5000kB  -- Bad!

-- Solution: Increase work_mem or add index
SET work_mem = '256MB';
-- Or
CREATE INDEX idx_orders_created ON orders(created_at);

Optimization Tips

Force Index Usage (When Needed)

-- PostgreSQL: Hints via enable_* settings
SET enable_seqscan = off;
EXPLAIN SELECT * FROM users WHERE status = 'active';
SET enable_seqscan = on;

-- MySQL: Index hints
SELECT * FROM users USE INDEX (idx_status) WHERE status = 'active';
SELECT * FROM users FORCE INDEX (idx_status) WHERE status = 'active';

Update Statistics

-- PostgreSQL
ANALYZE;  -- All tables
ANALYZE users;  -- Specific table
ANALYZE users (email, status);  -- Specific columns

-- MySQL
ANALYZE TABLE users;
OPTIMIZE TABLE users;  -- Also defragments

Planner Configuration

-- PostgreSQL: Adjust cost parameters
SET random_page_cost = 1.1;  -- SSD vs HDD
SET effective_cache_size = '4GB';
SET work_mem = '256MB';

-- MySQL: Optimizer switches
SET optimizer_switch = 'index_merge=on';
SET optimizer_switch = 'mrr=on,mrr_cost_based=off';

Tools for Analysis

PostgreSQL:
- pgAdmin: Visual explain plans
- auto_explain: Log slow query plans automatically
- pg_stat_statements: Query statistics

MySQL:
- MySQL Workbench: Visual explain
- Performance Schema: Detailed metrics
- SHOW PROFILE: Query profiling

Third-party:
- explain.dalibo.com: PostgreSQL plan visualization
- PEV2: Another PostgreSQL explain visualizer
- Percona Toolkit: MySQL analysis tools

Best Practices

  1. Always use EXPLAIN ANALYZE: Estimates can be wrong
  2. Check actual vs estimated rows: Large differences indicate stale statistics
  3. Look for sequential scans on large tables: Usually need indexes
  4. Monitor buffer hits vs reads: Reads indicate cache misses
  5. Watch for sort/hash spills to disk: Increase work_mem if needed
  6. Test with production-like data: Plans change with data volume
  7. Re-analyze after major data changes: Keep statistics current

On this page