Optimization
Execution Plans
Understanding and analyzing query execution plans for optimization
Execution Plans
Execution plans show how the database engine will execute a query. Understanding them is essential for query optimization.
What is an Execution Plan?
Query execution flow:
┌─────────────────────────────────────────────────────────┐
│ SQL Query │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Parser │ Syntax checking │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Optimizer │ Generate execution plan │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Executor │ Execute the plan │
│ └─────────────┘ │
└─────────────────────────────────────────────────────────┘
The optimizer chooses:
- Which indexes to use
- Join order and method
- Scan method (sequential vs index)
- Aggregation strategyPostgreSQL EXPLAIN
Basic Usage
-- Show execution plan
EXPLAIN SELECT * FROM users WHERE id = 1;
-- Show plan with actual execution statistics
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1;
-- Show detailed output including buffers
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM users WHERE id = 1;
-- JSON format for tooling
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM users WHERE id = 1;Reading the Output
EXPLAIN ANALYZE
SELECT u.name, COUNT(o.id)
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id;
-- Output:
HashAggregate (cost=245.00..255.00 rows=1000 width=40)
(actual time=15.2..18.4 rows=950 loops=1)
Group Key: u.id
-> Hash Left Join (cost=30.00..200.00 rows=3000 width=36)
(actual time=2.1..12.3 rows=2800 loops=1)
Hash Cond: (u.id = o.user_id)
-> Index Scan using idx_users_created ON users u
(cost=0.29..50.00 rows=1000 width=36)
(actual time=0.02..1.2 rows=950 loops=1)
Index Cond: (created_at > '2024-01-01')
-> Hash (cost=20.00..20.00 rows=800 width=8)
(actual time=1.8..1.8 rows=2000 loops=1)
-> Seq Scan on orders o
(cost=0.00..20.00 rows=800 width=8)
(actual time=0.01..0.9 rows=2000 loops=1)
Planning Time: 0.5 ms
Execution Time: 19.1 msKey Metrics
┌─────────────────────────────────────────────────────────┐
│ cost=startup..total │
│ - Estimated cost units (not time!) │
│ - startup: cost before first row returned │
│ - total: cost to return all rows │
│ │
│ rows=N │
│ - Estimated number of rows │
│ │
│ actual time=startup..total │
│ - Real execution time in milliseconds │
│ │
│ rows=N loops=M │
│ - Actual rows returned, number of iterations │
│ │
│ Buffers: shared hit=N read=M │
│ - hit: pages from cache │
│ - read: pages from disk │
└─────────────────────────────────────────────────────────┘Common Scan Types
Sequential Scan
-- Full table scan, reads every row
Seq Scan on users (cost=0.00..1000.00 rows=50000 width=100)
Filter: (status = 'active')
Rows Removed by Filter: 30000
-- When it's used:
-- ✓ No suitable index
-- ✓ Querying large portion of table
-- ✓ Small table
-- When to optimize:
-- ✗ Querying small subset of large table
-- Solution: Add index
CREATE INDEX idx_users_status ON users(status);Index Scan
-- Uses index, then fetches rows from table
Index Scan using idx_users_email on users
(cost=0.29..8.31 rows=1 width=100)
Index Cond: (email = 'user@example.com')
-- Best for:
-- ✓ Selective queries (few rows)
-- ✓ Columns with unique/high cardinalityIndex Only Scan
-- All data from index, no table access
Index Only Scan using idx_users_email_name on users
(cost=0.29..4.31 rows=1 width=50)
Index Cond: (email = 'user@example.com')
Heap Fetches: 0
-- Requires: Covering index with all needed columns
CREATE INDEX idx_users_email_name ON users(email) INCLUDE (name);Bitmap Index Scan
-- Two-phase: build bitmap, then fetch
Bitmap Heap Scan on orders (cost=10.00..500.00 rows=5000 width=100)
Recheck Cond: (status = 'pending')
-> Bitmap Index Scan on idx_orders_status
(cost=0.00..9.00 rows=5000 width=0)
Index Cond: (status = 'pending')
-- Used for:
-- ✓ Medium selectivity queries
-- ✓ OR conditions with multiple indexes
-- ✓ When too many rows for index scan, too few for seq scanJoin Methods
Nested Loop Join
Nested Loop (cost=0.29..100.00 rows=10 width=200)
-> Index Scan on users u (rows=10)
-> Index Scan on orders o (rows=1 loops=10)
Index Cond: (o.user_id = u.id)
-- Characteristics:
-- ✓ Fast for small outer table with indexed inner
-- ✓ Good when inner loop uses index
-- ✗ Slow for large tables without indexHash Join
Hash Join (cost=30.00..200.00 rows=3000 width=200)
Hash Cond: (o.user_id = u.id)
-> Seq Scan on orders o (rows=10000)
-> Hash (cost=20.00..20.00 rows=1000)
-> Seq Scan on users u (rows=1000)
-- Characteristics:
-- ✓ Builds hash table from smaller table
-- ✓ Good for larger joins without indexes
-- ✓ Memory-intensiveMerge Join
Merge Join (cost=500.00..700.00 rows=5000 width=200)
Merge Cond: (u.id = o.user_id)
-> Sort (cost=200.00..210.00 rows=1000)
Sort Key: u.id
-> Seq Scan on users u
-> Sort (cost=300.00..320.00 rows=10000)
Sort Key: o.user_id
-> Seq Scan on orders o
-- Characteristics:
-- ✓ Requires sorted input (or sorts first)
-- ✓ Good for large sorted datasets
-- ✓ Can use index for sortMySQL EXPLAIN
Basic Usage
-- Basic explain
EXPLAIN SELECT * FROM users WHERE id = 1;
-- Detailed format (MySQL 8.0+)
EXPLAIN FORMAT=JSON SELECT * FROM users WHERE id = 1;
-- Analyze actual execution (MySQL 8.0.18+)
EXPLAIN ANALYZE SELECT * FROM users WHERE id = 1;Reading MySQL Output
EXPLAIN SELECT * FROM orders WHERE user_id = 5 AND status = 'pending';
-- Output columns:
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+
| 1 | SIMPLE | orders | ref | idx_user_id | idx | 4 | const| 100 | 10.00 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+------+----------+-------------+
-- type column (best to worst):
-- system > const > eq_ref > ref > range > index > ALLAccess Types
┌─────────────────────────────────────────────────────────┐
│ const - Single row using PRIMARY KEY or UNIQUE │
│ eq_ref - One row per combination (JOIN on PK) │
│ ref - Multiple rows using index prefix │
│ range - Index range scan (BETWEEN, <, >) │
│ index - Full index scan │
│ ALL - Full table scan (usually bad!) │
└─────────────────────────────────────────────────────────┘Identifying Problems
Missing Index
-- PostgreSQL: Sequential scan on large table
Seq Scan on orders (cost=0.00..50000.00 rows=1000000)
Filter: (user_id = 5)
Rows Removed by Filter: 999000
-- Solution
CREATE INDEX idx_orders_user_id ON orders(user_id);Poor Estimate vs Actual
-- Estimated 10 rows, actually 10000
Index Scan on users (cost=0.29..100.00 rows=10)
(actual time=0.5..500.0 rows=10000 loops=1)
-- Causes: Outdated statistics
-- Solution
ANALYZE users; -- PostgreSQL
ANALYZE TABLE users; -- MySQLInefficient Join Order
-- Large table scanned first
Nested Loop (actual time=5000.0..6000.0 rows=10)
-> Seq Scan on large_table (rows=1000000)
-> Index Scan on small_table (rows=1 loops=1000000)
-- Should start with smaller/more filtered tableSort Spilling to Disk
Sort (cost=500.00..550.00 rows=10000)
Sort Key: created_at
Sort Method: external merge Disk: 5000kB -- Bad!
-- Solution: Increase work_mem or add index
SET work_mem = '256MB';
-- Or
CREATE INDEX idx_orders_created ON orders(created_at);Optimization Tips
Force Index Usage (When Needed)
-- PostgreSQL: Hints via enable_* settings
SET enable_seqscan = off;
EXPLAIN SELECT * FROM users WHERE status = 'active';
SET enable_seqscan = on;
-- MySQL: Index hints
SELECT * FROM users USE INDEX (idx_status) WHERE status = 'active';
SELECT * FROM users FORCE INDEX (idx_status) WHERE status = 'active';Update Statistics
-- PostgreSQL
ANALYZE; -- All tables
ANALYZE users; -- Specific table
ANALYZE users (email, status); -- Specific columns
-- MySQL
ANALYZE TABLE users;
OPTIMIZE TABLE users; -- Also defragmentsPlanner Configuration
-- PostgreSQL: Adjust cost parameters
SET random_page_cost = 1.1; -- SSD vs HDD
SET effective_cache_size = '4GB';
SET work_mem = '256MB';
-- MySQL: Optimizer switches
SET optimizer_switch = 'index_merge=on';
SET optimizer_switch = 'mrr=on,mrr_cost_based=off';Tools for Analysis
PostgreSQL:
- pgAdmin: Visual explain plans
- auto_explain: Log slow query plans automatically
- pg_stat_statements: Query statistics
MySQL:
- MySQL Workbench: Visual explain
- Performance Schema: Detailed metrics
- SHOW PROFILE: Query profiling
Third-party:
- explain.dalibo.com: PostgreSQL plan visualization
- PEV2: Another PostgreSQL explain visualizer
- Percona Toolkit: MySQL analysis toolsBest Practices
- Always use EXPLAIN ANALYZE: Estimates can be wrong
- Check actual vs estimated rows: Large differences indicate stale statistics
- Look for sequential scans on large tables: Usually need indexes
- Monitor buffer hits vs reads: Reads indicate cache misses
- Watch for sort/hash spills to disk: Increase work_mem if needed
- Test with production-like data: Plans change with data volume
- Re-analyze after major data changes: Keep statistics current