Docs For AI
Indexing

Index Strategies

Advanced index strategies including composite indexes, covering indexes, and partial indexes

Index Strategies

Mastering advanced index strategies can significantly improve the performance of complex queries.

Composite Index

Composite indexes are created on multiple columns. Column order is crucial.

Leftmost Prefix Rule

CREATE INDEX idx_orders ON orders(user_id, status, created_at);

-- Can use index:
WHERE user_id = 1
WHERE user_id = 1 AND status = 'active'
WHERE user_id = 1 AND status = 'active' AND created_at > '2024-01-01'

-- Cannot use index (skipped user_id):
WHERE status = 'active'
WHERE status = 'active' AND created_at > '2024-01-01'

-- Partially uses index:
WHERE user_id = 1 AND created_at > '2024-01-01'  -- Only uses user_id part

Column Order Selection Strategy

-- Rule 1: Put equality query columns first
WHERE user_id = 1 AND created_at > '2024-01-01'
-- Recommended: INDEX(user_id, created_at)

-- Rule 2: Put high selectivity columns first
-- user_id has high selectivity (many different values), status has low (few values)
-- Recommended: INDEX(user_id, status) not INDEX(status, user_id)

-- Rule 3: Consider sorting requirements
SELECT * FROM orders WHERE user_id = 1 ORDER BY created_at DESC
-- Recommended: INDEX(user_id, created_at DESC)

Composite Index Example

-- E-commerce order query scenario
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at DESC);

-- Efficient queries supported:
-- All orders for a user
SELECT * FROM orders WHERE user_id = 123;

-- Pending orders for a user
SELECT * FROM orders WHERE user_id = 123 AND status = 'pending';

-- Recent orders with specific status for a user
SELECT * FROM orders
WHERE user_id = 123 AND status = 'completed'
ORDER BY created_at DESC LIMIT 10;

Covering Index

Covering indexes contain all columns needed by the query, avoiding table lookups.

Principle Comparison

Regular index query (requires table lookup):
1. Search index → Get row_id
2. Access table via row_id → Get other columns
3. Return result

Covering index query (Index-Only Scan):
1. Search index → Directly get all needed columns
2. Return result (no table access needed)

PostgreSQL INCLUDE Syntax

-- Create covering index
CREATE INDEX idx_orders_covering ON orders(user_id, status)
INCLUDE (total, created_at);

-- This query can be fully satisfied by the index
SELECT user_id, status, total, created_at
FROM orders
WHERE user_id = 123 AND status = 'completed';

-- EXPLAIN will show "Index Only Scan"

MySQL Covering Index

-- MySQL achieves covering by including all query columns in the index
CREATE INDEX idx_orders_covering ON orders(user_id, status, total, created_at);

-- Query reads only from index, no table data access
EXPLAIN SELECT user_id, status, total, created_at
FROM orders WHERE user_id = 123;
-- Extra: Using index

Partial Index

Partial indexes only index rows that meet certain conditions, reducing index size.

PostgreSQL Partial Index

-- Index only active users
CREATE INDEX idx_active_users ON users(email)
WHERE status = 'active';

-- Index only pending orders
CREATE INDEX idx_pending_orders ON orders(created_at)
WHERE status = 'pending';

-- Index only important logs
CREATE INDEX idx_error_logs ON logs(created_at, message)
WHERE level IN ('ERROR', 'CRITICAL');

Benefit Analysis

-- Scenario: 10 million users, only 100,000 active users

-- Full index
CREATE INDEX idx_users_email ON users(email);
-- Index size: ~400MB
-- Querying active users: Scans entire index

-- Partial index
CREATE INDEX idx_active_users_email ON users(email)
WHERE status = 'active';
-- Index size: ~4MB
-- Querying active users: Scans only partial index

Expression Index

Create indexes on the results of expressions or functions.

-- Case-insensitive search
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';

-- Date part index
CREATE INDEX idx_orders_year_month ON orders(
    EXTRACT(YEAR FROM created_at),
    EXTRACT(MONTH FROM created_at)
);
SELECT * FROM orders
WHERE EXTRACT(YEAR FROM created_at) = 2024
  AND EXTRACT(MONTH FROM created_at) = 6;

-- JSON field index
CREATE INDEX idx_users_country ON users((metadata->>'country'));
SELECT * FROM users WHERE metadata->>'country' = 'USA';

Index for Sorting

Avoiding filesort

-- Create index that supports sorting
CREATE INDEX idx_products_price ON products(category_id, price DESC);

-- Efficient query (uses index for sorting)
SELECT * FROM products
WHERE category_id = 5
ORDER BY price DESC
LIMIT 20;

-- Note sort direction
CREATE INDEX idx_orders_date_desc ON orders(user_id, created_at DESC);
-- ORDER BY created_at DESC -- Uses index order
-- ORDER BY created_at ASC  -- Requires reverse scan

Multi-column Sorting

-- PostgreSQL supports mixed sort directions
CREATE INDEX idx_products_sort ON products(
    category_id ASC,
    rating DESC,
    price ASC
);

SELECT * FROM products
WHERE category_id = 5
ORDER BY rating DESC, price ASC;

Index Maintenance

Monitoring Index Usage

-- PostgreSQL: View index usage statistics
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT
    schemaname || '.' || tablename as table,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%_pkey';

Index Bloat and Rebuilding

-- PostgreSQL: Check index bloat
SELECT
    tablename,
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) as current_size,
    pg_size_pretty(pg_relation_size(indexrelid) * (1 - (n_dead_tup::float / GREATEST(n_live_tup, 1)))) as estimated_optimal
FROM pg_stat_user_tables t
JOIN pg_stat_user_indexes i USING (relname);

-- Concurrent index rebuild (doesn't block writes)
REINDEX INDEX CONCURRENTLY idx_orders_user_id;

-- Or create new index then drop old one
CREATE INDEX CONCURRENTLY idx_orders_user_id_new ON orders(user_id);
DROP INDEX idx_orders_user_id;
ALTER INDEX idx_orders_user_id_new RENAME TO idx_orders_user_id;

Common Anti-Patterns

1. Over-Indexing

-- Bad: Separate index for each column
CREATE INDEX idx1 ON orders(user_id);
CREATE INDEX idx2 ON orders(status);
CREATE INDEX idx3 ON orders(created_at);
CREATE INDEX idx4 ON orders(total);

-- Better: Create composite index based on query patterns
CREATE INDEX idx_orders_main ON orders(user_id, status, created_at);

2. Indexing Low-Selectivity Columns

-- Bad: Index on boolean or enum columns alone
CREATE INDEX idx_users_active ON users(is_active);  -- Only true/false

-- Better: Combine with high-selectivity columns, or use partial index
CREATE INDEX idx_active_users ON users(created_at) WHERE is_active = true;

3. Functions Causing Index Invalidation

-- Bad: Function applied to column
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- Index idx_users_email cannot be used

-- Better: Create expression index or adjust query
CREATE INDEX idx_users_email_upper ON users(UPPER(email));
-- Or store normalized data

4. Implicit Type Conversion

-- Bad: Type mismatch causes index invalidation
-- user_id is INT, but passing string
SELECT * FROM orders WHERE user_id = '123';

-- Better: Ensure type matching
SELECT * FROM orders WHERE user_id = 123;

Best Practices Summary

StrategyUse Case
Composite indexMulti-condition combined queries
Covering indexAvoid table lookup, improve SELECT performance
Partial indexOnly need to index partial data
Expression indexFunction or computed field queries
Descending indexORDER BY DESC optimization

Golden Rules:

  1. Analyze query patterns first, then design indexes
  2. Use EXPLAIN to verify index usage
  3. Regularly monitor index usage
  4. Remove unused indexes
  5. Regularly maintain indexes (rebuild bloated indexes)

On this page