Index Types
Detailed explanation of B-Tree, Hash, GIN, GiST and other index types
Index Types
Different index types are suitable for different query patterns and data types. Understanding the principles of each index type helps in choosing the best solution.
B-Tree Index
B-Tree (Balanced Tree) is the most commonly used index type, supporting equality and range queries.
Structure
┌─────────────────────┐
│ [50] │ Root
└─────────┬───────────┘
┌─────────────┴─────────────┐
▼ ▼
┌─────────────┐ ┌─────────────┐
│ [20, 35] │ │ [70, 85] │ Internal
└──┬────┬────┬┘ └──┬────┬────┬┘
│ │ │ │ │ │
┌──────┘ │ └────┐ ┌─────┘ │ └─────┐
▼ ▼ ▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│5,10,15 │→│22,25,30 │→│38,42,48 │→│55,60,65 │→│72,78,82 │→│88,90,95 │ Leaf
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
↓ ↓ ↓ ↓ ↓ ↓
Row Pointers (pointing to actual data rows)Supported Operations
-- B-Tree can accelerate the following queries:
WHERE column = value -- Equality query
WHERE column > value -- Greater than
WHERE column < value -- Less than
WHERE column >= value -- Greater than or equal
WHERE column <= value -- Less than or equal
WHERE column BETWEEN a AND b -- Range query
WHERE column LIKE 'prefix%' -- Prefix matching
ORDER BY column -- SortingCreation Examples
-- PostgreSQL / MySQL
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_date ON orders(created_at DESC);
-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);Hash Index
Hash indexes use a hash function to map keys to buckets, only supporting equality queries.
Structure
Hash Function: hash(key) → bucket_number
Bucket 0: [key1 → row_ptr] → [key5 → row_ptr]
Bucket 1: [key2 → row_ptr]
Bucket 2: [key3 → row_ptr] → [key7 → row_ptr] → [key9 → row_ptr]
Bucket 3: [key4 → row_ptr] → [key6 → row_ptr]
...Characteristics
| Feature | B-Tree | Hash |
|---|---|---|
| Equality query | O(log n) | O(1) |
| Range query | Supported | Not supported |
| Sorting | Supported | Not supported |
| Prefix matching | Supported | Not supported |
| Storage overhead | Larger | Smaller |
Use Cases
-- PostgreSQL Hash index (suitable for high-cardinality equality queries)
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);
-- Good for: WHERE token = 'abc123'
-- Not for: WHERE token > 'abc' or ORDER BY tokenNote: MySQL InnoDB does not support Hash indexes (Memory engine does).
B+ Tree vs B-Tree
Most databases actually use B+ Tree:
B+ Tree Characteristics:
┌──────────────────────────────────────────────────────────┐
│ 1. All data stored in leaf nodes │
│ 2. Leaf nodes form doubly linked list for range queries │
│ 3. Non-leaf nodes only store keys, allowing more entries │
└──────────────────────────────────────────────────────────┘
[30, 60] ← Internal nodes only store keys
/ | \
[10,20] [40,50] [70,80] ← Internal nodes
│ │ │
┌─────┴───────┴───────┴─────┐
▼ ▼
[10→data]↔[20→data]↔[30→data]↔[40→data]↔... ← Leaf nodes form linked listFull-Text Index
Full-text indexes are used for efficient text content searching.
PostgreSQL Full-Text Search
-- Create GIN full-text index
CREATE INDEX idx_articles_content ON articles
USING GIN (to_tsvector('english', title || ' ' || content));
-- Query
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('database & optimization');
-- Query with ranking
SELECT
title,
ts_rank(to_tsvector('english', content), query) as rank
FROM articles, to_tsquery('database | postgresql') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;MySQL Full-Text Search
-- Create full-text index
CREATE FULLTEXT INDEX idx_articles_fulltext ON articles(title, content);
-- Natural language search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('database optimization' IN NATURAL LANGUAGE MODE);
-- Boolean mode search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('+database -mysql' IN BOOLEAN MODE);GIN Index (PostgreSQL)
GIN (Generalized Inverted Index) is suitable for columns containing multiple values.
Supported Data Types
-- Arrays
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['database', 'postgresql'];
-- JSONB
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
SELECT * FROM users WHERE metadata @> '{"role": "admin"}';
-- Full-text search
CREATE INDEX idx_docs_search ON documents USING GIN (to_tsvector('english', content));GIN vs GiST
| Feature | GIN | GiST |
|---|---|---|
| Build speed | Slow | Fast |
| Query speed | Fast | Slower |
| Update speed | Slow | Fast |
| Index size | Large | Small |
| Best for | Read-heavy | Frequent updates |
GiST Index (PostgreSQL)
GiST (Generalized Search Tree) supports various advanced data types.
-- Geospatial data (PostGIS)
CREATE INDEX idx_locations_geom ON locations USING GIST (geom);
SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-73.9857, 40.7484)::geography, 1000);
-- Range types
CREATE INDEX idx_events_duration ON events USING GIST (duration);
SELECT * FROM events WHERE duration && '[2024-01-01, 2024-01-31]'::daterange;
-- Exclusion constraint (prevent overlapping)
ALTER TABLE reservations
ADD CONSTRAINT no_overlapping_reservations
EXCLUDE USING GIST (room_id WITH =, time_range WITH &&);BRIN Index (PostgreSQL)
BRIN (Block Range Index) is suitable for large tables with naturally ordered data.
-- Time series data (log tables)
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);
-- Characteristics:
-- - Very small index (possibly only 1% of B-Tree size)
-- - Suitable for data stored in physical order
-- - May scan more data blocks during queriesBRIN vs B-Tree Comparison
Table: logs (100 million rows, ordered by created_at)
B-Tree Index:
├── Size: ~2GB
├── Query time: ~1ms
└── Best for: Small range precise queries
BRIN Index:
├── Size: ~20MB
├── Query time: ~50ms
└── Best for: Large range scans, storage-constrained scenariosIndex Selection Guide
| Query Type | Recommended Index Type |
|---|---|
| Equality query (=) | B-Tree, Hash |
| Range query (BETWEEN, greater/less than) | B-Tree |
| Prefix matching (LIKE 'abc%') | B-Tree |
| Full-text search | GIN (Full-Text) |
| Array containment | GIN |
| JSONB queries | GIN |
| Geospatial | GiST |
| Range overlap | GiST |
| Large tables with ordered data | BRIN |