Docs For AI
Indexing

Index Types

Detailed explanation of B-Tree, Hash, GIN, GiST and other index types

Index Types

Different index types are suitable for different query patterns and data types. Understanding the principles of each index type helps in choosing the best solution.

B-Tree Index

B-Tree (Balanced Tree) is the most commonly used index type, supporting equality and range queries.

Structure

                        ┌─────────────────────┐
                        │    [50]             │  Root
                        └─────────┬───────────┘
                    ┌─────────────┴─────────────┐
                    ▼                           ▼
          ┌─────────────┐             ┌─────────────┐
          │  [20, 35]   │             │  [70, 85]   │  Internal
          └──┬────┬────┬┘             └──┬────┬────┬┘
             │    │    │                 │    │    │
      ┌──────┘    │    └────┐     ┌─────┘    │    └─────┐
      ▼           ▼         ▼     ▼          ▼          ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│5,10,15  │→│22,25,30 │→│38,42,48 │→│55,60,65 │→│72,78,82 │→│88,90,95 │  Leaf
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
     ↓           ↓           ↓           ↓           ↓           ↓
 Row Pointers (pointing to actual data rows)

Supported Operations

-- B-Tree can accelerate the following queries:
WHERE column = value           -- Equality query
WHERE column > value           -- Greater than
WHERE column < value           -- Less than
WHERE column >= value          -- Greater than or equal
WHERE column <= value          -- Less than or equal
WHERE column BETWEEN a AND b   -- Range query
WHERE column LIKE 'prefix%'    -- Prefix matching
ORDER BY column                -- Sorting

Creation Examples

-- PostgreSQL / MySQL
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_date ON orders(created_at DESC);

-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

Hash Index

Hash indexes use a hash function to map keys to buckets, only supporting equality queries.

Structure

Hash Function: hash(key) → bucket_number

Bucket 0: [key1 → row_ptr] → [key5 → row_ptr]
Bucket 1: [key2 → row_ptr]
Bucket 2: [key3 → row_ptr] → [key7 → row_ptr] → [key9 → row_ptr]
Bucket 3: [key4 → row_ptr] → [key6 → row_ptr]
...

Characteristics

FeatureB-TreeHash
Equality queryO(log n)O(1)
Range querySupportedNot supported
SortingSupportedNot supported
Prefix matchingSupportedNot supported
Storage overheadLargerSmaller

Use Cases

-- PostgreSQL Hash index (suitable for high-cardinality equality queries)
CREATE INDEX idx_sessions_token ON sessions USING HASH (token);

-- Good for: WHERE token = 'abc123'
-- Not for: WHERE token > 'abc' or ORDER BY token

Note: MySQL InnoDB does not support Hash indexes (Memory engine does).

B+ Tree vs B-Tree

Most databases actually use B+ Tree:

B+ Tree Characteristics:
┌──────────────────────────────────────────────────────────┐
│ 1. All data stored in leaf nodes                         │
│ 2. Leaf nodes form doubly linked list for range queries  │
│ 3. Non-leaf nodes only store keys, allowing more entries │
└──────────────────────────────────────────────────────────┘

             [30, 60]            ← Internal nodes only store keys
            /    |    \
       [10,20] [40,50] [70,80]   ← Internal nodes
          │       │       │
    ┌─────┴───────┴───────┴─────┐
    ▼                           ▼
[10→data]↔[20→data]↔[30→data]↔[40→data]↔...  ← Leaf nodes form linked list

Full-Text Index

Full-text indexes are used for efficient text content searching.

-- Create GIN full-text index
CREATE INDEX idx_articles_content ON articles
USING GIN (to_tsvector('english', title || ' ' || content));

-- Query
SELECT * FROM articles
WHERE to_tsvector('english', title || ' ' || content) @@ to_tsquery('database & optimization');

-- Query with ranking
SELECT
    title,
    ts_rank(to_tsvector('english', content), query) as rank
FROM articles, to_tsquery('database | postgresql') query
WHERE to_tsvector('english', content) @@ query
ORDER BY rank DESC;
-- Create full-text index
CREATE FULLTEXT INDEX idx_articles_fulltext ON articles(title, content);

-- Natural language search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('database optimization' IN NATURAL LANGUAGE MODE);

-- Boolean mode search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('+database -mysql' IN BOOLEAN MODE);

GIN Index (PostgreSQL)

GIN (Generalized Inverted Index) is suitable for columns containing multiple values.

Supported Data Types

-- Arrays
CREATE INDEX idx_posts_tags ON posts USING GIN (tags);
SELECT * FROM posts WHERE tags @> ARRAY['database', 'postgresql'];

-- JSONB
CREATE INDEX idx_users_metadata ON users USING GIN (metadata);
SELECT * FROM users WHERE metadata @> '{"role": "admin"}';

-- Full-text search
CREATE INDEX idx_docs_search ON documents USING GIN (to_tsvector('english', content));

GIN vs GiST

FeatureGINGiST
Build speedSlowFast
Query speedFastSlower
Update speedSlowFast
Index sizeLargeSmall
Best forRead-heavyFrequent updates

GiST Index (PostgreSQL)

GiST (Generalized Search Tree) supports various advanced data types.

-- Geospatial data (PostGIS)
CREATE INDEX idx_locations_geom ON locations USING GIST (geom);
SELECT * FROM locations
WHERE ST_DWithin(geom, ST_MakePoint(-73.9857, 40.7484)::geography, 1000);

-- Range types
CREATE INDEX idx_events_duration ON events USING GIST (duration);
SELECT * FROM events WHERE duration && '[2024-01-01, 2024-01-31]'::daterange;

-- Exclusion constraint (prevent overlapping)
ALTER TABLE reservations
ADD CONSTRAINT no_overlapping_reservations
EXCLUDE USING GIST (room_id WITH =, time_range WITH &&);

BRIN Index (PostgreSQL)

BRIN (Block Range Index) is suitable for large tables with naturally ordered data.

-- Time series data (log tables)
CREATE INDEX idx_logs_created ON logs USING BRIN (created_at);

-- Characteristics:
-- - Very small index (possibly only 1% of B-Tree size)
-- - Suitable for data stored in physical order
-- - May scan more data blocks during queries

BRIN vs B-Tree Comparison

Table: logs (100 million rows, ordered by created_at)

B-Tree Index:
├── Size: ~2GB
├── Query time: ~1ms
└── Best for: Small range precise queries

BRIN Index:
├── Size: ~20MB
├── Query time: ~50ms
└── Best for: Large range scans, storage-constrained scenarios

Index Selection Guide

Query TypeRecommended Index Type
Equality query (=)B-Tree, Hash
Range query (BETWEEN, greater/less than)B-Tree
Prefix matching (LIKE 'abc%')B-Tree
Full-text searchGIN (Full-Text)
Array containmentGIN
JSONB queriesGIN
GeospatialGiST
Range overlapGiST
Large tables with ordered dataBRIN

On this page