Database Replication

Replication is the process of copying data to multiple nodes to improve availability, fault tolerance, and read performance.

Why Replication?

┌─────────────────────────────────────────────────────────┐
│ Purpose of replication:                                 │
│                                                         │
│ 1. High availability: Auto-failover when nodes fail     │
│ 2. Read scaling: Distribute read load across replicas   │
│ 3. Geographic distribution: Place data close to users  │
│ 4. Disaster recovery: Prevent data loss                 │
└─────────────────────────────────────────────────────────┘

Single-Leader Replication

The most common replication mode where one primary node handles writes and multiple followers sync data.

Architecture

                    ┌───────────────┐
                    │    Primary    │
   Writes ─────────▶│   (Leader)    │
                    └───────┬───────┘
                            │
              ┌─────────────┼─────────────┐
              │             │             │
              ▼             ▼             ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
 Reads◀─│ Replica1 │  │ Replica2 │  │ Replica3 │
        │(Follower)│  │(Follower)│  │(Follower)│
        └──────────┘  └──────────┘  └──────────┘

Synchronous vs Asynchronous Replication

Synchronous Replication:
┌─────────────────────────────────────────────────────────┐
│ Primary     Replica                                     │
│    │          │                                         │
│    │──Write──▶│                                         │
│    │          │ Apply                                   │
│    │◀───ACK───│                                         │
│    │          │                                         │
│  Commit       │                                         │
│    │◀─Client──│                                         │
│                                                         │
│ Pros: No data loss, strong consistency                  │
│ Cons: Higher latency, replica failure affects writes    │
└─────────────────────────────────────────────────────────┘

Asynchronous Replication:
┌─────────────────────────────────────────────────────────┐
│ Primary     Replica                                     │
│    │          │                                         │
│  Commit       │                                         │
│    │◀─Client──│                                         │
│    │          │                                         │
│    │──Write──▶│                                         │
│    │          │ Apply (background)                      │
│                                                         │
│ Pros: Low latency, high availability                    │
│ Cons: Primary failure may lose data                     │
└─────────────────────────────────────────────────────────┘

PostgreSQL Streaming Replication

-- Primary node config (postgresql.conf)
wal_level = replica
max_wal_senders = 10
synchronous_commit = on  -- synchronous replication
synchronous_standby_names = 'replica1'

-- Replica node config
primary_conninfo = 'host=primary port=5432 user=replicator'
primary_slot_name = 'replica1_slot'

-- Check replication status
SELECT client_addr, state, sent_lsn, write_lsn, flush_lsn, replay_lsn
FROM pg_stat_replication;

MySQL Primary-Replica Replication

-- Primary node config (my.cnf)
server-id = 1
log_bin = mysql-bin
binlog_format = ROW

-- Replica node config
server-id = 2
relay_log = relay-bin

-- Configure replication
CHANGE MASTER TO
  MASTER_HOST='primary',
  MASTER_USER='replicator',
  MASTER_PASSWORD='password',
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=0;

START SLAVE;
SHOW SLAVE STATUS\G

Multi-Leader Replication

Multiple nodes can accept writes, suitable for multi-datacenter deployments.

Architecture

     Datacenter A              Datacenter B
    ┌─────────────┐           ┌─────────────┐
    │   Leader A  │◀─────────▶│   Leader B  │
    │   (Write)   │  Conflict │   (Write)   │
    └──────┬──────┘  Resolve  └──────┬──────┘
           │                         │
    ┌──────▼──────┐           ┌──────▼──────┐
    │  Followers  │           │  Followers  │
    └─────────────┘           └─────────────┘

Conflict Handling

Write conflict scenario:
┌─────────────────────────────────────────────────────────┐
│ Leader A: UPDATE users SET name = 'Alice' WHERE id = 1; │
│ Leader B: UPDATE users SET name = 'Bob' WHERE id = 1;   │
│ Happens simultaneously, how to resolve?                 │
└─────────────────────────────────────────────────────────┘

Conflict resolution strategies:
1. Last Write Wins (LWW)
   - Based on timestamp, newer overwrites older
   - Simple but may lose data

2. Merge Values
   - Combine conflicting values: name = 'Alice,Bob'
   - Requires application-level handling

3. Custom Logic
   - Decide based on business rules
   - e.g., take higher version number

4. Keep All Versions
   - Let user or application decide
   - Like CouchDB conflict documents

CRDTs (Conflict-free Replicated Data Types)

Conflict-free replicated data types:

G-Counter (Grow-only counter):
┌─────────────────────────────────────────────────────────┐
│ Node A: {A: 5, B: 0}                                    │
│ Node B: {A: 0, B: 3}                                    │
│ Merge: {A: 5, B: 3} → Total = 8                         │
│ Same result regardless of merge order                   │
└─────────────────────────────────────────────────────────┘

LWW-Register (Last-Write-Wins Register):
┌─────────────────────────────────────────────────────────┐
│ Each write has timestamp                                │
│ {value: 'Alice', timestamp: 1000}                       │
│ {value: 'Bob', timestamp: 1001}                         │
│ Merge takes larger timestamp: 'Bob'                     │
└─────────────────────────────────────────────────────────┘

Leaderless Replication

All nodes can accept reads and writes, using quorum for consistency.

Architecture

        Client
           │
     ┌─────┼─────┐
     │     │     │
     ▼     ▼     ▼
  ┌─────┐┌─────┐┌─────┐
  │Node1││Node2││Node3│
  │ R/W ││ R/W ││ R/W │
  └─────┘└─────┘└─────┘

Write: Send to all nodes, wait for W acknowledgments
Read: Query all nodes, wait for R responses

Quorum Mechanism

Quorum formula: R + W > N

N = Total number of replicas
W = Number of nodes that must acknowledge writes
R = Number of nodes that must respond to reads

Example (N=3):
┌─────────────────────────────────────────────────────────┐
│ W=2, R=2 → Strong consistency (2+2>3)                   │
│ W=3, R=1 → Strong consistency, slower writes            │
│ W=1, R=3 → Strong consistency, slower reads             │
│ W=1, R=1 → Eventual consistency (1+1≤3)                 │
└─────────────────────────────────────────────────────────┘

Cassandra Implementation

-- Create Keyspace with replication factor 3
CREATE KEYSPACE myapp WITH REPLICATION = {
  'class': 'SimpleStrategy',
  'replication_factor': 3
};

-- Specify consistency level when writing
INSERT INTO users (id, name) VALUES (1, 'John')
USING CONSISTENCY QUORUM;

-- QUORUM = (RF/2) + 1 = 2

-- Specify consistency level when reading
SELECT * FROM users WHERE id = 1
CONSISTENCY QUORUM;

Failover

Handling mechanism when primary node fails.

Automatic Failover

Failure detection and switchover:
┌─────────────────────────────────────────────────────────┐
│ 1. Heartbeat detection: Periodically check primary      │
│ 2. Failure confirmation: Confirm after multiple misses  │
│ 3. Election: Select new primary node                    │
│ 4. Switchover: Update config, redirect traffic          │
│ 5. Recovery: Old primary becomes follower when back     │
└─────────────────────────────────────────────────────────┘

Timeline:
T0: Primary running normally
    Primary ◀──── Heartbeat ────▶ Follower
T1: Primary fails
    Primary (❌) ◀──── Heartbeat (timeout) ────▶ Follower
T2: Failure detection
    Threshold exceeded, failure confirmed
T3: Elect new Primary
    Follower1 elected as new Primary
T4: Traffic switchover
    Client ────▶ New Primary (formerly Follower1)

Split Brain Problem

Split Brain:
┌─────────────────────────────────────────────────────────┐
│ Network partition causes both nodes to think they are   │
│ the primary                                             │
│                                                         │
│ ┌──────────┐   ╳╳╳╳╳╳   ┌──────────┐                   │
│ │ Primary  │  Network   │ Primary? │                   │
│ │  (old)   │   split    │  (newly  │                   │
│ └──────────┘            │ elected) │                   │
│      │                  └──────────┘                   │
│    Writes                    │                         │
│   (conflicting!)          Writes                       │
│                        (conflicting!)                   │
└─────────────────────────────────────────────────────────┘

Solutions:
1. Fencing: Ensure old primary cannot write (STONITH)
2. Quorum: Minority partition auto-demotes
3. Lease: Primary holds time-limited lease

Consensus Algorithms

Algorithms for reaching agreement in distributed systems.

Raft Algorithm

Raft state machine:
┌─────────────────────────────────────────────────────────┐
│                                                         │
│  ┌──────────┐  Election timeout  ┌──────────┐          │
│  │ Follower │───────────────────▶│ Candidate│          │
│  └──────────┘                    └──────────┘          │
│       ▲                               │                │
│       │    Discovers higher term      │ Wins majority  │
│       │                               ▼                │
│       │                         ┌──────────┐          │
│       └─────────────────────────│  Leader  │          │
│                                 └──────────┘          │
│                                                         │
└─────────────────────────────────────────────────────────┘

Leader election:
1. Follower times out without heartbeat
2. Becomes Candidate, increments term
3. Requests votes from other nodes
4. Wins majority to become Leader
5. Leader sends periodic heartbeats

Log replication:
1. Client sends write request to Leader
2. Leader appends to local log
3. Sends in parallel to all Followers
4. Commits after majority confirm
5. Responds to Client

Paxos Algorithm

Paxos roles:
┌─────────────────────────────────────────────────────────┐
│ Proposer: Initiates proposals                           │
│ Acceptor: Votes on decisions                            │
│ Learner: Learns the final decision                      │
└─────────────────────────────────────────────────────────┘

Two-phase commit:
Phase 1 (Prepare):
  Proposer → Acceptors: "Prepare proposal N"
  Acceptors → Proposer: "Promise not to accept < N"

Phase 2 (Accept):
  Proposer → Acceptors: "Accept proposal N, value V"
  Acceptors → Proposer: "Accepted"

  Majority accepts → Proposal passes

Best Practices

Choose appropriate replication mode:
- Single-leader: Most scenarios
- Multi-leader: Multi-datacenter
- Leaderless: High availability requirements
Configure appropriate sync level:
- Critical data: Synchronous replication
- General data: Asynchronous replication

Monitor replication lag:

-- PostgreSQL
SELECT NOW() - pg_last_xact_replay_timestamp() AS replication_lag;

Test failover:
- Regularly practice failover
- Verify data integrity
Handle consistency issues:
- Critical reads go to primary
- Use read-your-writes guarantee

Database Replication

On this page