Docs For AI
Advanced

Advanced Database Concepts

Advanced distributed database concepts - CAP theorem, replication, and sharding

Advanced Database Concepts

As data volume and traffic grow, single-node databases cannot meet demands. Understanding core concepts of distributed databases becomes essential.

Scaling Database

Vertical vs Horizontal Scaling

Vertical Scaling (Scale Up):
┌─────────────────────────────────────────────────────────┐
│ Add more resources to a single machine: CPU, RAM, SSD   │
│ Pros: Simple, no application changes needed             │
│ Cons: Exponential cost increase, has upper limits       │
└─────────────────────────────────────────────────────────┘

   ┌────────┐        ┌────────────────┐
   │ 4 CPU  │   →    │    32 CPU      │
   │ 16GB   │        │    256GB       │
   │ 500GB  │        │    4TB SSD     │
   └────────┘        └────────────────┘

Horizontal Scaling (Scale Out):
┌─────────────────────────────────────────────────────────┐
│ Add more machines, distribute load                      │
│ Pros: Theoretically unlimited, linear cost growth       │
│ Cons: Increased complexity, distributed system issues   │
└─────────────────────────────────────────────────────────┘

   ┌────────┐        ┌────────┐ ┌────────┐ ┌────────┐
   │ Server │   →    │ Node 1 │ │ Node 2 │ │ Node 3 │
   └────────┘        └────────┘ └────────┘ └────────┘

Distributed Database Architecture

                    ┌─────────────────┐
                    │  Load Balancer  │
                    └────────┬────────┘

              ┌──────────────┼──────────────┐
              │              │              │
       ┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐
       │   App 1     │ │   App 2   │ │   App 3   │
       └──────┬──────┘ └─────┬─────┘ └─────┬─────┘
              │              │              │
              └──────────────┼──────────────┘

                    ┌────────▼────────┐
                    │  Query Router   │
                    └────────┬────────┘

         ┌───────────────────┼───────────────────┐
         │                   │                   │
   ┌─────▼─────┐       ┌─────▼─────┐       ┌─────▼─────┐
   │  Shard 1  │       │  Shard 2  │       │  Shard 3  │
   │ (A-H)     │       │ (I-P)     │       │ (Q-Z)     │
   └─────┬─────┘       └─────┬─────┘       └─────┬─────┘
         │                   │                   │
   ┌─────▼─────┐       ┌─────▼─────┐       ┌─────▼─────┐
   │  Replica  │       │  Replica  │       │  Replica  │
   └───────────┘       └───────────┘       └───────────┘

Distributed Database Challenges

ChallengeDescription
Data ConsistencyHow to keep data in sync across nodes
Network PartitionHow to handle network disconnection between nodes
Failure RecoveryHow to ensure availability when nodes fail
Distributed TransactionsHow to guarantee ACID across nodes
Data DistributionHow to distribute data evenly to avoid hotspots
Query RoutingHow to efficiently locate which node holds the data

Topics

  • CAP Theorem - CAP theorem and consistency models
  • Replication - Data replication: primary-replica, multi-leader, consensus algorithms
  • Sharding - Data sharding: sharding strategies, cross-shard queries

On this page