Advanced
Advanced Database Concepts
Advanced distributed database concepts - CAP theorem, replication, and sharding
Advanced Database Concepts
As data volume and traffic grow, single-node databases cannot meet demands. Understanding core concepts of distributed databases becomes essential.
Scaling Database
Vertical vs Horizontal Scaling
Vertical Scaling (Scale Up):
┌─────────────────────────────────────────────────────────┐
│ Add more resources to a single machine: CPU, RAM, SSD │
│ Pros: Simple, no application changes needed │
│ Cons: Exponential cost increase, has upper limits │
└─────────────────────────────────────────────────────────┘
┌────────┐ ┌────────────────┐
│ 4 CPU │ → │ 32 CPU │
│ 16GB │ │ 256GB │
│ 500GB │ │ 4TB SSD │
└────────┘ └────────────────┘
Horizontal Scaling (Scale Out):
┌─────────────────────────────────────────────────────────┐
│ Add more machines, distribute load │
│ Pros: Theoretically unlimited, linear cost growth │
│ Cons: Increased complexity, distributed system issues │
└─────────────────────────────────────────────────────────┘
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ Server │ → │ Node 1 │ │ Node 2 │ │ Node 3 │
└────────┘ └────────┘ └────────┘ └────────┘Distributed Database Architecture
┌─────────────────┐
│ Load Balancer │
└────────┬────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ App 1 │ │ App 2 │ │ App 3 │
└──────┬──────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
└──────────────┼──────────────┘
│
┌────────▼────────┐
│ Query Router │
└────────┬────────┘
│
┌───────────────────┼───────────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ (A-H) │ │ (I-P) │ │ (Q-Z) │
└─────┬─────┘ └─────┬─────┘ └─────┬─────┘
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ Replica │ │ Replica │ │ Replica │
└───────────┘ └───────────┘ └───────────┘Distributed Database Challenges
| Challenge | Description |
|---|---|
| Data Consistency | How to keep data in sync across nodes |
| Network Partition | How to handle network disconnection between nodes |
| Failure Recovery | How to ensure availability when nodes fail |
| Distributed Transactions | How to guarantee ACID across nodes |
| Data Distribution | How to distribute data evenly to avoid hotspots |
| Query Routing | How to efficiently locate which node holds the data |
Topics
- CAP Theorem - CAP theorem and consistency models
- Replication - Data replication: primary-replica, multi-leader, consensus algorithms
- Sharding - Data sharding: sharding strategies, cross-shard queries