System design is the art of building scalable, reliable, and efficient distributed systems. This guide walks through the essential components and patterns that power modern applications, from simple client-server interactions to complex microservices architectures.
1. Client-Server Architecture: The Foundation
Basic Interaction Model
Every web application starts with the fundamental client-server model where clients (browsers, mobile apps) request services from servers.
[Client] ----HTTP Request----> [Server]
[Client] <---HTTP Response---- [Server]
How It Works
- Client initiates: User opens a web page or app
- Request formation: Client creates HTTP request with method, headers, and data
- Server processing: Server receives, processes, and generates response
- Response delivery: Server sends back data (HTML, JSON, images)
- Client rendering: Client displays or processes the received data
Real-World Example
When you search on an e-commerce site:
- Client sends: GET /search?query=laptop&category=electronics
- Server processes the search logic
- Server responds with product listings in JSON format
- Client renders the product cards on the webpage
2. DNS Lookup: The Internet's Address Book
What DNS Does
Domain Name System translates human-readable domain names into IP addresses that computers use to locate servers.
DNS Resolution Flow
User types: www.example.com
↓
Browser Cache Check
↓
OS Cache Check
↓
Router Cache Check
↓
ISP DNS Server
↓
Root DNS Server
↓
TLD DNS Server (.com)
↓
Authoritative DNS Server
↓
Returns IP: 192.168.1.100
DNS Record Types
- A Record: Maps domain to IPv4 address
- AAAA Record: Maps domain to IPv6 address
- CNAME: Creates domain aliases
- MX: Mail server routing
- TXT: Text-based verification and configuration
Performance Impact
DNS resolution typically takes 20-120ms. This is why DNS caching at multiple levels is crucial for performance.
3. Vertical Scaling: Growing Upward
Concept
Vertical scaling means increasing the power of your existing server by adding more CPU, RAM, or storage to handle increased load.
Implementation Approach
Before: 4 CPU cores, 8GB RAM, 100GB storage
After: 16 CPU cores, 64GB RAM, 1TB storage
When to Use Vertical Scaling
- Simple applications: Single-server setups
- Database servers: When ACID properties are critical
- Legacy systems: Applications not designed for distribution
- Quick fixes: Immediate performance boost needed
Scaling Limits
Most cloud providers offer instances up to: - 128+ CPU cores - 2-4TB of RAM - Multiple TB of storage
Real-World Example
A growing startup's database server handling 1,000 concurrent users starts experiencing slow queries. Instead of redesigning the architecture, they upgrade from a 8-core/16GB instance to a 32-core/128GB instance, immediately improving performance.
4. Horizontal Scaling: Growing Outward
Concept
Horizontal scaling involves adding more servers to distribute the load across multiple machines instead of making one machine more powerful.
Architecture Pattern
Single Server:
[All Traffic] → [Single Server]
Horizontal Scaling:
[Traffic] → [Load Balancer] → [Server 1]
→ [Server 2]
→ [Server 3]
Implementation Considerations
- Stateless design: Servers shouldn't store user session data locally
- Data synchronization: Shared databases or distributed data stores
- Load distribution: Traffic routing across multiple instances
When to Choose Horizontal Scaling
- High availability requirements: No single point of failure
- Cost efficiency: Commodity hardware is cheaper than high-end servers
- Unlimited scaling potential: Can add servers as needed
- Geographic distribution: Servers in multiple regions
Example Scenario
A video streaming service starts with one server handling 10,000 users. As they grow to 100,000 users, they deploy 10 identical servers behind a load balancer, each handling 10,000 users.
5. Load Balancers: Traffic Distribution Intelligence
Purpose
Load balancers distribute incoming requests across multiple backend servers to ensure no single server becomes overwhelmed.
Load Balancing Algorithms
Round Robin
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)
Weighted Round Robin
Server A (weight: 3): handles 3 requests
Server B (weight: 2): handles 2 requests
Server C (weight: 1): handles 1 request
Least Connections
Routes traffic to the server with the fewest active connections.
IP Hash
Uses client IP to consistently route to the same server (useful for session affinity).
Types of Load Balancers
Layer 4 (Transport Layer)
- Routes based on IP and port
- Fast, low latency
- Protocol agnostic
Layer 7 (Application Layer)
- Routes based on HTTP headers, URLs, cookies
- Content-based routing
- SSL termination capabilities
Health Checks
Load balancers continuously monitor server health:
Every 30 seconds:
Send GET /health to each server
If response != 200 OK:
Remove server from rotation
If server recovers:
Add back to rotation
Real-World Example
An e-commerce site uses a load balancer to route:
- /api/products/*
→ Product service servers
- /api/orders/*
→ Order service servers
- /api/users/*
→ User service servers
- /static/*
→ CDN or static file servers
6. Microservices: Divide and Conquer
Architecture Philosophy
Microservices break down large applications into small, independent services that communicate over well-defined APIs.
Monolith vs Microservices
Monolithic:
[Web UI + Business Logic + Database] - Single Deployment
Microservices:
[User Service] [Product Service] [Order Service] [Payment Service]
↓ ↓ ↓ ↓
[User DB] [Product DB] [Order DB] [Payment DB]
Service Communication Patterns
Synchronous Communication
Order Service → HTTP Call → Payment Service
Order Service ← Response ← Payment Service
Asynchronous Communication
Order Service → Message Queue → Payment Service
Order Service continues processing...
Payment Service processes when ready
Service Discovery
Services need to find and communicate with each other dynamically:
Service Registry:
- user-service: 192.168.1.10:8080
- product-service: 192.168.1.11:8080
- order-service: 192.168.1.12:8080
Order Service needs User Service:
1. Query service registry for "user-service"
2. Get IP and port
3. Make HTTP call
Data Management Patterns
Database per Service
Each microservice owns its data and database schema.
Shared Database Anti-pattern
Multiple services sharing the same database creates tight coupling.
Event Sourcing
Services publish events when data changes, other services react to these events.
Real-World Example
Netflix's architecture includes hundreds of microservices: - User Profile Service: Manages user accounts and preferences - Recommendation Service: Generates personalized content suggestions - Video Encoding Service: Processes and converts video files - Billing Service: Handles subscriptions and payments - Playback Service: Streams video content to devices
7. Content Delivery Networks (CDN): Global Performance
Purpose
CDNs cache and serve static content from geographically distributed servers to reduce latency and server load.
How CDNs Work
User in Tokyo requests image from US server:
Without CDN: Tokyo → US Server (200ms latency)
With CDN: Tokyo → Tokyo CDN Edge (20ms latency)
If not cached: Tokyo CDN → US Server → Cache → User
Content Types for CDN
- Static assets: Images, CSS, JavaScript files
- Video content: Streaming media files
- API responses: Cacheable GET requests
- Dynamic content: With proper cache headers
CDN Cache Strategies
Geographic Distribution
Origin Server (US East)
↓
CDN Edge Locations:
- US West: Serves West Coast users
- Europe: Serves European users
- Asia: Serves Asian users
- Australia: Serves Australian users
Real-World Example
A news website serves users globally: - Images and CSS: Cached at CDN edges for 24 hours - Breaking news API: Cached for 5 minutes - User-specific content: Not cached - Video content: Cached indefinitely until updated
8. Rate Limiting: Protecting System Resources
Purpose
Rate limiting controls the number of requests a client can make within a specific time window to prevent abuse and ensure fair resource allocation.
Common Rate Limiting Algorithms
Token Bucket
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
Request arrives:
- If tokens available: Process request, remove token
- If no tokens: Reject request (429 status)
Fixed Window
Window: 1 minute
Limit: 1000 requests
9:00:00 - 9:00:59: Count requests
If count > 1000: Reject subsequent requests
9:01:00: Reset counter
Implementation Levels
Application Level
@RateLimit(requests=100, window="1h", key="user_id")
function getUserProfile(userId) {
// API logic
}
9. Request Routing in Microservices
Service Mesh Architecture
When a microservice has multiple instances, requests need intelligent routing to healthy instances.
Service Instance Discovery
Product Service Instances:
- product-service-1: 192.168.1.10:8080 (healthy)
- product-service-2: 192.168.1.11:8080 (healthy)
- product-service-3: 192.168.1.12:8080 (unhealthy)
Load Balancer routes only to healthy instances
Circuit Breaker Pattern
Prevents cascading failures when downstream services fail:
Order Service → Payment Service
Circuit States:
- Closed: Normal operation
- Open: Stop calling failing service
- Half-Open: Test if service recovered
Failure threshold: 5 failures in 1 minute
Recovery timeout: 30 seconds
10. Queuing Systems: Asynchronous Processing
Purpose
Message queues enable asynchronous communication between services, improving system resilience and performance.
Queue Types
Point-to-Point (Queue)
Producer → [Message Queue] → Single Consumer
Example: Order processing system
Publish-Subscribe (Topic)
Publisher → [Topic] → Multiple Subscribers
Example: User registration event notification
Message Queue Patterns
Work Queue
Multiple workers process tasks from a shared queue:
[Image Upload] → Queue → [Worker 1: Resize images]
→ [Worker 2: Generate thumbnails]
→ [Worker 3: Extract metadata]
Fan-out
Single message triggers multiple parallel processes:
[Order Created] → Queue → [Inventory Service: Update stock]
→ [Email Service: Send confirmation]
→ [Analytics Service: Track metrics]
Queue Reliability Features
Message Persistence
Messages survive server restarts and failures.
Acknowledgments
1. Consumer receives message
2. Consumer processes message
3. Consumer sends ACK
4. Queue removes message
If no ACK received: Message returned to queue
Queue Technologies
Message Brokers
- RabbitMQ: Feature-rich, supports multiple protocols
- Apache Kafka: High-throughput, distributed streaming
- Amazon SQS: Managed cloud queuing service
- Redis Pub/Sub: In-memory messaging
Real-World Example
Social media platform message flow:
1. User posts content → Main database
2. Post event → Message queue
3. Queue consumers:
- Content moderation service checks for policy violations
- Notification service alerts followers
- Analytics service tracks engagement metrics
- Search indexing service updates search database
11. API Gateways: Single Entry Point
Purpose
API Gateway acts as a single entry point for all client requests, providing cross-cutting concerns like authentication, rate limiting, and request routing.
Core Functions
Request Routing
Client Request → API Gateway → Route to appropriate microservice
/api/users/* → User Service
/api/products/* → Product Service
/api/orders/* → Order Service
Protocol Translation
External: HTTPS/REST → API Gateway → Internal: HTTP/gRPC
External: WebSocket → API Gateway → Internal: Message Queue
Cross-Cutting Concerns
Authentication & Authorization
1. Client sends request with JWT token
2. API Gateway validates token
3. If valid: Forward to service with user context
4. If invalid: Return 401 Unauthorized
Aggregation
Combine multiple service calls into single response:
Client requests user dashboard:
API Gateway calls:
- User Service: Get profile
- Order Service: Get recent orders
- Notification Service: Get unread messages
Returns combined response
Advanced Features
Caching
Cache Strategy:
- User profiles: 15 minutes
- Product catalogs: 1 hour
- Static content: 24 hours
Analytics & Monitoring
Track API usage, performance metrics, and error rates.
Version Management
/v1/api/users → User Service v1.0
/v2/api/users → User Service v2.0
Real-World Example
Mobile banking app API Gateway: - Authentication: Validates user credentials and device tokens - Rate Limiting: Prevents brute force attacks (10 login attempts/hour) - Request Routing: Routes account requests to Banking Service, payment requests to Payment Service - Response Caching: Caches account balance for 30 seconds - Logging: Records all transactions for audit purposes
12. Caching with Redis: Speed Through Memory
Caching Strategy Levels
Browser Cache
Client-side caching of static resources and API responses.
CDN Cache
Geographic distribution of static content.
Application Cache
In-memory caching within application servers.
Database Cache
Redis/Memcached for frequently accessed data.
Redis as Distributed Cache
Cache-Aside Pattern
function getUserProfile(userId) {
// Check cache first
cached = redis.get("user:" + userId)
if (cached) return cached
// Cache miss - get from database
user = database.getUser(userId)
// Store in cache for future requests
redis.set("user:" + userId, user, expiry=3600)
return user
}
Write-Through Pattern
function updateUserProfile(userId, data) {
// Update database
database.updateUser(userId, data)
// Update cache immediately
redis.set("user:" + userId, data, expiry=3600)
}
Write-Behind Pattern
function updateUserProfile(userId, data) {
// Update cache immediately
redis.set("user:" + userId, data, expiry=3600)
// Queue database update for later
queue.add("update_user", {userId, data})
}
Real-World Example
Global streaming service architecture: - Auto-scaling: Video encoding services scale based on upload queue length - Multi-region: Content cached in 50+ regions worldwide - Service mesh: 300+ microservices with automatic discovery - Health monitoring: Services automatically restart on failures - Database replication: User data replicated across 3 regions
This guide provides a foundation for understanding modern system design. Each component serves specific purposes, and the art lies in combining them effectively based on your unique requirements, constraints, and growth trajectory.