System Design Fundamentals - From Client-Server to Distributed Architecture

May 24, 2025

System design is the art of building scalable, reliable, and efficient distributed systems. This guide walks through the essential components and patterns that power modern applications, from simple client-server interactions to complex microservices architectures.

1. Client-Server Architecture: The Foundation

Basic Interaction Model

Every web application starts with the fundamental client-server model where clients (browsers, mobile apps) request services from servers.

[Client] ----HTTP Request----> [Server]
[Client] <---HTTP Response---- [Server]

How It Works

Client initiates: User opens a web page or app
Request formation: Client creates HTTP request with method, headers, and data
Server processing: Server receives, processes, and generates response
Response delivery: Server sends back data (HTML, JSON, images)
Client rendering: Client displays or processes the received data

Real-World Example

When you search on an e-commerce site: - Client sends: GET /search?query=laptop&category=electronics - Server processes the search logic - Server responds with product listings in JSON format - Client renders the product cards on the webpage

2. DNS Lookup: The Internet's Address Book

What DNS Does

Domain Name System translates human-readable domain names into IP addresses that computers use to locate servers.

DNS Resolution Flow

User types: www.example.com
    ↓
Browser Cache Check
    ↓
OS Cache Check
    ↓
Router Cache Check
    ↓
ISP DNS Server
    ↓
Root DNS Server
    ↓
TLD DNS Server (.com)
    ↓
Authoritative DNS Server
    ↓
Returns IP: 192.168.1.100

DNS Record Types

A Record: Maps domain to IPv4 address
AAAA Record: Maps domain to IPv6 address
CNAME: Creates domain aliases
MX: Mail server routing
TXT: Text-based verification and configuration

Performance Impact

DNS resolution typically takes 20-120ms. This is why DNS caching at multiple levels is crucial for performance.

3. Vertical Scaling: Growing Upward

Concept

Vertical scaling means increasing the power of your existing server by adding more CPU, RAM, or storage to handle increased load.

Implementation Approach

Before: 4 CPU cores, 8GB RAM, 100GB storage
After:  16 CPU cores, 64GB RAM, 1TB storage

When to Use Vertical Scaling

Simple applications: Single-server setups
Database servers: When ACID properties are critical
Legacy systems: Applications not designed for distribution
Quick fixes: Immediate performance boost needed

Scaling Limits

Most cloud providers offer instances up to: - 128+ CPU cores - 2-4TB of RAM - Multiple TB of storage

Real-World Example

A growing startup's database server handling 1,000 concurrent users starts experiencing slow queries. Instead of redesigning the architecture, they upgrade from a 8-core/16GB instance to a 32-core/128GB instance, immediately improving performance.

4. Horizontal Scaling: Growing Outward

Concept

Horizontal scaling involves adding more servers to distribute the load across multiple machines instead of making one machine more powerful.

Architecture Pattern

Single Server:
[All Traffic] → [Single Server]

Horizontal Scaling:
[Traffic] → [Load Balancer] → [Server 1]
                           → [Server 2]
                           → [Server 3]

Implementation Considerations

Stateless design: Servers shouldn't store user session data locally
Data synchronization: Shared databases or distributed data stores
Load distribution: Traffic routing across multiple instances

When to Choose Horizontal Scaling

High availability requirements: No single point of failure
Cost efficiency: Commodity hardware is cheaper than high-end servers
Unlimited scaling potential: Can add servers as needed
Geographic distribution: Servers in multiple regions

Example Scenario

A video streaming service starts with one server handling 10,000 users. As they grow to 100,000 users, they deploy 10 identical servers behind a load balancer, each handling 10,000 users.

5. Load Balancers: Traffic Distribution Intelligence

Purpose

Load balancers distribute incoming requests across multiple backend servers to ensure no single server becomes overwhelmed.

Load Balancing Algorithms

Round Robin

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (cycle repeats)

Weighted Round Robin

Server A (weight: 3): handles 3 requests
Server B (weight: 2): handles 2 requests
Server C (weight: 1): handles 1 request

Least Connections

Routes traffic to the server with the fewest active connections.

IP Hash

Uses client IP to consistently route to the same server (useful for session affinity).

Types of Load Balancers

Layer 4 (Transport Layer)

Routes based on IP and port
Fast, low latency
Protocol agnostic

Layer 7 (Application Layer)

Routes based on HTTP headers, URLs, cookies
Content-based routing
SSL termination capabilities

Health Checks

Load balancers continuously monitor server health:

Every 30 seconds:
  Send GET /health to each server
  If response != 200 OK:
    Remove server from rotation
  If server recovers:
    Add back to rotation

Real-World Example

An e-commerce site uses a load balancer to route: - /api/products/* → Product service servers - /api/orders/* → Order service servers
- /api/users/* → User service servers - /static/* → CDN or static file servers

6. Microservices: Divide and Conquer

Architecture Philosophy

Microservices break down large applications into small, independent services that communicate over well-defined APIs.

Monolith vs Microservices

Monolithic:
[Web UI + Business Logic + Database] - Single Deployment

Microservices:
[User Service] [Product Service] [Order Service] [Payment Service]
      ↓              ↓               ↓              ↓
  [User DB]    [Product DB]    [Order DB]    [Payment DB]

Service Communication Patterns

Synchronous Communication

Order Service → HTTP Call → Payment Service
Order Service ← Response ← Payment Service

Asynchronous Communication

Order Service → Message Queue → Payment Service
Order Service continues processing...
Payment Service processes when ready

Service Discovery

Services need to find and communicate with each other dynamically:

Service Registry:
- user-service: 192.168.1.10:8080
- product-service: 192.168.1.11:8080  
- order-service: 192.168.1.12:8080

Order Service needs User Service:
1. Query service registry for "user-service"
2. Get IP and port
3. Make HTTP call

Data Management Patterns

Database per Service

Each microservice owns its data and database schema.

Shared Database Anti-pattern

Multiple services sharing the same database creates tight coupling.

Event Sourcing

Services publish events when data changes, other services react to these events.

Real-World Example

Netflix's architecture includes hundreds of microservices: - User Profile Service: Manages user accounts and preferences - Recommendation Service: Generates personalized content suggestions - Video Encoding Service: Processes and converts video files - Billing Service: Handles subscriptions and payments - Playback Service: Streams video content to devices

7. Content Delivery Networks (CDN): Global Performance

Purpose

CDNs cache and serve static content from geographically distributed servers to reduce latency and server load.

How CDNs Work

User in Tokyo requests image from US server:
Without CDN: Tokyo → US Server (200ms latency)

With CDN: Tokyo → Tokyo CDN Edge (20ms latency)
If not cached: Tokyo CDN → US Server → Cache → User

Content Types for CDN

Static assets: Images, CSS, JavaScript files
Video content: Streaming media files
API responses: Cacheable GET requests
Dynamic content: With proper cache headers

CDN Cache Strategies

Geographic Distribution

Origin Server (US East)
    ↓
CDN Edge Locations:
- US West: Serves West Coast users
- Europe: Serves European users  
- Asia: Serves Asian users
- Australia: Serves Australian users

Real-World Example

A news website serves users globally: - Images and CSS: Cached at CDN edges for 24 hours - Breaking news API: Cached for 5 minutes - User-specific content: Not cached - Video content: Cached indefinitely until updated

8. Rate Limiting: Protecting System Resources

Purpose

Rate limiting controls the number of requests a client can make within a specific time window to prevent abuse and ensure fair resource allocation.

Common Rate Limiting Algorithms

Token Bucket

Bucket capacity: 100 tokens
Refill rate: 10 tokens/second

Request arrives:
- If tokens available: Process request, remove token
- If no tokens: Reject request (429 status)

Fixed Window

Window: 1 minute
Limit: 1000 requests

9:00:00 - 9:00:59: Count requests
If count > 1000: Reject subsequent requests
9:01:00: Reset counter

Implementation Levels

Application Level

@RateLimit(requests=100, window="1h", key="user_id")
function getUserProfile(userId) {
    // API logic
}

9. Request Routing in Microservices

Service Mesh Architecture

When a microservice has multiple instances, requests need intelligent routing to healthy instances.

Service Instance Discovery

Product Service Instances:
- product-service-1: 192.168.1.10:8080 (healthy)
- product-service-2: 192.168.1.11:8080 (healthy)  
- product-service-3: 192.168.1.12:8080 (unhealthy)

Load Balancer routes only to healthy instances

Circuit Breaker Pattern

Prevents cascading failures when downstream services fail:

Order Service → Payment Service

Circuit States:
- Closed: Normal operation
- Open: Stop calling failing service
- Half-Open: Test if service recovered

Failure threshold: 5 failures in 1 minute
Recovery timeout: 30 seconds

10. Queuing Systems: Asynchronous Processing

Purpose

Message queues enable asynchronous communication between services, improving system resilience and performance.

Queue Types

Point-to-Point (Queue)

Producer → [Message Queue] → Single Consumer
Example: Order processing system

Publish-Subscribe (Topic)

Publisher → [Topic] → Multiple Subscribers
Example: User registration event notification

Message Queue Patterns

Work Queue

Multiple workers process tasks from a shared queue:

[Image Upload] → Queue → [Worker 1: Resize images]
                     → [Worker 2: Generate thumbnails]  
                     → [Worker 3: Extract metadata]

Fan-out

Single message triggers multiple parallel processes:

[Order Created] → Queue → [Inventory Service: Update stock]
                      → [Email Service: Send confirmation]
                      → [Analytics Service: Track metrics]

Queue Reliability Features

Message Persistence

Messages survive server restarts and failures.

Acknowledgments

1. Consumer receives message
2. Consumer processes message
3. Consumer sends ACK
4. Queue removes message

If no ACK received: Message returned to queue

Queue Technologies

Message Brokers

RabbitMQ: Feature-rich, supports multiple protocols
Apache Kafka: High-throughput, distributed streaming
Amazon SQS: Managed cloud queuing service
Redis Pub/Sub: In-memory messaging

Real-World Example

Social media platform message flow: 1. User posts content → Main database 2. Post event → Message queue 3. Queue consumers: - Content moderation service checks for policy violations - Notification service alerts followers
- Analytics service tracks engagement metrics - Search indexing service updates search database

11. API Gateways: Single Entry Point

Purpose

API Gateway acts as a single entry point for all client requests, providing cross-cutting concerns like authentication, rate limiting, and request routing.

Core Functions

Request Routing

Client Request → API Gateway → Route to appropriate microservice

/api/users/*    → User Service
/api/products/* → Product Service  
/api/orders/*   → Order Service

Protocol Translation

External: HTTPS/REST → API Gateway → Internal: HTTP/gRPC
External: WebSocket  → API Gateway → Internal: Message Queue

Cross-Cutting Concerns

Authentication & Authorization

1. Client sends request with JWT token
2. API Gateway validates token
3. If valid: Forward to service with user context
4. If invalid: Return 401 Unauthorized

Aggregation

Combine multiple service calls into single response:

Client requests user dashboard:
API Gateway calls:
- User Service: Get profile
- Order Service: Get recent orders  
- Notification Service: Get unread messages

Returns combined response

Advanced Features

Caching

Cache Strategy:
- User profiles: 15 minutes
- Product catalogs: 1 hour
- Static content: 24 hours

Analytics & Monitoring

Track API usage, performance metrics, and error rates.

Version Management

/v1/api/users → User Service v1.0
/v2/api/users → User Service v2.0

Real-World Example

Mobile banking app API Gateway: - Authentication: Validates user credentials and device tokens - Rate Limiting: Prevents brute force attacks (10 login attempts/hour) - Request Routing: Routes account requests to Banking Service, payment requests to Payment Service - Response Caching: Caches account balance for 30 seconds - Logging: Records all transactions for audit purposes

12. Caching with Redis: Speed Through Memory

Caching Strategy Levels

Browser Cache

Client-side caching of static resources and API responses.

CDN Cache

Geographic distribution of static content.

Application Cache

In-memory caching within application servers.

Database Cache

Redis/Memcached for frequently accessed data.

Redis as Distributed Cache

Cache-Aside Pattern

function getUserProfile(userId) {
    // Check cache first
    cached = redis.get("user:" + userId)
    if (cached) return cached

    // Cache miss - get from database
    user = database.getUser(userId)

    // Store in cache for future requests
    redis.set("user:" + userId, user, expiry=3600)
    return user
}

Write-Through Pattern

function updateUserProfile(userId, data) {
    // Update database
    database.updateUser(userId, data)

    // Update cache immediately
    redis.set("user:" + userId, data, expiry=3600)
}

Write-Behind Pattern

function updateUserProfile(userId, data) {
    // Update cache immediately
    redis.set("user:" + userId, data, expiry=3600)

    // Queue database update for later
    queue.add("update_user", {userId, data})
}

Real-World Example

Global streaming service architecture: - Auto-scaling: Video encoding services scale based on upload queue length - Multi-region: Content cached in 50+ regions worldwide - Service mesh: 300+ microservices with automatic discovery - Health monitoring: Services automatically restart on failures - Database replication: User data replicated across 3 regions

This guide provides a foundation for understanding modern system design. Each component serves specific purposes, and the art lies in combining them effectively based on your unique requirements, constraints, and growth trajectory.