System Design Basics
System design questions test your ability to architect scalable, reliable systems. This guide covers fundamental concepts frequently asked in Java backend interviews.
Scalability Concepts
Q: Vertical vs horizontal scaling?
| Vertical (Scale Up) | Horizontal (Scale Out) | |
|---|---|---|
| Method | Bigger machine | More machines |
| Limit | Hardware ceiling | Near-unlimited |
| Downtime | Often requires restart | Add nodes live |
| Cost | Expensive at high end | Commodity hardware |
| Complexity | Low | Higher (distributed systems) |
Java microservices typically scale horizontally.
Q: What is stateless vs stateful?
Stateless: any server can handle any request
→ session in Redis/JWT, not server memory
→ easy to scale, load balance
Stateful: server holds session/state
→ sticky sessions or session replication
→ harder to scale
Design REST APIs to be stateless for scalability.
Load Balancing
Client → Load Balancer → [Server 1, Server 2, Server 3]
| Algorithm | Description |
|---|---|
| Round Robin | Rotate through servers |
| Least Connections | Send to least busy server |
| Weighted | More traffic to powerful servers |
| IP Hash | Same client → same server (sticky) |
Java tools: Nginx, HAProxy, Spring Cloud LoadBalancer, Kubernetes Service.
Caching Strategies
Q: Where to cache?
Browser → CDN → API Gateway → Application Cache → Database
↑
Redis / Caffeine
| Strategy | Description | Use case |
|---|---|---|
| Cache-aside | App checks cache, loads from DB on miss | General purpose |
| Read-through | Cache loads from DB automatically | Simplified app code |
| Write-through | Write to cache and DB synchronously | Strong consistency |
| Write-behind | Write to cache, async to DB | Write-heavy workloads |
Q: Cache invalidation strategies?
- TTL — expire after time (simple, stale data possible)
- Event-driven — invalidate on update (Kafka, Spring
@CacheEvict) - Version-based — include version in cache key
Database Design
Q: SQL vs NoSQL — when to use each?
| SQL (PostgreSQL) | NoSQL (MongoDB, Redis) | |
|---|---|---|
| Schema | Fixed, relational | Flexible, document/key-value |
| Transactions | ACID | Eventual consistency (varies) |
| Scaling | Vertical + read replicas | Horizontal sharding |
| Best for | Complex queries, joins | High write throughput, flexible schema |
Q: Database sharding?
Split data across multiple databases by shard key:
user_id % 4 = 0 → Shard 0
user_id % 4 = 1 → Shard 1
user_id % 4 = 2 → Shard 2
user_id % 4 = 3 → Shard 3
Challenges: cross-shard queries, rebalancing, hot shards.
Message Queues
Q: When to use async messaging?
- Decouple services (order service → notification service)
- Absorb traffic spikes (buffer requests)
- Event-driven architecture (order created → inventory, shipping, analytics)
Order Service → Kafka → [Inventory, Shipping, Analytics]
(sync, fast) (async, independent)
Common Design Questions
Q: Design a URL shortener.
Key components:
- API —
POST /shorten,GET /{code}redirect - Encoding — base62 of auto-increment ID or hash
- Storage — Redis (hot) + PostgreSQL (persistent)
- Cache — cache popular URLs in Redis
- Scale — stateless API servers behind load balancer
Capacity estimate: 100M URLs × 500 bytes = 50GB storage.
Q: Design a rate limiter.
Approaches:
- Token bucket — refill tokens at fixed rate
- Sliding window — count requests in time window
- Fixed window — count per time interval
// Redis-based sliding window
String key = "rate:" + userId + ":" + (now / windowSize);
Long count = redis.incr(key);
if (count == 1) redis.expire(key, windowSize);
if (count > maxRequests) throw new RateLimitExceededException();
Q: Design a notification system.
Event → Kafka → Notification Service → [Email, SMS, Push]
↓
Template Engine
↓
Delivery Queue (with retry)
↓
Provider APIs (SendGrid, Twilio, FCM)
Key concerns: idempotency, retry with backoff, user preferences, delivery tracking.
CAP Theorem
In a partition, choose between:
- Consistency — all nodes see same data
- Availability — every request gets a response
Java examples:
- CP — ZooKeeper, etcd (consistent but may reject requests)
- AP — Cassandra, DynamoDB (available but may return stale data)
Most web applications choose AP with eventual consistency.
Interview Framework
Structure your answer:
- Requirements — functional + non-functional (QPS, latency, storage)
- Estimation — back-of-envelope calculations
- High-level design — boxes and arrows
- Deep dive — database schema, API design, key algorithms
- Bottlenecks — identify and address scaling limits
- Trade-offs — explain choices and alternatives