System design questions test your ability to architect scalable, reliable systems. This guide covers fundamental concepts frequently asked in Java backend interviews.

Scalability Concepts

Q: Vertical vs horizontal scaling?

Vertical (Scale Up) Horizontal (Scale Out)
Method Bigger machine More machines
Limit Hardware ceiling Near-unlimited
Downtime Often requires restart Add nodes live
Cost Expensive at high end Commodity hardware
Complexity Low Higher (distributed systems)

Java microservices typically scale horizontally.

Q: What is stateless vs stateful?

  Stateless: any server can handle any request
  → session in Redis/JWT, not server memory
  → easy to scale, load balance

Stateful: server holds session/state
  → sticky sessions or session replication
  → harder to scale
  

Design REST APIs to be stateless for scalability.

Load Balancing

  Client → Load Balancer → [Server 1, Server 2, Server 3]
  
Algorithm Description
Round Robin Rotate through servers
Least Connections Send to least busy server
Weighted More traffic to powerful servers
IP Hash Same client → same server (sticky)

Java tools: Nginx, HAProxy, Spring Cloud LoadBalancer, Kubernetes Service.

Caching Strategies

Q: Where to cache?

  Browser → CDN → API Gateway → Application Cache → Database
                                    ↑
                              Redis / Caffeine
  
Strategy Description Use case
Cache-aside App checks cache, loads from DB on miss General purpose
Read-through Cache loads from DB automatically Simplified app code
Write-through Write to cache and DB synchronously Strong consistency
Write-behind Write to cache, async to DB Write-heavy workloads

Q: Cache invalidation strategies?

  1. TTL — expire after time (simple, stale data possible)
  2. Event-driven — invalidate on update (Kafka, Spring @CacheEvict)
  3. Version-based — include version in cache key

Database Design

Q: SQL vs NoSQL — when to use each?

SQL (PostgreSQL) NoSQL (MongoDB, Redis)
Schema Fixed, relational Flexible, document/key-value
Transactions ACID Eventual consistency (varies)
Scaling Vertical + read replicas Horizontal sharding
Best for Complex queries, joins High write throughput, flexible schema

Q: Database sharding?

Split data across multiple databases by shard key:

  user_id % 4 = 0 → Shard 0
user_id % 4 = 1 → Shard 1
user_id % 4 = 2 → Shard 2
user_id % 4 = 3 → Shard 3
  

Challenges: cross-shard queries, rebalancing, hot shards.

Message Queues

Q: When to use async messaging?

  • Decouple services (order service → notification service)
  • Absorb traffic spikes (buffer requests)
  • Event-driven architecture (order created → inventory, shipping, analytics)
  Order Service → Kafka → [Inventory, Shipping, Analytics]
  (sync, fast)              (async, independent)
  

Common Design Questions

Q: Design a URL shortener.

Key components:

  1. APIPOST /shorten, GET /{code} redirect
  2. Encoding — base62 of auto-increment ID or hash
  3. Storage — Redis (hot) + PostgreSQL (persistent)
  4. Cache — cache popular URLs in Redis
  5. Scale — stateless API servers behind load balancer

Capacity estimate: 100M URLs × 500 bytes = 50GB storage.

Q: Design a rate limiter.

Approaches:

  1. Token bucket — refill tokens at fixed rate
  2. Sliding window — count requests in time window
  3. Fixed window — count per time interval
  // Redis-based sliding window
String key = "rate:" + userId + ":" + (now / windowSize);
Long count = redis.incr(key);
if (count == 1) redis.expire(key, windowSize);
if (count > maxRequests) throw new RateLimitExceededException();
  

Q: Design a notification system.

  Event → Kafka → Notification Service → [Email, SMS, Push]
                      ↓
                 Template Engine
                      ↓
                 Delivery Queue (with retry)
                      ↓
                 Provider APIs (SendGrid, Twilio, FCM)
  

Key concerns: idempotency, retry with backoff, user preferences, delivery tracking.

CAP Theorem

In a partition, choose between:

  • Consistency — all nodes see same data
  • Availability — every request gets a response

Java examples:

  • CP — ZooKeeper, etcd (consistent but may reject requests)
  • AP — Cassandra, DynamoDB (available but may return stale data)

Most web applications choose AP with eventual consistency.

Interview Framework

Structure your answer:

  1. Requirements — functional + non-functional (QPS, latency, storage)
  2. Estimation — back-of-envelope calculations
  3. High-level design — boxes and arrows
  4. Deep dive — database schema, API design, key algorithms
  5. Bottlenecks — identify and address scaling limits
  6. Trade-offs — explain choices and alternatives