🏗️

Good System Design — The Best Design Is Invisible

State management, DB design, caching, events, and failure recovery — the power of proven simplicity

System design principles based on Sean Goedecke's article.

Good Design Is Invisible

If software design is assembling code, system design is combining services — app servers, databases, caches, queues, event buses, proxies.

Good design gets reactions like "nothing special happened" and "that was easier than expected." Complex, eye-catching designs often hide fundamental problems or signal over-engineering. Start with the simplest possible structure and evolve incrementally.

State Is the Hardest Part

Stateless services (like PDF rendering) return results without storing anything. Services that write to a database manage state.

Minimize stateful components. Complexity and failure probability drop together. Let one service own state; others focus on stateless roles like API calls or event emission.

Database — Schema and Indexes

Human-readable schemas win. Overly flexible schemas (everything in JSON columns) burden application code. Index only columns hit by frequent queries — indexing everything adds write overhead.

When the DB becomes a bottleneck, use JOINs aggressively. It's almost always faster than multiple queries assembled in the application. Watch for N+1 queries with ORMs.

Distribute reads to replicas. Consider throttling writes.

Separating Fast and Slow Work

User interactions need sub-second responses. Long tasks (PDF conversion, etc.) get a minimal immediate response, then move to background processing. Queue (Redis) + job runner is standard.

Far-future scheduled tasks work better in a DB table with a scheduler than in Redis.

Cache Cautiously

Caching reduces expensive repeated computations. But juniors want to cache everything, while experienced engineers grow increasingly cautious.

Caches introduce new state. Sync issues, stale data, invalidation bugs — all born from caches. Try query optimization (add indexes) first. Cache only when that's not enough.

Event Processing

Most companies have event hubs like Kafka. But event overuse makes tracing hard. Simple request-response APIs are better for logging and debugging.

Event-driven processing fits when senders don't care about receiver behavior, or in high-volume, latency-tolerant scenarios.

Push vs Pull

Pull is simpler but creates polling overhead. Push delivers changes immediately — efficient and keeps data fresh. Scaling to many clients requires infrastructure investment either way.

Focus on Hot Paths

Hot paths carry the most traffic. Design failures here impact the entire service. Invest design and testing resources in hot paths over minor features.

Logging and Observability

Log unhappy paths aggressively. Don't just watch averages — observe p95 and p99 latency. The slowest requests may represent your most important users.

Kill Switches and Failure Recovery

Blind retries just burden other services. Use circuit breakers and idempotency keys.

Choose between fail open (allow) and fail closed (block) per scenario. Rate limiting should fail open. Authentication must fail closed.

Boring Design Survives in Production

Truly novel system designs are rare. Placing proven components in the right spots is the most reliable long-term strategy. Invisible design, safely combining proven methods — that's good system design.

How It Works

1

Minimize the number of stateful components

2

Keep DB schema readable, index only where needed

3

Move slow work to background processing

4

Cache as a last resort after query optimization

5

Focus design and testing resources on hot paths

6

Prepare for failures with circuit breakers + idempotency keys

Use Cases

Web service architecture — basic design combining app server + DB + cache + queue Inter-service communication — choosing event-driven vs request-response Failure resilience — fail open/closed, circuit breakers, idempotency keys