March 12, 2025
How I Would Design a Scalable System Today (vs 5 Years Ago)
Five years ago, if you'd asked me to design a URL shortener, I'd have had a working prototype in an
afternoon and considered the job done. A Flask app, a Postgres database, maybe a random.choices call to generate short
codes. Deploy it on a single server, call it shipped.
That confidence wasn't arrogance — it was inexperience wearing a convincing disguise.
The gap between "a system that works" and "a system that works under pressure" is where most of my real education happened. Not in documentation or design pattern books, but in watching a single database melt under load, debugging a race condition at 11 PM, and explaining to a product manager why a "simple" change caused a 40-minute outage. This post is about that gap — specifically, how I'd design a URL shortener today versus how I would have approached it five years ago, and why every decision change was paid for with a real lesson.
How I Would Have Designed It 5 Years Ago
The requirements seemed simple enough: take a long URL, return a short code, redirect users when they visit that short code. Maybe add click analytics later.
My early design would have looked roughly like this:
[Client] → [Flask/Django App] → [PostgreSQL DB]
One application server. One database with two tables: urls (id, short_code, original_url,
created_at) and clicks (id, short_code, timestamp,
ip_address). The short code generation was straightforward — pick 6 random alphanumeric characters,
check if it already exists in the DB, retry if collision. Expose two endpoints: POST /shorten and GET /{code}.
And honestly? This design is fine. For a side project, a hackathon, an internal tool with 50 users —
it's completely appropriate. You can build it in a day, reason about it in five minutes, and debug
it with a single tail -f on the logs. There's real value in
that simplicity, and I'm not going to pretend otherwise.
The mistake wasn't building it this way. The mistake was not knowing when this design would stop working — and more importantly, not building in the observability to even know when that moment arrived.
Where That Design Breaks
Imagine the service gets featured on a popular tech newsletter. Within an hour, 50,000 users hit the redirect endpoint. Here's what happens to that simple design:
The redirect endpoint becomes a
bottleneck. Every GET /{code} goes to Postgres. A single
Postgres instance can handle maybe a few thousand reads per second under ideal conditions. At 50,000
concurrent users, you're not in ideal conditions. Query latency climbs. Connection pool exhausts.
The app starts timing out. Users see errors.
The click tracking makes it
worse. Every redirect also writes a row to the clicks table. So each user action is both a
read and a write hitting the same database. The clicks table starts locking during inserts.
Read queries queue up. The whole thing grinds.
There's no failure isolation. If the database goes down, the entire service goes down. If a slow query holds a connection, all other requests queue behind it. One bad actor can take down every user.
You have no idea any of this is
happening. Your logs say "Redirected abc123" with no latency, no
error rates, no queue depths. You find out the service is degraded when users tweet about it.
How I Would Design It Today
The core insight that changed everything: design for the read path and write path separately, because they have completely different scaling characteristics.
Here's the architecture I'd use today:
┌─────────────────┐
│ CDN / Edge │ ← Cache popular redirects at edge
└────────┬────────┘
│
┌────────▼────────┐
│ API Gateway │ ← Rate limiting, auth, routing
└──┬──────────┬─-─┘
│ │
┌────────────▼──┐ ┌───▼────────────┐
│ Redirect Svc │ │ Shorten Svc │
└────────┬──────┘ └───┬────────────┘
│ │
┌────────▼──┐ ┌──────▼──────┐
│ Redis │ │ PostgreSQL │
│ (cache) │ │ (primary) │
└───────────┘ └─────────────┘
│
┌────────▼──────────┐
│ Message Queue │ ← Async click tracking
│ (Redis Streams │
│ or Kafka) │
└────────┬──────────┘
│
┌────────▼──────────┐
│ Analytics Worker │ → ClickHouse / TimescaleDB
└───────────────────┘
Separate the Redirect Service from the Shorten Service
These have completely different traffic patterns. Redirects are read-heavy, latency-sensitive, and happen at massive scale. Shortening is write-heavy, less frequent, and can tolerate slightly higher latency. Separating them means I can scale the redirect service horizontally without deploying new shorten logic, and I can rate-limit shortening independently to prevent abuse. This isn't microservices for its own sake — it's recognizing that treating two fundamentally different workloads as one service is a scalability trap.
Redis Caching for the Redirect Path
The redirect path is a textbook cache use case: the data rarely changes, it's read far more than it's written, and latency matters enormously.
def redirect(short_code: str) -> str:
# Check cache first
cached_url = redis.get(f"url:{short_code}")
if cached_url:
return cached_url.decode()
# Fall back to DB
url = db.query("SELECT original_url FROM urls WHERE short_code = %s", short_code)
if url:
redis.setex(f"url:{short_code}", 86400, url) # Cache for 24 hours
return url
raise NotFoundError(short_code)
With this, the vast majority of redirects — especially for popular links — never touch Postgres at all. Redis can handle hundreds of thousands of reads per second. The database becomes a durable source of truth, not a query engine under constant siege.
Async Click Tracking
Instead of writing a click row synchronously during the redirect, publish an event to a queue and let a background worker process it.
def redirect(short_code: str) -> str:
url = get_url(short_code) # From cache or DB
# Non-blocking: publish to queue, don't wait
queue.publish("clicks", {
"short_code": short_code,
"timestamp": time.time(),
"ip": request.remote_addr,
"user_agent": request.headers.get("User-Agent")
})
return url # Respond immediately
The redirect now completes in single-digit milliseconds. The analytics worker processes the queue at its own pace — if it falls behind under a traffic spike, it simply catches up without blocking users.
Short Code Generation Without DB Checks
Early me generated a code and then checked the database for collisions — a database round-trip on every shorten request. Today I'd use a distributed ID approach, no collision check, no round-trip, no lock contention.
import base62
import snowflake_id # Distributed ID generator
def generate_short_code() -> str:
unique_id = snowflake_id.generate() # Globally unique, time-ordered
return base62.encode(unique_id)[:7] # "a3kX9mZ"
Observability Built In
Every service emits structured logs with request IDs, latency, and cache hit/miss status. I instrument three key metrics from day one: redirect latency (p50, p95, p99), cache hit rate, and queue depth. Alerts fire when p95 latency exceeds 200ms or queue depth grows beyond 10,000 unprocessed events. I know about problems before users do.
Real-Life Scenario Walkthrough
Let's trace what happens when a user clicks a shared link — https://short.ly/a3kX9mZ.
Five years ago:
- Request hits the app server
- App queries Postgres:
SELECT original_url FROM urls WHERE short_code = 'a3kX9mZ' - App inserts a row into
clickstable - App returns a 301 redirect
- Total time: 80–300ms. Under traffic spike: 2–10 seconds or timeout.
Today:
- Request hits the CDN edge — if recently popular, redirect served directly from CDN. ~5ms.
- If not cached at edge, request reaches the Redirect Service.
- Redis lookup returns in ~1ms.
- If Redis misses, fall back to Postgres read replica. Cache result in Redis.
- Publish click event to queue — non-blocking, ~0.5ms.
- Return 301 redirect. Total time: 3–15ms under any reasonable load.
- Background worker batch-inserts into ClickHouse asynchronously, invisible to the user.
Key Lessons From This Evolution
Under-engineering is a choice; so is over-engineering. The simple design was right for the early stage. The lesson isn't "always build for scale" — it's "know your current scale, and design the migration path before you need it."
The read path and write path are different systems sharing a database. Treating them the same is the root cause of most performance bottlenecks I've seen. Separate them conceptually first, then decide whether they need to be separate services.
Async everything that doesn't need to be sync. If a user action triggers secondary effects — analytics, notifications, audit logs — those effects should almost never block the primary response. The cost of eventual consistency in analytics is almost always worth the latency improvement.
Observability is not a feature; it's a prerequisite. You cannot improve what you cannot measure. Structured logs, latency histograms, and queue depth metrics are non-negotiable from day one.
Failures are first-class citizens. Every component should have a defined behavior when something it depends on is unavailable. Designing for failure paths is just as important as designing for success paths.
Closing Thoughts
The most honest thing I can say about this evolution is that none of it came from studying system design in the abstract. It came from watching real systems fail in specific ways, then asking "what would have prevented this?" repeatedly over five years.
The junior version of me thought good system design meant knowing the right patterns. The version of me writing this understands that good system design means knowing which trade-offs you're making, why you're making them, and what you'll do when the assumptions behind them turn out to be wrong.
The goal was never to build a complex system. The goal was always to build a system that earns the right to stay simple — and knows when it needs to grow up.