MEDIUMDistributed Systems

Design a URL Shortening Service

Design a system like bit.ly that converts long URLs into short, memorable links. The system should handle billions of URLs with high availability and low latency.

Estimated Time: 45 minutes
#url-shortener#distributed-systems#caching#scalability
Solution Overview

A URL shortener requires: (1) Unique ID generation using base62 encoding, (2) Key-value store for URL mapping, (3) Caching layer for popular URLs, (4) Analytics tracking, (5) Rate limiting to prevent abuse.

Hints to Get Started
1

Start with a simple hash function, then consider collisions

2

Think about read vs write ratio - this is read-heavy

3

Consider what happens when a short URL is clicked millions of times

4

How would you prevent malicious users from exhausting your ID space?

Requirements

functional

  • Generate unique short URL for given long URL
  • Redirect users from short URL to original URL
  • Custom short URLs (optional)
  • Expiration time for links
  • Analytics (click tracking, referrer data)

non functional

  • High availability (99.99%)
  • Low latency (<100ms redirect)
  • Scalable to billions of URLs
  • Handle 1000 requests/second
Capacity Estimation

cache

20% of URLs generate 80% of traffic - cache 20%

storage

500 bytes/URL × 100M = 50GB/month, 600GB/year

traffic

100M new URLs/month, 10B redirects/month

bandwidth

Read-heavy: 100:1 read-to-write ratio

High-Level Architecture

data flow

  • 1. User submits long URL
  • 2. Generate unique short code (base62)
  • 3. Store mapping in database
  • 4. Return short URL to user
  • 5. On redirect: Check cache → DB → Return original URL

components

  • Load Balancer - distributes requests
  • Application Servers - handle business logic
  • Database Cluster - stores URL mappings
  • Cache Layer (Redis) - fast lookups
  • Analytics Service - track clicks
Deep Dive

analytics

storage

Time-series database (InfluxDB) or data warehouse

tracking

Click events, referrer, geo-location, device

real time

Kafka stream processing for live stats

rate limiting

limits

10 URL creations per hour per user

algorithm

Token bucket per user/IP

implementation

Redis with INCR and EXPIRE commands

database schema

indexes

  • INDEX on created_at
  • INDEX on user_id

partitioning

Shard by hash(short_code) for horizontal scaling

url mappings

user id

INTEGER

long url

TEXT NOT NULL

created at

TIMESTAMP

expires at

TIMESTAMP

short code

VARCHAR(10) PRIMARY KEY

click count

INTEGER DEFAULT 0

caching strategy

ttl

24 hours for popular URLs

cache layer

Redis with LRU eviction

cache warming

Preload trending URLs

what to cache

Top 20% most accessed URLs

cache invalidation

On URL deletion or expiration

unique id generation

base62

Uses [a-zA-Z0-9] = 62 characters

length

7 characters = 62^7 = 3.5 trillion URLs

approach

Base62 encoding of auto-incrementing counter or hash

collision handling

Use distributed ID generator (Snowflake) or check-and-increment

Trade-offs & Considerations
  • MD5 hash vs auto-increment: Hash may collide, increment needs coordination
  • SQL vs NoSQL: NoSQL better for key-value lookups at scale
  • Cache size: Balance memory cost vs cache hit rate
Scalability Strategies
  • Horizontal scaling: Add more app servers behind load balancer
  • Database sharding: Partition by hash of short code
  • Read replicas: For read-heavy workload
  • CDN: Cache static content and popular redirects