Design a URL Shortening Service
Design a system like bit.ly that converts long URLs into short, memorable links. The system should handle billions of URLs with high availability and low latency.
A URL shortener requires: (1) Unique ID generation using base62 encoding, (2) Key-value store for URL mapping, (3) Caching layer for popular URLs, (4) Analytics tracking, (5) Rate limiting to prevent abuse.
Start with a simple hash function, then consider collisions
Think about read vs write ratio - this is read-heavy
Consider what happens when a short URL is clicked millions of times
How would you prevent malicious users from exhausting your ID space?
functional
- •Generate unique short URL for given long URL
- •Redirect users from short URL to original URL
- •Custom short URLs (optional)
- •Expiration time for links
- •Analytics (click tracking, referrer data)
non functional
- •High availability (99.99%)
- •Low latency (<100ms redirect)
- •Scalable to billions of URLs
- •Handle 1000 requests/second
cache
20% of URLs generate 80% of traffic - cache 20%
storage
500 bytes/URL × 100M = 50GB/month, 600GB/year
traffic
100M new URLs/month, 10B redirects/month
bandwidth
Read-heavy: 100:1 read-to-write ratio
data flow
- •1. User submits long URL
- •2. Generate unique short code (base62)
- •3. Store mapping in database
- •4. Return short URL to user
- •5. On redirect: Check cache → DB → Return original URL
components
- •Load Balancer - distributes requests
- •Application Servers - handle business logic
- •Database Cluster - stores URL mappings
- •Cache Layer (Redis) - fast lookups
- •Analytics Service - track clicks
analytics
storage
Time-series database (InfluxDB) or data warehouse
tracking
Click events, referrer, geo-location, device
real time
Kafka stream processing for live stats
rate limiting
limits
10 URL creations per hour per user
algorithm
Token bucket per user/IP
implementation
Redis with INCR and EXPIRE commands
database schema
indexes
- •INDEX on created_at
- •INDEX on user_id
partitioning
Shard by hash(short_code) for horizontal scaling
url mappings
user id
INTEGER
long url
TEXT NOT NULL
created at
TIMESTAMP
expires at
TIMESTAMP
short code
VARCHAR(10) PRIMARY KEY
click count
INTEGER DEFAULT 0
caching strategy
ttl
24 hours for popular URLs
cache layer
Redis with LRU eviction
cache warming
Preload trending URLs
what to cache
Top 20% most accessed URLs
cache invalidation
On URL deletion or expiration
unique id generation
base62
Uses [a-zA-Z0-9] = 62 characters
length
7 characters = 62^7 = 3.5 trillion URLs
approach
Base62 encoding of auto-incrementing counter or hash
collision handling
Use distributed ID generator (Snowflake) or check-and-increment
- •MD5 hash vs auto-increment: Hash may collide, increment needs coordination
- •SQL vs NoSQL: NoSQL better for key-value lookups at scale
- •Cache size: Balance memory cost vs cache hit rate
- •Horizontal scaling: Add more app servers behind load balancer
- •Database sharding: Partition by hash of short code
- •Read replicas: For read-heavy workload
- •CDN: Cache static content and popular redirects