HARDSocial & Media

Design Instagram

Design a photo-sharing social media platform like Instagram. Users can upload photos, follow others, like posts, and view a personalized feed.

Estimated Time: 60 minutes
#social-media#newsfeed#image-storage#graphs#caching
Solution Overview

Instagram requires: (1) CDN for image storage, (2) News feed generation algorithm, (3) Follower/following graph storage, (4) Real-time notifications, (5) Scalable image processing pipeline.

Hints to Get Started
1

Consider read vs write patterns - which is more frequent?

2

How do you handle celebrity users with millions of followers?

3

Think about image sizes - original, thumbnails, different resolutions

4

How would you ensure feed generation is fast for all users?

Requirements

functional

  • Upload photos/videos
  • Follow/unfollow users
  • Like and comment on posts
  • Generate personalized news feed
  • Search users and hashtags
  • Real-time notifications
  • Stories feature (24hr expiry)

non functional

  • Handle 500M daily active users
  • Low latency feed generation (<300ms)
  • High availability for uploads
  • Consistent user experience globally
Capacity Estimation

posts

100M photos uploaded per day

users

1B total users, 500M DAU

storage

2MB avg photo × 100M = 200TB/day

bandwidth

Upload: 23GB/s, View: 230GB/s (10:1 ratio)

feed reads

500M users × 20 feed loads = 10B requests/day

High-Level Architecture

components

  • API Gateway - authentication, rate limiting
  • Upload Service - image processing, CDN upload
  • Feed Generation Service - fanout-on-write or read
  • Graph Service - follower relationships
  • Notification Service - push notifications
  • Search Service - Elasticsearch
  • Object Storage - S3/CDN for images
Deep Dive

database schema

feed cache

user_id, post_ids[] (Redis sorted set by timestamp)

likes table

user_id, post_id, created_at

posts table

post_id, user_id, image_url, caption, created_at

users table

user_id, username, profile_pic, bio, created_at

follows table

follower_id, followee_id, created_at

feed generation

fanout on read

cons

Slow read (must query and merge)

pros

Fast write, saves storage

description

Compute feed when user requests it

implementation

Query recent posts from followed users, rank

fanout on write

cons

Slow write for users with many followers (celebrities)

pros

Fast read (already computed), simple

description

Pre-compute feed when post is created

implementation

Push post to all followers' feed cache

hybrid approach

Use fanout-on-write for regular users, fanout-on-read for celebrities

image upload flow

  • 1. Client uploads image to upload service
  • 2. Service validates, generates unique ID
  • 3. Store original in object storage (S3)
  • 4. Async: Create thumbnails (multiple sizes)
  • 5. Upload processed images to CDN
  • 6. Create post record in database
  • 7. Trigger feed fanout or mark for on-read

ranking algorithm

  • Factors: Recency, likes count, commenter relationship
  • Machine learning: Personalized based on user interests
  • Real-time signals: Recent interactions boost score
Scalability Strategies
  • Shard users table by user_id
  • Shard posts table by post_id
  • Replicate graph database for read scaling
  • Use CDN for global image delivery
  • Cache hot data in Redis (trending posts, user profiles)