System Design – Polling system


Designing a system to handle a real‐time vote during a Super Bowl commercial—30 million votes cast in maybe 30 seconds (≈1 million votes/sec!)—requires careful attention to scalability, low latency, and resiliency. Here’s one possible end-to-end architecture:

1. Traffic Profile & SLAs

  • Peak write load: 30 M votes over 30 s → ~1 M requests/sec.
  • Read load: viewers want instant percentage updates (e.g. every 1–2 s).
  • Latency target: end‐to‐end vote confirmation <100 ms; dashboard update <1 s.

2. Ingestion Layer

  1. Global DNS + CDN (e.g. CloudFront, Fastly)
    • Direct users to the nearest edge PoP.
  2. API Gateway / Load Balancer
    • Autoscaling fleet of stateless vote‐receiving endpoints (e.g. Kubernetes pods, AWS Lambda).
  3. Client SDK with Retry / Idempotency Key
    • Attach a unique client vote-ID to avoid duplicate counting on retry.

3. Stream Buffering

  • Managed streaming system (e.g. Kafka, AWS Kinesis, Google Pub/Sub) sits behind the API layer.
    • Vote‐POST → enqueue onto partitioned stream (partition by region or by vote option).
    • Automatically handles back‐pressure and smooths burstiness.

4. Real-Time Aggregation

  1. Stream Processing (e.g. Kafka Streams, Flink, Kinesis Data Analytics)
    • Windowed aggregations over 1–2 s hops.
    • Maintain running counters for “green” vs. “blue”.
  2. In-Memory Cache (e.g. Redis, ElastiCache)
    • Write aggregates every second into a highly available, in-memory store keyed by time‐window & option.

5. Persistent Storage

  • Time-series database (e.g. DynamoDB with TTLs, InfluxDB, Bigtable)
    • Write final aggregate for each interval (e.g. every 5 s) for historical playback and post-game analysis.
    • Enable ad-hoc queries: “What was the vote split at second 15?”

6. Front-End Dashboard / Mobile App

  • WebSocket or Server-Sent Events
    • Clients subscribe to a “vote stream” channel.
    • Push updated percentages every 1–2 s.
  • Graceful degradation
    • If WS fails, fallback to polling a REST endpoint that returns the latest vote percentages.

7. Autoscaling & Resiliency

  • Autoscale ingestion pods or serverless functions to ≥2× expected peak.
  • Multi-AZ / Multi-Region deployment for split failover.
  • Chaos testing before the event (e.g. inject latency, drop pods).
  • Circuit breakers / bulkheads to isolate failures (e.g. if Redis is slow, degrade update frequency).

8. Observability & Monitoring

  • Metrics: QPS, error rates, processing latency in stream jobs, Redis latency.
  • Dashboards: Grafana / CloudWatch showing real-time vote‐rate vs. capacity.
  • Alerts on anomalies (e.g. queue backlog >10 s).

Vote-Counting Data Flow (Simplified)

  1. User taps “Green” → HTTPS POST /vote →
  2. API Gateway → Ingress pods → Push to Kafka/Kinesis
  3. Stream Processor aggregates → writes to Redis
  4. WebSocket server reads from Redis → broadcasts to clients

Key Design Considerations

  • Idempotency: ensure retries don’t double-count
  • Partitioning: evenly distribute load across shards
  • Back-pressure: stream buffer absorbs bursts; avoid overwhelming DB
  • Latency vs. consistency: eventual consistency is fine for 1–2 s update windows

With this setup you can support 30 M near‐simultaneous votes, provide sub-second feedback to users, and keep the system robust even under the legendary Super Bowl traffic spike.


Leave a comment