Skip to content

Latest commit

 

History

History
522 lines (366 loc) · 14.4 KB

File metadata and controls

522 lines (366 loc) · 14.4 KB

Design a Fitness Tracking App

System requirements

Functional

  • Users can track running, cycling, and weightlifting activities.
  • Activities can be logged manually or captured automatically via wearable devices and GPS.
  • The system records detailed workout data such as GPS coordinates, heart rate and etc.
  • The system tracks user consistency (e.g., weekly streaks, frequency).
  • Users receive feedback on performance and progress.
  • Workout data syncs across devices.
  • Data can be imported from or exported to third-party platforms (e.g., Apple Health, Garmin).
  • Notifications are sent for goal milestones, reminders, or inactivity.

Non-Functional

  • High availability: the app must be accessible at all times, especially during peak hours (mornings/evenings).
  • Real-time ingestion: GPS and biometric data should be uploaded and processed with minimal delay.
  • Horizontal scalability to handle millions of users and billions of data points per year.
  • Support for intermittent connectivity: app should buffer data offline and sync later.
  • Strong data integrity: no loss or duplication of recorded activities.
  • Low-latency response for key APIs <1s for most user interactions).

Capacity estimation

Let’s assume a moderate success scenario for the app: on the order of 10 million registered users within a few years.

Suppose about 1 million users are active daily (doing at least one tracked activity per day).

For activity tracking, the volume of data points can be significant. A running or cycling workout can easily last 30–60 minutes. If the app samples GPS and sensor data every second, a one-hour run produces ~3600 data points (GPS coordinates, speed, heart rate, etc.).

To be conservative, assume an average of 1000 data points per workout (some shorter or indoor workouts produce fewer).

If 1 million daily active users each log one activity, that’s roughly 1,000,000 activities per day, resulting in about 1 billion sensor readings per day generated by all users in total.

However, it’s likely not everyone exercises daily; we might have around 500k activities/day on weekdays and more on weekends. Even so, the system should handle peak loads of perhaps 50,000–100,000 simultaneous active workouts during popular times (mornings, evenings).

In terms of throughput, if 500k activities are finished per day, and each results in an upload to the server, that is about 5–6 uploads per second on average. Peaks could be higher (if many people finish runs around the hour). Additionally, if real-time syncing is used, continuous data streams from tens of thousands of active users could mean tens of thousands of small updates per second. We must design for high write throughput to ingest this data.


Storage

For storage capacity: each activity record might include a summary (a few hundred bytes: user ID, timestamps, totals, etc.) plus the detailed path or sensor stream.

If stored in compressed form (e.g. an encoded route polyline or compressed JSON), a typical run’s GPS track might be 10–50 KB. Weightlifting logs (sets/reps) are smaller, maybe a few KB.

Assuming an average of 20 KB of detailed data per activity, 500k activities/day would yield ~10 GB/day of new data, or about 3.6 TB per year just for the detailed sensor logs. Over five years, this could accumulate to tens of terabytes of historical data.


API design

We will expose a set of well-defined RESTful API endpoints (and possibly accompanying WebSocket or gRPC streams for live data) to allow the mobile apps (or any client, including third-party integrations) to interact with the system. All API calls will require authentication (e.g. OAuth 2.0 bearer tokens or an equivalent, since data is personal). Here we describe key endpoints and their request/response structures:

User Profile & Authentication

For account management, endpoints like POST /users (to register a new user) and POST /auth/login (to obtain a token) are needed. Once logged in, GET /users/{userId} returns profile information (name, age, etc.) and user settings. We might integrate with OAuth providers (Google, Apple) for login to simplify authentication. The profile data is small but important to get right (including linking to wearable accounts if needed).


Activity Submission

The core endpoint is POST /activities for uploading a completed workout. The request body contains the activity data – e.g.:

{
  "type": "running",
  "start_time": "2025-04-12T14:30:00Z",
  "duration": 3600,                     // seconds
  "distance": 10000,                    // in meters
  "calories": 600,
  "data_points": [
     // optional array of detailed track points or sensor readings 
     // (could be simplified or omitted if data is uploaded as a file or separate stream)
     {"time":0, "lat":43.7000, "lng":-79.4000, "heart_rate":90},
     {"time":1, "lat":43.7005, "lng":-79.3995, "heart_rate":91},
     /* ... */
     {"time":3600, "lat":43.7500, "lng":-79.3500, "heart_rate":150}
  ]
}

The server will respond with a confirmation and the new activity’s ID, e.g. 201 Created with body {"activity_id": "abcd1234", "status": "saved"}. In practice, for efficiency, the data_points array might be compressed or sent in a separate request (or via a file upload to cloud storage) if it’s very large. An alternative design is a two-step process: first call POST /activities/start when a user begins an activity (to create a placeholder and perhaps enable live tracking), then stream data, and finally call /activities/{id}/finish to finalize with summary stats. However, to keep the initial design simpler, we assume the app will upload activities in one go at the end of the workout (this is common for many fitness apps).

Activity Retrieval

Users will want to view past workouts.
GET /activities/{id} returns detailed information for a specific activity (including all metrics, and perhaps an encoded route map polyline or a link to download the GPS track).
For example:

{
  "activity_id": "abcd1234",
  "user_id": "u123",
  "type": "running",
  "start_time": "2025-04-12T14:30:00Z",
  "duration": 3600,
  "distance": 10000,
  "calories": 600,
  "route_polyline": "mjifF|`miObEe@...",
  "average_pace": 360,
  "heart_rate_avg": 130,
  "heart_rate_max": 155,
  "points_count": 3610,
  "created_at": "...",
  "updated_at": "..."
}

The polyline is an encoded string representing the GPS path to reduce size.
If the user took photos or notes, URLs or text can be included.

To list recent activities:

GET /users/{userId}/activities?limit=50&offset=0

This returns an array of summary objects.


Goal Management

The app allows setting and monitoring goals.

POST /goals example:

{
  "user_id": "u123",
  "goal_type": "weekly_distance",
  "target_value": 20000,
  "start_date": "2025-04-01",
  "end_date": "2025-04-30"
}

This defines a 20 km weekly running goal for April 2025.

To retrieve goals:

GET /users/{userId}/goals

Example goal response:

{
  "goal_id": "g789",
  "goal_type": "weekly_distance",
  "target_value": 20000,
  "period": "weekly",
  "start_date": "2025-04-01",
  "end_date": "2025-04-30",
  "current_period_progress": 12000,
  "current_period_start": "2025-04-06",
  "current_period_end": "2025-04-12",
  "achieved": false
}

Analytics and Stats

To support “progress over time”, we may offer:

GET /users/{userId}/stats

This returns aggregated statistics such as:

  • year-to-date distance
  • lifetime workouts
  • streaks
  • personal bests

Or time-series queries like:

GET /users/{id}/progress?metric=distance&period=weekly

All APIs:

  • use HTTP status codes (200, 201, 400, 401, etc.)
  • return JSON
  • may later support API versioning or GraphQL

Security includes token verification and rate limiting.


Database design

The data model includes: user profiles, activity records, detailed sensor logs, goals, and analytics.


User Profile Store

A Users table stores:

  • user_id (PK)
  • name
  • email
  • password_hash
  • preferences
  • linked device info

Indexed by email for login.
Other entities reference user_id.


Activity Data Store

Each workout session is represented as an Activity record.

Attributes:

  • activity_id
  • user_id
  • type
  • start_time
  • duration, distance, calories
  • avg/max metrics
  • path_storage_key → where detailed data lives

Storage options:

1. Embedded blob in SQL
Simple but row becomes large.

2. Separate time-series table
One row per point → billions of rows → high write load.

3. NoSQL / time-series DB (recommended)
E.g., Cassandra or ScyllaDB.
Partition by activity_id or user_id for efficient retrieval.

Hybrid approach (best practice):

  • SQL DB → activity summaries
  • NoSQL or object storage → detailed GPS/sensor logs

Example schema:

Users(user_id PK, name, email, ...)
Activities(activity_id PK, user_id FK, type, start_time, duration, ..., path_storage_key)
Goals(goal_id PK, user_id FK, goal_type, target_value, period, start_date, end_date)
ActivityTotals(user_id, week_start_date, total_distance, ...)

Large GPS logs stored in S3 or NoSQL.


Analytics Data

We may maintain:

  • cumulative totals
  • streaks
  • weekly progress
  • personal records

Stored in a UserStats table or Redis for fast access.

Old detailed logs may be archived to cheaper storage.

The database design is polyglot:

  • Relational DB for core entities
  • NoSQL/time-series for massive telemetry
  • Cache for performance
  • Object storage for files

High-level design

Architecture follows distributed microservices, including:

  • Mobile App (sensor collection, local buffering)
  • API Gateway / Load Balancer
  • User Service
  • Activity Service
  • Goal Service
  • Analytics Service
  • Notification Service
  • Integration Services (Apple Health, Garmin)
  • Relational DB + NoSQL + Redis
  • Message Queue / Event Bus
  • Object Storage

All backend services are stateless and horizontally scalable.


Storage Layer

  • SQL DB → summaries, users, goals
  • NoSQL → GPS/sensor time-series
  • S3 → GPX files, route maps
  • Redis → caching

Message Queue

Used for:

  • activity_created events
  • goal updates
  • analytics updates
  • notifications

Third-Party Integrations

Used for:

  • maps
  • elevation profiles
  • wearable sync
  • exporting data

Reference Architecture Diagram (text)

Mobile App
Wearables / GPS
      ↓
API Gateway
      ↓
---------------------------------------------------
| Auth | Activity | Goals | Analytics | Notification |
---------------------------------------------------
      ↓
---------------------- Storage Layer ----------------------
| Relational DB | NoSQL Time-Series | Redis | Object Store |
------------------------------------------------------------
      ↓
             Message Queue / Event Bus

Request flows

1. Activity Tracking Flow (End-to-End)

On device

  • User taps “Start Run”
  • Phone collects GPS + heart rate
  • Stores locally until upload

Upload

User taps “Save Workout” →
App sends POST /activities with summary + data.

Backend flow

  1. API Gateway forwards request
  2. Activity Service:
    • validates
    • processes GPS/heart-rate data
    • computes metrics
    • encodes polyline
  3. Stores in SQL (summary)
  4. Stores detailed data in NoSQL / S3
  5. Emits ActivityCreated event
  6. Responds 201 to client

Async processing

  • Goal Service updates progress
  • Analytics Service updates stats, streaks
  • Notification Service may trigger a push

Client may then call GET /activities/{id} or GET /users/{id}/stats.


Live tracking variant

A streaming channel (WebSocket / MQTT) can be added, but not required.


Error handling

If server errors:

  • App retries
  • Must avoid duplicate activities → use idempotent client-generated IDs
  • Incomplete storage must be repaired asynchronously

Goal Setting and Progress Flow

POST /goals → Goal Service writes record.

When ActivityCreated event arrives:

  1. Goal Service checks if activity falls within goal window
  2. Updates progress for current period
  3. Emits GoalAchieved event if target met

Users fetch progress via GET /users/{id}/goals.

Streaks handled in Analytics Service.


Detailed component design

Activity Ingestion Service

Components:

  • API Handler
  • Data Processing (cleanup, metrics, polyline)
  • Storage Manager
  • Event Emitter
  • Response Generator

Data processing examples:

  • GPS smoothing
  • distance computed via haversine formula
  • optional elevation lookup
  • heart rate smoothing
  • lap detection
  • polyline encoding

Storage

  • SQL insert summary
  • NoSQL/S3 insert data points
  • Retry logic + compensation if partial failure

Events

  • activity_created published
  • reachable by Goal / Analytics / Notification services

GPS Mapping & Route Handling

  • server computes distance & elevation
  • polyline encoding for compact storage
  • optional static map generation
  • supports future geo features (segments, heatmaps)

Analytics Computation

Hybrid approach:

  • on-demand for heavy/rare queries
  • precomputed for weekly totals, streaks, PRs

Stats stored in:

  • relational UserStats
  • or Redis for fast read

Handles late uploads, recalculation, consistency, etc.


Trade offs / Tech choices

Real-time Data Streaming

  • Batch upload simplifies design
  • WebSocket/MQTT optional for live tracking

Caching

  • Redis used for stats, profiles, session data
  • Short TTLs to keep consistency

Failure scenarios / bottlenecks

Server outages

  • LB removes bad node
  • retry logic
  • idempotent uploads

Database issues

  • replicas & failover
  • partitioning/sharding
  • write hotspots

Traffic spikes

  • autoscaling
  • queue buffering
  • batching writes

Offline clients

  • local buffering on device
  • late uploads handled gracefully

External API failures

  • async processing
  • fallback logic

Future improvements

  • performance tuning
  • adding warehouse for analytics
  • ML-based recommendations
  • advanced map/segment features
  • serverless event processors