Design a Fitness Tracking App

System requirements

Functional

Users can track running, cycling, and weightlifting activities.
Activities can be logged manually or captured automatically via wearable devices and GPS.
The system records detailed workout data such as GPS coordinates, heart rate and etc.
The system tracks user consistency (e.g., weekly streaks, frequency).
Users receive feedback on performance and progress.
Workout data syncs across devices.
Data can be imported from or exported to third-party platforms (e.g., Apple Health, Garmin).
Notifications are sent for goal milestones, reminders, or inactivity.

Non-Functional

High availability: the app must be accessible at all times, especially during peak hours (mornings/evenings).
Real-time ingestion: GPS and biometric data should be uploaded and processed with minimal delay.
Horizontal scalability to handle millions of users and billions of data points per year.
Support for intermittent connectivity: app should buffer data offline and sync later.
Strong data integrity: no loss or duplication of recorded activities.
Low-latency response for key APIs <1s for most user interactions).

Capacity estimation

Let’s assume a moderate success scenario for the app: on the order of 10 million registered users within a few years.

Suppose about 1 million users are active daily (doing at least one tracked activity per day).

For activity tracking, the volume of data points can be significant. A running or cycling workout can easily last 30–60 minutes. If the app samples GPS and sensor data every second, a one-hour run produces ~3600 data points (GPS coordinates, speed, heart rate, etc.).

To be conservative, assume an average of 1000 data points per workout (some shorter or indoor workouts produce fewer).

If 1 million daily active users each log one activity, that’s roughly 1,000,000 activities per day, resulting in about 1 billion sensor readings per day generated by all users in total.

However, it’s likely not everyone exercises daily; we might have around 500k activities/day on weekdays and more on weekends. Even so, the system should handle peak loads of perhaps 50,000–100,000 simultaneous active workouts during popular times (mornings, evenings).

In terms of throughput, if 500k activities are finished per day, and each results in an upload to the server, that is about 5–6 uploads per second on average. Peaks could be higher (if many people finish runs around the hour). Additionally, if real-time syncing is used, continuous data streams from tens of thousands of active users could mean tens of thousands of small updates per second. We must design for high write throughput to ingest this data.

Storage

For storage capacity: each activity record might include a summary (a few hundred bytes: user ID, timestamps, totals, etc.) plus the detailed path or sensor stream.

If stored in compressed form (e.g. an encoded route polyline or compressed JSON), a typical run’s GPS track might be 10–50 KB. Weightlifting logs (sets/reps) are smaller, maybe a few KB.

Assuming an average of 20 KB of detailed data per activity, 500k activities/day would yield ~10 GB/day of new data, or about 3.6 TB per year just for the detailed sensor logs. Over five years, this could accumulate to tens of terabytes of historical data.

API design

We will expose a set of well-defined RESTful API endpoints (and possibly accompanying WebSocket or gRPC streams for live data) to allow the mobile apps (or any client, including third-party integrations) to interact with the system. All API calls will require authentication (e.g. OAuth 2.0 bearer tokens or an equivalent, since data is personal). Here we describe key endpoints and their request/response structures:

User Profile & Authentication

For account management, endpoints like POST /users (to register a new user) and POST /auth/login (to obtain a token) are needed. Once logged in, GET /users/{userId} returns profile information (name, age, etc.) and user settings. We might integrate with OAuth providers (Google, Apple) for login to simplify authentication. The profile data is small but important to get right (including linking to wearable accounts if needed).

Activity Submission

The core endpoint is POST /activities for uploading a completed workout. The request body contains the activity data – e.g.:

{
  "type": "running",
  "start_time": "2025-04-12T14:30:00Z",
  "duration": 3600,                     // seconds
  "distance": 10000,                    // in meters
  "calories": 600,
  "data_points": [
     // optional array of detailed track points or sensor readings 
     // (could be simplified or omitted if data is uploaded as a file or separate stream)
     {"time":0, "lat":43.7000, "lng":-79.4000, "heart_rate":90},
     {"time":1, "lat":43.7005, "lng":-79.3995, "heart_rate":91},
     /* ... */
     {"time":3600, "lat":43.7500, "lng":-79.3500, "heart_rate":150}
  ]
}

The server will respond with a confirmation and the new activity’s ID, e.g. 201 Created with body {"activity_id": "abcd1234", "status": "saved"}. In practice, for efficiency, the data_points array might be compressed or sent in a separate request (or via a file upload to cloud storage) if it’s very large. An alternative design is a two-step process: first call POST /activities/start when a user begins an activity (to create a placeholder and perhaps enable live tracking), then stream data, and finally call /activities/{id}/finish to finalize with summary stats. However, to keep the initial design simpler, we assume the app will upload activities in one go at the end of the workout (this is common for many fitness apps).

Activity Retrieval

Users will want to view past workouts.
GET /activities/{id} returns detailed information for a specific activity (including all metrics, and perhaps an encoded route map polyline or a link to download the GPS track).
For example:

{
  "activity_id": "abcd1234",
  "user_id": "u123",
  "type": "running",
  "start_time": "2025-04-12T14:30:00Z",
  "duration": 3600,
  "distance": 10000,
  "calories": 600,
  "route_polyline": "mjifF|`miObEe@...",
  "average_pace": 360,
  "heart_rate_avg": 130,
  "heart_rate_max": 155,
  "points_count": 3610,
  "created_at": "...",
  "updated_at": "..."
}

The polyline is an encoded string representing the GPS path to reduce size.
If the user took photos or notes, URLs or text can be included.

To list recent activities:

GET /users/{userId}/activities?limit=50&offset=0

This returns an array of summary objects.

Goal Management

The app allows setting and monitoring goals.

POST /goals example:

{
  "user_id": "u123",
  "goal_type": "weekly_distance",
  "target_value": 20000,
  "start_date": "2025-04-01",
  "end_date": "2025-04-30"
}

This defines a 20 km weekly running goal for April 2025.

To retrieve goals:

GET /users/{userId}/goals

Example goal response:

{
  "goal_id": "g789",
  "goal_type": "weekly_distance",
  "target_value": 20000,
  "period": "weekly",
  "start_date": "2025-04-01",
  "end_date": "2025-04-30",
  "current_period_progress": 12000,
  "current_period_start": "2025-04-06",
  "current_period_end": "2025-04-12",
  "achieved": false
}

Analytics and Stats

To support “progress over time”, we may offer:

GET /users/{userId}/stats

This returns aggregated statistics such as:

year-to-date distance
lifetime workouts
streaks
personal bests

Or time-series queries like:

GET /users/{id}/progress?metric=distance&period=weekly

All APIs:

use HTTP status codes (200, 201, 400, 401, etc.)
return JSON
may later support API versioning or GraphQL

Security includes token verification and rate limiting.

Database design

The data model includes: user profiles, activity records, detailed sensor logs, goals, and analytics.

User Profile Store

A Users table stores:

user_id (PK)
name
email
password_hash
preferences
linked device info

Indexed by email for login.
Other entities reference user_id.

Activity Data Store

Each workout session is represented as an Activity record.

Attributes:

activity_id
user_id
type
start_time
duration, distance, calories
avg/max metrics
path_storage_key → where detailed data lives

Storage options:

1. Embedded blob in SQL
Simple but row becomes large.

2. Separate time-series table
One row per point → billions of rows → high write load.

3. NoSQL / time-series DB (recommended)
E.g., Cassandra or ScyllaDB.
Partition by activity_id or user_id for efficient retrieval.

Hybrid approach (best practice):

SQL DB → activity summaries
NoSQL or object storage → detailed GPS/sensor logs

Example schema:

Users(user_id PK, name, email, ...)
Activities(activity_id PK, user_id FK, type, start_time, duration, ..., path_storage_key)
Goals(goal_id PK, user_id FK, goal_type, target_value, period, start_date, end_date)
ActivityTotals(user_id, week_start_date, total_distance, ...)

Large GPS logs stored in S3 or NoSQL.

Analytics Data

We may maintain:

cumulative totals
streaks
weekly progress
personal records

Stored in a UserStats table or Redis for fast access.

Old detailed logs may be archived to cheaper storage.

The database design is polyglot:

Relational DB for core entities
NoSQL/time-series for massive telemetry
Cache for performance
Object storage for files

High-level design

Architecture follows distributed microservices, including:

Mobile App (sensor collection, local buffering)
API Gateway / Load Balancer
User Service
Activity Service
Goal Service
Analytics Service
Notification Service
Integration Services (Apple Health, Garmin)
Relational DB + NoSQL + Redis
Message Queue / Event Bus
Object Storage

All backend services are stateless and horizontally scalable.

Storage Layer

SQL DB → summaries, users, goals
NoSQL → GPS/sensor time-series
S3 → GPX files, route maps
Redis → caching

Message Queue

Used for:

activity_created events
goal updates
analytics updates
notifications

Third-Party Integrations

Used for:

maps
elevation profiles
wearable sync
exporting data

Reference Architecture Diagram (text)

Mobile App
Wearables / GPS
      ↓
API Gateway
      ↓
---------------------------------------------------
| Auth | Activity | Goals | Analytics | Notification |
---------------------------------------------------
      ↓
---------------------- Storage Layer ----------------------
| Relational DB | NoSQL Time-Series | Redis | Object Store |
------------------------------------------------------------
      ↓
             Message Queue / Event Bus

Request flows

1. Activity Tracking Flow (End-to-End)

On device

User taps “Start Run”
Phone collects GPS + heart rate
Stores locally until upload

Upload

User taps “Save Workout” →
App sends POST /activities with summary + data.

Backend flow

API Gateway forwards request
Activity Service:
- validates
- processes GPS/heart-rate data
- computes metrics
- encodes polyline
Stores in SQL (summary)
Stores detailed data in NoSQL / S3
Emits ActivityCreated event
Responds 201 to client

Async processing

Goal Service updates progress
Analytics Service updates stats, streaks
Notification Service may trigger a push

Client may then call GET /activities/{id} or GET /users/{id}/stats.

Live tracking variant

A streaming channel (WebSocket / MQTT) can be added, but not required.

Error handling

If server errors:

App retries
Must avoid duplicate activities → use idempotent client-generated IDs
Incomplete storage must be repaired asynchronously

Goal Setting and Progress Flow

POST /goals → Goal Service writes record.

When ActivityCreated event arrives:

Goal Service checks if activity falls within goal window
Updates progress for current period
Emits GoalAchieved event if target met

Users fetch progress via GET /users/{id}/goals.

Streaks handled in Analytics Service.

Detailed component design

Activity Ingestion Service

Components:

API Handler
Data Processing (cleanup, metrics, polyline)
Storage Manager
Event Emitter
Response Generator

Data processing examples:

GPS smoothing
distance computed via haversine formula
optional elevation lookup
heart rate smoothing
lap detection
polyline encoding

Storage

SQL insert summary
NoSQL/S3 insert data points
Retry logic + compensation if partial failure

Events

activity_created published
reachable by Goal / Analytics / Notification services

GPS Mapping & Route Handling

server computes distance & elevation
polyline encoding for compact storage
optional static map generation
supports future geo features (segments, heatmaps)

Analytics Computation

Hybrid approach:

on-demand for heavy/rare queries
precomputed for weekly totals, streaks, PRs

Stats stored in:

relational UserStats
or Redis for fast read

Handles late uploads, recalculation, consistency, etc.

Trade offs / Tech choices

Real-time Data Streaming

Batch upload simplifies design
WebSocket/MQTT optional for live tracking

Caching

Redis used for stats, profiles, session data
Short TTLs to keep consistency

Failure scenarios / bottlenecks

Server outages

LB removes bad node
retry logic
idempotent uploads

Database issues

replicas & failover
partitioning/sharding
write hotspots

Traffic spikes

autoscaling
queue buffering
batching writes

Offline clients

local buffering on device
late uploads handled gracefully

External API failures

async processing
fallback logic

Future improvements

performance tuning
adding warehouse for analytics
ML-based recommendations
advanced map/segment features
serverless event processors

FilesExpand file tree

003. Design a Fitness Tracking App.md

Latest commit

History

003. Design a Fitness Tracking App.md

File metadata and controls

Design a Fitness Tracking App

System requirements

Functional

Non-Functional

Capacity estimation

Storage

API design

User Profile & Authentication

Activity Submission

Activity Retrieval

Goal Management

Analytics and Stats

Database design

User Profile Store

Activity Data Store

Storage options:

Analytics Data

High-level design

Storage Layer

Message Queue

Third-Party Integrations

Reference Architecture Diagram (text)

Request flows

1. Activity Tracking Flow (End-to-End)

On device

Upload

Backend flow

Async processing

Live tracking variant

Error handling

Goal Setting and Progress Flow

Detailed component design

Activity Ingestion Service

Data processing examples:

Storage

Events

GPS Mapping & Route Handling

Analytics Computation

Trade offs / Tech choices

Real-time Data Streaming

Caching

Failure scenarios / bottlenecks

Server outages

Database issues

Traffic spikes

Offline clients

External API failures

Future improvements