- Users can track running, cycling, and weightlifting activities.
- Activities can be logged manually or captured automatically via wearable devices and GPS.
- The system records detailed workout data such as GPS coordinates, heart rate and etc.
- The system tracks user consistency (e.g., weekly streaks, frequency).
- Users receive feedback on performance and progress.
- Workout data syncs across devices.
- Data can be imported from or exported to third-party platforms (e.g., Apple Health, Garmin).
- Notifications are sent for goal milestones, reminders, or inactivity.
- High availability: the app must be accessible at all times, especially during peak hours (mornings/evenings).
- Real-time ingestion: GPS and biometric data should be uploaded and processed with minimal delay.
- Horizontal scalability to handle millions of users and billions of data points per year.
- Support for intermittent connectivity: app should buffer data offline and sync later.
- Strong data integrity: no loss or duplication of recorded activities.
- Low-latency response for key APIs <1s for most user interactions).
Let’s assume a moderate success scenario for the app: on the order of 10 million registered users within a few years.
Suppose about 1 million users are active daily (doing at least one tracked activity per day).
For activity tracking, the volume of data points can be significant. A running or cycling workout can easily last 30–60 minutes. If the app samples GPS and sensor data every second, a one-hour run produces ~3600 data points (GPS coordinates, speed, heart rate, etc.).
To be conservative, assume an average of 1000 data points per workout (some shorter or indoor workouts produce fewer).
If 1 million daily active users each log one activity, that’s roughly 1,000,000 activities per day, resulting in about 1 billion sensor readings per day generated by all users in total.
However, it’s likely not everyone exercises daily; we might have around 500k activities/day on weekdays and more on weekends. Even so, the system should handle peak loads of perhaps 50,000–100,000 simultaneous active workouts during popular times (mornings, evenings).
In terms of throughput, if 500k activities are finished per day, and each results in an upload to the server, that is about 5–6 uploads per second on average. Peaks could be higher (if many people finish runs around the hour). Additionally, if real-time syncing is used, continuous data streams from tens of thousands of active users could mean tens of thousands of small updates per second. We must design for high write throughput to ingest this data.
For storage capacity: each activity record might include a summary (a few hundred bytes: user ID, timestamps, totals, etc.) plus the detailed path or sensor stream.
If stored in compressed form (e.g. an encoded route polyline or compressed JSON), a typical run’s GPS track might be 10–50 KB. Weightlifting logs (sets/reps) are smaller, maybe a few KB.
Assuming an average of 20 KB of detailed data per activity, 500k activities/day would yield ~10 GB/day of new data, or about 3.6 TB per year just for the detailed sensor logs. Over five years, this could accumulate to tens of terabytes of historical data.
We will expose a set of well-defined RESTful API endpoints (and possibly accompanying WebSocket or gRPC streams for live data) to allow the mobile apps (or any client, including third-party integrations) to interact with the system. All API calls will require authentication (e.g. OAuth 2.0 bearer tokens or an equivalent, since data is personal). Here we describe key endpoints and their request/response structures:
For account management, endpoints like POST /users (to register a new user) and POST /auth/login (to obtain a token) are needed. Once logged in, GET /users/{userId} returns profile information (name, age, etc.) and user settings. We might integrate with OAuth providers (Google, Apple) for login to simplify authentication. The profile data is small but important to get right (including linking to wearable accounts if needed).
The core endpoint is POST /activities for uploading a completed workout. The request body contains the activity data – e.g.:
{
"type": "running",
"start_time": "2025-04-12T14:30:00Z",
"duration": 3600, // seconds
"distance": 10000, // in meters
"calories": 600,
"data_points": [
// optional array of detailed track points or sensor readings
// (could be simplified or omitted if data is uploaded as a file or separate stream)
{"time":0, "lat":43.7000, "lng":-79.4000, "heart_rate":90},
{"time":1, "lat":43.7005, "lng":-79.3995, "heart_rate":91},
/* ... */
{"time":3600, "lat":43.7500, "lng":-79.3500, "heart_rate":150}
]
}The server will respond with a confirmation and the new activity’s ID, e.g. 201 Created with body {"activity_id": "abcd1234", "status": "saved"}. In practice, for efficiency, the data_points array might be compressed or sent in a separate request (or via a file upload to cloud storage) if it’s very large. An alternative design is a two-step process: first call POST /activities/start when a user begins an activity (to create a placeholder and perhaps enable live tracking), then stream data, and finally call /activities/{id}/finish to finalize with summary stats. However, to keep the initial design simpler, we assume the app will upload activities in one go at the end of the workout (this is common for many fitness apps).
Users will want to view past workouts.
GET /activities/{id} returns detailed information for a specific activity (including all metrics, and perhaps an encoded route map polyline or a link to download the GPS track).
For example:
{
"activity_id": "abcd1234",
"user_id": "u123",
"type": "running",
"start_time": "2025-04-12T14:30:00Z",
"duration": 3600,
"distance": 10000,
"calories": 600,
"route_polyline": "mjifF|`miObEe@...",
"average_pace": 360,
"heart_rate_avg": 130,
"heart_rate_max": 155,
"points_count": 3610,
"created_at": "...",
"updated_at": "..."
}The polyline is an encoded string representing the GPS path to reduce size.
If the user took photos or notes, URLs or text can be included.
To list recent activities:
GET /users/{userId}/activities?limit=50&offset=0
This returns an array of summary objects.
The app allows setting and monitoring goals.
POST /goals example:
{
"user_id": "u123",
"goal_type": "weekly_distance",
"target_value": 20000,
"start_date": "2025-04-01",
"end_date": "2025-04-30"
}This defines a 20 km weekly running goal for April 2025.
To retrieve goals:
GET /users/{userId}/goals
Example goal response:
{
"goal_id": "g789",
"goal_type": "weekly_distance",
"target_value": 20000,
"period": "weekly",
"start_date": "2025-04-01",
"end_date": "2025-04-30",
"current_period_progress": 12000,
"current_period_start": "2025-04-06",
"current_period_end": "2025-04-12",
"achieved": false
}To support “progress over time”, we may offer:
GET /users/{userId}/stats
This returns aggregated statistics such as:
- year-to-date distance
- lifetime workouts
- streaks
- personal bests
Or time-series queries like:
GET /users/{id}/progress?metric=distance&period=weekly
All APIs:
- use HTTP status codes (200, 201, 400, 401, etc.)
- return JSON
- may later support API versioning or GraphQL
Security includes token verification and rate limiting.
The data model includes: user profiles, activity records, detailed sensor logs, goals, and analytics.
A Users table stores:
- user_id (PK)
- name
- password_hash
- preferences
- linked device info
Indexed by email for login.
Other entities reference user_id.
Each workout session is represented as an Activity record.
Attributes:
- activity_id
- user_id
- type
- start_time
- duration, distance, calories
- avg/max metrics
path_storage_key→ where detailed data lives
1. Embedded blob in SQL
Simple but row becomes large.
2. Separate time-series table
One row per point → billions of rows → high write load.
3. NoSQL / time-series DB (recommended)
E.g., Cassandra or ScyllaDB.
Partition by activity_id or user_id for efficient retrieval.
Hybrid approach (best practice):
- SQL DB → activity summaries
- NoSQL or object storage → detailed GPS/sensor logs
Example schema:
Users(user_id PK, name, email, ...)
Activities(activity_id PK, user_id FK, type, start_time, duration, ..., path_storage_key)
Goals(goal_id PK, user_id FK, goal_type, target_value, period, start_date, end_date)
ActivityTotals(user_id, week_start_date, total_distance, ...)
Large GPS logs stored in S3 or NoSQL.
We may maintain:
- cumulative totals
- streaks
- weekly progress
- personal records
Stored in a UserStats table or Redis for fast access.
Old detailed logs may be archived to cheaper storage.
The database design is polyglot:
- Relational DB for core entities
- NoSQL/time-series for massive telemetry
- Cache for performance
- Object storage for files
Architecture follows distributed microservices, including:
- Mobile App (sensor collection, local buffering)
- API Gateway / Load Balancer
- User Service
- Activity Service
- Goal Service
- Analytics Service
- Notification Service
- Integration Services (Apple Health, Garmin)
- Relational DB + NoSQL + Redis
- Message Queue / Event Bus
- Object Storage
All backend services are stateless and horizontally scalable.
- SQL DB → summaries, users, goals
- NoSQL → GPS/sensor time-series
- S3 → GPX files, route maps
- Redis → caching
Used for:
- activity_created events
- goal updates
- analytics updates
- notifications
Used for:
- maps
- elevation profiles
- wearable sync
- exporting data
Mobile App
Wearables / GPS
↓
API Gateway
↓
---------------------------------------------------
| Auth | Activity | Goals | Analytics | Notification |
---------------------------------------------------
↓
---------------------- Storage Layer ----------------------
| Relational DB | NoSQL Time-Series | Redis | Object Store |
------------------------------------------------------------
↓
Message Queue / Event Bus
- User taps “Start Run”
- Phone collects GPS + heart rate
- Stores locally until upload
User taps “Save Workout” →
App sends POST /activities with summary + data.
- API Gateway forwards request
- Activity Service:
- validates
- processes GPS/heart-rate data
- computes metrics
- encodes polyline
- Stores in SQL (summary)
- Stores detailed data in NoSQL / S3
- Emits
ActivityCreatedevent - Responds 201 to client
- Goal Service updates progress
- Analytics Service updates stats, streaks
- Notification Service may trigger a push
Client may then call GET /activities/{id} or GET /users/{id}/stats.
A streaming channel (WebSocket / MQTT) can be added, but not required.
If server errors:
- App retries
- Must avoid duplicate activities → use idempotent client-generated IDs
- Incomplete storage must be repaired asynchronously
POST /goals → Goal Service writes record.
When ActivityCreated event arrives:
- Goal Service checks if activity falls within goal window
- Updates progress for current period
- Emits GoalAchieved event if target met
Users fetch progress via GET /users/{id}/goals.
Streaks handled in Analytics Service.
Components:
- API Handler
- Data Processing (cleanup, metrics, polyline)
- Storage Manager
- Event Emitter
- Response Generator
- GPS smoothing
- distance computed via haversine formula
- optional elevation lookup
- heart rate smoothing
- lap detection
- polyline encoding
- SQL insert summary
- NoSQL/S3 insert data points
- Retry logic + compensation if partial failure
- activity_created published
- reachable by Goal / Analytics / Notification services
- server computes distance & elevation
- polyline encoding for compact storage
- optional static map generation
- supports future geo features (segments, heatmaps)
Hybrid approach:
- on-demand for heavy/rare queries
- precomputed for weekly totals, streaks, PRs
Stats stored in:
- relational
UserStats - or Redis for fast read
Handles late uploads, recalculation, consistency, etc.
- Batch upload simplifies design
- WebSocket/MQTT optional for live tracking
- Redis used for stats, profiles, session data
- Short TTLs to keep consistency
- LB removes bad node
- retry logic
- idempotent uploads
- replicas & failover
- partitioning/sharding
- write hotspots
- autoscaling
- queue buffering
- batching writes
- local buffering on device
- late uploads handled gracefully
- async processing
- fallback logic
- performance tuning
- adding warehouse for analytics
- ML-based recommendations
- advanced map/segment features
- serverless event processors