Madrid Metro Pulse is an advanced data engineering and machine learning project designed to analyze, visualize, and forecast urban mobility patterns in Madrid. By integrating real-time data from the Empresa Municipal de Transportes de Madrid (EMT) with historical pedestrian traffic data, this project provides a dynamic and insightful view into the city's pulse.
At its core, the project leverages a suite of hyper-local time-series models, trained using Meta's Prophet library, to deliver 48-hour pedestrian demand forecasts for hundreds of specific locations. Fused with live bus transit information and rendered on an interactive Mapbox interface, the dashboard offers a unique tool for urban planning, operational logistics, and public transit optimization. This project serves as a comprehensive, end-to-end demonstration of a production-ready data science application, from automated data ingestion and model training to a sophisticated, interactive user dashboard.
- Automated Data Ingestion: Includes a standalone script to perform a one-time, robust fetch of all static bus and line data from the official EMT MobilityLabs API, creating a reliable local database.
- Hyper-Local Time-Series Forecasting: A dedicated script trains hundreds of individual Prophet models, one for each pedestrian sensor, providing granular 48-hour demand forecasts for specific city locations.
- Hybrid Analysis Dashboard: The Streamlit application presents a dual view for any selected location: a historical analysis of typical 24-hour demand patterns and a live 48-hour predictive forecast.
- Geospatial Visualization: Integrates with the Mapbox API to render an interactive map of bus routes, dynamically highlighting selected stops and visualizing the predicted demand intensity with a color-coded halo.
- Dockerized for Portability: The entire application is containerized with Docker, ensuring a consistent and reproducible environment for setup and deployment.
Before you begin, you will need to create accounts with two services to obtain the necessary API credentials (free):
- EMT Madrid: An account with access to the MobilityLabs API. You can register at https://mobilitylabs.emtmadrid.es/. This will provide you with a Client ID (your email) and a Passkey.
- Mapbox: A Mapbox account to generate a public access token for rendering maps. You can sign up and find your token in your account console at https://www.mapbox.com/.
This is the simplest way to get the application running in a consistent environment.
-
Clone the Repository:
git clone [https://github.com/eduardoruiz1990/madrid-metro-pulse.git\](https://github.com/eduardoruiz1990/madrid-metro-pulse.git)
cd madrid-metro-pulse -
Create Your Secrets File:
Create a file named .secrets in the project's root directory and add your credentials (template file provided in repository, remember to rename):
# .streamlit/secrets.toml# EMT Madrid MobilityLabs Credentials
EMT_CLIENT_ID = "your_emt_email_here"
EMT_PASSKEY = "your_emt_passkey_here"# Mapbox Public Access Token
MAPBOX_API_KEY = "your_mapbox_api_key_here" -
Build and Run the Docker Container:
Make sure you have Docker Desktop running on your machine.
docker-compose up --buildThe application will be available at http://localhost:8501.
-
Clone the Repository and set up the environment:
git clone [https://github.com/eduardoruiz1990/madrid-metro-pulse.git\](https://github.com/eduardoruiz1990/madrid-metro-pulse.git)
cd madrid-metro-pulse
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate` -
Install Dependencies:
pip install -r requirements.txt -
Create Your Secrets File:
Follow the instructions in step 2 of the Docker setup to create your .secrets file.
The project follows a three-step workflow:
-
Fetch Static Data (One-Time Setup):
Before running the app for the first time, you must populate your local database. Run the fetcher script from your terminal:
python fetch_api_data.pyThis script will perform the heavy API calls and save the results to the /data folder.
-
Train Predictive Models (One-Time Setup):
Next, train the hyper-local forecast models using your historical pedestrian data:
python train_local_models.pyThis will create hundreds of model files in the /models folder.
-
Run the Streamlit Dashboard:
You can now launch the interactive application:
streamlit run app.py
madrid-metro-pulse/
├── .streamlit/
│ └── secrets.toml # Your private credentials file (ignored by Git)
├── data/ # Raw pedestrian data and fetched API data (ignored by Git)
├── models/ # Trained Prophet models (ignored by Git)
├── .gitignore # Specifies files to ignore
├── README.md # This project overview
├── requirements.txt # Project dependencies
├── fetch_api_data.py # One-time script to fetch static API data
├── train_local_models.py # One-time script to train all forecast models
└── app.py # The main Streamlit dashboard application
This project utilizes data and services provided by the following organizations. Use of these services is subject to their respective terms and conditions.
-
Data Provider: Empresa Municipal de Transportes de Madrid (EMT). - All transit and pedestrian data is sourced via the MobilityLabs API.
-
Mapping & Geocoding: Mapbox - Used for rendering interactive maps and calculating walking routes via the Directions API.
Please review their terms of use before deploying this application for any public or commercial purpose.