Note
This project is a Rust CLI tool developed as a take-home assessment for the Data Pipeline / DevOps Engineer position at Fizyr. It fetches air quality data from the OpenAQ API, stores it in a PostgreSQL database, and provides query capabilities.
Important
This repository contains the full implementation of the Air Quality Analysis CLI tool, designed to fetch, store, and analyze air quality data.
Key features:
- Fetches air quality data from OpenAQ API v3 (locations, sensors, measurements) for specified countries (NL, DE, FR, GR, ES, PK).
- Stores normalized data in a PostgreSQL database.
- Interactive CLI menu for user operations (schema init, data import, queries).
- Query: Find the most polluted country based on recent PM2.5/PM10.
- Query: Calculate 5-day average air quality for a specified country.
- Query: Retrieve latest measurements grouped by locality for a specified country.
- Docker integration (
Dockerfile,docker-compose.yml) for easy setup, including a custom network. - GitHub Actions workflow for CI checks (
cargo check,cargo fmt -- --check). - Unit tests for CLI logic and integration tests for database operations.
- Logging to
logs/app.log.
src/– Rust source code for the CLI application.main.rs- Application entry point, logging setup, interactive loop.api/- Modules for interacting with external APIs (OpenAQ).openaq.rs- Client for the OpenAQ API.
cli/- Command-line interface logic.commands.rs- Command definitions, state management, user prompts.
db/- Database interaction logic.postgres.rs- PostgreSQL connection, schema, queries, insertion.
models/- Data structures (API responses, DB records, output structs).openaq.rs- DefinesDailyMeasurement,DbMeasurement, etc.
error.rs- Custom application error types (AppError).
logs/- Directory for application logs (created automatically).Dockerfile- Defines the container image build process.docker-compose.yml- Orchestrates theappanddatabaseservices using a custom network..github/workflows/– GitHub Actions CI pipeline (ci-rust.yml).Cargo.toml&Cargo.lock- Rust project dependencies.rustfmt.toml- Code formatting configuration.LICENSE- Project license (MIT).
- Docker & Docker Compose: Essential for the containerized setup.
- OpenAQ API Key: Required for fetching real-time data from OpenAQ.
- Sign up at openaq.org to get your key.
ghCLI (Optional): For cloning using thegh repo clonecommand. Install gh.- Rust Toolchain (Optional): Only needed if you plan to run or develop locally outside Docker. Install Rust.
This method ensures the application runs in a consistent environment with its database dependency on a dedicated network.
- Clone the Repository:
# Using gh CLI
gh repo clone mohammadzainabbas/fizyr-assessment
# Or using standard git
git clone https://github.com/mohammadzainabbas/fizyr-assessment.git
cd fizyr-assessment- Configure API Key (Recommended: Use
.envfile):
Important
The application requires your OpenAQ API key. The recommended way to provide it is via an .env file in the project root.
Create a file named .env:
# .env
OPENAQ_KEY=your_actual_api_key_hereThe docker-compose.yml file is configured to read this file and pass the OPENAQ_KEY variable to the app container. Alternatively, you can export OPENAQ_KEY in your shell environment before running Docker Compose.
- Start Database Service:
Run the PostgreSQL database container in detached (background) mode. Data persists in the postgres_data named volume.
docker-compose up -d databaseTip
Allow a few seconds for the database container to initialize fully before proceeding. You can check logs with docker-compose logs -f database.
- Run Application Interactively:
Use docker-compose run to build (if needed) and start the application interactively. This command creates a temporary container for the app service, connects it to the running database service on the shared network, and attaches your terminal.
docker-compose run --rm --build app--rm: Automatically removes the container when the application exits.--build: Rebuilds the application image if source code orDockerfilechanges.
Note
You should see the welcome message and the interactive menu. Use your keyboard (arrow keys, Enter) to navigate. Using docker-compose run is generally preferred for interactive CLI applications like this over docker-compose up app, as it often handles terminal interactions (TTY) more reliably.
- Using the CLI:
Once the application starts, follow the menu prompts:
- Initialize Database Schema: Run this first! Creates the
locations,sensors, andmeasurementstables. - Import Data: Fetches top 10 locations/country, saves locations/sensors, then fetches daily measurements for sensors for the specified number of days (7-365). Includes retries for measurement fetching.
- Query Options: Perform analysis like finding the most polluted country, calculating averages, or viewing city-specific data.
- Stopping Services:
- App Container: Exit the application using the "Exit" menu option or press
Ctrl+Cin the terminal wheredocker-compose runis active. The container will be removed automatically due to--rm. - Database Container: Stop the background database service:
docker-compose downCaution
To stop the database AND delete all stored air quality data, use:
docker-compose down -vIf you prefer to run outside Docker:
- Setup Environment:
- Install and run PostgreSQL locally.
- Create a database (e.g.,
createdb air_quality). - Set environment variables (export in your shell or use a
.envfile and a tool likedotenv-cli):
# .env (Example - Adjust DATABASE_URL for your local setup)
DATABASE_URL=postgres://your_user:your_password@localhost:5432/air_quality
OPENAQ_KEY=your_actual_api_key_here
RUST_LOG=info # Optional: Set log level (e.g., debug, trace)- Build & Run:
# Build the project
cargo build
# Run the interactive application (make sure the database is running first)
# If using .env, you might need a tool like dotenv-cli:
# dotenv cargo run
cargo runCaution
Ensure your PostgreSQL server is running before running cargo run and accessible via the DATABASE_URL environment variable. The application will not start if it cannot connect to the database.
Follow the CLI menu prompts as described in the Docker section.
- Run Tests:
- Unit Tests: (Located in
src/cli/commands.rs)
cargo test- Database Integration Tests: (Located in
src/db/postgres.rs)
Important
These tests require a running PostgreSQL database accessible via the DATABASE_URL environment variable. This can be your local instance or the Dockerized one.
# 1. Ensure a PostgreSQL database is running and accessible via DATABASE_URL.
# Example using Docker Compose:
# docker-compose up -d database
# export DATABASE_URL="postgres://postgres:postgres@localhost:5432/air_quality" # Set for local shell
# 2. Run only the integration tests using the feature flag:
cargo test --features integration-tests
# 3. Stop the Dockerized database if you started it just for the tests:
# docker-compose downThe application fetches air quality data for a predefined list of countries (NL, DE, FR, GR, ES, PK) using the OpenAQ API v3. The import process involves:
- Fetching the top 10 locations for each country.
- Saving these locations and their associated sensor details into dedicated database tables (
locations,sensors). - Fetching daily aggregated measurements for each saved sensor within the user-specified date range, with retry logic for API errors.
- Saving the fetched measurements into the
measurementstable.
The core functionality is exposed through an interactive Command Line Interface (CLI) built using dialoguer, allowing users to:
- Initialize the database schema.
- Import historical data.
- Perform analysis queries on the stored data.
The database uses three main tables:
locations: Stores information about each fetched location (ID, name, coordinates, country, etc.).idis the primary key.sensors: Stores details about each sensor (ID, name, parameter info) and includes a foreign key (location_id) linking back to thelocationstable.idis the primary key.measurements: Stores the daily aggregated air quality measurements.- Columns: Include
id,location_id(denormalized),sensor_id(denormalized, corresponds tosensors.id),location_name(denormalized),parameter_id(denormalized),parameter_name(denormalized),value_avg(NUMERIC, nullable),value_min(NUMERIC, nullable),value_max(NUMERIC, nullable),measurement_count(INT, nullable),unit(denormalized),date_utc(TIMESTAMPTZ),date_local(TEXT),country(denormalized),city(denormalized locality),latitude(denormalized),longitude(denormalized),is_mobile(denormalized),is_monitor(denormalized),owner_name(denormalized),provider_name(denormalized), andcreated_at. - Constraint: A
UNIQUEconstraint exists on(sensor_id, date_utc)to prevent duplicate daily entries for the same sensor.
- Columns: Include
- Initialization: All tables are created idempotently (
CREATE TABLE IF NOT EXISTS) by theinit_schemafunction insrc/db/postgres.rs, triggered via the CLI. - Indexes: Created on relevant columns in
measurements(e.g.,country,parameter_name,date_utc,sensor_id) to optimize query performance.
- Client:
OpenAQClientinopenaq.rsusesreqwestto make asynchronous GET requests to the relevant OpenAQ v3 endpoints (e.g.,/v3/locations,/v3/sensors/{id}/measurements/daily). - Authentication: Uses the
X-API-Keyheader as required by OpenAQ API v3. - Error Handling: Includes checks for network errors and non-success HTTP status codes (4xx, 5xx), logging relevant details. Pagination is handled within the client methods.
- Fallback: Mock data provider is no longer used for import fallback. API errors during import are logged, and processing may skip affected countries/sensors.
- Interaction:
dialoguerprovides interactive prompts (text input, selection menus). - Commands: Defined in the
Commandsenum.clapis used implicitly via the enum structure but full argument parsing is not implemented. - State Management:
AppStateenum tracks whether the database is initialized and if data has been imported, dynamically adjusting the available menu options presented to the user inmain.rs. - Output:
comfy-tableis used to display query results in formatted tables.coloredenhances terminal output.indicatifprovides spinners and progress bars for long-running operations.
- Language (Rust): Chosen for performance, memory safety, strong typing, and its excellent ecosystem for CLI tools (
clap,dialoguer,indicatif) and asynchronous operations (tokio,reqwest,sqlx). - Database (PostgreSQL): A robust, open-source relational database well-suited for structured time-series data and analytical queries.
sqlxprovides compile-time checked SQL queries. - Containerization (Docker): Simplifies setup and ensures environment consistency using
Dockerfile(multi-stage build for smaller image) anddocker-compose.yml. A custom network (air_quality_net) isolates the application and database services. - API Client (
reqwest): A mature and widely used asynchronous HTTP client in the Rust ecosystem. - Error Handling (
thiserror, Custom Enum): Centralized error handling using theAppErrorenum andthiserrorprovides clear, context-specific error types, improving debugging and robustness.Arcis used to wrap non-Cloneerrors. - Modularity: The codebase is organized into logical modules (
api,cli,db,models,error) promoting separation of concerns and maintainability. - Testing:
- Unit tests (
src/cli/commands.rs) use mocking (MockDatabase) to test CLI command logic in isolation. - Integration tests (
src/db/postgres.rs) use thesqlx::testmacro for transactional tests against a real database instance, gated by theintegration-testsfeature flag.
- Unit tests (
- Data Import: Fetches top 10 locations per country, saves locations and sensors to dedicated tables, then fetches daily measurements for each sensor (with retries) and saves them. Uses
ON CONFLICT (id) DO NOTHINGfor locations/sensors andON CONFLICT (sensor_id, date_utc) DO NOTHINGfor measurements to handle duplicates. - Pollution Index: Implements a simple weighted index (
pm2.5 * 1.5 + pm10) for the "most polluted" feature, prioritizing PM2.5.
- Robust Error Handling: Add more sophisticated retry logic (e.g., with exponential backoff) for transient network or API errors during import.
- Configuration File: Move settings (country list, API URL, DB connection details) to a configuration file (e.g.,
config.toml) instead of environment variables or hardcoding. - Database Migrations: Use a dedicated migration tool (like
sqlx-cliorrefinery) for more robust schema management instead ofCREATE TABLE IF NOT EXISTS. - Non-Interactive Mode: Add command-line flags (using
clapmore extensively) for running commands non-interactively (e.g.,air-quality-cli import --days 30). - Query Filtering: Allow users to specify date ranges or parameters for queries via CLI options.
- Enhanced Testing: Increase unit test coverage, particularly for edge cases. Add end-to-end tests simulating full CLI interaction.
This project is licensed under the MIT License - see the LICENSE file for details.