Skip to content

MuhammadAnas4774/Urban-Environmental-Intelligence-Engine-UEIE-2025-

Repository files navigation

Urban Environmental Intelligence Engine (UEIE-2025)

The Urban Environmental Intelligence Engine (UEIE-2025) is a scalable smart-city analytics system designed to detect environmental anomalies using hourly air-quality data from 100 global sensor stations collected via the OpenAQ Global Air Quality API for the year 2025.

Project Overview

This project implements a diagnostic engine for identifying environmental anomalies in air quality data from 100 global sensor nodes. The system analyzes hourly values for PM2.5, PM10, NO2, Ozone, Temperature, and Humidity throughout the year 2025.

Project Structure

Data Scienece Assignment 2/
├── data_fetcher.py          # OpenAQ API data fetching module
├── data_processor.py         # Big data processing module
├── task1_dimensionality.py  # Task 1: PCA dimensionality reduction
├── task2_temporal.py         # Task 2: High-density temporal analysis
├── task3_distribution.py     # Task 3: Distribution modeling
├── task4_visual_integrity.py # Task 4: Visual integrity audit
├── main.py                   # Main pipeline script
├── dashboard.py              # Streamlit interactive dashboard
├── requirements.txt          # Python dependencies
└── README.md                 # This file

Installation

  1. Install Python 3.8 or higher
  2. Install dependencies:
pip install -r requirements.txt

Usage

1. Fetch and Process Data

Run the main pipeline to fetch data and execute all tasks:

python main.py

This will:

  • Fetch data from OpenAQ API (or create synthetic data if API is unavailable)
  • Process and cache the data
  • Execute all four tasks
  • Generate visualizations in outputs/ directory

2. Run Individual Tasks

Each task can be run independently:

python task1_dimensionality.py
python task2_temporal.py
python task3_distribution.py
python task4_visual_integrity.py

3. Launch Interactive Dashboard

streamlit run dashboard.py

The dashboard will open in your browser at http://localhost:8501

Task Descriptions

Task 1: The Dimensionality Challenge (25%)

  • Applies PCA to project 6-dimensional environmental data into 2D
  • Visualizes Industrial vs Residential zone clustering
  • Analyzes PCA loadings to identify main pollution drivers

Outputs:

  • outputs/task1_clusters.png - Zone clustering visualization
  • outputs/task1_loadings.png - PCA loadings visualization
  • outputs/task1_loadings.csv - Loadings data

Task 2: High-Density Temporal Analysis (25%)

  • Identifies PM2.5 > 35 violations across 100 sensors
  • Uses heatmap visualization to avoid overplotting
  • Identifies periodic signatures (daily vs monthly patterns)

Outputs:

  • outputs/task2_heatmap.png - High-density heatmap
  • outputs/task2_small_multiples.png - Temporal pattern analysis

Task 3: Distribution Modeling & Tail Integrity (25%)

  • Creates peak-optimized and tail-optimized distribution plots
  • Determines 99th percentile of pollution levels
  • Provides technical justification for plot selection

Outputs:

  • outputs/task3_peak_optimized.png - Peak-optimized plot
  • outputs/task3_tail_optimized.png - Tail-optimized plot
  • outputs/task3_comparison.png - Side-by-side comparison
  • outputs/task3_percentiles.csv - Percentile statistics
  • outputs/task3_justification.txt - Technical justification

Task 4: Visual Integrity Audit (25%)

  • Evaluates 3D bar chart proposal (REJECTED)
  • Implements Bivariate Mapping and Small Multiples alternatives
  • Justifies Sequential color scale choice

Outputs:

  • outputs/task4_bivariate_mapping.png - Bivariate mapping
  • outputs/task4_small_multiples.png - Small multiples
  • outputs/task4_evaluation.txt - 3D proposal evaluation
  • outputs/task4_color_justification.txt - Color scale justification

Technical Constraints

  1. Big Data Handling: Uses Parquet format and chunked processing for efficiency
  2. No Graphical Ducks: All visualizations avoid 3D effects, shadows, and unnecessary grids
  3. Reproducibility: Modular Python pipeline (not Jupyter notebooks)
  4. Data Location: All data stored in D:/Data Scienece Assignment 2/data/

Data Source

  • API: OpenAQ Global Air Quality API
  • Parameters: PM2.5, PM10, NO2, Ozone, Temperature, Humidity
  • Stations: 100 global sensor nodes
  • Period: Entire year 2025 (hourly values)

Key Features

  • Modular Design: Each task is a separate module
  • Efficient Processing: Handles multi-gigabyte datasets
  • Interactive Dashboard: Streamlit-based visualization interface
  • Comprehensive Analysis: Covers dimensionality reduction, temporal analysis, distribution modeling, and visual integrity

Output Directory Structure

outputs/
├── task1_clusters.png
├── task1_loadings.png
├── task1_loadings.csv
├── task2_heatmap.png
├── task2_small_multiples.png
├── task3_peak_optimized.png
├── task3_tail_optimized.png
├── task3_comparison.png
├── task3_percentiles.csv
├── task3_justification.txt
├── task4_bivariate_mapping.png
├── task4_small_multiples.png
├── task4_evaluation.txt
└── task4_color_justification.txt

Notes

  • If OpenAQ API is unavailable, the system automatically generates synthetic data for demonstration
  • All visualizations follow Tufte's principles: maximize data-ink ratio, minimize chartjunk
  • Sequential color scales (YlOrRd) are used for quantitative data visualization

Author

Muhammad Anas


Data Science Assignment 2 - Urban Environmental Intelligence Challenge

About

The Urban Environmental Intelligence Engine (UEIE-2025) is a scalable smart-city analytics system designed to detect environmental anomalies using hourly air-quality data from 100 global sensor stations collected via the OpenAQ Global Air Quality API for the year 2025.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages