Skip to content

rmkenv/climate-disaster-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is a high-end, professionally structured README.md designed to impress recruiters by highlighting technical proficiency, domain impact, and engineering best practices.


🌪️ disaster-data-replicator

Python Version License: MIT Data Source: NOAA Data Source: FEMA PRs Welcome

Bridging the gap between climate events and actionable insights through automated, high-fidelity data engineering.

disaster-data-replicator is a production-grade ETL pipeline designed to synchronize and normalize "Billion-Dollar Disaster" data. By orchestrating data from the National Weather Service (NWS) and FEMA APIs, this tool provides a unified view of climate-driven economic impacts, enabling researchers and developers to analyze disaster trends with precision.


💡 Key Features

  • 🔄 Multi-Source Synchronization: Seamlessly merges NOAA Billion-Dollar Disaster datasets with real-time FEMA disaster declarations.
  • 🛠 Automated ETL Pipeline: Handles data extraction, schema normalization, and validation without manual intervention.
  • 📉 Intelligent Rate Limiting: Built-in backoff algorithms and request throttling to respect NWS and FEMA API constraints.
  • 🧪 Data Integrity: Uses Pydantic for strict schema enforcement, ensuring that inconsistent API responses don't break downstream analytics.
  • 📂 Flexible Export: Supports high-performance output formats including Parquet (for Big Data), CSV, and JSON.

🛠 Tech Stack

Category Tools
Language Python
Data Handling Pandas Pydantic
APIs noaa-api, fema-data-api, requests
DevOps GitHub Actions Docker
Domain Climate Change, Disaster Management, Data Engineering

🏁 Quick Start

Prerequisites

  • Python 3.9+
  • API keys for NOAA/FEMA (if required for higher rate limits)

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/disaster-data-replicator.git
    cd disaster-data-replicator
  2. Set up a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt

Usage

Run the full replication pipeline:

python main.py --start-year 2000 --output-format parquet

Sync specific agency data:

# Sync only FEMA disaster declarations
python scripts/sync_fema.py --region "FL"

📊 Project Architecture

graph LR
    A[NOAA NWS API] --> E[Ingestion Engine]
    B[FEMA API] --> E
    E --> F{Normalization}
    F --> G[Validation/Pydantic]
    G --> H[(Local Storage / S3)]
    H --> I[Analytics & Visualization]
Loading

🤝 How to Contribute

Contributions make the open-source community an amazing place to learn, inspire, and create.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.


📧 Contact

Your Name - @YourTwitter - your.email@example.com

Project Link: https://github.com/yourusername/disaster-data-replicator


This project was developed to provide transparent access to climate-related economic data, supporting the global effort to understand and mitigate the impacts of climate change.

About

Automated pipeline for synchronizing NOAA Billion-Dollar Disasters data via NWS and FEMA APIs.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages