This is a high-end, professionally structured README.md designed to impress recruiters by highlighting technical proficiency, domain impact, and engineering best practices.
Bridging the gap between climate events and actionable insights through automated, high-fidelity data engineering.
disaster-data-replicator is a production-grade ETL pipeline designed to synchronize and normalize "Billion-Dollar Disaster" data. By orchestrating data from the National Weather Service (NWS) and FEMA APIs, this tool provides a unified view of climate-driven economic impacts, enabling researchers and developers to analyze disaster trends with precision.
- 🔄 Multi-Source Synchronization: Seamlessly merges NOAA Billion-Dollar Disaster datasets with real-time FEMA disaster declarations.
- 🛠 Automated ETL Pipeline: Handles data extraction, schema normalization, and validation without manual intervention.
- 📉 Intelligent Rate Limiting: Built-in backoff algorithms and request throttling to respect NWS and FEMA API constraints.
- 🧪 Data Integrity: Uses Pydantic for strict schema enforcement, ensuring that inconsistent API responses don't break downstream analytics.
- 📂 Flexible Export: Supports high-performance output formats including Parquet (for Big Data), CSV, and JSON.
| Category | Tools |
|---|---|
| Language | |
| Data Handling | |
| APIs | noaa-api, fema-data-api, requests |
| DevOps | |
| Domain | Climate Change, Disaster Management, Data Engineering |
- Python 3.9+
- API keys for NOAA/FEMA (if required for higher rate limits)
-
Clone the repository:
git clone https://github.com/yourusername/disaster-data-replicator.git cd disaster-data-replicator -
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
Run the full replication pipeline:
python main.py --start-year 2000 --output-format parquetSync specific agency data:
# Sync only FEMA disaster declarations
python scripts/sync_fema.py --region "FL"graph LR
A[NOAA NWS API] --> E[Ingestion Engine]
B[FEMA API] --> E
E --> F{Normalization}
F --> G[Validation/Pydantic]
G --> H[(Local Storage / S3)]
H --> I[Analytics & Visualization]
Contributions make the open-source community an amazing place to learn, inspire, and create.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Distributed under the MIT License. See LICENSE for more information.
Your Name - @YourTwitter - your.email@example.com
Project Link: https://github.com/yourusername/disaster-data-replicator
This project was developed to provide transparent access to climate-related economic data, supporting the global effort to understand and mitigate the impacts of climate change.