Skip to content

HW_02_202601 #168

@jeanpool1415

Description

@jeanpool1415


Homework Assignment: Emergency Healthcare Access Inequality in Peru

Course: Python Programming / Data Science
Total Score: 20 points
Deadline: Friday, April 24 — 11:59 PM


Description

The goal of this assignment is to build a complete geospatial analytics pipeline in Python to study emergency healthcare access inequality across districts in Peru.

Using public datasets on:

  • health facilities,

  • emergency care activity,

  • populated centers,

  • and district boundaries,

you will create a project that integrates GeoPandas, Folium, matplotlib / seaborn, and Streamlit into a single analytical product.

This is not a task about simply plotting hospitals on a map.
The purpose is to answer a more difficult analytical question:

Which districts in Peru appear relatively better or worse served in emergency healthcare access, and what evidence supports that conclusion?

You are expected to make methodological decisions, justify them, and communicate them clearly.


Expected Repository Structure

Create a repository named exactly:

emergency_access_peru

with the following structure:

emergency_access_peru/
│
├── app.py                              # Streamlit app
├── README.md                           # Project explanation and methodology
├── requirements.txt                    # Dependencies
│
├── src/
│   ├── data_loader.py                  # Loading functions
│   ├── cleaning.py                     # Cleaning and preprocessing
│   ├── geospatial.py                   # Spatial joins, distance logic, GeoDataFrames
│   ├── metrics.py                      # District-level indicators / score
│   ├── visualization.py                # Static charts and maps
│   └── utils.py                        # Helper functions
│
├── data/
│   ├── raw/                            # Original downloaded files
│   └── processed/                      # Cleaned and processed outputs
│
├── output/
│   ├── figures/                        # Static charts and maps
│   └── tables/                         # Final district-level tables
│
└── video/
    └── link.txt                        # Link to your explanatory video

Public Datasets to Use

You must use all the following datasets:

  1. Populated Centers
    Dataset Centros Poblados

  2. District Boundaries of Peru
    DISTRITOS.shp

  3. Reference Class for District Shapefile Usage
    GeoPandas Class Reference

  4. Emergency Care Production by IPRESS
    Producción Asistencial en Emergencia por IPRESS

  5. IPRESS Health Facilities
    MINSA – IPRESS


Main Objective

Build a district-level emergency healthcare access analysis for Peru.

Your project must combine the four datasets and produce a district-level analytical output that allows comparison across the country.

You are not given a fixed formula.

You must design and justify a district-level measure, index, or framework that helps evaluate emergency healthcare access conditions using:

  • facility availability,

  • emergency care activity,

  • and the spatial relationship between populated centers and health facilities.


Required Analytical Questions

Your project must answer the following questions using the full pipeline and all required tools.

Question 1 — Territorial Availability

Which districts appear to have lower or higher availability of health facilities and emergency care activity?

You must determine how to measure this and justify your approach.


Question 2 — Settlement Access

Which districts seem to have populated centers with weaker spatial access to emergency-related health services?

You must define and justify a spatial access logic from populated centers to facilities.


Question 3 — District Comparison

Which districts appear most underserved and which appear best served when combining:

  • facility presence,

  • emergency activity,

  • and populated-center access patterns?

You must explain the evidence behind your classification or ranking.


Question 4 — Methodological Sensitivity

How much do the district results change if your analytical definition of access changes?

You must build:

  • one baseline specification

  • one alternative specification

Then compare the results and explain what changed.


Technical Requirements

Your project must include all of the following tools:

  • GeoPandas

  • Folium

  • matplotlib

  • seaborn

  • Streamlit


Project Tasks

Task 1 — Data ingestion and cleaning

Score: 3 points

Load all required datasets and prepare them for analysis.

Your code must:

  • load the raw files correctly,

  • standardize column names,

  • document key variables,

  • handle duplicates,

  • remove invalid coordinates when needed,

  • prepare geospatial objects correctly,

  • and save cleaned outputs into data/processed/.

Minimum deliverables

  • cleaned datasets,

  • short data dictionary,

  • summary of filtering and cleaning decisions.


Task 2 — Geospatial integration with GeoPandas

Score: 3 points

Build the geospatial pipeline.

Your code must:

  • create GeoDataFrames,

  • assign facilities to districts,

  • assign populated centers to districts,

  • and build the spatial relationships needed for your district-level analysis.

You must use proper CRS handling and explain it in the README.


Task 3 — District-level metric / framework

Score: 3 points

Construct a district-level analytical output that helps answer the four required questions.

This may be:

  • a score,

  • an index,

  • or a rule-based classification system.

It must include at least:

  • one facility-related component,

  • one emergency-activity component,

  • one populated-center access component.

You must also create:

  • one baseline version,

  • one alternative version,

  • and a comparison between both.


Task 4 — Static analysis and visual reasoning

Score: 2 points

Create static visual outputs using matplotlib and seaborn.

Important:

  • You are not told which graphs to generate.

  • Choosing the correct graphs is part of the assignment.

Your visualizations must help answer the required analytical questions and support your methodological decisions.

In the README, explain:

  • what each graph helps answer,

  • why you chose it,

  • and why it is more useful than another plausible graph type.


Task 5 — Static and interactive geospatial outputs

Score: 2 points

Using GeoPandas and Folium, create:

  • static geospatial outputs for analytical comparison,

  • interactive views for exploration,

  • and district-level visual support for your conclusions.

The maps must help explain the problem, not simply display raw locations.


Task 6 — Streamlit application

Score: 4 points

Build a Streamlit application that communicates the full analysis.

The app must contain exactly 4 tabs:

Tab 1 — Data & Methodology

Include:

  • problem statement,

  • data sources,

  • cleaning summary,

  • methodological decisions,

  • limitations.

Tab 2 — Static Analysis

Include:

  • your selected charts,

  • short interpretations,

  • explanation of why those visuals were selected.

Tab 3 — GeoSpatial Results

Include:

  • static maps,

  • district-level comparisons,

  • and supporting tables.

Tab 4 — Interactive Exploration

Include:

  • Folium maps,

  • district comparison views,

  • and baseline vs alternative comparison.


README.md must include at least

  • What does the project do?

  • What is the main analytical goal?

  • What datasets were used?

  • How were the data cleaned?

  • How were the district-level metrics constructed?

  • How to install the dependencies?

  • How to run the processing pipeline?

  • How to run the Streamlit app?

  • What are the main findings?

  • What are the main limitations?


Explanatory Video

Score: 3 points

Create a video of 4 minutes maximum.

It must show:

  • brief explanation of the repository structure,

  • explanation of your methodology,

  • the pipeline or main scripts,

  • the Streamlit app running,

  • and the main outputs.

Place the link in:

Place the link to your repository and video here as well:
Submission Google Sheet


GitHub Workflow (MANDATORY)

❌ Do not work directly on main — points will be deducted
✅ Create a working branch
✅ Make progressive commits with descriptive messages
✅ Merge into main via a Pull Request

Reminder: working directly on main is not allowed. You must develop your work in branches and only merge the final version through a Pull Request.

Example branch names:

  • feature/geospatial-pipeline

  • feature/streamlit-app

  • feature/final-analysis


Grading Rubric

Task 1 — Data ingestion and cleaning

Criteria | Points -- | -- Correct loading and preprocessing of datasets | 1.0 pt Cleaning decisions documented clearly | 0.75 pt Processed outputs saved correctly | 0.75 pt Data dictionary / filtering summary | 0.5 pt Subtotal | 3.0 pts

Penalties

  • Working directly on main: -2 points

  • Missing explanatory video: -3 points

  • Repository not reproducible: up to -2 points

  • No baseline vs alternative comparison: -1 point

  • No explanation of methodological choices: up to -2 points

  • Streamlit app incomplete or missing required tabs: up to -2 points


Checklist before submitting

  • Repository is named emergency_access_peru

  • All required datasets were used

  • The project includes requirements.txt

  • The project includes README.md

  • The project includes app.py

  • The project includes modular code inside src/

  • Raw and processed data folders are present

  • The Streamlit app has exactly 4 tabs

  • The video link is saved in video/link.txt

  • Work was done through branches and merged via Pull Request

  • The README explains methodology and findings

  • The repository is fully runnable

  • The repository and video links were submitted in the Google Sheet


Submission

Submit the link to your GitHub repository and your video before:

Friday, April 24 — 11:59 PM

Submission form:
Google Sheet for repository and video submission


Final Note

This assignment is intentionally designed to evaluate analytical reasoning, not only coding speed.

The most important part is not producing many outputs.
The most important part is being able to justify:

  • how you defined access,

  • how you combined the datasets,

  • why your chosen graphs and maps are useful,

  • and why your final conclusions are defensible.

Good luck!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions