Homework Assignment: Emergency Healthcare Access Inequality in Peru
Course: Python Programming / Data Science
Total Score: 20 points
Deadline: Friday, April 24 — 11:59 PM
Description
The goal of this assignment is to build a complete geospatial analytics pipeline in Python to study emergency healthcare access inequality across districts in Peru.
Using public datasets on:
health facilities,
emergency care activity,
populated centers,
and district boundaries,
you will create a project that integrates GeoPandas, Folium, matplotlib / seaborn, and Streamlit into a single analytical product.
This is not a task about simply plotting hospitals on a map.
The purpose is to answer a more difficult analytical question:
Which districts in Peru appear relatively better or worse served in emergency healthcare access, and what evidence supports that conclusion?
You are expected to make methodological decisions, justify them, and communicate them clearly.
Expected Repository Structure
Create a repository named exactly:
emergency_access_peru
with the following structure:
emergency_access_peru/
│
├── app.py # Streamlit app
├── README.md # Project explanation and methodology
├── requirements.txt # Dependencies
│
├── src/
│ ├── data_loader.py # Loading functions
│ ├── cleaning.py # Cleaning and preprocessing
│ ├── geospatial.py # Spatial joins, distance logic, GeoDataFrames
│ ├── metrics.py # District-level indicators / score
│ ├── visualization.py # Static charts and maps
│ └── utils.py # Helper functions
│
├── data/
│ ├── raw/ # Original downloaded files
│ └── processed/ # Cleaned and processed outputs
│
├── output/
│ ├── figures/ # Static charts and maps
│ └── tables/ # Final district-level tables
│
└── video/
└── link.txt # Link to your explanatory video
Public Datasets to Use
You must use all the following datasets:
Populated Centers
Dataset Centros Poblados
District Boundaries of Peru
DISTRITOS.shp
Reference Class for District Shapefile Usage
GeoPandas Class Reference
Emergency Care Production by IPRESS
Producción Asistencial en Emergencia por IPRESS
IPRESS Health Facilities
MINSA – IPRESS
Main Objective
Build a district-level emergency healthcare access analysis for Peru.
Your project must combine the four datasets and produce a district-level analytical output that allows comparison across the country.
You are not given a fixed formula.
You must design and justify a district-level measure, index, or framework that helps evaluate emergency healthcare access conditions using:
Required Analytical Questions
Your project must answer the following questions using the full pipeline and all required tools.
Question 1 — Territorial Availability
Which districts appear to have lower or higher availability of health facilities and emergency care activity?
You must determine how to measure this and justify your approach.
Question 2 — Settlement Access
Which districts seem to have populated centers with weaker spatial access to emergency-related health services?
You must define and justify a spatial access logic from populated centers to facilities.
Question 3 — District Comparison
Which districts appear most underserved and which appear best served when combining:
You must explain the evidence behind your classification or ranking.
Question 4 — Methodological Sensitivity
How much do the district results change if your analytical definition of access changes?
You must build:
Then compare the results and explain what changed.
Technical Requirements
Your project must include all of the following tools:
GeoPandas
Folium
matplotlib
seaborn
Streamlit
Project Tasks
Task 1 — Data ingestion and cleaning
Score: 3 points
Load all required datasets and prepare them for analysis.
Your code must:
load the raw files correctly,
standardize column names,
document key variables,
handle duplicates,
remove invalid coordinates when needed,
prepare geospatial objects correctly,
and save cleaned outputs into data/processed/.
Minimum deliverables
Task 2 — Geospatial integration with GeoPandas
Score: 3 points
Build the geospatial pipeline.
Your code must:
create GeoDataFrames,
assign facilities to districts,
assign populated centers to districts,
and build the spatial relationships needed for your district-level analysis.
You must use proper CRS handling and explain it in the README.
Task 3 — District-level metric / framework
Score: 3 points
Construct a district-level analytical output that helps answer the four required questions.
This may be:
It must include at least:
one facility-related component,
one emergency-activity component,
one populated-center access component.
You must also create:
Task 4 — Static analysis and visual reasoning
Score: 2 points
Create static visual outputs using matplotlib and seaborn.
Important:
Your visualizations must help answer the required analytical questions and support your methodological decisions.
In the README, explain:
Task 5 — Static and interactive geospatial outputs
Score: 2 points
Using GeoPandas and Folium, create:
static geospatial outputs for analytical comparison,
interactive views for exploration,
and district-level visual support for your conclusions.
The maps must help explain the problem, not simply display raw locations.
Task 6 — Streamlit application
Score: 4 points
Build a Streamlit application that communicates the full analysis.
The app must contain exactly 4 tabs:
Tab 1 — Data & Methodology
Include:
Tab 2 — Static Analysis
Include:
Tab 3 — GeoSpatial Results
Include:
Tab 4 — Interactive Exploration
Include:
README.md must include at least
What does the project do?
What is the main analytical goal?
What datasets were used?
How were the data cleaned?
How were the district-level metrics constructed?
How to install the dependencies?
How to run the processing pipeline?
How to run the Streamlit app?
What are the main findings?
What are the main limitations?
Explanatory Video
Score: 3 points
Create a video of 4 minutes maximum.
It must show:
brief explanation of the repository structure,
explanation of your methodology,
the pipeline or main scripts,
the Streamlit app running,
and the main outputs.
Place the link in:
Place the link to your repository and video here as well:
Submission Google Sheet
GitHub Workflow (MANDATORY)
❌ Do not work directly on main — points will be deducted
✅ Create a working branch
✅ Make progressive commits with descriptive messages
✅ Merge into main via a Pull Request
Reminder: working directly on main is not allowed. You must develop your work in branches and only merge the final version through a Pull Request.
Example branch names:
Grading Rubric
Task 1 — Data ingestion and cleaning
Criteria | Points
-- | --
Correct loading and preprocessing of datasets | 1.0 pt
Cleaning decisions documented clearly | 0.75 pt
Processed outputs saved correctly | 0.75 pt
Data dictionary / filtering summary | 0.5 pt
Subtotal | 3.0 pts
Penalties
Working directly on main: -2 points
Missing explanatory video: -3 points
Repository not reproducible: up to -2 points
No baseline vs alternative comparison: -1 point
No explanation of methodological choices: up to -2 points
Streamlit app incomplete or missing required tabs: up to -2 points
Checklist before submitting
Repository is named emergency_access_peru
All required datasets were used
The project includes requirements.txt
The project includes README.md
The project includes app.py
The project includes modular code inside src/
Raw and processed data folders are present
The Streamlit app has exactly 4 tabs
The video link is saved in video/link.txt
Work was done through branches and merged via Pull Request
The README explains methodology and findings
The repository is fully runnable
The repository and video links were submitted in the Google Sheet
Submission
Submit the link to your GitHub repository and your video before:
Friday, April 24 — 11:59 PM
Submission form:
Google Sheet for repository and video submission
Final Note
This assignment is intentionally designed to evaluate analytical reasoning, not only coding speed.
The most important part is not producing many outputs.
The most important part is being able to justify:
how you defined access,
how you combined the datasets,
why your chosen graphs and maps are useful,
and why your final conclusions are defensible.
Good luck!
Homework Assignment: Emergency Healthcare Access Inequality in Peru
Course: Python Programming / Data Science
Total Score: 20 points
Deadline: Friday, April 24 — 11:59 PM
Description
The goal of this assignment is to build a complete geospatial analytics pipeline in Python to study emergency healthcare access inequality across districts in Peru.
Using public datasets on:
health facilities,
emergency care activity,
populated centers,
and district boundaries,
you will create a project that integrates GeoPandas, Folium, matplotlib / seaborn, and Streamlit into a single analytical product.
This is not a task about simply plotting hospitals on a map.
The purpose is to answer a more difficult analytical question:
You are expected to make methodological decisions, justify them, and communicate them clearly.
Expected Repository Structure
Create a repository named exactly:
emergency_access_peruwith the following structure:
Public Datasets to Use
You must use all the following datasets:
Populated Centers
Dataset Centros Poblados
District Boundaries of Peru
DISTRITOS.shp
Reference Class for District Shapefile Usage
GeoPandas Class Reference
Emergency Care Production by IPRESS
Producción Asistencial en Emergencia por IPRESS
IPRESS Health Facilities
MINSA – IPRESS
Main Objective
Build a district-level emergency healthcare access analysis for Peru.
Your project must combine the four datasets and produce a district-level analytical output that allows comparison across the country.
You are not given a fixed formula.
You must design and justify a district-level measure, index, or framework that helps evaluate emergency healthcare access conditions using:
facility availability,
emergency care activity,
and the spatial relationship between populated centers and health facilities.
Required Analytical Questions
Your project must answer the following questions using the full pipeline and all required tools.
Question 1 — Territorial Availability
Which districts appear to have lower or higher availability of health facilities and emergency care activity?
You must determine how to measure this and justify your approach.
Question 2 — Settlement Access
Which districts seem to have populated centers with weaker spatial access to emergency-related health services?
You must define and justify a spatial access logic from populated centers to facilities.
Question 3 — District Comparison
Which districts appear most underserved and which appear best served when combining:
facility presence,
emergency activity,
and populated-center access patterns?
You must explain the evidence behind your classification or ranking.
Question 4 — Methodological Sensitivity
How much do the district results change if your analytical definition of access changes?
You must build:
one baseline specification
one alternative specification
Then compare the results and explain what changed.
Technical Requirements
Your project must include all of the following tools:
GeoPandas
Folium
matplotlib
seaborn
Streamlit
Project Tasks
Task 1 — Data ingestion and cleaning
Score: 3 points
Load all required datasets and prepare them for analysis.
Your code must:
load the raw files correctly,
standardize column names,
document key variables,
handle duplicates,
remove invalid coordinates when needed,
prepare geospatial objects correctly,
and save cleaned outputs into
data/processed/.Minimum deliverables
cleaned datasets,
short data dictionary,
summary of filtering and cleaning decisions.
Task 2 — Geospatial integration with GeoPandas
Score: 3 points
Build the geospatial pipeline.
Your code must:
create GeoDataFrames,
assign facilities to districts,
assign populated centers to districts,
and build the spatial relationships needed for your district-level analysis.
You must use proper CRS handling and explain it in the README.
Task 3 — District-level metric / framework
Score: 3 points
Construct a district-level analytical output that helps answer the four required questions.
This may be:
a score,
an index,
or a rule-based classification system.
It must include at least:
one facility-related component,
one emergency-activity component,
one populated-center access component.
You must also create:
one baseline version,
one alternative version,
and a comparison between both.
Task 4 — Static analysis and visual reasoning
Score: 2 points
Create static visual outputs using matplotlib and seaborn.
Important:
You are not told which graphs to generate.
Choosing the correct graphs is part of the assignment.
Your visualizations must help answer the required analytical questions and support your methodological decisions.
In the README, explain:
what each graph helps answer,
why you chose it,
and why it is more useful than another plausible graph type.
Task 5 — Static and interactive geospatial outputs
Score: 2 points
Using GeoPandas and Folium, create:
static geospatial outputs for analytical comparison,
interactive views for exploration,
and district-level visual support for your conclusions.
The maps must help explain the problem, not simply display raw locations.
Task 6 — Streamlit application
Score: 4 points
Build a Streamlit application that communicates the full analysis.
The app must contain exactly 4 tabs:
Tab 1 — Data & Methodology
Include:
problem statement,
data sources,
cleaning summary,
methodological decisions,
limitations.
Tab 2 — Static Analysis
Include:
your selected charts,
short interpretations,
explanation of why those visuals were selected.
Tab 3 — GeoSpatial Results
Include:
static maps,
district-level comparisons,
and supporting tables.
Tab 4 — Interactive Exploration
Include:
Folium maps,
district comparison views,
and baseline vs alternative comparison.
README.md must include at least
What does the project do?
What is the main analytical goal?
What datasets were used?
How were the data cleaned?
How were the district-level metrics constructed?
How to install the dependencies?
How to run the processing pipeline?
How to run the Streamlit app?
What are the main findings?
What are the main limitations?
Explanatory Video
Score: 3 points
Create a video of 4 minutes maximum.
It must show:
brief explanation of the repository structure,
explanation of your methodology,
the pipeline or main scripts,
the Streamlit app running,
and the main outputs.
Place the link in:
Place the link to your repository and video here as well:
Submission Google Sheet
GitHub Workflow (MANDATORY)
❌ Do not work directly on
main— points will be deducted✅ Create a working branch
✅ Make progressive commits with descriptive messages
✅ Merge into
mainvia a Pull RequestReminder: working directly on
mainis not allowed. You must develop your work in branches and only merge the final version through a Pull Request.Example branch names:
feature/geospatial-pipelinefeature/streamlit-appfeature/final-analysisGrading Rubric
Task 1 — Data ingestion and cleaning
Criteria | Points -- | -- Correct loading and preprocessing of datasets | 1.0 pt Cleaning decisions documented clearly | 0.75 pt Processed outputs saved correctly | 0.75 pt Data dictionary / filtering summary | 0.5 pt Subtotal | 3.0 ptsPenalties
Working directly on
main: -2 pointsMissing explanatory video: -3 points
Repository not reproducible: up to -2 points
No baseline vs alternative comparison: -1 point
No explanation of methodological choices: up to -2 points
Streamlit app incomplete or missing required tabs: up to -2 points
Checklist before submitting
Repository is named
emergency_access_peruAll required datasets were used
The project includes
requirements.txtThe project includes
README.mdThe project includes
app.pyThe project includes modular code inside
src/Raw and processed data folders are present
The Streamlit app has exactly 4 tabs
The video link is saved in
video/link.txtWork was done through branches and merged via Pull Request
The README explains methodology and findings
The repository is fully runnable
The repository and video links were submitted in the Google Sheet
Submission
Submit the link to your GitHub repository and your video before:
Friday, April 24 — 11:59 PM
Submission form:
Google Sheet for repository and video submission
Final Note
This assignment is intentionally designed to evaluate analytical reasoning, not only coding speed.
The most important part is not producing many outputs.
The most important part is being able to justify:
how you defined access,
how you combined the datasets,
why your chosen graphs and maps are useful,
and why your final conclusions are defensible.
Good luck!