HW_02_202601

<html>
<body>
<html><head></head><body>
<hr><h1>Homework Assignment: Emergency Healthcare Access Inequality in Peru</h1>Course: Python Programming / Data Science Total Score: 20 points Deadline: Friday, April 24 — 11:59 PM<hr><h2>Description</h2>The goal of this assignment is to build a complete geospatial analytics pipeline in Python to study emergency healthcare access inequality across districts in Peru.Using public datasets on:<ul><li>health facilities,</li><li>emergency care activity,</li><li>populated centers,</li><li>and district boundaries,</li></ul>you will create a project that integrates GeoPandas, Folium, matplotlib / seaborn, and Streamlit into a single analytical product.This is not a task about simply plotting hospitals on a map. The purpose is to answer a more difficult analytical question:<blockquote>Which districts in Peru appear relatively better or worse served in emergency healthcare access, and what evidence supports that conclusion?</blockquote>You are expected to make methodological decisions, justify them, and communicate them clearly.<hr><h2>Expected Repository Structure</h2>Create a repository named exactly:<code inline="">emergency_access_peru</code>with the following structure:<pre><code class="language-bash">emergency_access_peru/
│
├── app.py # Streamlit app
├── README.md # Project explanation and methodology
├── requirements.txt # Dependencies
│
├── src/
│ ├── data_loader.py # Loading functions
│ ├── cleaning.py # Cleaning and preprocessing
│ ├── geospatial.py # Spatial joins, distance logic, GeoDataFrames
│ ├── metrics.py # District-level indicators / score
│ ├── visualization.py # Static charts and maps
│ └── utils.py # Helper functions
│
├── data/
│ ├── raw/ # Original downloaded files
│ └── processed/ # Cleaned and processed outputs
│
├── output/
│ ├── figures/ # Static charts and maps
│ └── tables/ # Final district-level tables
│
└── video/
 └── link.txt # Link to your explanatory video
</code></pre><hr><h2>Public Datasets to Use</h2>You must use all the following datasets:<ol><li>Populated Centers <a href="https://www.datosabiertos.gob.pe/dataset/dataset-centros-poblados">Dataset Centros Poblados</a></li><li>District Boundaries of Peru <a href="https://github.com/d2cml-ai/Data-Science-Python/blob/main/_data/Folium/DISTRITOS.shp">DISTRITOS.shp</a></li><li>Reference Class for District Shapefile Usage <a href="https://github.com/d2cml-ai/Data-Science-Python/blob/main/Lectures/Lecture_5/Geopandas1.ipynb">GeoPandas Class Reference</a></li><li>Emergency Care Production by IPRESS <a href="http://datos.susalud.gob.pe/dataset/consulta-c1-produccion-asistencial-en-emergencia-por-ipress">Producción Asistencial en Emergencia por IPRESS</a></li><li>IPRESS Health Facilities <a href="https://www.datosabiertos.gob.pe/dataset/minsa-ipress">MINSA – IPRESS</a></li></ol><hr><h2>Main Objective</h2>Build a district-level emergency healthcare access analysis for Peru.Your project must combine the four datasets and produce a district-level analytical output that allows comparison across the country.You are not given a fixed formula.You must design and justify a district-level measure, index, or framework that helps evaluate emergency healthcare access conditions using:<ul><li>facility availability,</li><li>emergency care activity,</li><li>and the spatial relationship between populated centers and health facilities.</li></ul><hr><h2>Required Analytical Questions</h2>Your project must answer the following questions using the full pipeline and all required tools.<h3>Question 1 — Territorial Availability</h3>Which districts appear to have lower or higher availability of health facilities and emergency care activity?You must determine how to measure this and justify your approach.<hr><h3>Question 2 — Settlement Access</h3>Which districts seem to have populated centers with weaker spatial access to emergency-related health services?You must define and justify a spatial access logic from populated centers to facilities.<hr><h3>Question 3 — District Comparison</h3>Which districts appear most underserved and which appear best served when combining:<ul><li>facility presence,</li><li>emergency activity,</li><li>and populated-center access patterns?</li></ul>You must explain the evidence behind your classification or ranking.<hr><h3>Question 4 — Methodological Sensitivity</h3>How much do the district results change if your analytical definition of access changes?You must build:<ul><li>one baseline specification</li><li>one alternative specification</li></ul>Then compare the results and explain what changed.<hr><h2>Technical Requirements</h2>Your project must include all of the following tools:<ul><li>GeoPandas</li><li>Folium</li><li>matplotlib</li><li>seaborn</li><li>Streamlit</li></ul></code></li></ul><hr><h2>Project Tasks</h2><h3>Task 1 — Data ingestion and cleaning</h3>Score: 3 pointsLoad all required datasets and prepare them for analysis.Your code must:<ul><li>load the raw files correctly,</li><li>standardize column names,</li><li>document key variables,</li><li>handle duplicates,</li><li>remove invalid coordinates when needed,</li><li>prepare geospatial objects correctly,</li><li>and save cleaned outputs into <code inline="">data/processed/</code>.</li></ul><h4>Minimum deliverables</h4><ul><li>cleaned datasets,</li><li>short data dictionary,</li><li>summary of filtering and cleaning decisions.</li></ul><hr><h3>Task 2 — Geospatial integration with GeoPandas</h3>Score: 3 pointsBuild the geospatial pipeline.Your code must:<ul><li>create GeoDataFrames,</li><li>assign facilities to districts,</li><li>assign populated centers to districts,</li><li>and build the spatial relationships needed for your district-level analysis.</li></ul>You must use proper CRS handling and explain it in the README.<hr><h3>Task 3 — District-level metric / framework</h3>Score: 3 pointsConstruct a district-level analytical output that helps answer the four required questions.This may be:<ul><li>a score,</li><li>an index,</li><li>or a rule-based classification system.</li></ul>It must include at least:<ul><li>one facility-related component,</li><li>one emergency-activity component,</li><li>one populated-center access component.</li></ul>You must also create:<ul><li>one baseline version,</li><li>one alternative version,</li><li>and a comparison between both.</li></ul><hr><h3>Task 4 — Static analysis and visual reasoning</h3>Score: 2 pointsCreate static visual outputs using matplotlib and seaborn.Important:<ul><li>You are not told which graphs to generate.</li><li>Choosing the correct graphs is part of the assignment.</li></ul>Your visualizations must help answer the required analytical questions and support your methodological decisions.In the README, explain:<ul><li>what each graph helps answer,</li><li>why you chose it,</li><li>and why it is more useful than another plausible graph type.</li></ul><hr><h3>Task 5 — Static and interactive geospatial outputs</h3>Score: 2 pointsUsing GeoPandas and Folium, create:<ul><li>static geospatial outputs for analytical comparison,</li><li>interactive views for exploration,</li><li>and district-level visual support for your conclusions.</li></ul>The maps must help explain the problem, not simply display raw locations.<hr><h3>Task 6 — Streamlit application</h3>Score: 4 pointsBuild a Streamlit application that communicates the full analysis.The app must contain exactly 4 tabs:<h4>Tab 1 — Data &amp; Methodology</h4>Include:<ul><li>problem statement,</li><li>data sources,</li><li>cleaning summary,</li><li>methodological decisions,</li><li>limitations.</li></ul><h4>Tab 2 — Static Analysis</h4>Include:<ul><li>your selected charts,</li><li>short interpretations,</li><li>explanation of why those visuals were selected.</li></ul><h4>Tab 3 — GeoSpatial Results</h4>Include:<ul><li>static maps,</li><li>district-level comparisons,</li><li>and supporting tables.</li></ul><h4>Tab 4 — Interactive Exploration</h4>Include:<ul><li>Folium maps,</li><li>district comparison views,</li><li>and baseline vs alternative comparison.</li></ul><hr><h2>README.md must include at least</h2><ul><li>What does the project do?</li><li>What is the main analytical goal?</li><li>What datasets were used?</li><li>How were the data cleaned?</li><li>How were the district-level metrics constructed?</li><li>How to install the dependencies?</li><li>How to run the processing pipeline?</li><li>How to run the Streamlit app?</li><li>What are the main findings?</li><li>What are the main limitations?</li></ul><hr><h2>Explanatory Video</h2>Score: 3 pointsCreate a video of 4 minutes maximum.It must show:<ul><li>brief explanation of the repository structure,</li><li>explanation of your methodology,</li><li>the pipeline or main scripts,</li><li>the Streamlit app running,</li><li>and the main outputs.</li></ul>Place the link in:<code inline=""></code>Place the link to your repository and video here as well: <a href="https://docs.google.com/spreadsheets/d/16i_gtlZV08QARXl8FM5yX503XjDyKPiRFRbo56cjR2k/edit?gid=511437302#gid=511437302">Submission Google Sheet</a><hr><h2>GitHub Workflow (MANDATORY)</h2>❌ Do not work directly on <code inline="">main</code> — points will be deducted ✅ Create a working branch ✅ Make progressive commits with descriptive messages ✅ Merge into <code inline="">main</code> via a Pull RequestReminder: working directly on <code inline="">main</code> is not allowed. You must develop your work in branches and only merge the final version through a Pull Request.Example branch names:<ul><li><code inline="">feature/geospatial-pipeline</code></li><li><code inline="">feature/streamlit-app</code></li><li><code inline="">feature/final-analysis</code></li></ul><hr><h2>Grading Rubric</h2><h3>Task 1 — Data ingestion and cleaning</h3>
Criteria | Points
-- | --
Correct loading and preprocessing of datasets | 1.0 pt
Cleaning decisions documented clearly | 0.75 pt
Processed outputs saved correctly | 0.75 pt
Data dictionary / filtering summary | 0.5 pt
Subtotal | 3.0 pts

<hr><h2>Penalties</h2><ul><li>Working directly on <code inline="">main</code>: -2 points</li><li>Missing explanatory video: -3 points</li><li>Repository not reproducible: up to -2 points</li><li>No baseline vs alternative comparison: -1 point</li><li>No explanation of methodological choices: up to -2 points</li><li>Streamlit app incomplete or missing required tabs: up to -2 points</li></ul><hr><h2>Checklist before submitting</h2><ul class="contains-task-list"><li class="task-list-item"><input type="checkbox" disabled=""> Repository is named <code inline="">emergency_access_peru</code></li><li class="task-list-item"><input type="checkbox" disabled=""> All required datasets were used</li><li class="task-list-item"><input type="checkbox" disabled=""> The project includes <code inline="">requirements.txt</code></li><li class="task-list-item"><input type="checkbox" disabled=""> The project includes <code inline="">README.md</code></li><li class="task-list-item"><input type="checkbox" disabled=""> The project includes <code inline="">app.py</code></li><li class="task-list-item"><input type="checkbox" disabled=""> The project includes modular code inside <code inline="">src/</code></li><li class="task-list-item"><input type="checkbox" disabled=""> Raw and processed data folders are present</li><li class="task-list-item"><input type="checkbox" disabled=""> The Streamlit app has exactly 4 tabs</li><li class="task-list-item"><input type="checkbox" disabled=""> The video link is saved in <code inline="">video/link.txt</code></li><li class="task-list-item"><input type="checkbox" disabled=""> Work was done through branches and merged via Pull Request</li><li class="task-list-item"><input type="checkbox" disabled=""> The README explains methodology and findings</li><li class="task-list-item"><input type="checkbox" disabled=""> The repository is fully runnable</li><li class="task-list-item"><input type="checkbox" disabled=""> The repository and video links were submitted in the Google Sheet</li></ul><hr><h2>Submission</h2>Submit the link to your GitHub repository and your video before:Friday, April 24 — 11:59 PMSubmission form: <a href="https://docs.google.com/spreadsheets/d/16i_gtlZV08QARXl8FM5yX503XjDyKPiRFRbo56cjR2k/edit?gid=511437302#gid=511437302">Google Sheet for repository and video submission</a><hr><h2>Final Note</h2>This assignment is intentionally designed to evaluate analytical reasoning, not only coding speed.The most important part is not producing many outputs. The most important part is being able to justify:<ul><li>how you defined access,</li><li>how you combined the datasets,</li><li>why your chosen graphs and maps are useful,</li><li>and why your final conclusions are defensible.</li></ul>Good luck!</body></html>
</body>
</html>

HW_02_202601 #168

Description

Homework Assignment: Emergency Healthcare Access Inequality in Peru

Description

Expected Repository Structure

Public Datasets to Use

Main Objective

Required Analytical Questions

Question 1 — Territorial Availability

Question 2 — Settlement Access

Question 3 — District Comparison

Question 4 — Methodological Sensitivity

Technical Requirements

Project Tasks

Task 1 — Data ingestion and cleaning

Minimum deliverables

Task 2 — Geospatial integration with GeoPandas

Task 3 — District-level metric / framework

Task 4 — Static analysis and visual reasoning

Task 5 — Static and interactive geospatial outputs

Task 6 — Streamlit application

Tab 1 — Data & Methodology

Tab 2 — Static Analysis

Tab 3 — GeoSpatial Results

Tab 4 — Interactive Exploration

README.md must include at least

Explanatory Video

GitHub Workflow (MANDATORY)

Grading Rubric

Task 1 — Data ingestion and cleaning

Penalties

Checklist before submitting

Submission

Final Note

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions