Skip to content

Commit 034382e

Browse files
Added readme
1 parent 5aa3788 commit 034382e

5 files changed

Lines changed: 1621 additions & 0 deletions

File tree

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
^data-raw$
44
^LICENSE\.md$
55
^data/metadata$
6+
^README\.Rmd$

README.Rmd

Lines changed: 265 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,265 @@
1+
---
2+
output: github_document
3+
always_allow_html: true
4+
editor_options:
5+
markdown:
6+
wrap: 72
7+
chunk_output_type: console
8+
---
9+
10+
<!-- README.md is generated from README.Rmd. Please edit that file -->
11+
12+
```{r, include = FALSE}
13+
knitr::opts_chunk$set(
14+
collapse = TRUE,
15+
comment = "#>",
16+
fig.path = "man/figures/README-",
17+
out.width = "100%",
18+
message = FALSE,
19+
warning = FALSE,
20+
fig.retina = 2,
21+
fig.align = 'center'
22+
)
23+
```
24+
25+
# Handpump Functionality Verification Survey - Chiradzulu, Malawi 2020
26+
27+
<!-- badges: start -->
28+
29+
[![License: CC BY
30+
4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
31+
32+
<!-- badges: end -->
33+
34+
This dataset contains detailed field survey records of borehole and
35+
handpump functionality verification exercises conducted in Chiradzulu
36+
District, Malawi in February 2020. Data was collected by BASEflow using
37+
the mWater mobile data collection platform. Each record represents a
38+
single site visit to a water point, capturing GPS coordinates, technical
39+
assessments, water availability, environmental conditions, and
40+
maintenance history.
41+
42+
The dataset includes:
43+
44+
- Identification & Location – Visit date, water point name/type,
45+
latitude, and longitude.
46+
47+
- Institutional Factors – Availability of government staff, committee
48+
permission for inspection.
49+
50+
- Functionality & Condition – Operational status, water availability,
51+
flow rate measurements, strokes to discharge, and mechanical
52+
condition.
53+
54+
- Environmental Hazards – Presence of latrines, cemeteries, waste,
55+
rivers, lakes, flood-prone areas, and difficult terrain within 50m.
56+
57+
- Repair & Maintenance History – Borehole age, manufacturer,
58+
installation details, prior repairs, spare parts required, and
59+
operational feel.
60+
61+
- Documentation – Photographs of the water point and repair parts.
62+
63+
**Purpose**
64+
65+
The dataset supports rural water supply monitoring, maintenance
66+
planning, and public health risk assessments, contributing to efforts to
67+
improve sustainability and reliability of community water points.
68+
69+
**Potential Users**
70+
71+
This dataset can be valuable to:
72+
73+
1. **Local Governments** – For planning maintenance schedules and
74+
allocating resources to priority water points.
75+
76+
2. **NGOs & Development Partners** – For designing interventions to
77+
improve rural water supply sustainability.
78+
79+
3. **Researchers & Public Health Experts** – For studying the impact of
80+
infrastructure condition on water access and health outcomes.
81+
82+
4. **Donors & Funding Agencies** – For monitoring the effectiveness of
83+
investments in water infrastructure.
84+
85+
5. **Community-Based Organizations** – For advocating improved water
86+
services and mobilizing community-led repairs.
87+
88+
## Installation
89+
90+
You can install the development version of handpumpstatusdata from
91+
[GitHub](https://github.com/) with:
92+
93+
``` r
94+
# install.packages("devtools")
95+
devtools::install_github("openwashdata/handpumpstatusdata")
96+
```
97+
98+
```{r}
99+
## Run the following code in console if you don't have the packages
100+
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
101+
library(dplyr)
102+
library(knitr)
103+
library(readr)
104+
library(stringr)
105+
library(gt)
106+
library(kableExtra)
107+
```
108+
109+
Alternatively, you can download the individual datasets as a CSV or XLSX
110+
file from the table below.
111+
112+
1. Click Download CSV. A window opens that displays the CSV in your
113+
browser.
114+
2. Right-click anywhere inside the window and select "Save Page As...".
115+
3. Save the file in a folder of your choice.
116+
117+
```{r, echo=FALSE, message=FALSE, warning=FALSE}
118+
119+
extdata_path <- "https://github.com/openwashdata/handpumpstatusdata/raw/main/inst/extdata/"
120+
121+
read_csv("data-raw/dictionary.csv") |>
122+
distinct(file_name) |>
123+
dplyr::mutate(file_name = str_remove(file_name, ".rda")) |>
124+
dplyr::rename(dataset = file_name) |>
125+
mutate(
126+
CSV = paste0("[Download CSV](", extdata_path, dataset, ".csv)"),
127+
XLSX = paste0("[Download XLSX](", extdata_path, dataset, ".xlsx)")
128+
) |>
129+
knitr::kable()
130+
131+
```
132+
133+
## Data
134+
135+
This dataset contains detailed field survey records of borehole and
136+
handpump functionality verification exercises conducted in Chiradzulu
137+
District, Malawi in February 2020.
138+
139+
```{r}
140+
library(handpumpstatusdata)
141+
```
142+
143+
### handpumpstatusdata
144+
145+
The dataset `handpumpstatusdata` has
146+
`r nrow(handpumpstatusdata)` observations and
147+
`r ncol(handpumpstatusdata)` variables
148+
149+
```{r}
150+
handpumpstatusdata |>
151+
head(3) |>
152+
gt::gt() |>
153+
gt::as_raw_html()
154+
```
155+
156+
For an overview of the variable names, see the following table.
157+
158+
```{r echo=FALSE, message=FALSE, warning=FALSE}
159+
readr::read_csv("data-raw/dictionary.csv") |>
160+
dplyr::filter(file_name == "handpumpstatusdata.rda") |>
161+
dplyr::select(variable_name:description) |>
162+
knitr::kable() |>
163+
kableExtra::kable_styling("striped") |>
164+
kableExtra::scroll_box(height = "200px")
165+
```
166+
167+
## Example
168+
169+
```{r}
170+
library(handpumpstatusdata)
171+
172+
# Example 1: Pie Chart Functionality Status Overview
173+
# Purpose: To show service availability.
174+
175+
# Load libraries
176+
library(tidyverse)
177+
178+
# Filter out NA or empty values
179+
data_filtered <- handpumpstatusdata %>%
180+
filter(!is.na(functionality_survey), functionality_survey != "")
181+
182+
# Summarise counts and calculate percentages
183+
functionality_counts <- data_filtered %>%
184+
group_by(functionality_survey) %>%
185+
summarise(count = n(), .groups = "drop") %>%
186+
mutate(percent = round(100 * count / sum(count), 1),
187+
label = paste0(percent, "%"))
188+
189+
# Create pie chart with percentages
190+
ggplot(functionality_counts, aes(x = "", y = count, fill = functionality_survey)) +
191+
geom_col(width = 1, color = "white") +
192+
coord_polar(theta = "y") +
193+
geom_text(aes(label = label),
194+
position = position_stack(vjust = 0.5), color = "white", size = 4) +
195+
labs(
196+
title = "Functionality Status Overview",
197+
fill = "Functionality"
198+
) +
199+
theme_void() +
200+
theme(
201+
plot.title = element_text(hjust = 0.5, face = "bold"),
202+
legend.title = element_text(face = "bold")
203+
)
204+
205+
# Example 2: Environmental Risk Factors
206+
# Purpose: Links potential contamination risks to water point locations.
207+
208+
# Load libraries
209+
library(tidyverse)
210+
211+
# Select relevant environmental risk variables and reshape
212+
risk_data <- handpumpstatusdata %>%
213+
select(waterpoint_name,
214+
latrines_within_50m,
215+
cemetery_within_50m,
216+
waste_within_50m,
217+
river_within_50m,
218+
lake_within_50m) %>%
219+
pivot_longer(
220+
cols = -waterpoint_name,
221+
names_to = "risk_factor",
222+
values_to = "present"
223+
)
224+
225+
# Clean labels for risk factors
226+
risk_data <- risk_data %>%
227+
mutate(
228+
risk_factor = recode(risk_factor,
229+
latrines_within_50m = "Latrines",
230+
cemetery_within_50m = "Cemeteries",
231+
waste_within_50m = "Waste",
232+
river_within_50m = "Rivers",
233+
lake_within_50m = "Lakes")
234+
)
235+
236+
# Count presence of each risk factor
237+
risk_counts <- risk_data %>%
238+
filter(!is.na(present), tolower(present) == "yes") %>%
239+
group_by(risk_factor) %>%
240+
summarise(count = n(), .groups = "drop")
241+
242+
# Stacked bar chart
243+
ggplot(risk_counts, aes(x = risk_factor, y = count, fill = risk_factor)) +
244+
geom_bar(stat = "identity") +
245+
labs(
246+
title = "Environmental Risk Factors within 50m",
247+
x = "Risk Factor",
248+
y = "Number of Water Points"
249+
) +
250+
theme_minimal(base_size = 14) +
251+
theme(legend.position = "none")
252+
```
253+
254+
## License
255+
256+
Data are available as
257+
[CC-BY](https://github.com/openwashdata/%7B%7B%7Bpackagename%7D%7D%7D/blob/main/LICENSE.md).
258+
259+
## Citation
260+
261+
Please cite this package using:
262+
263+
```{r}
264+
citation("handpumpstatusdata")
265+
```

0 commit comments

Comments
 (0)