Conversation
| bootstrap_quality_of_care_context <- function( | ||
| root_path = "~/workspace", | ||
| required_packages = c("jsonlite", "data.table", "arrow", "sf", "ggplot2", "glue", "reticulate", "RColorBrewer", "dplyr", "writexl", "knitr", "scales", "gridExtra") | ||
| ) { |
There was a problem hiding this comment.
I think we're doing the same sort of setup steps across all pipelines (R), The logic thing to do would be to try to unify the bootstrap in a generic functions , I'm doing this here, I'm calling this function from all 5 formatting scripts, but I think your function seems more complete (we just need to make sure we provide all output paths in the result).
So the next step would be to include this bootstrap in the snt_utils.r ;)
| #' @param country_code Country code (e.g. COD). | ||
| #' @param data_action Action suffix (`imputed` or `removed`). | ||
| #' @return File name of the selected routine parquet. | ||
| select_latest_qoc_routine_file <- function(dataset_last_version, country_code, data_action) { |
There was a problem hiding this comment.
I think this can also be done generic and simpler like this
| "\n", | ||
| "ROOT_PATH <- \"~/workspace\"\n", | ||
| "CODE_PATH <- file.path(ROOT_PATH, \"code\")\n", | ||
| "PIPELINE_PATH <- file.path(ROOT_PATH, \"pipelines\", \"snt_dhis2_quality_of_care\")\n", |
There was a problem hiding this comment.
If these paths are set in the bootstrap we dont need them here.
| #' @return Named list with routine data, shapes data, and selected filename. | ||
| load_quality_of_care_inputs <- function(setup_ctx, data_action) { | ||
| data_action <- validate_quality_of_care_action(data_action) | ||
| log_msg(glue::glue("Searching latest routine file for data_action: {data_action}")) |
There was a problem hiding this comment.
-Add the glue library directly in the list of required packages at bootstrap. But is good practice to use the reference of the library inside functions, you can keep that.
-Not sure if I would have the data loading steps inside a function, this is not really generalizable and are better handled at the R notebook level.
In general, functions should be as generic as possible (within the SNT context, of course). The goal is not to hide pipeline steps or make the computation obscure from the user’s perspective, but rather to make each step clearer and more readable, building on a set of functions defined in the SNT utils "toolbox".
| #' | ||
| #' @param routine Routine dataframe loaded from outliers dataset. | ||
| #' @return Data table with district-year indicators. | ||
| compute_quality_of_care_indicators <- function(routine) { |
There was a problem hiding this comment.
Following the idea of "transparency", I would suggest decomposing this function into smaller functions. Each one could handle a specific step and make it clearer what is being computed. Imagine you are the user trying to understand what is going on.
There was a problem hiding this comment.
I think this warning, as well as the check for existing_files, might not be necessary, since the function
the function add_files_to_dataset() logs any files not found, please see here
my version of the reworked pipeline