Skip to content

emjavan/DSHS_IP_RDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

113 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSHS_IP_RDF

Code to filter, categorize, and aggregate DSHS IP RDF data files. Aggregation is for different spatial and temporal resolutions. The spatial assignment can be based on the patient's mailing address or the hospital's location.

  1. sbatch launch_filter_icd10_codes.sh calls filter_icd10_codes.sh
  2. sbatch launch_run_categorize_funs.sh calls commands_run_categorize_funs.txt which calls run_categorize_funs.R
  3. sbatch launch_run_aggregate_funs.sh calls commands_run_aggregate_funs.txt which calls run_aggregation_funs.R The aggregation commands file is generated by create_aggregation_commands_file.R

Functions for both steps 2 & 3 are in categorize_aggregate_funs.R and get_packages_used.R

What's in directories on LS6

  1. ../../ALL_OG_FILES/: multiple folders with the orginal IP RDF files as downloaded from DSHS.
  2. ../../FILTERED_PAT_FILES/: out.IP_20*_filtered.txt the orginal IP RDF files filtered to only ICD-10-CM codes listed in input_data\icd10_disease_category_list.csv. The diseases approved by IRB are COVID-19, Influenza, ILI, and RSV.
  3. ../../CATEGORIZED_PAT_FILES/: out.IP_20*_categorized.csv the filtered files with columns categorizing the disease associated with each ICD-10-CM code and distinguishing primary vs secondary diagnoses. The aggregation function does not currently consider primary diagnosis only, but may be considered later. Hospital location features are joined here. The exact addresses of hospitals and patient drive times to them still need to be joined, but for now there is city, county, and state.
  4. ../../PAT_CATEGORIZED_BY_DISEASE/: IPRDF-categorized_DISEASE_MINYEAR-MAXYEAR.csv all patient data grouped and filtered to the disease combinations. Disease combinations are alphabetical hyphenated string, e.g. COV-FLU not FLU-COV. Creating these files will only need to be done when the categorization files change or new years are added (approximately annually). MINYEAR and MAXYEAR can be specified to limit file size, but must be an entire year. Only 2018 had half the year purchased.
  5. ../../AGGREGATED_PAT_FILES/: IPRDF-aggregated_DISEASE_COUNTTYPE_GRPVAR_TIMERES_MINYEAR-MAXYEAR.csv the disease filtered files aggregated to the spatial and temporal resolution desired by users. See create_aggregation_commands_file.R for all inputs and re-run sbatch launch_run_aggregate_funs.sh as needed.

Creates one big file of timeseries nice for filtering as needed. This will always generate the DAILY timeseries, then aggregate to WEEKLY as needed. Start of week hardcoded within Rscript to limit the command line parameters. There is also an optional grouping variable, AGE_GRP planned, but not used as it's a naive implementation. Age groups will 5-6x the row number, which is good for grouping variables and plotting but not the best for data storage.

User notes and no-no's

  1. The csv files do not have zeroes created as the time series are generated with patient admit start and discharge end dates. If zeros are needed, then user needs to expand their date range before use.
  2. Please do not save edited files to AGGREGATED_PAT_FILES. Make a new folder for your purposes. This will also help ensure multiple people are not trying to read from the same files.
  3. The data repository corral-secure is a shared resource, if one person opens a file other cannot open it simultaneously. It's not a GoogleDoc with a fancy software to tell who is editing what and where, files need to be opened and closed before others can open. If you have a large job and plan to read and write a lot, then create your own folder that is not within your git repository unless you are 100% confident you are not violating our use agreement. If you are not sure, then keep your git repo private or scp figures to your local machine for visual checks.
  4. All files you create will be only readable to you. Change file permissions for all files periodically chmod -R ug+rx /path/to/dir/. Only users and group should be able to read and execute file. Avoid allowing group members to write over your work. Directories themselves will need to be writable if you want to add any new files. Read and execute only is much safer state to leave any results you'll need in the future and do not want to risk someone overwriting.

About

Code to analyze DSHS IP RDF data files.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors