This directory contains scripts to create the schema and load the Northwestern ICU and Hospital data (NWICU) into PostgreSQL, following a structure similar to the MIMIC-IV build scripts.
# clone repo
git clone https://github.com/MIT-LCP/mimic-code.git
cd mimic-code
# download NWICU data
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/nwicu-northwestern-icu/0.1.0/
# clean directory (run this command outside of physionet.org directory)
mv physionet.org/files/nwicu-northwestern-icu nwicu && rmdir physionet.org/files && rm physionet.org/robots.txt && rmdir physionet.org
# create database
createdb nw
# build and load NWICU tables
psql -d nw -f nw/buildnw/postgres/create.sql
psql -d nw -v ON_ERROR_STOP=1 -v nw_data_dir=nwicu/0.1.0 -f nw/buildnw postgres/load_gz.sql
psql -d nw -f nw/buildnw/postgres/constraint.sql
psql -d nw -f nw/buildnw/postgres/index.sql
psql -d nw -f nw/buildnw/postgres/validate.sqlFirst ensure that PostgreSQL is running on your computer. For installation instructions, see: http://www.postgresql.org/download/
On macOS (using Homebrew):
brew update
brew install postgresql
brew services start postgresqlTo check which user is running the PostgreSQL service, use:
brew services listThe 'User' column shows the macOS account running PostgreSQL. This is usually the username you should use for database connections unless you have created a different PostgreSQL user.
On Ubuntu/Debian:
sudo apt update
sudo apt install postgresql postgresql-contrib
sudo service postgresql startOn Windows:
- Download the installer from https://www.postgresql.org/download/windows/
- Run the installer and follow the prompts to complete the installation.
- Start the PostgreSQL service from the Start Menu or Services app.
For more details, see the official PostgreSQL download page.
We can download Northwestern ICU (NWICU) database from PhysioNet:
wget -r -N -c -np --user <USERNAME> --ask-password https://physionet.org/files/nwicu-northwestern-icu/0.1.0/
mv physionet.org/files/nwicu-northwestern-icu nwicu && rmdir physionet.org/files && rm physionet.org/robots.txt && rmdir physionet.orgCreate the database if it does not already exist:
createdb nwSet PostgreSQL environment variables:
We can use the provided script to set your environment variables for the current terminal session:
source postgres_env.shReplace your_user and your_password with your actual PostgreSQL username and password in sh script.
Instead of editing the script, you can pass your username and password as arguments:
source postgres_env.sh myuser mypassword nw localhost 5432Once Postgres is installed, clone the mimic-code repository into a local directory.
git clone https://github.com/MIT-LCP/mimic-code.gitCreate the schemas and tables with the following psql command. This will delete any data present in the schemas. If you need to reload the data (for example, if you run the load scripts multiple times), simply rerun create.sql. This will drop all existing tables and recreate them, ensuring a clean slate before reloading your data.
psql -d nw -f nw/buildnw/postgres/create.sqlAfterwards, we need to load the NWICU files into the database. To do so, we'll specify the location of the local CSV files (compressed). Note that this assumes the folder structure is as follows:
nwicu_data_dir
nw_hosp
admissions.csv.gz
patients.csv.gz
...
nw_icu
icustays.csv.gz
...
For example, if you downloaded and moved the files as above, your nwicu_data_dir would be nwicu/0.1.0 and contain subfolders like nw_hosp and nw_icu with their respective compressed CSV files.
Once you have verified your data is stored in this structure, run:
psql -d nw -v ON_ERROR_STOP=1 -v nw_data_dir=nwicu/0.1.0 -f nw/buildnw/postgres/load_gz.sqlAfter loading the data, we can enforce data integrity by adding primary keys, foreign keys, and other constraints.
psql -d nw -f nw/buildnw/postgres/constraint.sqlWe can also improve query performance by creating indexes, which allow the database to quickly find and retrieve data, especially in large tables.
psql -d nw -f nw/buildnw/postgres/index.sqlTo ensure the data was loaded correctly, we can run validation checks.
psql -d nw -f nw/buildnw/postgres/validate.sql