Study MIMIC-III data using AWS Athena

The MIMIC-III dataset is available in the AWS cloud through our Open Data on AWS program. This allows researchers to use the MIMIC-III dataset without having to download it, make a copy of it, or pay to store it. They can simply analyze MIMIC-III using AWS services like Amazon EC2, Amazon Athena, AWS Lambda, or Amazon EMR by pointing to its address in the AWS cloud. This makes it faster and less expensive to perform research studies. In order to gain access, please login to the MIMIC PhysioNet website, input your AWS account number, and request access to the MIMIC-III Clinical Database on AWS.

Found here is some useful code to help you understand how to access the MIMIC-III dataset on AWS.

mimic-iii-athena.yaml

This is an AWS CloudFormation Template that will deploy a database in the AWS Glue Data Catalog that contains all of the MIMIC-III tables. It also deploys a Jupyter Notebook instance in Amazon SageMaker that contains the content of this mimic-code GitHub repository and is set up to access the MIMIC-III data through AWS Glue. This repository contains a version of the Aline Study that is configured to run it's SQL queries against the MIMIC-III data in AWS using AWS Athena.

Use the below Launch Stack button to deploy this AWS CloudFormation template into your AWS account. On the first screen, the template link has already been specified, so just click next. On the second screen, provide a Stack name (letters and numbers) and click next, on the third screen, just click next. On the fourth screen, at the bottom, there is a box that says I acknowledge that AWS CloudFormation might create IAM resources.. Check that box, and then click Create. Once the Stack has completed deploying, look at the Outputs tab of the AWS CloudFormation console for links to your Jupyter Notebooks instance.

mimictoparquet_glue_job.py

The MIMIC-III dataset is offered on AWS in both the original gzipped CSV format as well as the Apache Parquet format. This Python script is run by the MIMIC team as an AWS Glue job to convert the CSV.gz files to Apache Parquet format. It uses Apache Spark to help conver the files. You don't need to use this script yourself to use the MIMIC-III data in Parquet format, but it is provided as a template that could be used to convert other data sets from CSV to Parquet. The benefits of querying this data in Parquet format are outlined in this blog post:

mimic-aws-athena-ddl.sql

This is a SQL script that can be run directly within AWS Athena to define each of the MIMIC-III dataset tables from the Apache Parquet files provide in the AWS OpenData program. You do not need to execute these if you use the above mimic-iii-athena.yaml CloudFormation template. The table spaces are defined for you by that automation. However, if you wanted to define the tables yourself within your account, you can use these SQL statements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Study MIMIC-III data using AWS Athena

mimic-iii-athena.yaml

mimictoparquet_glue_job.py

mimic-aws-athena-ddl.sql

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Study MIMIC-III data using AWS Athena

mimic-iii-athena.yaml

mimictoparquet_glue_job.py

mimic-aws-athena-ddl.sql