Deprecation notice: With the merger of Open Targets Genetics into the main Plaform, this is not needed anymore.
QC, spin VMs, load data into DBs, create LUTs and other fun backend stuff we need to do to spin https://genetics.opentargets.io
There is a docker file to create an image with the project and required dependencies in place.
Build image and tag it with a name for convenience of calling later:
docker build --tag otg-etl .
Start a docker container in interactive mode.
Host names must not contain protocol (https is assumed) or slashes. The data loading script uses localhost if no host name provided.
docker run -it --rm \
--env ES_HOST='<elasticsearch host name>' \
--env CLICKHOUSE_HOST='<ot clickhouse db host name>' \
otg-etl
Authenticate google cloud storage.
gcloud auth application-default login
Load release data to ot database and the Elasticserch.
bash genetics-backend/loaders/clickhouse/create_and_load_everything_from_scratch.sh gs://genetics-portal-output/190504
Start a docker container in interactive mode.
Host names must not contain protocol (https is assumed) or slashes. The data loading script uses localhost if no host name provided.
docker run -it --rm \
--env ES_HOST='<elasticsearch host name>' \
--env CLICKHOUSE_HOST='<ot clickhouse db host name>' \
-v <directory with data>:/data/
otg-etl
Load release data to ot database and the Elasticserch.
bash genetics-backend/loaders/clickhouse/create_and_load_everything_from_scratch.sh /data
You can use wget to download the release data. Below is an example of the command for 19.05.04 release data.
wget --mirror ftp://ftp.ebi.ac.uk/pub/databases/opentargets/genetics/190504/