Starting point to build an application to generate CAOM2 Observations from FITS files.
In an empty directory:
-
This is the working directory, so it should probably have some space.
-
In the
mainbranch of this repository, find the fileDockerfile. In thescriptsdirectory, find the filesdocker-entrypoint.sh, andconfig.yml. Copy these files to the working directory.wget https://raw.github.com/opencadc-metadata-curation/possum2caom2/main/Dockerfile wget https://raw.github.com/opencadc-metadata-curation/possum2caom2/main/scripts/docker-entrypoint.sh wget https://raw.github.com/opencadc-metadata-curation/possum2caom2/main/scripts/config.yml -
Make
docker-entrypoint.shexecutable. -
config.ymlis configuration information for the ingestion. It will work with the files named and described here. For a complete description of its content, see https://github.com/opencadc-metadata-curation/collection2caom2/wiki/config.yml. -
The ways to tell this tool the work to be done:
-
provide a file containing the list of file ids to process, one file id per line, and the config.yml file containing the entries 'use_local_files' set to False, and 'task_types' set to -ingest -modify. The 'todo' file may provided in one of two ways:
- named 'todo.txt' in this directory, as specified in config.yml, or
- as the fully-qualified name with the --todo parameter
-
provide the files to be processed in the working directory, and the config.yml file containing the entries 'use_local_files' set to True, and 'task_types' set to -store -ingest -modify.
- The store task does not have to be present, unless the files on disk are newer than the same files at CADC.
-
provide the files to be processed in a Pawsey acacia remote, with the config.yml file containing the entries 'use_local_files' set to False, the 'task_types' set to -store -ingest -modify, and the
data_sourcesset to<acacia remote>/possum/tiles.- The
data_sourcesentry requires thercloneconfiguration to be set up. Usepawseyin theacacia remotename to get the correctrclonesyntax for the commands.
- The
-
provide the files to be processed in the working directory, and the config.yml file containing the entries 'use_local_files' set to True, and 'task_types' set to -scrape.
- This configuration will not attempt to write files or CAOM2 records to CADC. It is a good way to craft the content of the CAOM2 record without continually updating database content.
-
-
To build the container image, run this:
docker build -f Dockerfile -t possum_run_cli ./ -
In the working directory, place a CADC proxy certificate. The Docker image can be used to create a proxy certificate as follows. You will be prompted for the password for your CADC user:
user@dockerhost:<cwd># docker run --rm -ti -v ${PWD}:/usr/src/app -v <fully-qualified path to staging directory>:/data --user $(id -u):$(id -g) -e HOME=/usr/src/app --name possum_run_cli possum_run cadc-get-cert --days-valid 10 --cert-filename /usr/src/app/cadcproxy.pem -u <your CADC username> -
To set up the
rcloneconfiguration for Pawsey s3 acacia storage, run the image, and then runrclone configfrom within the image. Followrclone configsteps as described here: https://www.youtube.com/watch?v=mOp7NJpwzac&t=1507s. Note that this will leave the.config/rclone/rclone.conffile on disk, which is why the last step is to set permissions on thercloneconfiguration file:user@dockerhost:<cwd># docker run --rm -ti -v <cwd>:/usr/src/app --user $(id -u):$(id -g) -e HOME=/usr/src/app --name possum_run_cli possum_run_cli /bin/bash cadcops@d51a02720ea6:~$ rclone config cadcops@d51a02720ea6:~$ <follow the rclone config steps> cadcops@d51a02720ea6:~$ exit user@dockerhost:<cwd># chmod 600 .config/rclone/rclone.conf -
To run the application where it will retrieve files from the remote Pawsey s3 acacia storage:
user@dockerhost:<cwd># docker run --rm -ti -v <cwd>:/usr/src/app --user $(id -u):$(id -g) -e HOME=/usr/src/app --name possum_run_cli possum_run_cli possum_run_remote -
To edit and test the application from inside a container:
user@dockerhost:<cwd># git clone https://github.com/opencadc-metadata-curation/possum2caom2.git user@dockerhost:<cwd># docker run --rm -ti -v <cwd>:/usr/src/app -v <fully-qualified path to staging directory>:/data --user $(id -u):$(id -g) -e HOME=/usr/src/app --name possum_run_cli possum_run_cli /bin/bash root@53bef30d8af3:/usr/src/app# pip install -e ./possum2caom2 root@53bef30d8af3:/usr/src/app# pip install mock pytest root@53bef30d8af3:/usr/src/app# cd possum2caom2/possum2caom2/tests root@53bef30d8af3:/usr/src/app# pytest -
For some instructions that might be helpful on using containers, see: https://github.com/opencadc-metadata-curation/collection2caom2/wiki/Docker-and-Collections