Generation of GHG concentration inputs (i.e. forcings) for CMIP7's ScenarioMIP.
- development: the project is actively being worked on
We do all our environment management using pixi. To get started, you will need to make sure that pixi is installed (instructions here, we found that using the pixi provided script was best on a Mac).
To create the virtual environment, run
pixi install
pixi run pre-commit installThese steps are also captured in the Makefile so if you want a single
command, you can instead simply run make virtual-enviroment.
Having installed your virtual environment, you can now run commands in your virtual environment using
pixi run <command>For example, to run Python within the virtual environment, run
pixi run pythonAs another example, to run a notebook server, run
pixi run jupyter labBackground: If you use prefect in more than one project, you can get yourself in a mess. As a precaution, we recommend following this process in all cases, even if this is the only project with which you use pixi.
To avoid clashes, you will likely want to make a profile specific to this project, e.g.
pixi run prefect profile create cmip7-scenariomip-ghg-concentrationsThen use it with
pixi run prefect profile use cmip7-scenariomip-ghg-concentrations
# Check with
pixi run prefect profile lsTo avoid clashes with other databases, tell prefect to use a database specific to this project
mkdir .prefect
pixi run prefect config set PREFECT_API_DATABASE_CONNECTION_URL='sqlite+aiosqlite:////path/to/this/repo/.prefect/prefect.db'
# e.g.
pixi run prefect config set PREFECT_API_DATABASE_CONNECTION_URL="sqlite+aiosqlite:///${PWD}/.prefect/prefect.db"
# Check with
pixi run prefect config view --show-secretsIf you want to run on a specific host/port for this instance, you can set that with
# Host
pixi run prefect config set PREFECT_SERVER_API_HOST=<desired-host>
# e.g.
pixi run prefect config set PREFECT_SERVER_API_HOST="127.0.0.1"
# Port
pixi run prefect config set PREFECT_SERVER_API_PORT=<desired-port>
# e.g.
pixi run prefect config set PREFECT_SERVER_API_PORT="4201"If you do this, make sure that the prefect API URL matches
pixi run prefect config set PREFECT_API_URL="http://<desired-host>:<desired-port>/api"
# e.g.
pixi run prefect config set PREFECT_API_URL="http://127.0.0.1:4201/api"- Receive data from the emissions team
- Update
scripts/create-latest-set-of-concentration-files.sh- Likely you will need to update
--emissions-file,--run-id,--esgf-versionand--input4mips-cvs-source
- Likely you will need to update
- Commit
- Start your prefect server in a separate terminal,
pixi run prefect server start - Run
- Upload the results to NERSC for the publication team (see Uploading to NERSC).
- Receive markers from the emissions team
- the markers are defined in
scripts/generate-concentration-files.py. If there are changes, make sure you update this variable.
- the markers are defined in
- Receive emissions from the emissions team
- they should send two files.
They produce these files with the script
here.
The two files are:
- the emissions for each scenario, except for emissions of species that we derive from our inversions of sources like WMO (2022) (where we use only a single concentration projection, rather than having variation across scenarios)
- emissions for each scenario at the fossil/biosphere level. This is used for some extrapolations of latitudinal gradients. It's the same data as above, just at slightly higher sectoral detail.
- they should send two files.
They produce these files with the script
here.
The two files are:
- Put the received emissions in
data/raw/input-scenarios - Update the emissions file you use for your run.
There are two options for how to do this:
- specify this from the command line via the
--emissions-fileoption - change the value of the
emissions_filevariable inscripts/create-latest-set-of-concentration-files.sh
- specify this from the command line via the
- Run with a new run ID and ESGF version (using the command line argument
--run-idand--esgf-version). Pick whatever makes sense here (we don't have strong rules about our versioning yet)- This will also require creating entries for the controlled vocabularies (CVs). This requires updating this file to include source IDs of the form "CR-scenario-esgf-version". In practice, simply copy the existing "CR-scenario-esgf-version" entries and update their version to match the ESGF version you used above. Then push this to GitHub.
- When you run, you will need to update the value of
--input4mips-cvs-source. You can do this either via the command-line argument--input4mips-cvs-sourceor just update the value inscripts/generate-concentration-files.py. The value should be of the form"gh:[commit-id]"e.g."gh:c75a54d0af36dbedf654ad2eeba66e9c1fbce2a2".
- When the run is finished, upload the results to NERSC for the publication team (see Uploading to NERSC).
- raw docs are pretty good: https://docs.nersc.gov/services/scp/
- command is something like
rsync --partial --progress -avR output-bundles/1.0.0/data/processed/esgf-ready/input4MIPs zrjn@dtn01.nersc.gov:/global/u2/z/zrjn/-avR: sets the flags for copying recursively and with the directory structure we wantoutput-bundles/0.1.0/data/processed/esgf-ready/input4MIPs: the directory you want to uploadzrjn: zeb's username, yours will be something like fb. You can get this by logging into jupyter then looking at the start of your shell.dtn01.nersc.gov:: where we want to upload to get this from the docs https://docs.nersc.gov/services/scp//global/u2/z/zrjn/: the path we want to upload to. This is just my home directory
- move files to
/global/cfs/projectdirs/m4931/zrjn-tmp, the 'staging' area effectively - update permissions
- make all directories readable by anyone:
find /global/cfs/projectdirs/m4931/zrjn-tmp/input4MIPs/ -type d -exec chmod 755 {} \; - make all files readable by anyone:
find /global/cfs/projectdirs/m4931/zrjn-tmp/input4MIPs/ -type f -exec chmod 644 {} \;
- make all directories readable by anyone:
- message Sasha on slack with something like, "Hi, the files in
/path/to/suitable/level/dirare ready to be published"
By default, this all runs serially. You can add extra cores with the flags below:
--n-workers: the number of threaded (i.e. parallel) workers to use for submitting jobs- note: this doesn't result in true parallelism. A full explanation is beyond the scope of this document (but if you want to google, explore the difference between multiprocessing with threads compared to processes in python)
--n-workers-multiprocessing: the number of multiprocessing (i.e. parallel) workers to use, excluding any tasks that require running MAGICC--n-workers-multiprocessing-magicc: the number of multiprocessing (i.e. parallel) workers to use for tasks that run MAGICC--n-workers-per-magicc-notebook: the number of MAGICC workers to use in each MAGICC-running task.- note: the total number of MAGICC workers is the product of
--n-workers-multiprocessing-magiccand--n-workers-per-magicc-notebook
- note: the total number of MAGICC workers is the product of
In general, you want:
--n-workers: equal to the number of cores on your CPU (or more)--n-workers-multiprocessing: equal to the number of cores on your CPU (or more)--n-workers-multiprocessing-magicc,--n-workers-per-magicc-notebook: the product should be equal to equal to the number of cores on your CPU (or more)
For example, for an eight core machine you might do something like
pixi run python scripts/generate-concentration-files.py --n-workers 8 --n-workers-multiprocessing 8 --n-workers-multiprocessing-magicc 2 --n-workers-per-magicc-notebook 4If you need/want to run only for a specific gas, you can use the --ghg flag as shown below.
pixi run python scripts/generate-concentration-files.py --ghg ccl4 --ghg cfc113TODO: update this section as we add:
- tests
- anything else
Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).
TODO: update as we figure out the structure
TODO: update as we figure out the structure
We have a basic Makefile which captures key commands in one place
(for more thoughts on why this makes sense, see
general principles: automation).
For an introduction to make, see
this introduction from Software Carpentry.
Having said this, if you're not interested in make, you can just copy the
commands out of the Makefile by hand and you will be 90% as happy.
In this repository, we use the following tools:
- git for version-control (for more on version control, see
general principles: version control)
- for these purposes, git is a great version-control system so we don't complicate things any further. For an introduction to Git, see this introduction from Software Carpentry.
- Pixi for environment management
(for more on environment management, see
general principles: environment management)
- there are lots of environment management systems. Pixi works well in our experience and, for projects that need conda, it is the only solution we have tried that worked really well.
- we track the
pixi.lockfile so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
- pre-commit with some very basic settings to get some
easy wins in terms of maintenance, specifically:
- code formatting with ruff
- basic file checks (removing unneeded whitespace, not committing large files etc.)
- (for more thoughts on the usefulness of pre-commit, see general principles: automation
- track your notebooks using
jupytext
(for more thoughts on the usefulness of Jupytext, see
tips and tricks: Jupytext)
- this avoids nasty merge conflicts and incomprehensible diffs
- prefect for workflow orchestration
- relationship between this repo and https://github.com/PCMDI/input4MIPs_CVs
- this repo pulls information from the 'source ID' fine in input4MIPs_CVs, this file: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_source_id.json
- in there, it is looking for keys like 'CR-*', to make sure that the 'source ID' (think unique ID) we use is 'registered'/known to in input4MIPs_CVs
- the trick we play is that we can point to a specific commit or branch of input4MIPs_CVs, and then this repo is still happy.
- the idea of input4MIPs_CVs is make sure that the wider forcings team is aware of what is coming and can manage some of the metadata around all of these different contributions (that come from different people)
- we write the files using this information, so we can't really get it wrong but doing it this way means this metadata is defined in one spot, so it's a bit easier to manage
This project was generated from this template: basic python repository. copier is used to manage and distribute this template.