Skip to content

climate-resource/cmip7-scenariomip-ghg-concentrations

Repository files navigation

CMIP7 ScenarioMIP GHG Concentrations

Generation of GHG concentration inputs (i.e. forcings) for CMIP7's ScenarioMIP.

Status

  • development: the project is actively being worked on

Installation

We do all our environment management using pixi. To get started, you will need to make sure that pixi is installed (instructions here, we found that using the pixi provided script was best on a Mac).

To create the virtual environment, run

pixi install
pixi run pre-commit install

These steps are also captured in the Makefile so if you want a single command, you can instead simply run make virtual-enviroment.

Having installed your virtual environment, you can now run commands in your virtual environment using

pixi run <command>

For example, to run Python within the virtual environment, run

pixi run python

As another example, to run a notebook server, run

pixi run jupyter lab

Creating the files

Prefect set up

Background: If you use prefect in more than one project, you can get yourself in a mess. As a precaution, we recommend following this process in all cases, even if this is the only project with which you use pixi.

To avoid clashes, you will likely want to make a profile specific to this project, e.g.

pixi run prefect profile create cmip7-scenariomip-ghg-concentrations

Then use it with

pixi run prefect profile use cmip7-scenariomip-ghg-concentrations
# Check with
pixi run prefect profile ls

To avoid clashes with other databases, tell prefect to use a database specific to this project

mkdir .prefect
pixi run prefect config set PREFECT_API_DATABASE_CONNECTION_URL='sqlite+aiosqlite:////path/to/this/repo/.prefect/prefect.db'
# e.g.
pixi run prefect config set PREFECT_API_DATABASE_CONNECTION_URL="sqlite+aiosqlite:///${PWD}/.prefect/prefect.db"
# Check with
pixi run prefect config view --show-secrets

If you want to run on a specific host/port for this instance, you can set that with

# Host
pixi run prefect config set PREFECT_SERVER_API_HOST=<desired-host>
# e.g.
pixi run prefect config set PREFECT_SERVER_API_HOST="127.0.0.1"
# Port
pixi run prefect config set PREFECT_SERVER_API_PORT=<desired-port>
# e.g.
pixi run prefect config set PREFECT_SERVER_API_PORT="4201"

If you do this, make sure that the prefect API URL matches

pixi run prefect config set PREFECT_API_URL="http://<desired-host>:<desired-port>/api"
# e.g.
pixi run prefect config set PREFECT_API_URL="http://127.0.0.1:4201/api"

Process

In short

  1. Receive data from the emissions team
  2. Update scripts/create-latest-set-of-concentration-files.sh
    • Likely you will need to update --emissions-file, --run-id, --esgf-version and --input4mips-cvs-source
  3. Commit
  4. Start your prefect server in a separate terminal, pixi run prefect server start
  5. Run
  6. Upload the results to NERSC for the publication team (see Uploading to NERSC).

In long

  1. Receive markers from the emissions team
    • the markers are defined in scripts/generate-concentration-files.py. If there are changes, make sure you update this variable.
  2. Receive emissions from the emissions team
    • they should send two files. They produce these files with the script here. The two files are:
      1. the emissions for each scenario, except for emissions of species that we derive from our inversions of sources like WMO (2022) (where we use only a single concentration projection, rather than having variation across scenarios)
      2. emissions for each scenario at the fossil/biosphere level. This is used for some extrapolations of latitudinal gradients. It's the same data as above, just at slightly higher sectoral detail.
  3. Put the received emissions in data/raw/input-scenarios
  4. Update the emissions file you use for your run. There are two options for how to do this:
    1. specify this from the command line via the --emissions-file option
    2. change the value of the emissions_file variable in scripts/create-latest-set-of-concentration-files.sh
  5. Run with a new run ID and ESGF version (using the command line argument --run-id and --esgf-version). Pick whatever makes sense here (we don't have strong rules about our versioning yet)
    • This will also require creating entries for the controlled vocabularies (CVs). This requires updating this file to include source IDs of the form "CR-scenario-esgf-version". In practice, simply copy the existing "CR-scenario-esgf-version" entries and update their version to match the ESGF version you used above. Then push this to GitHub.
    • When you run, you will need to update the value of --input4mips-cvs-source. You can do this either via the command-line argument --input4mips-cvs-source or just update the value in scripts/generate-concentration-files.py. The value should be of the form "gh:[commit-id]" e.g. "gh:c75a54d0af36dbedf654ad2eeba66e9c1fbce2a2".
  6. When the run is finished, upload the results to NERSC for the publication team (see Uploading to NERSC).

Uploading to NERSC

  • raw docs are pretty good: https://docs.nersc.gov/services/scp/
  • command is something like rsync --partial --progress -avR output-bundles/1.0.0/data/processed/esgf-ready/input4MIPs zrjn@dtn01.nersc.gov:/global/u2/z/zrjn/
    • -avR: sets the flags for copying recursively and with the directory structure we want
    • output-bundles/0.1.0/data/processed/esgf-ready/input4MIPs: the directory you want to upload
    • zrjn: zeb's username, yours will be something like fb. You can get this by logging into jupyter then looking at the start of your shell.
    • dtn01.nersc.gov:: where we want to upload to get this from the docs https://docs.nersc.gov/services/scp/
    • /global/u2/z/zrjn/: the path we want to upload to. This is just my home directory
  • move files to /global/cfs/projectdirs/m4931/zrjn-tmp, the 'staging' area effectively
  • update permissions
    • make all directories readable by anyone: find /global/cfs/projectdirs/m4931/zrjn-tmp/input4MIPs/ -type d -exec chmod 755 {} \;
    • make all files readable by anyone: find /global/cfs/projectdirs/m4931/zrjn-tmp/input4MIPs/ -type f -exec chmod 644 {} \;
  • message Sasha on slack with something like, "Hi, the files in /path/to/suitable/level/dir are ready to be published"

Parallelisation

By default, this all runs serially. You can add extra cores with the flags below:

  • --n-workers: the number of threaded (i.e. parallel) workers to use for submitting jobs
    • note: this doesn't result in true parallelism. A full explanation is beyond the scope of this document (but if you want to google, explore the difference between multiprocessing with threads compared to processes in python)
  • --n-workers-multiprocessing: the number of multiprocessing (i.e. parallel) workers to use, excluding any tasks that require running MAGICC
  • --n-workers-multiprocessing-magicc: the number of multiprocessing (i.e. parallel) workers to use for tasks that run MAGICC
  • --n-workers-per-magicc-notebook: the number of MAGICC workers to use in each MAGICC-running task.
    • note: the total number of MAGICC workers is the product of --n-workers-multiprocessing-magicc and --n-workers-per-magicc-notebook

In general, you want:

  • --n-workers: equal to the number of cores on your CPU (or more)
  • --n-workers-multiprocessing: equal to the number of cores on your CPU (or more)
  • --n-workers-multiprocessing-magicc, --n-workers-per-magicc-notebook: the product should be equal to equal to the number of cores on your CPU (or more)

For example, for an eight core machine you might do something like

pixi run python scripts/generate-concentration-files.py --n-workers 8 --n-workers-multiprocessing 8 --n-workers-multiprocessing-magicc 2 --n-workers-per-magicc-notebook 4

Specific gases

If you need/want to run only for a specific gas, you can use the --ghg flag as shown below.

pixi run python scripts/generate-concentration-files.py --ghg ccl4 --ghg cfc113

Development

TODO: update this section as we add:

  • tests
  • anything else

Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).

Contributing

TODO: update as we figure out the structure

Repository structure

TODO: update as we figure out the structure

We have a basic Makefile which captures key commands in one place (for more thoughts on why this makes sense, see general principles: automation). For an introduction to make, see this introduction from Software Carpentry. Having said this, if you're not interested in make, you can just copy the commands out of the Makefile by hand and you will be 90% as happy.

Tools

In this repository, we use the following tools:

  • git for version-control (for more on version control, see general principles: version control)
  • Pixi for environment management (for more on environment management, see general principles: environment management)
    • there are lots of environment management systems. Pixi works well in our experience and, for projects that need conda, it is the only solution we have tried that worked really well.
    • we track the pixi.lock file so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
  • pre-commit with some very basic settings to get some easy wins in terms of maintenance, specifically:
    • code formatting with ruff
    • basic file checks (removing unneeded whitespace, not committing large files etc.)
    • (for more thoughts on the usefulness of pre-commit, see general principles: automation
    • track your notebooks using jupytext (for more thoughts on the usefulness of Jupytext, see tips and tricks: Jupytext)
      • this avoids nasty merge conflicts and incomprehensible diffs
  • prefect for workflow orchestration

General background

  • relationship between this repo and https://github.com/PCMDI/input4MIPs_CVs
    • this repo pulls information from the 'source ID' fine in input4MIPs_CVs, this file: https://github.com/PCMDI/input4MIPs_CVs/blob/main/CVs/input4MIPs_source_id.json
    • in there, it is looking for keys like 'CR-*', to make sure that the 'source ID' (think unique ID) we use is 'registered'/known to in input4MIPs_CVs
    • the trick we play is that we can point to a specific commit or branch of input4MIPs_CVs, and then this repo is still happy.
    • the idea of input4MIPs_CVs is make sure that the wider forcings team is aware of what is coming and can manage some of the metadata around all of these different contributions (that come from different people)
    • we write the files using this information, so we can't really get it wrong but doing it this way means this metadata is defined in one spot, so it's a bit easier to manage

Original template

This project was generated from this template: basic python repository. copier is used to manage and distribute this template.

About

Generation of GHG concentration inputs (i.e. forcings) for CMIP7's ScenarioMIP.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages