Skip to content

Commit e10a00e

Browse files
Merge branch 'graphnet-team:main' into 26_01_20_review_docstrings
2 parents fefd716 + 6b5608a commit e10a00e

29 files changed

Lines changed: 2262 additions & 160 deletions

docs/source/data_conversion/data_conversion.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -265,8 +265,8 @@ In this example, the writer will save the entire set of extractor outputs - a di
265265
266266
267267
268-
Two writers are implemented in GraphNeT; the :code:`SQLiteWriter` and :code:`ParquetWriter`, each of which output files that are directly used for
269-
training by :code:`ParquetDataset` and :code:`SQLiteDataset`.
268+
Three writers are implemented in GraphNeT; the :code:`SQLiteWriter`, :code:`ParquetWriter`, and :code:`LMDBWriter`, each of which output files that are directly used for
269+
training by :code:`SQLiteDataset`, :code:`ParquetDataset`, and :code:`LMDBDataset` respectively.
270270

271271

272272

docs/source/datasets/datasets.rst

Lines changed: 45 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -155,18 +155,19 @@ It looks like so:
155155
</details>
156156

157157

158-
:code:`SQLiteDataset` & :code:`ParquetDataset`
159-
----------------------------------------------
158+
:code:`SQLiteDataset`, :code:`ParquetDataset` & :code:`LMDBDataset`
159+
--------------------------------------------------------------------
160160

161-
The two specific implementations of :code:`Dataset` exists :
161+
The three specific implementations of :code:`Dataset` exists :
162162

163163
- `ParquetDataset <https://graphnet-team.github.io/graphnet/api/graphnet.data.parquet.parquet_dataset.html>`_ : Constructs :code:`Dataset` from files created by :code:`ParquetWriter`.
164164
- `SQLiteDataset <https://graphnet-team.github.io/graphnet/api/graphnet.data.sqlite.sqlite_dataset.html>`_ : Constructs :code:`Dataset` from files created by :code:`SQLiteWriter`.
165+
- `LMDBDataset <https://graphnet-team.github.io/graphnet/api/graphnet.data.dataset.lmdb.lmdb_dataset.html>`_ : Constructs :code:`Dataset` from files created by :code:`LMDBWriter`.
165166

166167

167168
To instantiate a :code:`Dataset` from your files, you must specify at least the following:
168169

169-
- :code:`pulsemaps`: These are named fields in your Parquet files, or tables in your SQLite databases, which store one or more pulse series from which you would like to create a dataset. A pulse series represents the detector response, in the form of a series of PMT hits or pulses, in some time window, usually triggered by a single neutrino or atmospheric muon interaction. This is the data that will be served as input to the `Model`.
170+
- :code:`pulsemaps`: These are named fields in your Parquet files, or tables in your SQLite or LMDB databases, which store one or more pulse series from which you would like to create a dataset. A pulse series represents the detector response, in the form of a series of PMT hits or pulses, in some time window, usually triggered by a single neutrino or atmospheric muon interaction. This is the data that will be served as input to the `Model`.
170171
- :code:`truth_table`: The name of a table/array that contains the truth-level information associated with the pulse series, and should contain the truth labels that you would like to reconstruct or classify. Often this table will contain the true physical attributes of the primary particle — such as its true direction, energy, PID, etc. — and is therefore graph- or event-level (as opposed to the pulse series tables, which are node- or hit-level) truth information.
171172
- :code:`features`: The names of the columns in your pulse series table(s) that you would like to include for training; they typically constitute the per-node/-hit features such as xyz-position of sensors, charge, and photon arrival times.
172173
- :code:`truth`: The columns in your truth table/array that you would like to include in the dataset.
@@ -225,6 +226,32 @@ Or similarly for Parquet files:
225226
226227
graph = dataset[0] # torch_geometric.data.Data
227228
229+
Or similarly for LMDB files:
230+
231+
.. code-block:: python
232+
233+
from graphnet.data.dataset.lmdb.lmdb_dataset import LMDBDataset
234+
from graphnet.models.detector.prometheus import Prometheus
235+
from graphnet.models.graphs import KNNGraph
236+
from graphnet.models.graphs.nodes import NodesAsPulses
237+
238+
graph_definition = KNNGraph(
239+
detector=Prometheus(),
240+
node_definition=NodesAsPulses(),
241+
nb_nearest_neighbours=8,
242+
)
243+
244+
dataset = LMDBDataset(
245+
path="data/examples/lmdb/prometheus/prometheus-events.lmdb",
246+
pulsemaps="total",
247+
truth_table="mc_truth",
248+
features=["sensor_pos_x", "sensor_pos_y", "sensor_pos_z", "t", ...],
249+
truth=["injection_energy", "injection_zenith", ...],
250+
graph_definiton = graph_definition,
251+
)
252+
253+
graph = dataset[0] # torch_geometric.data.Data
254+
228255
It's then straightforward to create a :code:`DataLoader` for training, which will take care of batching, shuffling, and such:
229256

230257
.. code-block:: python
@@ -250,10 +277,10 @@ By default, the following fields will be available in a graph built by :code:`Da
250277
- :code:`graph[truth_label] for truth_label in truth`: For each truth label in the :code:`truth` argument, the corresponding data is stored as a :code:`[num_rows, 1]` dimensional tensor. E.g., :code:`graph["energy"] = torch.tensor(26, dtype=torch.float)`
251278
- :code:`graph[feature] for feature in features`: For each feature given in the :code:`features` argument, the corresponding data is stored as a :code:`[num_rows, 1]` dimensional tensor. E.g., :code:`graph["sensor_x"] = torch.tensor([100, -200, -300, 200], dtype=torch.float)``
252279

253-
:code:`SQLiteDataset` vs. :code:`ParquetDataset`
254-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
280+
:code:`SQLiteDataset` vs. :code:`ParquetDataset` vs. :code:`LMDBDataset`
281+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
255282

256-
Besides working on different file formats, :code:`SQLiteDataset` and :code:`ParquetDataset` have significant differences,
283+
Besides working on different file formats, :code:`SQLiteDataset`, :code:`ParquetDataset`, and :code:`LMDBDataset` have significant differences,
257284
which may lead you to choose one over the other, depending on the problem at hand.
258285

259286
:SQLiteDataset: SQLite provides fast random access to all events inside it. This makes plotting and subsampling your dataset particularly easy,
@@ -265,13 +292,20 @@ which may lead you to choose one over the other, depending on the problem at han
265292
This means that the subsampling of your dataset needs to happen prior to the conversion to :code:`parquet`, unlike `SQLiteDataset` which allows for subsampling after conversion, due to it's fast random access.
266293
Conversion of files to :code:`parquet` is significantly faster than its :code:`SQLite` counterpart.
267294

295+
:LMDBDataset: LMDB databases produced by :code:`LMDBWriter` store events as key-value pairs with configurable serialization methods (pickle, json, msgpack, dill).
296+
:code:`LMDBDataset` supports two modes: reading raw tables and computing data representations in real-time (similar to :code:`SQLiteDataset`), or reading pre-computed data representations directly from the database for faster access.
297+
LMDB provides fast random access similar to SQLite, while also supporting efficient storage of pre-computed graph representations, making it suitable for scenarios where you want to pre-compute and cache data representations.
298+
LMDB takes up roughly half the space of SQLite, and is therefore a good compromise between SQLite and Parquet.
299+
268300

269301
.. note::
270302

271303
:code:`ParquetDataset` is scalable to ultra large datasets, but is more difficult to work with and has a higher memory consumption.
272304

273305
:code:`SQLiteDataset` does not scale to very large datasets, but is easy to work with and has minimal memory consumption.
274306

307+
:code:`LMDBDataset` provides a balance between SQLite and Parquet, offering fast random access and support for pre-computed representations, making it well-suited for scenarios where data representations are computed once and reused multiple times.
308+
275309

276310
Choosing a subset of events using `selection`
277311
----------------------------------------------
@@ -297,7 +331,7 @@ would produce a :code:`Dataset` with only those five events.
297331

298332
.. note::
299333

300-
For :code:`SQLiteDatase`, the :code:`selection` argument specifies individual events chosen for the dataset,
334+
For :code:`SQLiteDataset` and :code:`LMDBDataset`, the :code:`selection` argument specifies individual events chosen for the dataset,
301335
whereas for :code:`ParquetDataset`, the :code:`selection` argument specifies which batches are used in the dataset.
302336

303337

@@ -347,12 +381,14 @@ You can combine multiple instances of :code:`Dataset` from GraphNeT into a singl
347381
from graphnet.data import EnsembleDataset
348382
from graphnet.data.parquet import ParquetDataset
349383
from graphnet.data.sqlite import SQLiteDataset
384+
from graphnet.data.dataset.lmdb.lmdb_dataset import LMDBDataset
350385
351386
dataset_1 = SQLiteDataset(...)
352387
dataset_2 = SQLiteDataset(...)
353388
dataset_3 = ParquetDataset(...)
389+
dataset_4 = LMDBDataset(...)
354390
355-
ensemble_dataset = EnsembleDataset([dataset_1, dataset_2, dataset_3])
391+
ensemble_dataset = EnsembleDataset([dataset_1, dataset_2, dataset_3, dataset_4])
356392
357393
You can find a detailed example `here <https://github.com/graphnet-team/graphnet/blob/main/examples/02_data/04_ensemble_dataset.py>`_ .
358394

Lines changed: 88 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,85 @@
11
.. include:: ../substitutions.rst
2+
===========
3+
Quick Start
4+
===========
5+
Here we provide a quick start guide for getting you started with |graphnet|\ GraphNeT.
26

3-
Installation
4-
============
7+
Installing From Source
8+
======================
59

6-
|graphnet|\ GraphNeT is available for Python 3.9 to Python 3.11.
10+
We recommend installing |graphnet|\ GraphNeT in a separate environment, e.g. using a Python virtual environment or Anaconda (see details on installation `here <https://www.anaconda.com/products/individual>`_).
11+
With conda installed, you can create a fresh environment like so
712

8-
.. note::
9-
We recommend installing |graphnet|\ GraphNeT in a separate environment, e.g. using a Python virtual environment or Anaconda (see details on installation `here <https://www.anaconda.com/products/individual>`_).
10-
With conda installed, you can create a fresh environment like so
13+
.. code-block:: bash
1114
12-
.. code-block:: bash
15+
# Create the environment with minimal packages
16+
conda create --name graphnet_env --no-default-packages python=3.10
17+
conda activate graphnet_env
1318
14-
# Create the environment with minimal packages
15-
conda create --name graphnet_env --no-default-packages python=3.10
16-
conda activate graphnet_env
19+
# Update central packaging libraries
20+
pip install --upgrade setuptools packaging
1721
18-
# Update central packaging libraries
19-
pip install --upgrade setuptools packaging
20-
21-
# Verify that only wheel, packaging and setuptools are installed
22-
pip list
22+
# Verify that only wheel, packaging and setuptools are installed
23+
pip list
2324
24-
# Now you're ready to proceed with the installation
25-
Quick Start
26-
-----------
25+
# Now you're ready to proceed with the installation
26+
2727
2828
.. raw:: html
2929
:file: quick-start.html
3030

3131

3232
When installation is completed, you should be able to run `the examples <https://github.com/graphnet-team/graphnet/tree/main/examples>`_.
3333

34-
Installation in CVMFS (IceCube)
35-
-------------------------------
34+
Installation into experiment-specific Environments
35+
--------------------------------------------------
36+
Users may want to install |graphnet|\ GraphNeT into an environment that is specific to their experiment. This is useful for converting data from the experiment into a deep learning friendly file format, or when deploying models as part of an experiment-specific processing chain.
3637

37-
You may want |graphnet|\ GraphNeT to be able to interface with IceTray, e.g., when converting I3 files to a deep learning friendly file format, or when deploying models as part of an IceTray chain. In these cases, you need to install |graphnet|\ GraphNeT in a Python runtime that has IceTray installed.
38+
Below are some examples of how to install |graphnet|\ GraphNeT into experiment-specific environments. If your experiment is missing, please feel free to open an issue on the `GitHub repository <https://github.com/graphnet-team/graphnet/issues>`_ and/or contribute a pull request.
3839

39-
To achieve this, we recommend installing |graphnet|\ GraphNeT into a CVMFS with IceTray installed, like so:
40+
IceTray (IceCube & P-ONE)
41+
~~~~~~~~~~~~~~~~~~~~~~~~~~
42+
While |graphnet|\ GraphNeT can be installed into existing IceTray environments that is either built from source or distributed through CVMFS, we highly recommend to instead use our existing Docker images that contain both IceTray and GraphNeT. These images are created by installing GraphNeT into public Docker images from the IceCube Collaboration.
43+
44+
Details on how to run these images as Apptainer environments are provided in the `Docker & Apptainer Images`_ section.
45+
46+
For users who prefer to install |graphnet|\ GraphNeT directly into a CVMFS environment rather than using Docker/Apptainer images, you can follow the steps below. This example uses PyTorch 2.7.0 (CPU) — adjust the PyTorch version and extras according to the compatibility matrix above.
4047

4148
.. code-block:: bash
42-
49+
4350
# Download GraphNeT
4451
git clone https://github.com/graphnet-team/graphnet.git
4552
cd graphnet
53+
4654
# Open your favorite CVMFS distribution
4755
eval `/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/setup.sh`
4856
/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/RHEL_7_x86_64/metaprojects/icetray/v1.5.1/env-shell.sh
49-
# Update central utils
50-
pip install --upgrade 'pip>=20'
51-
pip install wheel setuptools==59.5.0
52-
# Install graphnet into the CVMFS as a user
53-
pip install --user -r requirements/torch_cpu.txt -e .[torch,develop]
5457
58+
# Upgrade central packaging libraries
59+
pip install --user --upgrade setuptools versioneer
5560
56-
Once installed, |graphnet|\ GraphNeT is available whenever you open the CVMFS locally.
61+
# Install PyTorch (CPU)
62+
pip3 install --user torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
5763
58-
Installation with km3io (KM3NeT)
59-
-----------------------------------------------
64+
# Install GraphNeT
65+
pip3 install --user -e .[torch-27,develop] -f https://data.pyg.org/whl/torch-2.7.0+cpu.html
6066
61-
This installation is only necessary if you want to process KM3NeT/ARCA or KM3NeT/ORCA files. Processing means converting them from a `.root` offline format into a suitable format for training using |graphnet|. If you already have your KM3NeT data in `SQLite` or `parquet` format and only want to train a model or perform inference on this database, this specific installation is not needed.
67+
To use |graphnet|\ GraphNeT in a new terminal session, re-activate the CVMFS distribution and the virtual environment:
6268

63-
Note that this installation will add `km3io` ensuring it is built with a compatible versions. The steps below are provided for a conda environment, with an enviroment created in the same way it is done above in this page, but feel free to choose a different enviroment setup.
69+
.. code-block:: bash
6470
65-
As mentioned, it is highly reommended to create a conda enviroment where your installation is done to do not mess up any dependecy. It can be done with the following commands:
71+
eval `/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/setup.sh`
72+
/cvmfs/icecube.opensciencegrid.org/py3-v4.2.1/RHEL_7_x86_64/metaprojects/icetray/v1.5.1/env-shell.sh
73+
source ~/graphnet_venv/bin/activate
74+
python -c "import graphnet; print(graphnet.__version__)"
75+
76+
which should print the version of |graphnet|\ GraphNeT.
77+
78+
km3io (KM3NeT)
79+
~~~~~~~~~~~~~~~~
80+
Note that this installation will add `km3io` ensuring it is built with a compatible version. The steps below are provided for a conda environment, with an environment created in the same way it is done above in this page, but feel free to choose a different environment setup.
81+
82+
As mentioned, it is highly recommended to create a conda environment where your installation is done to do not mess up any dependency. It can be done with the following commands:
6683

6784
.. code-block:: bash
6885
@@ -71,23 +88,23 @@ As mentioned, it is highly reommended to create a conda enviroment where your in
7188
# Activate the environment and move to the graphnet repository you just cloned. If using conda:
7289
conda activate <full-path-to-env>
7390
74-
The isntallation of GraphNeT is then done by:
91+
The installation of GraphNeT is then done by:
7592

7693
.. code-block:: bash
7794
7895
git clone https://github.com/graphnet-team/graphnet.git
7996
cd graphnet
8097
81-
Choose the appropriate requirements file based on your system. Here there is just an example of installation with PyTorch-2.5.1 but check the matrix above for a full idea of all the versions can be installed.
98+
Choose the appropriate requirements file based on your system. Here there is just an example of installation with PyTorch-2.5.1 but check the matrix above for a full idea of all the versions that can be installed.
8299

83-
For CPU-only enviroments:
100+
For CPU-only environments:
84101

85102
.. code-block:: bash
86103
87104
pip3 install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cpu
88105
pip3 install -e .[torch-25] -f https://data.pyg.org/whl/torch-2.5.1+cpu.html
89106
90-
For GPU enviroments with, for instance, CUDA 11.8 drivers:
107+
For GPU environments with, for instance, CUDA 11.8 drivers:
91108

92109
.. code-block:: bash
93110
@@ -102,5 +119,36 @@ Downgrade setuptools for compatibility between km3io and GraphNeT.
102119
pip3 install km3io==1.2.0
103120
104121
105-
.. note::
106-
We recommend installing |graphnet|\ GraphNeT without GPU in clean metaprojects.
122+
Docker & Apptainer Images
123+
=========================
124+
125+
We provide Docker images for |graphnet|\ GraphNeT. The list of available Docker images with standalone installations of GraphNeT can be found in DockerHub at https://hub.docker.com/r/rorsoe/graphnet/tags.
126+
127+
New images are created automatically when a new release is published, and when a new PR is merged to the main branch (latest). Each image comes in both GPU and CPU versions, but with a limited selection of pytorch versions. The Dockerfile for the standalone images is `here <https://github.com/graphnet-team/graphnet/blob/main/docker/standalone/Dockerfile>`_.
128+
129+
In compliment to standalone images, we also provide experiment-specific images for:
130+
131+
- `IceCube & P-ONE (IceTray+GraphNeT) <https://hub.docker.com/r/rorsoe/graphnet_icetray/tags>`_ which is built using this `Dockerfile <https://github.com/graphnet-team/graphnet/blob/main/docker/icetray/Dockerfile>`_.
132+
- KM3NeT (km3io+GraphNeT) (Coming Soon)
133+
134+
135+
136+
Running Docker images as Apptainer environments
137+
-----------------------------------------------
138+
While Docker images require sudo-rights to run, they may be converted to Apptainer images and used as virtual environments - providing a convienient way to run |graphnet|\ GraphNeT without sudo-rights or the need to install it on your system.
139+
140+
To run one of the Docker images as a Apptainer environment, you can use the following command:
141+
142+
.. code-block:: bash
143+
144+
apptainer exec --cleanenv --env PYTHONNOUSERSITE=1 --env PYTHONPATH= docker://<path_to_image> bash
145+
146+
where <path_to_image> is the path to the image you want to use from the DockerHub. For example, if `rorsoe/graphnet:graphnet-1.8.0-cu126-torch26-ubuntu-22.04` is chosen, an image with GraphNeT 1.8.0 + PyTorch 2.6.0 + CUDA 12.6 installed will open. The additional arguments `--cleanenv --env PYTHONNOUSERSITE=1 --env PYTHONPATH=` ensure that the environment is not contaminated with any other packages that may be installed on your system.
147+
148+
To run one of the images with IceTray+GraphNeT as a Apptainer environment, you can for example use the following command:
149+
150+
.. code-block:: bash
151+
152+
apptainer exec --cleanenv --env PYTHONNOUSERSITE=1 --env PYTHONPATH= docker://rorsoe/graphnet_icetray:graphnet-1.8.0-cpu-torch26-icecube-icetray-icetray-devel-v1.13.0-ubuntu22.04-2025-02-12 bash
153+
154+
which opens an image with a CPU-installation of GraphNeT 1.8.0 + PyTorch v2.6.0 + IceTray v1.13.0 installed and ready to use. You can replace the image path with the one you want to use from the DockerHub.

0 commit comments

Comments
 (0)