HyperPRI

HyperPRI - Hyperspectral Plant Root Imagery

This Github Repo contains source code used to demonstrate how the hyperspectral data included within the HyperPRI dataset improves binary segmentation performance for a deep learning segmentation model.

==This code's public release is a work-in-progress and will be cleaned up following submissions to bioRxiv and Elsevier's COMPAG journal==

HyperPRI Dataset

Oct-15-2023: Initial upload/release of dataset
Mar-25-2024: Included hyperspectral data for the viewing pane's material (Lexan)
Jun-21-2024: Set aside rhizobox 40 as test data (dates: Aug-15 and Aug-24)

Preprint: bioRxiv 2023.09.29.559614v3

YouTube: Dataset Video

Why use the HyperPRI dataset?

Data in HyperPRI enhances plant science analyses and provides challenging features for machine learning models.

Hyperspectral data can supplement root analysis
Study root traits across time, from seedling to reproductively mature
Thin object features: 1-3 pixels wide
High correlation between the high-resolution channels of hyperspectral data

Computer Vision Tasks

There are a number of related CV tasks for this dataset:

Compute root characteristics (length, diameter, angle, count, system architecture, hyperspectral)
Determine root turnover
Observe drought resiliency and response
Compare multiple physical and hyperspectral plant traits across time
Investigate texture analysis techniques
Segment roots vs. soil

HyperPRI Dataset Information

Hyperspectral Data (400 – 1000 nm, every 2 nm)
Temporal Data: Imaged for 14 or 15 timesteps across two months
- Drought: Aug-06 to Aug-19, 78 - 91 days after planting (stage R6)
- Drought: Jun-28 to Jul-21, 39 – 62 days after planting (stage V7 - V9)
Fully-annotated segmentation masks
- Includes annotations for peanut nodules and pegs
Box weights at each time stamp
- Baseline Measurements: Empty box, dry soil, wet soil
32 Peanut (Arachis hypogaea) rhizoboxes – 358 images
32 Sweet Corn (Zea mays) rhizoboxes – 390 images

Running the Code Out of the Box

The primary Python packages used are PyTorch, PyTorch Lightning, and related utilities. See the environment.yml file for specific versions.

Create a Conda virtual environment (ideally) with the packages requested in the provided environment.yml file.
- Additional instructions for using the YAML file may be found on the Conda site.
Dataset directory setup. The full directory path from the base HyperPRI/ repository is Datasets/HyperPRI/.
- If using, place the JSON/CSV splits data in a Datasets/HyperPRI/data_splits subdirectory.
- Per plant type (eg. Peanut, Sweet Corn), place them in a Datasets/HyperPRI/{Peanut, SweetCorn}_968x608 subdirectory which hosts 3 of its own subdirectories: hsi_files, mask_files, and rgb_files. As the names suggest, the HSI .dat and .hdr files should be in hsi_files, and the PNG mask/image files should be in the mask_files and rgb_files subdirectories, respectively.
  - Please note that the paper only used a Peanut_968x608 subdirectory.
Across all model training, the following holds:
- kfold_train.py: Set start_split and num_splits to 0 and 5, respectively. Set the n_seeds to the number of training seeds desired per model.
- src/Experiments/params_HyperPRI.py: Batch Size of 2. Adam optimization with 0.001 LR, standard $\beta$ values, and no weight decay. num_classes=1 (binary).
For each model, the following architecture parameters were used for the paper. Henceforth, the .../params_HyperPRI.py file is referred to as "Parameters":
- UNET:
  - Parameters: n_channels=3.
  - kfold_train.py: Set dataset equal to "RGB".
  - Everything else should be hardcoded to get ~31.0M parameters.
- SpectralUNET:
  - Parameters: n_channels=238, patch_size=(608, 700), augment=True, spectral_bn_size=1650. The hsi_lo and hsi_hi values should be 25 and 263, respectively. This approximately corresponds to 450nm and 926nm on the EM spectrum.
  - kfold_train.py: Set dataset equal to "HSI". Set MODEL_SHARD to True. This will require at least 2 GPUs to train due to the size of features when inputting multiple images. If a single GPU is desired, set MODEL_SHARD to False and decrease the value of patch_size in the Parameters until the training fits.
- CubeNET-64:
  - Parameters: n_channels=238, patch_size=(608, 968), augment=False, cube_featmaps=64. The hsi_lo and hsi_hi values should be 25 and 263, respectively. This approximately corresponds to 450nm and 926nm on the EM spectrum.
  - kfold_train.py: Set dataset equal to "HSI". Set MODEL_SHARD to False.
Run kfold_train.py individually for each set of parameters in the previous step.
After training is finished, the models should be saved in their respective directories. Provided this is so, the kfold_validate.py file is all set-up and prepared for running out of the box. If segmentation maps of the dataset for all three models is requested, change the segmaps list to be [True, True, True].

For any issues and additional questions, please direct them to changspencer.

Training Notes & Baseline Performance

Comet-ML/Tensorboard Logging: If certain loggers are undesired, they can be commented out starting in the src/PLTrainer.py > train_net method. It will be up to the user to trace all places where the logger(s) may be disrupted through removing their instantiation/definition.

SpectralUNET Training: To train SpectralUNET with 1650 neurons in each layer with our coding setup and memory constraints, we had to randomly crop the hyperspectral cubes' height and width to $608\times 700$. We expect that even if the additional 268 width-wise pixels were included, the model's performance would still be subpar compared to UNET and CubeNET.

Validation Data

Metric	UNET	SpectralUNET	CubeNET-64
BCE Loss	0.080 (0.015)	0.146 (0.022)	0.077 (0.014)
DICE	0.838 (0.015)	0.717 (0.044)	0.844 (0.013)
+IOU	0.721 (0.022)	0.561 (0.053)	0.730 (0.019)
AP	0.919 (0.013)	0.781 (0.048)	0.923 (0.012)

Test Data

Metric	UNET	SpectralUNET	CubeNET-64
Pix Acc	0.733 (0.123)	0.751 (0.114)	0.898 (0.134)
DICE	0.162 (0.053)	0.161 (0.064)	0.471 (0.206)
+IOU	0.089 (0.031)	0.089 (0.039)	0.329 (0.163)
AP	0.226 (0.079)	0.220 (0.083)	0.610 (0.109)

Note: Metrics shown are the mean across all splits with standard deviation in parentheses. Dataset splits are described in the JSON files located at the dataset URL above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HyperPRI

Why use the HyperPRI dataset?

Computer Vision Tasks

HyperPRI Dataset Information

Running the Code Out of the Box

Training Notes & Baseline Performance

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

HyperPRI

Why use the HyperPRI dataset?

Computer Vision Tasks

HyperPRI Dataset Information

Running the Code Out of the Box

Training Notes & Baseline Performance