This directory contains tools for processing downloaded NEON airborne data products (RGB, HSI, LiDAR) into machine learning-ready formats.
NEON data downloads have complex nested directory structures that are difficult to work with directly. These tools flatten and organize the data with standardized naming conventions.
Processes downloaded NEON tiles from their deeply nested directory structure into a flattened, organized format suitable for ML workflows.
Input Structure:
downloaded_tiles/
├── SITE_YEAR/
│ ├── rgb/DP3.../neon-aop-products/year/FullSite/.../YYYY_SITE_#_EASTING_NORTHING_image.tif
│ ├── hsi/DP3.../neon-aop-products/year/FullSite/.../NEON_D##_SITE_DP3_EASTING_NORTHING_reflectance.h5
│ └── lidar/DP3.../neon-aop-products/year/FullSite/.../NEON_D##_SITE_DP3_EASTING_NORTHING_CHM.tif
Output Structure:
curated_tiles/
├── rgb/SITE_YEAR_EASTING_NORTHING_rgb.tif
├── hsi_tif/SITE_YEAR_EASTING_NORTHING_hsi.tif # Converted from H5
└── lidar/SITE_YEAR_EASTING_NORTHING_lidar.tif
Key Features:
- Handles complex NEON directory nesting automatically
- Matches tiles across RGB, HSI, and LiDAR modalities by coordinates
- Creates standardized filenames for easy identification
- Only processes complete tile sets (all 3 modalities present)
- Option for flat structure or modality subdirectories
- Preserves original files by default
Converts HSI files from HDF5 (.h5) format to GeoTIFF (.tif) format for easier processing and compatibility with standard geospatial tools.
Crops individual tree crowns from NEON multi-modal tiles (RGB, LiDAR, HSI) using crown polygon data. Designed for machine learning workflows requiring individual tree-level data.
Creates training-ready CSV files by combining cropped crown data with species labels from NEON Vegetation Structure and Traits (VST) data. Simple script that merges crop metadata with species labels for machine learning workflows.
Converts TIF crown crops to NPY format for faster loading during machine learning training. Includes validation and data cleaning steps.
Key Features:
- 6.5x faster loading compared to TIF format
- Data validation: Binary validation for small tree crowns (no thresholds)
- NoData cleaning: Replaces -9999 values with 0 for cleaner downstream processing
- Modality-aware validation:
- HSI & LiDAR: Allows crown masking (requires ≥10% valid pixels)
- RGB: Strict validation (no nodata pixels expected)
- Metadata updates: Creates updated CSV with validity flags and NPY paths
- Incremental processing: Can process individual modalities separately
Input Structure:
cropped_crowns_modality_organized/
├── rgb/crown_id.tif
├── hsi/crown_id.tif
├── lidar/crown_id.tif
└── crop_metadata.csv
Output Structure:
cropped_crowns_npy/
├── rgb/crown_id.npy
├── hsi/crown_id.npy
├── lidar/crown_id.npy
└── crop_metadata_npy.csv
Input Requirements:
- Curated NEON tiles directory with
rgb/,lidar/, andhsi_tif/subdirectories - Crown polygon data in GeoPackage (.gpkg) or Shapefile format with individual tree locations
- Crown data must include
siteID,year, and individual identification columns
Output Structure (Flat - Default):
output_dir/
├── SITE_YEAR_INDIVIDUAL_CROWNIDX_rgb.tif
├── SITE_YEAR_INDIVIDUAL_CROWNIDX_lidar.tif
├── SITE_YEAR_INDIVIDUAL_CROWNIDX_hsi.tif
└── crop_metadata.csv
Output Structure (Modality Subdirectories):
output_dir/
├── rgb/SITE_YEAR_INDIVIDUAL_CROWNIDX.tif
├── lidar/SITE_YEAR_INDIVIDUAL_CROWNIDX.tif
├── hsi/SITE_YEAR_INDIVIDUAL_CROWNIDX.tif
└── crop_metadata.csv
Key Features:
- Site-specific UTM coordinate transformations for accurate cropping
- Multi-modal alignment ensuring each crown has RGB, LiDAR, and HSI crops
- Configurable buffer around crown polygons
- Spatial indexing for efficient crown-tile matching
- Comprehensive metadata logging with processing timestamps
- Flexible output organization (flat or modality-organized)
- Robust error handling and progress tracking
python curate_tiles.py --input-dir downloaded_neon_tiles/ --output-dir curated_tiles/python crop_crowns_multimodal.py \
--tiles_dir curated_tiles/ \
--crowns_gpkg crown_polygons.gpkg \
--output_dir cropped_crowns/python create_training_csv.py \
--crop_metadata cropped_crowns/crop_metadata.csv \
--vst_labels neon_vst_data.csv \
--output training_data.csv# Convert all modalities
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized \
/path/to/crop_metadata.csv
# Auto-detect metadata CSV
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized
# Custom output directory
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized \
--npy-dir cropped_crowns_npy_custom--modality_subdir: Organize output in modality subdirectories (rgb/, lidar/, hsi/)--buffer FLOAT: Buffer around crowns in meters (default: 2.0)--site SITE: Filter crowns by NEON site code--year YEAR: Filter crowns by year--max_crowns N: Limit processing to N crowns (for testing)
--delete-originals: Delete source files after copying (default: preserve originals)--flat-structure: Create completely flat structure without modality subdirectories--dry-run: Preview what would be processed without actually copying files
# Standard curation with modality subdirectories
python curate_tiles.py \
--input-dir /path/to/downloaded_neon_tiles_20250819 \
--output-dir /path/to/curated_tiles_20250819
# Dry run to see what would be processed
python curate_tiles.py \
--input-dir /path/to/downloaded_neon_tiles_20250819 \
--output-dir /path/to/curated_tiles_20250819 \
--dry-run
# Completely flat structure
python curate_tiles.py \
--input-dir /path/to/downloaded_neon_tiles_20250819 \
--output-dir /path/to/curated_tiles_flat \
--flat-structure# Basic cropping with flat output structure
python crop_crowns_multimodal.py \
--tiles_dir curated_tiles_20250819/ \
--crowns_gpkg crown_polygons.gpkg \
--output_dir cropped_crowns_flat/
# Organized by modality subdirectories
python crop_crowns_multimodal.py \
--tiles_dir curated_tiles_20250819/ \
--crowns_gpkg crown_polygons.gpkg \
--output_dir cropped_crowns_organized/ \
--modality_subdir
# Filter by site with custom buffer
python crop_crowns_multimodal.py \
--tiles_dir curated_tiles_20250819/ \
--crowns_gpkg crown_polygons.gpkg \
--output_dir cropped_crowns_harv/ \
--site HARV \
--buffer 3.0
# Test run with limited crowns
python crop_crowns_multimodal.py \
--tiles_dir curated_tiles_20250819/ \
--crowns_gpkg crown_polygons.gpkg \
--output_dir cropped_crowns_test/ \
--max_crowns 10# Standard conversion with auto-detection
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized
# With explicit metadata CSV
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized \
/path/to/crop_metadata.csv
# Custom output directory and CSV
python convert_tif_to_npy.py \
/path/to/cropped_crowns_modality_organized \
/path/to/crop_metadata.csv \
--npy-dir cropped_crowns_optimized \
--output-csv /path/to/custom_metadata_npy.csv- Complete sets only: Only processes coordinates with all 3 modalities (RGB, HSI, LiDAR)
- Safe operations: Original files are preserved by default
- Skip existing: Re-running will skip files that already exist in output directory
- Coordinate matching: Uses regex patterns to extract coordinates from NEON filenames
- Large files: Process can be slow due to large HSI files (several GB each)
- RGB:
YYYY_SITE_#_EASTING_NORTHING_image.tif - HSI:
NEON_D##_SITE_DP3_EASTING_NORTHING_reflectance.h5 - LiDAR:
NEON_D##_SITE_DP3_EASTING_NORTHING_CHM.tif
- Download NEON data using
../neon_downloader.py - Curate tiles with
curate_tiles.pyto flatten and organize the data - Convert HSI format with
hsi_convert_h5_to_tif.py(if needed) - Crop individual crowns with
crop_crowns_multimodal.pyusing crown polygon data - Create training CSV with
create_training_csv.pyto combine crops with species labels - Convert to NPY format with
convert_tif_to_npy.pyfor optimized training performance - Use for machine learning model training and evaluation
- Download NEON data using
../neon_downloader.py - Run
curate_tiles.pyto flatten and organize the data - Use curated tiles directly for tile-level machine learning workflows
- Complete steps 1-3 above
- Obtain crown polygon data (from field surveys, automated detection, etc.)
- Run
crop_crowns_multimodal.pyto extract individual tree crops - Run
convert_tif_to_npy.pyto convert crops to optimized NPY format - Use NPY crown crops for tree-level classification, detection, or analysis