1- # NEON Tree Classification
1+ # NEON Multi-Modal Tree Species Dataset
22
3- A modular Python package for processing NEON airborne data and multi-modal tree species classification using RGB, hyperspectral, and LiDAR data .
3+ Hyperspectral, RGB and LiDAR airborne data for ** 96 tree species** representing ** 5,518 individual trees ** across ** 30 NEON sites ** in North America .
44
5- ## Features
5+ ## Dataset Overview
66
7- ### Data Processing
8- - ** NEON data download** : Automated download of RGB, hyperspectral, and LiDAR tiles
9- - ** Shapefile processing** : Coordinate system transformations and validation for tree crowns
10- - ** Multi-modal tile processing** : Convert and process HSI (H5 → GeoTIFF), RGB, and LiDAR data
11- - ** Crown-tile intersection** : Match tree crown annotations with corresponding image tiles
7+ - ** 5,518** individual tree crowns
8+ - ** 96** unique species
9+ - ** 30** NEON sites across North America
10+ - ** 2018-2020** (3 years of data)
11+ - ** 3 modalities:** RGB, Hyperspectral (426 bands), LiDAR CHM
12+
13+ ## Quick Start
14+
15+ ``` python
16+ # Load and explore the dataset
17+ from neon_tree_classification.core.dataset import NeonCrownDataset
18+
19+ # Simple loading
20+ dataset = NeonCrownDataset.load()
21+ dataset.summary() # Print dataset overview
22+
23+ # Filter for specific species or sites
24+ conifers = dataset.filter(species = [' PSMEM' , ' TSHE' ])
25+ west_coast = conifers.filter(sites = [' ABBY' , ' HARV' ])
1226
13- ### Machine Learning
14- - ** Multi-modal models** : Separate architectures for RGB, hyperspectral (426 bands), and LiDAR
15- - ** Modular training** : PyTorch Lightning modules with CometML/TensorBoard logging
16- - ** Flexible data pipeline** : Clean tensor-only batches with configurable splits
17- - ** Modern packaging** : Uses ` pyproject.toml ` and ` uv ` for dependency management
27+ # Get dataset statistics
28+ stats = dataset.get_dataset_stats()
29+ ```
30+
31+ ## Visualization Examples
32+
33+ The package includes comprehensive visualization tools for all three modalities:
34+
35+ | RGB Image | HSI Pseudo RGB | HSI PCA Decomposition |
36+ | -----------| ----------------| ----------------------|
37+ | ![ RGB] ( sample_plots/sample_rgb.png ) | ![ HSI] ( sample_plots/sample_hsi.png ) | ![ HSI PCA] ( sample_plots/sample_hsi_pca.png ) |
38+
39+ | HSI Spectral Signatures | LiDAR Canopy Height Model |
40+ | -------------------------| ---------------------------|
41+ | ![ Spectra] ( sample_plots/sample_spectra.png ) | ![ LiDAR] ( sample_plots/sample_lidar.png ) |
42+
43+ ``` python
44+ # Visualization functions for tree crown data
45+ from neon_tree_classification.core.visualization import (
46+ plot_rgb, plot_hsi, plot_hsi_pca, plot_hsi_spectra, plot_lidar
47+ )
48+
49+ # RGB visualization
50+ plot_rgb(' path/to/crown_rgb.tif' ) # True color RGB image
51+
52+ # Hyperspectral visualization options
53+ plot_hsi(' path/to/crown_hsi.tif' ) # Pseudo RGB (bands ~660nm, ~550nm, ~450nm)
54+ plot_hsi_pca(' path/to/crown_hsi.tif' ) # PCA decomposition to 3 components
55+ plot_hsi_spectra(' path/to/crown_hsi.tif' ) # Spectral signatures of pixels
56+
57+ # LiDAR visualization
58+ plot_lidar(' path/to/crown_chm.tif' ) # Canopy height model with colorbar
59+ ```
60+
61+ ### Quick Visualization with Dataset
62+
63+ ``` python
64+ # Easy visualization with dataset integration
65+ from neon_tree_classification.core.dataset import NeonCrownDataset
66+
67+ dataset = NeonCrownDataset.load()
68+ sample = dataset.data.iloc[0 ] # Get first sample
69+
70+ # Visualize all modalities for this tree crown
71+ plot_rgb(sample[' rgb_path' ])
72+ plot_hsi(sample[' hsi_path' ])
73+ plot_lidar(sample[' lidar_path' ])
74+ ```
75+
76+ ## Top Species
77+
78+ The dataset includes 96 tree species. Here are the most common:
79+
80+ | Rank | Species | Count | Percentage |
81+ | ------| ---------| -------| ------------|
82+ | 1 | Picea mariana (Mill.) Britton, Sterns & Poggenb. | 678 | 12.3% |
83+ | 2 | Acer rubrum L. | 360 | 6.5% |
84+ | 3 | Pseudotsuga menziesii (Mirb.) Franco var. menziesii | 300 | 5.4% |
85+ | 4 | Populus tremuloides Michx. | 271 | 4.9% |
86+ | 5 | Quercus rubra L. | 243 | 4.4% |
87+ | 6 | Pinus palustris Mill. | 233 | 4.2% |
88+ | 7 | Tsuga canadensis (L.) Carrière | 200 | 3.6% |
89+ | 8 | Pinus contorta Douglas ex Loudon var. latifolia Engelm. ex S. Watson | 189 | 3.4% |
90+ | 9 | Abies lasiocarpa (Hook.) Nutt. var. lasiocarpa | 172 | 3.1% |
91+ | 10 | Betula neoalaskana Sarg. | 162 | 2.9% |
92+
93+ ## Geographic Distribution
94+
95+ Data collected from ** 30 NEON sites** across North America:
96+
97+ ** 1.** DEJU: 577 samples (10.5%)
98+ ** 2.** BART: 533 samples (9.7%)
99+ ** 3.** BONA: 504 samples (9.1%)
100+ ** 4.** HARV: 490 samples (8.9%)
101+ ** 5.** MLBS: 368 samples (6.7%)
102+ ** 6.** RMNP: 329 samples (6.0%)
103+ ** 7.** DELA: 299 samples (5.4%)
104+ ** 8.** NIWO: 276 samples (5.0%)
105+ ** 9.** UNDE: 262 samples (4.7%)
106+ ** 10.** TALL: 246 samples (4.5%)
18107
19108## Installation
20109
110+ ### Basic Installation
21111``` bash
112+ # Clone the repository
22113git clone https://github.com/Ritesh313/NeonTreeClassification.git
23114cd NeonTreeClassification
24115
25- # Install with uv (recommended)
26- pip install uv
27- uv sync
28-
29- # Or with pip
30- pip install -e .
116+ # Install core dependencies
117+ pip install .
31118```
32119
33- ## Quick Start
34-
35- ### Data Processing
120+ ### Optional Dependencies
36121``` bash
37- # Process NEON shapefiles
38- python scripts/test_shapefile_processor.py
122+ # For development (tests, formatting, notebooks)
123+ pip install .[dev]
124+
125+ # For data processing (geospatial tools)
126+ pip install .[processing]
39127
40- # Process tiles and match with crowns
41- python scripts/process_tiles_to_crowns.py
128+ # For experiment logging
129+ pip install .[logging]
130+
131+ # Install all optional dependencies
132+ pip install .[dev,processing,logging]
42133```
43134
44- ### Model Training
45- ``` bash
46- # Train RGB model
47- python train.py --modality rgb --csv_path data/crowns.csv --data_dir data/
135+ ## Repository Structure
48136
49- # Train HSI model with CometML logging
50- python train.py --modality hsi --logger comet --project_name my-project
137+ ```
138+ NeonTreeClassification/
139+ ├── neon_tree_classification/ # Main package
140+ │ ├── core/ # Core functionality (dataset, visualization)
141+ │ │ ├── dataset.py # Enhanced dataset with filtering & stats
142+ │ │ ├── datamodule.py # PyTorch Lightning data module
143+ │ │ └── visualization.py # All visualization functions
144+ │ └── models/ # ML architectures & Lightning modules
145+ ├── examples/ # Training and comparison examples
146+ │ ├── train.py # Main training script
147+ │ └── compare_modalities.py # Multi-modal comparison
148+ ├── notebooks/ # Interactive exploration
149+ │ └── visualization.ipynb # Visualization demo notebook
150+ ├── processing/ # Advanced data processing tools
151+ ├── scripts/ # Automation utilities
152+ ├── sample_plots/ # Generated sample images
153+ └── training_data_clean.csv # Main dataset file
154+ ```
155+
156+ ## Interactive Notebook
157+
158+ Explore the dataset and visualization functions interactively:
51159
52- # Compare all modalities
53- python compare_modalities.py --csv_path data/crowns.csv --data_dir data/
160+ ``` bash
161+ # Start Jupyter and open the visualization notebook
162+ jupyter notebook notebooks/visualization.ipynb
54163```
55164
56- ### Using in code
57- ``` python
58- # Data processing
59- from neon_tree_classification.data.shapefile_processor import ShapefileProcessor
165+ The notebook includes examples of:
166+ - Loading and filtering the dataset
167+ - RGB, HSI, and LiDAR visualizations
168+ - Interactive exploration of tree crown data
60169
61- processor = ShapefileProcessor()
62- sites_df, summary = processor.process_shapefiles(destination_dir)
170+ ## Advanced Usage
171+
172+ ### Multi-modal Training
63173
64- # Model training
65- from neon_tree_classification import NeonCrownDataModule, RGBClassifier
174+ ``` python
175+ # Train models on different modalities
176+ from neon_tree_classification.core.datamodule import NeonCrownDataModule
177+ from neon_tree_classification.models.lightning_modules import RGBClassifier
66178
179+ # Setup data
67180datamodule = NeonCrownDataModule(
68- csv_path = " data/crowns.csv" ,
69- base_data_dir = " data/" ,
70- modalities = [" rgb" ],
181+ csv_path = " training_data_clean.csv" ,
182+ modalities = [" rgb" , " hsi" , " lidar" ],
71183 batch_size = 32
72184)
73185
74- classifier = RGBClassifier(model_type = " resnet" , num_classes = 10 )
186+ # Train RGB model
187+ classifier = RGBClassifier(num_classes = 96 )
75188
76189import lightning as L
77190trainer = L.Trainer(max_epochs = 50 )
78191trainer.fit(classifier, datamodule)
79192```
80193
81- ## Architecture
194+ ### Data Processing
82195
83- ```
84- neon_tree_classification/
85- ├── data/
86- │ ├── dataset.py # Multi-modal dataset
87- │ ├── datamodule.py # Lightning DataModule
88- │ └── shapefile_processor.py # NEON shapefile processing
89- ├── models/
90- │ ├── rgb_models.py # RGB architectures
91- │ ├── hsi_models.py # Hyperspectral architectures
92- │ ├── lidar_models.py # LiDAR architectures
93- │ └── lightning_modules.py # Training modules
94- └── processing/ # NEON data processing utilities
95-
96- scripts/
97- ├── download_neon_all_modalities.py # Download NEON data
98- ├── process_tiles_to_crowns.py # Tile processing pipeline
99- └── test_shapefile_processor.py # Test shapefile processing
196+ The package includes tools for processing NEON data, but most users will work with the pre-processed dataset.
197+
198+ ``` python
199+ # For advanced users: process raw NEON data
200+ from processing.shapefile_processor import ShapefileProcessor
201+ processor = ShapefileProcessor()
202+ sites_df, summary = processor.process_shapefiles(destination_dir)
100203```
101204
102- ## NEON Data Products
205+ ## Dataset Details
103206
207+ ### NEON Data Products
104208- ** RGB** : ` DP3.30010.001 ` - High-resolution orthorectified imagery
105209- ** Hyperspectral** : ` DP3.30006.002 ` - 426-band spectrometer reflectance
106210- ** LiDAR** : ` DP3.30015.001 ` - Canopy Height Model
107211
212+ ### Data Structure
213+ ```
214+ training_data_clean.csv - Main dataset file
215+ ├── crown_id - Unique identifier for each tree crown
216+ ├── site - NEON site code
217+ ├── year - Data collection year
218+ ├── species - Species code
219+ ├── species_name - Full species name
220+ ├── height - Tree height (meters)
221+ ├── rgb_path - Path to RGB image
222+ ├── hsi_path - Path to hyperspectral image
223+ ├── lidar_path - Path to LiDAR CHM
224+ └── [other metadata] - Additional tree measurements
225+ ```
226+
108227## Contributing
109228
1102291 . Fork the repository
@@ -118,4 +237,5 @@ Ritesh Chowdhry
118237## Acknowledgments
119238
120239- National Ecological Observatory Network (NEON)
240+ - This dataset details were generated on 2025-08-24
121241
0 commit comments