Skip to content

Commit 3a16bd5

Browse files
committed
feat: Complete repository restructure with an increased focus on dataset
BREAKING CHANGES: - Restructured entire repository to emphasize dataset and visualization over processing - Moved core functionality to new modular structure - Updated all import paths and package structure New Structure: - core/: Main user-facing functionality (dataset, datamodule, visualization) - processing/: Advanced data processing tools (moved from scattered locations) - examples/: Training and comparison scripts - notebooks/: Interactive visualization demos - sample_plots/: Generated sample images for documentation Major Features Added: - Enhanced NeonCrownDataset with filtering, statistics, and summary methods - Comprehensive visualization utilities for RGB, HSI (pseudo RGB, PCA, spectra), and LiDAR - Dynamic README generation with live dataset statistics and new structure documentation - Interactive Jupyter notebook for visualization exploration - Automated sample image generation for documentation File Moves: - dataset.py, datamodule.py -> core/ - simple_visualization.py -> core/visualization.py - train.py, compare_modalities.py -> examples/ - All data processing scripts -> processing/ - Created notebooks/visualization.ipynb Dependencies: - Streamlined dependencies from 12 to 8 core packages - Moved geospatial tools (geopandas, shapely, etc.) to optional [processing] group - Updated to pytorch-lightning>=2.0.0 - Added scikit-learn for PCA visualization Updated Documentation: - README now includes installation instructions, repository structure, and interactive notebook sections - All import paths updated across examples, scripts, notebooks, and README generation script - Added .vscode/ to .gitignore This restructure makes the repository more user-friendly with dataset and visualization as primary features, while keeping advanced processing tools available but optional.
1 parent e80c911 commit 3a16bd5

46 files changed

Lines changed: 11811 additions & 3655 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
# Ignore SLURM batch scripts and shell scripts
2+
SLURM/
3+
*.sh
14

25
# custom
36
.env
@@ -22,6 +25,9 @@ dist/
2225
.venv/
2326
uv.lock
2427

28+
# IDE/Editor
29+
.vscode/
30+
2531
# Compiled source #
2632
###################
2733
*.com

README.md

Lines changed: 185 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,229 @@
1-
# NEON Tree Classification
1+
# NEON Multi-Modal Tree Species Dataset
22

3-
A modular Python package for processing NEON airborne data and multi-modal tree species classification using RGB, hyperspectral, and LiDAR data.
3+
Hyperspectral, RGB and LiDAR airborne data for **96 tree species** representing **5,518 individual trees** across **30 NEON sites** in North America.
44

5-
## Features
5+
## Dataset Overview
66

7-
### Data Processing
8-
- **NEON data download**: Automated download of RGB, hyperspectral, and LiDAR tiles
9-
- **Shapefile processing**: Coordinate system transformations and validation for tree crowns
10-
- **Multi-modal tile processing**: Convert and process HSI (H5 → GeoTIFF), RGB, and LiDAR data
11-
- **Crown-tile intersection**: Match tree crown annotations with corresponding image tiles
7+
- **5,518** individual tree crowns
8+
- **96** unique species
9+
- **30** NEON sites across North America
10+
- **2018-2020** (3 years of data)
11+
- **3 modalities:** RGB, Hyperspectral (426 bands), LiDAR CHM
12+
13+
## Quick Start
14+
15+
```python
16+
# Load and explore the dataset
17+
from neon_tree_classification.core.dataset import NeonCrownDataset
18+
19+
# Simple loading
20+
dataset = NeonCrownDataset.load()
21+
dataset.summary() # Print dataset overview
22+
23+
# Filter for specific species or sites
24+
conifers = dataset.filter(species=['PSMEM', 'TSHE'])
25+
west_coast = conifers.filter(sites=['ABBY', 'HARV'])
1226

13-
### Machine Learning
14-
- **Multi-modal models**: Separate architectures for RGB, hyperspectral (426 bands), and LiDAR
15-
- **Modular training**: PyTorch Lightning modules with CometML/TensorBoard logging
16-
- **Flexible data pipeline**: Clean tensor-only batches with configurable splits
17-
- **Modern packaging**: Uses `pyproject.toml` and `uv` for dependency management
27+
# Get dataset statistics
28+
stats = dataset.get_dataset_stats()
29+
```
30+
31+
## Visualization Examples
32+
33+
The package includes comprehensive visualization tools for all three modalities:
34+
35+
| RGB Image | HSI Pseudo RGB | HSI PCA Decomposition |
36+
|-----------|----------------|----------------------|
37+
| ![RGB](sample_plots/sample_rgb.png) | ![HSI](sample_plots/sample_hsi.png) | ![HSI PCA](sample_plots/sample_hsi_pca.png) |
38+
39+
| HSI Spectral Signatures | LiDAR Canopy Height Model |
40+
|-------------------------|---------------------------|
41+
| ![Spectra](sample_plots/sample_spectra.png) | ![LiDAR](sample_plots/sample_lidar.png) |
42+
43+
```python
44+
# Visualization functions for tree crown data
45+
from neon_tree_classification.core.visualization import (
46+
plot_rgb, plot_hsi, plot_hsi_pca, plot_hsi_spectra, plot_lidar
47+
)
48+
49+
# RGB visualization
50+
plot_rgb('path/to/crown_rgb.tif') # True color RGB image
51+
52+
# Hyperspectral visualization options
53+
plot_hsi('path/to/crown_hsi.tif') # Pseudo RGB (bands ~660nm, ~550nm, ~450nm)
54+
plot_hsi_pca('path/to/crown_hsi.tif') # PCA decomposition to 3 components
55+
plot_hsi_spectra('path/to/crown_hsi.tif') # Spectral signatures of pixels
56+
57+
# LiDAR visualization
58+
plot_lidar('path/to/crown_chm.tif') # Canopy height model with colorbar
59+
```
60+
61+
### Quick Visualization with Dataset
62+
63+
```python
64+
# Easy visualization with dataset integration
65+
from neon_tree_classification.core.dataset import NeonCrownDataset
66+
67+
dataset = NeonCrownDataset.load()
68+
sample = dataset.data.iloc[0] # Get first sample
69+
70+
# Visualize all modalities for this tree crown
71+
plot_rgb(sample['rgb_path'])
72+
plot_hsi(sample['hsi_path'])
73+
plot_lidar(sample['lidar_path'])
74+
```
75+
76+
## Top Species
77+
78+
The dataset includes 96 tree species. Here are the most common:
79+
80+
| Rank | Species | Count | Percentage |
81+
|------|---------|-------|------------|
82+
| 1 | Picea mariana (Mill.) Britton, Sterns & Poggenb. | 678 | 12.3% |
83+
| 2 | Acer rubrum L. | 360 | 6.5% |
84+
| 3 | Pseudotsuga menziesii (Mirb.) Franco var. menziesii | 300 | 5.4% |
85+
| 4 | Populus tremuloides Michx. | 271 | 4.9% |
86+
| 5 | Quercus rubra L. | 243 | 4.4% |
87+
| 6 | Pinus palustris Mill. | 233 | 4.2% |
88+
| 7 | Tsuga canadensis (L.) Carrière | 200 | 3.6% |
89+
| 8 | Pinus contorta Douglas ex Loudon var. latifolia Engelm. ex S. Watson | 189 | 3.4% |
90+
| 9 | Abies lasiocarpa (Hook.) Nutt. var. lasiocarpa | 172 | 3.1% |
91+
| 10 | Betula neoalaskana Sarg. | 162 | 2.9% |
92+
93+
## Geographic Distribution
94+
95+
Data collected from **30 NEON sites** across North America:
96+
97+
**1.** DEJU: 577 samples (10.5%)
98+
**2.** BART: 533 samples (9.7%)
99+
**3.** BONA: 504 samples (9.1%)
100+
**4.** HARV: 490 samples (8.9%)
101+
**5.** MLBS: 368 samples (6.7%)
102+
**6.** RMNP: 329 samples (6.0%)
103+
**7.** DELA: 299 samples (5.4%)
104+
**8.** NIWO: 276 samples (5.0%)
105+
**9.** UNDE: 262 samples (4.7%)
106+
**10.** TALL: 246 samples (4.5%)
18107

19108
## Installation
20109

110+
### Basic Installation
21111
```bash
112+
# Clone the repository
22113
git clone https://github.com/Ritesh313/NeonTreeClassification.git
23114
cd NeonTreeClassification
24115

25-
# Install with uv (recommended)
26-
pip install uv
27-
uv sync
28-
29-
# Or with pip
30-
pip install -e .
116+
# Install core dependencies
117+
pip install .
31118
```
32119

33-
## Quick Start
34-
35-
### Data Processing
120+
### Optional Dependencies
36121
```bash
37-
# Process NEON shapefiles
38-
python scripts/test_shapefile_processor.py
122+
# For development (tests, formatting, notebooks)
123+
pip install .[dev]
124+
125+
# For data processing (geospatial tools)
126+
pip install .[processing]
39127

40-
# Process tiles and match with crowns
41-
python scripts/process_tiles_to_crowns.py
128+
# For experiment logging
129+
pip install .[logging]
130+
131+
# Install all optional dependencies
132+
pip install .[dev,processing,logging]
42133
```
43134

44-
### Model Training
45-
```bash
46-
# Train RGB model
47-
python train.py --modality rgb --csv_path data/crowns.csv --data_dir data/
135+
## Repository Structure
48136

49-
# Train HSI model with CometML logging
50-
python train.py --modality hsi --logger comet --project_name my-project
137+
```
138+
NeonTreeClassification/
139+
├── neon_tree_classification/ # Main package
140+
│ ├── core/ # Core functionality (dataset, visualization)
141+
│ │ ├── dataset.py # Enhanced dataset with filtering & stats
142+
│ │ ├── datamodule.py # PyTorch Lightning data module
143+
│ │ └── visualization.py # All visualization functions
144+
│ └── models/ # ML architectures & Lightning modules
145+
├── examples/ # Training and comparison examples
146+
│ ├── train.py # Main training script
147+
│ └── compare_modalities.py # Multi-modal comparison
148+
├── notebooks/ # Interactive exploration
149+
│ └── visualization.ipynb # Visualization demo notebook
150+
├── processing/ # Advanced data processing tools
151+
├── scripts/ # Automation utilities
152+
├── sample_plots/ # Generated sample images
153+
└── training_data_clean.csv # Main dataset file
154+
```
155+
156+
## Interactive Notebook
157+
158+
Explore the dataset and visualization functions interactively:
51159

52-
# Compare all modalities
53-
python compare_modalities.py --csv_path data/crowns.csv --data_dir data/
160+
```bash
161+
# Start Jupyter and open the visualization notebook
162+
jupyter notebook notebooks/visualization.ipynb
54163
```
55164

56-
### Using in code
57-
```python
58-
# Data processing
59-
from neon_tree_classification.data.shapefile_processor import ShapefileProcessor
165+
The notebook includes examples of:
166+
- Loading and filtering the dataset
167+
- RGB, HSI, and LiDAR visualizations
168+
- Interactive exploration of tree crown data
60169

61-
processor = ShapefileProcessor()
62-
sites_df, summary = processor.process_shapefiles(destination_dir)
170+
## Advanced Usage
171+
172+
### Multi-modal Training
63173

64-
# Model training
65-
from neon_tree_classification import NeonCrownDataModule, RGBClassifier
174+
```python
175+
# Train models on different modalities
176+
from neon_tree_classification.core.datamodule import NeonCrownDataModule
177+
from neon_tree_classification.models.lightning_modules import RGBClassifier
66178

179+
# Setup data
67180
datamodule = NeonCrownDataModule(
68-
csv_path="data/crowns.csv",
69-
base_data_dir="data/",
70-
modalities=["rgb"],
181+
csv_path="training_data_clean.csv",
182+
modalities=["rgb", "hsi", "lidar"],
71183
batch_size=32
72184
)
73185

74-
classifier = RGBClassifier(model_type="resnet", num_classes=10)
186+
# Train RGB model
187+
classifier = RGBClassifier(num_classes=96)
75188

76189
import lightning as L
77190
trainer = L.Trainer(max_epochs=50)
78191
trainer.fit(classifier, datamodule)
79192
```
80193

81-
## Architecture
194+
### Data Processing
82195

83-
```
84-
neon_tree_classification/
85-
├── data/
86-
│ ├── dataset.py # Multi-modal dataset
87-
│ ├── datamodule.py # Lightning DataModule
88-
│ └── shapefile_processor.py # NEON shapefile processing
89-
├── models/
90-
│ ├── rgb_models.py # RGB architectures
91-
│ ├── hsi_models.py # Hyperspectral architectures
92-
│ ├── lidar_models.py # LiDAR architectures
93-
│ └── lightning_modules.py # Training modules
94-
└── processing/ # NEON data processing utilities
95-
96-
scripts/
97-
├── download_neon_all_modalities.py # Download NEON data
98-
├── process_tiles_to_crowns.py # Tile processing pipeline
99-
└── test_shapefile_processor.py # Test shapefile processing
196+
The package includes tools for processing NEON data, but most users will work with the pre-processed dataset.
197+
198+
```python
199+
# For advanced users: process raw NEON data
200+
from processing.shapefile_processor import ShapefileProcessor
201+
processor = ShapefileProcessor()
202+
sites_df, summary = processor.process_shapefiles(destination_dir)
100203
```
101204

102-
## NEON Data Products
205+
## Dataset Details
103206

207+
### NEON Data Products
104208
- **RGB**: `DP3.30010.001` - High-resolution orthorectified imagery
105209
- **Hyperspectral**: `DP3.30006.002` - 426-band spectrometer reflectance
106210
- **LiDAR**: `DP3.30015.001` - Canopy Height Model
107211

212+
### Data Structure
213+
```
214+
training_data_clean.csv - Main dataset file
215+
├── crown_id - Unique identifier for each tree crown
216+
├── site - NEON site code
217+
├── year - Data collection year
218+
├── species - Species code
219+
├── species_name - Full species name
220+
├── height - Tree height (meters)
221+
├── rgb_path - Path to RGB image
222+
├── hsi_path - Path to hyperspectral image
223+
├── lidar_path - Path to LiDAR CHM
224+
└── [other metadata] - Additional tree measurements
225+
```
226+
108227
## Contributing
109228

110229
1. Fork the repository
@@ -118,4 +237,5 @@ Ritesh Chowdhry
118237
## Acknowledgments
119238

120239
- National Ecological Observatory Network (NEON)
240+
- This dataset details were generated on 2025-08-24
121241

SLURM/crowns.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.

SLURM/dask.sh

Lines changed: 0 additions & 23 deletions
This file was deleted.
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@
1313
import json
1414
from pathlib import Path
1515

16-
from neon_tree_classification import (
17-
NeonCrownDataModule,
16+
from neon_tree_classification.core.datamodule import NeonCrownDataModule
17+
from neon_tree_classification.models.lightning_modules import (
1818
RGBClassifier,
1919
HSIClassifier,
2020
LiDARClassifier,

0 commit comments

Comments
 (0)