|
1 | | -# NEON Multi-Modal Tree Species Classification Dataset |
| 1 | +# NEON Multi-Modal Tree Species Classification |
2 | 2 |
|
3 | | -A comprehensive dataset of 167 tree species with 47,971 individual tree crowns from 30 NEON sites across North America. Each sample includes RGB imagery, 369-band hyperspectral data, and LiDAR canopy height models. |
| 3 | +A comprehensive toolkit for multi-modal tree species classification using NEON ecological data. This project combines RGB imagery, hyperspectral data, and LiDAR to enable accurate tree species identification across diverse North American ecosystems. |
4 | 4 |
|
5 | | -## Dataset Overview |
| 5 | +## Project Vision |
6 | 6 |
|
7 | | -**Dataset Statistics:** |
8 | | -- 47,971 individual tree crowns |
9 | | -- 167 unique species |
10 | | -- 30 NEON sites across North America |
11 | | -- 10 years of data (2014-2023) |
12 | | -- 3 modalities: RGB (3 bands), Hyperspectral (369 bands), LiDAR CHM (1 band) |
13 | | -- Ecological metadata: Height (95.4% available), stem diameter (99.4% available), canopy position (81.4% available) |
| 7 | +This repository aims to provide an end-to-end solution for tree species classification: |
14 | 8 |
|
15 | | -**Dataset Configurations:** |
| 9 | +- **Dataset**: Ready-to-use multi-modal tree crown dataset with 167 species (currently available) |
| 10 | +- **Data Processing**: Tools for downloading and processing raw NEON data products (in development) |
| 11 | +- **Classification Models**: Pre-trained models and training pipelines (in development) |
| 12 | +- **DeepForest Integration**: Automated crown detection and classification workflow (planned) |
16 | 13 |
|
17 | | -The dataset comes with 3 pre-configured subsets for different use cases: |
| 14 | +## What's Available Now |
18 | 15 |
|
19 | | -| Configuration | Samples | Species | Description | |
20 | | -|---------------|---------|---------|-------------| |
21 | | -| `combined` | 47,971 | 167 | Complete dataset with all available samples | |
22 | | -| `large` | ~42,000 | ~162 | Main training set | |
23 | | -| `high_quality` | ~5,500 | ~96 | Curated subset with highest data quality | |
| 16 | +### Multi-Modal Dataset |
24 | 17 |
|
25 | | -**Data Format:** |
26 | | -- HDF5 storage for efficient compressed format and fast loading |
27 | | -- CSV metadata files with crown IDs, species labels, site information, and ecological measurements |
| 18 | +A curated dataset of 47,971 individual tree crowns from 30 NEON sites, ready for immediate use: |
28 | 19 |
|
29 | | -## Installation |
| 20 | +- **167 tree species** from diverse North American ecosystems |
| 21 | +- **3 modalities**: RGB (3 bands), Hyperspectral (369 bands), LiDAR CHM (1 band) |
| 22 | +- **10 years of data** (2014-2023) with ecological metadata |
| 23 | +- **3 configurations**: `combined` (47,971 samples), `large` (~42,000 samples), `high_quality` (~5,500 samples) |
| 24 | +- **HDF5 format**: Efficient storage with automatic download (590 MB) |
30 | 25 |
|
31 | | -**Requirements:** |
32 | | -- Python 3.9+ (recommended: Python 3.11) |
33 | | -- CUDA-capable GPU (optional, recommended for training) |
| 26 | +## Quick Start |
34 | 27 |
|
35 | | -**Installation Steps:** |
| 28 | +### Installation |
36 | 29 |
|
37 | 30 | ```bash |
38 | | -# Clone the repository |
39 | 31 | git clone https://github.com/Ritesh313/NeonTreeClassification.git |
40 | 32 | cd NeonTreeClassification |
41 | | - |
42 | | -# Install with uv (recommended) |
43 | | -uv sync |
44 | | - |
45 | | -# Or install with pip |
46 | | -pip install -e . |
| 33 | +uv sync # or: pip install -e . |
47 | 34 | ``` |
48 | 35 |
|
49 | | -## Getting Started |
50 | | - |
51 | | -### Quick Start Example |
52 | | - |
53 | | -Run the quickstart script to verify installation and download the dataset: |
54 | | - |
55 | | -```bash |
56 | | -# Using uv |
57 | | -uv run python quickstart.py |
58 | | - |
59 | | -# Or after activating environment |
60 | | -source .venv/bin/activate |
61 | | -python quickstart.py |
62 | | -``` |
63 | | - |
64 | | -The dataset (590 MB) will automatically download on first use. |
65 | | - |
66 | | -### Using Dataloaders |
67 | | - |
68 | | -**Basic Usage:** |
| 36 | +### Get the Dataset |
69 | 37 |
|
70 | 38 | ```python |
71 | 39 | from scripts.get_dataloaders import get_dataloaders |
72 | 40 |
|
73 | | -# Get dataloaders (dataset downloads automatically on first use) |
| 41 | +# Dataset downloads automatically (590 MB) |
74 | 42 | train_loader, test_loader = get_dataloaders( |
75 | 43 | config='large', |
76 | 44 | modalities=['rgb', 'hsi', 'lidar'], |
77 | 45 | batch_size=32 |
78 | 46 | ) |
79 | 47 |
|
80 | | -# Each batch contains: |
| 48 | +# Use in your training loop |
81 | 49 | for batch in train_loader: |
82 | | - rgb_data = batch['rgb'] # torch.Tensor [batch_size, 3, 128, 128] |
83 | | - hsi_data = batch['hsi'] # torch.Tensor [batch_size, 369, 12, 12] |
84 | | - lidar_data = batch['lidar'] # torch.Tensor [batch_size, 1, 12, 12] |
85 | | - labels = batch['species_idx'] # torch.Tensor [batch_size] |
| 50 | + rgb = batch['rgb'] # [batch_size, 3, 128, 128] |
| 51 | + hsi = batch['hsi'] # [batch_size, 369, 12, 12] |
| 52 | + lidar = batch['lidar'] # [batch_size, 1, 12, 12] |
| 53 | + labels = batch['species_idx'] # [batch_size] |
86 | 54 | ``` |
87 | 55 |
|
88 | | -**Training Scenarios:** |
| 56 | +Or run the quickstart example: |
| 57 | +```bash |
| 58 | +uv run python quickstart.py |
| 59 | +``` |
89 | 60 |
|
90 | | -```python |
91 | | -# Standard training on large dataset |
92 | | -train_loader, test_loader = get_dataloaders(config='large', test_ratio=0.2) |
| 61 | +## Coming Soon |
93 | 62 |
|
94 | | -# Maximum data training |
95 | | -train_loader, test_loader = get_dataloaders(config='combined', test_ratio=0.15) |
| 63 | +**Data Processing Pipeline**: Tools for processing raw NEON data products are being finalized and will be released for public use. This will enable users to: |
| 64 | +- Download NEON tiles for all three modalities |
| 65 | +- Crop individual tree crowns from shapefiles |
| 66 | +- Create custom datasets with their own crown annotations |
96 | 67 |
|
97 | | -# High-quality subset only |
98 | | -train_loader, test_loader = get_dataloaders(config='high_quality', test_ratio=0.2) |
| 68 | +**Classification Models**: Pre-trained models and training scripts for tree species classification will be added to the repository. |
99 | 69 |
|
100 | | -# Domain transfer (train on large, test on high_quality) |
101 | | -train_loader, test_loader = get_dataloaders( |
102 | | - train_config='large', |
103 | | - test_config='high_quality' |
104 | | -) |
105 | | -``` |
| 70 | +**DeepForest Integration**: Planned integration with [DeepForest](https://github.com/weecology/DeepForest) to enable: |
| 71 | +- Automatic crown detection from aerial imagery |
| 72 | +- Seamless multi-modal data extraction for detected crowns |
| 73 | +- Direct classification using pre-trained models from this repository |
106 | 74 |
|
107 | 75 | ## Dataset Details |
108 | 76 |
|
109 | | -**Top Species:** |
110 | | - |
111 | | -| Rank | Species | Count | Percentage | |
112 | | -|------|---------|-------|------------| |
113 | | -| 1 | Acer rubrum L. | 5,684 | 11.8% | |
114 | | -| 2 | Tsuga canadensis (L.) Carrière | 3,303 | 6.9% | |
115 | | -| 3 | Pseudotsuga menziesii (Mirb.) Franco var. menziesii | 2,978 | 6.2% | |
116 | | -| 4 | Pinus palustris Mill. | 2,207 | 4.6% | |
117 | | -| 5 | Quercus rubra L. | 2,086 | 4.3% | |
| 77 | +**Top 5 Species:** |
| 78 | +1. Acer rubrum L. (5,684 samples, 11.8%) |
| 79 | +2. Tsuga canadensis (L.) Carrière (3,303 samples, 6.9%) |
| 80 | +3. Pseudotsuga menziesii (Mirb.) Franco var. menziesii (2,978 samples, 6.2%) |
| 81 | +4. Pinus palustris Mill. (2,207 samples, 4.6%) |
| 82 | +5. Quercus rubra L. (2,086 samples, 4.3%) |
118 | 83 |
|
119 | | -**Geographic Distribution:** |
120 | | - |
121 | | -Data collected from 30 NEON sites across North America. Top 5 sites: |
| 84 | +**Top 5 Sites:** |
122 | 85 | - HARV: 7,162 samples (14.9%) |
123 | 86 | - MLBS: 5,424 samples (11.3%) |
124 | 87 | - GRSM: 4,822 samples (10.1%) |
125 | 88 | - DELA: 4,539 samples (9.5%) |
126 | 89 | - RMNP: 3,931 samples (8.2%) |
127 | 90 |
|
128 | 91 | **NEON Data Products:** |
129 | | -- RGB: DP3.30010.001 - High-resolution orthorectified imagery |
130 | | -- Hyperspectral: DP3.30006.002 - 426-band spectrometer reflectance |
131 | | -- LiDAR: DP3.30015.001 - Canopy Height Model |
132 | | - |
133 | | -**Data Structure:** |
134 | | - |
135 | | -The metadata CSV contains: |
136 | | -- `crown_id`: Unique identifier for each tree crown |
137 | | -- `species`: Species code |
138 | | -- `species_name`: Full species name |
139 | | -- `site`: NEON site code |
140 | | -- `year`: Data collection year |
141 | | -- `height`: Tree height in meters |
142 | | -- `stemDiameter`: Stem diameter in cm |
143 | | -- `canopyPosition`: Light exposure level |
144 | | -- `rgb_path`, `hsi_path`, `lidar_path`: Paths to data in HDF5 file |
| 92 | +- RGB: DP3.30010.001 (High-resolution orthorectified imagery) |
| 93 | +- Hyperspectral: DP3.30006.002 (426-band spectrometer reflectance) |
| 94 | +- LiDAR: DP3.30015.001 (Canopy Height Model) |
145 | 95 |
|
146 | | -## Documentation |
| 96 | +For complete dataset documentation, training guides, and advanced usage, see the [docs/](docs/) directory. |
147 | 97 |
|
148 | | -For detailed documentation, see the `docs/` directory: |
149 | | -- [Advanced Usage](docs/advanced_usage.md) - Custom filtering, Lightning DataModule, and advanced features |
150 | | -- [Training Guide](docs/training.md) - Model training examples and baseline results |
151 | | -- [Visualization Guide](docs/visualization.md) - Data visualization tools and examples |
152 | | -- [Processing Pipeline](docs/processing.md) - NEON data processing workflow |
| 98 | +## Citation |
153 | 99 |
|
154 | | -## Contributing |
| 100 | +If you use this dataset in your research, please cite: |
155 | 101 |
|
156 | | -1. Fork the repository |
157 | | -2. Create a feature branch |
158 | | -3. Submit a pull request |
159 | | - |
160 | | -See [CONTRIBUTING.md](CONTRIBUTING.md) for more details. |
| 102 | +```bibtex |
| 103 | +@dataset{neon_tree_classification_2024, |
| 104 | + title={NEON Multi-Modal Tree Species Classification Dataset}, |
| 105 | + author={[Author Names]}, |
| 106 | + year={2024}, |
| 107 | + publisher={GitHub}, |
| 108 | + url={https://github.com/Ritesh313/NeonTreeClassification} |
| 109 | +} |
| 110 | +``` |
161 | 111 |
|
162 | 112 | ## Acknowledgments |
163 | 113 |
|
164 | | -- National Ecological Observatory Network (NEON) |
165 | | -- Dataset statistics generated on 2025-08-28 |
| 114 | +National Ecological Observatory Network (NEON) |
| 115 | + |
0 commit comments