Skip to content

Commit 40d5d6d

Browse files
CopilotRitesh313
andcommitted
Restructure README to focus on project vision and clarify available features vs future plans
Co-authored-by: Ritesh313 <36135489+Ritesh313@users.noreply.github.com>
1 parent e4827e5 commit 40d5d6d

1 file changed

Lines changed: 64 additions & 114 deletions

File tree

README.md

Lines changed: 64 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -1,165 +1,115 @@
1-
# NEON Multi-Modal Tree Species Classification Dataset
1+
# NEON Multi-Modal Tree Species Classification
22

3-
A comprehensive dataset of 167 tree species with 47,971 individual tree crowns from 30 NEON sites across North America. Each sample includes RGB imagery, 369-band hyperspectral data, and LiDAR canopy height models.
3+
A comprehensive toolkit for multi-modal tree species classification using NEON ecological data. This project combines RGB imagery, hyperspectral data, and LiDAR to enable accurate tree species identification across diverse North American ecosystems.
44

5-
## Dataset Overview
5+
## Project Vision
66

7-
**Dataset Statistics:**
8-
- 47,971 individual tree crowns
9-
- 167 unique species
10-
- 30 NEON sites across North America
11-
- 10 years of data (2014-2023)
12-
- 3 modalities: RGB (3 bands), Hyperspectral (369 bands), LiDAR CHM (1 band)
13-
- Ecological metadata: Height (95.4% available), stem diameter (99.4% available), canopy position (81.4% available)
7+
This repository aims to provide an end-to-end solution for tree species classification:
148

15-
**Dataset Configurations:**
9+
- **Dataset**: Ready-to-use multi-modal tree crown dataset with 167 species (currently available)
10+
- **Data Processing**: Tools for downloading and processing raw NEON data products (in development)
11+
- **Classification Models**: Pre-trained models and training pipelines (in development)
12+
- **DeepForest Integration**: Automated crown detection and classification workflow (planned)
1613

17-
The dataset comes with 3 pre-configured subsets for different use cases:
14+
## What's Available Now
1815

19-
| Configuration | Samples | Species | Description |
20-
|---------------|---------|---------|-------------|
21-
| `combined` | 47,971 | 167 | Complete dataset with all available samples |
22-
| `large` | ~42,000 | ~162 | Main training set |
23-
| `high_quality` | ~5,500 | ~96 | Curated subset with highest data quality |
16+
### Multi-Modal Dataset
2417

25-
**Data Format:**
26-
- HDF5 storage for efficient compressed format and fast loading
27-
- CSV metadata files with crown IDs, species labels, site information, and ecological measurements
18+
A curated dataset of 47,971 individual tree crowns from 30 NEON sites, ready for immediate use:
2819

29-
## Installation
20+
- **167 tree species** from diverse North American ecosystems
21+
- **3 modalities**: RGB (3 bands), Hyperspectral (369 bands), LiDAR CHM (1 band)
22+
- **10 years of data** (2014-2023) with ecological metadata
23+
- **3 configurations**: `combined` (47,971 samples), `large` (~42,000 samples), `high_quality` (~5,500 samples)
24+
- **HDF5 format**: Efficient storage with automatic download (590 MB)
3025

31-
**Requirements:**
32-
- Python 3.9+ (recommended: Python 3.11)
33-
- CUDA-capable GPU (optional, recommended for training)
26+
## Quick Start
3427

35-
**Installation Steps:**
28+
### Installation
3629

3730
```bash
38-
# Clone the repository
3931
git clone https://github.com/Ritesh313/NeonTreeClassification.git
4032
cd NeonTreeClassification
41-
42-
# Install with uv (recommended)
43-
uv sync
44-
45-
# Or install with pip
46-
pip install -e .
33+
uv sync # or: pip install -e .
4734
```
4835

49-
## Getting Started
50-
51-
### Quick Start Example
52-
53-
Run the quickstart script to verify installation and download the dataset:
54-
55-
```bash
56-
# Using uv
57-
uv run python quickstart.py
58-
59-
# Or after activating environment
60-
source .venv/bin/activate
61-
python quickstart.py
62-
```
63-
64-
The dataset (590 MB) will automatically download on first use.
65-
66-
### Using Dataloaders
67-
68-
**Basic Usage:**
36+
### Get the Dataset
6937

7038
```python
7139
from scripts.get_dataloaders import get_dataloaders
7240

73-
# Get dataloaders (dataset downloads automatically on first use)
41+
# Dataset downloads automatically (590 MB)
7442
train_loader, test_loader = get_dataloaders(
7543
config='large',
7644
modalities=['rgb', 'hsi', 'lidar'],
7745
batch_size=32
7846
)
7947

80-
# Each batch contains:
48+
# Use in your training loop
8149
for batch in train_loader:
82-
rgb_data = batch['rgb'] # torch.Tensor [batch_size, 3, 128, 128]
83-
hsi_data = batch['hsi'] # torch.Tensor [batch_size, 369, 12, 12]
84-
lidar_data = batch['lidar'] # torch.Tensor [batch_size, 1, 12, 12]
85-
labels = batch['species_idx'] # torch.Tensor [batch_size]
50+
rgb = batch['rgb'] # [batch_size, 3, 128, 128]
51+
hsi = batch['hsi'] # [batch_size, 369, 12, 12]
52+
lidar = batch['lidar'] # [batch_size, 1, 12, 12]
53+
labels = batch['species_idx'] # [batch_size]
8654
```
8755

88-
**Training Scenarios:**
56+
Or run the quickstart example:
57+
```bash
58+
uv run python quickstart.py
59+
```
8960

90-
```python
91-
# Standard training on large dataset
92-
train_loader, test_loader = get_dataloaders(config='large', test_ratio=0.2)
61+
## Coming Soon
9362

94-
# Maximum data training
95-
train_loader, test_loader = get_dataloaders(config='combined', test_ratio=0.15)
63+
**Data Processing Pipeline**: Tools for processing raw NEON data products are being finalized and will be released for public use. This will enable users to:
64+
- Download NEON tiles for all three modalities
65+
- Crop individual tree crowns from shapefiles
66+
- Create custom datasets with their own crown annotations
9667

97-
# High-quality subset only
98-
train_loader, test_loader = get_dataloaders(config='high_quality', test_ratio=0.2)
68+
**Classification Models**: Pre-trained models and training scripts for tree species classification will be added to the repository.
9969

100-
# Domain transfer (train on large, test on high_quality)
101-
train_loader, test_loader = get_dataloaders(
102-
train_config='large',
103-
test_config='high_quality'
104-
)
105-
```
70+
**DeepForest Integration**: Planned integration with [DeepForest](https://github.com/weecology/DeepForest) to enable:
71+
- Automatic crown detection from aerial imagery
72+
- Seamless multi-modal data extraction for detected crowns
73+
- Direct classification using pre-trained models from this repository
10674

10775
## Dataset Details
10876

109-
**Top Species:**
110-
111-
| Rank | Species | Count | Percentage |
112-
|------|---------|-------|------------|
113-
| 1 | Acer rubrum L. | 5,684 | 11.8% |
114-
| 2 | Tsuga canadensis (L.) Carrière | 3,303 | 6.9% |
115-
| 3 | Pseudotsuga menziesii (Mirb.) Franco var. menziesii | 2,978 | 6.2% |
116-
| 4 | Pinus palustris Mill. | 2,207 | 4.6% |
117-
| 5 | Quercus rubra L. | 2,086 | 4.3% |
77+
**Top 5 Species:**
78+
1. Acer rubrum L. (5,684 samples, 11.8%)
79+
2. Tsuga canadensis (L.) Carrière (3,303 samples, 6.9%)
80+
3. Pseudotsuga menziesii (Mirb.) Franco var. menziesii (2,978 samples, 6.2%)
81+
4. Pinus palustris Mill. (2,207 samples, 4.6%)
82+
5. Quercus rubra L. (2,086 samples, 4.3%)
11883

119-
**Geographic Distribution:**
120-
121-
Data collected from 30 NEON sites across North America. Top 5 sites:
84+
**Top 5 Sites:**
12285
- HARV: 7,162 samples (14.9%)
12386
- MLBS: 5,424 samples (11.3%)
12487
- GRSM: 4,822 samples (10.1%)
12588
- DELA: 4,539 samples (9.5%)
12689
- RMNP: 3,931 samples (8.2%)
12790

12891
**NEON Data Products:**
129-
- RGB: DP3.30010.001 - High-resolution orthorectified imagery
130-
- Hyperspectral: DP3.30006.002 - 426-band spectrometer reflectance
131-
- LiDAR: DP3.30015.001 - Canopy Height Model
132-
133-
**Data Structure:**
134-
135-
The metadata CSV contains:
136-
- `crown_id`: Unique identifier for each tree crown
137-
- `species`: Species code
138-
- `species_name`: Full species name
139-
- `site`: NEON site code
140-
- `year`: Data collection year
141-
- `height`: Tree height in meters
142-
- `stemDiameter`: Stem diameter in cm
143-
- `canopyPosition`: Light exposure level
144-
- `rgb_path`, `hsi_path`, `lidar_path`: Paths to data in HDF5 file
92+
- RGB: DP3.30010.001 (High-resolution orthorectified imagery)
93+
- Hyperspectral: DP3.30006.002 (426-band spectrometer reflectance)
94+
- LiDAR: DP3.30015.001 (Canopy Height Model)
14595

146-
## Documentation
96+
For complete dataset documentation, training guides, and advanced usage, see the [docs/](docs/) directory.
14797

148-
For detailed documentation, see the `docs/` directory:
149-
- [Advanced Usage](docs/advanced_usage.md) - Custom filtering, Lightning DataModule, and advanced features
150-
- [Training Guide](docs/training.md) - Model training examples and baseline results
151-
- [Visualization Guide](docs/visualization.md) - Data visualization tools and examples
152-
- [Processing Pipeline](docs/processing.md) - NEON data processing workflow
98+
## Citation
15399

154-
## Contributing
100+
If you use this dataset in your research, please cite:
155101

156-
1. Fork the repository
157-
2. Create a feature branch
158-
3. Submit a pull request
159-
160-
See [CONTRIBUTING.md](CONTRIBUTING.md) for more details.
102+
```bibtex
103+
@dataset{neon_tree_classification_2024,
104+
title={NEON Multi-Modal Tree Species Classification Dataset},
105+
author={[Author Names]},
106+
year={2024},
107+
publisher={GitHub},
108+
url={https://github.com/Ritesh313/NeonTreeClassification}
109+
}
110+
```
161111

162112
## Acknowledgments
163113

164-
- National Ecological Observatory Network (NEON)
165-
- Dataset statistics generated on 2025-08-28
114+
National Ecological Observatory Network (NEON)
115+

0 commit comments

Comments
 (0)