GatorSense
diff --git a/‎.gitignore‎
Lines changed: 6 additions & 0 deletions b/‎.gitignore‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 185 additions & 65 deletions b/‎README.md‎
Lines changed: 185 additions & 65 deletions
diff --git a/‎SLURM/crowns.sh‎
Lines changed: 0 additions & 23 deletions b/‎SLURM/crowns.sh‎
Lines changed: 0 additions & 23 deletions
diff --git a/‎SLURM/dask.sh‎
Lines changed: 0 additions & 23 deletions b/‎SLURM/dask.sh‎
Lines changed: 0 additions & 23 deletions
diff --git a/‎compare_modalities.py‎ ‎examples/compare_modalities.py‎compare_modalities.py renamed to examples/compare_modalities.py
Lines changed: 2 additions & 2 deletions b/‎compare_modalities.py‎ ‎examples/compare_modalities.py‎compare_modalities.py renamed to examples/compare_modalities.py
Lines changed: 2 additions & 2 deletions
@@ -1,3 +1,6 @@
+# Ignore SLURM batch scripts and shell scripts
+SLURM/
+*.sh
 
 # custom
 .env
@@ -22,6 +25,9 @@ dist/
 .venv/
 uv.lock
 
+# IDE/Editor
+.vscode/
+
 # Compiled source #
 ###################
 *.com
 
@@ -1,110 +1,229 @@
-# NEON Tree Classification
+# NEON Multi-Modal Tree Species Dataset
 
-A modular Python package for processing NEON airborne data and multi-modal tree species classification using RGB, hyperspectral, and LiDAR data.
+Hyperspectral, RGB and LiDAR airborne data for **96 tree species** representing **5,518 individual trees** across **30 NEON sites** in North America.
 
-## Features
+## Dataset Overview
 
-### Data Processing
-- **NEON data download**: Automated download of RGB, hyperspectral, and LiDAR tiles
-- **Shapefile processing**: Coordinate system transformations and validation for tree crowns
-- **Multi-modal tile processing**: Convert and process HSI (H5 → GeoTIFF), RGB, and LiDAR data
-- **Crown-tile intersection**: Match tree crown annotations with corresponding image tiles
+- **5,518** individual tree crowns
+- **96** unique species  
+- **30** NEON sites across North America
+- **2018-2020** (3 years of data)
+- **3 modalities:** RGB, Hyperspectral (426 bands), LiDAR CHM
+
+## Quick Start
+
+```python
+# Load and explore the dataset
+from neon_tree_classification.core.dataset import NeonCrownDataset
+
+# Simple loading
+dataset = NeonCrownDataset.load()
+dataset.summary()  # Print dataset overview
+
+# Filter for specific species or sites  
+conifers = dataset.filter(species=['PSMEM', 'TSHE'])
+west_coast = conifers.filter(sites=['ABBY', 'HARV'])
 
-### Machine Learning
-- **Multi-modal models**: Separate architectures for RGB, hyperspectral (426 bands), and LiDAR
-- **Modular training**: PyTorch Lightning modules with CometML/TensorBoard logging
-- **Flexible data pipeline**: Clean tensor-only batches with configurable splits
-- **Modern packaging**: Uses `pyproject.toml` and `uv` for dependency management
+# Get dataset statistics
+stats = dataset.get_dataset_stats()
+```
+
+## Visualization Examples
+
+The package includes comprehensive visualization tools for all three modalities:
+
+| RGB Image | HSI Pseudo RGB | HSI PCA Decomposition |
+|-----------|----------------|----------------------|
+| ![RGB](sample_plots/sample_rgb.png) | ![HSI](sample_plots/sample_hsi.png) | ![HSI PCA](sample_plots/sample_hsi_pca.png) |
+
+| HSI Spectral Signatures | LiDAR Canopy Height Model |
+|-------------------------|---------------------------|
+| ![Spectra](sample_plots/sample_spectra.png) | ![LiDAR](sample_plots/sample_lidar.png) |
+
+```python
+# Visualization functions for tree crown data
+from neon_tree_classification.core.visualization import (
+    plot_rgb, plot_hsi, plot_hsi_pca, plot_hsi_spectra, plot_lidar
+)
+
+# RGB visualization
+plot_rgb('path/to/crown_rgb.tif')        # True color RGB image
+
+# Hyperspectral visualization options  
+plot_hsi('path/to/crown_hsi.tif')        # Pseudo RGB (bands ~660nm, ~550nm, ~450nm)
+plot_hsi_pca('path/to/crown_hsi.tif')    # PCA decomposition to 3 components
+plot_hsi_spectra('path/to/crown_hsi.tif') # Spectral signatures of pixels
+
+# LiDAR visualization
+plot_lidar('path/to/crown_chm.tif')      # Canopy height model with colorbar
+```
+
+### Quick Visualization with Dataset
+
+```python
+# Easy visualization with dataset integration
+from neon_tree_classification.core.dataset import NeonCrownDataset
+
+dataset = NeonCrownDataset.load()
+sample = dataset.data.iloc[0]  # Get first sample
+
+# Visualize all modalities for this tree crown
+plot_rgb(sample['rgb_path'])
+plot_hsi(sample['hsi_path'])  
+plot_lidar(sample['lidar_path'])
+```
+
+## Top Species
+
+The dataset includes 96 tree species. Here are the most common:
+
+| Rank | Species | Count | Percentage |
+|------|---------|-------|------------|
+| 1 | Picea mariana (Mill.) Britton, Sterns & Poggenb. | 678 | 12.3% |
+| 2 | Acer rubrum L. | 360 | 6.5% |
+| 3 | Pseudotsuga menziesii (Mirb.) Franco var. menziesii | 300 | 5.4% |
+| 4 | Populus tremuloides Michx. | 271 | 4.9% |
+| 5 | Quercus rubra L. | 243 | 4.4% |
+| 6 | Pinus palustris Mill. | 233 | 4.2% |
+| 7 | Tsuga canadensis (L.) Carrière | 200 | 3.6% |
+| 8 | Pinus contorta Douglas ex Loudon var. latifolia Engelm. ex S. Watson | 189 | 3.4% |
+| 9 | Abies lasiocarpa (Hook.) Nutt. var. lasiocarpa | 172 | 3.1% |
+| 10 | Betula neoalaskana Sarg. | 162 | 2.9% |
+
+## Geographic Distribution
+
+Data collected from **30 NEON sites** across North America:
+
+**1.** DEJU: 577 samples (10.5%)  
+**2.** BART: 533 samples (9.7%)  
+**3.** BONA: 504 samples (9.1%)  
+**4.** HARV: 490 samples (8.9%)  
+**5.** MLBS: 368 samples (6.7%)  
+**6.** RMNP: 329 samples (6.0%)  
+**7.** DELA: 299 samples (5.4%)  
+**8.** NIWO: 276 samples (5.0%)  
+**9.** UNDE: 262 samples (4.7%)  
+**10.** TALL: 246 samples (4.5%)  
 
 ## Installation
 
+### Basic Installation
 ```bash
+# Clone the repository
 git clone https://github.com/Ritesh313/NeonTreeClassification.git
 cd NeonTreeClassification
 
-# Install with uv (recommended)
-pip install uv
-uv sync
-
-# Or with pip
-pip install -e .
+# Install core dependencies
+pip install .
 ```
 
-## Quick Start
-
-### Data Processing
+### Optional Dependencies
 ```bash
-# Process NEON shapefiles
-python scripts/test_shapefile_processor.py
+# For development (tests, formatting, notebooks)
+pip install .[dev]
+
+# For data processing (geospatial tools)
+pip install .[processing]
 
-# Process tiles and match with crowns
-python scripts/process_tiles_to_crowns.py
+# For experiment logging
+pip install .[logging]
+
+# Install all optional dependencies
+pip install .[dev,processing,logging]
 ```
 
-### Model Training
-```bash
-# Train RGB model
-python train.py --modality rgb --csv_path data/crowns.csv --data_dir data/
+## Repository Structure
 
-# Train HSI model with CometML logging
-python train.py --modality hsi --logger comet --project_name my-project
+```
+NeonTreeClassification/
+├── neon_tree_classification/          # Main package
+│   ├── core/                         # Core functionality (dataset, visualization)
+│   │   ├── dataset.py               # Enhanced dataset with filtering & stats
+│   │   ├── datamodule.py            # PyTorch Lightning data module  
+│   │   └── visualization.py         # All visualization functions
+│   └── models/                      # ML architectures & Lightning modules
+├── examples/                         # Training and comparison examples
+│   ├── train.py                     # Main training script
+│   └── compare_modalities.py        # Multi-modal comparison
+├── notebooks/                        # Interactive exploration
+│   └── visualization.ipynb          # Visualization demo notebook
+├── processing/                       # Advanced data processing tools
+├── scripts/                          # Automation utilities
+├── sample_plots/                     # Generated sample images
+└── training_data_clean.csv          # Main dataset file
+```
+
+## Interactive Notebook
+
+Explore the dataset and visualization functions interactively:
 
-# Compare all modalities
-python compare_modalities.py --csv_path data/crowns.csv --data_dir data/
+```bash
+# Start Jupyter and open the visualization notebook
+jupyter notebook notebooks/visualization.ipynb
 ```
 
-### Using in code
-```python
-# Data processing
-from neon_tree_classification.data.shapefile_processor import ShapefileProcessor
+The notebook includes examples of:
+- Loading and filtering the dataset
+- RGB, HSI, and LiDAR visualizations  
+- Interactive exploration of tree crown data
 
-processor = ShapefileProcessor()
-sites_df, summary = processor.process_shapefiles(destination_dir)
+## Advanced Usage
+
+### Multi-modal Training
 
-# Model training
-from neon_tree_classification import NeonCrownDataModule, RGBClassifier
+```python
+# Train models on different modalities
+from neon_tree_classification.core.datamodule import NeonCrownDataModule
+from neon_tree_classification.models.lightning_modules import RGBClassifier
 
+# Setup data
 datamodule = NeonCrownDataModule(
-    csv_path="data/crowns.csv",
-    base_data_dir="data/",
-    modalities=["rgb"],
+    csv_path="training_data_clean.csv",
+    modalities=["rgb", "hsi", "lidar"],
     batch_size=32
 )
 
-classifier = RGBClassifier(model_type="resnet", num_classes=10)
+# Train RGB model
+classifier = RGBClassifier(num_classes=96)
 
 import lightning as L
 trainer = L.Trainer(max_epochs=50)
 trainer.fit(classifier, datamodule)
 ```
 
-## Architecture
+### Data Processing
 
-```
-neon_tree_classification/
-├── data/
-│   ├── dataset.py            # Multi-modal dataset
-│   ├── datamodule.py         # Lightning DataModule
-│   └── shapefile_processor.py # NEON shapefile processing
-├── models/
-│   ├── rgb_models.py         # RGB architectures
-│   ├── hsi_models.py         # Hyperspectral architectures
-│   ├── lidar_models.py       # LiDAR architectures
-│   └── lightning_modules.py  # Training modules
-└── processing/               # NEON data processing utilities
-
-scripts/
-├── download_neon_all_modalities.py  # Download NEON data
-├── process_tiles_to_crowns.py       # Tile processing pipeline
-└── test_shapefile_processor.py      # Test shapefile processing
+The package includes tools for processing NEON data, but most users will work with the pre-processed dataset.
+
+```python
+# For advanced users: process raw NEON data
+from processing.shapefile_processor import ShapefileProcessor
+processor = ShapefileProcessor()
+sites_df, summary = processor.process_shapefiles(destination_dir)
 ```
 
-## NEON Data Products
+## Dataset Details
 
+### NEON Data Products
 - **RGB**: `DP3.30010.001` - High-resolution orthorectified imagery
 - **Hyperspectral**: `DP3.30006.002` - 426-band spectrometer reflectance  
 - **LiDAR**: `DP3.30015.001` - Canopy Height Model
 
+### Data Structure
+```
+training_data_clean.csv - Main dataset file
+├── crown_id          - Unique identifier for each tree crown
+├── site              - NEON site code
+├── year              - Data collection year  
+├── species           - Species code
+├── species_name      - Full species name
+├── height            - Tree height (meters)
+├── rgb_path          - Path to RGB image
+├── hsi_path          - Path to hyperspectral image
+├── lidar_path        - Path to LiDAR CHM
+└── [other metadata]  - Additional tree measurements
+```
+
 ## Contributing
 
 1. Fork the repository
@@ -118,4 +237,5 @@ Ritesh Chowdhry
 ## Acknowledgments
 
 - National Ecological Observatory Network (NEON)
+- This dataset details were generated on 2025-08-24
 
@@ -13,8 +13,8 @@
 import json
 from pathlib import Path
 
-from neon_tree_classification import (
-    NeonCrownDataModule,
+from neon_tree_classification.core.datamodule import NeonCrownDataModule
+from neon_tree_classification.models.lightning_modules import (
     RGBClassifier,
     HSIClassifier,
     LiDARClassifier,