Deep Learning for Canopy Height: 10% Data Scale Reproducibility

🚀 Executive Summary

This project investigates the democratization of high-resolution climate monitoring. It reproduces the State-of-the-Art (SOTA) deep learning framework by Pauls et al. (2024) for global canopy height estimation, but imposes strict constraints: 10% of the original training data and consumer-grade hardware (Tesla T4 GPU).

Key Result: The optimized model (ResNet50 + Huber Loss) achieved a Mean Absolute Error (MAE) of 2.85m. This is only a 17% reduction in accuracy compared to the original full-scale model trained on high-performance clusters, and it significantly outperforms prior global benchmarks like Lang et al. (2023).

📊 Performance & Benchmarks

Contrary to the assumption that "more data is always better," this project demonstrates that efficient backbone selection and loss function tuning can yield production-grade results with a fraction of the compute.

Model / Paper	Backbone	Data Scale	Loss Function	MAE (m)	Status
This Project (Best)	ResNet50	10%	Standard Huber	2.85	Reproduced
This Project (Baseline)	ResNet50	10%	Shift-Resilient Huber	2.89	Reproduced
Pauls et al. (2024)	ResNet50	100%	Shift-Resilient Huber	2.43	SOTA Reference
Lang et al. (2023)	CNN Ensemble	100%	N/A	6.47	Prior Benchmark
Potapov et al. (2021)	Bagged Trees	100%	N/A	6.92	Prior Benchmark

Data Source: Bachelor Thesis, Table 6.2 & Table 3.2.

🛠️ Technical Implementation

This repository adapts the original codebase for restricted environments (Google Colab High-RAM).

Architecture: U-Net with a ResNet50 encoder (pre-trained on ImageNet vs. Random Initialization tested).
Data Pipeline: Processed 11,749 samples of Sentinel-2 optical imagery fused with GEDI LiDAR sparse labels.
Optimization: AdamW optimizer with decoupled weight decay to handle regularization on small batches.
Experimentation:
- Comparisons of ResNet34 vs. ResNet50 vs. ResNet101 backbones under data scarcity.
- Ablation studies on Shift-Resilient Loss vs. Standard Huber Loss.

Key Findings

Architecture Efficiency: ResNet50 provided the optimal trade-off. The larger ResNet101 (52M parameters) suffered from overfitting/convergence issues due to the small batch size constraint (Batch size 10 vs 32).
Loss Function Dynamics: While Pauls et al. (2024) proposed "Shift-Resilient Loss" to fix geolocation errors, my experiments proved that at 10% data scale, Standard Huber Loss actually outperforms the complex shifted loss (2.85m vs 2.89m MAE).
Tall Canopy Challenge: The model struggles with trees >30m (MAE 20.84m) due to severe class imbalance in the training set (only 2.8% of samples were tall forests).

💻 Usage (Reproducibility)

The entire training pipeline has been ported to a single Google Colab notebook for ease of access.

Open the Notebook: Click here to run on Colab
Data Setup: The notebook connects to Google Drive to mount pre-processed .npz files (Sentinel-2 + GEDI).

Training:

# Example command structure
python main.py --config config.yaml --model resnet50 --loss huber

🖼️ Visual Results

Qualitative comparison of model outputs (Height predictions in meters). Model comparison: ResNet50 + Huber (left) vs. ResNet50 + Weighted (right). Each model takes the same two satellite images as input, and outputs their corresponding canopy height estimations. Higher predictions are indicated by brighter colors.

📜 Citation & Credits

This work constitutes the Bachelor's Thesis: "Deep Learning for Canopy Height Estimation: Experimental Reproducibility at 10% Data Scale" (2025), University of Copenhagen.

The core architecture forked and further developed from Estimating Canopy Height at Scale by Pauls et al. (2024). Please cite the original work if you use the core architecture:

@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls et al.},
booktitle={Forty-first International Conference on Machine Learning},
year={2024}
}

Estimating Canopy Height at Scale [ICML2024]

Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke

[Paper] Google Earth Engine viewer] [BibTeX]

We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

A comparison between our map and two other existing global height maps (Lang et al., Potapov et al.), as well as a regional map for France reveals that the visual quality improved a lot. It closely matches the one from regional maps, albeit some regions with remaining quality differences (e.g. column 8)

Interactive Google Earth Engine viewer

We uploaded our produced canopy height map to Google Earth Engine and created a GEE app that allows users to visualize our map globally and compare it to other existing products. If you want to build your own app or download/use our map in another way, you can access the map under the following asset_id:

var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020')

# To display on the map, create the mosaic:
var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020').mosaic()

Acknowledgements

This paper is part of the project AI4Forest, which is funded by the German Aerospace Agency (DLR), the german federal ministry for education and research (BMBF) and the french national research agency (anr). Further, calculations (or parts of them) for this publication were performed on the HPC cluster PALMA II of the University of Münster, subsidised by the DFG (INST 211/667-1).

Citing the paper

If you use our map in your research, please cite using the following BibTex:

@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls and Max Zimmer and Una M. Kelly and Martin Schwartz and Sassan Saatchi and Philippe CIAIS and Sebastian Pokutta and Martin Brandt and Fabian Gieseke},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=ZzCY0fRver}
}

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
datasets_pytorch/ai4forest_camera		datasets_pytorch/ai4forest_camera
figures		figures
scripts		scripts
training		training
.gitignore		.gitignore
.python-version		.python-version
FVWM_BSc_Thesis_2025.pdf		FVWM_BSc_Thesis_2025.pdf
README.md		README.md
exploration.ipynb		exploration.ipynb
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Canopy Height: 10% Data Scale Reproducibility

🚀 Executive Summary

📊 Performance & Benchmarks

🛠️ Technical Implementation

Key Findings

💻 Usage (Reproducibility)

🖼️ Visual Results

📜 Citation & Credits

Estimating Canopy Height at Scale [ICML2024]

Interactive Google Earth Engine viewer

Acknowledgements

Citing the paper

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Canopy Height: 10% Data Scale Reproducibility

🚀 Executive Summary

📊 Performance & Benchmarks

🛠️ Technical Implementation

Key Findings

💻 Usage (Reproducibility)

🖼️ Visual Results

📜 Citation & Credits

Estimating Canopy Height at Scale [ICML2024]

Interactive Google Earth Engine viewer

Acknowledgements

Citing the paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages