This project investigates the democratization of high-resolution climate monitoring. It reproduces the State-of-the-Art (SOTA) deep learning framework by Pauls et al. (2024) for global canopy height estimation, but imposes strict constraints: 10% of the original training data and consumer-grade hardware (Tesla T4 GPU).
Key Result: The optimized model (ResNet50 + Huber Loss) achieved a Mean Absolute Error (MAE) of 2.85m. This is only a 17% reduction in accuracy compared to the original full-scale model trained on high-performance clusters, and it significantly outperforms prior global benchmarks like Lang et al. (2023).
Contrary to the assumption that "more data is always better," this project demonstrates that efficient backbone selection and loss function tuning can yield production-grade results with a fraction of the compute.
| Model / Paper | Backbone | Data Scale | Loss Function | MAE (m) | Status |
|---|---|---|---|---|---|
| This Project (Best) | ResNet50 | 10% | Standard Huber | 2.85 | Reproduced |
| This Project (Baseline) | ResNet50 | 10% | Shift-Resilient Huber | 2.89 | Reproduced |
| Pauls et al. (2024) | ResNet50 | 100% | Shift-Resilient Huber | 2.43 | SOTA Reference |
| Lang et al. (2023) | CNN Ensemble | 100% | N/A | 6.47 | Prior Benchmark |
| Potapov et al. (2021) | Bagged Trees | 100% | N/A | 6.92 | Prior Benchmark |
Data Source: Bachelor Thesis, Table 6.2 & Table 3.2.
This repository adapts the original codebase for restricted environments (Google Colab High-RAM).
- Architecture: U-Net with a ResNet50 encoder (pre-trained on ImageNet vs. Random Initialization tested).
- Data Pipeline: Processed 11,749 samples of Sentinel-2 optical imagery fused with GEDI LiDAR sparse labels.
- Optimization: AdamW optimizer with decoupled weight decay to handle regularization on small batches.
- Experimentation:
- Comparisons of ResNet34 vs. ResNet50 vs. ResNet101 backbones under data scarcity.
- Ablation studies on Shift-Resilient Loss vs. Standard Huber Loss.
- Architecture Efficiency: ResNet50 provided the optimal trade-off. The larger ResNet101 (52M parameters) suffered from overfitting/convergence issues due to the small batch size constraint (Batch size 10 vs 32).
- Loss Function Dynamics: While Pauls et al. (2024) proposed "Shift-Resilient Loss" to fix geolocation errors, my experiments proved that at 10% data scale, Standard Huber Loss actually outperforms the complex shifted loss (2.85m vs 2.89m MAE).
- Tall Canopy Challenge: The model struggles with trees >30m (MAE 20.84m) due to severe class imbalance in the training set (only 2.8% of samples were tall forests).
The entire training pipeline has been ported to a single Google Colab notebook for ease of access.
- Open the Notebook: Click here to run on Colab
- Data Setup: The notebook connects to Google Drive to mount pre-processed
.npzfiles (Sentinel-2 + GEDI). - Training:
# Example command structure python main.py --config config.yaml --model resnet50 --loss huber
Qualitative comparison of model outputs (Height predictions in meters). Model comparison: ResNet50 + Huber (left) vs. ResNet50 + Weighted (right). Each model takes the same two satellite images as input, and outputs their corresponding canopy height estimations. Higher predictions are indicated by brighter colors.
This work constitutes the Bachelor's Thesis: "Deep Learning for Canopy Height Estimation: Experimental Reproducibility at 10% Data Scale" (2025), University of Copenhagen.
The core architecture forked and further developed from Estimating Canopy Height at Scale by Pauls et al. (2024). Please cite the original work if you use the core architecture:
@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls et al.},
booktitle={Forty-first International Conference on Machine Learning},
year={2024}
}Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke
[Paper] Google Earth Engine viewer] [BibTeX]
We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.
A comparison between our map and two other existing global height maps (Lang et al., Potapov et al.), as well as a regional map for France reveals that the visual quality improved a lot. It closely matches the one from regional maps, albeit some regions with remaining quality differences (e.g. column 8)
We uploaded our produced canopy height map to Google Earth Engine and created a GEE app that allows users to visualize our map globally and compare it to other existing products. If you want to build your own app or download/use our map in another way, you can access the map under the following asset_id:
var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020')
# To display on the map, create the mosaic:
var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020').mosaic()
This paper is part of the project AI4Forest, which is funded by the German Aerospace Agency (DLR), the german federal ministry for education and research (BMBF) and the french national research agency (anr). Further, calculations (or parts of them) for this publication were performed on the HPC cluster PALMA II of the University of Münster, subsidised by the DFG (INST 211/667-1).
If you use our map in your research, please cite using the following BibTex:
@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls and Max Zimmer and Una M. Kelly and Martin Schwartz and Sassan Saatchi and Philippe CIAIS and Sebastian Pokutta and Martin Brandt and Fabian Gieseke},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=ZzCY0fRver}
}



