Skip to content

wylich/Global-Canopy-Height-Map

 
 

Repository files navigation

Deep Learning for Canopy Height: 10% Data Scale Reproducibility

Open In Colab PyTorch Status

🚀 Executive Summary

This project investigates the democratization of high-resolution climate monitoring. It reproduces the State-of-the-Art (SOTA) deep learning framework by Pauls et al. (2024) for global canopy height estimation, but imposes strict constraints: 10% of the original training data and consumer-grade hardware (Tesla T4 GPU).

Key Result: The optimized model (ResNet50 + Huber Loss) achieved a Mean Absolute Error (MAE) of 2.85m. This is only a 17% reduction in accuracy compared to the original full-scale model trained on high-performance clusters, and it significantly outperforms prior global benchmarks like Lang et al. (2023).


📊 Performance & Benchmarks

Contrary to the assumption that "more data is always better," this project demonstrates that efficient backbone selection and loss function tuning can yield production-grade results with a fraction of the compute.

Model / Paper Backbone Data Scale Loss Function MAE (m) Status
This Project (Best) ResNet50 10% Standard Huber 2.85 Reproduced
This Project (Baseline) ResNet50 10% Shift-Resilient Huber 2.89 Reproduced
Pauls et al. (2024) ResNet50 100% Shift-Resilient Huber 2.43 SOTA Reference
Lang et al. (2023) CNN Ensemble 100% N/A 6.47 Prior Benchmark
Potapov et al. (2021) Bagged Trees 100% N/A 6.92 Prior Benchmark

Data Source: Bachelor Thesis, Table 6.2 & Table 3.2.


🛠️ Technical Implementation

This repository adapts the original codebase for restricted environments (Google Colab High-RAM).

  • Architecture: U-Net with a ResNet50 encoder (pre-trained on ImageNet vs. Random Initialization tested).
  • Data Pipeline: Processed 11,749 samples of Sentinel-2 optical imagery fused with GEDI LiDAR sparse labels.
  • Optimization: AdamW optimizer with decoupled weight decay to handle regularization on small batches.
  • Experimentation:
    • Comparisons of ResNet34 vs. ResNet50 vs. ResNet101 backbones under data scarcity.
    • Ablation studies on Shift-Resilient Loss vs. Standard Huber Loss.

Key Findings

  1. Architecture Efficiency: ResNet50 provided the optimal trade-off. The larger ResNet101 (52M parameters) suffered from overfitting/convergence issues due to the small batch size constraint (Batch size 10 vs 32).
  2. Loss Function Dynamics: While Pauls et al. (2024) proposed "Shift-Resilient Loss" to fix geolocation errors, my experiments proved that at 10% data scale, Standard Huber Loss actually outperforms the complex shifted loss (2.85m vs 2.89m MAE).
  3. Tall Canopy Challenge: The model struggles with trees >30m (MAE 20.84m) due to severe class imbalance in the training set (only 2.8% of samples were tall forests).

💻 Usage (Reproducibility)

The entire training pipeline has been ported to a single Google Colab notebook for ease of access.

  1. Open the Notebook: Click here to run on Colab
  2. Data Setup: The notebook connects to Google Drive to mount pre-processed .npz files (Sentinel-2 + GEDI).
  3. Training:
    # Example command structure
    python main.py --config config.yaml --model resnet50 --loss huber

🖼️ Visual Results

Qualitative comparison of model outputs (Height predictions in meters). Model comparison: ResNet50 + Huber (left) vs. ResNet50 + Weighted (right). Each model takes the same two satellite images as input, and outputs their corresponding canopy height estimations. Higher predictions are indicated by brighter colors.

Comparison of ResNet50+Huber vs Weighted Sampler


📜 Citation & Credits

This work constitutes the Bachelor's Thesis: "Deep Learning for Canopy Height Estimation: Experimental Reproducibility at 10% Data Scale" (2025), University of Copenhagen.

The core architecture forked and further developed from Estimating Canopy Height at Scale by Pauls et al. (2024). Please cite the original work if you use the core architecture:

@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls et al.},
booktitle={Forty-first International Conference on Machine Learning},
year={2024}
}

Estimating Canopy Height at Scale [ICML2024]

Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke

[Paper] Google Earth Engine viewer] [BibTeX]

Global canopy height map

We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

Global canopy height map

A comparison between our map and two other existing global height maps (Lang et al., Potapov et al.), as well as a regional map for France reveals that the visual quality improved a lot. It closely matches the one from regional maps, albeit some regions with remaining quality differences (e.g. column 8)

Global and regional comparison

Interactive Google Earth Engine viewer

We uploaded our produced canopy height map to Google Earth Engine and created a GEE app that allows users to visualize our map globally and compare it to other existing products. If you want to build your own app or download/use our map in another way, you can access the map under the following asset_id:

var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020')

# To display on the map, create the mosaic:
var canopy_height_2020 = ee.ImageCollection('projects/worldwidemap/assets/canopyheight2020').mosaic()

Acknowledgements

This paper is part of the project AI4Forest, which is funded by the German Aerospace Agency (DLR), the german federal ministry for education and research (BMBF) and the french national research agency (anr). Further, calculations (or parts of them) for this publication were performed on the HPC cluster PALMA II of the University of Münster, subsidised by the DFG (INST 211/667-1).

Citing the paper

If you use our map in your research, please cite using the following BibTex:

@inproceedings{pauls2024estimating,
title={Estimating Canopy Height at Scale},
author={Jan Pauls and Max Zimmer and Una M. Kelly and Martin Schwartz and Sassan Saatchi and Philippe CIAIS and Sebastian Pokutta and Martin Brandt and Fabian Gieseke},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=ZzCY0fRver}
}

About

Code to reproduce the experiments of ICML2024-paper: Estimating Canopy Height at Scale

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 86.4%
  • Jupyter Notebook 13.6%