This project addresses the challenge of reconstructing sea surface temperature (SST) data in the presence of cloud occlusion using deep learning. The data is sourced from the MODIS dataset (Aqua satellite, nightly data) for the North Adriatic Sea. The notebook provides a full pipeline from data loading and preprocessing to model training and evaluation.
Satellite-based SST measurements are often occluded by clouds, resulting in missing data. The goal is to reconstruct the missing SST values using machine learning, improving upon a provided statistical baseline.
-
MODIS Aqua Nightly SST: Region of the North Adriatic Sea
-
Files: Training, validation, and test sets, date arrays, land-sea mask, and a statistical baseline
-
Occlusions: Cloud-covered and land areas are marked as NaN
-
Data Loading & Visualization: Download and inspect SST data, visualize occlusions, and analyze temperature statistics.
-
Preprocessing: Gaussian normalization, land-sea masking, and baseline adjustment.
-
Artificial Occlusion Generator: Simulates additional cloud occlusions to create ground truth for supervised learning.
-
Model: U-Net with residual blocks implemented in TensorFlow/Keras. The model takes as input the masked SST, artificial mask, land-sea mask, and tuned baseline.
-
Training: Custom loss function (weighted MSE on occluded regions), AdamW optimizer, learning rate scheduler, and validation monitoring.
-
Evaluation: Root Mean Squared Error (RMSE) on the occluded (clouded) regions, compared to the statistical baseline.
-
Notebook: Contains all code, explanations, and results
-
Data Download: Uses
gdownto fetch required.npyfiles -
Model: U-Net with residual blocks for pixel-wise SST reconstruction
-
Evaluation: RMSE metric on test set, focusing on cloud-occluded regions
-
U-Net with residual blocks outperformed the statistical baseline and simple CNNs
-
AdamW optimizer and learning rate scheduling improved convergence
-
GANs were tested but did not perform well due to limited data
-
Data augmentation was not beneficial for this dataset
-
Vision Transformer (ViT) was also tested, but did not outperform U-Net, likely due to the limited dataset size and the data efficiency of convolutional architectures
-
Download all required data files using the provided
gdowncommands in the notebook -
Run the notebook cells sequentially
-
Train the model and evaluate using the provided scripts
-
Python 3.x
-
TensorFlow 2.x
-
NumPy, Pandas, Matplotlib, gdown
-
MODIS Aqua SST Dataset
-
U-Net: Convolutional Networks for Biomedical Image Segmentation (Ronneberger et al.)
This project was completed as part of a deep learning course. For details on model choices and experimental results, see the final section of the notebook.