Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

This is an official implementation of 'Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction' 🔥.

🚀 Installation

certifi                  2024.8.30
charset-normalizer       3.4.0
contourpy                1.3.0
cycler                   0.12.1
ecos                     2.0.14
einops                   0.8.0
filelock                 3.13.1
fonttools                4.54.1
fsspec                   2024.2.0
huggingface-hub          0.26.2
idna                     3.10
Jinja2                   3.1.3
joblib                   1.4.2
kiwisolver               1.4.7
MarkupSafe               2.1.5
matplotlib               3.9.2
mpmath                   1.3.0
networkx                 3.2.1
numexpr                  2.10.1
numpy                    1.26.3
nvidia-cublas-cu12       12.4.5.8
nvidia-cuda-cupti-cu12   12.4.127
nvidia-cuda-nvrtc-cu12   12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.2.1.3
nvidia-curand-cu12       10.3.5.147
nvidia-cusolver-cu12     11.6.1.9
nvidia-cusparse-cu12     12.3.1.170
nvidia-ml-py             12.535.161
nvidia-nccl-cu12         2.21.5
nvidia-nvjitlink-cu12    12.4.127
nvidia-nvtx-cu12         12.4.127
nvitop                   1.3.2
opencv-python            4.10.0.84
osqp                     0.6.7.post3
packaging                24.1
pandas                   2.2.3
pillow                   10.2.0
pip                      24.2
protobuf                 5.28.3
pyparsing                3.2.0
python-dateutil          2.9.0.post0
pytz                     2024.2
PyYAML                   6.0.2
qdldl                    0.1.7.post4
requests                 2.32.3
safetensors              0.4.5
scikit-learn             1.5.2
scikit-survival          0.23.0
scipy                    1.14.1
setuptools               75.1.0
six                      1.16.0
sympy                    1.13.1
tensorboardX             2.6.2.2
termcolor                2.5.0
threadpoolctl            3.5.0
timm                     1.0.11
torch                    2.5.1+cu121
torchaudio               2.5.1+cu121
torchvision              0.20.1+cu121
tqdm                     4.66.6
triton                   3.1.0
typing_extensions        4.9.0
tzdata                   2024.2
urllib3                  2.2.3
wheel                    0.44.0

Data preparation

WSI preprocessing toolkit: it is highly recommended to utilize an easy-to-use tool, CLAM, for WSI preprocessing, including dataset download, tissue segmentation, patching, and patch feature extraction. Please see a detailed documentation at https://github.com/mahmoodlab/CLAM, or refer to Tutorial - Processing WSIs for MIL from Scratch for a complete and more detailed tutorial built opon CLAM.

Next, we provide detailed steps to preprocess WSIs using CLAM (assuming you have already known its basic usage):

patching at level = 2 (downsampled 16x): go to CLAM directory and run

# DATA_DIRECTORY should be the path to raw images (e.g., svs files). 
# '/data/nlst/processed/tiles-l2-s256' is the path for saving patching results.
python create_patches_fp.py \
    --source DATA_DIRECTORY \
    --save_dir \
    --patch_level 2 --patch_size 256 --seg --patch --stitch

This step will save the coordinates of segmented patches at level = 2.

Feature extracting: go to CLAM directory and run

# DATA_DIRECTORY should be the path to raw images (e.g., svs files). 
# 'process_list_autogen.csv' is a csv file generated by the first step, initially in '/data/'.
# This csv file is automatically copied to '/data/processed/'.
CUDA_VISIBLE_DEVICES=0,1 python extract_features_fp.py \
    --data_h5_dir /data/processed/ \
    --data_slide_dir DATA_DIRECTORY \
    --csv_path /data//processed/process_list_autogen.csv \
    --feat_dir /data/processed/UNI \
    --batch_size 512 --slide_ext .svs

This step will compute all patch features and save them in /data/processed/UNI. Note that --data_h5_dir should be the full path of the patch coordinates at level = 1 from previous step.

Now it is expected that you have the following file directories (taking nlst for example) in your computer.

/data/processed/UNI: path to all patch features.
/data/processed/tiles-l1-s256: path to all segmented patch coordinates.
./table/nlst_path_full.csv: path to the csv table with patient_id, pathology_id, t, e. We have uploaded these files. Please see them in ./table.
./data_split/nlst-foldk.npz: path to the file with data splitting details. We have uploaded these files. Please see them in ./data_split.

A detailed file structure would be as follows:

/data                              # The directory of nlst.
└─ processed
   ├─ UNI                     # The directory of all patch features (level = 1).
   │  └─ pt_files                                   
   │     ├─ 10015.pt                     # The patch features of slide 10015.
   │     ├─ 10016.pt
   │     └─ ...
   │
   ├─ tiles-l1-s256                      # The directory of all segmented patch coordinates (level = 1).
   │  ├─ patches
   │  │  ├─ 10015.h5                     # The patch coordinates of slide 10015.
   │  │  ├─ 10016.h5
   │  │  └─ ...
   │  └─ process_list_autogen.csv        # csv file recording all processing details (autogeneraed by CLAM).
   └─ ...                                # other intermediate directories, such as "tiles-l2-s256" from the first step.

Training-Validation Splits

Splits for each cancer type are found in the splits/5foldcv folder, which are randomly partitioned each dataset using 5-fold cross-validation. Each one contains splits_{k}.csv for k = 1 to 5. To compare with MCAT, we follow the same splits as that of MCAT.

Training

nohup python train.py --source_dataset BLCA --source_dataset_dir "/data/ypq/BLCA_Features" --target_dataset_dir "/data/ypq/LGG_Features" --target_dataset LGG >> BLCA_LGG.log &
nohup python train.py --source_dataset BLCA --source_dataset_dir /data/ypq/BLCA_Features --target_dataset_dir /data/ypq/LUAD_Features --target_dataset LUAD >> BLCA_LUAD.log &
nohup python train.py --source_dataset BLCA --source_dataset_dir /data/ypq/BLCA_Features --target_dataset_dir /data/ypq/UCEC_Features --target_dataset UCEC >> BLCA_UCEC.log &



nohup python train.py --source_dataset LGG --source_dataset_dir /data/ypq/LGG_Features --target_dataset_dir /data/ypq/BLCA_Features --target_dataset BLCA >> LGG_BLCA.log &
nohup python train.py --source_dataset LGG --source_dataset_dir /data/ypq/LGG_Features --target_dataset_dir /data/ypq/LUAD_Features --target_dataset LUAD >> LGG_LUAD.log & 
nohup python train.py --source_dataset LGG --source_dataset_dir /data/ypq/LGG_Features --target_dataset_dir /data/ypq/UCEC_Features --target_dataset UCEC >> LGG_UCEC.log &



nohup python train.py --source_dataset LUAD --source_dataset_dir /data/ypq/LUAD_Features --target_dataset_dir /data/ypq/LGG_Features --target_dataset LGG >> LUAD_LGG.log &
nohup python train.py --source_dataset LUAD --source_dataset_dir /data/ypq/LUAD_Features --target_dataset_dir /data/ypq/BLCA_Features --target_dataset BLCA >> LUAD_BLCA.log &
nohup python train.py --source_dataset LUAD --source_dataset_dir /data/ypq/LUAD_Features --target_dataset_dir /data/ypq/UCEC_Features --target_dataset UCEC >> LUAD_UCEC.log &


nohup python train.py --source_dataset UCEC --source_dataset_dir /data/ypq/UCEC_Features --target_dataset_dir /data/ypq/LGG_Features --target_dataset LGG >> UCEC_LGG.log &
nohup python train.py --source_dataset UCEC --source_dataset_dir /data/ypq/UCEC_Features --target_dataset_dir /data/ypq/LUAD_Features --target_dataset LUAD >> UCEC_LUAD.log & 
nohup python train.py --source_dataset UCEC --source_dataset_dir /data/ypq/UCEC_Features --target_dataset_dir /data/ypq/BLCA_Features --target_dataset BLCA >> UCEC_BLCA.log &

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
csv		csv
fig		fig
splits		splits
BLCA_all_clean.rar		BLCA_all_clean.rar
README.md		README.md
datasets.py		datasets.py
extract_embeddings.py		extract_embeddings.py
models.py		models.py
pre_training.py		pre_training.py
surv_utils.py		surv_utils.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

🚀 Installation

Data preparation

Training-Validation Splits

Training

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction

🚀 Installation

Data preparation

Training-Validation Splits

Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages