This README describes how to apply the MOSAIC transformation to enable subgraph classification workloads to run within the SALIENT++ system. It also shows how to compose subgraph-aware labeling techniques like GLASS with modular nodewise architectures.
The script supports the following datasets:
Synthetic
- coreness
- cut_ratio
- density
- component
Real-world
- em_user
- hpo_metab
- ppi_bp
- hpo_neuro
- elliptic2
See the GLASS repo for instructions on accessing these.
For the Elliptic2 dataset, see Elliptic2 for access. It provides instructions on producing the edge_list.txt and subgraphs.pth files. This script expects the dataset to be in these file formats.
Run the following command to generate a directory containing the FastDataset torch files. You should see a directory structure like dataset/ppi_bp/ppi_bp, containing rowptr.pt, x.pt, etc.
python -m scripts.preprocess_SALIENT --dataset_dir DATASET_NAME
DATASET_NAME refers to directories relative to SALIENT_plusplus/dataset, e.g. ppi_bp.
Now, you are ready to run the training pipeline!
Run the experiment driver as instructed in the installation instructions
For example, to run on the ppi_bp dataset with the same hyperparameters as the GLASS paper (see their configs), run the following command.
python -m utils.exp_driver --num_machines 1 --num_gpus_per_machine 1 --gpu_percent 0.999 --replication_factor 15 --run_local --train_fanouts -1 -1 -1 --test_fanouts -1 -1 -1 --num_hidden 64 --train_batch_size 80 --learning_rate 0.0005 --dataset_name ppi_bp --dataset_dir ./dataset/ppi_bp --job_name test-job --model_name sageresinception --num_epochs 300 --use_subgraph_label
There are two key modifications to be aware of:
The MOSAIC transformation introduces an additional message-passing layer for subgraph representative nodes. Therefore, specify an extra fanout of -1 at the start of the fanout lists (train and test).
In this case, the command represents an architecture with 2 convolution layers, but has 3 layers specified to account for the subgraph representatives.
We now have the optional specifier --use_subgraph_label, which modifies the batch preparation process to add a 1 flag to all subgraph representatives within the batch. Specifically this occurs within the DevicePrefetcher class within fast_trainer/transferers.py.
Note - the preprocessing script automatically appends an 0-valued feature to all node feature vectors to accommodate subgraph labeling, avoiding the need for multiple versions of the dataset.
- Adding support for the distributed execution across multiple machines
- Hyperparameter tuning
- Incorporating gcn aggregation to better emulate GLASS's configuration process
- Evaluation on additional datasets or model backbones
