Subgraph Learning with SALIENT++ via MOSAIC

This README describes how to apply the MOSAIC transformation to enable subgraph classification workloads to run within the SALIENT++ system. It also shows how to compose subgraph-aware labeling techniques like GLASS with modular nodewise architectures.

Preprocessing Steps

0. Prepare the datasets

The script supports the following datasets:

Synthetic

coreness
cut_ratio
density
component

Real-world

em_user
hpo_metab
ppi_bp
hpo_neuro
elliptic2

See the GLASS repo for instructions on accessing these.

For the Elliptic2 dataset, see Elliptic2 for access. It provides instructions on producing the edge_list.txt and subgraphs.pth files. This script expects the dataset to be in these file formats.

1. Preprocess the dataset

Run the following command to generate a directory containing the FastDataset torch files. You should see a directory structure like dataset/ppi_bp/ppi_bp, containing rowptr.pt, x.pt, etc.

python -m scripts.preprocess_SALIENT --dataset_dir DATASET_NAME

DATASET_NAME refers to directories relative to SALIENT_plusplus/dataset, e.g. ppi_bp.

Now, you are ready to run the training pipeline!

Running within SALIENT

Run the experiment driver as instructed in the installation instructions

For example, to run on the ppi_bp dataset with the same hyperparameters as the GLASS paper (see their configs), run the following command.

python -m utils.exp_driver --num_machines 1 --num_gpus_per_machine 1 --gpu_percent 0.999 --replication_factor 15 --run_local --train_fanouts -1 -1 -1 --test_fanouts -1 -1 -1 --num_hidden 64 --train_batch_size 80 --learning_rate 0.0005 --dataset_name ppi_bp --dataset_dir ./dataset/ppi_bp --job_name test-job --model_name sageresinception --num_epochs 300 --use_subgraph_label

There are two key modifications to be aware of:

1. Fanout modification

The MOSAIC transformation introduces an additional message-passing layer for subgraph representative nodes. Therefore, specify an extra fanout of -1 at the start of the fanout lists (train and test).

In this case, the command represents an architecture with 2 convolution layers, but has 3 layers specified to account for the subgraph representatives.

2. Subgraph labeling

We now have the optional specifier --use_subgraph_label, which modifies the batch preparation process to add a 1 flag to all subgraph representatives within the batch. Specifically this occurs within the DevicePrefetcher class within fast_trainer/transferers.py.

Note - the preprocessing script automatically appends an 0-valued feature to all node feature vectors to accommodate subgraph labeling, avoiding the need for multiple versions of the dataset.

Future work

Adding support for the distributed execution across multiple machines
Hyperparameter tuning
Incorporating gcn aggregation to better emulate GLASS's configuration process
Evaluation on additional datasets or model backbones

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subgraph Learning with SALIENT++ via MOSAIC

Preprocessing Steps

0. Prepare the datasets

1. Preprocess the dataset

Running within SALIENT

1. Fanout modification

2. Subgraph labeling

Future work

FilesExpand file tree

README_mosaic.md

Latest commit

History

README_mosaic.md

File metadata and controls

Subgraph Learning with SALIENT++ via MOSAIC

Preprocessing Steps

0. Prepare the datasets

1. Preprocess the dataset

Running within SALIENT

1. Fanout modification

2. Subgraph labeling

Future work