Skip to content

Latest commit

 

History

History
70 lines (42 loc) · 3.22 KB

File metadata and controls

70 lines (42 loc) · 3.22 KB

Subgraph Learning with SALIENT++ via MOSAIC

This README describes how to apply the MOSAIC transformation to enable subgraph classification workloads to run within the SALIENT++ system. It also shows how to compose subgraph-aware labeling techniques like GLASS with modular nodewise architectures.

Preprocessing Steps

0. Prepare the datasets

The script supports the following datasets:

Synthetic

  • coreness
  • cut_ratio
  • density
  • component

Real-world

  • em_user
  • hpo_metab
  • ppi_bp
  • hpo_neuro
  • elliptic2

See the GLASS repo for instructions on accessing these.

For the Elliptic2 dataset, see Elliptic2 for access. It provides instructions on producing the edge_list.txt and subgraphs.pth files. This script expects the dataset to be in these file formats.

1. Preprocess the dataset

Run the following command to generate a directory containing the FastDataset torch files. You should see a directory structure like dataset/ppi_bp/ppi_bp, containing rowptr.pt, x.pt, etc.

python -m scripts.preprocess_SALIENT --dataset_dir DATASET_NAME

DATASET_NAME refers to directories relative to SALIENT_plusplus/dataset, e.g. ppi_bp.

Now, you are ready to run the training pipeline!

Running within SALIENT

Run the experiment driver as instructed in the installation instructions

For example, to run on the ppi_bp dataset with the same hyperparameters as the GLASS paper (see their configs), run the following command.

python -m utils.exp_driver --num_machines 1 --num_gpus_per_machine 1 --gpu_percent 0.999 --replication_factor 15 --run_local --train_fanouts -1 -1 -1 --test_fanouts -1 -1 -1 --num_hidden 64 --train_batch_size 80 --learning_rate 0.0005 --dataset_name ppi_bp --dataset_dir ./dataset/ppi_bp --job_name test-job --model_name sageresinception --num_epochs 300 --use_subgraph_label

There are two key modifications to be aware of:

1. Fanout modification

The MOSAIC transformation introduces an additional message-passing layer for subgraph representative nodes. Therefore, specify an extra fanout of -1 at the start of the fanout lists (train and test).

In this case, the command represents an architecture with 2 convolution layers, but has 3 layers specified to account for the subgraph representatives.

2. Subgraph labeling

We now have the optional specifier --use_subgraph_label, which modifies the batch preparation process to add a 1 flag to all subgraph representatives within the batch. Specifically this occurs within the DevicePrefetcher class within fast_trainer/transferers.py.

Note - the preprocessing script automatically appends an 0-valued feature to all node feature vectors to accommodate subgraph labeling, avoiding the need for multiple versions of the dataset.

Future work

  • Adding support for the distributed execution across multiple machines
  • Hyperparameter tuning
  • Incorporating gcn aggregation to better emulate GLASS's configuration process
  • Evaluation on additional datasets or model backbones