Skip to content

nsgln/eSIMBA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A new evolutive clustering algorithm (eSIMBA) for Active Module Identification in p-value Attributed Temporal Biological Networks

This repository contains the code needed to execute the eSIMBA algorithm and reproduice the results presented in [CITATION TO BE ADDED]. It also contains an optimized version of SIMBA algorithm presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6.

Table of contents

Installation

To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed to install the required packages described in the Requirements section and listed in the environment.yml file.

To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Reproduce paper results and the Usage on your own data sections.

Reproduce paper results

Several launch scripts used in the paper are available in the scripts/ folder. They are POSIX shell scripts.

To launch all the provided scripts using a POSIX-compliant terminal (e.g., Linux, macOS, Git Bash on Windows) and reproduce all the results of the paper using screen sessions, you can use the following command:

./scripts/launch_all.sh

Usage on your own data

The main entry point is main.py.

To run eSIMBA (dynamic) on your own data you can use the following command:

python main.py -i .\data\YourData.zip

To run SIMBA (static) on your own data you can use the following command:

python main.py -i .\data\YourStaticGraph.npz --static 

Command-line arguments (precise list)

The exact command-line arguments are defined in utils/utils.py. Below is a precise list with descriptions and default values.

  • Input parameters

    • -i, --input_path (str, required): Path to the input data. For dynamic runs (eSIMBA) this must be a zip file containing one .npz file per graph/time-step. For static runs (--static) a single .npz file containing the graph is accepted. In dynamic setting, files should follow the naming template *_Time_{time_step}.npz where {time_step} is an integer.
    • -gt, --ground_truth (flag): Indicates that ground-truth community labels are present in the data files.
    • -n, --name (flag): Indicates that node names are present in the data files.
  • Output parameters

    • -r, --results_path (str, default: ./Results): Path to the output results directory.
    • -o, --output_prefix (str, default: output): Prefix for output files. Note: the prefix must not contain path separators or dots.
    • -d, --draw (flag): Save plots of community evolution over time. This option is only valid when running eSIMBA ( dynamic).
  • Execution parameters

    • -s, --static (flag): Run SIMBA (static) instead of eSIMBA (dynamic).
    • -p, --parallelism (flag): Enable parallelism. Parallelism is only available for eSIMBA (dynamic) runs.
    • --debug (flag): Enable debug mode. When enabled, verbose output is automatically enabled and parallelism is disabled.
    • --debug_limit (int, default: 5): Limit on the number of graphs to process in debug mode. Can only be set in debug mode.
    • --shuffle (flag): Shuffle graphs before processing in debug mode. Only available when debug mode is enabled.
  • Algorithm parameters

    • -t, --threshold (float, default: 0.05): Threshold used during the filtering phase (applies to node p-values). Must be between 0 and 1.
    • -mc, --min_community_size (int, default: 5): Minimum size for a community to be considered valid; smaller communities are discarded.
    • -mwc, --min_window_community_size (int, default: 3): Minimum size for a community inside a time window (eSIMBA); smaller communities are discarded.
    • -a, --approach (str, default: adaptive): Clustering approach. Accepted values: adaptive, best, worst. The adaptive approach switches between best and worst depending on the number of edges to process in the current iteration.
  • Verbosity

    • -v, --verbose (flag): Enable verbose output. Note: enabling --debug forces verbose on.

Validation rules and notes

  • When running eSIMBA (dynamic) the --input_path must be a .zip file containing .npz files; when running SIMBA ( static) a single .npz file is accepted.
  • If --static is used, --draw and --parallelism are not allowed.
  • --debug automatically sets --verbose and disables --parallelism.
  • --shuffle and --debug_limit may only be used when --debug is enabled.
  • --threshold must be in [0, 1].
  • --approach must be one of adaptive, best, worst.

Repository structure

  • main.py: main script to run SIMBA and eSIMBA
  • clustering/: similarity-based clustering code (SIMBA)
  • graph/: graph, node and cluster classes
  • utils/: helper utilities (reading data, execution, metrics, saving, plotting)
  • scripts/: provided launch scripts used in the paper
  • data/: example datasets (zip / npz)

Data and format

The input data should be provided in .npz format. For eSIMBA (dynamic), the input should be a .zip file containing one .npz file per graph/time-step. Each .npz file should follow the naming template *_Time_{time_step}.npz where {time_step} is an integer representing the time step of the graph. For SIMBA (static), a single .npz file containing the graph is accepted. Multiple graphs can be provided by using multiple .npz files in a .zip archive.

Format of the .npz files

Each graph is stored as a .npz file. Expected keys are:

  • adjacency_data, adjacency_indices, adjacency_indptr, adjacency_shape — sparse adjacency matrix components
  • feature_data, feature_indices, feature_indptr, feature_shape — sparse feature matrix components (e.g., p-values)
  • labels (optional) — ground-truth community labels
  • label_indices (optional) — indices for ground-truth labels
  • evolution (optional) — evolution ground-truth (for dynamic graphs)
  • evolution_indices (optional) — indices for evolution ground-truth (for dynamic graphs)
  • name (optional) — node names

Available example datasets

Most of the provided datasets are .zip archives containing one or more .npz files (one .npz per graph / per time step for dynamic datasets). Below is the list of archives currently available in the data/ folder.

Synthetic datasets

  • [dynamic_albert-barabasi_batra-SAE-VM.zip](data/dynamic_albert-barabasi_batra-SAE-VM.zip) : 1000 graphs; 1000 nodes; 3 time steps; 10 communities; 10 nodes per community.
  • [dynamic_evolutive_values_albert-barabasi_batra-SAE-VM.zip](data/dynamic_evolutive_values_albert-barabasi_batra-SAE-VM.zip) : 1000 graphs; 1000 nodes; 3 time steps; 10 communities; 10 nodes per community.
  • [subset_dynamic_rewire-PPI-STRING-Human-robinson.zip](data/subset_dynamic_rewire-PPI-STRING-Human-robinson.zip) : 25 graphs; 16201 nodes; 3 time steps; 50 communities; 10 nodes per community. WARNING: this dataset is a subset of the full dynamic_rewire-PPI-STRING-Human-robinson.zip dataset used in the paper, containing only 25 graphs instead of 100. It is provided for testing, as the full dataset can is too large to be added.
  • [dynamic_rewire_with_modules-PPI-IntAct-Human-robinson.zip](data/dynamic_rewire_with_modules-PPI-IntAct-Human-robinson.zip) : 100 graphs; 5784 nodes; 3 time steps; 50 communities; 10 nodes per community.
  • [subset_dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip](data/subset_dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip) : 100 graphs; 16201 nodes; 3 time steps; 50 communities; 10 nodes per community. WARNING: this dataset is a subset of the full dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip dataset used in the paper, containing only 25 graphs instead of 100. It is provided for testing, as the full dataset can is too large to be added.
  • [dynamic_evolutive_values_rewire_with_modules-PPI-IntAct-Human-robinson.zip](data/dynamic_evolutive_values_rewire_with_modules-PPI-IntAct-Human-robinson.zip) : 100 graphs; 5784 nodes; 3 time steps; 50 communities; 10 nodes per community.

Real datasets

  • Data from Viggars, M. R., Sutherland, H., Lanmüller, H., Schmoll, M., Bijak, M., & Jarvis, J. C. (2023). Adaptation of the transcriptional response to resistance exercise over 4 weeks of daily training. FASEB journal : official publication of the Federation of American Societies for Experimental Biology, 37(1), e22686. https://doi.org/10.1096/fj.202201418R :
    • [Rat_Daily_Training.zip](data/Rat_Daily_Training.zip) : 4 time steps.
  • Data from Dumas, S. J., Meta, E., Borri, M., Goveia, J., Rohlenova, K., Conchinha, N. V., Falkenberg, K., Teuwen, L. A., de Rooij, L., Kalucka, J., Chen, R., Khan, S., Taverna, F., Lu, W., Parys, M., De Legher, C., Vinckier, S., Karakach, T. K., Schoonjans, L., Lin, L., … Carmeliet, P. (2020). Single-Cell RNA Sequencing Reveals Renal Endothelium Heterogeneity and Metabolic Adaptation to Water Deprivation. Journal of the American Society of Nephrology : JASN, 31( 1), 118–138. https://doi.org/10.1681/ASN.2019080832 :
    • [cRECs.zip](data/cRECs.zip) : 3 time steps.
    • [gRECs.zip](data/gRECs.zip) : 4 time steps.
    • [mRECs.zip](data/mRECs.zip) : 4 time steps.
  • Data from Pasquier, C., & Robichon, A. (2021). Temporal and sequential order of nonoverlapping gene networks unraveled in mated female Drosophila. Life science alliance, 5(2), e202101119. https://doi.org/10.26508/lsa.202101119 :
    • [Drosophila_Amine_DESeq.zip](data/Drosophila_Amine_DESeq.zip) : 3 time steps.
    • [Drosophila_Amine_edgeR.zip](data/Drosophila_Amine_edgeR.zip) : 3 time steps.

Requirements

The code is written in Python 3.11.

The following packages are required (also listed in environment.yml):

  • numpy
  • scipy
  • scikit-learn
  • matplotlib
  • networkx
  • psutil
  • pylint~=3.0
  • openpyxl
  • networkit
  • plotly
  • kaleido-core
  • scikit-network
  • pyunionfind

Cite

TO ADD

Contact

If you have any question, please contact me at singlan.nina@gmail.com

About

This directory contains the implementation of the eSIMBA algorithm presented in the article [TO ADD ]

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors