A new evolutive clustering algorithm (eSIMBA) for Active Module Identification in p-value Attributed Temporal Biological Networks

This repository contains the code needed to execute the eSIMBA algorithm and reproduice the results presented in [CITATION TO BE ADDED]. It also contains an optimized version of SIMBA algorithm presented in Singlan, N., Abou Choucha, F. & Pasquier, C. A new Similarity Based Adapted Louvain Algorithm (SIMBA) for active module identification in p-value attributed biological networks. Sci Rep 15, 11360 (2025). https://doi.org/10.1038/s41598-025-95749-6.

Installation

To use this algorithm, it is needed to have Python 3.11 installed on your machine. Moreover, it is needed to install the required packages described in the Requirements section and listed in the environment.yml file.

To run the algorithm, it is needed to clone this repository, install the required packages and run the code using the commands described in the Reproduce paper results and the Usage on your own data sections.

Reproduce paper results

Several launch scripts used in the paper are available in the scripts/ folder. They are POSIX shell scripts.

To launch all the provided scripts using a POSIX-compliant terminal (e.g., Linux, macOS, Git Bash on Windows) and reproduce all the results of the paper using screen sessions, you can use the following command:

./scripts/launch_all.sh

Usage on your own data

The main entry point is main.py.

To run eSIMBA (dynamic) on your own data you can use the following command:

python main.py -i .\data\YourData.zip

To run SIMBA (static) on your own data you can use the following command:

python main.py -i .\data\YourStaticGraph.npz --static

Command-line arguments (precise list)

The exact command-line arguments are defined in utils/utils.py. Below is a precise list with descriptions and default values.

Input parameters
- -i, --input_path (str, required): Path to the input data. For dynamic runs (eSIMBA) this must be a zip file containing one .npz file per graph/time-step. For static runs (--static) a single .npz file containing the graph is accepted. In dynamic setting, files should follow the naming template *_Time_{time_step}.npz where {time_step} is an integer.
- -gt, --ground_truth (flag): Indicates that ground-truth community labels are present in the data files.
- -n, --name (flag): Indicates that node names are present in the data files.
Output parameters
- -r, --results_path (str, default: ./Results): Path to the output results directory.
- -o, --output_prefix (str, default: output): Prefix for output files. Note: the prefix must not contain path separators or dots.
- -d, --draw (flag): Save plots of community evolution over time. This option is only valid when running eSIMBA ( dynamic).
Execution parameters
- -s, --static (flag): Run SIMBA (static) instead of eSIMBA (dynamic).
- -p, --parallelism (flag): Enable parallelism. Parallelism is only available for eSIMBA (dynamic) runs.
- --debug (flag): Enable debug mode. When enabled, verbose output is automatically enabled and parallelism is disabled.
- --debug_limit (int, default: 5): Limit on the number of graphs to process in debug mode. Can only be set in debug mode.
- --shuffle (flag): Shuffle graphs before processing in debug mode. Only available when debug mode is enabled.
Algorithm parameters
- -t, --threshold (float, default: 0.05): Threshold used during the filtering phase (applies to node p-values). Must be between 0 and 1.
- -mc, --min_community_size (int, default: 5): Minimum size for a community to be considered valid; smaller communities are discarded.
- -mwc, --min_window_community_size (int, default: 3): Minimum size for a community inside a time window (eSIMBA); smaller communities are discarded.
- -a, --approach (str, default: adaptive): Clustering approach. Accepted values: adaptive, best, worst. The adaptive approach switches between best and worst depending on the number of edges to process in the current iteration.
Verbosity
- -v, --verbose (flag): Enable verbose output. Note: enabling --debug forces verbose on.

Validation rules and notes

When running eSIMBA (dynamic) the --input_path must be a .zip file containing .npz files; when running SIMBA ( static) a single .npz file is accepted.
If --static is used, --draw and --parallelism are not allowed.
--debug automatically sets --verbose and disables --parallelism.
--shuffle and --debug_limit may only be used when --debug is enabled.
--threshold must be in [0, 1].
--approach must be one of adaptive, best, worst.

Repository structure

main.py: main script to run SIMBA and eSIMBA
clustering/: similarity-based clustering code (SIMBA)
graph/: graph, node and cluster classes
utils/: helper utilities (reading data, execution, metrics, saving, plotting)
scripts/: provided launch scripts used in the paper
data/: example datasets (zip / npz)

Data and format

The input data should be provided in .npz format. For eSIMBA (dynamic), the input should be a .zip file containing one .npz file per graph/time-step. Each .npz file should follow the naming template *_Time_{time_step}.npz where {time_step} is an integer representing the time step of the graph. For SIMBA (static), a single .npz file containing the graph is accepted. Multiple graphs can be provided by using multiple .npz files in a .zip archive.

Format of the `.npz` files

Each graph is stored as a .npz file. Expected keys are:

adjacency_data, adjacency_indices, adjacency_indptr, adjacency_shape — sparse adjacency matrix components
feature_data, feature_indices, feature_indptr, feature_shape — sparse feature matrix components (e.g., p-values)
labels (optional) — ground-truth community labels
label_indices (optional) — indices for ground-truth labels
evolution (optional) — evolution ground-truth (for dynamic graphs)
evolution_indices (optional) — indices for evolution ground-truth (for dynamic graphs)
name (optional) — node names

Available example datasets

Most of the provided datasets are .zip archives containing one or more .npz files (one .npz per graph / per time step for dynamic datasets). Below is the list of archives currently available in the data/ folder.

Synthetic datasets

[dynamic_albert-barabasi_batra-SAE-VM.zip](data/dynamic_albert-barabasi_batra-SAE-VM.zip) : 1000 graphs; 1000 nodes; 3 time steps; 10 communities; 10 nodes per community.
[dynamic_evolutive_values_albert-barabasi_batra-SAE-VM.zip](data/dynamic_evolutive_values_albert-barabasi_batra-SAE-VM.zip) : 1000 graphs; 1000 nodes; 3 time steps; 10 communities; 10 nodes per community.
[subset_dynamic_rewire-PPI-STRING-Human-robinson.zip](data/subset_dynamic_rewire-PPI-STRING-Human-robinson.zip) : 25 graphs; 16201 nodes; 3 time steps; 50 communities; 10 nodes per community. WARNING: this dataset is a subset of the full dynamic_rewire-PPI-STRING-Human-robinson.zip dataset used in the paper, containing only 25 graphs instead of 100. It is provided for testing, as the full dataset can is too large to be added.
[dynamic_rewire_with_modules-PPI-IntAct-Human-robinson.zip](data/dynamic_rewire_with_modules-PPI-IntAct-Human-robinson.zip) : 100 graphs; 5784 nodes; 3 time steps; 50 communities; 10 nodes per community.
[subset_dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip](data/subset_dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip) : 100 graphs; 16201 nodes; 3 time steps; 50 communities; 10 nodes per community. WARNING: this dataset is a subset of the full dynamic_evolutive_values_rewire-PPI-STRING-Human-robinson.zip dataset used in the paper, containing only 25 graphs instead of 100. It is provided for testing, as the full dataset can is too large to be added.
[dynamic_evolutive_values_rewire_with_modules-PPI-IntAct-Human-robinson.zip](data/dynamic_evolutive_values_rewire_with_modules-PPI-IntAct-Human-robinson.zip) : 100 graphs; 5784 nodes; 3 time steps; 50 communities; 10 nodes per community.

Real datasets

Data from Viggars, M. R., Sutherland, H., Lanmüller, H., Schmoll, M., Bijak, M., & Jarvis, J. C. (2023). Adaptation of the transcriptional response to resistance exercise over 4 weeks of daily training. FASEB journal : official publication of the Federation of American Societies for Experimental Biology, 37(1), e22686. https://doi.org/10.1096/fj.202201418R :
- [Rat_Daily_Training.zip](data/Rat_Daily_Training.zip) : 4 time steps.
Data from Dumas, S. J., Meta, E., Borri, M., Goveia, J., Rohlenova, K., Conchinha, N. V., Falkenberg, K., Teuwen, L. A., de Rooij, L., Kalucka, J., Chen, R., Khan, S., Taverna, F., Lu, W., Parys, M., De Legher, C., Vinckier, S., Karakach, T. K., Schoonjans, L., Lin, L., … Carmeliet, P. (2020). Single-Cell RNA Sequencing Reveals Renal Endothelium Heterogeneity and Metabolic Adaptation to Water Deprivation. Journal of the American Society of Nephrology : JASN, 31( 1), 118–138. https://doi.org/10.1681/ASN.2019080832 :
- [cRECs.zip](data/cRECs.zip) : 3 time steps.
- [gRECs.zip](data/gRECs.zip) : 4 time steps.
- [mRECs.zip](data/mRECs.zip) : 4 time steps.
Data from Pasquier, C., & Robichon, A. (2021). Temporal and sequential order of nonoverlapping gene networks unraveled in mated female Drosophila. Life science alliance, 5(2), e202101119. https://doi.org/10.26508/lsa.202101119 :
- [Drosophila_Amine_DESeq.zip](data/Drosophila_Amine_DESeq.zip) : 3 time steps.
- [Drosophila_Amine_edgeR.zip](data/Drosophila_Amine_edgeR.zip) : 3 time steps.

Requirements

The code is written in Python 3.11.

The following packages are required (also listed in environment.yml):

numpy
scipy
scikit-learn
matplotlib
networkx
psutil
pylint~=3.0
openpyxl
networkit
plotly
kaleido-core
scikit-network
pyunionfind

Cite

TO ADD

Contact

If you have any question, please contact me at singlan.nina@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A new evolutive clustering algorithm (eSIMBA) for Active Module Identification in p-value Attributed Temporal Biological Networks

Table of contents

Installation

Reproduce paper results

Usage on your own data

Command-line arguments (precise list)

Validation rules and notes

Repository structure

Data and format

Format of the `.npz` files

Available example datasets

Synthetic datasets

Real datasets

Requirements

Cite

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
clustering		clustering
data		data
graph		graph
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
main.py		main.py

Folders and files

Latest commit

History

Repository files navigation

A new evolutive clustering algorithm (eSIMBA) for Active Module Identification in p-value Attributed Temporal Biological Networks

Table of contents

Installation

Reproduce paper results

Usage on your own data

Command-line arguments (precise list)

Validation rules and notes

Repository structure

Data and format

Format of the .npz files

Available example datasets

Synthetic datasets

Real datasets

Requirements

Cite

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Format of the `.npz` files

Packages