Skip to content

skateddu/correlation-network

Repository files navigation

correlation-network

Python License Code style: Ruff Type: Typed

Python library that builds association networks from pandas DataFrames. Computes pairwise association measures (correlations, Cramer's V, mutual information, etc.), applies threshold filtering, and produces NetworkX graphs with centrality analysis and visualization.

Features

  • Association methods: Pearson, Spearman, Kendall, Cramer's V, Correlation Ratio, Mutual Information, PhiK
  • Auto-detection: selects the right measure based on column types (numeric, categorical, mixed)
  • API: scikit-learn-style fit() / transform() / fit_transform() pipeline
  • Centrality: degree, strength, betweenness, eigenvector, closeness — individual or summary table
  • Visualization: continuous colormap, community detection coloring, centrality-scaled nodes, per-edge alpha
  • Extensible: any object implementing the AssociationStrategy protocol can be plugged in

Project Structure

correlation-network/
├── src/correlation_network/
│   ├── __init__.py              # Public API exports
│   ├── network.py               # CorrelationNetwork (main entry point)
│   ├── centrality.py            # CentralityAnalyzer (5 metrics + summary)
│   ├── visualization.py         # NetworkVisualizer (5 layouts, colormap, community colors)
│   ├── exceptions.py            # NotFittedError
│   ├── _types.py                # Literal type aliases (AssociationMethod, CentralityMetric, LayoutAlgorithm)
│   ├── _validation.py           # check_is_fitted guard
│   ├── _graph_utils.py          # Shared graph utilities (abs_weight_graph)
│   └── association/
│       ├── __init__.py           # Re-exports all strategies
│       ├── _base.py              # AssociationStrategy (Protocol)
│       ├── _correlation.py       # Pearson, Spearman, Kendall
│       ├── _categorical.py       # Cramer's V, Correlation Ratio (eta)
│       ├── _universal.py         # PhiK, Mutual Information
│       └── _auto.py              # Auto-detection + dispatch
├── tests/
│   ├── conftest.py               # Shared fixtures (numeric, categorical, mixed DataFrames)
│   ├── unit/                     # One test file per module
│   └── integration/              # End-to-end pipeline tests
├── pyproject.toml
└── LICENSE                       # MIT

Dependencies

Package Description Link
NetworkX Graph data structures, algorithms, and layout Docs
pandas DataFrame input and association matrix storage Docs
NumPy Array operations for matrix computation Docs
SciPy Chi-squared test (Cramer's V), entropy (Mutual Information) Docs
Matplotlib Graph rendering, colormaps, and figure export Docs

Optional extras:

Extra Description Link
phik PhiK correlation (works on any column type pair) Docs
polars Accept Polars DataFrames as input (auto-converted to pandas) Docs
notebook Jupyter notebook support (ipykernel) Docs

Installation

Requires Python >= 3.10 and uv.

git clone https://github.com/andrea-cadeddu/correlation-network.git
cd correlation-network
uv sync

With optional extras:

uv sync --extra phik       # PhiK correlation support
uv sync --extra polars     # Polars DataFrame support
uv sync --all-extras       # All optional dependencies

Architecture

The library follows a pipeline pattern: DataFrame in, graph out.

graph LR
    A[DataFrame] -->|fit| B[Association Matrix]
    B -->|transform| C[nx.Graph]
    C --> D[CentralityAnalyzer]
    C --> E[NetworkVisualizer]
Loading
Concept Implementation Module
Pluggable metrics AssociationStrategy Protocol association/_base.py
Metric dispatch _STRATEGY_REGISTRY dict network.py
Fit/transform pipeline scikit-learn-style API network.py
Graph centrality NetworkX algorithms wrapper centrality.py
Negative weight handling abs_weight_graph() helper _graph_utils.py
Fitted state guard check_is_fitted decorator _validation.py
Rendering Matplotlib + NetworkX drawing visualization.py

Quick Start

Python API

import pandas as pd
from correlation_network import CorrelationNetwork, CentralityAnalyzer, NetworkVisualizer

# Build the network (default: method="pearson", threshold=0.5)
net = CorrelationNetwork(method="spearman", threshold=0.3)
graph = net.fit_transform(df)

# Analyze centrality
analyzer = CentralityAnalyzer(graph)
rankings = analyzer.summary()
print(rankings)

# Visualize
viz = NetworkVisualizer(graph)
fig = viz.plot(
    centrality_metric="degree",
    community_colors=True,
    title="Association Network",
)

Save directly

viz.save("network.png", dpi=150, community_colors=True, title="My Network")

From a pre-computed matrix

# Use a correlation matrix you already have
net = CorrelationNetwork.from_matrix(my_matrix, threshold=0.5)
graph = net.transform()

With Polars DataFrames

import polars as pl

df_polars = pl.read_csv("data.csv")
graph = CorrelationNetwork(method="pearson", threshold=0.5).fit_transform(df_polars)

Constructor Parameters

Parameter Default Description
method "pearson" Association method (see table below)
threshold 0.5 Minimum |weight| to keep an edge
strategy None Custom AssociationStrategy instance (overrides method)
cardinality_threshold 0.95 Exclude columns with nunique/nrows >= threshold

Preprocessing

Before computing associations, fit() automatically excludes:

  • Datetime columns (not suitable for correlation)
  • High-cardinality columns where nunique / nrows >= cardinality_threshold (e.g. IDs, UUIDs)

Excluded columns are stored in net.excluded_columns_ after fitting:

net = CorrelationNetwork(method="auto", threshold=0.3)
net.fit(df)
print(net.excluded_columns_)  # ['timestamp', 'row_id']

Association Methods

Method Strategy Class Column Types Output Range
pearson PearsonStrategy numeric — numeric [-1, 1]
spearman SpearmanStrategy numeric — numeric [-1, 1]
kendall KendallStrategy numeric — numeric [-1, 1]
cramers_v CramersVStrategy categorical — categorical [0, 1]
correlation_ratio CorrelationRatioStrategy categorical — numeric [0, 1]
mutual_information MutualInformationStrategy any [0, 1]
phik PhiKStrategy any [0, 1]
auto AutoAssociationStrategy auto-detects types varies

Auto mode

When method="auto", the library detects column dtypes and dispatches:

Pair type Default strategy
numeric — numeric Spearman
categorical — categorical Cramer's V
mixed Correlation Ratio
net = CorrelationNetwork(method="auto", threshold=0.3)
graph = net.fit_transform(df)

Custom strategies

Any object implementing the AssociationStrategy protocol (compute(df) -> pd.DataFrame) can be used:

net = CorrelationNetwork(strategy=my_custom_strategy, threshold=0.3)

Centrality Analysis

Method Description
degree() Normalized degree: connected neighbors as fraction of (n-1)
strength() Sum of absolute edge weights
betweenness() Frequency on shortest paths (uses absolute weights)
eigenvector() Influence based on connections to high-scoring nodes
closeness() Inverse average distance to all other nodes (uses absolute weights)
summary() DataFrame with all metrics combined
analyzer = CentralityAnalyzer(graph)

analyzer.degree()       # pd.Series, sorted descending
analyzer.summary()      # pd.DataFrame with all 5 metrics

Visualization

Parameter Default Description
layout kamada_kawai Layout algorithm (spring, circular, shell, spectral)
centrality_metric None Scale node size by centrality (degree, strength, etc.)
figsize (10, 8) Figure size in inches
dpi 100 Figure resolution
node_color tab:blue Node color (ignored when community_colors=True)
community_colors False Color nodes by greedy modularity community
edge_cmap RdYlGn Continuous colormap for edges (None for two-color mode)
positive_edge_color darkgreen Positive edge color (two-color mode only)
negative_edge_color tab:red Negative edge color (two-color mode only)
font_size 10 Font size for node labels
edge_label_font_size 8 Font size for edge weight labels
show_edge_labels True Display edge weight labels
title None Plot title
viz = NetworkVisualizer(graph)

fig = viz.plot(
    layout="kamada_kawai",
    centrality_metric="degree",
    community_colors=True,
    edge_cmap="RdYlGn",
    title="My Network",
)

viz.save("network.png", dpi=150, community_colors=True)

Testing

# Run all tests
uv run pytest

# Run a specific test file
uv run pytest tests/unit/test_network.py -v

# Run tests matching a pattern
uv run pytest -k "centrality" -v

# Run with coverage
uv run pytest --cov=src/correlation_network

# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/

References

About

Build and analyze correlation and association networks from pandas DataFrames. Supports various correlation and association algorithms. Provides centrality analysis and visualization via NetworkX.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages