Python library that builds association networks from pandas DataFrames. Computes pairwise association measures (correlations, Cramer's V, mutual information, etc.), applies threshold filtering, and produces NetworkX graphs with centrality analysis and visualization.
- Association methods: Pearson, Spearman, Kendall, Cramer's V, Correlation Ratio, Mutual Information, PhiK
- Auto-detection: selects the right measure based on column types (numeric, categorical, mixed)
- API: scikit-learn-style
fit()/transform()/fit_transform()pipeline - Centrality: degree, strength, betweenness, eigenvector, closeness — individual or summary table
- Visualization: continuous colormap, community detection coloring, centrality-scaled nodes, per-edge alpha
- Extensible: any object implementing the
AssociationStrategyprotocol can be plugged in
correlation-network/
├── src/correlation_network/
│ ├── __init__.py # Public API exports
│ ├── network.py # CorrelationNetwork (main entry point)
│ ├── centrality.py # CentralityAnalyzer (5 metrics + summary)
│ ├── visualization.py # NetworkVisualizer (5 layouts, colormap, community colors)
│ ├── exceptions.py # NotFittedError
│ ├── _types.py # Literal type aliases (AssociationMethod, CentralityMetric, LayoutAlgorithm)
│ ├── _validation.py # check_is_fitted guard
│ ├── _graph_utils.py # Shared graph utilities (abs_weight_graph)
│ └── association/
│ ├── __init__.py # Re-exports all strategies
│ ├── _base.py # AssociationStrategy (Protocol)
│ ├── _correlation.py # Pearson, Spearman, Kendall
│ ├── _categorical.py # Cramer's V, Correlation Ratio (eta)
│ ├── _universal.py # PhiK, Mutual Information
│ └── _auto.py # Auto-detection + dispatch
├── tests/
│ ├── conftest.py # Shared fixtures (numeric, categorical, mixed DataFrames)
│ ├── unit/ # One test file per module
│ └── integration/ # End-to-end pipeline tests
├── pyproject.toml
└── LICENSE # MIT
| Package | Description | Link |
|---|---|---|
| NetworkX | Graph data structures, algorithms, and layout | Docs |
| pandas | DataFrame input and association matrix storage | Docs |
| NumPy | Array operations for matrix computation | Docs |
| SciPy | Chi-squared test (Cramer's V), entropy (Mutual Information) | Docs |
| Matplotlib | Graph rendering, colormaps, and figure export | Docs |
Optional extras:
| Extra | Description | Link |
|---|---|---|
phik |
PhiK correlation (works on any column type pair) | Docs |
polars |
Accept Polars DataFrames as input (auto-converted to pandas) | Docs |
notebook |
Jupyter notebook support (ipykernel) | Docs |
Requires Python >= 3.10 and uv.
git clone https://github.com/andrea-cadeddu/correlation-network.git
cd correlation-network
uv syncWith optional extras:
uv sync --extra phik # PhiK correlation support
uv sync --extra polars # Polars DataFrame support
uv sync --all-extras # All optional dependenciesThe library follows a pipeline pattern: DataFrame in, graph out.
graph LR
A[DataFrame] -->|fit| B[Association Matrix]
B -->|transform| C[nx.Graph]
C --> D[CentralityAnalyzer]
C --> E[NetworkVisualizer]
| Concept | Implementation | Module |
|---|---|---|
| Pluggable metrics | AssociationStrategy Protocol |
association/_base.py |
| Metric dispatch | _STRATEGY_REGISTRY dict |
network.py |
| Fit/transform pipeline | scikit-learn-style API | network.py |
| Graph centrality | NetworkX algorithms wrapper | centrality.py |
| Negative weight handling | abs_weight_graph() helper |
_graph_utils.py |
| Fitted state guard | check_is_fitted decorator |
_validation.py |
| Rendering | Matplotlib + NetworkX drawing | visualization.py |
import pandas as pd
from correlation_network import CorrelationNetwork, CentralityAnalyzer, NetworkVisualizer
# Build the network (default: method="pearson", threshold=0.5)
net = CorrelationNetwork(method="spearman", threshold=0.3)
graph = net.fit_transform(df)
# Analyze centrality
analyzer = CentralityAnalyzer(graph)
rankings = analyzer.summary()
print(rankings)
# Visualize
viz = NetworkVisualizer(graph)
fig = viz.plot(
centrality_metric="degree",
community_colors=True,
title="Association Network",
)viz.save("network.png", dpi=150, community_colors=True, title="My Network")# Use a correlation matrix you already have
net = CorrelationNetwork.from_matrix(my_matrix, threshold=0.5)
graph = net.transform()import polars as pl
df_polars = pl.read_csv("data.csv")
graph = CorrelationNetwork(method="pearson", threshold=0.5).fit_transform(df_polars)| Parameter | Default | Description |
|---|---|---|
method |
"pearson" |
Association method (see table below) |
threshold |
0.5 |
Minimum |weight| to keep an edge |
strategy |
None |
Custom AssociationStrategy instance (overrides method) |
cardinality_threshold |
0.95 |
Exclude columns with nunique/nrows >= threshold |
Before computing associations, fit() automatically excludes:
- Datetime columns (not suitable for correlation)
- High-cardinality columns where
nunique / nrows >= cardinality_threshold(e.g. IDs, UUIDs)
Excluded columns are stored in net.excluded_columns_ after fitting:
net = CorrelationNetwork(method="auto", threshold=0.3)
net.fit(df)
print(net.excluded_columns_) # ['timestamp', 'row_id']| Method | Strategy Class | Column Types | Output Range |
|---|---|---|---|
pearson |
PearsonStrategy |
numeric — numeric | [-1, 1] |
spearman |
SpearmanStrategy |
numeric — numeric | [-1, 1] |
kendall |
KendallStrategy |
numeric — numeric | [-1, 1] |
cramers_v |
CramersVStrategy |
categorical — categorical | [0, 1] |
correlation_ratio |
CorrelationRatioStrategy |
categorical — numeric | [0, 1] |
mutual_information |
MutualInformationStrategy |
any | [0, 1] |
phik |
PhiKStrategy |
any | [0, 1] |
auto |
AutoAssociationStrategy |
auto-detects types | varies |
When method="auto", the library detects column dtypes and dispatches:
| Pair type | Default strategy |
|---|---|
| numeric — numeric | Spearman |
| categorical — categorical | Cramer's V |
| mixed | Correlation Ratio |
net = CorrelationNetwork(method="auto", threshold=0.3)
graph = net.fit_transform(df)Any object implementing the AssociationStrategy protocol (compute(df) -> pd.DataFrame) can be used:
net = CorrelationNetwork(strategy=my_custom_strategy, threshold=0.3)| Method | Description |
|---|---|
degree() |
Normalized degree: connected neighbors as fraction of (n-1) |
strength() |
Sum of absolute edge weights |
betweenness() |
Frequency on shortest paths (uses absolute weights) |
eigenvector() |
Influence based on connections to high-scoring nodes |
closeness() |
Inverse average distance to all other nodes (uses absolute weights) |
summary() |
DataFrame with all metrics combined |
analyzer = CentralityAnalyzer(graph)
analyzer.degree() # pd.Series, sorted descending
analyzer.summary() # pd.DataFrame with all 5 metrics| Parameter | Default | Description |
|---|---|---|
layout |
kamada_kawai |
Layout algorithm (spring, circular, shell, spectral) |
centrality_metric |
None |
Scale node size by centrality (degree, strength, etc.) |
figsize |
(10, 8) |
Figure size in inches |
dpi |
100 |
Figure resolution |
node_color |
tab:blue |
Node color (ignored when community_colors=True) |
community_colors |
False |
Color nodes by greedy modularity community |
edge_cmap |
RdYlGn |
Continuous colormap for edges (None for two-color mode) |
positive_edge_color |
darkgreen |
Positive edge color (two-color mode only) |
negative_edge_color |
tab:red |
Negative edge color (two-color mode only) |
font_size |
10 |
Font size for node labels |
edge_label_font_size |
8 |
Font size for edge weight labels |
show_edge_labels |
True |
Display edge weight labels |
title |
None |
Plot title |
viz = NetworkVisualizer(graph)
fig = viz.plot(
layout="kamada_kawai",
centrality_metric="degree",
community_colors=True,
edge_cmap="RdYlGn",
title="My Network",
)
viz.save("network.png", dpi=150, community_colors=True)# Run all tests
uv run pytest
# Run a specific test file
uv run pytest tests/unit/test_network.py -v
# Run tests matching a pattern
uv run pytest -k "centrality" -v
# Run with coverage
uv run pytest --cov=src/correlation_network
# Lint and format
uv run ruff check src/ tests/
uv run ruff format src/ tests/- Newman, M. E. J. (2010). Networks: An Introduction. Oxford University Press.
- Cramér, H. (1946). Mathematical Methods of Statistics. Princeton University Press.
- Bergsma, W. (2018). A new correlation coefficient. arXiv:1712.05289 — PhiK foundation.
- Reshef, D. et al. (2011). Detecting Novel Associations in Large Data Sets. Science.