From 3b3084766ca26233117aa735849a9b580443b243 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Thu, 4 Dec 2025 18:12:27 +0900 Subject: [PATCH 01/12] processor_tda: Add documentation for processor_tda Signed-off-by: Hiroshi Hatake --- pipeline/processors.md | 1 + pipeline/processors/tda.md | 283 +++++++++++++++++++++++++++++++++++++ 2 files changed, 284 insertions(+) create mode 100644 pipeline/processors/tda.md diff --git a/pipeline/processors.md b/pipeline/processors.md index a61cf454e..43e17908d 100644 --- a/pipeline/processors.md +++ b/pipeline/processors.md @@ -21,6 +21,7 @@ Fluent Bit offers the following processors: - [Sampling](./processors/sampling.md): Apply head or tail sampling to incoming traces. - [SQL](./processors/sql.md): Use SQL queries to extract log content. - [Filters as processors](./processors/filters.md): Use filters as processors. +- [TDA](./processors/tda.md): Do Topological Data Analysis calculations. ## Features diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md new file mode 100644 index 000000000..a233d3e66 --- /dev/null +++ b/pipeline/processors/tda.md @@ -0,0 +1,283 @@ +# TDA + +The `tda` processor applies **Topological Data Analysis (TDA)** – specifically, **persistent homology** – to Fluent Bit’s metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space. + +This processor is intended for detecting **phase transitions**, **regime changes**, and **intermittent instabilities** that are hard to see from individual counters, gauges, or standard statistical aggregates. It can, for example, differentiate between a single, one-off failure and an extended period of intermittent failures where the system never settles into a stable regime. + +Currently, `tda` works only in the **metrics pipeline** (`processors.metrics`). + +--- + +## Configuration parameters + +The `tda` processor supports the following configuration parameters: + +| Key | Description | Default | +| ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | +| `window_size` | Number of samples to keep in the TDA sliding window. This controls how far back in time the topology is estimated. | `60` | +| `min_points` | Minimum number of samples required in the window before running TDA. Until this limit is reached, no Betti metrics are emitted. | `10` | +| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by Takens’ theorem. | `3` | +| `embed_delay` | Delay `τ` in samples between successive lags used in delay embedding. | `1` | +| `threshold` | Distance scale selector. `0` enables an automatic **multi-quantile scan** across several candidate thresholds; a value in `(0, 1)` is interpreted as a single quantile used to pick the Rips radius. | `0` | + +All parameters are optional; defaults are suitable as a starting point for many workloads. + +--- + +## How it works + +### 1. Metric aggregation and normalization + +On each metrics flush, `tda`: + +1. **Groups metrics by `(namespace, subsystem)`** + All counters, gauges, and untyped metrics are traversed. For each `cmt_map`, the pair `(ns, subsystem)` is hashed and assigned a **feature index**. This produces a fixed-dimensional feature vector of length `feature_dim` (number of `(ns, subsystem)` groups). + +2. **Aggregates values per group** + For each group, all static and labeled metrics are summed into the corresponding feature dimension. + +3. **Converts counters to approximate rates** + The processor keeps the previous raw snapshot `last_vec` and timestamp `last_ts`. For each dimension: + + * `diff = now_raw - prev_raw` + * `dt_sec = (ts_now - ts_prev) / 1e9` + * `rate = diff / dt_sec` + A safeguard ensures `dt_sec > 0`. + +4. **Applies signed `log1p` normalization** + To stabilize very different magnitudes and bursty traffic, each rate is mapped to + `norm = log1p(|rate|)`, and the sign of `rate` is reattached. This yields a vector that is roughly scale-invariant but still sensitive to relative changes in rates across groups. + +The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented via a lightweight circular buffer (`lwrb`) that stores timestamped samples. The window maintains at most `window_size` samples; older samples are dropped when the buffer is full. + +### 2. Sliding window and delay embedding + +Let the ring buffer contain `n_raw` samples and the feature dimension be `D = feature_dim`. To capture temporal structure, `tda` supports an optional **delay embedding**: + +* Embedding dimension: `m = embed_dim` (forced to `1` if `embed_dim <= 0`) +* Lag (in integer samples): `τ = embed_delay` (ignored when `m = 1`) + +For each valid time index `t`, a reconstructed state vector is built as: + +$$ +x_t ;\to; (x_t,; x_{t-\tau},; \dots,; x_{t-(m-1)\tau}) +$$ + +where each `x_·` is the **D-dimensional normalized metrics vector** at that time. This yields embedded points in (\mathbb{R}^{mD}). + +Because we need all lags to be inside the window, the number of embedded points is: + +$$ +n_{\text{embed}} = n_{\text{raw}} - (m - 1)\tau +$$ + +If `n_raw < (m − 1)τ + 1`, TDA is skipped until enough data has accumulated. + +This embedding follows the idea of **Takens’ theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. + +Intuitively: + +* `embed_dim = 1`: you see only the current “snapshot” geometry. +* `embed_dim > 1`: you expose **loops and recurrent trajectories** in the joint evolution of metrics, which later show up as **H₁ (Betti₁) features**. + +### 3. Distance matrix construction + +For the embedded points $ x_i \in \mathbb{R}^{mD} $ (`i = 0..n_embed-1`), `tda` builds a **dense Euclidean distance matrix**: + +$$ +d(i, j) = \left| x_i - x_j \right|_2 +$$ + +The implementation iterates over all pairs `(i, j)` with `i > j`, accumulates squared differences across both feature dimensions and lags, and then takes the square root; the matrix is stored symmetrically with zeros on the diagonal. + +### 4. Threshold selection (Rips scale) + +Persistent homology requires a **scale parameter** (Rips radius / distance threshold). The plugin supports two modes: + +1. **Automatic multi-quantile scan** (`threshold = 0`, default) + + * The off-diagonal distances are collected, sorted, and several quantiles are evaluated, e.g. `q ∈ {0.10, 0.20, …, 0.90}`. + * For each candidate quantile `q`, a threshold `r_q` is chosen and Betti numbers are computed using Ripser. + * The plugin prefers the scale where **Betti₁** (loops) is maximized; if all Betti₁ are zero, it falls back to Betti₀ as a secondary indicator. + +2. **Fixed quantile mode** (`0 < threshold < 1`) + + * `threshold` is interpreted as a single quantile `q`. The Rips radius is set at this quantile of all pairwise distances. + * The multi-quantile scan still runs internally for robustness, but reported diagnostics (e.g., debug logs) will reflect the user-selected quantile. + +Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, which gathers all `i > j` entries of the distance matrix, sorts them, and picks the specified quantile index. + +### 5. Persistent homology via Ripser + +Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris–Rips persistent homology: + +1. **Compression and C API** + + * The dense `n_embed × n_embed` matrix is converted into Ripser’s `compressed_lower_distance_matrix`. + * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in (\mathbb{Z}/2\mathbb{Z}), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. + +2. **Interval aggregation** + + * A callback (`interval_recorder`) receives all persistence intervals ((\text{birth}, \text{death})) from Ripser. + * Intervals with very small persistence are filtered out, and the remaining ones are counted per homology dimension to form Betti numbers. + +3. **Multi-scale selection** + + * For each candidate threshold, Betti numbers are computed. + * The “best” scale is chosen as the one with the largest Betti₁ (loops); if Betti₁ is zero across scales, the plugin picks the scale where Betti₀ is largest. + * The corresponding Betti₀, Betti₁, and Betti₂ values are then exported as Fluent Bit gauges. + +### 6. Exported metrics + +`tda` creates (lazily) three gauge metrics in the `fluentbit_tda_*` namespace: + +| Metric name | Type | Description | +| ---------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | +| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ – number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many “micro-regimes”. | +| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ – number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | +| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ – number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. | + +Each metric is timestamped with the current time at the moment of TDA computation and is exported via the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric. + +--- + +## Interpreting Betti numbers + +Topologically, Betti numbers count the number of “holes” of each dimension in a space: + +* **Betti₀** – connected components (0-dimensional clusters). +* **Betti₁** – 1-dimensional holes (loops / cycles). +* **Betti₂** – 2-dimensional voids, and so on. + +In our context: + +* The sliding window of metrics is a **point cloud in phase space**. +* The Rips complex at a given scale connects points that are close in this space. +* Betti numbers summarize the topology of this complex. + +Some practical patterns: + +1. **Stable regime** + + * Metrics fluctuate near a single attractor. + * Betti₀ is small (often close to 1–few), Betti₁ and Betti₂ are typically `0` or very small. + +2. **Single, one-off failure** + + * A brief outage or spike happens once and resolves. + * The embedding sees a short excursion but no sustained cycling, so Betti₁ and Betti₂ often remain near `0`. + * In the provided HTTP example, a single failing chunk does not significantly raise Betti₁/₂. + +3. **Intermittent failure / unstable regime** + + * The system repeatedly bounces between “healthy” and “unhealthy” states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses). + * The trajectory in phase space forms **loops**: metrics move away from the healthy region and then return, many times. + * Betti₁ (and occasionally Betti₂) increases noticeably while this behavior persists, reflecting the emergence of non-trivial cycles in the metric dynamics. + + In the sample output, as the HTTP output oscillates between success and various `Connection refused` / `broken connection` errors, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (e.g., Betti₁ around 10–13, Betti₂ around 1–2) while Betti₀ also increases. This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability. + +These interpretations are consistent with results from condensed matter physics and dynamical systems, where persistent homology has been used to detect phase transitions and changes in underlying order purely from data [1][2]. + +--- + +## Configuration examples + +### Basic setup with `fluentbit_metrics` + +The following example computes TDA on Fluent Bit’s own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`: + +```yaml +service: + http_server: On + http_port: 2021 + +pipeline: + inputs: + - name: dummy + tag: log.raw + samples: 10000 + + - name: fluentbit_metrics + tag: metrics.raw + + processors: + metrics: + # Optionally exclude metrics you don't want in the TDA feature vector + - name: metrics_selector + metric_name: /process_start_time_seconds/ + action: exclude + + - name: metrics_selector + metric_name: /build_info/ + action: exclude + + # Perform TDA on the remaining metrics + - name: tda + # window_size: 60 # optional tuning + # min_points: 10 + # embed_dim: 3 + # embed_delay: 1 + # threshold: 0 # auto multi-quantile scan + + outputs: + - name: stdout + match: '*' +``` + +With this configuration, you will see time series like: + +```text +fluentbit_tda_betti0 = 39 +fluentbit_tda_betti1 = 7 +fluentbit_tda_betti2 = 0 +... +fluentbit_tda_betti0 = 56 +fluentbit_tda_betti1 = 13 +fluentbit_tda_betti2 = 2 +``` + +These Betti metrics can be scraped by Prometheus, forwarded to an observability backend, and used in alerts (for example, triggering on sudden increases in `fluentbit_tda_betti1` as a signal of emerging instability in the pipeline). + +### Emphasizing short-term cycles with delay embedding + +To focus on shorter-term cyclic behavior—for example, oscillations in retry logic and error counters—you can lower `window_size` and adjust the embedding parameters: + +```yaml +processors: + metrics: + - name: tda + window_size: 30 # shorter temporal horizon + min_points: 15 # require at least half the window + embed_dim: 4 # look at 4 successive states + embed_delay: 1 # each lag = 1 metrics interval + threshold: 0.2 # use 20th percentile of distances +``` + +This configuration reconstructs the system in an effective dimension of `4 × feature_dim` and tends to highlight tight loops that occur within roughly 4–10 sampling intervals. + +--- + +## When to use `tda` + +`tda` is particularly useful when: + +* You suspect **non-linear or multi-modal behavior** in your system (e.g., on/off regimes, congestion collapse, periodic retries). +* Standard indicators (mean, percentiles, error rates) show “noise,” but you want to know whether that noise hides **coherent structure**. +* You want to build alerts not just on “levels” of metrics, but on **changes in the topology** of system behavior – for example: + + * “Raise an alert if Betti₁ remains above 5 for more than 5 minutes.” + * “Mark windows where Betti₂ becomes non-zero as potential phase transitions.” + +Because the plugin operates on an arbitrary selection of metrics (chosen upstream via `metrics_selector` or by how you configure `fluentbit_metrics`), you can tailor the TDA to focus on: + +* Network health (latency histograms, connection failures, TLS handshake errors), +* Resource saturation (CPU, memory, buffer usage), +* Pipeline-level signals (retries, DLQ usage, chunk failures), +* Or any other metric subset that meaningfully characterizes the state of your system. + +--- + +## References + +1. I. Donato, M. Gori, A. Sarti, “Persistent homology analysis of phase transitions,” *Physical Review E*, 93, 052138, 2016. +2. F. Takens, “Detecting strange attractors in turbulence,” in D. Rand and L.-S. Young (eds.), *Dynamical Systems and Turbulence*, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. From 11059396f35d9b5927e19ebfd1abbb30cd381c48 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Thu, 4 Dec 2025 18:16:57 +0900 Subject: [PATCH 02/12] processor_tda: Address comments Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index a233d3e66..fa0ea48aa 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -176,7 +176,7 @@ Some practical patterns: In the sample output, as the HTTP output oscillates between success and various `Connection refused` / `broken connection` errors, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (e.g., Betti₁ around 10–13, Betti₂ around 1–2) while Betti₀ also increases. This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability. -These interpretations are consistent with results from condensed matter physics and dynamical systems, where persistent homology has been used to detect phase transitions and changes in underlying order purely from data [1][2]. +These interpretations are consistent with results from condensed matter physics and dynamical systems, where persistent homology has been used to detect phase transitions and changes in underlying order purely from data (References 1 and 2). --- @@ -279,5 +279,5 @@ Because the plugin operates on an arbitrary selection of metrics (chosen upstrea ## References -1. I. Donato, M. Gori, A. Sarti, “Persistent homology analysis of phase transitions,” *Physical Review E*, 93, 052138, 2016. -2. F. Takens, “Detecting strange attractors in turbulence,” in D. Rand and L.-S. Young (eds.), *Dynamical Systems and Turbulence*, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. +1. I. Donato, M. Gori, A. Sarti, “Persistent homology analysis of phase transitions,” _Physical Review E_, 93, 052138, 2016. +2. F. Takens, “Detecting strange attractors in turbulence,” in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. From 155e3f94b81cf0a4ba9578e1bec49ee1b8c5de1a Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Thu, 4 Dec 2025 18:19:14 +0900 Subject: [PATCH 03/12] processor_tda: Fix a descriptions of betti_0 Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index fa0ea48aa..46b0bb0e9 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -160,7 +160,7 @@ Some practical patterns: 1. **Stable regime** * Metrics fluctuate near a single attractor. - * Betti₀ is small (often close to 1–few), Betti₁ and Betti₂ are typically `0` or very small. + * Betti₀ is small (often close to 1–few and saturated on a long running), Betti₁ and Betti₂ are typically `0` or very small. 2. **Single, one-off failure** From a416f40c570b165a58adb476c4ee2fc245f499f2 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Thu, 4 Dec 2025 18:26:28 +0900 Subject: [PATCH 04/12] processor_tda: Address comments Signed-off-by: Hiroshi Hatake --- pipeline/processors.md | 2 +- pipeline/processors/tda.md | 39 ++++++++++++++++++++------------------ 2 files changed, 22 insertions(+), 19 deletions(-) diff --git a/pipeline/processors.md b/pipeline/processors.md index 43e17908d..59b1a5c5a 100644 --- a/pipeline/processors.md +++ b/pipeline/processors.md @@ -21,7 +21,7 @@ Fluent Bit offers the following processors: - [Sampling](./processors/sampling.md): Apply head or tail sampling to incoming traces. - [SQL](./processors/sql.md): Use SQL queries to extract log content. - [Filters as processors](./processors/filters.md): Use filters as processors. -- [TDA](./processors/tda.md): Do Topological Data Analysis calculations. +- [TDA](./processors/tda.md): Do Topological Data Analysis (TDA) calculations. ## Features diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 46b0bb0e9..191003858 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -1,9 +1,9 @@ # TDA -The `tda` processor applies **Topological Data Analysis (TDA)** – specifically, **persistent homology** – to Fluent Bit’s metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space. - -This processor is intended for detecting **phase transitions**, **regime changes**, and **intermittent instabilities** that are hard to see from individual counters, gauges, or standard statistical aggregates. It can, for example, differentiate between a single, one-off failure and an extended period of intermittent failures where the system never settles into a stable regime. +The `tda` processor applies **Topological Data Analysis (TDA)**—specifically, **persistent homology**—to Fluent Bit metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space. +This processor is intended for detecting **phase transitions**, **regime changes**, and **intermittent instabilities** that are difficult to detect from individual counters, gauges, or standard statistical aggregates. +It can, for example, differentiate between a single, one-off failure and an extended period of intermittent failures where the system never settles into a stable regime. Currently, `tda` works only in the **metrics pipeline** (`processors.metrics`). --- @@ -48,7 +48,8 @@ On each metrics flush, `tda`: To stabilize very different magnitudes and bursty traffic, each rate is mapped to `norm = log1p(|rate|)`, and the sign of `rate` is reattached. This yields a vector that is roughly scale-invariant but still sensitive to relative changes in rates across groups. -The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented via a lightweight circular buffer (`lwrb`) that stores timestamped samples. The window maintains at most `window_size` samples; older samples are dropped when the buffer is full. +The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented through a lightweight circular buffer (`lwrb`) that stores timestamped samples. +The window maintains at most `window_size` samples; older samples are dropped when the buffer is full. ### 2. Sliding window and delay embedding @@ -65,7 +66,7 @@ $$ where each `x_·` is the **D-dimensional normalized metrics vector** at that time. This yields embedded points in (\mathbb{R}^{mD}). -Because we need all lags to be inside the window, the number of embedded points is: +Because all lags must be inside the window, the number of embedded points is: $$ n_{\text{embed}} = n_{\text{raw}} - (m - 1)\tau @@ -77,8 +78,8 @@ This embedding follows the idea of **Takens’ theorem**, which states that, und Intuitively: -* `embed_dim = 1`: you see only the current “snapshot” geometry. -* `embed_dim > 1`: you expose **loops and recurrent trajectories** in the joint evolution of metrics, which later show up as **H₁ (Betti₁) features**. +* `embed_dim = 1`: only the current "snapshot" geometry is visible. +* `embed_dim > 1`: **loops and recurrent trajectories** in the joint evolution of metrics become visible, which later show up as **H₁ (Betti₁) features**. ### 3. Distance matrix construction @@ -96,24 +97,24 @@ Persistent homology requires a **scale parameter** (Rips radius / distance thres 1. **Automatic multi-quantile scan** (`threshold = 0`, default) - * The off-diagonal distances are collected, sorted, and several quantiles are evaluated, e.g. `q ∈ {0.10, 0.20, …, 0.90}`. + * The off-diagonal distances are collected, sorted, and several quantiles are evaluated, for example `q ∈ {0.10, 0.20, …, 0.90}`. * For each candidate quantile `q`, a threshold `r_q` is chosen and Betti numbers are computed using Ripser. * The plugin prefers the scale where **Betti₁** (loops) is maximized; if all Betti₁ are zero, it falls back to Betti₀ as a secondary indicator. 2. **Fixed quantile mode** (`0 < threshold < 1`) * `threshold` is interpreted as a single quantile `q`. The Rips radius is set at this quantile of all pairwise distances. - * The multi-quantile scan still runs internally for robustness, but reported diagnostics (e.g., debug logs) will reflect the user-selected quantile. + * The multi-quantile scan still runs internally for robustness, but reported diagnostics (For example, debug logs) will reflect the user-selected quantile. Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, which gathers all `i > j` entries of the distance matrix, sorts them, and picks the specified quantile index. -### 5. Persistent homology via Ripser +### 5. Persistent Homology through Ripser Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris–Rips persistent homology: 1. **Compression and C API** - * The dense `n_embed × n_embed` matrix is converted into Ripser’s `compressed_lower_distance_matrix`. + * The dense `n_embed × n_embed` matrix is converted into Ripser's `compressed_lower_distance_matrix`. * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in (\mathbb{Z}/2\mathbb{Z}), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. 2. **Interval aggregation** @@ -124,7 +125,7 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a 3. **Multi-scale selection** * For each candidate threshold, Betti numbers are computed. - * The “best” scale is chosen as the one with the largest Betti₁ (loops); if Betti₁ is zero across scales, the plugin picks the scale where Betti₀ is largest. + * The "best" scale is chosen as the one with the largest Betti₁ (loops); if Betti₁ is zero across scales, the plugin picks the scale where Betti₀ is largest. * The corresponding Betti₀, Betti₁, and Betti₂ values are then exported as Fluent Bit gauges. ### 6. Exported metrics @@ -133,9 +134,9 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a | Metric name | Type | Description | | ---------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ – number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many “micro-regimes”. | -| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ – number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | -| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ – number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. | +| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ - number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes". | +| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ - number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | +| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ - number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. | Each metric is timestamped with the current time at the moment of TDA computation and is exported via the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric. @@ -170,11 +171,13 @@ Some practical patterns: 3. **Intermittent failure / unstable regime** - * The system repeatedly bounces between “healthy” and “unhealthy” states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses). + * The system repeatedly bounces between "healthy" and "unhealthy" states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses). * The trajectory in phase space forms **loops**: metrics move away from the healthy region and then return, many times. * Betti₁ (and occasionally Betti₂) increases noticeably while this behavior persists, reflecting the emergence of non-trivial cycles in the metric dynamics. - In the sample output, as the HTTP output oscillates between success and various `Connection refused` / `broken connection` errors, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (e.g., Betti₁ around 10–13, Betti₂ around 1–2) while Betti₀ also increases. This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability. + In the sample output, the HTTP output oscillates between success and various "Connection refused" and "broken connection" errors. + As this occurs, `fluentbit_tda_betti1` and `fluentbit_tda_betti2` grow from small values to larger plateaus (for example, Betti₁ around 10—13, Betti₂ around 1—2) while Betti₀ also increases. + This is a direct signature of a **phase transition** from a stable regime to one with persistent, intermittent instability. These interpretations are consistent with results from condensed matter physics and dynamical systems, where persistent homology has been used to detect phase transitions and changes in underlying order purely from data (References 1 and 2). @@ -184,7 +187,7 @@ These interpretations are consistent with results from condensed matter physics ### Basic setup with `fluentbit_metrics` -The following example computes TDA on Fluent Bit’s own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`: +The following example computes TDA on Fluent Bit's own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`: ```yaml service: From 4bbcb376cb282e057f6f13cffc1595665412c0c3 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Thu, 4 Dec 2025 18:35:55 +0900 Subject: [PATCH 05/12] processor_tda: More style fixes Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 191003858..01281507d 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -1,4 +1,6 @@ -# TDA +# TDA (Topological Data Analysis) + + The `tda` processor applies **Topological Data Analysis (TDA)**—specifically, **persistent homology**—to Fluent Bit metrics stream and exports **Betti numbers** that summarize the shape of recent behavior in metric space. @@ -16,7 +18,7 @@ The `tda` processor supports the following configuration parameters: | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `window_size` | Number of samples to keep in the TDA sliding window. This controls how far back in time the topology is estimated. | `60` | | `min_points` | Minimum number of samples required in the window before running TDA. Until this limit is reached, no Betti metrics are emitted. | `10` | -| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by Takens’ theorem. | `3` | +| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by Takens theorem. | `3` | | `embed_delay` | Delay `τ` in samples between successive lags used in delay embedding. | `1` | | `threshold` | Distance scale selector. `0` enables an automatic **multi-quantile scan** across several candidate thresholds; a value in `(0, 1)` is interpreted as a single quantile used to pick the Rips radius. | `0` | @@ -74,7 +76,7 @@ $$ If `n_raw < (m − 1)τ + 1`, TDA is skipped until enough data has accumulated. -This embedding follows the idea of **Takens’ theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. +This embedding follows the idea of **Takens theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. Intuitively: @@ -115,11 +117,11 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a 1. **Compression and C API** * The dense `n_embed × n_embed` matrix is converted into Ripser's `compressed_lower_distance_matrix`. - * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in (\mathbb{Z}/2\mathbb{Z}), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. + * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in ($\mathbb{Z}/2\mathbb{Z}$), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. 2. **Interval aggregation** - * A callback (`interval_recorder`) receives all persistence intervals ((\text{birth}, \text{death})) from Ripser. + * A callback (`interval_recorder`) receives all persistence intervals ($\text{birth}$, $\text{death}$) from Ripser. * Intervals with very small persistence are filtered out, and the remaining ones are counted per homology dimension to form Betti numbers. 3. **Multi-scale selection** @@ -144,7 +146,7 @@ Each metric is timestamped with the current time at the moment of TDA computatio ## Interpreting Betti numbers -Topologically, Betti numbers count the number of “holes” of each dimension in a space: +Topologically, Betti numbers count the number of "holes" of each dimension in a space: * **Betti₀** – connected components (0-dimensional clusters). * **Betti₁** – 1-dimensional holes (loops / cycles). @@ -265,11 +267,11 @@ This configuration reconstructs the system in an effective dimension of `4 × fe `tda` is particularly useful when: * You suspect **non-linear or multi-modal behavior** in your system (e.g., on/off regimes, congestion collapse, periodic retries). -* Standard indicators (mean, percentiles, error rates) show “noise,” but you want to know whether that noise hides **coherent structure**. +* Standard indicators (mean, percentiles, error rates) show "noise," but you want to know whether that noise hides **coherent structure**. * You want to build alerts not just on “levels” of metrics, but on **changes in the topology** of system behavior – for example: - * “Raise an alert if Betti₁ remains above 5 for more than 5 minutes.” - * “Mark windows where Betti₂ becomes non-zero as potential phase transitions.” + * "Raise an alert if Betti₁ remains above 5 for more than 5 minutes." + * "Mark windows where Betti₂ becomes non-zero as potential phase transitions." Because the plugin operates on an arbitrary selection of metrics (chosen upstream via `metrics_selector` or by how you configure `fluentbit_metrics`), you can tailor the TDA to focus on: @@ -282,5 +284,5 @@ Because the plugin operates on an arbitrary selection of metrics (chosen upstrea ## References -1. I. Donato, M. Gori, A. Sarti, “Persistent homology analysis of phase transitions,” _Physical Review E_, 93, 052138, 2016. -2. F. Takens, “Detecting strange attractors in turbulence,” in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. +1. I. Donato, M. Gori, A. Sarti, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. +2. F. Takens, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. From 14290e61a60599bfe633a7833166ab6436a0972f Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Tue, 9 Dec 2025 18:39:13 +0900 Subject: [PATCH 06/12] processor_tda: Fix styles Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 01281507d..8c7cf3948 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -112,7 +112,7 @@ Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, w ### 5. Persistent Homology through Ripser -Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris–Rips persistent homology: +Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris-Rips persistent homology: 1. **Compression and C API** @@ -136,11 +136,11 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a | Metric name | Type | Description | | ---------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `fluentbit_tda_betti0` | gauge | Approximate Betti₀ - number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes". | -| `fluentbit_tda_betti1` | gauge | Approximate Betti₁ - number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | -| `fluentbit_tda_betti2` | gauge | Approximate Betti₂ - number of 2-dimensional voids (higher-order structures). These can appear when the system explores different “surfaces” in state space, e.g., transitioning between distinct operating modes. | +| `fluentbit_tda_betti0` | gauge | Approximate Betti₀. The number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes". | +| `fluentbit_tda_betti1` | gauge | Approximate Betti₁. The number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | +| `fluentbit_tda_betti2` | gauge | Approximate Betti₂. The number of 2-dimensional voids (higher-order structures). These can appear when the system explores different "surfaces" in state space, for example, transitioning between distinct operating modes. | -Each metric is timestamped with the current time at the moment of TDA computation and is exported via the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric. +Each metric is timestamped with the current time at the moment of TDA computation and is exported through the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric. --- @@ -148,9 +148,9 @@ Each metric is timestamped with the current time at the moment of TDA computatio Topologically, Betti numbers count the number of "holes" of each dimension in a space: -* **Betti₀** – connected components (0-dimensional clusters). -* **Betti₁** – 1-dimensional holes (loops / cycles). -* **Betti₂** – 2-dimensional voids, and so on. +* **Betti₀**: connected components (0-dimensional clusters). +* **Betti₁**: 1-dimensional holes (loops / cycles). +* **Betti₂**: 2-dimensional voids, and so on. In our context: @@ -163,7 +163,7 @@ Some practical patterns: 1. **Stable regime** * Metrics fluctuate near a single attractor. - * Betti₀ is small (often close to 1–few and saturated on a long running), Betti₁ and Betti₂ are typically `0` or very small. + * Betti₀ is small (often close to 1-few and saturated on a long running), Betti₁ and Betti₂ are typically `0` or very small. 2. **Single, one-off failure** @@ -173,7 +173,7 @@ Some practical patterns: 3. **Intermittent failure / unstable regime** - * The system repeatedly bounces between "healthy" and "unhealthy" states (e.g., repeated `Connection refused` / `broken connection` errors interspersed with 200 responses). + * The system repeatedly bounces between "healthy" and "unhealthy" states (For example, repeated `Connection refused` / `broken connection` errors interspersed with 200 responses). * The trajectory in phase space forms **loops**: metrics move away from the healthy region and then return, many times. * Betti₁ (and occasionally Betti₂) increases noticeably while this behavior persists, reflecting the emergence of non-trivial cycles in the metric dynamics. @@ -258,7 +258,7 @@ processors: threshold: 0.2 # use 20th percentile of distances ``` -This configuration reconstructs the system in an effective dimension of `4 × feature_dim` and tends to highlight tight loops that occur within roughly 4–10 sampling intervals. +This configuration reconstructs the system in an effective dimension of `4 × feature_dim` and tends to highlight tight loops that occur within roughly 4-10 sampling intervals. --- @@ -266,14 +266,14 @@ This configuration reconstructs the system in an effective dimension of `4 × fe `tda` is particularly useful when: -* You suspect **non-linear or multi-modal behavior** in your system (e.g., on/off regimes, congestion collapse, periodic retries). +* You suspect **non-linear or multi-modal behavior** in your system (For example, on/off regimes, congestion collapse, periodic retries). * Standard indicators (mean, percentiles, error rates) show "noise," but you want to know whether that noise hides **coherent structure**. -* You want to build alerts not just on “levels” of metrics, but on **changes in the topology** of system behavior – for example: +* You want to build alerts not simply on "levels" of metrics, but on **changes in the topology** of system behavior. For example: * "Raise an alert if Betti₁ remains above 5 for more than 5 minutes." * "Mark windows where Betti₂ becomes non-zero as potential phase transitions." -Because the plugin operates on an arbitrary selection of metrics (chosen upstream via `metrics_selector` or by how you configure `fluentbit_metrics`), you can tailor the TDA to focus on: +Because the plugin operates on an arbitrary selection of metrics (chosen upstream through `metrics_selector` or by how you configure `fluentbit_metrics`), you can tailor the TDA to focus on: * Network health (latency histograms, connection failures, TLS handshake errors), * Resource saturation (CPU, memory, buffer usage), @@ -285,4 +285,4 @@ Because the plugin operates on an arbitrary selection of metrics (chosen upstrea ## References 1. I. Donato, M. Gori, A. Sarti, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. -2. F. Takens, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366–381. +2. F. Takens, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. From 562288770b8a25c337e0035e53c0e7ecea2093e5 Mon Sep 17 00:00:00 2001 From: "Eric D. Schabell" Date: Fri, 15 May 2026 09:29:41 +0200 Subject: [PATCH 07/12] Apply suggestions from code review Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Signed-off-by: Eric D. Schabell --- pipeline/processors/tda.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 6bbea3054..95824bd18 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -246,8 +246,9 @@ These Betti metrics can be scraped by Prometheus, forwarded to an observability ### Emphasizing short-term cycles with delay embedding -To focus on shorter-term cyclic behavior—for example, oscillations in retry logic and error counters—you can lower `window_size` and adjust the embedding parameters: +To focus on shorter-term cyclic behavior—for example, oscillations in retry logic and error counters—you can lower `window_size` and adjust the embedding parameters ======= + # Topological data analysis (`TDA`) This processor applies [Topological Data Analysis](https://en.wikipedia.org/wiki/Topological_data_analysis) (`TDA`) to incoming metrics using a sliding window and `Ripser` persistent homology. It computes Betti numbers that characterize the topological shape of the metric signal over time, which can surface structural patterns (such as recurring cycles or anomalies) that traditional statistical methods miss. From 537fd496050fa7c798c65b1b7ec65a9df2d7863a Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 15 May 2026 19:17:27 +0900 Subject: [PATCH 08/12] processor_tda: Fix styles Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 95824bd18..6aa338645 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -19,7 +19,7 @@ The `tda` processor supports the following configuration parameters: | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `window_size` | Number of samples to keep in the TDA sliding window. This controls how far back in time the topology is estimated. | `60` | | `min_points` | Minimum number of samples required in the window before running TDA. Until this limit is reached, no Betti metrics are emitted. | `10` | -| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by Takens theorem. | `3` | +| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by `Takens`x theorem. | `3` | | `embed_delay` | Delay `τ` in samples between successive lags used in delay embedding. | `1` | | `threshold` | Distance scale selector. `0` enables an automatic **multi-quantile scan** across several candidate thresholds; a value in `(0, 1)` is interpreted as a single quantile used to pick the Rips radius. | `0` | @@ -34,7 +34,7 @@ All parameters are optional; defaults are suitable as a starting point for many On each metrics flush, `tda`: 1. **Groups metrics by `(namespace, subsystem)`** - All counters, gauges, and untyped metrics are traversed. For each `cmt_map`, the pair `(ns, subsystem)` is hashed and assigned a **feature index**. This produces a fixed-dimensional feature vector of length `feature_dim` (number of `(ns, subsystem)` groups). + All `counters`, `gauges`, and `untyped` metrics are traversed. For each `cmt_map`, the pair `(ns, subsystem)` is hashed and assigned a **feature index**. This produces a fixed-dimensional feature vector of length `feature_dim` (number of `(ns, subsystem)` groups). 2. **Aggregates values per group** For each group, all static and labeled metrics are summed into the corresponding feature dimension. @@ -48,8 +48,8 @@ On each metrics flush, `tda`: A safeguard ensures `dt_sec > 0`. 4. **Applies signed `log1p` normalization** - To stabilize very different magnitudes and bursty traffic, each rate is mapped to - `norm = log1p(|rate|)`, and the sign of `rate` is reattached. This yields a vector that is roughly scale-invariant but still sensitive to relative changes in rates across groups. + To stabilize very different magnitudes and burst traffic, each rate is mapped to + `norm = log1p(|rate|)`, and the sign of `rate` is reattached. This yields a vector that's roughly scale-invariant but still sensitive to relative changes in rates across groups. The resulting normalized vector is written into a **ring buffer window** (`tda_window`), implemented through a lightweight circular buffer (`lwrb`) that stores timestamped samples. The window maintains at most `window_size` samples; older samples are dropped when the buffer is full. @@ -77,7 +77,7 @@ $$ If `n_raw < (m − 1)τ + 1`, TDA is skipped until enough data has accumulated. -This embedding follows the idea of **Takens theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. +This embedding follows the idea of **`Takens` theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. Intuitively: @@ -373,9 +373,4 @@ Because the plugin operates on an arbitrary selection of metrics (chosen upstrea ## References 1. I. Donato, M. Gori, A. Sarti, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. -2. F. Takens, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. -======= - window_size: 60 - min_points: 10 - threshold: 0.3 -``` \ No newline at end of file +2. F. `Takens`, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. From 29f7545a9dc235c7c70bcd9455a45c7325358c64 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Fri, 15 May 2026 19:24:50 +0900 Subject: [PATCH 09/12] processor_tda: Fix broken sections on resolving conflicts Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 93 ++------------------------------------ 1 file changed, 3 insertions(+), 90 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 6aa338645..64bad3061 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -19,7 +19,7 @@ The `tda` processor supports the following configuration parameters: | ------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | | `window_size` | Number of samples to keep in the TDA sliding window. This controls how far back in time the topology is estimated. | `60` | | `min_points` | Minimum number of samples required in the window before running TDA. Until this limit is reached, no Betti metrics are emitted. | `10` | -| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by `Takens`x theorem. | `3` | +| `embed_dim` | Delay embedding dimension `m`. `m = 1` disables embedding (original behavior). For example, `m = 3` reconstructs state vectors `(x_t, x_{t-τ}, x_{t-2τ})` as suggested by `Takens` theorem. | `3` | | `embed_delay` | Delay `τ` in samples between successive lags used in delay embedding. | `1` | | `threshold` | Distance scale selector. `0` enables an automatic **multi-quantile scan** across several candidate thresholds; a value in `(0, 1)` is interpreted as a single quantile used to pick the Rips radius. | `0` | @@ -246,94 +246,7 @@ These Betti metrics can be scraped by Prometheus, forwarded to an observability ### Emphasizing short-term cycles with delay embedding -To focus on shorter-term cyclic behavior—for example, oscillations in retry logic and error counters—you can lower `window_size` and adjust the embedding parameters -======= - -# Topological data analysis (`TDA`) - -This processor applies [Topological Data Analysis](https://en.wikipedia.org/wiki/Topological_data_analysis) (`TDA`) to incoming metrics using a sliding window and `Ripser` persistent homology. It computes Betti numbers that characterize the topological shape of the metric signal over time, which can surface structural patterns (such as recurring cycles or anomalies) that traditional statistical methods miss. - -The processor operates only on metrics. Log and trace records pass through unchanged. - -{% hint style="info" %} - -Only [YAML configuration files](../../administration/configuring-fluent-bit/yaml.md) support processors. - -{% endhint %} - -## How it works - -On each flush, the processor: - -1. Aggregates incoming metrics into a feature vector by collapsing each unique `(namespace, subsystem)` pair into a single value. Counters are converted to log-scaled rates; gauges are used directly. -2. Appends the feature vector to a sliding ring-buffer window of up to `window_size` samples. -3. Optionally applies delay embedding (controlled by `embed_dim` and `embed_delay`) to reconstruct attractor geometry from the time series. -4. Once the window holds at least `min_points` samples, builds a pairwise Euclidean distance matrix over the embedded points and runs `Ripser` to compute persistent homology. -5. Scans across multiple distance thresholds (or uses the quantile supplied in `threshold`) and emits the Betti numbers that show the strongest topological signal. - -The output is three gauge metrics added to the same metrics context: - -| Metric | Description | -| ------ | ----------- | -| `fluentbit_tda_betti0` | Betti number β₀—number of connected components in the Vietoris-Rips complex. | -| `fluentbit_tda_betti1` | Betti number β₁—number of independent loops (1-cycles). Elevated values suggest cyclic or periodic patterns. | -| `fluentbit_tda_betti2` | Betti number β₂—number of enclosed voids (2-cycles). | - -## Configuration parameters - -| Key | Description | Default | -| --- | ----------- | ------- | -| `window_size` | Number of samples to keep in the sliding window. | `60` | -| `min_points` | Minimum number of samples that must be in the window before `Ripser` runs. | `10` | -| `embed_dim` | Delay embedding dimension `m`. Setting `m=1` disables delay embedding and uses the raw feature vectors directly. For `m>1`, each point in the distance matrix is constructed from `m` consecutive lagged snapshots (for example, `m=3` → `x_t`, `x_{t-1}`, `x_{t-2}`). | `3` | -| `embed_delay` | Lag `τ` in samples between successive delays in the embedding. Ignored when `embed_dim=1`. | `1` | -| `threshold` | Distance scale selector. `0` triggers an automatic multi-quantile scan that picks the threshold maximizing β₁ (or β₀ when all β₁ are zero). A value in `(0, 1)` is treated as a quantile of the pairwise distance distribution and used directly as the `Ripser` threshold. | `0` | - -## Configuration example - -The following example scrapes Prometheus metrics and runs `TDA` on the ingested data before forwarding to an OpenTelemetry endpoint: - -```yaml -service: - flush: 10 - log_level: info - -pipeline: - inputs: - - name: prometheus_scrape - host: 127.0.0.1 - port: 9090 - scrape_interval: 10s - tag: prom.metrics - - processors: - metrics: - - name: tda - window_size: 60 - min_points: 10 - embed_dim: 3 - embed_delay: 1 - threshold: 0 - - outputs: - - name: opentelemetry - match: 'prom.metrics' - host: otel-collector - port: 4318 -``` - -To disable delay embedding and run `TDA` directly on the raw metric vectors, set `embed_dim: 1`: - -```yaml -processors: - metrics: - - name: tda - window_size: 120 - min_points: 20 - embed_dim: 1 -``` - -To fix the distance threshold at a specific quantile of the pairwise distances (for example, the thirtieth percentile), set `threshold` to a value between 0 and 1: +To focus on shorter-term cyclic behavior—for example, oscillations in retry logic and error counters—you can lower `window_size` and adjust the embedding parameters: ```yaml processors: @@ -373,4 +286,4 @@ Because the plugin operates on an arbitrary selection of metrics (chosen upstrea ## References 1. I. Donato, M. Gori, A. Sarti, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. -2. F. `Takens`, "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. +2. F. `Takens` "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. From 2790ab00695aad3cb4ad6ab6761459ddf4de2fa3 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Mon, 18 May 2026 18:28:47 +0900 Subject: [PATCH 10/12] processor_tda: Suppress warnings on local vale Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 64bad3061..01852c1a3 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -190,7 +190,7 @@ These interpretations are consistent with results from condensed matter physics ### Basic setup with `fluentbit_metrics` -The following example computes TDA on Fluent Bit's own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`: +The following example computes TDA on Fluent Bit own internal metrics, using `metrics_selector` to remove a few high-cardinality or uninteresting metrics before feeding them into `tda`: ```yaml service: @@ -265,11 +265,11 @@ This configuration reconstructs the system in an effective dimension of `4 × fe ## When to use `tda` -`tda` is particularly useful when: +`tda` is particularly effective when: * You suspect **non-linear or multi-modal behavior** in your system (For example, on/off regimes, congestion collapse, periodic retries). * Standard indicators (mean, percentiles, error rates) show "noise," but you want to know whether that noise hides **coherent structure**. -* You want to build alerts not simply on "levels" of metrics, but on **changes in the topology** of system behavior. For example: +* You want to build alerts not only on "levels" of metrics, but on **changes in the topology** of system behavior. For example: * "Raise an alert if Betti₁ remains above 5 for more than 5 minutes." * "Mark windows where Betti₂ becomes non-zero as potential phase transitions." From 1c38348bb9de7a31a535dc3d4e2642b8c14cd1a4 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Mon, 18 May 2026 18:31:20 +0900 Subject: [PATCH 11/12] processor_tda: Use back ticks on Risper occurrences Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 01852c1a3..136a59eaf 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -101,7 +101,7 @@ Persistent homology requires a **scale parameter** (Rips radius / distance thres 1. **Automatic multi-quantile scan** (`threshold = 0`, default) * The off-diagonal distances are collected, sorted, and several quantiles are evaluated, for example `q ∈ {0.10, 0.20, …, 0.90}`. - * For each candidate quantile `q`, a threshold `r_q` is chosen and Betti numbers are computed using Ripser. + * For each candidate quantile `q`, a threshold `r_q` is chosen and Betti numbers are computed using `Ripser`. * The plugin prefers the scale where **Betti₁** (loops) is maximized; if all Betti₁ are zero, it falls back to Betti₀ as a secondary indicator. 2. **Fixed quantile mode** (`0 < threshold < 1`) @@ -111,18 +111,18 @@ Persistent homology requires a **scale parameter** (Rips radius / distance thres Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, which gathers all `i > j` entries of the distance matrix, sorts them, and picks the specified quantile index. -### 5. Persistent Homology through Ripser +### 5. Persistent Homology through `Ripser` -Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **Ripser**, a well-known implementation of Vietoris-Rips persistent homology: +Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **`Ripser`**, a well-known implementation of Vietoris-Rips persistent homology: 1. **Compression and C API** - * The dense `n_embed × n_embed` matrix is converted into Ripser's `compressed_lower_distance_matrix`. - * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs Ripser up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in ($\mathbb{Z}/2\mathbb{Z}$), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. + * The dense `n_embed × n_embed` matrix is converted into `Ripser`'s `compressed_lower_distance_matrix`. + * The wrapper function `flb_ripser_compute_betti_from_dense_distance` runs `Ripser` up to `max_dim = 2` (H₀, H₁, H₂), using coefficients in ($\mathbb{Z}/2\mathbb{Z}$), and accumulates persistence intervals into Betti numbers with a small persistence cutoff to ignore very short-lived noise features. 2. **Interval aggregation** - * A callback (`interval_recorder`) receives all persistence intervals ($\text{birth}$, $\text{death}$) from Ripser. + * A callback (`interval_recorder`) receives all persistence intervals ($\text{birth}$, $\text{death}$) from `Ripser`. * Intervals with very small persistence are filtered out, and the remaining ones are counted per homology dimension to form Betti numbers. 3. **Multi-scale selection** From fff872e998d742e240b1bfd41d37401baf3b2dd5 Mon Sep 17 00:00:00 2001 From: Hiroshi Hatake Date: Mon, 18 May 2026 18:42:06 +0900 Subject: [PATCH 12/12] processor_tda: Suppress more suggestions from local vale Signed-off-by: Hiroshi Hatake --- pipeline/processors/tda.md | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/pipeline/processors/tda.md b/pipeline/processors/tda.md index 136a59eaf..44050e690 100644 --- a/pipeline/processors/tda.md +++ b/pipeline/processors/tda.md @@ -77,7 +77,11 @@ $$ If `n_raw < (m − 1)τ + 1`, TDA is skipped until enough data has accumulated. -This embedding follows the idea of **`Takens` theorem**, which states that, under mild conditions, the dynamics of a system can be reconstructed from delay-embedded observations of a single time series or a low-dimensional observable [2]. In this plugin, the observable is the multi-dimensional vector of aggregated metrics. +This embedding follows the idea of **`Takens` theorem** [2]. + +Under mild conditions, the theorem states that system dynamics can be reconstructed from delay-embedded observations of a single time series or low-dimensional observable. + +In this plugin, the observable is the multi-dimensional vector of aggregated metrics. Intuitively: @@ -92,7 +96,11 @@ $$ d(i, j) = \left| x_i - x_j \right|_2 $$ -The implementation iterates over all pairs `(i, j)` with `i > j`, accumulates squared differences across both feature dimensions and lags, and then takes the square root; the matrix is stored symmetrically with zeros on the diagonal. +The implementation iterates over all pairs `(i, j)` with `i > j` +and accumulates squared differences across feature dimensions and lags. + +The square root is then taken, and the matrix is stored symmetrically +with zeros on the diagonal. ### 4. Threshold selection (Rips scale) @@ -113,7 +121,7 @@ Internally, quantile selection is handled by `tda_choose_threshold_from_dist`, w ### 5. Persistent Homology through `Ripser` -Once the compressed lower-triangular distance matrix is built, it is passed to a thin wrapper around **`Ripser`**, a well-known implementation of Vietoris-Rips persistent homology: +Once the compressed lower-triangular distance matrix is built, it's passed to a thin wrapper around **`Ripser`**, a well-known implementation of Vietoris-Rips persistent homology: 1. **Compression and C API** @@ -137,11 +145,15 @@ Once the compressed lower-triangular distance matrix is built, it is passed to a | Metric name | Type | Description | | ---------------------- | ----- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `fluentbit_tda_betti0` | gauge | Approximate Betti₀. The number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes". | +| `fluentbit_tda_betti0` | gauge | Approximate Betti₀. The number of connected components (clusters) in the embedded point cloud at the selected scale. Large values indicate fragmentation into many "micro-regimes." | | `fluentbit_tda_betti1` | gauge | Approximate Betti₁. The number of 1-dimensional loops / cycles in the Rips complex. Non-zero values often signal **recurrent, quasi-periodic, or cycling behavior**, typical of intermittent failure / recovery patterns and other regime switches. | | `fluentbit_tda_betti2` | gauge | Approximate Betti₂. The number of 2-dimensional voids (higher-order structures). These can appear when the system explores different "surfaces" in state space, for example, transitioning between distinct operating modes. | -Each metric is timestamped with the current time at the moment of TDA computation and is exported through the same metrics context it received, so downstream metric outputs can scrape or forward them like any other Fluent Bit metric. +Each metric is timestamped at the time of TDA computation. + +The metric is exported through the same metrics context that produced it. + +Downstream metric outputs can scrape or forward these metrics like any other Fluent Bit metric. --- @@ -153,7 +165,7 @@ Topologically, Betti numbers count the number of "holes" of each dimension in a * **Betti₁**: 1-dimensional holes (loops / cycles). * **Betti₂**: 2-dimensional voids, and so on. -In our context: +In this TDA context: * The sliding window of metrics is a **point cloud in phase space**. * The Rips complex at a given scale connects points that are close in this space. @@ -170,7 +182,7 @@ Some practical patterns: * A brief outage or spike happens once and resolves. * The embedding sees a short excursion but no sustained cycling, so Betti₁ and Betti₂ often remain near `0`. - * In the provided HTTP example, a single failing chunk does not significantly raise Betti₁/₂. + * In the provided HTTP example, a single failing chunk doesn't significantly raise Betti₁/₂. 3. **Intermittent failure / unstable regime** @@ -271,19 +283,19 @@ This configuration reconstructs the system in an effective dimension of `4 × fe * Standard indicators (mean, percentiles, error rates) show "noise," but you want to know whether that noise hides **coherent structure**. * You want to build alerts not only on "levels" of metrics, but on **changes in the topology** of system behavior. For example: - * "Raise an alert if Betti₁ remains above 5 for more than 5 minutes." + * "Raise an alert if Betti₁ remains 5 or greater for more than 5 minutes." * "Mark windows where Betti₂ becomes non-zero as potential phase transitions." Because the plugin operates on an arbitrary selection of metrics (chosen upstream through `metrics_selector` or by how you configure `fluentbit_metrics`), you can tailor the TDA to focus on: * Network health (latency histograms, connection failures, TLS handshake errors), * Resource saturation (CPU, memory, buffer usage), -* Pipeline-level signals (retries, DLQ usage, chunk failures), +* Pipeline-level signals (retries, Dead Latter Queue usage, chunk failures), * Or any other metric subset that meaningfully characterizes the state of your system. --- ## References -1. I. Donato, M. Gori, A. Sarti, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. +1. I. `Donato`, M. `Gori`, A. `Sarti`, "Persistent homology analysis of phase transitions," _Physical Review E_, 93, 052138, 2016. 2. F. `Takens` "Detecting strange attractors in turbulence," in D. Rand and L.-S. Young (eds.), _Dynamical Systems and Turbulence_, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381.