netdata · github-actions · May 29, 2026
diff --git a/docs/Alerts & Notifications/Alert Configuration Reference.mdx b/docs/Alerts & Notifications/Alert Configuration Reference.mdx
@@ -1382,6 +1382,124 @@ template: ml_5min_node
 <br/>
 </details><br/>
 
+<details>
+<summary><strong>Example 8: Boolean / Binary Metric Alerting</strong></summary><br/>
+
+**Scenario:** Monitor a boolean 0/1 health-check gauge and choose the right aggregation method for your alerting intent.
+
+**Why This Matters:** Boolean metrics require different aggregation strategies depending on whether you need to detect any single failure, confirm a sustained outage, or check the current state. Choosing the wrong method leads to missed alerts or alert noise.
+
+**Approach 1: Detect Any Failure Event (average)**
+
+```text
+ alarm: service_failure_event
+    on: my_service.health_status
+lookup: average -10s of health_status
+ every: 10s
+  warn: $this > 0
+   info: any failure detected in the last 10 seconds
+    to: sysadmin
+```
+
+Use when the metric acts as a failure indicator — the value is 0 normally and 1 when a failure occurs. `average` over a short window naturally reflects any non-zero sample: if the metric was 1 at any point, the average will be greater than 0. This is the same pattern used by Netdata's Docker container health monitoring (`average -10s of unhealthy`, `warn: $this > 0`).
+
+:::note 
+
+Do not use `sum` for boolean 0/1 gauges. While `sum -5m unaligned absolute` would technically detect failures (any non-zero sample makes the sum positive), `sum` produces a count of seconds in state 1 rather than an intuitive threshold. Use `sum` only for counter/cumulative metrics like packet drops or error totals — see [Example 4: Network Packet Drops](#example-4-network-packet-drops) for a correct `sum` use case.
+
+:::
+
+**Approach 2: Detect Any Downtime (min) or Continuous Outage (max)**
+
+```text
+ alarm: service_any_downtime
+    on: my_service.health_status
+lookup: min -5m unaligned
+ every: 10s
+  crit: $this == 0
+   info: metric dropped to 0 at some point in the last 5 minutes
+    to: sysadmin
+```
+
+Use when the metric is 1 = healthy and 0 = unhealthy. `min` returns the lowest value in the window — if the metric dropped to 0 at any point, the alert fires. This catches even brief outages.
+
+For the stricter check of **continuous outage** (metric was never 1), use `max`:
+
+```text
+ alarm: service_continuous_outage
+    on: my_service.health_status
+lookup: max -5m unaligned
+ every: 10s
+  crit: $this == 0
+   info: service was down for the entire last 5 minutes
+    to: sysadmin
+```
+
+`max` returns the highest value in the window. If `max == 0`, the metric never reached 1 — the service was down the entire time.
+
+**Approach 3: Measure Failure Rate (average)**
+
+```text
+ alarm: service_failure_rate
+    on: my_service.health_status
+lookup: average -5m unaligned of health_status
+ every: 1m
+  warn: $this > 0.1
+   crit: $this > 0.5
+   info: failure rate exceeded threshold over the last 5 minutes
+    to: sysadmin
+```
+
+When the metric is 0 = healthy and 1 = failure, `average` over the window returns a value between 0.0 and 1.0 representing the fraction of time spent in failure. `warn: $this > 0.1` fires when the service was failing more than 10% of the time, and `crit: $this > 0.5` fires when failures exceeded half the window. This is useful for SLO-style alerting where occasional failures are acceptable.
+
+:::note
+**Note on `percentage`:** The `percentage` option calculates each dimension's share of the chart total — it is designed for multi-dimension charts like `system.ram` (see [Task 3: Create a Simple Alert](#task-3-create-a-simple-alert): `lookup: average -1m percentage of used`). For a single-dimension boolean gauge, `percentage` always returns 100. Use plain `average` and compare against 0.0–1.0 thresholds instead.
+:::
+
+**Approach 4: Instant State Check (calc, no lookup)**
+
+```text
+ alarm: service_current_state
+    on: my_service.health_status
+  calc: $health_status
+ every: 10s
+  crit: $this == 0
+   info: service is currently down
+    to: sysadmin
+ delay: down 5m
+```
+
+Use to check only the current value without time-window aggregation. The `calc: $health_status` references the chart dimension directly — no `lookup` needed. Note that `$status` is a built-in alert variable (the alert's own status code, −2 to 3) and must not be used here; use the dimension name instead (e.g. `$health_status` for a dimension named `health_status`). The `delay: down 5m` debounces recovery notifications, requiring the alert to stay clear for 5 minutes before sending recovery. This is the same pattern used in `health.d/timex.conf` for clock sync state monitoring (`calc: $state`).
+
+**Comparison: Which Method to Use**
+
+| Intent                                                  | Method | Lookup / Calc              | Condition    | Fires When                                |
+| ------------------------------------------------------- | ------ | -------------------------- | ------------ | ----------------------------------------- |
+| Any failure event (metric is 0 normally, 1 on failure)  | `average`  | `average -10s of health_status` | `$this > 0` | Metric was non-zero at any point in the window |
+| Any downtime (metric is 1=healthy, 0=down)              | `min`  | `min -5m unaligned`        | `$this == 0` | Metric hit 0 at any point in the window   |
+| Continuous outage (metric is 1=healthy, 0=down)         | `max`  | `max -5m unaligned`        | `$this == 0` | Metric was 0 for the entire window        |
+| Failure rate over time                                  | `average` | `average -5m unaligned of health_status` | `$this > 0.N`  | Failure fraction exceeds threshold (0.0–1.0) |
+| Current state only                                      | `calc` | `calc: $health_status` (no lookup)   | `$this == 0` | Current value is 0 (debounce with delay)  |
+
+For a full list of available lookup methods and processing options (`average`, `min`, `max`, `sum`, `percentage`, `absolute`, etc.), see the [Alert Line `lookup`](#alert-line-lookup) section.
+
+**Key Points:**
+
+- Boolean 0/1 metrics work with all standard lookup methods — the choice depends on your alerting intent
+- Use `average` over a short window for failure detection (`average -10s of <dimension>`, `warn: $this > 0`) — the same pattern Netdata uses in its own health configs (e.g., `health.d/docker.conf`)
+- Use `min` for "was it ever down?" and `max` for "was it continuously down?"
+- Use `average` for SLO-style failure-rate alerting (returns 0.0–1.0 fraction of time in failure state; compare against decimal thresholds)
+- Use `calc` without `lookup` for instant state checks, combined with `delay` for debouncing
+- Avoid `sum` on boolean gauges — it produces a count of seconds in state 1, not an intuitive threshold. Use `sum` only for counter/cumulative metrics (e.g., total packet drops in a time window)
+
+**Variables Used:**
+
+- `$this` — Result of the `lookup` or `calc` expression
+- `$health_status` — Dimension value from the chart (used in the `calc` approach; the variable name matches the dimension name, e.g. `health_status`)
+
+<br/>
+</details><br/>
+
 **Next Steps:** Having trouble with your alerts? Continue to [Troubleshooting](#troubleshooting) for debugging techniques.
 
 ## Troubleshooting

diff --git a/docs/Collecting Metrics/Collectors/Collectors.mdx b/docs/Collecting Metrics/Collectors/Collectors.mdx
@@ -254,6 +254,10 @@ import { Grid, Box } from '@site/src/components/Grid_integrations';
     <img custom-image src="https://netdata.cloud/img/cassandra.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
 </Box>
 
+<Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/networking/cato-networks" title="Cato Networks">
+    <img custom-image src="https://netdata.cloud/img/network-wired.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
+</Box>
+
 <Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/storage-and-filesystems/ceph" title="Ceph">
     <img custom-image src="https://netdata.cloud/img/ceph.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
 </Box>
@@ -810,10 +814,6 @@ import { Grid, Box } from '@site/src/components/Grid_integrations';
     <img custom-image src="https://netdata.cloud/img/netfilter.png" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="low" data-logo-contrast-dark="ok" data-logo-contrast-confidence="high"/>
 </Box>
 
-<Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/networking/network-connections" title="Network Connections">
-    <img custom-image src="https://netdata.cloud/img/network.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
-</Box>
-
 <Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/networking/network-interfaces" title="Network interfaces">
     <img custom-image src="https://netdata.cloud/img/network-wired.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
 </Box>
@@ -1123,7 +1123,7 @@ import { Grid, Box } from '@site/src/components/Grid_integrations';
 </Box>
 
 <Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/operating-systems/system-statistics" title="System statistics">
-    <img custom-image src="https://netdata.cloud/img/linuxserver.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="unknown" data-logo-contrast-dark="unknown" data-logo-contrast-confidence="low"/>
+    <img custom-image src="https://netdata.cloud/img/windows.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="ok" data-logo-contrast-dark="ok" data-logo-contrast-confidence="medium"/>
 </Box>
 
 <Box banner="by Netdata" banner_color="#00ab44" to="/docs/collecting-metrics/collectors/hardware-and-sensors/system-thermal-zone" title="System thermal zone">
@@ -1694,6 +1694,10 @@ import { Grid, Box } from '@site/src/components/Grid_integrations';
     <img custom-image src="https://netdata.cloud/img/netatmo.svg" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="low" data-logo-contrast-dark="ok" data-logo-contrast-confidence="medium"/>
 </Box>
 
+<Box banner="by Community" banner_color="rgba(0, 0, 0, 0.25)" to="/docs/collecting-metrics/collectors/networking/network-connections" title="Network Connections">
+
+</Box>
+
 <Box banner="by Community" banner_color="rgba(0, 0, 0, 0.25)" to="/docs/collecting-metrics/collectors/applications/nextcloud-servers" title="Nextcloud servers">
     <img custom-image src="https://netdata.cloud/img/nextcloud.png" style={{width: '90%', maxHeight: '100%', verticalAlign: 'middle' }} data-integration-logo="true" data-logo-contrast-light="ok" data-logo-contrast-dark="ok" data-logo-contrast-confidence="high"/>
 </Box>