Skip to content

Commit 5a4fe00

Browse files
committed
Merge branch 'test' of github.com:flashcatcloud/knowledge-base into test
2 parents 48ef850 + c8b1ba0 commit 5a4fe00

9 files changed

Lines changed: 18 additions & 45 deletions

File tree

flashduty/en/1. On-call/4. Configure On-call/4.7 Templates.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,8 @@ Title | string | Yes | Alert title
132132
Description | string | Yes | Alert description, can be empty
133133
AlertSeverity | string | Yes | Severity level, enum values: Critical, Warning, Info
134134
AlertStatus | string | Yes | Alert status, enum values: Critical, Warning, Info, Ok
135-
Progress | string | Yes | Processing progress, enum values: Triggered, Processing, Closed
136135
StartTime | int64 | Yes | Trigger time, Unix timestamp in seconds
137136
EndTime | int64 | No | Recovery time, Unix timestamp in seconds, default 0
138-
CloseTime | int64 | No | Close time, EndTime is alert recovery time, CloseTime is processing progress close time. Alert automatically closes upon recovery, manual closure doesn't affect alert recovery. Unix timestamp in seconds, default 0
139137
`Labels` | map[string]string | No | Label key-value pairs, both Key and Value are strings
140138

141139
## Common Questions

flashduty/en/1. On-call/8. Integrations/8.5 Webhooks/8.5.1 Alert webhook.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -64,13 +64,11 @@ email | string | Yes | Email address
6464
| alert_key | string | Yes | Alert correlation basis |
6565
| alert_severity | string | Yes | Severity level, enum: Critical, Warning, Info |
6666
| alert_status | string | Yes | Alert status, enum: Critical, Warning, Info, Ok |
67-
| progress | string | Yes | Processing progress, enum: Triggered, Closed |
6867
| created_at | int64 | Yes | Creation time |
6968
| updated_at | int64 | Yes | Update time |
7069
| start_time | int64 | Yes | First trigger time (time of first event received by platform), Unix timestamp in seconds |
7170
| last_time | int64 | Yes | Latest event time (time of most recent event received by platform), Unix timestamp in seconds |
7271
| end_time | int64 | No | Alert recovery time (time when platform last received end-type event), Unix timestamp in seconds, defaults to 0 |
73-
| close_time | int64 | No | Closure time, different from end_time, this indicates progress closure, not actual alert recovery. Unix timestamp in seconds, defaults to 0 |
7472
| labels | map[string]string | No | Label key-value pairs, both Key and Value are strings |
7573
| event_cnt | int64 | No | Number of associated events |
7674
| incident | [Incident](#Incident) | No | Associated incident |
@@ -103,7 +101,6 @@ curl -X POST 'https://example.com/alert/webhook?a=a' \
103101
"alert_status":"Warning",
104102
"channel_id":1163577812973,
105103
"channel_name":"Order System",
106-
"close_time":0,
107104
"created_at":1683766015,
108105
"data_source_id":1571358104973,
109106
"data_source_name":"Aliyun SLS",
@@ -131,7 +128,6 @@ curl -X POST 'https://example.com/alert/webhook?a=a' \
131128
"severity":"6"
132129
},
133130
"last_time":1683809153,
134-
"progress":"Triggered",
135131
"start_time":1683766013,
136132
"title":"Test alert trigger to Flashduty",
137133
"title_rule":"$resource::$check",

flashduty/en/1. On-call/8. Integrations/8.5 Webhooks/8.5.3 Custom action.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -95,13 +95,11 @@ acknowledged_at | int64| No | Acknowledgment time
9595
| alert_key | string | Yes | Alert correlation basis |
9696
| alert_severity | string | Yes | Severity level, enum: Critical, Warning, Info |
9797
| alert_status | string | Yes | Alert status, enum: Critical, Warning, Info, Ok |
98-
| progress | string | Yes | Processing progress, enum: Triggered, Closed |
9998
| created_at | int64 | Yes | Creation time |
10099
| updated_at | int64 | Yes | Update time |
101100
| start_time | int64 | Yes | First trigger time (time of first event received by platform), Unix timestamp in seconds |
102101
| last_time | int64 | Yes | Latest event time (time of latest event received by platform), Unix timestamp in seconds |
103102
| end_time | int64 | No | Alert recovery time (time of last end-type event received by platform), Unix timestamp in seconds, default 0 |
104-
| close_time | int64 | No | Close time, different from end_time, this is processing progress closure, doesn't mean alert actually recovered. Unix timestamp in seconds, default 0 |
105103
| labels | map[string]string | No | Label KV pairs, both Key and Value are strings |
106104

107105
</div>
@@ -132,7 +130,6 @@ curl -X POST 'https://example.com/incident/action?a=a' \
132130
"alert_key": "asdflasdfl2xzasd112621",
133131
"alert_severity": "Critical",
134132
"alert_status": "Critical",
135-
"close_time": 0,
136133
"created_at": 1699869567,
137134
"data_source_id": 2398086111504,
138135
"description": "cpu.idle < 20%",
@@ -148,10 +145,6 @@ curl -X POST 'https://example.com/incident/action?a=a' \
148145
"v": "v"
149146
},
150147
"last_time": 1699869562,
151-
"progress": "Triggered",
152-
"responder_email": "",
153-
"responder_id": 0,
154-
"responder_name": "",
155148
"start_time": 1699869562,
156149
"title": "nj / es.nj.01 - 自定义字段测试",
157150
"title_rule": "$cluster::$resource::$check",
Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,31 @@
11
---
2-
title: "Alert Engine (Monitors) Introduction"
3-
description: "The Alert Engine (Monitors) integrates with various metric and log data sources, performs threshold evaluation based on user-configured alert rules through periodic data queries, generates alert events, and finally pushes them to Flashduty On-call for aggregation and delivery."
2+
title: "Alerting Engine (Monitors) Introduction"
3+
description: "The Alerting Engine (Monitors) integrates with various metric and log data sources, performs threshold evaluation based on user-configured alert rules through periodic data queries, generates alert events, and finally pushes them to Flashduty On-call for aggregation and delivery."
44
date: "2025-11-07T18:49:47.055+08:00"
55
url: "https://docs.flashcat.cloud/en/flashduty/monitors/introduction?nav=01JCQ7A4N4WRWNXW8EWEHXCMF5"
66
---
77

8-
## What is the Alert Engine (Monitors)?
8+
## What is the Alerting Engine (Monitors)?
99

10-
The Alert Engine (Monitors) integrates with various metric and log data sources, performs threshold evaluation based on your configured alert rules through periodic data queries, generates alert events, and finally pushes them to Flashduty On-call for aggregation and delivery.
10+
The Alerting Engine (Monitors) integrates with various metric and log data sources, performs threshold evaluation based on your configured alert rules through periodic data queries, generates alert events, and finally pushes them to Flashduty On-call for aggregation and delivery.
1111

12-
Flashduty Monitors can replace the alerting capabilities of products like Nightingale, vmalert, and elastalert. The Monitors alert engine is designed to be extremely flexible and deeply integrated with On-call products, capable of meeting various complex alerting requirements.
12+
Flashduty Monitors can replace the alerting capabilities of products like Nightingale, vmalert, and elastalert. The Monitors alerting engine is designed to be extremely flexible and deeply integrated with On-call products, capable of meeting various complex alerting requirements.
1313

14-
## Alert Engine (Monitors) Architecture Design
14+
## Alerting Engine (Monitors) Architecture Design
1515

16-
Flashduty is a SaaS service that cannot access data sources within users' private networks from the SaaS side. Therefore, the Alert Engine (Monitors) consists of two parts:
16+
Flashduty is a SaaS service that cannot access data sources within users' private networks from the SaaS side. Therefore, the Alerting Engine (Monitors) consists of two parts:
1717

1818
- **SaaS Server**: Responsible for managing alert rules and permissions
1919
- **monitedge**: Deployed within users' private networks, synchronizes alert rules from SaaS, performs periodic data queries and threshold evaluation, generates alert events and pushes them to the SaaS side
2020

2121
The architecture diagram is shown below:
2222

23-
![Flashduty Monitors Architecture Diagram](https://docs-cdn.flashcat.cloud/imges/mon/a4341737494509d131b637a74399a43c.png)
23+
![Flashduty Monitors Architecture Diagram](https://docs-cdn.flashcat.cloud/imges/mon/810c0f78abf52714ee32cee84461cdcc.png)
2424

2525
The diagram assumes that the customer has two data centers, East US and South China. Each data center has a `monitedge` instance deployed, responsible for alert evaluation of data sources within their respective data centers and pushing alert events to the SaaS side.
2626

2727
If you only have one data center, or if the network quality between data centers is good, you can also deploy only one `monitedge` instance to handle alert evaluation for all data sources.
2828

2929
If you are concerned about single point of failure risks when deploying one `monitedge`, you can also deploy multiple `monitedge` instances to form a cluster. For example, deploy 2 `monitedge` instances in the East US data center to form a cluster, setting the same cluster name through the `--alerter.clusterName meidong` parameter when starting the instances; deploy 2 `monitedge` instances in the South China data center to form another cluster, setting another cluster name through the `--alerter.clusterName huanan` parameter when starting these two instances.
3030

31-
Multiple instances in an alert engine cluster will automatically shard the processing of alert rules. For example, if this cluster needs to process 100 alert rules, the system will automatically balance the load, allowing each `monitedge` instance to process 50 rules respectively. If one instance fails, another instance will take over the processing of all 100 alert rules, ensuring high availability while avoiding duplicate alert event delivery.
31+
Multiple instances in an alerting engine cluster will automatically shard the processing of alert rules. For example, if this cluster needs to process 100 alert rules, the system will automatically balance the load, allowing each `monitedge` instance to process 50 rules respectively. If one instance fails, another instance will take over the processing of all 100 alert rules, ensuring high availability while avoiding duplicate alert event delivery.

flashduty/en/3. Monitors/1. Getting Started/2. Quick Start.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Quick Start"
33
description: "Monitors quick start guide to help you quickly get started with Monitors functionality."
44
date: "2025-11-07T19:24:43.956+08:00"
5-
url: "https://docs.flashcat.cloud/en/flashduty/monitors/installation?nav=01JCQ7A4N4WRWNXW8EWEHXCMF5"
5+
url: "https://docs.flashcat.cloud/en/flashduty/monitors/getting-started?nav=01JCQ7A4N4WRWNXW8EWEHXCMF5"
66
---
77

88
To experience Monitors functionality, there are three core steps: install `monitedge`, create data sources, and create alert rules.
@@ -48,28 +48,28 @@ The following details the various configurations of alert rules. Each field usua
4848

4949
### Basic Configuration
5050

51-
![Basic Configuration](https://docs-cdn.flashcat.cloud/imges/mon/3a2978a22d7a23dd862fdbd409adf663.png)
51+
![Basic Configuration](https://docs-cdn.flashcat.cloud/imges/mon/90942addfba10594e6fd97944fc20d43.png)
5252

5353
- **Rule Name**: The name of the alert rule, for easy identification and management. Variable references are not supported because names may be used for filtering, aggregation and other operations in the future, and fixed names are more convenient for processing.
5454
- **Additional Labels**: Similar to `labels` in Prometheus alert rules, they will be attached to all alert events generated by this rule, facilitating filtering, routing, inhibition and other operations in On-call.
5555

5656
### Data Source Selection
5757

58-
![Data Source Selection](https://docs-cdn.flashcat.cloud/imges/mon/9971af45b4bc19bfe807898bf1bf10a0.png)
58+
![Data Source Selection](https://docs-cdn.flashcat.cloud/imges/mon/1f183f153a46a42486573c672c457afc.png)
5959

6060
Monitors can make one rule effective for multiple data sources, and wildcards can be used, such as `db-*`, indicating that this rule will apply to all data sources whose names start with `db-`.
6161

6262
> ⚠️ Note: Because wildcards need to be supported here for data sources, data source names are stored instead of data source IDs. If the data source name is modified, it will affect the effectiveness of alert rules. Please be cautious when modifying data source names.
6363
6464
### Query Detection Method
6565

66-
![Query Detection Method](https://docs-cdn.flashcat.cloud/imges/mon/4e38b46952d6cbbfb97c2d28843dcdbe.png)
66+
![Query Detection Method](https://docs-cdn.flashcat.cloud/imges/mon/7726c726d684559c181129a2cac1b8a4.png)
6767

6868
This section is used to configure how to query data from data sources and how to determine alert conditions. This functionality is designed to be very flexible, which also brings higher complexity. Please read the usage instructions on the right side of **Query Detection Method** on the page to understand the configuration method.
6969

7070
### Detection Frequency & Effective Time
7171

72-
![Detection Frequency & Effective Time](https://docs-cdn.flashcat.cloud/imges/mon/0980d71a653985a1706243fc6795685e.png)
72+
![Detection Frequency & Effective Time](https://docs-cdn.flashcat.cloud/imges/mon/535c3e683411a3af3acae8b4415e622d.png)
7373

7474
- **Detection Frequency**: Usually periodic detection, also supports configuring `cron` expressions. The `cron` expressions in Monitors are accurate to the second.
7575
- **Effective Time**: Configure the effective time period for alert rules. Alerts will not be triggered during non-effective time periods.
@@ -88,18 +88,18 @@ This section is used to configure how to query data from data sources and how to
8888

8989
After completing the above configuration, if alert conditions are triggered, alert events will be generated, and the status in front of the alert rule will also change to `Triggered`.
9090

91-
![Alert Rules List Page](https://docs-cdn.flashcat.cloud/imges/mon/6f1fad7d65b5aee6b89bf0f0a564a1be.png)
91+
![Alert Rules List Page](https://docs-cdn.flashcat.cloud/imges/mon/38ba91252ee84a69b081d2e1e9ee65ea.png)
9292

9393
Clicking `Triggered` will show the alert events generated by this rule (you can also view them in On-call):
9494

95-
![Alert Events List](https://docs-cdn.flashcat.cloud/imges/mon/3a307bf41012c5e085d81ca8b2dc443b.png)
95+
![Alert Events List](https://docs-cdn.flashcat.cloud/imges/mon/53463829edad49f876a182cd320bcee8.png)
9696

9797
Continue clicking on the alert event title to see the alert event details, divided into three tabs: **Alert Overview**, **Timeline**, **Associated Events**. These are all functions of the On-call system, and the meaning of each field is also quite obvious, so they will not be described one by one here.
9898

9999
## 5. Import Alert Rules
100100

101101
If you already have a batch of Prometheus alert rules and want to quickly import them into Monitors for use, you can use the alert rule import function. Menu entry: Alert Rules → Import.
102102

103-
![Import Alert Rules](https://docs-cdn.flashcat.cloud/imges/mon/a613b20d1aeaf7321be5ab43bf07a83f.png)
103+
![Import Alert Rules](https://docs-cdn.flashcat.cloud/imges/mon/a475df96e4fc0efa0896b51af4134b52.png)
104104

105105
The requirement is to import Prometheus alert rule YAML format text, in the standard Prometheus alert rule file format with `groups` as the root node. The YAML indentation must be correct, otherwise the import will fail.

flashduty/zh/1. On-call/3. 配置管理/4.7 配置通知模板.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -132,10 +132,8 @@ Title | string | 是 | 告警标题
132132
Description | string | 是 | 告警描述,可能为空
133133
AlertSeverity | string | 是 | 严重程度,枚举值:Critical,Warning,Info
134134
AlertStatus | string | 是 | 告警状态,枚举值:Critical,Warning,Info,Ok
135-
Progress | string | 是 | 处理进度,枚举值:Triggered,Processing,Closed
136135
StartTime | int64 | 是 | 触发时间,Unix 秒时间戳
137136
EndTime | int64 | 否 | 恢复时间,Unix 秒时间戳,默认为 0
138-
CloseTime | int64 | 否 | 关闭时间,EndTime 为告警恢复时间,CloseTime 为处理进度的关闭时间,告警恢复时会自动关闭,告警手动关闭时不影响告警恢复。Unix 秒时间戳,默认为 0
139137
`Labels` | map[string]string | 否 | 标签 KV,Key 和 Value 均为字符串
140138

141139
## 常见问题

flashduty/zh/1. On-call/5. 集成引导/8.5 Webhooks/8.5.1 告警 webhook.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
| a_new | 集成推送新事件,触发一条新告警 |
1414
| a_update | 集成推送新事件,合并到一条告警,并更新告警信息(严重程度、状态、labels、描述等) |
1515
| a_merge | 合并告警至故障 |
16-
| a_close | 手动关闭告警 |
1716

1817
</div>
1918

@@ -64,13 +63,11 @@ email | string | 是 | 邮件地址
6463
| alert_key | string || 告警关联依据|
6564
| alert_severity | string || 严重程度,枚举值:Critical,Warning,Info|
6665
| alert_status | string || 告警状态,枚举值:Critical,Warning,Info,Ok|
67-
| progress | string || 处理进度,枚举值:Triggered,Closed|
6866
| created_at | int64 || 创建时间|
6967
| updated_at | int64 || 更新时间|
7068
| start_time | int64 || 首次触发时间(平台接收到的首个事件的时间),Unix 秒时间戳|
7169
| last_time | int64 || 最新事件时间(平台接收到的最新事件时间),Unix 秒时间戳|
7270
| end_time | int64 || 告警恢复时间(平台上一次接收到结束类型事件的时间),Unix 秒时间戳,默认为 0|
73-
| close_time | int64 || 关闭时间,不同于 end_time,这个是处理进度的关闭,不代表告警真的恢复。Unix 秒时间戳,默认为 0 |
7471
| labels | map[string]string || 标签 KV,Key 和 Value 均为字符串|
7572
| event_cnt | int64 || 关联事件个数|
7673
| incident | [Incident](#Incident) || 所属故障|
@@ -103,7 +100,6 @@ curl -X POST 'https://example.com/alert/webhook?a=a' \
103100
"alert_status":"Warning",
104101
"channel_id":1163577812973,
105102
"channel_name":"订单系统",
106-
"close_time":0,
107103
"created_at":1683766015,
108104
"data_source_id":1571358104973,
109105
"data_source_name":"阿里云 SLS",
@@ -131,7 +127,6 @@ curl -X POST 'https://example.com/alert/webhook?a=a' \
131127
"severity":"6"
132128
},
133129
"last_time":1683809153,
134-
"progress":"Triggered",
135130
"start_time":1683766013,
136131
"title":"测试发送到Flashduty告警触发",
137132
"title_rule":"$resource::$check",

flashduty/zh/1. On-call/5. 集成引导/8.5 Webhooks/8.5.3 自定义操作.md

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -96,13 +96,11 @@ acknowledged_at | int64| 否 | 认领时间
9696
| alert_key | string || 告警关联依据|
9797
| alert_severity | string || 严重程度,枚举值:Critical,Warning,Info|
9898
| alert_status | string || 告警状态,枚举值:Critical,Warning,Info,Ok|
99-
| progress | string || 处理进度,枚举值:Triggered,Closed|
10099
| created_at | int64 || 创建时间|
101100
| updated_at | int64 || 更新时间|
102101
| start_time | int64 || 首次触发时间(平台接收到的首个事件的时间),Unix 秒时间戳|
103102
| last_time | int64 || 最新事件时间(平台接收到的最新事件时间),Unix 秒时间戳|
104103
| end_time | int64 || 告警恢复时间(平台上一次接收到结束类型事件的时间),Unix 秒时间戳,默认为 0|
105-
| close_time | int64 || 关闭时间,不同于 end_time,这个是处理进度的关闭,不代表告警真的恢复。Unix 秒时间戳,默认为 0 |
106104
| labels | map[string]string || 标签 KV,Key 和 Value 均为字符串|
107105

108106
</div>
@@ -133,7 +131,6 @@ curl -X POST 'https://example.com/incident/action?a=a' \
133131
"alert_key": "asdflasdfl2xzasd112621",
134132
"alert_severity": "Critical",
135133
"alert_status": "Critical",
136-
"close_time": 0,
137134
"created_at": 1699869567,
138135
"data_source_id": 2398086111504,
139136
"description": "cpu.idle < 20%",
@@ -149,10 +146,6 @@ curl -X POST 'https://example.com/incident/action?a=a' \
149146
"v": "v"
150147
},
151148
"last_time": 1699869562,
152-
"progress": "Triggered",
153-
"responder_email": "",
154-
"responder_id": 0,
155-
"responder_name": "",
156149
"start_time": 1699869562,
157150
"title": "nj / es.nj.01 - 自定义字段测试",
158151
"title_rule": "$cluster::$resource::$check",

0 commit comments

Comments
 (0)