Skip to content

Commit 6452e76

Browse files
committed
Deduplicator - Add README files
1 parent 30806cd commit 6452e76

2 files changed

Lines changed: 53 additions & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,4 @@ functionality/purposes are:
66

77
* [Sampler](modules/sampler/): sample records at the given rate.
88
* [Telemetry](modules/telemetry/): provides unirec telemetry of the input interface.
9+
* [Deduplicator](modules/deduplicator/): omit duplicate records.

modules/deduplicator/README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Deduplicator module - README
2+
3+
## Description
4+
The module is used to avoid forwarding duplicate Unirec records
5+
that appear when the same flow is exported twice on different exporters and sent to same collector.
6+
It identifies and forwards only unique records, ignoring records that have already been seen.
7+
The storage is provided by hash map.
8+
9+
## Interfaces
10+
- Input: 1
11+
- Output: 1
12+
13+
## Parameters
14+
### Common TRAP parameters
15+
- `-h [trap,1]` Print help message for this module / for libtrap specific parameters.
16+
- `-i IFC_SPEC` Specification of interface types and their parameters.
17+
- `-v` Be verbose.
18+
- `-vv` Be more verbose.
19+
- `-vvv` Be even more verbose.
20+
21+
### Module specific parameters
22+
- `-s, --size <int>` Count of records that hash table can keep simultaneously. Default value is 2^20
23+
- `-t, --timeout <int>` Time to consider similar flows as duplicates in milliseconds. Default value 5000(5s)
24+
- `-m, --appfs-mountpoint <path>` Path where the appFs directory will be mounted
25+
26+
## Identification of duplicates flows
27+
Flows are considered as duplicates when they:
28+
- arrive to the collector with less than `--timeout` delay
29+
- have same source and destination ip addresses, ports and protocol field value
30+
- have distinct `LINK_BIT_FIELD` values
31+
32+
## Usage Examples
33+
```
34+
# Data from the input unix socket interface "in" is processed, and entries that
35+
are duplicates of entries received during last 1000 milliseconds are omitted, other are forwarded to the
36+
output interface "out." Transient storage is hash map with 2^15 records.
37+
38+
$ deduplicator -i "u:in,u:out" -s 15 -t 1000
39+
```
40+
41+
## Telemetry data format
42+
```
43+
├─ input/
44+
│ └─ stats
45+
└─ deduplicator/
46+
└─ statistics
47+
```
48+
49+
Statistics file contains counts of flows :
50+
- Replaced flows - flows that were inserted to the bucket and the oldest flow from the bucket is removed.
51+
- Deduplicated flows - flows that were identified as duplicates and were omitted.
52+
- Inserted flows - flows that were normally inserted (not Replaced nor Deduplicated).

0 commit comments

Comments
 (0)