|
| 1 | +# Deduplicator module - README |
| 2 | + |
| 3 | +## Description |
| 4 | +The module is used to avoid forwarding duplicate Unirec records |
| 5 | +that appear when the same flow is exported twice on different exporters and sent to same collector. |
| 6 | +It identifies and forwards only unique records, ignoring records that have already been seen. |
| 7 | +The storage is provided by hash map. |
| 8 | + |
| 9 | +## Interfaces |
| 10 | +- Input: 1 |
| 11 | +- Output: 1 |
| 12 | + |
| 13 | +## Parameters |
| 14 | +### Common TRAP parameters |
| 15 | +- `-h [trap,1]` Print help message for this module / for libtrap specific parameters. |
| 16 | +- `-i IFC_SPEC` Specification of interface types and their parameters. |
| 17 | +- `-v` Be verbose. |
| 18 | +- `-vv` Be more verbose. |
| 19 | +- `-vvv` Be even more verbose. |
| 20 | + |
| 21 | +### Module specific parameters |
| 22 | +- `-s, --size <int>` Count of records that hash table can keep simultaneously. Default value is 2^20 |
| 23 | +- `-t, --timeout <int>` Time to consider similar flows as duplicates in milliseconds. Default value 5000(5s) |
| 24 | +- `-m, --appfs-mountpoint <path>` Path where the appFs directory will be mounted |
| 25 | + |
| 26 | +## Identification of duplicates flows |
| 27 | +Flows are considered as duplicates when they: |
| 28 | +- arrive to the collector with less than `--timeout` delay |
| 29 | +- have same source and destination ip addresses, ports and protocol field value |
| 30 | +- have distinct `LINK_BIT_FIELD` values |
| 31 | + |
| 32 | +## Usage Examples |
| 33 | +``` |
| 34 | +# Data from the input unix socket interface "in" is processed, and entries that |
| 35 | +are duplicates of entries received during last 1000 milliseconds are omitted, other are forwarded to the |
| 36 | +output interface "out." Transient storage is hash map with 2^15 records. |
| 37 | +
|
| 38 | +$ deduplicator -i "u:in,u:out" -s 15 -t 1000 |
| 39 | +``` |
| 40 | + |
| 41 | +## Telemetry data format |
| 42 | +``` |
| 43 | +├─ input/ |
| 44 | +│ └─ stats |
| 45 | +└─ deduplicator/ |
| 46 | + └─ statistics |
| 47 | +``` |
| 48 | + |
| 49 | +Statistics file contains counts of flows : |
| 50 | +- Replaced flows - flows that were inserted to the bucket and the oldest flow from the bucket is removed. |
| 51 | +- Deduplicated flows - flows that were identified as duplicates and were omitted. |
| 52 | +- Inserted flows - flows that were normally inserted (not Replaced nor Deduplicated). |
0 commit comments