Skip to content

Commit 14a76c0

Browse files
author
Pavel Siska
committed
Clickhouse - add README.md
1 parent 8e35a71 commit 14a76c0

1 file changed

Lines changed: 165 additions & 0 deletions

File tree

modules/clickhouse/README.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# clickhouse output module
2+
Converts Unirec records into clickhouse format and stores them into database/s.
3+
- When multiple database endpoints are specified data is sent only to one of them.
4+
By default it is the first one and the others are used if the previous ones fail.
5+
6+
## Interfaces
7+
- Input: 1
8+
- Output: 0
9+
10+
## Parameters
11+
### Common TRAP parameters
12+
- `-h [trap,1]` Print help message for this module / for libtrap specific parameters.
13+
- `-i IFC_SPEC` Specification of interface types and their parameters.
14+
- `-v` Be verbose.
15+
- `-vv` Be more verbose.
16+
- `-vvv` Be even more verbose.
17+
18+
### Module specific parameters
19+
- `-c, --config <int>` YAML config specifying connections params and data columns
20+
21+
## Usage
22+
The module expects the ClickHouse database to already contain the table with
23+
appropriate schema corresponding to the configuration entered. The existence
24+
and schema of the table is checked after initiating connection to the database
25+
and an error is displayed if there is a mismatch. The table is not
26+
automatically created.
27+
28+
### Unirec to clickhouse type conversion
29+
| Unirec | Clickhouse | | Unirec | Clickhouse |
30+
|---------|---------------|-|----------|----------------------|
31+
| int8 | Int8 | | int8* | Array(Int8) |
32+
| int16 | Int16 | | int16* | Array(Int16) |
33+
| int32 | Int32 | | int32* | Array(Int32) |
34+
| int64 | Int64 | | int64* | Array(Int64) |
35+
| uint8 | UInt8 | | uint8* | Array(UInt8) |
36+
| uint16 | UInt16 | | uint16* | Array(UInt16) |
37+
| uint32 | UInt32 | | uint32* | Array(UInt32) |
38+
| uint64 | UInt64 | | uint64* | Array(UInt64) |
39+
| char | UInt8 | | char* | Array(UInt8) |
40+
| float | Float32 | | float* | Array(Float32) |
41+
| double | Float64 | | double* | Array(Float64) |
42+
| ipaddr | IPv6 | | ipaddr* | Array(IPv6) |
43+
| macaddr | Array(UInt8) | | macaddr* | Array(Array(UInt8)) |
44+
| time | DateTime64(9) | | time* | Array(DateTime64(9)) |
45+
| string | String | | | |
46+
| bytes | Array(UInt8) | | | |
47+
48+
### Clickhouse database and table creation example
49+
```SQL
50+
CREATE DATABASE IF NOT EXISTS clickhouse;
51+
CREATE TABLE clickhouse.flows(
52+
"DST_IP" IPv6,
53+
"SRC_IP" IPv6,
54+
"BYTES" UInt64,
55+
"BYTES_REV" UInt64,
56+
"LINK_BIT_FIELD" UInt64,
57+
"TIME_FIRST" DateTime64(9),
58+
"TIME_LAST" DateTime64(9),
59+
"PACKETS" UInt32,
60+
"PACKETS_REV" UInt32,
61+
"DST_PORT" UInt16,
62+
"SRC_PORT" UInt16,
63+
"FLOW_END_REASON" UInt8,
64+
"PROTOCOL" UInt8,
65+
"TCP_FLAGS" UInt8,
66+
"TCP_FLAGS_REV" UInt8,
67+
"IDP_CONTENT" Array(UInt8),
68+
"IDP_CONTENT_REV" Array(UInt8),
69+
"PPI_PKT_DIRECTIONS" Array(Int8),
70+
"PPI_PKT_FLAGS" Array(UInt8),
71+
"TLS_JA3_FINGERPRINT" Array(UInt8),
72+
"TLS_SNI" String,
73+
"PPI_PKT_LENGTHS" Array(UInt16),
74+
"DBI_BRST_BYTES" Array(UInt32),
75+
"DBI_BRST_PACKETS" Array(UInt32),
76+
"D_PHISTS_IPT" Array(UInt32),
77+
"D_PHISTS_SIZES" Array(UInt32),
78+
"SBI_BRST_BYTES" Array(UInt32),
79+
"SBI_BRST_PACKETS" Array(UInt32),
80+
"S_PHISTS_IPT" Array(UInt32),
81+
"S_PHISTS_SIZES" Array(UInt32),
82+
"DBI_BRST_TIME_START" Array(DateTime64(9)),
83+
"DBI_BRST_TIME_STOP" Array(DateTime64(9)),
84+
"PPI_PKT_TIMES" Array(DateTime64(9)),
85+
"SBI_BRST_TIME_START" Array(DateTime64(9)),
86+
"SBI_BRST_TIME_STOP" Array(DateTime64(9))
87+
)
88+
ENGINE = MergeTree
89+
ORDER BY TIME_FIRST
90+
```
91+
92+
## Configuration
93+
YAML config
94+
95+
### Config specification
96+
| Parameter | Description | Default |
97+
|-----------|-------------|---------|
98+
| **connection** | The database connection parameters. | |
99+
| connection.endpoints | The possible endpoints data can be sent to, i.e., all the replicas of a particular shard. In case one endpoint is unreachable, another one is used. | |
100+
| connection.endpoints.endpoint | Connection parameters of one endpoint. | |
101+
| connection.endpoints.endpoint.host | The ClickHouse database host as a domain name or an IP address. | |
102+
| connection.endpoints.endpoint.port | The port of the ClickHouse database. | 9000 |
103+
| connection.username | The database username. | |
104+
| connection.password | The database password. | |
105+
| connection.database | The database name where the specified table is present. | |
106+
| connection.table | The name of the table to insert the data into. | |
107+
| **blocks** | Number of data blocks in circulation. Each block is de-facto a memory buffer that the rows are written to before being sent out to the ClickHouse database. | 64 |
108+
| **inserterThreads** | Number of threads used for data insertion to ClickHouse. In other words, the number of ClickHouse connections that are concurrently used. | 8 |
109+
| **blockInsertThreshold** | Number of rows to be buffered into a block before the block is sent out to be inserted into the database. | 100000 |
110+
| **blockInsertMaxDelaySecs** | Maximum number of seconds to wait before a block gets sent out to be inserted into the database even if the threshold has not been reached yet. | 10 |
111+
| **columns** | List of fields which each row consists of. It is in unirec template format. ([TYPE] [NAME]) | |
112+
113+
114+
### Example configuration
115+
```YAML
116+
connection:
117+
endpoints:
118+
- host: localhost
119+
port: 9000
120+
username: clickhouse
121+
password: clickhouse
122+
database: clickhouse
123+
table: flows
124+
125+
inserterThreads: 32
126+
blocks: 1024
127+
blockInsertThreshold: 100000
128+
129+
columns:
130+
- ipaddr DST_IP
131+
- ipaddr SRC_IP
132+
- uint64 BYTES
133+
- uint64 BYTES_REV
134+
- uint64 LINK_BIT_FIELD
135+
- time TIME_FIRST
136+
- time TIME_LAST
137+
- uint32 PACKETS
138+
- uint32 PACKETS_REV
139+
- uint16 DST_PORT
140+
- uint16 SRC_PORT
141+
- uint8 FLOW_END_REASON
142+
- uint8 PROTOCOL
143+
- uint8 TCP_FLAGS
144+
- uint8 TCP_FLAGS_REV
145+
- bytes IDP_CONTENT
146+
- bytes IDP_CONTENT_REV
147+
- int8* PPI_PKT_DIRECTIONS
148+
- uint8* PPI_PKT_FLAGS
149+
- bytes TLS_JA3_FINGERPRINT
150+
- string TLS_SNI
151+
- uint16* PPI_PKT_LENGTHS
152+
- uint32* DBI_BRST_BYTES
153+
- uint32* DBI_BRST_PACKETS
154+
- uint32* D_PHISTS_IPT
155+
- uint32* D_PHISTS_SIZES
156+
- uint32* SBI_BRST_BYTES
157+
- uint32* SBI_BRST_PACKETS
158+
- uint32* S_PHISTS_IPT
159+
- uint32* S_PHISTS_SIZES
160+
- time* DBI_BRST_TIME_START
161+
- time* DBI_BRST_TIME_STOP
162+
- time* PPI_PKT_TIMES
163+
- time* SBI_BRST_TIME_START
164+
- time* SBI_BRST_TIME_STOP
165+
```

0 commit comments

Comments
 (0)