|
| 1 | +# clickhouse output module |
| 2 | +Converts Unirec records into clickhouse format and stores them into database/s. |
| 3 | +- When multiple database endpoints are specified data is sent only to one of them. |
| 4 | +By default it is the first one and the others are used if the previous ones fail. |
| 5 | + |
| 6 | +## Interfaces |
| 7 | +- Input: 1 |
| 8 | +- Output: 0 |
| 9 | + |
| 10 | +## Parameters |
| 11 | +### Common TRAP parameters |
| 12 | +- `-h [trap,1]` Print help message for this module / for libtrap specific parameters. |
| 13 | +- `-i IFC_SPEC` Specification of interface types and their parameters. |
| 14 | +- `-v` Be verbose. |
| 15 | +- `-vv` Be more verbose. |
| 16 | +- `-vvv` Be even more verbose. |
| 17 | + |
| 18 | +### Module specific parameters |
| 19 | +- `-c, --config <int>` YAML config specifying connections params and data columns |
| 20 | + |
| 21 | +## Usage |
| 22 | +The module expects the ClickHouse database to already contain the table with |
| 23 | +appropriate schema corresponding to the configuration entered. The existence |
| 24 | +and schema of the table is checked after initiating connection to the database |
| 25 | +and an error is displayed if there is a mismatch. The table is not |
| 26 | +automatically created. |
| 27 | + |
| 28 | +### Unirec to clickhouse type conversion |
| 29 | +| Unirec | Clickhouse | | Unirec | Clickhouse | |
| 30 | +|---------|---------------|-|----------|----------------------| |
| 31 | +| int8 | Int8 | | int8* | Array(Int8) | |
| 32 | +| int16 | Int16 | | int16* | Array(Int16) | |
| 33 | +| int32 | Int32 | | int32* | Array(Int32) | |
| 34 | +| int64 | Int64 | | int64* | Array(Int64) | |
| 35 | +| uint8 | UInt8 | | uint8* | Array(UInt8) | |
| 36 | +| uint16 | UInt16 | | uint16* | Array(UInt16) | |
| 37 | +| uint32 | UInt32 | | uint32* | Array(UInt32) | |
| 38 | +| uint64 | UInt64 | | uint64* | Array(UInt64) | |
| 39 | +| char | UInt8 | | char* | Array(UInt8) | |
| 40 | +| float | Float32 | | float* | Array(Float32) | |
| 41 | +| double | Float64 | | double* | Array(Float64) | |
| 42 | +| ipaddr | IPv6 | | ipaddr* | Array(IPv6) | |
| 43 | +| macaddr | Array(UInt8) | | macaddr* | Array(Array(UInt8)) | |
| 44 | +| time | DateTime64(9) | | time* | Array(DateTime64(9)) | |
| 45 | +| string | String | | | | |
| 46 | +| bytes | Array(UInt8) | | | | |
| 47 | + |
| 48 | +### Clickhouse database and table creation example |
| 49 | +```SQL |
| 50 | +CREATE DATABASE IF NOT EXISTS clickhouse; |
| 51 | +CREATE TABLE clickhouse.flows( |
| 52 | + "DST_IP" IPv6, |
| 53 | + "SRC_IP" IPv6, |
| 54 | + "BYTES" UInt64, |
| 55 | + "BYTES_REV" UInt64, |
| 56 | + "LINK_BIT_FIELD" UInt64, |
| 57 | + "TIME_FIRST" DateTime64(9), |
| 58 | + "TIME_LAST" DateTime64(9), |
| 59 | + "PACKETS" UInt32, |
| 60 | + "PACKETS_REV" UInt32, |
| 61 | + "DST_PORT" UInt16, |
| 62 | + "SRC_PORT" UInt16, |
| 63 | + "FLOW_END_REASON" UInt8, |
| 64 | + "PROTOCOL" UInt8, |
| 65 | + "TCP_FLAGS" UInt8, |
| 66 | + "TCP_FLAGS_REV" UInt8, |
| 67 | + "IDP_CONTENT" Array(UInt8), |
| 68 | + "IDP_CONTENT_REV" Array(UInt8), |
| 69 | + "PPI_PKT_DIRECTIONS" Array(Int8), |
| 70 | + "PPI_PKT_FLAGS" Array(UInt8), |
| 71 | + "TLS_JA3_FINGERPRINT" Array(UInt8), |
| 72 | + "TLS_SNI" String, |
| 73 | + "PPI_PKT_LENGTHS" Array(UInt16), |
| 74 | + "DBI_BRST_BYTES" Array(UInt32), |
| 75 | + "DBI_BRST_PACKETS" Array(UInt32), |
| 76 | + "D_PHISTS_IPT" Array(UInt32), |
| 77 | + "D_PHISTS_SIZES" Array(UInt32), |
| 78 | + "SBI_BRST_BYTES" Array(UInt32), |
| 79 | + "SBI_BRST_PACKETS" Array(UInt32), |
| 80 | + "S_PHISTS_IPT" Array(UInt32), |
| 81 | + "S_PHISTS_SIZES" Array(UInt32), |
| 82 | + "DBI_BRST_TIME_START" Array(DateTime64(9)), |
| 83 | + "DBI_BRST_TIME_STOP" Array(DateTime64(9)), |
| 84 | + "PPI_PKT_TIMES" Array(DateTime64(9)), |
| 85 | + "SBI_BRST_TIME_START" Array(DateTime64(9)), |
| 86 | + "SBI_BRST_TIME_STOP" Array(DateTime64(9)) |
| 87 | +) |
| 88 | +ENGINE = MergeTree |
| 89 | +ORDER BY TIME_FIRST |
| 90 | +``` |
| 91 | + |
| 92 | +## Configuration |
| 93 | +YAML config |
| 94 | + |
| 95 | +### Config specification |
| 96 | +| Parameter | Description | Default | |
| 97 | +|-----------|-------------|---------| |
| 98 | +| **connection** | The database connection parameters. | | |
| 99 | +| connection.endpoints | The possible endpoints data can be sent to, i.e., all the replicas of a particular shard. In case one endpoint is unreachable, another one is used. | | |
| 100 | +| connection.endpoints.endpoint | Connection parameters of one endpoint. | | |
| 101 | +| connection.endpoints.endpoint.host | The ClickHouse database host as a domain name or an IP address. | | |
| 102 | +| connection.endpoints.endpoint.port | The port of the ClickHouse database. | 9000 | |
| 103 | +| connection.username | The database username. | | |
| 104 | +| connection.password | The database password. | | |
| 105 | +| connection.database | The database name where the specified table is present. | | |
| 106 | +| connection.table | The name of the table to insert the data into. | | |
| 107 | +| **blocks** | Number of data blocks in circulation. Each block is de-facto a memory buffer that the rows are written to before being sent out to the ClickHouse database. | 64 | |
| 108 | +| **inserterThreads** | Number of threads used for data insertion to ClickHouse. In other words, the number of ClickHouse connections that are concurrently used. | 8 | |
| 109 | +| **blockInsertThreshold** | Number of rows to be buffered into a block before the block is sent out to be inserted into the database. | 100000 | |
| 110 | +| **blockInsertMaxDelaySecs** | Maximum number of seconds to wait before a block gets sent out to be inserted into the database even if the threshold has not been reached yet. | 10 | |
| 111 | +| **columns** | List of fields which each row consists of. It is in unirec template format. ([TYPE] [NAME]) | | |
| 112 | + |
| 113 | + |
| 114 | +### Example configuration |
| 115 | +```YAML |
| 116 | +connection: |
| 117 | + endpoints: |
| 118 | + - host: localhost |
| 119 | + port: 9000 |
| 120 | + username: clickhouse |
| 121 | + password: clickhouse |
| 122 | + database: clickhouse |
| 123 | + table: flows |
| 124 | + |
| 125 | +inserterThreads: 32 |
| 126 | +blocks: 1024 |
| 127 | +blockInsertThreshold: 100000 |
| 128 | + |
| 129 | +columns: |
| 130 | + - ipaddr DST_IP |
| 131 | + - ipaddr SRC_IP |
| 132 | + - uint64 BYTES |
| 133 | + - uint64 BYTES_REV |
| 134 | + - uint64 LINK_BIT_FIELD |
| 135 | + - time TIME_FIRST |
| 136 | + - time TIME_LAST |
| 137 | + - uint32 PACKETS |
| 138 | + - uint32 PACKETS_REV |
| 139 | + - uint16 DST_PORT |
| 140 | + - uint16 SRC_PORT |
| 141 | + - uint8 FLOW_END_REASON |
| 142 | + - uint8 PROTOCOL |
| 143 | + - uint8 TCP_FLAGS |
| 144 | + - uint8 TCP_FLAGS_REV |
| 145 | + - bytes IDP_CONTENT |
| 146 | + - bytes IDP_CONTENT_REV |
| 147 | + - int8* PPI_PKT_DIRECTIONS |
| 148 | + - uint8* PPI_PKT_FLAGS |
| 149 | + - bytes TLS_JA3_FINGERPRINT |
| 150 | + - string TLS_SNI |
| 151 | + - uint16* PPI_PKT_LENGTHS |
| 152 | + - uint32* DBI_BRST_BYTES |
| 153 | + - uint32* DBI_BRST_PACKETS |
| 154 | + - uint32* D_PHISTS_IPT |
| 155 | + - uint32* D_PHISTS_SIZES |
| 156 | + - uint32* SBI_BRST_BYTES |
| 157 | + - uint32* SBI_BRST_PACKETS |
| 158 | + - uint32* S_PHISTS_IPT |
| 159 | + - uint32* S_PHISTS_SIZES |
| 160 | + - time* DBI_BRST_TIME_START |
| 161 | + - time* DBI_BRST_TIME_STOP |
| 162 | + - time* PPI_PKT_TIMES |
| 163 | + - time* SBI_BRST_TIME_START |
| 164 | + - time* SBI_BRST_TIME_STOP |
| 165 | +``` |
0 commit comments