Skip to content

Commit 6b6fde8

Browse files
authored
Merge pull request #159 from cipherstash/bloom-filter-docs
docs(bloom-filter): correct k parameter terminology and add calculator link
2 parents aad104f + 5f4b850 commit 6b6fde8

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

docs/reference/index-config.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ The default match index options are:
7171
- `token_filters`: a list of filters to apply to normalize tokens before indexing.
7272
- `tokenizer`: determines how input text is split into tokens.
7373
- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
74-
- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
74+
- `k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.
7575

7676
**Token filters**
7777

@@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
9393
This determines the maximum number of bits that will be set in the bloom filter per term.
9494
`k` must be an integer from `3` to `16` and defaults to `6`.
9595

96+
To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).
97+
9698
**Caveats around n-gram tokenization**
9799

98100
While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.

0 commit comments

Comments
 (0)