Merge pull request #159 from cipherstash/bloom-filter-docs

tobyhede · web-flow · commit 6b6fde8e4c20 · 2025-12-13T14:33:25.000+11:00
docs(bloom-filter): correct k parameter terminology and add calculator link
diff --git a/docs/reference/index-config.md b/docs/reference/index-config.md
@@ -71,7 +71,7 @@ The default match index options are:
 - `token_filters`: a list of filters to apply to normalize tokens before indexing.
 - `tokenizer`: determines how input text is split into tokens.
 - `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
-- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
+- `k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.
 
 **Token filters**
 
@@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
 This determines the maximum number of bits that will be set in the bloom filter per term.
 `k` must be an integer from `3` to `16` and defaults to `6`.
 
+To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).
+
 **Caveats around n-gram tokenization**
 
 While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.