docs(bloom-filter): correct k parameter terminology and add calculator link

tobyhede · tobyhede · commit 5f4b850332b2 · 2025-12-12T08:26:01.000+11:00
Change description from "bits set" to "number of hash functions" to align
with standard Bloom filter terminology. Add link to Bloom filter calculator
to help users determine optimal bf and k values for their use case.
diff --git a/docs/reference/index-config.md b/docs/reference/index-config.md
@@ -71,7 +71,7 @@ The default match index options are:
 - `token_filters`: a list of filters to apply to normalize tokens before indexing.
 - `tokenizer`: determines how input text is split into tokens.
 - `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
-- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
+- `k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.
 
 **Token filters**
 
@@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
 This determines the maximum number of bits that will be set in the bloom filter per term.
 `k` must be an integer from `3` to `16` and defaults to `6`.
 
+To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).
+
 **Caveats around n-gram tokenization**
 
 While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.