Skip to content

Commit 5f4b850

Browse files
committed
docs(bloom-filter): correct k parameter terminology and add calculator link
Change description from "bits set" to "number of hash functions" to align with standard Bloom filter terminology. Add link to Bloom filter calculator to help users determine optimal bf and k values for their use case.
1 parent bf6e01f commit 5f4b850

1 file changed

Lines changed: 3 additions & 1 deletion

File tree

docs/reference/index-config.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ The default match index options are:
7171
- `token_filters`: a list of filters to apply to normalize tokens before indexing.
7272
- `tokenizer`: determines how input text is split into tokens.
7373
- `bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
74-
- `k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
74+
- `k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.
7575

7676
**Token filters**
7777

@@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
9393
This determines the maximum number of bits that will be set in the bloom filter per term.
9494
`k` must be an integer from `3` to `16` and defaults to `6`.
9595

96+
To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).
97+
9698
**Caveats around n-gram tokenization**
9799

98100
While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.

0 commit comments

Comments
 (0)