You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/index-config.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,7 @@ The default match index options are:
71
71
-`token_filters`: a list of filters to apply to normalize tokens before indexing.
72
72
-`tokenizer`: determines how input text is split into tokens.
73
73
-`bf`: The size of the backing [bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) in bits. Defaults to `2048`.
74
-
-`k`: The maximum number of bits set in the bloom filter per term. Defaults to `6`.
74
+
-`k`: The number of hash functions to use per term (each sets one bit in the bloom filter). Defaults to `6`.
75
75
76
76
**Token filters**
77
77
@@ -93,6 +93,8 @@ There are two `tokenizer`s provided: `standard` and `ngram`.
93
93
This determines the maximum number of bits that will be set in the bloom filter per term.
94
94
`k` must be an integer from `3` to `16` and defaults to `6`.
95
95
96
+
To calculate optimal values for your use case, see this [Bloom filter calculator](https://di-mgt.com.au/bloom-calculator.html).
97
+
96
98
**Caveats around n-gram tokenization**
97
99
98
100
While using n-grams as a tokenization method allows greater flexibility when doing arbitrary substring matches, it is important to bear in mind the limitations of this approach.
0 commit comments