Skip to content

Commit b27f036

Browse files
authored
Merge pull request #47 from ilotoki0804/patch-1
Mention that the sample of zstd_train_dict is chosen randomly
2 parents 3a820f3 + d0c2ec1 commit b27f036

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,9 +104,9 @@ Note that a compression VFS such as https://github.com/mlin/sqlite_zstd_vfs migh
104104

105105
- `zstd_train_dict(agg, dict_size: int, sample_count: int) -> blob`
106106

107-
Aggregate function (like sum() or count()) to train a zstd dictionary on sample_count samples of the given aggregate data
107+
Aggregate function (like sum() or count()) to train a zstd dictionary on randomly selected sample_count samples of the given aggregate data
108108

109-
Example use: `select zstd_train_dict(tbl.data, 100000, 1000) from tbl` will return a dictionary of size 100kB trained on 1000 samples in `tbl`
109+
Example use: `select zstd_train_dict(tbl.data, 100000, 1000) from tbl` will return a dictionary of size 100kB trained on 1000 random samples in `tbl`
110110

111111
The recommended number of samples is 100x the target dictionary size. As an example, you can train a dict of 100kB with the "optimal" sample count as follows:
112112

0 commit comments

Comments
 (0)