Provided a more useful blurb in each family index page

jmalkin · jmalkin · commit 53ff9e9363a3 · 2024-01-16T15:25:45.000-08:00
diff --git a/docs/source/distinct_counting/index.rst b/docs/source/distinct_counting/index.rst
@@ -1,6 +1,19 @@
 Distinct Counting 
 =================
-These are all of the sketches for distinct counting....
+
+.. currentmodule:: datasketches
+
+Distinct counting is one of the earliest tasks to which sketches were applied. The concept is simple:
+Provide an estimate of the number of unique elements in a set of data. One of the earliest solutions came
+from Flajolet and Martin in 1985 with their seminal work
+`Probabilistic counting Algorithms for Data Base Applications <http://db.cs.berkeley.edu/cs286/papers/flajoletmartin-jcss1985.pdf>`_.
+
+The DataSketches library offers several types of distinct counting sketches, each with different properties.
+
+  * :class:`hll_sketch`: Hyper Log Log, a well-known sketch for distinct counting but no longer state-of-the-art.
+  * :class:`cpc_sketch`: Provides a better accuracy-space trade-off than HLL, but with a somewhat larger footprint while in-memory.
+  * :class:`theta_sketch`: Theta sketch, a type of k-minimum value sketch, which provide good performance with intersection and set difference operations.
+  * :class:`tuple_sketch`: Tuple sketch, which is similar to a theta sketch but supports additional data stored with each key.
 
 .. toctree::
   :maxdepth: 1
diff --git a/docs/source/frequency/index.rst b/docs/source/frequency/index.rst
@@ -1,6 +1,12 @@
 Frequency Sketches
 ==================
-These are all of the sketches for frequency estimation
+
+Frequency estimation involves determining how often an item has been seen in a stream. The library currently
+offers two types of sketches for frequency estimation, one of which has two closely-related variants.
+
+  * :class:`frequent_items_sketch`: Identifies the *Top K* or *heavy hitters* in a stream, those items whose weight is above a certain percentage of the entire stream. Does not necessarily provide an estimate for most items outside the heavy hitters.
+  * :class:`frequent_strings_sketch`: Like the items version but containing snly strings (an implementation from before the library handled generic objects).
+  * :class:`count_min_sketch`: Provides an estimate for any item, regardless of relative weight, but does not maintain a list of the heaviest items.
 
 .. toctree::
   :maxdepth: 1
diff --git a/docs/source/helper/index.rst b/docs/source/helper/index.rst
@@ -1,11 +1,19 @@
 Helper Clsses
 =============
 
+.. currentmodule:: dataksetches
+
 These classes are required for certain sketches or specific
 functionality within sketches.
 Some of them are abstract base classes, but in those cases there is at
 least one reference example of a concrete class.
 
+  * :class:`serde` is used when serializing and deserializing sketches/
+  * :class:`jaccard` is used to compute the Jaccard similarity between pairs of theta or tuple sketches.
+  * :class:`tuple_policy` is required to use a :class:`tuple_sketch` by specifying how summaries are combined.
+  * :func:`ks_test` performs a Kolmogorov-Smirnov test on absolute-error quantiles family sketches.
+  * :class:`kernel_function` is required when using a :class:`kernel_sketch` for Kernel Density Estimation.
+
 .. toctree::
   :maxdepth: 1
 
diff --git a/docs/source/quantiles/index.rst b/docs/source/quantiles/index.rst
@@ -1,6 +1,22 @@
 Quantiles Sketches
 ==================
-These are all of the sketches for quantile estimation....
+
+Quantile estimation is useful for understanding the distribution of data values in a stream. The sketches currently
+in the library are designed to answer queries about the `rank` of an item in the stream of items. That is, when
+applying a global ordering on all the items, what is the portion of items seen so far that are less than (alternatively,
+less-than-or-equal-to) the given item. Using straightforward logic, they can also estimate the item at a given rank
+in the stream.
+
+These sketches may be used to compute approximate histograms, Probability Mass Functions (PMFs), or
+Cumulative Distribution Functions (CDFs).
+
+The library provides three types of quantiles sketches, each of which has generic items as well as versions
+specific to a given numeric type (e.g. integer or floating point values). All three types provide error
+bounds on rank estimation with proven probabilistic error distributions.
+
+  * KLL: Provides uniform rank estimation error over the entire range
+  * REQ: Provides relative rank error estimates, which decreases approaching either the high or low end values.
+  * Classic quantiles: Largely deprecated in favor of KLL, also provides uniform rank estimation error. Included largely for backwards compatibility with historic data.
 
 .. toctree::
   :maxdepth: 1
diff --git a/docs/source/sampling/index.rst b/docs/source/sampling/index.rst
@@ -1,6 +1,8 @@
 Random Sampling Sketches
 ========================
 
+.. currentmodule:: datasketches
+
 These sketches are used to randomly sample items. The length of the input
 stream does not need to be known in advance.
 
@@ -9,8 +11,8 @@ Probability Proportional to Size) sketches will include sample items based on
 each item's weight relative to the weight of the entire stream but
 they differ in details:
 
-  * EBPPS ensures that the probability of including an item is always exactly proportional to the item's weight.
-  * VarOpt optimizes for applying a predicate to the resulting sample such that the variance of the subset sum after applying the predicate is minimized, even if the inclusion probability differs somewhat from being proportional to the item's weight.
+  * :class:`ebpps_sketch` ensures that the probability of including an item is always exactly proportional to the item's weight.
+  * :class:`var_opt_sketch` optimizes for applying a predicate to the resulting sample such that the variance of the subset sum after applying the predicate is minimized, even if the inclusion probability differs somewhat from being proportional to the item's weight.
 
 .. toctree::
   :maxdepth: 1
diff --git a/docs/source/sampling/index.rst~ b/docs/source/sampling/index.rst~
diff --git a/docs/source/vector/index.rst b/docs/source/vector/index.rst
@@ -1,6 +1,10 @@
 Vector Sketches
 ==================
-These sketches are designed to accept vector inputs.
+
+.. currentmodule:: dataksetches
+
+These sketches are designed to accept vector inputs. For now, the library provides only the
+:class:`density_sketch` for Kernel Density Estimation.
 
 .. toctree::
   :maxdepth: 1