@@ -1744,7 +1744,8 @@ in that case, treating it as if it were a counter histogram.
17441744
17451745### Interpolation within a bucket
17461746
1747- When estimating quantiles or fractions, PromQL has to apply interpolation
1747+ When estimating quantiles or fractions of observations in a histogram, or when
1748+ removing observations from a histogram, PromQL has to apply interpolation
17481749within a bucket. In classic histograms, this interpolation happens in a linear
17491750fashion. It is based on the assumption that observations are equally
17501751distributed within the bucket. In reality, this assumption might be far off.
@@ -1763,17 +1764,31 @@ interpolation yields an error that is lower on average. Since the interpolation
17631764has worked well over many years of classic histogram usage, interpolation is
17641765also applied for native histograms.
17651766
1766- For NHCBs, PromQL applies the same interpolation method as for classic
1767+ For NHCBs, PromQL applies the same linear interpolation method as for classic
17671768histograms to keep results consistent. (The main use case for NHCBs is a
1768- drop-in replacement for classic histograms.) However, for standard exponential
1769- schemas, linear interpolation can be seen as a misfit. While exponential
1770- schemas primarily intend to minimize the relative error of quantile
1771- estimations, they also benefit from a balanced usage of buckets, at least over
1772- certain ranges of observed values. The basic assumption is that for most
1773- practically occurring distributions, the density of observations tends to be
1774- higher for smaller observed values. Therefore, PromQL uses exponential
1775- extrapolation for the standard schemas, which models the assumption
1776- that dividing a bucket into two when increasing the schema number by one (i.e.
1769+ drop-in replacement for classic histograms.) This includes the following
1770+ special cases:
1771+
1772+ - If all [ custom values] ( #custom-values ) are positive, the lower boundary of
1773+ the lowest bucket is assumed to be zero. (Note that it is recommended to
1774+ include a custom value of zero to avoid this heuristics.)
1775+ - If at least one custom value is zero or negative, the lower boundary of the
1776+ lowest bucket is assumed to be -Inf. However, for purposes of interpolation,
1777+ all observations in the lowest buckets are assumed to be equal to the upper
1778+ boundary of the bucket.
1779+ - Similarly, all observation in the overflow bucket are assumed to be equal to
1780+ the last custom value (i.e. the upper bound of the highest regular bucket).
1781+ - For an NHCB without any custom values, all observations are assumed to be of
1782+ value zero.
1783+
1784+ For standard exponential schemas, linear interpolation can be seen as a misfit.
1785+ While exponential schemas primarily intend to minimize the relative error of
1786+ quantile estimations, they also benefit from a balanced usage of buckets, at
1787+ least over certain ranges of observed values. The basic assumption is that for
1788+ most practically occurring distributions, the density of observations tends to
1789+ be higher for smaller observed values. Therefore, PromQL uses exponential
1790+ extrapolation for the standard schemas, which models the assumption that
1791+ dividing a bucket into two when increasing the schema number by one (i.e.
17771792doubling the resolution) will on average see similar populations in both new
17781793buckets. A more detailed explanation can be found in the [ PR implementing the
17791794interpolation method] ( https://github.com/prometheus/prometheus/pull/14677 ) .
@@ -1833,17 +1848,21 @@ the reason why the level of the annotation is only info.)
18331848
18341849The following describes all the operations that actually _ do_ work.
18351850
1851+ #### Addition and subtraction
1852+
18361853Addition (` + ` ) and subtraction (` - ` ) work between two compatible histograms.
18371854These operators add or subtract all matching bucket populations and the count
18381855and the sum of observations. Missing buckets are assumed to be empty and
1839- treated accordingly. Generally, both operands should be gauges. Adding and
1840- subtracting counter histograms requires caution, but PromQL allows it. Adding a
1841- gauge histogram and a counter histogram results in a gauge histogram. Adding
1842- two counter histograms results in a counter histogram. If the two operands
1843- share the same counter reset hint, the resulting counter histogram retains that
1844- counter reset hint. Otherwise, the resulting counter reset hint is set to
1845- ` UnknownCounterReset ` . The result of a subtraction is always marked as a gauge
1846- histogram because it might result in negative histograms, see [ notes
1856+ treated accordingly.
1857+
1858+ Generally, both operands should be gauges. Adding and subtracting counter
1859+ histograms requires caution, but PromQL allows it. Adding a gauge histogram and
1860+ a counter histogram results in a gauge histogram. Adding two counter histograms
1861+ results in a counter histogram. If the two operands share the same counter
1862+ reset hint, the resulting counter histogram retains that counter reset hint.
1863+ Otherwise, the resulting counter reset hint is set to ` UnknownCounterReset ` .
1864+ The result of a subtraction is always marked as a gauge histogram because it
1865+ might result in negative histograms, see [ notes
18471866above] ( #unary-minus-and-negative-histograms ) . Adding or subtracting two counter
18481867histograms with directly contradicting counter reset hints (i.e. ` CounterReset `
18491868and ` NotCounterReset ` ) triggers a warn-level annotation. (TODO: As described
@@ -1853,14 +1872,20 @@ circumstances involving the `HistogramStatsIterator`, which includes additional
18531872counter reset tracking. See [ tracking
18541873issue] ( https://github.com/prometheus/prometheus/issues/15346 ) .)
18551874
1875+ #### Multiplication
1876+
18561877Multiplication (` * ` ) works between a float sample or a scalar on the one side
18571878and a histogram on the other side, in any order. It multiplies all bucket
18581879populations and the count and the sum of observations by the float (sample or
18591880scalar). This will lead to “scaled” and sometimes even negative histograms,
18601881which is usually only useful as intermediate results inside other expressions
1861- (see also [ notes above] ( #unary-minus-and-negative-histograms ) ). Multiplication
1862- works for both counter histograms and gauge histograms, and their flavor is left
1863- unchanged by the operation.
1882+ (see also [ notes above] ( #unary-minus-and-negative-histograms ) ).
1883+
1884+ Multiplication works for both counter histograms and gauge histograms, and
1885+ their flavor is left unchanged by the operation, with the exception that
1886+ multiplying by a negative value always results in a gauge histogram.
1887+
1888+ #### Division
18641889
18651890Division (` / ` ) works between a histogram on the left hand side and a float
18661891sample or a scalar on the right hand side. It is equivalent to multiplication
@@ -1870,30 +1895,93 @@ and sum of observations all set to `+Inf`, `-Inf`, or `NaN`, depending on their
18701895values in the input histogram (positive, negative, or zero/` NaN ` ,
18711896respectively).
18721897
1898+ #### Equality and inequality
1899+
18731900Equality (` == ` ) and inequality (` != ` ) work between two histograms, both in
18741901their filtering version as well as with the ` bool ` modifier. They compare the
18751902schema, the custom values, the zero threshold, all bucket populations, and the
18761903sum and count of observations. Whether the histograms have counter or gauge
18771904flavor is irrelevant for the comparison. (A counter histogram could be equal to
18781905a gauge histogram.)
18791906
1907+ #### Logical and set operators
1908+
18801909The logical/set binary operators (` and ` , ` or ` , ` unless ` ) work as expected even
18811910if histogram samples are involved. They only check for the existence of a
18821911vector element and don't change their behavior depending on the sample type or
18831912flavor of an element (float or histogram, counter or gauge).
18841913
1885- The “trim” operators ` >/ ` and ` </ ` were introduced specifically for native
1886- histograms. They only work for a histogram on the left hand side and a float
1887- sample or a scalar on the right hand side. (They do not work for float samples
1888- or scalars on _ both_ sides. An info-level annotation is returned in this case.)
1889- These operators remove observations from the histogram that are greater or
1890- smaller than the float value on the right side, respectively, and return the
1891- resulting histogram. The removal is only precise if the threshold coincides
1892- with a bucket boundary. Otherwise, interpolation within the affected buckets
1893- has to be used, as described [ above] ( #interpolation-within-a-bucket ) . The
1894- counter vs. gauge flavor of the histogram is preserved. (TODO: These operators
1895- are not yet implemented and might also change in detail, see [ tracking
1896- issue] ( https://github.com/prometheus/prometheus/issues/14651 ) .)
1914+ #### Trim operators
1915+
1916+ The “trim upper” (` </ ` ) and “trim lower” (` >/ ` ) operators were introduced
1917+ specifically for native histograms. They only work for a histogram on the left
1918+ hand side and a float sample or a scalar on the right hand side. (An info-level
1919+ annotation is returned in all other cases.)
1920+
1921+ These operators remove observations from the left hand side histogram that are
1922+ greater or smaller than the float value on the right hand side, respectively,
1923+ and return the resulting histogram.
1924+
1925+ The removal is only precise if the threshold coincides with a bucket boundary.
1926+ Otherwise, interpolation within the affected buckets has to be used, as
1927+ described [ above] ( #interpolation-within-a-bucket ) with the following exceptions
1928+ for NHCBs:
1929+
1930+ - If the lowest bucket has a lower boundary of -Inf, all observation in that
1931+ bucket are considered to be of value -Inf (rather than the upper boundary).
1932+ - If the highest bucket has an upper boundary of +Inf, all observation in that
1933+ bucket are considered to be of value -Inf (rather than the lower boundary).
1934+ - However, in the pathologic edge case where a histogram has only a single
1935+ bucket with a lower limit of -Inf and an upper limit of +Inf, all
1936+ observations are still considered to be of value zero.
1937+
1938+ If any observations have been removed, the sum of all observations in the
1939+ resulting histogram is estimated from the remaining buckets. The value of each
1940+ observation in a given bucket is estimated to be the following:
1941+
1942+ - The upper boundary of the lowest bucket of an NHCB if that upper boundary is
1943+ negative or zero. (This follows again the original [ interpolation
1944+ heuristics] ( #interpolation-within-a-bucket ) , despite using a different one
1945+ for trimming, as explained above).
1946+ - The lower boundary for the overflow bucket of an NHCB. (Again switching back to
1947+ the original interpolation heuristics.)
1948+ - -Inf for the negative overflow bucket of a standard exponential histogram.
1949+ - +Inf for the positive overflow bucket of a standard exponential histogram.
1950+ - The arithmetic mean for all other buckets of an NHCB and for the zero bucket
1951+ of a standard exponential histogram, taking into account the known heuristics
1952+ for the following special cases:
1953+ - The lowest bucket of an NHCB is considered to have a lower boundary of zero if
1954+ its upper boundary is positive.
1955+ - The lower boundary of the zero bucket of a standard exponential histogram is
1956+ considered to be zero if the histograms has no populated negative buckets.
1957+ - The upper boundary of the zero bucket of a standard exponential histogram is
1958+ considered to be zero if the histograms has no populated positive buckets.
1959+ - The geometric mean for all other buckets of a standard exponential histogram.
1960+
1961+ Additionally, the boundaries used for the (arithmetic or geometric) mean
1962+ calculation for the bucket that was only partially removed (using the
1963+ interpolation described above) is modified in the following way: The relevant
1964+ upper boundary (for ` </ ` ) or lower boundary (for ` >/ ` ) is considered to be
1965+ equal to the cutoff threshold provided as the 2nd operand.
1966+
1967+ Note that this estimation of the sum of observations is inaccurate (even in the
1968+ case where the cutoff threshold conicides with a bucket boundary), up to a
1969+ point where it could yield results that are obviously wrong. For example, after
1970+ removing some positive observation, the estimated sum of observations could be
1971+ larger than the sum of observations in the original histogram. The estimation
1972+ algorithm could be refined to take such cases into account, but it is kept
1973+ deliberately simple to make it easier to reason with. In general, the
1974+ histograms resulting from trim operations are meant to be used for quantile
1975+ estimation or filtering, where the sum of observations is irrelevant anyway.
1976+ The trim operators are also a viable alternative to the ` histogram_fraction `
1977+ function (explained below) in cases where the desired outcome is not a fraction
1978+ of observations but a count of observations. For example, using
1979+ ` histogram_count(h >/ 0.2 </ 0.5) ` instead of `histogram_fraction(0,2, 0.5, h) *
1980+ histogram_count(h)` is simpler and also yields ` 0` instead of ` NaN` in case ` h`
1981+ has no observations. (The ` histogram_fraction ` calculated from an empty
1982+ histogram is always ` NaN ` .)
1983+
1984+ The trim operators preserve the counter vs. gauge flavor of the histogram.
18971985
18981986### Aggregation operators
18991987
0 commit comments