Problem
The heatmap uses quantile(0.01) for the lower bound and actual max() for the upper bound. Values below the lower quantile land in an underflow bucket (bucket 0).
Previously, a quantile(0.99) upper bound was also used, but this hid latency spikes above the 99th percentile — the exact anomalies (timeouts, slow queries) that users need a heatmap to detect. The upper bound was changed to actual max() since log scale handles wide ranges naturally.
However, using actual max() means a single extreme outlier (e.g., one 60s timeout when p99 is 500ms) can stretch the axis. Overflow-bucket indicators would let us use a tighter quantile range for the axis without hiding data — users would see a visual signal that data exists beyond the visible range.
Current overflow behavior
- Bucket 0: all values ≤ effectiveMin (fast failures, Duration=0)
- Bucket N+1: all values ≥ max (only the exact max value due to widthBucket semantics)
These overflow buckets are rendered as normal cells, so users can't distinguish:
- A timeout at 10s vs 60s (both in the top overflow bucket)
- A fast failure at 0ms vs 0.5ms (both in the bottom overflow bucket)
Why this would improve UX
With overflow-bucket indicators, we could re-introduce quantile-based range clamping (e.g., p0.1–p99.9) for the axis to keep the chart focused on the most relevant range, while still giving users a clear signal that outlier data exists beyond the visible boundaries. This is the "smart lumping" approach — the axis stays tight and readable, but spikes aren't silently hidden.
Use cases
- Fast failures: Auth rejected, validation errors, connection refused — duration ~0ms, clustered at the bottom. A spike in these indicates an error wave.
- Slow timeouts: Gateway timeouts, stuck queries — duration 10-60s, clustered at the top. These are often the most critical incidents to spot.
Proposal
Visually distinguish overflow buckets from regular buckets so users know data is being lumped:
- Visual indicator: Render overflow rows with a subtle hatched/striped pattern or different border to signal "this bucket contains clamped values"
- Tooltip context: When hovering an overflow bucket, show the actual min/max range of values in that bucket (e.g., "0ms – 0.01ms, 523 spans" or "30s – 120s, 12 spans")
- Selection accuracy: When selecting an overflow bucket, use the actual data range (not the bucket boundary) for the downstream filter
Related
- HDX-3698 — symlog scale (alternative approach to zero-handling)
- HDX-3697 — heatmap visualization overhaul (parent)
- HDX-3699 — hover tooltip with percentile context
Problem
The heatmap uses
quantile(0.01)for the lower bound and actualmax()for the upper bound. Values below the lower quantile land in an underflow bucket (bucket 0).Previously, a
quantile(0.99)upper bound was also used, but this hid latency spikes above the 99th percentile — the exact anomalies (timeouts, slow queries) that users need a heatmap to detect. The upper bound was changed to actualmax()since log scale handles wide ranges naturally.However, using actual
max()means a single extreme outlier (e.g., one 60s timeout when p99 is 500ms) can stretch the axis. Overflow-bucket indicators would let us use a tighter quantile range for the axis without hiding data — users would see a visual signal that data exists beyond the visible range.Current overflow behavior
These overflow buckets are rendered as normal cells, so users can't distinguish:
Why this would improve UX
With overflow-bucket indicators, we could re-introduce quantile-based range clamping (e.g., p0.1–p99.9) for the axis to keep the chart focused on the most relevant range, while still giving users a clear signal that outlier data exists beyond the visible boundaries. This is the "smart lumping" approach — the axis stays tight and readable, but spikes aren't silently hidden.
Use cases
Proposal
Visually distinguish overflow buckets from regular buckets so users know data is being lumped:
Related