Use this issue to track InQL RFC 023, which already exists at docs/rfcs/023_approximate_sketch_functions.md.
Area
- Specification (RFCs)
- Package & tests
- Documentation
Summary
RFC 023 defines the design boundary for approximate aggregates and sketch functions in InQL, including approximate count distinct, approximate percentiles, HyperLogLog-like sketches, KLL-like sketches, theta sketches, count-min sketches, and bitmap aggregates.
Motivation
Approximate and sketch functions need explicit result and state semantics. They must not pretend to be ordinary exact aggregates, and authors must opt into approximate results knowingly.
Proposal sketch
The RFC separates approximate aggregates and sketch-state functions from exact aggregate semantics. It defines explicit approximation parameters, sketch-family compatibility, merge behavior, serialization boundaries, and documentation requirements for accuracy and backend limits.
Open design questions to resolve before Planned:
- Should InQL standardize one sketch family per use case or expose multiple named families?
- What serialization format, if any, should be portable across backends?
- How should accuracy guarantees be documented without implying backend-independent statistical promises that are not true?
Alternatives considered
The RFC rejects treating sketches as binary values, exposing Spark sketch names directly as core functions, and letting backends choose approximate execution for exact aggregates.
Impact / compatibility
This RFC affects approximate aggregate/sketch APIs, sketch-state typing, typechecking for sketch family compatibility, Prism/Substrait lowering or rejection, and documentation that labels approximate behavior clearly.
Implementation notes (optional)
Handle after the registry, core aggregate, and modifier foundations are settled. Backend-specific sketch support should be explicit rather than silently selected.
Checklist
Use this issue to track InQL RFC 023, which already exists at
docs/rfcs/023_approximate_sketch_functions.md.Area
Summary
RFC 023 defines the design boundary for approximate aggregates and sketch functions in InQL, including approximate count distinct, approximate percentiles, HyperLogLog-like sketches, KLL-like sketches, theta sketches, count-min sketches, and bitmap aggregates.
Motivation
Approximate and sketch functions need explicit result and state semantics. They must not pretend to be ordinary exact aggregates, and authors must opt into approximate results knowingly.
Proposal sketch
The RFC separates approximate aggregates and sketch-state functions from exact aggregate semantics. It defines explicit approximation parameters, sketch-family compatibility, merge behavior, serialization boundaries, and documentation requirements for accuracy and backend limits.
Open design questions to resolve before Planned:
Alternatives considered
The RFC rejects treating sketches as binary values, exposing Spark sketch names directly as core functions, and letting backends choose approximate execution for exact aggregates.
Impact / compatibility
This RFC affects approximate aggregate/sketch APIs, sketch-state typing, typechecking for sketch family compatibility, Prism/Substrait lowering or rejection, and documentation that labels approximate behavior clearly.
Implementation notes (optional)
Handle after the registry, core aggregate, and modifier foundations are settled. Backend-specific sketch support should be explicit rather than silently selected.
Checklist