Skip to content

Add compression ratio calculation and per-column compression stats (#18184)#18185

Open
johnsolomonj wants to merge 62 commits into
apache:masterfrom
johnsolomonj:feature/compression-stats-tracking
Open

Add compression ratio calculation and per-column compression stats (#18184)#18185
johnsolomonj wants to merge 62 commits into
apache:masterfrom
johnsolomonj:feature/compression-stats-tracking

Conversation

@johnsolomonj

@johnsolomonj johnsolomonj commented Apr 13, 2026

Copy link
Copy Markdown
Contributor

Labels: feature, release-notes, observability

Summary

Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.

Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:

  • Track uncompressed forward index sizes at write time in all raw column writers (BaseChunkForwardIndexWriter subclasses, VarByteChunkForwardIndexWriterV4/V5/V6, CLPForwardIndexCreatorV2)
  • Track raw ingest size at write time for dict columns in SegmentDictionaryCreator (STRING via Utf8.encodedLength, BYTES via array length, BIG_DECIMAL via BigDecimalUtils.byteSize; fixed-width types computed from totalDocs × typeWidth at seal time)
  • Persist uncompressed size and compression codec to metadata.properties per column
  • Expose compressionStats, columnCompressionStats, and storageBreakdown on both GET /tables/{table}/size and GET /tables/{table}/metadata
  • Add TABLE_COMPRESSION_RATIO_PERCENT and TABLE_TIERED_STORAGE_SIZE controller gauges with tier lifecycle management
  • Gated by table-level indexingConfig.compressionStatsEnabled flag (default: off, zero overhead when disabled)

Design document

See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.

Key design decisions

  • Per-value tracking: Uncompressed size tracked at individual put*() callsites, capturing raw ingested data size without chunk headers or alignment padding
  • Shared codec resolution: ForwardIndexType.resolveCompressionType() handles CLP codec variants, used by both BaseSegmentCreator and ForwardIndexHandler
  • Dict columns included: Dictionary-encoded columns get codec="DICT_ENCODED", rawIngestSizeInBytes tracked via SegmentDictionaryCreator, and onDiskSizeInBytes = forward index + dictionary file size. Columns with mixed encoding across segments produce codec="MIXED" with a per-codec codecBreakdown (segments, rawIngestSizeInBytes, onDiskSizeInBytes per codec). hasDictionary field removed — encoding fully expressed via codec.
  • Backward compatible: New metadata fields are additive; old segments gracefully return defaults

Test plan

  • Unit tests for writer uncompressed size tracking (fixed-byte, var-byte V1-V3, V4/V5/V6)
  • Unit tests for CLP V2 sub-stream size aggregation
  • Unit tests for ForwardIndexType.resolveCompressionType() codec resolution
  • Unit tests for ForwardIndexHandler compression stats persistence on reload
  • Unit tests for SegmentDictionaryCreator.getTotalRawIngestBytes() (STRING UTF-8 multi-byte, BYTES, BIG_DECIMAL, MV columns)
  • Controller aggregation tests (dict sentinel preservation, negative ratio guards, partial coverage)
  • Integration test for end-to-end compression stats API response
  • E2E manual tests for dict-only, raw-only, mixed codec, and flag-off scenarios via both /size and /metadata APIs
  • Verify zero overhead when compressionStatsEnabled = false

@codecov-commenter

codecov-commenter commented Apr 13, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 72.11155% with 210 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.82%. Comparing base (b27a3ad) to head (c6fce9e).
⚠️ Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
.../apache/pinot/controller/util/TableSizeReader.java 74.85% 26 Missing and 18 partials ⚠️
.../pinot/server/api/resources/TableSizeResource.java 9.30% 35 Missing and 4 partials ⚠️
...t/controller/util/ServerSegmentMetadataReader.java 72.89% 18 Missing and 11 partials ⚠️
...che/pinot/server/api/resources/TablesResource.java 77.67% 2 Missing and 23 partials ⚠️
...segment/creator/impl/SegmentDictionaryCreator.java 62.85% 8 Missing and 5 partials ⚠️
...oller/api/resources/PinotTableRestletResource.java 0.00% 12 Missing ⚠️
...local/segment/creator/impl/BaseSegmentCreator.java 72.22% 0 Missing and 10 partials ⚠️
...ent/creator/impl/fwd/CLPForwardIndexCreatorV2.java 47.36% 2 Missing and 8 partials ⚠️
...ment/index/forward/ForwardIndexCreatorFactory.java 68.18% 4 Missing and 3 partials ⚠️
...ocal/segment/creator/impl/ColumnIndexCreators.java 20.00% 2 Missing and 2 partials ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18185      +/-   ##
============================================
+ Coverage     64.78%   64.82%   +0.04%     
  Complexity     1309     1309              
============================================
  Files          3380     3384       +4     
  Lines        209544   210244     +700     
  Branches      32797    32962     +165     
============================================
+ Hits         135746   136285     +539     
- Misses        62870    62943      +73     
- Partials      10928    11016      +88     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.82% <72.11%> (+0.04%) ⬆️
temurin 64.82% <72.11%> (+0.04%) ⬆️
unittests 64.81% <72.11%> (+0.04%) ⬆️
unittests1 56.95% <53.22%> (+<0.01%) ⬆️
unittests2 37.39% <68.65%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch 3 times, most recently from b9e573e to 7667a13 Compare May 12, 2026 23:13
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch 4 times, most recently from 0bf95a3 to 0741c48 Compare May 19, 2026 19:41
@xiangfu0 xiangfu0 added release-notes Referenced by PRs that need attention when compiling the next release notes observability Related to observability (logging, tracing, metrics) feature New functionality labels May 20, 2026
@xiangfu0 xiangfu0 force-pushed the feature/compression-stats-tracking branch 2 times, most recently from d4ce64e to 1cea546 Compare May 24, 2026 06:41
@xiangfu0

Copy link
Copy Markdown
Contributor

a few things:

  1. add docs for all the public apis and configs
  2. java docs should follow markdown style and start with ///
  3. pinot supports both dictionary + raw forward index, so just checking either to determine the existence of the other won't persist.

@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 992eeae to c53db31 Compare June 2, 2026 00:28
@johnsolomonj

Copy link
Copy Markdown
Contributor Author

a few things:

  1. add docs for all the public apis and configs
  2. java docs should follow markdown style and start with ///
  3. pinot supports both dictionary + raw forward index, so just checking either to determine the existence of the other won't persist.
  1. Added /// Javadoc to all new public classes (ColumnCompressionStatsInfo, CompressionStatsSummary) and the compressionStatsEnabled config field.
  2. Converted all class-level docs to /// markdown style.
  3. Addressed. hasDictionary is removed and codec is the single source of truth. Dict columns set codec="DICT_ENCODED", so checking one encoding no longer implies anything about the other.

@JsonProperty("rawIngestSizeBytes") long rawIngestSizeBytes,
@JsonProperty("onDiskSizeBytes") long onDiskSizeBytes,
@JsonProperty("tier") @Nullable String tier,
@JsonProperty("columnCompressionStats") @Nullable Map<String, ColumnCompressionStatsInfo>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This per column stats may blow up the response, please make sure the REST API has an explicit param to ask for this, default should be off.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice Idea! Added a query param includeColumnStats. Including per column stats only if this param is passed and set to true.

protected int _chunkSize;
protected long _dataOffset;
protected long _uncompressedSize;
protected boolean _trackUncompressedSize = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be default to false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of true was effectively overridden by setTrackUncompressedSize(false) via ForwardIndexCreatorFactory before any writes happened, so it wasn't causing incorrect behavior. But false is the right default for clarity and safety. Fixed.

private int _metadataSize = 0;
private long _chunkOffset = 0;
private long _uncompressedSize = 0;
private boolean _trackUncompressedSize = true;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default should be false and set by external

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default of true was effectively overridden by setTrackUncompressedSize(false) via ForwardIndexCreatorFactory before any writes happened, so it wasn't causing incorrect behavior. But false is the right default for clarity and safety. Fixed.

}

public void putInt(int value) {
if (_trackUncompressedSize) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this make the tracking to be on hotspot for every put call.
is better to infer it from the _chunkDataOffset?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

accum[1] += info.getOnDiskSizeInBytes();
if (info.getCodec() != null) {
columnCodecMap.merge(col, info.getCodec(),
(existing, incoming) -> existing.equals(incoming) ? existing : "MIXED");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this MIXED doesn't provide much info, can you make it a list of codecs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When codec="MIXED" in the response, the codecBreakdown map provides the full per-codec detail — segment count, rawIngestSizeInBytes, and onDiskSizeInBytes for each codec. So the list of codecs and their sizes are already available via codecBreakdown.


@Override
@Nullable
public String getCompressionCodec() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be ChunkCompressionType not string

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCompressionCodec() can return a ChunkCompressionType name, "DICT_ENCODED", or "MIXED" — a new enum would either duplicate ChunkCompressionType values (and drift when new codecs are added) or require a wrapper that still needs a String fallback. Kept it as String to avoid that coupling. Open to suggestions if there's a cleaner pattern you have in mind.

@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from d6abaf1 to f4902a8 Compare June 10, 2026 12:31
@johnsolomonj johnsolomonj marked this pull request as ready for review June 11, 2026 23:45
@johnsolomonj johnsolomonj changed the title [Draft] Add compression ratio calculation and per-column compression stats (#18184) Add compression ratio calculation and per-column compression stats (#18184) Jun 11, 2026
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 9755191 to 2c7f1d0 Compare June 12, 2026 09:52
…to table size API

This feature enables tracking and reporting of forward index compression
effectiveness across Pinot segments. When `compressionStatsEnabled` is set
in table config's indexing config, segment creation records uncompressed
forward index sizes and compression codec in metadata.properties. The
server-side table size endpoint now returns per-segment and per-column
raw/compressed forward index sizes. The controller aggregates these into
table-level compression ratio metrics (raw/compressed), with partial
coverage tracking for mixed-version clusters. Three new ControllerGauge
metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA,
TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring.
ForwardIndexHandler is updated to persist compression metadata during
segment reload operations (compression type change and dict-to-raw conversion).
…feature

- Add 6 new test files covering writer-level tracking, segment creation,
  corner cases, ForwardIndexHandler reload, and integration tests for both
  offline and realtime (Kafka) ingestion paths
- Merge redundant dual-loop in TableSizeReader into a single pass over
  server info, improving performance during table size aggregation
- Fix offline integration test teardown to properly wait for table data
  manager removal before stopping servers
- Wrap second table cleanup in offline test in finally block to prevent
  resource leaks on assertion failure
…tier breakdown, and stale metadata cleanup

- Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL)
- Add StorageBreakdown with per-tier segment count and size (always reported)
- Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments)
- Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF
- Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist
- Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload
- Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties
- Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF
- Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions
- Update integration tests for nested compressionStats JSON structure
…leSizeResource for dict

- Gate _totalRawIngestBytes accumulation in SegmentDictionaryCreator behind a
  _trackRawIngestBytes flag (passed from IndexCreationContext.isCompressionStatsEnabled()
  via DictionaryIndexType.createIndexCreator). Eliminates Utf8.encodedLength() and
  BigDecimalUtils.byteSize() calls on every row when the feature is disabled.
- Fix TableSizeResource to emit CODEC_DICT_ENCODED for dict columns instead of codec=null,
  include dict file size in onDiskSizeInBytes, and populate rawIngestSizeInBytes from
  getDictColumnRawIngestSizeBytes() — consistent with TablesResource handling.
…mns regardless of column filter

The metadata endpoint accepts an optional ?columns= filter; when omitted, JAX-RS
provides an empty list making columnSet empty, so the column loop iterated zero
columns and compression stats were never collected. Split the loop into two:
a column-stats loop scoped to columnSet, and a separate compression-stats loop
over allSegmentColumns — keeping per-requested-column data scoped to the filter
while ensuring compression stats always cover all segment columns.
…ablesResource

TableSizeReader: the summary guard required maxRawFwdIndexSize > 0 which is always
false for dict-only tables (no raw forward index). Switch to summing per-column
rawIngest and onDisk from perColumnMax for all table types — consistent with
per-column output and covers dict-only, raw-only, and mixed tables correctly.

TablesResource: split the single column loop into a column-stats loop (scoped to
caller's ?columns= filter) and a separate compression-stats loop over all segment
columns, so compression stats are always collected regardless of the column filter.
…ardIndexSizeBytes to rawIngestSizeBytes/onDiskSizeBytes

These fields were added in this PR (not on master) so no backward compatibility
concern. Aligns naming with ColumnCompressionStatsInfo (rawIngestSizeInBytes,
onDiskSizeInBytes) and CompressionStatsSummary (rawIngestSizePerReplicaInBytes,
onDiskSizePerReplicaInBytes) for consistency across the compression stats API.
…rackUncompressedSize when compressionStatsEnabled
…to gate per-column stats

Per-column compression stats (columnCompressionStats) can be large for tables with
many columns. Add ?includeColumnStats=false (default) to both GET /tables/{table}/size
and GET /tables/{table}/metadata so callers opt in explicitly.

- compressionStats summary and storageBreakdown always returned when feature flag enabled
- columnCompressionStats only computed and returned when includeColumnStats=true
- param flows end-to-end from controller to server; server skips per-column map
  construction when false, avoiding unnecessary CPU and response bloat
…edByteChunkForwardIndexWriter

Removes per-put if(_trackUncompressedSize) branches from putInt/putLong/putFloat/putDouble.
_chunkDataOffset already accumulates the same byte count unconditionally for flush detection,
so we read it once per chunk flush instead of re-incrementing per value.
Updates testPartialChunkAccountedInClose to match per-chunk semantics and adds Javadoc
clarifying that getUncompressedSize() is accurate only after close().
…artial chunk

Override getUncompressedSize() in FixedByteChunkForwardIndexWriter to return
_uncompressedSize + _chunkDataOffset so callers reading before close() (e.g.
writeMetadata()) get the correct total. Without this, partial chunks that have
not yet triggered a flush return 0, causing compression stats to be silently
omitted from segment metadata.
…acy writer, and ForwardIndexHandler

- MultiValueFixedByteRawIndexCreatorTest: tracking enabled/disabled
- MultiValueVarByteRawIndexCreatorTest: tracking enabled/disabled
- ForwardIndexWriterUncompressedSizeTest: legacy VarByteChunkForwardIndexWriter tracking
- ForwardIndexHandlerCompressionStatsTest: codec not persisted when compressionStatsEnabled=false
…rawIngestSize metadata persistence

- VarByteChunkSVForwardIndexTest: getUncompressedSize/setTrackUncompressedSize via
  SingleValueVarByteRawIndexCreator (enabled and disabled)
- SegmentDictionaryCreatorRawIngestSizeTest: end-to-end test verifying
  dict.rawIngestSizeBytes is persisted to segment metadata when compressionStatsEnabled
… behavior

The initial segment already has SNAPPY persisted (built with compressionStatsEnabled=true).
When stats are disabled, the handler does not overwrite the metadata with the new codec —
so the assertion is that the old value is unchanged, not null.
@johnsolomonj johnsolomonj force-pushed the feature/compression-stats-tracking branch from 2c7f1d0 to bb679b5 Compare June 15, 2026 14:53
…e to request

columnCompressionStats is only returned when includeColumnStats=true is passed.
The test was calling the endpoint without this param so ccs was always null.
…rror

Without this, a stale gauge value from a previous successful fetch persists
when all servers subsequently return errors. The test testGetTableSubTypeSizeAllErrors
asserts the gauge must not exist after an all-error run.
…/size API

Root cause: the summary accumulation loop was gated on perColumnMax which is only
populated when the server is called with includeColumnStats=true. For the default
case (includeColumnStats=false), the server omits columnCompressionStats from
SegmentSizeInfo so perColumnMax was always empty and _segmentsWithStats stayed 0,
causing _compressionStats to be null.

Fix: use segment-level rawIngestSizeBytes/onDiskSizeBytes from SegmentSizeInfo for
the summary (always populated by servers when compressionStatsEnabled). Dict-only
segments count toward coverage but not the ratio to avoid skewing it toward zero.
Keep per-column fallback for legacy servers that don't populate segment-level fields.

Adds regression test testCompressionStatsSummaryPresentWhenColumnStatsExcluded.
…overload

Added testRunner(servers, table, includeColumnStats) overload and used it
in the regression test instead of a method that does not exist.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New functionality observability Related to observability (logging, tracing, metrics) release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants