Add compression ratio calculation and per-column compression stats (#18184)#18185
Add compression ratio calculation and per-column compression stats (#18184)#18185johnsolomonj wants to merge 62 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #18185 +/- ##
============================================
+ Coverage 64.78% 64.82% +0.04%
Complexity 1309 1309
============================================
Files 3380 3384 +4
Lines 209544 210244 +700
Branches 32797 32962 +165
============================================
+ Hits 135746 136285 +539
- Misses 62870 62943 +73
- Partials 10928 11016 +88
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
b9e573e to
7667a13
Compare
0bf95a3 to
0741c48
Compare
d4ce64e to
1cea546
Compare
|
a few things:
|
992eeae to
c53db31
Compare
|
| @JsonProperty("rawIngestSizeBytes") long rawIngestSizeBytes, | ||
| @JsonProperty("onDiskSizeBytes") long onDiskSizeBytes, | ||
| @JsonProperty("tier") @Nullable String tier, | ||
| @JsonProperty("columnCompressionStats") @Nullable Map<String, ColumnCompressionStatsInfo> |
There was a problem hiding this comment.
This per column stats may blow up the response, please make sure the REST API has an explicit param to ask for this, default should be off.
There was a problem hiding this comment.
Nice Idea! Added a query param includeColumnStats. Including per column stats only if this param is passed and set to true.
| protected int _chunkSize; | ||
| protected long _dataOffset; | ||
| protected long _uncompressedSize; | ||
| protected boolean _trackUncompressedSize = true; |
There was a problem hiding this comment.
this should be default to false?
There was a problem hiding this comment.
The default of true was effectively overridden by setTrackUncompressedSize(false) via ForwardIndexCreatorFactory before any writes happened, so it wasn't causing incorrect behavior. But false is the right default for clarity and safety. Fixed.
| private int _metadataSize = 0; | ||
| private long _chunkOffset = 0; | ||
| private long _uncompressedSize = 0; | ||
| private boolean _trackUncompressedSize = true; |
There was a problem hiding this comment.
the default should be false and set by external
There was a problem hiding this comment.
The default of true was effectively overridden by setTrackUncompressedSize(false) via ForwardIndexCreatorFactory before any writes happened, so it wasn't causing incorrect behavior. But false is the right default for clarity and safety. Fixed.
| } | ||
|
|
||
| public void putInt(int value) { | ||
| if (_trackUncompressedSize) { |
There was a problem hiding this comment.
this make the tracking to be on hotspot for every put call.
is better to infer it from the _chunkDataOffset?
| accum[1] += info.getOnDiskSizeInBytes(); | ||
| if (info.getCodec() != null) { | ||
| columnCodecMap.merge(col, info.getCodec(), | ||
| (existing, incoming) -> existing.equals(incoming) ? existing : "MIXED"); |
There was a problem hiding this comment.
this MIXED doesn't provide much info, can you make it a list of codecs?
There was a problem hiding this comment.
When codec="MIXED" in the response, the codecBreakdown map provides the full per-codec detail — segment count, rawIngestSizeInBytes, and onDiskSizeInBytes for each codec. So the list of codecs and their sizes are already available via codecBreakdown.
|
|
||
| @Override | ||
| @Nullable | ||
| public String getCompressionCodec() { |
There was a problem hiding this comment.
this should be ChunkCompressionType not string
There was a problem hiding this comment.
getCompressionCodec() can return a ChunkCompressionType name, "DICT_ENCODED", or "MIXED" — a new enum would either duplicate ChunkCompressionType values (and drift when new codecs are added) or require a wrapper that still needs a String fallback. Kept it as String to avoid that coupling. Open to suggestions if there's a cleaner pattern you have in mind.
d6abaf1 to
f4902a8
Compare
9755191 to
2c7f1d0
Compare
…to table size API This feature enables tracking and reporting of forward index compression effectiveness across Pinot segments. When `compressionStatsEnabled` is set in table config's indexing config, segment creation records uncompressed forward index sizes and compression codec in metadata.properties. The server-side table size endpoint now returns per-segment and per-column raw/compressed forward index sizes. The controller aggregates these into table-level compression ratio metrics (raw/compressed), with partial coverage tracking for mixed-version clusters. Three new ControllerGauge metrics (TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA, TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA) are emitted for monitoring. ForwardIndexHandler is updated to persist compression metadata during segment reload operations (compression type change and dict-to-raw conversion).
…feature - Add 6 new test files covering writer-level tracking, segment creation, corner cases, ForwardIndexHandler reload, and integration tests for both offline and realtime (Kafka) ingestion paths - Merge redundant dual-loop in TableSizeReader into a single pass over server info, improving performance during table size aggregation - Fix offline integration test teardown to properly wait for table data manager removal before stopping servers - Wrap second table cleanup in offline test in finally block to prevent resource leaks on assertion failure
…tier breakdown, and stale metadata cleanup - Wrap flat compression fields in nested CompressionStats DTO with @JsonInclude(NON_NULL) - Add StorageBreakdown with per-tier segment count and size (always reported) - Add per-column ColumnCompressionDetail with aggregated sizes, ratio, and codec (MIXED when codecs differ across segments) - Gate compressionStats on tableConfig.indexingConfig.compressionStatsEnabled; suppress from JSON when OFF - Fix isPartialCoverage: now correctly returns true when 0 segments have stats but non-missing segments exist - Clear stale forwardIndex.compressionCodec and forwardIndex.uncompressedSizeBytes on raw-to-dict reload - Support null values in SegmentMetadataUtils.updateMetadataProperties to clear properties - Add TABLE_TIERED_STORAGE_SIZE gauge; emit tier metrics always; clear compression+tier gauges when flag OFF - Add testRawToDictClearsCompressionStats, testCompressionStatsNullWhenFlagOff, per-column/tier assertions - Update integration tests for nested compressionStats JSON structure
…leSizeResource for dict - Gate _totalRawIngestBytes accumulation in SegmentDictionaryCreator behind a _trackRawIngestBytes flag (passed from IndexCreationContext.isCompressionStatsEnabled() via DictionaryIndexType.createIndexCreator). Eliminates Utf8.encodedLength() and BigDecimalUtils.byteSize() calls on every row when the feature is disabled. - Fix TableSizeResource to emit CODEC_DICT_ENCODED for dict columns instead of codec=null, include dict file size in onDiskSizeInBytes, and populate rawIngestSizeInBytes from getDictColumnRawIngestSizeBytes() — consistent with TablesResource handling.
…ytes flag; fix tests to use it
…mns regardless of column filter The metadata endpoint accepts an optional ?columns= filter; when omitted, JAX-RS provides an empty list making columnSet empty, so the column loop iterated zero columns and compression stats were never collected. Split the loop into two: a column-stats loop scoped to columnSet, and a separate compression-stats loop over allSegmentColumns — keeping per-requested-column data scoped to the filter while ensuring compression stats always cover all segment columns.
…ablesResource TableSizeReader: the summary guard required maxRawFwdIndexSize > 0 which is always false for dict-only tables (no raw forward index). Switch to summing per-column rawIngest and onDisk from perColumnMax for all table types — consistent with per-column output and covers dict-only, raw-only, and mixed tables correctly. TablesResource: split the single column loop into a column-stats loop (scoped to caller's ?columns= filter) and a separate compression-stats loop over all segment columns, so compression stats are always collected regardless of the column filter.
…ardIndexSizeBytes to rawIngestSizeBytes/onDiskSizeBytes These fields were added in this PR (not on master) so no backward compatibility concern. Aligns naming with ColumnCompressionStatsInfo (rawIngestSizeInBytes, onDiskSizeInBytes) and CompressionStatsSummary (rawIngestSizePerReplicaInBytes, onDiskSizePerReplicaInBytes) for consistency across the compression stats API.
…s in TableSizeResource
…tedColumn to return null
…nTest - use shared suite instance
…uming/old segments
…rackUncompressedSize when compressionStatsEnabled
…to gate per-column stats
Per-column compression stats (columnCompressionStats) can be large for tables with
many columns. Add ?includeColumnStats=false (default) to both GET /tables/{table}/size
and GET /tables/{table}/metadata so callers opt in explicitly.
- compressionStats summary and storageBreakdown always returned when feature flag enabled
- columnCompressionStats only computed and returned when includeColumnStats=true
- param flows end-to-end from controller to server; server skips per-column map
construction when false, avoiding unnecessary CPU and response bloat
…rue) since default is now false
…edByteChunkForwardIndexWriter Removes per-put if(_trackUncompressedSize) branches from putInt/putLong/putFloat/putDouble. _chunkDataOffset already accumulates the same byte count unconditionally for flush detection, so we read it once per chunk flush instead of re-incrementing per value. Updates testPartialChunkAccountedInClose to match per-chunk semantics and adds Javadoc clarifying that getUncompressedSize() is accurate only after close().
…artial chunk Override getUncompressedSize() in FixedByteChunkForwardIndexWriter to return _uncompressedSize + _chunkDataOffset so callers reading before close() (e.g. writeMetadata()) get the correct total. Without this, partial chunks that have not yet triggered a flush return 0, causing compression stats to be silently omitted from segment metadata.
…acy writer, and ForwardIndexHandler - MultiValueFixedByteRawIndexCreatorTest: tracking enabled/disabled - MultiValueVarByteRawIndexCreatorTest: tracking enabled/disabled - ForwardIndexWriterUncompressedSizeTest: legacy VarByteChunkForwardIndexWriter tracking - ForwardIndexHandlerCompressionStatsTest: codec not persisted when compressionStatsEnabled=false
…rawIngestSize metadata persistence - VarByteChunkSVForwardIndexTest: getUncompressedSize/setTrackUncompressedSize via SingleValueVarByteRawIndexCreator (enabled and disabled) - SegmentDictionaryCreatorRawIngestSizeTest: end-to-end test verifying dict.rawIngestSizeBytes is persisted to segment metadata when compressionStatsEnabled
… behavior The initial segment already has SNAPPY persisted (built with compressionStatsEnabled=true). When stats are disabled, the handler does not overwrite the metadata with the new codec — so the assertion is that the old value is unchanged, not null.
2c7f1d0 to
bb679b5
Compare
…e to request columnCompressionStats is only returned when includeColumnStats=true is passed. The test was calling the endpoint without this param so ccs was always null.
…rror Without this, a stale gauge value from a previous successful fetch persists when all servers subsequently return errors. The test testGetTableSubTypeSizeAllErrors asserts the gauge must not exist after an all-error run.
…/size API Root cause: the summary accumulation loop was gated on perColumnMax which is only populated when the server is called with includeColumnStats=true. For the default case (includeColumnStats=false), the server omits columnCompressionStats from SegmentSizeInfo so perColumnMax was always empty and _segmentsWithStats stayed 0, causing _compressionStats to be null. Fix: use segment-level rawIngestSizeBytes/onDiskSizeBytes from SegmentSizeInfo for the summary (always populated by servers when compressionStatsEnabled). Dict-only segments count toward coverage but not the ratio to avoid skewing it toward zero. Keep per-column fallback for legacy servers that don't populate segment-level fields. Adds regression test testCompressionStatsSummaryPresentWhenColumnStatsExcluded.
…overload Added testRunner(servers, table, includeColumnStats) overload and used it in the regression test instead of a method that does not exist.
Labels:
feature,release-notes,observabilitySummary
Draft implementation for the PEP proposed in #18184. Kept as draft pending design review on the issue.
Adds compression ratio tracking and per-column compression stats to Pinot's existing table size and metadata APIs:
BaseChunkForwardIndexWritersubclasses,VarByteChunkForwardIndexWriterV4/V5/V6,CLPForwardIndexCreatorV2)SegmentDictionaryCreator(STRING viaUtf8.encodedLength, BYTES via array length, BIG_DECIMAL viaBigDecimalUtils.byteSize; fixed-width types computed fromtotalDocs × typeWidthat seal time)metadata.propertiesper columncompressionStats,columnCompressionStats, andstorageBreakdownon bothGET /tables/{table}/sizeandGET /tables/{table}/metadataTABLE_COMPRESSION_RATIO_PERCENTandTABLE_TIERED_STORAGE_SIZEcontroller gauges with tier lifecycle managementindexingConfig.compressionStatsEnabledflag (default: off, zero overhead when disabled)Design document
See #18184 for the full PEP including motivation, prior art, API response structure, and known corner cases.
Key design decisions
put*()callsites, capturing raw ingested data size without chunk headers or alignment paddingForwardIndexType.resolveCompressionType()handles CLP codec variants, used by bothBaseSegmentCreatorandForwardIndexHandlercodec="DICT_ENCODED",rawIngestSizeInBytestracked viaSegmentDictionaryCreator, andonDiskSizeInBytes= forward index + dictionary file size. Columns with mixed encoding across segments producecodec="MIXED"with a per-codeccodecBreakdown(segments, rawIngestSizeInBytes, onDiskSizeInBytes per codec).hasDictionaryfield removed — encoding fully expressed viacodec.Test plan
ForwardIndexType.resolveCompressionType()codec resolutionForwardIndexHandlercompression stats persistence on reloadSegmentDictionaryCreator.getTotalRawIngestBytes()(STRING UTF-8 multi-byte, BYTES, BIG_DECIMAL, MV columns)/sizeand/metadataAPIscompressionStatsEnabled = false