Skip to content

[PEP] Add compression ratio calculation and per-column compression stats #18184

@johnsolomonj

Description

@johnsolomonj

What

Add compression stats and storage breakdown to the existing GET /tables/{table}/size and GET /tables/{table}/metadata endpoints. No new endpoints.

Modules Changed

Stats are written once at segment creation into metadata.properties, loaded into ColumnMetadata at segment load, and aggregated on-demand per API call in the controller — no background jobs, no separate store.

Module Change
pinot-spi / IndexingConfig New compressionStatsEnabled boolean (default false)
pinot-segment-local writers BaseChunkForwardIndexWriter, VarByteChunkForwardIndexWriterV4/V5/V6, CLPForwardIndexCreatorV2 — track raw byte count during writes
pinot-segment-local / SegmentDictionaryCreator Track raw ingest byte count for dict columns (STRING via Utf8.encodedLength, BYTES via array length, BIG_DECIMAL via BigDecimalUtils.byteSize; fixed-width types from totalDocs × typeWidth at seal time); gated behind trackRawIngestBytes flag
pinot-segment-local / BaseSegmentCreator Persists uncompressed size + codec to segment metadata.properties at creation time; persists dict.rawIngestSizeBytes for dict columns
pinot-segment-spi / ColumnMetadata New default methods to read persisted stats at segment load, including getDictColumnRawIngestSizeBytes()
pinot-common New DTOs: ColumnCompressionStatsInfo, CompressionStatsSummary, StorageBreakdownInfo; extended: SegmentSizeInfo, TableMetadataInfo; new ControllerGauge entries
pinot-server /tables/{table}/size and /tables/{table}/metadata — read per-column stats from ColumnMetadata and include in response
pinot-controller TableSizeReader + ServerSegmentMetadataReader aggregate per-segment responses on each API call; emit Prometheus gauges

New Fields (both endpoints, same structure)

"compressionStats": {
  "rawIngestSizePerReplicaInBytes": 550000000,
  "onDiskSizePerReplicaInBytes": 30000000,
  "compressionRatio": 18.3,
  "segmentsWithStats": 312,
  "totalSegments": 285,
  "isPartialCoverage": true
},
"columnCompressionStats": [
  {
    "column": "revenue",
    "codec": "LZ4",
    "rawIngestSizeInBytes": 120000000,
    "onDiskSizeInBytes": 8000000,
    "compressionRatio": 15.0,
    "indexes": ["forward_index"]
  },
  {
    "column": "campaign_id",
    "codec": "DICT_ENCODED",
    "rawIngestSizeInBytes": 500000,
    "onDiskSizeInBytes": 95000,
    "compressionRatio": 5.26,
    "indexes": ["dictionary", "forward_index"]
  },
  {
    "column": "country",
    "codec": "MIXED",
    "rawIngestSizeInBytes": 310000,
    "onDiskSizeInBytes": 52000,
    "compressionRatio": 5.96,
    "codecBreakdown": {
      "LZ4":          { "segments": 8, "rawIngestSizeInBytes": 180000, "onDiskSizeInBytes": 28000 },
      "DICT_ENCODED": { "segments": 5, "rawIngestSizeInBytes": 130000, "onDiskSizeInBytes": 24000 }
    }
  }
],
"storageBreakdown": {
  "tiers": {
    "hotTier":  { "count": 50,  "sizePerReplicaInBytes": 10000000 },
    "coldTier": { "count": 262, "sizePerReplicaInBytes": 20000000 }
  }
}

Behavior

Feature flag (tableIndexConfig.compressionStatsEnabled, default false):

  • false: zero overhead — no tracking in writers, nothing written to disk, compressionStats and columnCompressionStats absent from responses
  • true: writers track raw byte counts; codec + uncompressed size persisted to segment metadata

storageBreakdown is always returned regardless of the flag.

Dictionary columns appear in columnCompressionStats with codec="DICT_ENCODED", rawIngestSizeInBytes populated (bytes before encoding), and onDiskSizeInBytes = forward index + dictionary file size.

Mixed encoding (e.g. column converted from dict to raw after a table config change): codec="MIXED" with a codecBreakdown map showing per-codec segment count, rawIngest, and onDisk sizes. hasDictionary field removed — encoding fully expressed via codec.

Partial coverage: enabling the flag on an existing table only affects new segments. Old segments are excluded from ratio computation (not counted as zero). isPartialCoverage=true and segmentsWithStats < totalSegments signal this.

Realtime: consuming segments excluded — stats appear only after segment commit.

All ingestion paths covered: offline batch, realtime, and minion tasks all converge at SegmentIndexCreationDriverImplBaseSegmentCreator.

Prometheus gauges: TABLE_COMPRESSION_RATIO_PERCENT, TABLE_RAW_FORWARD_INDEX_SIZE_PER_REPLICA, TABLE_COMPRESSED_FORWARD_INDEX_SIZE_PER_REPLICA, TABLE_TIERED_STORAGE_SIZE. Cleared when flag is disabled or table becomes dict-only.

What's Out of Scope

  • UI changes (follow-up): Surface compression ratio, per-column stats, and tier breakdown in Pinot Console table detail page — API already returns all required data, purely a rendering change.

Use Cases

  1. COGS estimation: Compression ratio and per-column breakdown for informed storage cost projections
  2. Codec optimization: Identify columns with poor compression ratios and switch codecs (e.g., LZ4 → ZSTANDARD for cold data)
  3. Capacity planning: Right-size clusters by understanding true storage footprint with local vs tiered breakdown
  4. Schema optimization: Identify columns that benefit from dictionary encoding vs raw encoding
  5. Index cost analysis: Per-column index size visibility to evaluate cost-vs-performance trade-offs when adding or removing indexes
  6. Monitoring/alerting: Alert when compression ratio degrades after schema changes or data pattern shifts

Related Issues and PRs

Draft PR

#18185

Metadata

Metadata

Assignees

No one assigned

    Labels

    PEP-RequestPinot Enhancement Proposal request to be reviewed.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions