|
| 1 | +# Incremental Stripe Compositing |
| 2 | + |
| 3 | +Grok supports incremental, row-at-a-time writing of decompressed JPEG 2000 |
| 4 | +images to output formats (TIFF, PNM, PNG, JPEG). Instead of decompressing |
| 5 | +every tile into a full-resolution composite image and then writing the whole |
| 6 | +image at once, tiles are composited and written one tile-row at a time. This |
| 7 | +keeps resident memory proportional to a single tile-row rather than the entire |
| 8 | +image. |
| 9 | + |
| 10 | +## Overview |
| 11 | + |
| 12 | +``` |
| 13 | +┌──────────────────────────────────────────────────────────────────────┐ |
| 14 | +│ Parser Thread │ |
| 15 | +│ Reads codestream markers (SOT) and feeds tile data to decompression │ |
| 16 | +│ Back-pressure: blocks when tile row > nextBandTileY_ + 1 │ |
| 17 | +└────────────────────┬─────────────────────────────────────────────────┘ |
| 18 | + │ schedule(tileProcessor) |
| 19 | + ▼ |
| 20 | +┌──────────────────────────────────────────────────────────────────────┐ |
| 21 | +│ Taskflow Thread Pool │ |
| 22 | +│ T2 (packet parsing) → T1 (entropy decode) → DWT → post callback │ |
| 23 | +└────────────────────┬─────────────────────────────────────────────────┘ |
| 24 | + │ tileCompletion_->complete(tileIndex) |
| 25 | + ▼ |
| 26 | +┌──────────────────────────────────────────────────────────────────────┐ |
| 27 | +│ Row Completion Callback │ |
| 28 | +│ Composites tiles into scratchImage_, calls ioBandCallback_, │ |
| 29 | +│ releases tile memory, advances strip buffer │ |
| 30 | +└────────────────────┬─────────────────────────────────────────────────┘ |
| 31 | + │ ioBandCallback_(yBegin, yEnd, scratchImage_) |
| 32 | + ▼ |
| 33 | +┌──────────────────────────────────────────────────────────────────────┐ |
| 34 | +│ Format Writer (e.g. TIFFFormat) │ |
| 35 | +│ writeImageBand(): interleave planar→packed, write strip to disk │ |
| 36 | +└──────────────────────────────────────────────────────────────────────┘ |
| 37 | +``` |
| 38 | + |
| 39 | +## When Incremental Writes Are Enabled |
| 40 | + |
| 41 | +The CLI application (`GrkDecompress`) enables incremental writes when **all** |
| 42 | +of the following are true: |
| 43 | + |
| 44 | +1. `storeToDisk` is true (not a memory-only decode) |
| 45 | +2. Not a single-tile decompress (`!parameters->single_tile_decompress`) |
| 46 | +3. Post-processing is a no-op (`grk_image_is_post_process_no_op`) — i.e. no |
| 47 | + colour-space conversion, ICC profile application, or precision scaling |
| 48 | +4. No windowed vertical crop (`parameters->dw_y1 == 0`) |
| 49 | +5. The output format supports incremental band writes |
| 50 | + (`fmt->supportsIncrementalBandWrite()`) |
| 51 | + |
| 52 | +When enabled, the application calls `grk_decompress_set_band_callback()` which |
| 53 | +stores `ioBandCallback_` on the `CodeStreamDecompress` instance. |
| 54 | + |
| 55 | +## Key Data Structures |
| 56 | + |
| 57 | +### scratchImage_ |
| 58 | + |
| 59 | +A `GrkImage` that acts as a strip buffer. Its component data arrays are sized |
| 60 | +to hold one tile-row of pixels. After each tile-row is written, the component |
| 61 | +`y0` and `h` fields are advanced to the next tile-row so that tile compositing |
| 62 | +writes into the correct location. |
| 63 | + |
| 64 | +### TileCompletion |
| 65 | + |
| 66 | +Tracks which tiles have finished decompressing. When all tiles in a tile-row |
| 67 | +are complete (`completedTilesPerRow_[row] == subregionWidth_`), the row |
| 68 | +completion callback fires. |
| 69 | + |
| 70 | +Key members: |
| 71 | +- `completedTiles_[]` — per-tile completion flag |
| 72 | +- `completedTilesPerRow_[]` — counter per tile-row |
| 73 | +- `rowCallback_` — fired when an entire row of tiles finishes |
| 74 | +- `heap_` — min-heap for tracking contiguous completion (used by `wait()`) |
| 75 | + |
| 76 | +### pendingBands_ |
| 77 | + |
| 78 | +An `std::unordered_map<uint16_t, BandInfo>` mapping tile-row Y to its band |
| 79 | +parameters (`yBegin`, `yEnd`, `tileX0`, `numCols`). Because tiles in a row |
| 80 | +may complete out-of-order relative to other rows, the row callback inserts |
| 81 | +into this map and then drains from `nextBandTileY_` forward in order. |
| 82 | + |
| 83 | +### Back-Pressure Variables |
| 84 | + |
| 85 | +- `nextBandTileY_` — the next tile-row to be drained (composited + written) |
| 86 | +- `bandOrderMutex_` — protects `nextBandTileY_` and `pendingBands_` |
| 87 | +- `bandDrainCV_` — wakes the parser thread when a row is drained |
| 88 | + |
| 89 | +## Lifecycle of a Tile Row |
| 90 | + |
| 91 | +### 1. Parsing and Scheduling |
| 92 | + |
| 93 | +The parser thread (sequential or TLM path) reads tile data from the |
| 94 | +codestream. Before scheduling each tile for decompression, it checks back |
| 95 | +pressure: |
| 96 | + |
| 97 | +```cpp |
| 98 | +// Block if this tile's row is too far ahead of the drained row |
| 99 | +if (ioBandCallback_ && tileCompletion_) { |
| 100 | + uint16_t tileY = tileIndex / numTileCols; |
| 101 | + std::unique_lock<std::mutex> lock(bandOrderMutex_); |
| 102 | + while (!(tileY < nextBandTileY_ + 2 || !success_)) |
| 103 | + bandDrainCV_.wait_for(lock, std::chrono::milliseconds(100)); |
| 104 | +} |
| 105 | +``` |
| 106 | + |
| 107 | +This limits the parser to scheduling at most **2 tile-rows ahead** of the |
| 108 | +currently drained row. This bounds memory to roughly 2 tile-rows of |
| 109 | +decompressed data. |
| 110 | + |
| 111 | +### 2. Decompression (Taskflow) |
| 112 | + |
| 113 | +Each tile is decompress-scheduled through Taskflow: |
| 114 | +T2 (packet parse) → T1 (entropy decode) → DWT (wavelet inverse) → |
| 115 | +`postMultiTile(tileProcessor)` callback. |
| 116 | + |
| 117 | +The `postMultiTile` callback: |
| 118 | +1. Calls `post_decompressT2T1(scratchImage_)` to extract tile data into a |
| 119 | + per-tile image |
| 120 | +2. Increments `numTilesDecompressed_` |
| 121 | +3. Skips the global `scratchImage_->composite()` call (which the non-band path |
| 122 | + uses) since compositing is deferred to the row callback |
| 123 | +4. Calls `tileCompletion_->complete(tileIndex)` |
| 124 | + |
| 125 | +### 3. Row Completion |
| 126 | + |
| 127 | +When `TileCompletion::complete()` detects that all tiles in a row are done, it |
| 128 | +fires the row callback **outside** the TileCompletion lock. |
| 129 | + |
| 130 | +The row callback (a lambda created during `decompressAllTiles()`) does: |
| 131 | + |
| 132 | +1. **Insert into `pendingBands_`** — records the band's Y extents and tile |
| 133 | + column range. |
| 134 | + |
| 135 | +2. **Drain in order** — walks `pendingBands_` starting from `nextBandTileY_`: |
| 136 | + |
| 137 | + a. **Composite** — for each tile in the row, copies its per-tile image into |
| 138 | + `scratchImage_` via `scratchImage_->composite(tileImage)`. This is a |
| 139 | + per-component `memcpy` of each row of each tile into the correct position |
| 140 | + in the strip buffer. |
| 141 | + |
| 142 | + b. **Write band** — calls `ioBandCallback_(yBegin, yEnd, scratchImage_)`. |
| 143 | + The application's callback (`grkWriteBandCallback`) calls |
| 144 | + `imageFormat->writeImageBand(yBegin, yEnd)`, which interleaves the |
| 145 | + planar int32 data into packed bytes (SIMD-accelerated for 8/16-bit) and |
| 146 | + writes one or more TIFF strips to disk via `TIFFWriteEncodedStrip`. |
| 147 | + |
| 148 | + c. **Release tiles** — calls `tileCache_->releaseForSwath(tileIndex)` for |
| 149 | + each tile in the row and `MemoryManager::releaseFreedPages()` to return |
| 150 | + freed pages to the OS, dropping RSS. |
| 151 | + |
| 152 | + d. **Advance strip buffer** — updates `scratchImage_` component `y0` and |
| 153 | + `h` for the next tile row so that future compositing writes land in the |
| 154 | + correct location. |
| 155 | + |
| 156 | + e. **Advance `nextBandTileY_`** and notify `bandDrainCV_` so the parser |
| 157 | + thread can schedule more tiles. |
| 158 | + |
| 159 | +### 4. Final Cleanup |
| 160 | + |
| 161 | +After `decompressAllTiles()` completes, `postMultiTile()` (the no-arg variant) |
| 162 | +runs. When `ioBandCallback_` is set, it skips the `transferDataTo` and |
| 163 | +`postProcess` steps since data was already incrementally consumed from |
| 164 | +`scratchImage_`. |
| 165 | + |
| 166 | +## Format Writer: TIFFFormat::writeImageBand() |
| 167 | + |
| 168 | +For non-subsampled images (the common path): |
| 169 | + |
| 170 | +1. Creates a `PlanarToInterleaved` interleaver via `InterleaverFactory`. |
| 171 | +2. For each strip-worth of rows (`rows_per_strip`): |
| 172 | + - Allocates a packed buffer from the I/O buffer pool |
| 173 | + - Calls `interleaver_->interleave()` — for 8-bit and 16-bit, this uses |
| 174 | + Highway SIMD (`StoreInterleaved3`/`StoreInterleaved4`) to convert |
| 175 | + planar int32 to packed uint8/uint16 |
| 176 | + - Calls `writeStripCore()` → `TIFFWriteEncodedStrip()` to write to disk |
| 177 | + - Returns the buffer to the pool |
| 178 | + |
| 179 | +For subsampled YCbCr images, a scalar loop packs luma and chroma samples |
| 180 | +according to the TIFF YCbCr layout (luma block + Cb + Cr per MCU). |
| 181 | + |
| 182 | +## Memory Behaviour |
| 183 | + |
| 184 | +Without incremental compositing, a 40000×40000 8-bit RGB image requires |
| 185 | +~4.5 GB for the full composite. With incremental compositing and 256-pixel |
| 186 | +tile rows, only ~2 tile-rows (~40 MB) of decompressed data are resident at |
| 187 | +any time. |
| 188 | + |
| 189 | +The `MemoryManager::releaseFreedPages()` call after each row release uses |
| 190 | +`madvise(MADV_DONTNEED)` (Linux) to return freed pages to the OS, ensuring |
| 191 | +RSS tracks the working set rather than the high-water mark. |
| 192 | + |
| 193 | +## Code Paths |
| 194 | + |
| 195 | +| Codestream Type | Function | Back-Pressure | |
| 196 | +|-----------------|----------|---------------| |
| 197 | +| Sequential (no TLM) | `sequentialParseAndSchedule()` | Blocks parser at `bandOrderMutex_` | |
| 198 | +| TLM, non-batched | `decompressTLM()` | Blocks per-tile at `bandOrderMutex_` | |
| 199 | +| TLM, batched (async) | `scheduleTileBatch()` | Queue depth limit via `batchTileQueueCondition_` | |
0 commit comments