Skip to content

Commit b987926

Browse files
author
Grok Compression
committed
docs: update and reference in code
1 parent bcb40a1 commit b987926

6 files changed

Lines changed: 222 additions & 0 deletions

File tree

Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
# Incremental Stripe Compositing
2+
3+
Grok supports incremental, row-at-a-time writing of decompressed JPEG 2000
4+
images to output formats (TIFF, PNM, PNG, JPEG). Instead of decompressing
5+
every tile into a full-resolution composite image and then writing the whole
6+
image at once, tiles are composited and written one tile-row at a time. This
7+
keeps resident memory proportional to a single tile-row rather than the entire
8+
image.
9+
10+
## Overview
11+
12+
```
13+
┌──────────────────────────────────────────────────────────────────────┐
14+
│ Parser Thread │
15+
│ Reads codestream markers (SOT) and feeds tile data to decompression │
16+
│ Back-pressure: blocks when tile row > nextBandTileY_ + 1 │
17+
└────────────────────┬─────────────────────────────────────────────────┘
18+
│ schedule(tileProcessor)
19+
20+
┌──────────────────────────────────────────────────────────────────────┐
21+
│ Taskflow Thread Pool │
22+
│ T2 (packet parsing) → T1 (entropy decode) → DWT → post callback │
23+
└────────────────────┬─────────────────────────────────────────────────┘
24+
│ tileCompletion_->complete(tileIndex)
25+
26+
┌──────────────────────────────────────────────────────────────────────┐
27+
│ Row Completion Callback │
28+
│ Composites tiles into scratchImage_, calls ioBandCallback_, │
29+
│ releases tile memory, advances strip buffer │
30+
└────────────────────┬─────────────────────────────────────────────────┘
31+
│ ioBandCallback_(yBegin, yEnd, scratchImage_)
32+
33+
┌──────────────────────────────────────────────────────────────────────┐
34+
│ Format Writer (e.g. TIFFFormat) │
35+
│ writeImageBand(): interleave planar→packed, write strip to disk │
36+
└──────────────────────────────────────────────────────────────────────┘
37+
```
38+
39+
## When Incremental Writes Are Enabled
40+
41+
The CLI application (`GrkDecompress`) enables incremental writes when **all**
42+
of the following are true:
43+
44+
1. `storeToDisk` is true (not a memory-only decode)
45+
2. Not a single-tile decompress (`!parameters->single_tile_decompress`)
46+
3. Post-processing is a no-op (`grk_image_is_post_process_no_op`) — i.e. no
47+
colour-space conversion, ICC profile application, or precision scaling
48+
4. No windowed vertical crop (`parameters->dw_y1 == 0`)
49+
5. The output format supports incremental band writes
50+
(`fmt->supportsIncrementalBandWrite()`)
51+
52+
When enabled, the application calls `grk_decompress_set_band_callback()` which
53+
stores `ioBandCallback_` on the `CodeStreamDecompress` instance.
54+
55+
## Key Data Structures
56+
57+
### scratchImage_
58+
59+
A `GrkImage` that acts as a strip buffer. Its component data arrays are sized
60+
to hold one tile-row of pixels. After each tile-row is written, the component
61+
`y0` and `h` fields are advanced to the next tile-row so that tile compositing
62+
writes into the correct location.
63+
64+
### TileCompletion
65+
66+
Tracks which tiles have finished decompressing. When all tiles in a tile-row
67+
are complete (`completedTilesPerRow_[row] == subregionWidth_`), the row
68+
completion callback fires.
69+
70+
Key members:
71+
- `completedTiles_[]` — per-tile completion flag
72+
- `completedTilesPerRow_[]` — counter per tile-row
73+
- `rowCallback_` — fired when an entire row of tiles finishes
74+
- `heap_` — min-heap for tracking contiguous completion (used by `wait()`)
75+
76+
### pendingBands_
77+
78+
An `std::unordered_map<uint16_t, BandInfo>` mapping tile-row Y to its band
79+
parameters (`yBegin`, `yEnd`, `tileX0`, `numCols`). Because tiles in a row
80+
may complete out-of-order relative to other rows, the row callback inserts
81+
into this map and then drains from `nextBandTileY_` forward in order.
82+
83+
### Back-Pressure Variables
84+
85+
- `nextBandTileY_` — the next tile-row to be drained (composited + written)
86+
- `bandOrderMutex_` — protects `nextBandTileY_` and `pendingBands_`
87+
- `bandDrainCV_` — wakes the parser thread when a row is drained
88+
89+
## Lifecycle of a Tile Row
90+
91+
### 1. Parsing and Scheduling
92+
93+
The parser thread (sequential or TLM path) reads tile data from the
94+
codestream. Before scheduling each tile for decompression, it checks back
95+
pressure:
96+
97+
```cpp
98+
// Block if this tile's row is too far ahead of the drained row
99+
if (ioBandCallback_ && tileCompletion_) {
100+
uint16_t tileY = tileIndex / numTileCols;
101+
std::unique_lock<std::mutex> lock(bandOrderMutex_);
102+
while (!(tileY < nextBandTileY_ + 2 || !success_))
103+
bandDrainCV_.wait_for(lock, std::chrono::milliseconds(100));
104+
}
105+
```
106+
107+
This limits the parser to scheduling at most **2 tile-rows ahead** of the
108+
currently drained row. This bounds memory to roughly 2 tile-rows of
109+
decompressed data.
110+
111+
### 2. Decompression (Taskflow)
112+
113+
Each tile is decompress-scheduled through Taskflow:
114+
T2 (packet parse) → T1 (entropy decode) → DWT (wavelet inverse) →
115+
`postMultiTile(tileProcessor)` callback.
116+
117+
The `postMultiTile` callback:
118+
1. Calls `post_decompressT2T1(scratchImage_)` to extract tile data into a
119+
per-tile image
120+
2. Increments `numTilesDecompressed_`
121+
3. Skips the global `scratchImage_->composite()` call (which the non-band path
122+
uses) since compositing is deferred to the row callback
123+
4. Calls `tileCompletion_->complete(tileIndex)`
124+
125+
### 3. Row Completion
126+
127+
When `TileCompletion::complete()` detects that all tiles in a row are done, it
128+
fires the row callback **outside** the TileCompletion lock.
129+
130+
The row callback (a lambda created during `decompressAllTiles()`) does:
131+
132+
1. **Insert into `pendingBands_`** — records the band's Y extents and tile
133+
column range.
134+
135+
2. **Drain in order** — walks `pendingBands_` starting from `nextBandTileY_`:
136+
137+
a. **Composite** — for each tile in the row, copies its per-tile image into
138+
`scratchImage_` via `scratchImage_->composite(tileImage)`. This is a
139+
per-component `memcpy` of each row of each tile into the correct position
140+
in the strip buffer.
141+
142+
b. **Write band** — calls `ioBandCallback_(yBegin, yEnd, scratchImage_)`.
143+
The application's callback (`grkWriteBandCallback`) calls
144+
`imageFormat->writeImageBand(yBegin, yEnd)`, which interleaves the
145+
planar int32 data into packed bytes (SIMD-accelerated for 8/16-bit) and
146+
writes one or more TIFF strips to disk via `TIFFWriteEncodedStrip`.
147+
148+
c. **Release tiles** — calls `tileCache_->releaseForSwath(tileIndex)` for
149+
each tile in the row and `MemoryManager::releaseFreedPages()` to return
150+
freed pages to the OS, dropping RSS.
151+
152+
d. **Advance strip buffer** — updates `scratchImage_` component `y0` and
153+
`h` for the next tile row so that future compositing writes land in the
154+
correct location.
155+
156+
e. **Advance `nextBandTileY_`** and notify `bandDrainCV_` so the parser
157+
thread can schedule more tiles.
158+
159+
### 4. Final Cleanup
160+
161+
After `decompressAllTiles()` completes, `postMultiTile()` (the no-arg variant)
162+
runs. When `ioBandCallback_` is set, it skips the `transferDataTo` and
163+
`postProcess` steps since data was already incrementally consumed from
164+
`scratchImage_`.
165+
166+
## Format Writer: TIFFFormat::writeImageBand()
167+
168+
For non-subsampled images (the common path):
169+
170+
1. Creates a `PlanarToInterleaved` interleaver via `InterleaverFactory`.
171+
2. For each strip-worth of rows (`rows_per_strip`):
172+
- Allocates a packed buffer from the I/O buffer pool
173+
- Calls `interleaver_->interleave()` — for 8-bit and 16-bit, this uses
174+
Highway SIMD (`StoreInterleaved3`/`StoreInterleaved4`) to convert
175+
planar int32 to packed uint8/uint16
176+
- Calls `writeStripCore()``TIFFWriteEncodedStrip()` to write to disk
177+
- Returns the buffer to the pool
178+
179+
For subsampled YCbCr images, a scalar loop packs luma and chroma samples
180+
according to the TIFF YCbCr layout (luma block + Cb + Cr per MCU).
181+
182+
## Memory Behaviour
183+
184+
Without incremental compositing, a 40000×40000 8-bit RGB image requires
185+
~4.5 GB for the full composite. With incremental compositing and 256-pixel
186+
tile rows, only ~2 tile-rows (~40 MB) of decompressed data are resident at
187+
any time.
188+
189+
The `MemoryManager::releaseFreedPages()` call after each row release uses
190+
`madvise(MADV_DONTNEED)` (Linux) to return freed pages to the OS, ensuring
191+
RSS tracks the working set rather than the high-water mark.
192+
193+
## Code Paths
194+
195+
| Codestream Type | Function | Back-Pressure |
196+
|-----------------|----------|---------------|
197+
| Sequential (no TLM) | `sequentialParseAndSchedule()` | Blocks parser at `bandOrderMutex_` |
198+
| TLM, non-batched | `decompressTLM()` | Blocks per-tile at `bandOrderMutex_` |
199+
| TLM, batched (async) | `scheduleTileBatch()` | Queue depth limit via `batchTileQueueCondition_` |

src/lib/codec/formats/TIFFFormat.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -622,6 +622,12 @@ static inline bool writeExifToTiff(TIFF* tif, const uint8_t* exifBuf, uint32_t e
622622
/* TIFF conversion*/
623623
void tiffSetErrorAndWarningHandlers(bool verbose);
624624

625+
/**
626+
* @class TIFFFormat
627+
* @brief TIFF format reader/writer with SIMD-accelerated pixel interleaving.
628+
*
629+
* @see doc/IncrementalStripeCompositing.md for the incremental band-write pipeline.
630+
*/
625631
template<typename T>
626632
class TIFFFormat : public ImageFormat
627633
{

src/lib/core/codestream/decompress/CodeStreamDecompress.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ namespace grk
2828
/**
2929
* @class CodeStreamDecompress
3030
* @brief Manages decompression
31+
*
32+
* @see doc/IncrementalStripeCompositing.md for the incremental band-write pipeline.
33+
* @see doc/TileCache.md for tile caching and LRU eviction.
3134
*/
3235
class CodeStreamDecompress final : public CodeStream, public IDecompressor
3336
{

src/lib/core/stream/fetchers/S3Fetcher.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,12 @@ enum class AWSCredentialSource
181181
EC2_OR_ECS
182182
};
183183

184+
/**
185+
* @class S3Fetcher
186+
* @brief Fetches JPEG 2000 codestream data from S3-compatible object storage.
187+
*
188+
* @see doc/S3.md for credential chain, environment variables, and configuration.
189+
*/
184190
class S3Fetcher : public CurlFetcher
185191
{
186192
// Cached credentials shared across all S3Fetcher instances.

src/lib/core/tile_processor/TileCache.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,8 @@ struct TileCacheEntry
6363
* @brief Caches tile processors so that repeated decompress calls on the same
6464
* codec can reuse SOT metadata, packet data, and decompressed images.
6565
*
66+
* @see doc/TileCache.md for architecture overview and cache strategies.
67+
*
6668
* ## Tile lifecycle
6769
*
6870
* 1. **First encounter (SOT parsed)** — `getTileProcessor()` creates a

src/lib/core/tile_processor/TileCompletion.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,12 @@
3131
namespace grk
3232
{
3333

34+
/**
35+
* @class TileCompletion
36+
* @brief Tracks per-tile completion and fires row callbacks for incremental compositing.
37+
*
38+
* @see doc/IncrementalStripeCompositing.md for the full incremental write pipeline.
39+
*/
3440
class TileCompletion
3541
{
3642
public:

0 commit comments

Comments
 (0)