Skip to content

Matroska ASS/SSA dialogue chunks can be zlib-compressed and are passed to libass without inflation #84

@rubinaboobin3-cell

Description

@rubinaboobin3-cell

Hi, I found an issue with embedded ASS/SSA subtitles in some MKV fansub releases.

Problem

AssMatroskaExtractor forwards Matroska ASS dialogue chunks directly to AssHandler.readTrackDialogue(...).

For some MKV files, the dialogue payload after Media3's generated Dialogue: <duration>, prefix is zlib-compressed. When those bytes are passed directly to libass, libass receives compressed binary data instead of the actual ASS event fields.

This causes symptoms like:

  • subtitles not appearing
  • style names becoming binary/garbled
  • libass style lookups failing
  • logs showing unreadable style names instead of Default, TS, etc.

Evidence

The affected samples have ASS dialogue payloads beginning with a valid zlib header, for example:

78 da ...

After inflating that payload, the data becomes a normal ASS event body, for example:

0,0,Default,,0,0,0,,Dialogue text here

Without inflation, libass tries to parse the compressed bytes as ASS fields, so the style field becomes garbage.

Where it happens

In the Media3 Matroska path, subtitleSample contains something like:

Dialogue: <duration>,<matroska-ass-event-payload>

For normal files, <matroska-ass-event-payload> is plain ASS event data and works.

For affected files, <matroska-ass-event-payload> is zlib-compressed and needs to be inflated before calling:

assHandler.readTrackDialogue(...)

Local workaround

I fixed this locally by copying the dialogue slice, checking for a zlib header, inflating if needed, and passing the inflated bytes to readTrackDialogue.

Pseudo-code:

val rawDialogue = sampleData.copyOfRange(dialogueOffset, sampleEnd)
val dialogue = maybeInflateDialogue(rawDialogue)

assHandler.readTrackDialogue(
    trackId = trackId,
    start = timeUs / 1000,
    duration = durationUs / 1000,
    data = dialogue,
    offset = 0,
    length = dialogue.size
)

With:

private fun maybeInflateDialogue(data: ByteArray): ByteArray {
    if (!looksLikeZlib(data)) return data

    val inflater = Inflater()
    return try {
        inflater.setInput(data)
        val output = ByteArrayOutputStream(data.size * 4)
        val buffer = ByteArray(4096)

        while (!inflater.finished()) {
            val count = inflater.inflate(buffer)
            if (count > 0) {
                output.write(buffer, 0, count)
            } else if (inflater.needsInput() || inflater.needsDictionary()) {
                break
            } else {
                break
            }
        }

        output.toByteArray().takeIf { it.isNotEmpty() } ?: data
    } catch (_: DataFormatException) {
        data
    } finally {
        inflater.end()
    }
}

private fun looksLikeZlib(data: ByteArray): Boolean {
    if (data.size < 2) return false
    val cmf = data[0].toInt() and 0xFF
    val flg = data[1].toInt() and 0xFF
    return cmf and 0x0F == 8 && ((cmf shl 8) + flg) % 31 == 0
}

Expected behavior

AssMatroskaExtractor should detect and inflate zlib-compressed Matroska ASS dialogue payloads before passing them to libass.

Notes

This is separate from fontconfig/font availability. In my case libass was running, fonts were being loaded, and the issue was specifically that the ASS event payload reaching libass was compressed binary data.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions