epub: avoid a fixed-size buffer when copying the content path prefix#715
Open
maxheise wants to merge 1 commit into
Open
epub: avoid a fixed-size buffer when copying the content path prefix#715maxheise wants to merge 1 commit into
maxheise wants to merge 1 commit into
Conversation
In get_uri_to_content() the directory part of the OPF path is copied into a fixed buffer with a hand-written loop that runs until the last '/', with no bound on how much it copies. The string copied is relativepath, the full-path attribute read from the EPUB's META-INF/container.xml, so its length is whatever the file declares. Two things about the allocation also stand out. The element size is sizeof(gchar*), the size of a pointer (8 bytes on 64-bit), but the buffer holds a string of gchar, i.e. characters, not pointers; the size that matches the data is sizeof(gchar) (1 byte) or simply a byte count, so the '*' is the wrong type for what is stored and the buffer is ~800 bytes only by accident. And 100 is a magic number with no explanation for why it is 100. Because the copy loop is unbounded, the buffer's correctness rests on those accidents. Normal EPUBs use short OPF paths and are unaffected, but a file with a long enough full-path would copy past the buffer. And if someone corrected the type mismatch, sizeof(gchar*) to sizeof(gchar), the buffer would shrink to 100 bytes and the same loop would write past the end on ordinary paths. Compute the prefix length as the distance to the last '/' and use g_strndup(), which allocates a buffer sized to the data and NUL-terminates it, so the copy stays in bounds. This also removes the magic number, the wrong-typed sizeof, and the hand-written loop, and produces the same string. The freeing of the result is unchanged.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi, and thanks for xreader.
I have been writing a markdown backend for xreader as a hobby, with help
from an AI coding assistant, which flagged this while I was in the EPUB
code. I reviewed it and think it is a small, worthwhile cleanup.
In
get_uri_to_content()(backend/epub/epub-document.c), the directorypart of the OPF path is copied into a fixed buffer with a hand-written
loop that runs until the last
/, with no bound on how much it copies:The string being copied is
relativepath, thefull-pathattribute readfrom the EPUB's
META-INF/container.xml, so its length is whatever thefile declares; nothing here bounds it.
Two things about the allocation also stand out:
sizeof(gchar*), the size of a pointer (8 byteson 64-bit). But the buffer holds a string of
gchar, i.e. characters,not pointers. The size that matches the data being copied is
sizeof(gchar)(1 byte), or simply a byte count. So the*is thewrong type for what is actually stored, and the buffer ends up ~800
bytes only by accident.
100is a magic number: there is no explanation for why it is 100.Because the copy loop is unbounded, the buffer's correctness rests on
those accidents. Normal EPUBs use short OPF paths, so this is not hit in
everyday use, but since the length comes from the file, a long enough
full-pathwould copy past the buffer. And if someone corrected theobvious type mismatch,
sizeof(gchar*)tosizeof(gchar), the bufferwould shrink to 100 bytes and the same loop would write past the end on
ordinary paths.
The fix computes the length and uses
g_strndup():g_strndupallocates a buffer sized to the prefix length, so the copystays within bounds regardless of the input. It also removes the magic
number, the wrong-typed
sizeof, and the hand-written loop, and producesthe same string as before.
Thanks for taking a look.