Fix: Resolved deterministic memory leak and dangling pointer in SQLParser::tokenize#262
Fix: Resolved deterministic memory leak and dangling pointer in SQLParser::tokenize#262RageLiu wants to merge 7 commits intohyrise:mainfrom
Conversation
|
|
||
| if (token == SQL_IDENTIFIER || token == SQL_STRING) { | ||
| free(yylval.sval); | ||
| yylval.sval = nullptr; |
There was a problem hiding this comment.
Do we need to care about the dangling pointer when we overwrite sval anyways in hsql_lex?
There was a problem hiding this comment.
Can you also add a test that would fail with sanitizers and without the patch?
There was a problem hiding this comment.
Do we need to care about the dangling pointer when we overwrite
svalanyways inhsql_lex?
That’s a fair point, but while hsql_lex does overwrite yylval for strings or identifiers, explicitly nullifying the pointer remains essential for several reasons. First, since yylval is a union, the sval member is typically not modified when the lexer returns tokens that don't require string values, such as semicolons or operators, meaning the stale, freed address stays in memory . Because the yylval structure is reused throughout the loop, this dangling pointer introduces a significant risk of a Double-Free if subsequent logic or future code changes attempt to release sval again while it still holds the old address.
There was a problem hiding this comment.
Can you also add a test that would fail with sanitizers and without the patch?
Done! I have added the regression test to test/sql_parser.cpp.
The test uses a sequence of consecutive identifiers to ensure that the memory is correctly managed and that yylval.sval is properly nullified after being freed. I have verified locally that this test fails with a LeakSanitizer error without my patch and passes successfully with the fix applied.
Please let me know if there are any other adjustments needed!
There was a problem hiding this comment.
Thank you! I noticed we do not run sanitizer builds in the CI. To get such warnings automatically and to verify the PR works as intended, yould you please add sanitizer builds with clang (on Ubuntu and macOS) to the CI workflow that run the tests?
Bouncner
left a comment
There was a problem hiding this comment.
Thanks for the pull request!
.github/workflows/ci.yml
Outdated
|
|
||
| steps: | ||
| - name: Checkout | ||
| uses: actions/checkout@v4 |
There was a problem hiding this comment.
Not yours, but can you please update the action to version 6? There are several deprecation warning.
There was a problem hiding this comment.
Not yours, but can you please update the action to version 6? There are several deprecation warning.
Sure! I've updated actions/checkout to v6 and temporarily removed the fix as requested to verify the sanitizer. I will restore the fix once we see the CI failing.
There was a problem hiding this comment.
Not yours, but can you please update the action to version 6? There are several deprecation warning.
My apologies—the previous CI run failed due to a missing newline at the end of src/SQLParser.cpp, which triggered a compiler error (-Wnewline-eof).
I have fixed the formatting while keeping the logic fix removed as you requested. Could you please approve the workflow run again? This should now correctly show the Sanitizer findings.
|
|
||
| } | ||
|
|
||
| token = hsql_lex(&yylval, &yylloc, scanner); |
There was a problem hiding this comment.
Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?
There was a problem hiding this comment.
Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?
Done! I have temporarily removed the fix as requested to verify the sanitizer. The workflow is now awaiting approval to run. Please approve the CI whenever you're ready.
There was a problem hiding this comment.
Last sanatizer run was all good. Can you -- just temporarily -- remove your fix to check if it correctly caught by the sanatizer?
The leaks were correctly caught by the Ubuntu Sanitizers/Valgrind (as expected, since LSan is more robust on Linux), confirming the bug's presence.
|
The experiment was a success! As shown in the latest CI run, the regression test correctly triggered a memory leak detection (9 bytes definitely lost) in all Ubuntu environments when the fix was removed. This confirms that the CI and the new test cases are effectively guarding against this bug. I have now re-applied the fix. |
Problem
The current implementation of the
SQLParser::tokenizeloop contains a logic error regarding memory management:whileblock. This causes the pointer to the first token (if it is aSQL_IDENTIFIERorSQL_STRING) to be overwritten and lost before it can be checked or freed.free(yylval.sval), the pointer is not set tonullptr. This stale address remains in the reusedyylvalstructure, leading to potential Double-Free or Use-After-Free (UAF) risks in subsequent lexer calls.Solution
This PR applies a minimal-change fix by reordering the operations within the
tokenizeloop:hsql_lexcall is moved to the end of the loop. This ensures that the current token (including the first one) is fully processed and its memory is safely released before the next token is fetched.yylval.sval = nullptr;immediately afterfree()to eliminate dangling pointers.while (token != 0)structure to keep the diff as clean as possible.Verification
LeakSanitizer (LSan): Confirmed that the previously detected 11-byte leak per SQL statement is now fully resolved.
AddressSanitizer (ASan): No memory corruption or illegal access detected during stress testing with consecutive identifier tokens.