lexer: render invalid-character codepoint as hex, not decimal#431
Open
c-tonneslan wants to merge 1 commit into
Open
lexer: render invalid-character codepoint as hex, not decimal#431c-tonneslan wants to merge 1 commit into
c-tonneslan wants to merge 1 commit into
Conversation
The three "invalid character" messages render the offending rune as "\u" + %04d, which prints the byte value in decimal. The "\u" prefix implies a Unicode codepoint, so anything past U+0009 comes out wrong: 0x0E reports as "�", 0x1F reports as "1", etc. Switch to %04x so the printed codepoint matches the actual character. The existing yml tests only exercised 0x07 and 0x00, where decimal and hex render the same; added one for 0x0E so the difference is covered. Signed-off-by: Charlie Tonneslan <cst0520@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Three "Invalid character" / "Cannot contain the invalid character" messages in lexer.go format the offending rune with
%04d, which prints the byte in decimal. The surrounding\uprefix implies a Unicode codepoint, so any rune past U+0009 comes out wrong:Same drift for 0x0A shown as 0010, 0x1F as 0031, etc.
The existing yml tests only exercised 0x07 and 0x00, where decimal and hex format identically, so the bug stayed invisible. Added a case for 0x0E.
go test ./...is clean with the fix.