Token-based CRAN code checker (no more regex over raw lines)#21
Merged
Conversation
check_code_lines() regex-scanned raw lines, producing false positives
that swamped real findings (36 of 36 warnings on chatterbox were noise
or mislabeled). Replaced by check_code.R, built on
utils::getParseData(parse(file, keep.source = TRUE)):
- Comments and string literals are never scanned: COMMENT and STR_CONST
tokens carry no findings, killing 'cat(' inside sprintf'd TorchScript
source and 'T' inside '# (B, T, dim)' shape comments.
- Function checks (print, cat, installed.packages, setwd, set.seed,
options) match whole SYMBOL_FUNCTION_CALL tokens, so torch.cat() is
not cat().
- print()/cat() are allowed inside print.*/format.* S3 methods, found
by walking parent links to the enclosing top-level definition.
- T/F shorthand is skipped when T or F is assigned, a formal, or a for
variable in the same top-level expression (a variable, not a logical).
- setwd()/on.exit() pairing is checked within the enclosing definition
instead of a 5-line window; set.seed() literals are excused by a seed
formal instead of a 20-line lookbehind.
- Unparseable files report one finding instead of erroring the check.
utils joins Imports (base distribution, no external dependency).
Closes #20.
man/check_code_lines.Rd was deleted with the function but restored by a stray git checkout before committing; man/get_maintainer_from_desc.Rd had been stale since that function was removed. The render_sections description used literal braces in an Rd macro example, tripping the 'Lost braces' check; reworded. R CMD check: 0 errors, 0 warnings, 0 notes.
Contributor
Author
|
Stale-Rd blocker fixed, plus the two older ones since they're the same class of noise this PR exists to kill:
Full |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #20.
check_code_lines() regex-scanned raw source lines. On chatterbox that produced 36 warnings from the code checker, of which 30 were false positives. Replaced with a token-based checker built on
utils::getParseData(parse(file, keep.source = TRUE)), in a newR/check_code.R(cran.R was at 783 lines doing three jobs).How each false-positive class dies:
print.*/format.*methods.cat(inside string literals /torch.cat(: STR_CONST tokens are never scanned, and call checks match wholeSYMBOL_FUNCTION_CALLtokens, so a dotted name can't matchcat.#).T <- x$size(3)locals: a T/F assigned, declared as a formal, or used as a for-variable in the same top-level expression is treated as a variable, not the logical shorthand.$T/@Tmember access is also excluded.Free wins from the same machinery:
setwd()/on.exit()pairing is checked within the enclosing function instead of a ±5-line window,set.seed()literals are excused by aseedformal instead of a 20-line lookbehind, and an unparseable file reports one finding instead of erroring the whole check.Verified on chatterbox: 36 → 6 warnings. The 6 survivors are
cat()calls inchatterbox_gc_options()(gc_options.R:66-78), a regular function, not a print method — those are honest findings, exactly what the lint is for.utilsjoins Imports; it ships with base R, so the no-external-dependencies claim stands.Tests: 206 pass (33 new in test_check_code.R covering all four classes plus the surviving checks).