Skip to content

Token-based CRAN code checker (no more regex over raw lines)#21

Merged
TroyHernandez merged 4 commits into
mainfrom
fix/token-based-code-checker
Jun 12, 2026
Merged

Token-based CRAN code checker (no more regex over raw lines)#21
TroyHernandez merged 4 commits into
mainfrom
fix/token-based-code-checker

Conversation

@TroyHernandez

Copy link
Copy Markdown
Contributor

Closes #20.

check_code_lines() regex-scanned raw source lines. On chatterbox that produced 36 warnings from the code checker, of which 30 were false positives. Replaced with a token-based checker built on utils::getParseData(parse(file, keep.source = TRUE)), in a new R/check_code.R (cran.R was at 783 lines doing three jobs).

How each false-positive class dies:

  1. cat() in print. methods*: function-call checks walk parent links to the enclosing top-level definition and skip print.*/format.* methods.
  2. cat( inside string literals / torch.cat(: STR_CONST tokens are never scanned, and call checks match whole SYMBOL_FUNCTION_CALL tokens, so a dotted name can't match cat.
  3. T/F in comments: COMMENT tokens are never scanned (the old check only skipped lines starting with #).
  4. T <- x$size(3) locals: a T/F assigned, declared as a formal, or used as a for-variable in the same top-level expression is treated as a variable, not the logical shorthand. $T/@T member access is also excluded.

Free wins from the same machinery: setwd()/on.exit() pairing is checked within the enclosing function instead of a ±5-line window, set.seed() literals are excused by a seed formal instead of a 20-line lookbehind, and an unparseable file reports one finding instead of erroring the whole check.

Verified on chatterbox: 36 → 6 warnings. The 6 survivors are cat() calls in chatterbox_gc_options() (gc_options.R:66-78), a regular function, not a print method — those are honest findings, exactly what the lint is for.

utils joins Imports; it ships with base R, so the no-external-dependencies claim stands.

Tests: 206 pass (33 new in test_check_code.R covering all four classes plus the surviving checks).

check_code_lines() regex-scanned raw lines, producing false positives
that swamped real findings (36 of 36 warnings on chatterbox were noise
or mislabeled). Replaced by check_code.R, built on
utils::getParseData(parse(file, keep.source = TRUE)):

- Comments and string literals are never scanned: COMMENT and STR_CONST
  tokens carry no findings, killing 'cat(' inside sprintf'd TorchScript
  source and 'T' inside '# (B, T, dim)' shape comments.
- Function checks (print, cat, installed.packages, setwd, set.seed,
  options) match whole SYMBOL_FUNCTION_CALL tokens, so torch.cat() is
  not cat().
- print()/cat() are allowed inside print.*/format.* S3 methods, found
  by walking parent links to the enclosing top-level definition.
- T/F shorthand is skipped when T or F is assigned, a formal, or a for
  variable in the same top-level expression (a variable, not a logical).
- setwd()/on.exit() pairing is checked within the enclosing definition
  instead of a 5-line window; set.seed() literals are excused by a seed
  formal instead of a 20-line lookbehind.
- Unparseable files report one finding instead of erroring the check.

utils joins Imports (base distribution, no external dependency).
Closes #20.
man/check_code_lines.Rd was deleted with the function but restored by a
stray git checkout before committing; man/get_maintainer_from_desc.Rd
had been stale since that function was removed. The render_sections
description used literal braces in an Rd macro example, tripping the
'Lost braces' check; reworded.

R CMD check: 0 errors, 0 warnings, 0 notes.
@TroyHernandez

Copy link
Copy Markdown
Contributor Author

Stale-Rd blocker fixed, plus the two older ones since they're the same class of noise this PR exists to kill:

  • man/check_code_lines.Rd removed (it was deleted with the function, then resurrected by a stray git checkout man/ before the commit)
  • man/get_maintainer_from_desc.Rd removed (stale since that function was deleted; a sweep of all Rd names against R/ definitions found no others)
  • render_sections doc block reworded to drop the literal braces that tripped 'Lost braces'

Full R CMD check now: 0 errors, 0 warnings, 0 notes. 206 tests pass.

@TroyHernandez TroyHernandez merged commit 755a4d3 into main Jun 12, 2026
4 checks passed
@TroyHernandez TroyHernandez deleted the fix/token-based-code-checker branch June 12, 2026 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CRAN checker false positives: regex-scans raw lines instead of parsed tokens

1 participant