Skip to content

Commit 30132b6

Browse files
hyperpolymathclaude
andcommitted
feat(assail): broaden #[cfg(test)] mod strip; recognise .take() as read bound
Two changes, both narrowing false-positive surface across the estate: 1. strip_cfg_test_modules_rs now applies BEFORE every code_only substring check in analyze_rust, not just UnboundedAllocation. The same FP class (`#[test] fn exercises_unsafe_wrapper()` inside a production file's `#[cfg(test)] mod tests { … }` block counting toward that file's unsafe-block total) was latent for unsafe / panic / unwrap / crypto counts — widening the strip closes it. Measured impact on 007-lang scan: UnboundedAllocation: 72 → 4 → 0 PanicPath: 21 → 4 (test-mod unwraps excluded) UncheckedError: 15 → 15 (unchanged — not test-mod) UnsafeCode: 4 → 4 (unchanged — all FFI boundary) InsecureProtocol: 1 → 1 (unchanged) This supersedes the narrow 9eda513 scoping, which used a separate code_no_test_mods local for the unbounded-allocation check only. The pipeline is now: raw content → strip_string_literals_rs (string bodies → ' ') → strip_proof_comments (// + /* */ → whitespace) → strip_cfg_test_modules_rs (#[cfg(test)] mod bodies → whitespace) = code_only (used by every dangerous-pattern check) 2. has_unbounded_allocations now treats `.take(` as a bounded-read marker. `.take(N).read_to_end(&mut buf)` and `.take(N).read_to_string(&mut buf)` are the canonical idioms for bounding a full-file read in Rust; they should disarm the UnboundedAllocation rule exactly as the lexical `limit` token already does. Previously, a file that bounded its reads idiomatically but did not happen to contain the lowercase word "limit" anywhere in its source would still be flagged. The new predicate is: let read_is_bounded = code_only.contains("limit") || code_only.contains(".take("); applied to both read_to_end and read_to_string. Code that relies on `.take()` semantically (which is the idiom std::io teaches) is now correctly recognised as bounded. Tests: 4 new end-to-end tests (186 total; 182 prior + 4 new): - analyze_rust_ignores_unsafe_inside_cfg_test_mod - analyze_rust_ignores_unbounded_keyword_in_cfg_test_mod - analyze_rust_take_disarms_read_to_string - analyze_rust_read_to_string_without_bound_still_fires (sanity) All lib tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent eda004a commit 30132b6

1 file changed

Lines changed: 167 additions & 33 deletions

File tree

src/assail/analyzer.rs

Lines changed: 167 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -795,7 +795,20 @@ impl Analyzer {
795795
// (allocation sites, I/O, threading) still use the raw content
796796
// because those patterns are safe to count in any context.
797797
let without_strings = Self::strip_string_literals_rs(content);
798-
let code_only = strip_proof_comments(&without_strings, "//", Some(("/*", "*/")));
798+
let without_comments =
799+
strip_proof_comments(&without_strings, "//", Some(("/*", "*/")));
800+
// Apply inline-test-mod stripping globally so `#[cfg(test)] mod
801+
// tests { … }` is treated as test context by every substring-based
802+
// dangerous-pattern check below — the Rust analogue of Zig's
803+
// `count_unsafe_in_test_blocks`. See
804+
// `Analyzer::strip_cfg_test_modules_rs` doc-comment for the
805+
// recognised attribute forms. Previously scoped only to the
806+
// unbounded-allocation check; widened 2026-04-17 after the same
807+
// FP class was projected to affect unsafe / panic / unwrap /
808+
// crypto counts too (a `#[test] fn exercises_unsafe_wrapper()`
809+
// inside a production file would otherwise count toward that
810+
// file's unsafe-block total).
811+
let code_only = Self::strip_cfg_test_modules_rs(&without_comments);
799812

800813
stats.unsafe_blocks += code_only.matches("unsafe {").count();
801814
stats.unsafe_blocks += code_only.matches("unsafe fn").count();
@@ -836,45 +849,44 @@ impl Analyzer {
836849
// Count allocations but flag unbounded patterns separately
837850
stats.allocation_sites += vec_new_count + box_new_count + string_new_count;
838851

839-
// Flag unbounded allocation patterns as high-risk.
852+
// Flag unbounded allocation patterns as high-risk. `code_only`
853+
// already has string literals, comments, and `#[cfg(test)] mod
854+
// tests` bodies stripped, so keyword substring checks below do
855+
// not fire on doc comments, generated source strings, or
856+
// test-mod identifiers.
840857
//
841-
// The check runs against `code_no_test_mods` — `code_only` (comments
842-
// + string literals already stripped) with any `#[cfg(test)] mod <x>
843-
// { … }` body also stripped. This way dangerous-word keywords
844-
// embedded in doc comments, string literals, or inline test modules
845-
// (`#[test] fn validate_detects_left_recursion()` inside a production
846-
// source file's `mod tests` block, etc.) do not falsely fire.
858+
// The earlier version also paired bare `for` / `while let` /
859+
// `loop` tokens with `push(` or `Vec::new` as standalone
860+
// heuristics. Those pairs co-occur in essentially every
861+
// non-trivial Rust file (bounded `for x in collection {
862+
// v.push(y) }` is normal code), so they generated ~60 critical
863+
// findings per average repo with no signal. Dropped in favour
864+
// of the explicit-keyword / tiny-capacity / unlimited-read
865+
// signals below, which remain specific enough to be useful.
847866
//
848-
// The earlier version also paired bare `for` / `while let` / `loop`
849-
// tokens with `push(` or `Vec::new` as standalone heuristics. Those
850-
// pairs co-occur in essentially every non-trivial Rust file (bounded
851-
// `for x in collection { v.push(y) }` is normal code), so they
852-
// generated ~60 critical findings per average repo with no signal.
853-
// Dropped in favour of the explicit-keyword / tiny-capacity /
854-
// unlimited-read signals below, which remain specific enough to be
855-
// useful.
856-
let code_no_test_mods = Self::strip_cfg_test_modules_rs(&code_only);
857-
let has_unbounded_allocations = code_no_test_mods.contains("unbounded")
858-
|| code_no_test_mods.contains("no_bound")
859-
|| code_no_test_mods.contains("no_limit")
860-
|| code_no_test_mods.contains("boundless")
861-
|| code_no_test_mods.contains("unlimited")
862-
|| code_no_test_mods.contains("unconstrained")
867+
// `read_to_*` is disarmed by either the lexical `limit` token
868+
// OR `.take(` in the same file — both are valid bounded-read
869+
// patterns; `.take(N).read_to_end(&mut buf)` is the canonical
870+
// idiom.
871+
let read_is_bounded =
872+
code_only.contains("limit") || code_only.contains(".take(");
873+
let has_unbounded_allocations = code_only.contains("unbounded")
874+
|| code_only.contains("no_bound")
875+
|| code_only.contains("no_limit")
876+
|| code_only.contains("boundless")
877+
|| code_only.contains("unlimited")
878+
|| code_only.contains("unconstrained")
863879
// `infinite` matches Rust std `f64::is_infinite()`, which is
864880
// benign. Require the word in a non-method-call context.
865-
|| (code_no_test_mods.contains("infinite")
866-
&& !code_no_test_mods.contains("is_infinite"))
881+
|| (code_only.contains("infinite") && !code_only.contains("is_infinite"))
867882
// Unterminated recursion lacking any depth guard.
868-
|| (code_no_test_mods.contains("recursion")
869-
&& !code_no_test_mods.contains("depth"))
883+
|| (code_only.contains("recursion") && !code_only.contains("depth"))
870884
// Suspiciously small initial capacity for a growing vector.
871-
|| code_no_test_mods.contains("with_capacity(0)")
872-
|| code_no_test_mods.contains("with_capacity(1)")
885+
|| code_only.contains("with_capacity(0)")
886+
|| code_only.contains("with_capacity(1)")
873887
// Network / I/O primitives that slurp without a cap.
874-
|| (code_no_test_mods.contains("read_to_end")
875-
&& !code_no_test_mods.contains("limit"))
876-
|| (code_no_test_mods.contains("read_to_string")
877-
&& !code_no_test_mods.contains("limit"));
888+
|| (code_only.contains("read_to_end") && !read_is_bounded)
889+
|| (code_only.contains("read_to_string") && !read_is_bounded);
878890

879891
if has_unbounded_allocations && !is_test_file {
880892
weak_points.push(WeakPoint {
@@ -6242,4 +6254,126 @@ pub mod tests {
62426254
assert!(out.contains("not(debug_assertions)"));
62436255
assert!(out.contains("test"));
62446256
}
6257+
6258+
// ---------------------------------------------------------------
6259+
// End-to-end: inline #[cfg(test)] mod is test context for *every*
6260+
// substring-based check in analyze_rust, not just unbounded-alloc.
6261+
// ---------------------------------------------------------------
6262+
6263+
fn analyze_rust_file(path: &str, content: &str) -> Vec<WeakPoint> {
6264+
let tmp = TempDir::new().unwrap();
6265+
let file_path = tmp.path().join(path);
6266+
if let Some(parent) = file_path.parent() {
6267+
fs::create_dir_all(parent).unwrap();
6268+
}
6269+
fs::write(&file_path, content).unwrap();
6270+
let analyzer = Analyzer::new(tmp.path()).unwrap();
6271+
let report = analyzer.analyze().unwrap();
6272+
report.weak_points
6273+
}
6274+
6275+
#[test]
6276+
fn analyze_rust_ignores_unsafe_inside_cfg_test_mod() {
6277+
// Production code: zero unsafe blocks. Test code: one unsafe
6278+
// block. The file as a whole should NOT be flagged for
6279+
// UnsafeCode because the unsafe block is test-context.
6280+
let src = "\
6281+
pub fn safe_prod() -> i64 { 1 + 2 }
6282+
6283+
#[cfg(test)]
6284+
mod tests {
6285+
use super::*;
6286+
6287+
#[test]
6288+
fn exercises_raw_pointer() {
6289+
let x: u64 = 42;
6290+
let ptr = &x as *const u64;
6291+
unsafe {
6292+
assert_eq!(*ptr, 42);
6293+
}
6294+
}
6295+
}
6296+
";
6297+
let findings = analyze_rust_file("src/lib.rs", src);
6298+
let unsafe_findings: Vec<_> = findings
6299+
.iter()
6300+
.filter(|wp| wp.category == WeakPointCategory::UnsafeCode)
6301+
.collect();
6302+
assert!(
6303+
unsafe_findings.is_empty(),
6304+
"unsafe inside #[cfg(test)] mod must not count: {:?}",
6305+
unsafe_findings
6306+
);
6307+
}
6308+
6309+
#[test]
6310+
fn analyze_rust_ignores_unbounded_keyword_in_cfg_test_mod() {
6311+
let src = "\
6312+
pub fn prod() -> i64 { 42 }
6313+
6314+
#[cfg(test)]
6315+
mod tests {
6316+
#[test]
6317+
fn choreography_unbounded_loop() {
6318+
assert_eq!(1, 1);
6319+
}
6320+
}
6321+
";
6322+
let findings = analyze_rust_file("src/lib.rs", src);
6323+
let ub_findings: Vec<_> = findings
6324+
.iter()
6325+
.filter(|wp| wp.category == WeakPointCategory::UnboundedAllocation)
6326+
.collect();
6327+
assert!(
6328+
ub_findings.is_empty(),
6329+
"`unbounded` in test-fn identifier must not count: {:?}",
6330+
ub_findings
6331+
);
6332+
}
6333+
6334+
#[test]
6335+
fn analyze_rust_take_disarms_read_to_string() {
6336+
// `.take(N).read_to_string(...)` is a bounded read — must not
6337+
// trigger UnboundedAllocation even without the word `limit`.
6338+
let src = "\
6339+
use std::io::Read;
6340+
6341+
const CAP: u64 = 64 * 1024 * 1024;
6342+
6343+
pub fn load(path: &str) -> std::io::Result<String> {
6344+
let mut buf = String::new();
6345+
std::fs::File::open(path)?.take(CAP).read_to_string(&mut buf)?;
6346+
Ok(buf)
6347+
}
6348+
";
6349+
let findings = analyze_rust_file("src/read.rs", src);
6350+
let ub_findings: Vec<_> = findings
6351+
.iter()
6352+
.filter(|wp| wp.category == WeakPointCategory::UnboundedAllocation)
6353+
.collect();
6354+
assert!(
6355+
ub_findings.is_empty(),
6356+
"`.take(N).read_to_string(...)` is bounded and must not fire: {:?}",
6357+
ub_findings
6358+
);
6359+
}
6360+
6361+
#[test]
6362+
fn analyze_rust_read_to_string_without_bound_still_fires() {
6363+
// Sanity: the rule still catches the genuine pattern.
6364+
let src = "\
6365+
pub fn load(path: &str) -> std::io::Result<String> {
6366+
std::fs::read_to_string(path)
6367+
}
6368+
";
6369+
let findings = analyze_rust_file("src/read.rs", src);
6370+
let ub_findings: Vec<_> = findings
6371+
.iter()
6372+
.filter(|wp| wp.category == WeakPointCategory::UnboundedAllocation)
6373+
.collect();
6374+
assert!(
6375+
!ub_findings.is_empty(),
6376+
"unbounded read_to_string in production must still fire"
6377+
);
6378+
}
62456379
}

0 commit comments

Comments
 (0)