Skip to content

Commit 6f18caf

Browse files
hyperpolymathclaude
andcommitted
fix(assail): strip comments before counting unsafe/FFI patterns
The Rust and Zig analysers previously substring-matched dangerous-pattern keywords (`unsafe {`, `unsafe fn`, `@cImport`, `@ptrCast`, `@intToPtr`, `@ptrToInt`) directly against the raw source text. This fired false positives against files whose only mentions of those patterns were inside comments — typically meta-tests documenting a no-unsafe contract or architectural headers describing what the file does. Two motivating cases from 007: - `crates/oo7-core/tests/aspect_tests.rs` — a meta-test whose stated purpose is literally "the absence of `unsafe` in this file IS the assertion" was flagged as containing 1 unsafe block because seven comments discussed the word "unsafe". - `compiler-core/build.zig` — a Zig build script flagged as "2 C interop imports" because two comment lines (file header + section divider) described the file's role using the string `@cImport`, even though the file contained zero real invocations. Fix: - Rust analyser: after the existing `strip_string_literals_rs` pass, run `strip_proof_comments(_, "//", Some(("/*", "*/")))` to also remove line + block comments before the match counting. String literals go first so that `//` or `/*` embedded in a string doesn't get treated as a comment start. - Zig analyser: introduce a shared `strip_simple_double_quoted_strings` helper (handles C-style `"..."` with `\` escapes — simpler than Rust's raw-string / byte-string / char-literal variants) and use it before `strip_proof_comments(_, "//", None)` (Zig has no block comments). Apply to the `@cImport`, `@ptrCast`, `@intToPtr`, `@ptrToInt` counts. The `count_unsafe_in_test_blocks` and test-only-helper detection paths still walk raw content — changing them would require threading the stripped content into line-walking state-machines and would not fix any observable false-positive (both paths feed into a subtraction that cancels out in the canonical cases). Verified: post-fix scan on 007-lang clears both false positives while preserving every legitimate UnsafeCode / UnsafeFFI finding in zig_bridge.rs and jit_compiler.rs. Classification register updated in 007-lang audit doc (audits/audit-ffi-unsafe.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a22367d commit 6f18caf

1 file changed

Lines changed: 71 additions & 11 deletions

File tree

src/assail/analyzer.rs

Lines changed: 71 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -516,7 +516,7 @@ impl Analyzer {
516516
| Language::Ephapax
517517
| Language::BetLang
518518
| Language::ErrorLang
519-
| Language::VQL
519+
| Language::VCL
520520
| Language::FBQL => {
521521
self.analyze_nextgen_dsl(
522522
&content,
@@ -781,12 +781,21 @@ impl Analyzer {
781781
|| file_path.contains("_mock.") // Mock files
782782
|| file_path.contains("_stub."); // Stub files
783783

784-
// Strip string literal contents before counting so that detection-tool
785-
// source files (which embed patterns as string literals) do not trigger
786-
// their own rules. Stats that measure code structure rather than
787-
// dangerous patterns (allocation sites, I/O, threading) still use the
788-
// raw content because those patterns are safe to count in any context.
789-
let code_only = Self::strip_string_literals_rs(content);
784+
// Strip string literal contents AND comments before counting so that
785+
// detection-tool source files (which embed patterns as string literals)
786+
// do not trigger their own rules, and so that rule names quoted in
787+
// `// ...` or `/* ... */` doc comments — meta-tests, audit prose,
788+
// architectural headers — do not falsely fire. See
789+
// `007-lang/audits/audit-ffi-unsafe.md` §3 for the motivating case
790+
// (a meta-test whose stated purpose is "the absence of `unsafe` in
791+
// this file IS the assertion" was flagged as containing 1 unsafe
792+
// block because comments discussed the word "unsafe").
793+
//
794+
// Stats that measure code structure rather than dangerous patterns
795+
// (allocation sites, I/O, threading) still use the raw content
796+
// because those patterns are safe to count in any context.
797+
let without_strings = Self::strip_string_literals_rs(content);
798+
let code_only = strip_proof_comments(&without_strings, "//", Some(("/*", "*/")));
790799

791800
stats.unsafe_blocks += code_only.matches("unsafe {").count();
792801
stats.unsafe_blocks += code_only.matches("unsafe fn").count();
@@ -3296,10 +3305,24 @@ impl Analyzer {
32963305
}
32973306
}
32983307

3308+
// Strip string literals and // line comments before counting built-in
3309+
// unsafe-ops, so that mentions of `@cImport`, `@ptrCast`, `@intToPtr`,
3310+
// `@ptrToInt` in doc comments or prose (file headers, architectural
3311+
// notes, build-script commentary) do not trigger false-positive
3312+
// findings. See `007-lang/audits/audit-ffi-unsafe.md` §4 for the
3313+
// motivating case (a build.zig whose only `@cImport` occurrences
3314+
// were in comment text describing the file's role was flagged as
3315+
// "2 C interop imports"). Zig has no block comments, so only `//`
3316+
// line comments need stripping.
3317+
let code_only_zig = {
3318+
let without_strings = strip_simple_double_quoted_strings(content);
3319+
strip_proof_comments(&without_strings, "//", None)
3320+
};
3321+
32993322
// Unsafe pointer operations (excluding those in test blocks and test-only helpers)
3300-
let ptr_ops = content.matches("@intToPtr").count()
3301-
+ content.matches("@ptrToInt").count()
3302-
+ content.matches("@ptrCast").count()
3323+
let ptr_ops = code_only_zig.matches("@intToPtr").count()
3324+
+ code_only_zig.matches("@ptrToInt").count()
3325+
+ code_only_zig.matches("@ptrCast").count()
33033326
- test_ptr_ops
33043327
- helper_ptr_ops;
33053328
stats.unsafe_blocks += ptr_ops;
@@ -3318,7 +3341,7 @@ impl Analyzer {
33183341
}
33193342

33203343
// C interop (excluding those in test blocks and test-only helpers)
3321-
let c_import = content.matches("@cImport").count() - test_c_imports - helper_c_imports;
3344+
let c_import = code_only_zig.matches("@cImport").count() - test_c_imports - helper_c_imports;
33223345
stats.unsafe_blocks += c_import;
33233346

33243347
if c_import > 0 {
@@ -4874,6 +4897,43 @@ fn count_line_pattern(content: &str, pattern: &str) -> usize {
48744897
/// nested comments, or escapes. That is acceptable here because we only need
48754898
/// the comment regions removed; mis-handling a string literal that happens to
48764899
/// contain `believe_me` would still be a true positive.
4900+
/// Strip `"..."` double-quoted string literals, honouring `\` escape
4901+
/// sequences. Preserves the opening and closing `"` so that later passes can
4902+
/// still see the delimiters, but replaces the contents with the empty string
4903+
/// (i.e. `"foo"` becomes `""`). Used by analyzers whose target language has
4904+
/// simple C-style string syntax (Zig, C, JavaScript, most brace languages).
4905+
/// For Rust's richer literal syntax (raw strings, byte strings, char
4906+
/// literals, lifetimes) use [`Analyzer::strip_string_literals_rs`] instead.
4907+
fn strip_simple_double_quoted_strings(content: &str) -> String {
4908+
let mut out = String::with_capacity(content.len());
4909+
let chars: Vec<char> = content.chars().collect();
4910+
let n = chars.len();
4911+
let mut i = 0;
4912+
while i < n {
4913+
if chars[i] == '"' {
4914+
out.push('"');
4915+
i += 1;
4916+
while i < n {
4917+
if chars[i] == '\\' && i + 1 < n {
4918+
i += 2; // skip escape sequence
4919+
} else if chars[i] == '"' {
4920+
break;
4921+
} else {
4922+
i += 1; // skip string content
4923+
}
4924+
}
4925+
if i < n {
4926+
out.push('"');
4927+
i += 1;
4928+
}
4929+
} else {
4930+
out.push(chars[i]);
4931+
i += 1;
4932+
}
4933+
}
4934+
out
4935+
}
4936+
48774937
fn strip_proof_comments(content: &str, line_marker: &str, block: Option<(&str, &str)>) -> String {
48784938
// First strip block comments greedily, left-to-right.
48794939
let mut without_blocks = String::with_capacity(content.len());

0 commit comments

Comments
 (0)