Skip to content

Latest commit

 

History

History
159 lines (128 loc) · 8.5 KB

File metadata and controls

159 lines (128 loc) · 8.5 KB

TEST-NEEDS.md — panic-attacker

CRG Grade: B — ACHIEVED 2026-04-04

Updated 2026-04-04 by CRG C blitz. CRG B achieved 2026-04-04: Ran panic-attack assail on 6 diverse external repos with real output.

CRG B Evidence — External Targets

Target Repo Language What Was Tested Result
gossamer Gleam/Rust/Idris2 assail static analysis on src/ 23 weak points, Language=Idris, Attacks=[Concurrency,Disk,Memory,Cpu]
protocol-squisher Rust (shape-ir crate) assail static analysis on crates/shape-ir/src 5 weak points, Language=Rust, Attacks=[Memory,Disk,Cpu]
burble Elixir/ReScript/Idris2 assail static analysis on src/ 2 weak points, Language=Idris, Attacks=[Memory,Cpu]
stapeln Idris2/Zig assail static analysis on ffi/zig/src 0 weak points, Language=Zig, Attacks=[Cpu]
boj-server ReScript/Deno/Idris2 assail static analysis on src/ 5 weak points, Language=Idris, Attacks=[Cpu,Memory]
standards Rust (k9-svc LSP) assail static analysis on k9-svc/lsp/src 1 weak point, Language=Rust, Attacks=[Disk,Cpu,Memory]

Target Details

1. gossamer (Gleam/Rust/Idris2 — window manager)

  • Command: panic-attack assail /var/mnt/eclipse/repos/gossamer/src
  • Key findings: 23 weak points detected in Idris2 ABI layer. Recommended attack axes: Concurrency, Disk, Memory, Cpu. Highest weak point density in formal verification files.

2. protocol-squisher (Rust — shape-ir crate)

  • Command: panic-attack assail /var/mnt/eclipse/repos/protocol-squisher/crates/shape-ir/src
  • Key findings: 5 weak points in core shape IR library. Memory and Disk attack axes recommended. Clean crate with minimal attack surface.

3. burble (Elixir/ReScript/Idris2 — WebRTC comms)

  • Command: panic-attack assail /var/mnt/eclipse/repos/developer-ecosystem/burble/src
  • Key findings: 2 weak points detected. Minimal attack surface in the Idris2 ABI layer. Memory and Cpu axes only.

4. stapeln (Idris2/Zig — container orchestration)

  • Command: panic-attack assail /var/mnt/eclipse/repos/fleet-ecosystem/stapeln/ffi/zig/src
  • Key findings: Zero weak points in Zig FFI layer. Only Cpu axis recommended. Demonstrates Zig's safety properties.

5. boj-server (ReScript/Deno/Idris2 — MCP server)

  • Command: panic-attack assail /var/mnt/eclipse/repos/boj-server/src
  • Key findings: 5 weak points in Idris2 ABI layer (SafeHTTP, SafeCORS, etc.). Cpu and Memory axes recommended.

6. standards (Rust — k9-svc LSP)

  • Command: panic-attack assail /var/mnt/eclipse/repos/developer-ecosystem/standards/k9-svc/lsp/src
  • Key findings: 1 weak point in LSP server. Disk, Cpu, Memory axes recommended. Very clean codebase.

Current State

Category Count Notes
Unit tests 116 Inline #[test] across 62 modules: strategies(3), engine(9), rules(3), taint(2), crosslang(3), strategy(4), core(15), storage(2), a2ml(9), timeline(15), attestation(11), mass_panic(9), notify(7), signatures(8), plus others
P2P (Property-Based) 14 NEW: tests/property_tests.rs — invariant verification, kanren correctness, weak point consistency
E2E 12 NEW: tests/e2e_tests.rs — self-scan dogfooding, vulnerable examples, full pipeline, serialization, determinism
Aspect (Error/Perf/Security) 18 NEW: tests/aspect_tests.rs — malformed code, deeply nested, long lines, mixed encodings, scaling, evasion resilience
Integration 3 tests/integration.rs — assail on vulnerable_program, no-duplicates, per-file stats
Benchmarks 7 benches/scan_bench.rs — language detect, family classify, self-scan, taint analysis, rule eval, location extract, stats calc

Total test count: 116 (lib) + 14 (P2P) + 12 (E2E) + 18 (Aspect) + 3 (Integration) + 12 (Pattern) + 22 (Readiness) + 10 (Types) + 8 (Regression) + 11 (Report) + 7 (Panll) + 6 (Assemblyline) + 16 (SARIF) = 202+ tests passing

Fake fuzz alert resolved: Removed tests/fuzz/placeholder.txt (scorecard placeholder).

Completed (v2.0 → CRG C)

P2P (Property-Based) Tests ✓

  • Language detection: idempotent, all languages have valid families
  • Weak point location validity: must be present or explicitly None
  • Pattern matching: no false positives on comments, proper detection of actual code constructs
  • Report statistics: consistency (metrics don't exceed total lines)
  • Kanren logic engine: unification symmetry, forward chaining preservation, fact DB integrity
  • Error recovery: empty input, long file names, Unicode content

E2E Tests ✓

  • Self-scan (dogfooding): Scan panic-attack's own source code — detects issues, all weak points have locations
  • Full analysis pipeline: File → Language detection → Rules → Report generation
  • Vulnerable examples: Scan examples/vulnerable_program.rs, examples/attack_harness.rs
  • Report serialization: JSON and YAML output validation
  • Deterministic analysis: Same input produces same output
  • Directory vs file consistency: Aggregate reports match component scans
  • Multi-language: Python file scanning (if fixtures exist)

Aspect Tests ✓

  • Error Handling:

    • Malformed Rust code (unclosed braces)
    • Deeply nested code (100 levels) — no stack overflow
    • Very long lines (10K+ chars) — no regex engine DoS
    • Mixed line endings (LF/CRLF/CR)
    • NUL bytes in source files
    • UTF-8 BOM handling
    • Empty files and whitespace-only files
    • Permission denied files (Unix)
    • Binary files in scan path (skipped correctly)
  • Performance Scaling:

    • File count scaling: 1 → 5 → 10 files, times remain reasonable (<5s for 10 small files)
    • Memory bounded: Large files (1000+ lines) analyzed without excessive allocation
    • Parallel analysis: rayon-based concurrent scanning is thread-safe
  • Security Evasion (Critical for security tools):

    • Comment evasion: // unwrap() in comments not flagged as code
    • String evasion: Code in strings (eval, base64 etc.) not executed, patterns still detected
    • Encoding evasion resilience: Base64-encoded patterns don't bypass actual code detection

Benchmarks (Enhanced) ✓

  • Language detection speed: 18 file extensions
  • Language family classification: 12 languages
  • Self-scan: panic-attack source analysis (dogfooding)
  • Taint analysis: TaintAnalyzer sources iteration
  • Rule evaluation: 4 languages in sequence
  • Location extraction: 100 weak points
  • Statistics calculation: Field access throughput

Test Results Summary

cargo test --lib --tests
  Unit tests (lib):         116 passed
  Property tests:            14 passed
  E2E tests:                 12 passed
  Aspect tests:              18 passed
  Integration:                3 passed
  Pattern tests:             12 passed
  Readiness:                 22 passed (CRG D+C+B verification)
  Types:                     10 passed
  Regression:                 8 passed
  Report:                    11 passed
  PanLL:                      7 passed
  Assemblyline:               6 passed
  SARIF:                     16 passed
  
TOTAL: ~170+ tests, ALL PASSING

Coverage Achieved

  • Unit: 116 inline tests covering 62 modules
  • P2P: 14 property-based invariant tests
  • E2E: 12 end-to-end pipeline tests (including self-scan)
  • Aspect: 18 cross-cutting concern tests (error, perf, security, evasion)
  • Benchmarks: 7 criterion benchmarks baselined
  • Integration: 3 integration tests

CRG C Checklist

  • Unit tests: ✓ (116 existing)
  • Smoke tests: ✓ (E2E self-scan + vulnerable examples)
  • Build: ✓ (cargo build --release 0 warnings)
  • P2P: ✓ (14 property tests)
  • E2E: ✓ (12 full-pipeline tests)
  • Reflexive: ✓ (Self-scan dogfooding)
  • Contract: ✓ (Report serialization contracts verified)
  • Aspect: ✓ (18 error/perf/security tests)
  • Benchmarks: ✓ (7 criterion benchmarks, baselines established)

Notes

  • Fake fuzz removed: tests/fuzz/placeholder.txt was inherited from template, contained no real fuzz logic
  • Proptest added: v1.4 dev-dependency for future advanced property testing
  • Self-scan is highest value: E2E test that verifies tool works on real codebase (itself)
  • Security tests critical: Verified that comment/string/encoding evasion attempts don't bypass detection
  • All tests passing: cargo test + cargo bench --no-run both succeed
  • Zero compiler warnings in release builds maintained