refactor: rewrite code-analysis.md across all 5 versions (685→339 lines)#390
Draft
kiyotis wants to merge 32 commits into
Draft
refactor: rewrite code-analysis.md across all 5 versions (685→339 lines)#390kiyotis wants to merge 32 commits into
kiyotis wants to merge 32 commits into
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d verify (#6) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Runner, scenarios (3), format checker, e2e prompt, and HOW-TO-RUN-CODE-ANALYSIS.md. 42 tests pass. Dry-run exits 0. complete task #1 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 3 scenarios (ca-01, ca-02, ca-03) ran against current code-analysis.md. DeepEval: answer_correctness=1.0/1.0/1.0, faithfulness=1.0/1.0/1.0, answer_relevancy=0.964/1.0/0.930. Format check: PASS for all. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ProjectAction.java exists in both nablarch-example-rest and
nablarch-example-web under .lw/nab-official/v6. Without scoping,
find-file.sh may return the wrong file.
- Add optional `when.project_subdir` field to scenario schema: when set,
the runner uses {project_dir}/{project_subdir} as cwd for the claude
invocation, narrowing find-file.sh's search to that sub-project.
- When project_subdir is set, script references in --allowedTools switch
to absolute paths so they resolve correctly from the subdir cwd.
- Add project_subdir to ca-01 scenario (nablarch-example-rest).
- Add 3 tests covering: no-subdir cwd, subdir cwd, absolute paths in tools.
- Document project_subdir in HOW-TO-RUN-CODE-ANALYSIS.md.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…RY violation - SE-F1/F2: Factor scripts_dir out unconditionally from skill_dir.resolve(); remove hardcoded nabledge-6 path in non-subdir branch (DRY violation) - Lang-F1: Add existence check for project_subdir; raise ValueError with clear message when directory does not exist - Lang-F2A: Extract _setup_skill_dir helper in tests to eliminate 5-line filesystem scaffolding repeated across 3 test methods - Lang-F2B: Remove dead `import subprocess` from 3 fake_run closures - QA: Fix resolve() discrepancy in assertion — use skill_dir.resolve()/scripts to match production code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…runs) Ran all 3 v6 scenarios against current 685-line code-analysis.md. Results: tools/benchmark/results/20260701-1736-code-analysis-baseline/ Baseline: .rn/refact-code-analysis/baseline.md Scores: ca-01: correctness=0.30, relevancy=0.96, faithfulness=1.00, format=PASS ca-02: correctness=1.00, relevancy=0.99, faithfulness=1.00, format=PASS ca-03: correctness=1.00, relevancy=0.97, faithfulness=1.00, format=PASS ca-01 low correctness is baseline behavior; code-analysis.md is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
41 findings: 18 duplicate, 5 conflict, 7 structural, 11 verbose. Projected savings: ~212 lines → target ~390–410 lines after rewrite. Top issues: - D-01/D-03/D-04: duplicate refinement workflow (~35L) - S-03/D-14-17: Best practices restates in-step rules (~30L) - V-08: inline Nablarch example duplicates template-guide (~30L) - C-01: example uses "uses" label which rules forbid - C-02/C-03: two competing refinement workflow blocks Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Section outline: Steps 0-4 (renumbered) + Error handling. Mapping: all 41 audit findings → drop/keep/merge/move. Gaps: G-01 OUTPUT_PATH gate, G-02 working memory prereq, G-03 DATE_PORTION. Projected: ~395-405 lines (within ≤400 target). Key structural changes: - Best practices + Output template sections dropped - Class/sequence diagram instructions merged into unified blocks - Example execution moved after Overview - Step numbers 0-4 instead of unnumbered + Step 0-3 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Rewrote all 5 versions following the approved design (Task #4). Key changes: - Steps renumbered 0-4 (confirm target, record time, deps, knowledge, generate) - Dropped: Best practices section (~30L), Output template section (~14L) - Dropped: compact duplicate refinement workflow blocks (D-01/03/04) - Merged: class/sequence diagram instructions into unified blocks (S-04/05) - Dropped: ObjectMapper inline example — template-guide.md is authoritative (V-08) - Merged: Step 4.5 construct/verify/write into one continuous operation (C-05) - Moved: Example execution after Overview (S-06) - Added: G-01 OUTPUT_PATH gate, G-02 working memory prereq, G-03 DATE_PORTION naming - Fixed C-01: example label "uses" → "queries" QA review findings applied: - Finding 1: read-sections.sh uses full repo-relative path (not bare "scripts/") - Finding 2: verify string matches actual prefill-template.sh output format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Ran all 3 scenarios against rewritten 339-line code-analysis.md. Results: tools/benchmark/results/20260703-1758-code-analysis-verify/ Comparison: .rn/refact-code-analysis/verification.md Scores (verify vs baseline): ca-01: correctness 0.30→1.00 (+0.70), relevancy 0.96→0.97, faith 1.00 ca-02: correctness 1.00→1.00, relevancy 0.99→0.99, faith 1.00 ca-03: correctness 1.00→1.00, relevancy 0.97→0.92 (−0.05, within variance), faith 1.00 All format checks PASS. Notable: ca-01 correctness improved dramatically (JAX-RS class correctly identified with rewritten instructions). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Approach
The existing
code-analysis.md(685 lines per version) had accumulated 41 quality issues: 18 duplicate rules, 5 conflicting instructions, 7 structural problems, and 11 verbose passages. A benchmark harness was built first to capture baseline scores, then the file was audited and redesigned, then rewritten, then scores were verified against baseline.The rewrite strategy was to apply a strict single-source-of-truth principle: each rule appears in exactly one location, steps are numbered 0–4 with clear boundaries, and redundant sections (Best Practices, Output Template) were removed because their content was either duplicated elsewhere or belonged in qa.md.
No issue number — this branch (
worktree-refact-code-analysis) is a direct refactor branch without a linked issue.Tasks
Work log:
.rn/refact-code-analysis/(audit.md, design.md, baseline.md, verification.md)Expert Review
QA review was conducted inline during Task #5. Two findings were fixed before completing the rewrite. No separate review file.
Success Criteria Check
🤖 Generated with Claude Code