Skip to content

feat(diag): symbolized JSON crash reports on POSIX (mac/linux)#14

Merged
Joshpolansky merged 5 commits into
developfrom
feat/crash-diagnostics
Jun 23, 2026
Merged

feat(diag): symbolized JSON crash reports on POSIX (mac/linux)#14
Joshpolansky merged 5 commits into
developfrom
feat/crash-diagnostics

Conversation

@Joshpolansky

Copy link
Copy Markdown
Owner

On mac/linux a segfault previously produced only a raw loom-crash-.txt (async-signal-safe path), so the viewer showed it as "raw" with no frames — while Windows already wrote a structured, symbolized JSON report. Make both platforms produce the same structured report.

  • crash_handler: extract a shared writeStructuredReport() (FaultReport + cpptrace symbolize + JSON) used by both the Windows UEF and the POSIX signal handler. The POSIX handler now writes the async-signal-safe raw .txt FIRST (guaranteed fallback), then attempts the structured .json best-effort. The FaultReport/JSON is fully built in memory then written in one shot, so the file is complete or absent — never partial. g_reporting still guards reentry. Also stamp a timestamp (system_clock/clock_gettime is async-signal-safe) so signal-path reports sort/display correctly.
  • fault_store scanDir: when both loom-crash-.json and .txt exist for one crash, surface only the .json (the .txt is the raw fallback it supersedes).
  • test: DiagFaultStore.JsonSupersedesRawTxtSibling covers both the json-wins-over-txt and txt-only (structured pass failed) cases.

Note: symbol quality on mac still depends on debug info (a Debug build with a .dSYM gives file:line; otherwise function names / addresses) — that's the separate offline-symbolize/symbolsDir track. The report is structured either way.

Windows build + diag tests green; POSIX path validated by CI.

…ust raw

On mac/linux a segfault previously produced only a raw loom-crash-<pid>.txt
(async-signal-safe path), so the viewer showed it as "raw" with no frames —
while Windows already wrote a structured, symbolized JSON report. Make both
platforms produce the same structured report.

- crash_handler: extract a shared writeStructuredReport() (FaultReport +
  cpptrace symbolize + JSON) used by both the Windows UEF and the POSIX signal
  handler. The POSIX handler now writes the async-signal-safe raw .txt FIRST
  (guaranteed fallback), then attempts the structured .json best-effort. The
  FaultReport/JSON is fully built in memory then written in one shot, so the
  file is complete or absent — never partial. g_reporting still guards reentry.
  Also stamp a timestamp (system_clock/clock_gettime is async-signal-safe) so
  signal-path reports sort/display correctly.
- fault_store scanDir: when both loom-crash-<id>.json and .txt exist for one
  crash, surface only the .json (the .txt is the raw fallback it supersedes).
- test: DiagFaultStore.JsonSupersedesRawTxtSibling covers both the
  json-wins-over-txt and txt-only (structured pass failed) cases.

Note: symbol *quality* on mac still depends on debug info (a Debug build with a
.dSYM gives file:line; otherwise function names / addresses) — that's the
separate offline-symbolize/symbolsDir track. The report is structured either way.

Windows build + diag tests green; POSIX path validated by CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 20, 2026 03:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes POSIX (macOS/Linux) crashes produce the same structured, symbolized JSON crash reports as Windows, while keeping an async-signal-safe raw .txt fallback. It also updates the fault store to prefer .json when both .json and .txt exist for the same crash, and adds coverage for the precedence behavior.

Changes:

  • Refactors crash reporting into a shared structured JSON writer and extends the POSIX handler to emit raw .txt first, then best-effort .json.
  • Updates FaultStore::scanDir() to suppress raw .txt when a sibling structured .json exists.
  • Adds a unit test to validate .json-supersedes-.txt behavior (and the .txt-only fallback case).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
runtime/src/diag/crash_handler.cpp Extracts shared structured report writing and updates POSIX handler to write raw fallback then best-effort JSON.
runtime/src/diag/fault_store.cpp Suppresses raw .txt entries when a sibling .json exists for the same crash stem.
tests/test_diag.cpp Adds test verifying that .json supersedes .txt and that .txt is still surfaced when alone.

Comment on lines +72 to +78
// First pass: note which stems have a structured .json so a sibling .txt
// (the POSIX async-signal-safe fallback) can be suppressed in its favor.
std::set<std::string> jsonStems;
for (const auto& de : std::filesystem::directory_iterator(crashDir_, ec)) {
if (ec || !de.is_regular_file()) continue;
if (de.path().extension() == ".json") jsonStems.insert(de.path().stem().string());
}
Comment on lines 42 to 68
@@ -78,23 +68,34 @@ void writeReportWin(FaultKind kind, const char* reason, int code,
if (f) f << json;
Comment on lines +165 to +167
void writeRawReport(int sig, void* const* frames, int n) {
int fd = ::open(g_reportPathRaw, O_WRONLY | O_CREAT | O_TRUNC, 0644);
if (fd < 0) return;
Joshpolansky and others added 4 commits June 19, 2026 21:00
…lization

cpptrace symbolizes crash frames from BOTH the runtime and the crashing module,
but the shipped (Release) builds carried no debug info and the release tarball
shipped no symbol files — so a downloaded runtime resolved only raw addresses.
Now optimized builds emit debug info and the symbols ride along next to the
binaries, so crash reports resolve to function/file:line with zero setup.

- cmake/LoomModule.cmake (new): loom_add_module() + loom_target_debug_info() +
  loom_target_dsym(). Installed with the SDK and included by loomConfig, so
  module authors who find_package(loom) build plugins with symbols handled the
  same way the bundled modules are. Defines the LOOM_WITH_DEBUG_INFO option (ON).
- root CMakeLists: for optimized configs (Release/RelWithDebInfo), add MSVC
  /Zi + /DEBUG + /OPT:REF,ICF (PDB, kept lean) or GCC/Clang -g (embedded DWARF
  on Linux; .dSYM on macOS via the helper). Covers loom + loom_runtime + all
  bundled modules with no per-module edits.
- modules: emit a .dSYM per module on macOS (explicit list so SOEM isn't dsym'd).
- runtime: emit loom.dSYM on macOS; install loom.pdb (Windows) / loom.dSYM
  (macOS) next to the binary. Linux DWARF is embedded (unstripped), nothing extra.
- sdk: install LoomModule.cmake into lib/cmake/loom and include it from
  loomConfig so user modules get loom_add_module().
- CI build.yml: stage loom's and each module's symbols into the tarball
  (PDBs / .dSYM bundles), per the "symbols in the runtime tarball" choice.

Verified on Windows Release: loom.pdb + per-module PDBs generated, binaries stay
lean (/OPT), loom.pdb installs to bin/, LoomModule.cmake installs to
lib/cmake/loom. macOS .dSYM + Linux DWARF paths validated by CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… rule)

The macOS build failed at Configure: add_custom_command(TARGET) can only attach
to a target created in the SAME directory, but modules/CMakeLists.txt called
loom_target_dsym() on module targets created in their subdirectories. Windows
passed only because the .dSYM step is APPLE-gated.

.dSYM isn't a build byproduct (unlike the PDB/embedded-DWARF that compilation
already emits) — it needs an explicit dsymutil pass, and only the *distributed*
binary needs it (local macOS dev resolves via the build-tree debug map). So:

- modules/CMakeLists.txt + runtime/CMakeLists.txt: drop the in-repo
  loom_target_dsym() calls and the macOS .dSYM install rule.
- CI build.yml staging: run dsymutil on the staged loom binary and each staged
  module on macOS (the .o debug map is still present in the job), producing
  .dSYM bundles next to the binaries in the tarball.
- LoomModule.cmake keeps loom_target_dsym for loom_add_module(), where it's
  same-directory (the user creates the target in that call) and valid.

Windows Release reconfigure clean; macOS .dSYM path now in CI staging.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rame)

POSIX crash reports dropped the actual fault site: the trace jumped from the
signal trampoline straight to the *caller* (the return address of the call into
the crashing function) and degraded into a 0x0 tail. Not a debug-info problem —
the frame was lost at capture time.

Cause: the signal handler captured with backtrace(), which walks the handler's
own stack. Crossing the signal trampoline it can only recover the saved return
address (losing the live faulting PC), and the alternate signal stack
(SA_ONSTACK) is discontinuous from the faulting thread's stack so the
frame-pointer walk can't cross back, yielding the 0x0 tail.

Fix (signal handler only): seed the unwind from the ucontext_t the handler
already receives (SA_SIGINFO). Take the faulting PC + FP, set frame 0 = PC (the
real leaf), then walk the saved frame-record chain ([fp]=caller fp,
[fp+8]=return address). This starts from the faulting thread's real stack,
sidestepping both the trampoline and the alt-stack. Register layouts for
macOS/Linux × arm64/x86_64; falls back to backtrace() elsewhere. Apple arm64e
return addresses are PAC-stripped (ptrauth_strip) so cpptrace can resolve them.

Async-signal-safe: only aligned reads, no allocation; alignment + up-stack-only
checks stop a runaway walk. writeRawReport (.txt) is still written before the
allocating symbolize pass. terminateHandler and the Windows path are unchanged
(they run in a normal call context where backtrace()/CaptureStackBackTrace work).

POSIX-only change; verify on macOS per the crasher repro (frame 0 should be the
module's faulting function at its real source line, no 0x0 tail).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
macOS's <ucontext.h> hard-errors ("deprecated ucontext routines require
_XOPEN_SOURCE") because it declares the get/setcontext routines. We only need
the mcontext_t TYPES (uc_mcontext->__ss), which <sys/ucontext.h> provides
without those routines. Keep <ucontext.h> on Linux (REG_RIP/REG_RBP via
_GNU_SOURCE). Linux + Windows already compiled; this fixes the darwin build.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@Joshpolansky Joshpolansky merged commit 18d84ff into develop Jun 23, 2026
10 of 20 checks passed
@Joshpolansky Joshpolansky deleted the feat/crash-diagnostics branch June 23, 2026 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants