feat(diag): symbolized JSON crash reports on POSIX (mac/linux)#14
Merged
Conversation
…ust raw On mac/linux a segfault previously produced only a raw loom-crash-<pid>.txt (async-signal-safe path), so the viewer showed it as "raw" with no frames — while Windows already wrote a structured, symbolized JSON report. Make both platforms produce the same structured report. - crash_handler: extract a shared writeStructuredReport() (FaultReport + cpptrace symbolize + JSON) used by both the Windows UEF and the POSIX signal handler. The POSIX handler now writes the async-signal-safe raw .txt FIRST (guaranteed fallback), then attempts the structured .json best-effort. The FaultReport/JSON is fully built in memory then written in one shot, so the file is complete or absent — never partial. g_reporting still guards reentry. Also stamp a timestamp (system_clock/clock_gettime is async-signal-safe) so signal-path reports sort/display correctly. - fault_store scanDir: when both loom-crash-<id>.json and .txt exist for one crash, surface only the .json (the .txt is the raw fallback it supersedes). - test: DiagFaultStore.JsonSupersedesRawTxtSibling covers both the json-wins-over-txt and txt-only (structured pass failed) cases. Note: symbol *quality* on mac still depends on debug info (a Debug build with a .dSYM gives file:line; otherwise function names / addresses) — that's the separate offline-symbolize/symbolsDir track. The report is structured either way. Windows build + diag tests green; POSIX path validated by CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR makes POSIX (macOS/Linux) crashes produce the same structured, symbolized JSON crash reports as Windows, while keeping an async-signal-safe raw .txt fallback. It also updates the fault store to prefer .json when both .json and .txt exist for the same crash, and adds coverage for the precedence behavior.
Changes:
- Refactors crash reporting into a shared structured JSON writer and extends the POSIX handler to emit raw
.txtfirst, then best-effort.json. - Updates
FaultStore::scanDir()to suppress raw.txtwhen a sibling structured.jsonexists. - Adds a unit test to validate
.json-supersedes-.txtbehavior (and the.txt-only fallback case).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
runtime/src/diag/crash_handler.cpp |
Extracts shared structured report writing and updates POSIX handler to write raw fallback then best-effort JSON. |
runtime/src/diag/fault_store.cpp |
Suppresses raw .txt entries when a sibling .json exists for the same crash stem. |
tests/test_diag.cpp |
Adds test verifying that .json supersedes .txt and that .txt is still surfaced when alone. |
Comment on lines
+72
to
+78
| // First pass: note which stems have a structured .json so a sibling .txt | ||
| // (the POSIX async-signal-safe fallback) can be suppressed in its favor. | ||
| std::set<std::string> jsonStems; | ||
| for (const auto& de : std::filesystem::directory_iterator(crashDir_, ec)) { | ||
| if (ec || !de.is_regular_file()) continue; | ||
| if (de.path().extension() == ".json") jsonStems.insert(de.path().stem().string()); | ||
| } |
Comment on lines
42
to
68
| @@ -78,23 +68,34 @@ void writeReportWin(FaultKind kind, const char* reason, int code, | |||
| if (f) f << json; | |||
Comment on lines
+165
to
+167
| void writeRawReport(int sig, void* const* frames, int n) { | ||
| int fd = ::open(g_reportPathRaw, O_WRONLY | O_CREAT | O_TRUNC, 0644); | ||
| if (fd < 0) return; |
…lization cpptrace symbolizes crash frames from BOTH the runtime and the crashing module, but the shipped (Release) builds carried no debug info and the release tarball shipped no symbol files — so a downloaded runtime resolved only raw addresses. Now optimized builds emit debug info and the symbols ride along next to the binaries, so crash reports resolve to function/file:line with zero setup. - cmake/LoomModule.cmake (new): loom_add_module() + loom_target_debug_info() + loom_target_dsym(). Installed with the SDK and included by loomConfig, so module authors who find_package(loom) build plugins with symbols handled the same way the bundled modules are. Defines the LOOM_WITH_DEBUG_INFO option (ON). - root CMakeLists: for optimized configs (Release/RelWithDebInfo), add MSVC /Zi + /DEBUG + /OPT:REF,ICF (PDB, kept lean) or GCC/Clang -g (embedded DWARF on Linux; .dSYM on macOS via the helper). Covers loom + loom_runtime + all bundled modules with no per-module edits. - modules: emit a .dSYM per module on macOS (explicit list so SOEM isn't dsym'd). - runtime: emit loom.dSYM on macOS; install loom.pdb (Windows) / loom.dSYM (macOS) next to the binary. Linux DWARF is embedded (unstripped), nothing extra. - sdk: install LoomModule.cmake into lib/cmake/loom and include it from loomConfig so user modules get loom_add_module(). - CI build.yml: stage loom's and each module's symbols into the tarball (PDBs / .dSYM bundles), per the "symbols in the runtime tarball" choice. Verified on Windows Release: loom.pdb + per-module PDBs generated, binaries stay lean (/OPT), loom.pdb installs to bin/, LoomModule.cmake installs to lib/cmake/loom. macOS .dSYM + Linux DWARF paths validated by CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… rule) The macOS build failed at Configure: add_custom_command(TARGET) can only attach to a target created in the SAME directory, but modules/CMakeLists.txt called loom_target_dsym() on module targets created in their subdirectories. Windows passed only because the .dSYM step is APPLE-gated. .dSYM isn't a build byproduct (unlike the PDB/embedded-DWARF that compilation already emits) — it needs an explicit dsymutil pass, and only the *distributed* binary needs it (local macOS dev resolves via the build-tree debug map). So: - modules/CMakeLists.txt + runtime/CMakeLists.txt: drop the in-repo loom_target_dsym() calls and the macOS .dSYM install rule. - CI build.yml staging: run dsymutil on the staged loom binary and each staged module on macOS (the .o debug map is still present in the job), producing .dSYM bundles next to the binaries in the tarball. - LoomModule.cmake keeps loom_target_dsym for loom_add_module(), where it's same-directory (the user creates the target in that call) and valid. Windows Release reconfigure clean; macOS .dSYM path now in CI staging. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rame) POSIX crash reports dropped the actual fault site: the trace jumped from the signal trampoline straight to the *caller* (the return address of the call into the crashing function) and degraded into a 0x0 tail. Not a debug-info problem — the frame was lost at capture time. Cause: the signal handler captured with backtrace(), which walks the handler's own stack. Crossing the signal trampoline it can only recover the saved return address (losing the live faulting PC), and the alternate signal stack (SA_ONSTACK) is discontinuous from the faulting thread's stack so the frame-pointer walk can't cross back, yielding the 0x0 tail. Fix (signal handler only): seed the unwind from the ucontext_t the handler already receives (SA_SIGINFO). Take the faulting PC + FP, set frame 0 = PC (the real leaf), then walk the saved frame-record chain ([fp]=caller fp, [fp+8]=return address). This starts from the faulting thread's real stack, sidestepping both the trampoline and the alt-stack. Register layouts for macOS/Linux × arm64/x86_64; falls back to backtrace() elsewhere. Apple arm64e return addresses are PAC-stripped (ptrauth_strip) so cpptrace can resolve them. Async-signal-safe: only aligned reads, no allocation; alignment + up-stack-only checks stop a runaway walk. writeRawReport (.txt) is still written before the allocating symbolize pass. terminateHandler and the Windows path are unchanged (they run in a normal call context where backtrace()/CaptureStackBackTrace work). POSIX-only change; verify on macOS per the crasher repro (frame 0 should be the module's faulting function at its real source line, no 0x0 tail). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
macOS's <ucontext.h> hard-errors ("deprecated ucontext routines require
_XOPEN_SOURCE") because it declares the get/setcontext routines. We only need
the mcontext_t TYPES (uc_mcontext->__ss), which <sys/ucontext.h> provides
without those routines. Keep <ucontext.h> on Linux (REG_RIP/REG_RBP via
_GNU_SOURCE). Linux + Windows already compiled; this fixes the darwin build.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
On mac/linux a segfault previously produced only a raw loom-crash-.txt (async-signal-safe path), so the viewer showed it as "raw" with no frames — while Windows already wrote a structured, symbolized JSON report. Make both platforms produce the same structured report.
Note: symbol quality on mac still depends on debug info (a Debug build with a .dSYM gives file:line; otherwise function names / addresses) — that's the separate offline-symbolize/symbolsDir track. The report is structured either way.
Windows build + diag tests green; POSIX path validated by CI.