Skip to content

Implement Win32 registry shim and enhance PIL integration#174

Merged
EmJayGee merged 1 commit into
mainfrom
micgrier/mwin32-registry
Jun 19, 2026
Merged

Implement Win32 registry shim and enhance PIL integration#174
EmJayGee merged 1 commit into
mainfrom
micgrier/mwin32-registry

Conversation

@EmJayGee

Copy link
Copy Markdown
Collaborator

This pull request introduces several improvements and fixes across the build system, CI workflows, and codebase. The most significant changes include the addition of a new GitHub Actions workflow for building and packaging the mwin32 SDK, updates to CMake presets for better cross-architecture support, a more comprehensive sample registry log, and code cleanups. Below are the most important changes grouped by theme:

CI/CD and Build System Enhancements

  • Added a new GitHub Actions workflow .github/workflows/mwin32-sdk.yml to automate building, testing, merging, and packaging the mwin32 SDK for both x64 and ARM64 architectures, and to attach the resulting artifact to GitHub Releases.
  • Updated CMakePresets.json:
    • Fixed the typo in the preset name from clang-aarchh64 to clang-aarch64 and updated all references. [1] [2] [3]
    • Added VCPKG_TARGET_TRIPLET cache variable to relevant presets to ensure consistent vcpkg triplet usage. [1] [2]
  • Updated cmake/README.md to explain why the vcpkg toolchain file is specified in both the toolchainFile property and as a cache variable in CMakePresets.json, improving documentation for future maintainers.

Codebase and Test Improvements

  • Changed the sample registry log in log1.xml to a more comprehensive and realistic registry tree structure, replacing the previous procedural log format.
  • Added the mwin32 subdirectory to the libraries CMake build, ensuring the mwin32 component is included in the build process.
  • Modified the errors library CMake file to only add the test subdirectory if BUILD_TESTING is enabled, preventing unnecessary test builds.

Code Cleanup and Maintenance

  • In src/Windows/libraries/multi_byte/include/m/multi_byte/to_span.h, provided an inline implementation for the to_span function and removed its previous redundant declaration, simplifying the code. [1] [2]
  • Removed a workaround for a suspected MSVC LTCG bug in CMakeLists.txt by eliminating the /GL flag replacement.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

@EmJayGee

Copy link
Copy Markdown
Collaborator Author

@copilot, can you review just src\libraries\pil please?

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

@copilot, can you review just src\libraries\pil please?

I reviewed src/libraries/pil and found 4 issues worth flagging:

  • src/libraries/pil/src/direct/Platforms/windows/win32_filesystem.cpp:735SetFilePointer(... static_cast<LONG>(written) ...) overflows for writes >2 GiB. That can make the truncate step fail after WriteFile already succeeded, leaving stale tail data behind.
  • src/libraries/pil/src/intercepting/intercepting_webcore.cpp:2105 and :1985-1987activate() heap-allocates interception_context, but teardown only nulls g_active_context; the context is leaked.
  • src/libraries/pil/src/intercepting/intercepting_webcore.cpp:822-823 and :1933-1942 — synthetic file handles allocated by the CreateFileW hook are never released because there is no CloseHandle hook.
  • src/libraries/pil/src/intercepting/intercepting_webcore.cpp:1589-1650 — synthetic request bodies are read from s_synthetic_request_bodies, but that map is never populated when requests are dequeued, so synthetic POST/PUT bodies are effectively dropped.

If you want, I can take a pass at fixing those next.

@EmJayGee

Copy link
Copy Markdown
Collaborator Author

@copilot, can you review src\libraries\pil again?

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

@copilot, can you review src\libraries\pil again?

Took another pass over src/libraries/pil (focused on the changes in 74270e9 and the surrounding interception code). The four previously flagged issues are fixed. Five new things worth flagging:

  1. src/libraries/pil/src/intercepting/intercepting_webcore.cpp:43g_active_context is thread_local, but it is only set on the thread that calls webcore::activate (g_active_context = ctx.get() at line 2152). hwebcore.dll runs almost everything (HTTP request servicing, async callbacks, worker pool tasks) on its own threads. On those threads g_active_context == nullptr, so every hook short-circuits to the original Win32 function and the interception is silently bypassed. The comment at lines 477–481 claims thread-local is fine because "only one context is active at a time" — that argument is about count, not about which thread the hooks fire on. This needs to be a plain global (with appropriate synchronization for swap-in/swap-out) or a per-thread propagation scheme, otherwise the entire decorator is effectively a no-op outside the activation thread.

  2. src/libraries/pil/src/intercepting/intercepting_webcore.cpp:824hook_CreateFileW returns a synthetic handle (e.g. 0x80000100+) from allocate_file_handle, but no ReadFile / WriteFile / GetFileSize(Ex) / SetFilePointer(Ex) / GetFileType / FlushFileBuffers / SetEndOfFile / LockFile* hooks exist. Anything the engine does with the returned handle beyond eventually closing it will hit kernel32 with a bogus HANDLE and fail ERROR_INVALID_HANDLE. The handle is essentially write-only-to-a-map. Either also hook the I/O surface (and route through ifile) or fall through to original_CreateFileW for cases we cannot fully service.

  3. src/libraries/pil/src/intercepting/intercepting_webcore.cpp:1123-1140hook_CloseHandle is now spliced into every KERNEL32!CloseHandle import in hwebcore.dll. That is one of the hottest Win32 calls in a server (every event, mutex, thread, file mapping, registry handle, …). On every call we take file_handle_mutex and do an unordered_map probe just to discover the handle isn't ours. Consider either (a) a lock-free fast-path that checks the handle value against the synthetic range (>= 0x80000100 && < next_file_handle_value) before locking, or (b) reader/writer locking, so unrelated CloseHandle traffic isn't serialised through one mutex.

  4. src/libraries/pil/src/intercepting/intercepting_webcore.cpp:1556-1606 — the body-stash fix from 74270e9 is correct but mostly unreachable in practice: marshal_synthetic_request (line 1493) only writes the base HTTP_REQUEST struct — it sets Verb, Version, and RequestId, but never marshals headers (no Content-Length), URL, cooked URL, raw URL, or unknown headers. With no Content-Length the engine has nothing telling it there is a body to read, so it will never call HttpReceiveRequestEntityBody, so the freshly added s_synthetic_request_bodies entry just sits there until the request id is reused (and there's no cleanup if the engine drops the request). Either marshal at least Content-Length (and the URL — the engine routes on it) or gate the body stash on having marshaled a non-zero Content-Length.

  5. src/libraries/pil/src/intercepting/intercepting_webcore.cpp:1578-1594try_dequeue_request() removes the request from m_pending_requests before we know it fits the caller's buffer. If needed > RequestBufferLength we return ERROR_MORE_DATA, but the request has already been popped and is lost — the caller's retry with a larger buffer will see the next request (or none). Either peek without popping and only pop once marshaling succeeds, or push the request back to the front of the deque on the ERROR_MORE_DATA path.

Happy to take a pass at #1 (the thread_local issue) and #5 (the dequeue-then-lose race) — those are the smallest and have the highest impact-to-risk ratio. #2 is a larger piece of work and probably wants its own checklist item.

EmJayGee pushed a commit that referenced this pull request Jun 18, 2026
Completed item: M-HWC-REVIEW-1: Make g_active_context a plain process-global so hooks fire on engine worker threads
Completed item: M-HWC-REVIEW-3: Add lock-free synthetic-handle fast-path to the CloseHandle hook
Completed item: M-HWC-REVIEW-5: Requeue synthetic HTTP request on ERROR_MORE_DATA instead of dropping it
@EmJayGee

Copy link
Copy Markdown
Collaborator Author

@copilot , can you review src\libraries\pil again?

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

@copilot , can you review src\libraries\pil again?

Re-reviewed src/libraries/pil (focused on the M-HWC-REVIEW-1..5 commits 3bc2c09 and 57023db). All five issues from the prior pass look correctly addressed (process-global g_active_context, full marshal_synthetic_request, requeue_front on ERROR_MORE_DATA, fast-path on synthetic-handle range, file I/O hooks added). A handful of new issues introduced or surfaced by the I/O hook work:

  1. Handle-range collision is now imminentsrc/libraries/pil/src/intercepting/intercepting_webcore.h:219-273. synthetic_handle_base = 0x80000100 and synthetic_find_handle_base = 0x80000200, but next_file_handle_value simply increments. After 256 CreateFileW calls (no opens-and-closes need accumulate; the counter never decreases), file handles will start being minted from 0x80000200… and onward and will collide with find handles — and then with reused HKEY values (HKEY uses the same base 0x80000100). Two consequences: the file-handle map will alias find handles, and is_synthetic_file_handle will start returning true for handles that are actually FindFirstFileW cookies. A real HWC engine that streams content can blow through 256 opens in seconds. The handle scheme needs either non-overlapping reserved ranges per kind (e.g. files 0x8000_0000_0000_0100, finds 0x8000_0000_4000_0000, keys 0x8000_0000_8000_0000 on x64), or a generation-tagged scheme with explicit reuse on close.

  2. hook_WriteFile on a synthetic handle breaks on the second callsrc/libraries/pil/src/intercepting/intercepting_webcore.h:372-395src/libraries/pil/src/direct/Platforms/windows/win32_filesystem.cpp:705-712. write_file_handle forwards to ifile::write_content at explicit_offset.value_or(st.position). But win32_filesystem.cpp:708 requires offset == 0 ("Whole-file replacement only (D16): a write whose offset is non-zero is a partial / mid-file overwrite, which is not modeled") and returns std::errc::not_supported. So the first WriteFile after a CreateFileW works (st.position is 0), and every subsequent WriteFile on the same handle fails with ERROR_NOT_SUPPORTED. The engine writing a response body, log file, or any multi-chunk artifact will hit this on chunk 2. This decorator is supposed to make the synthetic handle behave like a file handle — either the direct backend needs an append/positioned write path, or write_file_handle needs to buffer into a growable vector<byte> and flush as a single write_content on close/flush.

  3. OVERLAPPED synthetic completion is incompatible with IOCPsrc/libraries/pil/src/intercepting/intercepting_webcore.cpp:1253-1259 (and the matching block in hook_WriteFile). When the caller has bound the handle to an I/O completion port, they expect a completion packet to be queued via PostQueuedCompletionStatus in addition to the event being signalled. We synthesize neither (we only SetEvent(hEvent) and return TRUE). A server using IOCP — which is the default for hwebcore — will see the call "succeed" but its IOCP-pump thread will never wake. Either reject overlapped I/O on synthetic handles (same way hook_HttpReceiveHttpRequest rejects pOverlapped != nullptr with ERROR_INVALID_PARAMETER) or actually queue the completion packet via a stored IOCP handle.

  4. file_handle_mutex held across the entire I/O callsrc/libraries/pil/src/intercepting/intercepting_webcore.h:350-366, 378-394. The lock is taken before read_content / write_content runs, so all synthetic-file I/O across all handles is serialised through one mutex, even reads on independent files. The usual pattern is: take the mutex briefly to look up the shared_ptr<ifile> (and snapshot st.position), release it, do the I/O, re-acquire briefly to update st.position. With the current scheme a slow write blocks every other thread's read on every other file.

  5. get_file_handle_size and set_file_handle_pointer FILE_END swallow the metadata errorsrc/libraries/pil/src/intercepting/intercepting_webcore.h:414 and :446-447. ifile::query_information is called without checking its disposition, then md.m_size is read regardless. If the query fails, md.m_size is whatever the default-constructed file_metadata left it (0), and we return success with size 0. Either propagate the disposition into ec or return false so the hook falls back to the original.

  6. hook_GetFileType / hook_FlushFileBuffers / hook_SetEndOfFile short-circuit when context exists but handle isn't ourssrc/libraries/pil/src/intercepting/intercepting_webcore.cpp:1434-1446, 1448-1462, 1464-1482. The pattern is: fast-path range check, then if (is_synthetic_file_handle(h)) return TRUE/FILE_TYPE_DISK; return original_…(h). The fall-through return original_…(h) is right when is_synthetic_file_handle says no, but the range check excludes anything below synthetic_handle_base — so any real handle that happens to be ≥ 0x80000100 (extremely uncommon today, but the kernel does not guarantee handles stay below that value forever, especially on 64-bit) will still take the lock and probe the map. That's a continuation of the perf concern in Add workflows for building #4 above for non-CloseHandle hooks.

  7. Memory ordering claim in the new banner comment is too strongsrc/libraries/pil/src/intercepting/intercepting_webcore.cpp:43-53. The comment asserts "data-race-free without atomics because it is fully ordered: activate() sets the pointer (under m_mutex) before the underlying engine […] is started, and ~webcore_instance() clears it only after the underlying instance has been shut down and its threads joined." That's correct if hwebcore's instance destructor actually joins every worker before returning — but the engine is opaque to us and we have no spec hook guaranteeing that. Even if true today, future Windows updates could change the teardown sequence. Since g_active_context is read on every hooked call on every thread, the safest fix is std::atomic<interception_context*> with memory_order_acquire reads and memory_order_release writes. The cost is negligible (a relaxed/acquire load is a plain mov on x64/ARM64) and it removes one piece of "this is correct if" reasoning from a security-sensitive surface.

#1 (handle collision) and #2 (write breaks on second call) are the two that I'd treat as blocking — both will produce broken behaviour with realistic engine usage rather than just edge cases. #3 is also probably blocking depending on whether the engine uses IOCP on these handles in practice; if it does, reject overlapped synthetic I/O outright as a safe interim.

EmJayGee pushed a commit that referenced this pull request Jun 18, 2026
Completed item: M-HWC-REVIEW2-1: Give each synthetic-handle kind its own non-overlapping range (floor + per-kind bases)
Completed item: M-HWC-REVIEW2-2: Make WriteFile on a synthetic handle accumulate past the first chunk and flush once on flush/close
Completed item: M-HWC-REVIEW2-3: Reject overlapped I/O on synthetic handles with ERROR_INVALID_PARAMETER
Completed item: M-HWC-REVIEW2-4: Do not hold file_handle_mutex across the backing read
Completed item: M-HWC-REVIEW2-5: Propagate query_information failure through ec in size/seek
Completed item: M-HWC-REVIEW2-6: Make the active context a std::atomic read with acquire/release
@EmJayGee

Copy link
Copy Markdown
Collaborator Author

@copilot, can you please review src\libraries\math

Copilot AI commented Jun 18, 2026

Copy link
Copy Markdown

@copilot, can you please review src\libraries\math

Reviewed src/libraries/math (primarily include/m/math/math.h plus the component's tracked files). The core arithmetic looks correct — I traced the "compute in ℤ, then cast to ResultT" model through every signed/unsigned specialization, including the awkward most-negative (intmax_t::min / |min| = max+1) paths in multiply/divide/negate, and the unsigned↔signed cross cases, and didn't find a wrong-answer or missed-overflow bug. The division-based multiply overflow checks are all guarded by early l==0 || r==0 returns, so no divide-by-zero. So the notes below are hygiene/robustness, not correctness:

  1. Non-ASCII character in a runtime exception stringmath.h:149 embeds × (U+00D7) in the std::format(...) message that gets thrown. Comments using ×// (e.g. 401–409, 594, 859–873) are harmless, but a non-ASCII byte in a thrown message string is the kind of thing that breaks under a non-UTF-8 code page (and this repo has tpu encoding tooling precisely because of that class of bug). Suggest plain * (or x) in the literal.

  2. Unused includes in a Layer-0 foundation header<array>, <map>, <mutex>, <tuple>, <functional>, <cstddef>, and <exception> are all included (lines 6–17) but nothing in the file references std::array/map/mutex/tuple/function/byte/size_t/exception. Since this header sits at the bottom of the layering graph and is pulled in widely, dropping the unused ones trims transitive include cost for everything above it. (<format>, <stdexcept>, <climits>/<cstdint>, <type_traits> are the ones actually used.)

  3. Stray non-source files committed under the componentsrc/libraries/math/CHECKLIST.md.backup (a literal backup file) and src/libraries/math/FIX_ISSUE_2.md (an ad-hoc "here's the fix that needs to be applied" note that just describes work already done in the code). Both are clutter that will rot; per the repo's CHECKLIST/DESIGN-NOTES conventions the backup should be deleted and any durable rationale folded into DESIGN-NOTE.md.

  4. Misleading / inconsistent exception messagesmath.h:374 throws "m::math::add overflow: cannot add most negative signed value" from inside subtract (the r < 0 branch routes through addition internally, but the message will confuse anyone debugging a subtract call). More broadly, throw sites are inconsistent: some build rich std::format messages (lines 98, 148, 160, …) while many others throw a bare "integer overflow" (e.g. 321, 344, 393, 496, 768, 1184). Worth standardizing.

  5. Formatting not run through clang-format — several method bodies are mis-indented relative to their enclosing struct (e.g. 141–154, 266–272, 846–929, 1130–1150) and there are odd blank-line clusters at 438–451 and 1095–1108. Cosmetic, but a clang-format pass would clean it up.

#1 (non-ASCII in a thrown string) and #3 (stray committed files) are the two I'd prioritize. Happy to take a pass at any of these if you'd like.

@EmJayGee

Copy link
Copy Markdown
Collaborator Author

Thanks for the re-review. Heads-up on the snapshot: this pass looked at commits 3bc2c09 and 57023db, but a later commit 334b1ad ("Harden synthetic-file I/O hooks (PR #174 Copilot 3rd-pass review)") had already landed and addresses all seven items. Verified against current HEAD:

  1. Handle-range collision - Fixed. The three synthetic-handle kinds now mint from widely separated, non-overlapping reserved ranges instead of adjacent +0x100 bases: keys 0x80000000, files 0xA0000000, finds 0xC0000000 (each ~0x20000000 / ~536M apart), with a shared synthetic_handle_floor = 0x80000000 used as a lock-free fast path. This is the "non-overlapping reserved ranges per kind" option you suggested. (intercepting_webcore.h:225-228)
  2. hook_WriteFile second write - Fixed. file_state now carries a growable write_buffer + dirty flag; positioned/chunked writes accumulate in memory and are flushed to the backing ifile as a single whole-file write_content on FlushFileBuffers/CloseHandle. The old offset != 0 -> not_supported failure on chunk 2 is gone. (write_file_handle / flush_file_handle / close_file_handle)
  3. OVERLAPPED / IOCP incompatibility - Fixed. hook_ReadFile and hook_WriteFile now reject overlapped I/O on synthetic handles with ERROR_INVALID_PARAMETER rather than faking a completion the caller's IOCP port would never receive. (intercepting_webcore.cpp:1247, :1291)
  4. file_handle_mutex held across I/O - Fixed. read_file_handle snapshots the shared_ptr<ifile> + position under the lock, runs read_content unlocked, then re-acquires briefly only to advance the position. Writes are now an in-lock memcpy into the buffer (no backing I/O on the write path), so independent-file reads no longer serialize on one mutex.
  5. get_file_handle_size / FILE_END metadata error swallowed - Fixed. Both now check query_information's disposition and surface it through ec (io_error) instead of returning success with size 0.
  6. hook_GetFileType / hook_FlushFileBuffers / hook_SetEndOfFile short-circuit - Fixed. All three take the < synthetic_handle_floor fast path first, so any real handle below the floor skips the lock entirely.
  7. Memory-ordering claim too strong - Fixed. g_active_context is now std::atomic<interception_context*>, written with memory_order_release and read via the active_context() accessor with memory_order_acquire; the banner comment was rewritten to drop the "correct if the engine joins its threads" reasoning.

No further code changes needed for these. (clang-format/cosmetic items aside.)

@EmJayGee

Copy link
Copy Markdown
Collaborator Author

Thanks. Items 1-4 were addressed in commit 76e7c60 ("math: centralize overflow exception messages via detail throw helpers; fix non-ASCII char, IWYU includes, remove stray files"), which landed before this review snapshot. Verified against current HEAD:

  1. Non-ASCII character in a thrown string - Fixed. All throw sites now route through detail::throw_*_overflow() helpers whose messages are plain ASCII (e.g. "m::math::multiply result out of range"). The remaining non-ASCII characters are only in comments, which you noted are harmless.
  2. Unused includes in a Layer-0 header - Fixed. math.h now includes only <cstdint>, <limits>, <stdexcept>, <type_traits> plus the m/cast and m/utility headers it actually uses; <array>/<map>/<mutex>/<tuple>/<functional>/<cstddef>/<exception> are gone.
  3. Stray committed files - Fixed. CHECKLIST.md.backup and FIX_ISSUE_2.md are removed (neither is on disk nor tracked by git).
  4. Misleading / inconsistent exception messages - Fixed. The centralized helpers give each operation a consistent, operation-named message, so subtract no longer throws the add text and the bare "integer overflow" sites are gone.
  5. clang-format - Not changed, intentionally. Formatting isn't CI-gated in this repo, and the tree was formatted with an older clang-format than is available locally (a known-clean sibling header reports ~26 violations under clang-format 22), so a mechanical reformat would introduce version-drift churn inconsistent with the rest of the codebase. Better handled as a repo-wide pass with the pinned formatter version - happy to do that separately if you'd like.

@EmJayGee EmJayGee force-pushed the micgrier/mwin32-registry branch from 3096180 to dace64d Compare June 19, 2026 00:09
@EmJayGee EmJayGee merged commit 26e0786 into main Jun 19, 2026
5 checks passed
@EmJayGee EmJayGee deleted the micgrier/mwin32-registry branch June 19, 2026 00:37
@EmJayGee EmJayGee restored the micgrier/mwin32-registry branch June 20, 2026 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants