diff --git a/docs/RFCs/017-TestHost-Launcher.md b/docs/RFCs/017-TestHost-Launcher.md new file mode 100644 index 0000000000..a77aab427a --- /dev/null +++ b/docs/RFCs/017-TestHost-Launcher.md @@ -0,0 +1,436 @@ +# RFC 017 - Custom test host launcher + +- [ ] Approved in principle +- [x] Under discussion +- [x] Implementation +- [ ] Shipped + +## Summary + +Introduce **`ITestHostLauncher`**: a public, experimental Microsoft.Testing.Platform (MTP) +extension point that lets an extension control **how** the out-of-process test host is launched, +instead of the platform always doing `Process.Start`. The platform still owns everything around +the launch — argument/environment preparation, the controller↔host IPC pipe, PID tracking, +`ITestHostProcessLifetimeHandler` callbacks, and exit-code reconciliation — and simply delegates +the single "create and start the test host" step to the registered launcher. + +The hook is deliberately **agnostic of the launch mechanism**: the launcher does not have to start a +local OS process. It can deploy and activate a packaged application, launch a container, or start +the host on a remote machine. To make this explicit, the launcher returns an `ITestHostHandle` that +exposes only the lifecycle the platform needs (`WaitForExitAsync`, `ExitCode`, `HasExited`, +`Terminate`) plus an optional free-form `Identifier` string used purely for diagnostics. + +The motivating scenario is **packaging and deployment of MSIX-packaged applications — both UWP and +WinUI** (see [#2784](https://github.com/microsoft/testfx/issues/2784)): packaged/MSIX apps cannot be +started with `Process.Start` and must be deployed and then activated by AUMID. UWP and packaged WinUI +share this exact mechanism, which is why VSTest exposes a single `UwpTestHostRuntimeProvider` for +both; unpackaged apps similarly benefit from a custom deploy + launch step. The same hook also +enables launching the test host under a debugger, elevated, inside a container, or on a remote +machine. + +## Motivation + +MTP runs the test host out-of-process whenever a "test host controller" extension is active (hang +dump, crash dump, or any `ITestHostProcessLifetimeHandler` / `ITestHostEnvironmentVariableProvider`). +That work happens in `TestHostControllersTestHost`, which prepares a `ProcessStartInfo` (arguments, +environment variables including the `MONITORTOHOST` pipe name) and then launches the host with a +single call: + +```csharp +using IProcess testHostProcess = process.Start(processStartInfo); +``` + +Everything downstream only needs a handful of things from the returned handle — a way to observe +exit (`WaitForExitAsync()`, `ExitCode`, `HasExited`), optionally an identifier for logging, and a +way to tear it down — plus the child connecting back on the named pipe whose name was +injected via an environment variable. **`Process.Start` is the only assumption that does not hold +universally.** Several real scenarios need a different launch mechanism: + +- **Packaged WinUI/MSIX**: a packaged app must be deployed (in Developer Mode, register the loose + layout) and then activated by Application User Model ID (AUMID) via `IApplicationActivationManager`, + not started from an executable path. This is the blocker behind + [#2784](https://github.com/microsoft/testfx/issues/2784) and the reason VSTest's + `UwpTestHostRuntimeProvider` exists. +- **Debugger attach/launch**: start the host suspended (or under a debugger launcher such as + `vsdbg` / `WinDbg` / `dlv`) and only then resume. +- **Elevation**: run the test host as administrator (UAC) or as another user. +- **Container / remote**: launch the host inside a container (`docker run`) or on a remote device + over SSH/WinRM, then bridge the pipe — neither of which exposes a local, queryable PID. + +Today none of these is possible without forking the platform. The existing experimental +`ITestHostExecutionOrchestrator` sits at the wrong layer (see [Alternatives](#alternatives-considered)). This +RFC adds the *minimal* hook at exactly the launch site. + +## Goals + +- Let an extension substitute the test host launch step while the platform keeps owning + argument/env preparation, IPC, lifetime-handler dispatch, and exit-code handling. +- Keep hang dump, crash dump, and all `ITestHostProcessLifetimeHandler` / + `ITestHostEnvironmentVariableProvider` extensions working unchanged when a custom launcher is + present. +- Be generic enough to cover WinUI deploy+activate, debugger, elevation, container, and remote + launch with one shape, **without assuming the launched thing is a local OS process**. +- Follow MTP's experimental-API conventions so the surface can evolve before stabilizing. + +## Non-goals + +- Replacing the *entire* run loop (that is `ITestHostExecutionOrchestrator`'s job). +- Remote **device deployment/bootstrapping** of the Windows App SDK framework + agent (VSTest's + `Microsoft.UniversalApps.Deployment` has no public redistributable; out of scope — local launch + only). +- Shipping a *complete* packaged UWP/WinUI (MSIX) deployment story. This RFC adds the platform hook; + a reference consumer (`Microsoft.Testing.Extensions.PackagedApp`) implements the deploy-and-launch path, + while packaged AUMID activation remains a separate follow-up. +- Changing the in-process (single-process, `ConsoleTestHost`) execution path. + +## Detailed design + +### Where it plugs in + +The hook lives in the **test host controllers** layer, next to the existing lifetime-handler and +environment-variable-provider extension points, and is registered through +`ITestHostControllersManager`. + +```csharp +namespace Microsoft.Testing.Platform.Extensions.TestHostControllers; + +/// +/// Allows an extension to control how the out-of-process test host is launched, +/// replacing the platform's default Process.Start behavior. +/// +[Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")] +public interface ITestHostLauncher : ITestHostControllersExtension // : IExtension +{ + /// + /// Creates and starts the test host. The platform has already prepared the file name, + /// arguments, and environment variables (including the controller IPC pipe name) carried by + /// . The implementation must return a handle the platform can monitor. + /// + Task LaunchTestHostAsync( + TestHostLaunchContext context, + CancellationToken cancellationToken); +} +``` + +The platform passes the fully-prepared launch information: + +```csharp +[Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")] +public sealed class TestHostLaunchContext +{ + public TestHostLaunchContext( + string fileName, + IReadOnlyList arguments, + IReadOnlyDictionary environmentVariables, + string? workingDirectory); + + /// The default test host executable path the platform would have started. + public string FileName { get; } + + /// Arguments, already including the test host controller PID option. + public IReadOnlyList Arguments { get; } + + /// + /// The final environment for the test host, after all + /// ran. Includes the + /// controller↔host IPC pipe name the host must connect back on. + /// + public IReadOnlyDictionary EnvironmentVariables { get; } + + /// The working directory, or null to inherit the current one. + public string? WorkingDirectory { get; } +} +``` + +And the launcher returns a launch-mechanism-agnostic handle (the platform adapts it to its internal +monitoring contract): + +```csharp +[Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")] +public interface ITestHostHandle : IDisposable +{ + /// + /// Free-form diagnostic identifier (a PID, container id, remote host:pid, …) or null. The + /// platform never relies on it for control flow. + /// + string? Identifier { get; } + + /// Only valid once is true (reading it earlier is undefined). + int ExitCode { get; } + bool HasExited { get; } + + /// Waits for exit, or for the token to be canceled. May be awaited more than once. + Task WaitForExitAsync(CancellationToken cancellationToken); + + /// Best-effort teardown (e.g. when hang dump aborts the run). + void Terminate(); +} +``` + +`ITestHostHandle` extends `IDisposable`: the platform owns the handle for the whole lifetime of the +test host and disposes it once the host has exited, so implementations release any OS resources they +hold (process objects, sockets, container clients, …) in `Dispose`. `WaitForExitAsync` takes a +`CancellationToken` so the platform can stop waiting on cancellation; the controller host still +reconciles the real OS exit code afterwards (on cancellation it terminates the host and waits for it +to fully exit). + +Registration mirrors the existing methods on `ITestHostControllersManager`: + +```csharp +public interface ITestHostControllersManager +{ + // existing: AddEnvironmentVariableProvider(...), AddProcessLifetimeHandler(...) + + [Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")] + void AddTestHostLauncher(Func testHostLauncherFactory); + + [Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")] + void AddTestHostLauncher(CompositeExtensionFactory compositeServiceFactory) + where T : class, ITestHostLauncher; +} +``` + +### Platform integration (what changes inside MTP) + +1. **Swap the launch call.** In `TestHostControllersTestHost.InternalRunAsync`, at the current + `process.Start(processStartInfo)` site (after `BeforeTestHostProcessStartAsync` and after all + env-var providers ran), if a launcher is registered, build a `TestHostLaunchContext` from the + `ProcessStartInfo` and `await launcher.LaunchTestHostAsync(...)`. Otherwise keep the default + `process.Start`. The returned `ITestHostHandle` is adapted to the internal `IProcess` monitoring + contract — which only uses `WaitForExitAsync` / `ExitCode` / `HasExited` / `Kill` (and an + internal `Exited` event synthesized from the exit task for an informational log). The + premature-exit check is gated on `HasExited` only (not on whether an identifier is available), so + a launcher that returns no identifier (container/remote/AUMID) is monitored purely through the + handle lifecycle and the IPC PID handshake. +2. **Force the controller host.** A launcher makes `RequireProcessRestart` `true` when one is + registered (computed in `TestHostControllersManager.BuildAsync`, checked in + `TestHostBuilder.Modes.cs`); without this, a run with *only* a launcher (no dump/lifetime + extension) would stay in-process and there would be nothing to launch. +3. **Singleton.** At most one launcher may be registered; a duplicate fails fast at build time with + a localized "only one test host launcher" error. +4. **Preserve ordering and services.** Because the call stays at the same point, + `ITestHostEnvironmentVariableProvider`, the `MONITORTOHOST` IPC pipe, the PID handshake, and + `ITestHostProcessLifetimeHandler` (and therefore hang dump and crash dump) all keep working with + no changes. + +### Contract requirements on the launcher + +- The launched host **must** end up with the values in `context.EnvironmentVariables` (so it connects + back on the controller pipe) and **must** receive `context.Arguments`. *How* those values reach the + host is left to the launcher — they can be inherited from the environment, passed as activation + arguments, or bridged through a broker. AUMID activation in particular cannot set per-launch + environment variables, so a packaged-app launcher must transfer them another way. +- The returned handle must report exit reliably (`WaitForExitAsync`, `ExitCode`, `HasExited`) and + support `Terminate()` (hang dump terminates the host through it). `WaitForExitAsync` may be awaited + more than once, and must honor its `CancellationToken`. `ExitCode` is only required to be valid + once `HasExited` is `true` (or after `WaitForExitAsync` completes); reading it on a still-running + handle is undefined and implementations are not required to throw. +- The handle is `IDisposable`; the platform disposes it once the host has exited, so the launcher + should release any OS resources it holds (process object, sockets, container client, …) in + `Dispose`. +- `Identifier` is an optional free-form diagnostic string (PID, container id, remote `host:pid`, …) + and may be `null`. The platform never relies on it for control flow. +- If the launcher cannot start the host it should throw; the platform surfaces it as a + platform-setup failure. + +> The `Quote` and `PasteArguments` helpers used in the examples below are placeholders for whatever +> argument/shell quoting the target mechanism needs (e.g. `PasteArguments` from dotnet/runtime for +> Windows command lines, POSIX single-quoting for a shell). Implement them carefully to avoid +> argument-injection bugs; the reference `Microsoft.Testing.Extensions.PackagedApp` extension (see the +> implementation PR) shows a concrete approach. + +## Examples + +All examples assume the extension is registered on the builder, e.g. from a `…Extensions` helper: + +```csharp +builder.TestHostControllers.AddTestHostLauncher(sp => new MyLauncher(sp)); +``` + +### 1. Packaged WinUI / MSIX (the motivating case) + +Deploy the loose layout (Developer Mode) and activate the packaged app by AUMID instead of starting +an exe. The activated app self-hosts MTP (as the `MSTestRunnerWinUI` sample already does for the +in-process case); the launcher is responsible for getting the controller pipe name and correlation +id to it so it can connect back, since AUMID activation does not flow per-launch environment +variables on its own. + +```csharp +public Task LaunchTestHostAsync( + TestHostLaunchContext context, CancellationToken cancellationToken) +{ + // 1. Parse the .appxrecipe / AppxManifest.xml next to context.FileName to get the AUMID + // and (in Developer Mode) register the loose layout: + // new PackageManager().RegisterPackageByUriAsync(manifestUri, options); + string aumid = AppxManifest.ResolveAumid(context.FileName); + + // 2. Activate, passing the SAME args the platform prepared. AUMID activation takes a single + // command-line string, so the launcher must escape/quote context.Arguments (e.g. with a + // PasteArguments-style helper) to preserve what ProcessStartInfo.ArgumentList would have done. + var aam = (IApplicationActivationManager)new ApplicationActivationManager(); + aam.ActivateApplication(aumid, PasteArguments(context.Arguments), ACTIVATEOPTIONS.AO_NONE, out uint pid); + + // 3. Wrap the activated app. AUMID activation cannot set per-launch environment variables, so the + // launcher must bridge the values the host needs from context.EnvironmentVariables (the + // MONITORTOHOST pipe name, correlation id, etc.) another way — e.g. activation arguments or a + // broker process the activated app reads on startup. The handle surfaces the activated PID as + // its (diagnostic-only) Identifier. + return Task.FromResult(new ActivatedAppHandle(pid)); +} +``` + +> Note: enabling the controller→host pipe across the AppContainer sandbox requires a loopback/pipe-ACL +> step (e.g. `CheckNetIsolation LoopbackExempt` or granting the package SID on the pipe). That belongs +> to the package/deploy extension, not the platform. + +### 2. Launch under a debugger + +```csharp +public async Task LaunchTestHostAsync( + TestHostLaunchContext context, CancellationToken cancellationToken) +{ + var psi = new ProcessStartInfo(context.FileName) { UseShellExecute = false }; + foreach (string arg in context.Arguments) psi.ArgumentList.Add(arg); + foreach (var kvp in context.EnvironmentVariables.Where(kv => kv.Value is not null)) + psi.Environment[kvp.Key] = kvp.Value; // skip unset (null) vars + psi.Environment["DOTNET_DefaultDiagnosticPortSuspend"] = "1"; // start suspended + + Process p = Process.Start(psi)!; + await DebuggerLauncher.AttachAsync(p.Id, cancellationToken); // e.g. vsdbg / WinDbg / dlv + await DebuggerLauncher.ResumeAsync(p.Id, cancellationToken); + return new ProcessHandleAdapter(p); +} +``` + +### 3. Elevated (run as administrator) + +```csharp +public Task LaunchTestHostAsync( + TestHostLaunchContext context, CancellationToken cancellationToken) +{ + var psi = new ProcessStartInfo(context.FileName) + { + UseShellExecute = true, // required for the UAC "runas" verb + Verb = "runas", + }; + foreach (string arg in context.Arguments) psi.ArgumentList.Add(arg); + // NOTE: UseShellExecute = true cannot pass per-process env vars; an elevated launcher + // must forward context.EnvironmentVariables another way (e.g. a temp response file the + // host reads, or a broker that sets them) so the host still finds the controller pipe. + Process p = Process.Start(psi)!; + return Task.FromResult(new ProcessHandleAdapter(p)); +} +``` + +This example deliberately shows a sharp edge: elevation via the shell loses per-process environment +variables, so the launcher is responsible for re-delivering them. The platform contract only +requires that the host ends up with `context.EnvironmentVariables`. + +### 4. Container + +Run the test host inside a container and bridge the pipe. `docker run --rm` is used so the container +is removed when it stops. The returned handle tracks the `docker run` client process; note that +killing the client alone does not reliably stop the container, so a real implementation should make +`Terminate()` run `docker stop`/`docker rm` (or run with `--init` and rely on `--rm`) rather than +just killing the local client. + +```csharp +public Task LaunchTestHostAsync( + TestHostLaunchContext context, CancellationToken cancellationToken) +{ + var args = new List { "run", "--rm", "--init" }; + foreach (var kvp in context.EnvironmentVariables.Where(kv => kv.Value is not null)) { args.Add("-e"); args.Add($"{kvp.Key}={kvp.Value}"); } // skip unset (null) vars + // Map the controller pipe into the container (Windows named pipe / Unix domain socket mount). + args.Add("--name"); + string containerName = $"mtp-{Guid.NewGuid():N}"; + args.Add(containerName); + args.Add("test-image:latest"); + args.Add(context.FileName); + args.AddRange(context.Arguments); + + var psi = new ProcessStartInfo("docker") { UseShellExecute = false }; + foreach (string a in args) psi.ArgumentList.Add(a); + Process p = Process.Start(psi)!; + // Wrap so Terminate() runs `docker stop ` (which tears down the container), not + // just Kill() on the local docker client. + return Task.FromResult(new DockerRunHandle(p, containerName)); +} +``` + +### 5. Remote (SSH) + +```csharp +public Task LaunchTestHostAsync( + TestHostLaunchContext context, CancellationToken cancellationToken) +{ + string env = string.Join(' ', context.EnvironmentVariables + .Where(kv => kv.Value is not null) + .Select(kv => $"{kv.Key}={Quote(kv.Value!)}")); // values are nullable; skip unset vars + string remoteCmd = $"{env} {Quote(context.FileName)} {string.Join(' ', context.Arguments.Select(Quote))}"; + + var psi = new ProcessStartInfo("ssh") { UseShellExecute = false }; + psi.ArgumentList.Add("user@remote-host"); + psi.ArgumentList.Add(remoteCmd); + Process ssh = Process.Start(psi)!; // tunnel the controller pipe over the SSH connection + // The handle tracks the local ssh client; its Identifier could be the ssh client PID (diagnostic only). + return Task.FromResult(new ProcessHandleAdapter(ssh)); +} +``` + +## Alternatives considered + +### Reuse `ITestHostExecutionOrchestrator` + +MTP already ships an experimental `ITestHostExecutionOrchestrator` +(`ITestHostOrchestratorManager.AddTestHostOrchestrator`). It was rejected as the vehicle because it +sits **above** the controller: `OrchestrateTestHostExecutionAsync` runs in +`TestHostOrchestratorHost` and replaces the *entire* execution, returning only an exit code. An +implementation would have to re-create everything `TestHostControllersTestHost` provides — +environment-variable providers, the `MONITORTOHOST` IPC/PID handshake, and the +`ITestHostProcessLifetimeHandler` fan-out that **hang dump and crash dump depend on**. That is the +wrong granularity for "launch the host differently." The orchestrator remains the right tool for +whole-run concerns (e.g. retry/repeat that re-runs the host). + +### Make the internal `IProcessHandler` replaceable via DI + +`IProcessHandler` / `IProcess` are `internal` and surface `Process`-specific members (e.g. +`MainModule`). Exposing them publicly would leak implementation detail, over-commit the surface, and +bake in the "it's always a local process" assumption. A purpose-built, minimal, mechanism-agnostic +`ITestHostHandle` is cleaner and evolvable. + +### A process-centric `ITestHostProcessLauncher` returning a `ProcessId` + +An earlier draft of this RFC named the hook `ITestHostProcessLauncher` and returned an +`ITestHostProcessHandle` whose `ProcessId` was a mandatory `int`. That over-commits to "the test host +is a local OS process," which is false for container and remote launches and awkward for +AUMID-activated apps. The current design renames the types to drop "Process", replaces the `int` +process id with an optional free-form `string Identifier` (diagnostic only), drops the redundant +`Exited` event in favour of `WaitForExitAsync`, gives `WaitForExitAsync` a `CancellationToken` and +makes the handle `IDisposable` (so the platform can honor cancellation and deterministically release +handle resources), and names the teardown `Terminate()` instead of `Kill()`. + +### Do nothing (keep `Process.Start`) + +Leaves [#2784](https://github.com/microsoft/testfx/issues/2784) unsolvable on MTP for packaged apps +and blocks the debugger/elevation/container/remote scenarios, all of which today require forking the +platform. + +## Compatibility and conventions + +- **Experimental.** All new types and methods are gated behind + `[Experimental("TPEXP", UrlFormat = "https://aka.ms/testingplatform/diagnostics#{0}")]`, consistent + with the other test-host-controller-era experimental APIs. +- **Public API tracking.** New members are added to `PublicAPI.Unshipped.txt` with the `[TPEXP]` + prefix. +- **No `init` accessors** on any new public API, per repo policy. +- **No behavior change when unused.** If no launcher is registered, the platform behaves exactly as + today (`Process.Start`), and the controller host is selected only when it already would be. + +## Open questions + +- **CLI/debug integration.** Should the platform expose a built-in `--launcher`-style selector, or + is builder/MSBuild registration sufficient for v1? +- **Cancellation semantics.** Define precisely what `Terminate()` must guarantee for remote/container + launchers (best-effort teardown vs. synchronous termination). +- **Multiple launchers.** Singleton for v1; is there ever a composition story (e.g. debugger + + elevation), or do implementers compose manually?