Skip to content

Homebrew local driver bootstrap should repair stale TLS/JWT state #1525

@TaylorMutch

Description

@TaylorMutch

Summary

The macOS Homebrew install can leave local container drivers in a broken state after an upgrade because the installer/runtime bootstrap does not fully repair stale local gateway state.

We reproduced three related failure modes while debugging a local macOS install with both Docker Desktop and Podman installed:

  1. The Homebrew service wrapper copied TLS material into ~/.local/state/openshell/homebrew/tls, but did not copy the jwt/ directory generated by openshell-gateway generate-certs.
  2. Existing local TLS material was treated as reusable even when its server certificate had an older SAN set missing host.containers.internal, which breaks the explicit Podman driver when it uses https://host.containers.internal:17670.
  3. The install could keep using a cached/stale supervisor image/binary that was incompatible with gateway-minted sandbox JWT auth. That specific stale-supervisor symptom is tracked in bug: Homebrew Docker gateway can use stale supervisor:dev binary incompatible with sandbox JWT auth #1523.

What we observed

  • Homebrew generated JWT files under /opt/homebrew/var/openshell/tls/jwt.
  • The generated service wrapper set OPENSHELL_LOCAL_TLS_DIR to the copied Homebrew-local TLS directory under ~/.local/state/openshell/homebrew/tls.
  • That copied directory initially contained ca.crt, server/tls.*, and client/tls.*, but no jwt/signing.pem, jwt/public.pem, or jwt/kid.
  • Without the JWT files in OPENSHELL_LOCAL_TLS_DIR, sandbox JWT issuing was not configured.
  • After manually copying jwt/, the Docker path still failed until the stale supervisor image was pulled again. See bug: Homebrew Docker gateway can use stale supervisor:dev binary incompatible with sandbox JWT auth #1523.
  • The explicit Podman driver still failed after pulling the current supervisor image.

The Podman failure was isolated with a direct supervisor RPC attempt:

invalid peer certificate: certificate not valid for name "host.containers.internal";
certificate is only valid for DnsName("openshell"), DnsName("openshell.openshell.svc"),
DnsName("openshell.openshell.svc.cluster.local"), DnsName("localhost"),
DnsName("host.docker.internal"), IpAddress(127.0.0.1) or DnsName("host.openshell.internal")

The same install worked after overriding the Podman driver endpoint to a hostname already present in the stale cert:

[openshell.drivers.podman]
grpc_endpoint = "https://host.openshell.internal:17670"

Current generate-certs output includes host.containers.internal, so the certs were not expired. They were stale relative to the current default SAN set.

Expected behavior

The macOS installer and local bootstrap should make the default local Docker/Podman driver setup work after an upgrade without manual state surgery.

In particular:

  • The Homebrew service wrapper should make JWT key material available in OPENSHELL_LOCAL_TLS_DIR when local gateway JWT material exists.
  • generate-certs --output-dir ... should detect and repair stale local TLS bundles whose server certificate is missing required default or configured SANs.
  • The install/runtime should avoid silently using stale supervisor state that is incompatible with the installed gateway, or at least surface a precise repair instruction. See bug: Homebrew Docker gateway can use stale supervisor:dev binary incompatible with sandbox JWT auth #1523.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    os:macosBug affects macOS hosts

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions