Add SPIFFE v2 URI-SAN based principal extraction (DEPEND-89172)#142
Add SPIFFE v2 URI-SAN based principal extraction (DEPEND-89172)#142Sanju98 wants to merge 3 commits into
Conversation
Code reviewFound 2 issues:
🤖 Generated with Claude Code - If this code review was useful, please react with 👍. Otherwise, react with 👎. |
…oduction path Two fixes addressing the code review on PR linkedin#142: 1. X509AuthenticationConfig.getSpiffeSanMatchPattern now uses double-checked locking with volatile fields and a dedicated lock object, matching the pattern already used by allowedClientIdAsAclDomains and other lazy-loaded fields in the same class. The previous lazy-init was a data race on the per-handshake hot path. 2. X509ZNodeGroupAclProvider's setDomainAuthUpdater lambda now calls helper.getDomains(clientId) instead of the raw map's getOrDefault, so the segment-prefix walk-up added to ZkClientUriDomainMappingHelper is reachable from the production authentication path. This is the znode-tree analogue of LinkedIn's documented acl-tool wildcard idiom (`--spiffe "application/<mp>/*"`); without this fix, MP-level prefix grants documented in the class javadoc would silently no-op. Adds testA4_SpiffeCertResolvesViaPrefixWalkUpToDomainAuthInfo: real SPIFFE cert with a 4-segment principal resolves to an MP-level leaf grant through the full provider->helper.getDomains path. This test would have failed before fix linkedin#2 (the exact-match lookup misses the 4-segment principal against a 2-segment registered prefix). All 30 SPIFFE-related tests pass (X509AuthTest 13 + X509SpiffeAuthIntegrationTest 6 + ZkClientUriDomainMappingHelperTest 11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds server-side support for SPIFFE-issued client certificates in X509 auth. The principal is the ILM UID (path-after-/v2/), aligned with the LinkedIn ILM v2 design: trust-domain stripped, segment-prefix matching for ACL lookup. Key changes: - X509AuthenticationUtil.matchAndExtractSpiffeSAN extracts the ILM UID from v2 SPIFFE URIs. v1 SPIFFE URIs and user-identity URIs (/v<N>/user/...) fall through to existing URN / Subject-DN extraction. - X509AuthenticationConfig adds spiffe.sanMatchRegex as the operator- controlled trust-domain gate. - ZkClientUriDomainMappingHelper recursively walks the znode subtree below each domain. Only leaf znodes are registered as keys (path joined by '/'), letting multi-segment SPIFFE UIDs be expressed as nested znodes (whose names can't contain '/'). getDomains does exact-match then segment-prefix walk-up. clientUriToDomainNames is volatile with in-method snapshot. - Defense-in-depth: SPIFFE path extraction uses URI.getRawPath() and rejects any path containing '%' to prevent URL-decoding bypass of identity checks. Tests: 29/29 pass (X509AuthTest 13, X509SpiffeAuthIntegrationTest 6, ZkClientUriDomainMappingHelperTest 10; includes end-to-end SPIFFE-cert through X509ZNodeGroupAclProvider with real znode mapping). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…oduction path Two fixes addressing the code review on PR linkedin#142: 1. X509AuthenticationConfig.getSpiffeSanMatchPattern now uses double-checked locking with volatile fields and a dedicated lock object, matching the pattern already used by allowedClientIdAsAclDomains and other lazy-loaded fields in the same class. The previous lazy-init was a data race on the per-handshake hot path. 2. X509ZNodeGroupAclProvider's setDomainAuthUpdater lambda now calls helper.getDomains(clientId) instead of the raw map's getOrDefault, so the segment-prefix walk-up added to ZkClientUriDomainMappingHelper is reachable from the production authentication path. This is the znode-tree analogue of LinkedIn's documented acl-tool wildcard idiom (`--spiffe "application/<mp>/*"`); without this fix, MP-level prefix grants documented in the class javadoc would silently no-op. Adds testA4_SpiffeCertResolvesViaPrefixWalkUpToDomainAuthInfo: real SPIFFE cert with a 4-segment principal resolves to an MP-level leaf grant through the full provider->helper.getDomains path. This test would have failed before fix linkedin#2 (the exact-match lookup misses the 4-segment principal against a 2-segment registered prefix). All 30 SPIFFE-related tests pass (X509AuthTest 13 + X509SpiffeAuthIntegrationTest 6 + ZkClientUriDomainMappingHelperTest 11). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
242b58b to
45a3776
Compare
| // User-identity SPIFFE certs (issued to humans, not workloads) must NOT be promoted to a | ||
| // service principal, otherwise a user credential would be granted service-level ACL access. | ||
| // See LISPIFFE-ID spec: https://github.com/linkedin-multiproduct/gopki/blob/master/LISPIFFE-ID.md | ||
| private static final Pattern SPIFFE_USER_IDENTITY_PATH_PATTERN = | ||
| Pattern.compile("^/v\\d+/user(/.*)?$"); | ||
|
|
||
| // Matches LISPIFFE v2 workload paths and captures the ILM UID (the path after "/v2/"). | ||
| // The canonical ILM v2 principal is the full path-after-v2 (e.g. "application/foo-mp/bar-app"); | ||
| // ACL matching downstream is segment-prefix on this UID. v1 SPIFFE URIs are deliberately not | ||
| // matched here — they are a deprecated design with app-name collision across MPs. | ||
| private static final Pattern SPIFFE_V2_PATH_PATTERN = Pattern.compile("^/v2/(.+)$"); |
There was a problem hiding this comment.
As mentioned in another comment - do add support for both SPIFFE v1 and v2.
SPIFFE URI format: spiffe://<trust-domain/<version>/<unique-identity>
Examples of SPIFFE v1:
spiffe://trust-domain/v1/wl/<app-name>
Therefore parsing of the workload unique-identity will get wl/<app-name>
Remove the prefix wl to get the .
Examples SPIFFE v2:
User: spiffe://trust-domain/v2/user/<userId>
Online application: spiffe://trust-domain/v2/application/<ILM-unique-id>
Flyte Workflow (Although I think, this is not the usecase for ZK).
More details here: https://github.com/linkedin-multiproduct/gopki/blob/master/LISPIFFE-ID.md#2-uri-path
There was a problem hiding this comment.
Thanks @rgodha — addressed in 4a0533b.
The extractor now accepts both versions:
- v2 (existing):
spiffe://<td>/v2/<path>→ principal is the full path-after-/v2/(the ILM UID, e.g.application/foo-mp/bar-app). - v1 workload (new):
spiffe://<td>/v1/wl/<app-name>→ strip thewl/type prefix; principal is just<app-name>. Matches how legacy authZ handled v1 identities.
User-identity URIs (/v<N>/user/...) are still rejected for both versions — they must never be promoted to a service principal. Other v1 paths (e.g. v1/wf/ workflow) fall through to URN/DN since they are not in scope for ZK.
Test testRealCertWithSpiffeV1WlUriSanIsExtracted builds a real BouncyCastle-signed cert with a v1/wl SAN and asserts the extracted principal is the bare app-name. All 30 SPIFFE-related tests pass locally.
Per @rgodha's review comment, the extractor now accepts both SPIFFE versions instead of rejecting v1 outright: - v2 (existing): spiffe://<td>/v2/<path> → principal is the full path after /v2/ (the ILM UID, e.g. "application/foo-mp/bar-app"). - v1 workload (new): spiffe://<td>/v1/wl/<app-name> → strip the "wl/" type prefix; principal is just the app-name. This matches how legacy authZ handled v1 identities. Other v1 paths (e.g. v1/wf/ workflow) still fall through to URN/DN. User-identity URIs (/v<N>/user/...) continue to be rejected for both versions — they must never be promoted to a service principal. The test match regex broadens to ^spiffe://.*/v[12]/.*$ so existing test scaffolding exercises both versions. testSpiffeV1FallsBackToDn becomes testSpiffeV1WlAuth (now asserts extraction). The integration test for v1 likewise flips from fall-back-to-DN to v1/wl extraction. LISPIFFE-ID spec reference: https://github.com/linkedin-multiproduct/gopki/blob/master/LISPIFFE-ID.md#2-uri-path Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4c137df to
4a0533b
Compare
Description
Adds server-side support for SPIFFE-issued client certificates in ZooKeeper's X509 authentication. When configured by an operator, ZK accepts both SPIFFE v1 and v2 URI SANs as workload principals and resolves them to ACL domains via the existing znode-tree group-ACL provider. Aligned with the LISPIFFE-ID spec and the documented
acl-tool ... --spiffe "<prefix>/*"wildcard idiom. JIRA: DEPEND-89172.Main changes (
zookeeper-server/src/main/java/.../auth/...):X509AuthenticationUtil.matchAndExtractSpiffeSANextracts the principal from SPIFFE URI SANs in two supported forms:spiffe://<td>/v2/<path>) — principal is the full ILM UID (path-after-/v2/), e.g.spiffe://prod.lipki/v2/application/foo-mp/bar-app→application/foo-mp/bar-app.spiffe://<td>/v1/wl/<app-name>) — principal is just<app-name>(thewl/type prefix is stripped, matching how legacy authZ handled v1 identities).Falls through to existing URN / Subject-DN extraction for:
/v<N>/user/...) — never promoted to a service principal (applies to both v1 and v2){v1/wl, v2}paths (e.g.v1/wf/workflow — not in scope for ZK)%in the path (defense-in-depth:URI.getRawPath()+ reject%blocks%2F-smuggling and%75seruser-bypass)X509AuthenticationConfigadds one operator-controlled SPIFFE knob:ssl.x509.spiffe.sanMatchRegex— coarse trust-domain / version gate (recommended:^spiffe://prod\.lipki/v[12]/.*$). Lazy-loaded via double-checked locking against a dedicated lock object withvolatilefields, matching the pattern used byallowedClientIdAsAclDomainsand peers in the same class.ZkClientUriDomainMappingHelperrecursively walks the znode subtree below each domain. Only leaf znodes are registered as keys (path joined by/), letting multi-segment SPIFFE UIDs be expressed as nested znodes (znode names can't contain/).getDomains(clientUri)does exact-match first, then segment-prefix walk-up on the/-aligned segments. The internal mapping field isvolatileandgetDomainssnapshots the reference once at function entry.X509ZNodeGroupAclProvider's updater lambda callshelper.getDomains(clientId)so the walk-up is reachable on every TLS handshake.Operator workflow:
Tests
The following tests were added/updated:
X509SpiffeAuthIntegrationTest(new file, 6 tests) — real BouncyCastle-signed certs throughX509AuthenticationProvider.handleAuthentication. Covers: v1/wl happy-path extraction, v2 happy-path extraction, user-identity rejection, non-SPIFFE URI fall-through, plus two defense-in-depth cases (%2F-smuggling rejected,%75seruser-bypass rejected).X509AuthTest(13 tests total; 9 SPIFFE tests added, 4 pre-existing kept) — mock-cert coverage: v1/wl happy path, v2 happy paths (application/...+workload/...forms), v1+v2 user-identity rejection, multi-SAN error path, URN+SPIFFE coexistence (SPIFFE preferred when both present), SPIFFE-configured-but-no-match falls to URN, SPIFFE-not-configured falls to URN.ZkClientUriDomainMappingHelperTest(11 tests total; 9 new, 2 pre-existing kept):testA2_RecursiveZNodeWalkRegistersMultiLevelClientUris— real znode tree exercises leaf-only registration + prefix walk-uptestA3_SpiffeCertResolvesThroughZNodeMappingToDomainAuthInfo— end-to-end: real SPIFFE cert →X509ZNodeGroupAclProvider.handleAuthentication→ exact-match leaf in znode tree →(x509, <domain>)authInfotestA4_SpiffeCertResolvesViaPrefixWalkUpToDomainAuthInfo— end-to-end via segment-prefix walk-up: real cert with a 4-segment principal resolves through a 2-segment MP-level leaf grant on the production code pathtestC_*(6 tests) — in-memorygetDomainsunit tests: exact-match, single-prefix walk-up, multi-segment union, strict segment alignment, URN-style id (no walk-up), null clientUriSpiffeAuthTestUtil(new) — shared test fixtures: BouncyCastle bootstrap, SPIFFE system-property set/clear, real cert builder with URI SANs, stubAcceptAllTrustManager/NoopKeyManager.X509TestHelpers.newCertWithSans(new) — companion tonewCertthat installs caller-supplied SANs instead of the default localhost SANs.Result of
mvn teston the SPIFFE-related test classes:```
Tests run: 30, Failures: 0, Errors: 0, Skipped: 0
```
Changes that Break Backward Compatibility
None.
ssl.x509.spiffe.sanMatchRegex, SPIFFE extraction is dormant (getSpiffeSanMatchPattern()returns null) and the existing URN / Subject-DN extraction flow is unchanged byte-for-byte./zookeeper/uri-domain-map/<domain>/urn:li:servicePrincipal(...;...)continue to resolve via exact-match ingetDomains. The prefix walk-up only fires when the exact-match misses, and only if the client URI contains/— URN strings never do.ssl.x509.spiffe.sanExtractRegex,ssl.x509.spiffe.sanExtractMatcherGroupIndex,ssl.x509.spiffe.clientIdPrefix) are removed. They were added in an earlier iteration of this work and were never set in any deployed configuration (verified against the prod `acm` ensemble's `zoo.cfg`). Under the doc-aligned design the principal is the full ILM UID (v2) or just the app-name (v1/wl), so per-cert extract regex / prefix knobs are no longer meaningful.🤖 Generated with Claude Code