feat(etl)!: pluggable audit-column stampers — clean record shape by default#24
Merged
Conversation
fe40d0b to
d4156a9
Compare
BREAKING: bcli etl sync no longer injects audit/metadata columns by default. The dlt pipeline emits a clean record shape; extra columns are opt-in. Keeps vendor-specific audit conventions out of the OSS package. - New `bcli.etl.stampers` entry-point group: a plugin exposes a zero-arg callable returning a Stamper (per-page row transform). `bcli.etl. _stamper_factory.build_stampers(names)` resolves a name list to concrete stampers, skip-with-warning on unknown/failing — mirrors the telemetry / ask factory dispatch. Stays in the generic tier (no bcli.* import; enforced by the existing CI grep test). - New EtlConfig `[etl] stampers = [...]` opt-in + `bcli etl sync --stamper NAME` (repeatable) per-run override. - `bcli_profile()` drops its built-in audit-column flag; generic audit_stamper / company_id_stamper stay. - Scrubbed vendor framing from ETL help text + docstrings. Downstream packages register their own audit stampers under the new group and opt in via [etl] stampers, requiring bcli >= 0.6.0.
d4156a9 to
8b32156
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes the OSS dlt ETL pipeline emit a clean, vendor-neutral record shape.
bcli etl syncno longer injects audit/metadata columns by default — any extra columns are opt-in through a pluggable stamper mechanism, so vendor-specific audit conventions live in downstream packages rather than the OSS core.Mirrors the OSS-mechanism / downstream-content split already used for telemetry sinks, extract backends, packs, and ask context providers.
What changed
bcli.etl.stampersentry-point group. A plugin exposes a zero-arg callable returning aStamper=Callable[[list[dict]], list[dict]](per-page row transform).bcli.etl._stamper_factory.build_stampers(names)resolves a config name-list to concrete stampers; unknown/failing names are skipped with a warning so one broken plugin never aborts a sync.[etl] stampers = [...]config (EtlConfig, wired intoBCConfig) +bcli etl sync --stamper NAME(repeatable) per-run override.bcli_profile()drops its built-in audit-column flag in favour of the genericstampers=[...]argument. Genericaudit_stamper/company_id_stamperhelpers stay in OSS._stamper_factory.pystays in the generic tier — nobcli.*import (enforced by the existing CI grep test, which now covers it).Breaking change
bcli etl syncoutput schema changes: the previously-default audit columns disappear unless a stamper plugin is installed + opted in. Documented under## [0.6.0]in CHANGELOG with a migration note. bcli just shipped 0.5.0, so 0.6.0 is the right home.Downstream
Downstream packages register their own audit-column stampers under the new group and opt in via
[etl] stampers, requiring bcli >= 0.6.0.Test plan
uv run pytest— 946 passed, 5 skipped, 0 faileduv run ruff check src/ tests/— cleantests/test_etl/test_stamper_factory.py— discovery, ordered resolution, unknown-name skip, failing-factory skip, non-callable dropbcli etl sync --helpshows--stamper; copy is vendor-neutraldiscover_stamper_factories() == {}andbuild_stampers(["audit"])warns + returns[](no plugins shipped in OSS)bcli.*-import test extended to_stamper_factory.py