diff --git a/docs/impulse/docs/config/configuration.md b/docs/impulse/docs/config/configuration.md index c59a58c..471ff5c 100644 --- a/docs/impulse/docs/config/configuration.md +++ b/docs/impulse/docs/config/configuration.md @@ -360,15 +360,18 @@ mode-resolution rules and what counts as a definition change. ## measurement_dimensions (optional) -List of silver-layer `container_metrics` column names to surface into the -gold-layer `measurement_dimension` table. Names pass through unchanged: -whatever you list here is what you get in gold. - -Any column present in your silver `container_metrics` table is a valid -entry — there is no closed allow-list. Typical choices include -`container_id`, `uut_id`, `project`, `vehicle_key`, `file_name`, +List of `container_metrics` column names to surface into the gold-layer +`measurement_dimension` table. Names are matched **after** +[`solver_config.container_metrics.column_name_mapping`](#solver-column-mappings-and-filters) +has been applied — i.e. these are the **internal (post-mapping)** column +names, not the physical silver column names. Each name passes through +to gold verbatim, so the configured name is also the gold column name. + +Any column present in your post-mapping `container_metrics` DataFrame +is a valid entry — there is no closed allow-list. Typical choices +include `container_id`, `uut_id`, `project`, `vehicle_key`, `file_name`, `file_path`, `start_ts`, `stop_ts`, and `environment`, but any column -your silver schema carries is fair game. +your silver schema carries (under its internal name) is fair game. `container_id` is part of the default list and is recommended for any real-world config: it is the upsert key used by incremental processing @@ -388,9 +391,36 @@ downstream joins and incremental runs. ] ``` -If any listed column is not present in the silver `container_metrics` -table when the report runs, the run fails fast with a `ValueError` -naming the missing columns. +If any listed column is not present in the post-mapping +`container_metrics` DataFrame when the report runs, the run fails fast +with a `ValueError` naming the missing columns. + +### Worked example: physical name differs from internal name + +Suppose your silver `container_metrics` table has a physical column +`my_measurement_id` (no `container_id`). Map it to the internal name +in `solver_config`, then reference the internal name in +`measurement_dimensions`: + +```json +{ + "query_engine": { + "solver": "KeyValueStoreSolver", + "solver_config": { + "container_metrics": { + "column_name_mapping": { "my_measurement_id": "container_id" } + } + } + }, + "measurement_dimensions": ["container_id", "start_ts", "stop_ts"] +} +``` + +The gold `measurement_dimension` table will have columns +`container_id`, `start_ts`, `stop_ts`. Listing `"my_measurement_id"` in +`measurement_dimensions` here would fail — by the time the framework +selects the dimensions, the column has already been renamed to +`container_id`. **Migration note (pre-0.1):** earlier versions exposed a fixed enum that renamed two silver columns on the way to gold (`project` → diff --git a/docs/impulse/docs/data_model/silver_layer_schema.md b/docs/impulse/docs/data_model/silver_layer_schema.md index dc1c127..804e29d 100644 --- a/docs/impulse/docs/data_model/silver_layer_schema.md +++ b/docs/impulse/docs/data_model/silver_layer_schema.md @@ -121,7 +121,12 @@ Every other column on `container_metrics` is **pass-through**: it lands in the gold `measurement_dimension` table verbatim if you list it in [`measurement_dimensions`](../config/configuration.md#measurement_dimensions-optional), and can be used in `container_filters.metric_filters`; the framework does -not reference it under any specific internal name. +not reference it under any specific internal name. Both `measurement_dimensions` +entries and `metric_filters.column_name` references are matched against +the **post-mapping** (internal) column names — i.e. after +[`solver_config.container_metrics.column_name_mapping`](../config/configuration.md#solver-column-mappings-and-filters) +has been applied. If your physical silver column has a different name, +add it to that mapping and reference the internal name here. ### Additional columns commonly populated @@ -129,10 +134,12 @@ not reference it under any specific internal name. get surfaced into the gold-layer `measurement_dimension` table when listed in the report's [`measurement_dimensions`](../config/configuration.md#measurement_dimensions-optional) -config. Any column you list there must exist in this table — names -pass through to gold unchanged. The columns below are common choices, -but `measurement_dimensions` accepts any column you carry on this -table. +config. Any column you list there must exist on this table under its +**post-mapping** (internal) name — see +[`column_name_mapping`](../config/configuration.md#solver-column-mappings-and-filters) +for how physical-to-internal renaming works. Those internal names pass +through to gold unchanged. The columns below are common choices, but +`measurement_dimensions` accepts any column you carry on this table. | Column | Type | Description | |---------------|----------|----------------------------------------------| diff --git a/src/impulse_reporting/config/config_parser.py b/src/impulse_reporting/config/config_parser.py index 5a9db00..f2a0428 100644 --- a/src/impulse_reporting/config/config_parser.py +++ b/src/impulse_reporting/config/config_parser.py @@ -428,8 +428,15 @@ class ImpulseConfig(BaseModel): incremental : IncrementalConfig, optional Optional incremental processing configuration. Defaults to IncrementalConfig(). measurement_dimensions : list of str, optional - Silver-layer ``container_metrics`` column names to surface into the - gold-layer ``measurement_dimension`` table. Defaults to + Column names to surface from ``container_metrics`` into the + gold-layer ``measurement_dimension`` table. Names are matched + **after** ``query_engine.solver_config.container_metrics.column_name_mapping`` + has been applied — i.e. these are the internal (post-mapping) + column names, not the physical silver column names. If a silver + table uses a physical name like ``my_measurement_id`` mapped to + ``container_id``, list ``"container_id"`` here. Each listed name + lands in the gold table verbatim, so the configured name is also + the gold column name. Defaults to ``["container_id", "start_ts", "stop_ts"]``. The framework does not inject any column the user omits — keeping ``container_id`` in the list is recommended because it is the upsert key for incremental diff --git a/src/impulse_reporting/meta/container_dimensions.py b/src/impulse_reporting/meta/container_dimensions.py index 040bdb0..7e254fe 100644 --- a/src/impulse_reporting/meta/container_dimensions.py +++ b/src/impulse_reporting/meta/container_dimensions.py @@ -24,9 +24,14 @@ def get_dimension( of containers from the silver ``container_metrics`` table. Uses the solver filter pipeline (filter_container_tags -> filter_container_metrics) - to resolve the matching set of containers and their full metrics, then - selects exactly the columns listed in ``config.measurement_dimensions``. - Silver column names pass through to gold unchanged. + to resolve the matching set of containers and their full metrics. + ``filter_container_metrics`` applies ``container_metrics.column_name_mapping`` + (physical → internal) before returning, so the DataFrame's columns + are already the internal (post-mapping) names. This method then + selects exactly the columns listed in ``config.measurement_dimensions`` + — entries must therefore reference the **post-mapping** (internal) + names, not the physical silver column names. Post-mapping column + names pass through to gold unchanged. Parameters ---------- @@ -51,7 +56,7 @@ def get_dimension( ------ ValueError If any column listed in ``config.measurement_dimensions`` is not - present in the silver ``container_metrics`` DataFrame. + present in the post-mapping ``container_metrics`` DataFrame. """ measurement_dimensions = config.measurement_dimensions @@ -64,8 +69,13 @@ def get_dimension( if missing: raise ValueError( "Configured measurement_dimensions columns are not present in " - f"the silver container_metrics table: {missing}. Available " - f"columns: {df.columns}." + f"the container_metrics DataFrame: {missing}. Available " + f"columns: {df.columns}. Note: measurement_dimensions entries " + "must be the post-mapping (internal) column names, i.e. the " + "names that exist after container_metrics.column_name_mapping " + "has been applied. If your physical silver column has a " + "different name, add it to that mapping and reference the " + "internal name here." ) return df.select(*measurement_dimensions).transform(