Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 41 additions & 11 deletions docs/impulse/docs/config/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -360,15 +360,18 @@ mode-resolution rules and what counts as a definition change.

## measurement_dimensions (optional)

List of silver-layer `container_metrics` column names to surface into the
gold-layer `measurement_dimension` table. Names pass through unchanged:
whatever you list here is what you get in gold.

Any column present in your silver `container_metrics` table is a valid
entry — there is no closed allow-list. Typical choices include
`container_id`, `uut_id`, `project`, `vehicle_key`, `file_name`,
List of `container_metrics` column names to surface into the gold-layer
`measurement_dimension` table. Names are matched **after**
[`solver_config.container_metrics.column_name_mapping`](#solver-column-mappings-and-filters)
has been applied — i.e. these are the **internal (post-mapping)** column
names, not the physical silver column names. Each name passes through
to gold verbatim, so the configured name is also the gold column name.

Any column present in your post-mapping `container_metrics` DataFrame
is a valid entry — there is no closed allow-list. Typical choices
include `container_id`, `uut_id`, `project`, `vehicle_key`, `file_name`,
`file_path`, `start_ts`, `stop_ts`, and `environment`, but any column
your silver schema carries is fair game.
your silver schema carries (under its internal name) is fair game.

`container_id` is part of the default list and is recommended for any
real-world config: it is the upsert key used by incremental processing
Expand All @@ -388,9 +391,36 @@ downstream joins and incremental runs.
]
```

If any listed column is not present in the silver `container_metrics`
table when the report runs, the run fails fast with a `ValueError`
naming the missing columns.
If any listed column is not present in the post-mapping
`container_metrics` DataFrame when the report runs, the run fails fast
with a `ValueError` naming the missing columns.

### Worked example: physical name differs from internal name

Suppose your silver `container_metrics` table has a physical column
`my_measurement_id` (no `container_id`). Map it to the internal name
in `solver_config`, then reference the internal name in
`measurement_dimensions`:

```json
{
"query_engine": {
"solver": "KeyValueStoreSolver",
"solver_config": {
"container_metrics": {
"column_name_mapping": { "my_measurement_id": "container_id" }
}
}
},
"measurement_dimensions": ["container_id", "start_ts", "stop_ts"]
}
```

The gold `measurement_dimension` table will have columns
`container_id`, `start_ts`, `stop_ts`. Listing `"my_measurement_id"` in
`measurement_dimensions` here would fail — by the time the framework
selects the dimensions, the column has already been renamed to
`container_id`.

**Migration note (pre-0.1):** earlier versions exposed a fixed enum
that renamed two silver columns on the way to gold (`project` →
Expand Down
17 changes: 12 additions & 5 deletions docs/impulse/docs/data_model/silver_layer_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,18 +121,25 @@ Every other column on `container_metrics` is **pass-through**: it lands in
the gold `measurement_dimension` table verbatim if you list it in
[`measurement_dimensions`](../config/configuration.md#measurement_dimensions-optional),
and can be used in `container_filters.metric_filters`; the framework does
not reference it under any specific internal name.
not reference it under any specific internal name. Both `measurement_dimensions`
entries and `metric_filters.column_name` references are matched against
the **post-mapping** (internal) column names — i.e. after
[`solver_config.container_metrics.column_name_mapping`](../config/configuration.md#solver-column-mappings-and-filters)
has been applied. If your physical silver column has a different name,
add it to that mapping and reference the internal name here.

### Additional columns commonly populated

`container_metrics` typically carries additional metadata columns that
get surfaced into the gold-layer `measurement_dimension` table when
listed in the report's
[`measurement_dimensions`](../config/configuration.md#measurement_dimensions-optional)
config. Any column you list there must exist in this table — names
pass through to gold unchanged. The columns below are common choices,
but `measurement_dimensions` accepts any column you carry on this
table.
config. Any column you list there must exist on this table under its
**post-mapping** (internal) name — see
[`column_name_mapping`](../config/configuration.md#solver-column-mappings-and-filters)
for how physical-to-internal renaming works. Those internal names pass
through to gold unchanged. The columns below are common choices, but
`measurement_dimensions` accepts any column you carry on this table.

| Column | Type | Description |
|---------------|----------|----------------------------------------------|
Expand Down
11 changes: 9 additions & 2 deletions src/impulse_reporting/config/config_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -428,8 +428,15 @@ class ImpulseConfig(BaseModel):
incremental : IncrementalConfig, optional
Optional incremental processing configuration. Defaults to IncrementalConfig().
measurement_dimensions : list of str, optional
Silver-layer ``container_metrics`` column names to surface into the
gold-layer ``measurement_dimension`` table. Defaults to
Column names to surface from ``container_metrics`` into the
gold-layer ``measurement_dimension`` table. Names are matched
**after** ``query_engine.solver_config.container_metrics.column_name_mapping``
has been applied — i.e. these are the internal (post-mapping)
column names, not the physical silver column names. If a silver
table uses a physical name like ``my_measurement_id`` mapped to
``container_id``, list ``"container_id"`` here. Each listed name
lands in the gold table verbatim, so the configured name is also
the gold column name. Defaults to
``["container_id", "start_ts", "stop_ts"]``. The framework does not
inject any column the user omits — keeping ``container_id`` in the
list is recommended because it is the upsert key for incremental
Expand Down
22 changes: 16 additions & 6 deletions src/impulse_reporting/meta/container_dimensions.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,14 @@ def get_dimension(
of containers from the silver ``container_metrics`` table.

Uses the solver filter pipeline (filter_container_tags -> filter_container_metrics)
to resolve the matching set of containers and their full metrics, then
selects exactly the columns listed in ``config.measurement_dimensions``.
Silver column names pass through to gold unchanged.
to resolve the matching set of containers and their full metrics.
``filter_container_metrics`` applies ``container_metrics.column_name_mapping``
(physical → internal) before returning, so the DataFrame's columns
are already the internal (post-mapping) names. This method then
selects exactly the columns listed in ``config.measurement_dimensions``
— entries must therefore reference the **post-mapping** (internal)
names, not the physical silver column names. Post-mapping column
names pass through to gold unchanged.

Parameters
----------
Expand All @@ -51,7 +56,7 @@ def get_dimension(
------
ValueError
If any column listed in ``config.measurement_dimensions`` is not
present in the silver ``container_metrics`` DataFrame.
present in the post-mapping ``container_metrics`` DataFrame.
"""
measurement_dimensions = config.measurement_dimensions

Expand All @@ -64,8 +69,13 @@ def get_dimension(
if missing:
raise ValueError(
"Configured measurement_dimensions columns are not present in "
f"the silver container_metrics table: {missing}. Available "
f"columns: {df.columns}."
f"the container_metrics DataFrame: {missing}. Available "
f"columns: {df.columns}. Note: measurement_dimensions entries "
"must be the post-mapping (internal) column names, i.e. the "
"names that exist after container_metrics.column_name_mapping "
"has been applied. If your physical silver column has a "
"different name, add it to that mapping and reference the "
"internal name here."
)

return df.select(*measurement_dimensions).transform(
Expand Down
Loading