Skip to content

RealizationSelection plugin modification#2364

Open
gavinevans wants to merge 23 commits into
metoppv:masterfrom
gavinevans:mobt_783_realization_selection_error
Open

RealizationSelection plugin modification#2364
gavinevans wants to merge 23 commits into
metoppv:masterfrom
gavinevans:mobt_783_realization_selection_error

Conversation

@gavinevans

@gavinevans gavinevans commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Related to https://github.com/metoppv/mo-blue-team/issues/783

improver_test_data PR: metoppv/improver_test_data#140

Description
This PR makes a number of updates to the RealizationSelection plugin following trying to use the plugin with more realistic data:

  • Building on changes made in Support resetting the forecast_reference_time when clustering #2359, the forecast_reference_time and forecast_period on the input is reset to be from a common forecast_reference_time.
  • Add an exception is no forecast cubes are provided as an input.
  • Alter how the realizations are selected from the model cubes, so that the index of the realization is used, rather than the realization number on the coordinate.
  • Add the "cluster_sources" attribute to the result cube, from the cluster_cube, as this attribute is expected by the ForecastTrajectoryGapFiller plugin.
  • Alter how the name of the primary model is identified. Now the cluster_sources attribute is used, rather than expecting a "mosg__model_configuration" to be present on the cluster_cube. This attribute wouldn't be expected to be present on the cluster_cube.
  • Alter how the nearest secondary input is identified. This was an issue, for example, if one secondary input finished at T+12, the T+12:15 forecast would get mapped to using the secondary input from T+12, even though that input is not available at T+12:15. Therefore, the nearest, larger forecast period is now identified e.g. T+13, and the mapping from this forecast period is used.

Testing:

  • Ran tests and they passed OK
  • Added new tests for the new feature(s)

@gavinevans gavinevans marked this pull request as draft April 21, 2026 14:42
…_realization_selection_error

* mobt_783_reset_cycletime_when_clustering:
  Modifications
  Modifications following review.
  Modifications following review comments.
  Add handling for cube without forecast_reference_time coordinate.
  Add cycletime argument to realization_cluster_and_match, so that different inputs can be reset to have a common forecast_reference_time.
…ion_selection_error

* upstream/master:
  Support resetting the forecast_reference_time when clustering (metoppv#2359)
  Added support for equality operator (metoppv#2362)
@gavinevans gavinevans changed the title Add exception to RealizationSelection plugin RealizationSelection plugin modification Apr 24, 2026
@gavinevans gavinevans marked this pull request as ready for review April 24, 2026 16:42
model_id_attr (str):
The name of the cube attribute used to identify the model source.
cycletime (str):
The forecast_reference_time on the input forecast cubes will be reset to

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (84 > 79 characters)

)
result_cube = MergeCubes()(CubeList(selected_cubes))
if "cluster_sources" in cluster_cube.attributes:
result_cube.attributes["cluster_sources"] = cluster_cube.attributes[

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (80 > 79 characters)

n_realizations = len(model_cube.coord("realization").points)
if realization_index < 0 or realization_index >= n_realizations:
raise ValueError(
f"Realization index {realization_index} is out of bounds for "

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)

if not model_cube.coords("realization"):
selected = model_cube
selected.add_aux_coord(
AuxCoord(cluster_idx, standard_name="realization", units="1")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)


Raises:
ValueError: If no forecast cube is found for a specified model name.
ValueError: If a specified realization index is out of bounds for the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)

"""
Find the nearest forecast period in the secondary mapping to the requested
forecast period.
Find the nearest forecast period in the secondary mapping that is greater

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)

reset to this value. The forecast periods will be adjusted accordingly
with the validity times kept fixed. cycletime should be provided in the
format YYYYMMDDTHHMMZ (e.g., 20240101T0000Z). If not provided, the
forecast_reference_time on the input cubes will be left unchanged.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)

cycletime: The forecast_reference_time on the input forecast cubes will be
reset to this value. The forecast periods will be adjusted accordingly
with the validity times kept fixed. cycletime should be provided in the
format YYYYMMDDTHHMMZ (e.g., 20240101T0000Z). If not provided, the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)


cycletime: The forecast_reference_time on the input forecast cubes will be
reset to this value. The forecast periods will be adjusted accordingly
with the validity times kept fixed. cycletime should be provided in the

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (87 > 79 characters)

source.

cycletime: The forecast_reference_time on the input forecast cubes will be
reset to this value. The forecast periods will be adjusted accordingly

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (86 > 79 characters)

…ion_selection_error

* upstream/master: (22 commits)
  Refine realization coordinate handling when clustering (metoppv#2361)
  Mobt1205 better accum temporal interp (metoppv#2390)
  Refactor pollen forecast period setting (metoppv#2385)
  Changes to how we do distance_to calculations (metoppv#2379)
  Mobt 1138 ignore git hash cli (metoppv#2388)
  Ensure that the SubperiodSelector plugin and CLI returns a cube with the model_id_attr, if requested. (metoppv#2386)
  Add option to update a scalar coordinate attribute as part of the StandardiseMetadata plugin (metoppv#2378)
  Allow cube-extraction to use a formatted time string (metoppv#2380)
  Removing a name from CONTRIBUTING.md (metoppv#2384)
  Support more complex forecast period filtering when training QRF (metoppv#2383)
  Mobt1161 precip phase decision tree (metoppv#2376)
  MOBT-1148 Add subperiod-selector tool (metoppv#2373)
  Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (metoppv#2382)
  change ApplyDecisionTree categorical cube dtype from int32 to int16 (metoppv#2371)
  Changed implementation from clipping to masking + added Unit Tests (metoppv#2366)
  Adding kwarg as CLI argument & updating acceptance tests. (metoppv#2377)
  Refactor Pollen index for daily and hourly to single plugin (metoppv#2372)
  EPPT-3259 fix fsi duplicate metadata (metoppv#2370)
  Changes to Pollen classes for refactoring cube long names and units of concentration (metoppv#2368)
  Eppt 3223 lifted index investigate why the values are wrong (metoppv#2365)
  ...
model_id_attr: The name of the cube attribute used to identify the model
source.

cycletime: The forecast_reference_time on the input forecast cubes will be

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (86 > 79 characters)

frt=datetime(2017, 1, 10, 3),
)
deterministic_cube.coord("forecast_period").points = [3600]
deterministic_cube.attributes["mosg__model_configuration"] = "primary_model"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (80 > 79 characters)



def test_realizationselection_deterministic_input_no_realization_coord():
"""Test deterministic forecast input is handled without a realization coord."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (83 > 79 characters)

cubes = CubeList([cluster_cube])

plugin = RealizationSelection(forecast_period=3600)
with pytest.raises(ValueError, match="No forecast cubes found in input cubes."):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (84 > 79 characters)

}
cluster_cube = _make_cluster_cube_for_selection(primary_map, secondary_map)
# Forecast cubes: secondary model, fp=4000 (nearest is 3600)
# Forecast cubes: secondary model, fp=4000 (nearest greater-or-equal is 5400)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)

def test_realizationselection_secondary_nearest_fp():
"""Test nearest forecast period is used from secondary mapping."""
def test_realizationselection_secondary_nearest_greater_or_equal_fp():
"""Test nearest greater-or-equal forecast period is used from secondary mapping."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (87 > 79 characters)


def test_realizationselection_secondary_nearest_fp():
"""Test nearest forecast period is used from secondary mapping."""
def test_realizationselection_secondary_nearest_greater_or_equal_fp():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected 2 blank lines, found 1

)
result = plugin.process(cubes)

assert result.coord("forecast_reference_time").cell(0).point._to_real_datetime() == datetime.strptime("20170110T0400Z", "%Y%m%dT%H%MZ")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (139 > 79 characters)

cluster_cube = _make_cluster_cube_for_selection(primary_map)

# Initial forecast period is 7200s with forecast reference time at 03:00
# and validity at 05:00. Setting cycletime to 04:00 should keep validity fixed

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)



def test_realizationselection_cycletime():
"""Test that cycletime resets forecast_reference_time and forecast_period."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)

…ion_selection_error

* upstream/master:
  Ensure blend_time is consistent with forecast_reference_time when resetting (metoppv#2392)
frt_coord.rename("blend_time")
frt_coord.bounds = None
frt_dims = cube.coord_dims("forecast_reference_time")
cube.add_aux_coord(frt_coord, data_dims=frt_dims if frt_dims else None)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (87 > 79 characters)


If blend_time exists on at least one selected cube, any selected cube
without blend_time is given one copied from forecast_reference_time. If
cycletime is provided, reset_forecast_reference_time_and_period is applied

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)

selected_cubes.append(selected)
return selected_cubes

def _ensure_blend_time_on_selected_cubes(self, selected_cubes: list[Cube]) -> None:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (87 > 79 characters)

result.coord("blend_time").points,
result.coord("forecast_reference_time").points,
)
assert result.coord("forecast_reference_time").cell(0).point._to_real_datetime() == datetime.strptime("20170110T0400Z", "%Y%m%dT%H%MZ")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (139 > 79 characters)



def test_realizationselection_blend_time_propagated_and_aligned_with_cycletime():
"""Test blend_time is added to all selected cubes and aligned to cycletime."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)

assert result.coord("forecast_period").points[0] == 3600


def test_realizationselection_blend_time_propagated_and_aligned_with_cycletime():

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)

"""Remove blend_time coordinate from all selected cubes if present on any.

blend_time is removed to avoid ambiguity in the merged output, as selected
cubes may come from different source models with differing blend_time values.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (85 > 79 characters)

) -> None:
"""Remove blend_time coordinate from all selected cubes if present on any.

blend_time is removed to avoid ambiguity in the merged output, as selected

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (82 > 79 characters)

if not model_cube.coords("realization"):
selected = model_cube
selected.add_aux_coord(
DimCoord(cluster_idx, standard_name="realization", units="1")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (81 > 79 characters)



def test_realizationselection_blend_time_removed_from_selected_cubes():
"""Test blend_time is removed from all selected cubes when present on any input."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (87 > 79 characters)

@mo-jbeaver mo-jbeaver left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, but overall happy with the changes. Tests ran successfully.

"""
self.forecast_period = forecast_period
self.model_id_attr = model_id_attr
self.cycletime = cycletime

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a check needed here to make sure it is in the correct YYYYMMDDTHHMMZ format?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some validation of the cycletime format.

@mo-jbeaver mo-jbeaver removed their assignment Jun 10, 2026
Comment thread improver/clustering/realization_clustering.py Outdated
Comment thread improver/clustering/realization_clustering.py
Comment thread improver/clustering/realization_clustering.py
@maxwhitemet

Copy link
Copy Markdown
Contributor

Only some small comments added. Happy to approve otherwise 👍

@maxwhitemet maxwhitemet left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gavinevans. Happy to approve 👍

@gavinevans gavinevans assigned mo-jbeaver and unassigned maxwhitemet Jun 10, 2026

@mo-jbeaver mo-jbeaver left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with the update and additional tests ran successfully.

@mo-jbeaver mo-jbeaver assigned gavinevans and unassigned mo-jbeaver Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants