Modifications to ForecastTrajectoryGapFiller#2369
Conversation
d4a7391 to
8b2f876
Compare
…_interpolation * upstream/master: (24 commits) houndCI removal (metoppv#2393) Add the Ensemble Copula Coupling "transformation" approach (metoppv#2303) Ensure blend_time is consistent with forecast_reference_time when resetting (metoppv#2392) Refine realization coordinate handling when clustering (metoppv#2361) Mobt1205 better accum temporal interp (metoppv#2390) Refactor pollen forecast period setting (metoppv#2385) Changes to how we do distance_to calculations (metoppv#2379) Mobt 1138 ignore git hash cli (metoppv#2388) Ensure that the SubperiodSelector plugin and CLI returns a cube with the model_id_attr, if requested. (metoppv#2386) Add option to update a scalar coordinate attribute as part of the StandardiseMetadata plugin (metoppv#2378) Allow cube-extraction to use a formatted time string (metoppv#2380) Removing a name from CONTRIBUTING.md (metoppv#2384) Support more complex forecast period filtering when training QRF (metoppv#2383) Mobt1161 precip phase decision tree (metoppv#2376) MOBT-1148 Add subperiod-selector tool (metoppv#2373) Revert "change ApplyDecisionTree categorical cube dtype from int32 to int16 (…" (metoppv#2382) change ApplyDecisionTree categorical cube dtype from int32 to int16 (metoppv#2371) Changed implementation from clipping to masking + added Unit Tests (metoppv#2366) Adding kwarg as CLI argument & updating acceptance tests. (metoppv#2377) Refactor Pollen index for daily and hourly to single plugin (metoppv#2372) ...
maxwhitemet
left a comment
There was a problem hiding this comment.
This looks good. I've added 3 comments on clarity and 1 on simplifying a code block.
| combined with accumulation, max, or min period-type specifiers. | ||
| Default is False. |
There was a problem hiding this comment.
I am a bit confused and may have missed something: the min, max, and accumulation parameters are not exposed here. Comments:
- Should these parameters be exposed here?
- It may be good to add backticks around 'max', 'min', and 'accumulation' (
max,min,accumulation) in-text so it's clear they are parameters. - Alternatively to point 2, perhaps merge these 3 parameters into one parameter with each of the former being an optional argument (e.g.
period_type: Literal['accumulation','min','max'] = None. This feels clearer and may simplify some of the error raising.
| interpolation_window_in_minutes: | ||
| Time window (in minutes) as +/- range around forecast source transitions. |
There was a problem hiding this comment.
I am confused by the interpolation_window_in_minutes parameter, having not read the code for a while, Could some clarification be added here – perhaps an example – on the difference between interval_in_minutes and interpolation_window_in_minutes.
I understand the former as 'supplying 60 means I expect 60min between all provided cubes, so if I provide a cube valid at 12PM and another valid at 2PM, an intermediate cube should be added at 1PM'. An example such as this, with the interpolation_window_in_minutes on top could be helpful.
| missing_periods = self._identify_gaps(sorted_cubelist) | ||
| periods_to_regenerate = self._identify_periods_to_regenerate(sorted_cubelist) |
There was a problem hiding this comment.
I may have missed this: is there handling for when there are no gaps? E.g. skipping if missing_periods is empty. I think this happens further down but this would be a good place for it.
| min_period = existing_periods[0] | ||
| max_period = existing_periods[-1] | ||
| missing_periods = [] | ||
| current = min_period + self.interval_in_minutes | ||
| current = min_period + self.interval_in_seconds | ||
| while current < max_period: | ||
| if current not in existing_periods: | ||
| missing_periods.append(current) | ||
| current += self.interval_in_minutes | ||
| current += self.interval_in_seconds |
There was a problem hiding this comment.
Given the output of `_get_forecast_periods()' is a 'Sorted list of unique forecast periods in seconds, then this chunk could be simplified by finding the set difference. E.g.
existing_periods = set(self._get_forecast_periods(cubelist))
min_period = min(existing_periods)
max_period = max(existing_periods)
possible_periods_given_interval = set(range(min_period, max_period + self.interval_in_seconds, self.interval_in_seconds))
missing_periods = sorted(possible_periods_given_interval - existing_periods)
maxwhitemet
left a comment
There was a problem hiding this comment.
This looks good. I've added 3 comments on clarity and 1 on simplifying a code block.
mo-jbeaver
left a comment
There was a problem hiding this comment.
A few comments, but overall happy with the changes. All the tests passed successfully.
| treat_period_as_instantaneous: | ||
| If True, period diagnostics (inputs with time bounds) are treated | ||
| as instantaneous values for interpolation. No period-specific | ||
| renormalisation or max/min constraints are applied. |
There was a problem hiding this comment.
Maybe needs to be rephrased to clarify that it will error if one of accumulation, min, or max inputs are also provided.
| ValueError: If interval_in_minutes is not set. | ||
| """ | ||
| if self.interval_in_minutes is None: | ||
| if self.interval_in_seconds is None: | ||
| raise ValueError( | ||
| "interval_in_minutes must be set to identify gaps in forecast period." |
There was a problem hiding this comment.
There seems to be a mismatch of interval_in_seconds and interval_in_minutes here
| """Create interpolation tasks for periods to regenerate at regular intervals. | ||
|
|
||
| Instead of regenerating only at the transition point, generates forecasts | ||
| at regular intervals (specified by interval_in_minutes) across the entire |
There was a problem hiding this comment.
I understand the initial variable is interval_in_minutes, but the check and code only uses interval_in_seconds .
| { | ||
| "interval_in_minutes": 60, | ||
| "accumulation": True, | ||
| "treat_period_as_instantaneous": True, | ||
| }, | ||
| "treat_period_as_instantaneous cannot be combined", | ||
| ), # Conflicting period handling modes |
There was a problem hiding this comment.
Does this need to be repeated for max and min too?
| np.testing.assert_almost_equal(result[0].data, 4.0) | ||
| np.testing.assert_almost_equal(result[1].data, 7.0) |
There was a problem hiding this comment.
A short explanation may be need for how these numbers are determined
| Expected behavior: | ||
| - Regeneration fills only interior 1-hour intervals in window: 4, 5, 6, 7, 8 | ||
| (3 and 9 are boundary inputs and are not regenerated) | ||
| - Gap filling fills remaining missing intervals: 10, 11 | ||
| - Original cubes included: 3, 9, 12 | ||
|
|
||
| Result: [3, 4, 5, 6, 7, 8, 9, 10, 11, 12] |
There was a problem hiding this comment.
This is a good explanation for the difference between gap and regeneration fills. Maybe consider putting this in the plugin to.
Related to https://github.com/metoppv/mo-blue-team/issues/783
Description
This PR makes some edits to the ForecastTrajectoryGapFiller, mainly for the aim of being able to interpolate accumulation fields, like instantaneous fields e.g. interpolating a T+3 1 hour precipitation accumulation field, from a fields at T+2 and T+4, without an assumption that the interpolation is actually an attempt to disaggregate an accumulation field e.g. disaggregating a 3 hour precipitation accumulation field into 1 hour precipitation accumulation field.
This PR therefore provides a pathway through the ForecastTrajectoryGapFiller (and TemporalInterpolation) plugins for accumulation field interpolation, modifies bounds handling of the accumulation fields, and converts quite a lot of time-based processing in the ForecastTrajectoryGapFiller to use seconds to consistency with the coordinates being handled.
Testing: