Skip to content

added checking and logging for nan input#26

Draft
mkarim-rtx wants to merge 10 commits into
ngwpc-candidatefrom
nan_groundwater_flux_mohammed
Draft

added checking and logging for nan input#26
mkarim-rtx wants to merge 10 commits into
ngwpc-candidatefrom
nan_groundwater_flux_mohammed

Conversation

@mkarim-rtx
Copy link
Copy Markdown

@mkarim-rtx mkarim-rtx commented May 11, 2026

https://jira.nextgenwaterprediction.com/browse/NGWPC-10660

The issue involved invalid upstream groundwater and PET inputs propagating into the CFE model and producing NaN values in outputs such as POTENTIAL_ET, GW_STORAGE, DEEP_GW_TO_CHANNEL_FLUX, and Q_OUT. Investigation showed that the root cause was not the CFE physics itself, but invalid forcing and parameter values being passed into CFE from upstream workflows and parameter generation.

The fix focused on improving diagnostics and preventing unstable groundwater flux calculations while preserving the original invalid values for debugging and scientific transparency. Warning logs were added for:

  • invalid groundwater reservoir configuration values during initialization,
  • invalid groundwater reservoir state/parameter values during runtime flux calculations,
  • and invalid PET forcing values.

The warnings were implemented using throttled logging to avoid excessive duplicate messages while still clearly exposing the issue in the EWTS logs. The implementation was carefully updated so that invalid values are logged but not overwritten or silently replaced with defaults. As a result:

  • POTENTIAL_ET and GW_STORAGE remain NaN when upstream inputs are invalid,
  • groundwater flux calculations safely return zero flux for that timestep to prevent unstable numerical propagation into Q_OUT,
  • and the simulation continues running while preserving visibility of the underlying upstream issue.

Final validation confirmed that the EWTS logs now contain the expected warning messages for invalid PET forcing and invalid groundwater reservoir states/configuration, including duplicate-warning suppression behavior.

@mkarim-rtx mkarim-rtx marked this pull request as draft May 11, 2026 19:50
Comment thread src/bmi_cfe.c
Comment thread src/conceptual_reservoir.c
Comment thread src/bmi_cfe.c Outdated
Comment on lines +1015 to +1019
model->gw_reservoir.gw_storage = 0.0;
model->gw_reservoir.storage_max_m = 0.0;
model->gw_reservoir.storage_m = 0.0;
model->gw_reservoir.coeff_primary = 0.0;
model->gw_reservoir.exponent_primary = 1.0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this appropriate? For instance, you're removing any valid gw_storage value if exponent_primary is NaN.

More broadly, is it CFE's job to set default values for null data? What source are you using for why it's more appropriate to use 0.0 as the default gw_storage instead of 1.0 or 10.2384?

This all seems like user error for sending incomplete data. I would think the model should either warn the user that nulls were provided and generate null (garbage in, garbage out) or log it as an error and prevent the process from starting.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I revised the approach so CFE does not silently assign arbitrary default groundwater parameters when required config values are invalid.

For static groundwater configuration values such as gw_storage, max_gw_storage, Cgw, and expon, CFE now validates the inputs during initialization. If any required groundwater value is non-finite or physically invalid, initialization fails with a clear SEVERE log message. This avoids replacing valid user-provided values and avoids inventing unsupported defaults.

The groundwater runtime guard remains only as a last-resort protection against NaN propagation during flux calculation. It returns zero flux for that timestep but does not overwrite reservoir parameters.

For PET, the invalid value is a runtime BMI forcing input rather than an initialization parameter. The guard logs a throttled warning and uses zero PET for that timestep only to avoid NaN propagation, while clearly identifying the issue as an upstream forcing/BMI provider problem.

Copy link
Copy Markdown

@cmaynard-ngwpc cmaynard-ngwpc May 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@idtodd Should this be considered a FATAL error and an exception thrown or is logging a SEVERE message sufficient?

Comment thread src/cfe.c Outdated
potentialEtIssueWarned = 2;
}

evap_struct->potential_et_m_per_s = 0.0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be setting the value here? The most likely reason for this to occur (based on the BMI initialize function checking for NaN) is CFE being passed a bad value through Set_value. Reiterating my point from the previous comment, setting NaN to some default value in the middle of the run obscures the incorrect water model predictions that we're setting the water model to output.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. The previous runtime PET substitution inside cfe.c was masking invalid external forcing input by replacing non-finite PET values with 0.0 during model execution. Since water_potential_evaporation_flux is supplied through BMI Set_value, silently modifying the value inside the ET physics routine could make model output appear valid even though the forcing data was invalid.

I updated the fix so validation now occurs in the BMI Set_value() path before the forcing value enters the CFE model state. If water_potential_evaporation_flux is non-finite or negative, CFE now logs a FATAL error and returns BMI_FAILURE rather than substituting a default runtime value.

I also removed the runtime PET substitution logic from cfe.c and restored the original ET timestep calculations unchanged. This keeps the ET physics behavior intact while ensuring invalid external forcing data is rejected explicitly instead of being hidden during execution.

Comment thread src/bmi_cfe.c Outdated
@mkarim-rtx mkarim-rtx changed the base branch from development to ngwpc-candidate May 19, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants