Skip to content

Serialize the dynamic core boundary temperature plugin#6918

Open
Francyrad wants to merge 3 commits intogeodynamics:mainfrom
Francyrad:dynamic_core_restart_fix
Open

Serialize the dynamic core boundary temperature plugin#6918
Francyrad wants to merge 3 commits intogeodynamics:mainfrom
Francyrad:dynamic_core_restart_fix

Conversation

@Francyrad
Copy link
Copy Markdown
Contributor

This PR serializes the state of the dynamic core boundary temperature plugin so checkpoint/restart preserves its internal state across runs.

It also keeps compatibility with older checkpoints by falling back to the legacy restart data stored in the core statistics postprocessor.

Refs #6744.

Copy link
Copy Markdown
Contributor

@bangerth bangerth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The addition of the serialize(), save(), and load() functions is undoubtedly correct. I must admit that I don't understand why the rest is necessary. Can you explain?

In principle, loading from a checkpoint should just re-create the exact same state we had before. There shouldn't be a need to store something like a flag that describes whether we are just resuming.

Comment on lines +44 to -45
core_data = {};
is_first_call = true;
core_data.is_initialized = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part unrelated to the current purpose of getting things to work to work with serialization? I don't see the connection, but I also don't see how it's supposed to work: core_data = {} does not actually initialize anything (I think) because the CoreData class has no constructor :-(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part unrelated to the current purpose of getting things to work to work with serialization? I don't see the connection, but I also don't see how it's supposed to work: core_data = {} does not actually initialize anything (I think) because the CoreData class has no constructor :-(

This was related to making the restart path deterministic.

Before this patch, only core_data.is_initialized was explicitly set in the constructor, while the other members of core_data were left uninitialized until later. Since the restart path now relies on core_data before the normal prm-based initialization path, I wanted the whole struct to start from a well-defined zero state.

My understanding is that core_data = {} does value-initialize the aggregate and therefore zero-initialize all scalar members, even though CoreData has no user-defined constructor. But I agree that this is not very explicit and can easily be misread.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never clear to me what happens with default initialization of members of type double or int or bool. Let me just write a constructor for the CoreData structure, so that we don't have to guess here. I think it's right to just split this part of the patch off to a separate one.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #6924.

@Francyrad
Copy link
Copy Markdown
Contributor Author

The addition of the serialize(), save(), and load() functions is undoubtedly correct. I must admit that I don't understand why the rest is necessary. Can you explain?

In principle, loading from a checkpoint should just re-create the exact same state we had before. There shouldn't be a need to store something like a flag that describes whether we are just resuming.

Thanks, that is a fair point. The intent was not to store a persistent “we are resuming now” flag.

What I actually wanted to serialize is only the persistent state of the core solver itself: Ti, Ri, Xi, Q, dR_dt, dT_dt, dX_dt, and is_initialized. By contrast, quantities such as Qs, Qr, Qg, Ql, Es, Er, Eg, Ek, El, Eh, as well as dt, H, and Q_OES, are derived quantities that are recomputed from the restored state and from the current time step.

The extra logic in update() was only there to ensure that, on the first call after load(), we do not reinitialize from the prm values and that we rebuild these derived quantities before the first resumed solve. So resumed_from_checkpoint is only a local one-call helper inferred from the restored core_data.is_initialized; it is not itself checkpointed state.

@bangerth
Copy link
Copy Markdown
Contributor

bangerth commented Apr 2, 2026

I see. I think there are two options:

  • The data is small enough, it's ok to also serialize computed/derived quantities.
  • You could just write out the underlying data in save(), and in load() first read in that underlying data and then re-compute derived quantities there.

Do either of these options seem reasonable?

@bangerth
Copy link
Copy Markdown
Contributor

I think that the issue with initialization is more complicated, see #6924. If you want, take these issues out of the pull request and only add the three serialization functions, then we can merge this one independently.

@Francyrad
Copy link
Copy Markdown
Contributor Author

I can start to work on it Tuesday

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants