fix: prevent gui->reset corruption from re-entrant GTK event processing#20723
fix: prevent gui->reset corruption from re-entrant GTK event processing#20723abulka wants to merge 1 commit intodarktable-org:masterfrom
Conversation
During _dev_load_requested_image(), widget creation/destruction and pixelpipe rebuilding can trigger re-entrant GTK event processing, allowing other code paths to run their own ++/-- cycles on darktable.gui->reset. If these complete out of order, the final decrement takes gui->reset to -1, permanently disabling all IOP module GUI callbacks for the remainder of the session. Replace the relative ++/-- with a save/force-restore pattern so that gui->reset is always correctly restored regardless of re-entrant modifications. A diagnostic warning is logged when corruption is detected and corrected. Primarily observed on macOS (Quartz backend dispatches events more aggressively during widget operations) but the underlying code issue is platform-independent.
Even out of order I don't see how we can reach -1, it is impossible to have a |
|
In addition to what Pascal said, there are nearly 200 places where gui->reset is incremented/decremented, and all of them would need updating if this were an actual problem. For correctness with the approach of only allowing 0/1, you would also need to block on entry if the flag was already nonzero. |
|
You're right that with properly balanced However, I have diagnostic evidence that the corruption is real. Before applying the fix, I added instrumentation that checked e.g. here is an example from my logging After applying the save/restore fix, the diagnostic warning fires regularly (~20-30% of duplicate-and-switch operations on macOS/M1), each time detecting and correcting the corruption. Without the fix, each occurrence would require a full restart of Darktable. The most likely mechanism is a data race: Regarding the ~200 other ++/-- sites: this fix does not change gui->reset to a 0/1 flag. The rest of the codebase continues to use ++/-- normally. The save/restore is applied only to Arguably the proper solution would be making |
I found a possible unbalanced pair due to an early return in darktable/src/iop/colormapping.c Lines 906 to 919 in 30c8544 There should be a |
Which has occurred to me multiple times over the years, and which I've never pursued precisely because of the number of files touched for what appeared to be a non-problem. (Atomic is the way to go here: type One other thing you can try is declaring gui->reset to be |
|
If there are really some gui operations done on a background thread then this is the thing to fix, because all gtk functions must be called on the main thread. So no need to implement thread-safety when only one thread is involved. |
Good finding, indeed one instance to fix. |
|
As @zisoft said there should be no threading issue as there is a single GUI thread. So no need to atomic or mutex. |
I tried that, unfortunately no change. I can mostly reproduce by the steps described here: #17236 (comment) Whenever the GUI freezes I get this last line in the log: I am on a MacBook Pro with M5 Pro. @abulka : Can you please verify (start darktable with |
|
@zisoft No, I don't get that message when it freezes on my MacMini M1. I instead get: Darktable Darktable I have I attached these logs in a zip incl. log runs of my PR branch with the PR fix which has yet to freeze. |
|
Ok, the current freezes are another reason: #20738 |
During _dev_load_requested_image(), widget creation/destruction and pixelpipe rebuilding can trigger re-entrant GTK event processing, allowing other code paths to run their own ++/-- cycles on darktable.gui->reset. If these complete out of order, the final decrement takes gui->reset to -1, permanently disabling all IOP module GUI callbacks for the remainder of the session.
Replace the relative ++/-- with a save/force-restore pattern so that gui->reset is always correctly restored regardless of re-entrant modifications. A diagnostic warning is logged when corruption is detected and corrected.
Primarily observed on macOS (Quartz backend dispatches events more aggressively during widget operations) but the underlying code issue is platform-independent.
Fixes #17236