Skip to content

fix: prevent gui->reset corruption from re-entrant GTK event processing#20723

Open
abulka wants to merge 1 commit intodarktable-org:masterfrom
abulka:andy-dup-fix
Open

fix: prevent gui->reset corruption from re-entrant GTK event processing#20723
abulka wants to merge 1 commit intodarktable-org:masterfrom
abulka:andy-dup-fix

Conversation

@abulka
Copy link
Copy Markdown

@abulka abulka commented Apr 1, 2026

During _dev_load_requested_image(), widget creation/destruction and pixelpipe rebuilding can trigger re-entrant GTK event processing, allowing other code paths to run their own ++/-- cycles on darktable.gui->reset. If these complete out of order, the final decrement takes gui->reset to -1, permanently disabling all IOP module GUI callbacks for the remainder of the session.

Replace the relative ++/-- with a save/force-restore pattern so that gui->reset is always correctly restored regardless of re-entrant modifications. A diagnostic warning is logged when corruption is detected and corrected.

Primarily observed on macOS (Quartz backend dispatches events more aggressively during widget operations) but the underlying code issue is platform-independent.

Fixes #17236

During _dev_load_requested_image(), widget creation/destruction and
pixelpipe rebuilding can trigger re-entrant GTK event processing,
allowing other code paths to run their own ++/-- cycles on
darktable.gui->reset. If these complete out of order, the final
decrement takes gui->reset to -1, permanently disabling all IOP
module GUI callbacks for the remainder of the session.

Replace the relative ++/-- with a save/force-restore pattern so that
gui->reset is always correctly restored regardless of re-entrant
modifications. A diagnostic warning is logged when corruption is
detected and corrected.

Primarily observed on macOS (Quartz backend dispatches events more
aggressively during widget operations) but the underlying code issue
is platform-independent.
@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 1, 2026

If these complete out of order, the final decrement takes gui->reset to -1

Even out of order I don't see how we can reach -1, it is impossible to have a -- if a ++ was not reached before. So please can you clarify?

@ralfbrown
Copy link
Copy Markdown
Collaborator

In addition to what Pascal said, there are nearly 200 places where gui->reset is incremented/decremented, and all of them would need updating if this were an actual problem. For correctness with the approach of only allowing 0/1, you would also need to block on entry if the flag was already nonzero.

@TurboGit TurboGit added the controversial this raises concerns, don't move on the technical work before reaching a consensus label Apr 1, 2026
@abulka
Copy link
Copy Markdown
Author

abulka commented Apr 2, 2026

You're right that with properly balanced ++/-- pairs, re-entrant nesting in a single thread should not produce a negative value. I audited all ~180 ++/-- sites and every pair is balanced.

However, I have diagnostic evidence that the corruption is real. Before applying the fix, I added instrumentation that checked gui->reset immediately before the final --darktable.gui->reset in _dev_load_requested_image(). The logs showed gui->reset = 0 at that point — when it must be ≥ 1 since the function incremented it on entry. This was observed multiple times across multiple sessions. With the original --, this takes gui->reset to -1, which is truthy, permanently suppressing all if(darktable.gui->reset) return; checks and disabling every IOP module's GUI.

e.g. here is an example from my logging

129.6144 [darkroom] INCREMENT gui->reset (was 0)
129.6291 [darkroom] DECREMENT gui->reset (was 0)    <-- WAS 0, NOT 1!
129.6291 [darkroom] gui->reset now=-1 (should be 0)

After applying the save/restore fix, the diagnostic warning fires regularly (~20-30% of duplicate-and-switch operations on macOS/M1), each time detecting and correcting the corruption. Without the fix, each occurrence would require a full restart of Darktable.

The most likely mechanism is a data race: gui->reset is a plain int32_t with no atomicity guarantees. The ++/-- operations are load-modify-store sequences. dev->gui_synch is set from a background pixelpipe thread (develop.c:674), which influences when expose() runs its own ++/-- cycle. On ARM (Apple Silicon), relaxed memory ordering makes lost-update races on non-atomic variables more likely than on x86. This would explain both the macOS prevalence and the intermittency.

Regarding the ~200 other ++/-- sites: this fix does not change gui->reset to a 0/1 flag. The rest of the codebase continues to use ++/-- normally. The save/restore is applied only to _dev_load_requested_image(), which is the long-running function (widget rebuild, pixelpipe re-init, history pop) where the corruption consistently manifests. It ensures this one function restores the correct value regardless of what happens to the counter during its execution.

Arguably the proper solution would be making gui->reset an _Atomic int32_t or protecting it with a mutex — but that's a much larger change touching 50+ files.

@zisoft
Copy link
Copy Markdown
Collaborator

zisoft commented Apr 2, 2026

I audited all ~180 ++/-- sites and every pair is balanced.

I found a possible unbalanced pair due to an early return in src/iop/colormapping.c:

++darktable.gui->reset;
dt_iop_gui_enter_critical_section(self);
const int width = g->width;
const int height = g->height;
const int ch = g->ch;
float *const restrict buffer = dt_iop_image_alloc(width, height, ch);
if(!buffer)
{
dt_iop_gui_leave_critical_section(self);
return;
}
dt_iop_image_copy_by_size(buffer, g->buffer, width, height, ch);

There should be a -- before the return.

@ralfbrown
Copy link
Copy Markdown
Collaborator

Arguably the proper solution would be making gui->reset an _Atomic int32_t or protecting it with a mutex — but that's a much larger change touching 50+ files.

Which has occurred to me multiple times over the years, and which I've never pursued precisely because of the number of files touched for what appeared to be a non-problem. (Atomic is the way to go here: type dt_atomic_int and functions dt_atomic_add_int/dt_atomic_sub_int, which are set up to use whatever mechanisms are available -- see src/common/atomic.h.)

One other thing you can try is declaring gui->reset to be volatile. I suspect that compiler optimizations (specifically, delaying/eliminating spills to RAM) play a role here.

@zisoft
Copy link
Copy Markdown
Collaborator

zisoft commented Apr 2, 2026

If there are really some gui operations done on a background thread then this is the thing to fix, because all gtk functions must be called on the main thread. So no need to implement thread-safety when only one thread is involved.

@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 2, 2026

I audited all ~180 ++/-- sites and every pair is balanced.

I found a possible unbalanced pair due to an early return in src/iop/colormapping.c:

++darktable.gui->reset;
dt_iop_gui_enter_critical_section(self);
const int width = g->width;
const int height = g->height;
const int ch = g->ch;
float *const restrict buffer = dt_iop_image_alloc(width, height, ch);
if(!buffer)
{
dt_iop_gui_leave_critical_section(self);
return;
}
dt_iop_image_copy_by_size(buffer, g->buffer, width, height, ch);

There should be a -- before the return.

Good finding, indeed one instance to fix.

@TurboGit
Copy link
Copy Markdown
Member

TurboGit commented Apr 2, 2026

As @zisoft said there should be no threading issue as there is a single GUI thread. So no need to atomic or mutex.

@zisoft
Copy link
Copy Markdown
Collaborator

zisoft commented Apr 2, 2026

@ralfbrown :

One other thing you can try is declaring gui->reset to be volatile. I suspect that compiler optimizations (specifically, delaying/eliminating spills to RAM) play a role here.

I tried that, unfortunately no change.

I can mostly reproduce by the steps described here: #17236 (comment)

Whenever the GUI freezes I get this last line in the log:
30.3861 pipe cache check [preview] 12 lines (important=2, used=9). Freed: invalid 0MB used 0MB. Using 545MB, limit=0MB

I am on a MacBook Pro with M5 Pro.

@abulka : Can you please verify (start darktable with -d common)

@abulka
Copy link
Copy Markdown
Author

abulka commented Apr 3, 2026

@zisoft No, I don't get that message when it freezes on my MacMini M1. I instead get:

Darktable 5.4.1

   400.3596 pipe finished             CPU [thumbnail]                                        (0/0)    146x110 sc=0.028; 'P1100466.ORF' ID=63
   400.3598 [mipmap init 8] export ID=63 finished (sizes 180 110 => 146 110)

FREEZE 

Darktable 5.0.1

    56.6005 pipe finished             CPU [thumbnail]                                  (   0/   0)  146x 110 scale=0.0281 --> (   0/   0)  146x 110 scale=0.0281  ID=62
    56.6007 [mipmap init 8] export ID=62 finished (sizes 180 110 => 146 110)!

FREEZE 

I have I attached these logs in a zip incl. log runs of my PR branch with the PR fix which has yet to freeze.
darktable-logs-various-pr-20723.zip

@zisoft
Copy link
Copy Markdown
Collaborator

zisoft commented Apr 3, 2026

Ok, the current freezes are another reason: #20738

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

controversial this raises concerns, don't move on the technical work before reaching a consensus

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Duplicate manager freezes the module groups and deactivates the modules

4 participants