Skip to content

use fabric to read/write _replicator docs#6031

Open
rnewson wants to merge 1 commit into
mainfrom
replicator-quorum-ops
Open

use fabric to read/write _replicator docs#6031
rnewson wants to merge 1 commit into
mainfrom
replicator-quorum-ops

Conversation

@rnewson

@rnewson rnewson commented Jun 10, 2026

Copy link
Copy Markdown
Member

closes #6029

Overview

serialize_worker_startup=true fixes the "spurious conflict" update case in most circumstances. A prior attempt to address that problem in the replicator now conflicts with that work. The replicator can update replicator docs whenever the state changes. For jobs that crash quickly, stored conflicts could be generated. We changed the replicator to only write to the local copy on the "owner" node, to avoid this. That is fundamentally the same trick that serialize_worker_startup=true does. However these two mechanisms are not coordinated and can easily choose different primary nodes.

This PR returns couch replicator to use fabric:open_doc and fabric:update_doc to align its changes with serialize_worker_startup=true behaviour.

Testing recommendations

covered by existing tests

Related Issues or Pull Requests

#6029

Checklist

  • This is my own work, I did not use AI, LLM's or similar technology
  • Code is written and works correctly
  • Changes are covered by tests
  • Any new configurable parameters are documented in rel/overlay/etc/default.ini
  • Documentation changes were made in the src/docs folder
  • Documentation changes were backported (separated PR) to affected branches

@rnewson rnewson force-pushed the replicator-quorum-ops branch from f58ad1b to 8e34eee Compare June 10, 2026 09:10
ioq:maybe_set_io_priority({system, DbName}),
defer_call(fun() ->
try
fabric:update_doc(DbName, Doc, [?CTX])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With serialize_worker_startup=false currently we're using at cloudant, wouldn't this make the conflict generation worse?

Ret;
{'DOWN', Ref, process, Pid, {exit_throw, Reason}} ->
throw(Reason);
{'DOWN', Ref, process_, Pid, {exit_error, Reason}} ->

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo :process_ will deadlock on exits


save_rep_doc(<<"shards/", _/binary>> = ShardDbName, Doc) ->
DbName = mem3:dbname(ShardDbName),
ioq:maybe_set_io_priority({system, DbName}),

@nickva nickva Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything, same reason as above


open_rep_doc(<<"shards/", _/binary>> = ShardDbName, DocId) ->
DbName = mem3:dbname(ShardDbName),
ioq:maybe_set_io_priority({system, DbName}),

@nickva nickva Jun 10, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ioq setting here will fail. It's a pdict setting and since defer spawns a process it won't be applied

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: _replicator docs 409s with serialize_worker_startup=true

2 participants