draft blogpost for Wikidata Reconciliation Service#568
Conversation
✅ Deploy Preview for openrefine-website ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
On the Wikimedia side note that:
On the reconciliation side:
On the OpenRefine side:
Given all of this, my suggestion would be to either sit still in the boat or write a generic blog post on OpenRefine/reconciliation. I'm afraid a blog post like this could misinform and do more harm than good. |
|
Only on a process note, while it would be great if there were a wide consensus in the OpenRefine community, I don't think there is anything in the blog post that is within the scope of the Advisory Committee mandate to formally approve. It seems to me that what is described lies within the Core Dev group mandate. |
|
@Abbe98 Thanks for the detailed feedback, this is helpful. I agree with several of your points, in particular:
To clarify the intent of the post: this is not meant to be a precise technical description of all traffic patterns or a definitive account of recent changes on the Wikimedia side. The goal is more limited and practical:
We often need to explain this three-layer structure in the forum, on GitHub, and in conversations with partners. This incident highlighted that ownership is not clearly understood across the community. The post and diagram are meant to make the scope and responsibilities of each component visible, and to make OpenRefine’s role explicit. If there is one point this post aims to make explicit, it is that OpenRefine neither operates nor maintains the Wikidata reconciliation service. The recent incident brought this to the surface, but the underlying question has been around for some time. The post is also intended to help move the conversation toward clearer ownership going forward. On your specific points: Timing: Agreed that the situation is still evolving. The post intentionally avoids going into technical details or making assumptions about how things will settle. The focus is on structural aspects (scope, ownership, dependencies), which are more stable. Diagram: Agreed that it is simplified. The goal is not to represent all request paths, but to show that multiple independently operated components are involved. I can make that explicit in the text and include the direct Wikibase extension -> Wikidata path. Multiple reconciliation services: Good point. The post already mentions that OpenRefine supports multiple reconciliation services, but we can strengthen that to avoid the impression that Wikidata is the only one. “What we are doing” section: I agree with your point here. That section is not central to the main argument and may dilute the message. I’m leaning toward removing it to keep the post focused on scope and governance. Happy to adjust the draft further along those lines. |
|
From a process point of view, I would find it easier to first agree on an objective or outline for the blog post. A full written text is anchors and confines the discussion. There's no question that the Wikidata reconciliation service is a mess, but it's not our mess and I don't think we're in the position to speak for it. In an ideal world, it would have a responsive maintainer and a clear problem reporting mechanism that is easy for the users to find. We may be on the path to that, but I think it's too early to tell. We're definitely not there yet since the current service points to an issue tracker which is archived and points to two other different repos with a fourth repo being proposed as the final resting place. The planning, communication, and professionalism of the Wiki* engineering teams leaves a lot to be desired, but they're also under stress and we don't really have any influence on their behavior, so we (and our users) just need to deal with the consequences. I agree with Albin that it's premature to post anything until it's clearer what the outcome is going to be. Ideally, when posted, it should include a pointer to the problem reporting mechanism for the production Wikidata reconciliation service (and the service will have been updated to point to that same place). Eliminating a lot of the excess detail and focusing on the key message(s) ("not our problem"?) would help readers focus. Currently that's below the fold (ie after the break) and buried deep. The timeline actually begins in August 2025 from a Wikidata point of view, but it might also be worthwhile to mention the rise of the AI scrapers as context for these dramatic changes, because it affects other reconciliation services and Fetch URL. The other general topic worth including a discussion of reconciliation services is that because we don't control them, not only can we not fix outages, but we also don't control what is done with users data, so they should be comfortable sending their data to whatever service(s) they choose to use. Lastly, and this doesn't really relate to the blog post, one of the most troubling things I find about this whole situation is the lack of transparency. Both last Fall's and this most recent round of Wikidata changes were made without any advance notice. The recon service was "fixed" by quietly reconfiguring the service url to redirect to a different host behind the scenes, but none of the underlying bugs have been fixed, so it is susceptible to future Wikidata API tightening that doesn't whitelist the WMCS hosts. Bottom line - don't post until the situation is clearer and then revise to focus on the main message. |
I’m opening this PR to share a draft blog post following the recent issues around the Wikidata reconciliation service and the related discussions on rate limiting and infrastructure. It was previously discussed with @Ainali during last week's Advisory Committee call.
The goal of this post is to:
I would like this post to serve as an official reference point for future conversations, as this topic regularly comes up across the forum, GitHub, and external discussions.
I am requesting a formal approval from the Core Dev Group (@tfmorris @Abbe98) and the Advisory Committee (@Ainali @ej2432 @jfaurelacroix) as this post represents the project’s official position. The goal is to reflect a shared consensus within the OpenRefine community.
Please feel free to suggest edits or alternative directions. Other community members and committers are also very welcome to review, comment, and participate in the discussion.