draft blogpost for Wikidata Reconciliation Service by magdmartin · Pull Request #568 · OpenRefine/openrefine.org

magdmartin · 2026-04-22T19:58:29Z

I’m opening this PR to share a draft blog post following the recent issues around the Wikidata reconciliation service and the related discussions on rate limiting and infrastructure. It was previously discussed with @Ainali during last week's Advisory Committee call.

The goal of this post is to:

document the incident at a high level
clarify the structure of the reconciliation ecosystem
state OpenRefine’s position regarding scope and ownership
support ongoing discussions around governance and sustainability of the Wikidata reconciliation service

I would like this post to serve as an official reference point for future conversations, as this topic regularly comes up across the forum, GitHub, and external discussions.

I am requesting a formal approval from the Core Dev Group (@tfmorris @Abbe98) and the Advisory Committee (@Ainali @ej2432 @jfaurelacroix) as this post represents the project’s official position. The goal is to reflect a shared consensus within the OpenRefine community.

Please feel free to suggest edits or alternative directions. Other community members and committers are also very welcome to review, comment, and participate in the discussion.

netlify · 2026-04-22T19:58:35Z

✅ Deploy Preview for openrefine-website ready!

Name	Link
🔨 Latest commit	`e9e0a39`
🔍 Latest deploy log	https://app.netlify.com/projects/openrefine-website/deploys/69e9286825de6600083bdeb8
😎 Deploy Preview	https://deploy-preview-568--openrefine-website.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Abbe98 · 2026-04-23T05:16:45Z

On the Wikimedia side note that:

WMF intends to tighten the screws on API usage even more in the next 14 days, this blog post might therefore be premature.
Per Daniels last message OpenRefine is sending complaint headers, what they are seeing is traffic from old versions of OpenRefine and the reconciliation service.
We have only received communication from a single member of a single small Wikimedia engineering/product team, there has been no official communication about these changes through the normal channels.
In just two weeks said individual has changed their position three times.
Several tools and even Wikimedias own services has been having outages because of these changes, contributing to these swift position changes.

On the reconciliation side:

There is more than one reconciliation service, we just happens to be bundling one, not all depends on the SPARQL service, etc.

On the OpenRefine side:

The diagram is incorrect, simplified there are four types of traffic to Wikimedia from a default OpenRefine installation:
- Requests from the Wikibase extension backend(sometimes using the Wikidata-toolkitl library), this is mostly made up of authenticated requests
- Requests from a reconciliation service for which OpenRefine is a client
- Request from the Wikibase extension issued by the frontend
- Requests initiated by the user(fetch-by-url, etc)
On the subject of what we are actually doing, are we actually doing it? Improving our user-agents does not seem actionable? Working on improved error handling sounds great but is a rather massive undertaking. With the next planned release being 4.0 any notable improvements are also kinda far out.

Given all of this, my suggestion would be to either sit still in the boat or write a generic blog post on OpenRefine/reconciliation. I'm afraid a blog post like this could misinform and do more harm than good.

Ainali · 2026-04-23T08:35:19Z

Only on a process note, while it would be great if there were a wide consensus in the OpenRefine community, I don't think there is anything in the blog post that is within the scope of the Advisory Committee mandate to formally approve. It seems to me that what is described lies within the Core Dev group mandate.

magdmartin · 2026-04-23T12:36:07Z

@Abbe98 Thanks for the detailed feedback, this is helpful.

I agree with several of your points, in particular:

The situation on the Wikimedia side is still evolving
We only have partial visibility
The diagram is a simplification

To clarify the intent of the post: this is not meant to be a precise technical description of all traffic patterns or a definitive account of recent changes on the Wikimedia side.

The goal is more limited and practical:

document the incident at a high level for OpenRefine users
provide a clear reference to explain how the reconciliation ecosystem is structured
clarify scope and ownership across the different services involved

We often need to explain this three-layer structure in the forum, on GitHub, and in conversations with partners. This incident highlighted that ownership is not clearly understood across the community. The post and diagram are meant to make the scope and responsibilities of each component visible, and to make OpenRefine’s role explicit.

If there is one point this post aims to make explicit, it is that OpenRefine neither operates nor maintains the Wikidata reconciliation service. The recent incident brought this to the surface, but the underlying question has been around for some time. The post is also intended to help move the conversation toward clearer ownership going forward.

On your specific points:

Timing: Agreed that the situation is still evolving. The post intentionally avoids going into technical details or making assumptions about how things will settle. The focus is on structural aspects (scope, ownership, dependencies), which are more stable.

Diagram: Agreed that it is simplified. The goal is not to represent all request paths, but to show that multiple independently operated components are involved. I can make that explicit in the text and include the direct Wikibase extension -> Wikidata path.

Multiple reconciliation services: Good point. The post already mentions that OpenRefine supports multiple reconciliation services, but we can strengthen that to avoid the impression that Wikidata is the only one.

“What we are doing” section: I agree with your point here. That section is not central to the main argument and may dilute the message. I’m leaning toward removing it to keep the post focused on scope and governance.

Happy to adjust the draft further along those lines.

tfmorris · 2026-04-23T18:17:33Z

From a process point of view, I would find it easier to first agree on an objective or outline for the blog post. A full written text is anchors and confines the discussion.

There's no question that the Wikidata reconciliation service is a mess, but it's not our mess and I don't think we're in the position to speak for it. In an ideal world, it would have a responsive maintainer and a clear problem reporting mechanism that is easy for the users to find. We may be on the path to that, but I think it's too early to tell. We're definitely not there yet since the current service points to an issue tracker which is archived and points to two other different repos with a fourth repo being proposed as the final resting place.

The planning, communication, and professionalism of the Wiki* engineering teams leaves a lot to be desired, but they're also under stress and we don't really have any influence on their behavior, so we (and our users) just need to deal with the consequences.

I agree with Albin that it's premature to post anything until it's clearer what the outcome is going to be. Ideally, when posted, it should include a pointer to the problem reporting mechanism for the production Wikidata reconciliation service (and the service will have been updated to point to that same place).

Eliminating a lot of the excess detail and focusing on the key message(s) ("not our problem"?) would help readers focus. Currently that's below the fold (ie after the break) and buried deep.

The timeline actually begins in August 2025 from a Wikidata point of view, but it might also be worthwhile to mention the rise of the AI scrapers as context for these dramatic changes, because it affects other reconciliation services and Fetch URL.

The other general topic worth including a discussion of reconciliation services is that because we don't control them, not only can we not fix outages, but we also don't control what is done with users data, so they should be comfortable sending their data to whatever service(s) they choose to use.

Lastly, and this doesn't really relate to the blog post, one of the most troubling things I find about this whole situation is the lack of transparency. Both last Fall's and this most recent round of Wikidata changes were made without any advance notice. The recon service was "fixed" by quietly reconfiguring the service url to redirect to a different host behind the scenes, but none of the underlying bugs have been fixed, so it is susceptible to future Wikidata API tightening that doesn't whitelist the WMCS hosts.

Bottom line - don't post until the situation is clearer and then revise to focus on the main message.

draft blogpost for Wikidata Reconciliation Service

e9e0a39

magdmartin requested review from Abbe98, Ainali, ej2432 and tfmorris April 22, 2026 19:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

draft blogpost for Wikidata Reconciliation Service#568

draft blogpost for Wikidata Reconciliation Service#568
magdmartin wants to merge 1 commit into
masterfrom
202604-blog

magdmartin commented Apr 22, 2026

Uh oh!

netlify Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

Abbe98 commented Apr 23, 2026

Uh oh!

Ainali commented Apr 23, 2026

Uh oh!

magdmartin commented Apr 23, 2026

Uh oh!

tfmorris commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

magdmartin commented Apr 22, 2026

Uh oh!

netlify Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for openrefine-website ready!

Uh oh!

Abbe98 commented Apr 23, 2026

Uh oh!

Ainali commented Apr 23, 2026

Uh oh!

magdmartin commented Apr 23, 2026

Uh oh!

tfmorris commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify Bot commented Apr 22, 2026 •

edited

Loading