Skip to content

Retry SPARQL stage queries on HTTP 500 (and 408/425/429), not just 502/503/504 #504

Description

@ddeboer

Problem

@lde/pipeline’s SPARQL executor retries transient failures, but isTransientError (packages/pipeline/src/sparql/executor.ts) classifies only network errors and HTTP 502/503/504 as transient:

return status === 502 || status === 503 || status === 504;

HTTP 500 is treated as definitive, so a stage that hits a single 500 fails immediately with no retry. Underpowered dataset endpoints routinely return 500 under load (QLever, for one, aborts an over-budget query with a 500). When that happens mid-run the affected stages emit no output and the dataset’s analysis is silently incomplete.

Evidence

Dataset Knowledge Graph run 2026-06-19, dataset https://data.razu.nl/id/dataset/kranten (endpoint https://api.data.razu.nl/datasets/id/object/sparql, ~24M triples):

  • class-property-object-classes.rq and class-property-languages.rq failed with Invalid SPARQL endpoint response (HTTP status 500) – no retry.
  • subjects.rq and subject-uri-space.rq aborted on timeouts, with repeated adaptive-timeout tightening.
  • Result: the dataset got only a void:classPartition – no subject namespaces, hence no persistent-URI check downstream.

The endpoint serves light queries fine (ASK in ~50ms) but times out / 500s on heavy aggregations – exactly the transient/overload pattern retries exist for.

Proposed change

Extend the retryable set in isTransientError to {500, 502, 503, 504, 408, 425, 429}. Keep other 4xx (400, 404) non-retryable – those are deterministic. The existing bounded retries (default 3) plus p-retry backoff already guard against hammering a struggling endpoint. This also makes the policy consistent with the DKG’s own per-URI dereference path, which already treats any status >= 500 as transient.

Caveat

A 500 can also be deterministic (an endpoint that always chokes on a query at that size – e.g. razu’s COUNT(DISTINCT ?s) times out every time). Retrying reduces the frequency of incomplete analyses but cannot guarantee success, so it should be paired with surfacing stage-level incompleteness to consumers rather than relied on alone.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Fields

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions