Retry SPARQL stage queries on HTTP 500 (and 408/425/429), not just 502/503/504

## Problem

`@lde/pipeline`’s SPARQL executor retries transient failures, but `isTransientError` (`packages/pipeline/src/sparql/executor.ts`) classifies only network errors and HTTP 502/503/504 as transient:

```ts
return status === 502 || status === 503 || status === 504;
```

HTTP 500 is treated as definitive, so a stage that hits a single 500 fails immediately with no retry. Underpowered dataset endpoints routinely return 500 under load (QLever, for one, aborts an over-budget query with a 500). When that happens mid-run the affected stages emit no output and the dataset’s analysis is silently incomplete.

## Evidence

Dataset Knowledge Graph run 2026-06-19, dataset `https://data.razu.nl/id/dataset/kranten` (endpoint `https://api.data.razu.nl/datasets/id/object/sparql`, ~24M triples):

- `class-property-object-classes.rq` and `class-property-languages.rq` failed with `Invalid SPARQL endpoint response (HTTP status 500)` – no retry.
- `subjects.rq` and `subject-uri-space.rq` aborted on timeouts, with repeated adaptive-timeout tightening.
- Result: the dataset got only a `void:classPartition` – no subject namespaces, hence no persistent-URI check downstream.

The endpoint serves light queries fine (`ASK` in ~50ms) but times out / 500s on heavy aggregations – exactly the transient/overload pattern retries exist for.

## Proposed change

Extend the retryable set in `isTransientError` to `{500, 502, 503, 504, 408, 425, 429}`. Keep other 4xx (400, 404) non-retryable – those are deterministic. The existing bounded retries (default 3) plus `p-retry` backoff already guard against hammering a struggling endpoint. This also makes the policy consistent with the DKG’s own per-URI dereference path, which already treats any `status >= 500` as transient.

## Caveat

A 500 can also be deterministic (an endpoint that always chokes on a query at that size – e.g. razu’s `COUNT(DISTINCT ?s)` times out every time). Retrying reduces the frequency of incomplete analyses but cannot guarantee success, so it should be paired with surfacing stage-level incompleteness to consumers rather than relied on alone.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Retry SPARQL stage queries on HTTP 500 (and 408/425/429), not just 502/503/504 #504

Problem

Evidence

Proposed change

Caveat

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Retry SPARQL stage queries on HTTP 500 (and 408/425/429), not just 502/503/504 #504

Description

Problem

Evidence

Proposed change

Caveat

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions