GraphQL search API

## Summary

Add a presentation-facing **GraphQL search API**, generated from the same SHACL + `search:` source as the existing engine-agnostic projection (`@lde/search`) and engine adapter (`@lde/search-typesense`). Per the platform design, both the GraphQL API (schema + Mercurius resolvers) and the REST API (OpenAPI + Fastify handlers) are generated from one projection, so a single source serves N engines (Typesense is the v1 adapter; Elasticsearch/OpenSearch follow the same pattern).

See the platform reference: https://docs.nde.nl/stack/layers/platform#search-apis

## Requirement: facet labels returned inline

Facets are keyed by IRI (organizations, classes, terminology sources). The GraphQL API MUST return each facet bucket **with its human-readable, locale-resolved label attached inline** – so clients never perform a separate IRI→label lookup. The same applies to entity references rendered from an IRI (e.g. the publisher on a result card).

This is an **engine-agnostic contract**. It says nothing about Typesense, sidecar collections, joins, or `export` – *how* labels are resolved is the **engine adapter’s** responsibility, hidden behind the framed-document projection. Consumers of the API never learn which engine is underneath (Typesense is only the v1 adapter; Elasticsearch/OpenSearch satisfy the same contract their own way).

Concretely:

- Facet buckets carry the IRI value, its count, **and** the label as a locale-ordered `[LanguageString!]!` (Accept-Language preference), matching the platform’s multilingual-string convention.
- Label resolution happens server-side, below the API surface. The adapter is free to choose its mechanism – e.g. the Typesense v1 adapter can draw on a sidecar label collection; an Elasticsearch adapter might use nested documents – but none of that leaks into the schema or the resolver contract.
- No engine client, key, or bespoke lookup logic is shipped into any consumer.

### Why inline

Resolving labels server-side and returning them inline:

- removes a per-render round-trip and the need for clients to hold a Typesense client at all;
- keeps the locale-fallback policy (e.g. nl→en→first) in **one** server-side place instead of duplicated per consumer;
- preserves the **engine-agnostic boundary** – the API abstracts over the search engine (Typesense is the v1 adapter, not part of the contract), `@lde/search` / `@lde/search-typesense` stay domain-agnostic mechanics (no `label` concept; see the framed-document projection), and label semantics live in the API layer’s resolvers and the domain projection.

## Notes on shape (from the platform reference)

- Standard GraphQL idioms: Prisma-style typed filter inputs (`where: { field: { eq, in, gt, … } }`), Relay Connection pagination (`first` / `after` / opaque cursor), facet arrays with counts.
- Localised content as `[LanguageString!]!` ordered by Accept-Language, not language-tagged literals.
- Schema + resolvers generated from SHACL + `search:` annotations, not hand-written.

---

## Full feature surface: what the GraphQL must cover

Scoping driven by a comparison of the [Typesense search API](https://typesense.org/docs/latest/api/search.html) and the [Elasticsearch `_search` API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html). Four principles decide what is in the public schema vs. configured below the boundary:

- **Generated, not hand-written.** Every field/filter/facet/sort is emitted from SHACL + `search:` + the projection. The surface is bounded by what the projection materialises (`title_search_nl`, `publisher` facet, `size` number, date fields); the API cannot expose a capability the projection does not produce.
- **Engine-agnostic.** Anything exposed must be expressible on *both* Typesense and Elasticsearch (and resolvable by a future adapter). Engine-proprietary knobs stay below the boundary.
- **Presentation-facing.** The audience is web developers who do not know RDF. Prisma `where`, Relay connections, plain SDL. Relevance-engineering knobs do not leak unless a UI genuinely drives them.
- **Folding contract.** `query` MUST be `fold()`ed server-side with the same function used at index time (see [`@lde/search` README](../tree/main/packages/search#querying)), or matches silently miss. Invisible to the client.

### Capability map (Typesense ∪ Elasticsearch → our GraphQL)

| Capability | Typesense | Elasticsearch | Expose in GraphQL? | How / where |
|---|---|---|---|---|
| **Query / matching** |
| Free-text query | `q` | `query.match` | **Yes** – `query: String` | Server folds it; fans out over all search-enabled per-locale fields |
| Which fields are searched | `query_by` | `multi_match.fields` | **No (server-side)** | Derived from projection’s `search`-enabled fields |
| Prefix / autocomplete | `prefix`, `infix` | edge-ngram / `suggest` | **Yes (v1)** | Separate `suggest` field/endpoint, see below |
| Typo tolerance | `num_typos`, `*_threshold` | `fuzziness` | **No (server default)** | Tuned in engine schema |
| **Filtering** |
| Typed filters | `filter_by` | `bool.filter` | **Yes** – Prisma `where` | Per-field `{ eq, in, gt, gte, lt, lte, exists }` from datatype |
| Boolean composition | `&&` `\|\|` | `bool must/should/must_not` | **Yes** – `AND`/`OR`/`NOT` on `where` | |
| IRI / reference filters | `:=` exact | `term` | **Yes** – `eq`/`in` on URI-string fields | Facet IRIs are filterable |
| **Faceting** |
| Facet buckets + counts | `facet_by` | `terms` agg | **Yes (core)** | `facets { value count label }` |
| **Inline locale-resolved labels** | (sidecar) | (nested) | **Yes – headline requirement** | `label: [LanguageString!]!`, resolved below the boundary |
| Numeric / date facet ranges | facet ranges | `range` agg | **Yes** where projection has numbers/dates | Generated for `number`/`date` fields |
| Facet value search / max values | `facet_query`, `max_facet_values` | agg `include`/`size` | **Yes** – args on the facet field | For long facet lists (e.g. publishers) |
| **Disjunctive facets** (a facet’s own selection does not shrink its own counts) | per-facet filter behaviour | `post_filter` | **Yes – must get right** | Resolver applies selected filters as `post_filter`-equivalent so multi-select facets behave correctly |
| **Sorting** |
| By relevance | `_text_match` | `_score` | **Yes** (default) | |
| By field / per-locale sort | `sort_by` | `sort` | **Yes** – `orderBy` enum | Uses `*_sort_${locale}` fields, `missing_values: last` |
| By recency | date `sort_by` | `sort` on date | **Yes** | From `date` fields |
| Random | `_rand()` | `random_score` | **Maybe** | For discover surfaces |
| **Pagination** |
| Relay connection | `page`/`per_page`, `offset` | `from`/`size`, `search_after` | **Yes** | `first`/`after`, opaque cursor hides the offset-vs-`search_after` choice |
| Total count | always | `track_total_hits` | **Yes** – `totalCount` | |
| **Ranking / boosting** (see breakdown below) |
| Field weights (title > description) | `query_by_weights` | field `^boost` | **No (server-side)** | From a new `search:boost` annotation |
| Locale weighting (user’s lang higher) | `query_by_weights` | field `^boost` | **No (server-side)** | Computed from Accept-Language |
| Exact-match / token-position priority | `prioritize_*` | query DSL | **No (server default)** | |
| Editorial curation: pin / hide | `pinned_hits`, `hidden_hits`, overrides | pinned query / curation | **Deferred (out of scope v1)** | |
| Decay / recency / popularity boost | `_eval()`, decay | `function_score`, decay | **No raw params** | Surfaced only via the named `relevance` enum |
| Synonyms | synonym rules | synonym filter | **No (server-side)** | Engine config |
| **Result shaping** |
| Field selection | `include_fields` | `_source` | **Free** – GraphQL selection set | No param needed |
| Highlighting / snippets | `highlight_*` | `highlight` | **Yes (v1)** | `highlights` on result; engine-agnostic |
| Grouping / dedupe | `group_by` | `collapse` | **Deferred** | |
| **Discovery** |
| Did-you-mean / suggest | `q` + typos | `suggest` | **Yes (v1)** | Autocomplete field/endpoint |
| Vector / semantic / hybrid | `vector_query` | `knn`, RRF | **Schema room (v1)** | Reserve the shape; no implementation yet |
| **Cross-cutting** |
| Localization | – | – | **Yes, everywhere** | `Accept-Language` → `[LanguageString!]!` ordering |
| HTTP caching | `use_cache` | – | **REST twin** | `ETag` / `Cache-Control` |
| Analytics / click tracking | analytics rules | – | **Out of scope v1** | |

### Query boosting

Boosting is not one feature but four mechanisms that land on different sides of the boundary:

1. **Field weighting** (title ranks above description) – `query_by_weights` / `^boost`. Relevance engineering, not a client concern → driven by a new `search:boost` annotation per field; the generator emits the weights. Not a GraphQL parameter.
2. **Locale weighting** (rank the Accept-Language locale’s `*_search_${locale}` fields higher) – already anticipated in the `@lde/search` README. Computed server-side from Accept-Language; the resolver reorders the weights so the request’s locale leads.
3. **Editorial curation** (pin to top / hide) – `pinned_hits`/`hidden_hits`/overrides ↔ ES pinned query/curation. **Out of scope v1**; revisit when a real admin surface exists.
4. **Signal-based boosting** (recency decay, popularity) – `_eval()`/decay ↔ `function_score`. Server-configured; surfaced to clients **only** via the named `relevance` enum, never as raw decay parameters.

Net: the only boosting in the public schema is the `relevance`/`orderBy` enum. Everything else lives in `search:` annotations and the engine schema, generated, below the boundary – so relevance tuning changes without a breaking schema change.

---

## v1 scope (decided)

**In – core contract**

- Free-text `query` (folded server-side), Prisma `where` (`eq`/`in`/`gt`/`gte`/`lt`/`lte`/`exists` + `AND`/`OR`/`NOT`), `orderBy`, Relay `first`/`after` + `totalCount`.
- Facets with counts **and inline locale-ordered `[LanguageString!]!` labels** (the headline requirement above), disjunctive multi-select behaviour, facet-value search/limit, numeric/date ranges.
- Localization end-to-end via `Accept-Language`.
- **Relevance as a named enum** (`RELEVANCE | RECENT | POPULAR | …`) – no raw boost knobs.
- **Highlighting / snippets** on results.
- **Autocomplete / suggest** (prefix + did-you-mean) as a separate field/endpoint.
- **Schema room for vector / semantic / hybrid** – reserve the shape, no implementation yet.

**Below the boundary – generated/configured, never in the schema**

`query_by` field set, field weights & locale weighting (from a new `search:boost` annotation), typo tolerance, exact-match/token-position priority, synonyms, decay/recency tuning, the folding step, sidecar label collections, and the offset-vs-`search_after` pagination mechanism.

**Deferred**

- Editorial curation / pin-hide (revisit when a real admin surface exists).
- Grouping / collapse.
- Analytics / click tracking.

### Schema sketch

```graphql
type Query {
  datasets(
    query: String
    where: DatasetWhere
    orderBy: [DatasetOrder!]      # relevance (default) | title | modified | …
    first: Int, after: String      # Relay
    facets: [DatasetFacetName!]    # which facets to compute
  ): DatasetConnection!
}

type DatasetConnection {
  edges: [DatasetEdge!]!
  pageInfo: PageInfo!
  totalCount: Int!
  facets: [Facet!]!                # buckets with inline labels
}

type Facet {
  name: String!
  buckets: [FacetBucket!]!
}

type FacetBucket {
  value: String!                   # IRI (as plain string)
  count: Int!
  label: [LanguageString!]!        # inline, locale-ordered, resolved below the boundary
}

input DatasetWhere {
  AND: [DatasetWhere!]
  OR: [DatasetWhere!]
  NOT: DatasetWhere
  publisher: UriFilter             # { eq, in, exists }
  size: IntFilter                  # { eq, gt, gte, lt, lte }
  modified: DateFilter
}
```

### Note: `search:boost` is a new annotation

Field + locale weighting needs a home, and the cleanest one is a new `search:boost` annotation on the projection/SHACL source. This is the only part of this scope that reaches down into the existing packages: `@lde/search`’s spec vocabulary today is `langText`/`facet`/`number`/`date` with no weight concept. The addition is **additive, not breaking**, but it is the one piece that lands outside the new API layer.

## Relation to current state / interim

- The write/populate side is done and stays in LDE: `@lde/search-typesense`’s blue/green `rebuild` populates both the dataset and the sidecar label collections; the domain projection (label documents, schema, locale fallback) lives in the consumer (dataset-register, `search-indexer`).
- Until this API lands, consumers resolve facet/publisher labels themselves as **throwaway** client-side code (a fetch-all-and-cache of the bounded label collection). This issue supersedes that: once labels come back inline, that interim lookup is deleted.
- Context for the interim design: netwerk-digitaal-erfgoed/dataset-register#2085.

## Out of scope

- The REST/OpenAPI variant (same generator, separate tracking).
- Any label-aware surface in `@lde/search` or `@lde/search-typesense` – these stay engine-mechanics only; label resolution belongs in the API layer.
- Editorial curation (pin/hide), grouping/collapse, and analytics/click tracking – deferred from v1 (see above).


Capability	Typesense	Elasticsearch	Expose in GraphQL?	How / where
Query / matching
Free-text query	`q`	`query.match`	Yes – `query: String`	Server folds it; fans out over all search-enabled per-locale fields
Which fields are searched	`query_by`	`multi_match.fields`	No (server-side)	Derived from projection’s `search`-enabled fields
Prefix / autocomplete	`prefix`, `infix`	edge-ngram / `suggest`	Yes (v1)	Separate `suggest` field/endpoint, see below
Typo tolerance	`num_typos`, `*_threshold`	`fuzziness`	No (server default)	Tuned in engine schema
Filtering
Typed filters	`filter_by`	`bool.filter`	Yes – Prisma `where`	Per-field `{ eq, in, gt, gte, lt, lte, exists }` from datatype
Boolean composition	`&&` `\|\|`	`bool must/should/must_not`	Yes – `AND`/`OR`/`NOT` on `where`
IRI / reference filters	`:=` exact	`term`	Yes – `eq`/`in` on URI-string fields	Facet IRIs are filterable
Faceting
Facet buckets + counts	`facet_by`	`terms` agg	Yes (core)	`facets { value count label }`
Inline locale-resolved labels	(sidecar)	(nested)	Yes – headline requirement	`label: [LanguageString!]!`, resolved below the boundary
Numeric / date facet ranges	facet ranges	`range` agg	Yes where projection has numbers/dates	Generated for `number`/`date` fields
Facet value search / max values	`facet_query`, `max_facet_values`	agg `include`/`size`	Yes – args on the facet field	For long facet lists (e.g. publishers)
Disjunctive facets (a facet’s own selection does not shrink its own counts)	per-facet filter behaviour	`post_filter`	Yes – must get right	Resolver applies selected filters as `post_filter`-equivalent so multi-select facets behave correctly
Sorting
By relevance	`_text_match`	`_score`	Yes (default)
By field / per-locale sort	`sort_by`	`sort`	Yes – `orderBy` enum	Uses `*_sort_${locale}` fields, `missing_values: last`
By recency	date `sort_by`	`sort` on date	Yes	From `date` fields
Random	`_rand()`	`random_score`	Maybe	For discover surfaces
Pagination
Relay connection	`page`/`per_page`, `offset`	`from`/`size`, `search_after`	Yes	`first`/`after`, opaque cursor hides the offset-vs-`search_after` choice
Total count	always	`track_total_hits`	Yes – `totalCount`
Ranking / boosting (see breakdown below)
Field weights (title > description)	`query_by_weights`	field `^boost`	No (server-side)	From a new `search:boost` annotation
Locale weighting (user’s lang higher)	`query_by_weights`	field `^boost`	No (server-side)	Computed from Accept-Language
Exact-match / token-position priority	`prioritize_*`	query DSL	No (server default)
Editorial curation: pin / hide	`pinned_hits`, `hidden_hits`, overrides	pinned query / curation	Deferred (out of scope v1)
Decay / recency / popularity boost	`_eval()`, decay	`function_score`, decay	No raw params	Surfaced only via the named `relevance` enum
Synonyms	synonym rules	synonym filter	No (server-side)	Engine config
Result shaping
Field selection	`include_fields`	`_source`	Free – GraphQL selection set	No param needed
Highlighting / snippets	`highlight_*`	`highlight`	Yes (v1)	`highlights` on result; engine-agnostic
Grouping / dedupe	`group_by`	`collapse`	Deferred
Discovery
Did-you-mean / suggest	`q` + typos	`suggest`	Yes (v1)	Autocomplete field/endpoint
Vector / semantic / hybrid	`vector_query`	`knn`, RRF	Schema room (v1)	Reserve the shape; no implementation yet
Cross-cutting
Localization	–	–	Yes, everywhere	`Accept-Language` → `[LanguageString!]!` ordering
HTTP caching	`use_cache`	–	REST twin	`ETag` / `Cache-Control`
Analytics / click tracking	analytics rules	–	Out of scope v1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraphQL search API #495

Summary

Requirement: facet labels returned inline

Why inline

Notes on shape (from the platform reference)

Full feature surface: what the GraphQL must cover

Capability map (Typesense ∪ Elasticsearch → our GraphQL)

Query boosting

v1 scope (decided)

Schema sketch

Note: `search:boost` is a new annotation

Relation to current state / interim

Out of scope

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

GraphQL search API #495

Description

Summary

Requirement: facet labels returned inline

Why inline

Notes on shape (from the platform reference)

Full feature surface: what the GraphQL must cover

Capability map (Typesense ∪ Elasticsearch → our GraphQL)

Query boosting

v1 scope (decided)

Schema sketch

Note: search:boost is a new annotation

Relation to current state / interim

Out of scope

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Note: `search:boost` is a new annotation