You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a presentation-facing GraphQL search API, generated from the same SHACL + search: source as the existing engine-agnostic projection (@lde/search) and engine adapter (@lde/search-typesense). Per the platform design, both the GraphQL API (schema + Mercurius resolvers) and the REST API (OpenAPI + Fastify handlers) are generated from one projection, so a single source serves N engines (Typesense is the v1 adapter; Elasticsearch/OpenSearch follow the same pattern).
Facets are keyed by IRI (organizations, classes, terminology sources). The GraphQL API MUST return each facet bucket with its human-readable, locale-resolved label attached inline – so clients never perform a separate IRI→label lookup. The same applies to entity references rendered from an IRI (e.g. the publisher on a result card).
This is an engine-agnostic contract. It says nothing about Typesense, sidecar collections, joins, or export – how labels are resolved is the engine adapter’s responsibility, hidden behind the framed-document projection. Consumers of the API never learn which engine is underneath (Typesense is only the v1 adapter; Elasticsearch/OpenSearch satisfy the same contract their own way).
Concretely:
Facet buckets carry the IRI value, its count, and the label as a locale-ordered [LanguageString!]! (Accept-Language preference), matching the platform’s multilingual-string convention.
Label resolution happens server-side, below the API surface. The adapter is free to choose its mechanism – e.g. the Typesense v1 adapter can draw on a sidecar label collection; an Elasticsearch adapter might use nested documents – but none of that leaks into the schema or the resolver contract.
No engine client, key, or bespoke lookup logic is shipped into any consumer.
Why inline
Resolving labels server-side and returning them inline:
removes a per-render round-trip and the need for clients to hold a Typesense client at all;
keeps the locale-fallback policy (e.g. nl→en→first) in one server-side place instead of duplicated per consumer;
preserves the engine-agnostic boundary – the API abstracts over the search engine (Typesense is the v1 adapter, not part of the contract), @lde/search / @lde/search-typesense stay domain-agnostic mechanics (no label concept; see the framed-document projection), and label semantics live in the API layer’s resolvers and the domain projection.
Notes on shape (from the platform reference)
Standard GraphQL idioms: Prisma-style typed filter inputs (where: { field: { eq, in, gt, … } }), Relay Connection pagination (first / after / opaque cursor), facet arrays with counts.
Localised content as [LanguageString!]! ordered by Accept-Language, not language-tagged literals.
Schema + resolvers generated from SHACL + search: annotations, not hand-written.
Full feature surface: what the GraphQL must cover
Scoping driven by a comparison of the Typesense search API and the Elasticsearch _search API. Four principles decide what is in the public schema vs. configured below the boundary:
Generated, not hand-written. Every field/filter/facet/sort is emitted from SHACL + search: + the projection. The surface is bounded by what the projection materialises (title_search_nl, publisher facet, size number, date fields); the API cannot expose a capability the projection does not produce.
Engine-agnostic. Anything exposed must be expressible on both Typesense and Elasticsearch (and resolvable by a future adapter). Engine-proprietary knobs stay below the boundary.
Presentation-facing. The audience is web developers who do not know RDF. Prisma where, Relay connections, plain SDL. Relevance-engineering knobs do not leak unless a UI genuinely drives them.
Folding contract.query MUST be fold()ed server-side with the same function used at index time (see @lde/search README), or matches silently miss. Invisible to the client.
Server folds it; fans out over all search-enabled per-locale fields
Which fields are searched
query_by
multi_match.fields
No (server-side)
Derived from projection’s search-enabled fields
Prefix / autocomplete
prefix, infix
edge-ngram / suggest
Yes (v1)
Separate suggest field/endpoint, see below
Typo tolerance
num_typos, *_threshold
fuzziness
No (server default)
Tuned in engine schema
Filtering
Typed filters
filter_by
bool.filter
Yes – Prisma where
Per-field { eq, in, gt, gte, lt, lte, exists } from datatype
Boolean composition
&&||
bool must/should/must_not
Yes – AND/OR/NOT on where
IRI / reference filters
:= exact
term
Yes – eq/in on URI-string fields
Facet IRIs are filterable
Faceting
Facet buckets + counts
facet_by
terms agg
Yes (core)
facets { value count label }
Inline locale-resolved labels
(sidecar)
(nested)
Yes – headline requirement
label: [LanguageString!]!, resolved below the boundary
Numeric / date facet ranges
facet ranges
range agg
Yes where projection has numbers/dates
Generated for number/date fields
Facet value search / max values
facet_query, max_facet_values
agg include/size
Yes – args on the facet field
For long facet lists (e.g. publishers)
Disjunctive facets (a facet’s own selection does not shrink its own counts)
per-facet filter behaviour
post_filter
Yes – must get right
Resolver applies selected filters as post_filter-equivalent so multi-select facets behave correctly
Sorting
By relevance
_text_match
_score
Yes (default)
By field / per-locale sort
sort_by
sort
Yes – orderBy enum
Uses *_sort_${locale} fields, missing_values: last
By recency
date sort_by
sort on date
Yes
From date fields
Random
_rand()
random_score
Maybe
For discover surfaces
Pagination
Relay connection
page/per_page, offset
from/size, search_after
Yes
first/after, opaque cursor hides the offset-vs-search_after choice
Total count
always
track_total_hits
Yes – totalCount
Ranking / boosting (see breakdown below)
Field weights (title > description)
query_by_weights
field ^boost
No (server-side)
From a new search:boost annotation
Locale weighting (user’s lang higher)
query_by_weights
field ^boost
No (server-side)
Computed from Accept-Language
Exact-match / token-position priority
prioritize_*
query DSL
No (server default)
Editorial curation: pin / hide
pinned_hits, hidden_hits, overrides
pinned query / curation
Deferred (out of scope v1)
Decay / recency / popularity boost
_eval(), decay
function_score, decay
No raw params
Surfaced only via the named relevance enum
Synonyms
synonym rules
synonym filter
No (server-side)
Engine config
Result shaping
Field selection
include_fields
_source
Free – GraphQL selection set
No param needed
Highlighting / snippets
highlight_*
highlight
Yes (v1)
highlights on result; engine-agnostic
Grouping / dedupe
group_by
collapse
Deferred
Discovery
Did-you-mean / suggest
q + typos
suggest
Yes (v1)
Autocomplete field/endpoint
Vector / semantic / hybrid
vector_query
knn, RRF
Schema room (v1)
Reserve the shape; no implementation yet
Cross-cutting
Localization
–
–
Yes, everywhere
Accept-Language → [LanguageString!]! ordering
HTTP caching
use_cache
–
REST twin
ETag / Cache-Control
Analytics / click tracking
analytics rules
–
Out of scope v1
Query boosting
Boosting is not one feature but four mechanisms that land on different sides of the boundary:
Field weighting (title ranks above description) – query_by_weights / ^boost. Relevance engineering, not a client concern → driven by a new search:boost annotation per field; the generator emits the weights. Not a GraphQL parameter.
Locale weighting (rank the Accept-Language locale’s *_search_${locale} fields higher) – already anticipated in the @lde/search README. Computed server-side from Accept-Language; the resolver reorders the weights so the request’s locale leads.
Editorial curation (pin to top / hide) – pinned_hits/hidden_hits/overrides ↔ ES pinned query/curation. Out of scope v1; revisit when a real admin surface exists.
Signal-based boosting (recency decay, popularity) – _eval()/decay ↔ function_score. Server-configured; surfaced to clients only via the named relevance enum, never as raw decay parameters.
Net: the only boosting in the public schema is the relevance/orderBy enum. Everything else lives in search: annotations and the engine schema, generated, below the boundary – so relevance tuning changes without a breaking schema change.
Facets with counts and inline locale-ordered [LanguageString!]! labels (the headline requirement above), disjunctive multi-select behaviour, facet-value search/limit, numeric/date ranges.
Localization end-to-end via Accept-Language.
Relevance as a named enum (RELEVANCE | RECENT | POPULAR | …) – no raw boost knobs.
Highlighting / snippets on results.
Autocomplete / suggest (prefix + did-you-mean) as a separate field/endpoint.
Schema room for vector / semantic / hybrid – reserve the shape, no implementation yet.
Below the boundary – generated/configured, never in the schema
query_by field set, field weights & locale weighting (from a new search:boost annotation), typo tolerance, exact-match/token-position priority, synonyms, decay/recency tuning, the folding step, sidecar label collections, and the offset-vs-search_after pagination mechanism.
Deferred
Editorial curation / pin-hide (revisit when a real admin surface exists).
Field + locale weighting needs a home, and the cleanest one is a new search:boost annotation on the projection/SHACL source. This is the only part of this scope that reaches down into the existing packages: @lde/search’s spec vocabulary today is langText/facet/number/date with no weight concept. The addition is additive, not breaking, but it is the one piece that lands outside the new API layer.
Relation to current state / interim
The write/populate side is done and stays in LDE: @lde/search-typesense’s blue/green rebuild populates both the dataset and the sidecar label collections; the domain projection (label documents, schema, locale fallback) lives in the consumer (dataset-register, search-indexer).
Until this API lands, consumers resolve facet/publisher labels themselves as throwaway client-side code (a fetch-all-and-cache of the bounded label collection). This issue supersedes that: once labels come back inline, that interim lookup is deleted.
Summary
Add a presentation-facing GraphQL search API, generated from the same SHACL +
search:source as the existing engine-agnostic projection (@lde/search) and engine adapter (@lde/search-typesense). Per the platform design, both the GraphQL API (schema + Mercurius resolvers) and the REST API (OpenAPI + Fastify handlers) are generated from one projection, so a single source serves N engines (Typesense is the v1 adapter; Elasticsearch/OpenSearch follow the same pattern).See the platform reference: https://docs.nde.nl/stack/layers/platform#search-apis
Requirement: facet labels returned inline
Facets are keyed by IRI (organizations, classes, terminology sources). The GraphQL API MUST return each facet bucket with its human-readable, locale-resolved label attached inline – so clients never perform a separate IRI→label lookup. The same applies to entity references rendered from an IRI (e.g. the publisher on a result card).
This is an engine-agnostic contract. It says nothing about Typesense, sidecar collections, joins, or
export– how labels are resolved is the engine adapter’s responsibility, hidden behind the framed-document projection. Consumers of the API never learn which engine is underneath (Typesense is only the v1 adapter; Elasticsearch/OpenSearch satisfy the same contract their own way).Concretely:
[LanguageString!]!(Accept-Language preference), matching the platform’s multilingual-string convention.Why inline
Resolving labels server-side and returning them inline:
@lde/search/@lde/search-typesensestay domain-agnostic mechanics (nolabelconcept; see the framed-document projection), and label semantics live in the API layer’s resolvers and the domain projection.Notes on shape (from the platform reference)
where: { field: { eq, in, gt, … } }), Relay Connection pagination (first/after/ opaque cursor), facet arrays with counts.[LanguageString!]!ordered by Accept-Language, not language-tagged literals.search:annotations, not hand-written.Full feature surface: what the GraphQL must cover
Scoping driven by a comparison of the Typesense search API and the Elasticsearch
_searchAPI. Four principles decide what is in the public schema vs. configured below the boundary:search:+ the projection. The surface is bounded by what the projection materialises (title_search_nl,publisherfacet,sizenumber, date fields); the API cannot expose a capability the projection does not produce.where, Relay connections, plain SDL. Relevance-engineering knobs do not leak unless a UI genuinely drives them.queryMUST befold()ed server-side with the same function used at index time (see@lde/searchREADME), or matches silently miss. Invisible to the client.Capability map (Typesense ∪ Elasticsearch → our GraphQL)
qquery.matchquery: Stringquery_bymulti_match.fieldssearch-enabled fieldsprefix,infixsuggestsuggestfield/endpoint, see belownum_typos,*_thresholdfuzzinessfilter_bybool.filterwhere{ eq, in, gt, gte, lt, lte, exists }from datatype&&||bool must/should/must_notAND/OR/NOTonwhere:=exacttermeq/inon URI-string fieldsfacet_bytermsaggfacets { value count label }label: [LanguageString!]!, resolved below the boundaryrangeaggnumber/datefieldsfacet_query,max_facet_valuesinclude/sizepost_filterpost_filter-equivalent so multi-select facets behave correctly_text_match_scoresort_bysortorderByenum*_sort_${locale}fields,missing_values: lastsort_bysorton datedatefields_rand()random_scorepage/per_page,offsetfrom/size,search_afterfirst/after, opaque cursor hides the offset-vs-search_afterchoicetrack_total_hitstotalCountquery_by_weights^boostsearch:boostannotationquery_by_weights^boostprioritize_*pinned_hits,hidden_hits, overrides_eval(), decayfunction_score, decayrelevanceenuminclude_fields_sourcehighlight_*highlighthighlightson result; engine-agnosticgroup_bycollapseq+ typossuggestvector_queryknn, RRFAccept-Language→[LanguageString!]!orderinguse_cacheETag/Cache-ControlQuery boosting
Boosting is not one feature but four mechanisms that land on different sides of the boundary:
query_by_weights/^boost. Relevance engineering, not a client concern → driven by a newsearch:boostannotation per field; the generator emits the weights. Not a GraphQL parameter.*_search_${locale}fields higher) – already anticipated in the@lde/searchREADME. Computed server-side from Accept-Language; the resolver reorders the weights so the request’s locale leads.pinned_hits/hidden_hits/overrides ↔ ES pinned query/curation. Out of scope v1; revisit when a real admin surface exists._eval()/decay ↔function_score. Server-configured; surfaced to clients only via the namedrelevanceenum, never as raw decay parameters.Net: the only boosting in the public schema is the
relevance/orderByenum. Everything else lives insearch:annotations and the engine schema, generated, below the boundary – so relevance tuning changes without a breaking schema change.v1 scope (decided)
In – core contract
query(folded server-side), Prismawhere(eq/in/gt/gte/lt/lte/exists+AND/OR/NOT),orderBy, Relayfirst/after+totalCount.[LanguageString!]!labels (the headline requirement above), disjunctive multi-select behaviour, facet-value search/limit, numeric/date ranges.Accept-Language.RELEVANCE | RECENT | POPULAR | …) – no raw boost knobs.Below the boundary – generated/configured, never in the schema
query_byfield set, field weights & locale weighting (from a newsearch:boostannotation), typo tolerance, exact-match/token-position priority, synonyms, decay/recency tuning, the folding step, sidecar label collections, and the offset-vs-search_afterpagination mechanism.Deferred
Schema sketch
Note:
search:boostis a new annotationField + locale weighting needs a home, and the cleanest one is a new
search:boostannotation on the projection/SHACL source. This is the only part of this scope that reaches down into the existing packages:@lde/search’s spec vocabulary today islangText/facet/number/datewith no weight concept. The addition is additive, not breaking, but it is the one piece that lands outside the new API layer.Relation to current state / interim
@lde/search-typesense’s blue/greenrebuildpopulates both the dataset and the sidecar label collections; the domain projection (label documents, schema, locale fallback) lives in the consumer (dataset-register,search-indexer).Out of scope
@lde/searchor@lde/search-typesense– these stay engine-mechanics only; label resolution belongs in the API layer.