Skip to content

Commit 94fe6e9

Browse files
reakaleekclaude
authored andcommitted
Ingest: Add post-indexing content date resolution (#3112)
* Search: Add post-indexing content date resolution via update_by_query HashedBulkUpdate uses bulk update actions (scripted upserts) which skip Elasticsearch ingest pipelines, so content_last_updated was never set during normal indexing. This adds a ResolveContentDatesAsync step that runs _update_by_query with the enrichment pipeline after indexing completes, and switches StopAsync to use read aliases instead of the write target (which is removed after CompleteAsync). Includes integration tests against a real Elasticsearch container validating cold-start, date preservation, change detection, and the bulk-update pipeline gap. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Search: Fix lint warnings in content date enrichment tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 7a45c71 commit 94fe6e9

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

src/Elastic.Markdown/Exporters/Elasticsearch/ElasticsearchMarkdownExporter.cs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,14 @@ public async ValueTask StartAsync(Cancel ctx = default)
224224
public async ValueTask StopAsync(Cancel ctx = default)
225225
{
226226
_ = await _orchestrator.CompleteAsync(null, ctx);
227-
await _contentDateEnrichment.SyncLookupIndexAsync(_lexicalTypeContext.IndexStrategy!.WriteTarget!, ctx);
227+
228+
// Resolve content_last_updated for documents where the ingest pipeline didn't fire.
229+
// HashedBulkUpdate uses bulk update actions, which skip ingest pipelines.
230+
// Use the read alias (-latest) rather than WriteTarget, which is removed after CompleteAsync.
231+
await _contentDateEnrichment.ResolveContentDatesAsync(_lexicalReadAlias, ctx);
232+
await _contentDateEnrichment.ResolveContentDatesAsync(_semanticReadAlias, ctx);
233+
234+
await _contentDateEnrichment.SyncLookupIndexAsync(_lexicalReadAlias, ctx);
228235
}
229236

230237
// Resolve content_last_updated for documents where the ingest pipeline didn't fire.

0 commit comments

Comments
 (0)