Add Pagefind incremental indexing#2890
Add Pagefind incremental indexing#2890MoshiMoshiMochi wants to merge 10 commits intoMarkBind:masterfrom
Conversation
It takes in an array of pages that needs to be updated. Filters if the page is need searchable, then add a new HTML file to update the in-memory index. Finally, we retrieve all the in-memory indexes, wipe the old files and write newly updated files to disk.
add & change handlers use updatePagefindIndex as there is only a need to either add/update 1 file in-memory. However, as there is no way of to remove just 1 specific index file from in-memory indexes, remove handler will use indexSiteWithPagefind, meaning that it will reindex the entire site. There might be a better solution out there. But since removing already calls rebuildSourceFiles which takes significantly longer that it takes to reindex the site. I don't think its a big deal.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2890 +/- ##
==========================================
+ Coverage 71.35% 71.75% +0.39%
==========================================
Files 133 133
Lines 7229 7285 +56
Branches 1580 1721 +141
==========================================
+ Hits 5158 5227 +69
+ Misses 2065 1959 -106
- Partials 6 99 +93 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Now serving one page will directly call updatePagefindIndex with the respective page to build instead of calling indexSiteWithPagefind which will attempt to index all other pages too, even if they DNE. Still need to update test cases... arghhhhhhh lol
Removed warning about how pagefind only works on full site build as that is no longer the case after these changes
There was a problem hiding this comment.
Pull request overview
This PR adds incremental Pagefind indexing support during markbind serve, so search results stay up-to-date without requiring a full rebuild/restart (closing #2873).
Changes:
- Add an in-memory Pagefind index in
SiteGenerationManager, with a newupdatePagefindIndexmethod for incremental updates. - Update serve file-watch handlers to trigger Pagefind incremental updates on add/change, and full reindex on delete.
- Update unit tests and user guide to reflect the new serve-time indexing behavior.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/Site/SiteGenerationManager.ts | Introduces in-memory Pagefind index + incremental update/write-out logic; adjusts generate/rebuild flows. |
| packages/core/src/Site/index.ts | Exposes indexSiteWithPagefind / updatePagefindIndex on Site facade. |
| packages/cli/src/util/serveUtil.ts | Hooks incremental/full Pagefind indexing into file watcher handlers. |
| packages/core/test/unit/Site/SiteGenerationManager.test.ts | Adds unit tests for incremental indexing + lazy serve indexing behavior. |
| packages/core/test/unit/utils/data.ts | Extends Pagefind index mocks with deleteIndex / getFiles. |
| packages/core/test/unit/Site/Site.test.ts | Adds delegation tests for new Site methods and updates generation manager mock. |
| docs/userGuide/makingTheSiteSearchable.md | Removes outdated warning that Pagefind doesn’t update during serve. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| try { | ||
| const content = await fs.readFile(page.pageConfig.resultPath, 'utf8'); | ||
| const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath); | ||
|
|
||
| return this.pagefindIndex.addHTMLFile({ | ||
| sourcePath: relativePath, | ||
| content, | ||
| }); | ||
| } catch (err) { | ||
| const pageResultPath = page.pageConfig.resultPath; | ||
| logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
In updatePagefindIndex, the per-page try/catch swallows all errors (including addHTMLFile failures) and turns them into a warn+skip. This makes genuine indexing failures look like “file not built yet”, and it prevents the outer catch from returning false (the new unit test expects false when addHTMLFile rejects). Consider only catching fs.readFile/missing-file errors, and letting addHTMLFile errors propagate (or rethrow) so the method can fail fast and log an error appropriately.
| try { | |
| const content = await fs.readFile(page.pageConfig.resultPath, 'utf8'); | |
| const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath); | |
| return this.pagefindIndex.addHTMLFile({ | |
| sourcePath: relativePath, | |
| content, | |
| }); | |
| } catch (err) { | |
| const pageResultPath = page.pageConfig.resultPath; | |
| logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`); | |
| return null; | |
| } | |
| let content; | |
| try { | |
| content = await fs.readFile(page.pageConfig.resultPath, 'utf8'); | |
| } catch (err) { | |
| const pageResultPath = page.pageConfig.resultPath; | |
| logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`); | |
| return null; | |
| } | |
| const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath); | |
| return this.pagefindIndex.addHTMLFile({ | |
| sourcePath: relativePath, | |
| content, | |
| }); |
| // Only close the index in build/deploy mode; keep it in memory for serve mode | ||
| // Detect serve mode by checking if postBackgroundBuildFunc has a name (named function = serve) | ||
| const isServeMode = this.postBackgroundBuildFunc.name !== ''; | ||
| const shouldClose = !isServeMode; |
There was a problem hiding this comment.
Serve-mode detection via this.postBackgroundBuildFunc.name !== '' is brittle (function names can be inferred/changed by refactors/minifiers, and build/deploy could accidentally pass a named function). This affects whether Pagefind is closed and whether pagefindIndex is retained for incremental updates. Prefer an explicit flag (e.g., store the isDevMode constructor param, or add a dedicated keepPagefindIndexInMemory boolean) to decide whether to call pagefind.close() and clear pagefindIndex.
| if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) { | ||
| return site.rebuildSourceFiles(); | ||
| await site.rebuildSourceFiles(); | ||
| return await site.updatePagefindIndex(filePath); | ||
| } |
There was a problem hiding this comment.
updatePagefindIndex can auto-create a Pagefind index even when search is disabled, because it doesn’t check siteConfig.enableSearch. Since serveUtil now calls site.updatePagefindIndex(...) on every page/dependency change, this can generate Pagefind assets unexpectedly for sites with enableSearch: false. Add a guard (either here in serveUtil, or inside Site.updatePagefindIndex / SiteGenerationManager.updatePagefindIndex) to no-op when enableSearch is false.
| if (site.isDependencyOfPage(filePath)) { | ||
| return site.rebuildAffectedSourceFiles(filePath); | ||
| await site.rebuildAffectedSourceFiles(filePath); | ||
| return await site.updatePagefindIndex(filePath); |
There was a problem hiding this comment.
On dependency changes, changeHandler calls rebuildAffectedSourceFiles(filePath) but then calls updatePagefindIndex(filePath). Site.updatePagefindIndex filters by page.pageConfig.sourcePath === filePath, so a dependency path (e.g. _markbind/variables.md) won’t match any page source paths and the index won’t be updated for the pages that were regenerated. Pass the affected pages instead (e.g., compute pages where page.isDependency(filePath) is true, or change updatePagefindIndex to accept a dependency path and map it to the affected pages).
| return await site.updatePagefindIndex(filePath); | |
| return await site.indexSiteWithPagefind(); |
| if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) { | ||
| return site.rebuildSourceFiles(); | ||
| await site.rebuildSourceFiles(); | ||
| return await site.indexSiteWithPagefind(); |
There was a problem hiding this comment.
removeHandler now calls site.indexSiteWithPagefind() unconditionally after rebuildSourceFiles(). Unlike generate(), this bypasses the enableSearch guard and will attempt to build a Pagefind index even when enableSearch: false. Add an enableSearch check before calling indexSiteWithPagefind, or make Site/SiteGenerationManager.indexSiteWithPagefind a no-op when search is disabled.
| return await site.indexSiteWithPagefind(); | |
| if (site.siteConfig?.enableSearch) { | |
| return await site.indexSiteWithPagefind(); | |
| } | |
| return; |
| async updatePagefindIndex(filePaths: string | string[]): Promise<boolean> { | ||
| const paths = Array.isArray(filePaths) ? filePaths : [filePaths]; | ||
| const pages = this.generationManager.sitePages.pages.filter(page => | ||
| paths.some(p => page.pageConfig.sourcePath === p), |
There was a problem hiding this comment.
Site.updatePagefindIndex only selects pages whose pageConfig.sourcePath exactly matches the provided filePaths. When callers pass a dependency path (e.g. from isDependencyOfPage), this will produce an empty page list and the index won’t update for regenerated pages. Consider either (1) changing this method to accept Page[] instead of file paths, or (2) enhancing the selection logic to include pages that depend on the provided path(s) (e.g., page.isDependency(p)), not just direct source path matches.
| paths.some(p => page.pageConfig.sourcePath === p), | |
| paths.some(p => | |
| page.pageConfig.sourcePath === p || page.isDependency(p), | |
| ), |
| if (path.basename(filePath) === path.basename(site.siteConfigPath)) { | ||
| return site.reloadSiteConfig(); | ||
| } | ||
| if (site.isDependencyOfPage(filePath)) { | ||
| return site.rebuildAffectedSourceFiles(filePath); | ||
| await site.rebuildAffectedSourceFiles(filePath); | ||
| return await site.updatePagefindIndex(filePath); |
There was a problem hiding this comment.
reloadSiteConfig() can trigger full rebuilds (and may change enableSearch/Pagefind config), but serveUtil’s site-config change path returns early without rebuilding the Pagefind index. With incremental indexing in serve mode, the Pagefind index can become stale or not created when enableSearch is toggled on. Consider reindexing (likely a full indexSiteWithPagefind()) after site.reloadSiteConfig() when enableSearch is true.
There was a problem hiding this comment.
fair enough
yihao03
left a comment
There was a problem hiding this comment.
Just my 2 cents! Im not super familiar with this
| try { | ||
| const searchablePages = pages.filter(page => page.pageConfig.searchable); | ||
|
|
||
| await Promise.all(searchablePages.map(async (page) => { |
There was a problem hiding this comment.
Just curious, is really necessary for we to await? or can we have it run asynchronously.
By awaiting we would increase the time taken for each update to be shown. I don't think we would need the search result to be immediately ready too. It would require time for the users to click on the text box and then start searching, which is presumably sufficient for the pagefind to complete indexing.
There was a problem hiding this comment.
Yea u right, but the updating process usually only takes a few second maximum to complete so I felt like the latency reduction is negligible. Users already wait for the rebuild to complete anyway.
There was a problem hiding this comment.
yep i think the idea is to avoid increasing the rebuild time haha it can be annoying (to me at least)
wld there be any downsides to dropping the await?
| } | ||
| })); | ||
|
|
||
| const { files } = await this.pagefindIndex.getFiles(); |
There was a problem hiding this comment.
is it necessary to clear the files after adding getting them here?
on line 1035 of this file:
return this.pagefindIndex.addHTMLFile({
sourcePath: relativePath,
content,
});There was a problem hiding this comment.
Yep it is!
addHTMLFile() updates entries in the in-memory index but doesn't clear old on-disk files, so emptyDir() removes stale files before we updated files from getFiles() writes the complete current index.
There was a problem hiding this comment.
ops sorry for not being clear there, i meant clearing the in-memory index added by addHTMLPages on each call to this function
| if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) { | ||
| return site.rebuildSourceFiles(); | ||
| await site.rebuildSourceFiles(); | ||
| return await site.indexSiteWithPagefind(); |
There was a problem hiding this comment.
regarding the asynchronous update, perhaps we don't await this instead
What is the purpose of this pull request?
Overview of changes:
closes #2873
Implemented incremental Pagefind search index
Anything you'd like to highlight/discuss:
The incremental Pagefind search index works in the following manner.
For deleting I couldn't find a way to remove just 1 specific index file from in-memory indexes. Hence, I decided that the remove handler will use
indexSiteWithPagefind, meaning that it will reindex the entire site. There might be a better solution out there. But since removing already callsrebuildSourceFileswhich takes significantly longer than it takes to reindex the site, I don't think its a big deal. But hey just something to note if possible to improve ig.Technically this some what tackles #2882 but it does so indirectly as we no longer call
indexSiteWithPagefindwhen serving one-page, instead we callupdatePagefindIndexwhich incrementally indexes pages.Testing instructions:
Full build:
markbind serve -dOne-Page:
markbind serve -oProposed commit message: (wrap lines at 72 characters)
Add Pagefind incremental indexing
Checklist: ☑️
Reviewer checklist:
Indicate the SEMVER impact of the PR:
At the end of the review, please label the PR with the appropriate label:
r.Major,r.Minor,r.Patch.Breaking change release note preparation (if applicable):