Skip to content

Add Pagefind incremental indexing#2890

Open
MoshiMoshiMochi wants to merge 10 commits intoMarkBind:masterfrom
MoshiMoshiMochi:dev/pagefind-incremental-indexing
Open

Add Pagefind incremental indexing#2890
MoshiMoshiMochi wants to merge 10 commits intoMarkBind:masterfrom
MoshiMoshiMochi:dev/pagefind-incremental-indexing

Conversation

@MoshiMoshiMochi
Copy link
Copy Markdown
Contributor

@MoshiMoshiMochi MoshiMoshiMochi commented Apr 13, 2026

What is the purpose of this pull request?

  • Documentation update
  • Bug fix
  • Feature addition or enhancement
  • Code maintenance
  • DevOps
  • Improve developer experience
  • Others, please explain:

Overview of changes:
closes #2873
Implemented incremental Pagefind search index

Anything you'd like to highlight/discuss:
The incremental Pagefind search index works in the following manner.

  1. If a user updates a page -> Pagefind will re-index that specific page.
  2. If a user creates a new page -> Pagefind will index that specific new page.
  3. If a user deletes a page -> Pagefind will re-index all pages.

For deleting I couldn't find a way to remove just 1 specific index file from in-memory indexes. Hence, I decided that the remove handler will use indexSiteWithPagefind, meaning that it will reindex the entire site. There might be a better solution out there. But since removing already calls rebuildSourceFiles which takes significantly longer than it takes to reindex the site, I don't think its a big deal. But hey just something to note if possible to improve ig.

Technically this some what tackles #2882 but it does so indirectly as we no longer call indexSiteWithPagefind when serving one-page, instead we call updatePagefindIndex which incrementally indexes pages.

Testing instructions:
Full build: markbind serve -d

  1. Serve any markbind site
  2. Update any page and save the changes -> Pagefind should only index that one page -> Check to see if the changes are within the search
  3. Add any page -> Pagefind should only index that newly added page -> Check to see if the newly create page is searchable by Pagefind
  4. Delete any page -> Pagefind should re-index the entire site -> Check to see if the removed page is no longer searchable

One-Page: markbind serve -o

  1. Serve any markbind site with just one-page -> Pagefind should index that specific page if it is searchable
  2. Navigate through the other pages -> Pagefind should index those pages incrementally

Proposed commit message: (wrap lines at 72 characters)
Add Pagefind incremental indexing


Checklist: ☑️

  • Updated the documentation for feature additions and enhancements
  • Added tests for bug fixes or features
  • Linked all related issues
  • No unrelated changes

Reviewer checklist:

Indicate the SEMVER impact of the PR:

  • Major (when you make incompatible API changes)
  • Minor (when you add functionality in a backward compatible manner)
  • Patch (when you make backward compatible bug fixes)

At the end of the review, please label the PR with the appropriate label: r.Major, r.Minor, r.Patch.

Breaking change release note preparation (if applicable):

  • To be included in the release note for any feature that is made obsolete/breaking

Give a brief explanation note about:

  • what was the old feature that was made obsolete
  • any replacement feature (if any), and
  • how the author should modify his website to migrate from the old feature to the replacement feature (if possible).

It takes in an array of pages that needs to be updated. Filters if the
page is need searchable, then add a new HTML file to update the
in-memory index. Finally, we retrieve all the in-memory indexes, wipe
the old files and write newly updated files to disk.
add & change handlers use updatePagefindIndex as there is only a need to
either add/update 1 file in-memory.

However, as there is no way of to remove just 1 specific index file from
in-memory indexes, remove handler will use indexSiteWithPagefind,
meaning that it will reindex the entire site. There might be a better
solution out there. But since removing already calls rebuildSourceFiles
which takes significantly longer that it takes to reindex the site. I
don't think its a big deal.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 84.37500% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.75%. Comparing base (3132728) to head (d62750d).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
packages/core/src/Site/SiteGenerationManager.ts 82.45% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2890      +/-   ##
==========================================
+ Coverage   71.35%   71.75%   +0.39%     
==========================================
  Files         133      133              
  Lines        7229     7285      +56     
  Branches     1580     1721     +141     
==========================================
+ Hits         5158     5227      +69     
+ Misses       2065     1959     -106     
- Partials        6       99      +93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Now serving one page will directly call updatePagefindIndex with the
respective page to build instead of calling indexSiteWithPagefind
which will attempt to index all other pages too, even if they DNE.

Still need to update test cases... arghhhhhhh lol
Removed warning about how pagefind only works on full site build as that
is no longer the case after these changes
@MoshiMoshiMochi MoshiMoshiMochi marked this pull request as ready for review April 13, 2026 17:20
@MoshiMoshiMochi MoshiMoshiMochi requested a review from a team April 13, 2026 17:20
@Harjun751 Harjun751 requested a review from Copilot April 16, 2026 07:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds incremental Pagefind indexing support during markbind serve, so search results stay up-to-date without requiring a full rebuild/restart (closing #2873).

Changes:

  • Add an in-memory Pagefind index in SiteGenerationManager, with a new updatePagefindIndex method for incremental updates.
  • Update serve file-watch handlers to trigger Pagefind incremental updates on add/change, and full reindex on delete.
  • Update unit tests and user guide to reflect the new serve-time indexing behavior.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
packages/core/src/Site/SiteGenerationManager.ts Introduces in-memory Pagefind index + incremental update/write-out logic; adjusts generate/rebuild flows.
packages/core/src/Site/index.ts Exposes indexSiteWithPagefind / updatePagefindIndex on Site facade.
packages/cli/src/util/serveUtil.ts Hooks incremental/full Pagefind indexing into file watcher handlers.
packages/core/test/unit/Site/SiteGenerationManager.test.ts Adds unit tests for incremental indexing + lazy serve indexing behavior.
packages/core/test/unit/utils/data.ts Extends Pagefind index mocks with deleteIndex / getFiles.
packages/core/test/unit/Site/Site.test.ts Adds delegation tests for new Site methods and updates generation manager mock.
docs/userGuide/makingTheSiteSearchable.md Removes outdated warning that Pagefind doesn’t update during serve.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1031 to +1043
try {
const content = await fs.readFile(page.pageConfig.resultPath, 'utf8');
const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath);

return this.pagefindIndex.addHTMLFile({
sourcePath: relativePath,
content,
});
} catch (err) {
const pageResultPath = page.pageConfig.resultPath;
logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`);
return null;
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In updatePagefindIndex, the per-page try/catch swallows all errors (including addHTMLFile failures) and turns them into a warn+skip. This makes genuine indexing failures look like “file not built yet”, and it prevents the outer catch from returning false (the new unit test expects false when addHTMLFile rejects). Consider only catching fs.readFile/missing-file errors, and letting addHTMLFile errors propagate (or rethrow) so the method can fail fast and log an error appropriately.

Suggested change
try {
const content = await fs.readFile(page.pageConfig.resultPath, 'utf8');
const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath);
return this.pagefindIndex.addHTMLFile({
sourcePath: relativePath,
content,
});
} catch (err) {
const pageResultPath = page.pageConfig.resultPath;
logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`);
return null;
}
let content;
try {
content = await fs.readFile(page.pageConfig.resultPath, 'utf8');
} catch (err) {
const pageResultPath = page.pageConfig.resultPath;
logger.warn(`Skipping index update for ${pageResultPath}: file not built yet`);
return null;
}
const relativePath = path.relative(this.outputPath, page.pageConfig.resultPath);
return this.pagefindIndex.addHTMLFile({
sourcePath: relativePath,
content,
});

Copilot uses AI. Check for mistakes.
Comment on lines +992 to +995
// Only close the index in build/deploy mode; keep it in memory for serve mode
// Detect serve mode by checking if postBackgroundBuildFunc has a name (named function = serve)
const isServeMode = this.postBackgroundBuildFunc.name !== '';
const shouldClose = !isServeMode;
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Serve-mode detection via this.postBackgroundBuildFunc.name !== '' is brittle (function names can be inferred/changed by refactors/minifiers, and build/deploy could accidentally pass a named function). This affects whether Pagefind is closed and whether pagefindIndex is retained for incremental updates. Prefer an explicit flag (e.g., store the isDevMode constructor param, or add a dedicated keepPagefindIndexInMemory boolean) to decide whether to call pagefind.close() and clear pagefindIndex.

Copilot uses AI. Check for mistakes.
Comment on lines 37 to 40
if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) {
return site.rebuildSourceFiles();
await site.rebuildSourceFiles();
return await site.updatePagefindIndex(filePath);
}
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updatePagefindIndex can auto-create a Pagefind index even when search is disabled, because it doesn’t check siteConfig.enableSearch. Since serveUtil now calls site.updatePagefindIndex(...) on every page/dependency change, this can generate Pagefind assets unexpectedly for sites with enableSearch: false. Add a guard (either here in serveUtil, or inside Site.updatePagefindIndex / SiteGenerationManager.updatePagefindIndex) to no-op when enableSearch is false.

Copilot uses AI. Check for mistakes.
if (site.isDependencyOfPage(filePath)) {
return site.rebuildAffectedSourceFiles(filePath);
await site.rebuildAffectedSourceFiles(filePath);
return await site.updatePagefindIndex(filePath);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On dependency changes, changeHandler calls rebuildAffectedSourceFiles(filePath) but then calls updatePagefindIndex(filePath). Site.updatePagefindIndex filters by page.pageConfig.sourcePath === filePath, so a dependency path (e.g. _markbind/variables.md) won’t match any page source paths and the index won’t be updated for the pages that were regenerated. Pass the affected pages instead (e.g., compute pages where page.isDependency(filePath) is true, or change updatePagefindIndex to accept a dependency path and map it to the affected pages).

Suggested change
return await site.updatePagefindIndex(filePath);
return await site.indexSiteWithPagefind();

Copilot uses AI. Check for mistakes.
if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) {
return site.rebuildSourceFiles();
await site.rebuildSourceFiles();
return await site.indexSiteWithPagefind();
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeHandler now calls site.indexSiteWithPagefind() unconditionally after rebuildSourceFiles(). Unlike generate(), this bypasses the enableSearch guard and will attempt to build a Pagefind index even when enableSearch: false. Add an enableSearch check before calling indexSiteWithPagefind, or make Site/SiteGenerationManager.indexSiteWithPagefind a no-op when search is disabled.

Suggested change
return await site.indexSiteWithPagefind();
if (site.siteConfig?.enableSearch) {
return await site.indexSiteWithPagefind();
}
return;

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enuf

async updatePagefindIndex(filePaths: string | string[]): Promise<boolean> {
const paths = Array.isArray(filePaths) ? filePaths : [filePaths];
const pages = this.generationManager.sitePages.pages.filter(page =>
paths.some(p => page.pageConfig.sourcePath === p),
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Site.updatePagefindIndex only selects pages whose pageConfig.sourcePath exactly matches the provided filePaths. When callers pass a dependency path (e.g. from isDependencyOfPage), this will produce an empty page list and the index won’t update for regenerated pages. Consider either (1) changing this method to accept Page[] instead of file paths, or (2) enhancing the selection logic to include pages that depend on the provided path(s) (e.g., page.isDependency(p)), not just direct source path matches.

Suggested change
paths.some(p => page.pageConfig.sourcePath === p),
paths.some(p =>
page.pageConfig.sourcePath === p || page.isDependency(p),
),

Copilot uses AI. Check for mistakes.
Comment on lines 59 to +64
if (path.basename(filePath) === path.basename(site.siteConfigPath)) {
return site.reloadSiteConfig();
}
if (site.isDependencyOfPage(filePath)) {
return site.rebuildAffectedSourceFiles(filePath);
await site.rebuildAffectedSourceFiles(filePath);
return await site.updatePagefindIndex(filePath);
Copy link

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reloadSiteConfig() can trigger full rebuilds (and may change enableSearch/Pagefind config), but serveUtil’s site-config change path returns early without rebuilding the Pagefind index. With incremental indexing in serve mode, the Pagefind index can become stale or not created when enableSearch is toggled on. Consider reindexing (likely a full indexSiteWithPagefind()) after site.reloadSiteConfig() when enableSearch is true.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

Copy link
Copy Markdown
Contributor

@yihao03 yihao03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my 2 cents! Im not super familiar with this

try {
const searchablePages = pages.filter(page => page.pageConfig.searchable);

await Promise.all(searchablePages.map(async (page) => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is really necessary for we to await? or can we have it run asynchronously.

By awaiting we would increase the time taken for each update to be shown. I don't think we would need the search result to be immediately ready too. It would require time for the users to click on the text box and then start searching, which is presumably sufficient for the pagefind to complete indexing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea u right, but the updating process usually only takes a few second maximum to complete so I felt like the latency reduction is negligible. Users already wait for the rebuild to complete anyway.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep i think the idea is to avoid increasing the rebuild time haha it can be annoying (to me at least)

wld there be any downsides to dropping the await?

}
}));

const { files } = await this.pagefindIndex.getFiles();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it necessary to clear the files after adding getting them here?
on line 1035 of this file:

 return this.pagefindIndex.addHTMLFile({
            sourcePath: relativePath,
            content,
          });

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it is!
addHTMLFile() updates entries in the in-memory index but doesn't clear old on-disk files, so emptyDir() removes stale files before we updated files from getFiles() writes the complete current index.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ops sorry for not being clear there, i meant clearing the in-memory index added by addHTMLPages on each call to this function

if (site.isFilepathAPage(filePath) || site.isDependencyOfPage(filePath)) {
return site.rebuildSourceFiles();
await site.rebuildSourceFiles();
return await site.indexSiteWithPagefind();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regarding the asynchronous update, perhaps we don't await this instead

@Harjun751 Harjun751 mentioned this pull request Apr 16, 2026
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement incremental Pagefind indexing for markbind serve

3 participants