feat(bench): add quality and throughput benchmark workspace by jan-kubica · Pull Request #194 · stella/anonymize

jan-kubica · 2026-06-12T11:30:52Z

Summary

Adds packages/bench, a private workspace that benchmarks the deterministic pipeline (NER off) over the contract fixture corpus, as groundwork for publishing reproducible performance and quality numbers and for comparing other anonymization tools on the same corpus.

Stacked on #193: the bench imports the built dist like a production consumer, which is how the non-Western corpus bundling regression fixed there was found; the first commit here is that fix.

What's included

Scorer (src/scorer.ts): span-level, per-label precision/recall/F1 with one-to-one matching in two modes (exact bounds; label + largest overlap), plus unit tests.
Quality runner (src/run-quality.ts): scores predictions against the reviewed .snapshot.json reference annotations, per label, per language, micro-averaged. --predictions file.json scores an external tool's output through the same scorer; the interchange format and label-mapping expectations are documented in the README.
Throughput runner (src/run-throughput.ts): one-time costs (dictionary load, search preparation) separated from steady-state latency; warmup + measured passes with per-document medians and corpus chars/s.
Renderer producing results/RESULTS.md from the JSON reports; results from a developer machine are committed alongside the methodology README.

Methodology note

The reference annotations derive from reviewed pipeline output, so the pipeline's own score against them is ~100% by construction; the README states this explicitly and frames the number as a regression/drift signal. The meaningful outputs are throughput and, next, cross-tool comparisons via the predictions interchange format.

Verification

bun run lint, bun run typecheck (6/6 tasks), bun run format:check, and bun test in packages/bench pass. check:version and packlist tooling are unaffected (explicit package lists; bench is private).

The per-locale name corpus files were loaded with a template-literal dynamic import, which the bundler cannot resolve statically. The import survived into dist as a runtime-relative path that does not exist in the published package, so name detection was silently disabled for consumers of the built output (the regression suite imports from src and never hit the path). Replace the template literal with a map of literal import specifiers keyed by locale so each corpus file becomes a build chunk, and pin one chunk in check-packlist so the regression cannot ship again.

New private packages/bench workspace measuring the deterministic pipeline (NER off) over the contract fixture corpus: - span-level scorer (per-label precision/recall/F1, exact and overlap matching, one-to-one within label) with unit tests - quality runner scoring the pipeline against the reviewed .snapshot.json reference annotations; accepts external tool predictions via a documented JSON interchange format so other anonymizers can be scored by the same scorer on the same corpus - throughput runner (warmup + measured passes, per-document medians, corpus chars/s, one-time dictionary and prepare costs) - methodology README covering what the reference annotations can and cannot support, plus rendered results The bench imports the built dist like a production consumer, which is how it caught the non-Western corpus bundling regression fixed in the previous commit.

gemini-code-assist

Code Review

This pull request introduces a new benchmarking package, @stll/anonymize-bench, to measure the quality and throughput of the anonymization pipeline. It also refactors non-Western name imports in the main package to use literal import specifiers, ensuring correct bundler resolution. The reviewer identified three key areas for improvement: filtering readdirSync in fixtures.ts to avoid crashing on non-directory entries, deduplicating city names in dictionaries.ts using a Set for better memory efficiency, and adding a guard in run-throughput.ts to prevent a potential division-by-zero error when calculating characters per second.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-12T11:32:15Z

+  for (const language of readdirSync(CONTRACTS_DIR).toSorted()) {
+    const languageDir = join(CONTRACTS_DIR, language);


Using readdirSync directly on CONTRACTS_DIR will return all directory entries, including hidden files (such as .DS_Store) or any non-directory files. Attempting to call readdirSync on these files in the next line will throw an ENOTDIR error and crash the benchmark. Filtering for directories only using withFileTypes: true prevents this issue.

const languages = readdirSync(CONTRACTS_DIR, { withFileTypes: true }) .filter((dirent) => dirent.isDirectory()) .map((dirent) => dirent.name) .toSorted(); for (const language of languages) { const languageDir = join(CONTRACTS_DIR, language);

gemini-code-assist · 2026-06-12T11:32:15Z

+  const mergedCities: string[] = [];
+  for (const { country, entries } of cityResults) {
+    citiesByCountry[country] = entries;
+    for (const entry of entries) {
+      mergedCities.push(entry);
+    }
+  }


The mergedCities array is populated by pushing every city from every country, which can result in duplicate city names (e.g., "London" in both GB and US/CA). Deduplicating the list using a Set avoids redundant entries in the search automaton and improves memory efficiency.

Suggested change

const mergedCities: string[] = [];

for (const { country, entries } of cityResults) {

citiesByCountry[country] = entries;

for (const entry of entries) {

mergedCities.push(entry);

}

}

const mergedCitiesSet = new Set<string>();

for (const { country, entries } of cityResults) {

citiesByCountry[country] = entries;

for (const entry of entries) {

mergedCitiesSet.add(entry);

}

}

const mergedCities = Array.from(mergedCitiesSet);

gemini-code-assist · 2026-06-12T11:32:15Z

+    medianMs: roundMs(medianMs),
+    minMs: roundMs(Math.min(...samples)),
+    maxMs: roundMs(Math.max(...samples)),
+    charsPerSecond: Math.round(doc.text.length / (medianMs / 1_000)),


If a document is processed extremely quickly or due to timer resolution limits in virtualized environments, medianMs can be 0. This would result in a division-by-zero error, making charsPerSecond evaluate to Infinity. Adding a guard prevents this.

charsPerSecond: medianMs === 0 ? 0 : Math.round(doc.text.length / (medianMs / 1_000)),

jan-kubica added 2 commits June 12, 2026 13:27

gemini-code-assist Bot reviewed Jun 12, 2026

View reviewed changes

jan-kubica mentioned this pull request Jun 12, 2026

feat(bench): add Presidio and compromise comparison runs #195

Open

jan-kubica marked this pull request as ready for review June 12, 2026 12:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(bench): add quality and throughput benchmark workspace#194

feat(bench): add quality and throughput benchmark workspace#194
jan-kubica wants to merge 2 commits into
mainfrom
feat/bench

jan-kubica commented Jun 12, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		for (const language of readdirSync(CONTRACTS_DIR).toSorted()) {
		const languageDir = join(CONTRACTS_DIR, language);

Uh oh!

Conversation

jan-kubica commented Jun 12, 2026

Summary

What's included

Methodology note

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant