Skip to content

Extract Lang attribute for marked contents#20407

Open
edoardocavazza wants to merge 4 commits into
mozilla:masterfrom
edoardocavazza:marked-content-lang
Open

Extract Lang attribute for marked contents#20407
edoardocavazza wants to merge 4 commits into
mozilla:masterfrom
edoardocavazza:marked-content-lang

Conversation

@edoardocavazza

Copy link
Copy Markdown
Contributor

Some marked contents have a Lang attribute defined in the props passed to the beginMarkedContentProps operator. With this PR, the evaluator extracts this information. I chose to pass the props object as third argument in order to maintain backwards compatibility, but I'm open to changes.

Comment thread test/unit/api_spec.js Outdated
const pdfDoc = await loadingTask.promise;
const pdfPage = await pdfDoc.getPage(1);

pdfDoc.annotationStorage.setValue("30R", { value: "test" });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no annotations in the pdf so I don't see the point of setting these values.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh I am sorry, bad copy/paste from a previous test. Removed

Comment thread test/unit/api_spec.js
Comment thread src/core/evaluator.js
@calixteman

Copy link
Copy Markdown
Contributor

Could you rebase your patch ?

@calixteman

Copy link
Copy Markdown
Contributor

/botio test

@moz-tools-bot

Copy link
Copy Markdown
Collaborator

From: Bot.io (Windows)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.193.163.58:8877/d64eb63a5678ba9/output.txt

@moz-tools-bot

Copy link
Copy Markdown
Collaborator

From: Bot.io (Linux m4)


Received

Command cmd_test from @calixteman received. Current queue size: 0

Live output at: http://54.241.84.105:8877/0d18703028e4f9f/output.txt

@moz-tools-bot

Copy link
Copy Markdown
Collaborator

From: Bot.io (Linux m4)


Failed

Full output at http://54.241.84.105:8877/0d18703028e4f9f/output.txt

Total script time: 60.00 mins

@moz-tools-bot

Copy link
Copy Markdown
Collaborator

From: Bot.io (Windows)


Failed

Full output at http://54.193.163.58:8877/d64eb63a5678ba9/output.txt

Total script time: 75.18 mins

  • Unit tests: Passed
  • Integration Tests: Passed
  • Regression tests: FAILED
  different ref/snapshot: 2

Image differences available at: http://54.193.163.58:8877/d64eb63a5678ba9/reftest-analyzer.html#web=eq.log

@calixteman

Copy link
Copy Markdown
Contributor

@edoardocavazza Can you check the failure in issue12909 ?

@nicolo-ribaudo

Copy link
Copy Markdown
Collaborator

The lang property can affect which font the browser falls back to, and thus its size. In src/display/text_layer.js there a bunch of calls to #getCtx with the correct language to be able to measure how wide the text rendered by the browser will be, and scale it accordingly.

If some of the elements in the text layer are going to have a different language, we need to make sure to get the measurements with the appropriate canvas context (this might be what's causing the horizontal scaling difference in issue12909).

@calixteman calixteman force-pushed the marked-content-lang branch from 9f576be to cc9f9cf Compare July 5, 2026 13:26
@codecov-commenter

codecov-commenter commented Jul 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.43%. Comparing base (330cc4f) to head (52e7a3c).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20407      +/-   ##
==========================================
+ Coverage   89.41%   89.43%   +0.01%     
==========================================
  Files         262      262              
  Lines       66738    66761      +23     
==========================================
+ Hits        59675    59705      +30     
+ Misses       7063     7056       -7     
Flag Coverage Δ
browsertest 66.53% <100.00%> (+0.03%) ⬆️
fonttest 9.05% <ø> (ø)
integrationtest 68.75% <100.00%> (+0.02%) ⬆️
unittest 57.38% <96.66%> (+0.04%) ⬆️
unittestcli 56.31% <40.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread test/unit/api_spec.js Outdated
Comment thread src/core/evaluator.js Outdated
@calixteman calixteman force-pushed the marked-content-lang branch from cc9f9cf to 52e7a3c Compare July 5, 2026 15:19
Comment thread test/unit/api_spec.js
Comment on lines +4983 to +4984
const opList = await pdfPage.getOperatorList();
expect(opList.fnArray[0]).toEqual(OPS.beginMarkedContentProps);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Improve readability by adding a blank line before the expect statements.

Suggested change
const opList = await pdfPage.getOperatorList();
expect(opList.fnArray[0]).toEqual(OPS.beginMarkedContentProps);
const opList = await pdfPage.getOperatorList();
expect(opList.fnArray[0]).toEqual(OPS.beginMarkedContentProps);

Comment thread test/unit/api_spec.js
Comment on lines +4989 to +4990
expect(opList.argsArray[10][2]?.lang).toEqual("es-ES");
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't clean-up after itself.

Suggested change
expect(opList.argsArray[10][2]?.lang).toEqual("es-ES");
});
expect(opList.argsArray[10][2]?.lang).toEqual("es-ES");
await loadingTask.destroy();
});

Comment on lines +277 to +278
expect(span.textContent).toEqual("Esto es español");
});

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't clean-up after itself.

Suggested change
expect(span.textContent).toEqual("Esto es español");
});
expect(span.textContent).toEqual("Esto es español");
await loadingTask.destroy();
});

Comment thread src/display/text_layer.js
Comment on lines +537 to +545
let langAscentCache = this.#ascentCache.get((lang ||= ""));
if (langAscentCache) {
const cachedAscent = langAscentCache.get(fontFamily);
if (cachedAscent) {
return cachedAscent;
}
} else {
langAscentCache = new Map();
this.#ascentCache.set(lang, langAscentCache);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can probably be shortened?

Suggested change
let langAscentCache = this.#ascentCache.get((lang ||= ""));
if (langAscentCache) {
const cachedAscent = langAscentCache.get(fontFamily);
if (cachedAscent) {
return cachedAscent;
}
} else {
langAscentCache = new Map();
this.#ascentCache.set(lang, langAscentCache);
const langAscentCache = this.#ascentCache.getOrInsertComputed(
(lang ||= ""),
makeMap
);
const cachedAscent = langAscentCache.get(fontFamily);
if (cachedAscent) {
return cachedAscent;
}

Comment thread src/core/evaluator.js
Comment on lines +2310 to +2318
let props = null;
if (args[1] instanceof Dict) {
const lang = args[1].get("Lang");
if (typeof lang === "string") {
props = Object.create(null);
props.lang = stringToPDFString(lang);
}
}

@Snuffleupagus Snuffleupagus Jul 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why are we adding unused data here (outside of a single unit-test)!?

This means adding effectively dead code in the Firefox PDF Viewer, and it'll also be overall (ever so slightly) less efficient since this data first of all needs to be parsed and secondly it needs to be sent to the main-thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants