Skip to content

Privacy Review - MathML 4 #576

@gbshankar

Description

@gbshankar

Scope

MathML4 delta from MathML Core. MathML Core's prior PING review (issue #130) is taken as a baseline where applicable, but this review covers MathML Full-specific features that Core intentionally omits.

Overall assessment

No blocking privacy concerns. For features shared with MathML Core, the baseline Core review stands. MathML 4’s Privacy and Security Considerations sections (§D.4 and §D.5) are nevertheless insufficient because they largely defer to Core, while MathML Full introduces several features outside Core’s scope — specifically intent, the annotation src reference mechanism, mglyph, broad href on all elements, and Content MathML semantic identifiers — each of which warrants explicit treatment. The primary new privacy risk is misuse of intent as a hidden alternate content channel, potentially enabling assistive-technology-use detection. All findings appear addressable with targeted additions to the specification text.


Finding 1 — §D.4 and §D.5 must not simply defer to MathML Core

MathML Core’s privacy and security sections address the subset of MathML defined by Core, including Core-specific rendering, layout, font, link, and embedding risks. MathML 4 Full introduces features outside Core’s scope.Simply deferring to MathML Core leaves several MathML Full-specific features without explicit privacy or security analysis, including intent processing, annotation src external references, mglyph resource loading, broad href support on all elements, and Content MathML semantic identifiers.

Request: MathML 4 should include Privacy and Security Considerations sections that explicitly address MathML Full-specific features rather than relying on the Core review alone.


Finding 2 — href on all MathML elements reintroduces link-model risks outside Core

MathML Core did not retain MathML3’s broad href/xlink:href model. MathML 4 Full reinstates href on all MathML elements, including invisible and nested elements that may not fit the ordinary HTML link model. The specification should clarify activation behavior and privacy protections for these cases, including visited-link handling.

Requested addition to §D.5:

In web contexts, MathML href must not create link, navigation, URL-scheme, referrer, script-execution, download, or target-handling capabilities beyond those allowed by the host environment’s ordinary link model. The specification should also clarify safe behavior for href on non-rendered elements and for nested MathML links, since these cases may not map cleanly to ordinary visible links.


Finding 3 — AT-use detection via intent divergent content (primary new privacy concern)

The W3C Security and Privacy Questionnaire explicitly flags features that allow authors to serve different content to AT users as a privacy concern, because sites can infer AT use from subsequent user behavior. MathML Core does not include intent, so this risk is entirely new to MathML 4. The intent attribute, by design, can influence the accessible speech generated for MathML while leaving visual rendering unchanged. A malicious author could embed behavioral probes or instructions exclusively in intent values and observe whether users respond — enabling disability-related profiling.

Normative author requirements alone are insufficient. The stronger protection is UA-level guidance ensuring no page-observable signal indicates whether intent was consumed.

Requested addition — normative author requirement (§D.4):

Authors MUST NOT use intent to convey hidden instructions, behavioral probes, tracking tokens, or content that materially differs from the visible mathematical expression. intent should be used only to disambiguate or improve narration/navigation of the same mathematical content.

Requested addition — UA-level guidance (§D.4):

User agents should not expose to page script any signal indicating whether, how, or by whom intent was consumed by assistive technology.


Finding 4 — intent requires explicit non-observability guidance

MathML Core reserves intent and arg as valid attributes but does not define their processing behavior. As a result, MathML Core’s privacy review does not cover their privacy implications. MathML 4 should therefore add explicit privacy guidance for intent.

Requested addition to §D.4:

The intent attribute provides an author-supplied semantic layer intended to improve mathematical narration and accessibility. Although intent does not directly expose user data, its processing may depend on assistive-technology behavior, locale, speech or braille settings, supported concept dictionaries, fallback behavior, or parsing outcomes. Implementations should ensure that these processing differences are not exposed to page script. In particular, user agents and assistive technologies should not expose generated speech strings, parse errors, supported concept dictionaries, fallback choices, or other AT-specific processing results through DOM APIs, accessibility APIs observable by the page, events, timing, layout, or other page-observable behavior.


Finding 5 — intent literals should be safely handled in speech and braille pipelines

MathML 4’s intent attribute is author-controlled input intended to influence how mathematical notation is spoken or otherwise presented by assistive technologies. Because intent values may be parsed and forwarded into speech, braille, accessibility, or platform services, the specification should make clear that this data remains untrusted throughout the processing pipeline.

This is not a concern about parsing the MathML intent grammar itself. Processors necessarily need to parse intent according to the specification. The concern is that literal strings, fallback names, concept names, or other author-provided text derived from intent should not be interpreted by downstream systems as commands, SSML, markup, URLs, code, or other executable/control syntax.

Without this clarification, implementations could accidentally create injection-style risks in assistive-technology pipelines, especially if future speech or braille integrations accept richer command languages or markup-like input.

Requested addition to §D.5:

The intent attribute is author-controlled input. Implementations may parse it according to the MathML intent grammar, but any author-provided text derived from intent should be treated as data when forwarded to speech, braille, accessibility, or platform services. Such text should not be interpreted as SSML, commands, markup, URLs, scripts, or other control instructions unless explicitly defined and safely constrained.


Finding 6 — intent processing should not expose user locale or AT preferences

This risk does not arise in MathML Core because intent is not defined there. MathML 4 introduces author-provided intent values that may be interpreted differently depending on language, locale, speech rules, braille rules, or assistive-technology preferences.

Using user-specific settings such as OS locale, speech locale, braille preferences, or installed accessibility dictionaries is not necessarily a privacy problem by itself. These settings may be needed to produce the correct experience for the user. The privacy concern arises if those differences become observable to the page. For example, if a page script can observe different generated accessible names, fallback behavior, parsing errors, timing, layout, or other outputs based on user-specific locale or AT configuration, then intent processing could add fingerprinting entropy beyond MathML Core’s baseline.

Requested addition to §D.4:

Implementations should use document and element language as the author-controlled input for intent interpretation when possible. User-specific locale, speech, braille, or assistive-technology preferences may affect the user’s final accessibility experience, but differences derived from those preferences must not be exposed to page script through generated accessible names, fallback behavior, parsing errors, timing, layout, events, or other observable behavior.


Finding 7 — Clarify fetch behavior for external annotation references

MathML 4 allows annotation and annotation-xml elements to reference external annotation content using src. The specification appears to discuss this mainly for processors that expand, export, or transform annotations, rather than for ordinary visual rendering. However, because src is a URL-bearing attribute, MathML 4 should explicitly define when, if ever, these external references may be dereferenced in web contexts.

In particular, MathML 4 should state that user agents must not automatically fetch external annotation references merely for parsing, rendering, accessibility-tree construction, indexing, or passive document inspection. If a processor chooses to expand or export annotation content by fetching an external reference, that fetch should be explicit, should follow the host environment’s normal web security policies, and should not bypass CSP, referrer policy, mixed-content restrictions, credential rules, private-network protections, or user/application mediation.

Without this clarification, external annotations could become an unexpected network-observable surface, especially in tools that process MathML for accessibility, search, conversion, validation, or export.

Requested addition to §D.5:

In web contexts, external annotation references via annotation src or annotation-xml src must not be fetched automatically during parsing, rendering, accessibility-tree construction, or other passive processing. Any processor that expands or exports external annotation content should treat the reference as an explicit resource load subject to the host environment’s normal fetch, CSP, referrer, credentials, mixed-content, and network-isolation policies.


Finding 8 — mglyph adds external image resource loading outside Core

mglyph is not in MathML Core. It includes a src attribute for external glyph images, and the spec notes a JavaScript polyfill implements it using img. This creates image-like network requests not present in Core's baseline.

Requested addition to §D.5:

Web implementations and polyfills must treat mglyph resource loading like ordinary image loading: subject to CSP, referrer policy, mixed-content blocking, credential rules, and canvas tainting where applicable. User agents should not create additional network observability beyond ordinary image loading behavior.


Finding 9 — Content MathML semantic identifiers should not be resolved automatically

Content MathML is outside MathML Core and introduces semantic identifiers such as definitionURL, cd, and csymbol. These identifiers can refer to external or application-defined semantic definitions. While such references may be useful for specialized tools, MathML 4 should clarify that web user agents must not automatically resolve or dereference them during ordinary parsing, rendering, or accessibility processing..

Requested addition to §D.4:

Content MathML semantic identifiers such as definitionURL, cd, and csymbol should be treated as opaque identifiers in web contexts. User agents must not automatically fetch, resolve, or dereference them during parsing, rendering, or accessibility processing unless an application explicitly requests such resolution subject to the host environment’s normal fetch and privacy controls.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions