Skip to content

Switch back to original xmlquery package#305

Merged
chocolatkey merged 3 commits into
developfrom
xmlquery-fix
May 26, 2026
Merged

Switch back to original xmlquery package#305
chocolatkey merged 3 commits into
developfrom
xmlquery-fix

Conversation

@chocolatkey

@chocolatkey chocolatkey commented May 25, 2026

Copy link
Copy Markdown
Member

We go back to using the original xmlquery package now that the underlying xpath package supports easily using namespaces during querying

@chocolatkey chocolatkey changed the title switch back to original xmlquery package Switch back to original xmlquery package May 25, 2026
@chocolatkey chocolatkey requested a review from mickael-menu May 25, 2026 02:18
@chocolatkey chocolatkey marked this pull request as ready for review May 25, 2026 02:18
@mickael-menu mickael-menu requested a review from Copilot May 26, 2026 09:21

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the EPUB parser stack back to the upstream github.com/antchfx/xmlquery (and antchfx/xpath) now that namespace-aware XPath compilation is available, removing the previously used fork and prefix-injection approach.

Changes:

  • Replace github.com/readium/xmlquery usage with upstream github.com/antchfx/xmlquery and namespace-aware, precompiled XPath expressions.
  • Simplify XML loading by removing the prefixes parameter from fetcher.ReadResourceAsXML and updating all call sites.
  • Update EPUB parsing modules (package doc, NCX, navdoc, SMIL, encryption, metadata) and related tests to use xmlquery.QuerySelector* with compiled XPath.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pkg/parser/epub/utils.go Introduces shared namespace map + XPath compilation helper; updates container.xml parsing call site.
pkg/parser/epub/parser.go Updates XML loading calls to new ReadResourceAsXML signature.
pkg/parser/epub/parser_smil.go Switches SMIL parsing to compiled namespace-aware XPath selectors.
pkg/parser/epub/parser_smil_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/parser_packagedoc.go Switches OPF package document parsing to compiled namespace-aware XPath selectors.
pkg/parser/epub/parser_packagedoc_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/parser_ncx.go Switches NCX parsing to compiled namespace-aware XPath selectors.
pkg/parser/epub/parser_ncx_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/parser_navdoc.go Switches navdoc parsing to compiled namespace-aware XPath selectors.
pkg/parser/epub/parser_navdoc_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/parser_encryption.go Switches encryption.xml parsing to compiled namespace-aware XPath selectors.
pkg/parser/epub/parser_encryption_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/metadata.go Switches metadata lookup to compiled namespace-aware XPath selectors; minor slice type update.
pkg/parser/epub/metadata_test.go Updates test XML loading to new ReadResourceAsXML signature.
pkg/parser/epub/media_overlay_service.go Updates XML loading to new ReadResourceAsXML signature.
pkg/fetcher/resource.go Changes ReadResourceAsXML signature and parsing implementation to upstream xmlquery.
go.mod Adds direct requirements for antchfx/xmlquery and antchfx/xpath; removes readium/xmlquery.
go.sum Updates checksums for dependency changes.
Comments suppressed due to low confidence (2)

pkg/parser/epub/utils.go:50

  • container.xml uses a default namespace (NamespaceOPC). The unprefixed XPath /container/rootfiles/rootfile will not match namespaced elements in standard XPath, so n may be nil for valid EPUBs. Consider adding the container namespace to xmlNS and querying with a prefixed XPath (or using local-name() selectors).
	xml, err := ftchr.ReadResourceAsXML(ctx, res)
	if err != nil {
		return nil, errors.Wrap(err, "failed loading container.xml")
	}
	n := xml.SelectElement("/container/rootfiles/rootfile")
	if n == nil {
		return nil, errors.New("rootfile not found in container")

pkg/fetcher/resource.go:105

  • ReadResourceAsXML converts []byte to string and then back to an io.Reader, which adds an extra allocation/copy for every XML parse. Prefer using a byte reader (e.g., bytes.NewReader(bytes)) to avoid the conversion overhead.
func ReadResourceAsXML(ctx context.Context, r Resource) (*xmlquery.Node, *ResourceError) {
	bytes, ex := r.Read(ctx, 0, 0)
	if ex != nil {
		return nil, ex
	}
	node, err := xmlquery.ParseWithOptions(strings.NewReader(string(bytes)), xmlquery.ParserOptions{
		Decoder: &xmlquery.DecoderOptions{
			Strict: true,
			Entity: xml.HTMLEntity,
		},
	})

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@chocolatkey chocolatkey merged commit 6187833 into develop May 26, 2026
4 checks passed
@chocolatkey chocolatkey deleted the xmlquery-fix branch May 26, 2026 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants