Skip to content

Commit 5a2cb46

Browse files
committed
Add WebHTML Exporter
1 parent 216550b commit 5a2cb46

14 files changed

Lines changed: 279 additions & 125 deletions

File tree

docs/source/external_exporters.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,10 @@ format designated by the ``FORMAT`` string as explained below.
1818

1919
Extending the built-in format exporters
2020
---------------------------------------
21-
A few built-in formats are available by default: ``html``, ``pdf``, ``webpdf``,
22-
``script``, ``latex``. Each of these has its own *exporter* with many
23-
configuration options that can be extended. Having the option to point to a
24-
different *exporter* allows authors to create their own fully customized
21+
A few built-in formats are available by default: ``html``, ``pdf``, ``webhtml``,
22+
``webpdf``, ``script``, ``latex``. Each of these has its own *exporter* with
23+
many configuration options that can be extended. Having the option to point
24+
to a different *exporter* allows authors to create their own fully customized
2525
templates or export formats.
2626

2727
A custom *exporter* must be an importable Python object. We recommend that

docs/source/highlighting.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
Customizing Syntax Highlighting
22
===============================
33

4-
Under the hood, nbconvert uses pygments to highlight code. pdf, webpdf and html exporting support
5-
changing the highlighting style.
4+
Under the hood, nbconvert uses pygments to highlight code. pdf, webpdf, html, and webhtml
5+
exporting support changing the highlighting style.
66

77
Using Builtin styles
88
--------------------
99
Pygments has a number of builtin styles available. To use them, we just need to set the style setting
1010
in the relevant preprocessor.
1111

12-
To change html and webpdf highlighting export with:
12+
To change html, webhtml, and webpdf highlighting export with:
1313

1414
.. code-block:: bash
1515

docs/source/install.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ notebooks to PDF.
7979
Installing Chromium
8080
-------------------
8181

82-
For converting notebooks to PDF with ``--to webpdf``, nbconvert requires the
82+
For converting notebooks to PDF with ``--to webpdf``, or for prerendering HTML
83+
notebooks via ``--to webhtml``, nbconvert requires the
8384
`playwright <https://github.com/microsoft/playwright-python>`_ Chromium automation library.
8485

8586
Playwright makes use of a specific version of Chromium. If it does not find a suitable

docs/source/usage.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ The currently supported output formats are:
2626
- :ref:`HTML <convert_html>`,
2727
- :ref:`LaTeX <convert_latex>`,
2828
- :ref:`PDF <convert_pdf>`,
29+
- :ref:`WebHTML <convert_webhtml>`,
2930
- :ref:`WebPDF <convert_webpdf>`,
3031
- :ref:`Reveal.js HTML slideshow <convert_revealjs>`,
3132
- :ref:`Markdown <convert_markdown>`,
@@ -71,6 +72,19 @@ HTML
7172

7273
If this option is provided, embed images as base64 urls in the resulting HTML file.
7374

75+
.. _convert_webpdf:
76+
77+
WebHTML
78+
~~~~~~
79+
* ``--to webhtml``
80+
81+
Generates an HTML document by first rendering to HTML, rendering the HTML Chromium headless, and
82+
exporting to resulting HTML content back to a file. This exporter supports the same templates
83+
as ``--to html``.
84+
85+
The webhtml exporter requires the ``playwright`` Chromium automation library, which
86+
can be installed via ``nbconvert[webhtml]``.
87+
7488
.. _convert_latex:
7589

7690
LaTeX

nbconvert/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
ScriptExporter,
2222
SlidesExporter,
2323
TemplateExporter,
24+
WebHTMLExporter,
2425
WebPDFExporter,
2526
export,
2627
get_export_names,
@@ -48,6 +49,7 @@
4849
"ScriptExporter",
4950
"SlidesExporter",
5051
"TemplateExporter",
52+
"WebHTMLExporter",
5153
"WebPDFExporter",
5254
"__version__",
5355
"export",

nbconvert/exporters/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
from .script import ScriptExporter
1414
from .slides import SlidesExporter
1515
from .templateexporter import TemplateExporter
16+
from .webhtml import WebHTMLExporter
1617
from .webpdf import WebPDFExporter
1718

1819
__all__ = [
@@ -34,6 +35,7 @@
3435
"ScriptExporter",
3536
"SlidesExporter",
3637
"TemplateExporter",
38+
"WebHTMLExporter",
3739
"WebPDFExporter",
3840
"export",
3941
"get_export_names",

nbconvert/exporters/webhtml.py

Lines changed: 158 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
"""Export to HTML after loading in a headless browser"""
2+
3+
# Copyright (c) IPython Development Team.
4+
# Distributed under the terms of the Modified BSD License.
5+
6+
import asyncio
7+
import concurrent.futures
8+
import os
9+
import subprocess
10+
import sys
11+
import tempfile
12+
from importlib import util as importlib_util
13+
14+
from traitlets import Bool, List, Unicode, default
15+
16+
from .html import HTMLExporter
17+
18+
PLAYWRIGHT_INSTALLED = importlib_util.find_spec("playwright") is not None
19+
IS_WINDOWS = os.name == "nt"
20+
21+
__all__ = ("WebHTMLExporter",)
22+
23+
class WebHTMLExporter(HTMLExporter):
24+
"""Writer designed to write to HTML files after rendering in a browser.
25+
26+
This inherits from :class:`HTMLExporter`. It creates the HTML using the
27+
template machinery, and then run playwright to load in a browser, saving
28+
the resulting page.
29+
"""
30+
31+
export_from_notebook = "HTML via Browser"
32+
33+
allow_chromium_download = Bool(
34+
False,
35+
help="Whether to allow downloading Chromium if no suitable version is found on the system.",
36+
).tag(config=True)
37+
38+
@default("file_extension")
39+
def _file_extension_default(self):
40+
return ".html"
41+
42+
@default("template_name")
43+
def _template_name_default(self):
44+
return "webhtml"
45+
46+
disable_sandbox = Bool(
47+
False,
48+
help="""
49+
Disable chromium security sandbox when converting to PDF.
50+
51+
WARNING: This could cause arbitrary code execution in specific circumstances,
52+
where JS in your notebook can execute serverside code! Please use with
53+
caution.
54+
55+
``https://github.com/puppeteer/puppeteer/blob/main@%7B2020-12-14T17:22:24Z%7D/docs/troubleshooting.md#setting-up-chrome-linux-sandbox``
56+
has more information.
57+
58+
This is required for webhtml to work inside most container environments.
59+
""",
60+
).tag(config=True)
61+
62+
browser_args = List(
63+
Unicode(),
64+
help="""
65+
Additional arguments to pass to the browser rendering to PDF.
66+
67+
These arguments will be passed directly to the browser launch method
68+
and can be used to customize browser behavior beyond the default settings.
69+
""",
70+
).tag(config=True)
71+
72+
def run_playwright(self, html, _postprocess = None):
73+
"""Run playwright."""
74+
75+
async def main(temp_file):
76+
"""Run main playwright script."""
77+
78+
try:
79+
from playwright.async_api import ( # type: ignore[import-not-found] # noqa: PLC0415,
80+
async_playwright,
81+
)
82+
except ModuleNotFoundError as e:
83+
msg = (
84+
"Playwright is not installed to support Web PDF conversion. "
85+
"Please install `nbconvert[webpdf]` to enable."
86+
)
87+
raise RuntimeError(msg) from e
88+
89+
if self.allow_chromium_download:
90+
cmd = [sys.executable, "-m", "playwright", "install", "chromium"]
91+
subprocess.check_call(cmd) # noqa: S603
92+
93+
playwright = await async_playwright().start()
94+
chromium = playwright.chromium
95+
96+
args = self.browser_args
97+
if self.disable_sandbox:
98+
args.append("--no-sandbox")
99+
100+
try:
101+
browser = await chromium.launch(
102+
handle_sigint=False, handle_sigterm=False, handle_sighup=False, args=args
103+
)
104+
except Exception as e:
105+
msg = (
106+
"No suitable chromium executable found on the system. "
107+
"Please use '--allow-chromium-download' to allow downloading one,"
108+
"or install it using `playwright install chromium`."
109+
)
110+
await playwright.stop()
111+
raise RuntimeError(msg) from e
112+
113+
page = await browser.new_page()
114+
await page.emulate_media(media="print")
115+
await page.wait_for_timeout(100)
116+
await page.goto(f"file://{temp_file.name}", wait_until="networkidle")
117+
await page.wait_for_timeout(100)
118+
119+
data = await page.content()
120+
121+
if _postprocess:
122+
# Reuse this code for webpdf
123+
data = await _postprocess(page, browser, playwright)
124+
125+
await browser.close()
126+
await playwright.stop()
127+
return data
128+
129+
pool = concurrent.futures.ThreadPoolExecutor()
130+
# Create a temporary file to pass the HTML code to Chromium:
131+
# Unfortunately, tempfile on Windows does not allow for an already open
132+
# file to be opened by a separate process. So we must close it first
133+
# before calling Chromium. We also specify delete=False to ensure the
134+
# file is not deleted after closing (the default behavior).
135+
temp_file = tempfile.NamedTemporaryFile( # noqa: SIM115
136+
suffix=".html", delete=False
137+
)
138+
with temp_file:
139+
if isinstance(html, str):
140+
temp_file.write(html.encode("utf-8"))
141+
else:
142+
temp_file.write(html)
143+
try:
144+
html_data = pool.submit(asyncio.run, main(temp_file)).result()
145+
finally:
146+
# Ensure the file is deleted even if playwright raises an exception
147+
os.unlink(temp_file.name)
148+
return html_data
149+
150+
def from_notebook_node(self, nb, resources=None, **kw):
151+
"""Convert from a notebook node."""
152+
html, resources = super().from_notebook_node(nb, resources=resources, **kw)
153+
154+
self.log.info("Building HTML")
155+
html_data = self.run_playwright(html)
156+
self.log.info("HTML successfully created")
157+
158+
return html_data, resources

0 commit comments

Comments
 (0)