Skip to content

Commit 41bd909

Browse files
committed
-
1 parent 22be6e5 commit 41bd909

1 file changed

Lines changed: 209 additions & 0 deletions

File tree

Lines changed: 209 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,209 @@
1+
---
2+
title: Using WeasyPrint to convert HTML to PDF
3+
image: /assets/img/default-banner.jpg
4+
author: jack
5+
date: 2021-08-15 20:55:00 +0800
6+
categories: [blog, python]
7+
tags: [python, html]
8+
math: false
9+
pin: false
10+
---
11+
12+
Environment: Tested and worked on Ubuntu 20.04 to Ubuntu 24.04.
13+
14+
While maintaining the PDF renderer for the tldr-page project, I came across the handy library WeasyPrint. Here are three ways to use it. Install it via pip:
15+
16+
```bash
17+
pip install weasyprint
18+
```
19+
20+
## Converting HTML into a automatic paginated PDF
21+
22+
First, I found an [HTML example](https://www.w3.org/Style/Examples/011/firstcss.en.html) from W3C:
23+
24+
**mystyle.css**
25+
26+
```css
27+
body {
28+
padding-left: 11em;
29+
font-family: Georgia, "Times New Roman", Times, serif;
30+
color: purple;
31+
background-color: #d8da3d
32+
}
33+
34+
ul.navbar {
35+
list-style-type: none;
36+
padding: 0;
37+
margin: 0;
38+
position: absolute;
39+
top: 2em;
40+
left: 1em;
41+
width: 9em
42+
}
43+
44+
h1 {
45+
font-family: Helvetica, Geneva, Arial, SunSans-Regular, sans-serif
46+
}
47+
48+
ul.navbar li {
49+
background: white;
50+
margin: 0.5em 0;
51+
padding: 0.3em;
52+
border-right: 1em solid black
53+
}
54+
55+
ul.navbar a {
56+
text-decoration: none
57+
}
58+
59+
a:link {
60+
color: blue
61+
}
62+
63+
a:visited {
64+
color: purple
65+
}
66+
67+
address {
68+
margin-top: 1em;
69+
padding-top: 1em;
70+
border-top: thin dotted
71+
}
72+
```
73+
74+
**mydoc.html**
75+
76+
```html
77+
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
78+
<html>
79+
80+
<head>
81+
<title>My first styled page</title>
82+
<link rel="stylesheet" href="mystyle.css">
83+
</head>
84+
85+
<body>
86+
87+
<!-- Site navigation menu -->
88+
<ul class="navbar">
89+
<li><a href="index.html">Home page</a></li>
90+
<li><a href="musings.html">Musings</a></li>
91+
<li><a href="town.html">My town</a></li>
92+
<li><a href="links.html">Links</a></li>
93+
</ul>
94+
95+
<!-- Main content -->
96+
<h1>My first styled page</h1>
97+
<p>Welcome to my styled page!</p>
98+
<p>It lacks images, but at least it has style. And it has links, even if they don't go anywhere&hellip;</p>
99+
<p>There should be more here, but I don't know what yet.</p>
100+
101+
<!-- Sign and date the page, it's only polite! -->
102+
<address>Made 5 April 2004<br>
103+
by myself.</address>
104+
105+
</body>
106+
107+
</html>
108+
```
109+
110+
Place `mydoc.html` and `mystyle.css` in the same directory to generate the following preview in a browser:
111+
112+
![](https://raw.githubusercontent.com/blueskyson/image-host/master/weasyprint/1.png)
113+
114+
Next, we write a small Python script in the current directory:
115+
116+
**convert.py**
117+
118+
```python
119+
from weasyprint import HTML
120+
HTML("mydoc.html").write_pdf("mydoc.pdf")
121+
```
122+
123+
After running the script, you will get a beautiful PDF file:
124+
125+
```bash
126+
python convert.py
127+
```
128+
129+
![](https://raw.githubusercontent.com/blueskyson/image-host/master/weasyprint/2.png)
130+
131+
## Converting an HTML String to a Paginated PDF
132+
133+
The coding logic is similar to reading an HTML file, but when you need to include all CSS filenames in a list and pass it as a parameter to WeasyPrint. We'll use the same `mydoc.html` and `mystyle.css` as examples.
134+
135+
**convert_string.py**
136+
137+
```python
138+
from weasyprint import HTML, CSS
139+
140+
# Append as many style sheets as you want
141+
csslist = []
142+
csslist.append(CSS("mystyle.css"))
143+
144+
with open('mydoc.html', 'r') as file:
145+
# convert HTML string to PDF file
146+
data = file.read()
147+
HTML(string=data).write_pdf("mydoc2.pdf", stylesheets=csslist)
148+
```
149+
150+
## Manual Page Breaks
151+
152+
One of the key differences between a PDF and a text document is pagination. When converting a long web page into a PDF, you need to consider the layout. We can use:
153+
154+
```html
155+
<p style="page-break-before: always"></p>
156+
```
157+
158+
to force the content below this line of HTML to move to the next page. Add this line whenever you need a manual page break. Here is another document example:
159+
160+
`mydoc.html`
161+
162+
```html
163+
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
164+
<html>
165+
<head>
166+
<title>Hello World</title>
167+
</head>
168+
<body>
169+
<!-- Main content -->
170+
<h1>What is GitHub?</h1>
171+
<p>GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects from anywhere.</p>
172+
<p>This tutorial teaches you GitHub essentials like repositories, branches, commits, and Pull Requests. You’ll create your own Hello World repository and learn GitHub’s Pull Request workflow, a popular way to create and review code.</p>
173+
<h2>No coding necessary</h2>
174+
<p>To complete this tutorial, you need a GitHub.com account and Internet access. You don’t need to know how to code, use the command line, or install Git (the version control software GitHub is built on).</p>
175+
176+
<h1>Step 1. Create a Repository</h1>
177+
<p>A repository is usually used to organize a single project. Repositories can contain folders and files, images, videos, spreadsheets, and data sets – anything your project needs. We recommend including a README, or a file with information about your
178+
project. GitHub makes it easy to add one at the same time you create your new repository. It also offers other common options such as a license file.</p>
179+
<p>Your hello-world repository can be a place where you store ideas, resources, or even share and discuss things with others.</p>
180+
<h2>To create a new repository</h2>
181+
<ol>
182+
<li>In the upper right corner, next to your avatar or identicon, click and then select New repository.</li>
183+
<li>Name your repository hello-world.</li>
184+
<li>Write a short description.</li>
185+
<li>Select Initialize this repository with a README.</li>
186+
</ol>
187+
</body>
188+
</html>
189+
```
190+
191+
After converting with **convert.py**, the resulting PDF will look like this:
192+
193+
![](https://raw.githubusercontent.com/blueskyson/image-host/master/weasyprint/3.png)
194+
195+
Generally, we would want major headings to appear at the beginning of each page. To achieve this, you can place the `<p style="page-break-before: always"></p>` before each `<h1></h1>` heading, like so:
196+
197+
```html
198+
...use the command line, or install Git (the version control software GitHub is built on).</p>
199+
200+
<!-- the folowing will be in next page -->
201+
<p style="page-break-before: always" ></p>
202+
203+
<h1>Step 1. Create a Repository</h1>
204+
<p>A repository is usually used to organize a single project...
205+
```
206+
207+
By converting again using `convert.py`, you will have the `<h1></h1>` and its paragraph placed in the next page:
208+
209+
![](https://raw.githubusercontent.com/blueskyson/image-host/master/weasyprint/4.png)

0 commit comments

Comments
 (0)