Releases: openzim/gutenberg
Releases · openzim/gutenberg
3.0.1
3.0.0
Breaking
- Remove any optimization logic + use of S3 cache, dropping
--optimization-cacheand--use-any-optimized-versionflags (#300) - Move to another default mirror + add CLI flag to select mirror to use, dropping
--rdf-urlflag and adding--mirror_urlCLI flag (#301) - Reengineer the scraper (#312)
- Source list of books from CSV at https://gutenberg.pglaf.org/cache/epub/feeds/pg_catalog.csv.gz for quicker startup
- Stop downloading big RDF archive and download only needed individual RDF files (saves lot of time when only few books are requested ; looks like negligible penalty when many books are needed)
- Get rid of SQLite database used to persist data across runs, too much maintenance effort and impact on the filesystem for limited benefit
- Get rid of the option to generate one ZIM per language ; scraper now always produce one single ZIM
- Data is now directly transferred to the ZIM, without touching the filesystem
- Get rid of the "steps" approach, not used (anymore?) in production and difficult to maintain
- Many CLI options removed: --use-any-optimized-version, --zim, --download, --parse, --prepare, -m/--one-language-one-zim, --dlc, -d/--dl-folder, -e/--static-folder
- Drop support for Gutenberg bookshelves and add support for LibraryOfCongress (#265)
Added
- Add standard
--outputoption to control where ZIM files are written (#314) - Add ability to generate smaller selection (#184)
- Log failing book ID in case of fatal error (#333)
Changed
- Finalize implementation of scraper progress (#289)
- Move to another default mirror + add CLI flag to select mirror to use (#301)
- Split build logic in Dockerfile to separate dependencies layer from scraper code layer (#302)
- Prepare structure for TranslateWiki (#312)
- Remove « .html » extension (#166)
- Configure libzim verbosity based on
--debugflag (#326)
Fixed
- Stop ignoring HTML illustrations containing cover in their name (#270)
- Fix JS/JSON files generation (#297, #298)
- Fix navigation to bookshelves with special characters (#305)
- Bookshelves with special characters cannot be opened (#306)
- Fix internationalization of the "Copyrighted" license label (#253)
- Properly compute (including sorting) ZIM Language items + allow to override with --zim-languages (#323)
- Fix support for ZIMs without full-text index (#326)
2.2.0
Added
- Add support for
--debugflag to output debug logs - Add support for
-Llong_description flags - Add request timeout for util.py (#197)
- Add Booklanguage DB to support multi-languages books (#218)
- Add RTL support to UI (#248)
- Add language filter to combobox for requested languages (#249)
Changed
- Simplify Gutenberg scraping (no more rsync, no more fallback URLs / filenames) (#97)
- Prefer EPUB 3 to EPUB (#235)
- Do not force the presence of PDF format for all books (#160)
- Replace usage of os.path and path.py with pathlib.Path (#195)
- Finalize ZIM metadata title translations and multilingual detection (#229)
- Replaced magic number with named constant and clarified comment regarding book ID URL rules (#196)
- Replace print and pp calls with logger (#192)
- Update to Python3.13
- Update python-scraperlib to 5.1.1 and dependencies (#188)
- Rename Book DB table fields (#199)
- Update multi-resolution favicons (#165)
Fixed
2.1.1
2.1.0
Changed
- Fixed regression with broken filters on on multiple-languages ZIM (#175)
- Fixed
Namemetadata that was incorrectly including period (#177) - Fixed
Languagemetadata (and filename) for multilang ZIMs (#174) - Using zimscraperlib 2.1.0
- Using localized Title and Description metadata (#148)
- Fixed regression with epub files stored as
application/zip(#181) - Adopt Python bootstrap conventions, especially migration to hatch instead of setuptools and Github CI Workflows adaptations (#190)
- Removed inline Javascript in HTML files (#145)
Fixed
- Support single quotes in author names (#162)
- Migrated to another Gutenberg server (#187)
- Removed useless file languages_06_2018 (#180)
Removed
2.0.0
Added
- Porgress report using
--stats-filename
Changed
- Updated dependencies, including zimscraperlib (2.0)
- Now creating no-namespace ZIM with Illustration
- Fixed/reduced sqlite timeouts
- Better handling of rsync'd list of URLs
- RDF files are not extracted to disk anymore (faster on selections)
- Remove all Urls from DB before processing rsync'd ones
- Fixed --concurrency short flag (now
-c) - Docker image now uses python3.11
- DB don't use a separate Format table anymore
Removed
- Dependency to zimwriterfs binary.
-r/--rdf-folderflag: rdf not extracted to disk anymore--export: HTML files not written to disk first anymore--dev: idem- Binaries from docker images: jpegoptim, pngquant, gifsicle, zip, curl, p7zip