Skip to content

Releases: openzim/gutenberg

3.0.1

24 Nov 16:41
b116866

Choose a tag to compare

Added

  • Add CLI flag to customize ZIM Name (#340)

Fixed

  • Add missing flags in offliner definition (#339)

3.0.0

28 Oct 13:50
8dfc5f4

Choose a tag to compare

Breaking

  • Remove any optimization logic + use of S3 cache, dropping --optimization-cache and --use-any-optimized-version flags (#300)
  • Move to another default mirror + add CLI flag to select mirror to use, dropping --rdf-url flag and adding --mirror_url CLI flag (#301)
  • Reengineer the scraper (#312)
    • Source list of books from CSV at https://gutenberg.pglaf.org/cache/epub/feeds/pg_catalog.csv.gz for quicker startup
    • Stop downloading big RDF archive and download only needed individual RDF files (saves lot of time when only few books are requested ; looks like negligible penalty when many books are needed)
    • Get rid of SQLite database used to persist data across runs, too much maintenance effort and impact on the filesystem for limited benefit
    • Get rid of the option to generate one ZIM per language ; scraper now always produce one single ZIM
    • Data is now directly transferred to the ZIM, without touching the filesystem
    • Get rid of the "steps" approach, not used (anymore?) in production and difficult to maintain
    • Many CLI options removed: --use-any-optimized-version, --zim, --download, --parse, --prepare, -m/--one-language-one-zim, --dlc, -d/--dl-folder, -e/--static-folder
  • Drop support for Gutenberg bookshelves and add support for LibraryOfCongress (#265)

Added

  • Add standard --output option to control where ZIM files are written (#314)
  • Add ability to generate smaller selection (#184)
  • Log failing book ID in case of fatal error (#333)

Changed

  • Finalize implementation of scraper progress (#289)
  • Move to another default mirror + add CLI flag to select mirror to use (#301)
  • Split build logic in Dockerfile to separate dependencies layer from scraper code layer (#302)
  • Prepare structure for TranslateWiki (#312)
  • Remove « .html » extension (#166)
  • Configure libzim verbosity based on --debug flag (#326)

Fixed

  • Stop ignoring HTML illustrations containing cover in their name (#270)
  • Fix JS/JSON files generation (#297, #298)
  • Fix navigation to bookshelves with special characters (#305)
  • Bookshelves with special characters cannot be opened (#306)
  • Fix internationalization of the "Copyrighted" license label (#253)
  • Properly compute (including sorting) ZIM Language items + allow to override with --zim-languages (#323)
  • Fix support for ZIMs without full-text index (#326)

2.2.0

06 Jun 16:06
9c7f2ef

Choose a tag to compare

Added

  • Add support for --debug flag to output debug logs
  • Add support for -L long_description flags
  • Add request timeout for util.py (#197)
  • Add Booklanguage DB to support multi-languages books (#218)
  • Add RTL support to UI (#248)
  • Add language filter to combobox for requested languages (#249)

Changed

  • Simplify Gutenberg scraping (no more rsync, no more fallback URLs / filenames) (#97)
  • Prefer EPUB 3 to EPUB (#235)
  • Do not force the presence of PDF format for all books (#160)
  • Replace usage of os.path and path.py with pathlib.Path (#195)
  • Finalize ZIM metadata title translations and multilingual detection (#229)
  • Replaced magic number with named constant and clarified comment regarding book ID URL rules (#196)
  • Replace print and pp calls with logger (#192)
  • Update to Python3.13
  • Update python-scraperlib to 5.1.1 and dependencies (#188)
  • Rename Book DB table fields (#199)
  • Update multi-resolution favicons (#165)

Fixed

  • Fix regression on missing HTML content (#219)
  • Simplify the logger name (used gutenberg2zim instead of gutenberg2zim.constants) (#206)
  • Add retry logic on book downloads (#254)
  • Fix UI and navigation glitches on bookshelves (#262)
  • Remove dependencies on binaries + buggy pngquant (#257)

2.1.1

17 Jan 13:45
310731c

Choose a tag to compare

Added

  • Publisher ZIM metadata can now be customized at CLI (#210)

Changed

  • Publisher ZIM metadata default value is changed to openZIM intead of Kiwix (#210)

Fixed

  • Do not fail if temporary directory already exists (#207)
  • Typo in Scraper ZIM metadata (#212)
  • Adapt to hatchling v1.19.0 which mandates packages setting (#211)

2.1.0

18 Aug 15:22
1750b66

Choose a tag to compare

Changed

  • Fixed regression with broken filters on on multiple-languages ZIM (#175)
  • Fixed Name metadata that was incorrectly including period (#177)
  • Fixed Language metadata (and filename) for multilang ZIMs (#174)
  • Using zimscraperlib 2.1.0
  • Using localized Title and Description metadata (#148)
  • Fixed regression with epub files stored as application/zip (#181)
  • Adopt Python bootstrap conventions, especially migration to hatch instead of setuptools and Github CI Workflows adaptations (#190)
  • Removed inline Javascript in HTML files (#145)

Fixed

  • Support single quotes in author names (#162)
  • Migrated to another Gutenberg server (#187)
  • Removed useless file languages_06_2018 (#180)

Removed

  • Removed Datatables JS code from repository, fetch online now (#116)
  • Dropped Python 2 support (#191)

2.0.0

21 Feb 08:46

Choose a tag to compare

Added

  • Porgress report using --stats-filename

Changed

  • Updated dependencies, including zimscraperlib (2.0)
  • Now creating no-namespace ZIM with Illustration
  • Fixed/reduced sqlite timeouts
  • Better handling of rsync'd list of URLs
  • RDF files are not extracted to disk anymore (faster on selections)
  • Remove all Urls from DB before processing rsync'd ones
  • Fixed --concurrency short flag (now -c)
  • Docker image now uses python3.11
  • DB don't use a separate Format table anymore

Removed

  • Dependency to zimwriterfs binary.
  • -r/--rdf-folder flag: rdf not extracted to disk anymore
  • --export: HTML files not written to disk first anymore
  • --dev: idem
  • Binaries from docker images: jpegoptim, pngquant, gifsicle, zip, curl, p7zip