Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ organization-logo-nav: https://www.lib.uidaho.edu/collectionbuilder/cdil-white.p
# provide a relative path in this repository or full url
lib-assets: /assets/lib
# ignore stuff
exclude: [docs/, Rakefile, rakelib/, README.md, LICENSE, CITATION.cff, CODE_OF_CONDUCT.md, CONTRIBUTING.md, SECURITY.md]
exclude: [docs/, Rakefile, rakelib/, README.md, LICENSE, CITATION.cff, CODE_OF_CONDUCT.md, CONTRIBUTING.md, SECURITY.md, offline_site/]
# compress CSS output
sass:
style: compressed
4 changes: 3 additions & 1 deletion _includes/head/head.html
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,10 @@
{% comment %}
Meta tags and analytics are added during production build ONLY
{%- endcomment -%}
{% if jekyll.environment == "production" %}
{% if jekyll.environment == "production" or jekyll.environment == "offline" %}
{% if layout.item-meta %}{% include head/item-meta.html %}{% else %}{% include head/page-meta.html %}{% endif %}
{% endif %}
{% if jekyll.environment == "production" %}
{% include head/analytics.html %}
{%- endif -%}
{% if site.noindex == true or page.noindex == true or layout.noindex == true %}<meta name="robots" content="noindex" />{% endif %}
Expand Down
17 changes: 16 additions & 1 deletion _includes/js/table-js.html
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
{% comment %}

Javascript for DataTables interactive table feature, https://datatables.net/
Used with the "data.html" layout.
By default it loads table data from "assets/js/metadata.min.json" for better performance with large collections.
For offline build, it provides an alternative basic table version.

{%- endcomment -%}
{% assign fcount = site.data.config-table | size %}
<!-- load DataTables with jquery bundled -->
<script type="text/javascript" language="javascript" src="{{ site.lib-assets | default: '/assets/lib' | relative_url }}/datatables/datatables.min.js"></script>
<script src="{{ site.lib-assets | default: '/assets/lib' | relative_url }}/datatables/datatables.min.js"></script>
{% unless jekyll.environment == "offline" or page.simple-table == true %}
<script>
/* use jquery to initialize DataTables and load collection data */
$(document).ready( function () {
Expand Down Expand Up @@ -30,3 +39,9 @@
});
});
</script>
{% else %}
<script>
/* initialize basic DataTables */
let table = new DataTable('#item-table');
</script>
{% endunless %}
15 changes: 14 additions & 1 deletion _layouts/data.html
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,22 @@
{% for f in fields %}
<th scope="col">{{ f }}</th>
{% endfor %}
<th>Link</th>
{% unless jekyll.environment == "offline" %}<th scope="col">Link</th>{% endunless %}
</tr>
</thead>
{% if jekyll.environment == "offline" or page.simple-table == true %}
<tbody>
{% if site.data.theme.data-child-objects == true %}
{%- assign items = site.data[site.metadata] | where_exp: 'item','item.objectid' -%}
{% else %}
{%- assign items = site.data[site.metadata] | where_exp: 'item','item.objectid and item.parentid == nil' -%}
{% endif %}
{%- assign fields = site.data.config-table | map: 'field' -%}
{%- for item in items -%}
<tr>{% for f in fields %}<td>{% if forloop.first %}<a href="{{ '/items/' | relative_url }}{% if item.parentid %}{{ item.parentid }}.html#{{ item.objectid }}{% else %}{{ item.objectid }}.html{% endif %}">{{ item[f] }}</a>{% else %}{{ item[f] }}{% endif %}</td>{% endfor %}</tr>
{%- endfor -%}
</tbody>
{% endif %}
</table>
</div>

35 changes: 35 additions & 0 deletions docs/rake_tasks/build_offline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# build_offline

`rake build_offline` generates your CB site, downloads all external media, and then rewrites all internal links to create a copy of your project that can be used fully offline in the local filesystem.

The fully static offline file version is intended to serve as an artifact for digital preservation that maintains the functionality of a project in a minimal environment.
It can also be useful for sharing the site (via a thumb drive) in a location with no internet, or for content that needs to remain offline for security or privacy reasons.

The task will:

1. Complete a fresh build of the site (using the "offline" JEKYLL_ENV rather than "production", which allows us to swap out some parts of the site specific to building offline)
2. Copy the build to the output directory
3. Download external media (images, PDFs, audio) references in your metadata CSV to the "objects" folder.
4. Rewrite all links in the files to relatives file paths so that pages can load and link correctly from the local filesystem.

When the task completes, you can browse the offline version by clicking "offline_site/index.html" to open it in your web browser.

| option | description | default value |
| --- | --- | --- |
| download_external | attempt to download all external media linked in the project including items, true/false | true |
| output_dir | directory name for output offline version | "offline_site" |
| skip_rewrite | local path of directory to skip rewriting, useful for external libraries that should not be modified. | "assets/lib" |

Pass options as rake arguments:

`rake build_offline[false,"my_offline_copy","assets/lib"]`

Note: if you change the default output_dir, remember to add it to `exclude` option in "_config.yml" to avoid including in your future site builds!

## Limitations

- Streaming video (YouTube, Vimeo, etc.) is not downloaded and will not play offline. Item pages for video objects will display without the video.
- Map tiles (Leaflet/OpenStreetMap) require internet access. The map page will show markers but no background tiles when offline.
- External images not in metadata (e.g., organization logos in the banner) remain as external links and require internet to display. Please manually adjust those images in your project.
- The rewrite script is somewhat idiosyncratic to CB projects, so may not work correctly for other websites or highly customized projects.

264 changes: 264 additions & 0 deletions rakelib/build_offline.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,264 @@
###############################################################################
# TASK: build_offline
#
# build a CollectionBuilder site and replace links for offline use
#
# generates the Jekyll site, downloads external media (images, pdfs),
# and rewrites all internal links so the site works from the local filesystem without a server.
#
# options (passed as rake arguments, e.g. rake build_offline[true,offline_site,assets/lib]):
# download_external - download external media linked in metadata, true/false (default: true)
# output_dir - directory name for the offline output (default: "offline_site")
# skip_rewrite - local path of directory to skip rewriting, useful for external libraries that should not be modified (default: "assets/lib")
#
# see docs/rake_tasks/build_offline.md for full documentation
###############################################################################

require 'net/http'
require 'open-uri'
require 'pathname'
require 'uri'
require 'yaml'

# file types to download for offline use (images and audio; streaming video is skipped)
OFFLINE_MEDIA_EXTENSIONS = %w[.jpg .jpeg .png .gif .tif .tiff .pdf .mp3 .wav .ogg .m4a].freeze

# streaming/video platforms to skip when downloading external media
OFFLINE_SKIP_DOMAINS = %w[youtube.com youtu.be vimeo.com soundcloud.com].freeze

# check if a URL is from a platform that should be skipped for downloading
def offline_skip_url?(url)
OFFLINE_SKIP_DOMAINS.any? { |domain| url.include?(domain) }
end

# check whether the URL points to a file type eligible for offline download
def offline_downloadable?(url)
ext = File.extname(URI.parse(url).path).downcase
OFFLINE_MEDIA_EXTENSIONS.include?(ext)
rescue URI::InvalidURIError
false
end

# download a file from url and save to dest_path; returns true on success
def offline_download(url, dest_path)
puts "Downloading: #{url}"
URI.open(url, 'rb', open_timeout: 30, read_timeout: 60) do |remote|
IO.copy_stream(remote, dest_path)
end
puts " -> #{dest_path}"
true
rescue OpenURI::HTTPError, SocketError, Errno::ECONNREFUSED, Errno::ETIMEDOUT,
Net::OpenTimeout, Net::ReadTimeout, RuntimeError => e
puts " -> download failed: #{e.message}"
FileUtils.rm_f(dest_path)
false
end

# rewrite all internal links in a file's content for local filesystem use.
# depth - number of directory levels below the offline root (0 = root-level files)
# site_url - absolute URL prefix from Jekyll config (url + baseurl), used in data files
# url_map - hash of { external_url => root_relative_local_path } for downloaded media
def offline_rewrite_links(content, depth, site_url, url_map)
prefix = '../' * depth

# 1. replace downloaded external media URLs with relative local paths
url_map.each do |external_url, local_path|
content = content.gsub(external_url, "#{prefix}#{local_path.delete_prefix('/')}")
end

# 2. replace absolute site URLs (Jekyll url + baseurl, or localhost:4000 when url is unset)
# these appear in generated data files and occasionally in HTML meta tags
unless site_url.empty?
escaped = Regexp.escape(site_url)
content = content.gsub(%r{#{escaped}(/[^\s"'<>()\[\]]+)}) do
"#{prefix}#{$1.delete_prefix('/')}"
end
# bare site root URL with no following path
content = content.gsub(%r{#{escaped}/?(?=[\s"'<>()\[\]])}) do
"#{prefix}index.html"
end
end

# 3. rewrite root-relative paths in HTML attribute values
# covers href, src, action, content (meta), xlink:href (SVG), data-src (lazy-load)
# negative lookahead (?!\/) prevents rewriting protocol-relative URLs (//)
content = content.gsub(/((?:href|src|action|content|xlink:href|data-src)=["'])(\/(?!\/)[^"']*)/) do
local = $2.delete_prefix('/')
local = 'index.html' if local.empty?
"#{$1}#{prefix}#{local}"
end

# 4. rewrite root-relative paths in CSS url() references (inline styles and <style> blocks)
content = content.gsub(/url\((['"]?)(\/(?!\/)[^'")\s]+)(['"]?)\)/) do
"url(#{$1}#{prefix}#{$2.delete_prefix('/')}#{$3})"
end

# 5. rewrite root-relative paths in JS/JSON string literals (single and double quoted)
# handles inline data arrays like: "img": "/objects/thumbs/item_th.jpg"
content = content.gsub(/(["'])(\/(?!\/)[^"'\r\n]+)(["'])/) do
next "#{$1}#{$2}#{$3}" unless $1 == $3 # skip mismatched quotes (not a plain string)
"#{$1}#{prefix}#{$2.delete_prefix('/')}#{$3}"
end

# 6. rewrite root-relative paths in JS template literals (backtick strings)
# handles dynamic hrefs like: `/items/${obj.id}.html`
content = content.gsub(/`(\/(?!\/)[^`]+)`/) do
"`#{prefix}#{$1.delete_prefix('/')}`"
end

content
end

desc 'Build jekyll site and rewrite links for offline use'
task :build_offline, [:download_external, :output_dir, :skip_rewrite] do |_t, args|
args.with_defaults(
download_external: 'true',
output_dir: 'offline_site',
skip_rewrite: 'assets/lib'
)

download_external = args.download_external.to_s.strip.downcase != 'false'
offline_dir = args.output_dir
skip_rewrite_dir = args.skip_rewrite.to_s.strip

# build jekyll site with offline environment
ENV['JEKYLL_ENV'] = 'offline'
system('bundle', 'exec', 'jekyll', 'build') or abort 'Jekyll build failed'

jekyll_site = '_site'

# load site configuration for url, baseurl, and metadata filename
config = YAML.load_file('_config.yml')
baseurl = (config['baseurl'] || '').strip.chomp('/')
site_url_val = (config['url'] || '').strip.chomp('/')
# when url is blank, Jekyll uses http://localhost:4000 for absolute URLs in generated data files
site_url = site_url_val.empty? ? "http://localhost:4000#{baseurl}" : "#{site_url_val}#{baseurl}"
metadata_name = config['metadata']

# recreate output directory for a clean build
if Dir.exist?(offline_dir)
puts "Removing existing '#{offline_dir}' for a clean build..."
FileUtils.rm_rf(offline_dir)
end
FileUtils.mkdir_p(offline_dir)

# copy built site contents into the offline directory (contents only, not _site subfolder)
puts "Copying '#{jekyll_site}' to '#{offline_dir}'..."
Dir.glob(File.join(jekyll_site, '{*,.*}')).each do |entry|
next if ['.', '..'].include?(File.basename(entry))
FileUtils.cp_r(entry, offline_dir)
end

# track { external_url => root_relative_local_path } for all downloaded files
url_map = {}

if download_external
if metadata_name.nil? || metadata_name.strip.empty?
puts "No 'metadata' key found in _config.yml, skipping external media download."
else
metadata_file = File.join('_data', "#{metadata_name}.csv")
unless File.exist?(metadata_file)
puts "Metadata file '#{metadata_file}' not found, skipping external media download."
else
puts "Scanning '#{metadata_file}' for external media to download..."
csv_data = CSV.read(metadata_file, headers: true, encoding: 'utf-8')

# metadata field => objects/ subdirectory for downloaded files
media_field_dirs = {
'object_location' => 'objects',
'image_small' => File.join('objects', 'small'),
'image_thumb' => File.join('objects', 'thumbs')
}

media_field_dirs.each do |field, subdir|
next unless csv_data.headers.include?(field)

dest_dir = File.join(offline_dir, subdir)
FileUtils.mkdir_p(dest_dir)

csv_data.each do |row|
url = row[field]
next if url.nil? || url.strip.empty?
next unless url.start_with?('http')
next if offline_skip_url?(url)
next unless offline_downloadable?(url)
next if url_map.key?(url) # already queued from another field

begin
filename = File.basename(URI.parse(url).path)
rescue URI::InvalidURIError
puts " Skipping invalid URL: #{url}"
next
end

dest_path = File.join(dest_dir, filename)
# root_relative_path uses forward slashes regardless of OS
root_relative = "/#{[subdir.tr(File::SEPARATOR, '/'), filename].join('/')}"
url_map[url] = root_relative

offline_download(url, dest_path) unless File.exist?(dest_path)
end
end
end
end
end

# rewrite all links in html and js files for local filesystem use
puts "Rewriting links for offline use..."
updated = 0
Dir.glob(File.join(offline_dir, '**', '*.{html,js}')).each do |filepath|
rel = Pathname.new(filepath).relative_path_from(Pathname.new(offline_dir)).to_s
# skip files inside the skip_rewrite directory (e.g. third-party libraries)
next if !skip_rewrite_dir.empty? && rel.start_with?(skip_rewrite_dir)
depth = rel.count('/')
content = File.read(filepath, encoding: 'utf-8', invalid: :replace, undef: :replace)
new_content = offline_rewrite_links(content, depth, site_url, url_map)
if new_content != content
File.write(filepath, new_content, encoding: 'utf-8')
updated += 1
end
end
puts " #{updated} file(s) updated.#{skip_rewrite_dir.empty? ? '' : " (skipped '#{skip_rewrite_dir}')"}"

# inline SVG icon sprite: browsers block loading external SVG files in local file:// mode,
# so we embed the full sprite as a hidden <svg> in each HTML page and rewrite all
# href="PATH/cb-icons.svg#id" references to fragment-only href="#id".
# this handles both static <use> elements in HTML and dynamically-built icon strings in JS.
puts "Inlining SVG icon sprite for offline use..."
svg_sprite_path = File.join(offline_dir, 'assets', 'css', 'cb-icons.svg')
if File.exist?(svg_sprite_path)
sprite_svg = File.read(svg_sprite_path, encoding: 'utf-8')
# strip XML declaration — not valid inside HTML documents
sprite_svg = sprite_svg.sub(/\A<\?xml[^>]*\?>\s*/, '')
# mark the injected sprite as hidden; it is a symbol library, not visible content
inline_sprite = sprite_svg.sub(/<svg\b/, '<svg style="display:none" aria-hidden="true"')

inlined = 0
Dir.glob(File.join(offline_dir, '**', '*.html')).each do |filepath|
content = File.read(filepath, encoding: 'utf-8', invalid: :replace, undef: :replace)
new_content = content.dup

# inject the sprite right after the opening <body> tag so symbols are available
# to all <use> references in the document (and dynamically-created ones via JS)
new_content = new_content.sub(/(<body\b[^>]*>)/, "\\1\n#{inline_sprite}")

# rewrite all href references that point to the external sprite file to use
# fragment-only hrefs (e.g. href="#icon-image"), which reference the now-inlined symbols.
# the path prefix varies by directory depth after link rewriting, so we match
# any characters up to "cb-icons.svg#" rather than a fixed path.
new_content = new_content.gsub(/(href=["'])[^"']*cb-icons\.svg#/, '\1#')

if new_content != content
File.write(filepath, new_content, encoding: 'utf-8')
inlined += 1
end
end
puts " #{inlined} file(s) updated with inline SVG sprite."
else
puts " Warning: '#{svg_sprite_path}' not found, skipping SVG icon inlining."
end

puts "\nDone! Offline site created in '#{offline_dir}'."
puts "Open '#{File.join(offline_dir, 'index.html')}' in a browser to browse the collection."
end

3 changes: 3 additions & 0 deletions rakelib/download_by_csv.rake
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
# TASK: download_by_csv
#
# read csv, download using wget
#
# Requires wget installed!
#
###############################################################################

desc "download objects and rename using csv"
Expand Down
3 changes: 3 additions & 0 deletions rakelib/generate_derivatives.rake
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
# TASK: generate_derivatives
#
# create small and thumb images for image and pdf files in the 'objects' folder
#
# Requires ImageMagick and Ghostscript installed!
#
###############################################################################

require 'image_optim' unless Gem.win_platform?
Expand Down
Loading