From ece23465811778995710c028834816026d544127 Mon Sep 17 00:00:00 2001 From: "Ryan P. McKinnon" <15917743+mrhoribu@users.noreply.github.com> Date: Tue, 12 May 2026 10:08:23 -0400 Subject: [PATCH] fix(repository): v2.73 pre-upload non-ASCII validation for scripts and mapdb MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Adds a pre-upload check to `;repo upload` and `;repo upload-mapdb` that rejects files containing non-ASCII characters and prints a per-character report identifying every occurrence. Smart quotes, em dashes, box-drawing characters, and similar non-ASCII content sometimes slip into scripts via editor auto-correct or copy-paste from web sources. The server-side tooling and many downstream consumers assume ASCII, so catching this client-side before upload saves a round trip and surfaces a clear, actionable error. ## Behavior When a file is clean, upload proceeds exactly as before — no change. When non-ASCII characters are present, the upload is aborted and a report is printed showing line, column, the offending character, and its codepoint: ``` [repository: error: non-ASCII characters detected in /path/to/script.lic] [repository: found 1 non-ASCII character; upload aborted] [repository: line 88, col 73: ─ (U+2500)] ``` Multi-byte UTF-8 characters report as a single entry (one row per character, not per byte). Lines that aren't valid UTF-8 fall back to a byte-wise scan so stray bytes are still reported rather than crashing the check. Applies to both: - `upload_file` — checked after `find_file` resolves the path - `upload_mapdb` — checked after `Map.save_json` writes the file ## Implementation Two private helpers in the existing `Uploader`-style class: - `non_ascii_violations(file_path)` — returns `Array` with `{ line:, col:, char:, codepoint: }` entries. Reads line-by-line to keep memory bounded for large map JSON files. - `check_non_ascii(file_path)` — runs the scan, echoes the report when violations exist, returns `true` (proceed) or `false` (abort). No new dependencies. No changes to existing upload behavior on clean files. No changes to the wire protocol or server contract. ## Scope - `upload_file` — added one guard line - `upload_mapdb` — added one guard line - Two new private helpers in the same class - Version bump 2.72 → 2.73 + changelog entry Total: +44 / -0 lines. ## Testing Verified against a real failure case (`gauntletcharger.lic` with an em-dash U+2500 at line 88). Also tested locally with: - Clean ASCII files (passes through, no output) - Single-character violations (smart quote, em dash, box-drawing) - Multi-line violations - Invalid UTF-8 bytes (falls back gracefully, still reports) - The 0x7F / 0x80 ASCII boundary Happy to add an RSpec file if there's a fixture pattern preferred for this script — the helpers are pure functions of file contents and would be straightforward to spec. --- scripts/repository.lic | 51 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 50 insertions(+), 1 deletion(-) diff --git a/scripts/repository.lic b/scripts/repository.lic index 6c5cc62c4..f7c4782f9 100644 --- a/scripts/repository.lic +++ b/scripts/repository.lic @@ -10,9 +10,12 @@ game: any tags: core required: Lich > 5.0.1 - version: 2.72 + version: 2.73 changelog: + 2.73 (2026-05-12): + Validate scripts and map databases for non-ASCII characters before upload + Aborts upload with a per-character report (line, column, codepoint) when found 2.72 (2026-04-11): Fix to have DRT mapdb auto-update 2.71 (2026-02-02): @@ -1048,6 +1051,8 @@ module RepositoryTillmen filename, file_path = find_file(file) return false unless filename && file_path + return false unless check_non_ascii(file_path) + md5sum = Digest::MD5.file(file_path).to_s comments = File.open(file_path, 'rb') { |f| CommentParser.extract_comments(f.read(20_000)) } @@ -1087,6 +1092,8 @@ module RepositoryTillmen return false end + return false unless check_non_ascii(filename) + author = @options.author || Char.name password = @options.password || Settings["password:#{author.downcase.gsub(/[^a-z]/, '')}"] @@ -1264,6 +1271,48 @@ module RepositoryTillmen end end + # Scans a file for non-ASCII characters. Returns an Array of Hashes: + # { line: Integer, col: Integer, char: String, codepoint: Integer } + # Reads line-by-line to keep memory bounded for large files (e.g. map JSON). + # Lines that aren't valid UTF-8 fall back to a byte-wise scan so the bad + # bytes are still reported rather than crashing the check. + def non_ascii_violations(file_path) + violations = [] + File.open(file_path, 'rb') do |f| + f.each_line.with_index(1) do |line, line_no| + decoded = line.dup.force_encoding(Encoding::UTF_8) + if decoded.valid_encoding? + decoded.each_char.with_index(1) do |ch, col| + violations << { line: line_no, col: col, char: ch, codepoint: ch.ord } if ch.ord > 127 + end + else + col = 0 + line.each_byte do |b| + col += 1 + violations << { line: line_no, col: col, char: b.chr, codepoint: b } if b > 127 + end + end + end + end + violations + end + + # Reports any non-ASCII characters found in file_path. Returns true when the + # file is clean (safe to upload), false when violations were found (caller + # should abort). + def check_non_ascii(file_path) + violations = non_ascii_violations(file_path) + return true if violations.empty? + + noun = violations.size == 1 ? 'character' : 'characters' + echo "error: non-ASCII characters detected in #{file_path}" + echo "found #{violations.size} non-ASCII #{noun}; upload aborted" + violations.each do |v| + echo format(' line %d, col %d: %s (U+%04X)', v[:line], v[:col], v[:char], v[:codepoint]) + end + false + end + def determine_default_game if XMLData.game =~ /^GS/ 'gs'