feat(legacy-archive): kickers, featured images, slugs, shortlinks#148
Merged
Conversation
…ugs, shortlinks Round 5 of the legacy-archive cleanup, driven by the completeness audit at /tmp/legacy-import-logs/wp-completeness-audit.md. * kickers + subdecks: 4732 articles got correct kickers from poly-online type_db (2303) and WP `Kicker` postmeta (1357 fills + 1072 overwrites of generic "Editorial/Opinion"). Plus 768 poly-online subdecks from blurb_db. * featured images: 1531 WP articles got real featured_image_id + caption + photographer attribution from `Photo`/`PhotoCaption`/`PhotoByline`/ `Photographer` postmeta. * WP slug regen: 2921 run-together slugs (e.g. `midnpartakeinmarathon`) rebuilt from source `post_name` (e.g. `midn-partake-in-marathon`). Old slugs are saved on a new `previous_slug` column so middleware can 301 to the new URL. New `articles_previous_slug_idx` makes the lookup cheap. * shortlinks: new `legacy_shortlinks` lookup table populated from the `pluginSL_shorturl` plugin (5014 of 12,872 source rows resolve to a real destination — 4091 polymer URLs, 871 archive media files, 50 external, 2 mirror fallbacks). Middleware checks the table after the hand-curated override map. Regex broadened from 5-digit to 5-char alphanumeric. Backfill scripts live under scripts/legacy-import/ and run idempotently (dry run by default; --write to commit). All four were used to update the production DB directly via the SSH tunnel before this PR.
… strip CodeQL flagged a single `replace(/<[^>]+>/g, '')` as an incomplete sanitizer because unbalanced angle brackets could survive. Replace with an indexOf fixpoint loop that drops the trailing fragment when a `<` has no matching `>`.
…-import wp_id 7191/7421 * expand-wp-shortcodes.ts: 75 articles re-rendered. 58 [gallery] shortcodes produce real upload nodes (988 images resolved against the attachment guid map), 19 [gview] PDFs become 'Download PDF' links, and 2 iframes (1 YouTube, 1 Google Form) become outbound links. Pre-expands the shortcodes to <img>/<a> tags before handing off to the existing WP→Lexical pipeline + media-resolver. * fix-wp-7191-7421.ts: wp_id 7191 (Tate Boucher's neuromarketing letter, previously mis-imported as a day-archive listing titled 'RENSSELAER UNION') gets its real content + Letter-to-the-Editor kicker + author. wp_id 7421 has zero source content; demoted to draft.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Round 5 of legacy-archive cleanup, driven by the completeness audit. Five fixes batched into one PR:
type_db(2,303) and WPKickerpostmeta (1,357 fills + 1,072 overwrites of generic "Editorial/Opinion"). Section-level pluralism (Letter to the Editor, Top Hat, Derby, My View, etc.) is restored.blurb_dbsummaries.featured_image_id,image_caption, and photographer attribution from postmeta. Image cards on section pages 2009-2019 are no longer blank.2011-11-30-midnpartakeinmarathon) rebuilt from sourcepost_nameto proper hyphenated form (2011-11-30-midn-partake-in-marathon). Old slugs are saved on a newprevious_slugcolumn; middleware 301s the old URL to the new one.legacy_shortlinkslookup table populated frompluginSL_shorturl(12,872 rows → 5,014 useful redirects: 4,091 polymer URLs + 871 archive media + 50 external + 2 mirror fallbacks). Regex broadened from 5-digit numeric to 5-char alphanumeric.Schema changes
Two migrations:
20260507_000000_add_articles_previous_slug— addsarticles.previous_slug(+ version shadow + index).20260507_010000_add_legacy_shortlinks— newlegacy_shortlinks(short_code, target_url, hit_count, created_at)lookup table.Both are nullable / additive; no data is destroyed. Production DB has already been migrated and backfilled directly via the SSH tunnel (the backfill scripts ran serially against the prod DB).
Test plan
pnpm typecheckcleanpnpm lintclean (53 pre-existing warnings, 0 errors)pnpm buildcleanfeatured_image_idset, slug renamed,previous_slugretained/news/2011/11/2011-11-30-watsonreturnstorpidefeatsstudents) → expect 301 to the new hyphenated slug/63345) → expect 301 to the polymer URL🤖 Generated with Claude Code