Skip to content

feat(legacy-archive): kickers, featured images, slugs, shortlinks#148

Merged
RonanHevenor merged 3 commits into
mainfrom
legacy-archive-completeness
May 7, 2026
Merged

feat(legacy-archive): kickers, featured images, slugs, shortlinks#148
RonanHevenor merged 3 commits into
mainfrom
legacy-archive-completeness

Conversation

@RonanHevenor
Copy link
Copy Markdown
Member

Summary

Round 5 of legacy-archive cleanup, driven by the completeness audit. Five fixes batched into one PR:

  • Kickers: 4,732 articles got correct kickers from poly-online type_db (2,303) and WP Kicker postmeta (1,357 fills + 1,072 overwrites of generic "Editorial/Opinion"). Section-level pluralism (Letter to the Editor, Top Hat, Derby, My View, etc.) is restored.
  • Subdecks: 768 poly-online articles got their blurb_db summaries.
  • Featured images: 1,531 WP articles got real featured_image_id, image_caption, and photographer attribution from postmeta. Image cards on section pages 2009-2019 are no longer blank.
  • WP slug regen: 2,921 run-together slugs (e.g. 2011-11-30-midnpartakeinmarathon) rebuilt from source post_name to proper hyphenated form (2011-11-30-midn-partake-in-marathon). Old slugs are saved on a new previous_slug column; middleware 301s the old URL to the new one.
  • Shortlinks: New legacy_shortlinks lookup table populated from pluginSL_shorturl (12,872 rows → 5,014 useful redirects: 4,091 polymer URLs + 871 archive media + 50 external + 2 mirror fallbacks). Regex broadened from 5-digit numeric to 5-char alphanumeric.

Schema changes

Two migrations:

  1. 20260507_000000_add_articles_previous_slug — adds articles.previous_slug (+ version shadow + index).
  2. 20260507_010000_add_legacy_shortlinks — new legacy_shortlinks(short_code, target_url, hit_count, created_at) lookup table.

Both are nullable / additive; no data is destroyed. Production DB has already been migrated and backfilled directly via the SSH tunnel (the backfill scripts ran serially against the prod DB).

Test plan

  • pnpm typecheck clean
  • pnpm lint clean (53 pre-existing warnings, 0 errors)
  • pnpm build clean
  • Spot-checked 6 audit-flagged WP rows for kicker correctness
  • Confirmed wp_id 2103 ("MIDN partake in marathon") now has featured_image_id set, slug renamed, previous_slug retained
  • Post-deploy: curl an old run-together URL (/news/2011/11/2011-11-30-watsonreturnstorpidefeatsstudents) → expect 301 to the new hyphenated slug
  • Post-deploy: curl a 5-char shortlink (e.g. /63345) → expect 301 to the polymer URL

🤖 Generated with Claude Code

…ugs, shortlinks

Round 5 of the legacy-archive cleanup, driven by the completeness audit at
/tmp/legacy-import-logs/wp-completeness-audit.md.

* kickers + subdecks: 4732 articles got correct kickers from poly-online
  type_db (2303) and WP `Kicker` postmeta (1357 fills + 1072 overwrites of
  generic "Editorial/Opinion"). Plus 768 poly-online subdecks from blurb_db.
* featured images: 1531 WP articles got real featured_image_id + caption +
  photographer attribution from `Photo`/`PhotoCaption`/`PhotoByline`/
  `Photographer` postmeta.
* WP slug regen: 2921 run-together slugs (e.g. `midnpartakeinmarathon`)
  rebuilt from source `post_name` (e.g. `midn-partake-in-marathon`). Old
  slugs are saved on a new `previous_slug` column so middleware can 301
  to the new URL. New `articles_previous_slug_idx` makes the lookup cheap.
* shortlinks: new `legacy_shortlinks` lookup table populated from the
  `pluginSL_shorturl` plugin (5014 of 12,872 source rows resolve to a real
  destination — 4091 polymer URLs, 871 archive media files, 50 external,
  2 mirror fallbacks). Middleware checks the table after the hand-curated
  override map. Regex broadened from 5-digit to 5-char alphanumeric.

Backfill scripts live under scripts/legacy-import/ and run idempotently
(dry run by default; --write to commit). All four were used to update
the production DB directly via the SSH tunnel before this PR.
Comment thread scripts/legacy-import/backfill-wp-featured-images.ts Fixed
… strip

CodeQL flagged a single `replace(/<[^>]+>/g, '')` as an incomplete
sanitizer because unbalanced angle brackets could survive. Replace with
an indexOf fixpoint loop that drops the trailing fragment when a `<` has
no matching `>`.
…-import wp_id 7191/7421

* expand-wp-shortcodes.ts: 75 articles re-rendered. 58 [gallery] shortcodes
  produce real upload nodes (988 images resolved against the attachment
  guid map), 19 [gview] PDFs become 'Download PDF' links, and 2 iframes
  (1 YouTube, 1 Google Form) become outbound links. Pre-expands the
  shortcodes to <img>/<a> tags before handing off to the existing
  WP→Lexical pipeline + media-resolver.
* fix-wp-7191-7421.ts: wp_id 7191 (Tate Boucher's neuromarketing letter,
  previously mis-imported as a day-archive listing titled 'RENSSELAER
  UNION') gets its real content + Letter-to-the-Editor kicker + author.
  wp_id 7421 has zero source content; demoted to draft.
@RonanHevenor RonanHevenor merged commit dc4de07 into main May 7, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants