Skip to content

fix: author/date byline heuristic matches day-of-week as a date#291

Merged
kepano merged 2 commits into
kepano:mainfrom
mvanhorn:fix/233-author-date-byline-heuristic-matches-day-of-week-a
Jun 6, 2026
Merged

fix: author/date byline heuristic matches day-of-week as a date#291
kepano merged 2 commits into
kepano:mainfrom
mvanhorn:fix/233-author-date-byline-heuristic-matches-day-of-week-a

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Summary

The byline-detection heuristic in content-patterns.ts treated bare weekday names ("Tuesday", "Wed") as date strings and stripped them from article bylines. Tightens the regex so a weekday only counts when adjacent to a numeric date (day, month, or year), matching how human readers parse "Tuesday, March 5".

Why this matters

The bug surfaces on email-style header blocks like the fixture added here, where stripping a standalone weekday clipped the byline. Reported in #233 with a real Substack-style article.

Changes

  • src/removals/content-patterns.ts - require a numeric date neighbor for the day-of-week match.
  • tests/fixtures/metadata--email-style-header-block.html + tests/expected/metadata--email-style-header-block.md - regression fixture covering the reporter's case.

Testing

New fixture passes locally. Existing fixtures still pass.

Fixes #233

The author/date byline regex matched bare weekday names ('Tuesday',
'Wed') as date candidates and stripped them. Restrict the day-of-week
match so weekday-only strings without a numeric date next to them
are not treated as dates.

Fixes kepano#233
@kepano kepano merged commit 323c895 into kepano:main Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Author-date byline heuristic matches day-of-week abbreviations as names

2 participants