Skip to content

feat: Add channel alias support, debug export, and improve tag handling#2

Open
RedShieldArr wants to merge 1 commit into
PiratesIRC:mainfrom
RedShieldArr:main
Open

feat: Add channel alias support, debug export, and improve tag handling#2
RedShieldArr wants to merge 1 commit into
PiratesIRC:mainfrom
RedShieldArr:main

Conversation

@RedShieldArr
Copy link
Copy Markdown
Collaborator

feat: add alias-aware matching, debug export, and ignored-tag expansion

Improve channel matching quality and provide deeper debug visibility.

  • Add premium channel alias support

    • Load aliases from country JSON databases
    • Match against aliases and map back to canonical channel names
    • Expand UKTV alias coverage in UK channel data
  • Add Debug Match Export action

    • Export detailed matching diagnostics to CSV
    • Include OTA detection, normalized queries, and fuzzy match stages
    • Add setting to control top N fuzzy candidates in debug output
  • Improve ignored tag handling

    • Add shared ignored-tag expansion helper
    • Support bracketed, parenthesized, and bare tag variants from one entry
    • Apply consistently across rename and category workflows
  • Improve Unicode matching behavior

    • Preserve accented names during normalization for better matching
    • Keep accent handling in the dedicated normalization pipeline
    • Correct related normalization comment text
  • Update documentation

    • Document Debug Match Export action and debug-top-N setting
    • Clarify ignored-tag auto-expansion behavior
    • Expand CSV export format section with debug export details

Major enhancements to matching accuracy and debugging capabilities:

- Add channel alias support for better matching flexibility
  * Load and match against channel aliases from JSON files
  * Map matched aliases back to canonical channel names
  * Add comprehensive UKTV channel aliases (Dave, Gold, Drama, etc.)

- Implement debug export functionality
  * New Debug Match Export (CSV) action with detailed matching stages
  * Configurable top N fuzzy match candidates per channel
  * Export shows OTA detection, tag extraction, and all fuzzy matching stages
  * Includes normalized queries and stage-by-stage candidate scores

- Improve tag handling and normalization
  * Auto-expand ignored tags to bracket/parentheses/bare versions
  * Single [HD] tag now matches [HD], (HD), and HD automatically
  * Fix Unicode handling by removing premature non-ASCII stripping

- Fix Unicode encoding in UK_channels.json
  * Properly encode special characters as JSON unicode escapes (ä, ü, ö, é)
  * Maintain JSON compatibility across systems

- Update documentation
  * Document debug export action and settings
  * Expand ignored tags description with auto-expansion details
  * Add detailed CSV export format documentation
@PiratesIRC
Copy link
Copy Markdown
Owner

Thanks for this — the alias support, the _expand_ignored_tags refactor, and the debug export are all genuinely useful. A few things need addressing before this can land:

Blocking: branch is based on the pre-ORM codebase

This PR predates the HTTP-API → Django ORM migration. main is now v1.26.1001200 and accesses data via from apps.channels.models import .... The new debug_export_action calls self._get_api_token(...) and self._get_api_data("/api/channels/groups/", ...), which no longer exist on main (replaced by _get_all_groups / _get_all_channels ORM helpers). The run() dispatch and the category_groups_dry_run_action / organize_by_category_action call sites you patched were also rewritten for ORM — hence the merge conflicts. Could you rebase onto current main and re-apply the concepts (not the diff)?

Perf: fuzzy_match_debug bypasses the token pre-filter

It iterates all candidate_names three times (stages 1–3), recomputing normalize_name each pass, with no get_candidates() pre-filter and no normalization cache. That reintroduces the O(streams×channels) cost the v1.26 token-index work removed (32h → 6s). Over ~31K channels a debug export could hang. Please route it through get_candidates() + _get_cached_norm() like fuzzy_match does.

Behavior change in tag expansion

_expand_ignored_tags now also appends the bare inner token (the old inline blocks only added the opposite-bracket form). Bare HD/4K stripping is more aggressive and could over-strip names where the token appears as a word. It's documented in the README, but please spot-check matching quality on a real channel list.

PR description vs. diff

The description mentions Unicode/accent-preservation normalization changes and comment fixes, but no such changes appear in the diff (the fuzzy_matcher hunks are only __init__, db-load, and the new debug method). Could you reconcile the description with what's actually included?

Minor

  • import json inside debug_export_action is unused (the module already imports it).
  • UK_channels.json has pre-existing duplicate channel_name entries (U&Eden HD, U&Yesterday HD, U&Yesterday+1 each appear twice); this PR duplicates the alias blocks into both, with inconsistent key ordering. Harmless at runtime (dict last-wins), but the dedup is worth splitting into its own change.

The alias→canonical reverse mapping itself is sound, and remapping before the category_map_premium lookup is correctly ordered. Happy to re-review once it's rebased onto ORM main.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants