Skip to content

Latest commit

 

History

History
414 lines (294 loc) · 10.1 KB

File metadata and controls

414 lines (294 loc) · 10.1 KB

Roadmap for Version 1.1.0

Target Release: Q1 2026
Focus: Enhanced user experience, real-time features, and advanced name parsing capabilities


🎯 Version 1.1.0 Goals

Version 1.1.0 will focus on three main areas:

  1. Real-Time Features: WebSocket-based progress tracking and live updates
  2. Advanced Name Parsing: Better handling of edge cases and international names
  3. User Experience: Comparison views, search/filter, and keyboard shortcuts

🚀 Priority 1: Real-Time Features (High Priority)

WebSocket Progress Tracking

Current: Polling-based progress updates (every 2 seconds)
Planned: Real-time WebSocket connection for instant updates

Benefits:

  • ✅ Instant progress updates without polling overhead
  • ✅ Reduced server load (no repeated HTTP requests)
  • ✅ Better user experience with live feedback
  • ✅ Support for multiple concurrent jobs

Implementation:

  • Add Socket.IO or native WebSocket support
  • Create WebSocket server endpoint
  • Update job processor to emit progress events
  • Update client to listen for real-time updates
  • Add connection status indicator

Estimated Effort: 2-3 days


🔍 Priority 2: Advanced Name Parsing (High Priority)

1. Name Length Validation

Goal: Detect and flag suspiciously short or long names

Features:

  • Minimum length validation (e.g., 2 characters)
  • Maximum length validation (e.g., 100 characters)
  • Configurable thresholds
  • Warning flags for edge cases

Examples:

  • ✅ "John Smith" → Valid
  • ⚠️ "J" → Warning (too short)
  • ⚠️ "Verylongnamewithnospaces..." → Warning (too long)

Estimated Effort: 1 day


2. Better Initials Handling

Goal: Properly parse names with initials

Current Issues:

  • "J. Robert Smith" → Unclear if "J." is first name or initial
  • "R.J. Williams" → Should preserve "R.J." as first name

Planned Improvements:

  • Detect single-letter + period patterns
  • Preserve multi-initial patterns (R.J., J.P., etc.)
  • Add option to expand initials to full names (if database available)

Examples:

  • ✅ "J. Robert Smith" → First: "J.", Middle: "Robert", Last: "Smith"
  • ✅ "R.J. Williams" → First: "R.J.", Last: "Williams"
  • ✅ "John P. Smith" → First: "John", Middle: "P.", Last: "Smith"

Estimated Effort: 2 days


3. Asian Name Order Detection

Goal: Detect and handle family-name-first patterns

Current Issues:

  • "Li Wei" (Chinese) → Incorrectly parsed as First: "Li", Last: "Wei"
  • Should be: First: "Wei", Last: "Li" (or preserve original order with flag)

Planned Improvements:

  • Detect common Asian surnames (Li, Wang, Zhang, Kim, Park, etc.)
  • Add name order detection heuristics
  • Provide option to preserve original order
  • Add confidence scoring

Examples:

  • ✅ "Li Wei" → First: "Wei", Last: "Li" (with flag: family_name_first)
  • ✅ "Kim Min-jun" → First: "Min-jun", Last: "Kim"
  • ✅ "Tanaka Hiroshi" → First: "Hiroshi", Last: "Tanaka"

Estimated Effort: 3-4 days


4. Mononym Support

Goal: Handle single-name individuals (celebrities, historical figures)

Examples:

  • ✅ "Madonna" → First: "Madonna", Last: null
  • ✅ "Cher" → First: "Cher", Last: null
  • ✅ "Plato" → First: "Plato", Last: null

Implementation:

  • Detect single-word names
  • Add mononym flag
  • Handle in formatting (don't require last name)

Estimated Effort: 1 day


5. Patronymic/Matronymic Patterns

Goal: Recognize patronymic naming systems (Russian, Icelandic, etc.)

Examples:

  • ✅ "Ivan Petrovich Sidorov" → First: "Ivan", Patronymic: "Petrovich", Last: "Sidorov"
  • ✅ "Björk Guðmundsdóttir" → First: "Björk", Patronymic: "Guðmundsdóttir"

Implementation:

  • Add patronymic suffix detection (-ovich, -evich, -son, -dóttir, etc.)
  • Create patronymic field in output
  • Handle in formatting

Estimated Effort: 2-3 days


6. Phonetic Matching for Duplicates

Goal: Detect duplicate names with different spellings

Examples:

  • "John Smith" vs "Jon Smyth" → 85% phonetic match
  • "Catherine" vs "Kathryn" → 90% phonetic match

Implementation:

  • Add Soundex algorithm
  • Add Metaphone algorithm
  • Create similarity scoring API
  • Add duplicate detection in batch processing

Estimated Effort: 3 days


7. Fuzzy Matching Against Name Databases

Goal: Validate names against known name databases

Features:

  • Match against common first names database
  • Match against common last names database
  • Confidence scoring based on frequency
  • Flag unusual/suspicious names

Implementation:

  • Integrate name frequency databases (US Census, etc.)
  • Add fuzzy matching with Levenshtein distance
  • Create validation API endpoint

Estimated Effort: 4-5 days


8. Context-Aware Parsing

Goal: Use other CSV columns to improve parsing accuracy

Examples:

  • If "Company" column contains "Dr. Smith Clinic" → Likely "Dr." is a title
  • If "Email" is "john.smith@example.com" → First: "John", Last: "Smith"
  • If "Phone" has country code +82 → Likely Korean name (family-name-first)

Implementation:

  • Add context analysis module
  • Extract hints from email addresses
  • Use company names for validation
  • Use phone country codes for name order detection

Estimated Effort: 5-6 days


9. Character Set Validation

Goal: Detect mixed scripts and encoding issues

Examples:

  • ⚠️ "John Смит" → Warning (mixed Latin and Cyrillic)
  • ⚠️ "José�Smith" → Error (encoding issue)

Implementation:

  • Detect Unicode script mixing
  • Flag potential encoding errors
  • Suggest corrections

Estimated Effort: 2 days


🎨 Priority 3: User Experience Enhancements (Medium Priority)

1. Comparison View (Original vs Namefully)

Goal: Show side-by-side comparison with Namefully library approach

Features:

  • Display original name
  • Show our parsing result
  • Show Namefully parsing result
  • Highlight differences
  • Explain why our approach is better

Estimated Effort: 3 days


2. Search/Filter for Example Names

Goal: Quickly find specific example names

Features:

  • Search box for example names
  • Filter by category (credentials, titles, prefixes, etc.)
  • Filter by complexity (simple, medium, complex)
  • Quick access to edge cases

Estimated Effort: 2 days


3. Custom Prefix/Credential Lists Editor

Goal: Allow users to add custom credentials and prefixes

Features:

  • UI to add/edit/delete credentials
  • UI to add/edit/delete last name prefixes
  • UI to add/edit/delete titles
  • Save custom lists to user account
  • Import/export custom lists

Estimated Effort: 4-5 days


4. Keyboard Shortcuts

Goal: Speed up workflow with keyboard shortcuts

Shortcuts:

  • Ctrl+Enter → Parse name
  • Ctrl+B → Switch to batch mode
  • Ctrl+U → Upload CSV
  • Ctrl+D → Download results
  • Ctrl+C → Copy formatted name
  • ? → Show keyboard shortcuts help

Estimated Effort: 2 days


5. Internationalization (i18n)

Goal: Support multiple languages for UI

Languages:

  • English (default)
  • Spanish
  • French
  • German
  • Chinese (Simplified)
  • Japanese

Implementation:

  • Add i18next library
  • Extract all UI strings
  • Create translation files
  • Add language switcher

Estimated Effort: 5-6 days


📦 Priority 4: Package & Distribution (Low Priority)

1. Publish @normalization/core to npm

Goal: Make the package publicly available

Steps:

  • Create npm account (if needed)
  • Configure package.json for publishing
  • Add npm publish workflow
  • Create package documentation
  • Publish to npm registry

Estimated Effort: 1-2 days


2. Create GitHub Packages Release

Goal: Alternative distribution via GitHub Packages

Steps:

  • Configure GitHub Packages
  • Add publishing workflow
  • Update documentation

Estimated Effort: 1 day


🧪 Priority 5: Testing & Quality (Ongoing)

1. Increase Test Coverage

Goal: Reach 95%+ code coverage

Areas:

  • Unit tests for all normalizers
  • Integration tests for API endpoints
  • E2E tests for critical user flows
  • Performance tests for batch processing

Estimated Effort: 5-7 days


2. Add Performance Benchmarks

Goal: Track performance over time

Metrics:

  • Names processed per second
  • Memory usage during batch processing
  • API response times
  • Database query performance

Estimated Effort: 2-3 days


📅 Proposed Timeline

Phase 1: Real-Time Features (Weeks 1-2)

  • WebSocket progress tracking
  • Connection status indicator
  • Real-time job updates

Phase 2: Advanced Name Parsing (Weeks 3-6)

  • Name length validation
  • Better initials handling
  • Asian name order detection
  • Mononym support
  • Patronymic patterns

Phase 3: User Experience (Weeks 7-9)

  • Comparison view
  • Search/filter
  • Keyboard shortcuts
  • Custom lists editor

Phase 4: Advanced Features (Weeks 10-12)

  • Phonetic matching
  • Fuzzy matching
  • Context-aware parsing
  • Character set validation

Phase 5: Polish & Release (Weeks 13-14)

  • Internationalization
  • Testing & QA
  • Documentation updates
  • v1.1.0 release

🎯 Success Metrics

Performance

  • Process 100K names in under 30 seconds
  • Support 1M+ names in a single batch job
  • Real-time updates with <100ms latency

Quality

  • 95%+ parsing accuracy on test dataset
  • 95%+ code coverage
  • Zero critical bugs

User Experience

  • <2 second page load time
  • <500ms API response time (p95)
  • 4.5+ star user rating

🤝 Community Contributions

We welcome community contributions for v1.1.0! Priority areas:

  1. Internationalization: Help translate the UI
  2. Name Databases: Contribute name frequency data for your country
  3. Test Cases: Submit edge cases we haven't covered
  4. Documentation: Improve guides and tutorials

See CONTRIBUTING.md for guidelines.


📝 Notes

  • This roadmap is subject to change based on user feedback and priorities
  • Features may be moved between versions based on complexity and dependencies
  • Community contributions can accelerate the timeline

Last Updated: October 28, 2025
Status: Planning Phase