Releases: roALAB1/data-normalization-platform
v4.24.0 - Workshop 4: Data Quality & Intent Filtering
Workshop 4 added with Wistia embed (5dx0ehg1oe), timestamped transcript, comprehensive KB article. Covers derivative data, match rate deconstruction, geoframing, Distance + Deviation intent model, closed feedback loop, stateless workers, DeepVerify. 211 tests passing.
v4.23.0: RetargetIQ Dedicated Battle Card
Added dedicated RetargetIQ battle card cloned from DataShopper with reseller-specific messaging. 5 cards, 13 comparison dimensions, 4 fatal flaws, rebuttal script, talk track, 6 FAQ, 3-month roadmap. 190 tests passing.
v4.21.0 - Premium Tier Rebuttal Cards
Added dedicated Premium Tier Rebuttal battle cards and FAQ entries for ZoomInfo (card #6, FAQ #7) and Apollo.io (card #5, FAQ #6). Addresses the common objection 'We already use the premium/enterprise tier' with detailed rebuttals explaining why paying more doesn't fix structural data architecture limitations. 172 tests passing.
v4.20.0 - Battle Card Seeding: ZoomInfo, Apollo.io, RB2B
Synthesized comprehensive battle cards from ChatGPT, 2x Perplexity, and Google Deep Research. ZoomInfo (5 cards, 8 comparisons, 4 fatal flaws), Apollo.io (4 cards, 8 comparisons, 3 fatal flaws), RB2B (4 cards, 8 comparisons, 3 fatal flaws). Each includes rebuttal scripts, talk tracks, sales FAQ, and 90-day deployment roadmaps. 172 tests passing.
v4.19.0 — Data Objection Handler
AI-powered Data Objection Handler page. Reps paste a prospect's data quality objection and receive a tailored response backed by the 'Cheap Data Costs More' workshop intelligence. Features 4 tone options, 15 common objection presets, copy-to-clipboard, response history, retry, and Cmd+Enter shortcut. 143 tests passing.
v4.18.0 — Data Economics Quiz & Vendor Checklist
Added 12-question Data Economics Quiz (NCOA, Distance Scoring, Starbucks Problem, derivative data, match rates, vendor vetting) with 4 score tiers and category breakdown. Generated one-page Vendor Comparison Checklist PDF on S3 CDN. Both linked from Data Workshop page and Workshop Hub. 121 tests passing across 9 test files.
v4.17.0 — Data Deep Dive Workshop
Added 'Cheap Data Costs More' workshop page with embedded Loom video, full timestamped transcript, and comprehensive Knowledge Base article covering the three pillars of identity data (Audience, Pixel, Intent). Clickable timestamps jump to video position. Workshop 3 card added to Workshop Hub. 108 tests passing across 8 test files.
v3.50.0: Smart Column Mapping
Smart Column Mapping 🤖
Intelligent pre-normalization feature that automatically detects and suggests combining fragmented columns (address components, name components, phone components) with confidence scoring and preview generation. Eliminates 5-10 minutes of manual Excel work with one-click acceptance.
Key Features
- 🏠 Address Components: House + Street + Apt → Address (e.g., "65" + "MILL ST" + "306" → "65 MILL ST Apt 306")
- 👤 Name Components: First + Middle + Last + Prefix + Suffix → Full Name (supports 15+ column name variations)
- 📞 Phone Components: Area Code + Number + Extension → Phone (e.g., "555" + "123-4567" → "(555) 123-4567")
- 🎯 Pattern Matching: Case-insensitive detection with space/underscore support
- 📊 Confidence Scoring: High (≥80%), Medium (60-79%), Low (<60%) confidence indicators
- 👁️ Preview Generation: Shows 3 sample combinations before acceptance
- ⚡ Fast Detection: <50ms for typical CSV (10-20 columns)
- 🎨 SmartSuggestions UI: User-friendly interface with Accept/Customize/Ignore actions
UI Enhancements
- 🌐 URL Normalization Tile: Replaced Company tile with URL normalization showcase in Enrichment-Ready Output Format
- 🔗 URL Examples: https://www.example.com/path → example.com, http://subdomain.site.co.uk → site.co.uk
User Experience
- Before: 5-10 minutes of manual column combination in Excel
- After: One-click "Accept" on smart suggestion
- Eliminates manual Excel formula work and reduces errors
Test Coverage
- 22/22 comprehensive unit tests (100% pass rate)
- Detection time: <50ms for typical CSV
- Minimal memory overhead (only 5 sample rows per column)
Technical Details
Files Added:
shared/utils/ColumnCombinationDetector.ts- Core detection logicclient/src/components/SmartSuggestions.tsx- UI componenttests/v3.50.0.test.ts- Comprehensive test suitedocs/VERSION_HISTORY_v3.50.0.md- Detailed documentation
Test Categories:
- Address component detection (5 tests)
- Name component detection (3 tests)
- Phone component detection (3 tests)
- Column combination application (4 tests)
- Multiple suggestions (2 tests)
- Edge cases (5 tests)
See CHANGELOG.md for complete details.
Release v3.49.0
What's New in v3.49.0 🚀
Changes
- Checkpoint: v3.49.0: Fix critical memory issues with 400k+ row files (5f26173)
- Release v3.49.0: Large File Processing Fix (661db03)
Full Changelog
See CHANGELOG.md for complete version history.
Installation
git clone https://github.com/roALAB1/data-normalization-platform.git
cd data-normalization-platform
pnpm install
pnpm run devDocumentation
v3.48.0: URL Normalization Feature 🌐
URL Normalization Feature 🌐
Comprehensive URL normalization that extracts clean domain names from URLs by removing protocols, www prefixes, paths, query parameters, and fragments. Auto-detects URL columns in CSV files with 95%+ accuracy and supports international domains (.co.uk, .com.au, etc.). Includes confidence scoring for URL validity and handles 18+ multi-part TLDs. All 40 tests passing with full integration into the intelligent normalization engine.
Key Features
- 🌐 Protocol Removal: Strips http://, https://, ftp://, and other protocols
- 🔗 WWW Prefix Removal: Removes www. from domain names (case-insensitive)
- 🎯 Root Domain Extraction: Extracts only domain + extension (google.com)
- 🗑️ Path/Query/Fragment Removal: Removes /paths, ?query=params, and #fragments
- 🌍 International Domain Support: Handles .co.uk, .com.au, and 18+ multi-part TLDs
- 🤖 Auto-Detection: Automatically identifies URL columns (Website, URL, Link, Homepage)
- 📊 Confidence Scoring: 0-1 confidence scores based on domain validity
- ✅ 40 Tests Passing: Comprehensive coverage including real-world examples
Examples
http://www.google.com → google.com
https://www.example.com/page?query=1 → example.com
www.facebook.com/profile#section → facebook.com
subdomain.site.co.uk/path → site.co.uk
Technical Details
- URLNormalizer Utility Class: Three main methods
normalize(url): Returns detailed result with metadatanormalizeString(url): Simplified version for CSV processingnormalizeBatch(urls): Batch processing for multiple URLs
- Integration with Intelligent Engine: Added 'url' DataType to UnifiedNormalizationEngine
- Seamless integration with existing normalization pipeline
- Lazy import for optimal performance
- Metadata includes: domain, subdomain, tld, isValid, confidence
Test Coverage
40 comprehensive tests (100% pass rate):
- Basic URL normalization (4 tests)
- Protocol removal (4 tests)
- WWW prefix removal (3 tests)
- Path/query/fragment removal (6 tests)
- Subdomain handling (3 tests)
- International domains (4 tests)
- Edge cases (6 tests)
- Confidence scoring (3 tests)
- Batch normalization (1 test)
- String normalization (1 test)
- Real-world examples (5 tests)
What's Changed
- Updated version to 3.48.0 in package.json and versionManager.ts
- Added comprehensive URL normalization feature
- Updated README.md with v3.48.0 overview
- Updated CHANGELOG.md with detailed v3.48.0 entry
- All existing features remain fully functional
Full Changelog: v3.45.0...v3.48.0