Skip to content

Commit f1a4300

Browse files
committed
feat: add data integrity page with bulk parent discovery
Database maintenance dashboard at /db/:dbId/integrity showing provider coverage gaps, missing parent links, orphaned edges, and stale data. Includes bulk parent ID discovery with SSE progress and cancellation.
1 parent 53c5640 commit f1a4300

11 files changed

Lines changed: 1624 additions & 2 deletions

File tree

.changelog/v0.5.x.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,17 @@ data/provider-cache/
119119
### Dead Code Removal (v0.5.3)
120120
- Deleted `UnifiedPlatformSection.tsx` (superseded by ProviderDataTable)
121121

122+
### Data Integrity Page + Bulk Discovery (v0.5.4)
123+
- Database maintenance dashboard at `/db/:dbId/integrity` with summary cards and tabbed interface
124+
- Integrity service: provider coverage gaps, parent linkage gaps, orphaned edges, stale cache detection
125+
- Bulk discovery service: database-wide automated parent ID discovery with SSE progress and cancellation
126+
- Parents tab: provider selector, "Discover All" button with real-time progress bar
127+
- Coverage tab: persons with incomplete provider links shown with linked/missing badges
128+
- Orphans tab: broken parent edges with missing person indicators
129+
- Stale tab: configurable age threshold with color-coded staleness
130+
- API endpoints at `/api/integrity/:dbId` for all checks and bulk operations
131+
- Sidebar nav item with ShieldCheck icon under each database
132+
122133
## Full Changelog
123134

124135
**Full Diff**: https://github.com/atomantic/SparseTree/compare/v0.4.16...v0.5.0

PLAN.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ High-level project roadmap. For detailed phase documentation, see [docs/roadmap.
3333
| 15.13 | Provider comparison table + LinkedIn ||
3434
| 15.14 | Code quality refactoring | 📋 |
3535
| 15.16 | Ancestry photo upload ||
36+
| 15.17 | Data integrity + bulk discovery ||
3637
| 16 | Multi-platform sync architecture | 📋 |
3738
| 17 | Real-time event system (Socket.IO) | 📋 |
3839

@@ -255,6 +256,52 @@ See [docs/architecture.md](./docs/architecture.md) for full details.
255256

256257
## Next Steps
257258

259+
### Phase 15.17: Data Integrity Page + Bulk Discovery
260+
261+
Database maintenance dashboard with automated parent ID discovery:
262+
263+
- **Integrity Service**: `integrity.service.ts` - SQL-based checks for data quality
264+
- `getIntegritySummary()` - Counts for all check types
265+
- `getProviderCoverageGaps()` - Persons with some but not all provider links
266+
- `getParentLinkageGaps()` - Parent edges where parent lacks provider link the child has
267+
- `getOrphanedEdges()` - Parent edges referencing non-existent person records
268+
- `getStaleProviderData()` - Provider cache files older than N days
269+
- **Bulk Discovery Service**: `bulk-discovery.service.ts` - Database-wide parent ID discovery
270+
- Async generator yielding `BulkDiscoveryProgress` events for SSE streaming
271+
- Deduplicates by child_id (one scrape discovers both parents)
272+
- Reuses existing `parentDiscoveryService.discoverParentIds()` per child
273+
- Rate limited via `PROVIDER_DEFAULTS[provider].rateLimitDefaults`
274+
- In-memory cancellation via `Set<operationId>` checked between iterations
275+
- **API Endpoints** (`/api/integrity/:dbId`):
276+
- `GET /` - Full integrity summary
277+
- `GET /coverage` - Provider coverage gaps (?providers=fs,ancestry)
278+
- `GET /parents` - Parent linkage gaps (?provider=familysearch)
279+
- `GET /orphans` - Orphaned edges
280+
- `GET /stale` - Stale records (?days=30)
281+
- `POST /discover-all` - Start bulk discovery
282+
- `GET /discover-all/events` - SSE stream for progress
283+
- `POST /discover-all/cancel` - Cancel running operation
284+
- **UI**: `IntegrityPage.tsx` at `/db/:dbId/integrity`
285+
- Summary cards (4 check types with counts, clickable)
286+
- Tabbed interface: Parents | Coverage | Orphans | Stale
287+
- Parents tab: provider selector, "Discover All" button with SSE progress bar + cancel
288+
- Coverage tab: table of persons with linked/missing provider badges
289+
- Orphans tab: table of broken parent edges
290+
- Stale tab: configurable days threshold, table with age coloring
291+
- Sidebar nav item with ShieldCheck icon
292+
- **Shared Types**: `IntegritySummary`, `ProviderCoverageGap`, `ParentLinkageGap`, `OrphanedEdge`, `StaleRecord`, `BulkDiscoveryProgress`
293+
- **Files Created**:
294+
- `server/src/services/integrity.service.ts`
295+
- `server/src/services/bulk-discovery.service.ts`
296+
- `server/src/routes/integrity.routes.ts`
297+
- `client/src/components/integrity/IntegrityPage.tsx`
298+
- **Files Modified**:
299+
- `shared/src/index.ts` - New types
300+
- `server/src/index.ts` - Route mount
301+
- `client/src/services/api.ts` - API methods + type re-exports
302+
- `client/src/App.tsx` - Route
303+
- `client/src/components/layout/Sidebar.tsx` - Nav item
304+
258305
### Phase 15.14: Code Quality Refactoring (Pre-Phase 16 Cleanup)
259306

260307
Code audit identified DRY/YAGNI/performance issues to address before Phase 16:

client/src/App.tsx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ import { ReportsPage } from './pages/ReportsPage';
1414
import { FavoritesPage } from './components/favorites/FavoritesPage';
1515
import { SparseTreePage } from './components/favorites/SparseTreePage';
1616
import { DatabaseFavoritesPage } from './components/favorites/DatabaseFavoritesPage';
17+
import { IntegrityPage } from './components/integrity/IntegrityPage';
1718

1819
function App() {
1920
return (
@@ -38,6 +39,7 @@ function App() {
3839
<Route path="favorites" element={<FavoritesPage />} />
3940
<Route path="favorites/sparse-tree/:dbId" element={<SparseTreePage />} />
4041
<Route path="db/:dbId/favorites" element={<DatabaseFavoritesPage />} />
42+
<Route path="db/:dbId/integrity" element={<IntegrityPage />} />
4143
</Route>
4244
</Routes>
4345
);

0 commit comments

Comments
 (0)