You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+34-2Lines changed: 34 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,6 +38,8 @@ DeepDiff DB makes the entire process deterministic, reviewable, and safe by:
38
38
-**Progress tracking** - Visual progress bars and spinners for long-running operations
39
39
-**Checkpoint/resume** - Resume interrupted operations from saved checkpoints
40
40
-**Enhanced error handling** - Rich error messages with actionable suggestions
41
+
-**Streaming large datasets** - Keyset-paginated batch hashing keeps memory bounded at any table size
42
+
-**Parallel table hashing** - Hash multiple tables concurrently with configurable worker pool
41
43
42
44
### Safety Features
43
45
@@ -247,6 +249,16 @@ Resolution strategies:
247
249
- `theirs`: Use development values (accept dev changes)
248
250
- `manual`: Require interactive decision for each conflict
249
251
252
+
**Performance Configuration (v0.7+):**
253
+
- `performance.hash_batch_size`: Rows per keyset-paginated query during table hashing. `0` disables batching (loads all rows in one query). Default: `10000`
254
+
- `performance.max_parallel_tables`: Maximum number of tables hashed concurrently. Default: `1`
255
+
256
+
```yaml
257
+
performance:
258
+
hash_batch_size: 10000 # ~1–2 MB per page; keeps heap bounded on any table size
259
+
max_parallel_tables: 2 # hash prod tables in parallel; raises throughput ~2× on dual-core
260
+
```
261
+
250
262
An example configuration file is included at `deepdiffdb.config.yaml.example`.
251
263
252
264
## Commands
@@ -314,6 +326,18 @@ Performs a full comparison of both schema and data.
@@ -513,7 +545,7 @@ DeepDiff DB uses a multi-stage approach to ensure safe and accurate database syn
513
545
7. **Migration Generation** - Creates SQL migration scripts with proper ordering and batching
514
546
8. **Transactional Application** - Applies changes within a single transaction for atomicity
515
547
516
-
The tool processes data in chunks for large tables and provides progress indicators for operations exceeding 10,000 rows. Progress bars show throughput (rows/second) and estimated time remaining. Checkpoints are automatically saved during long-running operations, allowing you to resume from interruptions.
548
+
The tool processes data using **keyset-paginated batching** for large tables — each page fetches a bounded number of rows (`WHERE pk > lastVal ORDER BY pk LIMIT N`), keeping heap usage flat regardless of table size. Multiple tables can be hashed concurrently using a bounded goroutine pool. Progress bars show throughput (rows/second) and estimated time remaining. Checkpoints are automatically saved during long-running operations, allowing you to resume from interruptions.
517
549
518
550
## Architecture
519
551
@@ -584,7 +616,7 @@ Current limitations and known constraints:
584
616
-**Database Support** - MSSQL and Oracle are not yet supported (planned for future releases)
585
617
-**Schema Auto-merge** - Schema differences must be resolved manually
586
618
-**Primary Key Requirement** - All tables must have primary keys (unless explicitly ignored)
587
-
-**Large Database Performance** - Very large databases may produce large diff files and require significant processing time
619
+
-**Large Database Performance** - Very large tables are handled with keyset-paginated batching (v0.7+); diff output files may still be large for tables with many changed rows
588
620
-**Conflict Resolution** - Complex merge strategies (e.g., column-level merging) are not supported
589
621
-**SQLite Constraints** - SQLite has limited support for ALTER TABLE operations
0 commit comments