|
| 1 | +# Fuzz Test — OLR Accuracy Validation |
| 2 | + |
| 3 | +Validates OLR data accuracy under randomized workloads on Oracle RAC by |
| 4 | +comparing OLR's CDC output against LogMiner event-by-event. |
| 5 | + |
| 6 | +## Quick Start |
| 7 | + |
| 8 | +```bash |
| 9 | +cd tests/dbz-twin/rac |
| 10 | + |
| 11 | +./fuzz-test.sh up # start infrastructure |
| 12 | +./fuzz-test.sh run 60 # run 60-minute workload |
| 13 | +./fuzz-test.sh validate # compare results |
| 14 | +./fuzz-test.sh down # clean up |
| 15 | +``` |
| 16 | + |
| 17 | +## Architecture |
| 18 | + |
| 19 | +``` |
| 20 | +Oracle RAC (2 nodes) |
| 21 | + └─ PL/SQL fuzz workload (random DML, event_id on every row) |
| 22 | + ├─ Debezium LogMiner adapter ─→ Kafka topic: lm-events |
| 23 | + └─ Debezium OLR adapter ─→ Kafka topic: olr-events |
| 24 | + │ |
| 25 | + Python Kafka consumer |
| 26 | + │ |
| 27 | + SQLite (lm_events + olr_events) |
| 28 | + │ |
| 29 | + Python validator |
| 30 | + Compares by event_id |
| 31 | +``` |
| 32 | + |
| 33 | +Both Debezium adapters read from the same Oracle redo logs. LogMiner is the |
| 34 | +reference (Oracle's own CDC). OLR is the system under test. If both produce |
| 35 | +the same events for the same DML, OLR is accurate. |
| 36 | + |
| 37 | +## Components |
| 38 | + |
| 39 | +### Load Generator (`perf/fuzz-workload.sql`) |
| 40 | + |
| 41 | +PL/SQL package that generates random DML across 7 table types: |
| 42 | + |
| 43 | +| Table | Tests | |
| 44 | +|-------|-------| |
| 45 | +| FUZZ_SCALAR | Core types: VARCHAR2, NUMBER, FLOAT, DOUBLE, DATE, TIMESTAMP, RAW | |
| 46 | +| FUZZ_WIDE | 40+ columns — multi-block redo records | |
| 47 | +| FUZZ_LOB | CLOB + BLOB — LOB redo opcodes, out-of-row storage | |
| 48 | +| FUZZ_PART | List-partitioned — data-obj-id resolution | |
| 49 | +| FUZZ_NOPK | No primary key — ROWID-based supplemental logging | |
| 50 | +| FUZZ_MAXSTR | Two VARCHAR2(4000) — near block-boundary rows | |
| 51 | +| FUZZ_INTERVAL | INTERVAL YEAR TO MONTH, DAY TO SECOND | |
| 52 | + |
| 53 | +Transaction patterns: |
| 54 | +- 55% immediate commit |
| 55 | +- 15% batched commit (2-5 operations) |
| 56 | +- 10% full rollback |
| 57 | +- 10% savepoint + partial rollback |
| 58 | +- 10% large transaction (10-30 operations) |
| 59 | + |
| 60 | +Every row has a globally unique `event_id` column (`N{node}_{seq:08d}`). |
| 61 | +This is the key for comparison — no ordering assumptions needed. |
| 62 | + |
| 63 | +### Kafka (single broker, KRaft) |
| 64 | + |
| 65 | +Single topic per adapter (`lm-events`, `olr-events`). All tables routed to |
| 66 | +one topic via `RegexRouter` to preserve commit order within each adapter. |
| 67 | + |
| 68 | +### Consumer (`kafka-consumer.py`) |
| 69 | + |
| 70 | +Subscribes to both topics. For each event: |
| 71 | +1. Extracts `event_id` from Debezium JSON (`after.EVENT_ID` or `before.EVENT_ID`) |
| 72 | +2. Writes to SQLite: `(event_id, seq, table_name, op, raw_json, consumed_at)` |
| 73 | +3. Skips `FUZZ_STATS` table and `event_id='SEED'` rows |
| 74 | + |
| 75 | +The `seq` column handles LogMiner's LOB splitting (same event_id, multiple |
| 76 | +CDC events). SQLite uses WAL mode for concurrent reads. |
| 77 | + |
| 78 | +### Validator (`validator.py`) |
| 79 | + |
| 80 | +Walks both SQLite tables sorted by event_id using a per-node watermark: |
| 81 | + |
| 82 | +1. For each RAC node (N1, N2): `frontier = min(max_lm_event_id, max_olr_event_id)` |
| 83 | +2. Fetch event_ids within each node's frontier |
| 84 | +3. For each event_id: |
| 85 | + - In both → compare table, op, column values (with LOB merge) |
| 86 | + - In OLR only → extra (phantom transaction) |
| 87 | + - In LM only → missing from OLR |
| 88 | +4. LOB table mismatches classified as known issues (olr#26) |
| 89 | +5. Non-LOB mismatches = FAIL |
| 90 | + |
| 91 | +Exit 0 = PASS (no non-LOB mismatches), exit 1 = FAIL. |
| 92 | + |
| 93 | +## Commands |
| 94 | + |
| 95 | +| Command | Description | |
| 96 | +|---------|-------------| |
| 97 | +| `./fuzz-test.sh up` | Start Kafka, Debezium, consumer, OLR. Deploy fuzz tables. | |
| 98 | +| `./fuzz-test.sh run [min]` | Run fuzz workload for N minutes (default: 30) | |
| 99 | +| `./fuzz-test.sh status` | Show container status, consumer counts, OLR memory | |
| 100 | +| `./fuzz-test.sh validate` | Wait for consumer drain, run validator, report PASS/FAIL | |
| 101 | +| `./fuzz-test.sh logs <c>` | Show logs: kafka, logminer, olr, consumer, validator, olr-vm | |
| 102 | +| `./fuzz-test.sh down` | Stop all containers and remove volumes | |
| 103 | + |
| 104 | +## Prerequisites |
| 105 | + |
| 106 | +- RAC VM running with Oracle containers started |
| 107 | +- OLR dev image built (`make build`) |
| 108 | +- OLR image loaded on RAC VM (`podman load`) |
| 109 | +- One-time setup done (`./setup.sh` — creates `c##dbzuser` + grants) |
| 110 | +- `CREATE PROCEDURE` grant for `olr_test` user (for PL/SQL package) |
| 111 | + |
| 112 | +## Known Issues |
| 113 | + |
| 114 | +- **LOB phantom transactions (olr#26)**: OLR emits entire phantom committed |
| 115 | + transactions on FUZZ_LOB that LogMiner does not see. ~0.1% of LOB events. |
| 116 | + Classified as `lob_known` in validator output — does not fail the test. |
| 117 | + |
| 118 | +- **LOB UPDATE variant (olr#10)**: Occasional LOB UPDATE events present in |
| 119 | + LogMiner but absent from OLR. Same phantom undo root cause. |
| 120 | + |
| 121 | +- Non-LOB tables are **100% accurate** in all testing so far. |
| 122 | + |
| 123 | +## SQLite Schema |
| 124 | + |
| 125 | +```sql |
| 126 | +CREATE TABLE lm_events ( |
| 127 | + event_id TEXT NOT NULL, |
| 128 | + seq INTEGER NOT NULL, -- 0 normally, >0 for LOB splits |
| 129 | + table_name TEXT NOT NULL, |
| 130 | + op TEXT NOT NULL, -- INSERT, UPDATE, DELETE |
| 131 | + raw_json TEXT NOT NULL, -- full Debezium envelope |
| 132 | + consumed_at REAL NOT NULL, |
| 133 | + PRIMARY KEY (event_id, seq) |
| 134 | +); |
| 135 | +-- olr_events: identical schema |
| 136 | +``` |
| 137 | + |
| 138 | +The database persists after `down` is called. Query it directly for |
| 139 | +investigation: |
| 140 | + |
| 141 | +```bash |
| 142 | +docker run --rm -v rac_fuzz-data:/data python:3.12-slim python3 -c " |
| 143 | +import sqlite3 |
| 144 | +conn = sqlite3.connect('/data/fuzz.db') |
| 145 | +# Example: find all phantom events |
| 146 | +for r in conn.execute(''' |
| 147 | + SELECT o.event_id, o.table_name, o.op |
| 148 | + FROM olr_events o LEFT JOIN lm_events l ON o.event_id = l.event_id |
| 149 | + WHERE l.event_id IS NULL ORDER BY o.event_id |
| 150 | +''').fetchall(): |
| 151 | + print(r) |
| 152 | +" |
| 153 | +``` |
0 commit comments