Skip to content

Commit 185a2f6

Browse files
committed
docs: add fuzz test documentation
1 parent 139ebaa commit 185a2f6

1 file changed

Lines changed: 153 additions & 0 deletions

File tree

tests/dbz-twin/rac/FUZZ-TEST.md

Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
# Fuzz Test — OLR Accuracy Validation
2+
3+
Validates OLR data accuracy under randomized workloads on Oracle RAC by
4+
comparing OLR's CDC output against LogMiner event-by-event.
5+
6+
## Quick Start
7+
8+
```bash
9+
cd tests/dbz-twin/rac
10+
11+
./fuzz-test.sh up # start infrastructure
12+
./fuzz-test.sh run 60 # run 60-minute workload
13+
./fuzz-test.sh validate # compare results
14+
./fuzz-test.sh down # clean up
15+
```
16+
17+
## Architecture
18+
19+
```
20+
Oracle RAC (2 nodes)
21+
└─ PL/SQL fuzz workload (random DML, event_id on every row)
22+
├─ Debezium LogMiner adapter ─→ Kafka topic: lm-events
23+
└─ Debezium OLR adapter ─→ Kafka topic: olr-events
24+
25+
Python Kafka consumer
26+
27+
SQLite (lm_events + olr_events)
28+
29+
Python validator
30+
Compares by event_id
31+
```
32+
33+
Both Debezium adapters read from the same Oracle redo logs. LogMiner is the
34+
reference (Oracle's own CDC). OLR is the system under test. If both produce
35+
the same events for the same DML, OLR is accurate.
36+
37+
## Components
38+
39+
### Load Generator (`perf/fuzz-workload.sql`)
40+
41+
PL/SQL package that generates random DML across 7 table types:
42+
43+
| Table | Tests |
44+
|-------|-------|
45+
| FUZZ_SCALAR | Core types: VARCHAR2, NUMBER, FLOAT, DOUBLE, DATE, TIMESTAMP, RAW |
46+
| FUZZ_WIDE | 40+ columns — multi-block redo records |
47+
| FUZZ_LOB | CLOB + BLOB — LOB redo opcodes, out-of-row storage |
48+
| FUZZ_PART | List-partitioned — data-obj-id resolution |
49+
| FUZZ_NOPK | No primary key — ROWID-based supplemental logging |
50+
| FUZZ_MAXSTR | Two VARCHAR2(4000) — near block-boundary rows |
51+
| FUZZ_INTERVAL | INTERVAL YEAR TO MONTH, DAY TO SECOND |
52+
53+
Transaction patterns:
54+
- 55% immediate commit
55+
- 15% batched commit (2-5 operations)
56+
- 10% full rollback
57+
- 10% savepoint + partial rollback
58+
- 10% large transaction (10-30 operations)
59+
60+
Every row has a globally unique `event_id` column (`N{node}_{seq:08d}`).
61+
This is the key for comparison — no ordering assumptions needed.
62+
63+
### Kafka (single broker, KRaft)
64+
65+
Single topic per adapter (`lm-events`, `olr-events`). All tables routed to
66+
one topic via `RegexRouter` to preserve commit order within each adapter.
67+
68+
### Consumer (`kafka-consumer.py`)
69+
70+
Subscribes to both topics. For each event:
71+
1. Extracts `event_id` from Debezium JSON (`after.EVENT_ID` or `before.EVENT_ID`)
72+
2. Writes to SQLite: `(event_id, seq, table_name, op, raw_json, consumed_at)`
73+
3. Skips `FUZZ_STATS` table and `event_id='SEED'` rows
74+
75+
The `seq` column handles LogMiner's LOB splitting (same event_id, multiple
76+
CDC events). SQLite uses WAL mode for concurrent reads.
77+
78+
### Validator (`validator.py`)
79+
80+
Walks both SQLite tables sorted by event_id using a per-node watermark:
81+
82+
1. For each RAC node (N1, N2): `frontier = min(max_lm_event_id, max_olr_event_id)`
83+
2. Fetch event_ids within each node's frontier
84+
3. For each event_id:
85+
- In both → compare table, op, column values (with LOB merge)
86+
- In OLR only → extra (phantom transaction)
87+
- In LM only → missing from OLR
88+
4. LOB table mismatches classified as known issues (olr#26)
89+
5. Non-LOB mismatches = FAIL
90+
91+
Exit 0 = PASS (no non-LOB mismatches), exit 1 = FAIL.
92+
93+
## Commands
94+
95+
| Command | Description |
96+
|---------|-------------|
97+
| `./fuzz-test.sh up` | Start Kafka, Debezium, consumer, OLR. Deploy fuzz tables. |
98+
| `./fuzz-test.sh run [min]` | Run fuzz workload for N minutes (default: 30) |
99+
| `./fuzz-test.sh status` | Show container status, consumer counts, OLR memory |
100+
| `./fuzz-test.sh validate` | Wait for consumer drain, run validator, report PASS/FAIL |
101+
| `./fuzz-test.sh logs <c>` | Show logs: kafka, logminer, olr, consumer, validator, olr-vm |
102+
| `./fuzz-test.sh down` | Stop all containers and remove volumes |
103+
104+
## Prerequisites
105+
106+
- RAC VM running with Oracle containers started
107+
- OLR dev image built (`make build`)
108+
- OLR image loaded on RAC VM (`podman load`)
109+
- One-time setup done (`./setup.sh` — creates `c##dbzuser` + grants)
110+
- `CREATE PROCEDURE` grant for `olr_test` user (for PL/SQL package)
111+
112+
## Known Issues
113+
114+
- **LOB phantom transactions (olr#26)**: OLR emits entire phantom committed
115+
transactions on FUZZ_LOB that LogMiner does not see. ~0.1% of LOB events.
116+
Classified as `lob_known` in validator output — does not fail the test.
117+
118+
- **LOB UPDATE variant (olr#10)**: Occasional LOB UPDATE events present in
119+
LogMiner but absent from OLR. Same phantom undo root cause.
120+
121+
- Non-LOB tables are **100% accurate** in all testing so far.
122+
123+
## SQLite Schema
124+
125+
```sql
126+
CREATE TABLE lm_events (
127+
event_id TEXT NOT NULL,
128+
seq INTEGER NOT NULL, -- 0 normally, >0 for LOB splits
129+
table_name TEXT NOT NULL,
130+
op TEXT NOT NULL, -- INSERT, UPDATE, DELETE
131+
raw_json TEXT NOT NULL, -- full Debezium envelope
132+
consumed_at REAL NOT NULL,
133+
PRIMARY KEY (event_id, seq)
134+
);
135+
-- olr_events: identical schema
136+
```
137+
138+
The database persists after `down` is called. Query it directly for
139+
investigation:
140+
141+
```bash
142+
docker run --rm -v rac_fuzz-data:/data python:3.12-slim python3 -c "
143+
import sqlite3
144+
conn = sqlite3.connect('/data/fuzz.db')
145+
# Example: find all phantom events
146+
for r in conn.execute('''
147+
SELECT o.event_id, o.table_name, o.op
148+
FROM olr_events o LEFT JOIN lm_events l ON o.event_id = l.event_id
149+
WHERE l.event_id IS NULL ORDER BY o.event_id
150+
''').fetchall():
151+
print(r)
152+
"
153+
```

0 commit comments

Comments
 (0)