Deterministic. Auditable. Global.
Designed for explainable processing in regulated environments.
FinLang is a domain-specific language (DSL) and high-performance CLI engine for financial transaction processing.
It replaces opaque machine-learning categorization with transparent, deterministic rules — delivering explainability, auditability, and global compatibility.
Built for audit-friendly logic and deterministic processing.
A deterministic alternative where explainability and reproducibility matter.
FinLang rules are human-readable, Git-friendly, and designed for precision.
The engine processes rules top-to-bottom; the last matching rule sets the category, while flags accumulate.
# Example: Basic categorization and flagging
rule "GROCERIES: Tesco" {
match:
- counterparty ~ "*TESCO*"
set:
- category = "Groceries"
- flags += "Supermarket"
}
# Example: Numeric range and exact match
rule "TRAVEL: High Value Flight" {
match:
- counterparty == "BRITISH AIRWAYS"
- amount in -5000.00 .. -500.00
set:
- category = "Travel"
- flags += "HighValue"
}
| Feature | Description |
|---|---|
| Deterministic DSL | Human-readable .fin rules language — explainable logic, Git-friendly. |
| High-Performance Engine | Vectorized core (Pandas + NumPy + PyArrow) — ~217K rows/sec FastIO validated throughput on the integrity harness. |
| Dual Backend | Standard (Engine: c) or FastIO (Engine: pyarrow) with automatic fallback. |
| Growth Loop | Automated Discover → Suggest → Categorize workflow — 97.8% success on addressable patterns. |
| Global I18n Support | US/UK/EU/Commonwealth formats, £ € $ ¥ ₹ stripping, localized decimals/dates/delimiters. |
| Audit Trail System | Every decision logged (before/after state diffs); stateless for reproducibility. |
| Exclude Marker | Boolean exclude column — rule-driven, auditable, supports blacklist/whitelist exception patterns. |
| CR/DR Semantics | Case-insensitive CR/DR (with or without space), accounting negatives (123.45), trailing minus 123.45-. v0.7.7 fixes a latent bug on no-space CR/DR formats. |
| Amount Synthesis | Auto-computes amount = abs(credit) – abs(debit) across 9 edge cases. |
| Strict Parsing | Locale-aware normalization with configurable thresholds (--strict-parse). |
| Flag Integrity | Append-only (flags +=) with deterministic deduplication. |
| Integrity Verification | Built-in --verify and --verify-full — SHA-256 fingerprinting of immutable fields with optional artifact output. |
Requirements: Python 3.10—3.14
From PyPI (Recommended):
pip install finlangWith Fast I/O (PyArrow):
pip install "finlang[fastio]"(Enables --fastio for accelerated CSV I/O.)
From Source (Development):
git clone https://github.com/FinLang-Ltd/finlang.git
cd finlang
pip install -e .[fastio]1️⃣ Initial Categorization
finlang --input transactions.csv --output baseline.csv \
--rules my_rules.fin --include-pack retail,transport2️⃣ Discover Gaps
finlang-discover --input baseline.csv \
--candidates candidates.csv --all-candidates all_candidates.csv \
--min-count 53️⃣ Suggest Rules (Exact Mode Recommended)
finlang-suggest --input candidates.csv --output suggested_rules.fin \
--rules my_rules.fin --emit-match exact4️⃣ Merge and Re-run
cat my_rules.fin suggested_rules.fin > merged.fin
finlang --input transactions.csv --output improved.csv \
--rules merged.fin --include-pack retail,transport✅ Expected Result: 5–10% coverage improvement; zero duplicates in exact mode.
Measured with --audit-mode none (max throughput) on Intel i7-12700T, 48GB RAM, Windows 11, Python 3.13.7, PyArrow 21.0.
| Dataset | Test | Time (s) | Rows/sec | Notes |
|---|---|---|---|---|
| 100K (UK Synthetic) | Growth Loop | 2.54 | 39,370 ✅ | Baseline (121 rules) |
| 100K (after Growth Loop) | Growth Loop | 4.96 | 20,161 ✅ | +6.3× rules → ≈ 2× slower (764 rules) |
| 5M × 50 cols | Benchmark Harness | 179.27 | 27,900 ✅ | Enterprise validation, 3-run average |
| 20M × 6 cols | Integrity Test (FastIO) | ~90 | 217,068 ✅ | Engine throughput, full SHA-256 verified |
v0.7.7 improvement: Hot-path bug fix in
_to_numberremoved an unnecessary\bword boundary that was both producing wrong results on no-space CR/DR formats AND costing measurable runtime. The fix delivered +30-50% throughput on the integrity harness vs v0.7.6, taking standard mode to ~180K rows/sec and FastIO to ~217K rows/sec.Cumulative v0.6.4 → v0.7.7: -14% runtime, +16% throughput on the enterprise harness (5M × 50).
Audit Overhead: Enabling
--audit-mode lite/fullreduces throughput by ≈38% due to diff calculation; provides full decision provenance.Note: These figures are validated benchmark results from controlled tests. Actual performance varies depending on dataset, ruleset, and audit mode.
Seedocs/benchmarks.mdfor full details.
SHA-256 fingerprint verification benchmarked on large datasets:
| Rows | Engine (Standard) | Engine (FastIO) | Result |
|---|---|---|---|
| 5M | 178,903 rows/s | 198,448 rows/s | ✅ All fingerprints match |
| 10M | 178,511 rows/s | 214,136 rows/s | ✅ All fingerprints match |
| 20M | 181,566 rows/s | 217,068 rows/s | ✅ All fingerprints match |
What this benchmark validated: Every row's immutable fields (
date,amount,counterparty) were verified via SHA-256 hash before and after engine processing. Zero cross-row contamination detected. Zero data corruption detected. 60M rows verified field-by-field across three runs, zero mismatches.Note: As of v0.7.7, SHA-256 integrity verification is available as a CLI feature via
--verify(fast fingerprint) and--verify-full(fingerprint + field comparison). Use--verify-output-dirto save audit artifacts (JSON report + proof CSV). Seedocs/cli_reference.mdfor details.
| Region | Example Number | Date Order | CLI Flags |
|---|---|---|---|
| 🇺🇸 US / 🇨🇦 Canada | 1,234.56 | MM/DD | (defaults) |
| 🇬🇧 UK / 🇦🇺 Commonwealth | 1,234.56 | DD/MM | --dayfirst |
| 🇪🇺 Continental Europe | 1.234,56 | DD/MM | --decimal "," --thousands "." --dayfirst |
| 🇨🇭 Switzerland | 1'234.56 | DD/MM | --thousands "'" --dayfirst |
Auto-Detection and Normalization: BOM-safe UTF-8 encodings, , ; | \t delimiters, and automatic currency symbol stripping.
Discover → Suggest → Categorize → Repeat
FinLang's Growth Loop accelerates rule creation through data-driven discovery.
- Discover uncategorized counterparties
- Suggest new rules in seconds (1:1 mapping in exact mode)
- Merge + Re-run for incremental coverage gains
- Validated Result: 97.8% success on addressable patterns
- ROI: 8.8 transactions categorized per new rule
📄 See: docs/growth_loop_best_practices.md
⚠️ --emit-match fuzzy(default) filters corporate stopwords (LTD, LLC, PLC, INC, GROUP, COMPANY, CO, SAS, GMBH, CORP) and deduplicates patterns within a batch (v0.7.7). Edge cases with very short counterparty names may still produce broad patterns. → Use--emit-match exactfor production workflows.⚠️ Hyphenated/apostrophe names may affect fuzzy matching (< 1% impact).⚠️ No support for non-Gregorian calendars or non-Western numerals.
docs/release_notes/v0_7_7.mddocs/release_notes/v0_7_6.mddocs/runtime_contract.mddocs/cli_reference.mddocs/rulepacks.mddocs/benchmarks.mddocs/growth_loop_best_practices.mddocs/amount_synthesis.mddocs/i18n_examples.mddocs/stateless_processing.md
Command-line help:
finlang --help
finlang-discover --help
finlang-suggest --helpfinlang --input bank.csv --output categorized.csv \
--rules examples/rules.demo.fin \
--include-pack retail,transport,subs \
--fastio --audit audit_log.json --audit-mode liteFinLang is open source under the GNU Affero General Public License (AGPL-3.0).
Commercial licenses and enterprise support are available via FinLang Ltd.
📧 info@finlang.io
🌐 https://finlang.io
Contributions are welcome! Before submitting a PR, please review and accept our Contributor Licence Agreement (CLA).
| Component | Version | Status |
|---|---|---|
| Core Engine | v0.7.7 | ✅ Production-Ready |
| CLI Suite | v0.7.7 | ✅ Validated (118 tests, 9 gates) |
| Discover/Suggest | v0.7.7 | ✅ 97.8% accuracy |
| Integrity Test | v0.7.7 | ✅ 20M rows verified, ~217K rows/sec FastIO |
| Verify | v0.7.7 | ✅ Built-in --verify / --verify-full |
| Docs | v0.7.7 | ✅ Complete |
| Python Support | 3.10—3.14 | ✅ Tested |