Skip to content

Commit fe44933

Browse files
hyperpolymathclaude
andcommitted
feat(quandledb/krl): implement KRL parser suite — Lexer, AST, Parser, SqlFrontend + tests
Implements KRL (Knot Resolution Language) v0.1.0 parser per grammar.ebnf, conforming to standards/testing-and-benchmarking/TESTING-TAXONOMY.adoc. Components: Lexer.jl — single-pass tokeniser; knot-name tokens (3_1), Unicode ≅, nested block comments, bare-= rejection with helpful error Ast.jl — 40+ concrete node types, all position-annotated; ConfidenceLevel enum matching Idris2 ABI Parser.jl — recursive-descent, 9 expression precedence levels; pipeline error recovery (collect errors at | boundaries); Gauss code validation at parse time (non-empty, no zeros); parse_any SQL/KRL auto-dispatch SqlFrontend.jl — syntactic SQL→KRL translation; each clause maps directly to the corresponding KRL pipeline stage; unsupported SQL (DISTINCT, NULL, INSERT, UPDATE, DELETE) → helpful errors test/lexer_test.jl — unit + property + fuzz seed corpus test/parser_test.jl — unit + property + error-recovery + fuzz seed corpus test/sql_test.jl — SQL clause coverage, ordering, unsupported features, parse_any routing, AST equivalence property Also fixes _parse_sort_item bug: _advance!() → _advance!(ps). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 260dc69 commit fe44933

8 files changed

Lines changed: 3104 additions & 0 deletions

File tree

quandledb/server/krl/Ast.jl

Lines changed: 494 additions & 0 deletions
Large diffs are not rendered by default.

quandledb/server/krl/KRL.jl

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# SPDX-License-Identifier: MPL-2.0
2+
# (PMPL-1.0-or-later preferred; MPL-2.0 for Julia ecosystem consistency)
3+
# Copyright (c) 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>
4+
5+
"""
6+
KRL — Knot Resolution Language parser for QuandleDB.
7+
8+
Implements the parser for KRL v0.1.0 (grammar.ebnf, spec/type-system.md).
9+
Components:
10+
Lexer.jl — tokeniser (Token, tokenise, KRLLexError)
11+
Ast.jl — AST node types (KRLProgram, KRLQuery, …)
12+
Parser.jl — recursive-descent parser (parse_krl, parse_krl_query)
13+
SqlFrontend.jl — SQL→KRL translation layer (parse_sql)
14+
15+
Entry points (re-exported):
16+
parse_krl(src) — parse KRL source → KRLProgram
17+
parse_krl_query(src) — parse a single pipeline query → KRLQuery
18+
parse_sql(src) — parse SQL SELECT → KRLProgram (translated to KRL AST)
19+
parse_any(src) — auto-detect SQL vs KRL and dispatch
20+
21+
Usage from serve.jl:
22+
include("krl/KRL.jl")
23+
using .KRL: parse_any, parse_krl_query, KRLParseError
24+
"""
25+
module KRL
26+
27+
include("Lexer.jl")
28+
include("Ast.jl")
29+
include("Parser.jl")
30+
include("SqlFrontend.jl")
31+
32+
end # module KRL

quandledb/server/krl/Lexer.jl

Lines changed: 353 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,353 @@
1+
# SPDX-License-Identifier: MPL-2.0
2+
# (PMPL-1.0-or-later preferred; MPL-2.0 for Julia ecosystem consistency)
3+
# Copyright (c) 2026 Jonathan D.A. Jewell <j.d.a.jewell@open.ac.uk>
4+
5+
"""
6+
KRL Lexer — tokenises a KRL (Knot Resolution Language) source string.
7+
8+
Follows grammar.ebnf v0.1.0 exactly. All 50+ keywords are recognised.
9+
10+
Token kinds (Symbol tags):
11+
Literals: :integer — `123`
12+
:float — `1.5`
13+
:string — `"hello"`
14+
:bool — `true` / `false`
15+
:knot_name — `3_1`, `10_139` (digit `_` digit form)
16+
Names: :keyword — reserved word
17+
:identifier — user name or invariant name
18+
Operators: :eq (`==`)
19+
:neq (`!=`)
20+
:lt (`<`)
21+
:lte (`<=`)
22+
:gt (`>`)
23+
:gte (`>=`)
24+
:plus (`+`)
25+
:minus (`-`)
26+
:star (`*`)
27+
:slash (`/`)
28+
:percent (`%`)
29+
:pipe (`|`)
30+
:arrow (`->`)
31+
:fat_arrow (`=>`)
32+
:tilde_arrow (`~>`)
33+
:iso (`≅` or `~=` — normalised to one token kind)
34+
:null_coalesce (`??`)
35+
:dot (`.`)
36+
:colon (`:`)
37+
:comma (`,`)
38+
:semi (`;`)
39+
Delimiters: :lparen, :rparen, :lbracket, :rbracket, :lbrace, :rbrace
40+
Special: :eof
41+
42+
Innovations vs KRLAdapter.jl:
43+
- `:knot_name` token: `3_1`, `5_2`, `10_139` (digit-underscore-digit) emitted
44+
as a dedicated token, not split into integer + identifier.
45+
- Float literals: `1.5`, `3.14`.
46+
- Block comments `{- ... -}` with nesting depth tracking (depth > 1 is an
47+
innovation allowing nested comment-out blocks).
48+
- Unicode operator `≅` normalised to `:iso` (same as ASCII `~=`).
49+
- All 50 KRL keywords, including compound forms (`find_equivalent`, `group_by`).
50+
"""
51+
52+
export Token, TokenKind, tokenise, KRLLexError
53+
54+
# ─────────────────────────────────────────────────────────────────────────────
55+
# Token kind
56+
# ─────────────────────────────────────────────────────────────────────────────
57+
58+
const TokenKind = Symbol
59+
60+
# ─────────────────────────────────────────────────────────────────────────────
61+
# Reserved words (must stay in sync with grammar.ebnf keyword list)
62+
# ─────────────────────────────────────────────────────────────────────────────
63+
64+
const KRL_KEYWORDS = Set([
65+
# pipeline stages
66+
"from", "filter", "sort", "take", "skip", "return",
67+
"group_by", "aggregate",
68+
# equivalence / path queries
69+
"find_equivalent", "find_path", "match", "via",
70+
# logical / comparison
71+
"and", "or", "not", "in",
72+
# specifiers
73+
"as", "with", "let", "where",
74+
# provenance
75+
"provenance",
76+
# confidence levels
77+
"confidence", "exact", "necessary", "sufficient", "heuristic",
78+
# sort direction
79+
"asc", "desc",
80+
# aggregate functions (also plain identifiers in function-call position)
81+
"count", "min", "max", "avg", "sum",
82+
# collections
83+
"knots", "diagrams", "invariants",
84+
# path methods
85+
"reidemeister", "isotopy",
86+
# return items
87+
"equivalences", "equivalence_class", "proof",
88+
# rule / axiom
89+
"rule", "define", "axiom", "forall",
90+
# boolean literals (also handled in literal branch, keyword tag wins)
91+
"true", "false",
92+
# option absent
93+
"none",
94+
# type names (doubled as keywords so the parser can recognise them)
95+
"Int", "Float", "String", "Bool",
96+
"Knot", "Diagram", "Polynomial", "GaussCode", "Quandle",
97+
"Equivalence", "Option", "List", "Set", "Map", "ResultSet", "Provenance",
98+
])
99+
100+
# ─────────────────────────────────────────────────────────────────────────────
101+
# Token
102+
# ─────────────────────────────────────────────────────────────────────────────
103+
104+
"""
105+
Token
106+
107+
A single lexical unit produced by `tokenise`.
108+
109+
Fields:
110+
- `kind::TokenKind` — symbolic tag (see Lexer.jl header)
111+
- `value::String` — raw text of the token (empty for `:eof`)
112+
- `line::Int` — 1-based source line of first character
113+
- `col::Int` — 1-based column of first character
114+
"""
115+
struct Token
116+
kind::TokenKind
117+
value::String
118+
line::Int
119+
col::Int
120+
end
121+
122+
Base.show(io::IO, t::Token) =
123+
print(io, "Token($(t.kind), $(repr(t.value)), L$(t.line):C$(t.col))")
124+
125+
# ─────────────────────────────────────────────────────────────────────────────
126+
# Lex error
127+
# ─────────────────────────────────────────────────────────────────────────────
128+
129+
"""
130+
KRLLexError(msg, line, col)
131+
132+
Thrown by `tokenise` on unexpected characters, unterminated strings, or
133+
unterminated block comments.
134+
"""
135+
struct KRLLexError <: Exception
136+
msg::String
137+
line::Int
138+
col::Int
139+
end
140+
141+
Base.showerror(io::IO, e::KRLLexError) =
142+
print(io, "KRLLexError at L$(e.line):C$(e.col): $(e.msg)")
143+
144+
# ─────────────────────────────────────────────────────────────────────────────
145+
# Tokeniser
146+
# ─────────────────────────────────────────────────────────────────────────────
147+
148+
"""
149+
tokenise(src::String) -> Vector{Token}
150+
151+
Scan `src` and return all tokens. The last element is always `Token(:eof, "", …)`.
152+
153+
Throws `KRLLexError` on unrecognised characters, unterminated string literals,
154+
or unterminated block comments.
155+
"""
156+
function tokenise(src::String)::Vector{Token}
157+
tokens = Token[]
158+
chars = collect(src) # Unicode-aware char array
159+
n = length(chars)
160+
i = 1
161+
line = 1
162+
col = 1
163+
164+
# Inline helpers ─────────────────────────────────────────────────────────
165+
166+
@inline function advance!()
167+
c = chars[i]; i += 1
168+
if c == '\n'; line += 1; col = 1; else; col += 1; end
169+
c
170+
end
171+
172+
@inline peek() = i <= n ? chars[i] : '\0'
173+
@inline peek2() = i + 1 <= n ? chars[i + 1] : '\0'
174+
175+
@inline function emit(kind, val, sl, sc)
176+
push!(tokens, Token(kind, val, sl, sc))
177+
end
178+
179+
# Main scan loop ─────────────────────────────────────────────────────────
180+
181+
while i <= n
182+
sl = line; sc = col # start line/col of this token
183+
c = advance!()
184+
185+
# ── whitespace ───────────────────────────────────────────────────────
186+
if c == ' ' || c == '\t' || c == '\r' || c == '\n'
187+
continue
188+
end
189+
190+
# ── line comment: -- to EOL ──────────────────────────────────────────
191+
if c == '-' && peek() == '-'
192+
advance!()
193+
while i <= n && chars[i] != '\n'; advance!(); end
194+
continue
195+
end
196+
197+
# ── block comment: {- ... -} with nesting ────────────────────────────
198+
if c == '{' && peek() == '-'
199+
advance!() # consume '-'
200+
depth = 1
201+
while depth > 0
202+
i > n && throw(KRLLexError("unterminated block comment", sl, sc))
203+
ch = advance!()
204+
if ch == '{' && peek() == '-'; advance!(); depth += 1
205+
elseif ch == '-' && peek() == '}'; advance!(); depth -= 1
206+
end
207+
end
208+
continue
209+
end
210+
211+
# ── string literals ──────────────────────────────────────────────────
212+
if c == '"'
213+
buf = Char[]
214+
while true
215+
i > n && throw(KRLLexError("unterminated string literal", sl, sc))
216+
ch = advance!()
217+
if ch == '"'; break
218+
elseif ch == '\\'
219+
i > n && throw(KRLLexError("unterminated escape sequence", sl, sc))
220+
esc = advance!()
221+
push!(buf, esc == 'n' ? '\n' :
222+
esc == 't' ? '\t' :
223+
esc == 'r' ? '\r' : esc)
224+
else
225+
push!(buf, ch)
226+
end
227+
end
228+
emit(:string, String(buf), sl, sc)
229+
continue
230+
end
231+
232+
# ── numeric literals: integer, float, or knot_name ───────────────────
233+
# knot_name pattern: digit+ '_' digit+ (e.g. 3_1, 10_139)
234+
# float pattern: digit+ '.' digit+
235+
# integer pattern: digit+
236+
if isdigit(c)
237+
buf = [c]
238+
while i <= n && isdigit(peek()); push!(buf, advance!()); end
239+
if peek() == '_' && i + 1 <= n && isdigit(peek2())
240+
push!(buf, advance!()) # consume '_'
241+
while i <= n && isdigit(peek()); push!(buf, advance!()); end
242+
emit(:knot_name, String(buf), sl, sc)
243+
elseif peek() == '.' && i + 1 <= n && isdigit(peek2())
244+
push!(buf, advance!()) # consume '.'
245+
while i <= n && isdigit(peek()); push!(buf, advance!()); end
246+
emit(:float, String(buf), sl, sc)
247+
else
248+
emit(:integer, String(buf), sl, sc)
249+
end
250+
continue
251+
end
252+
253+
# ── identifiers and keywords ─────────────────────────────────────────
254+
# Identifiers may contain letters, digits, underscores, hyphens.
255+
# Compound keywords like `find_equivalent` and `group_by` are matched
256+
# as single identifiers and then classified via KRL_KEYWORDS.
257+
if isletter(c) || c == '_'
258+
buf = [c]
259+
while i <= n && (isletter(peek()) || isdigit(peek()) || peek() == '_' || peek() == '-')
260+
push!(buf, advance!())
261+
end
262+
word = String(buf)
263+
kind = word in KRL_KEYWORDS ? :keyword : :identifier
264+
emit(kind, word, sl, sc)
265+
continue
266+
end
267+
268+
# ── Unicode operator: ≅ (propositional equivalence) ──────────────────
269+
if c == ''
270+
emit(:iso, "", sl, sc)
271+
continue
272+
end
273+
274+
# ── two-character and single-character operators ──────────────────────
275+
276+
if c == '='
277+
if peek() == '='
278+
advance!(); emit(:eq, "==", sl, sc)
279+
else
280+
throw(KRLLexError("bare `=` is not a KRL operator; did you mean `==`?", sl, sc))
281+
end
282+
continue
283+
end
284+
285+
if c == '!'
286+
peek() == '=' || throw(KRLLexError("expected `!=`, got bare `!`", sl, sc))
287+
advance!(); emit(:neq, "!=", sl, sc)
288+
continue
289+
end
290+
291+
if c == '<'
292+
if peek() == '='; advance!(); emit(:lte, "<=", sl, sc)
293+
else; emit(:lt, "<", sl, sc); end
294+
continue
295+
end
296+
297+
if c == '>'
298+
if peek() == '='; advance!(); emit(:gte, ">=", sl, sc)
299+
else; emit(:gt, ">", sl, sc); end
300+
continue
301+
end
302+
303+
if c == '-'
304+
if peek() == '>'; advance!(); emit(:arrow, "->", sl, sc)
305+
else; emit(:minus, "-", sl, sc); end
306+
continue
307+
end
308+
309+
if c == '='
310+
if peek() == '>'; advance!(); emit(:fat_arrow, "=>", sl, sc)
311+
else; emit(:eq, "=", sl, sc); end
312+
continue
313+
end
314+
315+
if c == '~'
316+
if peek() == '>'; advance!(); emit(:tilde_arrow, "~>", sl, sc)
317+
elseif peek() == '='; advance!(); emit(:iso, "~=", sl, sc)
318+
else; throw(KRLLexError("expected `~>` or `~=`, got bare `~`", sl, sc)); end
319+
continue
320+
end
321+
322+
if c == '?'
323+
peek() == '?' || throw(KRLLexError("expected `??`, got bare `?`", sl, sc))
324+
advance!(); emit(:null_coalesce, "??", sl, sc)
325+
continue
326+
end
327+
328+
# single-char operators
329+
if c == '+'; emit(:plus, "+", sl, sc); continue; end
330+
if c == '*'; emit(:star, "*", sl, sc); continue; end
331+
if c == '/'; emit(:slash, "/", sl, sc); continue; end
332+
if c == '%'; emit(:percent, "%", sl, sc); continue; end
333+
if c == '|'; emit(:pipe, "|", sl, sc); continue; end
334+
if c == '.'; emit(:dot, ".", sl, sc); continue; end
335+
if c == ':'; emit(:colon, ":", sl, sc); continue; end
336+
if c == ','; emit(:comma, ",", sl, sc); continue; end
337+
if c == ';'; emit(:semi, ";", sl, sc); continue; end
338+
if c == '('; emit(:lparen, "(", sl, sc); continue; end
339+
if c == ')'; emit(:rparen, ")", sl, sc); continue; end
340+
if c == '['; emit(:lbracket, "[", sl, sc); continue; end
341+
if c == ']'; emit(:rbracket, "]", sl, sc); continue; end
342+
if c == '}'; emit(:rbrace, "}", sl, sc); continue; end
343+
344+
# '{' is a block-comment start only if followed by '-' (handled above).
345+
# Otherwise it opens a record literal.
346+
if c == '{'; emit(:lbrace, "{", sl, sc); continue; end
347+
348+
throw(KRLLexError("unexpected character $(repr(c))", sl, sc))
349+
end
350+
351+
emit(:eof, "", line, col)
352+
tokens
353+
end

0 commit comments

Comments
 (0)