Skip to content

Commit 04c3a08

Browse files
committed
repl api for parsers; moved parser-specific stuff to plugins
1 parent f528978 commit 04c3a08

12 files changed

Lines changed: 612 additions & 1204 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- unified parameter interface for parsers.
1313
- method `Parser.accepted_params`.
1414
- maximum depth protection for parsers.
15+
- repl api for parsers.
1516

1617
### Changed
1718

docs/installation.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -124,18 +124,18 @@ uv run hyperbase parsers
124124
Once installed, parsers can be used from the interactive REPL:
125125

126126
```bash
127-
hyperbase repl --parser alphabeta --language en
127+
hyperbase repl --parser alphabeta --lang en
128128
```
129129

130130
```bash
131-
uv run hyperbase repl --parser alphabeta --language en
131+
uv run hyperbase repl --parser alphabeta --lang en
132132
```
133133

134134
Or programmatically:
135135

136136
```python
137137
from hyperbase.parsers import get_parser
138138

139-
parser = get_parser("alphabeta", language="en")
139+
parser = get_parser("alphabeta", lang="en")
140140
result = parser.parse_text("The sky is blue.")
141141
```

docs/manual/parsers.md

Lines changed: 5 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ Parsers are obtained by name with `get_parser()`:
2020
```python
2121
from hyperbase import get_parser
2222

23-
parser = get_parser("alphabeta", language="en")
23+
parser = get_parser("alphabeta", lang="en")
2424
```
2525

26-
The keyword arguments are forwarded to the parser constructor. Each parser plugin defines its own parameters -- for example, `alphabeta` takes a `language` code, while `generative` accepts `model_path`, `device`, `max_length`, and others.
26+
The keyword arguments are forwarded to the parser constructor. Each parser plugin defines its own parameters -- for example, `alphabeta` takes a `lang` code, while `generative` accepts `model_path`, `device`, `max_length`, and others. Run `hyperbase repl --parser <name> --help` (or `hyperbase read --parser <name> --help`) to see the full set of CLI flags injected by the active plugin.
2727

2828
To see which parsers are installed:
2929

@@ -125,42 +125,7 @@ This is what `read_source_to_jsonl()` uses internally -- each line in the output
125125

126126
## Quality checking
127127

128-
The `hyperbase.parsers.correctness` module provides functions to assess the quality of a parse result.
129-
130-
### Badness check
131-
132-
`badness_check()` runs a comprehensive quality check on a parsed edge, combining structural validation with token-to-atom matching:
133-
134-
```python
135-
from hyperbase.parsers.correctness import badness_check
136-
137-
errors = badness_check(result.edge, result.tokens)
138-
if errors:
139-
for key, error_list in errors.items():
140-
for code, message, severity in error_list:
141-
print(f"[{code}] {message} (severity: {severity})")
142-
else:
143-
print("No errors found.")
144-
```
145-
146-
The function returns a dictionary mapping edge fragments (or the string `'token-matching'`) to lists of `(code, message, severity)` tuples. An empty dictionary means no errors were found.
147-
148-
The checks include:
149-
150-
- **Structural correctness** -- validates the hyperedge against the SH specification (via `Hyperedge.check_correctness()`).
151-
- **Argument role validation** -- checks that argument roles are drawn from the valid set (`m`, `s`, `p`, `a`, `o`, `i`, `x`, `t`, `j`, `r`, `c`) and that roles like `s`, `p`, `o` are not duplicated.
152-
- **Junction consistency** -- verifies that junction arguments are consistently typed (all relations or all concepts).
153-
- **Token matching** -- ensures that every token in the original sentence maps to an atom root in the edge, and vice versa. Handles multi-token atoms, contractions and other non-trivial correspondences.
154-
155-
### Structural quality only
156-
157-
For a lighter check that skips token matching:
158-
159-
```python
160-
from hyperbase.parsers.correctness import check_structural_quality
161-
162-
errors = check_structural_quality(result.edge)
163-
```
128+
Badness/correctness checking lives in the parser plugin that needs it. The generative parser ships [`hyperbase_parser_gen.correctness.badness_check`](https://github.com/telmomenezes/hyperbase-parser-gen) for combined structural + token-matching validation; see that package's docs for usage.
164129

165130
## CLI
166131

@@ -177,7 +142,7 @@ Shows all installed parser plugins and their entry point values.
177142
The REPL lets you parse sentences interactively:
178143

179144
```bash
180-
hyperbase repl --parser alphabeta --language en
145+
hyperbase repl --parser alphabeta --lang en
181146
```
182147

183148
Inside the REPL, type a sentence to parse it. Use `/help` to see available commands, `/settings` to view current configuration, and `/set` to change settings on the fly (e.g. `/set parser generative`). The REPL caches parser instances, so switching between parsers is fast after the first load.
@@ -186,7 +151,7 @@ Inside the REPL, type a sentence to parse it. Use `/help` to see available comma
186151

187152
```bash
188153
# Parse a file to JSONL
189-
hyperbase read article.txt -o output.jsonl --parser alphabeta --language en
154+
hyperbase read article.txt -o output.jsonl --parser alphabeta --lang en
190155

191156
# Parse a Wikipedia article
192157
hyperbase read https://en.wikipedia.org/wiki/Hypergraph -o output.jsonl

docs/manual/readers.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ hyperbase read article.txt -o output.txt
7070
hyperbase read https://en.wikipedia.org/wiki/Hypergraph -o output.jsonl
7171

7272
# Specify reader and parser explicitly
73-
hyperbase read source.txt -o output.jsonl --reader plain_text --parser alphabeta --language en
73+
hyperbase read source.txt -o output.jsonl --reader plain_text --parser alphabeta --lang en
7474
```
7575

7676
## Built-in readers

src/hyperbase/cli/__init__.py

Lines changed: 111 additions & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,101 @@
11
import argparse
22
import sys
33

4+
from hyperbase.parsers import Parser, list_parsers
5+
6+
7+
def _add_parser_args(
8+
subparser: argparse.ArgumentParser, parser_cls: type[Parser]
9+
) -> None:
10+
"""Inject *parser_cls*-specific CLI arguments into *subparser*.
11+
12+
The arguments are derived from ``parser_cls.accepted_params()``: each
13+
accepted parameter becomes ``--<name>``. Boolean parameters become
14+
``store_true`` flags. ``max_depth`` (declared on the base
15+
:class:`Parser`) is added once globally, not per plugin.
16+
"""
17+
for name, info in parser_cls.accepted_params().items():
18+
flag = f"--{name}"
19+
# Avoid clobbering an arg that's already been added (e.g. when
20+
# the same name appears on multiple subparsers, or when both base
21+
# and subclass declare it).
22+
if any(
23+
flag in (action.option_strings or [])
24+
for action in subparser._actions # type: ignore[attr-defined]
25+
):
26+
continue
27+
type_: type = info.get("type", str)
28+
help_str: str = info.get("description", "") or ""
29+
if type_ is bool:
30+
subparser.add_argument(
31+
flag,
32+
action="store_true",
33+
default=None,
34+
help=help_str,
35+
)
36+
else:
37+
subparser.add_argument(
38+
flag,
39+
type=type_,
40+
default=None,
41+
help=help_str,
42+
)
43+
44+
45+
def _resolve_parser_name(
46+
argv: list[str], subcommand: str, default: str | None
47+
) -> str | None:
48+
"""Look ahead in *argv* for ``--parser <name>`` under *subcommand*.
49+
50+
Returns the parser name to use for dynamically injecting parser-specific
51+
args, or *default* if not specified. Falls back to the saved REPL
52+
settings if no value is on the command line and *default* is ``None``.
53+
"""
54+
pre = argparse.ArgumentParser(add_help=False)
55+
pre.add_argument("--parser", default=None)
56+
57+
# Strip everything before the subcommand so the pre-parser only
58+
# sees flags for the right subcommand. ``parse_known_args`` ignores
59+
# the rest.
60+
try:
61+
idx = argv.index(subcommand)
62+
rest = argv[idx + 1 :]
63+
except ValueError:
64+
rest = argv
65+
66+
pre_args, _ = pre.parse_known_args(rest)
67+
if pre_args.parser:
68+
return pre_args.parser
69+
if default is not None:
70+
return default
71+
72+
# Fall back to whatever the REPL last saved.
73+
try:
74+
from hyperbase.cli.repl import load_saved_settings
75+
76+
saved = load_saved_settings()
77+
if saved.get("parser"):
78+
return str(saved["parser"])
79+
except Exception:
80+
pass
81+
return None
82+
83+
84+
def _maybe_load_parser_class(name: str | None) -> type[Parser] | None:
85+
if not name:
86+
return None
87+
parsers = list_parsers()
88+
if name not in parsers:
89+
return None
90+
try:
91+
return parsers[name].load() # type: ignore[no-any-return]
92+
except Exception as e:
93+
print(
94+
f"Warning: failed to load parser {name!r}: {e}",
95+
file=sys.stderr,
96+
)
97+
return None
98+
499

5100
def main() -> None:
6101
parser = argparse.ArgumentParser(
@@ -35,7 +130,7 @@ def main() -> None:
35130
read_parser.add_argument(
36131
"--parser",
37132
type=str,
38-
default="generative",
133+
default=None,
39134
help="Parser plugin name (default: generative)",
40135
)
41136
read_parser.add_argument(
@@ -44,24 +139,6 @@ def main() -> None:
44139
default="auto",
45140
help="Reader name or 'auto' (default: auto)",
46141
)
47-
read_parser.add_argument(
48-
"--model_path",
49-
type=str,
50-
default=None,
51-
help="Path to trained model (generative parser)",
52-
)
53-
read_parser.add_argument(
54-
"--language",
55-
type=str,
56-
default=None,
57-
help="Language for alphabeta parser",
58-
)
59-
read_parser.add_argument(
60-
"--device",
61-
type=str,
62-
default=None,
63-
help="Device to use (cuda/cpu/mps)",
64-
)
65142
read_parser.add_argument(
66143
"--batch_size",
67144
type=int,
@@ -75,68 +152,32 @@ def main() -> None:
75152
help="Interactive REPL for SH parsers",
76153
formatter_class=argparse.RawDescriptionHelpFormatter,
77154
)
78-
79155
repl_parser.add_argument(
80156
"--parser",
81157
type=str,
82158
default=None,
83-
help="Parser plugin name (e.g. generative, alphabeta)",
84-
)
85-
repl_parser.add_argument(
86-
"--model_path",
87-
type=str,
88-
default=None,
89-
help="Path to trained model (generative parser)",
90-
)
91-
repl_parser.add_argument(
92-
"--language",
93-
type=str,
94-
default=None,
95-
help="Language for alphabeta parser (de, en, es, fr, pt, etc.)",
96-
)
97-
repl_parser.add_argument(
98-
"--max_length",
99-
type=int,
100-
default=None,
101-
help="Maximum sequence length (generative parser)",
102-
)
103-
repl_parser.add_argument(
104-
"--num_beams",
105-
type=int,
106-
default=None,
107-
help="Number of beams for beam search (generative parser)",
108-
)
109-
repl_parser.add_argument(
110-
"--num_candidates",
111-
type=int,
112-
default=None,
113-
help="Number of candidates for beam search (generative parser)",
114-
)
115-
repl_parser.add_argument(
116-
"--use_constraints",
117-
action="store_true",
118-
default=None,
119-
help="Enable post-generation SH constraint validation (generative parser)",
120-
)
121-
repl_parser.add_argument(
122-
"--check_badness",
123-
action="store_true",
124-
default=None,
125-
help="Enable badness check after parsing",
159+
help="Parser plugin name",
126160
)
127161
repl_parser.add_argument(
128162
"--statistics",
129163
action="store_true",
130164
default=None,
131-
help="Show parse statistics",
132-
)
133-
repl_parser.add_argument(
134-
"--device",
135-
type=str,
136-
default=None,
137-
help="Device to use (cuda/cpu/mps)",
165+
help="Show parse statistics after each parse",
138166
)
139167

168+
# Dynamically inject parser-specific args, derived from the active
169+
# parser's ``accepted_params()``. We do this in two passes so that
170+
# plugin packages stay the source of truth for their CLI surface.
171+
argv = sys.argv[1:]
172+
for sub_name, sub_parser, default_parser in (
173+
("read", read_parser, "generative"),
174+
("repl", repl_parser, None),
175+
):
176+
active = _resolve_parser_name(argv, sub_name, default_parser)
177+
cls = _maybe_load_parser_class(active)
178+
if cls is not None:
179+
_add_parser_args(sub_parser, cls)
180+
140181
args = parser.parse_args()
141182

142183
if args.command is None:

src/hyperbase/cli/read.py

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,11 @@
22
import os
33
import sys
44

5-
from hyperbase.parsers import get_parser
5+
from hyperbase.parsers import get_parser, list_parsers
66
from hyperbase.readers import get_reader
77

8+
DEFAULT_PARSER = "generative"
9+
810

911
def run_read(args: argparse.Namespace) -> None:
1012
ext = os.path.splitext(args.output)[1].lower()
@@ -27,24 +29,34 @@ def run_read(args: argparse.Namespace) -> None:
2729
)
2830
sys.exit(1)
2931

30-
# Build parser kwargs
31-
kwargs = {}
32-
if args.parser == "generative":
33-
if args.model_path:
34-
kwargs["model_path"] = args.model_path
35-
if args.device:
36-
kwargs["device"] = args.device
37-
elif args.parser == "alphabeta":
38-
if args.language:
39-
kwargs["lang"] = args.language
32+
parser_name: str = getattr(args, "parser", None) or DEFAULT_PARSER
33+
34+
parsers = list_parsers()
35+
if parser_name not in parsers:
36+
avail = ", ".join(sorted(parsers)) or "(none)"
37+
print(
38+
f"Error: parser {parser_name!r} is not installed. Available: {avail}",
39+
file=sys.stderr,
40+
)
41+
sys.exit(1)
42+
parser_cls = parsers[parser_name].load()
43+
44+
# Build kwargs from the parser's own ``accepted_params``: every
45+
# parser-specific CLI flag was injected by ``hyperbase.cli`` based on
46+
# the same dict, so this is just the inverse mapping.
47+
kwargs: dict[str, object] = {}
48+
for name in parser_cls.accepted_params():
49+
value = getattr(args, name, None)
50+
if value is not None:
51+
kwargs[name] = value
4052

4153
try:
42-
parser = get_parser(args.parser, **kwargs)
54+
parser = get_parser(parser_name, **kwargs)
4355
except ValueError as e:
4456
print(f"Error: {e}", file=sys.stderr)
4557
sys.exit(1)
4658

47-
print(f"Parser: {args.parser}", file=sys.stderr)
59+
print(f"Parser: {parser_name}", file=sys.stderr)
4860

4961
sentences = 0
5062
edges = 0

0 commit comments

Comments
 (0)