Skip to content

Commit 2a9d0b4

Browse files
committed
Added example about how to write back the parsed tree, as requested at issue #3
1 parent 19cad56 commit 2a9d0b4

6 files changed

Lines changed: 209 additions & 27 deletions

File tree

.github/workflows/pre-commit.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,11 +10,11 @@ jobs:
1010
runs-on: ubuntu-latest
1111
strategy:
1212
matrix:
13-
python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11" ]
13+
python-version: [ "3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13" ]
1414
name: Pre-commit python ${{ matrix.python-version }}
1515
steps:
16-
- uses: actions/checkout@v3
17-
- uses: actions/setup-python@v4
16+
- uses: actions/checkout@v4
17+
- uses: actions/setup-python@v5
1818
id: cachepy
1919
with:
2020
python-version: ${{ matrix.python-version }}
@@ -34,17 +34,17 @@ jobs:
3434
- name: 'Install dev requirements'
3535
run: pip install -r dev-requirements.txt -r mypy-requirements.txt
3636
- name: MyPy cache
37-
uses: actions/cache@v3
37+
uses: actions/cache@v4
3838
with:
3939
path: .mypy_cache/${{ matrix.python-version }}
4040
key: mypy-${{ matrix.python-version }}
4141
- name: 'pre-commit'
42-
uses: pre-commit/action@v3.0.0
42+
uses: pre-commit/action@v3.0.1
4343
# if: ${{ matrix.python-version != '3.6' }}
4444
with:
4545
extra_args: --all -c .pre-commit-config.yaml
4646
# - name: 'pre-commit (custom Python ${{ matrix.python-version }})'
47-
# uses: pre-commit/action@v3.0.0
47+
# uses: pre-commit/action@v3.0.1
4848
# if: ${{ matrix.python-version == '3.6' }}
4949
# with:
5050
# extra_args: --all -c .pre-commit-config-gh-${{ matrix.python-version }}.yaml
@@ -60,7 +60,7 @@ jobs:
6060
- name: Print licences report
6161
if: ${{ always() }}
6262
run: echo "${{ steps.license_check_report.outputs.report }}"
63-
- uses: actions/upload-artifact@v3
63+
- uses: actions/upload-artifact@v4
6464
with:
6565
retention-days: 2
6666
path: constraints-${{ matrix.python-version }}.txt
@@ -71,8 +71,8 @@ jobs:
7171
needs:
7272
- pre-commit
7373
steps:
74-
- uses: actions/checkout@v3
75-
- uses: actions/download-artifact@v3
74+
- uses: actions/checkout@v4
75+
- uses: actions/download-artifact@v4
7676
with:
7777
path: changes-dir
7878
- name: Move artifacts to their right place
@@ -81,7 +81,7 @@ jobs:
8181
rm -r changes-dir/artifact
8282
- name: Create Pull Request
8383
id: cpr
84-
uses: peter-evans/create-pull-request@v4
84+
uses: peter-evans/create-pull-request@v7
8585
with:
8686
title: Updated constraints (triggered by ${{ github.sha }})
8787
delete-branch: true

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ repos:
3131
# args: [--strict, --show-error-codes, --no-warn-unused-ignores, --python-executable, .pyWEenv/bin/python]
3232
# - repo: meta
3333
- repo: https://github.com/jmfernandez/pre-commit_mirrors-actionlint.git
34-
rev: v1.6.24
34+
rev: v1.7.1
3535
hooks:
3636
- id: actionlint
3737

README.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ pip install git+https://github.com/inab/python-groovy-parser.git
2222

2323
## Test programs
2424

25-
This repo contains a couple of test programs called
26-
[translated-groovy3-parser.py](translated-groovy3-parser.py) and
27-
[cached-translated-groovy3-parser.py](cached-translated-groovy3-parser.py),
25+
This repo contains three test programs called
26+
[translated-groovy3-parser.py](translated-groovy3-parser.py),
27+
[cached-translated-groovy3-parser.py](cached-translated-groovy3-parser.py) and [parser-groovy-writer.py](parser-groovy-writer.py),
2828
which demonstrate how to use the parser and digest it a bit.
2929

30-
The programs take one or more files as input.
30+
All the programs take one or more files as input.
3131

3232
```bash
3333
git pull https://github.com/nf-core/rnaseq.git
@@ -42,8 +42,8 @@ Also, when the parsing task worked properly, it condenses and serializes
4242
the parse tree into a file with extension `.lark.json` (for instance,
4343
`rnaseq/modules/local/bedtools_genomecov.nf.lark.json`).
4444

45-
And as a proof of concept, it tries to identify features from Nextflow files,
46-
like the declared processes, includes and workflows, and they are roughly printed
45+
The first two programs try, as a proof of concept, to identify features from Nextflow files,
46+
like the declared `process`, `include` and `workflow`, and they are roughly printed
4747
at a file with extension `.lark.result` (for instance `rnaseq/modules/local/bedtools_genomecov.nf.lark.result`).
4848

4949
As parsing task is heavy, the parsing module also contains a method to
@@ -59,6 +59,13 @@ The caching directory contents depend on the grammar and the implementations, as
5959
So, if this software is updated (due grammar is updated or a bug is fixed),
6060
cached contents from previous versions are not reused.
6161

62+
The third program `parser-groovy-writer.py` was written thinking on a request from an
63+
issue, where the issuer wanted to write back the parsed tree after some processing.
64+
So, this program writes in a new file with extension `.mirrored` what it survived the parsing.
65+
In the current implementation there are some elements, like comments and some combinations of
66+
whitespaces, which are not propagated from the tokenizer to the lexer and parser,
67+
so they are not reintegrated.
68+
6269
# Acknowledgements
6370

6471
The tokenizer is an evolution from Pygments Groovy lexer https://github.com/pygments/pygments/blob/b7c8f35440f591c6687cb912aa223f5cf37b6704/pygments/lexers/jvm.py#L543-L618

groovy_parser/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# -*- coding: utf-8 -*-
33

44
# SPDX-License-Identifier: Apache-2.0
5-
# Copyright (C) 2023 Barcelona Supercomputing Center, José M. Fernández
5+
# Copyright (C) 2024 Barcelona Supercomputing Center, José M. Fernández
66
#
77
# Licensed under the Apache License, Version 2.0 (the "License");
88
# you may not use this file except in compliance with the License.
@@ -17,8 +17,8 @@
1717
# limitations under the License.
1818

1919
__author__ = "José M. Fernández <https://orcid.org/0000-0002-4806-5140>"
20-
__copyright__ = 2023 Barcelona Supercomputing Center (BSC), ES"
20+
__copyright__ = 2024 Barcelona Supercomputing Center (BSC), ES"
2121
__license__ = "Apache-2.0"
2222

2323
# https://www.python.org/dev/peps/pep-0396/
24-
__version__ = "0.1.1"
24+
__version__ = "0.1.2"

groovy_parser/parser.py

Lines changed: 34 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# -*- coding: utf-8 -*-
33

44
# SPDX-License-Identifier: Apache-2.0
5-
# Copyright (C) 2023 Barcelona Supercomputing Center, José M. Fernández
5+
# Copyright (C) 2024 Barcelona Supercomputing Center, José M. Fernández
66
#
77
# Licensed under the Apache License, Version 2.0 (the "License");
88
# you may not use this file except in compliance with the License.
@@ -118,11 +118,23 @@ def default( # type: ignore[override]
118118
and isinstance(children[0], LarkTree)
119119
and children[0].data not in noflat
120120
):
121-
return self.default(children[0], rule=new_rule)
121+
return self.default(
122+
children[0],
123+
rule=new_rule,
124+
prune=prune,
125+
noflat=noflat,
126+
)
122127
else:
123128
return {
124129
"rule": new_rule,
125-
"children": [self.default(child) for child in children],
130+
"children": [
131+
self.default(
132+
child,
133+
prune=prune,
134+
noflat=noflat,
135+
)
136+
for child in children
137+
],
126138
}
127139
else:
128140
# No children!!!!!!!
@@ -176,8 +188,16 @@ def parse_groovy_content(content: "str") -> "ParseTree":
176188
return tree
177189

178190

179-
def digest_lark_tree(tree: "ParseTree") -> "Union[RuleNode, LeafNode, EmptyNode]":
180-
return LarkFilteringTreeEncoder().default(tree)
191+
def digest_lark_tree(
192+
tree: "ParseTree",
193+
prune: "Sequence[str]" = ["sep", "nls"],
194+
noflat: "Sequence[str]" = ["script_statement"],
195+
) -> "Union[RuleNode, LeafNode, EmptyNode]":
196+
return LarkFilteringTreeEncoder().default(
197+
tree,
198+
prune=prune,
199+
noflat=noflat,
200+
)
181201

182202

183203
SIGNATURE_FILES = [
@@ -196,7 +216,10 @@ def digest_lark_tree(tree: "ParseTree") -> "Union[RuleNode, LeafNode, EmptyNode]
196216

197217

198218
def parse_and_digest_groovy_content(
199-
content: "str", cache_directory: "Optional[str]" = None
219+
content: "str",
220+
cache_directory: "Optional[str]" = None,
221+
prune: "Sequence[str]" = ["sep", "nls"],
222+
noflat: "Sequence[str]" = ["script_statement"],
200223
) -> "Union[RuleNode, LeafNode, EmptyNode]":
201224
t_tree: "Optional[Union[RuleNode, LeafNode, EmptyNode]]" = None
202225
hashfile: "Optional[str]" = None
@@ -240,7 +263,11 @@ def parse_and_digest_groovy_content(
240263

241264
if t_tree is None:
242265
tree = parse_groovy_content(content)
243-
t_tree = LarkFilteringTreeEncoder().default(tree)
266+
t_tree = LarkFilteringTreeEncoder().default(
267+
tree,
268+
prune=prune,
269+
noflat=noflat,
270+
)
244271

245272
if hashfile is not None:
246273
with gzip.open(hashfile, mode="wt", encoding="utf-8") as jH:

parser-groovy-writer.py

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
4+
# SPDX-License-Identifier: Apache-2.0
5+
# groovy-parser, a proof of concept Groovy parser based on Pygments and Lark
6+
# Copyright (C) 2024 Barcelona Supercomputing Center, José M. Fernández
7+
#
8+
# Licensed under the Apache License, Version 2.0 (the "License");
9+
# you may not use this file except in compliance with the License.
10+
# You may obtain a copy of the License at
11+
#
12+
# http://www.apache.org/licenses/LICENSE-2.0
13+
#
14+
# Unless required by applicable law or agreed to in writing, software
15+
# distributed under the License is distributed on an "AS IS" BASIS,
16+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
17+
# See the License for the specific language governing permissions and
18+
# limitations under the License.
19+
20+
import json
21+
import logging
22+
import os
23+
import re
24+
import sys
25+
26+
from typing import (
27+
cast,
28+
NamedTuple,
29+
TYPE_CHECKING,
30+
)
31+
32+
if TYPE_CHECKING:
33+
from typing import (
34+
IO,
35+
Iterator,
36+
MutableSequence,
37+
Optional,
38+
Sequence,
39+
Tuple,
40+
Union,
41+
)
42+
43+
from groovy_parser.parser import (
44+
EmptyNode,
45+
LeafNode,
46+
RuleNode,
47+
)
48+
49+
from pygments.token import Token
50+
51+
from groovy_parser.parser import (
52+
parse_groovy_content,
53+
digest_lark_tree,
54+
)
55+
56+
from lark import (
57+
Lark,
58+
Transformer,
59+
v_args,
60+
)
61+
from lark.visitors import Discard
62+
63+
prev_wants_space = False
64+
65+
66+
def write_groovy(
67+
t_tree: "Union[RuleNode, LeafNode, EmptyNode]",
68+
mH: "IO[str]",
69+
reset_prev_wants_space: "bool" = False,
70+
) -> None:
71+
global prev_wants_space
72+
if reset_prev_wants_space:
73+
prev_wants_space = False
74+
children = cast("RuleNode", t_tree).get("children")
75+
if children is not None:
76+
for child in children:
77+
write_groovy(child, mH=mH)
78+
else:
79+
leaf = cast("LeafNode", t_tree).get("leaf")
80+
value = cast("LeafNode", t_tree).get("value")
81+
if value is not None and leaf is not None:
82+
wants_space = False
83+
print(f"Leaf {leaf} value {value}")
84+
if prev_wants_space and leaf in (
85+
"STRING_LITERAL",
86+
"IDENTIFIER",
87+
"CAPITALIZED_IDENTIFIER",
88+
"LBRACE",
89+
"GSTRING_BEGIN",
90+
):
91+
# These whitespace separators were silenced
92+
mH.write(" ")
93+
94+
if leaf == "STRING_LITERAL":
95+
mH.write("'")
96+
97+
mH.write(value)
98+
if leaf in ("IDENTIFIER", "CAPITALIZED_IDENTIFIER", "RBRACE", "COMMA"):
99+
wants_space = True
100+
elif leaf == "STRING_LITERAL":
101+
mH.write("'")
102+
103+
prev_wants_space = wants_space
104+
105+
106+
def mirror_groovy_source(
107+
filename: "str", jsonfile: "str", mirror_filename: "str"
108+
) -> "Union[RuleNode, LeafNode, EmptyNode]":
109+
with open(filename, mode="r", encoding="utf-8") as wfH:
110+
content = wfH.read()
111+
112+
tree = parse_groovy_content(content)
113+
114+
t_tree = digest_lark_tree(tree, prune=[])
115+
116+
# These are for debugging purposes
117+
# logging.debug(tree.pretty())
118+
# with open(jsonfile, mode="w", encoding="utf-8") as jH:
119+
# json.dump(tree, jH, indent=4, cls=LarkFilteringTreeEncoder)
120+
with open(jsonfile, mode="w", encoding="utf-8") as jH:
121+
json.dump(t_tree, jH, indent=4)
122+
123+
with open(mirror_filename, mode="w", encoding="utf-8") as mH:
124+
write_groovy(t_tree, mH, reset_prev_wants_space=True)
125+
126+
return t_tree
127+
128+
129+
if __name__ == "__main__":
130+
logging.basicConfig(
131+
level=logging.DEBUG,
132+
)
133+
log = logging.getLogger() # root logger
134+
for filename in sys.argv[1:]:
135+
print(f"* Parsing {filename}")
136+
logfile = filename + ".lark"
137+
jsonfile = logfile + ".json"
138+
mirrored_filename = filename + ".mirrored"
139+
fH = logging.FileHandler(logfile, mode="w", encoding="utf-8")
140+
for hdlr in log.handlers[:]: # remove all old handlers
141+
log.removeHandler(hdlr)
142+
log.addHandler(fH) # set the new handler
143+
try:
144+
mirror_groovy_source(filename, jsonfile, mirrored_filename)
145+
except Exception as e:
146+
print(f"\tParse failed, see {logfile}")
147+
logging.exception("Parse failed")
148+
fH.close()

0 commit comments

Comments
 (0)