Parses Python source files using ANTLR4 (Python3 grammar from grammars-v4), removes all AST nodes matching an ANTLR4 XPath expression or replacing them with some custom text, and writes the result to an output file.
Requirements:
- Java 17+
- Maven 3.6+
- build:
mvn package - test:
mvn test
String PythonTransformer.transform(Path inputPath, String xpathExpr, String replacement)Read a python file from inputPath, parses it into an AST, locates the nodes described by the xpathExpr query,
and creates python sources by replacing the text corresponding to those nodes by replacement text.
If replacement is empty, then the text will just be removed.
There is also an API function to apply multiple transformations (multiple query+replacement pairs).
transform(input, "//shebang", "")-- removes the shebang line defining the python interpretertransform(input, "//shebang", "python3")-- normalises the shebang linetransform(input, "//comment", "")-- removes all comments
antlr does not support full xpath syntax.
In particular, it does not support queries with node conditions, such as //comment[contains(@text, 'TODO')].
- pmd also uses xpath to query ASTs (Java, JS) and define antipatterns corresponding to bugs. The advantage of using antlr is that it has a grammar library supporting most mainstream languages
- treesitter has a LISP-like query language
- semgrep has a custom rule syntax to query syntax trees