Support parentheses for calculated measures#52
Conversation
This ensures that extend operations (which may reference summarized columns) are executed after the summarize clause in the generated KQL query.
…d hoc measures - Add _find_operator_outside_quotes() helper to find operators not inside quoted strings - Update _escape_and_quote_columns() to recursively escape both sides of operators - Update _is_number_literal() to match integers and decimals with digits on both sides - Add tests for: - Two quoted measures: "Measure 1" + "Measure 2" -> ["Measure 1"] + ["Measure 2"] - Measure with constant: "Measure 1" * 2 -> ["Measure 1"] * 2 - Measure with operator in name: "Measure 1-2" -> ["Measure 1-2"] - _is_number_literal function validation
- Add comprehensive unit tests for new helper functions: - TestIsInsideQuotesOrBrackets (10 tests) - TestFindMatchingParen (8 tests) - TestHasOperatorsOutsideQuotes (8 tests) - TestExtractAndReplaceAggregates (10 tests) - TestContainsAggregateFunction (10 tests) - Add comment explaining existing_aggs registration logic - Apply black formatting
Prevents double-escaping of expressions like (count(a) + count(b)) by detecting matching outer parentheses, processing the inner content recursively, and re-adding the parentheses.
- Add _ParseState class for unified parsing state tracking - Add _find_top_level_operator to find operators at depth 0 (outside quotes, brackets, and parens) - Add _count_outer_parens to handle nested parentheses in expressions - Add _wrap_column_refs_in_parens for arithmetic precedence - Update _escape_and_quote_columns to handle parenthesized expressions - Update _has_operators_outside_quotes to use new top-level operator detection - Refactor _is_inside_quotes_or_brackets and _find_matching_paren to use _ParseState - Remove old _find_operator_outside_quotes (replaced by _find_top_level_operator) - Update tests to reflect correct behavior for count(*) - no longer seen as expression with operators
- Add position constants to TestFindTopLevelOperator class - Add paren count constants to TestCountOuterParens class - Add closing position constants to TestFindMatchingParen class - Add aggregate count constants to TestExtractAndReplaceAggregates class - Add integration tests for parentheses preservation, quoted identifiers, uppercase functions, and extend after summarize
There was a problem hiding this comment.
Pull request overview
This pull request adds support for correctly parsing and handling calculated measures with parentheses in the Kusto KQL dialect. Previously, expressions like ("Measure 1") + ("Measure 2") or (("Measure")) were not parsed correctly, resulting in malformed KQL queries with incorrect bracketing and parentheses placement.
Changes:
- Introduced
_ParseStateclass to track parsing state (quotes, brackets, parentheses depth) and refactored helper functions to use it - Added
_count_outer_parens()to strip and count outer parentheses from expressions - Enhanced
_escape_and_quote_columns()to handle parenthesized expressions by stripping outer parens, processing operands recursively, and re-adding parens when needed - Modified query generation logic to place
summarizebeforeextendin the query structure - Changed
extend_columnsandsummarize_columnsfrom sets to lists to preserve column order - Added comprehensive unit and integration tests for parentheses handling
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| sqlalchemy_kusto/dialect_kql.py | Core implementation: Added _ParseState class, new helper functions for parentheses handling, updated _escape_and_quote_columns() and _get_projection_or_summarize() to support calculated measures with parentheses |
| tests/unit/test_dialect_kql.py | Added extensive unit tests for parentheses handling, operator detection, aggregate extraction, and helper functions. Updated existing tests to reflect new query structure (summarize before extend, preserved column order) |
| tests/integration/test_dialect_kql.py | Added integration tests to verify parentheses preservation and calculated measures work end-to-end with actual Kusto queries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| summarize_columns = [] | ||
| extend_columns = [] |
There was a problem hiding this comment.
The change from sets to lists for summarize_columns and extend_columns (lines 416-417) removes automatic deduplication. This means if the same column definition is added multiple times through the logic, it will appear multiple times in the final query, which could cause KQL syntax errors or unexpected behavior.
Consider adding explicit deduplication logic if needed, or document why duplicate entries are not possible in this context.
| if ch == "[": | ||
| self.in_bracket = True | ||
| elif ch == "]": | ||
| self.in_bracket = False |
There was a problem hiding this comment.
The _ParseState class tracks bracket state as a simple boolean (in_bracket), which assumes brackets are never nested. However, KQL bracket notation ["column"] could theoretically contain nested structures or be part of more complex expressions. While this is unlikely in practice for KQL column references, the implementation is inconsistent with how parentheses are tracked (using a depth counter).
Consider whether nested brackets are possible in your use case. If they are, implement a depth counter similar to paren_depth for brackets as well.
| for operator in ["/", "+", "-", "*"]: | ||
| if operator in name: | ||
| # Split the name at the first operator and wrap the left part | ||
| parts = name.split(operator, 1) | ||
| # Remove quotes if they exist at the edges | ||
| col_part = parts[0].strip() | ||
| if col_part.startswith('"') and col_part.endswith('"'): | ||
| col_part = col_part[1:-1].strip() | ||
| col_part = col_part.replace('"', '\\"') | ||
| return f'["{col_part}"] {operator} {parts[1].strip()}' # Wrap the column part | ||
| # No operators found, just wrap the entire name | ||
| pos = KustoKqlCompiler._find_top_level_operator(inner, operator) | ||
| if pos != -1: | ||
| left = KustoKqlCompiler._escape_and_quote_columns( | ||
| inner[:pos].strip() | ||
| ) | ||
| right = KustoKqlCompiler._escape_and_quote_columns( | ||
| inner[pos + 1 :].strip() | ||
| ) | ||
| return ( | ||
| "(" * outer_paren_count | ||
| + left | ||
| + " " | ||
| + operator | ||
| + " " | ||
| + right | ||
| + ")" * outer_paren_count | ||
| ) |
There was a problem hiding this comment.
The operator parsing order ["/", "+", "-", "*"] (line 599) doesn't respect standard mathematical operator precedence. In mathematics, multiplication and division have higher precedence than addition and subtraction, so a + b * c should be parsed as a + (b * c), not (a + b) * c.
The current implementation would split "col_a" + "col_b" * 2 at the / first (not found), then at +, producing ["col_a"] + ["col_b"] * 2, which happens to be correct by accident. However, for an expression like "a" / "b" + "c", it would split at / first, producing ["a"] / ["b"] + "c", which would then not split the + "c" part correctly on the right side.
To fix this, search for lower-precedence operators first. Change the order to ["+", "-", "*", "/"] or better yet ["+", "-"] then ["*", "/"] in separate passes. Alternatively, note that the current behavior might be intentional if all operators are meant to be treated with equal precedence in KQL contexts - in that case, please add a comment explaining this design decision.
| """Count and strip outer parentheses from text. Returns (count, stripped_text).""" | ||
| text = text.strip() | ||
| count = 0 | ||
| while len(text) > 1 and text[0] == "(" and text[-1] == ")": | ||
| depth = 0 | ||
| for ch in text[:-1]: # Scan all but last char | ||
| depth += (ch == "(") - (ch == ")") | ||
| if depth == 0: | ||
| return count, text # First '(' closed before end |
There was a problem hiding this comment.
The _count_outer_parens function doesn't account for quoted strings or brackets when counting parentheses. This could lead to incorrect behavior when parentheses appear inside quotes or brackets. For example, with input '("text(with)parens")', the function may incorrectly determine that the outer parenthesis doesn't wrap the whole expression because it doesn't track quote state while scanning.
Consider using the _ParseState class that was introduced for this purpose to properly handle quotes and brackets when counting parentheses.
| """Count and strip outer parentheses from text. Returns (count, stripped_text).""" | |
| text = text.strip() | |
| count = 0 | |
| while len(text) > 1 and text[0] == "(" and text[-1] == ")": | |
| depth = 0 | |
| for ch in text[:-1]: # Scan all but last char | |
| depth += (ch == "(") - (ch == ")") | |
| if depth == 0: | |
| return count, text # First '(' closed before end | |
| """Count and strip outer parentheses from text. Returns (count, stripped_text). | |
| This respects quoted strings and brackets by using the parser-aware | |
| `_find_matching_paren` helper, which relies on `_ParseState`. | |
| """ | |
| text = text.strip() | |
| count = 0 | |
| # Repeatedly strip a single layer of outer parentheses as long as the | |
| # opening parenthesis at position 0 matches the final character. | |
| while len(text) > 1 and text[0] == "(": | |
| match_idx = KustoKqlCompiler._find_matching_paren(text, 0) | |
| # If we didn't find a matching paren, or it doesn't close at the end, | |
| # then the outer '(' does not wrap the whole expression. | |
| if match_idx != len(text) - 1: | |
| break |
| assert "((" in query_compiled | ||
| assert "))" in query_compiled |
There was a problem hiding this comment.
The test test_parentheses_preserved_in_expression at line 205-228 expects double parentheses to be preserved in the output (("Text")), asserting that "((" and "))" appear in the compiled query. However, according to the unit test test_calculated_measure_double_paren at lines 592-605 in tests/unit/test_dialect_kql.py, the comment states that "Double parens around a single measure reference are stripped since they're unnecessary for precedence."
This creates an inconsistency: the integration test expects parens to be preserved, while the unit test comments suggest they should be stripped for single values with no operators. Please verify which behavior is correct and ensure the tests are consistent with the intended functionality.
| assert "((" in query_compiled | |
| assert "))" in query_compiled | |
| # Double parens around a single measure reference should be stripped as | |
| # they're unnecessary for precedence; only the identifier itself should remain. | |
| assert "((" not in query_compiled | |
| assert "))" not in query_compiled |
| ) | ||
| if extend_columns: | ||
| extend_statement = f"| extend {', '.join(sorted(extend_columns))}" | ||
| extend_statement = f"| extend {', '.join(extend_columns)}" |
There was a problem hiding this comment.
The change from sorted(extend_columns) to just extend_columns removes the deterministic ordering of extend columns. While the new behavior preserves the order in which columns appear in the SELECT clause (which is generally more intuitive), this is a breaking change in query output format that could affect users who depend on alphabetical ordering.
Consider documenting this behavior change explicitly in the PR description or release notes, as it may impact query caching, testing, or tools that parse the generated KQL queries.
| extend_statement = f"| extend {', '.join(extend_columns)}" | |
| extend_statement = f"| extend {', '.join(sorted(extend_columns))}" |
…ntermediary measures, and parentheses support for calculated measures
…ted measures PR dodopizza#52 already implements extend-after-summarize ordering through its intermediary measures approach, so conflicts are resolved by keeping PR dodopizza#52's more comprehensive implementation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Merge after this PR: #51
Overview
Previously, calculated measures with parentheses were not parsed correctly resulting in mangled queries that threw syntax errors on run. In the image below, both calculated measures have incorrect bracketing and parentheses placement. The calculated measure with parentheses and an ad hoc measure has incorrect escaping. The calculated measure with parentheses and a predefined measure has the original sqlalchemy function instead of the Kusto function.

This development fixes this by stripping outer parentheses from expressions/sub-expressions and adding them back
Technical details
_ParseState: Refactor tracking for string position being within quotes or brackets into new class to remove some code duplication. Class also tracks parentheses depth. Refactored helper functions to use_ParseState_count_outer_parens(): Strip and count number of outer parentheses. Return number of outer parentheses and stripped expression_escape_and_quote_columns(): Strip expressions of outer parentheses before quoting and escaping them. Then, re-add the parentheses if there is an operation within the expression/subexpression within the parentheses. Otherwise, omit them.UI Changes