Fix nested function calls in WHERE clause#12
Conversation
The SQL parser was routing legacy functions (CONTAINS, STARTSWITH, ENDSWITH, etc.) through the string-based FunctionCondition path even when their arguments contained nested function calls like LOWER() or UPPER(). The string-based ResolveValue method cannot parse function call syntax, causing a JsonException. Added a check for nested function calls in arguments, routing them through the modern SqlExpressionCondition path which recursively evaluates expressions. Bumps version to 2.0.184.
13 additional test cases covering: - 3-level deep nesting: CONTAINS(LOWER(TRIM(...))) - Nested inside nested: STARTSWITH(CONCAT(LOWER(...), ...), ...) - Both args as functions: CONTAINS(LOWER(...), LOWER(...)) - 3-arg CONTAINS with nested function + case-insensitive flag - IS_DEFINED/IS_NULL with nested function args - ARRAY_CONTAINS with nested function in search value - NOT combined with nested function predicate - AND with multiple nested function predicates - Undefined property through nested function - Null value flowing through nested function - Nested function in second argument only - Type conversion via ToString() in nested function
IS_DEFINED() in EvaluateSqlFunction had a fallthrough that returned true for UndefinedValue.Instance (args[0] != null). This was exposed by routing nested-function legacy calls through the modern path. Fixed to check: args[0] is not null and not UndefinedValue. This also fixes the 8 pre-existing GitHubIssueReproTests failures. Added 4 edge case tests: - IS_DEFINED(LOWER(c.nonexistent)) → false - IS_NULL(LOWER(c.nonexistent)) → false - CONTAINS(LOWER(c.numericField), 'x') → no match - ARRAY_CONTAINS + nested function combined with AND
Red-team edge case testing found 3 additional bugs in the nested function call fix: 1. ContainsFunctionCall only checked for nested function calls but missed CoalesceExpression (??) and arithmetic BinaryExpression in legacy function arguments. These still routed through the string- based FunctionCondition path, causing JsonException crashes. Added ContainsArithmetic and ContainsComplexExpression checks. 2. ARRAY_CONTAINS with UndefinedValue search value (from nested function on missing property) caused NullReferenceException. UndefinedValue.ToString() returns null, and searchStr.StartsWith throws NRE on object-type array elements. Fixed in both legacy and modern evaluation paths. Added 20 edge case tests covering coalesce, arithmetic, undefined propagation, deep nesting, type-checking, NOT, OR, and projection. Bumps version to 2.0.185.
|
I asked claude about this, because it sounded suspicious "For {this PR} , what is meant by " When detected, the function is routed through the modern SqlExpressionCondition path", particularly the "modern" bit, and why aren't we always using this path, is it a performance thing?" Response: "The word "modern" is misleading in the PR — there's nothing architecturally modern about it. There are just two evaluation paths in the parser, both written by the same author: FunctionCondition (the "legacy" path) — converts function arguments to strings via ExprToString(), then resolves them at evaluation time via ResolveValue(). This works for simple cases like CONTAINS(c.name, "foo") where arguments are field references or literals. It breaks when an argument is itself a function call like LOWER(c.name) because ExprToString produces the literal string "LOWER(c.name)" which ResolveValue can't handle. SqlExpressionCondition (the "modern" path) — keeps the full parsed expression tree and evaluates it recursively via EvaluateSqlExpression(). This handles arbitrary nesting because it walks the AST rather than flattening to strings. The reason both paths exist isn't performance — it's history. FunctionCondition was written first as a simple string-based approach. Later, the full expression evaluator (EvaluateSqlExpression) was built for more complex SQL features. The legacy path was never migrated because it worked fine for the simple cases it handled. Could we just route everything through SqlExpressionCondition? Yes, and that would be the cleaner fix — remove FunctionCondition entirely and always use the expression tree path. The PR takes the minimal-diff approach of only routing to the new path when it detects nested function calls, which leaves the bifurcation in place." So I think the best approach is to delete this non "modern" path completely. |
Problem
The SQL query parser throws a
JsonExceptioninResolveValuewhen processing queries with nested function calls in WHERE clauses (e.g.CONTAINS(LOWER(c.field), @value)). Single-level function calls work fine.Fixes #11
Root Cause
In
CosmosSqlParser.ToWhereExpression(), legacy functions (CONTAINS, STARTSWITH, ENDSWITH, ARRAY_CONTAINS, IS_DEFINED, IS_NULL) were always routed through the string-basedFunctionConditionpath viaExprToString(). When an argument was itself a function call likeLOWER(c.name), it was converted to the string"LOWER(c.name)"and passed toResolveValue(), which tried to parse it as a JSON path — causing the exception.Fix
Added a check in
ToWhereExpression()that detects when any argument of a legacy function contains a nested function call (using the existingContainsFunctionCallhelper). When detected, the function is routed through the modernSqlExpressionConditionpath, which recursively evaluates expression trees and already supports all the legacy functions.Tests Added
5 new tests in
SqlFunctionDeepDiveTests:Contains_Lower_InWhere_MatchesCaseInsensitivelyStartsWith_Upper_InWhere_MatchesCaseInsensitivelyEndsWith_Lower_InWhere_MatchesSuffixContains_Lower_InWhere_NoMatch_ReturnsEmptyContains_Lower_InWhere_MultipleMatchesVersion
Bumps all packages from 2.0.183 → 2.0.184.