Skip to content

Fix nested function calls in WHERE clause#12

Open
McNultyyy wants to merge 4 commits into
mainfrom
fix/nested-function-calls-in-where
Open

Fix nested function calls in WHERE clause#12
McNultyyy wants to merge 4 commits into
mainfrom
fix/nested-function-calls-in-where

Conversation

@McNultyyy
Copy link
Copy Markdown
Collaborator

Problem

The SQL query parser throws a JsonException in ResolveValue when processing queries with nested function calls in WHERE clauses (e.g. CONTAINS(LOWER(c.field), @value)). Single-level function calls work fine.

Fixes #11

Root Cause

In CosmosSqlParser.ToWhereExpression(), legacy functions (CONTAINS, STARTSWITH, ENDSWITH, ARRAY_CONTAINS, IS_DEFINED, IS_NULL) were always routed through the string-based FunctionCondition path via ExprToString(). When an argument was itself a function call like LOWER(c.name), it was converted to the string "LOWER(c.name)" and passed to ResolveValue(), which tried to parse it as a JSON path — causing the exception.

Fix

Added a check in ToWhereExpression() that detects when any argument of a legacy function contains a nested function call (using the existing ContainsFunctionCall helper). When detected, the function is routed through the modern SqlExpressionCondition path, which recursively evaluates expression trees and already supports all the legacy functions.

Tests Added

5 new tests in SqlFunctionDeepDiveTests:

  • Contains_Lower_InWhere_MatchesCaseInsensitively
  • StartsWith_Upper_InWhere_MatchesCaseInsensitively
  • EndsWith_Lower_InWhere_MatchesSuffix
  • Contains_Lower_InWhere_NoMatch_ReturnsEmpty
  • Contains_Lower_InWhere_MultipleMatches

Version

Bumps all packages from 2.0.183 → 2.0.184.

The SQL parser was routing legacy functions (CONTAINS, STARTSWITH,
ENDSWITH, etc.) through the string-based FunctionCondition path even
when their arguments contained nested function calls like LOWER() or
UPPER(). The string-based ResolveValue method cannot parse function
call syntax, causing a JsonException.

Added a check for nested function calls in arguments, routing them
through the modern SqlExpressionCondition path which recursively
evaluates expressions.

Bumps version to 2.0.184.
@McNultyyy McNultyyy requested a review from lemonlion as a code owner April 15, 2026 16:33
13 additional test cases covering:
- 3-level deep nesting: CONTAINS(LOWER(TRIM(...)))
- Nested inside nested: STARTSWITH(CONCAT(LOWER(...), ...), ...)
- Both args as functions: CONTAINS(LOWER(...), LOWER(...))
- 3-arg CONTAINS with nested function + case-insensitive flag
- IS_DEFINED/IS_NULL with nested function args
- ARRAY_CONTAINS with nested function in search value
- NOT combined with nested function predicate
- AND with multiple nested function predicates
- Undefined property through nested function
- Null value flowing through nested function
- Nested function in second argument only
- Type conversion via ToString() in nested function
IS_DEFINED() in EvaluateSqlFunction had a fallthrough that returned
true for UndefinedValue.Instance (args[0] != null). This was exposed
by routing nested-function legacy calls through the modern path.
Fixed to check: args[0] is not null and not UndefinedValue.

This also fixes the 8 pre-existing GitHubIssueReproTests failures.

Added 4 edge case tests:
- IS_DEFINED(LOWER(c.nonexistent)) → false
- IS_NULL(LOWER(c.nonexistent)) → false
- CONTAINS(LOWER(c.numericField), 'x') → no match
- ARRAY_CONTAINS + nested function combined with AND
Red-team edge case testing found 3 additional bugs in the nested
function call fix:

1. ContainsFunctionCall only checked for nested function calls but
   missed CoalesceExpression (??) and arithmetic BinaryExpression in
   legacy function arguments. These still routed through the string-
   based FunctionCondition path, causing JsonException crashes.
   Added ContainsArithmetic and ContainsComplexExpression checks.

2. ARRAY_CONTAINS with UndefinedValue search value (from nested
   function on missing property) caused NullReferenceException.
   UndefinedValue.ToString() returns null, and searchStr.StartsWith
   throws NRE on object-type array elements. Fixed in both legacy
   and modern evaluation paths.

Added 20 edge case tests covering coalesce, arithmetic, undefined
propagation, deep nesting, type-checking, NOT, OR, and projection.

Bumps version to 2.0.185.
@McNultyyy McNultyyy changed the title Fix nested function calls in WHERE clause (GitHub Issue #11) Fix nested function calls in WHERE clause #11 Apr 15, 2026
@McNultyyy McNultyyy changed the title Fix nested function calls in WHERE clause #11 Fix nested function calls in WHERE clause Apr 15, 2026
@lemonlion
Copy link
Copy Markdown
Owner

I asked claude about this, because it sounded suspicious

"For {this PR} , what is meant by " When detected, the function is routed through the modern SqlExpressionCondition path", particularly the "modern" bit, and why aren't we always using this path, is it a performance thing?"

Response:

"The word "modern" is misleading in the PR — there's nothing architecturally modern about it. There are just two evaluation paths in the parser, both written by the same author:

FunctionCondition (the "legacy" path) — converts function arguments to strings via ExprToString(), then resolves them at evaluation time via ResolveValue(). This works for simple cases like CONTAINS(c.name, "foo") where arguments are field references or literals. It breaks when an argument is itself a function call like LOWER(c.name) because ExprToString produces the literal string "LOWER(c.name)" which ResolveValue can't handle.

SqlExpressionCondition (the "modern" path) — keeps the full parsed expression tree and evaluates it recursively via EvaluateSqlExpression(). This handles arbitrary nesting because it walks the AST rather than flattening to strings.

The reason both paths exist isn't performance — it's history. FunctionCondition was written first as a simple string-based approach. Later, the full expression evaluator (EvaluateSqlExpression) was built for more complex SQL features. The legacy path was never migrated because it worked fine for the simple cases it handled.

Could we just route everything through SqlExpressionCondition? Yes, and that would be the cleaner fix — remove FunctionCondition entirely and always use the expression tree path. The PR takes the minimal-diff approach of only routing to the new path when it detects nested function calls, which leaves the bifurcation in place."

So I think the best approach is to delete this non "modern" path completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: SQL parser fails on nested function calls like CONTAINS(LOWER(...))

2 participants