Fix COUNT with bracket notation in conditional expressions (Issue #59)#60
Open
McNultyyy wants to merge 11 commits into
Open
Fix COUNT with bracket notation in conditional expressions (Issue #59)#60McNultyyy wants to merge 11 commits into
McNultyyy wants to merge 11 commits into
Conversation
… - bump to v4.0.17 COUNT(expr) was stringifying the inner argument and passing it to JToken.SelectToken as a JSONPath, which failed with JsonException for any non-trivial expression — including the required double-quoted bracket notation when a field name is a Cosmos SQL reserved word (e.g. COUNT(c.amount["value"] > 0 ? 1 : undefined)). Evaluate the inner expression per-item via EvaluateSqlExpression (mirroring SUM/AVG) and count rows whose result is defined and non-null. Add UndefinedLiteralExpression so the `undefined` keyword maps to the runtime undefined sentinel and isn't conflated with the string literal 'undefined'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The SDK-rewritten {payload: ...} aggregate query path in ProjectAggregateFields
still used the buggy stringified-JSONPath approach for COUNT(expr), so the
Issue #59 fix only landed in EvaluateAggregateExpression. CI's aggregate
plan caching exposed it; local debug runs happened to skip this code path.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collaborator
Author
|
CI follow-up:
|
Single-aggregate non-VALUE queries (e.g. SELECT SUM(c.x) AS total FROM c)
through InMemoryCosmos.Builder() hit a pre-existing SDK aggregate-pipeline
shape mismatch on CI: the handler returns the result without the {payload:
{alias: {item: value}}} envelope the pipeline expects. The pre-existing
SqlFunctionTests.SumAggregate_ReturnsCorrectSum sidesteps it by going
through InMemoryContainer directly. Switching the single-aggregate Issue 59
tests to SELECT VALUE form triggers isValueAggregateBypass in
DefaultQueryPlanStrategy, so the SDK pipeline is suppressed and the
reporter-scenario coverage (Combined_CountAndSum, which uses the issue's
exact query shape) keeps its multi-aggregate bypass coverage too.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The query-plan ContainsAggregate whitelist (used to decide whether to bypass the SDK aggregate pipeline) was missing COUNTIF, so SELECT VALUE COUNTIF(...) queries returned a raw scalar via the bypass-only handler path while the SDK still tried to apply its aggregate pipeline — resulting in "RewrittenAggregateProjections was not an array for a value aggregate query" on CI Release runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… budget
Both Linux Docker and Windows emulator integration jobs were timing out on
the very first POST /dbs/parity-validation/colls with 503 / 1007 "high demand
in this region". The HTTP listener accepts requests before the partition
services are ready for writes, so the cold-start retry was burning through
the default ~75s EmulatorRetry budget on every test that needed a container.
Two changes, both on the emulator path only (in-memory tests unaffected):
- EmulatorSession.InitializeAsync now does a probe create+delete on a
throwaway container after CreateDatabaseIfNotExists, using the generous
maxRetries=60 / maxBackoffSeconds=15 budget. This pre-warms the partition
services once per run so subsequent per-test creates hit warm services.
- EmulatorTestFixture.CreateContainerCoreAsync now uses the same retry
budget as the session warmup (maxRetries=30 / maxBackoffSeconds=15) so
transient 503s mid-run are absorbed instead of failing tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adding COUNTIF to ContainsAggregate so the SDK's aggregate pipeline ran for
SELECT COUNTIF(...) AS x FROM c surfaced the underlying problem: the SDK
pipeline doesn't recognise COUNTIF and treats the response as a malformed
aggregate envelope ("Underlying object does not have an 'payload' field").
Add a narrow COUNTIF-only bypass alongside the existing GROUP BY /
multi-aggregate / VALUE-aggregate bypasses, so any query mentioning COUNTIF
short-circuits the SDK accumulator and the handler returns the result
directly. Leaves COUNT/SUM/MIN/MAX/AVG single-non-VALUE queries on the
existing path so the QueryPlan_*Aggregate_Detects* tests keep their
contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ansient The 503/1007 fix from 18babd2 surfaced a deeper layer: freshly-created containers can return 404 / 1013 ("Collection is not yet available for read") for tens of seconds while partition routing settles. Tests calling ReadItemAsync immediately after CreateContainerAsync would burn ~8 minutes of SDK-internal retries per test before failing. - EmulatorSession.InitializeAsync warmup probe now does Create + Upsert + Read on the throwaway container so the data plane is genuinely warm (not just the control plane) before tests start. - EmulatorTestFixture.CreateContainerCoreAsync now does the same probe inside the retry loop — each test class only gets back a container once Upsert succeeds, so test bodies don't hit 1013 on their first op. - EmulatorRetry.IsTransient treats 404/1013 specifically as transient. Plain 404/0 ("resource doesn't exist") is still non-transient, so this cannot mask genuine missing-resource bugs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous probe wrote {id, __pk} with the partition key header set to
probeId — but the container's partition key path is whatever the test
specified (e.g. /partitionKey). The PK extracted from the document was
undefined while the header said probeId, so the emulator returned
400 / 1001 "PartitionKey extracted from document doesn't match the one
specified in the header" and every per-test create failed fast.
ProbeDataPlaneAsync now reads props.PartitionKeyPaths and constructs a
JObject with each path's top-level property set, plus a PartitionKeyBuilder
matching. Skips the probe for nested (deeper than one slash) paths — those
are rare and the SDK's own retry will absorb the warmup window for them.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI runs showed the Linux Docker emulator's read replica lags the write
replica by 6-7 minutes per freshly-created container. The per-test
ProbeDataPlaneAsync (which read back its own write to confirm read
availability) ate the full retry budget per test class — N test classes ×
~7 min = past the 45min CI job timeout, so no tests completed.
Net effect was worse than no probe at all: pre-existing Issue18 emulator
failures (13 tests on main) went away, but new failures in
FakeCosmosHandlerQueryAdvancedTests appeared because we never got to those
classes before timeout.
Reverting:
- EmulatorTestFixture.CreateContainerCoreAsync goes back to plain
CreateContainerIfNotExistsAsync with retry (no data-plane probe).
- EmulatorSession session-level warmup probe keeps the CreateItem step
(write replica is fast) but drops the ReadItemAsync (read replica is
slow and won't be ready in time anyway).
This restores main's baseline emulator behaviour for the pre-existing
flaky tests, while keeping the COUNTIF whitelist fixes that this PR
genuinely improves.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous bump (30 × 15s) caused each pre-existing flaky-on-main test class to wait ~7m before failing instead of ~1m. Across 13 known-flaky tests, that pushed the emulator-linux job past its 45m timeout, masking the difference between PR and main and burning runner minutes. Default budget (10 × 10s ≈ 75s) makes the flaky tests fail fast, same as main. Session-level warmup (60 × 15s ≈ 15m) is kept — it's a one-time per-run cost and absorbs the initial 503 storm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…budget Previous commit's session-level write probe created and deleted a throwaway container before tests started. On the CI Linux Docker emulator (PARTITION_COUNT=10) and especially the Windows emulator (PartitionCount=3), this consumed a slot that didn't fully free on delete in time, causing Handler_MultiContainer_CrudIsolated (which needs 2 concurrent containers) to fail with 503 — a new failure not in main's baseline. Reverted to the original behaviour: only CreateDatabaseIfNotExistsAsync in session init. Per-test fixture retries (default 75s budget) absorb the initial 503 storm, same as main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
COUNT(expr)was stringifying the inner argument and handing it toJToken.SelectTokenas a JSONPath. That failed withNewtonsoft.Json.JsonException: Unexpected character while parsing pathfor any non-trivial expression — including the bracket notation required when a field name is a Cosmos SQL reserved word (c.amount["value"]) combined with comparison or ternary operators (COUNT(c.amount["value"] > 0 ? 1 : undefined)).COUNT(expr)now evaluates the inner expression per-item viaEvaluateSqlExpression(mirroring howSUM/AVGalready work) and counts rows whose result is defined and non-null — matching Cosmos DB semantics and preserving the existingCountField_InSelectValueObject_ExcludesNullsbehaviour.UndefinedLiteralExpressionso the SQLundefinedkeyword maps to the runtime undefined sentinel instead of the string"undefined"(previously indistinguishable from the literal'undefined').Closes #59. Version bumped to 4.0.17.
Test plan
Issue59_CountBracketNotationTests.cscoveringSUM,COUNTwith ternary, and combinedCOUNT+SUMon a reserved-word bracket-notation fieldCOUNT(c.score)null-exclusion behaviour preserved (CountField_InSelectValueObject_ExcludesNulls)🤖 Generated with Claude Code