Fix: sql server param limit#127
Open
andres-sole wants to merge 11 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses SQL Server’s 2,100 bound-parameter limit by introducing a shared ORM helper that batches large IN (...) filters, then migrating several high-volume query call sites (layout exporter, structure/semantic/scope services, Meili JSON, and DPM-XL queries) to use it.
Changes:
- Added
dpmcore.orm.query_utils.chunked_in(with unit tests) to safely batchIN (...)predicates across supported backends. - Replaced multiple unbounded
.in_(...)usages across services/utilities withchunked_into prevent SQL Server crashes on large modules. - Updated Meili JSON tests to patch the shared helper (removing the service-local chunking implementation and its tests).
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit/orm/test_query_utils.py | Adds unit coverage for the new chunked_in batching helper. |
| tests/unit/meili/test_meili_json_service.py | Removes tests for the deleted local chunking helper; updates mocking to target chunked_in. |
| src/dpmcore/orm/query_utils.py | Introduces chunked_in and IN_CHUNK_SIZE as shared query utilities. |
| src/dpmcore/services/layout_exporter/queries.py | Uses chunked_in to batch large ID lookups during layout export queries. |
| src/dpmcore/services/structure.py | Replaces multiple unbounded IN filters with chunked_in for bulk-loading structure data. |
| src/dpmcore/services/semantic.py | Uses chunked_in for module-scope lookups and re-deduplicates results across chunked DISTINCT queries. |
| src/dpmcore/services/scope_calculator.py | Applies chunked_in to bulk module/table/key lookups to avoid SQL Server parameter limits. |
| src/dpmcore/services/meili_json.py | Switches bulk loaders from a local chunking helper to the shared chunked_in. |
| src/dpmcore/server/routers/structure.py | Uses chunked_in when resolving organisation acronyms to IDs. |
| src/dpmcore/dpm_xl/model_queries.py | Uses chunked_in in DataFrame-producing model query helpers to avoid oversized IN predicates. |
… to stay under the param cap, with a deterministic member-code winner
…ta/dpmcore into fix/sql-server-param-limit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #126
Summary
SQL Server caps a single statement at 2,100 bound parameters, so the layout exporter crashed (
pyodbc 07002) whenever a module referenced more than ~2,100 variable versions — every ID was bound into one unboundedIN (...). This PR introduces a shared chunking helper and migrates the high-volume, data-sizedINcall sites across the codebase to use it.What was done
dpmcore.orm.query_utils.chunked_in(withIN_CHUNK_SIZE = 900) — splits acolumn.in_(values)filter into fixed-size batches that stay well under the cap on every backend, and concatenates the results. Values are de-duplicated (order-preserving) so the chunked result matches single-statementINsemantics even when callers pass duplicates that would straddle batches.INlookups in the layout exporter, structure/semantic/scope-calculator services, Meili JSON, the DPM-XL model queries, and the structure router tochunked_in._load_member_codesso the domain filter is applied in Python instead of a second unboundedIN, keeping the chunked statement under the cap regardless of how many domains an export spans. Its member-code result is now deterministic (highest(category_id, code)wins) rather than dependent on backend row order.Notes
INsites flagged against Layout exporter fails on SQL Server when a module references more than 2,100 variable versions #126 are intentionally left:dpm_xl/utils/filters.pybinds only release IDs (bounded to the number of releases, never near the cap) and is a query-builder that can't usechunked_in;dpm_xl/ast/operands.pyis a pandas-compiled.distinct()path that needs a separate chunk-and-concat approach. Tracked as a follow-up — this PR is part of Layout exporter fails on SQL Server when a module references more than 2,100 variable versions #126, not a full close.Checklist
ruff format,ruff check,mypy)pytest) with 100% branch coverage (coverage report --fail-under=100)