Add ignore param to createDocuments for silent duplicate handling#851
Add ignore param to createDocuments for silent duplicate handling#851premtsd-code wants to merge 4 commits intomainfrom
Conversation
- Add `bool $ignore` param to `createDocuments` across all adapters (MariaDB: INSERT IGNORE, Postgres: ON CONFLICT DO NOTHING, SQLite: INSERT OR IGNORE, MongoDB: ordered:false outside transaction) - Pre-fetch existing document IDs to skip relationship writes and filter known duplicates from adapter batches - Deduplicate intra-batch documents by ID (first occurrence wins) to prevent ACL drift and overcounted results - Handle tenant-per-document mode with composite keys for pre-fetch - Chunk all find queries by maxQueryValues - Optimize upsertDocuments: batch-fetch existing docs with find instead of per-row getDocument calls (~2x speedup) - Add ignore param to Mirror::createDocuments - Add e2e tests: mixed duplicates, all duplicates, intra-batch duplicates with onNext callback assertions
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 52 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client as Client
participant DB as Database
participant Adapter as Adapter
participant Store as Storage
Client->>DB: createDocuments(docs, ignore=true)
DB->>DB: Deduplicate intra-batch by ID (tenant-aware)
DB->>Adapter: find(ids chunked) pre-fetch existing IDs
Adapter->>Store: Query existing IDs
Store-->>Adapter: Existing IDs
Adapter-->>DB: Existing ID set
DB->>DB: Filter docs excluding pre-existing IDs
DB->>Adapter: createDocuments(filteredDocs, ignore=true)
alt Using SQL dialect
Adapter->>Store: INSERT (INSERT IGNORE / INSERT OR IGNORE / ON CONFLICT DO NOTHING)
else Using Mongo
Adapter->>Store: find existing _uid + insertMany(filtered)
end
Store-->>Adapter: Insert result / skipped count
Adapter-->>DB: Insert results
DB-->>Client: Return inserted count
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Greptile SummaryThis PR adds an Two adapter-level pre-filter bugs remain:
Confidence Score: 3/5Two P1 bugs can cause The MongoDB
|
| Filename | Overview |
|---|---|
| src/Database/Adapter/Mongo.php | Adds pre-filter for ignore mode; firstBatch cursor truncation can miss existing UIDs when batch has >~101 duplicates, causing insertMany to throw in transaction context. |
| src/Database/Adapter/SQL.php | Adds ignore-mode duplicate pre-filter and INSERT IGNORE/ON CONFLICT plumbing; tenantPerDocument pre-filter incorrectly uses session tenant instead of per-document tenant. |
| src/Database/Database.php | Adds ignore parameter with intra-batch deduplication and chunked pre-fetch of existing IDs; logic is correct, but count/onNext overcount on SQL race-condition is a known documented limitation. |
| src/Database/Adapter/Postgres.php | Adds ON CONFLICT DO NOTHING suffix; COLLATE mismatch concern was already discussed in prior review thread and accepted by the team. |
| src/Database/Mirror.php | Forwards ignore to both source and destination; destination call intentionally omits onNext/onError (pre-existing pattern), no new issues. |
| src/Database/Adapter/Pool.php | Trivial signature update to add $ignore parameter; delegates via func_get_args, correct. |
| src/Database/Adapter/SQLite.php | Adds INSERT OR IGNORE INTO override; correct SQLite syntax, no issues. |
| src/Database/Adapter.php | Abstract signature extended with bool $ignore = false; docblock updated correctly. |
| tests/e2e/Adapter/Scopes/DocumentTests.php | Adds three well-structured E2E tests covering mixed-batch, intra-batch, and all-duplicate scenarios; assertions are precise. |
Reviews (3): Last reviewed commit: "Add tenant filter to Mongo adapter pre-f..." | Re-trigger Greptile
| protected function getInsertSuffix(bool $ignore, string $table): string | ||
| { | ||
| if (!$ignore) { | ||
| return ''; | ||
| } | ||
|
|
||
| $conflictTarget = $this->sharedTables ? '("_uid", "_tenant")' : '("_uid")'; | ||
|
|
||
| return "ON CONFLICT {$conflictTarget} DO NOTHING"; | ||
| } | ||
|
|
||
| protected function getInsertPermissionsSuffix(bool $ignore): string | ||
| { | ||
| if (!$ignore) { | ||
| return ''; | ||
| } | ||
|
|
||
| $conflictTarget = $this->sharedTables | ||
| ? '("_type", "_permission", "_document", "_tenant")' | ||
| : '("_type", "_permission", "_document")'; | ||
|
|
||
| return "ON CONFLICT {$conflictTarget} DO NOTHING"; | ||
| } |
There was a problem hiding this comment.
PostgreSQL
ON CONFLICT conflict targets missing COLLATE utf8_ci_ai
Both getInsertSuffix and getInsertPermissionsSuffix produce ON CONFLICT clauses whose conflict targets do not match the actual unique indexes created by the Postgres adapter. PostgreSQL uses inference to find the unique index — when a unique index was built with a non-default collation (here utf8_ci_ai, which is deterministic = false), that same collation must be specified in the conflict target, otherwise PostgreSQL will throw:
there is no unique or exclusion constraint matching the ON CONFLICT specification
Actual unique indexes vs. conflict targets:
| Table | Actual index | Conflict target in PR | Issue |
|---|---|---|---|
| main (non-shared) | ("_uid" COLLATE utf8_ci_ai) |
("_uid") |
❌ missing COLLATE |
| main (shared) | ("_uid" COLLATE utf8_ci_ai, "_tenant") |
("_uid", "_tenant") |
❌ missing COLLATE on _uid |
| perms (non-shared) | (_document COLLATE utf8_ci_ai, _type, _permission) |
("_type", "_permission", "_document") |
❌ missing COLLATE on _document |
| perms (shared) | (_tenant, _document, _type, _permission) |
("_type", "_permission", "_document", "_tenant") |
✅ no collation in index |
The fixes:
// getInsertSuffix – main document table
protected function getInsertSuffix(bool $ignore, string $table): string
{
if (!$ignore) {
return '';
}
$conflictTarget = $this->sharedTables
? '("_uid" COLLATE utf8_ci_ai, "_tenant")'
: '("_uid" COLLATE utf8_ci_ai)';
return "ON CONFLICT {$conflictTarget} DO NOTHING";
}
// getInsertPermissionsSuffix – permissions table
protected function getInsertPermissionsSuffix(bool $ignore): string
{
if (!$ignore) {
return '';
}
$conflictTarget = $this->sharedTables
? '("_type", "_permission", "_document", "_tenant")'
: '("_document" COLLATE utf8_ci_ai, "_type", "_permission")';
return "ON CONFLICT {$conflictTarget} DO NOTHING";
}Without this fix, createDocuments(..., ignore: true) will always fail at the PostgreSQL level for non-shared tables (and shared tables' main document insert), raising a PDOException rather than silently ignoring duplicates.
There was a problem hiding this comment.
The existing upsertDocuments at [Postgres.php:1462] uses ("_uid") without COLLATE and works in production
src/Database/Adapter/Mongo.php
Outdated
| try { | ||
| $documents = $this->client->insertMany($name, $records, $options); | ||
| } catch (MongoException $e) { | ||
| throw $this->processException($e); | ||
| $processed = $this->processException($e); | ||
|
|
||
| if ($ignore && $processed instanceof DuplicateException) { | ||
| // Race condition: a doc was inserted between pre-filter and insertMany. | ||
| // With ordered:false outside transaction, non-duplicate inserts persist. | ||
| // Return empty — we cannot determine which docs succeeded without querying. | ||
| return []; | ||
| } | ||
|
|
||
| throw $processed; | ||
| } | ||
|
|
||
| foreach ($documents as $index => $document) { |
There was a problem hiding this comment.
Race-condition path silently drops
onNext callbacks for successfully-inserted documents
When a DuplicateException is caught here (a race condition where another writer inserted a document between the pre-filter find and insertMany), the method returns []. Because MongoDB used ordered: false, some non-duplicate documents in the batch were persisted, but Database::createDocuments will receive an empty $batch and:
- Call
$onNextfor none of those documents. - Not increment
$modifiedfor them.
The comment in the code acknowledges this, but callers of createDocuments(..., ignore: true) may not be aware. Consider documenting this limitation in the Database::createDocuments docblock, or performing a follow-up find to discover which documents actually landed.
There was a problem hiding this comment.
Known limitation, documented in code comments. This only triggers on a race condition — the pre-fetch already filters known duplicates before they reach the adapter. The utopia-php/mongo client does not expose BulkWriteException::getWriteResult(), so recovering partial success info would require changes to the client library. For the migration use case (single writer), this race cannot occur.
There was a problem hiding this comment.
Known limitation. This only triggers on a race condition — pre-fetch filters known duplicates before reaching the adapter. The utopia-php/mongo client does not expose BulkWriteException::getWriteResult(), so recovering partial success info is not possible from this layer. Needs a fix in utopia-php/mongo to expose write results from insertMany — tracked for follow-up there. For the migration use case (single writer), this race cannot occur.
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/Database/Adapter/SQL.php (1)
2575-2604:⚠️ Potential issue | 🔴 CriticalDrive permission writes from the inserted subset, not the requested batch.
If the document insert silently skips any row, this code still inserts
_permsentries for that document. In the race window that can add permissions to an already-existing document while leaving its_permissionsJSON unchanged, and downstream code will still treat the skipped document as newly created. Please reconcile the actually inserted_uids before building the permissions batch or returning the result.Based on learnings, in
src/Database/Adapter/MariaDB.phponly duplicate_uidviolations are intended to be treated as document duplicates; other unique constraint violations should stay distinct.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/Adapter/SQL.php` around lines 2575 - 2604, After executing the batch INSERT in SQL.php (the block using getInsertKeyword, getSQLTable and $this->execute($stmt)), do not blindly use the originally requested batch to build $sqlPermissions; instead reconcile which _uid rows were actually inserted and only write permissions for those: query the target table (using getSQLTable($name)) for the subset of _uid values present in the just-executed batch (using the same bind values or a short SELECT WHERE _uid IN (...)) to compute the actually-inserted uids, then rebuild $permissions and $bindValuesPermissions from that confirmed set before preparing and executing the permissions INSERT (the block using getInsertPermissionsSuffix and getSQLTable($name . '_perms')); also ensure behavior in MariaDB adapter still treats only duplicate _uid unique-constraint violations as document-duplicates while preserving other unique-constraint failures as distinct.
🧹 Nitpick comments (3)
src/Database/Adapter/SQL.php (1)
2617-2637: Mark the default hook arguments as intentionally unused.PHPMD is already flagging
$ignoreand$tablehere. Renaming them to$_ignore/$_tableor adding a local suppression will keep these new extension points from introducing permanent static-analysis noise.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Database/Adapter/SQL.php` around lines 2617 - 2637, The default implementations of getInsertSuffix and getInsertPermissionsSuffix accept parameters that are intentionally unused and cause PHPMD warnings; rename their parameters to indicate unused (e.g., change getInsertSuffix(bool $ignore, string $table) to getInsertSuffix(bool $_ignore, string $_table) and change getInsertPermissionsSuffix(bool $ignore) to getInsertPermissionsSuffix(bool $_ignore)) or otherwise explicitly suppress/consume the variables in those methods so PHPMD no longer flags them; keep getInsertKeyword unchanged.tests/e2e/Adapter/Scopes/DocumentTests.php (2)
7815-7818: Consider using__FUNCTION__for collection naming consistency and test isolation.Using
__FUNCTION__here (as in adjacent tests) reduces collision risk in repeated/partial runs.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7815 - 7818, The test currently hardcodes the collection name in $col = 'createDocsIgnoreIntraBatch', which can cause collisions; change the assignment to use __FUNCTION__ (e.g., $col = __FUNCTION__) so createCollection($col) and createAttribute($col, 'name', Database::VAR_STRING, 128, true) use a unique name per test run, matching adjacent tests' pattern and improving isolation.
7726-7920: Add a chunk-boundary duplicate-ignore test to covermaxQueryValuesprefetch paths.Current new tests are small-batch only. Please add at least one case with IDs >
maxQueryValues(and, if supported, tenant-per-document composite-key mode) to exercise chunked duplicate prefetch behavior end-to-end.Based on learnings In src/Database/Database.php, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) for tenant-per-document and non-tenant modes to satisfy DocumentsValidator limits on Query::equal('$id', ...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7726 - 7920, Add a new E2E test that exercises chunked prefetching by creating and then calling Database::createDocuments (and optionally upsertDocumentsWithIncrease) with a batch size larger than Database::$maxQueryValues so the code goes through the array_chunk(...) path; generate N = $this->getDatabase()->maxQueryValues + X IDs (where X > 0), include some IDs that already exist and some duplicates intra-batch, call createDocuments(..., ignore: true, onNext: ...) and assert emitted IDs, final counts, that first-occurrence wins, and no ACL drift (similar to testCreateDocumentsIgnoreIntraBatchDuplicates), and repeat the same in tenant-per-document composite-key mode if supported to cover both chunking branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1499-1507: The catch for MongoException in the insertMany path
currently returns an empty array when $ignore is true and processException($e)
yields a DuplicateException, which incorrectly signals “no writes” even though
with ordered:false some docs may have persisted; instead, remove the "return []"
behavior and either rethrow the original exception ($e) or propagate the
processed exception so callers can decide to query which docs succeeded. Update
the catch block in the insertMany flow (referencing processException(),
DuplicateException, $ignore, and the insertMany/ordered:false behavior) to avoid
silently returning an empty result and ensure the exception is propagated for
callers to detect/handle partial success.
- Around line 1467-1473: The current branch sets $options to ordered:false when
$ignore is true, which drops the session and silently escapes any active
transaction; instead add a guard that detects an active transaction/session
(e.g. check $this->getTransactionOptions(), $this->clientSession or an existing
helper like hasActiveTransaction()) and throw an exception when $ignore is true
while a transaction is active, so callers fail fast rather than weakening
atomicity; implement this check before altering $options and reference $ignore,
$options and getTransactionOptions() when adding the guard.
In `@src/Database/Database.php`:
- Around line 5658-5674: The dedupe loop currently only uses $document->getId(),
which drops documents with the same ID across different tenants when $ignore is
true; change the dedupe key to be tenant-aware by combining the tenant
identifier with the document ID (e.g., use a composite key like
"$tenantId:$docId") when checking and writing to $seenIds inside the $ignore
block; obtain the tenant id from the document (e.g., $document->getTenantId() or
equivalent accessor) and fall back to a stable empty/default tenant string if
none, then use that composite key in the isset/$seenIds assignment so documents
from different tenants aren’t deduplicated incorrectly.
- Around line 5682-5689: The tenant-prefetch loop that builds $idsByTenant from
$documents (using $doc->getTenant() and $doc->getId()) doesn't filter out empty
IDs, causing pointless Query::equal('$id', $idChunk) calls; update the bucket
cleanup so that before deduping/chunking you apply array_filter to remove
empty/falsey IDs (i.e. replace the current tenantIds handling with
array_values(array_unique(array_filter($tenantIds)))) so the subsequent
foreach(array_chunk(..., $this->maxQueryValues)) in the authorization->skip ->
withTenant -> silent -> find path only receives non-empty IDs; make the same
change in the other tenant-per-document block around the code noted at lines
~7223-7224 to match the non-tenant paths.
---
Outside diff comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2575-2604: After executing the batch INSERT in SQL.php (the block
using getInsertKeyword, getSQLTable and $this->execute($stmt)), do not blindly
use the originally requested batch to build $sqlPermissions; instead reconcile
which _uid rows were actually inserted and only write permissions for those:
query the target table (using getSQLTable($name)) for the subset of _uid values
present in the just-executed batch (using the same bind values or a short SELECT
WHERE _uid IN (...)) to compute the actually-inserted uids, then rebuild
$permissions and $bindValuesPermissions from that confirmed set before preparing
and executing the permissions INSERT (the block using getInsertPermissionsSuffix
and getSQLTable($name . '_perms')); also ensure behavior in MariaDB adapter
still treats only duplicate _uid unique-constraint violations as
document-duplicates while preserving other unique-constraint failures as
distinct.
---
Nitpick comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2617-2637: The default implementations of getInsertSuffix and
getInsertPermissionsSuffix accept parameters that are intentionally unused and
cause PHPMD warnings; rename their parameters to indicate unused (e.g., change
getInsertSuffix(bool $ignore, string $table) to getInsertSuffix(bool $_ignore,
string $_table) and change getInsertPermissionsSuffix(bool $ignore) to
getInsertPermissionsSuffix(bool $_ignore)) or otherwise explicitly
suppress/consume the variables in those methods so PHPMD no longer flags them;
keep getInsertKeyword unchanged.
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7815-7818: The test currently hardcodes the collection name in
$col = 'createDocsIgnoreIntraBatch', which can cause collisions; change the
assignment to use __FUNCTION__ (e.g., $col = __FUNCTION__) so
createCollection($col) and createAttribute($col, 'name', Database::VAR_STRING,
128, true) use a unique name per test run, matching adjacent tests' pattern and
improving isolation.
- Around line 7726-7920: Add a new E2E test that exercises chunked prefetching
by creating and then calling Database::createDocuments (and optionally
upsertDocumentsWithIncrease) with a batch size larger than
Database::$maxQueryValues so the code goes through the array_chunk(...) path;
generate N = $this->getDatabase()->maxQueryValues + X IDs (where X > 0), include
some IDs that already exist and some duplicates intra-batch, call
createDocuments(..., ignore: true, onNext: ...) and assert emitted IDs, final
counts, that first-occurrence wins, and no ACL drift (similar to
testCreateDocumentsIgnoreIntraBatchDuplicates), and repeat the same in
tenant-per-document composite-key mode if supported to cover both chunking
branches.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5f33c139-d8d4-4a2d-8818-031ba4b5ceea
📒 Files selected for processing (9)
src/Database/Adapter.phpsrc/Database/Adapter/Mongo.phpsrc/Database/Adapter/Pool.phpsrc/Database/Adapter/Postgres.phpsrc/Database/Adapter/SQL.phpsrc/Database/Adapter/SQLite.phpsrc/Database/Database.phpsrc/Database/Mirror.phptests/e2e/Adapter/Scopes/DocumentTests.php
src/Database/Adapter/Mongo.php
Outdated
| if ($ignore) { | ||
| // Run outside transaction — MongoDB aborts transactions on any write error, | ||
| // so ordered:false + session would roll back even successfully inserted docs. | ||
| $options = ['ordered' => false]; | ||
| } else { | ||
| $options = $this->getTransactionOptions(); | ||
| } |
There was a problem hiding this comment.
Don't silently escape the active transaction.
When $ignore is true this write no longer carries the current session, so a caller that is already inside a transaction can still end up committing these documents even if the surrounding transaction later rolls back. Failing fast here is safer than silently weakening atomicity.
🔒 Suggested guard
- if ($ignore) {
+ if ($ignore && $this->inTransaction > 0) {
+ throw new DatabaseException('createDocuments(ignore: true) is not supported inside MongoDB transactions');
+ }
+
+ if ($ignore) {
// Run outside transaction — MongoDB aborts transactions on any write error,
// so ordered:false + session would roll back even successfully inserted docs.
$options = ['ordered' => false];
} else {
$options = $this->getTransactionOptions();
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ($ignore) { | |
| // Run outside transaction — MongoDB aborts transactions on any write error, | |
| // so ordered:false + session would roll back even successfully inserted docs. | |
| $options = ['ordered' => false]; | |
| } else { | |
| $options = $this->getTransactionOptions(); | |
| } | |
| if ($ignore && $this->inTransaction > 0) { | |
| throw new DatabaseException('createDocuments(ignore: true) is not supported inside MongoDB transactions'); | |
| } | |
| if ($ignore) { | |
| // Run outside transaction — MongoDB aborts transactions on any write error, | |
| // so ordered:false + session would roll back even successfully inserted docs. | |
| $options = ['ordered' => false]; | |
| } else { | |
| $options = $this->getTransactionOptions(); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Adapter/Mongo.php` around lines 1467 - 1473, The current branch
sets $options to ordered:false when $ignore is true, which drops the session and
silently escapes any active transaction; instead add a guard that detects an
active transaction/session (e.g. check $this->getTransactionOptions(),
$this->clientSession or an existing helper like hasActiveTransaction()) and
throw an exception when $ignore is true while a transaction is active, so
callers fail fast rather than weakening atomicity; implement this check before
altering $options and reference $ignore, $options and getTransactionOptions()
when adding the guard.
src/Database/Adapter/Mongo.php
Outdated
| } catch (MongoException $e) { | ||
| throw $this->processException($e); | ||
| $processed = $this->processException($e); | ||
|
|
||
| if ($ignore && $processed instanceof DuplicateException) { | ||
| // Race condition: a doc was inserted between pre-filter and insertMany. | ||
| // With ordered:false outside transaction, non-duplicate inserts persist. | ||
| // Return empty — we cannot determine which docs succeeded without querying. | ||
| return []; | ||
| } |
There was a problem hiding this comment.
Don't report the raced batch as empty.
With ordered: false, inserts before the duplicate can still persist. Returning [] here makes callers treat the batch as a no-op, so follow-up work for actually inserted documents can be skipped even though those writes already committed.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Database/Adapter/Mongo.php` around lines 1499 - 1507, The catch for
MongoException in the insertMany path currently returns an empty array when
$ignore is true and processException($e) yields a DuplicateException, which
incorrectly signals “no writes” even though with ordered:false some docs may
have persisted; instead, remove the "return []" behavior and either rethrow the
original exception ($e) or propagate the processed exception so callers can
decide to query which docs succeeded. Update the catch block in the insertMany
flow (referencing processException(), DuplicateException, $ignore, and the
insertMany/ordered:false behavior) to avoid silently returning an empty result
and ensure the exception is propagated for callers to detect/handle partial
success.
…ering - Pre-filter duplicates inside MongoDB session instead of bypassing the transaction, ensuring ignore mode works correctly within transactions - Use tenant-aware composite keys for intra-batch dedup in tenant-per-document mode - Add array_filter() to tenant prefetch paths to skip empty IDs, consistent with non-tenant paths
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1492-1500: The pre-filter in the insert-ignore path of Mongo
adapter misses tenant scoping when $this->sharedTables is true, so modify the
query used in Mongo::client->find (inside the block handling $ignore and
$records where $uids is built) to include the tenant condition the same way
getDocument does: build a filter that combines ['_uid' => ['$in' =>
array_values($uids)]] with the tenant predicate (using $this->getTenantFilter or
the same _tenant value logic) before calling $this->getTransactionOptions and
$this->client->find, so existingUids is computed only for documents belonging to
the current tenant. Ensure variable names ($uids, $existingUids, $findOptions)
remain consistent.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b9a2cb68-c98c-4fd8-b7c9-598046eeaaec
📒 Files selected for processing (2)
src/Database/Adapter/Mongo.phpsrc/Database/Database.php
✅ Files skipped from review due to trivial changes (1)
- src/Database/Database.php
| $tenantFilter = ''; | ||
| if ($this->sharedTables) { | ||
| $tenantFilter = ' AND _tenant = :_dup_tenant'; | ||
| $binds[':_dup_tenant'] = $this->getTenant(); | ||
| } |
There was a problem hiding this comment.
tenantPerDocument pre-filter uses session tenant, misses cross-tenant duplicates
When tenantPerDocument is enabled, a single createDocuments call may contain documents for different tenants (each document->getTenant() can differ). The pre-filter query uses AND _tenant = :_dup_tenant bound to $this->getTenant() — the session-level tenant — so it never finds existing documents whose _tenant differs from the session tenant.
Those undetected duplicates are passed to INSERT IGNORE/ON CONFLICT DO NOTHING, which correctly discards them at the DB layer. However, SQL::createDocuments still returns the original $documents array (line 2658), so Database::createDocuments calls $onNext and increments $modified for documents that were silently skipped — an overcount identical to the known race-condition overcount.
The minimal fix is to filter by (_uid, _tenant) pairs instead of a single session tenant, or restrict this pre-check to non-tenantPerDocument mode and rely solely on the DB-level ignore keyword for that case.
Summary
bool $ignore = falseparameter tocreateDocumentsacross all adapters for silent duplicate handlingmaxQueryValuesupsertDocuments: batch-fetch existing docs withfindinstead of per-rowgetDocumentcalls (~2x speedup)Adapter Implementation
INSERT IGNORE INTOINSERT INTO ... ON CONFLICT DO NOTHINGINSERT OR IGNORE INTOordered: falseoutside transactionTest plan
ignore: true— only new docs insertedignore: true— no rows inserted, count = 0onNextcallback assertions — fires only for actually inserted docsupsertDocuments~2x faster with batch-fetch optimizationSupersedes #850
Summary by CodeRabbit
New Features
Tests