Skip to content

Add ignore param to createDocuments for silent duplicate handling#851

Closed
premtsd-code wants to merge 4 commits intomainfrom
csv-import-upsert-v2
Closed

Add ignore param to createDocuments for silent duplicate handling#851
premtsd-code wants to merge 4 commits intomainfrom
csv-import-upsert-v2

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 8, 2026

Summary

  • Adds bool $ignore = false parameter to createDocuments across all adapters for silent duplicate handling
  • Pre-fetches existing document IDs to skip relationship writes and filter known duplicates from adapter batches
  • Deduplicates intra-batch documents by ID (first occurrence wins) to prevent ACL drift
  • Handles tenant-per-document mode with composite keys
  • Chunks all find queries by maxQueryValues
  • Optimizes upsertDocuments: batch-fetch existing docs with find instead of per-row getDocument calls (~2x speedup)

Adapter Implementation

Adapter Strategy
MariaDB/MySQL INSERT IGNORE INTO
PostgreSQL INSERT INTO ... ON CONFLICT DO NOTHING
SQLite INSERT OR IGNORE INTO
MongoDB ordered: false outside transaction

Test plan

  • Mixed batch (existing + new docs) with ignore: true — only new docs inserted
  • All-duplicate batch with ignore: true — no rows inserted, count = 0
  • Intra-batch duplicate IDs — first occurrence wins, no ACL drift
  • onNext callback assertions — fires only for actually inserted docs
  • All 39 tests pass across 13 adapter configurations
  • Benchmark: upsertDocuments ~2x faster with batch-fetch optimization

Supersedes #850

Summary by CodeRabbit

  • New Features

    • Document creation now includes an optional flag to silently skip duplicates rather than throw exceptions when inserting multiple documents.
  • Tests

    • Added comprehensive test coverage for duplicate-handling behavior in document creation scenarios.

- Add `bool $ignore` param to `createDocuments` across all adapters
  (MariaDB: INSERT IGNORE, Postgres: ON CONFLICT DO NOTHING, SQLite:
  INSERT OR IGNORE, MongoDB: ordered:false outside transaction)
- Pre-fetch existing document IDs to skip relationship writes and
  filter known duplicates from adapter batches
- Deduplicate intra-batch documents by ID (first occurrence wins)
  to prevent ACL drift and overcounted results
- Handle tenant-per-document mode with composite keys for pre-fetch
- Chunk all find queries by maxQueryValues
- Optimize upsertDocuments: batch-fetch existing docs with find
  instead of per-row getDocument calls (~2x speedup)
- Add ignore param to Mirror::createDocuments
- Add e2e tests: mixed duplicates, all duplicates, intra-batch
  duplicates with onNext callback assertions
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 8, 2026

Warning

Rate limit exceeded

@premtsd-code has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 14 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 14 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 14541f8b-df51-4685-957a-b88a19572dc7

📥 Commits

Reviewing files that changed from the base of the PR and between 2b4fd1e and 9653052.

📒 Files selected for processing (2)
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/SQL.php
📝 Walkthrough

Walkthrough

Adds an optional ignore flag to createDocuments across the database layer to allow silently skipping duplicate documents; implements intra-batch deduplication, pre-fetching of existing IDs, and adapter-level SQL/Mongo dialect handling to support ignore semantics.

Changes

Cohort / File(s) Summary
Adapter Base
src/Database/Adapter.php
Abstract createDocuments(Document $collection, array $documents, bool $ignore = false): array signature added and docblock updated.
Adapter Pool
src/Database/Adapter/Pool.php
Public createDocuments(..., bool $ignore = false) signature added; forwards ignore via existing delegation.
Mongo Adapter
src/Database/Adapter/Mongo.php
createDocuments(..., bool $ignore = false) extended; when ignore=true pre-fetches existing _uids and filters batch before insert, returns early on empty batch.
SQL Adapters
src/Database/Adapter/SQL.php, src/Database/Adapter/Postgres.php, src/Database/Adapter/SQLite.php
createDocuments(..., bool $ignore = false) added to SQL adapter; new protected helpers (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) introduced and used to emit dialect-specific INSERT IGNORE / ON CONFLICT DO NOTHING or INSERT OR IGNORE behavior.
Database Facade
src/Database/Database.php
createDocuments(..., bool $ignore = false) adds intra-batch deduplication, chunked pre-fetch of existing IDs (tenant-aware), filters insert/relationship creation for duplicates, and forwards ignore to adapters; upsertDocumentsWithIncrease batching optimized for bulk pre-fetch.
Mirror
src/Database/Mirror.php
createDocuments(..., bool $ignore = false) signature updated and ignore propagated to source and destination create calls during mirroring.
E2E Tests
tests/e2e/Adapter/Scopes/DocumentTests.php
Three new tests added to validate createDocuments(..., ignore: true) semantics: existing-duplicates, intra-batch duplicates (first-wins), and all-duplicates scenarios.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client as Client
    participant DB as Database
    participant Adapter as Adapter
    participant Store as Storage

    Client->>DB: createDocuments(docs, ignore=true)
    DB->>DB: Deduplicate intra-batch by ID (tenant-aware)
    DB->>Adapter: find(ids chunked) pre-fetch existing IDs
    Adapter->>Store: Query existing IDs
    Store-->>Adapter: Existing IDs
    Adapter-->>DB: Existing ID set
    DB->>DB: Filter docs excluding pre-existing IDs
    DB->>Adapter: createDocuments(filteredDocs, ignore=true)
    alt Using SQL dialect
        Adapter->>Store: INSERT (INSERT IGNORE / INSERT OR IGNORE / ON CONFLICT DO NOTHING)
    else Using Mongo
        Adapter->>Store: find existing _uid + insertMany(filtered)
    end
    Store-->>Adapter: Insert result / skipped count
    Adapter-->>DB: Insert results
    DB-->>Client: Return inserted count
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • fogelito
  • abnegate
  • ArnabChatterjee20k

Poem

🐰
I hopped through batches, nose in the hay,
Found duplicate carrots in my way.
With a twitch and a thump, I let them pass,
First-wins stay planted, the rest—no fuss, no glass.
Quietly skipping, I nibble on peace today.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding an ignore parameter to createDocuments for silent duplicate handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch csv-import-upsert-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds an ignore parameter to createDocuments across all adapters for silent duplicate handling, using INSERT IGNORE/ON CONFLICT DO NOTHING/ordered: false per adapter, with intra-batch deduplication and a chunked pre-fetch of existing IDs in the Database layer.

Two adapter-level pre-filter bugs remain:

  • MongoDB (Mongo.php:1502): $result->cursor->firstBatch is capped at MongoDB's default batch size (~101). In a 1000-doc batch with >101 existing docs, the extra existing UIDs reach insertMany and trigger a BulkWriteException inside the transaction, breaking the ignore: true contract.
  • SQL tenantPerDocument (SQL.php:2499): The adapter-level duplicate check filters by $this->getTenant() (session tenant). In tenant-per-document mode, cross-tenant duplicates are missed by the query, pass through to INSERT IGNORE, are silently discarded by the DB, but still appear in the returned $documents array — causing onNext overcounting for those rows.

Confidence Score: 3/5

Two P1 bugs can cause ignore: true to throw exceptions or silently overcount; needs fixes before merging.

The MongoDB firstBatch truncation is a definite correctness break for any batch with >~101 pre-existing documents — insertMany will throw in a transaction instead of silently skipping. The SQL tenantPerDocument pre-filter bug causes onNext to fire for docs that were never inserted. Both bugs are on the primary new path (ignore: true) and affect the stated migration use-case.

src/Database/Adapter/Mongo.php (firstBatch limit) and src/Database/Adapter/SQL.php (tenantPerDocument tenant filter)

Vulnerabilities

No security concerns identified. The ignore path uses parameterized queries throughout (both the pre-filter SELECT and the INSERT IGNORE/ON CONFLICT statements bind values via PDO). Authorization checks happen before the adapter call and are not bypassed by the new flag.

Important Files Changed

Filename Overview
src/Database/Adapter/Mongo.php Adds pre-filter for ignore mode; firstBatch cursor truncation can miss existing UIDs when batch has >~101 duplicates, causing insertMany to throw in transaction context.
src/Database/Adapter/SQL.php Adds ignore-mode duplicate pre-filter and INSERT IGNORE/ON CONFLICT plumbing; tenantPerDocument pre-filter incorrectly uses session tenant instead of per-document tenant.
src/Database/Database.php Adds ignore parameter with intra-batch deduplication and chunked pre-fetch of existing IDs; logic is correct, but count/onNext overcount on SQL race-condition is a known documented limitation.
src/Database/Adapter/Postgres.php Adds ON CONFLICT DO NOTHING suffix; COLLATE mismatch concern was already discussed in prior review thread and accepted by the team.
src/Database/Mirror.php Forwards ignore to both source and destination; destination call intentionally omits onNext/onError (pre-existing pattern), no new issues.
src/Database/Adapter/Pool.php Trivial signature update to add $ignore parameter; delegates via func_get_args, correct.
src/Database/Adapter/SQLite.php Adds INSERT OR IGNORE INTO override; correct SQLite syntax, no issues.
src/Database/Adapter.php Abstract signature extended with bool $ignore = false; docblock updated correctly.
tests/e2e/Adapter/Scopes/DocumentTests.php Adds three well-structured E2E tests covering mixed-batch, intra-batch, and all-duplicate scenarios; assertions are precise.

Reviews (3): Last reviewed commit: "Add tenant filter to Mongo adapter pre-f..." | Re-trigger Greptile

Comment on lines +1373 to +1395
protected function getInsertSuffix(bool $ignore, string $table): string
{
if (!$ignore) {
return '';
}

$conflictTarget = $this->sharedTables ? '("_uid", "_tenant")' : '("_uid")';

return "ON CONFLICT {$conflictTarget} DO NOTHING";
}

protected function getInsertPermissionsSuffix(bool $ignore): string
{
if (!$ignore) {
return '';
}

$conflictTarget = $this->sharedTables
? '("_type", "_permission", "_document", "_tenant")'
: '("_type", "_permission", "_document")';

return "ON CONFLICT {$conflictTarget} DO NOTHING";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 PostgreSQL ON CONFLICT conflict targets missing COLLATE utf8_ci_ai

Both getInsertSuffix and getInsertPermissionsSuffix produce ON CONFLICT clauses whose conflict targets do not match the actual unique indexes created by the Postgres adapter. PostgreSQL uses inference to find the unique index — when a unique index was built with a non-default collation (here utf8_ci_ai, which is deterministic = false), that same collation must be specified in the conflict target, otherwise PostgreSQL will throw:

there is no unique or exclusion constraint matching the ON CONFLICT specification

Actual unique indexes vs. conflict targets:

Table Actual index Conflict target in PR Issue
main (non-shared) ("_uid" COLLATE utf8_ci_ai) ("_uid") ❌ missing COLLATE
main (shared) ("_uid" COLLATE utf8_ci_ai, "_tenant") ("_uid", "_tenant") ❌ missing COLLATE on _uid
perms (non-shared) (_document COLLATE utf8_ci_ai, _type, _permission) ("_type", "_permission", "_document") ❌ missing COLLATE on _document
perms (shared) (_tenant, _document, _type, _permission) ("_type", "_permission", "_document", "_tenant") ✅ no collation in index

The fixes:

// getInsertSuffix – main document table
protected function getInsertSuffix(bool $ignore, string $table): string
{
    if (!$ignore) {
        return '';
    }

    $conflictTarget = $this->sharedTables
        ? '("_uid" COLLATE utf8_ci_ai, "_tenant")'
        : '("_uid" COLLATE utf8_ci_ai)';

    return "ON CONFLICT {$conflictTarget} DO NOTHING";
}

// getInsertPermissionsSuffix – permissions table
protected function getInsertPermissionsSuffix(bool $ignore): string
{
    if (!$ignore) {
        return '';
    }

    $conflictTarget = $this->sharedTables
        ? '("_type", "_permission", "_document", "_tenant")'
        : '("_document" COLLATE utf8_ci_ai, "_type", "_permission")';

    return "ON CONFLICT {$conflictTarget} DO NOTHING";
}

Without this fix, createDocuments(..., ignore: true) will always fail at the PostgreSQL level for non-shared tables (and shared tables' main document insert), raising a PDOException rather than silently ignoring duplicates.

Copy link
Copy Markdown
Contributor Author

@premtsd-code premtsd-code Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing upsertDocuments at [Postgres.php:1462] uses ("_uid") without COLLATE and works in production

https://github.com/utopia-php/database/blob/csv-import-upsert-v2/src/Database/Adapter/Postgres.php#L1462

Comment on lines 1497 to 1512
try {
$documents = $this->client->insertMany($name, $records, $options);
} catch (MongoException $e) {
throw $this->processException($e);
$processed = $this->processException($e);

if ($ignore && $processed instanceof DuplicateException) {
// Race condition: a doc was inserted between pre-filter and insertMany.
// With ordered:false outside transaction, non-duplicate inserts persist.
// Return empty — we cannot determine which docs succeeded without querying.
return [];
}

throw $processed;
}

foreach ($documents as $index => $document) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Race-condition path silently drops onNext callbacks for successfully-inserted documents

When a DuplicateException is caught here (a race condition where another writer inserted a document between the pre-filter find and insertMany), the method returns []. Because MongoDB used ordered: false, some non-duplicate documents in the batch were persisted, but Database::createDocuments will receive an empty $batch and:

  1. Call $onNext for none of those documents.
  2. Not increment $modified for them.

The comment in the code acknowledges this, but callers of createDocuments(..., ignore: true) may not be aware. Consider documenting this limitation in the Database::createDocuments docblock, or performing a follow-up find to discover which documents actually landed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation, documented in code comments. This only triggers on a race condition — the pre-fetch already filters known duplicates before they reach the adapter. The utopia-php/mongo client does not expose BulkWriteException::getWriteResult(), so recovering partial success info would require changes to the client library. For the migration use case (single writer), this race cannot occur.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known limitation. This only triggers on a race condition — pre-fetch filters known duplicates before reaching the adapter. The utopia-php/mongo client does not expose BulkWriteException::getWriteResult(), so recovering partial success info is not possible from this layer. Needs a fix in utopia-php/mongo to expose write results from insertMany — tracked for follow-up there. For the migration use case (single writer), this race cannot occur.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/Database/Adapter/SQL.php (1)

2575-2604: ⚠️ Potential issue | 🔴 Critical

Drive permission writes from the inserted subset, not the requested batch.

If the document insert silently skips any row, this code still inserts _perms entries for that document. In the race window that can add permissions to an already-existing document while leaving its _permissions JSON unchanged, and downstream code will still treat the skipped document as newly created. Please reconcile the actually inserted _uids before building the permissions batch or returning the result.

Based on learnings, in src/Database/Adapter/MariaDB.php only duplicate _uid violations are intended to be treated as document duplicates; other unique constraint violations should stay distinct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2575 - 2604, After executing the
batch INSERT in SQL.php (the block using getInsertKeyword, getSQLTable and
$this->execute($stmt)), do not blindly use the originally requested batch to
build $sqlPermissions; instead reconcile which _uid rows were actually inserted
and only write permissions for those: query the target table (using
getSQLTable($name)) for the subset of _uid values present in the just-executed
batch (using the same bind values or a short SELECT WHERE _uid IN (...)) to
compute the actually-inserted uids, then rebuild $permissions and
$bindValuesPermissions from that confirmed set before preparing and executing
the permissions INSERT (the block using getInsertPermissionsSuffix and
getSQLTable($name . '_perms')); also ensure behavior in MariaDB adapter still
treats only duplicate _uid unique-constraint violations as document-duplicates
while preserving other unique-constraint failures as distinct.
🧹 Nitpick comments (3)
src/Database/Adapter/SQL.php (1)

2617-2637: Mark the default hook arguments as intentionally unused.

PHPMD is already flagging $ignore and $table here. Renaming them to $_ignore / $_table or adding a local suppression will keep these new extension points from introducing permanent static-analysis noise.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2617 - 2637, The default
implementations of getInsertSuffix and getInsertPermissionsSuffix accept
parameters that are intentionally unused and cause PHPMD warnings; rename their
parameters to indicate unused (e.g., change getInsertSuffix(bool $ignore, string
$table) to getInsertSuffix(bool $_ignore, string $_table) and change
getInsertPermissionsSuffix(bool $ignore) to getInsertPermissionsSuffix(bool
$_ignore)) or otherwise explicitly suppress/consume the variables in those
methods so PHPMD no longer flags them; keep getInsertKeyword unchanged.
tests/e2e/Adapter/Scopes/DocumentTests.php (2)

7815-7818: Consider using __FUNCTION__ for collection naming consistency and test isolation.

Using __FUNCTION__ here (as in adjacent tests) reduces collision risk in repeated/partial runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7815 - 7818, The
test currently hardcodes the collection name in $col =
'createDocsIgnoreIntraBatch', which can cause collisions; change the assignment
to use __FUNCTION__ (e.g., $col = __FUNCTION__) so createCollection($col) and
createAttribute($col, 'name', Database::VAR_STRING, 128, true) use a unique name
per test run, matching adjacent tests' pattern and improving isolation.

7726-7920: Add a chunk-boundary duplicate-ignore test to cover maxQueryValues prefetch paths.

Current new tests are small-batch only. Please add at least one case with IDs > maxQueryValues (and, if supported, tenant-per-document composite-key mode) to exercise chunked duplicate prefetch behavior end-to-end.

Based on learnings In src/Database/Database.php, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) for tenant-per-document and non-tenant modes to satisfy DocumentsValidator limits on Query::equal('$id', ...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7726 - 7920, Add a
new E2E test that exercises chunked prefetching by creating and then calling
Database::createDocuments (and optionally upsertDocumentsWithIncrease) with a
batch size larger than Database::$maxQueryValues so the code goes through the
array_chunk(...) path; generate N = $this->getDatabase()->maxQueryValues + X IDs
(where X > 0), include some IDs that already exist and some duplicates
intra-batch, call createDocuments(..., ignore: true, onNext: ...) and assert
emitted IDs, final counts, that first-occurrence wins, and no ACL drift (similar
to testCreateDocumentsIgnoreIntraBatchDuplicates), and repeat the same in
tenant-per-document composite-key mode if supported to cover both chunking
branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1499-1507: The catch for MongoException in the insertMany path
currently returns an empty array when $ignore is true and processException($e)
yields a DuplicateException, which incorrectly signals “no writes” even though
with ordered:false some docs may have persisted; instead, remove the "return []"
behavior and either rethrow the original exception ($e) or propagate the
processed exception so callers can decide to query which docs succeeded. Update
the catch block in the insertMany flow (referencing processException(),
DuplicateException, $ignore, and the insertMany/ordered:false behavior) to avoid
silently returning an empty result and ensure the exception is propagated for
callers to detect/handle partial success.
- Around line 1467-1473: The current branch sets $options to ordered:false when
$ignore is true, which drops the session and silently escapes any active
transaction; instead add a guard that detects an active transaction/session
(e.g. check $this->getTransactionOptions(), $this->clientSession or an existing
helper like hasActiveTransaction()) and throw an exception when $ignore is true
while a transaction is active, so callers fail fast rather than weakening
atomicity; implement this check before altering $options and reference $ignore,
$options and getTransactionOptions() when adding the guard.

In `@src/Database/Database.php`:
- Around line 5658-5674: The dedupe loop currently only uses $document->getId(),
which drops documents with the same ID across different tenants when $ignore is
true; change the dedupe key to be tenant-aware by combining the tenant
identifier with the document ID (e.g., use a composite key like
"$tenantId:$docId") when checking and writing to $seenIds inside the $ignore
block; obtain the tenant id from the document (e.g., $document->getTenantId() or
equivalent accessor) and fall back to a stable empty/default tenant string if
none, then use that composite key in the isset/$seenIds assignment so documents
from different tenants aren’t deduplicated incorrectly.
- Around line 5682-5689: The tenant-prefetch loop that builds $idsByTenant from
$documents (using $doc->getTenant() and $doc->getId()) doesn't filter out empty
IDs, causing pointless Query::equal('$id', $idChunk) calls; update the bucket
cleanup so that before deduping/chunking you apply array_filter to remove
empty/falsey IDs (i.e. replace the current tenantIds handling with
array_values(array_unique(array_filter($tenantIds)))) so the subsequent
foreach(array_chunk(..., $this->maxQueryValues)) in the authorization->skip ->
withTenant -> silent -> find path only receives non-empty IDs; make the same
change in the other tenant-per-document block around the code noted at lines
~7223-7224 to match the non-tenant paths.

---

Outside diff comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2575-2604: After executing the batch INSERT in SQL.php (the block
using getInsertKeyword, getSQLTable and $this->execute($stmt)), do not blindly
use the originally requested batch to build $sqlPermissions; instead reconcile
which _uid rows were actually inserted and only write permissions for those:
query the target table (using getSQLTable($name)) for the subset of _uid values
present in the just-executed batch (using the same bind values or a short SELECT
WHERE _uid IN (...)) to compute the actually-inserted uids, then rebuild
$permissions and $bindValuesPermissions from that confirmed set before preparing
and executing the permissions INSERT (the block using getInsertPermissionsSuffix
and getSQLTable($name . '_perms')); also ensure behavior in MariaDB adapter
still treats only duplicate _uid unique-constraint violations as
document-duplicates while preserving other unique-constraint failures as
distinct.

---

Nitpick comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2617-2637: The default implementations of getInsertSuffix and
getInsertPermissionsSuffix accept parameters that are intentionally unused and
cause PHPMD warnings; rename their parameters to indicate unused (e.g., change
getInsertSuffix(bool $ignore, string $table) to getInsertSuffix(bool $_ignore,
string $_table) and change getInsertPermissionsSuffix(bool $ignore) to
getInsertPermissionsSuffix(bool $_ignore)) or otherwise explicitly
suppress/consume the variables in those methods so PHPMD no longer flags them;
keep getInsertKeyword unchanged.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7815-7818: The test currently hardcodes the collection name in
$col = 'createDocsIgnoreIntraBatch', which can cause collisions; change the
assignment to use __FUNCTION__ (e.g., $col = __FUNCTION__) so
createCollection($col) and createAttribute($col, 'name', Database::VAR_STRING,
128, true) use a unique name per test run, matching adjacent tests' pattern and
improving isolation.
- Around line 7726-7920: Add a new E2E test that exercises chunked prefetching
by creating and then calling Database::createDocuments (and optionally
upsertDocumentsWithIncrease) with a batch size larger than
Database::$maxQueryValues so the code goes through the array_chunk(...) path;
generate N = $this->getDatabase()->maxQueryValues + X IDs (where X > 0), include
some IDs that already exist and some duplicates intra-batch, call
createDocuments(..., ignore: true, onNext: ...) and assert emitted IDs, final
counts, that first-occurrence wins, and no ACL drift (similar to
testCreateDocumentsIgnoreIntraBatchDuplicates), and repeat the same in
tenant-per-document composite-key mode if supported to cover both chunking
branches.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5f33c139-d8d4-4a2d-8818-031ba4b5ceea

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and 287420b.

📒 Files selected for processing (9)
  • src/Database/Adapter.php
  • src/Database/Adapter/Mongo.php
  • src/Database/Adapter/Pool.php
  • src/Database/Adapter/Postgres.php
  • src/Database/Adapter/SQL.php
  • src/Database/Adapter/SQLite.php
  • src/Database/Database.php
  • src/Database/Mirror.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php

Comment on lines +1467 to +1473
if ($ignore) {
// Run outside transaction — MongoDB aborts transactions on any write error,
// so ordered:false + session would roll back even successfully inserted docs.
$options = ['ordered' => false];
} else {
$options = $this->getTransactionOptions();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't silently escape the active transaction.

When $ignore is true this write no longer carries the current session, so a caller that is already inside a transaction can still end up committing these documents even if the surrounding transaction later rolls back. Failing fast here is safer than silently weakening atomicity.

🔒 Suggested guard
-        if ($ignore) {
+        if ($ignore && $this->inTransaction > 0) {
+            throw new DatabaseException('createDocuments(ignore: true) is not supported inside MongoDB transactions');
+        }
+
+        if ($ignore) {
             // Run outside transaction — MongoDB aborts transactions on any write error,
             // so ordered:false + session would roll back even successfully inserted docs.
             $options = ['ordered' => false];
         } else {
             $options = $this->getTransactionOptions();
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ($ignore) {
// Run outside transaction — MongoDB aborts transactions on any write error,
// so ordered:false + session would roll back even successfully inserted docs.
$options = ['ordered' => false];
} else {
$options = $this->getTransactionOptions();
}
if ($ignore && $this->inTransaction > 0) {
throw new DatabaseException('createDocuments(ignore: true) is not supported inside MongoDB transactions');
}
if ($ignore) {
// Run outside transaction — MongoDB aborts transactions on any write error,
// so ordered:false + session would roll back even successfully inserted docs.
$options = ['ordered' => false];
} else {
$options = $this->getTransactionOptions();
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Mongo.php` around lines 1467 - 1473, The current branch
sets $options to ordered:false when $ignore is true, which drops the session and
silently escapes any active transaction; instead add a guard that detects an
active transaction/session (e.g. check $this->getTransactionOptions(),
$this->clientSession or an existing helper like hasActiveTransaction()) and
throw an exception when $ignore is true while a transaction is active, so
callers fail fast rather than weakening atomicity; implement this check before
altering $options and reference $ignore, $options and getTransactionOptions()
when adding the guard.

Comment on lines +1499 to +1507
} catch (MongoException $e) {
throw $this->processException($e);
$processed = $this->processException($e);

if ($ignore && $processed instanceof DuplicateException) {
// Race condition: a doc was inserted between pre-filter and insertMany.
// With ordered:false outside transaction, non-duplicate inserts persist.
// Return empty — we cannot determine which docs succeeded without querying.
return [];
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't report the raced batch as empty.

With ordered: false, inserts before the duplicate can still persist. Returning [] here makes callers treat the batch as a no-op, so follow-up work for actually inserted documents can be skipped even though those writes already committed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Mongo.php` around lines 1499 - 1507, The catch for
MongoException in the insertMany path currently returns an empty array when
$ignore is true and processException($e) yields a DuplicateException, which
incorrectly signals “no writes” even though with ordered:false some docs may
have persisted; instead, remove the "return []" behavior and either rethrow the
original exception ($e) or propagate the processed exception so callers can
decide to query which docs succeeded. Update the catch block in the insertMany
flow (referencing processException(), DuplicateException, $ignore, and the
insertMany/ordered:false behavior) to avoid silently returning an empty result
and ensure the exception is propagated for callers to detect/handle partial
success.

…ering

- Pre-filter duplicates inside MongoDB session instead of bypassing the
  transaction, ensuring ignore mode works correctly within transactions
- Use tenant-aware composite keys for intra-batch dedup in
  tenant-per-document mode
- Add array_filter() to tenant prefetch paths to skip empty IDs,
  consistent with non-tenant paths
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1492-1500: The pre-filter in the insert-ignore path of Mongo
adapter misses tenant scoping when $this->sharedTables is true, so modify the
query used in Mongo::client->find (inside the block handling $ignore and
$records where $uids is built) to include the tenant condition the same way
getDocument does: build a filter that combines ['_uid' => ['$in' =>
array_values($uids)]] with the tenant predicate (using $this->getTenantFilter or
the same _tenant value logic) before calling $this->getTransactionOptions and
$this->client->find, so existingUids is computed only for documents belonging to
the current tenant. Ensure variable names ($uids, $existingUids, $findOptions)
remain consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b9a2cb68-c98c-4fd8-b7c9-598046eeaaec

📥 Commits

Reviewing files that changed from the base of the PR and between 287420b and 2b4fd1e.

📒 Files selected for processing (2)
  • src/Database/Adapter/Mongo.php
  • src/Database/Database.php
✅ Files skipped from review due to trivial changes (1)
  • src/Database/Database.php

Comment on lines +2496 to +2500
$tenantFilter = '';
if ($this->sharedTables) {
$tenantFilter = ' AND _tenant = :_dup_tenant';
$binds[':_dup_tenant'] = $this->getTenant();
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 tenantPerDocument pre-filter uses session tenant, misses cross-tenant duplicates

When tenantPerDocument is enabled, a single createDocuments call may contain documents for different tenants (each document->getTenant() can differ). The pre-filter query uses AND _tenant = :_dup_tenant bound to $this->getTenant() — the session-level tenant — so it never finds existing documents whose _tenant differs from the session tenant.

Those undetected duplicates are passed to INSERT IGNORE/ON CONFLICT DO NOTHING, which correctly discards them at the DB layer. However, SQL::createDocuments still returns the original $documents array (line 2658), so Database::createDocuments calls $onNext and increments $modified for documents that were silently skipped — an overcount identical to the known race-condition overcount.

The minimal fix is to filter by (_uid, _tenant) pairs instead of a single session tenant, or restrict this pre-check to non-tenantPerDocument mode and rely solely on the DB-level ignore keyword for that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant