Skip to content

feat: add ignoreDuplicates option to insertMany#36

Open
premtsd-code wants to merge 3 commits intomainfrom
csv-import-upsert
Open

feat: add ignoreDuplicates option to insertMany#36
premtsd-code wants to merge 3 commits intomainfrom
csv-import-upsert

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

@premtsd-code premtsd-code commented Apr 10, 2026

Summary

  • Adds ignoreDuplicates option to insertMany() that silently skips duplicate key errors (11000/11001) instead of throwing
  • Forces ordered: false when active so MongoDB continues inserting remaining docs after a conflict
  • Uses writeErrors[].index to identify which batch entries failed, returning only successfully inserted documents
  • Non-duplicate write errors still throw as before
  • Passes the flag through the call chain (query() -> send() -> receive() -> parseResponse()) instead of using class-level mutable state, ensuring Swoole coroutine safety

Why: The utopia-php/database Mongo adapter's createDocuments(ignore: true) pre-filters known duplicates via find() before calling insertMany(), but a concurrent insert between the check and the insert (TOCTOU race) would abort the transaction. This change makes insertMany() handle race-condition duplicates gracefully at the wire protocol level.

Changes

Client.php

  • insertMany(): new ignoreDuplicates option — forces ordered: false, filters failed indices from results
  • query(), send(), receive(), parseResponse(): thread bool $ignoreDuplicateErrors parameter (no shared mutable state)
  • parseResponse(): when flag is set, suppress duplicate key writeErrors (11000/11001), re-throw all others
  • ordered key excluded from $otherOptions passthrough to prevent accidental override

MongoTest.php

  • testInsertManyIgnoreDuplicates: mixed batch (existing + new), verifies only new docs returned, originals unchanged
  • testInsertManyIgnoreIntraBatchDuplicates: same ID twice in batch, first wins
  • testInsertManyIgnoreAllDuplicates: all duplicates, zero inserts returned

Test plan

  • Run docker compose exec tests vendor/bin/phpunit to verify all tests pass
  • Verify insertMany() with ignoreDuplicates: true skips duplicate docs and returns only inserted ones
  • Verify non-duplicate write errors still throw when ignoreDuplicates is true
  • Verify default behavior (ignoreDuplicates: false) is unchanged
  • Run utopia-php/database E2E tests to confirm createDocuments(ignore: true) works end-to-end

- Use ordered:false so MongoDB continues inserting after a duplicate key error
- Suppress duplicate key writeErrors (11000/11001) in receive() via instance flag
- Use writeError indices to filter out failed docs from the returned array
- Non-duplicate writeErrors still throw as before
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 10, 2026

📝 Walkthrough

Walkthrough

Client gained an opt-in flag to ignore duplicate-key write errors. query(), send(), and receive() propagate this flag into parseResponse(). parseResponse() can suppress duplicate-key errors (11000/11001) and return raw results if only duplicates occur. insertMany() adds ignoreDuplicates and forces ordered=false when enabled.

Changes

Cohort / File(s) Summary
Client core
src/Client.php
Added ignoreDuplicateErrors boolean threaded through query(array $command, ?string $db = null, bool $ignoreDuplicateErrors = false), send(mixed $data, bool $ignoreDuplicateErrors = false), and receive(bool $ignoreDuplicateErrors = false) into parseResponse(string $response, int $responseLength, bool $ignoreDuplicateErrors = false). parseResponse() now suppresses duplicate-key write errors (11000/11001) when requested and returns raw result if only duplicates exist. insertMany() gained ignoreDuplicates option, forces ordered to false when enabled, calls query(..., $ignoreDuplicates), and filters out documents whose batch indices correspond to duplicate-key writeErrors before returning inserted documents. Inline docs/comments updated accordingly.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibble through the error heap,

Duplicate seeds I softly sweep.
Ordered off, the good remain,
No noisy throw, just gentle gain.
Hops and bytes — a cleaner grain.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add ignoreDuplicates option to insertMany' accurately and concisely describes the main change: adding a new ignoreDuplicates option to the insertMany method.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch csv-import-upsert

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Client.php`:
- Around line 789-797: Explicitly initialize the instance flag before
conditionally setting it so it can't inherit prior state: at the start of the
operation assign $this->ignoreDuplicateErrors to a known boolean (e.g. false)
and then set it to true if $ignoreDuplicates is truthy, call query($command) in
the try block, and in the finally block reset $this->ignoreDuplicateErrors to
false; reference the $ignoreDuplicates parameter, the
$this->ignoreDuplicateErrors property, and the query($command) call when making
this change.
- Around line 46-51: The class-level flag $ignoreDuplicateErrors creates a race;
remove the mutable property and instead pass an explicit boolean parameter
through the call chain (e.g., add an optional bool $ignoreDuplicateErrors =
false argument to insertMany(), query(), send(), receive(), and
parseResponse()); change Client::insertMany to call query(...) with true when
ignoreDuplicates is requested, propagate that flag through query() -> send() ->
receive() -> parseResponse() and use it only locally to decide suppression of
11000/11001 errors; ensure all other callers of query()/send()/receive() pass
the default false to preserve existing behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f12071df-d604-408a-bc52-a58f9388ada6

📥 Commits

Reviewing files that changed from the base of the PR and between 677a21c and 28e9c27.

📒 Files selected for processing (1)
  • src/Client.php

@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR adds an ignoreDuplicates option to insertMany() that silently skips duplicate-key errors (11000/11001) by forcing ordered: false and using writeErrors[].index to return only the successfully inserted documents. Both previously flagged issues have been resolved: ordered is now excluded from $otherOptions so the forced false cannot be overridden by the caller, and the ignoreDuplicateErrors flag is explicitly threaded through query → send → receive → parseResponse rather than held as shared mutable state.

Confidence Score: 5/5

Safe to merge — both previously flagged issues are addressed and the logic is correct across all batch/error combinations.

No P0 or P1 findings remain. The flag-threading approach is clean, the ordered-override hole is closed, and the three new integration tests cover the primary scenarios (mixed, intra-batch, all-duplicate). The return-value path handles the integer-vs-object result correctly when no errors occur.

No files require special attention.

Important Files Changed

Filename Overview
src/Client.php Adds ignoreDuplicates parameter threaded through query→send→receive→parseResponse; ordered is correctly excluded from $otherOptions; duplicate-error filtering logic in insertMany is correct.
tests/MongoTest.php Three new integration tests covering the core scenarios: mixed duplicate/new docs, intra-batch duplicate collision, and an all-duplicate batch; assertions are correct.

Reviews (3): Last reviewed commit: "Add tests for insertMany ignoreDuplicate..." | Re-trigger Greptile

…flag

Addresses review feedback:
- Remove mutable $ignoreDuplicateErrors property to prevent Swoole coroutine races
- Thread the flag as a parameter through query() -> send() -> receive() -> parseResponse()
- Add 'ordered' to array_diff_key exclusion list to prevent caller override
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/Client.php (2)

785-790: Centralize duplicate-key detection.

The same 11000/11001 check now lives in both the response parser and the insertMany post-filter. A small helper would keep those paths aligned with each other and with src/Exception.php:84-90.

♻️ Proposed refactor
+    private function isDuplicateWriteError(object $error): bool
+    {
+        return ($error->code ?? 0) === 11000 || ($error->code ?? 0) === 11001;
+    }
+
     public function insertMany(string $collection, array $documents, array $options = []): array
     {
         ...
                 foreach ($result->writeErrors as $err) {
-                    if (($err->code ?? 0) === 11000 || ($err->code ?? 0) === 11001) {
+                    if ($this->isDuplicateWriteError($err)) {
                         $failedIndices[$err->index] = true;
                     }
                 }
         ...
     }

     private function parseResponse(string $response, int $responseLength, bool $ignoreDuplicateErrors = false): stdClass|array|int
     {
         ...
                 if ($ignoreDuplicateErrors) {
                     $nonDuplicateErrors = \array_filter(
                         (array)$result->writeErrors,
-                        fn ($err) => ($err->code ?? 0) !== 11000 && ($err->code ?? 0) !== 11001
+                        fn ($err) => !$this->isDuplicateWriteError($err)
                     );
         ...
     }

Also applies to: 1657-1660

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Client.php` around lines 785 - 790, Create a small centralized helper
(e.g., isDuplicateKeyError(int|null $code): bool) and use it wherever the magic
numbers 11000/11001 are checked: replace the checks in Client.php's response
parser block that inspects $result->writeErrors (the loop that currently uses
($err->code ?? 0) === 11000 || ...), and the insertMany post-filter (the other
occurrence around lines 1657-1660), and any similar checks near
src/Exception.php (lines ~84-90) to call the helper instead; ensure the helper
accepts nullable codes or the error object/code and returns a boolean so both
code paths stay consistent.

783-803: Please add focused regression coverage for the index-based filter.

This branch now depends on writeErrors[].index to compute the returned documents. A mixed batch test and a multi-batch test would make this much safer to evolve.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Client.php` around lines 783 - 803, Add regression tests that exercise
the index-based filtering logic in Client.php that depends on
writeErrors[].index: create a mixed-batch unit test that sends a batch
containing both duplicate-key documents and unique documents and asserts that
the returned insertedDocs (via the code paths using $this->toArray($docObjs) and
the branch using $failedIndices built from $result->writeErrors) only contains
the non-duplicate items based on writeErrors[*].index; also add a
multi-batch/incremental insert test that simulates writeErrors returned for one
batch (with indices relative to that batch) and ensures subsequent batches merge
correctly into $insertedDocs, covering the branches around query(),
$ignoreDuplicates handling, and the array_merge behavior that uses
$this->toArray($filtered).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/Client.php`:
- Around line 785-790: Create a small centralized helper (e.g.,
isDuplicateKeyError(int|null $code): bool) and use it wherever the magic numbers
11000/11001 are checked: replace the checks in Client.php's response parser
block that inspects $result->writeErrors (the loop that currently uses
($err->code ?? 0) === 11000 || ...), and the insertMany post-filter (the other
occurrence around lines 1657-1660), and any similar checks near
src/Exception.php (lines ~84-90) to call the helper instead; ensure the helper
accepts nullable codes or the error object/code and returns a boolean so both
code paths stay consistent.
- Around line 783-803: Add regression tests that exercise the index-based
filtering logic in Client.php that depends on writeErrors[].index: create a
mixed-batch unit test that sends a batch containing both duplicate-key documents
and unique documents and asserts that the returned insertedDocs (via the code
paths using $this->toArray($docObjs) and the branch using $failedIndices built
from $result->writeErrors) only contains the non-duplicate items based on
writeErrors[*].index; also add a multi-batch/incremental insert test that
simulates writeErrors returned for one batch (with indices relative to that
batch) and ensures subsequent batches merge correctly into $insertedDocs,
covering the branches around query(), $ignoreDuplicates handling, and the
array_merge behavior that uses $this->toArray($filtered).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 37c4e832-c519-45be-9c5a-379fc9f28757

📥 Commits

Reviewing files that changed from the base of the PR and between 28e9c27 and 08c94f4.

📒 Files selected for processing (1)
  • src/Client.php

- testInsertManyIgnoreDuplicates: mixed batch with existing + new docs
- testInsertManyIgnoreIntraBatchDuplicates: same ID twice, first wins
- testInsertManyIgnoreAllDuplicates: all duplicates, zero inserts
@premtsd-code premtsd-code changed the title Add ignoreDuplicates option to insertMany feat: add ignoreDuplicates option to insertMany Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant