Skip to content

Add overwrite and skip options to Appwrite destination#168

Open
premtsd-code wants to merge 5 commits intomainfrom
csv-import-upsert
Open

Add overwrite and skip options to Appwrite destination#168
premtsd-code wants to merge 5 commits intomainfrom
csv-import-upsert

Conversation

@premtsd-code
Copy link
Copy Markdown
Contributor

Summary

  • Adds setOverwrite() and setSkip() methods to the Appwrite destination
  • When overwrite is true, uses upsertDocuments() to update existing rows with matching IDs
  • When skip is true, passes ignore: true to createDocuments() to silently skip duplicates
  • Points utopia-php/database to csv-import-upsert branch which adds the ignore parameter to createDocuments

Depends on utopia-php/database#850

Test plan

  • CSV import with skip: true — duplicate rows silently skipped
  • CSV import with overwrite: true — existing rows updated
  • CSV import with neither — duplicate rows throw error
  • JSON import with same scenarios

- Add setOverwrite() / setSkip() methods to Appwrite destination
- When overwrite is true, use upsertDocuments to update existing rows
- When skip is true, pass ignore: true to createDocuments to silently
  skip duplicates
- Point utopia-php/database to csv-import-upsert branch for ignore support
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Greptile Summary

This PR adds setOverwrite() and setSkip() methods to the Appwrite destination, routing bulk document writes through upsertDocuments or createDocuments(..., ignore: true) respectively, and fixes the cache check so cached rows are not prematurely skipped in overwrite mode.

  • P1 — Silent data loss in skip mode: When the row flagged as isLast=true is already in the migration cache, createRecord returns early before flushing rowBuffer, silently dropping all rows buffered in that batch. This affects both skip=true and the default mode in re-run scenarios.

Confidence Score: 4/5

Not safe to merge yet — the rowBuffer flush gap can silently drop rows in skip/re-run scenarios.

The overwrite/skip logic itself is sound and prior review concerns have been addressed, but the early-return path in createRecord can leave rowBuffer unflushed on the isLast row, causing silent data loss for any rows buffered before the cached last row.

src/Migration/Destinations/Appwrite.php — specifically the early-return cache check at line 1010 relative to the isLast flush block.

Vulnerabilities

No security concerns identified. The new flags control write semantics only and do not affect authentication, authorization, or input validation paths.

Important Files Changed

Filename Overview
src/Migration/Destinations/Appwrite.php Adds overwrite/skip flags with mutual-exclusivity guards and correct cache-bypass for overwrite; minor concern with rowBuffer not flushing when the isLast row is early-returned due to cache hit.
composer.json Pins utopia-php/database to an unmerged dev branch (dev-csv-import-upsert-v2 as 5.0.0) — already flagged in prior review thread.
composer.lock Lock file updated to reflect dev-branch database dependency and transitive dependency changes (utopia-php/console replaces utopia-php/compression and utopia-php/http).

Comments Outside Diff (1)

  1. src/Migration/Destinations/Appwrite.php, line 1010-1016 (link)

    P1 Buffered rows silently dropped when isLast row is cache-skipped

    When skip=true (or the default), if the row marked isLast=true by the caller is already in the migration cache, it returns false here before reaching the $this->rowBuffer[] append and the flush block at line 1050. Any rows already accumulated in $this->rowBuffer for that batch are silently discarded — the finally block that resets rowBuffer is only reached inside the if ($isLast) branch.

    For example: rows 1–9 (not cached) are appended to rowBuffer; row 10 (isLast=true, cached) hits this early return → rows 1–9 are never written to the DB and are not tracked as errors.

    One fix is to flush the buffer before returning early:

    if ($exists && !$this->overwrite) {
        // Flush any buffered rows before returning, in case this is the last resource
        if ($isLast && !empty($this->rowBuffer)) {
            $this->flushRowBuffer($resource);
        }
        $resource->setStatus(Resource::STATUS_SKIPPED, 'Row has already been created');
        return false;
    }

Reviews (4): Last reviewed commit: "Update composer.lock for database csv-im..." | Re-trigger Greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 8, 2026

Tip:

Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.

Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.

Comment on lines +1087 to +1097
if ($this->overwrite) {
$dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->upsertDocuments(
$collectionName,
$this->rowBuffer
));
} else {
$dbForDatabases->skipRelationshipsExistCheck(fn () => $dbForDatabases->createDocuments(
$collectionName,
$this->rowBuffer,
ignore: $this->skip
));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Successful upserts not reflected in resource status

When overwrite is true, upsertDocuments is called but the return value is silently discarded and the method unconditionally returns true (line 1106). If the upsert fails for any reason (other than throwing an exception), the caller has no way to know — and even in the success case, there is no cache update after a successful upsert. A successfully upserted row won't be added to the cache, so a re-run would attempt to upsert it again instead of treating it as already migrated.

The same gap exists for the skip path: rows silently skipped at the database level (because ignore: true suppresses the duplicate error) are returned as true — indistinguishable from a genuine insert — meaning the status report will show them as successfully migrated when they were actually dropped.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache is already updated at the import() level (line 368: $this->cache->update($responseResource)) for every resource regardless of the code path in createRecord(), so re-runs will see the row as cached.

For the skip path: returning true is the intended behavior — skip means "do not fail on duplicates, keep going." The migration completed successfully from the caller's perspective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant