Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions backend/supabase/functions/cleanup-orphan-images/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Cleanup Orphan Images

Deletes images from scan-images bucket if:

- older than 24 hours
- no matching record exists in scans table
46 changes: 46 additions & 0 deletions backend/supabase/functions/cleanup-orphan-images/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
import { createClient } from "@supabase/supabase-js";

const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);
Comment on lines +3 to +6

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit error handling for missing environment variables.

The ! non-null assertion will throw a runtime error if SUPABASE_URL or SUPABASE_SERVICE_ROLE_KEY are not set. While Supabase Edge Functions typically provide these variables, explicit validation would make the function more robust and provide clearer error messages.

🛡️ Proposed fix to add validation
+const supabaseUrl = Deno.env.get("SUPABASE_URL");
+const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
+
+if (!supabaseUrl || !supabaseKey) {
+  throw new Error("Missing required environment variables");
+}
+
 const supabase = createClient(
-  Deno.env.get("SUPABASE_URL")!,
-  Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
+  supabaseUrl,
+  supabaseKey
 );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);
const supabaseUrl = Deno.env.get("SUPABASE_URL");
const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
if (!supabaseUrl || !supabaseKey) {
throw new Error("Missing required environment variables");
}
const supabase = createClient(
supabaseUrl,
supabaseKey
);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 3 -
6, The code uses non-null assertions when calling createClient with
Deno.env.get("SUPABASE_URL") and Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"), so
add explicit validation before createClient: read both env vars into local
constants (e.g., supabaseUrl and supabaseKey), check if either is missing, and
throw or log a clear error (including which variable is missing) instead of
relying on the `!`; then pass the validated values to createClient. Reference:
the createClient invocation and the SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY env
keys in index.ts.


Deno.serve(async () => {
const bucket = "scan-images";

const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit: 1000 });

if (error) {
return new Response(error.message, { status: 500 });
}
Comment on lines +11 to +17

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Implement pagination to handle more than 1000 files.

The storage listing is capped at 1000 files with no pagination. If the scan-images bucket contains more than 1000 objects, files beyond this limit will never be evaluated for cleanup, causing orphaned images to accumulate indefinitely.

♻️ Proposed fix to add pagination
  const bucket = "scan-images";
+  let offset = 0;
+  const limit = 1000;
+  let hasMore = true;

+  while (hasMore) {
-  const { data: files, error } = await supabase.storage
-    .from(bucket)
-    .list("", { limit: 1000 });
+    const { data: files, error } = await supabase.storage
+      .from(bucket)
+      .list("", { limit, offset });

-  if (error) {
-    return new Response(error.message, { status: 500 });
-  }
+    if (error) {
+      return new Response(error.message, { status: 500 });
+    }

+    if (!files || files.length < limit) {
+      hasMore = false;
+    }

   const now = Date.now();

   for (const file of files ?? []) {
     // ... existing processing logic
   }

+    offset += limit;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit: 1000 });
if (error) {
return new Response(error.message, { status: 500 });
}
const bucket = "scan-images";
let offset = 0;
const limit = 1000;
let hasMore = true;
while (hasMore) {
const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit, offset });
if (error) {
return new Response(error.message, { status: 500 });
}
if (!files || files.length < limit) {
hasMore = false;
}
const now = Date.now();
for (const file of files ?? []) {
// ... existing processing logic
}
offset += limit;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 11 -
17, The current supabase.storage.from(bucket).list call is limited to 1000 items
(variables: files, error, bucket) so implement pagination by repeatedly calling
list with a moving offset (or continuation token if your Supabase client
supports it) and accumulating results until a call returns fewer than the page
size; inside that loop handle and return on any error (same error handling as
the existing block) and replace the single files usage with the full accumulated
array for downstream cleanup logic.


const now = Date.now();

for (const file of files ?? []) {
if (!file.created_at) continue;

const ageHours =
(now - new Date(file.created_at).getTime()) /
(1000 * 60 * 60);

if (ageHours < 24) continue;

const { data: scan } = await supabase
.from("scans")
.select("id")
.contains("photo_urls", [file.name])
.maybeSingle();
Comment on lines +30 to +34

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: File path/URL mismatch will delete all old images, including those with valid scan records.

The orphan detection logic compares file.name (storage object path, e.g., "user123/scan456.jpg") against photo_urls, which stores public URLs (e.g., "https://...supabase.co/storage/v1/object/public/scan-images/user123/scan456.jpg"), as shown in backend/main.py:291-304 where get_public_url(path) is stored.

This mismatch means:

  • The .contains() query will never find a match
  • Every file older than 24 hours will be considered orphaned
  • All images with valid scan records will be incorrectly deleted
  • This causes immediate data loss when the function runs

Verify the mismatch by checking what values are actually stored in the database:

#!/bin/bash
# Description: Confirm that photo_urls contains full public URLs, not just file names

# Show the upload function that generates photo_urls values
echo "=== Upload function (shows get_public_url is stored) ==="
rg -A 3 "get_public_url" backend/main.py

echo ""
echo "=== Scan insert (shows photo_urls receives the URL) ==="
rg -B 2 -A 2 'photo_urls.*photo_url' backend/main.py
🔧 Proposed fix to construct the public URL for matching
+    // Construct the public URL to match what's stored in photo_urls
+    const publicUrl = supabase.storage
+      .from(bucket)
+      .getPublicUrl(file.name).data.publicUrl;
+
     const { data: scan } = await supabase
       .from("scans")
       .select("id")
-      .contains("photo_urls", [file.name])
+      .contains("photo_urls", [publicUrl])
       .maybeSingle();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
34, The query is comparing storage object paths (file.name) to database
photo_urls which store public URLs, so .contains("photo_urls", [file.name]) will
never match; change the lookup to compare the same public URL string stored by
backend/main.py's get_public_url. Concretely, in cleanup-orphan-images/index.ts
compute the public URL for file.name (either via
supabase.storage.from('scan-images').getPublicUrl(file.name).publicUrl or by
composing
`${process.env.SUPABASE_URL}/storage/v1/object/public/scan-images/${file.name}`
to match get_public_url) and use that value in the .contains("photo_urls",
[publicUrl]) call so scans are correctly detected and not deleted.


if (!scan) {
await supabase.storage
.from(bucket)
.remove([file.name]);
Comment on lines +37 to +39

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add error handling for storage deletion failures.

The .remove() operation on Line 39 has no error handling. If deletion fails (e.g., due to permissions or network issues), the function continues silently, leaving orphans undeleted without any indication of the failure.

🛡️ Proposed fix to handle deletion errors
-      await supabase.storage
+      const { error: removeError } = await supabase.storage
         .from(bucket)
         .remove([file.name]);

+      if (removeError) {
+        console.error(`Failed to delete ${file.name}: ${removeError.message}`);
+        continue;
+      }
+
       console.log(`Deleted orphan image: ${file.name}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await supabase.storage
.from(bucket)
.remove([file.name]);
const { error: removeError } = await supabase.storage
.from(bucket)
.remove([file.name]);
if (removeError) {
console.error(`Failed to delete ${file.name}: ${removeError.message}`);
continue;
}
console.log(`Deleted orphan image: ${file.name}`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 37 -
39, The storage removal call using
supabase.storage.from(bucket).remove([file.name]) lacks error handling; update
the code in the cleanup-orphan-images handler to capture the result/error
returned by .remove(), check if an error was returned, and handle it (e.g., log
a descriptive error including bucket and file.name via your logger/console,
optionally collect failed deletions for a summary or retry/backoff, and avoid
silently continuing as if deletion succeeded). Ensure you reference the .remove
call and file.name when logging so failures are traceable.


console.log(`Deleted orphan image: ${file.name}`);
}
Comment on lines +30 to +42

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

N+1 query problem: Optimize database access pattern.

The function executes one database query per file in a loop (up to 1000 queries). This creates significant database load and extends execution time. Batch the operation by fetching all photo_urls once, then checking membership in memory.

♻️ Proposed optimization to eliminate N+1 queries
  const now = Date.now();
+  
+  // Collect all files older than 24 hours
+  const oldFiles = [];
  for (const file of files ?? []) {
    if (!file.created_at) continue;

    const ageHours =
      (now - new Date(file.created_at).getTime()) /
      (1000 * 60 * 60);

-    if (ageHours < 24) continue;
+    if (ageHours >= 24) {
+      oldFiles.push(file.name);
+    }
+  }
+
+  if (oldFiles.length === 0) {
+    return new Response("Cleanup completed");
+  }
+
+  // Fetch all photo_urls in one query
+  const { data: scans } = await supabase
+    .from("scans")
+    .select("photo_urls");
+
+  // Build a Set of all referenced URLs for O(1) lookup
+  const referencedUrls = new Set<string>();
+  for (const scan of scans ?? []) {
+    for (const url of scan.photo_urls ?? []) {
+      referencedUrls.add(url);
+    }
+  }

-    const { data: scan } = await supabase
-      .from("scans")
-      .select("id")
-      .contains("photo_urls", [file.name])
-      .maybeSingle();
+  // Delete orphans
+  for (const fileName of oldFiles) {
+    const publicUrl = supabase.storage
+      .from(bucket)
+      .getPublicUrl(fileName).data.publicUrl;

-    if (!scan) {
+    if (!referencedUrls.has(publicUrl)) {
       await supabase.storage
         .from(bucket)
-        .remove([file.name]);
+        .remove([fileName]);

-      console.log(`Deleted orphan image: ${file.name}`);
+      console.log(`Deleted orphan image: ${fileName}`);
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
42, The loop currently calls
supabase.from("scans").select("id").contains("photo_urls", [file.name]) per file
causing N+1 queries; instead fetch all scan photo_urls once (e.g.,
supabase.from("scans").select("photo_urls")) and build an in-memory Set or Map
of all referenced filenames, then iterate the storage files and for each
file.name check membership in that Set before calling
supabase.storage.from(bucket).remove or logging deletion; update the code around
the existing variables (file, bucket, supabase) and ensure the single bulk query
handles nullable/array photo_urls entries and flattens them to strings for the
membership check.

}

return new Response("Cleanup completed");
});
Loading