Skip to content

fix: add orphan image cleanup function#87

Open
prachishelke1312 wants to merge 1 commit into
jpdevhub:mainfrom
prachishelke1312:fix/orphan-image-cleanup
Open

fix: add orphan image cleanup function#87
prachishelke1312 wants to merge 1 commit into
jpdevhub:mainfrom
prachishelke1312:fix/orphan-image-cleanup

Conversation

@prachishelke1312

@prachishelke1312 prachishelke1312 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds a Supabase Edge Function to automatically clean up orphaned images from the scan-images storage bucket.

Changes Made

  • Added cleanup-orphan-images Supabase Edge Function
  • Identifies images older than 24 hours
  • Checks for corresponding records in the scans table
  • Removes orphaned files that are no longer referenced
  • Added documentation for the cleanup process

Issue

Fixes #5

SSoC'26

This contribution was made as part of SSoC'26.

Summary by CodeRabbit

  • New Features
    • Automatic cleanup of orphaned images from storage. Images older than 24 hours with no associated records are now automatically deleted, improving storage efficiency and system maintenance.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

🎉 Thank you for your Pull Request! We're thrilled to have your contribution to FreshScan AI.

Before we review, please make sure you have:

  • Followed the CONTRIBUTING.md guidelines.
  • Ensured all automated CI checks (linting, tests) are passing.
  • Checked that your commit messages follow the Conventional Commits format.

A maintainer will review your code as soon as possible!

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

.coderabbit.yaml has a parsing error

The CodeRabbit configuration file in this repository has a parsing error and default settings were used instead. Please fix the error(s) in the configuration file. You can initialize chat with CodeRabbit to get help with the configuration file.

💥 Parsing errors (1)
Validation error: Invalid input: expected object, received boolean at "reviews.auto_review"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This PR introduces a Supabase Edge Function that automatically cleans up orphaned images in the scan-images storage bucket. The function lists bucket objects, identifies those older than 24 hours without corresponding scan records, deletes them, and returns a completion status. Documentation outlines the cleanup criteria and scope.

Changes

Orphan Image Cleanup Edge Function

Layer / File(s) Summary
Edge Function implementation and documentation
backend/supabase/functions/cleanup-orphan-images/index.ts, backend/supabase/functions/cleanup-orphan-images/README.md
Deno Edge Function initializes a service-role Supabase client, lists objects from the scan-images bucket, filters those older than 24 hours, queries the scans table to find matching photo_urls entries, deletes objects with no corresponding scan record, logs deletions, and returns completion status. README documents the cleanup job's purpose and deletion criteria.

Sequence Diagram

sequenceDiagram
  participant Handler as Deno.serve
  participant Client as Supabase Client
  participant Bucket as scan-images Bucket
  participant Table as scans Table
  Handler->>Client: Initialize with credentials
  Handler->>Bucket: List objects (max 1000)
  Bucket-->>Handler: Return objects with created_at
  loop For each object > 24h old
    Handler->>Table: Query photo_urls contains filename
    alt Scan record found
      Note over Handler: Skip deletion
    else No matching scan
      Handler->>Bucket: Delete orphan object
      Handler->>Handler: Log deletion
    end
  end
  Handler-->>Handler: Return "Cleanup completed"
Loading

🎯 2 (Simple) | ⏱️ ~10 minutes

🐰 A function to clean up orphans with care,
Images forgotten floating in the air,
After twenty-four hours, they disappear,
No matching scans to keep them here!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: add orphan image cleanup function' directly and clearly describes the main change: adding a cleanup function for orphaned images.
Linked Issues check ✅ Passed The PR implements all core requirements from issue #5: a Supabase Edge Function that identifies images older than 24 hours, checks for corresponding scans table records, and deletes orphaned files.
Out of Scope Changes check ✅ Passed All changes are directly aligned with issue #5 requirements; the PR adds only the cleanup function and its documentation without unrelated modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@vercel

vercel Bot commented Jun 8, 2026

Copy link
Copy Markdown

@prachishelke1312 is attempting to deploy a commit to the karan3431's projects Team on Vercel.

A member of the Team first needs to authorize it.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
backend/supabase/functions/cleanup-orphan-images/README.md (1)

1-6: 💤 Low value

Consider adding more implementation details to the documentation.

The README would be more helpful if it specified:

  • That matching is performed against the scans.photo_urls field
  • The expected URL format for matching
  • How often the function should be invoked (e.g., via cron)

This would help future maintainers understand the matching logic and deployment requirements.

📝 Suggested documentation enhancement
 # Cleanup Orphan Images

 Deletes images from scan-images bucket if:

 - older than 24 hours
-- no matching record exists in scans table
+- no matching record exists in scans table (checked via photo_urls field containing the public storage URL)
+
+## Usage
+
+Deploy as a Supabase Edge Function and invoke via cron (recommended: daily) or webhook.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/README.md` around lines 1 -
6, Update the README to explain that the function checks each object in the
scan-images bucket against the scans.photo_urls field in the scans table
(matching by exact URL or by normalizing to the bucket's public URL format,
e.g., https://{bucket}.storage.googleapis.com/{path}), state that only objects
older than 24 hours are considered, and add guidance on invocation frequency
(recommend a cron job every 24 hours or a shorter interval if rapid cleanup is
needed) plus example cron syntax and any required environment variables or
permissions the cleanup function needs to run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/supabase/functions/cleanup-orphan-images/index.ts`:
- Around line 3-6: The code uses non-null assertions when calling createClient
with Deno.env.get("SUPABASE_URL") and Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"),
so add explicit validation before createClient: read both env vars into local
constants (e.g., supabaseUrl and supabaseKey), check if either is missing, and
throw or log a clear error (including which variable is missing) instead of
relying on the `!`; then pass the validated values to createClient. Reference:
the createClient invocation and the SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY env
keys in index.ts.
- Around line 37-39: The storage removal call using
supabase.storage.from(bucket).remove([file.name]) lacks error handling; update
the code in the cleanup-orphan-images handler to capture the result/error
returned by .remove(), check if an error was returned, and handle it (e.g., log
a descriptive error including bucket and file.name via your logger/console,
optionally collect failed deletions for a summary or retry/backoff, and avoid
silently continuing as if deletion succeeded). Ensure you reference the .remove
call and file.name when logging so failures are traceable.
- Around line 11-17: The current supabase.storage.from(bucket).list call is
limited to 1000 items (variables: files, error, bucket) so implement pagination
by repeatedly calling list with a moving offset (or continuation token if your
Supabase client supports it) and accumulating results until a call returns fewer
than the page size; inside that loop handle and return on any error (same error
handling as the existing block) and replace the single files usage with the full
accumulated array for downstream cleanup logic.
- Around line 30-42: The loop currently calls
supabase.from("scans").select("id").contains("photo_urls", [file.name]) per file
causing N+1 queries; instead fetch all scan photo_urls once (e.g.,
supabase.from("scans").select("photo_urls")) and build an in-memory Set or Map
of all referenced filenames, then iterate the storage files and for each
file.name check membership in that Set before calling
supabase.storage.from(bucket).remove or logging deletion; update the code around
the existing variables (file, bucket, supabase) and ensure the single bulk query
handles nullable/array photo_urls entries and flattens them to strings for the
membership check.
- Around line 30-34: The query is comparing storage object paths (file.name) to
database photo_urls which store public URLs, so .contains("photo_urls",
[file.name]) will never match; change the lookup to compare the same public URL
string stored by backend/main.py's get_public_url. Concretely, in
cleanup-orphan-images/index.ts compute the public URL for file.name (either via
supabase.storage.from('scan-images').getPublicUrl(file.name).publicUrl or by
composing
`${process.env.SUPABASE_URL}/storage/v1/object/public/scan-images/${file.name}`
to match get_public_url) and use that value in the .contains("photo_urls",
[publicUrl]) call so scans are correctly detected and not deleted.

---

Nitpick comments:
In `@backend/supabase/functions/cleanup-orphan-images/README.md`:
- Around line 1-6: Update the README to explain that the function checks each
object in the scan-images bucket against the scans.photo_urls field in the scans
table (matching by exact URL or by normalizing to the bucket's public URL
format, e.g., https://{bucket}.storage.googleapis.com/{path}), state that only
objects older than 24 hours are considered, and add guidance on invocation
frequency (recommend a cron job every 24 hours or a shorter interval if rapid
cleanup is needed) plus example cron syntax and any required environment
variables or permissions the cleanup function needs to run.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 718abb1d-17df-414c-9ced-0cc852e5d7c8

📥 Commits

Reviewing files that changed from the base of the PR and between d33d33e and ca92e6a.

📒 Files selected for processing (2)
  • backend/supabase/functions/cleanup-orphan-images/README.md
  • backend/supabase/functions/cleanup-orphan-images/index.ts

Comment on lines +3 to +6
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add explicit error handling for missing environment variables.

The ! non-null assertion will throw a runtime error if SUPABASE_URL or SUPABASE_SERVICE_ROLE_KEY are not set. While Supabase Edge Functions typically provide these variables, explicit validation would make the function more robust and provide clearer error messages.

🛡️ Proposed fix to add validation
+const supabaseUrl = Deno.env.get("SUPABASE_URL");
+const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
+
+if (!supabaseUrl || !supabaseKey) {
+  throw new Error("Missing required environment variables");
+}
+
 const supabase = createClient(
-  Deno.env.get("SUPABASE_URL")!,
-  Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
+  supabaseUrl,
+  supabaseKey
 );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const supabase = createClient(
Deno.env.get("SUPABASE_URL")!,
Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
);
const supabaseUrl = Deno.env.get("SUPABASE_URL");
const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
if (!supabaseUrl || !supabaseKey) {
throw new Error("Missing required environment variables");
}
const supabase = createClient(
supabaseUrl,
supabaseKey
);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 3 -
6, The code uses non-null assertions when calling createClient with
Deno.env.get("SUPABASE_URL") and Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"), so
add explicit validation before createClient: read both env vars into local
constants (e.g., supabaseUrl and supabaseKey), check if either is missing, and
throw or log a clear error (including which variable is missing) instead of
relying on the `!`; then pass the validated values to createClient. Reference:
the createClient invocation and the SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY env
keys in index.ts.

Comment on lines +11 to +17
const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit: 1000 });

if (error) {
return new Response(error.message, { status: 500 });
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Implement pagination to handle more than 1000 files.

The storage listing is capped at 1000 files with no pagination. If the scan-images bucket contains more than 1000 objects, files beyond this limit will never be evaluated for cleanup, causing orphaned images to accumulate indefinitely.

♻️ Proposed fix to add pagination
  const bucket = "scan-images";
+  let offset = 0;
+  const limit = 1000;
+  let hasMore = true;

+  while (hasMore) {
-  const { data: files, error } = await supabase.storage
-    .from(bucket)
-    .list("", { limit: 1000 });
+    const { data: files, error } = await supabase.storage
+      .from(bucket)
+      .list("", { limit, offset });

-  if (error) {
-    return new Response(error.message, { status: 500 });
-  }
+    if (error) {
+      return new Response(error.message, { status: 500 });
+    }

+    if (!files || files.length < limit) {
+      hasMore = false;
+    }

   const now = Date.now();

   for (const file of files ?? []) {
     // ... existing processing logic
   }

+    offset += limit;
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit: 1000 });
if (error) {
return new Response(error.message, { status: 500 });
}
const bucket = "scan-images";
let offset = 0;
const limit = 1000;
let hasMore = true;
while (hasMore) {
const { data: files, error } = await supabase.storage
.from(bucket)
.list("", { limit, offset });
if (error) {
return new Response(error.message, { status: 500 });
}
if (!files || files.length < limit) {
hasMore = false;
}
const now = Date.now();
for (const file of files ?? []) {
// ... existing processing logic
}
offset += limit;
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 11 -
17, The current supabase.storage.from(bucket).list call is limited to 1000 items
(variables: files, error, bucket) so implement pagination by repeatedly calling
list with a moving offset (or continuation token if your Supabase client
supports it) and accumulating results until a call returns fewer than the page
size; inside that loop handle and return on any error (same error handling as
the existing block) and replace the single files usage with the full accumulated
array for downstream cleanup logic.

Comment on lines +30 to +34
const { data: scan } = await supabase
.from("scans")
.select("id")
.contains("photo_urls", [file.name])
.maybeSingle();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: File path/URL mismatch will delete all old images, including those with valid scan records.

The orphan detection logic compares file.name (storage object path, e.g., "user123/scan456.jpg") against photo_urls, which stores public URLs (e.g., "https://...supabase.co/storage/v1/object/public/scan-images/user123/scan456.jpg"), as shown in backend/main.py:291-304 where get_public_url(path) is stored.

This mismatch means:

  • The .contains() query will never find a match
  • Every file older than 24 hours will be considered orphaned
  • All images with valid scan records will be incorrectly deleted
  • This causes immediate data loss when the function runs

Verify the mismatch by checking what values are actually stored in the database:

#!/bin/bash
# Description: Confirm that photo_urls contains full public URLs, not just file names

# Show the upload function that generates photo_urls values
echo "=== Upload function (shows get_public_url is stored) ==="
rg -A 3 "get_public_url" backend/main.py

echo ""
echo "=== Scan insert (shows photo_urls receives the URL) ==="
rg -B 2 -A 2 'photo_urls.*photo_url' backend/main.py
🔧 Proposed fix to construct the public URL for matching
+    // Construct the public URL to match what's stored in photo_urls
+    const publicUrl = supabase.storage
+      .from(bucket)
+      .getPublicUrl(file.name).data.publicUrl;
+
     const { data: scan } = await supabase
       .from("scans")
       .select("id")
-      .contains("photo_urls", [file.name])
+      .contains("photo_urls", [publicUrl])
       .maybeSingle();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
34, The query is comparing storage object paths (file.name) to database
photo_urls which store public URLs, so .contains("photo_urls", [file.name]) will
never match; change the lookup to compare the same public URL string stored by
backend/main.py's get_public_url. Concretely, in cleanup-orphan-images/index.ts
compute the public URL for file.name (either via
supabase.storage.from('scan-images').getPublicUrl(file.name).publicUrl or by
composing
`${process.env.SUPABASE_URL}/storage/v1/object/public/scan-images/${file.name}`
to match get_public_url) and use that value in the .contains("photo_urls",
[publicUrl]) call so scans are correctly detected and not deleted.

Comment on lines +30 to +42
const { data: scan } = await supabase
.from("scans")
.select("id")
.contains("photo_urls", [file.name])
.maybeSingle();

if (!scan) {
await supabase.storage
.from(bucket)
.remove([file.name]);

console.log(`Deleted orphan image: ${file.name}`);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

N+1 query problem: Optimize database access pattern.

The function executes one database query per file in a loop (up to 1000 queries). This creates significant database load and extends execution time. Batch the operation by fetching all photo_urls once, then checking membership in memory.

♻️ Proposed optimization to eliminate N+1 queries
  const now = Date.now();
+  
+  // Collect all files older than 24 hours
+  const oldFiles = [];
  for (const file of files ?? []) {
    if (!file.created_at) continue;

    const ageHours =
      (now - new Date(file.created_at).getTime()) /
      (1000 * 60 * 60);

-    if (ageHours < 24) continue;
+    if (ageHours >= 24) {
+      oldFiles.push(file.name);
+    }
+  }
+
+  if (oldFiles.length === 0) {
+    return new Response("Cleanup completed");
+  }
+
+  // Fetch all photo_urls in one query
+  const { data: scans } = await supabase
+    .from("scans")
+    .select("photo_urls");
+
+  // Build a Set of all referenced URLs for O(1) lookup
+  const referencedUrls = new Set<string>();
+  for (const scan of scans ?? []) {
+    for (const url of scan.photo_urls ?? []) {
+      referencedUrls.add(url);
+    }
+  }

-    const { data: scan } = await supabase
-      .from("scans")
-      .select("id")
-      .contains("photo_urls", [file.name])
-      .maybeSingle();
+  // Delete orphans
+  for (const fileName of oldFiles) {
+    const publicUrl = supabase.storage
+      .from(bucket)
+      .getPublicUrl(fileName).data.publicUrl;

-    if (!scan) {
+    if (!referencedUrls.has(publicUrl)) {
       await supabase.storage
         .from(bucket)
-        .remove([file.name]);
+        .remove([fileName]);

-      console.log(`Deleted orphan image: ${file.name}`);
+      console.log(`Deleted orphan image: ${fileName}`);
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
42, The loop currently calls
supabase.from("scans").select("id").contains("photo_urls", [file.name]) per file
causing N+1 queries; instead fetch all scan photo_urls once (e.g.,
supabase.from("scans").select("photo_urls")) and build an in-memory Set or Map
of all referenced filenames, then iterate the storage files and for each
file.name check membership in that Set before calling
supabase.storage.from(bucket).remove or logging deletion; update the code around
the existing variables (file, bucket, supabase) and ensure the single bulk query
handles nullable/array photo_urls entries and flattens them to strings for the
membership check.

Comment on lines +37 to +39
await supabase.storage
.from(bucket)
.remove([file.name]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add error handling for storage deletion failures.

The .remove() operation on Line 39 has no error handling. If deletion fails (e.g., due to permissions or network issues), the function continues silently, leaving orphans undeleted without any indication of the failure.

🛡️ Proposed fix to handle deletion errors
-      await supabase.storage
+      const { error: removeError } = await supabase.storage
         .from(bucket)
         .remove([file.name]);

+      if (removeError) {
+        console.error(`Failed to delete ${file.name}: ${removeError.message}`);
+        continue;
+      }
+
       console.log(`Deleted orphan image: ${file.name}`);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
await supabase.storage
.from(bucket)
.remove([file.name]);
const { error: removeError } = await supabase.storage
.from(bucket)
.remove([file.name]);
if (removeError) {
console.error(`Failed to delete ${file.name}: ${removeError.message}`);
continue;
}
console.log(`Deleted orphan image: ${file.name}`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 37 -
39, The storage removal call using
supabase.storage.from(bucket).remove([file.name]) lacks error handling; update
the code in the cleanup-orphan-images handler to capture the result/error
returned by .remove(), check if an error was returned, and handle it (e.g., log
a descriptive error including bucket and file.name via your logger/console,
optionally collect failed deletions for a summary or retry/backoff, and avoid
silently continuing as if deletion succeeded). Ensure you reference the .remove
call and file.name when logging so failures are traceable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tech Debt: Automated Cleanup of Orphaned Images in Storage

1 participant