fix: add orphan image cleanup function#87
Conversation
|
🎉 Thank you for your Pull Request! We're thrilled to have your contribution to FreshScan AI. Before we review, please make sure you have:
A maintainer will review your code as soon as possible! |
|
Warning
|
| Layer / File(s) | Summary |
|---|---|
Edge Function implementation and documentation backend/supabase/functions/cleanup-orphan-images/index.ts, backend/supabase/functions/cleanup-orphan-images/README.md |
Deno Edge Function initializes a service-role Supabase client, lists objects from the scan-images bucket, filters those older than 24 hours, queries the scans table to find matching photo_urls entries, deletes objects with no corresponding scan record, logs deletions, and returns completion status. README documents the cleanup job's purpose and deletion criteria. |
Sequence Diagram
sequenceDiagram
participant Handler as Deno.serve
participant Client as Supabase Client
participant Bucket as scan-images Bucket
participant Table as scans Table
Handler->>Client: Initialize with credentials
Handler->>Bucket: List objects (max 1000)
Bucket-->>Handler: Return objects with created_at
loop For each object > 24h old
Handler->>Table: Query photo_urls contains filename
alt Scan record found
Note over Handler: Skip deletion
else No matching scan
Handler->>Bucket: Delete orphan object
Handler->>Handler: Log deletion
end
end
Handler-->>Handler: Return "Cleanup completed"
🎯 2 (Simple) | ⏱️ ~10 minutes
🐰 A function to clean up orphans with care,
Images forgotten floating in the air,
After twenty-four hours, they disappear,
No matching scans to keep them here! ✨
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description Check | ✅ Passed | Check skipped - CodeRabbit’s high-level summary is enabled. |
| Title check | ✅ Passed | The title 'fix: add orphan image cleanup function' directly and clearly describes the main change: adding a cleanup function for orphaned images. |
| Linked Issues check | ✅ Passed | The PR implements all core requirements from issue #5: a Supabase Edge Function that identifies images older than 24 hours, checks for corresponding scans table records, and deletes orphaned files. |
| Out of Scope Changes check | ✅ Passed | All changes are directly aligned with issue #5 requirements; the PR adds only the cleanup function and its documentation without unrelated modifications. |
| Docstring Coverage | ✅ Passed | No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check. |
✏️ Tip: You can configure your own custom pre-merge checks in the settings.
✨ Finishing Touches
🧪 Generate unit tests (beta)
- Create PR with unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
|
@prachishelke1312 is attempting to deploy a commit to the karan3431's projects Team on Vercel. A member of the Team first needs to authorize it. |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
backend/supabase/functions/cleanup-orphan-images/README.md (1)
1-6: 💤 Low valueConsider adding more implementation details to the documentation.
The README would be more helpful if it specified:
- That matching is performed against the
scans.photo_urlsfield- The expected URL format for matching
- How often the function should be invoked (e.g., via cron)
This would help future maintainers understand the matching logic and deployment requirements.
📝 Suggested documentation enhancement
# Cleanup Orphan Images Deletes images from scan-images bucket if: - older than 24 hours -- no matching record exists in scans table +- no matching record exists in scans table (checked via photo_urls field containing the public storage URL) + +## Usage + +Deploy as a Supabase Edge Function and invoke via cron (recommended: daily) or webhook.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/supabase/functions/cleanup-orphan-images/README.md` around lines 1 - 6, Update the README to explain that the function checks each object in the scan-images bucket against the scans.photo_urls field in the scans table (matching by exact URL or by normalizing to the bucket's public URL format, e.g., https://{bucket}.storage.googleapis.com/{path}), state that only objects older than 24 hours are considered, and add guidance on invocation frequency (recommend a cron job every 24 hours or a shorter interval if rapid cleanup is needed) plus example cron syntax and any required environment variables or permissions the cleanup function needs to run.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/supabase/functions/cleanup-orphan-images/index.ts`:
- Around line 3-6: The code uses non-null assertions when calling createClient
with Deno.env.get("SUPABASE_URL") and Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"),
so add explicit validation before createClient: read both env vars into local
constants (e.g., supabaseUrl and supabaseKey), check if either is missing, and
throw or log a clear error (including which variable is missing) instead of
relying on the `!`; then pass the validated values to createClient. Reference:
the createClient invocation and the SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY env
keys in index.ts.
- Around line 37-39: The storage removal call using
supabase.storage.from(bucket).remove([file.name]) lacks error handling; update
the code in the cleanup-orphan-images handler to capture the result/error
returned by .remove(), check if an error was returned, and handle it (e.g., log
a descriptive error including bucket and file.name via your logger/console,
optionally collect failed deletions for a summary or retry/backoff, and avoid
silently continuing as if deletion succeeded). Ensure you reference the .remove
call and file.name when logging so failures are traceable.
- Around line 11-17: The current supabase.storage.from(bucket).list call is
limited to 1000 items (variables: files, error, bucket) so implement pagination
by repeatedly calling list with a moving offset (or continuation token if your
Supabase client supports it) and accumulating results until a call returns fewer
than the page size; inside that loop handle and return on any error (same error
handling as the existing block) and replace the single files usage with the full
accumulated array for downstream cleanup logic.
- Around line 30-42: The loop currently calls
supabase.from("scans").select("id").contains("photo_urls", [file.name]) per file
causing N+1 queries; instead fetch all scan photo_urls once (e.g.,
supabase.from("scans").select("photo_urls")) and build an in-memory Set or Map
of all referenced filenames, then iterate the storage files and for each
file.name check membership in that Set before calling
supabase.storage.from(bucket).remove or logging deletion; update the code around
the existing variables (file, bucket, supabase) and ensure the single bulk query
handles nullable/array photo_urls entries and flattens them to strings for the
membership check.
- Around line 30-34: The query is comparing storage object paths (file.name) to
database photo_urls which store public URLs, so .contains("photo_urls",
[file.name]) will never match; change the lookup to compare the same public URL
string stored by backend/main.py's get_public_url. Concretely, in
cleanup-orphan-images/index.ts compute the public URL for file.name (either via
supabase.storage.from('scan-images').getPublicUrl(file.name).publicUrl or by
composing
`${process.env.SUPABASE_URL}/storage/v1/object/public/scan-images/${file.name}`
to match get_public_url) and use that value in the .contains("photo_urls",
[publicUrl]) call so scans are correctly detected and not deleted.
---
Nitpick comments:
In `@backend/supabase/functions/cleanup-orphan-images/README.md`:
- Around line 1-6: Update the README to explain that the function checks each
object in the scan-images bucket against the scans.photo_urls field in the scans
table (matching by exact URL or by normalizing to the bucket's public URL
format, e.g., https://{bucket}.storage.googleapis.com/{path}), state that only
objects older than 24 hours are considered, and add guidance on invocation
frequency (recommend a cron job every 24 hours or a shorter interval if rapid
cleanup is needed) plus example cron syntax and any required environment
variables or permissions the cleanup function needs to run.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 718abb1d-17df-414c-9ced-0cc852e5d7c8
📒 Files selected for processing (2)
backend/supabase/functions/cleanup-orphan-images/README.mdbackend/supabase/functions/cleanup-orphan-images/index.ts
| const supabase = createClient( | ||
| Deno.env.get("SUPABASE_URL")!, | ||
| Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")! | ||
| ); |
There was a problem hiding this comment.
Add explicit error handling for missing environment variables.
The ! non-null assertion will throw a runtime error if SUPABASE_URL or SUPABASE_SERVICE_ROLE_KEY are not set. While Supabase Edge Functions typically provide these variables, explicit validation would make the function more robust and provide clearer error messages.
🛡️ Proposed fix to add validation
+const supabaseUrl = Deno.env.get("SUPABASE_URL");
+const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY");
+
+if (!supabaseUrl || !supabaseKey) {
+ throw new Error("Missing required environment variables");
+}
+
const supabase = createClient(
- Deno.env.get("SUPABASE_URL")!,
- Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")!
+ supabaseUrl,
+ supabaseKey
);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const supabase = createClient( | |
| Deno.env.get("SUPABASE_URL")!, | |
| Deno.env.get("SUPABASE_SERVICE_ROLE_KEY")! | |
| ); | |
| const supabaseUrl = Deno.env.get("SUPABASE_URL"); | |
| const supabaseKey = Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"); | |
| if (!supabaseUrl || !supabaseKey) { | |
| throw new Error("Missing required environment variables"); | |
| } | |
| const supabase = createClient( | |
| supabaseUrl, | |
| supabaseKey | |
| ); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 3 -
6, The code uses non-null assertions when calling createClient with
Deno.env.get("SUPABASE_URL") and Deno.env.get("SUPABASE_SERVICE_ROLE_KEY"), so
add explicit validation before createClient: read both env vars into local
constants (e.g., supabaseUrl and supabaseKey), check if either is missing, and
throw or log a clear error (including which variable is missing) instead of
relying on the `!`; then pass the validated values to createClient. Reference:
the createClient invocation and the SUPABASE_URL / SUPABASE_SERVICE_ROLE_KEY env
keys in index.ts.
| const { data: files, error } = await supabase.storage | ||
| .from(bucket) | ||
| .list("", { limit: 1000 }); | ||
|
|
||
| if (error) { | ||
| return new Response(error.message, { status: 500 }); | ||
| } |
There was a problem hiding this comment.
Implement pagination to handle more than 1000 files.
The storage listing is capped at 1000 files with no pagination. If the scan-images bucket contains more than 1000 objects, files beyond this limit will never be evaluated for cleanup, causing orphaned images to accumulate indefinitely.
♻️ Proposed fix to add pagination
const bucket = "scan-images";
+ let offset = 0;
+ const limit = 1000;
+ let hasMore = true;
+ while (hasMore) {
- const { data: files, error } = await supabase.storage
- .from(bucket)
- .list("", { limit: 1000 });
+ const { data: files, error } = await supabase.storage
+ .from(bucket)
+ .list("", { limit, offset });
- if (error) {
- return new Response(error.message, { status: 500 });
- }
+ if (error) {
+ return new Response(error.message, { status: 500 });
+ }
+ if (!files || files.length < limit) {
+ hasMore = false;
+ }
const now = Date.now();
for (const file of files ?? []) {
// ... existing processing logic
}
+ offset += limit;
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { data: files, error } = await supabase.storage | |
| .from(bucket) | |
| .list("", { limit: 1000 }); | |
| if (error) { | |
| return new Response(error.message, { status: 500 }); | |
| } | |
| const bucket = "scan-images"; | |
| let offset = 0; | |
| const limit = 1000; | |
| let hasMore = true; | |
| while (hasMore) { | |
| const { data: files, error } = await supabase.storage | |
| .from(bucket) | |
| .list("", { limit, offset }); | |
| if (error) { | |
| return new Response(error.message, { status: 500 }); | |
| } | |
| if (!files || files.length < limit) { | |
| hasMore = false; | |
| } | |
| const now = Date.now(); | |
| for (const file of files ?? []) { | |
| // ... existing processing logic | |
| } | |
| offset += limit; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 11 -
17, The current supabase.storage.from(bucket).list call is limited to 1000 items
(variables: files, error, bucket) so implement pagination by repeatedly calling
list with a moving offset (or continuation token if your Supabase client
supports it) and accumulating results until a call returns fewer than the page
size; inside that loop handle and return on any error (same error handling as
the existing block) and replace the single files usage with the full accumulated
array for downstream cleanup logic.
| const { data: scan } = await supabase | ||
| .from("scans") | ||
| .select("id") | ||
| .contains("photo_urls", [file.name]) | ||
| .maybeSingle(); |
There was a problem hiding this comment.
Critical: File path/URL mismatch will delete all old images, including those with valid scan records.
The orphan detection logic compares file.name (storage object path, e.g., "user123/scan456.jpg") against photo_urls, which stores public URLs (e.g., "https://...supabase.co/storage/v1/object/public/scan-images/user123/scan456.jpg"), as shown in backend/main.py:291-304 where get_public_url(path) is stored.
This mismatch means:
- The
.contains()query will never find a match - Every file older than 24 hours will be considered orphaned
- All images with valid scan records will be incorrectly deleted
- This causes immediate data loss when the function runs
Verify the mismatch by checking what values are actually stored in the database:
#!/bin/bash
# Description: Confirm that photo_urls contains full public URLs, not just file names
# Show the upload function that generates photo_urls values
echo "=== Upload function (shows get_public_url is stored) ==="
rg -A 3 "get_public_url" backend/main.py
echo ""
echo "=== Scan insert (shows photo_urls receives the URL) ==="
rg -B 2 -A 2 'photo_urls.*photo_url' backend/main.py🔧 Proposed fix to construct the public URL for matching
+ // Construct the public URL to match what's stored in photo_urls
+ const publicUrl = supabase.storage
+ .from(bucket)
+ .getPublicUrl(file.name).data.publicUrl;
+
const { data: scan } = await supabase
.from("scans")
.select("id")
- .contains("photo_urls", [file.name])
+ .contains("photo_urls", [publicUrl])
.maybeSingle();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
34, The query is comparing storage object paths (file.name) to database
photo_urls which store public URLs, so .contains("photo_urls", [file.name]) will
never match; change the lookup to compare the same public URL string stored by
backend/main.py's get_public_url. Concretely, in cleanup-orphan-images/index.ts
compute the public URL for file.name (either via
supabase.storage.from('scan-images').getPublicUrl(file.name).publicUrl or by
composing
`${process.env.SUPABASE_URL}/storage/v1/object/public/scan-images/${file.name}`
to match get_public_url) and use that value in the .contains("photo_urls",
[publicUrl]) call so scans are correctly detected and not deleted.
| const { data: scan } = await supabase | ||
| .from("scans") | ||
| .select("id") | ||
| .contains("photo_urls", [file.name]) | ||
| .maybeSingle(); | ||
|
|
||
| if (!scan) { | ||
| await supabase.storage | ||
| .from(bucket) | ||
| .remove([file.name]); | ||
|
|
||
| console.log(`Deleted orphan image: ${file.name}`); | ||
| } |
There was a problem hiding this comment.
N+1 query problem: Optimize database access pattern.
The function executes one database query per file in a loop (up to 1000 queries). This creates significant database load and extends execution time. Batch the operation by fetching all photo_urls once, then checking membership in memory.
♻️ Proposed optimization to eliminate N+1 queries
const now = Date.now();
+
+ // Collect all files older than 24 hours
+ const oldFiles = [];
for (const file of files ?? []) {
if (!file.created_at) continue;
const ageHours =
(now - new Date(file.created_at).getTime()) /
(1000 * 60 * 60);
- if (ageHours < 24) continue;
+ if (ageHours >= 24) {
+ oldFiles.push(file.name);
+ }
+ }
+
+ if (oldFiles.length === 0) {
+ return new Response("Cleanup completed");
+ }
+
+ // Fetch all photo_urls in one query
+ const { data: scans } = await supabase
+ .from("scans")
+ .select("photo_urls");
+
+ // Build a Set of all referenced URLs for O(1) lookup
+ const referencedUrls = new Set<string>();
+ for (const scan of scans ?? []) {
+ for (const url of scan.photo_urls ?? []) {
+ referencedUrls.add(url);
+ }
+ }
- const { data: scan } = await supabase
- .from("scans")
- .select("id")
- .contains("photo_urls", [file.name])
- .maybeSingle();
+ // Delete orphans
+ for (const fileName of oldFiles) {
+ const publicUrl = supabase.storage
+ .from(bucket)
+ .getPublicUrl(fileName).data.publicUrl;
- if (!scan) {
+ if (!referencedUrls.has(publicUrl)) {
await supabase.storage
.from(bucket)
- .remove([file.name]);
+ .remove([fileName]);
- console.log(`Deleted orphan image: ${file.name}`);
+ console.log(`Deleted orphan image: ${fileName}`);
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 30 -
42, The loop currently calls
supabase.from("scans").select("id").contains("photo_urls", [file.name]) per file
causing N+1 queries; instead fetch all scan photo_urls once (e.g.,
supabase.from("scans").select("photo_urls")) and build an in-memory Set or Map
of all referenced filenames, then iterate the storage files and for each
file.name check membership in that Set before calling
supabase.storage.from(bucket).remove or logging deletion; update the code around
the existing variables (file, bucket, supabase) and ensure the single bulk query
handles nullable/array photo_urls entries and flattens them to strings for the
membership check.
| await supabase.storage | ||
| .from(bucket) | ||
| .remove([file.name]); |
There was a problem hiding this comment.
Add error handling for storage deletion failures.
The .remove() operation on Line 39 has no error handling. If deletion fails (e.g., due to permissions or network issues), the function continues silently, leaving orphans undeleted without any indication of the failure.
🛡️ Proposed fix to handle deletion errors
- await supabase.storage
+ const { error: removeError } = await supabase.storage
.from(bucket)
.remove([file.name]);
+ if (removeError) {
+ console.error(`Failed to delete ${file.name}: ${removeError.message}`);
+ continue;
+ }
+
console.log(`Deleted orphan image: ${file.name}`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| await supabase.storage | |
| .from(bucket) | |
| .remove([file.name]); | |
| const { error: removeError } = await supabase.storage | |
| .from(bucket) | |
| .remove([file.name]); | |
| if (removeError) { | |
| console.error(`Failed to delete ${file.name}: ${removeError.message}`); | |
| continue; | |
| } | |
| console.log(`Deleted orphan image: ${file.name}`); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/supabase/functions/cleanup-orphan-images/index.ts` around lines 37 -
39, The storage removal call using
supabase.storage.from(bucket).remove([file.name]) lacks error handling; update
the code in the cleanup-orphan-images handler to capture the result/error
returned by .remove(), check if an error was returned, and handle it (e.g., log
a descriptive error including bucket and file.name via your logger/console,
optionally collect failed deletions for a summary or retry/backoff, and avoid
silently continuing as if deletion succeeded). Ensure you reference the .remove
call and file.name when logging so failures are traceable.
Summary
Adds a Supabase Edge Function to automatically clean up orphaned images from the
scan-imagesstorage bucket.Changes Made
cleanup-orphan-imagesSupabase Edge FunctionscanstableIssue
Fixes #5
SSoC'26
This contribution was made as part of SSoC'26.
Summary by CodeRabbit