Skip to content

Support sandboxes in collections#4613

Open
josephjclark wants to merge 17 commits intomainfrom
sandbox-collections
Open

Support sandboxes in collections#4613
josephjclark wants to merge 17 commits intomainfrom
sandbox-collections

Conversation

@josephjclark
Copy link
Copy Markdown
Collaborator

@josephjclark josephjclark commented Apr 10, 2026

Description

Sandboxes can't support projects that use collections because collection names are globally unique, and the v1 collections API has no way to disambiguate between a parent and its sandbox. This PR relaxes the uniqueness constraint to per-project and adds a v2 API that resolves collections by (project_id, name).

On the lifecycle side, empty collection records are now cloned into a sandbox when it's provisioned so that collections.get('my-collection') resolves correctly without copying production data. When a sandbox is merged back into its parent, collection names are synchronised (new collections in the sandbox are created empty in the parent, collections missing from the sandbox are removed from the parent). Collection data is never synced.

V1 continues to work for any project that doesn't have name collisions. When a v1 lookup finds the same name across multiple projects, it returns 409 Conflict pointing clients at v2. API version is selected via the x-api-version header (missing or "1" = v1, "2" = v2, anything else = 400 Bad Request). Header-based versioning was chosen in discussion with @stuartc and @josephjclark to keep a single collections base path.

Closes #3548

Validation steps

  1. Create a project with a collection called patients, add some items, then spawn a sandbox from it. The sandbox should have an empty patients collection.
  2. Add items to the sandbox's patients collection, then merge the sandbox. Parent project's patients collection should still only have the original items (no data sync).
  3. Create a new collection staging-only in a sandbox, then merge. The parent should now have an empty staging-only collection.
  4. Hit the v1 endpoint GET /collections/patients with a token that has access to both the parent and sandbox, should return 409. Repeat with x-api-version: 2 and GET /collections/:project_id/patients, should resolve correctly.
  5. Hit the v1 endpoint with x-api-version: 99, should return 400 Bad Request.
  6. On a project with only one collection of a given name, the v1 endpoint continues to work as before with no header.

Additional notes for the reviewer

AI Usage

  • I have used Claude Code
  • I have used another model
  • I have not used AI

Pre-submission checklist

  • I have performed an AI review of my code (we recommend using /review with Claude Code)
  • I have implemented and tested all related authorization policies.
  • I have updated the changelog.
  • I have ticked a box in "AI usage" in this PR

Replaces the global unique index on collections.name with a per-project
(project_id, name) index, allowing sandboxes to share collection names
with their parent. Updates the schema constraint to match and adds
conflict detection to get_collection/1, plus a new get_collection/2
that takes a project_id for unambiguous lookup.
Adds a :conflict clause to FallbackController so that when
get_collection/1 finds the same collection name in multiple projects,
all actions return a 409 with a message directing clients to use the v2 API.
Fixes the create_collection test to match the new constraint name
(collections_project_id_name_index). Adds tests for get_collection/1
returning :conflict when the same name exists in multiple projects, and
for get_collection/2 resolving unambiguously by project_id. Adds
controller tests asserting all v1 endpoints return 409 when a name
conflict exists.
Replaces the explicit collections routes with a single wildcard that
dispatches to v1 or v2 logic based on the X-API-VERSION header. V2
routes include project_id in the path, resolving collections
unambiguously without name-collision risk.

Also migrates the download endpoint to v2 (/:project_id/:name) and
updates the LiveView download link accordingly.
Covers all v2 endpoints (GET item, stream, PUT, POST, DELETE item,
DELETE all), including access control with personal and run tokens,
and the key scenario where v2 resolves correctly by project_id when
the same collection name exists in multiple projects.
Adds clone_collections_from_parent/2 to the sandbox provisioning
pipeline. When a sandbox is forked, empty collection records matching
each parent collection name are created in the new project. Items are
not copied. Uses on_conflict: :nothing so re-provisioning is safe.
Adds sync_collections/2, called after a successful merge. Collections
present in the sandbox but not the parent are created (empty) in the
parent. Collections present in the parent but not the sandbox are
deleted from the parent along with all their items. Collections in
both are left untouched. Data is never merged.
Extends the merge confirmation warning to explain that collection names
will be synchronized: new sandbox collections are created empty in the
target, and collections deleted in the sandbox are permanently removed
from the target including all their data.
@github-project-automation github-project-automation bot moved this to New Issues in Core Apr 10, 2026
The single dispatch function had a cyclomatic complexity of 14 (max 9).
Splitting by version brings each function to 7 branches.
@josephjclark
Copy link
Copy Markdown
Collaborator Author

Testing and QA notes:

  • The existing worker & collections adaptor should continue to work against the new lightning server, no changes needed
  • If you create two collections with the same id, and request that id through the legacy API, you should get an error
  • After updating the adaptor and worker, this branch should return the correct collection (even for duplicate collection names)

@josephjclark
Copy link
Copy Markdown
Collaborator Author

@elias-ba one thing I didn't think to check is access control. A project should not be able to access the collections of another project.

How can we enforce this?

Using the CLI you can send any access token you want and so long as the project is within that access token's scope, you're good.

In the worker, we send a JWT which I believe is scoped only to access for that workflow.

That must be sufficient right? We just look at the scope of the access token?

Address review feedback on the sandbox collections work:

- Add a plug that validates the x-api-version header early and returns 400
  for unsupported values instead of silently falling back to v1
- Consolidate v1/v2 controller actions through a shared resolve/1 helper,
  dropping ~100 lines of duplication
- Return 422 with a clear error on missing value/items in request body
  (previously raised FunctionClauseError)
- Move sync_collections into the Sandboxes context, wrap creates and
  deletes in a single transaction, and replace the N delete_collection
  calls with a single batch delete_all
- Scope the collection unique constraint error to :name instead of
  :project_id so the user-facing field reports the conflict
- Tone down the merge warning copy while keeping the essential info

Adds tests for unsupported/garbage x-api-version values, malformed
request bodies, and transaction rollback behavior for sync_collections.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 96.70330% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.60%. Comparing base (4634d08) to head (2155e2f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
lib/lightning_web/live/sandbox_live/index.ex 50.00% 2 Missing ⚠️
...ightning_web/controllers/collections_controller.ex 97.82% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4613      +/-   ##
==========================================
+ Coverage   89.57%   89.60%   +0.02%     
==========================================
  Files         444      445       +1     
  Lines       21505    21567      +62     
==========================================
+ Hits        19263    19325      +62     
  Misses       2242     2242              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@elias-ba
Copy link
Copy Markdown
Contributor

elias-ba commented Apr 13, 2026

Good question @josephjclark , and yes, the access token scope is sufficient. The controller already enforces it.

Every action calls authorize/2 which hits Lightning.Policies.Collections.authorize/3. That policy has two clauses:

  • For a user subject, it delegates to ProjectUsers.authorize(:access_project, user, ...) which checks membership against the collection's project
  • For a run subject, it directly compares Runs.get_project_id_for_run(run) == collection.project_id

So a CLI user with a PAT can only touch collections in projects they're a member of, and a worker's run token can only touch collections in the same project as the run it was issued for. There's a test that specifically covers the worker case ("with a run token, cannot access a different project's collection" returns 401) and another for the user case ("returns 401 when user does not have access to the project").

One subtle thing worth noting: v2 makes this tighter by design. With v1, a request with a stale or leaked token could at least look up a collection by name globally; v2 requires you to know (and be authorized for) the project_id upfront.

@elias-ba elias-ba marked this pull request as ready for review April 13, 2026 16:50
Copy link
Copy Markdown
Contributor

@elias-ba elias-ba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @josephjclark this is now looking good to me and it has all things we discussed this morning. Please do test / QA this extensively and let me know if you find anything weird

@github-actions
Copy link
Copy Markdown

Security Review

⚠️ Automated security review did not complete.

Claude hit the max-turns limit or encountered an error before posting findings.
A manual review of S0 (project-scoped data access), S1 (authorization policies),
and S2 (audit trail coverage) is recommended for this PR.

See the workflow run for details.

- Log a warning when collection sync fails after merge instead of
  silently ignoring the error
- Remove on_conflict: :nothing from insert_empty_collections since
  both callers guarantee no duplicates can exist
- Rename misleading test to match what it actually verifies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

sandboxes: support collections

2 participants