feat: Implement storage aliases by janbuchar · Pull Request #3636 · apify/crawlee

janbuchar · 2026-05-07T15:38:55Z

closes Add support for non-default unnamed storages #3074

…rom crawlee-python

…implify-storage-subclients

barjin

Thank you @janbuchar ! I have a few questions below ⬇️

barjin · 2026-05-11T05:43:34Z

+            // The default queue is an unnamed storage (opened with no identifier), so it has no name.
+            const isDefaultQueue = this.requestQueue?.name === undefined;


I'm not sure I agree with this, e.g., in WCC, we create unnamed queues to pass to the subcrawlers, see:

const fileQueue = await RequestQueue.open(); const mainCrawler = new Crawler(({ req, res }) => { if(res is file) await fileQueue.addRequest(req); }); const fileCrawler = new Crawler({ rq: fileQueue, ... })

In this use-case, the RQ is supposed to, e.g., survive migration and keep the enqueued file links until these are processed by fileCrawler.

I guess my point is unnamed !== exclusively used by one crawler. Perhaps we could have a similar flag as with SessionPool?

crawlee/packages/basic-crawler/src/internals/basic-crawler.ts

Lines 575 to 578 in 712a5d1

/**

* Indicates whether the crawler owns the session pool (it was not passed from the outside using the `sessionPool` constructor option).

*/

private ownsSessionPool: boolean;

barjin · 2026-05-11T05:52:19Z

+    /**
+     * Ensure that the same string is not used as both a name and an alias for the same
+     * storage class + backend combination. Mirrors crawlee-python's `_check_name_alias_conflict`.
+     */


Forgive me for being dense, but why isn't this supported? Two supporting arguments I can think of:

A) The legacy Dataset.open(idOrName: string) signature doesn't support alias, so it's unambiguous in that way. To use alias, the user has to be explicit, e.g., Dataset.open({ name: 'x' }) or Dataset.open({ alias: 'x' })

B) If this is because of the memory-storage implementation (both alias and name become folder names), then it's imo memory-storage implementation leaking into the abstraction, and we should change this. Perhaps add an indirection layer for aliases?

Is there another reason I'm not seeing?

barjin · 2026-05-11T05:58:24Z

+            store.id === entryNameOrId ||
+            store.name?.toLowerCase() === entryNameOrId.toLowerCase() ||
+            store.directoryName.toLowerCase() === entryNameOrId.toLowerCase(),


nit: Given that directoryName is always populated, doesn't the last clause (directoryName) always contain the first two (id / name)?

directoryNameMatches => idMatches \/ nameMatches

janbuchar added 24 commits April 22, 2026 15:56

refactor!: Make the StorageManager more like StorageInstanceManager f…

cdcc9ff

…rom crawlee-python

ServiceLocator.storageInstanceManager is class-level

02c89b7

Simplify StorageInstanceManager

afc367c

Fix type

e6191c3

Fix cache eviction (See apify/crawlee-python#1855)

f8ae6d5

Minor nits

72b58e9

Do not initialize RequestQueue repeatedly

fe46682

Handle missing storage client cache key better

481abe9

Clean up unused locks

f1106f0

Refactor to match crawlee-python better and avoid race conditions

4f69d63

Improve JSDoc wording

a238fb9

Improve interface of client creation methods

5a78768

Fix build

1ac08cf

Merge remote-tracking branch 'origin/v4' into refactor-storage-manager

40ccbd5

Move StorageOpenOptions out of the storage instance manager file

6116429

Clean up the storage_instance_manager file

7868f7d

Do not pass implicit configuration to storage frontend constructors

0b14900

Rename type variable

921dadd

refactor!: Trim the storage subclient interfaces

1151aca

Do not check for cached storage subclient instances

09cfd75

Remove unnecessary ServiceLocator.clearStorageManagerCache

673b586

Correctly check for pre-existing storages

ebac3ad

Merge remote-tracking branch 'origin/refactor-storage-manager' into s…

d998bc7

…implify-storage-subclients

feat: Implement storage aliases

24a7d60

janbuchar added the t-tooling Issues with this label are in the ownership of the tooling team. label May 7, 2026

janbuchar requested review from B4nan and barjin May 7, 2026 15:39

janbuchar linked an issue May 7, 2026 that may be closed by this pull request

Add support for non-default unnamed storages #3074

Open

barjin reviewed May 11, 2026

View reviewed changes

janbuchar force-pushed the simplify-storage-subclients branch from d998bc7 to 1e349f5 Compare May 12, 2026 11:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Implement storage aliases#3636

feat: Implement storage aliases#3636
janbuchar wants to merge 24 commits into
simplify-storage-subclientsfrom
implement-storage-aliases

janbuchar commented May 7, 2026

Uh oh!

barjin left a comment

Uh oh!

barjin May 11, 2026

Uh oh!

barjin May 11, 2026

Uh oh!

barjin May 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// The default queue is an unnamed storage (opened with no identifier), so it has no name.
		const isDefaultQueue = this.requestQueue?.name === undefined;

	/**
	* Indicates whether the crawler owns the session pool (it was not passed from the outside using the `sessionPool` constructor option).
	*/
	private ownsSessionPool: boolean;

Conversation

janbuchar commented May 7, 2026

Uh oh!

barjin left a comment

Choose a reason for hiding this comment

Uh oh!

barjin May 11, 2026

Choose a reason for hiding this comment

Uh oh!

barjin May 11, 2026

Choose a reason for hiding this comment

Uh oh!

barjin May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants