feat: Implement storage aliases#3636
Conversation
janbuchar
commented
May 7, 2026
- closes Add support for non-default unnamed storages #3074
…rom crawlee-python
…implify-storage-subclients
barjin
left a comment
There was a problem hiding this comment.
Thank you @janbuchar ! I have a few questions below ⬇️
| // The default queue is an unnamed storage (opened with no identifier), so it has no name. | ||
| const isDefaultQueue = this.requestQueue?.name === undefined; |
There was a problem hiding this comment.
I'm not sure I agree with this, e.g., in WCC, we create unnamed queues to pass to the subcrawlers, see:
const fileQueue = await RequestQueue.open();
const mainCrawler = new Crawler(({ req, res }) => {
if(res is file) await fileQueue.addRequest(req);
});
const fileCrawler = new Crawler({ rq: fileQueue, ... })In this use-case, the RQ is supposed to, e.g., survive migration and keep the enqueued file links until these are processed by fileCrawler.
I guess my point is unnamed !== exclusively used by one crawler. Perhaps we could have a similar flag as with SessionPool?
crawlee/packages/basic-crawler/src/internals/basic-crawler.ts
Lines 575 to 578 in 712a5d1
| /** | ||
| * Ensure that the same string is not used as both a name and an alias for the same | ||
| * storage class + backend combination. Mirrors crawlee-python's `_check_name_alias_conflict`. | ||
| */ |
There was a problem hiding this comment.
Forgive me for being dense, but why isn't this supported? Two supporting arguments I can think of:
A) The legacy Dataset.open(idOrName: string) signature doesn't support alias, so it's unambiguous in that way. To use alias, the user has to be explicit, e.g., Dataset.open({ name: 'x' }) or Dataset.open({ alias: 'x' })
B) If this is because of the memory-storage implementation (both alias and name become folder names), then it's imo memory-storage implementation leaking into the abstraction, and we should change this. Perhaps add an indirection layer for aliases?
Is there another reason I'm not seeing?
| store.id === entryNameOrId || | ||
| store.name?.toLowerCase() === entryNameOrId.toLowerCase() || | ||
| store.directoryName.toLowerCase() === entryNameOrId.toLowerCase(), |
There was a problem hiding this comment.
nit: Given that directoryName is always populated, doesn't the last clause (directoryName) always contain the first two (id / name)?
directoryNameMatches => idMatches \/ nameMatches
d998bc7 to
1e349f5
Compare