feat: export SessionPool interface, minimize ISession, allow custom implementations#3644
feat: export SessionPool interface, minimize ISession, allow custom implementations#3644barjin wants to merge 19 commits into
SessionPool interface, minimize ISession, allow custom implementations#3644Conversation
…able Previously, calling retire() bumped errorScore to maxErrorScore but a subsequent markGood() (e.g. the automatic markGood after a successful requestHandler that explicitly retired the session) could decrement the score back below the threshold, making the session usable again. Track retirement in a dedicated _retired flag checked by isUsable() so retire() is a true terminal state.
Replace the global EVENT_SESSION_RETIRED listener and the per-controller browserSessionIds map with a check at the per-request cleanup hook: if the session ended the request unusable, retire the browser controller. The previous mechanism tore down browsers eagerly mid-flight; the new one lets the in-flight request finish on the doomed browser and retires it once the request is done. Same outcome, no global event subscription needed.
SessionPool no longer extends EventEmitter and no longer fires a sessionRetired event. The Session->SessionPool back-reference, the sessionPool constructor option on Session, and the EVENT_SESSION_RETIRED constant are gone with it. The only consumer of that event was the browser crawler, which now retires browsers via the per-request context pipeline cleanup. Custom createSessionFunction implementations that manually constructed Session instances should drop the sessionPool argument.
Define the minimal contract a session pool must satisfy to be usable as a crawler's session pool: getSession() / getSession(id) plus the optional resetStore() and teardown() lifecycle hooks. The built-in SessionPool declaratively implements it. Crawler-side type loosening follows in a separate commit.
Loosen BasicCrawlerOptions.sessionPool and the corresponding instance property from the concrete SessionPool class to the ISessionPool interface so users can plug in custom session-management strategies. The ow.instanceOf(SessionPool) validation is replaced with a duck-typed validators.sessionPool check, and the optional resetStore() / teardown() lifecycle hooks are now optional-chained so implementations that don't need them can simply omit them.
Add a section to the v4 upgrade guide covering the new ISessionPool interface — what it requires, what stays optional, and a minimal working example of plugging a custom session pool into a crawler.
The optional resetStore() and teardown() hooks on ISessionPool could never actually fire on a custom pool: the crawler only calls them when ownsSessionPool is true, and the crawler can only own a pool it constructed itself — which is always the built-in SessionPool, never a custom ISessionPool. Drop them from the interface so the contract reflects what crawlers really require of a user-supplied pool: just getSession() / getSession(id). In BasicCrawler, replace the ownsSessionPool boolean with an ownedSessionPool?: SessionPool reference. It carries the concrete class so resetStore() / teardown() can be invoked on the internally constructed pool without the optional-chain dance, and its presence encodes ownership directly.
The interface belongs in the shared-contracts package alongside the sibling ISession interface, not in @crawlee/core where it was put to dodge a circular dependency. Returning ISession instead of the concrete Session class is enough to support every accessor the crawlers reach for (markGood, markBad, retire, isUsable, cookieJar, proxyInfo, getCookies, setCookies, setCookiesFromResponse — all already on ISession), and lets the interface live in @crawlee/types without taking a dependency on core. Relax CrawlingContext.session, the few session: Session parameter annotations in BasicCrawler / HttpCrawler / BrowserCrawler, and the handleRequestTimeout helper to ISession. The concrete Session class still implements ISession, so SessionPool.getSession() — which physically returns a Session — is structurally assignable to the new ISessionPool contract, with no runtime change.
There was a problem hiding this comment.
Pull request overview
This PR introduces an exported ISessionPool contract so crawlers can accept custom session-pool implementations (beyond the built-in SessionPool), and updates crawler internals/types and upgrade docs accordingly.
Changes:
- Added
ISessionPoolinterface to@crawlee/typesand wired the built-inSessionPoolto implement it. - Updated crawler code to type sessions as
ISessionand to acceptsessionPool?: ISessionPool, while only tearing down pools constructed by the crawler itself. - Added validation for custom
sessionPoolobjects and documented the change in the v4 upgrade guide.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/types/src/session.ts | Adds the new exported ISessionPool interface next to ISession. |
| packages/http-crawler/src/internals/http-crawler.ts | Switches internal session typing from Session to ISession. |
| packages/core/src/validators.ts | Adds an ISessionPool-style validator for the sessionPool option. |
| packages/core/src/session_pool/session_pool.ts | Declares built-in SessionPool implements ISessionPool. |
| packages/core/src/crawlers/crawler_utils.ts | Updates timeout helper to accept ISession instead of concrete Session. |
| packages/core/src/crawlers/crawler_commons.ts | Types RestrictedCrawlingContext.session as ISession. |
| packages/browser-crawler/src/internals/browser-crawler.ts | Switches internal session typing from Session to ISession. |
| packages/basic-crawler/src/internals/send-request.ts | Types session as ISession for the sendRequest helper. |
| packages/basic-crawler/src/internals/basic-crawler.ts | Accepts sessionPool?: ISessionPool, changes ownership/lifecycle handling, updates typings. |
| docs/upgrading/upgrading_v4.md | Documents custom pools and related session changes (currently with some issues). |
Comments suppressed due to low confidence (2)
docs/upgrading/upgrading_v4.md:171
- The example imports
ISessionPoolfrom@crawlee/core, but@crawlee/coredoes not currently re-export that type (core exports only a small subset of@crawlee/types). Either update the snippet to importISessionPoolfrom@crawlee/types, or re-exportISessionPoolfrom@crawlee/coreso the example compiles as written.
```typescript
import { BasicCrawler, Session, type ISessionPool } from '@crawlee/core';
docs/upgrading/upgrading_v4.md:201
- This paragraph says custom pools must return
Sessioninstances, but the newISessionPoolcontract returnsISessionand crawlers are now typed againstISession. Consider rewording to require anISession-compatible object (or explicitly state that only the built-inSessionis supported if that's still the intent) to avoid contradicting the interface and TypeScript types.
The returned objects must be Session instances — the rest of the crawler relies on session.markGood(), session.cookieJar, session.proxyInfo, and the rest of the concrete Session API.
</details>
---
💡 <a href="/apify/crawlee/new/v4?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
| getSession(): Promise<ISession>; | ||
| getSession(sessionId: string): Promise<ISession | undefined>; |
There was a problem hiding this comment.
I forgot how much of the functionality is still in ISession, which is not a bad thing, per se. I just checked it and I think there is a couple of questionable methods that we might want to consider removing (getState, isBlocked, retire?).
By the way, I'm super happy that the interface does not contain stuff like addSession 🙂
barjin
left a comment
There was a problem hiding this comment.
A few explanatory comments ⬇️
| * ``` | ||
| */ | ||
| setCookies(cookies: CookieObject[], url: string) { | ||
| const normalizedCookies = cookies.map((c) => browserPoolCookieToToughCookie(c, this.maxAgeSecs)); |
There was a problem hiding this comment.
The cookies-related methods have been removed from ISession, as these convert the cookies between different formats (Puppeteer's cookie format and the tough-cookie cookies).
This shouldn't be a concern of the Session class (the logic has been moved to the callers in browser-crawler, etc.).
SessionPool interface, allow custom implementationsSessionPool interface, minimize ISession allow custom implementations
SessionPool interface, minimize ISession allow custom implementationsSessionPool interface, minimize ISession, allow custom implementations
Exports
ISessionPoolinterface that can be used to implement customSessionPoolimplementations. Adds documentation.