Skip to content

feat: export SessionPool interface, minimize ISession, allow custom implementations#3644

Open
barjin wants to merge 19 commits into
v4from
feat/session-pool-interface
Open

feat: export SessionPool interface, minimize ISession, allow custom implementations#3644
barjin wants to merge 19 commits into
v4from
feat/session-pool-interface

Conversation

@barjin
Copy link
Copy Markdown
Member

@barjin barjin commented May 11, 2026

Exports ISessionPool interface that can be used to implement custom SessionPool implementations. Adds documentation.

barjin added 9 commits May 11, 2026 14:22
…able

Previously, calling retire() bumped errorScore to maxErrorScore but a subsequent markGood() (e.g. the automatic markGood after a successful requestHandler that explicitly retired the session) could decrement the score back below the threshold, making the session usable again. Track retirement in a dedicated _retired flag checked by isUsable() so retire() is a true terminal state.
Replace the global EVENT_SESSION_RETIRED listener and the per-controller
browserSessionIds map with a check at the per-request cleanup hook: if
the session ended the request unusable, retire the browser controller.
The previous mechanism tore down browsers eagerly mid-flight; the new
one lets the in-flight request finish on the doomed browser and retires
it once the request is done. Same outcome, no global event subscription
needed.
SessionPool no longer extends EventEmitter and no longer fires a
sessionRetired event. The Session->SessionPool back-reference, the
sessionPool constructor option on Session, and the EVENT_SESSION_RETIRED
constant are gone with it. The only consumer of that event was the
browser crawler, which now retires browsers via the per-request context
pipeline cleanup. Custom createSessionFunction implementations that
manually constructed Session instances should drop the sessionPool
argument.
Define the minimal contract a session pool must satisfy to be usable as
a crawler's session pool: getSession() / getSession(id) plus the optional
resetStore() and teardown() lifecycle hooks. The built-in SessionPool
declaratively implements it. Crawler-side type loosening follows in a
separate commit.
Loosen BasicCrawlerOptions.sessionPool and the corresponding instance
property from the concrete SessionPool class to the ISessionPool
interface so users can plug in custom session-management strategies.
The ow.instanceOf(SessionPool) validation is replaced with a duck-typed
validators.sessionPool check, and the optional resetStore() / teardown()
lifecycle hooks are now optional-chained so implementations that don't
need them can simply omit them.
Add a section to the v4 upgrade guide covering the new ISessionPool
interface — what it requires, what stays optional, and a minimal
working example of plugging a custom session pool into a crawler.
The optional resetStore() and teardown() hooks on ISessionPool could
never actually fire on a custom pool: the crawler only calls them when
ownsSessionPool is true, and the crawler can only own a pool it
constructed itself — which is always the built-in SessionPool, never
a custom ISessionPool. Drop them from the interface so the contract
reflects what crawlers really require of a user-supplied pool: just
getSession() / getSession(id).

In BasicCrawler, replace the ownsSessionPool boolean with an
ownedSessionPool?: SessionPool reference. It carries the concrete
class so resetStore() / teardown() can be invoked on the internally
constructed pool without the optional-chain dance, and its presence
encodes ownership directly.
The interface belongs in the shared-contracts package alongside the
sibling ISession interface, not in @crawlee/core where it was put
to dodge a circular dependency. Returning ISession instead of the
concrete Session class is enough to support every accessor the
crawlers reach for (markGood, markBad, retire, isUsable, cookieJar,
proxyInfo, getCookies, setCookies, setCookiesFromResponse — all
already on ISession), and lets the interface live in @crawlee/types
without taking a dependency on core.

Relax CrawlingContext.session, the few session: Session parameter
annotations in BasicCrawler / HttpCrawler / BrowserCrawler, and
the handleRequestTimeout helper to ISession. The concrete Session
class still implements ISession, so SessionPool.getSession() —
which physically returns a Session — is structurally assignable to
the new ISessionPool contract, with no runtime change.
Base automatically changed from refactor/remove-session-pool-events to v4 May 15, 2026 08:28
@barjin barjin marked this pull request as ready for review May 15, 2026 10:59
@barjin barjin requested a review from Copilot May 15, 2026 10:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an exported ISessionPool contract so crawlers can accept custom session-pool implementations (beyond the built-in SessionPool), and updates crawler internals/types and upgrade docs accordingly.

Changes:

  • Added ISessionPool interface to @crawlee/types and wired the built-in SessionPool to implement it.
  • Updated crawler code to type sessions as ISession and to accept sessionPool?: ISessionPool, while only tearing down pools constructed by the crawler itself.
  • Added validation for custom sessionPool objects and documented the change in the v4 upgrade guide.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/types/src/session.ts Adds the new exported ISessionPool interface next to ISession.
packages/http-crawler/src/internals/http-crawler.ts Switches internal session typing from Session to ISession.
packages/core/src/validators.ts Adds an ISessionPool-style validator for the sessionPool option.
packages/core/src/session_pool/session_pool.ts Declares built-in SessionPool implements ISessionPool.
packages/core/src/crawlers/crawler_utils.ts Updates timeout helper to accept ISession instead of concrete Session.
packages/core/src/crawlers/crawler_commons.ts Types RestrictedCrawlingContext.session as ISession.
packages/browser-crawler/src/internals/browser-crawler.ts Switches internal session typing from Session to ISession.
packages/basic-crawler/src/internals/send-request.ts Types session as ISession for the sendRequest helper.
packages/basic-crawler/src/internals/basic-crawler.ts Accepts sessionPool?: ISessionPool, changes ownership/lifecycle handling, updates typings.
docs/upgrading/upgrading_v4.md Documents custom pools and related session changes (currently with some issues).
Comments suppressed due to low confidence (2)

docs/upgrading/upgrading_v4.md:171

  • The example imports ISessionPool from @crawlee/core, but @crawlee/core does not currently re-export that type (core exports only a small subset of @crawlee/types). Either update the snippet to import ISessionPool from @crawlee/types, or re-export ISessionPool from @crawlee/core so the example compiles as written.
```typescript
import { BasicCrawler, Session, type ISessionPool } from '@crawlee/core';

docs/upgrading/upgrading_v4.md:201

  • This paragraph says custom pools must return Session instances, but the new ISessionPool contract returns ISession and crawlers are now typed against ISession. Consider rewording to require an ISession-compatible object (or explicitly state that only the built-in Session is supported if that's still the intent) to avoid contradicting the interface and TypeScript types.

The returned objects must be Session instances — the rest of the crawler relies on session.markGood(), session.cookieJar, session.proxyInfo, and the rest of the concrete Session API.

</details>



---

💡 <a href="/apify/crawlee/new/v4?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

Comment thread packages/core/src/session_pool/session_pool.ts
Comment thread docs/upgrading/upgrading_v4.md Outdated
@janbuchar janbuchar self-requested a review May 15, 2026 14:36
Comment thread packages/types/src/session.ts Outdated
Comment on lines +207 to +208
getSession(): Promise<ISession>;
getSession(sessionId: string): Promise<ISession | undefined>;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot how much of the functionality is still in ISession, which is not a bad thing, per se. I just checked it and I think there is a couple of questionable methods that we might want to consider removing (getState, isBlocked, retire?).

By the way, I'm super happy that the interface does not contain stuff like addSession 🙂

Copy link
Copy Markdown
Member Author

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few explanatory comments ⬇️

* ```
*/
setCookies(cookies: CookieObject[], url: string) {
const normalizedCookies = cookies.map((c) => browserPoolCookieToToughCookie(c, this.maxAgeSecs));
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cookies-related methods have been removed from ISession, as these convert the cookies between different formats (Puppeteer's cookie format and the tough-cookie cookies).

This shouldn't be a concern of the Session class (the logic has been moved to the callers in browser-crawler, etc.).

Comment thread packages/core/src/session_pool/session_pool.ts
Comment thread packages/types/src/session.ts
Comment thread packages/types/src/session.ts
@barjin barjin requested review from janbuchar and l2ysho May 19, 2026 07:56
@barjin barjin changed the title feat: export SessionPool interface, allow custom implementations feat: export SessionPool interface, minimize ISession allow custom implementations May 20, 2026
@barjin barjin changed the title feat: export SessionPool interface, minimize ISession allow custom implementations feat: export SessionPool interface, minimize ISession, allow custom implementations May 20, 2026
@barjin barjin self-assigned this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants