From 1141334d723b539faade110d3fcf9036709df02e Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:01:42 +0200 Subject: [PATCH 01/22] =?UTF-8?q?docs:=20add=20V2=5FPROGRESS.md=20?= =?UTF-8?q?=E2=80=94=20v2=20refactor=20plan=20&=20source=20of=20truth?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Self-contained plan for the v2 rebuild: branch strategy, 12-commit sequence, hard rules, and baked-in reference data (enum old->new tables, deprecated-file list, connector import surface). Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 184 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 V2_PROGRESS.md diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md new file mode 100644 index 0000000..2d47dc4 --- /dev/null +++ b/V2_PROGRESS.md @@ -0,0 +1,184 @@ +# ADaaS SDK v2 Refactor — Progress & Plan (source of truth) + +> This file is the **single source of truth** for the v2 refactor. It is self-contained: +> any session (or subagent) can read ONLY this file + the named git oracles and have +> everything needed. Do NOT rely on chat history. Update the **Status** table after every commit. + +## TL;DR +Rebuild the v2 branch cleanly from `main` as a sequence of small, single-purpose, reviewable +commits. Mechanical/structural transforms first (Phase 1), polish + surface-defining work last +(Phase 2). `npm run build` stays green every commit; the test suite + api report are intentionally +**left broken** until the final steps. No npm publishing during the work. + +## Git facts +- **Working branch:** `v2` (already hard-reset to `origin/main`). +- **Base commit:** `origin/main` = `5b81ef2` (feat: Add new common error enums #204). +- **Oracle (target shape):** `origin/v2` / tag `v2-old-backup` = `9202e47`. This is the PREVIOUS + v2 attempt — it already implemented the rename, deletions, adapter split, state split+envelope, + and emit-from-return, but bundled into huge unreviewable commits built on a stale base. **Use it + as a structural reference / oracle only. Never copy wholesale. Re-author cleanly.** +- **Safety:** old work preserved at tag `v2-old-backup`. Force-push of `v2` is approved by Rado. + +## Hard rules (apply to EVERY Phase-1 commit) +1. **`npm run build` must stay green.** Achieved by commit 1 adding a build tsconfig that excludes + `**/*.test.ts`. (ts-jest still transpiles tests independently, so tests still *run* — and will + fail on old names — that is expected and accepted.) +2. **Never touch `*.test.ts` files or any api-extractor report** (`*.api.md`, `*.api.json`, + `latest.json`, backwards-compatibility fixtures) until Phase 2. Reviewer rejects any commit that does. +3. **Do NOT rename DevRev backend API route strings.** Only SDK-owned identifiers/types/classes are + renamed Airdrop→AirSync. Route strings like `airdrop.sync-mapper-record.get-by-target` and any + `/internal/airdrop.*` endpoints stay verbatim (they are platform API, not SDK naming). +4. **Every deletion must be grep-justified** (zero live references in SDK `src/` non-test + the 3 + inspectable connectors). Record the justification in the commit body. +5. Each commit is **single-purpose**. If a change belongs to a later commit, defer it. +6. Keep `multithreading/` directory name. No logging/console changes. Both out of scope. + +## Commit sequence + +### Phase 1 — structural (review commit-by-commit) +- **C0 — Package rename** `@devrev/ts-adaas` → `@devrev/airsync-sdk` (scoped, stays under @devrev). + Touch: `package.json` `name`; README references; api-extractor config (entry point / package name); + rename the report file `*/ts-adaas.api.md` → `airsync-sdk.api.md` IF trivial, else defer report to Phase 2. + Do NOT publish. Version → `2.0.0-beta.0` placeholder. +- **C1 — Delete dead/deprecated code + add build tsconfig.** + - Delete `src/deprecated/**` (see list below) and its exports from `src/index.ts`. + - Delete `src/common/event-type-translation.ts` + `.test.ts` (the old↔new event-type shim). + - Delete other `@deprecated`-tagged symbols / provably-unused code (grep-justified). + - Add `tsconfig.build.json` (`include: ["src"]`, `exclude: ["**/*.test.ts","node_modules","dist"]`) + and point `build` script at it. This is the "build stays green" enabler. +- **C2 — Airdrop→AirSync identifier rename.** SDK identifiers/types/classes/comments only. + NOT API route strings (rule 3). e.g. `AirdropEvent`→`AirSyncEvent`, `AirdropMessage`→`AirSyncMessage` + (verify exact target names against `origin/v2`). Provide back-compat type aliases ONLY if origin/v2 did. +- **C3 — Delete deprecated enum members** (NOT a rename — main carries old+new side by side; drop old). + Leave only the new members. See enum tables below. Files: `src/types/extraction.ts`, + `src/types/loading.ts`, plus any `case`/reference cleanups in `control-protocol.ts`, `spawn.helpers.ts`, adapters. +- **C4a — State split (structural only).** Introduce `BaseState` + `ExtractionState` + `LoadingState`. + KEEP the flat `AdapterState = ConnectorState & SdkState` shape (behavior identical). + Author fresh; origin/v2 `src/state/base-state.ts` etc. are structural reference only. +- **C4b — State envelope + migration.** Change on-disk shape to `{ connectorState, sdkState }`. + Add migration shim: read legacy flat v1 blob → split SDK-owned keys into `sdkState` → persist envelope. + (origin/v2 `base-state.ts` has the reference impl incl. `V1_SDK_STATE_KEYS`.) +- **C5 — Adapter split (structural only).** `BaseAdapter` + `ExtractionAdapter` + `LoadingAdapter`. + KEEP existing `emit`-based contract working (behavior identical). Author fresh intermediate form + (this exact form exists in NO branch — origin/v2's split already assumes emit-from-return). +- **C6 — Emit-from-return contract.** `task`/`onTimeout` return a `TaskResult` + (`{ status: 'success'|'progress'|'delay'|'error', ... }`); the SDK maps status→phase event and emits + exactly once; `emit` removed from public surface. `processTask` → `processExtractionTask` + + `processLoadingTask`. Reference: origin/v2 `process-task.ts`, `base-adapter.ts` (mapping keys off + event_type/phase, NOT off state shape — so C4b and C6 are independent). + +### Phase 2 — closing / interactive (batched, done at the end) +- **C7 — JSDoc pass.** Bar = `src/mappers/mappers.ts` style (class block: what+when; method block: + one-line what, "Used to/for…" usage, `@param` w/ type, `@returns`). Public surface + non-obvious + internals (state migration, emit-from-return mapping, attachment streaming pool). Fan out per module, + squash to one `docs:` commit. +- **C8 — Regenerate api report** (`airsync-sdk.api.md`). +- **C9 — Exposure audit (INTERACTIVE with Rado).** Read the regenerated report; decide per-symbol what + to keep public vs hide. Empirical floor = anything imported by the 3 connectors (table below). +- **C10 — Fix tests + bw-compat baseline.** Update test files to new names/contract; decide re-baseline + vs remove the backwards-compatibility gate (v2 is an intentional break, so a v1-comparison gate is wrong). +- **C11 — Migration deliverable.** Scan full `main..v2` diff → derive v1→v2 change catalog → build a + **dedicated `migrate-v2` skill in `adaas-sdk`** (`.claude/skills/migrate-v2/`), later ported to the + `connectors-codegen` repo (owns the `connector-dev` plugin). Mechanical changes auto-applied; semantic + (emit-from-return, state access) flagged for review; ambiguous → `MIGRATION_TODO.md`. Validate against + the 3 inspectable connectors. Skill philosophy mirrors existing `update-sdk` (autonomous + defer-on-ambiguity). + +## Orchestration model +- Per Phase-1 commit, in the main session across multiple sittings: + 1. **Implementer subagent** — does the one commit's work; obeys all Hard rules; build stays green. + 2. **Reviewer subagent** (read-only) — verifies diff against that commit's contract + Hard rules + (esp. "no test/report files touched", "deletions grep-justified", "structure-only vs behavior-only"). + 3. Rado eyeballs → commit → next. +- Mini-workflows for parallel sub-steps: deletion grep-verification (C1), JSDoc-by-module (C7), + exposure-by-symbol (C9). + +--- + +## Reference data (baked in so future sessions don't re-derive) + +### `src/deprecated/**` files to delete (C1) +``` +src/deprecated/adapter/index.ts +src/deprecated/common/helpers.ts +src/deprecated/demo-extractor/external_domain_metadata.json +src/deprecated/demo-extractor/index.ts +src/deprecated/http/client.ts +src/deprecated/uploader/index.ts +``` +Also delete `src/common/event-type-translation.ts` (+ `.test.ts`). +`src/index.ts` on main exports these deprecated barrels — remove those export lines: +`./deprecated/adapter`, `./deprecated/demo-extractor`, `./deprecated/http/client`, `./deprecated/uploader`, +and the `formatAxiosError` export (origin/v2 dropped it — confirm against connector usage; azure-boards imports it, so this is a migration note). + +### C3 — EventType (incoming): DELETE these deprecated members, keep the new ones +| DELETE (old member = old VALUE) | KEEP (new member = new VALUE) | +|--------------------------------------------------------------|--------------------------------------------------------------------------| +| ExtractionExternalSyncUnitsStart = EXTRACTION_EXTERNAL_SYNC_UNITS_START | StartExtractingExternalSyncUnits = START_EXTRACTING_EXTERNAL_SYNC_UNITS | +| ExtractionMetadataStart = EXTRACTION_METADATA_START | StartExtractingMetadata = START_EXTRACTING_METADATA | +| ExtractionDataStart = EXTRACTION_DATA_START | StartExtractingData = START_EXTRACTING_DATA | +| ExtractionDataContinue = EXTRACTION_DATA_CONTINUE | ContinueExtractingData = CONTINUE_EXTRACTING_DATA | +| ExtractionDataDelete = EXTRACTION_DATA_DELETE | StartDeletingExtractorState = START_DELETING_EXTRACTOR_STATE | +| ExtractionAttachmentsStart = EXTRACTION_ATTACHMENTS_START | StartExtractingAttachments = START_EXTRACTING_ATTACHMENTS | +| ExtractionAttachmentsContinue = EXTRACTION_ATTACHMENTS_CONTINUE | ContinueExtractingAttachments = CONTINUE_EXTRACTING_ATTACHMENTS | +| ExtractionAttachmentsDelete = EXTRACTION_ATTACHMENTS_DELETE | StartDeletingExtractorAttachmentsState = START_DELETING_EXTRACTOR_ATTACHMENTS_STATE | +Loading members (StartLoadingData…StartDeletingLoaderAttachmentState) + UnknownEventType are unchanged. + +### C3 — ExtractorEventType (outgoing): DELETE deprecated, keep new +| DELETE (old) | KEEP (new) | +|---------------------------------------|-----------------------------------------| +| ExtractionExternalSyncUnitsDone | ExternalSyncUnitExtractionDone | +| ExtractionExternalSyncUnitsError | ExternalSyncUnitExtractionError | +| ExtractionMetadataDone | MetadataExtractionDone | +| ExtractionMetadataError | MetadataExtractionError | +| ExtractionDataProgress | DataExtractionProgress | +| ExtractionDataDelay | DataExtractionDelayed | +| ExtractionDataDone | DataExtractionDone | +| ExtractionDataError | DataExtractionError | +| ExtractionDataDeleteDone | ExtractorStateDeletionDone | +| ExtractionDataDeleteError | ExtractorStateDeletionError | +| ExtractionAttachmentsProgress | AttachmentExtractionProgress | +| ExtractionAttachmentsDelay | AttachmentExtractionDelayed | +| ExtractionAttachmentsDone | AttachmentExtractionDone | +| ExtractionAttachmentsError | AttachmentExtractionError | +| ExtractionAttachmentsDeleteDone | ExtractorAttachmentsStateDeletionDone | +| ExtractionAttachmentsDeleteError | ExtractorAttachmentsStateDeletionError | +(values for new members are the *_EXTRACTION_* / *_DELETION_* strings — see origin/v2 extraction.ts.) + +### C3 — LoaderEventType: DELETE deprecated typo/plural members +DELETE: `DataLoadingDelay` (typo), `AttachmentsLoadingProgress/Delayed/Done/Error` (the plural-typo dupes). +KEEP: `DataLoadingProgress, DataLoadingDelayed, DataLoadingDone, DataLoadingError, +AttachmentLoadingProgress/Delayed/Done/Error, LoaderStateDeletionDone/Error, +LoaderAttachmentStateDeletionDone/Error, UnknownEventType`. + +### Connector import surface (empirical floor for C9 exposure audit + C11 migration) +Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: +- **asana-internal:** AirSyncDefaultItemTypes, AirdropEvent, ErrorRecord, EventType, ExternalDomainMetadata, + ExternalSyncUnit, ExtractorEventType, LoaderEventType, NormalizedAttachment, NormalizedItem, RepoInterface, + SyncMode, WorkerAdapter, axios, processTask, spawn +- **azure-boards:** AirSyncDefaultItemTypes, AirdropEvent, AirdropMessage, EventType, ExternalSyncUnit, + ExternalSystemAttachment, ExternalSystemItem, ExternalSystemItemLoadingParams, ExtractorEventType, + LoaderEventType, NormalizedAttachment, NormalizedItem, SyncMode, WorkerAdapter, formatAxiosError, + installInitialDomainMapping, processTask, spawn +- **google-drive:** AirdropEvent, EventType, ExternalSystemAttachmentStreamingParams, ExtractorEventType, + NormalizedAttachment, NormalizedItem, SyncMode, WorkerAdapter, processTask, spawn, axios, axiosClient + +**Migration-relevant removals these connectors will hit:** `WorkerAdapter` (removed → use processExtraction/LoadingTask ++ return-based contract), `processTask` (split), `formatAxiosError` (dropped from index), `AirdropEvent`/`AirdropMessage` +(renamed AirSync*), all old `EXTRACTION_*` enum members (deleted). + +## Status +| Commit | State | Notes | +|--------|-------|-------| +| C0 package rename | ☐ todo | | +| C1 delete + tsconfig | ☐ todo | | +| C2 AirSync rename | ☐ todo | | +| C3 enum cleanup | ☐ todo | | +| C4a state split | ☐ todo | | +| C4b state envelope | ☐ todo | | +| C5 adapter split | ☐ todo | | +| C6 emit-from-return | ☐ todo | | +| C7 JSDoc | ☐ todo | Phase 2 | +| C8 api report | ☐ todo | Phase 2 | +| C9 exposure audit | ☐ todo | Phase 2, interactive | +| C10 tests + baseline | ☐ todo | Phase 2 | +| C11 migrate-v2 skill | ☐ todo | Phase 2 | From 8ddeb87a3bfa63425dd6b89a1efef9a0c188ad2b Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:03:59 +0200 Subject: [PATCH 02/22] refactor(v2): rename package @devrev/ts-adaas -> @devrev/airsync-sdk Package identity change for the v2 AirSync rebrand. Scoped to package identity only: - package.json name, version (2.0.0-beta.0), description - package-lock.json (regenerated) - README install command - release workflow references Deferred to later commits: README Airdrop->AirSync prose (C2), the api-extractor report filename + config under backwards-compatibility (C8, test/report territory). Ref: V2_PROGRESS.md C0 Co-Authored-By: Claude Opus 4.8 (1M context) --- .github/workflows/release.yaml | 4 ++-- README.md | 2 +- package-lock.json | 8 ++++---- package.json | 6 +++--- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml index 3145273..13d7658 100644 --- a/.github/workflows/release.yaml +++ b/.github/workflows/release.yaml @@ -1,4 +1,4 @@ -# This workflow is used to release a new version of @devrev/ts-adaas package to +# This workflow is used to release a new version of @devrev/airsync-sdk package to # npm registry and generate release notes using softprops/action-gh-release # action. It consists of two jobs: # @@ -93,7 +93,7 @@ jobs: id: version run: | # Get the latest version including prereleases - LATEST_VERSION=$(npm view @devrev/ts-adaas versions --json 2>/dev/null | jq -r '.[-1]' || echo "0.0.0") + LATEST_VERSION=$(npm view @devrev/airsync-sdk versions --json 2>/dev/null | jq -r '.[-1]' || echo "0.0.0") echo "Latest published version: $LATEST_VERSION" echo "LATEST_PUBLISHED_VERSION=$LATEST_VERSION" >> $GITHUB_ENV diff --git a/README.md b/README.md index 11f7b2c..e2fc2c9 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ It provides features such as: ## Installation ```bash -npm install @devrev/ts-adaas +npm install @devrev/airsync-sdk ``` ## Reference diff --git a/package-lock.json b/package-lock.json index 9d56f7e..d0038c2 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { - "name": "@devrev/ts-adaas", - "version": "1.19.7", + "name": "@devrev/airsync-sdk", + "version": "2.0.0-beta.0", "lockfileVersion": 3, "requires": true, "packages": { "": { - "name": "@devrev/ts-adaas", - "version": "1.19.7", + "name": "@devrev/airsync-sdk", + "version": "2.0.0-beta.0", "license": "ISC", "dependencies": { "@devrev/typescript-sdk": "^1.1.74", diff --git a/package.json b/package.json index 27d1e7f..0310fb5 100644 --- a/package.json +++ b/package.json @@ -1,7 +1,7 @@ { - "name": "@devrev/ts-adaas", - "version": "1.19.7", - "description": "Typescript library containing the ADaaS(AirDrop as a Service) control protocol.", + "name": "@devrev/airsync-sdk", + "version": "2.0.0-beta.0", + "description": "Typescript SDK for building AirSync snap-ins on the DevRev platform.", "type": "commonjs", "main": "./dist/index.js", "typings": "./dist/index.d.ts", From 0760f0b99e7a9ab64fe6cd8c05af7e29891ad39a Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:04:18 +0200 Subject: [PATCH 03/22] docs: mark C0 done in V2_PROGRESS Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 2d47dc4..e9df782 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -169,7 +169,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: ## Status | Commit | State | Notes | |--------|-------|-------| -| C0 package rename | ☐ todo | | +| C0 package rename | ☑ done | 8ddeb87. @devrev/ts-adaas→@devrev/airsync-sdk, v2.0.0-beta.0. Report filename rename deferred to C8. | | C1 delete + tsconfig | ☐ todo | | | C2 AirSync rename | ☐ todo | | | C3 enum cleanup | ☐ todo | | From d573cb6a1abff762b78b85c1647b8af99493431e Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:11:04 +0200 Subject: [PATCH 04/22] refactor(v2)!: delete deprecated module + add build tsconfig Remove the entire src/deprecated/ tree (Adapter, DemoExtractor, HTTPClient, deprecated Uploader, demo metadata) and its four barrel re-exports from src/index.ts. These were unused by production code and any of the inspectable connectors (zero live references). Add tsconfig.build.json that extends tsconfig.json and excludes test files (**/*.test.ts, src/tests). Point the 'build' script at it so 'npm run build' compiles only shippable source. This keeps the build green across subsequent v2 commits even while the test suite (run separately via ts-jest) lags behind the rename/contract changes until it is fixed in Phase 2. BREAKING CHANGE: removed deprecated exports Adapter, DemoExtractor, HTTPClient, and the legacy Uploader from the public API. Ref: V2_PROGRESS.md C1 Co-Authored-By: Claude Opus 4.8 (1M context) --- package.json | 2 +- src/deprecated/adapter/index.ts | 209 ---------------- src/deprecated/common/helpers.ts | 55 ---- .../external_domain_metadata.json | 38 --- src/deprecated/demo-extractor/index.ts | 235 ------------------ src/deprecated/http/client.ts | 149 ----------- src/deprecated/uploader/index.ts | 156 ------------ src/index.ts | 4 - tsconfig.build.json | 4 + 9 files changed, 5 insertions(+), 847 deletions(-) delete mode 100644 src/deprecated/adapter/index.ts delete mode 100644 src/deprecated/common/helpers.ts delete mode 100644 src/deprecated/demo-extractor/external_domain_metadata.json delete mode 100644 src/deprecated/demo-extractor/index.ts delete mode 100644 src/deprecated/http/client.ts delete mode 100644 src/deprecated/uploader/index.ts create mode 100644 tsconfig.build.json diff --git a/package.json b/package.json index 0310fb5..6858c35 100644 --- a/package.json +++ b/package.json @@ -6,7 +6,7 @@ "main": "./dist/index.js", "typings": "./dist/index.d.ts", "scripts": { - "build": "tsc -p ./tsconfig.json", + "build": "tsc -p ./tsconfig.build.json", "prepare": "npm run build", "start": "ts-node src/index.ts", "lint": "eslint .", diff --git a/src/deprecated/adapter/index.ts b/src/deprecated/adapter/index.ts deleted file mode 100644 index 6954768..0000000 --- a/src/deprecated/adapter/index.ts +++ /dev/null @@ -1,209 +0,0 @@ -import axios from 'axios'; - -import { - AirdropEvent, - EventData, - ExtractorEvent, - ExtractorEventType, -} from '../../types/extraction'; -import { Artifact } from '../../uploader/uploader.interfaces'; - -import { AdapterState } from '../../state/state.interfaces'; - -import { STATELESS_EVENT_TYPES } from '../../common/constants'; -import { getTimeoutExtractorEventType } from '../common/helpers'; -// import { Logger } from '../../logger/logger'; -import { State, createAdapterState } from '../../state/state'; -import { translateIncomingEventType } from '../../common/event-type-translation'; -import { runWithSdkLogContext } from '../../logger/logger.context'; - -/** - * Adapter class is used to interact with Airdrop platform. The class provides - * utilities to - * - emit control events to the platform - * - update the state of the extractor - * - set the last saved state in case of a timeout - * - * @class Adapter - * @constructor - * @deprecated - * @param {AirdropEvent} event - The event object received from the platform - * @param {object=} initialState - The initial state of the adapter - * @param {boolean=} isLocalDevelopment - A flag to indicate if the adapter is being used in local development - */ - -/** - * Creates an adapter instance. - * - * @param {AirdropEvent} event - The event object received from the platform - * @param initialState - * @param {boolean=} isLocalDevelopment - A flag to indicate if the adapter is being used in local development - * @return The adapter instance - */ - -export async function createAdapter( - event: AirdropEvent, - initialState: ConnectorState, - isLocalDevelopment: boolean = false -) { - event.payload.event_type = translateIncomingEventType(event.payload.event_type); - - const newInitialState = structuredClone(initialState); - const adapterState: State = await createAdapterState({ - event, - initialState: newInitialState, - }); - - const a = new Adapter( - event, - adapterState, - isLocalDevelopment - ); - - return a; -} - -export class Adapter { - private adapterState: State; - private _artifacts: Artifact[]; - - private event: AirdropEvent; - private callbackUrl: string; - private devrevToken: string; - private startTime: number; - private heartBeatFn: ReturnType | undefined; - private exit: boolean = false; - private lambdaTimeout: number = 10 * 60 * 1000; // 10 minutes in milliseconds - private heartBeatInterval: number = 30 * 1000; // 30 seconds in milliseconds - - constructor( - event: AirdropEvent, - adapterState: State, - isLocalDevelopment: boolean = false - ) { - if (!isLocalDevelopment) { - // Logger.init(event); - } - - this.adapterState = adapterState; - this._artifacts = []; - - this.event = event; - this.callbackUrl = event.payload.event_context.callback_url; - this.devrevToken = event.context.secrets.service_account_token; - - this.startTime = Date.now(); - - // Run heartbeat every 30 seconds - this.heartBeatFn = setInterval(async () => { - const b = await this.heartbeat(); - if (b) { - this.exitAdapter(); - } - }, this.heartBeatInterval); - } - - get state(): AdapterState { - return this.adapterState.state; - } - - set state(value: AdapterState) { - this.adapterState.state = value; - } - - get artifacts(): Artifact[] { - return this._artifacts; - } - - set artifacts(value: Artifact[]) { - this._artifacts = value; - } - - /** - * Emits an event to the platform. - * - * @param {ExtractorEventType} newEventType - The event type to be emitted - * @param {EventData=} data - The data to be sent with the event - */ - async emit(newEventType: ExtractorEventType, data?: EventData) { - if (this.exit) { - console.warn( - 'Adapter is already in exit state. No more events can be emitted.' - ); - return; - } - - // We want to save the state every time we emit an event, except for the start and delete events - if (!STATELESS_EVENT_TYPES.includes(this.event.payload.event_type)) { - runWithSdkLogContext(() => - console.log(`Saving state before emitting event`) - ); - await this.adapterState.postState(this.state); - } - - const newEvent: ExtractorEvent = { - event_type: newEventType, - event_context: this.event.payload.event_context, - event_data: { - ...data, - }, - }; - - try { - await axios.post( - this.callbackUrl, - { ...newEvent }, - { - headers: { - Accept: 'application/json, text/plain, */*', - Authorization: this.devrevToken, - 'Content-Type': 'application/json', - }, - } - ); - - console.log('Successfully emitted event: ' + JSON.stringify(newEvent)); - } catch (error) { - // If this request fails the extraction will be stuck in loop and - // we need to stop it through UI or think about retrying this request - console.log( - 'Failed to emit event: ' + - JSON.stringify(newEvent) + - ', error: ' + - error - ); - } finally { - this.exitAdapter(); - } - } - - /** - * Exit the adapter. This will stop the heartbeat and no - * further events will be emitted. - */ - private exitAdapter() { - this.exit = true; - } - - /** - * Heartbeat function to check if the lambda is about to timeout. - * @returns true if 10 minutes have passed since the start of the lambda. - */ - private async heartbeat(): Promise { - if (this.exit) { - return true; - } - if (Date.now() - this.startTime > this.lambdaTimeout) { - const timeoutEventType = getTimeoutExtractorEventType( - this.event.payload.event_type - ); - if (timeoutEventType !== null) { - const { eventType, isError } = timeoutEventType; - const err = isError ? { message: 'Lambda Timeout' } : undefined; - await this.emit(eventType, { error: err, artifacts: this._artifacts }); - return true; - } - } - return false; - } -} diff --git a/src/deprecated/common/helpers.ts b/src/deprecated/common/helpers.ts deleted file mode 100644 index 41eb947..0000000 --- a/src/deprecated/common/helpers.ts +++ /dev/null @@ -1,55 +0,0 @@ -import { jsonl } from 'js-jsonl'; - -import { EventType, ExtractorEventType } from '../../types/extraction'; - -export function createFormData( - //eslint-disable-next-line @typescript-eslint/no-explicit-any - preparedArtifact: any, - fetchedObjects: object[] | object -): FormData { - const formData = new FormData(); - for (const item of preparedArtifact.form_data) { - formData.append(item.key, item.value); - } - - const output = jsonl.stringify(fetchedObjects); - formData.append('file', output); - - return formData; -} - -export function getTimeoutExtractorEventType(eventType: EventType): { - eventType: ExtractorEventType; - isError: boolean; -} | null { - switch (eventType) { - case EventType.ExtractionMetadataStart: - return { - eventType: ExtractorEventType.ExtractionMetadataError, - isError: true, - }; - case EventType.ExtractionDataStart: - case EventType.ExtractionDataContinue: - return { - eventType: ExtractorEventType.ExtractionDataProgress, - isError: false, - }; - case EventType.ExtractionAttachmentsStart: - case EventType.ExtractionAttachmentsContinue: - return { - eventType: ExtractorEventType.ExtractionAttachmentsProgress, - isError: false, - }; - case EventType.ExtractionExternalSyncUnitsStart: - return { - eventType: ExtractorEventType.ExtractionExternalSyncUnitsError, - isError: true, - }; - default: - console.log( - 'Event type not recognized in getTimeoutExtractorEventType function: ' + - eventType - ); - return null; - } -} diff --git a/src/deprecated/demo-extractor/external_domain_metadata.json b/src/deprecated/demo-extractor/external_domain_metadata.json deleted file mode 100644 index ea87648..0000000 --- a/src/deprecated/demo-extractor/external_domain_metadata.json +++ /dev/null @@ -1,38 +0,0 @@ -{ - "record_types": { - "users": { - "fields": { - "name": { - "is_required": true, - "type": "text", - "name": "Name", - "text": { - "min_length": 1 - } - }, - "email": { - "type": "text", - "name": "Email", - "is_required": true - } - } - }, - "contacts": { - "fields": { - "name": { - "is_required": true, - "type": "text", - "name": "Name", - "text": { - "min_length": 1 - } - }, - "email": { - "type": "text", - "name": "Email", - "is_required": true - } - } - } - } -} diff --git a/src/deprecated/demo-extractor/index.ts b/src/deprecated/demo-extractor/index.ts deleted file mode 100644 index 30e937f..0000000 --- a/src/deprecated/demo-extractor/index.ts +++ /dev/null @@ -1,235 +0,0 @@ -import { - AirdropEvent, - EventType, - ExternalSyncUnit, - ExtractorEventType, -} from '../../types/extraction'; -import { Adapter } from '../adapter'; -import { Uploader } from '../uploader'; -import externalDomainMetadata from './external_domain_metadata.json'; - -type ConnectorState = object; - -/** - * Demo extractor is a reference implementation of an ADaaS connector to facilitate rapid immersion into ADaaS. - * - * @class DemoExtractor - * @deprecated - **/ -export class DemoExtractor { - private event: AirdropEvent; - private adapter: Adapter; - private uploader: Uploader; - - constructor(event: AirdropEvent, adapter: Adapter) { - this.event = event; - this.adapter = adapter; - this.uploader = new Uploader( - this.event.execution_metadata.devrev_endpoint, - this.event.context.secrets.service_account_token - ); - } - - async run() { - switch (this.event.payload.event_type) { - case EventType.ExtractionExternalSyncUnitsStart: { - const externalSyncUnits: ExternalSyncUnit[] = [ - { - id: 'devrev', - name: 'devrev', - description: 'Demo external sync unit', - }, - ]; - - await this.adapter.emit( - ExtractorEventType.ExtractionExternalSyncUnitsDone, - { - external_sync_units: externalSyncUnits, - } - ); - - break; - } - - case EventType.ExtractionMetadataStart: { - const { artifact, error } = await this.uploader.upload( - 'metadata_1.jsonl', - 'external_domain_metadata', - externalDomainMetadata - ); - - if (error || !artifact) { - await this.adapter.emit(ExtractorEventType.ExtractionMetadataError, { - error, - }); - return; - } - - await this.adapter.emit(ExtractorEventType.ExtractionMetadataDone, { - artifacts: [artifact], - }); - - break; - } - - case EventType.ExtractionDataStart: { - const contacts = [ - { - id: 'contact-1', - created_date: '1999-12-25T01:00:03+01:00', - modified_date: '1999-12-25T01:00:03+01:00', - data: { - email: 'johnsmith@test.com', - name: 'John Smith', - }, - }, - { - id: 'contact-2', - created_date: '1999-12-27T15:31:34+01:00', - modified_date: '2002-04-09T01:55:31+02:00', - data: { - email: 'janesmith@test.com', - name: 'Jane Smith', - }, - }, - ]; - - const { artifact, error } = await this.uploader.upload( - 'contacts_1.json', - 'contacts', - contacts - ); - - if (error || !artifact) { - await this.adapter.emit(ExtractorEventType.ExtractionDataError, { - error, - }); - - return; - } - - await this.adapter.emit(ExtractorEventType.ExtractionDataProgress, { - progress: 50, - artifacts: [artifact], - }); - - break; - } - - case EventType.ExtractionDataContinue: { - const users = [ - { - id: 'user-1', - created_date: '1999-12-25T01:00:03+01:00', - modified_date: '1999-12-25T01:00:03+01:00', - data: { - email: 'johndoe@test.com', - name: 'John Doe', - }, - }, - { - id: 'user-2', - created_date: '1999-12-27T15:31:34+01:00', - modified_date: '2002-04-09T01:55:31+02:00', - data: { - email: 'janedoe@test.com', - name: 'Jane Doe', - }, - }, - ]; - - const { artifact, error } = await this.uploader.upload( - 'users_1.json', - 'users', - users - ); - - if (error || !artifact) { - await this.adapter.emit(ExtractorEventType.ExtractionDataError, { - error, - }); - return; - } - - await this.adapter.emit(ExtractorEventType.ExtractionDataDone, { - progress: 100, - artifacts: [artifact], - }); - - break; - } - - case EventType.ExtractionDataDelete: { - await this.adapter.emit(ExtractorEventType.ExtractionDataDeleteDone); - break; - } - - case EventType.ExtractionAttachmentsStart: { - const attachment1 = ['This is attachment1.txt content']; - const { artifact, error } = await this.uploader.upload( - 'attachment1.txt', - 'attachment', - attachment1 - ); - - if (error || !artifact) { - await this.adapter.emit( - ExtractorEventType.ExtractionAttachmentsError, - { - error, - } - ); - return; - } - - await this.adapter.emit( - ExtractorEventType.ExtractionAttachmentsProgress, - { - artifacts: [artifact], - } - ); - - break; - } - - case EventType.ExtractionAttachmentsContinue: { - const attachment2 = ['This is attachment2.txt content']; - const { artifact, error } = await this.uploader.upload( - 'attachment2.txt', - 'attachment', - attachment2 - ); - - if (error || !artifact) { - await this.adapter.emit( - ExtractorEventType.ExtractionAttachmentsError, - { - error, - } - ); - return; - } - - await this.adapter.emit(ExtractorEventType.ExtractionAttachmentsDone, { - artifacts: [artifact], - }); - - break; - } - - case EventType.ExtractionAttachmentsDelete: { - await this.adapter.emit( - ExtractorEventType.ExtractionAttachmentsDeleteDone - ); - break; - } - - default: { - console.error( - 'Event in DemoExtractor run not recognized: ' + - JSON.stringify(this.event.payload.event_type) - ); - } - } - } -} diff --git a/src/deprecated/http/client.ts b/src/deprecated/http/client.ts deleted file mode 100644 index e443043..0000000 --- a/src/deprecated/http/client.ts +++ /dev/null @@ -1,149 +0,0 @@ -import axios, { - InternalAxiosRequestConfig, - isAxiosError, - RawAxiosRequestHeaders, -} from 'axios'; -import { - RATE_LIMIT_EXCEEDED, - RATE_LIMIT_EXCEEDED_STATUS_CODE, -} from '../../http/constants'; -import { HTTPResponse } from '../../http/types'; - -export const defaultResponse: HTTPResponse = { - data: { - delay: 0, - nextPage: 1, - records: [], - }, - message: '', - success: false, -}; - -/** - * HTTPClient class to make HTTP requests - * @deprecated - */ -export class HTTPClient { - private retryAfter = 0; - private retryAt = 0; - private axiosInstance = axios.create(); - - constructor() { - // Add request interceptor to check for retryAfter before making a request - this.axiosInstance.interceptors.request.use( - (config: InternalAxiosRequestConfig) => { - // Check if retryAfter is not 0 and return a LIMIT_EXCEEDED error - if (this.retryAfter !== 0) { - // check if the current time is greater than the retryAt time - const currentTime = new Date().getTime(); - if (currentTime < this.retryAt) { - console.error( - 'Rate limit exceeded. Interceptor has retryAfter: ' + - this.retryAfter - ); - // Rate limit exceeded. - return Promise.reject(RATE_LIMIT_EXCEEDED); - } else { - // Reset the retryAfter - this.retryAfter = 0; - } - } - return config; - }, - (error) => { - return Promise.reject(error); - } - ); - } - - /** - * - * Function to make a GET call to the endpoint. - * There is special handling for rate limit exceeded error. - * In case of rate limit exceeded, the function returns success as true and the delay time in seconds - * In case of any other error, the function returns success as false and the error message - */ - async getCall( - endpoint: string, - headers: RawAxiosRequestHeaders, - // eslint-disable-next-line @typescript-eslint/no-explicit-any - params?: any - ): Promise { - // Return the LIMIT_EXCEEDED error if the retryAfter is not 0 - try { - const res = await this.axiosInstance.get(endpoint, { - headers: headers, - params: params, - }); - return { - ...defaultResponse, - data: { - delay: 0, - records: res.data, - }, - success: true, - }; - } catch (error: unknown) { - console.log('Error in getCall: ' + JSON.stringify(error)); - // send error to adapter - if (isAxiosError(error)) { - if (error.response?.status === RATE_LIMIT_EXCEEDED_STATUS_CODE) { - this.retryAfter = error.response.headers['retry-after'] - ? error.response.headers['retry-after'] - : 0; - this.retryAt = new Date().getTime() + this.retryAfter * 1000; - console.warn( - 'Rate limit exceeded. Error code: ' + - error.response.status + - ' RetryAfter: ' + - this.retryAfter + - ' RetryAt: ' + - this.retryAt - ); - return { - data: { - delay: this.retryAfter, - records: [], - }, - message: RATE_LIMIT_EXCEEDED, - success: true, - }; - } - if (error.response) { - return { ...defaultResponse, message: error.response.data }; - } else { - return { ...defaultResponse, message: error.message }; - } - } else { - if (this.retryAfter !== 0) { - console.warn( - 'Rate limit exceeded. Going to return the following response: ' + - JSON.stringify(error) - ); - return { - data: { - delay: this.retryAfter, - records: [], - }, - message: - typeof error === 'string' - ? error - : JSON.stringify(error, Object.getOwnPropertyNames(error)), - success: true, - }; - } - return { - data: { - delay: this.retryAfter, - records: [], - }, - message: - typeof error === 'string' - ? error - : JSON.stringify(error, Object.getOwnPropertyNames(error)), - success: false, - }; - } - } - } -} diff --git a/src/deprecated/uploader/index.ts b/src/deprecated/uploader/index.ts deleted file mode 100644 index eca7203..0000000 --- a/src/deprecated/uploader/index.ts +++ /dev/null @@ -1,156 +0,0 @@ -import { betaSDK, client } from '@devrev/typescript-sdk'; -import fs, { promises as fsPromises } from 'fs'; -import { axiosClient } from '../../http/axios-client-internal'; -import { Artifact, UploadResponse } from '../../uploader/uploader.interfaces'; -import { createFormData } from '../common/helpers'; - -/** - * Uploader class is used to upload files to the DevRev platform. - * The class provides utilities to - * - prepare artifact - * - upload artifact - * - return the artifact information to the platform - * - * @class Uploader - * @constructor - * @param {string} endpoint - The endpoint of the DevRev platform - * @param {string} token - The token to authenticate with the DevRev platform - * @param {boolean} local - Flag to indicate if the uploader should upload to the file-system. - */ -export class Uploader { - private betaDevrevSdk: betaSDK.Api; - private local: boolean; - constructor(endpoint: string, token: string, local = false) { - this.betaDevrevSdk = client.setupBeta({ - endpoint, - token, - }); - this.local = local; - } - - /** - * - * Uploads the file to the DevRev platform. The file is uploaded to the platform - * and the artifact information is returned. - * - * @param {string} filename - The name of the file to be uploaded - * @param {string} entity - The entity type of the file to be uploaded - * @param {object[] | object} fetchedObjects - The objects to be uploaded - * @param filetype - The type of the file to be uploaded - * @returns {Promise} - The response object containing the artifact information - */ - async upload( - filename: string, - entity: string, - fetchedObjects: object[] | object, - filetype: string = 'application/jsonl+json' - ): Promise { - if (this.local) { - await this.downloadToLocal(filename, fetchedObjects); - } - - const preparedArtifact = await this.prepareArtifact(filename, filetype); - - if (!preparedArtifact) { - return { - artifact: undefined, - error: { message: 'Error while preparing artifact' }, - }; - } - - const uploadedArtifact = await this.uploadToArtifact( - preparedArtifact, - fetchedObjects - ); - - if (!uploadedArtifact) { - return { - artifact: undefined, - error: { message: 'Error while uploading artifact' }, - }; - } - - // If file was successfully uploaded we want to post data about that file when emitting - const itemCount = Array.isArray(fetchedObjects) ? fetchedObjects.length : 1; - const artifact: Artifact = { - id: preparedArtifact.id, - item_type: entity, - item_count: itemCount, - }; - - console.log(`Artifact uploaded successfully: ${artifact.id}`); - - return { artifact, error: undefined }; - } - - private async prepareArtifact( - filename: string, - filetype: string - ): Promise { - try { - const response = await this.betaDevrevSdk.artifactsPrepare({ - file_name: filename, - file_type: filetype, - }); - - return response.data; - } catch (error) { - console.error('Error while preparing artifact: ' + error); - return null; - } - } - - private async uploadToArtifact( - // eslint-disable-next-line @typescript-eslint/no-explicit-any - preparedArtifact: any, - fetchedObjects: object[] | object - // eslint-disable-next-line @typescript-eslint/no-explicit-any - ): Promise { - const formData = createFormData(preparedArtifact, fetchedObjects); - try { - const response = await axiosClient.post(preparedArtifact.url, formData, { - headers: { - 'Content-Type': 'multipart/form-data', - }, - }); - - return response; - } catch (error) { - console.error('Error while uploading artifact: ' + error); - return null; - } - } - - private async downloadToLocal( - filePath: string, - fetchedObjects: object | object[] - ) { - console.log(`Uploading ${filePath} to local file system`); - try { - if (!fs.existsSync('extracted_files')) { - fs.mkdirSync('extracted_files'); - } - - const timestamp = new Date().getTime(); - const fileHandle = await fsPromises.open( - `extracted_files/${timestamp}_${filePath}`, - 'w' - ); - let objArray = []; - if (!Array.isArray(fetchedObjects)) { - objArray.push(fetchedObjects); - } else { - objArray = fetchedObjects; - } - for (const jsonObject of objArray) { - const jsonLine = JSON.stringify(jsonObject) + '\n'; - await fileHandle.write(jsonLine); - } - await fileHandle.close(); - console.log('Data successfully written to', filePath); - } catch (error) { - console.error('Error writing data to file:', error); - return Promise.reject(error); - } - } -} diff --git a/src/index.ts b/src/index.ts index 26dd8ff..b4b94dc 100644 --- a/src/index.ts +++ b/src/index.ts @@ -1,10 +1,6 @@ export { AirSyncDefaultItemTypes } from './common/constants'; export { ExtractionCommonError } from './common/errors'; export * from './common/install-initial-domain-mapping'; -export * from './deprecated/adapter'; -export * from './deprecated/demo-extractor'; -export * from './deprecated/http/client'; -export * from './deprecated/uploader'; export * from './http'; export { formatAxiosError, serializeAxiosError } from './logger/logger'; export { MockServer } from './mock-server/mock-server'; diff --git a/tsconfig.build.json b/tsconfig.build.json new file mode 100644 index 0000000..81df2d6 --- /dev/null +++ b/tsconfig.build.json @@ -0,0 +1,4 @@ +{ + "extends": "./tsconfig.json", + "exclude": ["node_modules", "dist", "**/*.test.ts", "src/tests"] +} From 5c2aecd1d7aa3b8ea6f69553d52fa48feb44b711 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:11:15 +0200 Subject: [PATCH 05/22] docs: mark C1 done; move event-type-translation deletion to C3 Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index e9df782..6c9ab35 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -40,18 +40,28 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi Touch: `package.json` `name`; README references; api-extractor config (entry point / package name); rename the report file `*/ts-adaas.api.md` → `airsync-sdk.api.md` IF trivial, else defer report to Phase 2. Do NOT publish. Version → `2.0.0-beta.0` placeholder. -- **C1 — Delete dead/deprecated code + add build tsconfig.** - - Delete `src/deprecated/**` (see list below) and its exports from `src/index.ts`. - - Delete `src/common/event-type-translation.ts` + `.test.ts` (the old↔new event-type shim). - - Delete other `@deprecated`-tagged symbols / provably-unused code (grep-justified). +- **C1 — Delete deprecated dir + add build tsconfig.** + - Delete `src/deprecated/**` (see list below) and its 4 `export * from './deprecated/...'` lines from `src/index.ts`. - Add `tsconfig.build.json` (`include: ["src"]`, `exclude: ["**/*.test.ts","node_modules","dist"]`) and point `build` script at it. This is the "build stays green" enabler. + - NOTE: `event-type-translation.ts` deletion MOVED to C3 — it is NOT dead; production code + (`process-task.ts`, `spawn.ts`, `worker-adapter.ts`, `control-protocol.ts`) imports it, so it + can only be removed alongside the old enum members it translates (C3). Deleting it here would + break the build. - **C2 — Airdrop→AirSync identifier rename.** SDK identifiers/types/classes/comments only. NOT API route strings (rule 3). e.g. `AirdropEvent`→`AirSyncEvent`, `AirdropMessage`→`AirSyncMessage` (verify exact target names against `origin/v2`). Provide back-compat type aliases ONLY if origin/v2 did. -- **C3 — Delete deprecated enum members** (NOT a rename — main carries old+new side by side; drop old). - Leave only the new members. See enum tables below. Files: `src/types/extraction.ts`, - `src/types/loading.ts`, plus any `case`/reference cleanups in `control-protocol.ts`, `spawn.helpers.ts`, adapters. +- **C3 — Remove old event-type compatibility layer** (NOT a rename — main carries old+new side by side; drop old). + - Delete deprecated enum members, leaving only the new ones (tables below). Files: `src/types/extraction.ts`, + `src/types/loading.ts`. + - Delete `src/common/event-type-translation.ts` + `.test.ts`. + - Rewire its 4 production callers to use event types directly (no translation; backend now sends/accepts + only new types): `src/multithreading/process-task.ts` (translateIncomingEventType), + `src/multithreading/spawn/spawn.ts` (translateIncomingEventType), + `src/multithreading/worker-adapter/worker-adapter.ts` (translateOutgoingEventType), + `src/common/control-protocol.ts` (translateOutgoingEventType). + - Plus any `case`/reference cleanups in `spawn.helpers.ts`, adapters. + - Exported translation fns are NOT in index.ts (internal) — safe to delete. - **C4a — State split (structural only).** Introduce `BaseState` + `ExtractionState` + `LoadingState`. KEEP the flat `AdapterState = ConnectorState & SdkState` shape (behavior identical). Author fresh; origin/v2 `src/state/base-state.ts` etc. are structural reference only. @@ -170,7 +180,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | Commit | State | Notes | |--------|-------|-------| | C0 package rename | ☑ done | 8ddeb87. @devrev/ts-adaas→@devrev/airsync-sdk, v2.0.0-beta.0. Report filename rename deferred to C8. | -| C1 delete + tsconfig | ☐ todo | | +| C1 delete + tsconfig | ☑ done | d573cb6. Deleted src/deprecated/ (6 files) + 4 index exports; added tsconfig.build.json (excludes tests), build script points to it. Reviewer-approved. event-type-translation deletion moved to C3. | | C2 AirSync rename | ☐ todo | | | C3 enum cleanup | ☐ todo | | | C4a state split | ☐ todo | | From 1fa9afc8a08b2204a14c07dd48074e779ba44cea Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:20:18 +0200 Subject: [PATCH 06/22] refactor(v2)!: rename AirdropEvent/AirdropMessage -> AirSync*; rebrand prose Rename the public types AirdropEvent -> AirSyncEvent and AirdropMessage -> AirSyncMessage across all production code (no back-compat alias), and update stale 'ADaaS'/'Airdrop' branding in comments and JSDoc to 'AirSync'. Left untouched (platform contracts): the /internal/airdrop.* API route strings, the AIRDROP_* mapping enum members and their 'airdrop_*' values, and the external_system_type: 'ADaaS' string literal. BREAKING CHANGE: AirdropEvent and AirdropMessage are renamed to AirSyncEvent and AirSyncMessage. Connectors must update their imports and type references. Ref: V2_PROGRESS.md C2 Co-Authored-By: Claude Opus 4.8 (1M context) --- src/common/control-protocol.ts | 4 +-- src/common/install-initial-domain-mapping.ts | 4 +-- src/common/test-utils.ts | 14 ++++---- src/logger/logger.interfaces.ts | 4 +-- src/mappers/mappers.interface.ts | 4 +-- src/multithreading/spawn/spawn.ts | 8 ++--- .../worker-adapter/worker-adapter.ts | 6 ++-- src/repo/repo.interfaces.ts | 4 +-- src/state/state.interfaces.ts | 4 +-- src/types/extraction.ts | 32 +++++++++---------- src/types/index.ts | 4 +-- src/types/loading.ts | 4 +-- src/types/workers.ts | 18 +++++------ src/uploader/uploader.interfaces.ts | 6 ++-- 14 files changed, 58 insertions(+), 58 deletions(-) diff --git a/src/common/control-protocol.ts b/src/common/control-protocol.ts index c372c91..6116de9 100644 --- a/src/common/control-protocol.ts +++ b/src/common/control-protocol.ts @@ -1,7 +1,7 @@ import { AxiosResponse } from 'axios'; import { axiosClient } from '../http/axios-client-internal'; import { - AirdropEvent, + AirSyncEvent, EventData, ExtractorEvent, ExtractorEventType, @@ -12,7 +12,7 @@ import { LIBRARY_VERSION } from './constants'; import { translateOutgoingEventType } from './event-type-translation'; export interface EmitInterface { - event: AirdropEvent; + event: AirSyncEvent; eventType: ExtractorEventType | LoaderEventType; data?: EventData; } diff --git a/src/common/install-initial-domain-mapping.ts b/src/common/install-initial-domain-mapping.ts index f65026e..5a8ec43 100644 --- a/src/common/install-initial-domain-mapping.ts +++ b/src/common/install-initial-domain-mapping.ts @@ -1,11 +1,11 @@ import { axiosClient } from '../http/axios-client-internal'; -import { AirdropEvent } from '../types/extraction'; +import { AirSyncEvent } from '../types/extraction'; import { serializeError } from '../logger/logger'; import { InitialDomainMapping } from '../types/common'; export async function installInitialDomainMapping( - event: AirdropEvent, + event: AirSyncEvent, initialDomainMappingJson: InitialDomainMapping ): Promise { const devrevEndpoint = event.execution_metadata.devrev_endpoint; diff --git a/src/common/test-utils.ts b/src/common/test-utils.ts index 1584a34..c9b8b2b 100644 --- a/src/common/test-utils.ts +++ b/src/common/test-utils.ts @@ -1,4 +1,4 @@ -import { AirdropEvent, EventType } from '../types/extraction'; +import { AirSyncEvent, EventType } from '../types/extraction'; export const MOCK_SERVER_DEFAULT_URL = 'http://localhost:0'; @@ -44,19 +44,19 @@ function deepMerge>( } /** - * Creates a mock AirdropEvent for testing. + * Creates a mock AirSyncEvent for testing. * * @param mockServerUrl - Base URL for the mock server. Defaults to {@link MOCK_SERVER_DEFAULT_URL}. * The `callback_url`, `worker_data_url`, and `devrev_endpoint` fields are * derived from this value unless explicitly overridden. - * @param overrides - Deep partial of AirdropEvent. Any provided fields are + * @param overrides - Deep partial of AirSyncEvent. Any provided fields are * deep-merged on top of the defaults. */ export function createMockEvent( mockServerUrl: string = MOCK_SERVER_DEFAULT_URL, - overrides: DeepPartial = {} -): AirdropEvent { - const base: AirdropEvent = { + overrides: DeepPartial = {} +): AirSyncEvent { + const base: AirSyncEvent = { context: { secrets: { service_account_token: 'test_token', @@ -118,7 +118,7 @@ export function createMockEvent( const merged = deepMerge( base as unknown as Record, overrides as DeepPartial> - ) as unknown as AirdropEvent; + ) as unknown as AirSyncEvent; // Ensure mock server URLs always win over overrides, unless the caller // explicitly provided them. diff --git a/src/logger/logger.interfaces.ts b/src/logger/logger.interfaces.ts index 9c39bde..b8f2494 100644 --- a/src/logger/logger.interfaces.ts +++ b/src/logger/logger.interfaces.ts @@ -1,9 +1,9 @@ import type { RawAxiosResponseHeaders } from 'axios'; -import type { AirdropEvent, EventContext } from '../types/extraction'; +import type { AirSyncEvent, EventContext } from '../types/extraction'; import type { WorkerAdapterOptions } from '../types/workers'; export interface LoggerFactoryInterface { - event: AirdropEvent; + event: AirSyncEvent; options?: WorkerAdapterOptions; } diff --git a/src/mappers/mappers.interface.ts b/src/mappers/mappers.interface.ts index b37272a..7d03c60 100644 --- a/src/mappers/mappers.interface.ts +++ b/src/mappers/mappers.interface.ts @@ -1,4 +1,4 @@ -import { AirdropEvent } from '../types'; +import { AirSyncEvent } from '../types'; import { DonV2 } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; @@ -6,7 +6,7 @@ import { WorkerAdapterOptions } from '../types/workers'; * Configuration interface for creating a Mappers instance. */ export interface MappersFactoryInterface { - event: AirdropEvent; + event: AirSyncEvent; options?: WorkerAdapterOptions; } diff --git a/src/multithreading/spawn/spawn.ts b/src/multithreading/spawn/spawn.ts index 0350d34..7743268 100644 --- a/src/multithreading/spawn/spawn.ts +++ b/src/multithreading/spawn/spawn.ts @@ -5,7 +5,7 @@ import { emit } from '../../common/control-protocol'; import { translateIncomingEventType } from '../../common/event-type-translation'; import { getMemoryUsage } from '../../common/helpers'; import { Logger, serializeError } from '../../logger/logger'; -import { AirdropEvent, EventType } from '../../types/extraction'; +import { AirSyncEvent, EventType } from '../../types/extraction'; import { GetWorkerPathInterface, SpawnFactoryInterface, @@ -65,7 +65,7 @@ function getWorkerPath({ * The class provides utilities to emit control events to the platform and exit the worker gracefully. * In case of lambda timeout, the class emits a lambda timeout event to the platform. * @param {SpawnFactoryInterface} options - The options to create a new instance of Spawn class - * @param {AirdropEvent} options.event - The event object received from the platform + * @param {AirSyncEvent} options.event - The event object received from the platform * @param {object} options.initialState - The initial state of the adapter * @param {string} [options.workerPath] Remove getWorkerPath function and use baseWorkerPath: __dirname instead of workerPath * @param {string} [options.baseWorkerPath] - The base path for the worker files, usually `__dirname` @@ -169,7 +169,7 @@ export async function spawn({ } export class Spawn { - private event: AirdropEvent; + private event: AirSyncEvent; private alreadyEmitted: boolean; private softTimeoutSent: boolean; private defaultLambdaTimeout: number = DEFAULT_LAMBDA_TIMEOUT; @@ -269,7 +269,7 @@ export class Spawn { // If worker sends a message that it has emitted an event, then set alreadyEmitted to true. if (message?.subject === WorkerMessageSubject.WorkerMessageEmitted) { - console.info('Worker has emitted message to ADaaS.'); + console.info('Worker has emitted message to AirSync.'); this.alreadyEmitted = true; } diff --git a/src/multithreading/worker-adapter/worker-adapter.ts b/src/multithreading/worker-adapter/worker-adapter.ts index 8149129..d938287 100644 --- a/src/multithreading/worker-adapter/worker-adapter.ts +++ b/src/multithreading/worker-adapter/worker-adapter.ts @@ -27,7 +27,7 @@ import { import { State } from '../../state/state'; import { AdapterState } from '../../state/state.interfaces'; import { - AirdropEvent, + AirSyncEvent, EventData, EventType, ExternalSystemAttachmentProcessors, @@ -74,7 +74,7 @@ export function createWorkerAdapter({ } /** - * WorkerAdapter class is used to interact with Airdrop platform. It is passed to the snap-in + * WorkerAdapter class is used to interact with AirSync platform. It is passed to the snap-in * as parameter in processTask and onTimeout functions. The class provides * utilities to emit control events to the platform, update the state of the connector, * and upload artifacts to the platform. @@ -89,7 +89,7 @@ export function createWorkerAdapter({ * @public */ export class WorkerAdapter { - readonly event: AirdropEvent; + readonly event: AirSyncEvent; readonly options?: WorkerAdapterOptions; isTimeout: boolean; hasWorkerEmitted: boolean; diff --git a/src/repo/repo.interfaces.ts b/src/repo/repo.interfaces.ts index 4573a3a..f790be2 100644 --- a/src/repo/repo.interfaces.ts +++ b/src/repo/repo.interfaces.ts @@ -1,6 +1,6 @@ import { Artifact } from '../uploader/uploader.interfaces'; -import { AirdropEvent } from '../types/extraction'; +import { AirSyncEvent } from '../types/extraction'; import { WorkerAdapterOptions } from '../types/workers'; /** @@ -16,7 +16,7 @@ export interface RepoInterface { * RepoFactoryInterface is an interface that defines the structure of a repo factory which is used to create a repo. */ export interface RepoFactoryInterface { - event: AirdropEvent; + event: AirSyncEvent; itemType: string; normalize?: (record: object) => NormalizedItem | NormalizedAttachment; onUpload: (artifact: Artifact) => void; diff --git a/src/state/state.interfaces.ts b/src/state/state.interfaces.ts index f3c0592..6c28cf3 100644 --- a/src/state/state.interfaces.ts +++ b/src/state/state.interfaces.ts @@ -1,5 +1,5 @@ import { InitialDomainMapping } from '../types/common'; -import { AirdropEvent } from '../types/extraction'; +import { AirSyncEvent } from '../types/extraction'; import { FileToLoad } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; @@ -56,7 +56,7 @@ export interface FromDevRev { } export interface StateInterface { - event: AirdropEvent; + event: AirSyncEvent; initialState: ConnectorState; initialDomainMapping?: InitialDomainMapping; options?: WorkerAdapterOptions; diff --git a/src/types/extraction.ts b/src/types/extraction.ts index 0c328de..c902e0b 100644 --- a/src/types/extraction.ts +++ b/src/types/extraction.ts @@ -10,7 +10,7 @@ import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; import { DonV2, LoaderReport, RateLimited } from './loading'; /** - * EventType is an enum that defines the different types of events that can be sent to the external extractor from ADaaS. + * EventType is an enum that defines the different types of events that can be sent to the external extractor from AirSync. * The external extractor can use these events to know what to do next in the extraction process. */ export enum EventType { @@ -71,8 +71,8 @@ export enum EventType { } /** - * ExtractorEventType is an enum that defines the different types of events that can be sent from the external extractor to ADaaS. - * The external extractor can use these events to inform ADaaS about the progress of the extraction process. + * ExtractorEventType is an enum that defines the different types of events that can be sent from the external extractor to AirSync. + * The external extractor can use these events to inform AirSync about the progress of the extraction process. */ export enum ExtractorEventType { // Extraction - Old member names with OLD values (deprecated, kept for backwards compatibility) @@ -248,7 +248,7 @@ export interface TimeValue { } /** - * EventContextIn is an interface that defines the structure of the input event context that is sent to the external extractor from ADaaS. + * EventContextIn is an interface that defines the structure of the input event context that is sent to the external extractor from AirSync. * @deprecated */ export interface EventContextIn { @@ -276,7 +276,7 @@ export interface EventContextIn { } /** - * EventContextOut is an interface that defines the structure of the output event context that is sent from the external extractor to ADaaS. + * EventContextOut is an interface that defines the structure of the output event context that is sent from the external extractor to AirSync. * @deprecated */ export interface EventContextOut { @@ -286,7 +286,7 @@ export interface EventContextOut { } /** - * EventContext is an interface that defines the structure of the event context that is sent to the external connector from Airdrop. + * EventContext is an interface that defines the structure of the event context that is sent to the external connector from AirSync. */ export interface EventContext { callback_url: string; @@ -378,7 +378,7 @@ export interface EventContext { } /** - * ConnectionData is an interface that defines the structure of the connection data that is sent to the external extractor from ADaaS. + * ConnectionData is an interface that defines the structure of the connection data that is sent to the external extractor from AirSync. * It contains the organization ID, organization name, key, and key type. */ export interface ConnectionData { @@ -389,7 +389,7 @@ export interface ConnectionData { } /** - * EventData is an interface that defines the structure of the event data that is sent from the external extractor to ADaaS. + * EventData is an interface that defines the structure of the event data that is sent from the external extractor to AirSync. */ export interface EventData { /** @@ -416,7 +416,7 @@ export interface EventData { } /** - * WorkerMetadata is an interface that defines the structure of the worker metadata that is sent from the external extractor to ADaaS. + * WorkerMetadata is an interface that defines the structure of the worker metadata that is sent from the external extractor to AirSync. */ export interface WorkerMetadata { adaas_library_version: string; @@ -439,10 +439,10 @@ export interface DomainObjectState { } /** - * AirdropEvent is an interface that defines the structure of the event that is sent to the external extractor from ADaaS. + * AirSyncEvent is an interface that defines the structure of the event that is sent to the external extractor from AirSync. * It contains the context, payload, execution metadata, and input data as common snap-ins. */ -export interface AirdropEvent { +export interface AirSyncEvent { context: { secrets: { service_account_token: string; @@ -450,7 +450,7 @@ export interface AirdropEvent { snap_in_version_id: string; snap_in_id: string; }; - payload: AirdropMessage; + payload: AirSyncMessage; execution_metadata: { devrev_endpoint: string; }; @@ -458,9 +458,9 @@ export interface AirdropEvent { } /** - * AirdropMessage is an interface that defines the structure of the payload/message that is sent to the external extractor from ADaaS. + * AirSyncMessage is an interface that defines the structure of the payload/message that is sent to the external extractor from AirSync. */ -export interface AirdropMessage { +export interface AirSyncMessage { connection_data: ConnectionData; event_context: EventContext; event_type: EventType; @@ -468,7 +468,7 @@ export interface AirdropMessage { } /** - * ExtractorEvent is an interface that defines the structure of the event that is sent from the external extractor to ADaaS. + * ExtractorEvent is an interface that defines the structure of the event that is sent from the external extractor to AirSync. * It contains the event type, event context, extractor state, and event data. */ export interface ExtractorEvent { @@ -495,7 +495,7 @@ export type ExternalSystemAttachmentStreamingFunction = ({ export interface ExternalSystemAttachmentStreamingParams { item: NormalizedAttachment; - event: AirdropEvent; + event: AirSyncEvent; } export interface ExternalSystemAttachmentStreamingResponse { diff --git a/src/types/index.ts b/src/types/index.ts index f49ef30..f98c725 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -10,8 +10,8 @@ export { // Extraction export { - AirdropEvent, - AirdropMessage, + AirSyncEvent, + AirSyncMessage, ConnectionData, DomainObjectState, EventContextIn, diff --git a/src/types/loading.ts b/src/types/loading.ts index b08fb1c..3ab48ae 100644 --- a/src/types/loading.ts +++ b/src/types/loading.ts @@ -1,6 +1,6 @@ import { Mappers } from '../mappers/mappers'; import { ErrorRecord } from './common'; -import { AirdropEvent } from './extraction'; +import { AirSyncEvent } from './extraction'; export interface StatsFileObject { id: string; @@ -60,7 +60,7 @@ export interface ExternalSystemItem { export interface ExternalSystemItemLoadingParams { item: Type; mappers: Mappers; - event: AirdropEvent; + event: AirSyncEvent; } export interface ExternalSystemItemLoadingResponse { diff --git a/src/types/workers.ts b/src/types/workers.ts index 4123c29..9e3ed25 100644 --- a/src/types/workers.ts +++ b/src/types/workers.ts @@ -4,7 +4,7 @@ import type { LogLevel } from '../logger/logger.interfaces'; import { State } from '../state/state'; import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; -import { AirdropEvent, EventType, ExtractorEventType } from './extraction'; +import { AirSyncEvent, EventType, ExtractorEventType } from './extraction'; import { LoaderEventType } from './loading'; @@ -14,12 +14,12 @@ import { InitialDomainMapping } from './common'; * WorkerAdapterInterface is an interface for WorkerAdapter class. * @interface WorkerAdapterInterface * @constructor - * @param {AirdropEvent} event - The event object received from the platform + * @param {AirSyncEvent} event - The event object received from the platform * @param {object=} initialState - The initial state of the adapter * @param {WorkerAdapterInterface} options - The options to create a new instance of WorkerAdapter class */ export interface WorkerAdapterInterface { - event: AirdropEvent; + event: AirSyncEvent; adapterState: State; options?: WorkerAdapterOptions; } @@ -51,11 +51,11 @@ export interface WorkerAdapterOptions { * SpawnInterface is an interface for Spawn class. * @interface SpawnInterface * @constructor - * @param {AirdropEvent} event - The event object received from the platform + * @param {AirSyncEvent} event - The event object received from the platform * @param {Worker} worker - The worker thread */ export interface SpawnInterface { - event: AirdropEvent; + event: AirSyncEvent; worker: Worker; options?: WorkerAdapterOptions; resolve: (value: void | PromiseLike) => void; @@ -69,7 +69,7 @@ export interface SpawnInterface { * In case of lambda timeout, the class emits a lambda timeout event to the platform. * @interface SpawnFactoryInterface * @constructor - * @param {AirdropEvent} event - The event object received from the platform + * @param {AirSyncEvent} event - The event object received from the platform * @param {object=} initialState - The initial state of the adapter * @param {string} workerPath - The path to the worker file * @param {string} initialDomainMapping - The initial domain mapping @@ -77,7 +77,7 @@ export interface SpawnInterface { * @param {string=} baseWorkerPath - The base path for the worker files, usually `__dirname` */ export interface SpawnFactoryInterface { - event: AirdropEvent; + event: AirSyncEvent; initialState: ConnectorState; /** @deprecated Remove getWorkerPath function and use baseWorkerPath: __dirname instead of workerPath */ @@ -181,7 +181,7 @@ export type WorkerMessage = * WorkerData represents the structure of the worker data object. */ export interface WorkerData { - event: AirdropEvent; + event: AirSyncEvent; initialState: ConnectorState; workerPath: string; initialDomainMapping?: InitialDomainMapping; @@ -192,7 +192,7 @@ export interface WorkerData { * GetWorkerPathInterface is an interface for getting the worker path. */ export interface GetWorkerPathInterface { - event: AirdropEvent; + event: AirSyncEvent; workerBasePath?: string | null; } diff --git a/src/uploader/uploader.interfaces.ts b/src/uploader/uploader.interfaces.ts index ce995a7..2224bed 100644 --- a/src/uploader/uploader.interfaces.ts +++ b/src/uploader/uploader.interfaces.ts @@ -1,10 +1,10 @@ import { ErrorRecord } from '../types/common'; -import { AirdropEvent } from '../types/extraction'; +import { AirSyncEvent } from '../types/extraction'; import { ExternalSystemItem, StatsFileObject } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; export interface UploaderFactoryInterface { - event: AirdropEvent; + event: AirSyncEvent; options?: WorkerAdapterOptions; } @@ -17,7 +17,7 @@ export type UploaderResult = | { response?: never; error: unknown }; /** - * Artifact is an interface that defines the structure of an artifact. Artifact is a file that is generated by the extractor and uploaded to ADaaS. + * Artifact is an interface that defines the structure of an artifact. Artifact is a file that is generated by the extractor and uploaded to AirSync. */ export interface Artifact { id: string; From 8d091594b5e64be4729981f3f288d55a9aa20445 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:20:28 +0200 Subject: [PATCH 07/22] docs: mark C2 done Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 6c9ab35..ed65b4b 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -48,9 +48,15 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi (`process-task.ts`, `spawn.ts`, `worker-adapter.ts`, `control-protocol.ts`) imports it, so it can only be removed alongside the old enum members it translates (C3). Deleting it here would break the build. -- **C2 — Airdrop→AirSync identifier rename.** SDK identifiers/types/classes/comments only. - NOT API route strings (rule 3). e.g. `AirdropEvent`→`AirSyncEvent`, `AirdropMessage`→`AirSyncMessage` - (verify exact target names against `origin/v2`). Provide back-compat type aliases ONLY if origin/v2 did. +- **C2 — Airdrop→AirSync identifier rename.** DECIDED with Rado: + - HARD rename, NO back-compat alias: `AirdropEvent`→`AirSyncEvent`, `AirdropMessage`→`AirSyncMessage`. + (origin/v2 did NOT do this rename — oracle unreliable here; do it properly.) + - Update stale branding in comments/prose: bare "Airdrop" + "ADaaS" → "AirSync". + - MUST NOT touch: `/internal/airdrop.*` API routes; the `AIRDROP_*` mapping enum members AND their + `'airdrop_*'` string values (mappers.interface.ts); the `external_system_type: 'ADaaS'` string LITERAL + (platform contract — keep quotes-protected). + - PRODUCTION scope only: exclude `**/*.test.ts` AND `src/tests/**` (test scaffolding renamed later in C10). + - Connectors all import AirdropEvent → breaking; captured for migrate-v2 skill. - **C3 — Remove old event-type compatibility layer** (NOT a rename — main carries old+new side by side; drop old). - Delete deprecated enum members, leaving only the new ones (tables below). Files: `src/types/extraction.ts`, `src/types/loading.ts`. @@ -181,7 +187,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: |--------|-------|-------| | C0 package rename | ☑ done | 8ddeb87. @devrev/ts-adaas→@devrev/airsync-sdk, v2.0.0-beta.0. Report filename rename deferred to C8. | | C1 delete + tsconfig | ☑ done | d573cb6. Deleted src/deprecated/ (6 files) + 4 index exports; added tsconfig.build.json (excludes tests), build script points to it. Reviewer-approved. event-type-translation deletion moved to C3. | -| C2 AirSync rename | ☐ todo | | +| C2 AirSync rename | ☑ done | 1fa9afc. AirdropEvent→AirSyncEvent, AirdropMessage→AirSyncMessage (hard, no alias) + prose ADaaS/Airdrop→AirSync. Protected: airdrop.* routes, AIRDROP_* enum, 'ADaaS' literal. Reviewer-approved. | | C3 enum cleanup | ☐ todo | | | C4a state split | ☐ todo | | | C4b state envelope | ☐ todo | | From cc05f41fdcc74548fcac43c33d6cd2f971b9e4ae Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:55:30 +0200 Subject: [PATCH 08/22] refactor(v2)!: remove old event-type compatibility layer The backend now sends and accepts only the new event-type values, so the old->new translation indirection and the deprecated enum members are no longer needed. - Delete deprecated members from EventType and ExtractorEventType (the EXTRACTION_* values) and the typo/plural dupes from LoaderEventType (DataLoadingDelay, AttachmentsLoading*), keeping only the new members. - Delete src/common/event-type-translation.ts and its test. - Rewire the four callers to use event types directly with no translation: process-task.ts, spawn.ts (incoming), control-protocol.ts emit() and worker-adapter.ts emit() (outgoing). - Drop the now-dead old-member case arms in spawn.helpers.ts (return values unchanged). Behavior is unchanged: translation was identity for new values, which are now the only values on the wire. BREAKING CHANGE: removed deprecated enum members. Connectors must use the new event-type members (e.g. EventType.StartExtractingMetadata instead of EventType.ExtractionMetadataStart; LoaderEventType.DataLoadingDelayed instead of DataLoadingDelay). Ref: V2_PROGRESS.md C3 Co-Authored-By: Claude Opus 4.8 (1M context) --- src/common/control-protocol.ts | 7 +- src/common/event-type-translation.test.ts | 193 ------------------ src/common/event-type-translation.ts | 163 --------------- src/multithreading/process-task.ts | 6 - src/multithreading/spawn/spawn.helpers.ts | 20 +- src/multithreading/spawn/spawn.ts | 18 +- .../worker-adapter/worker-adapter.ts | 3 - src/types/extraction.ts | 126 ++---------- src/types/loading.ts | 24 +-- 9 files changed, 23 insertions(+), 537 deletions(-) delete mode 100644 src/common/event-type-translation.test.ts delete mode 100644 src/common/event-type-translation.ts diff --git a/src/common/control-protocol.ts b/src/common/control-protocol.ts index 6116de9..0d13be1 100644 --- a/src/common/control-protocol.ts +++ b/src/common/control-protocol.ts @@ -9,7 +9,6 @@ import { } from '../types/extraction'; import { LoaderEventType } from '../types/loading'; import { LIBRARY_VERSION } from './constants'; -import { translateOutgoingEventType } from './event-type-translation'; export interface EmitInterface { event: AirSyncEvent; @@ -22,12 +21,8 @@ export const emit = async ({ eventType, data, }: EmitInterface): Promise => { - // Translate outgoing event type to ensure we always send new event types - // TODO: Remove when the old types are completely phased out - const translatedEventType = translateOutgoingEventType(eventType); - const newEvent: ExtractorEvent | LoaderEvent = { - event_type: translatedEventType, + event_type: eventType, event_context: event.payload.event_context, event_data: { ...data, diff --git a/src/common/event-type-translation.test.ts b/src/common/event-type-translation.test.ts deleted file mode 100644 index 85b4825..0000000 --- a/src/common/event-type-translation.test.ts +++ /dev/null @@ -1,193 +0,0 @@ -import { EventType, ExtractorEventType } from '../types/extraction'; -import { LoaderEventType } from '../types/loading'; -import { - translateExtractorEventType, - translateIncomingEventType, - translateLoaderEventType, - translateOutgoingEventType, -} from './event-type-translation'; - -describe(translateIncomingEventType.name, () => { - it.each([ - [ - EventType.ExtractionExternalSyncUnitsStart, - EventType.StartExtractingExternalSyncUnits, - ], - [EventType.ExtractionMetadataStart, EventType.StartExtractingMetadata], - [EventType.ExtractionDataStart, EventType.StartExtractingData], - [EventType.ExtractionDataContinue, EventType.ContinueExtractingData], - [EventType.ExtractionDataDelete, EventType.StartDeletingExtractorState], - [ - EventType.ExtractionAttachmentsStart, - EventType.StartExtractingAttachments, - ], - [ - EventType.ExtractionAttachmentsContinue, - EventType.ContinueExtractingAttachments, - ], - [ - EventType.ExtractionAttachmentsDelete, - EventType.StartDeletingExtractorAttachmentsState, - ], - ])('maps legacy extraction event %s to %s', (legacy, modern) => { - expect(translateIncomingEventType(legacy)).toBe(modern); - }); - - it.each([ - [EventType.StartExtractingExternalSyncUnits], - [EventType.StartExtractingMetadata], - [EventType.StartExtractingData], - [EventType.ContinueExtractingData], - [EventType.StartDeletingExtractorState], - [EventType.StartExtractingAttachments], - [EventType.ContinueExtractingAttachments], - [EventType.StartDeletingExtractorAttachmentsState], - [EventType.StartLoadingData], - [EventType.ContinueLoadingData], - [EventType.StartLoadingAttachments], - [EventType.ContinueLoadingAttachments], - [EventType.StartDeletingLoaderState], - [EventType.StartDeletingLoaderAttachmentState], - [EventType.UnknownEventType], - ])('is a no-op for already-modern event type %s', (eventType) => { - expect(translateIncomingEventType(eventType)).toBe(eventType); - }); - - it('returns the input verbatim for an unrecognised event type', () => { - const result = translateIncomingEventType('NONSENSE_EVENT' as EventType); - - expect(result).toBe('NONSENSE_EVENT'); - }); -}); - -describe(translateExtractorEventType.name, () => { - it.each([ - [ - ExtractorEventType.ExtractionExternalSyncUnitsDone, - ExtractorEventType.ExternalSyncUnitExtractionDone, - ], - [ - ExtractorEventType.ExtractionExternalSyncUnitsError, - ExtractorEventType.ExternalSyncUnitExtractionError, - ], - [ - ExtractorEventType.ExtractionMetadataDone, - ExtractorEventType.MetadataExtractionDone, - ], - [ - ExtractorEventType.ExtractionMetadataError, - ExtractorEventType.MetadataExtractionError, - ], - [ - ExtractorEventType.ExtractionDataProgress, - ExtractorEventType.DataExtractionProgress, - ], - [ - ExtractorEventType.ExtractionDataDelay, - ExtractorEventType.DataExtractionDelayed, - ], - [ - ExtractorEventType.ExtractionDataDone, - ExtractorEventType.DataExtractionDone, - ], - [ - ExtractorEventType.ExtractionDataError, - ExtractorEventType.DataExtractionError, - ], - [ - ExtractorEventType.ExtractionDataDeleteDone, - ExtractorEventType.ExtractorStateDeletionDone, - ], - [ - ExtractorEventType.ExtractionDataDeleteError, - ExtractorEventType.ExtractorStateDeletionError, - ], - [ - ExtractorEventType.ExtractionAttachmentsProgress, - ExtractorEventType.AttachmentExtractionProgress, - ], - [ - ExtractorEventType.ExtractionAttachmentsDelay, - ExtractorEventType.AttachmentExtractionDelayed, - ], - [ - ExtractorEventType.ExtractionAttachmentsDone, - ExtractorEventType.AttachmentExtractionDone, - ], - [ - ExtractorEventType.ExtractionAttachmentsError, - ExtractorEventType.AttachmentExtractionError, - ], - [ - ExtractorEventType.ExtractionAttachmentsDeleteDone, - ExtractorEventType.ExtractorAttachmentsStateDeletionDone, - ], - [ - ExtractorEventType.ExtractionAttachmentsDeleteError, - ExtractorEventType.ExtractorAttachmentsStateDeletionError, - ], - ])('maps legacy extractor event %s to %s', (legacy, modern) => { - expect(translateExtractorEventType(legacy)).toBe(modern); - }); - - it.each([ - [ExtractorEventType.DataExtractionDone], - [ExtractorEventType.DataExtractionProgress], - [ExtractorEventType.AttachmentExtractionDone], - [ExtractorEventType.MetadataExtractionDone], - [ExtractorEventType.UnknownEventType], - ])('is a no-op for already-modern extractor event %s', (eventType) => { - expect(translateExtractorEventType(eventType)).toBe(eventType); - }); -}); - -describe(translateLoaderEventType.name, () => { - it.each([ - [LoaderEventType.DataLoadingDelay, LoaderEventType.DataLoadingDelayed], - [ - LoaderEventType.AttachmentsLoadingProgress, - LoaderEventType.AttachmentLoadingProgress, - ], - [ - LoaderEventType.AttachmentsLoadingDelayed, - LoaderEventType.AttachmentLoadingDelayed, - ], - [ - LoaderEventType.AttachmentsLoadingDone, - LoaderEventType.AttachmentLoadingDone, - ], - [ - LoaderEventType.AttachmentsLoadingError, - LoaderEventType.AttachmentLoadingError, - ], - ])('maps legacy loader event %s to %s', (legacy, modern) => { - expect(translateLoaderEventType(legacy)).toBe(modern); - }); - - it.each([ - [LoaderEventType.DataLoadingDone], - [LoaderEventType.DataLoadingProgress], - [LoaderEventType.AttachmentLoadingDone], - ])('is a no-op for already-modern loader event %s', (eventType) => { - expect(translateLoaderEventType(eventType)).toBe(eventType); - }); -}); - -describe(translateOutgoingEventType.name, () => { - it('routes extractor events through translateExtractorEventType', () => { - expect( - translateOutgoingEventType(ExtractorEventType.ExtractionDataDone) - ).toBe(ExtractorEventType.DataExtractionDone); - }); - - it('routes loader events through translateLoaderEventType', () => { - expect( - translateOutgoingEventType(LoaderEventType.AttachmentsLoadingDone) - ).toBe(LoaderEventType.AttachmentLoadingDone); - }); - - it('passes through unknown event types unchanged', () => { - const unknown = 'SOME_UNKNOWN_EVENT' as ExtractorEventType; - expect(translateOutgoingEventType(unknown)).toBe(unknown); - }); -}); diff --git a/src/common/event-type-translation.ts b/src/common/event-type-translation.ts deleted file mode 100644 index f1a32ab..0000000 --- a/src/common/event-type-translation.ts +++ /dev/null @@ -1,163 +0,0 @@ -import { EventType, ExtractorEventType } from '../types/extraction'; -import { LoaderEventType } from '../types/loading'; - -/** - * Maps old incoming event type strings to new EventType enum values. - * This ensures backwards compatibility when the platform sends old event types. - * @param eventTypeString The raw event type string from the platform - * @returns The translated EventType enum value - */ -export function translateIncomingEventType(eventTypeString: string): EventType { - // Create a reverse mapping from OLD string values to NEW enum member names - const eventTypeMap: Record = { - // Old extraction event types from platform -> New enum members - [EventType.ExtractionExternalSyncUnitsStart]: - EventType.StartExtractingExternalSyncUnits, - [EventType.ExtractionMetadataStart]: EventType.StartExtractingMetadata, - [EventType.ExtractionDataStart]: EventType.StartExtractingData, - [EventType.ExtractionDataContinue]: EventType.ContinueExtractingData, - [EventType.ExtractionDataDelete]: EventType.StartDeletingExtractorState, - [EventType.ExtractionAttachmentsStart]: - EventType.StartExtractingAttachments, - [EventType.ExtractionAttachmentsContinue]: - EventType.ContinueExtractingAttachments, - [EventType.ExtractionAttachmentsDelete]: - EventType.StartDeletingExtractorAttachmentsState, - - // New extraction event types (already correct, map to new enum members) - [EventType.StartExtractingExternalSyncUnits]: - EventType.StartExtractingExternalSyncUnits, - [EventType.StartExtractingMetadata]: EventType.StartExtractingMetadata, - [EventType.StartExtractingData]: EventType.StartExtractingData, - [EventType.ContinueExtractingData]: EventType.ContinueExtractingData, - [EventType.StartDeletingExtractorState]: - EventType.StartDeletingExtractorState, - [EventType.StartExtractingAttachments]: - EventType.StartExtractingAttachments, - [EventType.ContinueExtractingAttachments]: - EventType.ContinueExtractingAttachments, - [EventType.StartDeletingExtractorAttachmentsState]: - EventType.StartDeletingExtractorAttachmentsState, - - // Loading events - [EventType.StartLoadingData]: EventType.StartLoadingData, - [EventType.ContinueLoadingData]: EventType.ContinueLoadingData, - [EventType.StartLoadingAttachments]: EventType.StartLoadingAttachments, - [EventType.ContinueLoadingAttachments]: - EventType.ContinueLoadingAttachments, - [EventType.StartDeletingLoaderState]: EventType.StartDeletingLoaderState, - [EventType.StartDeletingLoaderAttachmentState]: - EventType.StartDeletingLoaderAttachmentState, - - // Unknown - [EventType.UnknownEventType]: EventType.UnknownEventType, - }; - - const translated = eventTypeMap[eventTypeString]; - if (!translated) { - console.warn( - `Unknown event type received: ${eventTypeString}. This may indicate a new event type or a typo.` - ); - // Return the original string cast as EventType as a fallback - return eventTypeString as EventType; - } - - return translated; -} - -/** - * Translates ExtractorEventType enum values by converting old enum members to new ones. - * Old enum members are deprecated and should be replaced with new ones. - */ -export function translateExtractorEventType( - eventType: ExtractorEventType -): ExtractorEventType { - // Map old enum members to new enum members - const stringValue = eventType as string; - - const mapping: Record = { - // Old string values -> New enum members - [ExtractorEventType.ExtractionExternalSyncUnitsDone]: - ExtractorEventType.ExternalSyncUnitExtractionDone, - [ExtractorEventType.ExtractionExternalSyncUnitsError]: - ExtractorEventType.ExternalSyncUnitExtractionError, - [ExtractorEventType.ExtractionMetadataDone]: - ExtractorEventType.MetadataExtractionDone, - [ExtractorEventType.ExtractionMetadataError]: - ExtractorEventType.MetadataExtractionError, - [ExtractorEventType.ExtractionDataProgress]: - ExtractorEventType.DataExtractionProgress, - [ExtractorEventType.ExtractionDataDelay]: - ExtractorEventType.DataExtractionDelayed, - [ExtractorEventType.ExtractionDataDone]: - ExtractorEventType.DataExtractionDone, - [ExtractorEventType.ExtractionDataError]: - ExtractorEventType.DataExtractionError, - [ExtractorEventType.ExtractionDataDeleteDone]: - ExtractorEventType.ExtractorStateDeletionDone, - [ExtractorEventType.ExtractionDataDeleteError]: - ExtractorEventType.ExtractorStateDeletionError, - [ExtractorEventType.ExtractionAttachmentsProgress]: - ExtractorEventType.AttachmentExtractionProgress, - [ExtractorEventType.ExtractionAttachmentsDelay]: - ExtractorEventType.AttachmentExtractionDelayed, - [ExtractorEventType.ExtractionAttachmentsDone]: - ExtractorEventType.AttachmentExtractionDone, - [ExtractorEventType.ExtractionAttachmentsError]: - ExtractorEventType.AttachmentExtractionError, - [ExtractorEventType.ExtractionAttachmentsDeleteDone]: - ExtractorEventType.ExtractorAttachmentsStateDeletionDone, - [ExtractorEventType.ExtractionAttachmentsDeleteError]: - ExtractorEventType.ExtractorAttachmentsStateDeletionError, - }; - - // If there's a mapping, use it; otherwise return original (already new) - return mapping[stringValue] ?? eventType; -} - -/** - * Translates LoaderEventType enum values by converting old enum members to new ones. - * Old enum members are deprecated and should be replaced with new ones. - */ -export function translateLoaderEventType( - eventType: LoaderEventType -): LoaderEventType { - // Map old enum members to new enum members - const stringValue = eventType as string; - - const mapping: Record = { - // Old string values -> New enum members - [LoaderEventType.DataLoadingDelay]: LoaderEventType.DataLoadingDelayed, - [LoaderEventType.AttachmentsLoadingProgress]: - LoaderEventType.AttachmentLoadingProgress, - [LoaderEventType.AttachmentsLoadingDelayed]: - LoaderEventType.AttachmentLoadingDelayed, - [LoaderEventType.AttachmentsLoadingDone]: - LoaderEventType.AttachmentLoadingDone, - [LoaderEventType.AttachmentsLoadingError]: - LoaderEventType.AttachmentLoadingError, - }; - - // If there's a mapping, use it; otherwise return original (already new) - return mapping[stringValue] ?? eventType; -} - -/** - * Translates any outgoing event type (Extractor or Loader) to ensure new event types are used. - */ -export function translateOutgoingEventType( - eventType: ExtractorEventType | LoaderEventType -): ExtractorEventType | LoaderEventType { - // Check if it's an ExtractorEventType by checking if the value exists in ExtractorEventType - if ( - Object.values(ExtractorEventType).includes(eventType as ExtractorEventType) - ) { - return translateExtractorEventType(eventType as ExtractorEventType); - } - // Otherwise treat as LoaderEventType - if (Object.values(LoaderEventType).includes(eventType as LoaderEventType)) { - return translateLoaderEventType(eventType as LoaderEventType); - } - // If neither, return as-is - return eventType; -} diff --git a/src/multithreading/process-task.ts b/src/multithreading/process-task.ts index 4f3e75a..0818f55 100644 --- a/src/multithreading/process-task.ts +++ b/src/multithreading/process-task.ts @@ -1,5 +1,4 @@ import { isMainThread, parentPort, workerData } from 'node:worker_threads'; -import { translateIncomingEventType } from '../common/event-type-translation'; import { Logger, serializeError } from '../logger/logger'; import { runWithSdkLogContext, @@ -26,11 +25,6 @@ export function processTask({ try { const event = workerData.event; - // TODO: Remove when the old types are completely phased out - event.payload.event_type = translateIncomingEventType( - event.payload.event_type - ); - const initialState = workerData.initialState as ConnectorState; const initialDomainMapping = workerData.initialDomainMapping; const options = workerData.options; diff --git a/src/multithreading/spawn/spawn.helpers.ts b/src/multithreading/spawn/spawn.helpers.ts index abbe990..3db03df 100644 --- a/src/multithreading/spawn/spawn.helpers.ts +++ b/src/multithreading/spawn/spawn.helpers.ts @@ -10,48 +10,40 @@ export function getTimeoutErrorEventType(eventType: EventType): { eventType: ExtractorEventType | LoaderEventType; } { switch (eventType) { - // Metadata extraction (handles both old and new enum members) + // Metadata extraction case EventType.StartExtractingMetadata: - case EventType.ExtractionMetadataStart: return { eventType: ExtractorEventType.MetadataExtractionError, }; - // Data extraction (handles both old and new enum members) + // Data extraction case EventType.StartExtractingData: case EventType.ContinueExtractingData: - case EventType.ExtractionDataStart: - case EventType.ExtractionDataContinue: return { eventType: ExtractorEventType.DataExtractionError, }; - // Data deletion (handles both old and new enum members) + // Data deletion case EventType.StartDeletingExtractorState: - case EventType.ExtractionDataDelete: return { eventType: ExtractorEventType.ExtractorStateDeletionError, }; - // Attachments extraction (handles both old and new enum members) + // Attachments extraction case EventType.StartExtractingAttachments: case EventType.ContinueExtractingAttachments: - case EventType.ExtractionAttachmentsStart: - case EventType.ExtractionAttachmentsContinue: return { eventType: ExtractorEventType.AttachmentExtractionError, }; - // Attachments deletion (handles both old and new enum members) + // Attachments deletion case EventType.StartDeletingExtractorAttachmentsState: - case EventType.ExtractionAttachmentsDelete: return { eventType: ExtractorEventType.ExtractorAttachmentsStateDeletionError, }; - // External sync units (handles both old and new enum members) + // External sync units case EventType.StartExtractingExternalSyncUnits: - case EventType.ExtractionExternalSyncUnitsStart: return { eventType: ExtractorEventType.ExternalSyncUnitExtractionError, }; diff --git a/src/multithreading/spawn/spawn.ts b/src/multithreading/spawn/spawn.ts index 7743268..02176ff 100644 --- a/src/multithreading/spawn/spawn.ts +++ b/src/multithreading/spawn/spawn.ts @@ -2,7 +2,6 @@ import yargs from 'yargs'; import { hideBin } from 'yargs/helpers'; import { emit } from '../../common/control-protocol'; -import { translateIncomingEventType } from '../../common/event-type-translation'; import { getMemoryUsage } from '../../common/helpers'; import { Logger, serializeError } from '../../logger/logger'; import { AirSyncEvent, EventType } from '../../types/extraction'; @@ -79,14 +78,6 @@ export async function spawn({ options, baseWorkerPath, }: SpawnFactoryInterface): Promise { - // Translate incoming event type for backwards compatibility. This allows the - // SDK to accept both old and new event type formats. Then update the event with the translated event type. - const originalEventType = event.payload.event_type; - const translatedEventType = translateIncomingEventType( - event.payload.event_type as string - ); - event.payload.event_type = translatedEventType; - // Read the command line arguments to check if the local flag is passed. const argv = await yargs(hideBin(process.argv)).argv; if (argv._.includes('local') || argv.local) { @@ -100,11 +91,6 @@ export async function spawn({ // eslint-disable-next-line no-global-assign console = new Logger({ event, options }); - if (translatedEventType !== originalEventType) { - console.log( - `Event type translated from ${originalEventType} to ${translatedEventType}.` - ); - } if (options?.isLocalDevelopment) { console.log('Snap-in is running in local development mode.'); } @@ -115,11 +101,11 @@ export async function spawn({ } else if ( baseWorkerPath != null && options?.workerPathOverrides != null && - options.workerPathOverrides[translatedEventType as EventType] != null + options.workerPathOverrides[event.payload.event_type as EventType] != null ) { script = baseWorkerPath + - options.workerPathOverrides[translatedEventType as EventType]; + options.workerPathOverrides[event.payload.event_type as EventType]; } else { script = getWorkerPath({ event, diff --git a/src/multithreading/worker-adapter/worker-adapter.ts b/src/multithreading/worker-adapter/worker-adapter.ts index d938287..b89d889 100644 --- a/src/multithreading/worker-adapter/worker-adapter.ts +++ b/src/multithreading/worker-adapter/worker-adapter.ts @@ -58,7 +58,6 @@ import { } from '../../types/workers'; import { Uploader } from '../../uploader/uploader'; import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; -import { translateOutgoingEventType } from '../../common/event-type-translation'; import { truncateMessage } from '../../common/helpers'; export function createWorkerAdapter({ @@ -244,8 +243,6 @@ export class WorkerAdapter { data?: EventData ): Promise { return runWithSdkLogContext(async () => { - newEventType = translateOutgoingEventType(newEventType); - if (this.hasWorkerEmitted) { console.warn( `Trying to emit event with event type: ${newEventType}. Ignoring emit request because it has already been emitted.` diff --git a/src/types/extraction.ts b/src/types/extraction.ts index c902e0b..34189d0 100644 --- a/src/types/extraction.ts +++ b/src/types/extraction.ts @@ -14,39 +14,15 @@ import { DonV2, LoaderReport, RateLimited } from './loading'; * The external extractor can use these events to know what to do next in the extraction process. */ export enum EventType { - // Extraction - Old member names with OLD values (deprecated, kept for backwards compatibility) - /** - * @deprecated Use StartExtractingExternalSyncUnits instead - */ - ExtractionExternalSyncUnitsStart = 'EXTRACTION_EXTERNAL_SYNC_UNITS_START', - /** - * @deprecated Use StartExtractingMetadata instead - */ - ExtractionMetadataStart = 'EXTRACTION_METADATA_START', - /** - * @deprecated Use StartExtractingData instead - */ - ExtractionDataStart = 'EXTRACTION_DATA_START', - /** - * @deprecated Use ContinueExtractingData instead - */ - ExtractionDataContinue = 'EXTRACTION_DATA_CONTINUE', - /** - * @deprecated Use StartDeletingExtractorState instead - */ - ExtractionDataDelete = 'EXTRACTION_DATA_DELETE', - /** - * @deprecated Use StartExtractingAttachments instead - */ - ExtractionAttachmentsStart = 'EXTRACTION_ATTACHMENTS_START', - /** - * @deprecated Use ContinueExtractingAttachments instead - */ - ExtractionAttachmentsContinue = 'EXTRACTION_ATTACHMENTS_CONTINUE', - /** - * @deprecated Use StartDeletingExtractorAttachmentsState instead - */ - ExtractionAttachmentsDelete = 'EXTRACTION_ATTACHMENTS_DELETE', + // Extraction + StartExtractingExternalSyncUnits = 'START_EXTRACTING_EXTERNAL_SYNC_UNITS', + StartExtractingMetadata = 'START_EXTRACTING_METADATA', + StartExtractingData = 'START_EXTRACTING_DATA', + ContinueExtractingData = 'CONTINUE_EXTRACTING_DATA', + StartDeletingExtractorState = 'START_DELETING_EXTRACTOR_STATE', + StartExtractingAttachments = 'START_EXTRACTING_ATTACHMENTS', + ContinueExtractingAttachments = 'CONTINUE_EXTRACTING_ATTACHMENTS', + StartDeletingExtractorAttachmentsState = 'START_DELETING_EXTRACTOR_ATTACHMENTS_STATE', // Loading StartLoadingData = 'START_LOADING_DATA', @@ -58,16 +34,6 @@ export enum EventType { // Unknown UnknownEventType = 'UNKNOWN_EVENT_TYPE', - - // Extraction - New member names with NEW values (preferred) - StartExtractingExternalSyncUnits = 'START_EXTRACTING_EXTERNAL_SYNC_UNITS', - StartExtractingMetadata = 'START_EXTRACTING_METADATA', - StartExtractingData = 'START_EXTRACTING_DATA', - ContinueExtractingData = 'CONTINUE_EXTRACTING_DATA', - StartDeletingExtractorState = 'START_DELETING_EXTRACTOR_STATE', - StartExtractingAttachments = 'START_EXTRACTING_ATTACHMENTS', - ContinueExtractingAttachments = 'CONTINUE_EXTRACTING_ATTACHMENTS', - StartDeletingExtractorAttachmentsState = 'START_DELETING_EXTRACTOR_ATTACHMENTS_STATE', } /** @@ -75,76 +41,7 @@ export enum EventType { * The external extractor can use these events to inform AirSync about the progress of the extraction process. */ export enum ExtractorEventType { - // Extraction - Old member names with OLD values (deprecated, kept for backwards compatibility) - /** - * @deprecated Use ExternalSyncUnitExtractionDone instead - */ - ExtractionExternalSyncUnitsDone = 'EXTRACTION_EXTERNAL_SYNC_UNITS_DONE', - /** - * @deprecated Use ExternalSyncUnitExtractionError instead - */ - ExtractionExternalSyncUnitsError = 'EXTRACTION_EXTERNAL_SYNC_UNITS_ERROR', - /** - * @deprecated Use MetadataExtractionDone instead - */ - ExtractionMetadataDone = 'EXTRACTION_METADATA_DONE', - /** - * @deprecated Use MetadataExtractionError instead - */ - ExtractionMetadataError = 'EXTRACTION_METADATA_ERROR', - /** - * @deprecated Use DataExtractionProgress instead - */ - ExtractionDataProgress = 'EXTRACTION_DATA_PROGRESS', - /** - * @deprecated Use DataExtractionDelayed instead - */ - ExtractionDataDelay = 'EXTRACTION_DATA_DELAY', - /** - * @deprecated Use DataExtractionDone instead - */ - ExtractionDataDone = 'EXTRACTION_DATA_DONE', - /** - * @deprecated Use DataExtractionError instead - */ - ExtractionDataError = 'EXTRACTION_DATA_ERROR', - /** - * @deprecated Use ExtractorStateDeletionDone instead - */ - ExtractionDataDeleteDone = 'EXTRACTION_DATA_DELETE_DONE', - /** - * @deprecated Use ExtractorStateDeletionError instead - */ - ExtractionDataDeleteError = 'EXTRACTION_DATA_DELETE_ERROR', - /** - * @deprecated Use AttachmentExtractionProgress instead - */ - ExtractionAttachmentsProgress = 'EXTRACTION_ATTACHMENTS_PROGRESS', - /** - * @deprecated Use AttachmentExtractionDelayed instead - */ - ExtractionAttachmentsDelay = 'EXTRACTION_ATTACHMENTS_DELAY', - /** - * @deprecated Use AttachmentExtractionDone instead - */ - ExtractionAttachmentsDone = 'EXTRACTION_ATTACHMENTS_DONE', - /** - * @deprecated Use AttachmentExtractionError instead - */ - ExtractionAttachmentsError = 'EXTRACTION_ATTACHMENTS_ERROR', - /** - * @deprecated Use ExtractorAttachmentsStateDeletionDone instead - */ - ExtractionAttachmentsDeleteDone = 'EXTRACTION_ATTACHMENTS_DELETE_DONE', - /** - * @deprecated Use ExtractorAttachmentsStateDeletionError instead - */ - ExtractionAttachmentsDeleteError = 'EXTRACTION_ATTACHMENTS_DELETE_ERROR', - - // Unknown - UnknownEventType = 'UNKNOWN_EVENT_TYPE', - - // Extraction - New member names with NEW values (preferred) + // Extraction ExternalSyncUnitExtractionDone = 'EXTERNAL_SYNC_UNIT_EXTRACTION_DONE', ExternalSyncUnitExtractionError = 'EXTERNAL_SYNC_UNIT_EXTRACTION_ERROR', MetadataExtractionDone = 'METADATA_EXTRACTION_DONE', @@ -161,6 +58,9 @@ export enum ExtractorEventType { AttachmentExtractionError = 'ATTACHMENT_EXTRACTION_ERROR', ExtractorAttachmentsStateDeletionDone = 'EXTRACTOR_ATTACHMENTS_STATE_DELETION_DONE', ExtractorAttachmentsStateDeletionError = 'EXTRACTOR_ATTACHMENTS_STATE_DELETION_ERROR', + + // Unknown + UnknownEventType = 'UNKNOWN_EVENT_TYPE', } /** diff --git a/src/types/loading.ts b/src/types/loading.ts index 3ab48ae..37d2d30 100644 --- a/src/types/loading.ts +++ b/src/types/loading.ts @@ -135,13 +135,9 @@ export type SyncMapperRecord = { input_file?: string; }; -/* eslint-disable @typescript-eslint/no-duplicate-enum-values */ export enum LoaderEventType { DataLoadingProgress = 'DATA_LOADING_PROGRESS', - /** - * @deprecated This was a typo. Use DataLoadingDelayed for the corrected spelling - */ - DataLoadingDelay = 'DATA_LOADING_DELAYED', + DataLoadingDelayed = 'DATA_LOADING_DELAYED', DataLoadingDone = 'DATA_LOADING_DONE', DataLoadingError = 'DATA_LOADING_ERROR', @@ -157,22 +153,4 @@ export enum LoaderEventType { LoaderAttachmentStateDeletionError = 'LOADER_ATTACHMENT_STATE_DELETION_ERROR', UnknownEventType = 'UNKNOWN_EVENT_TYPE', - DataLoadingDelayed = 'DATA_LOADING_DELAYED', - - /** - * @deprecated Use AttachmentsLoadingProgress instead (note: singular changed to plural) - */ - AttachmentsLoadingProgress = 'ATTACHMENT_LOADING_PROGRESS', - /** - * @deprecated Use AttachmentsLoadingDelayed instead (note: singular changed to plural) - */ - AttachmentsLoadingDelayed = 'ATTACHMENT_LOADING_DELAYED', - /** - * @deprecated Use AttachmentsLoadingDone instead (note: singular changed to plural) - */ - AttachmentsLoadingDone = 'ATTACHMENT_LOADING_DONE', - /** - * @deprecated Use AttachmentsLoadingError instead (note: singular changed to plural) - */ - AttachmentsLoadingError = 'ATTACHMENT_LOADING_ERROR', } From 1c15f885a9f0eee1603e7c9e101558874676e90e Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 08:55:40 +0200 Subject: [PATCH 09/22] docs: mark C3 done Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index ed65b4b..108de42 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -188,7 +188,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C0 package rename | ☑ done | 8ddeb87. @devrev/ts-adaas→@devrev/airsync-sdk, v2.0.0-beta.0. Report filename rename deferred to C8. | | C1 delete + tsconfig | ☑ done | d573cb6. Deleted src/deprecated/ (6 files) + 4 index exports; added tsconfig.build.json (excludes tests), build script points to it. Reviewer-approved. event-type-translation deletion moved to C3. | | C2 AirSync rename | ☑ done | 1fa9afc. AirdropEvent→AirSyncEvent, AirdropMessage→AirSyncMessage (hard, no alias) + prose ADaaS/Airdrop→AirSync. Protected: airdrop.* routes, AIRDROP_* enum, 'ADaaS' literal. Reviewer-approved. | -| C3 enum cleanup | ☐ todo | | +| C3 enum cleanup | ☑ done | cc05f41. Deleted deprecated enum members (EventType, ExtractorEventType, LoaderEventType) + event-type-translation.ts/.test; rewired 4 callers (process-task, spawn, control-protocol, worker-adapter) + spawn.helpers cases. Behavior-equivalent. Reviewer-approved. | | C4a state split | ☐ todo | | | C4b state envelope | ☐ todo | | | C5 adapter split | ☐ todo | | From b63f3aba0db87a1304b19e61407147adea91d160 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:10:14 +0200 Subject: [PATCH 10/22] refactor(v2): split State into BaseState + ExtractionState + LoadingState Structural split of the monolithic State class into a shared abstract BaseState plus mode-specific ExtractionState and LoadingState subclasses. The on-disk state shape is unchanged: still the flat AdapterState = ConnectorState & SdkState (the connector/SDK envelope split comes in a later commit). - base-state.ts: abstract BaseState owns the shared lifecycle (init, fetch, postState) and installInitialDomainMappingIfNeeded, extracted from the old createAdapterState factory. - extraction-state.ts: ExtractionState seeds extractionSdkState and adds resolveExtractionWindow (time-value resolution, pending-boundary reuse, lastSyncStarted, window validation); createExtractionState factory. - loading-state.ts: LoadingState seeds loadingSdkState; createLoadingState factory. - state.ts: thin module re-exporting the classes/factories; createAdapterState is now a dispatcher selecting the mode-specific state by event_context.mode. - Consumers (types/workers.ts, worker-adapter.ts) reference BaseState. Behavior is preserved. Loading mode previously ran extraction-window code that was inert for loading events (no matching event types, no pending boundaries in loadingSdkState); routing it to LoadingState only drops those inert log lines. Ref: V2_PROGRESS.md C4a Co-Authored-By: Claude Opus 4.8 (1M context) --- .../worker-adapter/worker-adapter.ts | 4 +- src/state/base-state.ts | 246 ++++++++++++ src/state/extraction-state.ts | 150 ++++++++ src/state/loading-state.ts | 45 +++ src/state/state.ts | 362 ++---------------- src/types/workers.ts | 4 +- 6 files changed, 468 insertions(+), 343 deletions(-) create mode 100644 src/state/base-state.ts create mode 100644 src/state/extraction-state.ts create mode 100644 src/state/loading-state.ts diff --git a/src/multithreading/worker-adapter/worker-adapter.ts b/src/multithreading/worker-adapter/worker-adapter.ts index b89d889..23e2550 100644 --- a/src/multithreading/worker-adapter/worker-adapter.ts +++ b/src/multithreading/worker-adapter/worker-adapter.ts @@ -24,7 +24,7 @@ import { NormalizedAttachment, RepoInterface, } from '../../repo/repo.interfaces'; -import { State } from '../../state/state'; +import { BaseState } from '../../state/state'; import { AdapterState } from '../../state/state.interfaces'; import { AirSyncEvent, @@ -93,7 +93,7 @@ export class WorkerAdapter { isTimeout: boolean; hasWorkerEmitted: boolean; - private adapterState: State; + private adapterState: BaseState; private _artifacts: Artifact[]; private repos: Repo[] = []; private currentEventDataLength: number = 0; diff --git a/src/state/base-state.ts b/src/state/base-state.ts new file mode 100644 index 0000000..3c9f8a9 --- /dev/null +++ b/src/state/base-state.ts @@ -0,0 +1,246 @@ +import axios from 'axios'; +import { parentPort } from 'node:worker_threads'; + +import { installInitialDomainMapping } from '../common/install-initial-domain-mapping'; +import { axiosClient } from '../http/axios-client-internal'; +import { getPrintableState, serializeError } from '../logger/logger'; +import { InitialDomainMapping } from '../types/common'; +import { AirSyncEvent } from '../types/extraction'; +import { WorkerMessageSubject } from '../types/workers'; +import { ExtractionScope } from '../types/workers'; + +import { AdapterState, SdkState, StateInterface } from './state.interfaces'; + +/** + * BaseState owns the state lifecycle shared by every sync mode: holding the + * adapter state, fetching/initializing/posting it against the platform, and the + * snap-in-version-gated initial domain mapping install. + * + * Mode-specific subclasses (`ExtractionState`, `LoadingState`) seed the + * SDK-owned portion of the state and add mode-specific setup in their factories. + * + * @typeParam ConnectorState - the connector-owned state shape + */ +export abstract class BaseState { + protected _state: AdapterState; + protected _extractionScope: ExtractionScope = {}; + protected readonly initialSdkState: SdkState; + protected readonly event: AirSyncEvent; + private workerUrl: string; + private devrevToken: string; + private syncUnitId: string; + private requestId: string; + + constructor( + { event, initialState }: StateInterface, + initialSdkState: SdkState + ) { + this.event = event; + this.initialSdkState = initialSdkState; + this._state = { + ...initialState, + ...initialSdkState, + } as AdapterState; + this.workerUrl = event.payload.event_context.worker_data_url; + this.devrevToken = event.context.secrets.service_account_token; + this.syncUnitId = event.payload.event_context.sync_unit_id; + this.requestId = event.payload.event_context.request_id_adaas; + } + + get state(): AdapterState { + return this._state; + } + + set state(value: AdapterState) { + this._state = value; + } + + get extractionScope(): ExtractionScope { + return this._extractionScope; + } + + /** + * Installs the initial domain mapping when the snap-in version in state does + * not match the version in the event context. Shared by all modes so that a + * loading run still installs the mapping if extraction has not done so. + * @param initialDomainMapping The initial domain mapping passed to spawn + */ + async installInitialDomainMappingIfNeeded( + initialDomainMapping?: InitialDomainMapping + ): Promise { + const snapInVersionId = this.event.context.snap_in_version_id; + const hasSnapInVersionInState = 'snapInVersionId' in this.state; + const shouldUpdateIDM = + !hasSnapInVersionInState || + this.state.snapInVersionId !== snapInVersionId; + + if (!shouldUpdateIDM) { + console.log( + `Snap-in version in state matches the version in event context "${snapInVersionId}". Skipping initial domain mapping installation.` + ); + return; + } + + try { + console.log( + `Snap-in version in state "${this.state.snapInVersionId}" does not match the version in event context "${snapInVersionId}". Installing initial domain mapping.` + ); + + if (initialDomainMapping) { + await installInitialDomainMapping(this.event, initialDomainMapping); + this.state.snapInVersionId = snapInVersionId; + } else { + throw new Error( + 'No initial domain mapping was passed to spawn function. Skipping initial domain mapping installation.' + ); + } + } catch (error) { + const errorMessage = `Error while installing initial domain mapping. ${serializeError( + error + )}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + } + + /** + * Initializes the state for this adapter instance by fetching from API + * or creating an initial state if none exists (404). + * @param initialState The initial connector state provided by the spawn function + */ + async init(initialState: ConnectorState): Promise { + try { + const { state: stringifiedState, objects } = await this.fetchState(); + if (!stringifiedState) { + throw new Error('No state found in response.'); + } + + let parsedState: AdapterState; + try { + parsedState = JSON.parse(stringifiedState); + } catch (error) { + throw new Error(`Failed to parse state. ${error}`); + } + + this.state = parsedState; + console.log( + 'State fetched successfully. Current state', + getPrintableState(this.state) + ); + + if (objects) { + try { + this._extractionScope = JSON.parse(objects); + } catch (error) { + console.warn(`Failed to parse extractionScope. ${error}`); + } + } + } catch (error) { + if (axios.isAxiosError(error) && error.response?.status === 404) { + console.log('State not found. Initializing state with initial state.'); + const initialAdapterState: AdapterState = { + ...initialState, + ...this.initialSdkState, + }; + + this.state = initialAdapterState; + await this.postState(initialAdapterState); + } else { + const errorMessage = `Failed to init state. ${serializeError(error)}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + } + } + + /** + * Updates the state of the adapter by posting to API. + * @param {object} state - The state to be updated + */ + async postState(state?: AdapterState) { + const url = this.workerUrl + '.update'; + this.state = state || this.state; + + let stringifiedState: string; + try { + stringifiedState = JSON.stringify(this.state); + } catch (error) { + const errorMessage = `Failed to stringify state. ${serializeError( + error + )}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + + try { + await axiosClient.post( + url, + { + state: stringifiedState, + }, + { + headers: { + Authorization: this.devrevToken, + }, + params: { + sync_unit: this.syncUnitId, + request_id: this.requestId, + }, + } + ); + + console.log( + 'State updated successfully to', + getPrintableState(this.state) + ); + } catch (error) { + const errorMessage = `Failed to update the state. ${serializeError( + error + )}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + } + + /** + * Fetches the state of the adapter from API. + * @return The raw state data from API + */ + async fetchState(): Promise<{ state: string; objects?: string }> { + console.log( + `Fetching state with sync unit id ${this.syncUnitId} and request id ${this.requestId}.` + ); + + const url = this.workerUrl + '.get'; + const response = await axiosClient.get(url, { + headers: { + Authorization: this.devrevToken, + }, + params: { + sync_unit: this.syncUnitId, + request_id: this.requestId, + }, + }); + + return { + state: response.data?.state, + objects: response.data?.objects, + }; + } +} diff --git a/src/state/extraction-state.ts b/src/state/extraction-state.ts new file mode 100644 index 0000000..bb99a0f --- /dev/null +++ b/src/state/extraction-state.ts @@ -0,0 +1,150 @@ +import { parentPort } from 'node:worker_threads'; + +import { STATELESS_EVENT_TYPES } from '../common/constants'; +import { resolveTimeValue } from '../common/time-value-resolver'; +import { serializeError } from '../logger/logger'; +import { EventType } from '../types/extraction'; +import { WorkerMessageSubject } from '../types/workers'; + +import { BaseState } from './base-state'; +import { extractionSdkState, StateInterface } from './state.interfaces'; + +/** + * ExtractionState is the per-mode state for extraction workers. It seeds the + * extraction SDK state (extraction boundaries + attachments bookkeeping) on top + * of the shared lifecycle provided by `BaseState` and adds extraction-window + * resolution. + */ +export class ExtractionState extends BaseState { + constructor(params: StateInterface) { + super(params, extractionSdkState); + } + + /** + * Resolves the extraction window onto the event context. + * + * On StartExtractingData: stamp `lastSyncStarted` if not already set. + * On StartExtractingMetadata: resolve fresh from the TimeValue objects in the + * event context and cache them as pending boundaries (always overwrite). + * On all other events: reuse the pending boundaries cached during + * StartExtractingMetadata. Finally, validate that extract_from < extract_to. + */ + resolveExtractionWindow(): void { + // Set lastSyncStarted if the event type is StartExtractingData + if ( + this.event.payload.event_type === EventType.StartExtractingData && + !this.state.lastSyncStarted + ) { + this.state.lastSyncStarted = new Date().toISOString(); + console.log(`Setting lastSyncStarted to ${this.state.lastSyncStarted}.`); + } + + const eventContext = this.event.payload.event_context; + + if (this.event.payload.event_type === EventType.StartExtractingMetadata) { + const timeFields = [ + { + source: 'extraction_start_time', + target: 'extract_from', + pending: 'pendingWorkersOldest', + }, + { + source: 'extraction_end_time', + target: 'extract_to', + pending: 'pendingWorkersNewest', + }, + ] as const; + + for (const { source, target, pending } of timeFields) { + const timeValue = eventContext[source]; + if (timeValue && timeValue.type) { + try { + const resolved = resolveTimeValue(timeValue, this.state); + eventContext[target] = resolved; + this.state[pending] = resolved; + console.log( + `Resolved ${target} to ${resolved}. Stored in ${pending}.` + ); + } catch (error) { + const errorMessage = `Failed to resolve ${source}: ${serializeError( + error + )}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + } + } + } else { + // Non-StartExtractingMetadata events: reuse pending values from state + if (this.state.pendingWorkersOldest) { + eventContext.extract_from = this.state.pendingWorkersOldest; + console.log( + `Reusing pendingWorkersOldest as extract_from: ${this.state.pendingWorkersOldest}.` + ); + } else { + console.log( + 'pendingWorkersOldest is not set in state. extract_from will not be populated for this invocation.' + ); + } + if (this.state.pendingWorkersNewest) { + eventContext.extract_to = this.state.pendingWorkersNewest; + console.log( + `Reusing pendingWorkersNewest as extract_to: ${this.state.pendingWorkersNewest}.` + ); + } else { + console.log( + 'pendingWorkersNewest is not set in state. extract_to will not be populated for this invocation.' + ); + } + } + + // Validate that extract_from is before extract_to + if (eventContext.extract_from && eventContext.extract_to) { + if (eventContext.extract_from >= eventContext.extract_to) { + const errorMessage = `Invalid extraction window: extract_from (${eventContext.extract_from}) must be older than extract_to (${eventContext.extract_to}). This indicates an error in the platform.`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + } + } + } +} + +/** + * Creates and initializes an `ExtractionState` for an extraction worker. + * + * For non-stateless events this fetches persisted state, installs the initial + * domain mapping if the snap-in version changed, then resolves the extraction + * window (time-value resolution + pending boundary reuse) and validates it. + */ +export async function createExtractionState({ + event, + initialState, + initialDomainMapping, + options, +}: StateInterface): Promise> { + // Deep clone the initial state to avoid mutating the original state + const deepCloneInitialState: ConnectorState = structuredClone(initialState); + + const state = new ExtractionState({ + event, + initialState: deepCloneInitialState, + initialDomainMapping, + options, + }); + + if (!STATELESS_EVENT_TYPES.includes(event.payload.event_type)) { + await state.init(deepCloneInitialState); + await state.installInitialDomainMappingIfNeeded(initialDomainMapping); + state.resolveExtractionWindow(); + } + + return state; +} diff --git a/src/state/loading-state.ts b/src/state/loading-state.ts new file mode 100644 index 0000000..151f5bd --- /dev/null +++ b/src/state/loading-state.ts @@ -0,0 +1,45 @@ +import { STATELESS_EVENT_TYPES } from '../common/constants'; + +import { BaseState } from './base-state'; +import { loadingSdkState, StateInterface } from './state.interfaces'; + +/** + * LoadingState is the per-mode state for loading workers. It seeds the loading + * SDK state (files-to-load bookkeeping) on top of the shared lifecycle provided + * by `BaseState`. Loading has no extraction-window resolution. + */ +export class LoadingState extends BaseState { + constructor(params: StateInterface) { + super(params, loadingSdkState); + } +} + +/** + * Creates and initializes a `LoadingState` for a loading worker. + * + * For non-stateless events this fetches persisted state and installs the + * initial domain mapping if the snap-in version changed. + */ +export async function createLoadingState({ + event, + initialState, + initialDomainMapping, + options, +}: StateInterface): Promise> { + // Deep clone the initial state to avoid mutating the original state + const deepCloneInitialState: ConnectorState = structuredClone(initialState); + + const state = new LoadingState({ + event, + initialState: deepCloneInitialState, + initialDomainMapping, + options, + }); + + if (!STATELESS_EVENT_TYPES.includes(event.payload.event_type)) { + await state.init(deepCloneInitialState); + await state.installInitialDomainMappingIfNeeded(initialDomainMapping); + } + + return state; +} diff --git a/src/state/state.ts b/src/state/state.ts index 6f13224..ab7dbb5 100644 --- a/src/state/state.ts +++ b/src/state/state.ts @@ -1,342 +1,26 @@ -import axios from 'axios'; -import { parentPort } from 'node:worker_threads'; - -import { STATELESS_EVENT_TYPES } from '../common/constants'; -import { installInitialDomainMapping } from '../common/install-initial-domain-mapping'; -import { resolveTimeValue } from '../common/time-value-resolver'; -import { axiosClient } from '../http/axios-client-internal'; -import { getPrintableState, serializeError } from '../logger/logger'; import { SyncMode } from '../types/common'; -import { EventType } from '../types/extraction'; -import { WorkerMessageSubject } from '../types/workers'; - -import { - AdapterState, - extractionSdkState, - loadingSdkState, - SdkState, - StateInterface, -} from './state.interfaces'; -import { ExtractionScope } from '../types/workers'; - -export async function createAdapterState({ - event, - initialState, - initialDomainMapping, - options, -}: StateInterface): Promise> { - // Deep clone the initial state to avoid mutating the original state - const deepCloneInitialState: ConnectorState = structuredClone(initialState); - - const as = new State({ - event, - initialState: deepCloneInitialState, - initialDomainMapping, - options, - }); - - if (!STATELESS_EVENT_TYPES.includes(event.payload.event_type)) { - await as.init(deepCloneInitialState); - - // Check if IDM needs to be updated - const snapInVersionId = event.context.snap_in_version_id; - const hasSnapInVersionInState = 'snapInVersionId' in as.state; - const shouldUpdateIDM = - !hasSnapInVersionInState || as.state.snapInVersionId !== snapInVersionId; - - if (!shouldUpdateIDM) { - console.log( - `Snap-in version in state matches the version in event context "${snapInVersionId}". Skipping initial domain mapping installation.` - ); - } else { - try { - console.log( - `Snap-in version in state "${as.state.snapInVersionId}" does not match the version in event context "${snapInVersionId}". Installing initial domain mapping.` - ); - - if (initialDomainMapping) { - await installInitialDomainMapping(event, initialDomainMapping); - as.state.snapInVersionId = snapInVersionId; - } else { - throw new Error( - 'No initial domain mapping was passed to spawn function. Skipping initial domain mapping installation.' - ); - } - } catch (error) { - const errorMessage = `Error while installing initial domain mapping. ${serializeError( - error - )}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - } - - // Set lastSyncStarted if the event type is StartExtractingData - if ( - event.payload.event_type === EventType.StartExtractingData && - !as.state.lastSyncStarted - ) { - as.state.lastSyncStarted = new Date().toISOString(); - console.log(`Setting lastSyncStarted to ${as.state.lastSyncStarted}.`); - } - - // Resolve extraction timestamps from TimeValue objects, or reuse pending values from a prior invocation. - // On StartExtractingMetadata: resolve fresh from TimeValue objects and store in pending state (always overwrite). - // On all other events: reuse the pending values cached during StartExtractingMetadata. - const eventContext = event.payload.event_context; - if (event.payload.event_type === EventType.StartExtractingMetadata) { - const timeFields = [ - { - source: 'extraction_start_time', - target: 'extract_from', - pending: 'pendingWorkersOldest', - }, - { - source: 'extraction_end_time', - target: 'extract_to', - pending: 'pendingWorkersNewest', - }, - ] as const; - - for (const { source, target, pending } of timeFields) { - const timeValue = eventContext[source]; - if (timeValue && timeValue.type) { - try { - const resolved = resolveTimeValue(timeValue, as.state); - eventContext[target] = resolved; - as.state[pending] = resolved; - console.log( - `Resolved ${target} to ${resolved}. Stored in ${pending}.` - ); - } catch (error) { - const errorMessage = `Failed to resolve ${source}: ${serializeError( - error - )}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - } - } - } else { - // Non-StartExtractingMetadata events: reuse pending values from state - if (as.state.pendingWorkersOldest) { - eventContext.extract_from = as.state.pendingWorkersOldest; - console.log( - `Reusing pendingWorkersOldest as extract_from: ${as.state.pendingWorkersOldest}.` - ); - } else { - console.log( - 'pendingWorkersOldest is not set in state. extract_from will not be populated for this invocation.' - ); - } - if (as.state.pendingWorkersNewest) { - eventContext.extract_to = as.state.pendingWorkersNewest; - console.log( - `Reusing pendingWorkersNewest as extract_to: ${as.state.pendingWorkersNewest}.` - ); - } else { - console.log( - 'pendingWorkersNewest is not set in state. extract_to will not be populated for this invocation.' - ); - } - } - - // Validate that extract_from is before extract_to - if (eventContext.extract_from && eventContext.extract_to) { - if (eventContext.extract_from >= eventContext.extract_to) { - const errorMessage = `Invalid extraction window: extract_from (${eventContext.extract_from}) must be older than extract_to (${eventContext.extract_to}). This indicates an error in the platform.`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - } - } - - return as; -} - -export class State { - private _state: AdapterState; - private _extractionScope: ExtractionScope = {}; - private initialSdkState: SdkState; - private workerUrl: string; - private devrevToken: string; - private syncUnitId: string; - private requestId: string; - - constructor({ event, initialState }: StateInterface) { - this.initialSdkState = - event.payload.event_context.mode === SyncMode.LOADING - ? loadingSdkState - : extractionSdkState; - this._state = { - ...initialState, - ...this.initialSdkState, - } as AdapterState; - this.workerUrl = event.payload.event_context.worker_data_url; - this.devrevToken = event.context.secrets.service_account_token; - this.syncUnitId = event.payload.event_context.sync_unit_id; - this.requestId = event.payload.event_context.request_id_adaas; - } - - get state(): AdapterState { - return this._state; - } - - set state(value: AdapterState) { - this._state = value; - } - - get extractionScope(): ExtractionScope { - return this._extractionScope; - } - - /** - * Initializes the state for this adapter instance by fetching from API - * or creating an initial state if none exists (404). - * @param initialState The initial connector state provided by the spawn function - */ - async init(initialState: ConnectorState): Promise { - try { - const { state: stringifiedState, objects } = await this.fetchState(); - if (!stringifiedState) { - throw new Error('No state found in response.'); - } - - let parsedState: AdapterState; - try { - parsedState = JSON.parse(stringifiedState); - } catch (error) { - throw new Error(`Failed to parse state. ${error}`); - } - - this.state = parsedState; - console.log( - 'State fetched successfully. Current state', - getPrintableState(this.state) - ); - - if (objects) { - try { - this._extractionScope = JSON.parse(objects); - } catch (error) { - console.warn(`Failed to parse extractionScope. ${error}`); - } - } - } catch (error) { - if (axios.isAxiosError(error) && error.response?.status === 404) { - console.log('State not found. Initializing state with initial state.'); - const initialAdapterState: AdapterState = { - ...initialState, - ...this.initialSdkState, - }; - - this.state = initialAdapterState; - await this.postState(initialAdapterState); - } else { - const errorMessage = `Failed to init state. ${serializeError(error)}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - } - } - - /** - * Updates the state of the adapter by posting to API. - * @param {object} state - The state to be updated - */ - async postState(state?: AdapterState) { - const url = this.workerUrl + '.update'; - this.state = state || this.state; - - let stringifiedState: string; - try { - stringifiedState = JSON.stringify(this.state); - } catch (error) { - const errorMessage = `Failed to stringify state. ${serializeError( - error - )}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - - try { - await axiosClient.post( - url, - { - state: stringifiedState, - }, - { - headers: { - Authorization: this.devrevToken, - }, - params: { - sync_unit: this.syncUnitId, - request_id: this.requestId, - }, - } - ); - - console.log( - 'State updated successfully to', - getPrintableState(this.state) - ); - } catch (error) { - const errorMessage = `Failed to update the state. ${serializeError( - error - )}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - } - } - - /** - * Fetches the state of the adapter from API. - * @return The raw state data from API - */ - async fetchState(): Promise<{ state: string; objects?: string }> { - console.log( - `Fetching state with sync unit id ${this.syncUnitId} and request id ${this.requestId}.` - ); - - const url = this.workerUrl + '.get'; - const response = await axiosClient.get(url, { - headers: { - Authorization: this.devrevToken, - }, - params: { - sync_unit: this.syncUnitId, - request_id: this.requestId, - }, - }); - - return { - state: response.data?.state, - objects: response.data?.objects, - }; - } +import { BaseState } from './base-state'; +import { createExtractionState } from './extraction-state'; +import { createLoadingState } from './loading-state'; +import { StateInterface } from './state.interfaces'; + +export { BaseState } from './base-state'; +export { ExtractionState, createExtractionState } from './extraction-state'; +export { LoadingState, createLoadingState } from './loading-state'; + +/** + * Creates and initializes the adapter state for the current worker, dispatching + * to the extraction or loading state based on the event's sync mode. + * + * @param params The state factory parameters (event, initial state, options) + * @returns The initialized mode-specific state + */ +export async function createAdapterState( + params: StateInterface +): Promise> { + if (params.event.payload.event_context.mode === SyncMode.LOADING) { + return createLoadingState(params); + } + return createExtractionState(params); } diff --git a/src/types/workers.ts b/src/types/workers.ts index 9e3ed25..8be46ef 100644 --- a/src/types/workers.ts +++ b/src/types/workers.ts @@ -1,7 +1,7 @@ import { Worker } from 'worker_threads'; import type { LogLevel } from '../logger/logger.interfaces'; -import { State } from '../state/state'; +import { BaseState } from '../state/state'; import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; import { AirSyncEvent, EventType, ExtractorEventType } from './extraction'; @@ -20,7 +20,7 @@ import { InitialDomainMapping } from './common'; */ export interface WorkerAdapterInterface { event: AirSyncEvent; - adapterState: State; + adapterState: BaseState; options?: WorkerAdapterOptions; } From 958bc3542b4043ad86a8535babcd73723ce11eb3 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:10:24 +0200 Subject: [PATCH 11/22] docs: mark C4a done; fix oracle reference (tag v2-old-backup, not origin/v2) Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 108de42..2cdf8ac 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -13,10 +13,13 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi ## Git facts - **Working branch:** `v2` (already hard-reset to `origin/main`). - **Base commit:** `origin/main` = `5b81ef2` (feat: Add new common error enums #204). -- **Oracle (target shape):** `origin/v2` / tag `v2-old-backup` = `9202e47`. This is the PREVIOUS +- **Oracle (target shape):** tag `v2-old-backup` = `9202e47`. This is the PREVIOUS v2 attempt — it already implemented the rename, deletions, adapter split, state split+envelope, and emit-from-return, but bundled into huge unreviewable commits built on a stale base. **Use it as a structural reference / oracle only. Never copy wholesale. Re-author cleanly.** + Read oracle files with `git show v2-old-backup:src/...`. +- **IMPORTANT:** `origin/v2` is NOW our working branch (post-reset), NOT the oracle. The oracle is the + TAG `v2-old-backup`. Don't confuse them. - **Safety:** old work preserved at tag `v2-old-backup`. Force-push of `v2` is approved by Rado. ## Hard rules (apply to EVERY Phase-1 commit) @@ -50,7 +53,7 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi break the build. - **C2 — Airdrop→AirSync identifier rename.** DECIDED with Rado: - HARD rename, NO back-compat alias: `AirdropEvent`→`AirSyncEvent`, `AirdropMessage`→`AirSyncMessage`. - (origin/v2 did NOT do this rename — oracle unreliable here; do it properly.) + (v2-old-backup (oracle tag) did NOT do this rename — oracle unreliable here; do it properly.) - Update stale branding in comments/prose: bare "Airdrop" + "ADaaS" → "AirSync". - MUST NOT touch: `/internal/airdrop.*` API routes; the `AIRDROP_*` mapping enum members AND their `'airdrop_*'` string values (mappers.interface.ts); the `external_system_type: 'ADaaS'` string LITERAL @@ -70,17 +73,31 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi - Exported translation fns are NOT in index.ts (internal) — safe to delete. - **C4a — State split (structural only).** Introduce `BaseState` + `ExtractionState` + `LoadingState`. KEEP the flat `AdapterState = ConnectorState & SdkState` shape (behavior identical). - Author fresh; origin/v2 `src/state/base-state.ts` etc. are structural reference only. + Author fresh; oracle `src/state/base-state.ts` etc. guide STRUCTURE only (the oracle is already + envelope-based = C4b; do NOT copy its envelope split here). + DESIGN (decided after reading current state.ts + oracle): + - `base-state.ts`: `abstract class BaseState` holds flat `_state: AdapterState`, + `_extractionScope`, shared fields, `state` get/set, `extractionScope` get, `init()`, `fetchState()`, + `postState()`, and `installInitialDomainMappingIfNeeded()` (extracted from current factory). Ctor takes + `initialSdkState`. + - `extraction-state.ts`: `ExtractionState extends BaseState` (ctor passes `extractionSdkState`) + + `resolveExtractionWindow()` + lastSyncStarted logic; factory `createExtractionState()`. + - `loading-state.ts`: `LoadingState extends BaseState` (ctor passes `loadingSdkState`); factory + `createLoadingState()`. + - `state.ts`: keep a thin `createAdapterState()` DISPATCHER picking Extraction/Loading by + `event_context.mode === SyncMode.LOADING` (process-task stays unchanged until C6). Re-export classes. + - `state.interfaces.ts`: UNCHANGED (flat shape kept; type reshaping is C4b). + - Consumers: `types/workers.ts` + `worker-adapter.ts` change `State` type ref → `BaseState`. - **C4b — State envelope + migration.** Change on-disk shape to `{ connectorState, sdkState }`. Add migration shim: read legacy flat v1 blob → split SDK-owned keys into `sdkState` → persist envelope. - (origin/v2 `base-state.ts` has the reference impl incl. `V1_SDK_STATE_KEYS`.) + (v2-old-backup (oracle tag) `base-state.ts` has the reference impl incl. `V1_SDK_STATE_KEYS`.) - **C5 — Adapter split (structural only).** `BaseAdapter` + `ExtractionAdapter` + `LoadingAdapter`. KEEP existing `emit`-based contract working (behavior identical). Author fresh intermediate form - (this exact form exists in NO branch — origin/v2's split already assumes emit-from-return). + (this exact form exists in NO branch — v2-old-backup (oracle tag)'s split already assumes emit-from-return). - **C6 — Emit-from-return contract.** `task`/`onTimeout` return a `TaskResult` (`{ status: 'success'|'progress'|'delay'|'error', ... }`); the SDK maps status→phase event and emits exactly once; `emit` removed from public surface. `processTask` → `processExtractionTask` + - `processLoadingTask`. Reference: origin/v2 `process-task.ts`, `base-adapter.ts` (mapping keys off + `processLoadingTask`. Reference: v2-old-backup (oracle tag) `process-task.ts`, `base-adapter.ts` (mapping keys off event_type/phase, NOT off state shape — so C4b and C6 are independent). ### Phase 2 — closing / interactive (batched, done at the end) @@ -124,7 +141,7 @@ src/deprecated/uploader/index.ts Also delete `src/common/event-type-translation.ts` (+ `.test.ts`). `src/index.ts` on main exports these deprecated barrels — remove those export lines: `./deprecated/adapter`, `./deprecated/demo-extractor`, `./deprecated/http/client`, `./deprecated/uploader`, -and the `formatAxiosError` export (origin/v2 dropped it — confirm against connector usage; azure-boards imports it, so this is a migration note). +and the `formatAxiosError` export (v2-old-backup (oracle tag) dropped it — confirm against connector usage; azure-boards imports it, so this is a migration note). ### C3 — EventType (incoming): DELETE these deprecated members, keep the new ones | DELETE (old member = old VALUE) | KEEP (new member = new VALUE) | @@ -158,7 +175,7 @@ Loading members (StartLoadingData…StartDeletingLoaderAttachmentState) + Unknow | ExtractionAttachmentsError | AttachmentExtractionError | | ExtractionAttachmentsDeleteDone | ExtractorAttachmentsStateDeletionDone | | ExtractionAttachmentsDeleteError | ExtractorAttachmentsStateDeletionError | -(values for new members are the *_EXTRACTION_* / *_DELETION_* strings — see origin/v2 extraction.ts.) +(values for new members are the *_EXTRACTION_* / *_DELETION_* strings — see v2-old-backup (oracle tag) extraction.ts.) ### C3 — LoaderEventType: DELETE deprecated typo/plural members DELETE: `DataLoadingDelay` (typo), `AttachmentsLoadingProgress/Delayed/Done/Error` (the plural-typo dupes). @@ -189,7 +206,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C1 delete + tsconfig | ☑ done | d573cb6. Deleted src/deprecated/ (6 files) + 4 index exports; added tsconfig.build.json (excludes tests), build script points to it. Reviewer-approved. event-type-translation deletion moved to C3. | | C2 AirSync rename | ☑ done | 1fa9afc. AirdropEvent→AirSyncEvent, AirdropMessage→AirSyncMessage (hard, no alias) + prose ADaaS/Airdrop→AirSync. Protected: airdrop.* routes, AIRDROP_* enum, 'ADaaS' literal. Reviewer-approved. | | C3 enum cleanup | ☑ done | cc05f41. Deleted deprecated enum members (EventType, ExtractorEventType, LoaderEventType) + event-type-translation.ts/.test; rewired 4 callers (process-task, spawn, control-protocol, worker-adapter) + spawn.helpers cases. Behavior-equivalent. Reviewer-approved. | -| C4a state split | ☐ todo | | +| C4a state split | ☑ done | b63f3ab. BaseState + ExtractionState + LoadingState, flat shape preserved; state.ts is dispatcher by mode. Reviewer-confirmed behavior-equivalent (loading only loses inert logs). | | C4b state envelope | ☐ todo | | | C5 adapter split | ☐ todo | | | C6 emit-from-return | ☐ todo | | From 30ba1b370504daa34439f7a4db0f14aed259dd2b Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:24:25 +0200 Subject: [PATCH 12/22] refactor(v2)!: persist state as { connectorState, sdkState } envelope Separate connector-owned state from SDK bookkeeping on disk. State is now persisted as a v2 envelope { connectorState, sdkState } instead of a single flat blob that merged both, so SDK internals stay encapsulated and can never collide with connector keys. - state.interfaces.ts: add AdapterStateEnvelope and V1_SDK_STATE_KEYS (the union of the SDK-owned initial-state keys); mark the flat AdapterState deprecated. SdkState stays a single combined interface (narrowing into per-mode variants is deferred to the adapter split). - base-state.ts: hold _connectorState and _sdkState separately; the state getter/setter now exposes connector state only; add an sdkState getter/setter for SDK internals; init() routes fetched state through a normalizeFetchedState() migration shim; postState() persists the envelope. - Migration shim: a v2 envelope is used as-is; a legacy flat v1 blob is split by V1_SDK_STATE_KEYS (SDK keys to sdkState, the rest to connectorState); a half-envelope or non-object fails loud. Existing syncs migrate on first read. - extraction-state.ts: resolve the extraction window against sdkState. - worker-adapter.ts: adapter.state now returns ConnectorState; add an adapter.sdkState getter; all internal SDK-field access goes through sdkState. - attachments-streaming-pool.ts: read toDevRev via adapter.sdkState. BREAKING CHANGE: adapter.state no longer exposes SDK bookkeeping fields (lastSyncStarted, workersOldest, toDevRev, fromDevRev, snapInVersionId, ...). Connector state is now disjoint from SDK state. Persisted v1 state is migrated automatically on read. Ref: V2_PROGRESS.md C4b Co-Authored-By: Claude Opus 4.8 (1M context) --- .../attachments-streaming-pool.ts | 18 +-- .../worker-adapter/worker-adapter.ts | 62 +++---- src/state/base-state.ts | 153 +++++++++++++----- src/state/extraction-state.ts | 24 +-- src/state/state.interfaces.ts | 30 +++- 5 files changed, 199 insertions(+), 88 deletions(-) diff --git a/src/attachments-streaming/attachments-streaming-pool.ts b/src/attachments-streaming/attachments-streaming-pool.ts index 1690386..4dbc1ca 100644 --- a/src/attachments-streaming/attachments-streaming-pool.ts +++ b/src/attachments-streaming/attachments-streaming-pool.ts @@ -74,7 +74,7 @@ export class AttachmentsStreamingPool { `Starting download of ${this.attachments.length} attachments, streaming ${this.batchSize} at once.` ); - if (!this.adapter.state.toDevRev) { + if (!this.adapter.sdkState.toDevRev) { const error = new Error('toDevRev state is not initialized'); console.error(error); return { error }; @@ -83,17 +83,17 @@ export class AttachmentsStreamingPool { // Get the list of successfully processed attachments in previous (possibly incomplete) batch extraction. // If no such list exists, create an empty one. if ( - !this.adapter.state.toDevRev.attachmentsMetadata + !this.adapter.sdkState.toDevRev.attachmentsMetadata .lastProcessedAttachmentsIdsList ) { - this.adapter.state.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList = + this.adapter.sdkState.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList = []; } // Migrate old processed attachments to the new format. - this.adapter.state.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList = + this.adapter.sdkState.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList = this.migrateProcessedAttachments( - this.adapter.state.toDevRev.attachmentsMetadata + this.adapter.sdkState.toDevRev.attachmentsMetadata .lastProcessedAttachmentsIdsList ); @@ -139,8 +139,8 @@ export class AttachmentsStreamingPool { } if ( - this.adapter.state.toDevRev && - this.adapter.state.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList?.some( + this.adapter.sdkState.toDevRev && + this.adapter.sdkState.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList?.some( (it) => it.id == attachment.id && it.parent_id == attachment.parent_id ) ) { @@ -180,10 +180,10 @@ export class AttachmentsStreamingPool { // No rate limiting, process normally if ( - this.adapter.state.toDevRev?.attachmentsMetadata + this.adapter.sdkState.toDevRev?.attachmentsMetadata ?.lastProcessedAttachmentsIdsList ) { - this.adapter.state.toDevRev?.attachmentsMetadata.lastProcessedAttachmentsIdsList.push( + this.adapter.sdkState.toDevRev?.attachmentsMetadata.lastProcessedAttachmentsIdsList.push( { id: attachment.id, parent_id: attachment.parent_id } ); } diff --git a/src/multithreading/worker-adapter/worker-adapter.ts b/src/multithreading/worker-adapter/worker-adapter.ts index 23e2550..aae497a 100644 --- a/src/multithreading/worker-adapter/worker-adapter.ts +++ b/src/multithreading/worker-adapter/worker-adapter.ts @@ -25,7 +25,7 @@ import { RepoInterface, } from '../../repo/repo.interfaces'; import { BaseState } from '../../state/state'; -import { AdapterState } from '../../state/state.interfaces'; +import { SdkState } from '../../state/state.interfaces'; import { AirSyncEvent, EventData, @@ -129,14 +129,20 @@ export class WorkerAdapter { }); } - get state(): AdapterState { + /** Connector-owned state exposed to snap-in code. */ + get state(): ConnectorState { return this.adapterState.state; } - set state(value: AdapterState) { + set state(value: ConnectorState) { this.adapterState.state = value; } + /** SDK-internal bookkeeping state. Used by SDK internals; not for connector use. */ + get sdkState(): SdkState { + return this.adapterState.sdkState; + } + get reports(): LoaderReport[] { return this.loaderReports; } @@ -177,7 +183,7 @@ export class WorkerAdapter { onUpload: (artifact: Artifact) => { // We need to store artifacts ids in state for later use when streaming attachments if (repo.itemType === AirSyncDefaultItemTypes.ATTACHMENTS) { - this.state.toDevRev?.attachmentsMetadata.artifactIds.push( + this.sdkState.toDevRev?.attachmentsMetadata.artifactIds.push( artifact.id ); } @@ -296,15 +302,15 @@ export class WorkerAdapter { // If the extraction is done, we want to save the timestamp of the last successful sync if (newEventType === ExtractorEventType.AttachmentExtractionDone) { console.log( - `Overwriting lastSuccessfulSyncStarted with lastSyncStarted (${this.state.lastSyncStarted}).` + `Overwriting lastSuccessfulSyncStarted with lastSyncStarted (${this.sdkState.lastSyncStarted}).` ); - this.state.lastSuccessfulSyncStarted = this.state.lastSyncStarted; - this.state.lastSyncStarted = ''; + this.sdkState.lastSuccessfulSyncStarted = this.sdkState.lastSyncStarted; + this.sdkState.lastSyncStarted = ''; // Clear pending extraction boundaries now that the cycle is complete - this.state.pendingWorkersOldest = ''; - this.state.pendingWorkersNewest = ''; + this.sdkState.pendingWorkersOldest = ''; + this.sdkState.pendingWorkersNewest = ''; // Update workersOldest and workersNewest boundaries from resolved extraction timestamps. // Expand boundaries: workersOldest gets the earliest timestamp, workersNewest gets the latest. @@ -313,24 +319,24 @@ export class WorkerAdapter { if ( extractionStart && - (!this.state.workersOldest || - extractionStart < this.state.workersOldest) + (!this.sdkState.workersOldest || + extractionStart < this.sdkState.workersOldest) ) { console.log( - `Updating workersOldest from '${this.state.workersOldest}' to '${extractionStart}'.` + `Updating workersOldest from '${this.sdkState.workersOldest}' to '${extractionStart}'.` ); - this.state.workersOldest = extractionStart; + this.sdkState.workersOldest = extractionStart; } if ( extractionEnd && - (!this.state.workersNewest || - extractionEnd > this.state.workersNewest) + (!this.sdkState.workersNewest || + extractionEnd > this.sdkState.workersNewest) ) { console.log( - `Updating workersNewest from '${this.state.workersNewest}' to '${extractionEnd}'.` + `Updating workersNewest from '${this.sdkState.workersNewest}' to '${extractionEnd}'.` ); - this.state.workersNewest = extractionEnd; + this.sdkState.workersNewest = extractionEnd; } } @@ -423,14 +429,14 @@ export class WorkerAdapter { const filesToLoad = await this.getLoaderBatches({ supportedItemTypes: itemTypes, }); - this.adapterState.state.fromDevRev = { + this.adapterState.sdkState.fromDevRev = { filesToLoad, }; } if ( - !this.adapterState.state.fromDevRev || - !this.adapterState.state.fromDevRev.filesToLoad.length + !this.adapterState.sdkState.fromDevRev || + !this.adapterState.sdkState.fromDevRev.filesToLoad.length ) { console.warn('No files to load, returning.'); return { @@ -441,12 +447,12 @@ export class WorkerAdapter { console.log( 'Files to load in state', - this.adapterState.state.fromDevRev?.filesToLoad + this.adapterState.sdkState.fromDevRev?.filesToLoad ); try { - outerloop: for (const fileToLoad of this.adapterState.state.fromDevRev - .filesToLoad) { + outerloop: for (const fileToLoad of this.adapterState.sdkState + .fromDevRev.filesToLoad) { const itemTypeToLoad = itemTypesToLoad.find( (itemTypeToLoad: ItemTypeToLoad) => itemTypeToLoad.itemType === fileToLoad.itemType @@ -580,7 +586,7 @@ export class WorkerAdapter { }): Promise { return runWithSdkLogContext(async () => { if (this.event.payload.event_type === EventType.StartLoadingAttachments) { - this.adapterState.state.fromDevRev = { + this.adapterState.sdkState.fromDevRev = { filesToLoad: await this.getLoaderBatches({ supportedItemTypes: ['attachment'], }), @@ -588,8 +594,8 @@ export class WorkerAdapter { } if ( - !this.adapterState.state.fromDevRev || - this.adapterState.state.fromDevRev?.filesToLoad.length === 0 + !this.adapterState.sdkState.fromDevRev || + this.adapterState.sdkState.fromDevRev?.filesToLoad.length === 0 ) { console.log('No files to load, returning.'); return { @@ -598,7 +604,7 @@ export class WorkerAdapter { }; } - const filesToLoad = this.adapterState.state.fromDevRev?.filesToLoad; + const filesToLoad = this.adapterState.sdkState.fromDevRev?.filesToLoad; try { outerloop: for (const fileToLoad of filesToLoad) { @@ -1123,7 +1129,7 @@ export class WorkerAdapter { ]; this.initializeRepos(repos); - const attachmentsMetadata = this.state.toDevRev?.attachmentsMetadata; + const attachmentsMetadata = this.sdkState.toDevRev?.attachmentsMetadata; // If there are no attachments metadata artifact IDs in state, finish here if (!attachmentsMetadata?.artifactIds?.length) { diff --git a/src/state/base-state.ts b/src/state/base-state.ts index 3c9f8a9..06db895 100644 --- a/src/state/base-state.ts +++ b/src/state/base-state.ts @@ -9,12 +9,17 @@ import { AirSyncEvent } from '../types/extraction'; import { WorkerMessageSubject } from '../types/workers'; import { ExtractionScope } from '../types/workers'; -import { AdapterState, SdkState, StateInterface } from './state.interfaces'; +import { + AdapterStateEnvelope, + SdkState, + StateInterface, + V1_SDK_STATE_KEYS, +} from './state.interfaces'; /** - * BaseState owns the state lifecycle shared by every sync mode: holding the - * adapter state, fetching/initializing/posting it against the platform, and the - * snap-in-version-gated initial domain mapping install. + * BaseState owns the state lifecycle shared by every sync mode: connector vs. + * SDK state separation, fetch/init/post against the platform, the v1->v2 + * migration shim, and the snap-in-version-gated initial domain mapping install. * * Mode-specific subclasses (`ExtractionState`, `LoadingState`) seed the * SDK-owned portion of the state and add mode-specific setup in their factories. @@ -22,7 +27,8 @@ import { AdapterState, SdkState, StateInterface } from './state.interfaces'; * @typeParam ConnectorState - the connector-owned state shape */ export abstract class BaseState { - protected _state: AdapterState; + protected _connectorState: ConnectorState; + protected _sdkState: SdkState; protected _extractionScope: ExtractionScope = {}; protected readonly initialSdkState: SdkState; protected readonly event: AirSyncEvent; @@ -37,22 +43,30 @@ export abstract class BaseState { ) { this.event = event; this.initialSdkState = initialSdkState; - this._state = { - ...initialState, - ...initialSdkState, - } as AdapterState; + this._connectorState = initialState; + this._sdkState = { ...initialSdkState }; this.workerUrl = event.payload.event_context.worker_data_url; this.devrevToken = event.context.secrets.service_account_token; this.syncUnitId = event.payload.event_context.sync_unit_id; this.requestId = event.payload.event_context.request_id_adaas; } - get state(): AdapterState { - return this._state; + /** Connector-owned state. This is what `adapter.state` exposes to snap-in code. */ + get state(): ConnectorState { + return this._connectorState; } - set state(value: AdapterState) { - this._state = value; + set state(value: ConnectorState) { + this._connectorState = value; + } + + /** SDK-internal bookkeeping state. Never exposed to connector code. */ + get sdkState(): SdkState { + return this._sdkState; + } + + set sdkState(value: SdkState) { + this._sdkState = value; } get extractionScope(): ExtractionScope { @@ -69,10 +83,10 @@ export abstract class BaseState { initialDomainMapping?: InitialDomainMapping ): Promise { const snapInVersionId = this.event.context.snap_in_version_id; - const hasSnapInVersionInState = 'snapInVersionId' in this.state; + const hasSnapInVersionInState = 'snapInVersionId' in this.sdkState; const shouldUpdateIDM = !hasSnapInVersionInState || - this.state.snapInVersionId !== snapInVersionId; + this.sdkState.snapInVersionId !== snapInVersionId; if (!shouldUpdateIDM) { console.log( @@ -83,12 +97,12 @@ export abstract class BaseState { try { console.log( - `Snap-in version in state "${this.state.snapInVersionId}" does not match the version in event context "${snapInVersionId}". Installing initial domain mapping.` + `Snap-in version in state "${this.sdkState.snapInVersionId}" does not match the version in event context "${snapInVersionId}". Installing initial domain mapping.` ); if (initialDomainMapping) { await installInitialDomainMapping(this.event, initialDomainMapping); - this.state.snapInVersionId = snapInVersionId; + this.sdkState.snapInVersionId = snapInVersionId; } else { throw new Error( 'No initial domain mapping was passed to spawn function. Skipping initial domain mapping installation.' @@ -110,6 +124,10 @@ export abstract class BaseState { /** * Initializes the state for this adapter instance by fetching from API * or creating an initial state if none exists (404). + * + * Reads both the v2 `{ connectorState, sdkState }` envelope and a legacy flat + * v1 blob (connector keys merged with SDK keys), migrating the latter on read. + * Always persists the v2 envelope going forward. * @param initialState The initial connector state provided by the spawn function */ async init(initialState: ConnectorState): Promise { @@ -119,18 +137,23 @@ export abstract class BaseState { throw new Error('No state found in response.'); } - let parsedState: AdapterState; + let parsed: unknown; try { - parsedState = JSON.parse(stringifiedState); + parsed = JSON.parse(stringifiedState); } catch (error) { throw new Error(`Failed to parse state. ${error}`); } - this.state = parsedState; - console.log( - 'State fetched successfully. Current state', - getPrintableState(this.state) - ); + const { connectorState, sdkState } = this.normalizeFetchedState(parsed); + this.state = connectorState; + this.sdkState = sdkState; + + console.log('State fetched successfully. Current state', { + connectorState: getPrintableState( + this.state as Record + ), + sdkState: getPrintableState(this.sdkState), + }); if (objects) { try { @@ -142,13 +165,9 @@ export abstract class BaseState { } catch (error) { if (axios.isAxiosError(error) && error.response?.status === 404) { console.log('State not found. Initializing state with initial state.'); - const initialAdapterState: AdapterState = { - ...initialState, - ...this.initialSdkState, - }; - - this.state = initialAdapterState; - await this.postState(initialAdapterState); + this.state = initialState; + this.sdkState = { ...this.initialSdkState }; + await this.postState(); } else { const errorMessage = `Failed to init state. ${serializeError(error)}`; console.error(errorMessage); @@ -161,17 +180,73 @@ export abstract class BaseState { } } + /** + * Normalizes a parsed on-disk state into the `{ connectorState, sdkState }` + * envelope, migrating a legacy flat v1 blob if needed. + * + * - v2 envelope (`{ connectorState, sdkState }`): used as-is. + * - v1 flat blob: SDK-owned keys (`V1_SDK_STATE_KEYS`) split into `sdkState`, + * everything else becomes connector state. + * - Malformed envelope (one side present, the other missing) fails loud. + */ + private normalizeFetchedState(parsed: unknown): { + connectorState: ConnectorState; + sdkState: SdkState; + } { + if (parsed === null || typeof parsed !== 'object') { + throw new Error('Fetched state is not a JSON object.'); + } + + const record = parsed as Record; + const hasConnector = 'connectorState' in record; + const hasSdk = 'sdkState' in record; + + if (hasConnector || hasSdk) { + if (!hasConnector || !hasSdk) { + throw new Error( + 'Malformed state envelope: expected both "connectorState" and "sdkState".' + ); + } + return { + connectorState: record.connectorState as ConnectorState, + sdkState: { ...this.initialSdkState, ...(record.sdkState as SdkState) }, + }; + } + + // Legacy flat v1 blob: split known SDK keys out of the connector state. + const connectorState: Record = {}; + const sdkState: Record = {}; + for (const [key, value] of Object.entries(record)) { + if (V1_SDK_STATE_KEYS.has(key)) { + sdkState[key] = value; + } else { + connectorState[key] = value; + } + } + + return { + connectorState: connectorState as ConnectorState, + sdkState: { ...this.initialSdkState, ...(sdkState as SdkState) }, + }; + } + /** * Updates the state of the adapter by posting to API. - * @param {object} state - The state to be updated + * Persists the v2 `{ connectorState, sdkState }` envelope. + * @param {object} state - The connector state to be updated */ - async postState(state?: AdapterState) { + async postState(state?: ConnectorState) { const url = this.workerUrl + '.update'; this.state = state || this.state; + const envelope: AdapterStateEnvelope = { + connectorState: this.state, + sdkState: this.sdkState, + }; + let stringifiedState: string; try { - stringifiedState = JSON.stringify(this.state); + stringifiedState = JSON.stringify(envelope); } catch (error) { const errorMessage = `Failed to stringify state. ${serializeError( error @@ -201,10 +276,12 @@ export abstract class BaseState { } ); - console.log( - 'State updated successfully to', - getPrintableState(this.state) - ); + console.log('State updated successfully to', { + connectorState: getPrintableState( + this.state as Record + ), + sdkState: getPrintableState(this.sdkState), + }); } catch (error) { const errorMessage = `Failed to update the state. ${serializeError( error diff --git a/src/state/extraction-state.ts b/src/state/extraction-state.ts index bb99a0f..b78dbbc 100644 --- a/src/state/extraction-state.ts +++ b/src/state/extraction-state.ts @@ -30,13 +30,15 @@ export class ExtractionState extends BaseState { * StartExtractingMetadata. Finally, validate that extract_from < extract_to. */ resolveExtractionWindow(): void { + const sdkState = this.sdkState; + // Set lastSyncStarted if the event type is StartExtractingData if ( this.event.payload.event_type === EventType.StartExtractingData && - !this.state.lastSyncStarted + !sdkState.lastSyncStarted ) { - this.state.lastSyncStarted = new Date().toISOString(); - console.log(`Setting lastSyncStarted to ${this.state.lastSyncStarted}.`); + sdkState.lastSyncStarted = new Date().toISOString(); + console.log(`Setting lastSyncStarted to ${sdkState.lastSyncStarted}.`); } const eventContext = this.event.payload.event_context; @@ -59,9 +61,9 @@ export class ExtractionState extends BaseState { const timeValue = eventContext[source]; if (timeValue && timeValue.type) { try { - const resolved = resolveTimeValue(timeValue, this.state); + const resolved = resolveTimeValue(timeValue, sdkState); eventContext[target] = resolved; - this.state[pending] = resolved; + sdkState[pending] = resolved; console.log( `Resolved ${target} to ${resolved}. Stored in ${pending}.` ); @@ -80,20 +82,20 @@ export class ExtractionState extends BaseState { } } else { // Non-StartExtractingMetadata events: reuse pending values from state - if (this.state.pendingWorkersOldest) { - eventContext.extract_from = this.state.pendingWorkersOldest; + if (sdkState.pendingWorkersOldest) { + eventContext.extract_from = sdkState.pendingWorkersOldest; console.log( - `Reusing pendingWorkersOldest as extract_from: ${this.state.pendingWorkersOldest}.` + `Reusing pendingWorkersOldest as extract_from: ${sdkState.pendingWorkersOldest}.` ); } else { console.log( 'pendingWorkersOldest is not set in state. extract_from will not be populated for this invocation.' ); } - if (this.state.pendingWorkersNewest) { - eventContext.extract_to = this.state.pendingWorkersNewest; + if (sdkState.pendingWorkersNewest) { + eventContext.extract_to = sdkState.pendingWorkersNewest; console.log( - `Reusing pendingWorkersNewest as extract_to: ${this.state.pendingWorkersNewest}.` + `Reusing pendingWorkersNewest as extract_to: ${sdkState.pendingWorkersNewest}.` ); } else { console.log( diff --git a/src/state/state.interfaces.ts b/src/state/state.interfaces.ts index 6c28cf3..9e10577 100644 --- a/src/state/state.interfaces.ts +++ b/src/state/state.interfaces.ts @@ -30,11 +30,25 @@ export interface SdkState { } /** - * AdapterState is an interface that defines the structure of the adapter state that is used by the external extractor. - * It extends the connector state with additional fields: lastSyncStarted, lastSuccessfulSyncStarted, snapInVersionId and attachmentsMetadata. + * AdapterState is the legacy (v1) flat adapter state: connector state merged + * with SDK bookkeeping in a single object. + * + * @deprecated v2 persists the `{ connectorState, sdkState }` envelope + * (see {@link AdapterStateEnvelope}). Connector state is now exposed via + * `adapter.state` and SDK state is kept internal. */ export type AdapterState = ConnectorState & SdkState; +/** + * AdapterStateEnvelope is the v2 on-disk state shape: connector state and SDK + * bookkeeping are stored as disjoint sub-objects so SDK internals stay + * encapsulated and never collide with connector keys. + */ +export interface AdapterStateEnvelope { + connectorState: ConnectorState; + sdkState: SdkState; +} + export interface ToDevRev { attachmentsMetadata: { artifactIds: string[]; @@ -85,3 +99,15 @@ export const loadingSdkState = { filesToLoad: [], }, }; + +/** + * The set of top-level state keys owned by the SDK. Derived from the initial + * SDK state constants so it auto-updates whenever a new SDK field is added. + * Used by the migration shim to split a flat v1 state blob into the + * `{ connectorState, sdkState }` envelope: keys in this set go to `sdkState`, + * everything else is connector state. + */ +export const V1_SDK_STATE_KEYS: ReadonlySet = new Set([ + ...Object.keys(extractionSdkState), + ...Object.keys(loadingSdkState), +]); From 0da1331924c701c877723c9b646892348880ec5e Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:25:00 +0200 Subject: [PATCH 13/22] docs: mark C4b done; note SdkState narrowing deferred to C5 Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 2cdf8ac..609a2ef 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -90,7 +90,22 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi - Consumers: `types/workers.ts` + `worker-adapter.ts` change `State` type ref → `BaseState`. - **C4b — State envelope + migration.** Change on-disk shape to `{ connectorState, sdkState }`. Add migration shim: read legacy flat v1 blob → split SDK-owned keys into `sdkState` → persist envelope. - (v2-old-backup (oracle tag) `base-state.ts` has the reference impl incl. `V1_SDK_STATE_KEYS`.) + (oracle `base-state.ts` has the reference impl incl. `V1_SDK_STATE_KEYS`.) + DESIGN DECISION: keep a SINGLE combined `SdkState` for C4b (do NOT narrow into Base/Extraction/Loading + SdkState — oracle narrows but that's coupled to the adapter split; defer to C5). `BaseState` + keeps ONE type param; gains `_connectorState` + `_sdkState: SdkState`. Keep deprecated lastSyncStarted/ + lastSuccessfulSyncStarted in SdkState (still live: stamped in extraction-state, promoted in worker-adapter, + read by time-value-resolver). Blast radius (SDK-field access moves off `.state` → `.sdkState`): + - state.interfaces.ts: add AdapterStateEnvelope + V1_SDK_STATE_KEYS; keep SdkState/initials/AdapterState. + - base-state.ts: `_connectorState`+`_sdkState`; `state` getter→connector, add `sdkState` getter; + init/postState use envelope; private normalizeFetchedState() (v2 as-is; v1 flat split by V1_SDK_STATE_KEYS; + malformed→throw). installIDM uses this.sdkState.snapInVersionId. + - extraction-state.ts: resolveExtractionWindow this.state.X→this.sdkState.X; pass this.sdkState to resolver. + - worker-adapter.ts: get state()→ConnectorState (KEY public breaking change); add get sdkState(); internal + this.state.→this.sdkState; this.adapterState.state.fromDevRev→this.adapterState.sdkState.fromDevRev. + - attachments-streaming-pool.ts: this.adapter.state.toDevRev→this.adapter.sdkState.toDevRev. + - time-value-resolver.ts: signature unchanged. + BREAKING: connectors reading SDK fields via adapter.state break; on-disk state auto-migrates v1→v2 on read. - **C5 — Adapter split (structural only).** `BaseAdapter` + `ExtractionAdapter` + `LoadingAdapter`. KEEP existing `emit`-based contract working (behavior identical). Author fresh intermediate form (this exact form exists in NO branch — v2-old-backup (oracle tag)'s split already assumes emit-from-return). @@ -207,8 +222,8 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C2 AirSync rename | ☑ done | 1fa9afc. AirdropEvent→AirSyncEvent, AirdropMessage→AirSyncMessage (hard, no alias) + prose ADaaS/Airdrop→AirSync. Protected: airdrop.* routes, AIRDROP_* enum, 'ADaaS' literal. Reviewer-approved. | | C3 enum cleanup | ☑ done | cc05f41. Deleted deprecated enum members (EventType, ExtractorEventType, LoaderEventType) + event-type-translation.ts/.test; rewired 4 callers (process-task, spawn, control-protocol, worker-adapter) + spawn.helpers cases. Behavior-equivalent. Reviewer-approved. | | C4a state split | ☑ done | b63f3ab. BaseState + ExtractionState + LoadingState, flat shape preserved; state.ts is dispatcher by mode. Reviewer-confirmed behavior-equivalent (loading only loses inert logs). | -| C4b state envelope | ☐ todo | | -| C5 adapter split | ☐ todo | | +| C4b state envelope | ☑ done | 30ba1b3. { connectorState, sdkState } envelope + v1->v2 migration shim (normalizeFetchedState). adapter.state→connector-only, new adapter.sdkState; ~28 SDK-field access sites moved. SdkState kept combined (narrowing deferred to C5). Reviewer-approved (migration cases verified). | +| C5 adapter split | ☐ todo | NOTE for C5: narrow SdkState into Base/Extraction/Loading variants here (oracle state.interfaces shows the split). | | C6 emit-from-return | ☐ todo | | | C7 JSDoc | ☐ todo | Phase 2 | | C8 api report | ☐ todo | Phase 2 | From a7a877fa255021a542ee9abd6a1249c29f87dafd Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:41:37 +0200 Subject: [PATCH 14/22] refactor(v2)!: split WorkerAdapter into Base/Extraction/Loading adapters Replace the monolithic WorkerAdapter with a BaseAdapter holding the shared state access and the emit() control-protocol flow as a template method, plus ExtractionAdapter and LoadingAdapter subclasses that own their mode-specific surface. - base-adapter.ts: abstract BaseAdapter owns event/options/isTimeout/ hasWorkerEmitted, adapterState, uploader, state/sdkState/extractionScope accessors, postState(), and emit(). emit() is a template method with three hooks: beforeEmit (pre-emit work), buildEmitPayload (mode payload extras), afterEmit (post-emit cleanup). - extraction-adapter.ts: ExtractionAdapter owns repos, artifacts, attachment streaming/processing. beforeEmit uploads repos and updates extraction boundaries (incl. lastSuccessfulSyncStarted promotion); buildEmitPayload adds artifacts; afterEmit clears them. An override emit() handles the inline external-sync-unit upload before delegating to BaseAdapter.emit(). - loading-adapter.ts: LoadingAdapter owns mappers, loader reports, item and attachment loading. buildEmitPayload adds reports + processed_files. - worker-adapter.helpers.ts moved to adapters/loading-adapter.helpers.ts (loader-only). Old worker-adapter.ts removed. - WorkerAdapter is now a type alias ExtractionAdapter | LoadingAdapter. processTask builds the concrete adapter from the event's sync mode and passes it to task/onTimeout. Attachment processor types and the streaming pool are typed to ExtractionAdapter. Behavior is preserved: emit() runs the same steps in the same order, the mode-specific payloads are unchanged, and the extraction-boundary bookkeeping is identical. The emit(eventType, data) signature is kept; the emit-from-return contract change is a separate commit. BREAKING CHANGE: WorkerAdapter is no longer a constructable class. Extraction tasks receive an ExtractionAdapter and loading tasks a LoadingAdapter; the methods available are scoped to the worker's mode. Ref: V2_PROGRESS.md C5 Co-Authored-By: Claude Opus 4.8 (1M context) --- .../attachments-streaming-pool.interfaces.ts | 4 +- .../attachments-streaming-pool.ts | 4 +- src/index.ts | 4 +- src/multithreading/adapters/base-adapter.ts | 185 +++ .../adapters/extraction-adapter.ts | 535 +++++++ .../loading-adapter.helpers.ts} | 0 .../adapters/loading-adapter.ts | 645 +++++++++ src/multithreading/process-task.ts | 22 +- .../worker-adapter/worker-adapter.ts | 1244 ----------------- src/types/extraction.ts | 6 +- src/types/workers.ts | 13 +- 11 files changed, 1403 insertions(+), 1259 deletions(-) create mode 100644 src/multithreading/adapters/base-adapter.ts create mode 100644 src/multithreading/adapters/extraction-adapter.ts rename src/multithreading/{worker-adapter/worker-adapter.helpers.ts => adapters/loading-adapter.helpers.ts} (100%) create mode 100644 src/multithreading/adapters/loading-adapter.ts delete mode 100644 src/multithreading/worker-adapter/worker-adapter.ts diff --git a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts index 9067c4f..9a22ea0 100644 --- a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts +++ b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts @@ -2,10 +2,10 @@ import { ExternalSystemAttachmentStreamingFunction, NormalizedAttachment, } from '../types'; -import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; +import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; export interface AttachmentsStreamingPoolParams { - adapter: WorkerAdapter; + adapter: ExtractionAdapter; attachments: NormalizedAttachment[]; batchSize?: number; stream: ExternalSystemAttachmentStreamingFunction; diff --git a/src/attachments-streaming/attachments-streaming-pool.ts b/src/attachments-streaming/attachments-streaming-pool.ts index 4dbc1ca..58281e1 100644 --- a/src/attachments-streaming/attachments-streaming-pool.ts +++ b/src/attachments-streaming/attachments-streaming-pool.ts @@ -1,5 +1,5 @@ import { sleep } from '../common/helpers'; -import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; +import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; import { ProcessedAttachment } from '../state/state.interfaces'; import { ExternalSystemAttachmentStreamingFunction, @@ -9,7 +9,7 @@ import { import { AttachmentsStreamingPoolParams } from './attachments-streaming-pool.interfaces'; export class AttachmentsStreamingPool { - private adapter: WorkerAdapter; + private adapter: ExtractionAdapter; private attachments: NormalizedAttachment[]; private batchSize: number; private delay: number | undefined; diff --git a/src/index.ts b/src/index.ts index b4b94dc..ce50689 100644 --- a/src/index.ts +++ b/src/index.ts @@ -11,7 +11,9 @@ export type { } from './mock-server/mock-server.interfaces'; export { processTask } from './multithreading/process-task'; export { spawn } from './multithreading/spawn/spawn'; -export { WorkerAdapter } from './multithreading/worker-adapter/worker-adapter'; +export { BaseAdapter } from './multithreading/adapters/base-adapter'; +export { ExtractionAdapter } from './multithreading/adapters/extraction-adapter'; +export { LoadingAdapter } from './multithreading/adapters/loading-adapter'; export { createMockEvent, MOCK_SERVER_DEFAULT_URL } from './common/test-utils'; export type { DeepPartial } from './common/test-utils'; export * from './types'; diff --git a/src/multithreading/adapters/base-adapter.ts b/src/multithreading/adapters/base-adapter.ts new file mode 100644 index 0000000..e05c541 --- /dev/null +++ b/src/multithreading/adapters/base-adapter.ts @@ -0,0 +1,185 @@ +import { parentPort } from 'node:worker_threads'; + +import { STATELESS_EVENT_TYPES } from '../../common/constants'; +import { emit } from '../../common/control-protocol'; +import { truncateMessage } from '../../common/helpers'; +import { serializeError } from '../../logger/logger'; +import { runWithSdkLogContext } from '../../logger/logger.context'; +import { BaseState } from '../../state/state'; +import { SdkState } from '../../state/state.interfaces'; +import { + AirSyncEvent, + EventData, + ExtractorEventType, +} from '../../types/extraction'; +import { LoaderEventType } from '../../types/loading'; +import { + WorkerAdapterOptions, + WorkerMessageEmitted, + WorkerMessageSubject, +} from '../../types/workers'; +import { Uploader } from '../../uploader/uploader'; + +/** + * BaseAdapter holds the state and behavior shared by both sync modes and owns + * the `emit` control-protocol flow as a template method. Mode-specific adapters + * (`ExtractionAdapter`, `LoadingAdapter`) implement the abstract hooks to inject + * their own pre-emit work and event payload shaping. + * + * @typeParam ConnectorState - the connector-owned state shape + */ +export abstract class BaseAdapter { + readonly event: AirSyncEvent; + readonly options?: WorkerAdapterOptions; + isTimeout: boolean; + hasWorkerEmitted: boolean; + + protected adapterState: BaseState; + protected uploader: Uploader; + + constructor({ + event, + adapterState, + options, + }: { + event: AirSyncEvent; + adapterState: BaseState; + options?: WorkerAdapterOptions; + }) { + this.event = event; + this.options = options; + this.adapterState = adapterState; + this.hasWorkerEmitted = false; + this.isTimeout = false; + this.uploader = new Uploader({ + event, + options, + }); + } + + /** Connector-owned state exposed to snap-in code. */ + get state(): ConnectorState { + return this.adapterState.state; + } + + set state(value: ConnectorState) { + this.adapterState.state = value; + } + + /** SDK-internal bookkeeping state. Used by SDK internals; not for connector use. */ + get sdkState(): SdkState { + return this.adapterState.sdkState; + } + + get extractionScope() { + return this.adapterState.extractionScope; + } + + async postState() { + return runWithSdkLogContext(async () => { + await this.adapterState.postState(); + }); + } + + /** + * Pre-emit hook run before any state is persisted or the event is sent. + * Extraction uploads pending repos and updates extraction boundaries here; + * loading has nothing to do. Throwing aborts the emit (the caller signals + * worker exit). + */ + protected abstract beforeEmit( + newEventType: ExtractorEventType | LoaderEventType + ): Promise; + + /** + * Builds the mode-specific extras merged into the emitted event payload + * (extraction: artifacts; loading: reports + processed files). + */ + protected abstract buildEmitPayload( + newEventType: ExtractorEventType | LoaderEventType + ): EventData; + + /** + * Post-emit hook run after the event has been sent successfully. Extraction + * clears its accumulated artifacts here; loading has nothing to do. + */ + protected abstract afterEmit( + newEventType: ExtractorEventType | LoaderEventType + ): void; + + /** + * Emits an event to the platform. + * + * @param newEventType - The event type to be emitted + * @param data - The data to be sent with the event + */ + async emit( + newEventType: ExtractorEventType | LoaderEventType, + data?: EventData + ): Promise { + return runWithSdkLogContext(async () => { + if (this.hasWorkerEmitted) { + console.warn( + `Trying to emit event with event type: ${newEventType}. Ignoring emit request because it has already been emitted.` + ); + return; + } + + try { + await this.beforeEmit(newEventType); + } catch (error) { + console.error('Error while preparing to emit event', error); + parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); + this.hasWorkerEmitted = true; + return; + } + + // We want to save the state every time we emit an event, except for the start and delete events + if (!STATELESS_EVENT_TYPES.includes(this.event.payload.event_type)) { + console.log( + `Saving state before emitting event with event type: ${newEventType}.` + ); + + try { + await this.adapterState.postState(); + } catch (error) { + console.error('Error while posting state', error); + parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); + this.hasWorkerEmitted = true; + return; + } + } + + try { + // Always prune error messages to make them shorter before emit + if (data?.error?.message) { + data.error.message = truncateMessage(data.error.message); + } + + await emit({ + eventType: newEventType, + event: this.event, + data: { + ...data, + ...this.buildEmitPayload(newEventType), + }, + }); + + const message: WorkerMessageEmitted = { + subject: WorkerMessageSubject.WorkerMessageEmitted, + payload: { eventType: newEventType }, + }; + this.afterEmit(newEventType); + parentPort?.postMessage(message); + this.hasWorkerEmitted = true; + } catch (error) { + console.error( + `Error while emitting event with event type: ${newEventType}.`, + serializeError(error) + ); + parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); + this.hasWorkerEmitted = true; + } + }); + } +} diff --git a/src/multithreading/adapters/extraction-adapter.ts b/src/multithreading/adapters/extraction-adapter.ts new file mode 100644 index 0000000..6aba8cc --- /dev/null +++ b/src/multithreading/adapters/extraction-adapter.ts @@ -0,0 +1,535 @@ +import { AxiosResponse } from 'axios'; + +import { AttachmentsStreamingPool } from '../../attachments-streaming/attachments-streaming-pool'; +import { + AirSyncDefaultItemTypes, + EVENT_SIZE_THRESHOLD_BYTES, + SSOR_ATTACHMENT, +} from '../../common/constants'; +import { serializeError } from '../../logger/logger'; +import { + runWithSdkLogContext, + runWithUserLogContext, +} from '../../logger/logger.context'; +import { Repo } from '../../repo/repo'; +import { + NormalizedAttachment, + RepoInterface, +} from '../../repo/repo.interfaces'; +import { + AirSyncEvent, + EventData, + ExternalSystemAttachmentProcessors, + ExternalSystemAttachmentStreamingFunction, + ExtractorEventType, + ProcessAttachmentReturnType, + StreamAttachmentsReturnType, +} from '../../types/extraction'; +import { LoaderEventType } from '../../types/loading'; +import { BaseState } from '../../state/state'; +import { WorkerAdapterOptions } from '../../types/workers'; +import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; + +import { BaseAdapter } from './base-adapter'; + +/** + * ExtractionAdapter is the adapter passed to extraction tasks. It exposes the + * extraction surface (repos, artifacts, attachment streaming) and uploads + * pending repos and updates the extraction boundaries before emitting. + * + * @public + */ +export class ExtractionAdapter< + ConnectorState +> extends BaseAdapter { + private _artifacts: Artifact[]; + private repos: Repo[] = []; + private currentEventDataLength: number = 0; + + constructor(params: { + event: AirSyncEvent; + adapterState: BaseState; + options?: WorkerAdapterOptions; + }) { + super(params); + this._artifacts = []; + } + + /** + * Returns whether the given item type should be extracted. + * Defaults to true if the scope is empty or the item type is not listed. + */ + shouldExtract(itemType: string): boolean { + const scope = this.extractionScope; + if (Object.keys(scope).length === 0) return true; + if (!(itemType in scope)) return true; + return scope[itemType].extract; + } + + initializeRepos(repos: RepoInterface[]) { + this.repos = repos.map((repo) => { + const shouldNormalize = + repo.itemType !== AirSyncDefaultItemTypes.EXTERNAL_DOMAIN_METADATA && + repo.itemType !== SSOR_ATTACHMENT; + + return new Repo({ + event: this.event, + itemType: repo.itemType, + ...(shouldNormalize && { normalize: repo.normalize }), + onUpload: (artifact: Artifact) => { + // We need to store artifacts ids in state for later use when streaming attachments + if (repo.itemType === AirSyncDefaultItemTypes.ATTACHMENTS) { + this.sdkState.toDevRev?.attachmentsMetadata.artifactIds.push( + artifact.id + ); + } + + // Calculate size of the entire artifact object that goes in the SQS message + this.currentEventDataLength += Buffer.byteLength( + JSON.stringify(artifact), + 'utf8' + ); + + if ( + this.currentEventDataLength > EVENT_SIZE_THRESHOLD_BYTES && + !this.isTimeout + ) { + this.isTimeout = true; + } + }, + options: { + ...this.options, + ...repo.overridenOptions, + }, + }); + }); + } + + getRepo(itemType: string): Repo | undefined { + return runWithSdkLogContext(() => { + const repo = this.repos.find((repo) => repo.itemType === itemType); + + if (!repo) { + console.error(`Repo for item type ${itemType} not found.`); + return; + } + + return repo; + }); + } + + get artifacts(): Artifact[] { + return this._artifacts; + } + + set artifacts(artifacts: Artifact[]) { + this._artifacts = this._artifacts + .concat(artifacts) + .filter((value, index, self) => self.indexOf(value) === index); + } + + protected async beforeEmit( + newEventType: ExtractorEventType | LoaderEventType + ): Promise { + // Upload all repos before emitting the event + console.log( + `Uploading all repos before emitting event with event type: ${newEventType}.` + ); + await this.uploadAllRepos(); + + // If the extraction is done, we want to save the timestamp of the last successful sync + if (newEventType === ExtractorEventType.AttachmentExtractionDone) { + const sdkState = this.sdkState; + + console.log( + `Overwriting lastSuccessfulSyncStarted with lastSyncStarted (${sdkState.lastSyncStarted}).` + ); + + sdkState.lastSuccessfulSyncStarted = sdkState.lastSyncStarted; + sdkState.lastSyncStarted = ''; + + // Clear pending extraction boundaries now that the cycle is complete + sdkState.pendingWorkersOldest = ''; + sdkState.pendingWorkersNewest = ''; + + // Update workersOldest and workersNewest boundaries from resolved extraction timestamps. + // Expand boundaries: workersOldest gets the earliest timestamp, workersNewest gets the latest. + const extractionStart = this.event.payload.event_context.extract_from; + const extractionEnd = this.event.payload.event_context.extract_to; + + if ( + extractionStart && + (!sdkState.workersOldest || extractionStart < sdkState.workersOldest) + ) { + console.log( + `Updating workersOldest from '${sdkState.workersOldest}' to '${extractionStart}'.` + ); + sdkState.workersOldest = extractionStart; + } + + if ( + extractionEnd && + (!sdkState.workersNewest || extractionEnd > sdkState.workersNewest) + ) { + console.log( + `Updating workersNewest from '${sdkState.workersNewest}' to '${extractionEnd}'.` + ); + sdkState.workersNewest = extractionEnd; + } + } + } + + protected buildEmitPayload( + newEventType: ExtractorEventType | LoaderEventType + ): EventData { + const isExtractionEvent = Object.values(ExtractorEventType).includes( + newEventType as ExtractorEventType + ); + return isExtractionEvent ? { artifacts: this.artifacts } : {}; + } + + protected afterEmit(): void { + this.artifacts = []; + } + + /** + * Emits an extraction event. For ExternalSyncUnitExtractionDone, inline + * external sync units are uploaded via a Repo first (then stripped from the + * payload to avoid SQS size limits); `beforeEmit` uploads that repo with the + * rest before the event is sent. + * TODO: Remove the external sync unit handling in v2.0.0 + * + * @param newEventType - The event type to be emitted + * @param data - The data to be sent with the event + */ + override async emit( + newEventType: ExtractorEventType | LoaderEventType, + data?: EventData + ): Promise { + if ( + newEventType === ExtractorEventType.ExternalSyncUnitExtractionDone && + data?.external_sync_units && + data.external_sync_units.length > 0 + ) { + console.log( + `Uploading ${data.external_sync_units.length} external sync units via repo before emitting event.` + ); + + this.initializeRepos([ + { + itemType: AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS, + overridenOptions: { + batchSize: 25000, + skipConfirmation: true, + }, + }, + ]); + + await this.getRepo(AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS)?.push( + data.external_sync_units + ); + + // Remove inline external_sync_units from data to avoid SQS size issues + delete data.external_sync_units; + } + + return super.emit(newEventType, data); + } + + async uploadAllRepos(): Promise { + for (const repo of this.repos) { + const error = await repo.upload(); + this.artifacts.push(...repo.uploadedArtifacts); + if (error) { + throw error; + } + } + } + + async processAttachment( + attachment: NormalizedAttachment, + stream: ExternalSystemAttachmentStreamingFunction + ): Promise { + return runWithSdkLogContext(async () => { + const { httpStream, delay, error } = await runWithUserLogContext( + async () => + stream({ + item: attachment, + event: this.event, + }) + ); + + if (error) { + return { error }; + } else if (delay) { + return { delay }; + } + + if (httpStream) { + const fileType = + attachment.content_type || + httpStream.headers['content-type']?.toString() || + 'application/octet-stream'; + const contentLength = httpStream.headers['content-length']?.toString(); + const fileSize = contentLength ? parseInt(contentLength) : undefined; + + // Get upload URL + const { error: artifactUrlError, response: artifactUrlResponse } = + await this.uploader.getArtifactUploadUrl( + attachment.file_name, + fileType, + fileSize + ); + + if (artifactUrlError) { + this.destroyHttpStream(httpStream); + return { + error: { + message: `Error while preparing artifact for attachment ID ${ + attachment.id + }. Skipping attachment. ${serializeError(artifactUrlError)}`, + fileSize: fileSize, + }, + }; + } + + // Stream attachment + const { error: uploadedArtifactError } = + await this.uploader.streamArtifact(artifactUrlResponse!, httpStream); + + if (uploadedArtifactError) { + this.destroyHttpStream(httpStream); + return { + error: { + message: + `Error while streaming to artifact for attachment ID ${attachment.id}. Skipping attachment. ` + + serializeError(uploadedArtifactError), + fileSize: fileSize, + }, + }; + } + + // Confirm attachment upload + const { error: confirmArtifactUploadError } = + await this.uploader.confirmArtifactUpload( + artifactUrlResponse!.artifact_id + ); + if (confirmArtifactUploadError) { + return { + error: { + message: + `Error while confirming upload for attachment ID ${attachment.id}. ` + + serializeError(confirmArtifactUploadError), + fileSize: fileSize, + }, + }; + } + + const ssorAttachment: SsorAttachment = { + id: { + devrev: artifactUrlResponse!.artifact_id, + external: attachment.id, + }, + parent_id: { + external: attachment.parent_id, + }, + }; + + if (attachment.author_id) { + ssorAttachment.actor_id = { + external: attachment.author_id, + }; + } + + // This will set inline flag in ssor_attachment only if it is explicity + // set in the attachment object. + if (attachment.inline === true) { + ssorAttachment.inline = true; + } else if (attachment.inline === false) { + ssorAttachment.inline = false; + } + + await this.getRepo('ssor_attachment')?.push([ssorAttachment]); + return; + } + return { + error: { + message: `Error while opening attachment stream. Skipping attachment.`, + }, + }; + }); + } + + /** + * Destroys a stream to prevent memory leaks. + * @param httpStream - The axios response stream to destroy + */ + private destroyHttpStream(httpStream: AxiosResponse): void { + try { + if (httpStream && httpStream.data) { + if (typeof httpStream.data.destroy === 'function') { + httpStream.data.destroy(); + } else if (typeof httpStream.data.close === 'function') { + httpStream.data.close(); + } + } + } catch (error) { + console.warn('Error while destroying HTTP stream:', error); + } + } + + /** + * Streams the attachments to the DevRev platform. + * The attachments are streamed to the platform and the artifact information is returned. + * @param params - The parameters to stream the attachments + * @returns The response object containing the ssorAttachment artifact information + * or error information if there was an error + */ + async streamAttachments({ + stream, + processors, + batchSize = 1, // By default, we want to stream one attachment at a time + }: { + stream: ExternalSystemAttachmentStreamingFunction; + processors?: ExternalSystemAttachmentProcessors< + ConnectorState, + NormalizedAttachment[], + NewBatch + >; + batchSize?: number; + }): Promise { + return runWithSdkLogContext(async () => { + if (batchSize <= 0) { + console.warn( + `The specified batch size (${batchSize}) is invalid. Using 1 instead.` + ); + batchSize = 1; + } + + if (batchSize > 50) { + console.warn( + `The specified batch size (${batchSize}) is too large. Using 50 instead.` + ); + batchSize = 50; + } + + const repos = [ + { + itemType: 'ssor_attachment', + }, + ]; + this.initializeRepos(repos); + + const attachmentsMetadata = this.sdkState.toDevRev?.attachmentsMetadata; + + // If there are no attachments metadata artifact IDs in state, finish here + if (!attachmentsMetadata?.artifactIds?.length) { + console.log(`No attachments metadata artifact IDs found in state.`); + return; + } else { + console.log( + `Found ${attachmentsMetadata.artifactIds.length} attachments metadata artifact IDs in state.` + ); + } + + // Loop through the attachments metadata artifact IDs + while (attachmentsMetadata.artifactIds.length > 0) { + const attachmentsMetadataArtifactId = + attachmentsMetadata.artifactIds[0]; + + console.log( + `Started processing attachments for attachments metadata artifact ID: ${attachmentsMetadataArtifactId}.` + ); + + const { attachments, error } = + await this.uploader.getAttachmentsFromArtifactId({ + artifact: attachmentsMetadataArtifactId, + }); + + if (error) { + console.error( + `Failed to get attachments for artifact ID: ${attachmentsMetadataArtifactId}.` + ); + return { error }; + } + + if (!attachments || attachments.length === 0) { + console.warn( + `No attachments found for artifact ID: ${attachmentsMetadataArtifactId}.` + ); + // Remove empty artifact and reset lastProcessed + attachmentsMetadata.artifactIds.shift(); + attachmentsMetadata.lastProcessed = 0; + continue; + } + + console.log( + `Found ${attachments.length} attachments for artifact ID: ${attachmentsMetadataArtifactId}.` + ); + + let response; + + if (processors) { + console.log(`Using custom processors for attachments.`); + + const reducer = processors.reducer; + const iterator = processors.iterator; + + const reducedAttachments = runWithUserLogContext(() => + reducer({ + attachments, + adapter: this, + batchSize, + }) + ); + + response = await runWithUserLogContext(async () => { + return await iterator({ + reducedAttachments, + adapter: this, + stream, + }); + }); + } else { + console.log( + `Using attachments streaming pool for attachments streaming.` + ); + + const attachmentsPool = new AttachmentsStreamingPool({ + adapter: this, + attachments, + batchSize, + stream, + }); + + response = await attachmentsPool.streamAll(); + } + + if (response?.delay || response?.error) { + return response; + } + + // On timeout, emit progress and exit to allow continuation. + if (this.isTimeout) { + console.log( + `Timeout detected after processing attachments for artifact ID: ${attachmentsMetadataArtifactId}. Emitting progress to allow continuation.` + ); + await this.emit(ExtractorEventType.AttachmentExtractionProgress); + process.exit(0); + return; + } + + console.log( + `Finished processing all attachments for artifact ID: ${attachmentsMetadataArtifactId}.` + ); + attachmentsMetadata.artifactIds.shift(); + attachmentsMetadata.lastProcessed = 0; + if (attachmentsMetadata.lastProcessedAttachmentsIdsList) { + attachmentsMetadata.lastProcessedAttachmentsIdsList.length = 0; + } + } + + return; + }); + } +} diff --git a/src/multithreading/worker-adapter/worker-adapter.helpers.ts b/src/multithreading/adapters/loading-adapter.helpers.ts similarity index 100% rename from src/multithreading/worker-adapter/worker-adapter.helpers.ts rename to src/multithreading/adapters/loading-adapter.helpers.ts diff --git a/src/multithreading/adapters/loading-adapter.ts b/src/multithreading/adapters/loading-adapter.ts new file mode 100644 index 0000000..66a3fd2 --- /dev/null +++ b/src/multithreading/adapters/loading-adapter.ts @@ -0,0 +1,645 @@ +import axios from 'axios'; + +import { serializeError } from '../../logger/logger'; +import { + runWithSdkLogContext, + runWithUserLogContext, +} from '../../logger/logger.context'; +import { Mappers } from '../../mappers/mappers'; +import { SyncMapperRecordStatus } from '../../mappers/mappers.interface'; +import { BaseState } from '../../state/state'; +import { + AirSyncEvent, + EventData, + EventType, + ExtractorEventType, +} from '../../types/extraction'; +import { + ActionType, + ExternalSystemAttachment, + ExternalSystemItem, + ExternalSystemLoadingFunction, + FileToLoad, + ItemTypesToLoadParams, + ItemTypeToLoad, + LoaderEventType, + LoaderReport, + LoadItemResponse, + LoadItemTypesResponse, + StatsFileObject, +} from '../../types/loading'; +import { WorkerAdapterOptions } from '../../types/workers'; + +import { BaseAdapter } from './base-adapter'; +import { + addReportToLoaderReport, + getFilesToLoad, +} from './loading-adapter.helpers'; + +/** + * LoadingAdapter is the adapter passed to loading tasks. It exposes the loading + * surface (item/attachment loading, mappers, loader reports). + * + * @public + */ +export class LoadingAdapter< + ConnectorState +> extends BaseAdapter { + private loaderReports: LoaderReport[]; + private _processedFiles: string[]; + private _mappers: Mappers; + + constructor(params: { + event: AirSyncEvent; + adapterState: BaseState; + options?: WorkerAdapterOptions; + }) { + super(params); + this.loaderReports = []; + this._processedFiles = []; + this._mappers = new Mappers({ + event: params.event, + options: params.options, + }); + } + + get reports(): LoaderReport[] { + return this.loaderReports; + } + + get processedFiles(): string[] { + return this._processedFiles; + } + + get mappers(): Mappers { + return this._mappers; + } + + protected async beforeEmit(): Promise { + // Loading has no pre-emit work (no repos, no extraction boundaries). + } + + protected buildEmitPayload( + newEventType: ExtractorEventType | LoaderEventType + ): EventData { + const isLoaderEvent = Object.values(LoaderEventType).includes( + newEventType as LoaderEventType + ); + return isLoaderEvent + ? { + reports: this.reports, + processed_files: this.processedFiles, + } + : {}; + } + + protected afterEmit(): void { + // Loading keeps its accumulated reports/processed files across emits. + } + + async loadItemTypes({ + itemTypesToLoad, + }: ItemTypesToLoadParams): Promise { + return runWithSdkLogContext(async () => { + if (this.event.payload.event_type === EventType.StartLoadingData) { + const itemTypes = itemTypesToLoad.map( + (itemTypeToLoad) => itemTypeToLoad.itemType + ); + + if (!itemTypes.length) { + console.warn('No item types to load, returning.'); + return { + reports: this.reports, + processed_files: this.processedFiles, + }; + } + + const filesToLoad = await this.getLoaderBatches({ + supportedItemTypes: itemTypes, + }); + this.sdkState.fromDevRev = { + filesToLoad, + }; + } + + if ( + !this.sdkState.fromDevRev || + !this.sdkState.fromDevRev.filesToLoad.length + ) { + console.warn('No files to load, returning.'); + return { + reports: this.reports, + processed_files: this.processedFiles, + }; + } + + console.log( + 'Files to load in state', + this.sdkState.fromDevRev?.filesToLoad + ); + + try { + outerloop: for (const fileToLoad of this.sdkState.fromDevRev + .filesToLoad) { + const itemTypeToLoad = itemTypesToLoad.find( + (itemTypeToLoad: ItemTypeToLoad) => + itemTypeToLoad.itemType === fileToLoad.itemType + ); + + if (!itemTypeToLoad) { + console.error( + `Item type to load not found for item type: ${fileToLoad.itemType}.` + ); + + await this.emit(LoaderEventType.DataLoadingError, { + error: { + message: `Item type to load not found for item type: ${fileToLoad.itemType}.`, + }, + }); + + break; + } + + if (!fileToLoad.completed) { + const { response, error: transformerFileError } = + await this.uploader.getJsonObjectByArtifactId({ + artifactId: fileToLoad.id, + isGzipped: true, + }); + + if (transformerFileError) { + console.error( + `Transformer file not found for artifact ID: ${fileToLoad.id}.` + ); + await this.emit(LoaderEventType.DataLoadingError, { + error: { + message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, + }, + }); + break outerloop; + } + + const transformerFile = response as ExternalSystemItem[]; + + for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { + if (this.isTimeout) { + console.log( + 'Timeout detected during data loading. Emitting progress to allow continuation.' + ); + await this.emit(LoaderEventType.DataLoadingProgress); + process.exit(0); + } + + const { report, rateLimit } = await this.loadItem({ + item: transformerFile[i], + itemTypeToLoad, + }); + + if (rateLimit?.delay) { + await this.emit(LoaderEventType.DataLoadingDelayed, { + delay: rateLimit.delay, + reports: this.reports, + processed_files: this.processedFiles, + }); + + break outerloop; + } + + if (report) { + addReportToLoaderReport({ + loaderReports: this.loaderReports, + report, + }); + fileToLoad.lineToProcess = fileToLoad.lineToProcess + 1; + } + } + + fileToLoad.completed = true; + this._processedFiles.push(fileToLoad.id); + } + } + } catch (error) { + console.error('Error during data loading.', serializeError(error)); + await this.emit(LoaderEventType.DataLoadingError, { + error: { + message: `Error during data loading. ${serializeError(error)}`, + }, + }); + process.exit(1); + } + + return { + reports: this.reports, + processed_files: this.processedFiles, + }; + }); + } + + async getLoaderBatches({ + supportedItemTypes, + }: { + supportedItemTypes: string[]; + }) { + return runWithSdkLogContext(async () => { + const statsFileArtifactId = this.event.payload.event_data?.stats_file; + + if (statsFileArtifactId) { + const { response, error: statsFileError } = + await this.uploader.getJsonObjectByArtifactId({ + artifactId: statsFileArtifactId, + }); + + const statsFile = response as StatsFileObject[]; + + if (statsFileError || statsFile.length === 0) { + return [] as FileToLoad[]; + } + + const filesToLoad = getFilesToLoad({ + supportedItemTypes, + statsFile, + }); + + return filesToLoad; + } + + return [] as FileToLoad[]; + }); + } + + async loadAttachments({ + create, + }: { + create: ExternalSystemLoadingFunction; + }): Promise { + return runWithSdkLogContext(async () => { + if (this.event.payload.event_type === EventType.StartLoadingAttachments) { + this.sdkState.fromDevRev = { + filesToLoad: await this.getLoaderBatches({ + supportedItemTypes: ['attachment'], + }), + }; + } + + if ( + !this.sdkState.fromDevRev || + this.sdkState.fromDevRev?.filesToLoad.length === 0 + ) { + console.log('No files to load, returning.'); + return { + reports: this.reports, + processed_files: this.processedFiles, + }; + } + + const filesToLoad = this.sdkState.fromDevRev?.filesToLoad; + + try { + outerloop: for (const fileToLoad of filesToLoad) { + if (!fileToLoad.completed) { + const { response, error: transformerFileError } = + await this.uploader.getJsonObjectByArtifactId({ + artifactId: fileToLoad.id, + isGzipped: true, + }); + + const transformerFile = response as ExternalSystemAttachment[]; + + if (transformerFileError) { + console.error( + `Transformer file not found for artifact ID: ${fileToLoad.id}.` + ); + await this.emit(LoaderEventType.AttachmentLoadingError, { + error: { + message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, + }, + }); + break outerloop; + } + + for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { + if (this.isTimeout) { + console.log( + 'Timeout detected during attachment loading. Emitting progress to allow continuation.' + ); + await this.emit(LoaderEventType.AttachmentLoadingProgress); + process.exit(0); + } + + const { report, rateLimit } = await this.loadAttachment({ + item: transformerFile[i], + create, + }); + + if (rateLimit?.delay) { + await this.emit(LoaderEventType.AttachmentLoadingDelayed, { + delay: rateLimit.delay, + reports: this.reports, + processed_files: this.processedFiles, + }); + + break outerloop; + } + + if (report) { + addReportToLoaderReport({ + loaderReports: this.loaderReports, + report, + }); + fileToLoad.lineToProcess = fileToLoad.lineToProcess + 1; + } + } + + fileToLoad.completed = true; + this._processedFiles.push(fileToLoad.id); + } + } + } catch (error) { + console.error( + 'Error during attachment loading.', + serializeError(error) + ); + await this.emit(LoaderEventType.AttachmentLoadingError, { + error: { + message: `Error during attachment loading. ${serializeError( + error + )}`, + }, + }); + process.exit(1); + } + + return { + reports: this.reports, + processed_files: this.processedFiles, + }; + }); + } + + async loadItem({ + item, + itemTypeToLoad, + }: { + item: ExternalSystemItem; + itemTypeToLoad: ItemTypeToLoad; + }): Promise { + return runWithSdkLogContext(async () => { + const devrevId = item.id.devrev; + + try { + const syncMapperRecordResponse = await this._mappers.getByTargetId({ + sync_unit: this.event.payload.event_context.sync_unit, + target: devrevId, + }); + + const syncMapperRecord = syncMapperRecordResponse.data; + if (!syncMapperRecord) { + console.warn('Failed to get sync mapper record from response.'); + return { + error: { + message: 'Failed to get sync mapper record from response.', + }, + }; + } + + // Update item in external system + const { id, modifiedDate, delay, error } = await runWithUserLogContext( + async () => { + return await itemTypeToLoad.update({ + item, + mappers: this._mappers, + event: this.event, + }); + } + ); + + if (id) { + try { + const syncMapperRecordUpdateResponse = await this._mappers.update({ + id: syncMapperRecord.sync_mapper_record.id, + sync_unit: this.event.payload.event_context.sync_unit, + status: SyncMapperRecordStatus.OPERATIONAL, + ...(modifiedDate && { + external_versions: { + add: [ + { + modified_date: modifiedDate, + recipe_version: 0, + }, + ], + }, + }), + external_ids: { + add: [id], + }, + targets: { + add: [devrevId], + }, + }); + + console.log( + 'Successfully updated sync mapper record.', + syncMapperRecordUpdateResponse.data + ); + } catch (error) { + console.warn( + 'Failed to update sync mapper record.', + serializeError(error) + ); + return { + error: { + message: + 'Failed to update sync mapper record' + serializeError(error), + }, + }; + } + + return { + report: { + item_type: itemTypeToLoad.itemType, + [ActionType.UPDATED]: 1, + }, + }; + } else if (delay) { + console.log( + `Rate limited while updating item in external system, delaying for ${delay} seconds.` + ); + + return { + rateLimit: { + delay, + }, + }; + } else { + console.warn('Failed to update item in external system', error); + return { + report: { + item_type: itemTypeToLoad.itemType, + [ActionType.FAILED]: 1, + }, + }; + } + + // TODO: Update mapper (optional) + } catch (error) { + if (axios.isAxiosError(error)) { + if (error.response?.status === 404) { + // Create item in external system if mapper record not found + const { id, modifiedDate, delay, error } = + await runWithUserLogContext(async () => { + return await itemTypeToLoad.create({ + item, + mappers: this._mappers, + event: this.event, + }); + }); + + if (id) { + // Create mapper + try { + const syncMapperRecordCreateResponse = + await this._mappers.create({ + sync_unit: this.event.payload.event_context.sync_unit, + status: SyncMapperRecordStatus.OPERATIONAL, + external_ids: [id], + targets: [devrevId], + ...(modifiedDate && { + external_versions: [ + { + modified_date: modifiedDate, + recipe_version: 0, + }, + ], + }), + }); + + console.log( + 'Successfully created sync mapper record.', + syncMapperRecordCreateResponse.data + ); + + return { + report: { + item_type: itemTypeToLoad.itemType, + [ActionType.CREATED]: 1, + }, + }; + } catch (error) { + console.warn( + 'Failed to create sync mapper record.', + serializeError(error) + ); + return { + error: { + message: + 'Failed to create sync mapper record. ' + + serializeError(error), + }, + }; + } + } else if (delay) { + return { + rateLimit: { + delay, + }, + }; + } else { + console.warn( + 'Failed to create item in external system.', + serializeError(error) + ); + return { + report: { + item_type: itemTypeToLoad.itemType, + [ActionType.FAILED]: 1, + }, + }; + } + } else { + console.warn( + 'Failed to get sync mapper record.', + serializeError(error) + ); + return { + error: { + message: error.message, + }, + }; + } + } + + console.warn( + 'Failed to get sync mapper record.', + serializeError(error) + ); + return { + error: { + message: + 'Failed to get sync mapper record. ' + serializeError(error), + }, + }; + } + }); + } + + async loadAttachment({ + item, + create, + }: { + item: ExternalSystemAttachment; + create: ExternalSystemLoadingFunction; + }): Promise { + return runWithSdkLogContext(async () => { + // Create item + const { id, delay, error } = await runWithUserLogContext(async () => + create({ + item, + mappers: this._mappers, + event: this.event, + }) + ); + + if (delay) { + return { + rateLimit: { + delay, + }, + }; + } else if (id) { + try { + const syncMapperRecordCreateResponse = await this._mappers.create({ + sync_unit: this.event.payload.event_context.sync_unit, + external_ids: [id], + targets: [item.reference_id], + status: SyncMapperRecordStatus.OPERATIONAL, + }); + + console.log( + 'Successfully created sync mapper record.', + syncMapperRecordCreateResponse.data + ); + } catch (error) { + console.warn( + 'Failed to create sync mapper record.', + serializeError(error) + ); + } + + return { + report: { + item_type: 'attachment', + [ActionType.CREATED]: 1, + }, + }; + } else { + console.warn('Failed to create attachment in external system', error); + return { + report: { + item_type: 'attachment', + [ActionType.FAILED]: 1, + }, + }; + } + }); + } +} diff --git a/src/multithreading/process-task.ts b/src/multithreading/process-task.ts index 0818f55..619bf85 100644 --- a/src/multithreading/process-task.ts +++ b/src/multithreading/process-task.ts @@ -5,12 +5,15 @@ import { runWithUserLogContext, } from '../logger/logger.context'; import { createAdapterState } from '../state/state'; +import { SyncMode } from '../types/common'; import { ProcessTaskInterface, + WorkerAdapter, WorkerEvent, WorkerMessageSubject, } from '../types/workers'; -import { WorkerAdapter } from './worker-adapter/worker-adapter'; +import { ExtractionAdapter } from './adapters/extraction-adapter'; +import { LoadingAdapter } from './adapters/loading-adapter'; export function processTask({ task, @@ -38,11 +41,18 @@ export function processTask({ options, }); - const adapter = new WorkerAdapter({ - event, - adapterState, - options, - }); + const adapter: WorkerAdapter = + event.payload.event_context.mode === SyncMode.LOADING + ? new LoadingAdapter({ + event, + adapterState, + options, + }) + : new ExtractionAdapter({ + event, + adapterState, + options, + }); parentPort?.on(WorkerEvent.WorkerMessage, (message) => { if (message.subject !== WorkerMessageSubject.WorkerMessageExit) { diff --git a/src/multithreading/worker-adapter/worker-adapter.ts b/src/multithreading/worker-adapter/worker-adapter.ts deleted file mode 100644 index aae497a..0000000 --- a/src/multithreading/worker-adapter/worker-adapter.ts +++ /dev/null @@ -1,1244 +0,0 @@ -import axios, { AxiosResponse } from 'axios'; -import { parentPort } from 'node:worker_threads'; -import { AttachmentsStreamingPool } from '../../attachments-streaming/attachments-streaming-pool'; -import { - AirSyncDefaultItemTypes, - EVENT_SIZE_THRESHOLD_BYTES, - SSOR_ATTACHMENT, - STATELESS_EVENT_TYPES, -} from '../../common/constants'; -import { emit } from '../../common/control-protocol'; -import { - addReportToLoaderReport, - getFilesToLoad, -} from './worker-adapter.helpers'; -import { serializeError } from '../../logger/logger'; -import { - runWithSdkLogContext, - runWithUserLogContext, -} from '../../logger/logger.context'; -import { Mappers } from '../../mappers/mappers'; -import { SyncMapperRecordStatus } from '../../mappers/mappers.interface'; -import { Repo } from '../../repo/repo'; -import { - NormalizedAttachment, - RepoInterface, -} from '../../repo/repo.interfaces'; -import { BaseState } from '../../state/state'; -import { SdkState } from '../../state/state.interfaces'; -import { - AirSyncEvent, - EventData, - EventType, - ExternalSystemAttachmentProcessors, - ExternalSystemAttachmentStreamingFunction, - ExtractorEventType, - ProcessAttachmentReturnType, - StreamAttachmentsReturnType, -} from '../../types/extraction'; -import { - ActionType, - ExternalSystemAttachment, - ExternalSystemItem, - ExternalSystemLoadingFunction, - FileToLoad, - ItemTypesToLoadParams, - ItemTypeToLoad, - LoaderEventType, - LoaderReport, - LoadItemResponse, - LoadItemTypesResponse, - StatsFileObject, -} from '../../types/loading'; -import { - WorkerAdapterInterface, - WorkerAdapterOptions, - WorkerMessageEmitted, - WorkerMessageSubject, -} from '../../types/workers'; -import { Uploader } from '../../uploader/uploader'; -import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; -import { truncateMessage } from '../../common/helpers'; - -export function createWorkerAdapter({ - event, - adapterState, - options, -}: WorkerAdapterInterface): WorkerAdapter { - return new WorkerAdapter({ - event, - adapterState, - options, - }); -} - -/** - * WorkerAdapter class is used to interact with AirSync platform. It is passed to the snap-in - * as parameter in processTask and onTimeout functions. The class provides - * utilities to emit control events to the platform, update the state of the connector, - * and upload artifacts to the platform. - * @class WorkerAdapter - * @constructor - * @param options - The options to create a new instance of WorkerAdapter class - * @param event - The event object received from the platform - * @param initialState - The initial state of the adapter - * @param isLocalDevelopment - A flag to indicate if the adapter is being used in local development - * @param workerPath - The path to the worker file - * - * @public - */ -export class WorkerAdapter { - readonly event: AirSyncEvent; - readonly options?: WorkerAdapterOptions; - isTimeout: boolean; - hasWorkerEmitted: boolean; - - private adapterState: BaseState; - private _artifacts: Artifact[]; - private repos: Repo[] = []; - private currentEventDataLength: number = 0; - - // Loader - private loaderReports: LoaderReport[]; - private _processedFiles: string[]; - private _mappers: Mappers; - private uploader: Uploader; - - constructor({ - event, - adapterState, - options, - }: WorkerAdapterInterface) { - this.event = event; - this.options = options; - this.adapterState = adapterState; - this._artifacts = []; - this.hasWorkerEmitted = false; - this.isTimeout = false; - - // Loader - this.loaderReports = []; - this._processedFiles = []; - this._mappers = new Mappers({ - event, - options, - }); - this.uploader = new Uploader({ - event, - options, - }); - } - - /** Connector-owned state exposed to snap-in code. */ - get state(): ConnectorState { - return this.adapterState.state; - } - - set state(value: ConnectorState) { - this.adapterState.state = value; - } - - /** SDK-internal bookkeeping state. Used by SDK internals; not for connector use. */ - get sdkState(): SdkState { - return this.adapterState.sdkState; - } - - get reports(): LoaderReport[] { - return this.loaderReports; - } - - get processedFiles(): string[] { - return this._processedFiles; - } - - get mappers(): Mappers { - return this._mappers; - } - - get extractionScope() { - return this.adapterState.extractionScope; - } - - /** - * Returns whether the given item type should be extracted. - * Defaults to true if the scope is empty or the item type is not listed. - */ - shouldExtract(itemType: string): boolean { - const scope = this.extractionScope; - if (Object.keys(scope).length === 0) return true; - if (!(itemType in scope)) return true; - return scope[itemType].extract; - } - - initializeRepos(repos: RepoInterface[]) { - this.repos = repos.map((repo) => { - const shouldNormalize = - repo.itemType !== AirSyncDefaultItemTypes.EXTERNAL_DOMAIN_METADATA && - repo.itemType !== SSOR_ATTACHMENT; - - return new Repo({ - event: this.event, - itemType: repo.itemType, - ...(shouldNormalize && { normalize: repo.normalize }), - onUpload: (artifact: Artifact) => { - // We need to store artifacts ids in state for later use when streaming attachments - if (repo.itemType === AirSyncDefaultItemTypes.ATTACHMENTS) { - this.sdkState.toDevRev?.attachmentsMetadata.artifactIds.push( - artifact.id - ); - } - - // Calculate size of the entire artifact object that goes in the SQS message - this.currentEventDataLength += Buffer.byteLength( - JSON.stringify(artifact), - 'utf8' - ); - - if ( - this.currentEventDataLength > EVENT_SIZE_THRESHOLD_BYTES && - !this.isTimeout - ) { - this.isTimeout = true; - } - }, - options: { - ...this.options, - ...repo.overridenOptions, - }, - }); - }); - } - - getRepo(itemType: string): Repo | undefined { - return runWithSdkLogContext(() => { - const repo = this.repos.find((repo) => repo.itemType === itemType); - - if (!repo) { - console.error(`Repo for item type ${itemType} not found.`); - return; - } - - return repo; - }); - } - - async postState() { - return runWithSdkLogContext(async () => { - await this.adapterState.postState(); - }); - } - - get artifacts(): Artifact[] { - return this._artifacts; - } - - set artifacts(artifacts: Artifact[]) { - this._artifacts = this._artifacts - .concat(artifacts) - .filter((value, index, self) => self.indexOf(value) === index); - } - - /** - * Emits an event to the platform. - * - * @param newEventType - The event type to be emitted - * @param data - The data to be sent with the event - */ - async emit( - newEventType: ExtractorEventType | LoaderEventType, - data?: EventData - ): Promise { - return runWithSdkLogContext(async () => { - if (this.hasWorkerEmitted) { - console.warn( - `Trying to emit event with event type: ${newEventType}. Ignoring emit request because it has already been emitted.` - ); - return; - } - - // If the event is ExternalSyncUnitExtractionDone, upload external sync units via a Repo before emitting - // TODO: Remove in v2.0.0 - if ( - newEventType === ExtractorEventType.ExternalSyncUnitExtractionDone && - data?.external_sync_units && - data.external_sync_units.length > 0 - ) { - console.log( - `Uploading ${data.external_sync_units.length} external sync units via repo before emitting event.` - ); - - this.initializeRepos([ - { - itemType: AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS, - overridenOptions: { - batchSize: 25000, - skipConfirmation: true, - }, - }, - ]); - - await this.getRepo(AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS)?.push( - data.external_sync_units - ); - - // Remove inline external_sync_units from data to avoid SQS size issues - delete data.external_sync_units; - } - - // Upload all repos before emitting the event - console.log( - `Uploading all repos before emitting event with event type: ${newEventType}.` - ); - - try { - await this.uploadAllRepos(); - } catch (error) { - console.error('Error while uploading repos', error); - parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); - this.hasWorkerEmitted = true; - return; - } - - // If the extraction is done, we want to save the timestamp of the last successful sync - if (newEventType === ExtractorEventType.AttachmentExtractionDone) { - console.log( - `Overwriting lastSuccessfulSyncStarted with lastSyncStarted (${this.sdkState.lastSyncStarted}).` - ); - - this.sdkState.lastSuccessfulSyncStarted = this.sdkState.lastSyncStarted; - this.sdkState.lastSyncStarted = ''; - - // Clear pending extraction boundaries now that the cycle is complete - this.sdkState.pendingWorkersOldest = ''; - this.sdkState.pendingWorkersNewest = ''; - - // Update workersOldest and workersNewest boundaries from resolved extraction timestamps. - // Expand boundaries: workersOldest gets the earliest timestamp, workersNewest gets the latest. - const extractionStart = this.event.payload.event_context.extract_from; - const extractionEnd = this.event.payload.event_context.extract_to; - - if ( - extractionStart && - (!this.sdkState.workersOldest || - extractionStart < this.sdkState.workersOldest) - ) { - console.log( - `Updating workersOldest from '${this.sdkState.workersOldest}' to '${extractionStart}'.` - ); - this.sdkState.workersOldest = extractionStart; - } - - if ( - extractionEnd && - (!this.sdkState.workersNewest || - extractionEnd > this.sdkState.workersNewest) - ) { - console.log( - `Updating workersNewest from '${this.sdkState.workersNewest}' to '${extractionEnd}'.` - ); - this.sdkState.workersNewest = extractionEnd; - } - } - - // We want to save the state every time we emit an event, except for the start and delete events - if (!STATELESS_EVENT_TYPES.includes(this.event.payload.event_type)) { - console.log( - `Saving state before emitting event with event type: ${newEventType}.` - ); - - try { - await this.adapterState.postState(this.state); - } catch (error) { - console.error('Error while posting state', error); - parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); - this.hasWorkerEmitted = true; - return; - } - } - - try { - // Always prune error messages to make them shorter before emit - if (data?.error?.message) { - data.error.message = truncateMessage(data.error.message); - } - - const isExtractionEvent = Object.values(ExtractorEventType).includes( - newEventType as ExtractorEventType - ); - const isLoaderEvent = Object.values(LoaderEventType).includes( - newEventType as LoaderEventType - ); - - await emit({ - eventType: newEventType, - event: this.event, - data: { - ...data, - ...(isExtractionEvent ? { artifacts: this.artifacts } : {}), - ...(isLoaderEvent - ? { reports: this.reports, processed_files: this.processedFiles } - : {}), - }, - }); - - const message: WorkerMessageEmitted = { - subject: WorkerMessageSubject.WorkerMessageEmitted, - payload: { eventType: newEventType }, - }; - this.artifacts = []; - parentPort?.postMessage(message); - this.hasWorkerEmitted = true; - } catch (error) { - console.error( - `Error while emitting event with event type: ${newEventType}.`, - serializeError(error) - ); - parentPort?.postMessage(WorkerMessageSubject.WorkerMessageExit); - this.hasWorkerEmitted = true; - } - }); - } - - async uploadAllRepos(): Promise { - for (const repo of this.repos) { - const error = await repo.upload(); - this.artifacts.push(...repo.uploadedArtifacts); - if (error) { - throw error; - } - } - } - - async loadItemTypes({ - itemTypesToLoad, - }: ItemTypesToLoadParams): Promise { - return runWithSdkLogContext(async () => { - if (this.event.payload.event_type === EventType.StartLoadingData) { - const itemTypes = itemTypesToLoad.map( - (itemTypeToLoad) => itemTypeToLoad.itemType - ); - - if (!itemTypes.length) { - console.warn('No item types to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; - } - - const filesToLoad = await this.getLoaderBatches({ - supportedItemTypes: itemTypes, - }); - this.adapterState.sdkState.fromDevRev = { - filesToLoad, - }; - } - - if ( - !this.adapterState.sdkState.fromDevRev || - !this.adapterState.sdkState.fromDevRev.filesToLoad.length - ) { - console.warn('No files to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; - } - - console.log( - 'Files to load in state', - this.adapterState.sdkState.fromDevRev?.filesToLoad - ); - - try { - outerloop: for (const fileToLoad of this.adapterState.sdkState - .fromDevRev.filesToLoad) { - const itemTypeToLoad = itemTypesToLoad.find( - (itemTypeToLoad: ItemTypeToLoad) => - itemTypeToLoad.itemType === fileToLoad.itemType - ); - - if (!itemTypeToLoad) { - console.error( - `Item type to load not found for item type: ${fileToLoad.itemType}.` - ); - - await this.emit(LoaderEventType.DataLoadingError, { - error: { - message: `Item type to load not found for item type: ${fileToLoad.itemType}.`, - }, - }); - - break; - } - - if (!fileToLoad.completed) { - const { response, error: transformerFileError } = - await this.uploader.getJsonObjectByArtifactId({ - artifactId: fileToLoad.id, - isGzipped: true, - }); - - if (transformerFileError) { - console.error( - `Transformer file not found for artifact ID: ${fileToLoad.id}.` - ); - await this.emit(LoaderEventType.DataLoadingError, { - error: { - message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, - }, - }); - break outerloop; - } - - const transformerFile = response as ExternalSystemItem[]; - - for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { - if (this.isTimeout) { - console.log( - 'Timeout detected during data loading. Emitting progress to allow continuation.' - ); - await this.emit(LoaderEventType.DataLoadingProgress); - process.exit(0); - } - - const { report, rateLimit } = await this.loadItem({ - item: transformerFile[i], - itemTypeToLoad, - }); - - if (rateLimit?.delay) { - await this.emit(LoaderEventType.DataLoadingDelayed, { - delay: rateLimit.delay, - reports: this.reports, - processed_files: this.processedFiles, - }); - - break outerloop; - } - - if (report) { - addReportToLoaderReport({ - loaderReports: this.loaderReports, - report, - }); - fileToLoad.lineToProcess = fileToLoad.lineToProcess + 1; - } - } - - fileToLoad.completed = true; - this._processedFiles.push(fileToLoad.id); - } - } - } catch (error) { - console.error('Error during data loading.', serializeError(error)); - await this.emit(LoaderEventType.DataLoadingError, { - error: { - message: `Error during data loading. ${serializeError(error)}`, - }, - }); - process.exit(1); - } - - return { - reports: this.reports, - processed_files: this.processedFiles, - }; - }); - } - - async getLoaderBatches({ - supportedItemTypes, - }: { - supportedItemTypes: string[]; - }) { - return runWithSdkLogContext(async () => { - const statsFileArtifactId = this.event.payload.event_data?.stats_file; - - if (statsFileArtifactId) { - const { response, error: statsFileError } = - await this.uploader.getJsonObjectByArtifactId({ - artifactId: statsFileArtifactId, - }); - - const statsFile = response as StatsFileObject[]; - - if (statsFileError || statsFile.length === 0) { - return [] as FileToLoad[]; - } - - const filesToLoad = getFilesToLoad({ - supportedItemTypes, - statsFile, - }); - - return filesToLoad; - } - - return [] as FileToLoad[]; - }); - } - - async loadAttachments({ - create, - }: { - create: ExternalSystemLoadingFunction; - }): Promise { - return runWithSdkLogContext(async () => { - if (this.event.payload.event_type === EventType.StartLoadingAttachments) { - this.adapterState.sdkState.fromDevRev = { - filesToLoad: await this.getLoaderBatches({ - supportedItemTypes: ['attachment'], - }), - }; - } - - if ( - !this.adapterState.sdkState.fromDevRev || - this.adapterState.sdkState.fromDevRev?.filesToLoad.length === 0 - ) { - console.log('No files to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; - } - - const filesToLoad = this.adapterState.sdkState.fromDevRev?.filesToLoad; - - try { - outerloop: for (const fileToLoad of filesToLoad) { - if (!fileToLoad.completed) { - const { response, error: transformerFileError } = - await this.uploader.getJsonObjectByArtifactId({ - artifactId: fileToLoad.id, - isGzipped: true, - }); - - const transformerFile = response as ExternalSystemAttachment[]; - - if (transformerFileError) { - console.error( - `Transformer file not found for artifact ID: ${fileToLoad.id}.` - ); - await this.emit(LoaderEventType.AttachmentLoadingError, { - error: { - message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, - }, - }); - break outerloop; - } - - for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { - if (this.isTimeout) { - console.log( - 'Timeout detected during attachment loading. Emitting progress to allow continuation.' - ); - await this.emit(LoaderEventType.AttachmentLoadingProgress); - process.exit(0); - } - - const { report, rateLimit } = await this.loadAttachment({ - item: transformerFile[i], - create, - }); - - if (rateLimit?.delay) { - await this.emit(LoaderEventType.AttachmentLoadingDelayed, { - delay: rateLimit.delay, - reports: this.reports, - processed_files: this.processedFiles, - }); - - break outerloop; - } - - if (report) { - addReportToLoaderReport({ - loaderReports: this.loaderReports, - report, - }); - fileToLoad.lineToProcess = fileToLoad.lineToProcess + 1; - } - } - - fileToLoad.completed = true; - this._processedFiles.push(fileToLoad.id); - } - } - } catch (error) { - console.error( - 'Error during attachment loading.', - serializeError(error) - ); - await this.emit(LoaderEventType.AttachmentLoadingError, { - error: { - message: `Error during attachment loading. ${serializeError( - error - )}`, - }, - }); - process.exit(1); - } - - return { - reports: this.reports, - processed_files: this.processedFiles, - }; - }); - } - - async loadItem({ - item, - itemTypeToLoad, - }: { - item: ExternalSystemItem; - itemTypeToLoad: ItemTypeToLoad; - }): Promise { - return runWithSdkLogContext(async () => { - const devrevId = item.id.devrev; - - try { - const syncMapperRecordResponse = await this._mappers.getByTargetId({ - sync_unit: this.event.payload.event_context.sync_unit, - target: devrevId, - }); - - const syncMapperRecord = syncMapperRecordResponse.data; - if (!syncMapperRecord) { - console.warn('Failed to get sync mapper record from response.'); - return { - error: { - message: 'Failed to get sync mapper record from response.', - }, - }; - } - - // Update item in external system - const { id, modifiedDate, delay, error } = await runWithUserLogContext( - async () => { - return await itemTypeToLoad.update({ - item, - mappers: this._mappers, - event: this.event, - }); - } - ); - - if (id) { - try { - const syncMapperRecordUpdateResponse = await this._mappers.update({ - id: syncMapperRecord.sync_mapper_record.id, - sync_unit: this.event.payload.event_context.sync_unit, - status: SyncMapperRecordStatus.OPERATIONAL, - ...(modifiedDate && { - external_versions: { - add: [ - { - modified_date: modifiedDate, - recipe_version: 0, - }, - ], - }, - }), - external_ids: { - add: [id], - }, - targets: { - add: [devrevId], - }, - }); - - console.log( - 'Successfully updated sync mapper record.', - syncMapperRecordUpdateResponse.data - ); - } catch (error) { - console.warn( - 'Failed to update sync mapper record.', - serializeError(error) - ); - return { - error: { - message: - 'Failed to update sync mapper record' + serializeError(error), - }, - }; - } - - return { - report: { - item_type: itemTypeToLoad.itemType, - [ActionType.UPDATED]: 1, - }, - }; - } else if (delay) { - console.log( - `Rate limited while updating item in external system, delaying for ${delay} seconds.` - ); - - return { - rateLimit: { - delay, - }, - }; - } else { - console.warn('Failed to update item in external system', error); - return { - report: { - item_type: itemTypeToLoad.itemType, - [ActionType.FAILED]: 1, - }, - }; - } - - // TODO: Update mapper (optional) - } catch (error) { - if (axios.isAxiosError(error)) { - if (error.response?.status === 404) { - // Create item in external system if mapper record not found - const { id, modifiedDate, delay, error } = - await runWithUserLogContext(async () => { - return await itemTypeToLoad.create({ - item, - mappers: this._mappers, - event: this.event, - }); - }); - - if (id) { - // Create mapper - try { - const syncMapperRecordCreateResponse = - await this._mappers.create({ - sync_unit: this.event.payload.event_context.sync_unit, - status: SyncMapperRecordStatus.OPERATIONAL, - external_ids: [id], - targets: [devrevId], - ...(modifiedDate && { - external_versions: [ - { - modified_date: modifiedDate, - recipe_version: 0, - }, - ], - }), - }); - - console.log( - 'Successfully created sync mapper record.', - syncMapperRecordCreateResponse.data - ); - - return { - report: { - item_type: itemTypeToLoad.itemType, - [ActionType.CREATED]: 1, - }, - }; - } catch (error) { - console.warn( - 'Failed to create sync mapper record.', - serializeError(error) - ); - return { - error: { - message: - 'Failed to create sync mapper record. ' + - serializeError(error), - }, - }; - } - } else if (delay) { - return { - rateLimit: { - delay, - }, - }; - } else { - console.warn( - 'Failed to create item in external system.', - serializeError(error) - ); - return { - report: { - item_type: itemTypeToLoad.itemType, - [ActionType.FAILED]: 1, - }, - }; - } - } else { - console.warn( - 'Failed to get sync mapper record.', - serializeError(error) - ); - return { - error: { - message: error.message, - }, - }; - } - } - - console.warn( - 'Failed to get sync mapper record.', - serializeError(error) - ); - return { - error: { - message: - 'Failed to get sync mapper record. ' + serializeError(error), - }, - }; - } - }); - } - - async processAttachment( - attachment: NormalizedAttachment, - stream: ExternalSystemAttachmentStreamingFunction - ): Promise { - return runWithSdkLogContext(async () => { - const { httpStream, delay, error } = await runWithUserLogContext( - async () => - stream({ - item: attachment, - event: this.event, - }) - ); - - if (error) { - return { error }; - } else if (delay) { - return { delay }; - } - - if (httpStream) { - const fileType = - attachment.content_type || - httpStream.headers['content-type']?.toString() || - 'application/octet-stream'; - const contentLength = httpStream.headers['content-length']?.toString(); - const fileSize = contentLength ? parseInt(contentLength) : undefined; - - // Get upload URL - const { error: artifactUrlError, response: artifactUrlResponse } = - await this.uploader.getArtifactUploadUrl( - attachment.file_name, - fileType, - fileSize - ); - - if (artifactUrlError) { - this.destroyHttpStream(httpStream); - return { - error: { - message: `Error while preparing artifact for attachment ID ${ - attachment.id - }. Skipping attachment. ${serializeError(artifactUrlError)}`, - fileSize: fileSize, - }, - }; - } - - // Stream attachment - const { error: uploadedArtifactError } = - await this.uploader.streamArtifact(artifactUrlResponse!, httpStream); - - if (uploadedArtifactError) { - this.destroyHttpStream(httpStream); - return { - error: { - message: - `Error while streaming to artifact for attachment ID ${attachment.id}. Skipping attachment. ` + - serializeError(uploadedArtifactError), - fileSize: fileSize, - }, - }; - } - - // Confirm attachment upload - const { error: confirmArtifactUploadError } = - await this.uploader.confirmArtifactUpload( - artifactUrlResponse!.artifact_id - ); - if (confirmArtifactUploadError) { - return { - error: { - message: - `Error while confirming upload for attachment ID ${attachment.id}. ` + - serializeError(confirmArtifactUploadError), - fileSize: fileSize, - }, - }; - } - - const ssorAttachment: SsorAttachment = { - id: { - devrev: artifactUrlResponse!.artifact_id, - external: attachment.id, - }, - parent_id: { - external: attachment.parent_id, - }, - }; - - if (attachment.author_id) { - ssorAttachment.actor_id = { - external: attachment.author_id, - }; - } - - // This will set inline flag in ssor_attachment only if it is explicity - // set in the attachment object. - if (attachment.inline === true) { - ssorAttachment.inline = true; - } else if (attachment.inline === false) { - ssorAttachment.inline = false; - } - - await this.getRepo('ssor_attachment')?.push([ssorAttachment]); - return; - } - return { - error: { - message: `Error while opening attachment stream. Skipping attachment.`, - }, - }; - }); - } - - /** - * Destroys a stream to prevent memory leaks. - * @param httpStream - The axios response stream to destroy - */ - private destroyHttpStream(httpStream: AxiosResponse): void { - try { - if (httpStream && httpStream.data) { - if (typeof httpStream.data.destroy === 'function') { - httpStream.data.destroy(); - } else if (typeof httpStream.data.close === 'function') { - httpStream.data.close(); - } - } - } catch (error) { - console.warn('Error while destroying HTTP stream:', error); - } - } - - async loadAttachment({ - item, - create, - }: { - item: ExternalSystemAttachment; - create: ExternalSystemLoadingFunction; - }): Promise { - return runWithSdkLogContext(async () => { - // Create item - const { id, delay, error } = await runWithUserLogContext(async () => - create({ - item, - mappers: this._mappers, - event: this.event, - }) - ); - - if (delay) { - return { - rateLimit: { - delay, - }, - }; - } else if (id) { - try { - const syncMapperRecordCreateResponse = await this._mappers.create({ - sync_unit: this.event.payload.event_context.sync_unit, - external_ids: [id], - targets: [item.reference_id], - status: SyncMapperRecordStatus.OPERATIONAL, - }); - - console.log( - 'Successfully created sync mapper record.', - syncMapperRecordCreateResponse.data - ); - } catch (error) { - console.warn( - 'Failed to create sync mapper record.', - serializeError(error) - ); - } - - return { - report: { - item_type: 'attachment', - [ActionType.CREATED]: 1, - }, - }; - } else { - console.warn('Failed to create attachment in external system', error); - return { - report: { - item_type: 'attachment', - [ActionType.FAILED]: 1, - }, - }; - } - }); - } - - /** - * Streams the attachments to the DevRev platform. - * The attachments are streamed to the platform and the artifact information is returned. - * @param params - The parameters to stream the attachments - * @returns The response object containing the ssorAttachment artifact information - * or error information if there was an error - */ - async streamAttachments({ - stream, - processors, - batchSize = 1, // By default, we want to stream one attachment at a time - }: { - stream: ExternalSystemAttachmentStreamingFunction; - processors?: ExternalSystemAttachmentProcessors< - ConnectorState, - NormalizedAttachment[], - NewBatch - >; - batchSize?: number; - }): Promise { - return runWithSdkLogContext(async () => { - if (batchSize <= 0) { - console.warn( - `The specified batch size (${batchSize}) is invalid. Using 1 instead.` - ); - batchSize = 1; - } - - if (batchSize > 50) { - console.warn( - `The specified batch size (${batchSize}) is too large. Using 50 instead.` - ); - batchSize = 50; - } - - const repos = [ - { - itemType: 'ssor_attachment', - }, - ]; - this.initializeRepos(repos); - - const attachmentsMetadata = this.sdkState.toDevRev?.attachmentsMetadata; - - // If there are no attachments metadata artifact IDs in state, finish here - if (!attachmentsMetadata?.artifactIds?.length) { - console.log(`No attachments metadata artifact IDs found in state.`); - return; - } else { - console.log( - `Found ${attachmentsMetadata.artifactIds.length} attachments metadata artifact IDs in state.` - ); - } - - // Loop through the attachments metadata artifact IDs - while (attachmentsMetadata.artifactIds.length > 0) { - const attachmentsMetadataArtifactId = - attachmentsMetadata.artifactIds[0]; - - console.log( - `Started processing attachments for attachments metadata artifact ID: ${attachmentsMetadataArtifactId}.` - ); - - const { attachments, error } = - await this.uploader.getAttachmentsFromArtifactId({ - artifact: attachmentsMetadataArtifactId, - }); - - if (error) { - console.error( - `Failed to get attachments for artifact ID: ${attachmentsMetadataArtifactId}.` - ); - return { error }; - } - - if (!attachments || attachments.length === 0) { - console.warn( - `No attachments found for artifact ID: ${attachmentsMetadataArtifactId}.` - ); - // Remove empty artifact and reset lastProcessed - attachmentsMetadata.artifactIds.shift(); - attachmentsMetadata.lastProcessed = 0; - continue; - } - - console.log( - `Found ${attachments.length} attachments for artifact ID: ${attachmentsMetadataArtifactId}.` - ); - - let response; - - if (processors) { - console.log(`Using custom processors for attachments.`); - - const reducer = processors.reducer; - const iterator = processors.iterator; - - const reducedAttachments = runWithUserLogContext(() => - reducer({ - attachments, - adapter: this, - batchSize, - }) - ); - - response = await runWithUserLogContext(async () => { - return await iterator({ - reducedAttachments, - adapter: this, - stream, - }); - }); - } else { - console.log( - `Using attachments streaming pool for attachments streaming.` - ); - - const attachmentsPool = new AttachmentsStreamingPool({ - adapter: this, - attachments, - batchSize, - stream, - }); - - response = await attachmentsPool.streamAll(); - } - - if (response?.delay || response?.error) { - return response; - } - - // On timeout, emit progress and exit to allow continuation. - if (this.isTimeout) { - console.log( - `Timeout detected after processing attachments for artifact ID: ${attachmentsMetadataArtifactId}. Emitting progress to allow continuation.` - ); - await this.emit(ExtractorEventType.AttachmentExtractionProgress); - process.exit(0); - return; - } - - console.log( - `Finished processing all attachments for artifact ID: ${attachmentsMetadataArtifactId}.` - ); - attachmentsMetadata.artifactIds.shift(); - attachmentsMetadata.lastProcessed = 0; - if (attachmentsMetadata.lastProcessedAttachmentsIdsList) { - attachmentsMetadata.lastProcessedAttachmentsIdsList.length = 0; - } - } - - return; - }); - } -} diff --git a/src/types/extraction.ts b/src/types/extraction.ts index 34189d0..9b3a7c7 100644 --- a/src/types/extraction.ts +++ b/src/types/extraction.ts @@ -6,7 +6,7 @@ import { ErrorRecord } from './common'; import { AxiosResponse } from 'axios'; import { NormalizedAttachment } from '../repo/repo.interfaces'; -import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; +import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; import { DonV2, LoaderReport, RateLimited } from './loading'; /** @@ -434,7 +434,7 @@ export type ExternalSystemAttachmentReducerFunction< batchSize, }: { attachments: Batch; - adapter: WorkerAdapter; + adapter: ExtractionAdapter; batchSize?: number; }) => NewBatch; @@ -453,7 +453,7 @@ export type ExternalSystemAttachmentIteratorFunction = stream, }: { reducedAttachments: NewBatch; - adapter: WorkerAdapter; + adapter: ExtractionAdapter; stream: ExternalSystemAttachmentStreamingFunction; }) => Promise; diff --git a/src/types/workers.ts b/src/types/workers.ts index 8be46ef..529d783 100644 --- a/src/types/workers.ts +++ b/src/types/workers.ts @@ -2,7 +2,8 @@ import { Worker } from 'worker_threads'; import type { LogLevel } from '../logger/logger.interfaces'; import { BaseState } from '../state/state'; -import { WorkerAdapter } from '../multithreading/worker-adapter/worker-adapter'; +import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; +import { LoadingAdapter } from '../multithreading/adapters/loading-adapter'; import { AirSyncEvent, EventType, ExtractorEventType } from './extraction'; @@ -10,6 +11,16 @@ import { LoaderEventType } from './loading'; import { InitialDomainMapping } from './common'; +/** + * WorkerAdapter is the adapter passed to a worker's task/onTimeout callbacks. + * It is the extraction adapter for extraction workers and the loading adapter + * for loading workers; the SDK constructs the concrete type based on the sync + * mode of the event. + */ +export type WorkerAdapter = + | ExtractionAdapter + | LoadingAdapter; + /** * WorkerAdapterInterface is an interface for WorkerAdapter class. * @interface WorkerAdapterInterface From b405c1b27e87aad6a4ca8891478ac93a4b6e289d Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 09:41:52 +0200 Subject: [PATCH 15/22] docs: mark C5 done Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 40 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 37 insertions(+), 3 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 609a2ef..9dc272e 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -107,8 +107,42 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi - time-value-resolver.ts: signature unchanged. BREAKING: connectors reading SDK fields via adapter.state break; on-disk state auto-migrates v1→v2 on read. - **C5 — Adapter split (structural only).** `BaseAdapter` + `ExtractionAdapter` + `LoadingAdapter`. - KEEP existing `emit`-based contract working (behavior identical). Author fresh intermediate form - (this exact form exists in NO branch — v2-old-backup (oracle tag)'s split already assumes emit-from-return). + KEEP existing `emit(eventType, data)` contract (behavior identical). + CORRECTION: the oracle's adapter split IS C5-compatible — its `emit()` still has the + `emit(eventType, data)` signature (emit-from-return is genuinely deferred to C6 in the oracle too). + So follow the oracle's adapter design closely (it's a faithful reference), re-authored cleanly. + DESIGN (decided after reading current worker-adapter + oracle adapters): + - `adapters/base-adapter.ts`: `abstract class BaseAdapter` holds event/options/isTimeout/ + hasWorkerEmitted, `protected adapterState: BaseState`, `protected uploader`, + `state` get/set, `sdkState` get, `extractionScope` get, `postState()`, and the TEMPLATE-METHOD `emit()` + skeleton (hasWorkerEmitted guard → beforeEmit hook → state save (non-stateless) → emit() w/ buildEmitPayload + → afterEmit hook → WorkerMessageEmitted). Three abstract hooks: `beforeEmit(type)`, `buildEmitPayload(type)`, + `afterEmit(type)`. + - `adapters/extraction-adapter.ts`: `ExtractionAdapter extends BaseAdapter`. Owns _artifacts, repos, + currentEventDataLength, mappers? NO (mappers is loader). Owns: shouldExtract, initializeRepos, getRepo, + artifacts get/set, uploadAllRepos, processAttachment, destroyHttpStream, streamAttachments. + beforeEmit = ESU-repo-upload (the TODO block) + uploadAllRepos + the AttachmentExtractionDone boundary + update (lastSuccessfulSyncStarted/workers* on sdkState). buildEmitPayload = { artifacts }. afterEmit = + clear artifacts. ctor seeds mappers? no. (mappers stays loader-only per oracle.) + - `adapters/loading-adapter.ts`: `LoadingAdapter extends BaseAdapter`. Owns loaderReports, _processedFiles, + _mappers, reports/processedFiles/mappers getters, loadItemTypes, getLoaderBatches, loadAttachments, + loadItem, loadAttachment. beforeEmit = noop. buildEmitPayload = { reports, processed_files }. afterEmit = noop. + - NOTE the current emit() computes isExtraction/isLoader to decide payload extras; the template split makes + that implicit (each subclass's buildEmitPayload returns its own extras). Behavior-equivalent. + - `worker-adapter.ts`: REPLACE with a thin re-export module OR keep file but it just re-exports. The + `WorkerAdapter` public type becomes a UNION alias in types/workers.ts (see below). `createWorkerAdapter` + factory: keep but split or drop (process-task builds directly). Check consumers. + - `types/workers.ts`: `export type WorkerAdapter = ExtractionAdapter | LoadingAdapter`. + WorkerAdapterInterface stays (adapterState: BaseState). + - `process-task.ts`: dispatch by `event_context.mode === SyncMode.LOADING` → new LoadingAdapter, else + new ExtractionAdapter; pass concrete adapter to task/onTimeout. (process-task SPLIT into two entry points + is C6, NOT here — keep single processTask dispatching.) + - `attachments-streaming-pool.ts` + `.interfaces.ts`: retype `adapter: WorkerAdapter` → `ExtractionAdapter` + (pool is extraction-only; uses sdkState.toDevRev, processAttachment, isTimeout, emit). + - `types/extraction.ts`: imports WorkerAdapter for processor fn types — keep importing the union alias. + - DECIDED: union alias is throwaway, removed in C6 when processTask splits into typed entry points. + - Helpers `worker-adapter.helpers.ts` (getFilesToLoad, addReportToLoaderReport): move to loading-adapter's + dir or keep; used only by loading now. Keep path stable to minimize churn (import from new loading-adapter). - **C6 — Emit-from-return contract.** `task`/`onTimeout` return a `TaskResult` (`{ status: 'success'|'progress'|'delay'|'error', ... }`); the SDK maps status→phase event and emits exactly once; `emit` removed from public surface. `processTask` → `processExtractionTask` + @@ -223,7 +257,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C3 enum cleanup | ☑ done | cc05f41. Deleted deprecated enum members (EventType, ExtractorEventType, LoaderEventType) + event-type-translation.ts/.test; rewired 4 callers (process-task, spawn, control-protocol, worker-adapter) + spawn.helpers cases. Behavior-equivalent. Reviewer-approved. | | C4a state split | ☑ done | b63f3ab. BaseState + ExtractionState + LoadingState, flat shape preserved; state.ts is dispatcher by mode. Reviewer-confirmed behavior-equivalent (loading only loses inert logs). | | C4b state envelope | ☑ done | 30ba1b3. { connectorState, sdkState } envelope + v1->v2 migration shim (normalizeFetchedState). adapter.state→connector-only, new adapter.sdkState; ~28 SDK-field access sites moved. SdkState kept combined (narrowing deferred to C5). Reviewer-approved (migration cases verified). | -| C5 adapter split | ☐ todo | NOTE for C5: narrow SdkState into Base/Extraction/Loading variants here (oracle state.interfaces shows the split). | +| C5 adapter split | ☑ done | a7a877f. BaseAdapter (template emit + hooks) + ExtractionAdapter + LoadingAdapter; WorkerAdapter→union alias; processTask dispatches by mode (still single entry). worker-adapter.ts deleted; helpers→loading-adapter.helpers. Reviewer-approved (emit equivalence verified). SdkState kept combined (narrowing dropped from scope). | | C6 emit-from-return | ☐ todo | | | C7 JSDoc | ☐ todo | Phase 2 | | C8 api report | ☐ todo | Phase 2 | From 0fb6116895569ee7270ea5c1d059fd0886b7b7b2 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:02:04 +0200 Subject: [PATCH 16/22] feat(v2)!: emit from returned TaskResult instead of adapter.emit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Connectors now signal how a phase ended by RETURNING a TaskResult from their task/onTimeout callbacks; the SDK maps that status to the phase-appropriate platform event and emits it exactly once. Connectors never call emit directly. - types/workers.ts: add TaskResult ({status: 'success'|'progress'|'delay' (+delaySeconds)|'error' (+error)}) and TaskStatus. ProcessTaskInterface.task now returns Promise; onTimeout is optional and also returns Promise. TaskAdapterInterface is generic over the adapter. The WorkerAdapter union alias is removed. - spawn.helpers.ts: add getEventTypeForResult + EVENT_PHASE_MAP, mapping each incoming EventType + status to the outgoing event. Resumable phases (data/ attachment extraction, data/attachment loading) honour every status; non- resumable phases (ESU, metadata, state deletions) only have done/error, so a progress/delay there is illegal and emits an error with a descriptive msg. - base-adapter.ts: emit() is now protected (SDK-internal). New public emitFromResult(result) — the driver-invoked bridge that picks the event from the mapping and emits with the right payload. State is still saved before emit for non-stateless events. - extraction-adapter.ts: streamAttachments returns a TaskResult (timeout -> progress, pool error/delay -> error/delay, otherwise success) instead of emitting and exiting. The inline external-sync-unit emit override is removed (ESUs flow through the repo; connectors push then return success). - loading-adapter.ts: loadItemTypes and loadAttachments return a TaskResult; their former mid-flight emits (rate-limit -> delay, timeout -> progress, error -> error) become returns. They no longer call process.exit. - process-task.ts: processTask is split into processExtractionTask and processLoadingTask over a shared runWorkerTask driver that runs task (then onTimeout, or a default progress result, on timeout) and emits once from the returned TaskResult. - index.ts: export processExtractionTask + processLoadingTask; remove processTask. emit is no longer part of the public surface. BREAKING CHANGE: connectors must return a TaskResult from task/onTimeout instead of calling adapter.emit(...), and import processExtractionTask / processLoadingTask instead of processTask. adapter.emit is no longer public; the loader and attachment-streaming methods return a TaskResult to be returned from the task. onTimeout is now optional (defaults to a progress handoff). Ref: V2_PROGRESS.md C6 Co-Authored-By: Claude Opus 4.8 (1M context) --- src/index.ts | 5 +- src/multithreading/adapters/base-adapter.ts | 35 ++- .../adapters/extraction-adapter.ts | 71 ++----- .../adapters/loading-adapter.ts | 94 +++------ src/multithreading/process-task.ts | 199 ++++++++++++------ src/multithreading/spawn/spawn.helpers.ts | 170 +++++++++++++++ src/types/workers.ts | 87 +++++--- 7 files changed, 448 insertions(+), 213 deletions(-) diff --git a/src/index.ts b/src/index.ts index ce50689..f079f05 100644 --- a/src/index.ts +++ b/src/index.ts @@ -9,7 +9,10 @@ export type { RetryConfig, RouteConfig, } from './mock-server/mock-server.interfaces'; -export { processTask } from './multithreading/process-task'; +export { + processExtractionTask, + processLoadingTask, +} from './multithreading/process-task'; export { spawn } from './multithreading/spawn/spawn'; export { BaseAdapter } from './multithreading/adapters/base-adapter'; export { ExtractionAdapter } from './multithreading/adapters/extraction-adapter'; diff --git a/src/multithreading/adapters/base-adapter.ts b/src/multithreading/adapters/base-adapter.ts index e05c541..dec9285 100644 --- a/src/multithreading/adapters/base-adapter.ts +++ b/src/multithreading/adapters/base-adapter.ts @@ -14,11 +14,13 @@ import { } from '../../types/extraction'; import { LoaderEventType } from '../../types/loading'; import { + TaskResult, WorkerAdapterOptions, WorkerMessageEmitted, WorkerMessageSubject, } from '../../types/workers'; import { Uploader } from '../../uploader/uploader'; +import { getEventTypeForResult } from '../spawn/spawn.helpers'; /** * BaseAdapter holds the state and behavior shared by both sync modes and owns @@ -107,13 +109,44 @@ export abstract class BaseAdapter { newEventType: ExtractorEventType | LoaderEventType ): void; + /** + * Maps a {@link TaskResult} returned by a worker's task/onTimeout callback to + * the phase-appropriate platform event and emits it exactly once. + * + * This is the SDK-internal bridge between the return-based connector contract + * and the control protocol; it is invoked by the worker driver, not by + * connectors. Connectors signal outcomes by returning a `TaskResult`, never by + * calling `emit` directly. + * + * @param result - The status the worker reported for the current phase. + */ + async emitFromResult(result: TaskResult): Promise { + const { eventType, illegal } = getEventTypeForResult( + this.event.payload.event_type, + result.status + ); + + const data: EventData = {}; + if (result.status === 'delay') { + data.delay = result.delaySeconds; + } else if (result.status === 'error') { + data.error = result.error; + } else if (illegal) { + data.error = { + message: `Worker returned status '${result.status}' for a non-resumable phase (${this.event.payload.event_type}), which is not allowed. Emitting an error event instead.`, + }; + } + + await this.emit(eventType, data); + } + /** * Emits an event to the platform. * * @param newEventType - The event type to be emitted * @param data - The data to be sent with the event */ - async emit( + protected async emit( newEventType: ExtractorEventType | LoaderEventType, data?: EventData ): Promise { diff --git a/src/multithreading/adapters/extraction-adapter.ts b/src/multithreading/adapters/extraction-adapter.ts index 6aba8cc..5bbbffa 100644 --- a/src/multithreading/adapters/extraction-adapter.ts +++ b/src/multithreading/adapters/extraction-adapter.ts @@ -23,11 +23,10 @@ import { ExternalSystemAttachmentStreamingFunction, ExtractorEventType, ProcessAttachmentReturnType, - StreamAttachmentsReturnType, } from '../../types/extraction'; import { LoaderEventType } from '../../types/loading'; import { BaseState } from '../../state/state'; -import { WorkerAdapterOptions } from '../../types/workers'; +import { TaskResult, WorkerAdapterOptions } from '../../types/workers'; import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; import { BaseAdapter } from './base-adapter'; @@ -192,50 +191,6 @@ export class ExtractionAdapter< this.artifacts = []; } - /** - * Emits an extraction event. For ExternalSyncUnitExtractionDone, inline - * external sync units are uploaded via a Repo first (then stripped from the - * payload to avoid SQS size limits); `beforeEmit` uploads that repo with the - * rest before the event is sent. - * TODO: Remove the external sync unit handling in v2.0.0 - * - * @param newEventType - The event type to be emitted - * @param data - The data to be sent with the event - */ - override async emit( - newEventType: ExtractorEventType | LoaderEventType, - data?: EventData - ): Promise { - if ( - newEventType === ExtractorEventType.ExternalSyncUnitExtractionDone && - data?.external_sync_units && - data.external_sync_units.length > 0 - ) { - console.log( - `Uploading ${data.external_sync_units.length} external sync units via repo before emitting event.` - ); - - this.initializeRepos([ - { - itemType: AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS, - overridenOptions: { - batchSize: 25000, - skipConfirmation: true, - }, - }, - ]); - - await this.getRepo(AirSyncDefaultItemTypes.EXTERNAL_SYNC_UNITS)?.push( - data.external_sync_units - ); - - // Remove inline external_sync_units from data to avoid SQS size issues - delete data.external_sync_units; - } - - return super.emit(newEventType, data); - } - async uploadAllRepos(): Promise { for (const repo of this.repos) { const error = await repo.upload(); @@ -397,7 +352,7 @@ export class ExtractionAdapter< NewBatch >; batchSize?: number; - }): Promise { + }): Promise { return runWithSdkLogContext(async () => { if (batchSize <= 0) { console.warn( @@ -425,7 +380,7 @@ export class ExtractionAdapter< // If there are no attachments metadata artifact IDs in state, finish here if (!attachmentsMetadata?.artifactIds?.length) { console.log(`No attachments metadata artifact IDs found in state.`); - return; + return { status: 'success' }; } else { console.log( `Found ${attachmentsMetadata.artifactIds.length} attachments metadata artifact IDs in state.` @@ -450,7 +405,7 @@ export class ExtractionAdapter< console.error( `Failed to get attachments for artifact ID: ${attachmentsMetadataArtifactId}.` ); - return { error }; + return { status: 'error', error }; } if (!attachments || attachments.length === 0) { @@ -505,18 +460,20 @@ export class ExtractionAdapter< response = await attachmentsPool.streamAll(); } - if (response?.delay || response?.error) { - return response; + if (response?.error) { + return { status: 'error', error: response.error }; + } + + if (response?.delay) { + return { status: 'delay', delaySeconds: response.delay }; } - // On timeout, emit progress and exit to allow continuation. + // On timeout, return progress to allow continuation in a fresh invocation. if (this.isTimeout) { console.log( - `Timeout detected after processing attachments for artifact ID: ${attachmentsMetadataArtifactId}. Emitting progress to allow continuation.` + `Timeout detected after processing attachments for artifact ID: ${attachmentsMetadataArtifactId}. Returning progress to allow continuation.` ); - await this.emit(ExtractorEventType.AttachmentExtractionProgress); - process.exit(0); - return; + return { status: 'progress' }; } console.log( @@ -529,7 +486,7 @@ export class ExtractionAdapter< } } - return; + return { status: 'success' }; }); } } diff --git a/src/multithreading/adapters/loading-adapter.ts b/src/multithreading/adapters/loading-adapter.ts index 66a3fd2..49fdd2b 100644 --- a/src/multithreading/adapters/loading-adapter.ts +++ b/src/multithreading/adapters/loading-adapter.ts @@ -25,10 +25,9 @@ import { LoaderEventType, LoaderReport, LoadItemResponse, - LoadItemTypesResponse, StatsFileObject, } from '../../types/loading'; -import { WorkerAdapterOptions } from '../../types/workers'; +import { TaskResult, WorkerAdapterOptions } from '../../types/workers'; import { BaseAdapter } from './base-adapter'; import { @@ -99,7 +98,7 @@ export class LoadingAdapter< async loadItemTypes({ itemTypesToLoad, - }: ItemTypesToLoadParams): Promise { + }: ItemTypesToLoadParams): Promise { return runWithSdkLogContext(async () => { if (this.event.payload.event_type === EventType.StartLoadingData) { const itemTypes = itemTypesToLoad.map( @@ -108,10 +107,7 @@ export class LoadingAdapter< if (!itemTypes.length) { console.warn('No item types to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; + return { status: 'success' }; } const filesToLoad = await this.getLoaderBatches({ @@ -127,10 +123,7 @@ export class LoadingAdapter< !this.sdkState.fromDevRev.filesToLoad.length ) { console.warn('No files to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; + return { status: 'success' }; } console.log( @@ -139,8 +132,7 @@ export class LoadingAdapter< ); try { - outerloop: for (const fileToLoad of this.sdkState.fromDevRev - .filesToLoad) { + for (const fileToLoad of this.sdkState.fromDevRev.filesToLoad) { const itemTypeToLoad = itemTypesToLoad.find( (itemTypeToLoad: ItemTypeToLoad) => itemTypeToLoad.itemType === fileToLoad.itemType @@ -151,13 +143,12 @@ export class LoadingAdapter< `Item type to load not found for item type: ${fileToLoad.itemType}.` ); - await this.emit(LoaderEventType.DataLoadingError, { + return { + status: 'error', error: { message: `Item type to load not found for item type: ${fileToLoad.itemType}.`, }, - }); - - break; + }; } if (!fileToLoad.completed) { @@ -171,12 +162,12 @@ export class LoadingAdapter< console.error( `Transformer file not found for artifact ID: ${fileToLoad.id}.` ); - await this.emit(LoaderEventType.DataLoadingError, { + return { + status: 'error', error: { message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, }, - }); - break outerloop; + }; } const transformerFile = response as ExternalSystemItem[]; @@ -184,10 +175,9 @@ export class LoadingAdapter< for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { if (this.isTimeout) { console.log( - 'Timeout detected during data loading. Emitting progress to allow continuation.' + 'Timeout detected during data loading. Returning progress to allow continuation.' ); - await this.emit(LoaderEventType.DataLoadingProgress); - process.exit(0); + return { status: 'progress' }; } const { report, rateLimit } = await this.loadItem({ @@ -196,13 +186,7 @@ export class LoadingAdapter< }); if (rateLimit?.delay) { - await this.emit(LoaderEventType.DataLoadingDelayed, { - delay: rateLimit.delay, - reports: this.reports, - processed_files: this.processedFiles, - }); - - break outerloop; + return { status: 'delay', delaySeconds: rateLimit.delay }; } if (report) { @@ -220,18 +204,15 @@ export class LoadingAdapter< } } catch (error) { console.error('Error during data loading.', serializeError(error)); - await this.emit(LoaderEventType.DataLoadingError, { + return { + status: 'error', error: { message: `Error during data loading. ${serializeError(error)}`, }, - }); - process.exit(1); + }; } - return { - reports: this.reports, - processed_files: this.processedFiles, - }; + return { status: 'success' }; }); } @@ -271,7 +252,7 @@ export class LoadingAdapter< create, }: { create: ExternalSystemLoadingFunction; - }): Promise { + }): Promise { return runWithSdkLogContext(async () => { if (this.event.payload.event_type === EventType.StartLoadingAttachments) { this.sdkState.fromDevRev = { @@ -286,16 +267,13 @@ export class LoadingAdapter< this.sdkState.fromDevRev?.filesToLoad.length === 0 ) { console.log('No files to load, returning.'); - return { - reports: this.reports, - processed_files: this.processedFiles, - }; + return { status: 'success' }; } const filesToLoad = this.sdkState.fromDevRev?.filesToLoad; try { - outerloop: for (const fileToLoad of filesToLoad) { + for (const fileToLoad of filesToLoad) { if (!fileToLoad.completed) { const { response, error: transformerFileError } = await this.uploader.getJsonObjectByArtifactId({ @@ -309,21 +287,20 @@ export class LoadingAdapter< console.error( `Transformer file not found for artifact ID: ${fileToLoad.id}.` ); - await this.emit(LoaderEventType.AttachmentLoadingError, { + return { + status: 'error', error: { message: `Transformer file not found for artifact ID: ${fileToLoad.id}.`, }, - }); - break outerloop; + }; } for (let i = fileToLoad.lineToProcess; i < fileToLoad.count; i++) { if (this.isTimeout) { console.log( - 'Timeout detected during attachment loading. Emitting progress to allow continuation.' + 'Timeout detected during attachment loading. Returning progress to allow continuation.' ); - await this.emit(LoaderEventType.AttachmentLoadingProgress); - process.exit(0); + return { status: 'progress' }; } const { report, rateLimit } = await this.loadAttachment({ @@ -332,13 +309,7 @@ export class LoadingAdapter< }); if (rateLimit?.delay) { - await this.emit(LoaderEventType.AttachmentLoadingDelayed, { - delay: rateLimit.delay, - reports: this.reports, - processed_files: this.processedFiles, - }); - - break outerloop; + return { status: 'delay', delaySeconds: rateLimit.delay }; } if (report) { @@ -359,20 +330,17 @@ export class LoadingAdapter< 'Error during attachment loading.', serializeError(error) ); - await this.emit(LoaderEventType.AttachmentLoadingError, { + return { + status: 'error', error: { message: `Error during attachment loading. ${serializeError( error )}`, }, - }); - process.exit(1); + }; } - return { - reports: this.reports, - processed_files: this.processedFiles, - }; + return { status: 'success' }; }); } diff --git a/src/multithreading/process-task.ts b/src/multithreading/process-task.ts index 619bf85..a43debb 100644 --- a/src/multithreading/process-task.ts +++ b/src/multithreading/process-task.ts @@ -1,85 +1,160 @@ import { isMainThread, parentPort, workerData } from 'node:worker_threads'; + import { Logger, serializeError } from '../logger/logger'; import { runWithSdkLogContext, runWithUserLogContext, } from '../logger/logger.context'; -import { createAdapterState } from '../state/state'; -import { SyncMode } from '../types/common'; +import { createExtractionState } from '../state/extraction-state'; +import { createLoadingState } from '../state/loading-state'; import { ProcessTaskInterface, - WorkerAdapter, + TaskResult, WorkerEvent, WorkerMessageSubject, } from '../types/workers'; + +import { BaseAdapter } from './adapters/base-adapter'; import { ExtractionAdapter } from './adapters/extraction-adapter'; import { LoadingAdapter } from './adapters/loading-adapter'; -export function processTask({ +/** + * Shared worker-thread driver. Builds the logger context, runs the task and + * (on timeout) the onTimeout callback against the provided adapter, maps the + * returned {@link TaskResult} to a platform event and emits it exactly once, + * and wires the error/exit plumbing. + * + * The adapter is constructed by the caller so each entry point can build its + * own typed adapter. + * + * If `onTimeout` is omitted, the SDK emits a phase-appropriate default on + * timeout: `progress` (resumable phases) or `error` (non-resumable phases) is + * handled by the status->event mapping when we emit a `progress` result. + */ +async function runWorkerTask>( + buildAdapter: () => Promise, + { task, onTimeout }: ProcessTaskInterface +): Promise { + await runWithSdkLogContext(async () => { + try { + const adapter = await buildAdapter(); + + parentPort?.on(WorkerEvent.WorkerMessage, (message) => { + if (message.subject !== WorkerMessageSubject.WorkerMessageExit) { + return; + } + console.log('Timeout received. Waiting for the task to finish.'); + adapter.isTimeout = true; + }); + + let result: TaskResult = await runWithUserLogContext(async () => + task({ adapter }) + ); + + // On timeout, hand off to onTimeout (or default to a progress result). + if (adapter.isTimeout && !adapter.hasWorkerEmitted) { + result = onTimeout + ? await runWithUserLogContext(async () => onTimeout({ adapter })) + : { status: 'progress' }; + } + + if (!adapter.hasWorkerEmitted) { + await adapter.emitFromResult(result); + } + + process.exit(0); + } catch (error) { + runWithUserLogContext(() => { + const errorMessage = `Error while processing task. ${serializeError( + error + )}`; + console.error(errorMessage); + parentPort?.postMessage({ + subject: WorkerMessageSubject.WorkerMessageFailed, + payload: { message: errorMessage }, + }); + process.exit(1); + }); + } + }); +} + +/** + * Entry point for an extraction worker. Builds an {@link ExtractionAdapter} and + * runs the provided task against it. + * + * @public + */ +export function processExtractionTask({ task, onTimeout, -}: ProcessTaskInterface) { +}: ProcessTaskInterface>) { if (isMainThread) { return; } - void (async () => { - await runWithSdkLogContext(async () => { - try { - const event = workerData.event; - - const initialState = workerData.initialState as ConnectorState; - const initialDomainMapping = workerData.initialDomainMapping; - const options = workerData.options; - // eslint-disable-next-line no-global-assign - console = new Logger({ event, options }); - - const adapterState = await createAdapterState({ - event, - initialState, - initialDomainMapping, - options, - }); + void runWorkerTask>( + async () => { + const event = workerData.event; + const initialState = workerData.initialState as ConnectorState; + const initialDomainMapping = workerData.initialDomainMapping; + const options = workerData.options; + // eslint-disable-next-line no-global-assign + console = new Logger({ event, options }); - const adapter: WorkerAdapter = - event.payload.event_context.mode === SyncMode.LOADING - ? new LoadingAdapter({ - event, - adapterState, - options, - }) - : new ExtractionAdapter({ - event, - adapterState, - options, - }); - - parentPort?.on(WorkerEvent.WorkerMessage, (message) => { - if (message.subject !== WorkerMessageSubject.WorkerMessageExit) { - return; - } - console.log('Timeout received. Waiting for the task to finish.'); - adapter.isTimeout = true; - }); + const adapterState = await createExtractionState({ + event, + initialState, + initialDomainMapping, + options, + }); - await runWithUserLogContext(async () => task({ adapter })); - if (adapter.isTimeout && !adapter.hasWorkerEmitted) { - await runWithUserLogContext(async () => onTimeout({ adapter })); - } - process.exit(0); - } catch (error) { - runWithUserLogContext(() => { - const errorMessage = `Error while processing task. ${serializeError( - error - )}`; - console.error(errorMessage); - parentPort?.postMessage({ - subject: WorkerMessageSubject.WorkerMessageFailed, - payload: { message: errorMessage }, - }); - process.exit(1); - }); - } - }); - })(); + return new ExtractionAdapter({ + event, + adapterState, + options, + }); + }, + { task, onTimeout } + ); +} + +/** + * Entry point for a loading worker. Builds a {@link LoadingAdapter} and runs the + * provided task against it. + * + * @public + */ +export function processLoadingTask({ + task, + onTimeout, +}: ProcessTaskInterface>) { + if (isMainThread) { + return; + } + + void runWorkerTask>( + async () => { + const event = workerData.event; + const initialState = workerData.initialState as ConnectorState; + const initialDomainMapping = workerData.initialDomainMapping; + const options = workerData.options; + // eslint-disable-next-line no-global-assign + console = new Logger({ event, options }); + + const adapterState = await createLoadingState({ + event, + initialState, + initialDomainMapping, + options, + }); + + return new LoadingAdapter({ + event, + adapterState, + options, + }); + }, + { task, onTimeout } + ); } diff --git a/src/multithreading/spawn/spawn.helpers.ts b/src/multithreading/spawn/spawn.helpers.ts index 3db03df..1f52764 100644 --- a/src/multithreading/spawn/spawn.helpers.ts +++ b/src/multithreading/spawn/spawn.helpers.ts @@ -1,5 +1,175 @@ import { EventType, ExtractorEventType } from '../../types/extraction'; import { LoaderEventType } from '../../types/loading'; +import { TaskStatus } from '../../types/workers'; + +/** + * Resolves the outgoing event type the SDK should emit for a given incoming + * event type and the {@link TaskResult} status a worker returned. + * + * The mapping follows the per-phase contract documented on `TaskResult`: + * resumable phases (data/attachment extraction, data/attachment loading) honor + * every status (success -> *_DONE, progress -> *_PROGRESS, delay -> *_DELAYED, + * error -> *_ERROR); non-resumable phases (external sync units, metadata, state + * deletion) only have a done and an error event, so a `progress`/`delay` status + * there is illegal and is mapped to the phase's error event. + * + * @param eventType - The incoming event type that started this worker. + * @param status - The status the worker's task/onTimeout returned. + * @returns The outgoing extractor/loader event type to emit, plus whether the + * status was illegal for the phase (so the caller can attach a descriptive + * error message). + */ +export function getEventTypeForResult( + eventType: EventType, + status: TaskStatus +): { + eventType: ExtractorEventType | LoaderEventType; + illegal: boolean; +} { + const phase = EVENT_PHASE_MAP[eventType]; + + if (!phase) { + console.error( + 'Event type not recognized in getEventTypeForResult function: ' + + eventType + ); + return { eventType: LoaderEventType.UnknownEventType, illegal: true }; + } + + // Non-resumable phases only define done/error events. + if (!phase.resumable) { + if (status === 'success') { + return { eventType: phase.done, illegal: false }; + } + // progress/delay are illegal here; collapse them (and error) to the error event. + return { eventType: phase.error, illegal: status !== 'error' }; + } + + switch (status) { + case 'success': + return { eventType: phase.done, illegal: false }; + case 'progress': + return { eventType: phase.progress!, illegal: false }; + case 'delay': + return { eventType: phase.delayed!, illegal: false }; + case 'error': + return { eventType: phase.error, illegal: false }; + } +} + +/** + * Per-phase outgoing event types, keyed by the incoming {@link EventType}. + * `resumable` phases define progress/delayed events; non-resumable ones do not. + */ +const EVENT_PHASE_MAP: Partial< + Record< + EventType, + { + resumable: boolean; + done: ExtractorEventType | LoaderEventType; + error: ExtractorEventType | LoaderEventType; + progress?: ExtractorEventType | LoaderEventType; + delayed?: ExtractorEventType | LoaderEventType; + } + > +> = { + // External sync units (non-resumable) + [EventType.StartExtractingExternalSyncUnits]: { + resumable: false, + done: ExtractorEventType.ExternalSyncUnitExtractionDone, + error: ExtractorEventType.ExternalSyncUnitExtractionError, + }, + // Metadata (non-resumable) + [EventType.StartExtractingMetadata]: { + resumable: false, + done: ExtractorEventType.MetadataExtractionDone, + error: ExtractorEventType.MetadataExtractionError, + }, + // Data extraction (resumable) + [EventType.StartExtractingData]: { + resumable: true, + done: ExtractorEventType.DataExtractionDone, + error: ExtractorEventType.DataExtractionError, + progress: ExtractorEventType.DataExtractionProgress, + delayed: ExtractorEventType.DataExtractionDelayed, + }, + [EventType.ContinueExtractingData]: { + resumable: true, + done: ExtractorEventType.DataExtractionDone, + error: ExtractorEventType.DataExtractionError, + progress: ExtractorEventType.DataExtractionProgress, + delayed: ExtractorEventType.DataExtractionDelayed, + }, + // Extractor state deletion (non-resumable) + [EventType.StartDeletingExtractorState]: { + resumable: false, + done: ExtractorEventType.ExtractorStateDeletionDone, + error: ExtractorEventType.ExtractorStateDeletionError, + }, + // Attachment extraction (resumable) + [EventType.StartExtractingAttachments]: { + resumable: true, + done: ExtractorEventType.AttachmentExtractionDone, + error: ExtractorEventType.AttachmentExtractionError, + progress: ExtractorEventType.AttachmentExtractionProgress, + delayed: ExtractorEventType.AttachmentExtractionDelayed, + }, + [EventType.ContinueExtractingAttachments]: { + resumable: true, + done: ExtractorEventType.AttachmentExtractionDone, + error: ExtractorEventType.AttachmentExtractionError, + progress: ExtractorEventType.AttachmentExtractionProgress, + delayed: ExtractorEventType.AttachmentExtractionDelayed, + }, + // Extractor attachments state deletion (non-resumable) + [EventType.StartDeletingExtractorAttachmentsState]: { + resumable: false, + done: ExtractorEventType.ExtractorAttachmentsStateDeletionDone, + error: ExtractorEventType.ExtractorAttachmentsStateDeletionError, + }, + // Data loading (resumable) + [EventType.StartLoadingData]: { + resumable: true, + done: LoaderEventType.DataLoadingDone, + error: LoaderEventType.DataLoadingError, + progress: LoaderEventType.DataLoadingProgress, + delayed: LoaderEventType.DataLoadingDelayed, + }, + [EventType.ContinueLoadingData]: { + resumable: true, + done: LoaderEventType.DataLoadingDone, + error: LoaderEventType.DataLoadingError, + progress: LoaderEventType.DataLoadingProgress, + delayed: LoaderEventType.DataLoadingDelayed, + }, + // Attachment loading (resumable) + [EventType.StartLoadingAttachments]: { + resumable: true, + done: LoaderEventType.AttachmentLoadingDone, + error: LoaderEventType.AttachmentLoadingError, + progress: LoaderEventType.AttachmentLoadingProgress, + delayed: LoaderEventType.AttachmentLoadingDelayed, + }, + [EventType.ContinueLoadingAttachments]: { + resumable: true, + done: LoaderEventType.AttachmentLoadingDone, + error: LoaderEventType.AttachmentLoadingError, + progress: LoaderEventType.AttachmentLoadingProgress, + delayed: LoaderEventType.AttachmentLoadingDelayed, + }, + // Loader state deletion (non-resumable) + [EventType.StartDeletingLoaderState]: { + resumable: false, + done: LoaderEventType.LoaderStateDeletionDone, + error: LoaderEventType.LoaderStateDeletionError, + }, + // Loader attachment state deletion (non-resumable) + [EventType.StartDeletingLoaderAttachmentState]: { + resumable: false, + done: LoaderEventType.LoaderAttachmentStateDeletionDone, + error: LoaderEventType.LoaderAttachmentStateDeletionError, + }, +}; /** * Gets the event type for the timeout error. diff --git a/src/types/workers.ts b/src/types/workers.ts index 529d783..dc617d1 100644 --- a/src/types/workers.ts +++ b/src/types/workers.ts @@ -2,24 +2,12 @@ import { Worker } from 'worker_threads'; import type { LogLevel } from '../logger/logger.interfaces'; import { BaseState } from '../state/state'; -import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; -import { LoadingAdapter } from '../multithreading/adapters/loading-adapter'; import { AirSyncEvent, EventType, ExtractorEventType } from './extraction'; import { LoaderEventType } from './loading'; -import { InitialDomainMapping } from './common'; - -/** - * WorkerAdapter is the adapter passed to a worker's task/onTimeout callbacks. - * It is the extraction adapter for extraction workers and the loading adapter - * for loading workers; the SDK constructs the concrete type based on the sync - * mode of the event. - */ -export type WorkerAdapter = - | ExtractionAdapter - | LoadingAdapter; +import { ErrorRecord, InitialDomainMapping } from './common'; /** * WorkerAdapterInterface is an interface for WorkerAdapter class. @@ -99,25 +87,66 @@ export interface SpawnFactoryInterface { } /** - * TaskAdapterInterface is an interface for TaskAdapter class. - * @interface TaskAdapterInterface - * @constructor - * @param {WorkerAdapter} adapter - The adapter object - */ -export interface TaskAdapterInterface { - adapter: WorkerAdapter; + * TaskResult is the value a worker's `task` (and optional `onTimeout`) callback + * returns to tell the SDK how the current phase ended. The SDK — not the + * connector — maps this status to the phase-appropriate platform event and + * emits it exactly once. Connectors never call `emit` directly. + * + * One lambda invocation = one worker process = exactly one emitted event = + * terminal. Any continuation (CONTINUE_*, next phase, retry after delay) + * happens in a fresh invocation driven by the platform. + * + * The discriminant is a bare string literal, so connectors write e.g. + * `return { status: 'delay', delaySeconds: 60 }` with no import. + * + * Status -> emitted event, per phase: + * + * | status | resumable phases | non-resumable (ESU / metadata) | + * |------------|------------------|--------------------------------| + * | 'success' | *_DONE | *_DONE | + * | 'progress' | *_PROGRESS | *_ERROR (illegal; descriptive) | + * | 'delay' | *_DELAYED | *_ERROR (illegal; descriptive) | + * | 'error' | *_ERROR | *_ERROR | + * + * Resumable phases: data/attachment extraction, data/attachment loading. + * Non-resumable phases: external sync units, metadata. + */ +export type TaskResult = + | { status: 'success' } + | { status: 'progress' } + | { status: 'delay'; delaySeconds: number } + | { status: 'error'; error: ErrorRecord }; + +/** + * Discriminant string of a {@link TaskResult}. + */ +export type TaskStatus = TaskResult['status']; + +/** + * TaskAdapterInterface is the parameter shape passed to a worker's task and + * onTimeout callbacks. + * @param adapter - The mode-specific adapter for the worker. + */ +export interface TaskAdapterInterface { + adapter: Adapter; } /** - * ProcessTaskInterface is an interface for ProcessTask class. - * @interface ProcessTaskInterface - * @constructor - * @param {function} task - The task to be executed, returns exit code - * @param {function} onTimeout - The task to be executed on timeout, returns exit code - */ -export interface ProcessTaskInterface { - task: (params: TaskAdapterInterface) => Promise; - onTimeout: (params: TaskAdapterInterface) => Promise; + * ProcessTaskInterface is the parameter shape for the process-task entry points. + * + * Both callbacks return a {@link TaskResult}; the SDK — not the connector — + * maps that status to the phase-appropriate platform event and emits it exactly + * once. Connectors never call `emit` directly. + * + * `onTimeout` is optional: if omitted, the SDK emits a phase-appropriate default + * on timeout (progress for resumable phases, error for ESU/metadata). + * + * @param task - Runs the phase; returns how it ended. + * @param onTimeout - Runs only on timeout; returns how to hand off. + */ +export interface ProcessTaskInterface { + task: (params: TaskAdapterInterface) => Promise; + onTimeout?: (params: TaskAdapterInterface) => Promise; } /** From d9ec86b0affcac473cf62748d0f6ad524e0d0b52 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:02:15 +0200 Subject: [PATCH 17/22] =?UTF-8?q?docs:=20mark=20C6=20done=20=E2=80=94=20Ph?= =?UTF-8?q?ase=201=20complete?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-Authored-By: Claude Opus 4.8 (1M context) --- V2_PROGRESS.md | 43 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 37 insertions(+), 6 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 9dc272e..6ebed94 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -143,11 +143,42 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi - DECIDED: union alias is throwaway, removed in C6 when processTask splits into typed entry points. - Helpers `worker-adapter.helpers.ts` (getFilesToLoad, addReportToLoaderReport): move to loading-adapter's dir or keep; used only by loading now. Keep path stable to minimize churn (import from new loading-adapter). -- **C6 — Emit-from-return contract.** `task`/`onTimeout` return a `TaskResult` - (`{ status: 'success'|'progress'|'delay'|'error', ... }`); the SDK maps status→phase event and emits - exactly once; `emit` removed from public surface. `processTask` → `processExtractionTask` + - `processLoadingTask`. Reference: v2-old-backup (oracle tag) `process-task.ts`, `base-adapter.ts` (mapping keys off - event_type/phase, NOT off state shape — so C4b and C6 are independent). +- **C6 — Emit-from-return contract.** NET-NEW DESIGN: emit-from-return was NEVER implemented in any + branch (oracle process-task.ts is still emit-based). Only spec is the TaskResult surface recovered from + `src/tests/backwards-compatibility/temp/airsync-sdk.api.md` (untracked leftover). DESIGN: + - types: add to types/workers.ts: + `export type TaskResult = {status:'success'} | {status:'progress'} | {status:'delay';delaySeconds:number} + | {status:'error';error:ErrorRecord};` and `export type TaskStatus = TaskResult['status'];` + Change ProcessTaskInterface: `task: (p)=>Promise`, `onTimeout?: (p)=>Promise` (optional). + - Status→event mapping (SDK-owned, per phase). Resumable phases (data/attachment extraction, data/attachment + loading): success→*_DONE, progress→*_PROGRESS, delay→*_DELAYED, error→*_ERROR. Non-resumable (ESU, metadata): + success→*_DONE, error→*_ERROR, progress/delay→*_ERROR (illegal, descriptive msg). Map keyed off + event.payload.event_type (which incoming EventType → which Extractor/Loader event per status). + - base-adapter: emit() becomes SDK-INTERNAL (drop from public surface / index). Add a driver-facing method + e.g. `async emitFromResult(result: TaskResult)` that computes the event type from event_type+status via the + mapping and calls the internal emit with the right payload (delaySeconds for delay, error for error). + - LOADER/STREAMING methods RETURN TaskResult (Rado decided): loadItemTypes, loadAttachments, streamAttachments + stop calling emit mid-flight; instead they RETURN a TaskResult (success/progress/delay/error). The reports/ + processed_files still flow via buildEmitPayload from adapter state (already wired in C5), so the methods just + return status. Their old in-method emit() calls (rate-limit→delay, timeout→progress, error→error) become + `return {status:...}`. They no longer call process.exit; the driver handles emit+exit. + - process-task.ts: split into `processExtractionTask` + `processLoadingTask` (drop single processTask + the + WorkerAdapter union alias). Shared `runWorkerTask(buildAdapter, {task,onTimeout})` driver (oracle has the + skeleton): run task→get TaskResult; if isTimeout && !hasWorkerEmitted run onTimeout (or SDK default if + onTimeout omitted: progress for resumable, error for ESU/metadata)→get TaskResult; then + adapter.emitFromResult(result); exit. Build the typed adapter via createExtractionState/createLoadingState. + - index.ts: export TaskResult, TaskStatus, processExtractionTask, processLoadingTask. REMOVE processTask. + - DECIDED (Rado): emit is SDK-INTERNAL — NOT on the public ExtractionAdapter/LoadingAdapter surface. + The spec confirms this (its adapter surfaces show no emit()). Implementation: the template-method skeleton + + beforeEmit/buildEmitPayload/afterEmit hooks become `protected` (already are for hooks); the actual emit + entry the DRIVER calls is `emitFromResult(result: TaskResult)` — make it a method the driver can call but + not documented as connector surface. Connectors calling adapter.emit() get a compile error (good). + Keep the old `emit(eventType,data)` as a protected/internal helper that emitFromResult delegates to (the + loader/streaming methods no longer call it — they return TaskResult). + - test connectors (src/tests/**): these CALL adapter.emit today → they'd break. They are test files (deferred + to C10), and build excludes them, so C6 build stays green. Migration of test workers to return-style = C10. + - Reference: oracle process-task.ts (the runWorkerTask skeleton + typed entry points) — but oracle is still + emit-based, so the TaskResult return + emitFromResult mapping is authored fresh per the spec above. ### Phase 2 — closing / interactive (batched, done at the end) - **C7 — JSDoc pass.** Bar = `src/mappers/mappers.ts` style (class block: what+when; method block: @@ -258,7 +289,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C4a state split | ☑ done | b63f3ab. BaseState + ExtractionState + LoadingState, flat shape preserved; state.ts is dispatcher by mode. Reviewer-confirmed behavior-equivalent (loading only loses inert logs). | | C4b state envelope | ☑ done | 30ba1b3. { connectorState, sdkState } envelope + v1->v2 migration shim (normalizeFetchedState). adapter.state→connector-only, new adapter.sdkState; ~28 SDK-field access sites moved. SdkState kept combined (narrowing deferred to C5). Reviewer-approved (migration cases verified). | | C5 adapter split | ☑ done | a7a877f. BaseAdapter (template emit + hooks) + ExtractionAdapter + LoadingAdapter; WorkerAdapter→union alias; processTask dispatches by mode (still single entry). worker-adapter.ts deleted; helpers→loading-adapter.helpers. Reviewer-approved (emit equivalence verified). SdkState kept combined (narrowing dropped from scope). | -| C6 emit-from-return | ☐ todo | | +| C6 emit-from-return | ☑ done | 0fb6116. task/onTimeout return TaskResult; SDK maps status→event via getEventTypeForResult and emits once (emitFromResult); emit now protected/internal; processTask→processExtractionTask+processLoadingTask; loader/stream methods return TaskResult. Reviewer-approved (mapping+state-save+no-double-emit verified). NET-NEW design (no oracle). | | C7 JSDoc | ☐ todo | Phase 2 | | C8 api report | ☐ todo | Phase 2 | | C9 exposure audit | ☐ todo | Phase 2, interactive | From d05434be0baa2cb373ba114ad3da769561aa6854 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:18:11 +0200 Subject: [PATCH 18/22] docs(v2): JSDoc pass over public surface and v2 internals Documentation-only pass (C7). Brings the v2-new structural code and the under-documented older modules up to the `src/mappers/mappers.ts` style bar (one-line what + "Used to/for ..." usage line + typed @param/@returns). Covered: - adapters: BaseAdapter template-method emit + emitFromResult, ExtractionAdapter (repos/artifacts/attachment streaming), LoadingAdapter (mappers/reports/loading). - state: BaseState lifecycle, the v1->v2 normalizeFetchedState migration shim, ExtractionState window resolution, the mode-based createAdapterState dispatcher, and state.interfaces (SdkState, envelope, V1_SDK_STATE_KEYS). - multithreading: processExtractionTask/processLoadingTask + runWorkerTask driver, spawn/Spawn supervision, and the getEventTypeForResult / EVENT_PHASE_MAP status->event mapping. - repo, uploader (+helpers/interfaces), attachments-streaming-pool. - common: control-protocol emit, install-initial-domain-mapping, errors; types/loading (previously zero JSDoc) and the types barrel. No executable code, type signatures, imports, or string literals changed (verified: every changed line is a comment). Build green, lint clean. --- .../attachments-streaming-pool.interfaces.ts | 10 ++ .../attachments-streaming-pool.ts | 41 ++++- src/common/control-protocol.ts | 20 +++ src/common/errors.ts | 9 + src/common/install-initial-domain-mapping.ts | 13 ++ src/multithreading/adapters/base-adapter.ts | 41 +++-- .../adapters/extraction-adapter.ts | 117 +++++++++++-- .../adapters/loading-adapter.helpers.ts | 26 ++- .../adapters/loading-adapter.ts | 97 ++++++++++- src/multithreading/create-worker.ts | 10 ++ src/multithreading/process-task.ts | 24 +++ src/multithreading/spawn/spawn.helpers.ts | 10 +- src/multithreading/spawn/spawn.ts | 35 +++- src/repo/repo.interfaces.ts | 40 ++++- src/repo/repo.ts | 26 +++ src/state/base-state.ts | 75 ++++++--- src/state/extraction-state.ts | 40 +++-- src/state/loading-state.ts | 19 ++- src/state/state.interfaces.ts | 45 ++++- src/state/state.ts | 10 +- src/types/index.ts | 8 + src/types/loading.ts | 154 ++++++++++++++++++ src/uploader/uploader.helpers.ts | 47 ++++-- src/uploader/uploader.interfaces.ts | 21 +++ src/uploader/uploader.ts | 110 ++++++++----- 25 files changed, 902 insertions(+), 146 deletions(-) diff --git a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts index 9a22ea0..c412758 100644 --- a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts +++ b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts @@ -4,9 +4,19 @@ import { } from '../types'; import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; +/** + * Construction parameters used to create an AttachmentsStreamingPool. + * + * Used to supply the driving extraction adapter, the attachments to stream, the concurrency limit, and + * the connector-provided streaming function. + */ export interface AttachmentsStreamingPoolParams { + /** The ExtractionAdapter that owns sync state, timeout detection, and the processAttachment call. */ adapter: ExtractionAdapter; + /** The normalized attachments to stream to DevRev. */ attachments: NormalizedAttachment[]; + /** Optional maximum number of attachments to stream concurrently (defaults to 10 in the pool). */ batchSize?: number; + /** Connector-provided function that downloads a single attachment from the external system. */ stream: ExternalSystemAttachmentStreamingFunction; } diff --git a/src/attachments-streaming/attachments-streaming-pool.ts b/src/attachments-streaming/attachments-streaming-pool.ts index 58281e1..d7cf4a0 100644 --- a/src/attachments-streaming/attachments-streaming-pool.ts +++ b/src/attachments-streaming/attachments-streaming-pool.ts @@ -8,6 +8,12 @@ import { } from '../types'; import { AttachmentsStreamingPoolParams } from './attachments-streaming-pool.interfaces'; +/** + * Concurrency-bounded pool that streams a batch of attachments from the external system to DevRev. + * + * Used during attachment extraction to download up to batchSize attachments in parallel while honoring + * timeouts, rate-limit delays, and per-attachment errors, and to track processed attachments for resumption. + */ export class AttachmentsStreamingPool { private adapter: ExtractionAdapter; private attachments: NormalizedAttachment[]; @@ -31,6 +37,14 @@ export class AttachmentsStreamingPool { this.stream = stream; } + /** + * Increments the processed counter and periodically logs progress. + * + * Used after each attachment to report progress every PROGRESS_REPORT_INTERVAL items and briefly + * yield the event loop. + * + * @returns Promise that resolves once progress has been recorded (and any brief sleep elapsed). + */ private async updateProgress() { this.totalProcessedCount++; if (this.totalProcessedCount % this.PROGRESS_REPORT_INTERVAL === 0) { @@ -41,10 +55,13 @@ export class AttachmentsStreamingPool { } /** - * Migrates processed attachments from the legacy string[] format to the new ProcessedAttachment[] format. + * Migrates processed-attachment state from the legacy string[] format to ProcessedAttachment[]. * - * @param attachments - The attachments list to migrate (either string[] or ProcessedAttachment[]) - * @returns Migrated array of ProcessedAttachment objects, or empty array if input is invalid + * Used when resuming streaming so older saved state (a list of ids) is upgraded to the structured + * { id, parent_id } form before it is consulted for de-duplication. + * + * @param attachments - The persisted list to migrate, either a string[] of ids or a ProcessedAttachment[]. + * @returns Migrated array of ProcessedAttachment objects, or an empty array if the input is invalid. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any private migrateProcessedAttachments(attachments: any): ProcessedAttachment[] { @@ -69,6 +86,15 @@ export class AttachmentsStreamingPool { return []; } + /** + * Streams every attachment in the pool, running up to batchSize streams concurrently. + * + * Used as the pool's entry point: it initializes/migrates the processed-attachments state, starts the + * initial set of worker loops, and waits for them to drain the queue or stop early on a delay. + * + * @returns Promise resolving to a ProcessAttachmentReturnType: a delay if rate-limited, an error if + * state is uninitialized, or an empty object once all attachments are processed. + */ async streamAll(): Promise { console.log( `Starting download of ${this.attachments.length} attachments, streaming ${this.batchSize} at once.` @@ -115,6 +141,15 @@ export class AttachmentsStreamingPool { return {}; } + /** + * Runs a single worker loop that pulls and streams attachments until the queue is drained. + * + * Used as one of the concurrent workers started by streamAll: it skips already-processed attachments, + * stops on timeout or a rate-limit delay, records successes, and logs/skips per-attachment errors. + * + * @returns Promise that resolves when this worker stops, either because the queue is empty or a + * timeout/delay was detected. + */ async startPoolStreaming() { // Process attachments until the attachments array is empty while (this.attachments.length > 0) { diff --git a/src/common/control-protocol.ts b/src/common/control-protocol.ts index 0d13be1..8470401 100644 --- a/src/common/control-protocol.ts +++ b/src/common/control-protocol.ts @@ -10,12 +10,32 @@ import { import { LoaderEventType } from '../types/loading'; import { LIBRARY_VERSION } from './constants'; +/** + * Parameters for emitting a worker control message back to the platform. + * + * Used by {@link emit} to construct and post the outgoing extractor/loader event. + */ export interface EmitInterface { + /** The incoming AirSync event currently being processed; supplies the callback URL, event context, and auth secrets. */ event: AirSyncEvent; + /** The outgoing event type to report. In v2 this value is used directly (no event-type translation). */ eventType: ExtractorEventType | LoaderEventType; + /** Optional payload describing progress, results, or error details to attach to the event. */ data?: EventData; } +/** + * Emits a worker control message to the parent/platform via the event callback URL. + * + * Used to report extraction/loading progress, completion, delays, or errors back to AirSync. + * Wraps the given event type and data into an ExtractorEvent/LoaderEvent envelope (stamped with + * the library version) and POSTs it to the callback URL with the service account authorization. + * + * @param event - The incoming AirSyncEvent providing callback URL, event context, and auth secrets. + * @param eventType - The outgoing ExtractorEventType or LoaderEventType, used directly as the event_type. + * @param data - Optional EventData payload to include in the emitted event. + * @returns Promise resolving to the AxiosResponse of the callback POST request. + */ export const emit = async ({ event, eventType, diff --git a/src/common/errors.ts b/src/common/errors.ts index cd420a9..cb32a8b 100644 --- a/src/common/errors.ts +++ b/src/common/errors.ts @@ -1,6 +1,15 @@ +/** Prefix used to namespace common error codes emitted by extractors. */ const ERROR_PREFIX = 'ERROR_CODE'; +/** Delimiter joining the error prefix and the specific error name in the encoded code. */ const ERROR_DELIMITER = '='; +/** + * Well-known error codes an extractor can report to signal common, externally-caused failure conditions. + * + * Used to communicate persistent source-system states (deletion, deactivation, missing access/permission) + * or sync-completion signals back to AirSync in a recognized, machine-readable form. Each member's value + * is the encoded string `ERROR_CODE=`. + */ export const enum ExtractionCommonError { // Indicates that the external system is permanently inactive or inaccessible. // This is used for persistent conditions (system deleted, deactivated, access permanently revoked) diff --git a/src/common/install-initial-domain-mapping.ts b/src/common/install-initial-domain-mapping.ts index 5a8ec43..b0adb6f 100644 --- a/src/common/install-initial-domain-mapping.ts +++ b/src/common/install-initial-domain-mapping.ts @@ -4,6 +4,19 @@ import { AirSyncEvent } from '../types/extraction'; import { serializeError } from '../logger/logger'; import { InitialDomainMapping } from '../types/common'; +/** + * Installs the connector's initial domain mapping into the sync. + * + * Used the first time a sync needs the mapping that describes how external record types/fields map + * to DevRev. It resolves the snap-in's import and version slugs, optionally creates the starting + * recipe blueprint, then installs the initial domain mapping (plus any additional mappings) via the + * airdrop recipe API. If no mapping JSON is provided the call is a no-op; blueprint creation failures + * are logged and the install continues without a blueprint. + * + * @param event - The AirSyncEvent providing the DevRev endpoint, service account token, and snap-in ID. + * @param initialDomainMappingJson - The InitialDomainMapping JSON containing the starting recipe blueprint and additional mappings. + * @returns Promise that resolves once the mapping has been installed (or early-returns if none is provided). + */ export async function installInitialDomainMapping( event: AirSyncEvent, initialDomainMappingJson: InitialDomainMapping diff --git a/src/multithreading/adapters/base-adapter.ts b/src/multithreading/adapters/base-adapter.ts index dec9285..787b240 100644 --- a/src/multithreading/adapters/base-adapter.ts +++ b/src/multithreading/adapters/base-adapter.ts @@ -23,10 +23,13 @@ import { Uploader } from '../../uploader/uploader'; import { getEventTypeForResult } from '../spawn/spawn.helpers'; /** - * BaseAdapter holds the state and behavior shared by both sync modes and owns - * the `emit` control-protocol flow as a template method. Mode-specific adapters - * (`ExtractionAdapter`, `LoadingAdapter`) implement the abstract hooks to inject - * their own pre-emit work and event payload shaping. + * Abstract base for the worker adapters, holding state and behavior shared by + * both sync modes and owning the `emit` control-protocol flow as a template method. + * + * Used as the type passed to worker tasks; mode-specific adapters + * (`ExtractionAdapter`, `LoadingAdapter`) extend it and implement the abstract + * hooks (`beforeEmit`, `buildEmitPayload`, `afterEmit`) to inject their own + * pre-emit work and event payload shaping. * * @typeParam ConnectorState - the connector-owned state shape */ @@ -73,10 +76,18 @@ export abstract class BaseAdapter { return this.adapterState.sdkState; } + /** Per-item-type extraction scope (which item types to extract). */ get extractionScope() { return this.adapterState.extractionScope; } + /** + * Persists the current adapter state to the platform. + * + * Used to checkpoint connector and SDK state outside of an emit. + * + * @returns Promise that resolves once the state has been posted. + */ async postState() { return runWithSdkLogContext(async () => { await this.adapterState.postState(); @@ -113,12 +124,15 @@ export abstract class BaseAdapter { * Maps a {@link TaskResult} returned by a worker's task/onTimeout callback to * the phase-appropriate platform event and emits it exactly once. * - * This is the SDK-internal bridge between the return-based connector contract + * Used as the SDK-internal bridge between the return-based connector contract * and the control protocol; it is invoked by the worker driver, not by * connectors. Connectors signal outcomes by returning a `TaskResult`, never by - * calling `emit` directly. + * calling `emit` directly. A `delay`/`error` status carries its delay seconds + * or error into the event data; a status that is illegal for a non-resumable + * phase is downgraded to an error event. * - * @param result - The status the worker reported for the current phase. + * @param result - The TaskResult status the worker reported for the current phase. + * @returns Promise that resolves once the mapped event has been emitted. */ async emitFromResult(result: TaskResult): Promise { const { eventType, illegal } = getEventTypeForResult( @@ -141,10 +155,17 @@ export abstract class BaseAdapter { } /** - * Emits an event to the platform. + * Emits a single event to the platform via the template-method flow. + * + * Used as the one place that sends a control-protocol event: it runs the + * `beforeEmit` hook, persists state (except for stateless start/delete events), + * merges in the mode-specific `buildEmitPayload`, sends the event, then runs + * `afterEmit`. Guarded by `hasWorkerEmitted` so it emits at most once; any + * failure in preparation, state posting, or sending signals the worker to exit. * - * @param newEventType - The event type to be emitted - * @param data - The data to be sent with the event + * @param newEventType - The ExtractorEventType or LoaderEventType to emit. + * @param data - Optional EventData (e.g. delay or error) merged into the payload. + * @returns Promise that resolves once the emit attempt has completed. */ protected async emit( newEventType: ExtractorEventType | LoaderEventType, diff --git a/src/multithreading/adapters/extraction-adapter.ts b/src/multithreading/adapters/extraction-adapter.ts index 5bbbffa..39cd036 100644 --- a/src/multithreading/adapters/extraction-adapter.ts +++ b/src/multithreading/adapters/extraction-adapter.ts @@ -32,10 +32,14 @@ import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; import { BaseAdapter } from './base-adapter'; /** - * ExtractionAdapter is the adapter passed to extraction tasks. It exposes the - * extraction surface (repos, artifacts, attachment streaming) and uploads - * pending repos and updates the extraction boundaries before emitting. + * Worker adapter passed to extraction tasks, exposing the extraction surface + * (repos, artifacts, attachment streaming). * + * Used during the extraction phases; before emitting it uploads all pending + * repos and, on `AttachmentExtractionDone`, advances the sync boundaries on + * `sdkState`. + * + * @typeParam ConnectorState - the connector-owned state shape * @public */ export class ExtractionAdapter< @@ -56,7 +60,12 @@ export class ExtractionAdapter< /** * Returns whether the given item type should be extracted. - * Defaults to true if the scope is empty or the item type is not listed. + * + * Used to honor the per-item-type extraction scope; defaults to true when the + * scope is empty or the item type is not listed. + * + * @param itemType - The item type name to check. + * @returns True if the item type should be extracted. */ shouldExtract(itemType: string): boolean { const scope = this.extractionScope; @@ -65,6 +74,17 @@ export class ExtractionAdapter< return scope[itemType].extract; } + /** + * Initializes the adapter's repos from the given repo definitions. + * + * Used to set up the in-memory item buffers an extraction task pushes to; + * each repo normalizes items (except external domain metadata and SSOR + * attachments) and, on upload, records attachment artifact IDs in state and + * tracks event payload size to trigger a timeout when the SQS size threshold + * is exceeded. + * + * @param repos - The RepoInterface definitions to build repos from. + */ initializeRepos(repos: RepoInterface[]) { this.repos = repos.map((repo) => { const shouldNormalize = @@ -104,6 +124,15 @@ export class ExtractionAdapter< }); } + /** + * Looks up an initialized repo by item type. + * + * Used by extraction tasks to get the buffer they push normalized items into; + * logs an error and returns undefined when no repo matches. + * + * @param itemType - The item type name of the repo to find. + * @returns The matching Repo, or undefined if none was initialized. + */ getRepo(itemType: string): Repo | undefined { return runWithSdkLogContext(() => { const repo = this.repos.find((repo) => repo.itemType === itemType); @@ -117,16 +146,32 @@ export class ExtractionAdapter< }); } + /** Artifacts accumulated since the last emit, sent with the next event. */ get artifacts(): Artifact[] { return this._artifacts; } + /** Appends artifacts to the accumulator, de-duplicating by reference. */ set artifacts(artifacts: Artifact[]) { this._artifacts = this._artifacts .concat(artifacts) .filter((value, index, self) => self.indexOf(value) === index); } + /** + * Pre-emit hook: uploads all repos and, when extraction completes, advances + * the sync boundaries. + * + * Used as the extraction implementation of the `BaseAdapter` template hook. + * Always uploads pending repos; on `AttachmentExtractionDone` it promotes + * `lastSyncStarted` to `lastSuccessfulSyncStarted`, clears the pending worker + * boundaries, and expands `workersOldest`/`workersNewest` from the event's + * extract-from/extract-to window. + * + * @param newEventType - The event type about to be emitted. + * @returns Promise that resolves once pre-emit work is done; rejects (aborting + * the emit) if a repo upload fails. + */ protected async beforeEmit( newEventType: ExtractorEventType | LoaderEventType ): Promise { @@ -178,6 +223,15 @@ export class ExtractionAdapter< } } + /** + * Builds the extraction-specific extras merged into the emitted event payload. + * + * Used as the extraction implementation of the `BaseAdapter` template hook; + * returns the accumulated artifacts for extraction events and nothing otherwise. + * + * @param newEventType - The event type about to be emitted. + * @returns EventData carrying the accumulated artifacts for extraction events. + */ protected buildEmitPayload( newEventType: ExtractorEventType | LoaderEventType ): EventData { @@ -187,10 +241,24 @@ export class ExtractionAdapter< return isExtractionEvent ? { artifacts: this.artifacts } : {}; } + /** + * Post-emit hook: clears the accumulated artifacts after a successful emit. + * + * Used as the extraction implementation of the `BaseAdapter` template hook so + * artifacts are not re-sent on a subsequent emit. + */ protected afterEmit(): void { this.artifacts = []; } + /** + * Uploads all initialized repos and collects their resulting artifacts. + * + * Used by `beforeEmit` to flush buffered items to artifacts before an event is + * sent; throws on the first repo that fails to upload. + * + * @returns Promise that resolves once every repo has been uploaded. + */ async uploadAllRepos(): Promise { for (const repo of this.repos) { const error = await repo.upload(); @@ -201,6 +269,20 @@ export class ExtractionAdapter< } } + /** + * Streams a single attachment from the external system into a DevRev artifact + * and records the SSOR attachment mapping. + * + * Used while streaming attachments: it opens the external stream, requests an + * artifact upload URL, streams the bytes, confirms the upload, then pushes an + * `ssor_attachment` linking the external and DevRev IDs. A delay or error from + * the stream is surfaced back to the caller and the HTTP stream is destroyed. + * + * @param attachment - The NormalizedAttachment to fetch and upload. + * @param stream - The ExternalSystemAttachmentStreamingFunction that opens the source stream. + * @returns Promise with undefined on success, or a ProcessAttachmentReturnType + * carrying a delay or error. + */ async processAttachment( attachment: NormalizedAttachment, stream: ExternalSystemAttachmentStreamingFunction @@ -316,8 +398,12 @@ export class ExtractionAdapter< } /** - * Destroys a stream to prevent memory leaks. - * @param httpStream - The axios response stream to destroy + * Destroys an HTTP response stream to prevent memory leaks. + * + * Used to release an open attachment source stream when uploading fails; + * swallows any error raised while destroying. + * + * @param httpStream - The AxiosResponse stream to destroy. */ private destroyHttpStream(httpStream: AxiosResponse): void { try { @@ -334,11 +420,20 @@ export class ExtractionAdapter< } /** - * Streams the attachments to the DevRev platform. - * The attachments are streamed to the platform and the artifact information is returned. - * @param params - The parameters to stream the attachments - * @returns The response object containing the ssorAttachment artifact information - * or error information if there was an error + * Streams all pending attachments to the DevRev platform and returns the phase + * outcome. + * + * Used as the entry point for the attachment-streaming phase: it iterates the + * attachments-metadata artifact IDs recorded in state, fetches each batch, and + * streams them either through caller-provided processors or the default + * streaming pool (with batch size clamped to 1..50). Returns `delay` on a + * rate limit, `error` on failure, `progress` on timeout (to resume in a fresh + * invocation), and `success` once every artifact ID is processed. + * + * @param stream - The ExternalSystemAttachmentStreamingFunction used to open each source stream. + * @param processors - Optional ExternalSystemAttachmentProcessors (reducer + iterator) overriding the default pool. + * @param batchSize - Number of attachments to stream concurrently; defaults to 1, clamped to 1..50. + * @returns Promise with the TaskResult describing the phase outcome. */ async streamAttachments({ stream, diff --git a/src/multithreading/adapters/loading-adapter.helpers.ts b/src/multithreading/adapters/loading-adapter.helpers.ts index ecab2de..e7f3187 100644 --- a/src/multithreading/adapters/loading-adapter.helpers.ts +++ b/src/multithreading/adapters/loading-adapter.helpers.ts @@ -6,10 +6,15 @@ import { } from '../../types/loading'; /** - * Gets the files to load for the loader. - * @param {string[]} supportedItemTypes - The supported item types - * @param {StatsFileObject[]} statsFile - The stats file - * @returns {FileToLoad[]} The files to load + * Builds the ordered list of files to load from a stats file. + * + * Used by the loading adapter to filter the stats file down to the supported + * item types, order the entries to match that item-type order, and shape each + * into a FileToLoad with progress fields reset. + * + * @param supportedItemTypes - The supported item type names, in desired load order. + * @param statsFile - The StatsFileObject entries describing available files. + * @returns The FileToLoad entries to process, ordered by supported item type. */ export function getFilesToLoad({ supportedItemTypes, @@ -50,10 +55,15 @@ export function getFilesToLoad({ } /** - * Adds a report to the loader report. - * @param {LoaderReport[]} loaderReports - The loader reports - * @param {LoaderReport} report - The report to add - * @returns {LoaderReport[]} The updated loader reports + * Merges a report into the accumulated loader reports. + * + * Used to keep one running report per item type: when a report for the same + * item type already exists its created/updated/failed counts are summed, + * otherwise the report is appended. + * + * @param loaderReports - The existing LoaderReport accumulator (mutated in place). + * @param report - The LoaderReport to merge in. + * @returns The updated LoaderReport accumulator. */ export function addReportToLoaderReport({ loaderReports, diff --git a/src/multithreading/adapters/loading-adapter.ts b/src/multithreading/adapters/loading-adapter.ts index 49fdd2b..70c4fe9 100644 --- a/src/multithreading/adapters/loading-adapter.ts +++ b/src/multithreading/adapters/loading-adapter.ts @@ -36,9 +36,14 @@ import { } from './loading-adapter.helpers'; /** - * LoadingAdapter is the adapter passed to loading tasks. It exposes the loading - * surface (item/attachment loading, mappers, loader reports). + * Worker adapter passed to loading tasks, exposing the loading surface + * (item/attachment loading, mappers, loader reports). * + * Used during the loading phases to push transformed DevRev items into the + * external system, maintaining sync mapper records and accumulating per-item-type + * loader reports across emits. + * + * @typeParam ConnectorState - the connector-owned state shape * @public */ export class LoadingAdapter< @@ -62,22 +67,43 @@ export class LoadingAdapter< }); } + /** Per-item-type loader reports accumulated across loads, sent with each emit. */ get reports(): LoaderReport[] { return this.loaderReports; } + /** Artifact IDs of transformer files that have been fully loaded. */ get processedFiles(): string[] { return this._processedFiles; } + /** Mappers client for reading/writing sync mapper records. */ get mappers(): Mappers { return this._mappers; } + /** + * Pre-emit hook for loading; intentionally a no-op. + * + * Used as the loading implementation of the `BaseAdapter` template hook; + * loading has no repos to upload and no extraction boundaries to advance. + * + * @returns Promise that resolves immediately. + */ protected async beforeEmit(): Promise { // Loading has no pre-emit work (no repos, no extraction boundaries). } + /** + * Builds the loading-specific extras merged into the emitted event payload. + * + * Used as the loading implementation of the `BaseAdapter` template hook; + * returns the accumulated reports and processed files for loader events and + * nothing otherwise. + * + * @param newEventType - The event type about to be emitted. + * @returns EventData carrying loader reports and processed files for loader events. + */ protected buildEmitPayload( newEventType: ExtractorEventType | LoaderEventType ): EventData { @@ -92,10 +118,29 @@ export class LoadingAdapter< : {}; } + /** + * Post-emit hook for loading; intentionally a no-op. + * + * Used as the loading implementation of the `BaseAdapter` template hook; + * loading keeps its accumulated reports and processed files across emits. + */ protected afterEmit(): void { // Loading keeps its accumulated reports/processed files across emits. } + /** + * Loads all supported item types into the external system and returns the + * phase outcome. + * + * Used as the entry point for the data-loading phase: on `StartLoadingData` it + * resolves the batches of transformer files to load from the stats file, then + * for each file loads every item via `loadItem`, tracking progress per line so + * a fresh invocation can resume. Returns `delay` on a rate limit, `progress` + * on timeout, `error` on failure, and `success` once all files are processed. + * + * @param itemTypesToLoad - The ItemTypeToLoad definitions (create/update handlers per item type). + * @returns Promise with the TaskResult describing the phase outcome. + */ async loadItemTypes({ itemTypesToLoad, }: ItemTypesToLoadParams): Promise { @@ -216,6 +261,17 @@ export class LoadingAdapter< }); } + /** + * Resolves the ordered list of transformer files to load for the given item + * types. + * + * Used by `loadItemTypes`/`loadAttachments` to turn the event's stats file + * into `FileToLoad` batches, filtered to the supported item types; returns an + * empty list when there is no stats file, it cannot be fetched, or it is empty. + * + * @param supportedItemTypes - The item type names to include, in load order. + * @returns Promise with the FileToLoad batches to process. + */ async getLoaderBatches({ supportedItemTypes, }: { @@ -248,6 +304,18 @@ export class LoadingAdapter< }); } + /** + * Loads attachments into the external system and returns the phase outcome. + * + * Used as the entry point for the attachment-loading phase: on + * `StartLoadingAttachments` it resolves the attachment transformer files to + * load, then creates each attachment via `loadAttachment`, tracking progress + * per line for resumption. Returns `delay` on a rate limit, `progress` on + * timeout, `error` on failure, and `success` once all files are processed. + * + * @param create - The ExternalSystemLoadingFunction that creates an attachment in the external system. + * @returns Promise with the TaskResult describing the phase outcome. + */ async loadAttachments({ create, }: { @@ -344,6 +412,19 @@ export class LoadingAdapter< }); } + /** + * Loads a single item into the external system, creating or updating as needed. + * + * Used per item by `loadItemTypes`: it looks up the sync mapper record by + * DevRev target ID and calls the item type's `update` handler; if no mapper + * record exists (404) it falls back to the `create` handler. On success it + * creates or updates the sync mapper record and reports the action; a rate + * limit surfaces a delay and other failures are reported as failed. + * + * @param item - The ExternalSystemItem to load. + * @param itemTypeToLoad - The ItemTypeToLoad providing the create/update handlers. + * @returns Promise with a LoadItemResponse carrying a report, a rateLimit delay, or an error. + */ async loadItem({ item, itemTypeToLoad, @@ -550,6 +631,18 @@ export class LoadingAdapter< }); } + /** + * Creates a single attachment in the external system. + * + * Used per attachment by `loadAttachments`: it calls the `create` handler and, + * on success, creates a sync mapper record linking the new external ID to the + * attachment's reference ID and reports it as created. A rate limit surfaces a + * delay; a missing ID is reported as failed (attachments are create-only). + * + * @param item - The ExternalSystemAttachment to create. + * @param create - The ExternalSystemLoadingFunction that creates the attachment. + * @returns Promise with a LoadItemResponse carrying a report or a rateLimit delay. + */ async loadAttachment({ item, create, diff --git a/src/multithreading/create-worker.ts b/src/multithreading/create-worker.ts index abd3503..72b9655 100644 --- a/src/multithreading/create-worker.ts +++ b/src/multithreading/create-worker.ts @@ -2,6 +2,16 @@ import { isMainThread, Worker } from 'node:worker_threads'; import { WorkerData, WorkerEvent } from '../types/workers'; +/** + * Creates a Node worker thread that runs the snap-in's task worker script. + * + * Used by `spawn` to launch the off-main-thread worker that processes an + * extraction/loading event; the promise settles once the worker comes online + * so the caller can wire up timeouts and lifecycle handling. + * + * @param workerData - The data of type WorkerData passed to the worker thread (event, initial state, options, etc.). + * @returns A Promise that resolves with the online Worker instance, or rejects with the Error if the worker fails to start or is itself a worker thread. + */ async function createWorker( workerData: WorkerData ): Promise { diff --git a/src/multithreading/process-task.ts b/src/multithreading/process-task.ts index a43debb..6db0d7c 100644 --- a/src/multithreading/process-task.ts +++ b/src/multithreading/process-task.ts @@ -30,6 +30,12 @@ import { LoadingAdapter } from './adapters/loading-adapter'; * If `onTimeout` is omitted, the SDK emits a phase-appropriate default on * timeout: `progress` (resumable phases) or `error` (non-resumable phases) is * handled by the status->event mapping when we emit a `progress` result. + * + * @param buildAdapter - Factory that constructs the typed adapter for this worker. + * @param params - The task hooks of type ProcessTaskInterface. + * @param params.task - The worker's main task; receives the adapter and resolves to a TaskResult. + * @param params.onTimeout - Optional callback run on soft timeout when nothing has emitted yet; resolves to a TaskResult. + * @returns A Promise that resolves once the result has been emitted and the worker process exits. */ async function runWorkerTask>( buildAdapter: () => Promise, @@ -83,6 +89,15 @@ async function runWorkerTask>( * Entry point for an extraction worker. Builds an {@link ExtractionAdapter} and * runs the provided task against it. * + * Used as the worker-script entry the snap-in calls inside an extraction worker + * thread; returns immediately on the main thread so the same module can be + * imported there safely. + * + * @param params - The task hooks of type ProcessTaskInterface for an ExtractionAdapter. + * @param params.task - The extraction task; receives the adapter and resolves to a TaskResult. + * @param params.onTimeout - Optional callback run on soft timeout; resolves to a TaskResult. + * @returns Nothing; emission and process exit are handled by the shared driver. + * * @public */ export function processExtractionTask({ @@ -123,6 +138,15 @@ export function processExtractionTask({ * Entry point for a loading worker. Builds a {@link LoadingAdapter} and runs the * provided task against it. * + * Used as the worker-script entry the snap-in calls inside a loading worker + * thread; returns immediately on the main thread so the same module can be + * imported there safely. + * + * @param params - The task hooks of type ProcessTaskInterface for a LoadingAdapter. + * @param params.task - The loading task; receives the adapter and resolves to a TaskResult. + * @param params.onTimeout - Optional callback run on soft timeout; resolves to a TaskResult. + * @returns Nothing; emission and process exit are handled by the shared driver. + * * @public */ export function processLoadingTask({ diff --git a/src/multithreading/spawn/spawn.helpers.ts b/src/multithreading/spawn/spawn.helpers.ts index 1f52764..30a00e1 100644 --- a/src/multithreading/spawn/spawn.helpers.ts +++ b/src/multithreading/spawn/spawn.helpers.ts @@ -59,7 +59,15 @@ export function getEventTypeForResult( /** * Per-phase outgoing event types, keyed by the incoming {@link EventType}. - * `resumable` phases define progress/delayed events; non-resumable ones do not. + * + * Each entry maps a phase to its outgoing events: every phase has a `done` + * (success) and `error` event. `resumable` phases (data/attachment extraction, + * data/attachment loading) additionally define `progress` and `delayed` events + * and accept all four statuses. Non-resumable phases (external sync units, + * metadata, state deletions) omit `progress`/`delayed`; a `progress`/`delay` + * status there is illegal and {@link getEventTypeForResult} collapses it to the + * `error` event while flagging it. Event types absent from this partial map are + * treated as unrecognized. */ const EVENT_PHASE_MAP: Partial< Record< diff --git a/src/multithreading/spawn/spawn.ts b/src/multithreading/spawn/spawn.ts index 02176ff..f31ecba 100644 --- a/src/multithreading/spawn/spawn.ts +++ b/src/multithreading/spawn/spawn.ts @@ -25,6 +25,17 @@ import { getNoScriptEventType, } from './spawn.helpers'; +/** + * Resolves the default worker script path for an incoming event type. + * + * Used by `spawn` to pick which built-in worker (external sync units, metadata, + * data/attachment extraction, data/attachment loading) to run when the caller + * has not supplied an explicit `workerPath` or override. + * + * @param event - The AirSync event whose `payload.event_type` selects the worker. + * @param workerBasePath - The base directory string the resolved relative worker path is appended to. + * @returns The full worker script path string, or null if the event type has no matching built-in worker. + */ function getWorkerPath({ event, workerBasePath, @@ -63,12 +74,14 @@ function getWorkerPath({ * Spawn class is responsible for spawning a new worker thread and managing the lifecycle of the worker. * The class provides utilities to emit control events to the platform and exit the worker gracefully. * In case of lambda timeout, the class emits a lambda timeout event to the platform. - * @param {SpawnFactoryInterface} options - The options to create a new instance of Spawn class - * @param {AirSyncEvent} options.event - The event object received from the platform - * @param {object} options.initialState - The initial state of the adapter - * @param {string} [options.workerPath] Remove getWorkerPath function and use baseWorkerPath: __dirname instead of workerPath - * @param {string} [options.baseWorkerPath] - The base path for the worker files, usually `__dirname` - * @returns {Promise} - A new instance of Spawn class + * @param options - The options of type SpawnFactoryInterface used to launch the worker. + * @param options.event - The AirSync event object received from the platform. + * @param options.initialState - The initial connector state handed to the worker. + * @param options.initialDomainMapping - The initial domain mapping handed to the worker. + * @param options.options - Optional SDK behavior overrides (timeout, local development, worker path overrides, etc.). + * @param options.workerPath - Optional explicit path to the worker script; takes precedence over overrides and the default resolver. + * @param options.baseWorkerPath - The base path for the worker files, usually `__dirname`. + * @returns A Promise that resolves once the worker finishes (or a no-script default event is emitted), or rejects if the worker fails to start. */ export async function spawn({ event, @@ -154,6 +167,16 @@ export async function spawn({ } } +/** + * Manages the lifecycle of a spawned worker thread for a single event. + * + * Used by `spawn` to supervise the worker: it arms a soft timeout (asks the + * worker to exit gracefully) and a hard timeout (terminates a stuck worker), + * relays the worker's log messages to the main thread, tracks whether the + * worker has already emitted an event, periodically logs memory usage, and on + * worker exit clears the timers and resolves the spawn promise -- emitting a + * timeout error event if the worker exited without emitting one itself. + */ export class Spawn { private event: AirSyncEvent; private alreadyEmitted: boolean; diff --git a/src/repo/repo.interfaces.ts b/src/repo/repo.interfaces.ts index f790be2..583f71e 100644 --- a/src/repo/repo.interfaces.ts +++ b/src/repo/repo.interfaces.ts @@ -4,54 +4,84 @@ import { AirSyncEvent } from '../types/extraction'; import { WorkerAdapterOptions } from '../types/workers'; /** - * RepoInterface is an interface that defines the structure of a repo which is used to store and upload extracted data. + * Describes a repo configuration that stores and uploads extracted data of one item type. + * + * Used to declare which item type a repo holds and how its raw records should be normalized. */ export interface RepoInterface { + /** The item type the repo buffers and uploads. */ itemType: string; + /** Optional normalizer turning a raw record into a NormalizedItem or NormalizedAttachment. */ normalize?: (record: object) => NormalizedItem | NormalizedAttachment; + /** Optional worker adapter options that override defaults (e.g. batch size). */ overridenOptions?: WorkerAdapterOptions; } /** - * RepoFactoryInterface is an interface that defines the structure of a repo factory which is used to create a repo. + * Construction parameters used to create a Repo instance. + * + * Used to wire a repo to its triggering event, item type, normalizer, upload callback, and options. */ export interface RepoFactoryInterface { + /** The AirSync event that drives the extraction and supplies platform credentials. */ event: AirSyncEvent; + /** The item type the repo buffers and uploads. */ itemType: string; + /** Optional normalizer turning a raw record into a NormalizedItem or NormalizedAttachment. */ normalize?: (record: object) => NormalizedItem | NormalizedAttachment; + /** Callback invoked with each Artifact once it has been uploaded. */ onUpload: (artifact: Artifact) => void; + /** Optional worker adapter options that override defaults (e.g. batch size). */ options?: WorkerAdapterOptions; } /** - * NormalizedItem is an interface of item after normalization. + * An external system item after normalization into the shape AirSync expects. + * + * Used as the uploaded representation of a non-attachment record. */ export interface NormalizedItem { + /** External system identifier of the item. */ id: string; + /** ISO timestamp of when the item was created in the external system. */ created_date: string; + /** ISO timestamp of when the item was last modified in the external system. */ modified_date: string; + /** Normalized field values of the item. */ data: object; } /** - * NormalizedAttachment is an interface of attachment after normalization. + * An external system attachment after normalization into the shape AirSync expects. + * + * Used as the uploaded metadata for an attachment whose binary is streamed separately. */ export interface NormalizedAttachment { + /** Source URL the attachment binary can be downloaded from. */ url: string; + /** External system identifier of the attachment. */ id: string; + /** Name of the attached file. */ file_name: string; + /** External system identifier of the item the attachment belongs to. */ parent_id: string; + /** Optional external system identifier of the attachment's author. */ author_id?: string; + /** Whether the attachment is embedded inline (e.g. in rich text) rather than a standalone file. */ inline?: boolean; + /** Optional MIME type of the attachment. */ content_type?: string; // This should be a string, but it was a number in the past. Due to backwards // compatibility we are keeping it also as a number. + /** Optional external identifier of the parent's parent; kept as number for backwards compatibility. */ grand_parent_id?: number | string; } /** - * Item is an interface that defines the structure of an item. + * A raw, un-normalized record extracted from the external system. + * + * Used as the input to a repo's normalize function before items are uploaded. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any export type Item = Record; diff --git a/src/repo/repo.ts b/src/repo/repo.ts index d119eea..e694b58 100644 --- a/src/repo/repo.ts +++ b/src/repo/repo.ts @@ -16,6 +16,12 @@ import { RepoFactoryInterface, } from './repo.interfaces'; +/** + * In-memory buffer that accumulates normalized items of a single item type during extraction. + * + * Used to batch pushed items (ARTIFACT_BATCH_SIZE per batch), normalize them, and upload them as + * artifacts to the DevRev platform, firing the onUpload callback for each uploaded artifact. + */ export class Repo { readonly itemType: string; private items: (NormalizedItem | NormalizedAttachment | Item)[]; @@ -41,10 +47,20 @@ export class Repo { this.uploadedArtifacts = []; } + /** Returns the items currently buffered in the repo (not yet uploaded). */ getItems(): (NormalizedItem | NormalizedAttachment | Item)[] { return this.items; } + /** + * Uploads a batch of items (or all buffered items) as a single artifact. + * + * Used to flush buffered items to the DevRev platform; on success the artifact is passed to + * onUpload and recorded in uploadedArtifacts. When no explicit batch is given the buffer is cleared. + * + * @param batch - Optional explicit array of NormalizedItem, NormalizedAttachment, or Item to upload; defaults to all buffered items. + * @returns Promise that resolves to void on success, or an ErrorRecord describing the upload failure. + */ async upload( batch?: (NormalizedItem | NormalizedAttachment | Item)[] ): Promise { @@ -84,6 +100,16 @@ export class Repo { } } + /** + * Normalizes and buffers items, uploading full batches as they accumulate. + * + * Used by connectors to feed extracted items into the repo; items are normalized (unless the item + * type is external domain metadata or SSOR attachments) and any complete batches of batchSize are + * uploaded immediately, leaving the remainder buffered for a later flush. + * + * @param items - Array of raw Item records to normalize and buffer. + * @returns Promise that resolves to true when items were buffered/uploaded successfully, or false if a batch upload threw. + */ async push(items: Item[]): Promise { let recordsToPush: (NormalizedItem | NormalizedAttachment | Item)[]; diff --git a/src/state/base-state.ts b/src/state/base-state.ts index 06db895..1b860e8 100644 --- a/src/state/base-state.ts +++ b/src/state/base-state.ts @@ -17,10 +17,11 @@ import { } from './state.interfaces'; /** - * BaseState owns the state lifecycle shared by every sync mode: connector vs. - * SDK state separation, fetch/init/post against the platform, the v1->v2 - * migration shim, and the snap-in-version-gated initial domain mapping install. + * Abstract base owning the adapter state lifecycle shared by every sync mode. * + * Used to keep connector-owned state separate from SDK bookkeeping, fetch/init/ + * post the persisted state against the platform, run the v1->v2 migration shim, + * and install the initial domain mapping gated on the snap-in version. * Mode-specific subclasses (`ExtractionState`, `LoadingState`) seed the * SDK-owned portion of the state and add mode-specific setup in their factories. * @@ -69,15 +70,21 @@ export abstract class BaseState { this._sdkState = value; } + /** The per-sync-unit extraction scope (object types to extract), loaded alongside state. */ get extractionScope(): ExtractionScope { return this._extractionScope; } /** - * Installs the initial domain mapping when the snap-in version in state does - * not match the version in the event context. Shared by all modes so that a - * loading run still installs the mapping if extraction has not done so. - * @param initialDomainMapping The initial domain mapping passed to spawn + * Installs the initial domain mapping when the version in state is stale. + * + * Used by all modes (so a loading run still installs the mapping if extraction + * has not) to (re)install whenever `sdkState.snapInVersionId` is absent or + * differs from the event context's snap-in version; on success the new version + * is recorded in state. A missing mapping or install error fails the worker. + * + * @param initialDomainMapping - The initial domain mapping of type InitialDomainMapping passed to the spawn function; required when an install is needed + * @returns Promise that resolves once the mapping is installed or the install is skipped */ async installInitialDomainMappingIfNeeded( initialDomainMapping?: InitialDomainMapping @@ -122,13 +129,17 @@ export abstract class BaseState { } /** - * Initializes the state for this adapter instance by fetching from API - * or creating an initial state if none exists (404). + * Initializes this adapter's state from persisted state, or seeds it on first run. * - * Reads both the v2 `{ connectorState, sdkState }` envelope and a legacy flat - * v1 blob (connector keys merged with SDK keys), migrating the latter on read. - * Always persists the v2 envelope going forward. - * @param initialState The initial connector state provided by the spawn function + * Used at worker start to load and normalize state: it fetches the persisted + * blob, parses it, and runs `normalizeFetchedState` so both the v2 + * `{ connectorState, sdkState }` envelope and a legacy flat v1 blob are + * accepted (the latter migrated on read). It also restores the extraction + * scope. On a 404 it seeds the initial state and persists the v2 envelope; + * any other failure fails the worker. + * + * @param initialState - The initial connector state of type ConnectorState provided by the spawn function, used when no state exists yet + * @returns Promise that resolves once state has been loaded or seeded */ async init(initialState: ConnectorState): Promise { try { @@ -181,13 +192,20 @@ export abstract class BaseState { } /** - * Normalizes a parsed on-disk state into the `{ connectorState, sdkState }` - * envelope, migrating a legacy flat v1 blob if needed. + * Normalizes parsed on-disk state into the `{ connectorState, sdkState }` envelope, migrating legacy v1 state. + * + * Used as the v1->v2 migration shim so older snap-ins keep working after the + * state split. Behavior by shape of the parsed input: + * - v2 envelope (`{ connectorState, sdkState }`): used as-is, with `sdkState` + * merged over the mode's initial SDK state to backfill newly added fields. + * - Legacy v1 flat blob: top-level keys present in `V1_SDK_STATE_KEYS` are + * split into `sdkState`, everything else becomes connector state. + * - Malformed envelope (one of `connectorState`/`sdkState` present, the other + * missing) or non-object input: throws. * - * - v2 envelope (`{ connectorState, sdkState }`): used as-is. - * - v1 flat blob: SDK-owned keys (`V1_SDK_STATE_KEYS`) split into `sdkState`, - * everything else becomes connector state. - * - Malformed envelope (one side present, the other missing) fails loud. + * @param parsed - The JSON-parsed persisted state of unknown shape (v2 envelope or legacy v1 flat blob) + * @returns The split state as `{ connectorState, sdkState }`, with `sdkState` merged over the initial SDK state + * @throws Error when the input is not an object or is a malformed envelope */ private normalizeFetchedState(parsed: unknown): { connectorState: ConnectorState; @@ -231,9 +249,14 @@ export abstract class BaseState { } /** - * Updates the state of the adapter by posting to API. - * Persists the v2 `{ connectorState, sdkState }` envelope. - * @param {object} state - The connector state to be updated + * Persists the adapter state to the platform. + * + * Used to checkpoint progress: wraps the current connector and SDK state into + * the v2 `{ connectorState, sdkState }` envelope, serializes it, and posts it. + * A serialization or request failure fails the worker. + * + * @param state - Optional connector state of type ConnectorState to set and persist; when omitted the current `this.state` is used + * @returns Promise that resolves once the state has been persisted */ async postState(state?: ConnectorState) { const url = this.workerUrl + '.update'; @@ -296,8 +319,12 @@ export abstract class BaseState { } /** - * Fetches the state of the adapter from API. - * @return The raw state data from API + * Fetches the raw persisted adapter state from the platform. + * + * Used by `init` to read the stored state before normalization; returns the + * raw, still-stringified payload without parsing or migrating it. + * + * @returns Promise resolving to `{ state, objects }`, where `state` is the stringified state blob and `objects` is the optional stringified extraction scope */ async fetchState(): Promise<{ state: string; objects?: string }> { console.log( diff --git a/src/state/extraction-state.ts b/src/state/extraction-state.ts index b78dbbc..49bd185 100644 --- a/src/state/extraction-state.ts +++ b/src/state/extraction-state.ts @@ -10,10 +10,13 @@ import { BaseState } from './base-state'; import { extractionSdkState, StateInterface } from './state.interfaces'; /** - * ExtractionState is the per-mode state for extraction workers. It seeds the - * extraction SDK state (extraction boundaries + attachments bookkeeping) on top - * of the shared lifecycle provided by `BaseState` and adds extraction-window - * resolution. + * Per-mode adapter state for extraction workers. + * + * Used to seed the extraction SDK state (extraction-window boundaries + + * attachments bookkeeping) on top of the shared lifecycle provided by + * `BaseState`, and to resolve the incremental extraction window for each event. + * + * @typeParam ConnectorState - the connector-owned state shape */ export class ExtractionState extends BaseState { constructor(params: StateInterface) { @@ -21,13 +24,19 @@ export class ExtractionState extends BaseState { } /** - * Resolves the extraction window onto the event context. + * Computes the incremental extraction window and writes `extract_from`/`extract_to` onto the event context. * - * On StartExtractingData: stamp `lastSyncStarted` if not already set. - * On StartExtractingMetadata: resolve fresh from the TimeValue objects in the - * event context and cache them as pending boundaries (always overwrite). - * On all other events: reuse the pending boundaries cached during - * StartExtractingMetadata. Finally, validate that extract_from < extract_to. + * Used so every extraction phase shares one consistent time window, read from + * the SDK-owned boundary fields in `this.sdkState`. Behavior by event type: + * - StartExtractingData: stamp `lastSyncStarted` if not already set. + * - StartExtractingMetadata: resolve fresh from the TimeValue objects in the + * event context and cache them as pending boundaries (always overwrite). + * - All other events: reuse the pending boundaries cached during + * StartExtractingMetadata. + * Finally validates that `extract_from` is older than `extract_to`, failing + * the worker if the platform supplied an inverted window. + * + * @returns void; mutates the event context and `this.sdkState` in place */ resolveExtractionWindow(): void { const sdkState = this.sdkState; @@ -122,9 +131,14 @@ export class ExtractionState extends BaseState { /** * Creates and initializes an `ExtractionState` for an extraction worker. * - * For non-stateless events this fetches persisted state, installs the initial - * domain mapping if the snap-in version changed, then resolves the extraction - * window (time-value resolution + pending boundary reuse) and validates it. + * Used by the state dispatcher to build extraction-mode state. The initial state + * is deep-cloned to avoid mutating the caller's object; for non-stateless events + * this fetches persisted state, installs the initial domain mapping if the + * snap-in version changed, then resolves the extraction window (time-value + * resolution + pending boundary reuse) and validates it. + * + * @param params - The state factory parameters of type StateInterface (event, initial connector state, optional domain mapping and worker options) + * @returns Promise resolving to the initialized ExtractionState */ export async function createExtractionState({ event, diff --git a/src/state/loading-state.ts b/src/state/loading-state.ts index 151f5bd..5b03ecb 100644 --- a/src/state/loading-state.ts +++ b/src/state/loading-state.ts @@ -4,9 +4,13 @@ import { BaseState } from './base-state'; import { loadingSdkState, StateInterface } from './state.interfaces'; /** - * LoadingState is the per-mode state for loading workers. It seeds the loading - * SDK state (files-to-load bookkeeping) on top of the shared lifecycle provided - * by `BaseState`. Loading has no extraction-window resolution. + * Per-mode adapter state for loading workers. + * + * Used to seed the loading SDK state (files-to-load bookkeeping) on top of the + * shared lifecycle provided by `BaseState`. Loading has no extraction-window + * resolution. + * + * @typeParam ConnectorState - the connector-owned state shape */ export class LoadingState extends BaseState { constructor(params: StateInterface) { @@ -17,8 +21,13 @@ export class LoadingState extends BaseState { /** * Creates and initializes a `LoadingState` for a loading worker. * - * For non-stateless events this fetches persisted state and installs the - * initial domain mapping if the snap-in version changed. + * Used by the state dispatcher to build loading-mode state. The initial state is + * deep-cloned to avoid mutating the caller's object; for non-stateless events + * this fetches persisted state and installs the initial domain mapping if the + * snap-in version changed. + * + * @param params - The state factory parameters of type StateInterface (event, initial connector state, optional domain mapping and worker options) + * @returns Promise resolving to the initialized LoadingState */ export async function createLoadingState({ event, diff --git a/src/state/state.interfaces.ts b/src/state/state.interfaces.ts index 9e10577..e3c45ca 100644 --- a/src/state/state.interfaces.ts +++ b/src/state/state.interfaces.ts @@ -3,6 +3,13 @@ import { AirSyncEvent } from '../types/extraction'; import { FileToLoad } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; +/** + * The SDK-owned portion of the persisted adapter state. + * + * Used to hold bookkeeping the SDK manages itself (extraction-window boundaries, + * attachments/files progress, installed snap-in version) separately from + * connector-owned state, so SDK internals never collide with connector keys. + */ export interface SdkState { /** * @deprecated Use extract_from and extract_to from the event context instead, @@ -20,12 +27,15 @@ export interface SdkState { /** The pending (not yet committed) newest extraction boundary (ISO 8601 timestamp). * Set on StartExtractingMetadata, reused across subsequent phases, cleared on AttachmentExtractionDone. */ pendingWorkersNewest?: string; - /** The oldest point of extraction (ISO 8601 timestamp). */ + /** The committed oldest point of extraction (ISO 8601 timestamp). */ workersOldest?: string; - /** The newest point of extraction (ISO 8601 timestamp). */ + /** The committed newest point of extraction (ISO 8601 timestamp). */ workersNewest?: string; + /** Attachments-extraction bookkeeping (artifact ids, progress cursor). Extraction mode only. */ toDevRev?: ToDevRev; + /** Loading bookkeeping (files still to load into DevRev). Loading mode only. */ fromDevRev?: FromDevRev; + /** The snap-in version id whose initial domain mapping is installed; drives reinstall on change. */ snapInVersionId?: string; } @@ -49,6 +59,12 @@ export interface AdapterStateEnvelope { sdkState: SdkState; } +/** + * SDK-owned attachments-extraction state (external system -> DevRev direction). + * + * Used to track which attachment artifacts have been streamed and how far the + * attachments phase has progressed so it can resume after a timeout. + */ export interface ToDevRev { attachmentsMetadata: { artifactIds: string[]; @@ -65,10 +81,23 @@ export interface ProcessedAttachment { parent_id: string; } +/** + * SDK-owned loading state (DevRev -> external system direction). + * + * Used to track which files still need to be loaded into the external system so + * the loading phase can resume after a timeout. + */ export interface FromDevRev { filesToLoad: FileToLoad[]; } +/** + * Constructor/factory parameters for building an adapter state instance. + * + * Used by `createAdapterState` and the per-mode factories to carry the AirSync + * event, the connector's seed state, and the optional initial domain mapping and + * worker options. + */ export interface StateInterface { event: AirSyncEvent; initialState: ConnectorState; @@ -76,6 +105,12 @@ export interface StateInterface { options?: WorkerAdapterOptions; } +/** + * The initial SDK state seeded for extraction-mode workers. + * + * Used by `ExtractionState` as the baseline `sdkState` (extraction-window + * boundaries plus attachments bookkeeping) before any persisted state is merged in. + */ export const extractionSdkState = { lastSyncStarted: '', lastSuccessfulSyncStarted: '', @@ -93,6 +128,12 @@ export const extractionSdkState = { }, }; +/** + * The initial SDK state seeded for loading-mode workers. + * + * Used by `LoadingState` as the baseline `sdkState` (files-to-load bookkeeping) + * before any persisted state is merged in. + */ export const loadingSdkState = { snapInVersionId: '', fromDevRev: { diff --git a/src/state/state.ts b/src/state/state.ts index ab7dbb5..bcb48dc 100644 --- a/src/state/state.ts +++ b/src/state/state.ts @@ -10,11 +10,13 @@ export { ExtractionState, createExtractionState } from './extraction-state'; export { LoadingState, createLoadingState } from './loading-state'; /** - * Creates and initializes the adapter state for the current worker, dispatching - * to the extraction or loading state based on the event's sync mode. + * Creates and initializes the adapter state for the current worker. * - * @param params The state factory parameters (event, initial state, options) - * @returns The initialized mode-specific state + * Used as the single entry point that dispatches to either `createLoadingState` + * or `createExtractionState` based on `event.payload.event_context.mode`. + * + * @param params - The state factory parameters of type StateInterface (event, initial state, optional domain mapping and worker options) + * @returns Promise resolving to the initialized mode-specific state (LoadingState when mode is LOADING, otherwise ExtractionState) */ export async function createAdapterState( params: StateInterface diff --git a/src/types/index.ts b/src/types/index.ts index f98c725..9be77e3 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -1,3 +1,11 @@ +/** + * Public types barrel for the SDK. + * + * Aggregates and re-exports the commonly used types across the SDK domains — common, extraction, + * loading, repo, state, uploader, mappers, and external domain metadata — so consumers can import + * them from a single entry point. + */ + // Common export { AdapterUpdateParams, diff --git a/src/types/loading.ts b/src/types/loading.ts index 37d2d30..7c58c46 100644 --- a/src/types/loading.ts +++ b/src/types/loading.ts @@ -2,22 +2,50 @@ import { Mappers } from '../mappers/mappers'; import { ErrorRecord } from './common'; import { AirSyncEvent } from './extraction'; +/** + * Describes a single prepared data file as listed in the loading stats manifest. + * + * Used during loading to enumerate the artifact files produced by extraction, along with their + * item type and record count, so the loader knows what is available to process. + */ export interface StatsFileObject { + /** Identifier of the artifact/file. */ id: string; + /** External item type contained in the file (e.g. the record type being loaded). */ item_type: string; + /** Name of the file. */ file_name: string; + /** Number of records in the file, as a string. */ count: string; } +/** + * Loader-side view of a file to be loaded, tracking its processing progress. + * + * Used to drive and resume loading of a single data file: it records how many lines exist, the next + * line to process, and whether the file has been fully consumed. + */ export interface FileToLoad { + /** Identifier of the artifact/file. */ id: string; + /** Name of the file. */ file_name: string; + /** External item type contained in the file. */ itemType: string; + /** Total number of records in the file. */ count: number; + /** Index of the next line/record to process; used to resume loading across batches. */ lineToProcess: number; + /** Whether all records in the file have been loaded. */ completed: boolean; } +/** + * An attachment to be loaded into the external system, with its source metadata and parent links. + * + * Used by attachment loading to describe a single file (location, type, size, validity window, + * audit fields) and the DevRev/external parent it belongs to. + */ export interface ExternalSystemAttachment { reference_id: DonV2; parent_type: string; @@ -35,13 +63,28 @@ export interface ExternalSystemAttachment { grand_parent_id?: string; } +/** + * A single item to be loaded into the external system. + * + * Used during loading to carry the DevRev (and optional external) identifiers, audit timestamps, + * and the system-specific payload for one record. + * + * Note: this interface is declared twice in this file with identical members (TypeScript merges + * the declarations); the duplicate is redundant — see report. + */ export interface ExternalSystemItem { + /** Identifiers linking this item to DevRev and, when known, the external system. */ id: { + /** DevRev object identifier (DON). */ devrev: DonV2; + /** External system identifier, present once the item exists in the external system. */ external?: string; }; + /** Creation timestamp of the item. */ created_date: string; + /** Last-modified timestamp of the item. */ modified_date: string; + /** System-specific record payload. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any data: any; } @@ -57,84 +100,195 @@ export interface ExternalSystemItem { data: any; } +/** + * Arguments passed to an external-system loading function for a single item. + * + * Used to give create/update handlers the item to load, the ID mappers for resolving DevRev <-> external + * references, and the current AirSync event for auth/context. + * + * @typeParam Type - The shape of the item being loaded. + */ export interface ExternalSystemItemLoadingParams { item: Type; mappers: Mappers; event: AirSyncEvent; } +/** + * Result returned by an external-system loading function for a single item. + * + * Used to report the outcome of a create/update: the resulting external id, an error message, + * the item's modified date, or a delay (in seconds) when the external system is rate limiting. + */ export interface ExternalSystemItemLoadingResponse { + /** External system id of the loaded item, when the operation succeeded. */ id?: string; + /** Error message when the operation failed. */ error?: string; + /** Modified timestamp reported by the external system after the operation. */ modifiedDate?: string; + /** Suggested delay in seconds before retrying, set when rate limited. */ delay?: number; } +/** + * Record of an item that was loaded into the external system. + * + * Used to persist the outcome of a load (external id, error, modified date) for reporting and + * subsequent runs. + */ export interface ExternalSystemItemLoadedItem { + /** External system id of the loaded item. */ id?: string; + /** Error message if loading the item failed. */ error?: string; + /** Modified timestamp reported by the external system. */ modifiedDate?: string; } +/** + * A handler that loads a single item into the external system. + * + * Used to implement the create or update behavior for an item type; receives the item, ID mappers, + * and event, and resolves with the loading outcome. + * + * @typeParam Item - The shape of the item to load. + * @returns Promise resolving to the ExternalSystemItemLoadingResponse for the item. + */ export type ExternalSystemLoadingFunction = ({ item, mappers, event, }: ExternalSystemItemLoadingParams) => Promise; +/** + * Registration of an item type and the functions that load it. + * + * Used to tell the loader, for a given external item type, how to create and update records in the + * external system. + */ export interface ItemTypeToLoad { + /** External item type these handlers apply to. */ itemType: string; + /** Handler that creates a new record in the external system. */ create: ExternalSystemLoadingFunction; + /** Handler that updates an existing record in the external system. */ update: ExternalSystemLoadingFunction; // requiresSecondPass: boolean; } +/** + * Parameters bundling the full set of item-type loaders for a loading run. + * + * Used to pass the configured list of loadable item types into the loading entry point. + */ export interface ItemTypesToLoadParams { + /** The item types to load, each with its create/update handlers. */ itemTypesToLoad: ItemTypeToLoad[]; } +/** + * Per-item-type counters summarizing the outcome of a loading run. + * + * Used to report, for one item type, how many records were created/updated/skipped/deleted/failed. + */ export interface LoaderReport { + /** External item type this report covers. */ item_type: string; + /** Number of records created. */ [ActionType.CREATED]?: number; + /** Number of records updated. */ [ActionType.UPDATED]?: number; + /** Number of records skipped (no-op). */ [ActionType.SKIPPED]?: number; + /** Number of records deleted. */ [ActionType.DELETED]?: number; + /** Number of records that failed to load. */ [ActionType.FAILED]?: number; } +/** + * Signals that the external system is rate limiting and loading should pause. + * + * Used to propagate a back-off duration from a loading function up to the loader. + */ export interface RateLimited { + /** Number of seconds to wait before resuming. */ delay: number; } +/** + * Result of loading a single item, capturing success report, error, or rate-limit signal. + * + * Used internally by the loader to aggregate per-item outcomes. + */ export interface LoadItemResponse { + /** Error record when the item could not be loaded. */ error?: ErrorRecord; + /** Per-type counters contributed by this item. */ report?: LoaderReport; + /** Rate-limit signal when the external system is throttling. */ rateLimit?: RateLimited; } +/** + * Aggregate result of loading one or more item types. + * + * Used to return the per-type reports and the list of processed files at the end of a loading phase. + */ export interface LoadItemTypesResponse { + /** Per-item-type loading reports. */ reports: LoaderReport[]; + /** Names of the data files that were processed. */ processed_files: string[]; } +/** + * The kinds of actions a loader can perform on a record, used as report counter keys. + * + * Used to key {@link LoaderReport} counters and to classify the outcome of each loaded item. + */ export enum ActionType { + /** A new record was created in the external system. */ CREATED = 'created', + /** An existing record was updated. */ UPDATED = 'updated', + /** The record required no change. */ SKIPPED = 'skipped', + /** The record was deleted. */ DELETED = 'deleted', + /** Loading the record failed. */ FAILED = 'failed', } +/** A DevRev object identifier (DON), represented as a string. */ export type DonV2 = string; +/** + * A sync mapper record linking external and DevRev identifiers for one mapping. + * + * Used to track the correspondence between external ids, secondary ids, and DevRev ids, along with + * status, for sync operations. + */ export type SyncMapperRecord = { + /** External system identifiers for the mapped item. */ external_ids: string[]; + /** Secondary external identifiers (e.g. alternate keys). */ secondary_ids: string[]; + /** DevRev object identifiers for the mapped item. */ devrev_ids: string[]; + /** Status values associated with the mapping. */ status: string[]; + /** Input file the record was sourced from, when applicable. */ input_file?: string; }; +/** + * Outgoing event types reported by the loading phases. + * + * Used as the event_type when a loader emits control messages for data loading, attachment loading, + * and loader-state deletion (progress / delayed / done / error), plus a fallback for unrecognized events. + */ export enum LoaderEventType { DataLoadingProgress = 'DATA_LOADING_PROGRESS', DataLoadingDelayed = 'DATA_LOADING_DELAYED', diff --git a/src/uploader/uploader.helpers.ts b/src/uploader/uploader.helpers.ts index ad389a5..85087db 100644 --- a/src/uploader/uploader.helpers.ts +++ b/src/uploader/uploader.helpers.ts @@ -9,9 +9,12 @@ import { import { UploaderResult } from './uploader.interfaces'; /** - * Compresses a JSONL string using gzip compression. - * @param {string} jsonlObject - The JSONL string to compress - * @returns {Buffer | void} The compressed buffer or undefined on error + * Compresses a JSONL string using gzip. + * + * Used to shrink a serialized JSONL batch before uploading it as an artifact. + * + * @param jsonlObject - The JSONL string to compress. + * @returns An UploaderResult wrapping the gzipped Buffer, or an error on failure. */ export function compressGzip(jsonlObject: string): UploaderResult { try { @@ -22,9 +25,12 @@ export function compressGzip(jsonlObject: string): UploaderResult { } /** - * Decompresses a gzipped buffer to a JSONL string. - * @param {Buffer} gzippedJsonlObject - The gzipped buffer to decompress - * @returns {string | void} The decompressed JSONL string or undefined on error + * Decompresses a gzipped buffer back into a JSONL string. + * + * Used to restore a downloaded gzipped artifact before parsing it. + * + * @param gzippedJsonlObject - The gzipped Buffer to decompress. + * @returns An UploaderResult wrapping the decompressed JSONL string, or an error on failure. */ export function decompressGzip( gzippedJsonlObject: Buffer @@ -39,8 +45,11 @@ export function decompressGzip( /** * Parses a JSONL string into an array of objects. - * @param {string} jsonlObject - The JSONL string to parse - * @returns {object[] | null} The parsed array of objects or null on error + * + * Used to turn a decompressed artifact into usable records. + * + * @param jsonlObject - The JSONL string to parse. + * @returns An UploaderResult wrapping the parsed object array, or an error on failure. */ export function parseJsonl(jsonlObject: string): UploaderResult { try { @@ -51,10 +60,14 @@ export function parseJsonl(jsonlObject: string): UploaderResult { } /** - * Downloads fetched objects to the local file system (for local development). - * @param {string} itemType - The type of items being downloaded - * @param {object | object[]} fetchedObjects - The objects to write to file - * @returns {Promise} Resolves when the file is written or rejects on error + * Writes fetched objects to the local file system for local development. + * + * Used to inspect extracted data on disk instead of uploading it when running locally; writes a + * timestamped JSON/JSONL file under the `extracted_files` directory. + * + * @param itemType - The string item type, used to name the output file and pick its extension. + * @param fetchedObjects - The object or array of objects to write, one JSON record per line. + * @returns Promise that resolves once the file is written, or rejects on a write error. */ export async function downloadToLocal( itemType: string, @@ -90,9 +103,13 @@ export async function downloadToLocal( } /** - * Truncates a filename if it exceeds the maximum allowed length. - * @param {string} filename - The filename to truncate - * @returns {string} The truncated filename + * Truncates a filename that exceeds the platform's maximum length. + * + * Used before requesting an upload URL so the registered file name stays within DevRev limits, + * preserving the extension and inserting an ellipsis in the middle. + * + * @param filename - The string filename to truncate. + * @returns The original filename if within the limit, otherwise a truncated `name...ext` string. */ export function truncateFilename(filename: string): string { // If the filename is already within the limit, return it as is. diff --git a/src/uploader/uploader.interfaces.ts b/src/uploader/uploader.interfaces.ts index 2224bed..ee8027c 100644 --- a/src/uploader/uploader.interfaces.ts +++ b/src/uploader/uploader.interfaces.ts @@ -3,8 +3,15 @@ import { AirSyncEvent } from '../types/extraction'; import { ExternalSystemItem, StatsFileObject } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; +/** + * Construction parameters used to create an Uploader instance. + * + * Used to supply the triggering event (platform endpoint, token, request id) and optional adapter options. + */ export interface UploaderFactoryInterface { + /** The AirSync event supplying the DevRev endpoint, service account token, and request id. */ event: AirSyncEvent; + /** Optional worker adapter options (e.g. local development and skip-confirmation flags). */ options?: WorkerAdapterOptions; } @@ -90,12 +97,26 @@ export interface SsorAttachment { inline?: boolean; } +/** + * Result of fetching and parsing a loading stats file artifact. + * + * Used to return the per-item-type stats produced by the loading phase, or an error if it could not be read. + */ export interface StatsFileResponse { + /** Error describing why the stats file could not be retrieved or parsed. */ error?: ErrorRecord; + /** Parsed stats file entries, one per item type. */ statsFile?: StatsFileObject[]; } +/** + * Result of fetching and parsing a transformer file artifact. + * + * Used to return the transformed external system items to be loaded into DevRev, or an error if it could not be read. + */ export interface TransformerFileResponse { + /** Error describing why the transformer file could not be retrieved or parsed. */ error?: ErrorRecord; + /** Parsed external system items to load. */ transformerFile?: ExternalSystemItem[]; } diff --git a/src/uploader/uploader.ts b/src/uploader/uploader.ts index fa9833a..29ae52c 100644 --- a/src/uploader/uploader.ts +++ b/src/uploader/uploader.ts @@ -22,6 +22,12 @@ import { UploaderResult, } from './uploader.interfaces'; +/** + * Uploads extraction artifacts to the DevRev platform and reads them back. + * + * Used to compress and upload JSON batches and streamed attachment binaries, obtain upload/download + * URLs, confirm uploads, and download and parse previously uploaded artifacts during sync. + */ export class Uploader { private isLocalDevelopment?: boolean; private devrevApiEndpoint: string; @@ -42,10 +48,14 @@ export class Uploader { } /** - * Uploads the fetched objects to the DevRev platform. Fetched objects are compressed to a gzipped jsonl object and uploaded to the platform. - * @param {string} itemType - The type of the item to be uploaded - * @param {object[] | object} fetchedObjects - The objects to be uploaded - * @returns {Promise} - The response object containing the artifact information or error information if there was an error + * Uploads fetched objects to the DevRev platform as a single artifact. + * + * Used to compress the objects into a gzipped JSONL file, request an upload URL, push the file, and + * (unless skipped) confirm the upload, returning the resulting artifact descriptor. + * + * @param itemType - The string item type of the objects being uploaded. + * @param fetchedObjects - The object or array of objects to upload. + * @returns Promise resolving to an UploadResponse with the artifact descriptor, or an error message on failure. */ async upload( itemType: string, @@ -127,11 +137,14 @@ export class Uploader { } /** - * Gets the upload URL for an artifact from the DevRev API. - * @param {string} filename - The name of the file to upload - * @param {string} fileType - The MIME type of the file - * @param {number} [fileSize] - Optional file size in bytes - * @returns {Promise} The artifact upload information or undefined on error + * Requests a pre-signed upload URL and form data for a new artifact. + * + * Used before uploading or streaming a file so the binary can be POSTed to the returned URL. + * + * @param filename - The string file name to register (truncated if it exceeds the platform limit). + * @param fileType - The string MIME type of the file. + * @param fileSize - Optional number of bytes; rejected if 0 or less. + * @returns Promise resolving to an UploaderResult wrapping the ArtifactToUpload, or an error on failure. */ async getArtifactUploadUrl( filename: string, @@ -165,10 +178,13 @@ export class Uploader { } /** - * Uploads an artifact file to the provided upload URL using multipart form data. - * @param {ArtifactToUpload} artifact - The artifact upload information containing upload URL and form data - * @param {Buffer} file - The file buffer to upload - * @returns {Promise} The axios response or undefined on error + * Uploads an in-memory file buffer to a pre-signed artifact upload URL. + * + * Used to push a fully buffered artifact (e.g. a compressed JSON batch) as multipart form data. + * + * @param artifact - The ArtifactToUpload descriptor holding the upload URL and form fields. + * @param file - The Buffer containing the file contents to upload. + * @returns Promise resolving to an UploaderResult wrapping the AxiosResponse, or an error on failure. */ async uploadArtifact( artifact: ArtifactToUpload, @@ -193,10 +209,14 @@ export class Uploader { } /** - * Streams an artifact file from an axios response to the upload URL. - * @param {ArtifactToUpload} artifact - The artifact upload information containing upload URL and form data - * @param {AxiosResponse} fileStream - The axios response stream containing the file data - * @returns {Promise} The axios response or undefined on error + * Streams a file directly from a source response into a pre-signed artifact upload URL. + * + * Used to upload attachment binaries without buffering them in memory; falls back to the max + * artifact size for Content-Length when the source omits it, and always destroys the source stream. + * + * @param artifact - The ArtifactToUpload descriptor holding the upload URL and form fields. + * @param fileStream - The AxiosResponse whose data stream supplies the file contents. + * @returns Promise resolving to an UploaderResult wrapping the AxiosResponse, or an error on failure. */ async streamArtifact( artifact: ArtifactToUpload, @@ -232,9 +252,12 @@ export class Uploader { } /** - * Confirms that an artifact upload has been completed successfully. - * @param {string} artifactId - The ID of the artifact to confirm - * @returns {Promise} The axios response or undefined on error + * Confirms with the platform that an artifact upload has finished. + * + * Used after pushing the binary so the platform finalizes and accepts the artifact. + * + * @param artifactId - The string ID of the uploaded artifact to confirm. + * @returns Promise resolving to an object with the AxiosResponse on a 2xx, or an error otherwise. */ async confirmArtifactUpload(artifactId: string): Promise<{ response?: AxiosResponse; @@ -273,8 +296,11 @@ export class Uploader { } /** - * Destroys a stream to prevent resource leaks. - * @param {any} fileStream - The axios response stream to destroy + * Destroys a source stream to prevent resource leaks after streaming an artifact. + * + * Used internally by streamArtifact to close the AxiosResponse data stream on both success and error. + * + * @param fileStream - The AxiosResponse whose underlying data stream should be destroyed/closed. */ private destroyStream(fileStream: AxiosResponse): void { try { @@ -292,10 +318,13 @@ export class Uploader { } /** - * Retrieves attachment metadata from an artifact by downloading and parsing it. - * @param {object} param0 - Configuration object - * @param {string} param0.artifact - The artifact ID to download attachments from - * @returns {Promise<{attachments?: NormalizedAttachment[], error?: {message: string}}>} The attachments array or error object + * Downloads an attachments-metadata artifact and parses it into normalized attachments. + * + * Used during attachment extraction to read back the previously uploaded attachment metadata so its + * binaries can be streamed; resolves the download URL, downloads, gunzips, and parses the JSONL. + * + * @param param0 - Object with `artifact`, the string artifact ID of the attachments-metadata artifact. + * @returns Promise resolving to an object with the NormalizedAttachment array, or an error message on failure. */ async getAttachmentsFromArtifactId({ artifact, @@ -364,9 +393,12 @@ export class Uploader { } /** - * Gets the download URL for an artifact from the DevRev API. - * @param {string} artifactId - The ID of the artifact to download - * @returns {Promise} The download URL or undefined on error + * Requests a pre-signed download URL for an artifact from the platform. + * + * Used internally before downloading an artifact's contents back from object storage. + * + * @param artifactId - The string ID of the artifact to download. + * @returns Promise resolving to an UploaderResult wrapping the download URL string, or an error on failure. */ private async getArtifactDownloadUrl( artifactId: string @@ -391,9 +423,12 @@ export class Uploader { } /** - * Downloads an artifact file from the given URL. - * @param {string} artifactUrl - The URL to download the artifact from - * @returns {Promise} The artifact file buffer or undefined on error + * Downloads an artifact's raw bytes from a pre-signed URL. + * + * Used internally to fetch artifact contents as a Buffer for later decompression and parsing. + * + * @param artifactUrl - The string pre-signed URL to download the artifact from. + * @returns Promise resolving to an UploaderResult wrapping the file Buffer, or an error on failure. */ private async downloadArtifact( artifactUrl: string @@ -410,11 +445,12 @@ export class Uploader { } /** - * Retrieves and parses JSON objects from an artifact by artifact ID. - * @param {object} param0 - Configuration object - * @param {string} param0.artifactId - The artifact ID to download and parse - * @param {boolean} [param0.isGzipped=false] - Whether the artifact is gzipped - * @returns {Promise} The parsed JSON objects or undefined on error + * Downloads an artifact by ID and parses its JSONL contents into objects. + * + * Used to read back a previously uploaded JSON batch; optionally gunzips the bytes first. + * + * @param param0 - Object with `artifactId` (string artifact ID) and optional `isGzipped` (boolean, default false) flag. + * @returns Promise resolving to an UploaderResult wrapping the parsed object or object array, or an error on failure. */ async getJsonObjectByArtifactId({ artifactId, From ec6242a70016290fbeddeb4404077cd5bedaeed1 Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:18:21 +0200 Subject: [PATCH 19/22] docs: mark C7 done --- V2_PROGRESS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index 6ebed94..d9e2f9e 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -290,7 +290,7 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C4b state envelope | ☑ done | 30ba1b3. { connectorState, sdkState } envelope + v1->v2 migration shim (normalizeFetchedState). adapter.state→connector-only, new adapter.sdkState; ~28 SDK-field access sites moved. SdkState kept combined (narrowing deferred to C5). Reviewer-approved (migration cases verified). | | C5 adapter split | ☑ done | a7a877f. BaseAdapter (template emit + hooks) + ExtractionAdapter + LoadingAdapter; WorkerAdapter→union alias; processTask dispatches by mode (still single entry). worker-adapter.ts deleted; helpers→loading-adapter.helpers. Reviewer-approved (emit equivalence verified). SdkState kept combined (narrowing dropped from scope). | | C6 emit-from-return | ☑ done | 0fb6116. task/onTimeout return TaskResult; SDK maps status→event via getEventTypeForResult and emits once (emitFromResult); emit now protected/internal; processTask→processExtractionTask+processLoadingTask; loader/stream methods return TaskResult. Reviewer-approved (mapping+state-save+no-double-emit verified). NET-NEW design (no oracle). | -| C7 JSDoc | ☐ todo | Phase 2 | +| C7 JSDoc | ☑ done | d05434b. Comments-only pass over 25 files to the mappers.ts bar: v2-new code (adapters, state incl. migration shim, process-task/spawn/getEventTypeForResult mapping) + under-documented older modules (repo, uploader, attachments pool, control-protocol, install-IDM, errors, types/loading barrel). Verified every changed line is a comment; build green, lint clean. Fanned out 5 implementer subagents over disjoint file groups. | | C8 api report | ☐ todo | Phase 2 | | C9 exposure audit | ☐ todo | Phase 2, interactive | | C10 tests + baseline | ☐ todo | Phase 2 | From 4fea7552c41e6ab51d13c4d70d45ed7401e34cfe Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:45:25 +0200 Subject: [PATCH 20/22] Revert "docs(v2): JSDoc pass over public surface and v2 internals" This reverts commit d05434be0baa2cb373ba114ad3da769561aa6854. --- .../attachments-streaming-pool.interfaces.ts | 10 -- .../attachments-streaming-pool.ts | 41 +---- src/common/control-protocol.ts | 20 --- src/common/errors.ts | 9 - src/common/install-initial-domain-mapping.ts | 13 -- src/multithreading/adapters/base-adapter.ts | 41 ++--- .../adapters/extraction-adapter.ts | 117 ++----------- .../adapters/loading-adapter.helpers.ts | 26 +-- .../adapters/loading-adapter.ts | 97 +---------- src/multithreading/create-worker.ts | 10 -- src/multithreading/process-task.ts | 24 --- src/multithreading/spawn/spawn.helpers.ts | 10 +- src/multithreading/spawn/spawn.ts | 35 +--- src/repo/repo.interfaces.ts | 40 +---- src/repo/repo.ts | 26 --- src/state/base-state.ts | 75 +++------ src/state/extraction-state.ts | 40 ++--- src/state/loading-state.ts | 19 +-- src/state/state.interfaces.ts | 45 +---- src/state/state.ts | 10 +- src/types/index.ts | 8 - src/types/loading.ts | 154 ------------------ src/uploader/uploader.helpers.ts | 47 ++---- src/uploader/uploader.interfaces.ts | 21 --- src/uploader/uploader.ts | 110 +++++-------- 25 files changed, 146 insertions(+), 902 deletions(-) diff --git a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts index c412758..9a22ea0 100644 --- a/src/attachments-streaming/attachments-streaming-pool.interfaces.ts +++ b/src/attachments-streaming/attachments-streaming-pool.interfaces.ts @@ -4,19 +4,9 @@ import { } from '../types'; import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; -/** - * Construction parameters used to create an AttachmentsStreamingPool. - * - * Used to supply the driving extraction adapter, the attachments to stream, the concurrency limit, and - * the connector-provided streaming function. - */ export interface AttachmentsStreamingPoolParams { - /** The ExtractionAdapter that owns sync state, timeout detection, and the processAttachment call. */ adapter: ExtractionAdapter; - /** The normalized attachments to stream to DevRev. */ attachments: NormalizedAttachment[]; - /** Optional maximum number of attachments to stream concurrently (defaults to 10 in the pool). */ batchSize?: number; - /** Connector-provided function that downloads a single attachment from the external system. */ stream: ExternalSystemAttachmentStreamingFunction; } diff --git a/src/attachments-streaming/attachments-streaming-pool.ts b/src/attachments-streaming/attachments-streaming-pool.ts index d7cf4a0..58281e1 100644 --- a/src/attachments-streaming/attachments-streaming-pool.ts +++ b/src/attachments-streaming/attachments-streaming-pool.ts @@ -8,12 +8,6 @@ import { } from '../types'; import { AttachmentsStreamingPoolParams } from './attachments-streaming-pool.interfaces'; -/** - * Concurrency-bounded pool that streams a batch of attachments from the external system to DevRev. - * - * Used during attachment extraction to download up to batchSize attachments in parallel while honoring - * timeouts, rate-limit delays, and per-attachment errors, and to track processed attachments for resumption. - */ export class AttachmentsStreamingPool { private adapter: ExtractionAdapter; private attachments: NormalizedAttachment[]; @@ -37,14 +31,6 @@ export class AttachmentsStreamingPool { this.stream = stream; } - /** - * Increments the processed counter and periodically logs progress. - * - * Used after each attachment to report progress every PROGRESS_REPORT_INTERVAL items and briefly - * yield the event loop. - * - * @returns Promise that resolves once progress has been recorded (and any brief sleep elapsed). - */ private async updateProgress() { this.totalProcessedCount++; if (this.totalProcessedCount % this.PROGRESS_REPORT_INTERVAL === 0) { @@ -55,13 +41,10 @@ export class AttachmentsStreamingPool { } /** - * Migrates processed-attachment state from the legacy string[] format to ProcessedAttachment[]. + * Migrates processed attachments from the legacy string[] format to the new ProcessedAttachment[] format. * - * Used when resuming streaming so older saved state (a list of ids) is upgraded to the structured - * { id, parent_id } form before it is consulted for de-duplication. - * - * @param attachments - The persisted list to migrate, either a string[] of ids or a ProcessedAttachment[]. - * @returns Migrated array of ProcessedAttachment objects, or an empty array if the input is invalid. + * @param attachments - The attachments list to migrate (either string[] or ProcessedAttachment[]) + * @returns Migrated array of ProcessedAttachment objects, or empty array if input is invalid */ // eslint-disable-next-line @typescript-eslint/no-explicit-any private migrateProcessedAttachments(attachments: any): ProcessedAttachment[] { @@ -86,15 +69,6 @@ export class AttachmentsStreamingPool { return []; } - /** - * Streams every attachment in the pool, running up to batchSize streams concurrently. - * - * Used as the pool's entry point: it initializes/migrates the processed-attachments state, starts the - * initial set of worker loops, and waits for them to drain the queue or stop early on a delay. - * - * @returns Promise resolving to a ProcessAttachmentReturnType: a delay if rate-limited, an error if - * state is uninitialized, or an empty object once all attachments are processed. - */ async streamAll(): Promise { console.log( `Starting download of ${this.attachments.length} attachments, streaming ${this.batchSize} at once.` @@ -141,15 +115,6 @@ export class AttachmentsStreamingPool { return {}; } - /** - * Runs a single worker loop that pulls and streams attachments until the queue is drained. - * - * Used as one of the concurrent workers started by streamAll: it skips already-processed attachments, - * stops on timeout or a rate-limit delay, records successes, and logs/skips per-attachment errors. - * - * @returns Promise that resolves when this worker stops, either because the queue is empty or a - * timeout/delay was detected. - */ async startPoolStreaming() { // Process attachments until the attachments array is empty while (this.attachments.length > 0) { diff --git a/src/common/control-protocol.ts b/src/common/control-protocol.ts index 8470401..0d13be1 100644 --- a/src/common/control-protocol.ts +++ b/src/common/control-protocol.ts @@ -10,32 +10,12 @@ import { import { LoaderEventType } from '../types/loading'; import { LIBRARY_VERSION } from './constants'; -/** - * Parameters for emitting a worker control message back to the platform. - * - * Used by {@link emit} to construct and post the outgoing extractor/loader event. - */ export interface EmitInterface { - /** The incoming AirSync event currently being processed; supplies the callback URL, event context, and auth secrets. */ event: AirSyncEvent; - /** The outgoing event type to report. In v2 this value is used directly (no event-type translation). */ eventType: ExtractorEventType | LoaderEventType; - /** Optional payload describing progress, results, or error details to attach to the event. */ data?: EventData; } -/** - * Emits a worker control message to the parent/platform via the event callback URL. - * - * Used to report extraction/loading progress, completion, delays, or errors back to AirSync. - * Wraps the given event type and data into an ExtractorEvent/LoaderEvent envelope (stamped with - * the library version) and POSTs it to the callback URL with the service account authorization. - * - * @param event - The incoming AirSyncEvent providing callback URL, event context, and auth secrets. - * @param eventType - The outgoing ExtractorEventType or LoaderEventType, used directly as the event_type. - * @param data - Optional EventData payload to include in the emitted event. - * @returns Promise resolving to the AxiosResponse of the callback POST request. - */ export const emit = async ({ event, eventType, diff --git a/src/common/errors.ts b/src/common/errors.ts index cb32a8b..cd420a9 100644 --- a/src/common/errors.ts +++ b/src/common/errors.ts @@ -1,15 +1,6 @@ -/** Prefix used to namespace common error codes emitted by extractors. */ const ERROR_PREFIX = 'ERROR_CODE'; -/** Delimiter joining the error prefix and the specific error name in the encoded code. */ const ERROR_DELIMITER = '='; -/** - * Well-known error codes an extractor can report to signal common, externally-caused failure conditions. - * - * Used to communicate persistent source-system states (deletion, deactivation, missing access/permission) - * or sync-completion signals back to AirSync in a recognized, machine-readable form. Each member's value - * is the encoded string `ERROR_CODE=`. - */ export const enum ExtractionCommonError { // Indicates that the external system is permanently inactive or inaccessible. // This is used for persistent conditions (system deleted, deactivated, access permanently revoked) diff --git a/src/common/install-initial-domain-mapping.ts b/src/common/install-initial-domain-mapping.ts index b0adb6f..5a8ec43 100644 --- a/src/common/install-initial-domain-mapping.ts +++ b/src/common/install-initial-domain-mapping.ts @@ -4,19 +4,6 @@ import { AirSyncEvent } from '../types/extraction'; import { serializeError } from '../logger/logger'; import { InitialDomainMapping } from '../types/common'; -/** - * Installs the connector's initial domain mapping into the sync. - * - * Used the first time a sync needs the mapping that describes how external record types/fields map - * to DevRev. It resolves the snap-in's import and version slugs, optionally creates the starting - * recipe blueprint, then installs the initial domain mapping (plus any additional mappings) via the - * airdrop recipe API. If no mapping JSON is provided the call is a no-op; blueprint creation failures - * are logged and the install continues without a blueprint. - * - * @param event - The AirSyncEvent providing the DevRev endpoint, service account token, and snap-in ID. - * @param initialDomainMappingJson - The InitialDomainMapping JSON containing the starting recipe blueprint and additional mappings. - * @returns Promise that resolves once the mapping has been installed (or early-returns if none is provided). - */ export async function installInitialDomainMapping( event: AirSyncEvent, initialDomainMappingJson: InitialDomainMapping diff --git a/src/multithreading/adapters/base-adapter.ts b/src/multithreading/adapters/base-adapter.ts index 787b240..dec9285 100644 --- a/src/multithreading/adapters/base-adapter.ts +++ b/src/multithreading/adapters/base-adapter.ts @@ -23,13 +23,10 @@ import { Uploader } from '../../uploader/uploader'; import { getEventTypeForResult } from '../spawn/spawn.helpers'; /** - * Abstract base for the worker adapters, holding state and behavior shared by - * both sync modes and owning the `emit` control-protocol flow as a template method. - * - * Used as the type passed to worker tasks; mode-specific adapters - * (`ExtractionAdapter`, `LoadingAdapter`) extend it and implement the abstract - * hooks (`beforeEmit`, `buildEmitPayload`, `afterEmit`) to inject their own - * pre-emit work and event payload shaping. + * BaseAdapter holds the state and behavior shared by both sync modes and owns + * the `emit` control-protocol flow as a template method. Mode-specific adapters + * (`ExtractionAdapter`, `LoadingAdapter`) implement the abstract hooks to inject + * their own pre-emit work and event payload shaping. * * @typeParam ConnectorState - the connector-owned state shape */ @@ -76,18 +73,10 @@ export abstract class BaseAdapter { return this.adapterState.sdkState; } - /** Per-item-type extraction scope (which item types to extract). */ get extractionScope() { return this.adapterState.extractionScope; } - /** - * Persists the current adapter state to the platform. - * - * Used to checkpoint connector and SDK state outside of an emit. - * - * @returns Promise that resolves once the state has been posted. - */ async postState() { return runWithSdkLogContext(async () => { await this.adapterState.postState(); @@ -124,15 +113,12 @@ export abstract class BaseAdapter { * Maps a {@link TaskResult} returned by a worker's task/onTimeout callback to * the phase-appropriate platform event and emits it exactly once. * - * Used as the SDK-internal bridge between the return-based connector contract + * This is the SDK-internal bridge between the return-based connector contract * and the control protocol; it is invoked by the worker driver, not by * connectors. Connectors signal outcomes by returning a `TaskResult`, never by - * calling `emit` directly. A `delay`/`error` status carries its delay seconds - * or error into the event data; a status that is illegal for a non-resumable - * phase is downgraded to an error event. + * calling `emit` directly. * - * @param result - The TaskResult status the worker reported for the current phase. - * @returns Promise that resolves once the mapped event has been emitted. + * @param result - The status the worker reported for the current phase. */ async emitFromResult(result: TaskResult): Promise { const { eventType, illegal } = getEventTypeForResult( @@ -155,17 +141,10 @@ export abstract class BaseAdapter { } /** - * Emits a single event to the platform via the template-method flow. - * - * Used as the one place that sends a control-protocol event: it runs the - * `beforeEmit` hook, persists state (except for stateless start/delete events), - * merges in the mode-specific `buildEmitPayload`, sends the event, then runs - * `afterEmit`. Guarded by `hasWorkerEmitted` so it emits at most once; any - * failure in preparation, state posting, or sending signals the worker to exit. + * Emits an event to the platform. * - * @param newEventType - The ExtractorEventType or LoaderEventType to emit. - * @param data - Optional EventData (e.g. delay or error) merged into the payload. - * @returns Promise that resolves once the emit attempt has completed. + * @param newEventType - The event type to be emitted + * @param data - The data to be sent with the event */ protected async emit( newEventType: ExtractorEventType | LoaderEventType, diff --git a/src/multithreading/adapters/extraction-adapter.ts b/src/multithreading/adapters/extraction-adapter.ts index 39cd036..5bbbffa 100644 --- a/src/multithreading/adapters/extraction-adapter.ts +++ b/src/multithreading/adapters/extraction-adapter.ts @@ -32,14 +32,10 @@ import { Artifact, SsorAttachment } from '../../uploader/uploader.interfaces'; import { BaseAdapter } from './base-adapter'; /** - * Worker adapter passed to extraction tasks, exposing the extraction surface - * (repos, artifacts, attachment streaming). + * ExtractionAdapter is the adapter passed to extraction tasks. It exposes the + * extraction surface (repos, artifacts, attachment streaming) and uploads + * pending repos and updates the extraction boundaries before emitting. * - * Used during the extraction phases; before emitting it uploads all pending - * repos and, on `AttachmentExtractionDone`, advances the sync boundaries on - * `sdkState`. - * - * @typeParam ConnectorState - the connector-owned state shape * @public */ export class ExtractionAdapter< @@ -60,12 +56,7 @@ export class ExtractionAdapter< /** * Returns whether the given item type should be extracted. - * - * Used to honor the per-item-type extraction scope; defaults to true when the - * scope is empty or the item type is not listed. - * - * @param itemType - The item type name to check. - * @returns True if the item type should be extracted. + * Defaults to true if the scope is empty or the item type is not listed. */ shouldExtract(itemType: string): boolean { const scope = this.extractionScope; @@ -74,17 +65,6 @@ export class ExtractionAdapter< return scope[itemType].extract; } - /** - * Initializes the adapter's repos from the given repo definitions. - * - * Used to set up the in-memory item buffers an extraction task pushes to; - * each repo normalizes items (except external domain metadata and SSOR - * attachments) and, on upload, records attachment artifact IDs in state and - * tracks event payload size to trigger a timeout when the SQS size threshold - * is exceeded. - * - * @param repos - The RepoInterface definitions to build repos from. - */ initializeRepos(repos: RepoInterface[]) { this.repos = repos.map((repo) => { const shouldNormalize = @@ -124,15 +104,6 @@ export class ExtractionAdapter< }); } - /** - * Looks up an initialized repo by item type. - * - * Used by extraction tasks to get the buffer they push normalized items into; - * logs an error and returns undefined when no repo matches. - * - * @param itemType - The item type name of the repo to find. - * @returns The matching Repo, or undefined if none was initialized. - */ getRepo(itemType: string): Repo | undefined { return runWithSdkLogContext(() => { const repo = this.repos.find((repo) => repo.itemType === itemType); @@ -146,32 +117,16 @@ export class ExtractionAdapter< }); } - /** Artifacts accumulated since the last emit, sent with the next event. */ get artifacts(): Artifact[] { return this._artifacts; } - /** Appends artifacts to the accumulator, de-duplicating by reference. */ set artifacts(artifacts: Artifact[]) { this._artifacts = this._artifacts .concat(artifacts) .filter((value, index, self) => self.indexOf(value) === index); } - /** - * Pre-emit hook: uploads all repos and, when extraction completes, advances - * the sync boundaries. - * - * Used as the extraction implementation of the `BaseAdapter` template hook. - * Always uploads pending repos; on `AttachmentExtractionDone` it promotes - * `lastSyncStarted` to `lastSuccessfulSyncStarted`, clears the pending worker - * boundaries, and expands `workersOldest`/`workersNewest` from the event's - * extract-from/extract-to window. - * - * @param newEventType - The event type about to be emitted. - * @returns Promise that resolves once pre-emit work is done; rejects (aborting - * the emit) if a repo upload fails. - */ protected async beforeEmit( newEventType: ExtractorEventType | LoaderEventType ): Promise { @@ -223,15 +178,6 @@ export class ExtractionAdapter< } } - /** - * Builds the extraction-specific extras merged into the emitted event payload. - * - * Used as the extraction implementation of the `BaseAdapter` template hook; - * returns the accumulated artifacts for extraction events and nothing otherwise. - * - * @param newEventType - The event type about to be emitted. - * @returns EventData carrying the accumulated artifacts for extraction events. - */ protected buildEmitPayload( newEventType: ExtractorEventType | LoaderEventType ): EventData { @@ -241,24 +187,10 @@ export class ExtractionAdapter< return isExtractionEvent ? { artifacts: this.artifacts } : {}; } - /** - * Post-emit hook: clears the accumulated artifacts after a successful emit. - * - * Used as the extraction implementation of the `BaseAdapter` template hook so - * artifacts are not re-sent on a subsequent emit. - */ protected afterEmit(): void { this.artifacts = []; } - /** - * Uploads all initialized repos and collects their resulting artifacts. - * - * Used by `beforeEmit` to flush buffered items to artifacts before an event is - * sent; throws on the first repo that fails to upload. - * - * @returns Promise that resolves once every repo has been uploaded. - */ async uploadAllRepos(): Promise { for (const repo of this.repos) { const error = await repo.upload(); @@ -269,20 +201,6 @@ export class ExtractionAdapter< } } - /** - * Streams a single attachment from the external system into a DevRev artifact - * and records the SSOR attachment mapping. - * - * Used while streaming attachments: it opens the external stream, requests an - * artifact upload URL, streams the bytes, confirms the upload, then pushes an - * `ssor_attachment` linking the external and DevRev IDs. A delay or error from - * the stream is surfaced back to the caller and the HTTP stream is destroyed. - * - * @param attachment - The NormalizedAttachment to fetch and upload. - * @param stream - The ExternalSystemAttachmentStreamingFunction that opens the source stream. - * @returns Promise with undefined on success, or a ProcessAttachmentReturnType - * carrying a delay or error. - */ async processAttachment( attachment: NormalizedAttachment, stream: ExternalSystemAttachmentStreamingFunction @@ -398,12 +316,8 @@ export class ExtractionAdapter< } /** - * Destroys an HTTP response stream to prevent memory leaks. - * - * Used to release an open attachment source stream when uploading fails; - * swallows any error raised while destroying. - * - * @param httpStream - The AxiosResponse stream to destroy. + * Destroys a stream to prevent memory leaks. + * @param httpStream - The axios response stream to destroy */ private destroyHttpStream(httpStream: AxiosResponse): void { try { @@ -420,20 +334,11 @@ export class ExtractionAdapter< } /** - * Streams all pending attachments to the DevRev platform and returns the phase - * outcome. - * - * Used as the entry point for the attachment-streaming phase: it iterates the - * attachments-metadata artifact IDs recorded in state, fetches each batch, and - * streams them either through caller-provided processors or the default - * streaming pool (with batch size clamped to 1..50). Returns `delay` on a - * rate limit, `error` on failure, `progress` on timeout (to resume in a fresh - * invocation), and `success` once every artifact ID is processed. - * - * @param stream - The ExternalSystemAttachmentStreamingFunction used to open each source stream. - * @param processors - Optional ExternalSystemAttachmentProcessors (reducer + iterator) overriding the default pool. - * @param batchSize - Number of attachments to stream concurrently; defaults to 1, clamped to 1..50. - * @returns Promise with the TaskResult describing the phase outcome. + * Streams the attachments to the DevRev platform. + * The attachments are streamed to the platform and the artifact information is returned. + * @param params - The parameters to stream the attachments + * @returns The response object containing the ssorAttachment artifact information + * or error information if there was an error */ async streamAttachments({ stream, diff --git a/src/multithreading/adapters/loading-adapter.helpers.ts b/src/multithreading/adapters/loading-adapter.helpers.ts index e7f3187..ecab2de 100644 --- a/src/multithreading/adapters/loading-adapter.helpers.ts +++ b/src/multithreading/adapters/loading-adapter.helpers.ts @@ -6,15 +6,10 @@ import { } from '../../types/loading'; /** - * Builds the ordered list of files to load from a stats file. - * - * Used by the loading adapter to filter the stats file down to the supported - * item types, order the entries to match that item-type order, and shape each - * into a FileToLoad with progress fields reset. - * - * @param supportedItemTypes - The supported item type names, in desired load order. - * @param statsFile - The StatsFileObject entries describing available files. - * @returns The FileToLoad entries to process, ordered by supported item type. + * Gets the files to load for the loader. + * @param {string[]} supportedItemTypes - The supported item types + * @param {StatsFileObject[]} statsFile - The stats file + * @returns {FileToLoad[]} The files to load */ export function getFilesToLoad({ supportedItemTypes, @@ -55,15 +50,10 @@ export function getFilesToLoad({ } /** - * Merges a report into the accumulated loader reports. - * - * Used to keep one running report per item type: when a report for the same - * item type already exists its created/updated/failed counts are summed, - * otherwise the report is appended. - * - * @param loaderReports - The existing LoaderReport accumulator (mutated in place). - * @param report - The LoaderReport to merge in. - * @returns The updated LoaderReport accumulator. + * Adds a report to the loader report. + * @param {LoaderReport[]} loaderReports - The loader reports + * @param {LoaderReport} report - The report to add + * @returns {LoaderReport[]} The updated loader reports */ export function addReportToLoaderReport({ loaderReports, diff --git a/src/multithreading/adapters/loading-adapter.ts b/src/multithreading/adapters/loading-adapter.ts index 70c4fe9..49fdd2b 100644 --- a/src/multithreading/adapters/loading-adapter.ts +++ b/src/multithreading/adapters/loading-adapter.ts @@ -36,14 +36,9 @@ import { } from './loading-adapter.helpers'; /** - * Worker adapter passed to loading tasks, exposing the loading surface - * (item/attachment loading, mappers, loader reports). + * LoadingAdapter is the adapter passed to loading tasks. It exposes the loading + * surface (item/attachment loading, mappers, loader reports). * - * Used during the loading phases to push transformed DevRev items into the - * external system, maintaining sync mapper records and accumulating per-item-type - * loader reports across emits. - * - * @typeParam ConnectorState - the connector-owned state shape * @public */ export class LoadingAdapter< @@ -67,43 +62,22 @@ export class LoadingAdapter< }); } - /** Per-item-type loader reports accumulated across loads, sent with each emit. */ get reports(): LoaderReport[] { return this.loaderReports; } - /** Artifact IDs of transformer files that have been fully loaded. */ get processedFiles(): string[] { return this._processedFiles; } - /** Mappers client for reading/writing sync mapper records. */ get mappers(): Mappers { return this._mappers; } - /** - * Pre-emit hook for loading; intentionally a no-op. - * - * Used as the loading implementation of the `BaseAdapter` template hook; - * loading has no repos to upload and no extraction boundaries to advance. - * - * @returns Promise that resolves immediately. - */ protected async beforeEmit(): Promise { // Loading has no pre-emit work (no repos, no extraction boundaries). } - /** - * Builds the loading-specific extras merged into the emitted event payload. - * - * Used as the loading implementation of the `BaseAdapter` template hook; - * returns the accumulated reports and processed files for loader events and - * nothing otherwise. - * - * @param newEventType - The event type about to be emitted. - * @returns EventData carrying loader reports and processed files for loader events. - */ protected buildEmitPayload( newEventType: ExtractorEventType | LoaderEventType ): EventData { @@ -118,29 +92,10 @@ export class LoadingAdapter< : {}; } - /** - * Post-emit hook for loading; intentionally a no-op. - * - * Used as the loading implementation of the `BaseAdapter` template hook; - * loading keeps its accumulated reports and processed files across emits. - */ protected afterEmit(): void { // Loading keeps its accumulated reports/processed files across emits. } - /** - * Loads all supported item types into the external system and returns the - * phase outcome. - * - * Used as the entry point for the data-loading phase: on `StartLoadingData` it - * resolves the batches of transformer files to load from the stats file, then - * for each file loads every item via `loadItem`, tracking progress per line so - * a fresh invocation can resume. Returns `delay` on a rate limit, `progress` - * on timeout, `error` on failure, and `success` once all files are processed. - * - * @param itemTypesToLoad - The ItemTypeToLoad definitions (create/update handlers per item type). - * @returns Promise with the TaskResult describing the phase outcome. - */ async loadItemTypes({ itemTypesToLoad, }: ItemTypesToLoadParams): Promise { @@ -261,17 +216,6 @@ export class LoadingAdapter< }); } - /** - * Resolves the ordered list of transformer files to load for the given item - * types. - * - * Used by `loadItemTypes`/`loadAttachments` to turn the event's stats file - * into `FileToLoad` batches, filtered to the supported item types; returns an - * empty list when there is no stats file, it cannot be fetched, or it is empty. - * - * @param supportedItemTypes - The item type names to include, in load order. - * @returns Promise with the FileToLoad batches to process. - */ async getLoaderBatches({ supportedItemTypes, }: { @@ -304,18 +248,6 @@ export class LoadingAdapter< }); } - /** - * Loads attachments into the external system and returns the phase outcome. - * - * Used as the entry point for the attachment-loading phase: on - * `StartLoadingAttachments` it resolves the attachment transformer files to - * load, then creates each attachment via `loadAttachment`, tracking progress - * per line for resumption. Returns `delay` on a rate limit, `progress` on - * timeout, `error` on failure, and `success` once all files are processed. - * - * @param create - The ExternalSystemLoadingFunction that creates an attachment in the external system. - * @returns Promise with the TaskResult describing the phase outcome. - */ async loadAttachments({ create, }: { @@ -412,19 +344,6 @@ export class LoadingAdapter< }); } - /** - * Loads a single item into the external system, creating or updating as needed. - * - * Used per item by `loadItemTypes`: it looks up the sync mapper record by - * DevRev target ID and calls the item type's `update` handler; if no mapper - * record exists (404) it falls back to the `create` handler. On success it - * creates or updates the sync mapper record and reports the action; a rate - * limit surfaces a delay and other failures are reported as failed. - * - * @param item - The ExternalSystemItem to load. - * @param itemTypeToLoad - The ItemTypeToLoad providing the create/update handlers. - * @returns Promise with a LoadItemResponse carrying a report, a rateLimit delay, or an error. - */ async loadItem({ item, itemTypeToLoad, @@ -631,18 +550,6 @@ export class LoadingAdapter< }); } - /** - * Creates a single attachment in the external system. - * - * Used per attachment by `loadAttachments`: it calls the `create` handler and, - * on success, creates a sync mapper record linking the new external ID to the - * attachment's reference ID and reports it as created. A rate limit surfaces a - * delay; a missing ID is reported as failed (attachments are create-only). - * - * @param item - The ExternalSystemAttachment to create. - * @param create - The ExternalSystemLoadingFunction that creates the attachment. - * @returns Promise with a LoadItemResponse carrying a report or a rateLimit delay. - */ async loadAttachment({ item, create, diff --git a/src/multithreading/create-worker.ts b/src/multithreading/create-worker.ts index 72b9655..abd3503 100644 --- a/src/multithreading/create-worker.ts +++ b/src/multithreading/create-worker.ts @@ -2,16 +2,6 @@ import { isMainThread, Worker } from 'node:worker_threads'; import { WorkerData, WorkerEvent } from '../types/workers'; -/** - * Creates a Node worker thread that runs the snap-in's task worker script. - * - * Used by `spawn` to launch the off-main-thread worker that processes an - * extraction/loading event; the promise settles once the worker comes online - * so the caller can wire up timeouts and lifecycle handling. - * - * @param workerData - The data of type WorkerData passed to the worker thread (event, initial state, options, etc.). - * @returns A Promise that resolves with the online Worker instance, or rejects with the Error if the worker fails to start or is itself a worker thread. - */ async function createWorker( workerData: WorkerData ): Promise { diff --git a/src/multithreading/process-task.ts b/src/multithreading/process-task.ts index 6db0d7c..a43debb 100644 --- a/src/multithreading/process-task.ts +++ b/src/multithreading/process-task.ts @@ -30,12 +30,6 @@ import { LoadingAdapter } from './adapters/loading-adapter'; * If `onTimeout` is omitted, the SDK emits a phase-appropriate default on * timeout: `progress` (resumable phases) or `error` (non-resumable phases) is * handled by the status->event mapping when we emit a `progress` result. - * - * @param buildAdapter - Factory that constructs the typed adapter for this worker. - * @param params - The task hooks of type ProcessTaskInterface. - * @param params.task - The worker's main task; receives the adapter and resolves to a TaskResult. - * @param params.onTimeout - Optional callback run on soft timeout when nothing has emitted yet; resolves to a TaskResult. - * @returns A Promise that resolves once the result has been emitted and the worker process exits. */ async function runWorkerTask>( buildAdapter: () => Promise, @@ -89,15 +83,6 @@ async function runWorkerTask>( * Entry point for an extraction worker. Builds an {@link ExtractionAdapter} and * runs the provided task against it. * - * Used as the worker-script entry the snap-in calls inside an extraction worker - * thread; returns immediately on the main thread so the same module can be - * imported there safely. - * - * @param params - The task hooks of type ProcessTaskInterface for an ExtractionAdapter. - * @param params.task - The extraction task; receives the adapter and resolves to a TaskResult. - * @param params.onTimeout - Optional callback run on soft timeout; resolves to a TaskResult. - * @returns Nothing; emission and process exit are handled by the shared driver. - * * @public */ export function processExtractionTask({ @@ -138,15 +123,6 @@ export function processExtractionTask({ * Entry point for a loading worker. Builds a {@link LoadingAdapter} and runs the * provided task against it. * - * Used as the worker-script entry the snap-in calls inside a loading worker - * thread; returns immediately on the main thread so the same module can be - * imported there safely. - * - * @param params - The task hooks of type ProcessTaskInterface for a LoadingAdapter. - * @param params.task - The loading task; receives the adapter and resolves to a TaskResult. - * @param params.onTimeout - Optional callback run on soft timeout; resolves to a TaskResult. - * @returns Nothing; emission and process exit are handled by the shared driver. - * * @public */ export function processLoadingTask({ diff --git a/src/multithreading/spawn/spawn.helpers.ts b/src/multithreading/spawn/spawn.helpers.ts index 30a00e1..1f52764 100644 --- a/src/multithreading/spawn/spawn.helpers.ts +++ b/src/multithreading/spawn/spawn.helpers.ts @@ -59,15 +59,7 @@ export function getEventTypeForResult( /** * Per-phase outgoing event types, keyed by the incoming {@link EventType}. - * - * Each entry maps a phase to its outgoing events: every phase has a `done` - * (success) and `error` event. `resumable` phases (data/attachment extraction, - * data/attachment loading) additionally define `progress` and `delayed` events - * and accept all four statuses. Non-resumable phases (external sync units, - * metadata, state deletions) omit `progress`/`delayed`; a `progress`/`delay` - * status there is illegal and {@link getEventTypeForResult} collapses it to the - * `error` event while flagging it. Event types absent from this partial map are - * treated as unrecognized. + * `resumable` phases define progress/delayed events; non-resumable ones do not. */ const EVENT_PHASE_MAP: Partial< Record< diff --git a/src/multithreading/spawn/spawn.ts b/src/multithreading/spawn/spawn.ts index f31ecba..02176ff 100644 --- a/src/multithreading/spawn/spawn.ts +++ b/src/multithreading/spawn/spawn.ts @@ -25,17 +25,6 @@ import { getNoScriptEventType, } from './spawn.helpers'; -/** - * Resolves the default worker script path for an incoming event type. - * - * Used by `spawn` to pick which built-in worker (external sync units, metadata, - * data/attachment extraction, data/attachment loading) to run when the caller - * has not supplied an explicit `workerPath` or override. - * - * @param event - The AirSync event whose `payload.event_type` selects the worker. - * @param workerBasePath - The base directory string the resolved relative worker path is appended to. - * @returns The full worker script path string, or null if the event type has no matching built-in worker. - */ function getWorkerPath({ event, workerBasePath, @@ -74,14 +63,12 @@ function getWorkerPath({ * Spawn class is responsible for spawning a new worker thread and managing the lifecycle of the worker. * The class provides utilities to emit control events to the platform and exit the worker gracefully. * In case of lambda timeout, the class emits a lambda timeout event to the platform. - * @param options - The options of type SpawnFactoryInterface used to launch the worker. - * @param options.event - The AirSync event object received from the platform. - * @param options.initialState - The initial connector state handed to the worker. - * @param options.initialDomainMapping - The initial domain mapping handed to the worker. - * @param options.options - Optional SDK behavior overrides (timeout, local development, worker path overrides, etc.). - * @param options.workerPath - Optional explicit path to the worker script; takes precedence over overrides and the default resolver. - * @param options.baseWorkerPath - The base path for the worker files, usually `__dirname`. - * @returns A Promise that resolves once the worker finishes (or a no-script default event is emitted), or rejects if the worker fails to start. + * @param {SpawnFactoryInterface} options - The options to create a new instance of Spawn class + * @param {AirSyncEvent} options.event - The event object received from the platform + * @param {object} options.initialState - The initial state of the adapter + * @param {string} [options.workerPath] Remove getWorkerPath function and use baseWorkerPath: __dirname instead of workerPath + * @param {string} [options.baseWorkerPath] - The base path for the worker files, usually `__dirname` + * @returns {Promise} - A new instance of Spawn class */ export async function spawn({ event, @@ -167,16 +154,6 @@ export async function spawn({ } } -/** - * Manages the lifecycle of a spawned worker thread for a single event. - * - * Used by `spawn` to supervise the worker: it arms a soft timeout (asks the - * worker to exit gracefully) and a hard timeout (terminates a stuck worker), - * relays the worker's log messages to the main thread, tracks whether the - * worker has already emitted an event, periodically logs memory usage, and on - * worker exit clears the timers and resolves the spawn promise -- emitting a - * timeout error event if the worker exited without emitting one itself. - */ export class Spawn { private event: AirSyncEvent; private alreadyEmitted: boolean; diff --git a/src/repo/repo.interfaces.ts b/src/repo/repo.interfaces.ts index 583f71e..f790be2 100644 --- a/src/repo/repo.interfaces.ts +++ b/src/repo/repo.interfaces.ts @@ -4,84 +4,54 @@ import { AirSyncEvent } from '../types/extraction'; import { WorkerAdapterOptions } from '../types/workers'; /** - * Describes a repo configuration that stores and uploads extracted data of one item type. - * - * Used to declare which item type a repo holds and how its raw records should be normalized. + * RepoInterface is an interface that defines the structure of a repo which is used to store and upload extracted data. */ export interface RepoInterface { - /** The item type the repo buffers and uploads. */ itemType: string; - /** Optional normalizer turning a raw record into a NormalizedItem or NormalizedAttachment. */ normalize?: (record: object) => NormalizedItem | NormalizedAttachment; - /** Optional worker adapter options that override defaults (e.g. batch size). */ overridenOptions?: WorkerAdapterOptions; } /** - * Construction parameters used to create a Repo instance. - * - * Used to wire a repo to its triggering event, item type, normalizer, upload callback, and options. + * RepoFactoryInterface is an interface that defines the structure of a repo factory which is used to create a repo. */ export interface RepoFactoryInterface { - /** The AirSync event that drives the extraction and supplies platform credentials. */ event: AirSyncEvent; - /** The item type the repo buffers and uploads. */ itemType: string; - /** Optional normalizer turning a raw record into a NormalizedItem or NormalizedAttachment. */ normalize?: (record: object) => NormalizedItem | NormalizedAttachment; - /** Callback invoked with each Artifact once it has been uploaded. */ onUpload: (artifact: Artifact) => void; - /** Optional worker adapter options that override defaults (e.g. batch size). */ options?: WorkerAdapterOptions; } /** - * An external system item after normalization into the shape AirSync expects. - * - * Used as the uploaded representation of a non-attachment record. + * NormalizedItem is an interface of item after normalization. */ export interface NormalizedItem { - /** External system identifier of the item. */ id: string; - /** ISO timestamp of when the item was created in the external system. */ created_date: string; - /** ISO timestamp of when the item was last modified in the external system. */ modified_date: string; - /** Normalized field values of the item. */ data: object; } /** - * An external system attachment after normalization into the shape AirSync expects. - * - * Used as the uploaded metadata for an attachment whose binary is streamed separately. + * NormalizedAttachment is an interface of attachment after normalization. */ export interface NormalizedAttachment { - /** Source URL the attachment binary can be downloaded from. */ url: string; - /** External system identifier of the attachment. */ id: string; - /** Name of the attached file. */ file_name: string; - /** External system identifier of the item the attachment belongs to. */ parent_id: string; - /** Optional external system identifier of the attachment's author. */ author_id?: string; - /** Whether the attachment is embedded inline (e.g. in rich text) rather than a standalone file. */ inline?: boolean; - /** Optional MIME type of the attachment. */ content_type?: string; // This should be a string, but it was a number in the past. Due to backwards // compatibility we are keeping it also as a number. - /** Optional external identifier of the parent's parent; kept as number for backwards compatibility. */ grand_parent_id?: number | string; } /** - * A raw, un-normalized record extracted from the external system. - * - * Used as the input to a repo's normalize function before items are uploaded. + * Item is an interface that defines the structure of an item. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any export type Item = Record; diff --git a/src/repo/repo.ts b/src/repo/repo.ts index e694b58..d119eea 100644 --- a/src/repo/repo.ts +++ b/src/repo/repo.ts @@ -16,12 +16,6 @@ import { RepoFactoryInterface, } from './repo.interfaces'; -/** - * In-memory buffer that accumulates normalized items of a single item type during extraction. - * - * Used to batch pushed items (ARTIFACT_BATCH_SIZE per batch), normalize them, and upload them as - * artifacts to the DevRev platform, firing the onUpload callback for each uploaded artifact. - */ export class Repo { readonly itemType: string; private items: (NormalizedItem | NormalizedAttachment | Item)[]; @@ -47,20 +41,10 @@ export class Repo { this.uploadedArtifacts = []; } - /** Returns the items currently buffered in the repo (not yet uploaded). */ getItems(): (NormalizedItem | NormalizedAttachment | Item)[] { return this.items; } - /** - * Uploads a batch of items (or all buffered items) as a single artifact. - * - * Used to flush buffered items to the DevRev platform; on success the artifact is passed to - * onUpload and recorded in uploadedArtifacts. When no explicit batch is given the buffer is cleared. - * - * @param batch - Optional explicit array of NormalizedItem, NormalizedAttachment, or Item to upload; defaults to all buffered items. - * @returns Promise that resolves to void on success, or an ErrorRecord describing the upload failure. - */ async upload( batch?: (NormalizedItem | NormalizedAttachment | Item)[] ): Promise { @@ -100,16 +84,6 @@ export class Repo { } } - /** - * Normalizes and buffers items, uploading full batches as they accumulate. - * - * Used by connectors to feed extracted items into the repo; items are normalized (unless the item - * type is external domain metadata or SSOR attachments) and any complete batches of batchSize are - * uploaded immediately, leaving the remainder buffered for a later flush. - * - * @param items - Array of raw Item records to normalize and buffer. - * @returns Promise that resolves to true when items were buffered/uploaded successfully, or false if a batch upload threw. - */ async push(items: Item[]): Promise { let recordsToPush: (NormalizedItem | NormalizedAttachment | Item)[]; diff --git a/src/state/base-state.ts b/src/state/base-state.ts index 1b860e8..06db895 100644 --- a/src/state/base-state.ts +++ b/src/state/base-state.ts @@ -17,11 +17,10 @@ import { } from './state.interfaces'; /** - * Abstract base owning the adapter state lifecycle shared by every sync mode. + * BaseState owns the state lifecycle shared by every sync mode: connector vs. + * SDK state separation, fetch/init/post against the platform, the v1->v2 + * migration shim, and the snap-in-version-gated initial domain mapping install. * - * Used to keep connector-owned state separate from SDK bookkeeping, fetch/init/ - * post the persisted state against the platform, run the v1->v2 migration shim, - * and install the initial domain mapping gated on the snap-in version. * Mode-specific subclasses (`ExtractionState`, `LoadingState`) seed the * SDK-owned portion of the state and add mode-specific setup in their factories. * @@ -70,21 +69,15 @@ export abstract class BaseState { this._sdkState = value; } - /** The per-sync-unit extraction scope (object types to extract), loaded alongside state. */ get extractionScope(): ExtractionScope { return this._extractionScope; } /** - * Installs the initial domain mapping when the version in state is stale. - * - * Used by all modes (so a loading run still installs the mapping if extraction - * has not) to (re)install whenever `sdkState.snapInVersionId` is absent or - * differs from the event context's snap-in version; on success the new version - * is recorded in state. A missing mapping or install error fails the worker. - * - * @param initialDomainMapping - The initial domain mapping of type InitialDomainMapping passed to the spawn function; required when an install is needed - * @returns Promise that resolves once the mapping is installed or the install is skipped + * Installs the initial domain mapping when the snap-in version in state does + * not match the version in the event context. Shared by all modes so that a + * loading run still installs the mapping if extraction has not done so. + * @param initialDomainMapping The initial domain mapping passed to spawn */ async installInitialDomainMappingIfNeeded( initialDomainMapping?: InitialDomainMapping @@ -129,17 +122,13 @@ export abstract class BaseState { } /** - * Initializes this adapter's state from persisted state, or seeds it on first run. + * Initializes the state for this adapter instance by fetching from API + * or creating an initial state if none exists (404). * - * Used at worker start to load and normalize state: it fetches the persisted - * blob, parses it, and runs `normalizeFetchedState` so both the v2 - * `{ connectorState, sdkState }` envelope and a legacy flat v1 blob are - * accepted (the latter migrated on read). It also restores the extraction - * scope. On a 404 it seeds the initial state and persists the v2 envelope; - * any other failure fails the worker. - * - * @param initialState - The initial connector state of type ConnectorState provided by the spawn function, used when no state exists yet - * @returns Promise that resolves once state has been loaded or seeded + * Reads both the v2 `{ connectorState, sdkState }` envelope and a legacy flat + * v1 blob (connector keys merged with SDK keys), migrating the latter on read. + * Always persists the v2 envelope going forward. + * @param initialState The initial connector state provided by the spawn function */ async init(initialState: ConnectorState): Promise { try { @@ -192,20 +181,13 @@ export abstract class BaseState { } /** - * Normalizes parsed on-disk state into the `{ connectorState, sdkState }` envelope, migrating legacy v1 state. - * - * Used as the v1->v2 migration shim so older snap-ins keep working after the - * state split. Behavior by shape of the parsed input: - * - v2 envelope (`{ connectorState, sdkState }`): used as-is, with `sdkState` - * merged over the mode's initial SDK state to backfill newly added fields. - * - Legacy v1 flat blob: top-level keys present in `V1_SDK_STATE_KEYS` are - * split into `sdkState`, everything else becomes connector state. - * - Malformed envelope (one of `connectorState`/`sdkState` present, the other - * missing) or non-object input: throws. + * Normalizes a parsed on-disk state into the `{ connectorState, sdkState }` + * envelope, migrating a legacy flat v1 blob if needed. * - * @param parsed - The JSON-parsed persisted state of unknown shape (v2 envelope or legacy v1 flat blob) - * @returns The split state as `{ connectorState, sdkState }`, with `sdkState` merged over the initial SDK state - * @throws Error when the input is not an object or is a malformed envelope + * - v2 envelope (`{ connectorState, sdkState }`): used as-is. + * - v1 flat blob: SDK-owned keys (`V1_SDK_STATE_KEYS`) split into `sdkState`, + * everything else becomes connector state. + * - Malformed envelope (one side present, the other missing) fails loud. */ private normalizeFetchedState(parsed: unknown): { connectorState: ConnectorState; @@ -249,14 +231,9 @@ export abstract class BaseState { } /** - * Persists the adapter state to the platform. - * - * Used to checkpoint progress: wraps the current connector and SDK state into - * the v2 `{ connectorState, sdkState }` envelope, serializes it, and posts it. - * A serialization or request failure fails the worker. - * - * @param state - Optional connector state of type ConnectorState to set and persist; when omitted the current `this.state` is used - * @returns Promise that resolves once the state has been persisted + * Updates the state of the adapter by posting to API. + * Persists the v2 `{ connectorState, sdkState }` envelope. + * @param {object} state - The connector state to be updated */ async postState(state?: ConnectorState) { const url = this.workerUrl + '.update'; @@ -319,12 +296,8 @@ export abstract class BaseState { } /** - * Fetches the raw persisted adapter state from the platform. - * - * Used by `init` to read the stored state before normalization; returns the - * raw, still-stringified payload without parsing or migrating it. - * - * @returns Promise resolving to `{ state, objects }`, where `state` is the stringified state blob and `objects` is the optional stringified extraction scope + * Fetches the state of the adapter from API. + * @return The raw state data from API */ async fetchState(): Promise<{ state: string; objects?: string }> { console.log( diff --git a/src/state/extraction-state.ts b/src/state/extraction-state.ts index 49bd185..b78dbbc 100644 --- a/src/state/extraction-state.ts +++ b/src/state/extraction-state.ts @@ -10,13 +10,10 @@ import { BaseState } from './base-state'; import { extractionSdkState, StateInterface } from './state.interfaces'; /** - * Per-mode adapter state for extraction workers. - * - * Used to seed the extraction SDK state (extraction-window boundaries + - * attachments bookkeeping) on top of the shared lifecycle provided by - * `BaseState`, and to resolve the incremental extraction window for each event. - * - * @typeParam ConnectorState - the connector-owned state shape + * ExtractionState is the per-mode state for extraction workers. It seeds the + * extraction SDK state (extraction boundaries + attachments bookkeeping) on top + * of the shared lifecycle provided by `BaseState` and adds extraction-window + * resolution. */ export class ExtractionState extends BaseState { constructor(params: StateInterface) { @@ -24,19 +21,13 @@ export class ExtractionState extends BaseState { } /** - * Computes the incremental extraction window and writes `extract_from`/`extract_to` onto the event context. + * Resolves the extraction window onto the event context. * - * Used so every extraction phase shares one consistent time window, read from - * the SDK-owned boundary fields in `this.sdkState`. Behavior by event type: - * - StartExtractingData: stamp `lastSyncStarted` if not already set. - * - StartExtractingMetadata: resolve fresh from the TimeValue objects in the - * event context and cache them as pending boundaries (always overwrite). - * - All other events: reuse the pending boundaries cached during - * StartExtractingMetadata. - * Finally validates that `extract_from` is older than `extract_to`, failing - * the worker if the platform supplied an inverted window. - * - * @returns void; mutates the event context and `this.sdkState` in place + * On StartExtractingData: stamp `lastSyncStarted` if not already set. + * On StartExtractingMetadata: resolve fresh from the TimeValue objects in the + * event context and cache them as pending boundaries (always overwrite). + * On all other events: reuse the pending boundaries cached during + * StartExtractingMetadata. Finally, validate that extract_from < extract_to. */ resolveExtractionWindow(): void { const sdkState = this.sdkState; @@ -131,14 +122,9 @@ export class ExtractionState extends BaseState { /** * Creates and initializes an `ExtractionState` for an extraction worker. * - * Used by the state dispatcher to build extraction-mode state. The initial state - * is deep-cloned to avoid mutating the caller's object; for non-stateless events - * this fetches persisted state, installs the initial domain mapping if the - * snap-in version changed, then resolves the extraction window (time-value - * resolution + pending boundary reuse) and validates it. - * - * @param params - The state factory parameters of type StateInterface (event, initial connector state, optional domain mapping and worker options) - * @returns Promise resolving to the initialized ExtractionState + * For non-stateless events this fetches persisted state, installs the initial + * domain mapping if the snap-in version changed, then resolves the extraction + * window (time-value resolution + pending boundary reuse) and validates it. */ export async function createExtractionState({ event, diff --git a/src/state/loading-state.ts b/src/state/loading-state.ts index 5b03ecb..151f5bd 100644 --- a/src/state/loading-state.ts +++ b/src/state/loading-state.ts @@ -4,13 +4,9 @@ import { BaseState } from './base-state'; import { loadingSdkState, StateInterface } from './state.interfaces'; /** - * Per-mode adapter state for loading workers. - * - * Used to seed the loading SDK state (files-to-load bookkeeping) on top of the - * shared lifecycle provided by `BaseState`. Loading has no extraction-window - * resolution. - * - * @typeParam ConnectorState - the connector-owned state shape + * LoadingState is the per-mode state for loading workers. It seeds the loading + * SDK state (files-to-load bookkeeping) on top of the shared lifecycle provided + * by `BaseState`. Loading has no extraction-window resolution. */ export class LoadingState extends BaseState { constructor(params: StateInterface) { @@ -21,13 +17,8 @@ export class LoadingState extends BaseState { /** * Creates and initializes a `LoadingState` for a loading worker. * - * Used by the state dispatcher to build loading-mode state. The initial state is - * deep-cloned to avoid mutating the caller's object; for non-stateless events - * this fetches persisted state and installs the initial domain mapping if the - * snap-in version changed. - * - * @param params - The state factory parameters of type StateInterface (event, initial connector state, optional domain mapping and worker options) - * @returns Promise resolving to the initialized LoadingState + * For non-stateless events this fetches persisted state and installs the + * initial domain mapping if the snap-in version changed. */ export async function createLoadingState({ event, diff --git a/src/state/state.interfaces.ts b/src/state/state.interfaces.ts index e3c45ca..9e10577 100644 --- a/src/state/state.interfaces.ts +++ b/src/state/state.interfaces.ts @@ -3,13 +3,6 @@ import { AirSyncEvent } from '../types/extraction'; import { FileToLoad } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; -/** - * The SDK-owned portion of the persisted adapter state. - * - * Used to hold bookkeeping the SDK manages itself (extraction-window boundaries, - * attachments/files progress, installed snap-in version) separately from - * connector-owned state, so SDK internals never collide with connector keys. - */ export interface SdkState { /** * @deprecated Use extract_from and extract_to from the event context instead, @@ -27,15 +20,12 @@ export interface SdkState { /** The pending (not yet committed) newest extraction boundary (ISO 8601 timestamp). * Set on StartExtractingMetadata, reused across subsequent phases, cleared on AttachmentExtractionDone. */ pendingWorkersNewest?: string; - /** The committed oldest point of extraction (ISO 8601 timestamp). */ + /** The oldest point of extraction (ISO 8601 timestamp). */ workersOldest?: string; - /** The committed newest point of extraction (ISO 8601 timestamp). */ + /** The newest point of extraction (ISO 8601 timestamp). */ workersNewest?: string; - /** Attachments-extraction bookkeeping (artifact ids, progress cursor). Extraction mode only. */ toDevRev?: ToDevRev; - /** Loading bookkeeping (files still to load into DevRev). Loading mode only. */ fromDevRev?: FromDevRev; - /** The snap-in version id whose initial domain mapping is installed; drives reinstall on change. */ snapInVersionId?: string; } @@ -59,12 +49,6 @@ export interface AdapterStateEnvelope { sdkState: SdkState; } -/** - * SDK-owned attachments-extraction state (external system -> DevRev direction). - * - * Used to track which attachment artifacts have been streamed and how far the - * attachments phase has progressed so it can resume after a timeout. - */ export interface ToDevRev { attachmentsMetadata: { artifactIds: string[]; @@ -81,23 +65,10 @@ export interface ProcessedAttachment { parent_id: string; } -/** - * SDK-owned loading state (DevRev -> external system direction). - * - * Used to track which files still need to be loaded into the external system so - * the loading phase can resume after a timeout. - */ export interface FromDevRev { filesToLoad: FileToLoad[]; } -/** - * Constructor/factory parameters for building an adapter state instance. - * - * Used by `createAdapterState` and the per-mode factories to carry the AirSync - * event, the connector's seed state, and the optional initial domain mapping and - * worker options. - */ export interface StateInterface { event: AirSyncEvent; initialState: ConnectorState; @@ -105,12 +76,6 @@ export interface StateInterface { options?: WorkerAdapterOptions; } -/** - * The initial SDK state seeded for extraction-mode workers. - * - * Used by `ExtractionState` as the baseline `sdkState` (extraction-window - * boundaries plus attachments bookkeeping) before any persisted state is merged in. - */ export const extractionSdkState = { lastSyncStarted: '', lastSuccessfulSyncStarted: '', @@ -128,12 +93,6 @@ export const extractionSdkState = { }, }; -/** - * The initial SDK state seeded for loading-mode workers. - * - * Used by `LoadingState` as the baseline `sdkState` (files-to-load bookkeeping) - * before any persisted state is merged in. - */ export const loadingSdkState = { snapInVersionId: '', fromDevRev: { diff --git a/src/state/state.ts b/src/state/state.ts index bcb48dc..ab7dbb5 100644 --- a/src/state/state.ts +++ b/src/state/state.ts @@ -10,13 +10,11 @@ export { ExtractionState, createExtractionState } from './extraction-state'; export { LoadingState, createLoadingState } from './loading-state'; /** - * Creates and initializes the adapter state for the current worker. + * Creates and initializes the adapter state for the current worker, dispatching + * to the extraction or loading state based on the event's sync mode. * - * Used as the single entry point that dispatches to either `createLoadingState` - * or `createExtractionState` based on `event.payload.event_context.mode`. - * - * @param params - The state factory parameters of type StateInterface (event, initial state, optional domain mapping and worker options) - * @returns Promise resolving to the initialized mode-specific state (LoadingState when mode is LOADING, otherwise ExtractionState) + * @param params The state factory parameters (event, initial state, options) + * @returns The initialized mode-specific state */ export async function createAdapterState( params: StateInterface diff --git a/src/types/index.ts b/src/types/index.ts index 9be77e3..f98c725 100644 --- a/src/types/index.ts +++ b/src/types/index.ts @@ -1,11 +1,3 @@ -/** - * Public types barrel for the SDK. - * - * Aggregates and re-exports the commonly used types across the SDK domains — common, extraction, - * loading, repo, state, uploader, mappers, and external domain metadata — so consumers can import - * them from a single entry point. - */ - // Common export { AdapterUpdateParams, diff --git a/src/types/loading.ts b/src/types/loading.ts index 7c58c46..37d2d30 100644 --- a/src/types/loading.ts +++ b/src/types/loading.ts @@ -2,50 +2,22 @@ import { Mappers } from '../mappers/mappers'; import { ErrorRecord } from './common'; import { AirSyncEvent } from './extraction'; -/** - * Describes a single prepared data file as listed in the loading stats manifest. - * - * Used during loading to enumerate the artifact files produced by extraction, along with their - * item type and record count, so the loader knows what is available to process. - */ export interface StatsFileObject { - /** Identifier of the artifact/file. */ id: string; - /** External item type contained in the file (e.g. the record type being loaded). */ item_type: string; - /** Name of the file. */ file_name: string; - /** Number of records in the file, as a string. */ count: string; } -/** - * Loader-side view of a file to be loaded, tracking its processing progress. - * - * Used to drive and resume loading of a single data file: it records how many lines exist, the next - * line to process, and whether the file has been fully consumed. - */ export interface FileToLoad { - /** Identifier of the artifact/file. */ id: string; - /** Name of the file. */ file_name: string; - /** External item type contained in the file. */ itemType: string; - /** Total number of records in the file. */ count: number; - /** Index of the next line/record to process; used to resume loading across batches. */ lineToProcess: number; - /** Whether all records in the file have been loaded. */ completed: boolean; } -/** - * An attachment to be loaded into the external system, with its source metadata and parent links. - * - * Used by attachment loading to describe a single file (location, type, size, validity window, - * audit fields) and the DevRev/external parent it belongs to. - */ export interface ExternalSystemAttachment { reference_id: DonV2; parent_type: string; @@ -63,28 +35,13 @@ export interface ExternalSystemAttachment { grand_parent_id?: string; } -/** - * A single item to be loaded into the external system. - * - * Used during loading to carry the DevRev (and optional external) identifiers, audit timestamps, - * and the system-specific payload for one record. - * - * Note: this interface is declared twice in this file with identical members (TypeScript merges - * the declarations); the duplicate is redundant — see report. - */ export interface ExternalSystemItem { - /** Identifiers linking this item to DevRev and, when known, the external system. */ id: { - /** DevRev object identifier (DON). */ devrev: DonV2; - /** External system identifier, present once the item exists in the external system. */ external?: string; }; - /** Creation timestamp of the item. */ created_date: string; - /** Last-modified timestamp of the item. */ modified_date: string; - /** System-specific record payload. */ // eslint-disable-next-line @typescript-eslint/no-explicit-any data: any; } @@ -100,195 +57,84 @@ export interface ExternalSystemItem { data: any; } -/** - * Arguments passed to an external-system loading function for a single item. - * - * Used to give create/update handlers the item to load, the ID mappers for resolving DevRev <-> external - * references, and the current AirSync event for auth/context. - * - * @typeParam Type - The shape of the item being loaded. - */ export interface ExternalSystemItemLoadingParams { item: Type; mappers: Mappers; event: AirSyncEvent; } -/** - * Result returned by an external-system loading function for a single item. - * - * Used to report the outcome of a create/update: the resulting external id, an error message, - * the item's modified date, or a delay (in seconds) when the external system is rate limiting. - */ export interface ExternalSystemItemLoadingResponse { - /** External system id of the loaded item, when the operation succeeded. */ id?: string; - /** Error message when the operation failed. */ error?: string; - /** Modified timestamp reported by the external system after the operation. */ modifiedDate?: string; - /** Suggested delay in seconds before retrying, set when rate limited. */ delay?: number; } -/** - * Record of an item that was loaded into the external system. - * - * Used to persist the outcome of a load (external id, error, modified date) for reporting and - * subsequent runs. - */ export interface ExternalSystemItemLoadedItem { - /** External system id of the loaded item. */ id?: string; - /** Error message if loading the item failed. */ error?: string; - /** Modified timestamp reported by the external system. */ modifiedDate?: string; } -/** - * A handler that loads a single item into the external system. - * - * Used to implement the create or update behavior for an item type; receives the item, ID mappers, - * and event, and resolves with the loading outcome. - * - * @typeParam Item - The shape of the item to load. - * @returns Promise resolving to the ExternalSystemItemLoadingResponse for the item. - */ export type ExternalSystemLoadingFunction = ({ item, mappers, event, }: ExternalSystemItemLoadingParams) => Promise; -/** - * Registration of an item type and the functions that load it. - * - * Used to tell the loader, for a given external item type, how to create and update records in the - * external system. - */ export interface ItemTypeToLoad { - /** External item type these handlers apply to. */ itemType: string; - /** Handler that creates a new record in the external system. */ create: ExternalSystemLoadingFunction; - /** Handler that updates an existing record in the external system. */ update: ExternalSystemLoadingFunction; // requiresSecondPass: boolean; } -/** - * Parameters bundling the full set of item-type loaders for a loading run. - * - * Used to pass the configured list of loadable item types into the loading entry point. - */ export interface ItemTypesToLoadParams { - /** The item types to load, each with its create/update handlers. */ itemTypesToLoad: ItemTypeToLoad[]; } -/** - * Per-item-type counters summarizing the outcome of a loading run. - * - * Used to report, for one item type, how many records were created/updated/skipped/deleted/failed. - */ export interface LoaderReport { - /** External item type this report covers. */ item_type: string; - /** Number of records created. */ [ActionType.CREATED]?: number; - /** Number of records updated. */ [ActionType.UPDATED]?: number; - /** Number of records skipped (no-op). */ [ActionType.SKIPPED]?: number; - /** Number of records deleted. */ [ActionType.DELETED]?: number; - /** Number of records that failed to load. */ [ActionType.FAILED]?: number; } -/** - * Signals that the external system is rate limiting and loading should pause. - * - * Used to propagate a back-off duration from a loading function up to the loader. - */ export interface RateLimited { - /** Number of seconds to wait before resuming. */ delay: number; } -/** - * Result of loading a single item, capturing success report, error, or rate-limit signal. - * - * Used internally by the loader to aggregate per-item outcomes. - */ export interface LoadItemResponse { - /** Error record when the item could not be loaded. */ error?: ErrorRecord; - /** Per-type counters contributed by this item. */ report?: LoaderReport; - /** Rate-limit signal when the external system is throttling. */ rateLimit?: RateLimited; } -/** - * Aggregate result of loading one or more item types. - * - * Used to return the per-type reports and the list of processed files at the end of a loading phase. - */ export interface LoadItemTypesResponse { - /** Per-item-type loading reports. */ reports: LoaderReport[]; - /** Names of the data files that were processed. */ processed_files: string[]; } -/** - * The kinds of actions a loader can perform on a record, used as report counter keys. - * - * Used to key {@link LoaderReport} counters and to classify the outcome of each loaded item. - */ export enum ActionType { - /** A new record was created in the external system. */ CREATED = 'created', - /** An existing record was updated. */ UPDATED = 'updated', - /** The record required no change. */ SKIPPED = 'skipped', - /** The record was deleted. */ DELETED = 'deleted', - /** Loading the record failed. */ FAILED = 'failed', } -/** A DevRev object identifier (DON), represented as a string. */ export type DonV2 = string; -/** - * A sync mapper record linking external and DevRev identifiers for one mapping. - * - * Used to track the correspondence between external ids, secondary ids, and DevRev ids, along with - * status, for sync operations. - */ export type SyncMapperRecord = { - /** External system identifiers for the mapped item. */ external_ids: string[]; - /** Secondary external identifiers (e.g. alternate keys). */ secondary_ids: string[]; - /** DevRev object identifiers for the mapped item. */ devrev_ids: string[]; - /** Status values associated with the mapping. */ status: string[]; - /** Input file the record was sourced from, when applicable. */ input_file?: string; }; -/** - * Outgoing event types reported by the loading phases. - * - * Used as the event_type when a loader emits control messages for data loading, attachment loading, - * and loader-state deletion (progress / delayed / done / error), plus a fallback for unrecognized events. - */ export enum LoaderEventType { DataLoadingProgress = 'DATA_LOADING_PROGRESS', DataLoadingDelayed = 'DATA_LOADING_DELAYED', diff --git a/src/uploader/uploader.helpers.ts b/src/uploader/uploader.helpers.ts index 85087db..ad389a5 100644 --- a/src/uploader/uploader.helpers.ts +++ b/src/uploader/uploader.helpers.ts @@ -9,12 +9,9 @@ import { import { UploaderResult } from './uploader.interfaces'; /** - * Compresses a JSONL string using gzip. - * - * Used to shrink a serialized JSONL batch before uploading it as an artifact. - * - * @param jsonlObject - The JSONL string to compress. - * @returns An UploaderResult wrapping the gzipped Buffer, or an error on failure. + * Compresses a JSONL string using gzip compression. + * @param {string} jsonlObject - The JSONL string to compress + * @returns {Buffer | void} The compressed buffer or undefined on error */ export function compressGzip(jsonlObject: string): UploaderResult { try { @@ -25,12 +22,9 @@ export function compressGzip(jsonlObject: string): UploaderResult { } /** - * Decompresses a gzipped buffer back into a JSONL string. - * - * Used to restore a downloaded gzipped artifact before parsing it. - * - * @param gzippedJsonlObject - The gzipped Buffer to decompress. - * @returns An UploaderResult wrapping the decompressed JSONL string, or an error on failure. + * Decompresses a gzipped buffer to a JSONL string. + * @param {Buffer} gzippedJsonlObject - The gzipped buffer to decompress + * @returns {string | void} The decompressed JSONL string or undefined on error */ export function decompressGzip( gzippedJsonlObject: Buffer @@ -45,11 +39,8 @@ export function decompressGzip( /** * Parses a JSONL string into an array of objects. - * - * Used to turn a decompressed artifact into usable records. - * - * @param jsonlObject - The JSONL string to parse. - * @returns An UploaderResult wrapping the parsed object array, or an error on failure. + * @param {string} jsonlObject - The JSONL string to parse + * @returns {object[] | null} The parsed array of objects or null on error */ export function parseJsonl(jsonlObject: string): UploaderResult { try { @@ -60,14 +51,10 @@ export function parseJsonl(jsonlObject: string): UploaderResult { } /** - * Writes fetched objects to the local file system for local development. - * - * Used to inspect extracted data on disk instead of uploading it when running locally; writes a - * timestamped JSON/JSONL file under the `extracted_files` directory. - * - * @param itemType - The string item type, used to name the output file and pick its extension. - * @param fetchedObjects - The object or array of objects to write, one JSON record per line. - * @returns Promise that resolves once the file is written, or rejects on a write error. + * Downloads fetched objects to the local file system (for local development). + * @param {string} itemType - The type of items being downloaded + * @param {object | object[]} fetchedObjects - The objects to write to file + * @returns {Promise} Resolves when the file is written or rejects on error */ export async function downloadToLocal( itemType: string, @@ -103,13 +90,9 @@ export async function downloadToLocal( } /** - * Truncates a filename that exceeds the platform's maximum length. - * - * Used before requesting an upload URL so the registered file name stays within DevRev limits, - * preserving the extension and inserting an ellipsis in the middle. - * - * @param filename - The string filename to truncate. - * @returns The original filename if within the limit, otherwise a truncated `name...ext` string. + * Truncates a filename if it exceeds the maximum allowed length. + * @param {string} filename - The filename to truncate + * @returns {string} The truncated filename */ export function truncateFilename(filename: string): string { // If the filename is already within the limit, return it as is. diff --git a/src/uploader/uploader.interfaces.ts b/src/uploader/uploader.interfaces.ts index ee8027c..2224bed 100644 --- a/src/uploader/uploader.interfaces.ts +++ b/src/uploader/uploader.interfaces.ts @@ -3,15 +3,8 @@ import { AirSyncEvent } from '../types/extraction'; import { ExternalSystemItem, StatsFileObject } from '../types/loading'; import { WorkerAdapterOptions } from '../types/workers'; -/** - * Construction parameters used to create an Uploader instance. - * - * Used to supply the triggering event (platform endpoint, token, request id) and optional adapter options. - */ export interface UploaderFactoryInterface { - /** The AirSync event supplying the DevRev endpoint, service account token, and request id. */ event: AirSyncEvent; - /** Optional worker adapter options (e.g. local development and skip-confirmation flags). */ options?: WorkerAdapterOptions; } @@ -97,26 +90,12 @@ export interface SsorAttachment { inline?: boolean; } -/** - * Result of fetching and parsing a loading stats file artifact. - * - * Used to return the per-item-type stats produced by the loading phase, or an error if it could not be read. - */ export interface StatsFileResponse { - /** Error describing why the stats file could not be retrieved or parsed. */ error?: ErrorRecord; - /** Parsed stats file entries, one per item type. */ statsFile?: StatsFileObject[]; } -/** - * Result of fetching and parsing a transformer file artifact. - * - * Used to return the transformed external system items to be loaded into DevRev, or an error if it could not be read. - */ export interface TransformerFileResponse { - /** Error describing why the transformer file could not be retrieved or parsed. */ error?: ErrorRecord; - /** Parsed external system items to load. */ transformerFile?: ExternalSystemItem[]; } diff --git a/src/uploader/uploader.ts b/src/uploader/uploader.ts index 29ae52c..fa9833a 100644 --- a/src/uploader/uploader.ts +++ b/src/uploader/uploader.ts @@ -22,12 +22,6 @@ import { UploaderResult, } from './uploader.interfaces'; -/** - * Uploads extraction artifacts to the DevRev platform and reads them back. - * - * Used to compress and upload JSON batches and streamed attachment binaries, obtain upload/download - * URLs, confirm uploads, and download and parse previously uploaded artifacts during sync. - */ export class Uploader { private isLocalDevelopment?: boolean; private devrevApiEndpoint: string; @@ -48,14 +42,10 @@ export class Uploader { } /** - * Uploads fetched objects to the DevRev platform as a single artifact. - * - * Used to compress the objects into a gzipped JSONL file, request an upload URL, push the file, and - * (unless skipped) confirm the upload, returning the resulting artifact descriptor. - * - * @param itemType - The string item type of the objects being uploaded. - * @param fetchedObjects - The object or array of objects to upload. - * @returns Promise resolving to an UploadResponse with the artifact descriptor, or an error message on failure. + * Uploads the fetched objects to the DevRev platform. Fetched objects are compressed to a gzipped jsonl object and uploaded to the platform. + * @param {string} itemType - The type of the item to be uploaded + * @param {object[] | object} fetchedObjects - The objects to be uploaded + * @returns {Promise} - The response object containing the artifact information or error information if there was an error */ async upload( itemType: string, @@ -137,14 +127,11 @@ export class Uploader { } /** - * Requests a pre-signed upload URL and form data for a new artifact. - * - * Used before uploading or streaming a file so the binary can be POSTed to the returned URL. - * - * @param filename - The string file name to register (truncated if it exceeds the platform limit). - * @param fileType - The string MIME type of the file. - * @param fileSize - Optional number of bytes; rejected if 0 or less. - * @returns Promise resolving to an UploaderResult wrapping the ArtifactToUpload, or an error on failure. + * Gets the upload URL for an artifact from the DevRev API. + * @param {string} filename - The name of the file to upload + * @param {string} fileType - The MIME type of the file + * @param {number} [fileSize] - Optional file size in bytes + * @returns {Promise} The artifact upload information or undefined on error */ async getArtifactUploadUrl( filename: string, @@ -178,13 +165,10 @@ export class Uploader { } /** - * Uploads an in-memory file buffer to a pre-signed artifact upload URL. - * - * Used to push a fully buffered artifact (e.g. a compressed JSON batch) as multipart form data. - * - * @param artifact - The ArtifactToUpload descriptor holding the upload URL and form fields. - * @param file - The Buffer containing the file contents to upload. - * @returns Promise resolving to an UploaderResult wrapping the AxiosResponse, or an error on failure. + * Uploads an artifact file to the provided upload URL using multipart form data. + * @param {ArtifactToUpload} artifact - The artifact upload information containing upload URL and form data + * @param {Buffer} file - The file buffer to upload + * @returns {Promise} The axios response or undefined on error */ async uploadArtifact( artifact: ArtifactToUpload, @@ -209,14 +193,10 @@ export class Uploader { } /** - * Streams a file directly from a source response into a pre-signed artifact upload URL. - * - * Used to upload attachment binaries without buffering them in memory; falls back to the max - * artifact size for Content-Length when the source omits it, and always destroys the source stream. - * - * @param artifact - The ArtifactToUpload descriptor holding the upload URL and form fields. - * @param fileStream - The AxiosResponse whose data stream supplies the file contents. - * @returns Promise resolving to an UploaderResult wrapping the AxiosResponse, or an error on failure. + * Streams an artifact file from an axios response to the upload URL. + * @param {ArtifactToUpload} artifact - The artifact upload information containing upload URL and form data + * @param {AxiosResponse} fileStream - The axios response stream containing the file data + * @returns {Promise} The axios response or undefined on error */ async streamArtifact( artifact: ArtifactToUpload, @@ -252,12 +232,9 @@ export class Uploader { } /** - * Confirms with the platform that an artifact upload has finished. - * - * Used after pushing the binary so the platform finalizes and accepts the artifact. - * - * @param artifactId - The string ID of the uploaded artifact to confirm. - * @returns Promise resolving to an object with the AxiosResponse on a 2xx, or an error otherwise. + * Confirms that an artifact upload has been completed successfully. + * @param {string} artifactId - The ID of the artifact to confirm + * @returns {Promise} The axios response or undefined on error */ async confirmArtifactUpload(artifactId: string): Promise<{ response?: AxiosResponse; @@ -296,11 +273,8 @@ export class Uploader { } /** - * Destroys a source stream to prevent resource leaks after streaming an artifact. - * - * Used internally by streamArtifact to close the AxiosResponse data stream on both success and error. - * - * @param fileStream - The AxiosResponse whose underlying data stream should be destroyed/closed. + * Destroys a stream to prevent resource leaks. + * @param {any} fileStream - The axios response stream to destroy */ private destroyStream(fileStream: AxiosResponse): void { try { @@ -318,13 +292,10 @@ export class Uploader { } /** - * Downloads an attachments-metadata artifact and parses it into normalized attachments. - * - * Used during attachment extraction to read back the previously uploaded attachment metadata so its - * binaries can be streamed; resolves the download URL, downloads, gunzips, and parses the JSONL. - * - * @param param0 - Object with `artifact`, the string artifact ID of the attachments-metadata artifact. - * @returns Promise resolving to an object with the NormalizedAttachment array, or an error message on failure. + * Retrieves attachment metadata from an artifact by downloading and parsing it. + * @param {object} param0 - Configuration object + * @param {string} param0.artifact - The artifact ID to download attachments from + * @returns {Promise<{attachments?: NormalizedAttachment[], error?: {message: string}}>} The attachments array or error object */ async getAttachmentsFromArtifactId({ artifact, @@ -393,12 +364,9 @@ export class Uploader { } /** - * Requests a pre-signed download URL for an artifact from the platform. - * - * Used internally before downloading an artifact's contents back from object storage. - * - * @param artifactId - The string ID of the artifact to download. - * @returns Promise resolving to an UploaderResult wrapping the download URL string, or an error on failure. + * Gets the download URL for an artifact from the DevRev API. + * @param {string} artifactId - The ID of the artifact to download + * @returns {Promise} The download URL or undefined on error */ private async getArtifactDownloadUrl( artifactId: string @@ -423,12 +391,9 @@ export class Uploader { } /** - * Downloads an artifact's raw bytes from a pre-signed URL. - * - * Used internally to fetch artifact contents as a Buffer for later decompression and parsing. - * - * @param artifactUrl - The string pre-signed URL to download the artifact from. - * @returns Promise resolving to an UploaderResult wrapping the file Buffer, or an error on failure. + * Downloads an artifact file from the given URL. + * @param {string} artifactUrl - The URL to download the artifact from + * @returns {Promise} The artifact file buffer or undefined on error */ private async downloadArtifact( artifactUrl: string @@ -445,12 +410,11 @@ export class Uploader { } /** - * Downloads an artifact by ID and parses its JSONL contents into objects. - * - * Used to read back a previously uploaded JSON batch; optionally gunzips the bytes first. - * - * @param param0 - Object with `artifactId` (string artifact ID) and optional `isGzipped` (boolean, default false) flag. - * @returns Promise resolving to an UploaderResult wrapping the parsed object or object array, or an error on failure. + * Retrieves and parses JSON objects from an artifact by artifact ID. + * @param {object} param0 - Configuration object + * @param {string} param0.artifactId - The artifact ID to download and parse + * @param {boolean} [param0.isGzipped=false] - Whether the artifact is gzipped + * @returns {Promise} The parsed JSON objects or undefined on error */ async getJsonObjectByArtifactId({ artifactId, From 9b2b9758ef0ccb96f5e5281e4d762d79eb62c96f Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:47:23 +0200 Subject: [PATCH 21/22] refactor(v2): post-review cleanup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three independent cleanups from the v2 self-review: 1. Drop the pre-1.15.2 processed-attachments migration. `migrateProcessedAttachments` only upgraded the legacy `string[]` form of `lastProcessedAttachmentsIdsList` to the structured `ProcessedAttachment[]` form. That `string[]` format predates v1.15.2 (commit c3aa151, Feb 2026); from v1.15.2 on, state is already written as `ProcessedAttachment[]`, and the list is per-cycle scratch state cleared at the end of each completed attachment phase. v2 assumes on-disk state written by >= v1.15.2 (connectors are on >= 1.16, typically 1.19, before moving to v2), so the shim is dead weight. The call site now just initializes the list to [] when absent. 2. Move the `emit` primitive `common/control-protocol.ts` -> `multithreading/emit.ts`. The file's sole export is `emit`, used only by the multithreading layer (base-adapter, spawn) — co-locating it with its consumers and naming it after its content. Import paths updated. 3. Remove the dead `createAdapterState` dispatcher from `state.ts`. After C6 split the single entry point into processExtractionTask/processLoadingTask, process-task.ts calls createExtractionState/createLoadingState directly; the mode dispatcher had zero callers. `state.ts` is now a thin re-export barrel. Build green, lint clean. --- .../attachments-streaming-pool.ts | 37 ------------------- src/multithreading/adapters/base-adapter.ts | 2 +- .../emit.ts} | 2 +- src/multithreading/spawn/spawn.ts | 2 +- src/state/state.ts | 23 ------------ 5 files changed, 3 insertions(+), 63 deletions(-) rename src/{common/control-protocol.ts => multithreading/emit.ts} (95%) diff --git a/src/attachments-streaming/attachments-streaming-pool.ts b/src/attachments-streaming/attachments-streaming-pool.ts index 58281e1..59ce364 100644 --- a/src/attachments-streaming/attachments-streaming-pool.ts +++ b/src/attachments-streaming/attachments-streaming-pool.ts @@ -1,6 +1,5 @@ import { sleep } from '../common/helpers'; import { ExtractionAdapter } from '../multithreading/adapters/extraction-adapter'; -import { ProcessedAttachment } from '../state/state.interfaces'; import { ExternalSystemAttachmentStreamingFunction, NormalizedAttachment, @@ -40,35 +39,6 @@ export class AttachmentsStreamingPool { } } - /** - * Migrates processed attachments from the legacy string[] format to the new ProcessedAttachment[] format. - * - * @param attachments - The attachments list to migrate (either string[] or ProcessedAttachment[]) - * @returns Migrated array of ProcessedAttachment objects, or empty array if input is invalid - */ - // eslint-disable-next-line @typescript-eslint/no-explicit-any - private migrateProcessedAttachments(attachments: any): ProcessedAttachment[] { - // Handle null/undefined - if (!attachments || !Array.isArray(attachments)) { - return []; - } - - // If already migrated (first element is an object), return as-is - if (attachments.length > 0 && typeof attachments[0] === 'object') { - return attachments as ProcessedAttachment[]; - } - - // Migrate old string[] format - if (attachments.length > 0 && typeof attachments[0] === 'string') { - return attachments.map((it) => ({ - id: it as string, - parent_id: '', - })); - } - - return []; - } - async streamAll(): Promise { console.log( `Starting download of ${this.attachments.length} attachments, streaming ${this.batchSize} at once.` @@ -90,13 +60,6 @@ export class AttachmentsStreamingPool { []; } - // Migrate old processed attachments to the new format. - this.adapter.sdkState.toDevRev.attachmentsMetadata.lastProcessedAttachmentsIdsList = - this.migrateProcessedAttachments( - this.adapter.sdkState.toDevRev.attachmentsMetadata - .lastProcessedAttachmentsIdsList - ); - // Start initial batch of promises up to batchSize limit const initialBatchSize = Math.min(this.batchSize, this.attachments.length); const initialPromises = []; diff --git a/src/multithreading/adapters/base-adapter.ts b/src/multithreading/adapters/base-adapter.ts index dec9285..ab36c0e 100644 --- a/src/multithreading/adapters/base-adapter.ts +++ b/src/multithreading/adapters/base-adapter.ts @@ -1,7 +1,7 @@ import { parentPort } from 'node:worker_threads'; import { STATELESS_EVENT_TYPES } from '../../common/constants'; -import { emit } from '../../common/control-protocol'; +import { emit } from '../emit'; import { truncateMessage } from '../../common/helpers'; import { serializeError } from '../../logger/logger'; import { runWithSdkLogContext } from '../../logger/logger.context'; diff --git a/src/common/control-protocol.ts b/src/multithreading/emit.ts similarity index 95% rename from src/common/control-protocol.ts rename to src/multithreading/emit.ts index 0d13be1..9296d87 100644 --- a/src/common/control-protocol.ts +++ b/src/multithreading/emit.ts @@ -8,7 +8,7 @@ import { LoaderEvent, } from '../types/extraction'; import { LoaderEventType } from '../types/loading'; -import { LIBRARY_VERSION } from './constants'; +import { LIBRARY_VERSION } from '../common/constants'; export interface EmitInterface { event: AirSyncEvent; diff --git a/src/multithreading/spawn/spawn.ts b/src/multithreading/spawn/spawn.ts index 02176ff..064d82c 100644 --- a/src/multithreading/spawn/spawn.ts +++ b/src/multithreading/spawn/spawn.ts @@ -1,7 +1,7 @@ import yargs from 'yargs'; import { hideBin } from 'yargs/helpers'; -import { emit } from '../../common/control-protocol'; +import { emit } from '../emit'; import { getMemoryUsage } from '../../common/helpers'; import { Logger, serializeError } from '../../logger/logger'; import { AirSyncEvent, EventType } from '../../types/extraction'; diff --git a/src/state/state.ts b/src/state/state.ts index ab7dbb5..7ba1589 100644 --- a/src/state/state.ts +++ b/src/state/state.ts @@ -1,26 +1,3 @@ -import { SyncMode } from '../types/common'; - -import { BaseState } from './base-state'; -import { createExtractionState } from './extraction-state'; -import { createLoadingState } from './loading-state'; -import { StateInterface } from './state.interfaces'; - export { BaseState } from './base-state'; export { ExtractionState, createExtractionState } from './extraction-state'; export { LoadingState, createLoadingState } from './loading-state'; - -/** - * Creates and initializes the adapter state for the current worker, dispatching - * to the extraction or loading state based on the event's sync mode. - * - * @param params The state factory parameters (event, initial state, options) - * @returns The initialized mode-specific state - */ -export async function createAdapterState( - params: StateInterface -): Promise> { - if (params.event.payload.event_context.mode === SyncMode.LOADING) { - return createLoadingState(params); - } - return createExtractionState(params); -} From 4269e30caa22aa1f0c6cc397c8190ab75043658b Mon Sep 17 00:00:00 2001 From: radovanjorgic Date: Tue, 9 Jun 2026 10:47:57 +0200 Subject: [PATCH 22/22] docs: record post-review cleanup + defer C7 JSDoc to end --- V2_PROGRESS.md | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/V2_PROGRESS.md b/V2_PROGRESS.md index d9e2f9e..5094c94 100644 --- a/V2_PROGRESS.md +++ b/V2_PROGRESS.md @@ -181,10 +181,15 @@ commits. Mechanical/structural transforms first (Phase 1), polish + surface-defi emit-based, so the TaskResult return + emitFromResult mapping is authored fresh per the spec above. ### Phase 2 — closing / interactive (batched, done at the end) -- **C7 — JSDoc pass.** Bar = `src/mappers/mappers.ts` style (class block: what+when; method block: - one-line what, "Used to/for…" usage, `@param` w/ type, `@returns`). Public surface + non-obvious - internals (state migration, emit-from-return mapping, attachment streaming pool). Fan out per module, - squash to one `docs:` commit. +- **C7 — JSDoc pass (DEFERRED to the very end).** Bar = `src/mappers/mappers.ts` style (class block: + what+when; method block: one-line what, "Used to/for…" usage, `@param` w/ type, `@returns`). Public + surface + non-obvious internals (state migration, emit-from-return mapping, attachment streaming pool). + FIRST attempt (d05434b, fanned-out subagents) was REVERTED (4fea755): too heavy (902 insertions), some + blocks just restated the obvious, made structural review harder. REDO as the LAST step of the whole + refactor — done once, by hand, against settled code, with an explicit **"skip-the-obvious" rule**: + NO docs on trivial getters/setters, NO restating the method name; document only class-level "what+when", + non-obvious params, and genuinely tricky internals. The high-value blocks (normalizeFetchedState + migration shim, getEventTypeForResult/EVENT_PHASE_MAP, zero-doc types/loading.ts) still get covered. - **C8 — Regenerate api report** (`airsync-sdk.api.md`). - **C9 — Exposure audit (INTERACTIVE with Rado).** Read the regenerated report; decide per-symbol what to keep public vs hide. Empirical floor = anything imported by the 3 connectors (table below). @@ -279,6 +284,12 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: + return-based contract), `processTask` (split), `formatAxiosError` (dropped from index), `AirdropEvent`/`AirdropMessage` (renamed AirSync*), all old `EXTRACTION_*` enum members (deleted). +**v2 on-disk-state baseline assumption (for C11 migrate-v2 note):** v2 assumes persisted state was written by +SDK **>= v1.15.2**. The pre-1.15.2 `string[]` form of `lastProcessedAttachmentsIdsList` is NO LONGER migrated +(shim removed in CLEANUP/9b2b975). The v1→v2 *envelope* migration (`normalizeFetchedState`, C4b) is KEPT — that's +the real v1-vs-v2 boundary (every v1 version, incl. 1.19, writes flat state). Only the ancient intra-v1 string[] +processed-attachments format was dropped. + ## Status | Commit | State | Notes | |--------|-------|-------| @@ -290,7 +301,8 @@ Symbols imported from `@devrev/ts-adaas` by the 3 inspectable connectors: | C4b state envelope | ☑ done | 30ba1b3. { connectorState, sdkState } envelope + v1->v2 migration shim (normalizeFetchedState). adapter.state→connector-only, new adapter.sdkState; ~28 SDK-field access sites moved. SdkState kept combined (narrowing deferred to C5). Reviewer-approved (migration cases verified). | | C5 adapter split | ☑ done | a7a877f. BaseAdapter (template emit + hooks) + ExtractionAdapter + LoadingAdapter; WorkerAdapter→union alias; processTask dispatches by mode (still single entry). worker-adapter.ts deleted; helpers→loading-adapter.helpers. Reviewer-approved (emit equivalence verified). SdkState kept combined (narrowing dropped from scope). | | C6 emit-from-return | ☑ done | 0fb6116. task/onTimeout return TaskResult; SDK maps status→event via getEventTypeForResult and emits once (emitFromResult); emit now protected/internal; processTask→processExtractionTask+processLoadingTask; loader/stream methods return TaskResult. Reviewer-approved (mapping+state-save+no-double-emit verified). NET-NEW design (no oracle). | -| C7 JSDoc | ☑ done | d05434b. Comments-only pass over 25 files to the mappers.ts bar: v2-new code (adapters, state incl. migration shim, process-task/spawn/getEventTypeForResult mapping) + under-documented older modules (repo, uploader, attachments pool, control-protocol, install-IDM, errors, types/loading barrel). Verified every changed line is a comment; build green, lint clean. Fanned out 5 implementer subagents over disjoint file groups. | +| CLEANUP (post-review) | ☑ done | 9b2b975. (1) Removed dead pre-1.15.2 `migrateProcessedAttachments` shim (v2 assumes on-disk state >= v1.15.2); (2) moved emit primitive `common/control-protocol.ts`→`multithreading/emit.ts` (co-located w/ its only consumers base-adapter+spawn); (3) deleted dead `createAdapterState` dispatcher (zero callers post-C6), `state.ts` now a thin barrel. Build green, lint clean. Preceded by 4fea755 (revert of C7). | +| C7 JSDoc | ☐ todo | DEFERRED to very end. First attempt d05434b REVERTED (4fea755) — too heavy/redundant, hurt review. Redo by hand, once, against settled code, with "skip-the-obvious" rule (see C7 plan above). | | C8 api report | ☐ todo | Phase 2 | | C9 exposure audit | ☐ todo | Phase 2, interactive | | C10 tests + baseline | ☐ todo | Phase 2 |