feat(pipeline): context manager for pre/post operation on `pipeline.run()` by rudolfix · Pull Request #3677 · dlt-hub/dlt

rudolfix · 2026-02-24T20:57:25Z

Description

This PR enables and demonstrates customization of the run method: it injects additional data to be loaded in the same trace transaction in an idempotent (can be retried) way.

See the included test for PoC

cloudflare-workers-and-pages · 2026-02-24T21:01:16Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	docs	`fbdc0d6`	Commit Preview URL Branch Preview URL	Feb 25 2026, 04:09 PM

zilto

My summary of the design constraints:

Pipeline.run() is a public interface; it is what people call in their code.
Pipeline.run() decorated with a bunch of context manager and has a lot of built-in assumptions for transactions, which makes patching it hard.
Pipeline._run_once() is a new method that is private and include the whole logic of what Pipeline.run() does. It can be patched while preserving all of the decorators, context manager, public interface

Suggestion

rename Pipeline._run_once() to Pipeline._run()

move the logic of ._run_once() to an internal function called _run_pipeline()

def _run_pipeline(
   pipeline: Pipeline,
    data: Any,
    *,
    ...,
) -> LoadInfo:
    pipeline.extract(...)
    pipeline.normalize()
    return pipeline.load(...)

class Pipeline:
   def _run(self, ...):
       return _run_pipeline(...)

    def run(self, ...):
        return self._run(...)

This reduces clunkiness of patching by replacing base class invocation Pipeline._run(...) with functional pattern _run_pipeline()

class SidecarPipeline(Pipeline):
    def _run(self, data: Any, **kwargs: Any) -> LoadInfo:
        load_info = Pipeline._run(self, data, **kwargs)
        return load_info

Becomes

class SidecarPipeline(Pipeline):
    def _run(self, data: Any, **kwargs: Any) -> LoadInfo:
        load_info = _run_pipeline(self, data, **kwargs)
        return load_info

zilto · 2026-02-25T16:21:30Z

+    class SidecarPipeline(Pipeline):
+        def _run_once(self, data: Any, **kwargs: Any) -> LoadInfo:
+            load_info = Pipeline._run_once(self, data, **kwargs)
+            # guard idempotency via local state
+            try:
+                self.get_local_state_val(SIDECAR_LOADED_KEY)
+            except KeyError:
+                self.set_local_state_val(SIDECAR_LOADED_KEY, True)
+                Pipeline._run_once(self, sidecar_source(), **kwargs)
+            return load_info
+
+        def __reduce__(self):
+            return (Pipeline.__new__, (Pipeline,), Pipeline.__getstate__(self))


This code really deserve some comments and explanation.

Class vs instance

AFAIU, the key trick is to use the class object instead of the instance

# via class, passing `self` explicitly Pipeline._run_once(self, data, **kwargs) # instead of via instance self._run_once(data, **kwargs)

I understand that SidecarPipeline overrides _run_once which is called by the user's code pipeline.run(). So SidecarPipeline._run_once() references Pipeline._run_once() to avoid recursion

State key

How / why is the local state key set? It seems to not be used anywhere. I imagine that on a retry, you can tell if you need to retry main pipeline, sidecar pipeline, or both

adds _run_once with sidecar source PoC test

8fdaf1e

rudolfix requested a review from zilto February 24, 2026 20:57

rudolfix self-assigned this Feb 24, 2026

zilto changed the title ~~(feat) allows to customize pipeline.run method~~ feat(pipeline: context manager for pre/post operation on pipeline.run() Feb 25, 2026

zilto changed the title ~~feat(pipeline: context manager for pre/post operation on pipeline.run()~~ feat(pipeline): context manager for pre/post operation on pipeline.run() Feb 25, 2026

tests _run_once signature

fbdc0d6

zilto reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pipeline): context manager for pre/post operation on `pipeline.run()`#3677

feat(pipeline): context manager for pre/post operation on `pipeline.run()`#3677
rudolfix wants to merge 2 commits intodevelfrom
feat/customize-run-pipeline

rudolfix commented Feb 24, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Feb 24, 2026 •

edited

Loading

Uh oh!

zilto left a comment

Uh oh!

zilto Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rudolfix commented Feb 24, 2026

Description

Uh oh!

cloudflare-workers-and-pages Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

zilto left a comment

Choose a reason for hiding this comment

Suggestion

Uh oh!

zilto Feb 25, 2026

Choose a reason for hiding this comment

Class vs instance

State key

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cloudflare-workers-and-pages Bot commented Feb 24, 2026 •

edited

Loading