Skip to content

Add BatchLoader + BatchTask for batched file loading#68

Merged
koenvo merged 4 commits intomainfrom
feature/dataset-resource-in-file-loader
Apr 7, 2026
Merged

Add BatchLoader + BatchTask for batched file loading#68
koenvo merged 4 commits intomainfrom
feature/dataset-resource-in-file-loader

Conversation

@koenvo
Copy link
Copy Markdown
Contributor

@koenvo koenvo commented Apr 5, 2026

Sources can now wrap a loader function in BatchLoader(loader_fn, batch_size) and share the instance across DatasetResources that should be batched together. Ingestify groups those resources, chunks them into groups of batch_size, wraps each chunk in a BatchTask, and calls the loader_fn once per batch with lists of file_resources / current_files / dataset_resources.

load_file() now passes dataset_resource to loaders that accept it (signature introspection with lru_cache, so existing loaders continue to work without changes).

koenvo added 4 commits April 5, 2026 21:22
Sources can now wrap a loader function in BatchLoader(loader_fn, batch_size)
and share the instance across DatasetResources that should be batched
together. Ingestify groups those resources, chunks them into groups of
batch_size, wraps each chunk in a BatchTask, and calls the loader_fn once
per batch with lists of file_resources / current_files / dataset_resources.

load_file() now passes dataset_resource to loaders that accept it
(signature introspection with lru_cache, so existing loaders continue to
work without changes).
Different BatchTasks sharing the same loader write/read different
keys (id(file_resource)); CPython dict operations on distinct keys
are atomic under the GIL.
@koenvo koenvo merged commit e5274c6 into main Apr 7, 2026
13 checks passed
@koenvo koenvo deleted the feature/dataset-resource-in-file-loader branch April 7, 2026 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant