This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
bentools/etl is a PHP library implementing the Extract/Transform/Load pattern for data processing workflows. It's designed to be flexible, event-driven, and support both synchronous and asynchronous (ReactPHP) processing.
Core concept: Extract data from a source, apply transformations, and load results into a destination.
# Run all CI checks (PHP-CS-Fixer, PHPStan, Pest with coverage)
composer ci:check
# Run tests only
vendor/bin/pest
# Run tests with coverage
vendor/bin/pest --coverage
# Run a single test file
vendor/bin/pest tests/Behavior/FlushTest.php
# Run PHPStan type checking
vendor/bin/phpstan analyse
# Run code style fixer
vendor/bin/php-cs-fixer fix- PHP >=8.2
- Tests use Pest (not PHPUnit syntax)
- 100% code coverage expected before PRs
EtlExecutor (src/EtlExecutor.php)
- Main entry point for building and executing ETL workflows
- Uses builder pattern via
EtlBuilderTraitto chain extractors, transformers, and loaders - Dispatches events at each lifecycle stage (init, extract, transform, load, flush, end)
- Handles exceptions through dedicated event types (ExtractException, TransformException, etc.)
EtlState (src/EtlState.php)
- Immutable state object passed through the entire workflow
- Tracks: current item, indices, flush timing, loaded items count, output
- Contains context (arbitrary data), source, and destination
- Version system for state updates during processing
EtlConfiguration (src/EtlConfiguration.php)
- Configuration object for flush frequency, batch size, and other options
flushEvery- Controls how often the loader flushes (default: INF)batchSize- Controls how many items are grouped for batch transformation (default: 1)
-
ExtractorInterface (
src/Extractor/)extract(EtlState $state): iterable- Returns an iterable of items to process- Built-in: CSV, JSON, FileExtractor, STDINExtractor, IterableExtractor, ReactStreamExtractor
-
TransformerInterface (
src/Transformer/)transform(mixed $item, EtlState $state): mixed- Transforms extracted items- Return value can be a single value, an array, or a generator (yield)
- Yielded items generate multiple loads from a single extracted item
- Built-in: CallableTransformer, ChainTransformer, NullTransformer
-
BatchTransformerInterface (
src/Transformer/)transform(array $items, EtlState $state): Generator- Transforms a batch of items at once- Separate interface from
TransformerInterface(does NOT extend it) - Activated when
batchSizeis set inEtlConfigurationand transformer implements this interface - Each yielded value becomes an individual item for the load phase
- Built-in: CallableBatchTransformer
-
LoaderInterface (
src/Loader/)load(mixed $item, EtlState $state): void- Loads transformed itemsflush(bool $isEarly, EtlState $state): mixed- Called at flush frequency or end- Built-in: InMemoryLoader, CSV, JSON, DoctrineORM, STDOUTLoader
Event dispatching (src/EventDispatcher/)
- Custom PSR-14 implementation with priority support
- Events: InitEvent, StartEvent, ExtractEvent, TransformEvent, BeforeLoadEvent, LoadEvent, FlushEvent, EndEvent
- Exception events: ExtractExceptionEvent, TransformExceptionEvent, LoadExceptionEvent, FlushExceptionEvent
- Use
->on{EventName}(callable $listener, int $priority = 0)on EtlExecutor
Control flow exceptions:
SkipRequest- Skip current item, continue processingStopRequest- Stop entire workflow immediately
ProcessorInterface (src/Processor/)
IterableProcessor- Default synchronous processingReactStreamProcessor- Async processing with ReactPHP streams (experimental)
Recipe (src/Recipe/)
- Reusable workflow configurations (combine extractors, transformers, loaders, event listeners)
FilterRecipe- Skip/exclude items based on callable filterLoggerRecipe- PSR-3 logging integration
src/functions.php provides helper functions:
extractFrom()- Create executor starting with extractortransformWith()- Create executor starting with transformerloadInto()- Create executor starting with loaderwithRecipe()- Create executor with recipechain()- Chain multiple extractors/transformers/loadersstdIn()/stdOut()- STDIN/STDOUT helpersskipWhen()- Conditional skip recipe
- EtlExecutor uses
ClonableTrait- all builder methods return clones - EtlState has version tracking - always get latest via
$state->getLastVersion()
$executor = (new EtlExecutor())
->extractFrom($extractor)
->transformWith($transformer)
->loadInto($loader)
->onTransform(fn($event) => /* ... */)
->process($source, $destination);$state->nextTick(callable $callback)- Schedule callback after current item- Useful for deferring operations or cleanup
- Consumed between items and guaranteed to run even if workflow stops
- Configure via
new EtlConfiguration(batchSize: N)to group N items per batch - Requires a transformer implementing
BatchTransformerInterface(separate fromTransformerInterface) - Processing flow: items are chunked via
iterable_chunk(), then for each chunk:- ExtractEvent fires per item (items can be skipped individually)
transform(array $items, EtlState $state): Generatoris called once for the whole batch- Each yielded result goes through TransformEvent → Load individually
nextTickcallbacks are consumed between batches, not between items within a batch- When
batchSizeis set but transformer is notBatchTransformerInterface, batching is ignored - Note:
$state->currentItemKeyduring Transform/Load events points to the last item of the batch
- Configurable via
new EtlConfiguration(flushEvery: N) flush()called when: frequency threshold reached, or at end (with$isEarly = false)- Early flush = during processing, final flush = at termination
- Tests are organized in
tests/Behavior/andtests/Unit/ - Use Pest syntax (
test(),expect(),it()) - Mock with Mockery when needed
- Coverage is tracked - don't reduce it
- PHP 8.2+ features are welcome (readonly properties, enums, etc.)
- Prefer immutability and value objects
- Event listeners should be side-effect free when possible
- Transformers returning generators (yield) allow 1-to-many transformations
- Loaders can implement
ConditionalLoaderInterfaceto skip certain items