Implement async byte latent transformer with entropy-based patching#1
Draft
laerdon wants to merge 2 commits into
Draft
Implement async byte latent transformer with entropy-based patching#1laerdon wants to merge 2 commits into
laerdon wants to merge 2 commits into
Conversation
- add byte encoder with dynamic patch segmentation based on entropy - implement async patch processor for parallel processing - create blt transformer architecture with multi-head attention - add async training loop with gradient accumulation - include comprehensive tests for all components - add example scripts for training and inference - all tests passing successfully Co-authored-by: Laerdon Kim <laerdon@users.noreply.github.com>
Co-authored-by: Laerdon Kim <laerdon@users.noreply.github.com>
Owner
Author
|
Comment: this was just seeing if Opus 4.6 would one-shot it for fun |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
overview
this pr implements a complete asynchronous byte latent transformer (blt) architecture with entropy-based dynamic patching, based on the facebook research paper "byte latent transformer: patches scale better than tokens".
key features
byte encoder
async patch processor
transformer architecture
training infrastructure
testing
all components have comprehensive tests:
all tests pass successfully:
examples
included example scripts demonstrate:
usage
implementation details
files changed
blt/models/byte_encoder.py: entropy-based byte encoderblt/models/patch_processor.py: async patch processingblt/models/blt_model.py: main transformer architectureblt/utils/trainer.py: async training loopblt/utils/data_loader.py: byte sequence data loadingexamples/: training and inference examplestests/: comprehensive test suiteREADME.md: complete documentationIMPLEMENTATION.md: detailed implementation summarySlack Thread