Skip to content

Implement async byte latent transformer with entropy-based patching#1

Draft
laerdon wants to merge 2 commits into
mainfrom
cursor/task-re-attempt-5271
Draft

Implement async byte latent transformer with entropy-based patching#1
laerdon wants to merge 2 commits into
mainfrom
cursor/task-re-attempt-5271

Conversation

@laerdon

@laerdon laerdon commented Mar 17, 2026

Copy link
Copy Markdown
Owner

overview

this pr implements a complete asynchronous byte latent transformer (blt) architecture with entropy-based dynamic patching, based on the facebook research paper "byte latent transformer: patches scale better than tokens".

key features

byte encoder

  • dynamically sized patches based on next-byte prediction entropy
  • higher entropy regions get more compute allocation
  • configurable patch size bounds (min/max)
  • no tokenization required - works directly with raw bytes

async patch processor

  • parallel processing of variable-length patches
  • async attention layers for improved throughput
  • configurable worker pool for concurrent operations
  • supports both sync and async execution modes

transformer architecture

  • multi-head self-attention with residual connections
  • feed-forward networks with gelu activation
  • layer normalization and positional encoding
  • configurable depth and width

training infrastructure

  • async training loop with gradient accumulation
  • cosine annealing learning rate schedule
  • checkpoint saving and loading
  • validation support

testing

all components have comprehensive tests:

  • byte encoder tests: entropy computation, patch segmentation, batch encoding
  • async processor tests: parallel processing, attention mechanisms
  • blt model tests: forward pass, generation, parameter counting

all tests pass successfully:

[PASS] all byte encoder tests passed
[PASS] all async processor tests passed
[PASS] all blt model tests passed

examples

included example scripts demonstrate:

  • inference and text generation
  • training with custom data
  • model configuration options

usage

from blt.models.blt_model import ByteLatentTransformer

model = ByteLatentTransformer(
    d_model=512,
    num_layers=6,
    num_heads=8,
    use_async=True,
)

# process bytes directly
byte_tensor = torch.tensor(list(text.encode('utf-8')), dtype=torch.long)
logits, patch_sizes = model(byte_tensor)

implementation details

  • entropy-based segmentation allocates compute based on data complexity
  • async processing enables parallel patch handling
  • supports both single sequence and batch processing
  • includes comprehensive documentation in readme and implementation.md

files changed

  • blt/models/byte_encoder.py: entropy-based byte encoder
  • blt/models/patch_processor.py: async patch processing
  • blt/models/blt_model.py: main transformer architecture
  • blt/utils/trainer.py: async training loop
  • blt/utils/data_loader.py: byte sequence data loading
  • examples/: training and inference examples
  • tests/: comprehensive test suite
  • README.md: complete documentation
  • IMPLEMENTATION.md: detailed implementation summary

Slack Thread

Open in Web Open in Cursor 

cursoragent and others added 2 commits March 17, 2026 20:53
- add byte encoder with dynamic patch segmentation based on entropy
- implement async patch processor for parallel processing
- create blt transformer architecture with multi-head attention
- add async training loop with gradient accumulation
- include comprehensive tests for all components
- add example scripts for training and inference
- all tests passing successfully

Co-authored-by: Laerdon Kim <laerdon@users.noreply.github.com>
Co-authored-by: Laerdon Kim <laerdon@users.noreply.github.com>
@laerdon

laerdon commented Apr 2, 2026

Copy link
Copy Markdown
Owner Author

Comment: this was just seeing if Opus 4.6 would one-shot it for fun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants