|
| 1 | +# py-snappy |
| 2 | + |
| 3 | +Pure Python implementation of Google's Snappy compression algorithm. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Pure Python**: No external dependencies or C extensions required |
| 8 | +- **Full compatibility**: Produces output compatible with Google's Snappy format |
| 9 | +- **Well documented**: Extensive inline documentation explaining the algorithm |
| 10 | +- **Thoroughly tested**: Comprehensive test suite using C++ Snappy test data |
| 11 | + |
| 12 | +## Installation |
| 13 | + |
| 14 | +```bash |
| 15 | +uv sync |
| 16 | +``` |
| 17 | + |
| 18 | +## Usage |
| 19 | + |
| 20 | +```python |
| 21 | +from src import compress, decompress |
| 22 | + |
| 23 | +# Compress data |
| 24 | +data = b"Hello, World!" * 100 |
| 25 | +compressed = compress(data) |
| 26 | + |
| 27 | +# Decompress data |
| 28 | +original = decompress(compressed) |
| 29 | +assert original == data |
| 30 | +``` |
| 31 | + |
| 32 | +## API |
| 33 | + |
| 34 | +### Core Functions |
| 35 | + |
| 36 | +- `compress(data: bytes) -> bytes`: Compress data using Snappy |
| 37 | +- `decompress(data: bytes) -> bytes`: Decompress Snappy-compressed data |
| 38 | + |
| 39 | +### Utilities |
| 40 | + |
| 41 | +- `max_compressed_length(size: int) -> int`: Maximum possible compressed size |
| 42 | +- `get_uncompressed_length(data: bytes) -> int`: Read uncompressed length from header |
| 43 | +- `is_valid_compressed_data(data: bytes) -> bool`: Quick validation check |
| 44 | + |
| 45 | +### Exceptions |
| 46 | + |
| 47 | +- `SnappyDecompressionError`: Raised when decompression fails |
| 48 | + |
| 49 | +## Development |
| 50 | + |
| 51 | +```bash |
| 52 | +# Install dependencies |
| 53 | +uv sync |
| 54 | + |
| 55 | +# Run tests |
| 56 | +uv run pytest |
| 57 | + |
| 58 | +# Run linter |
| 59 | +uv run ruff check src/ tests/ |
| 60 | + |
| 61 | +# Format code |
| 62 | +uv run ruff format src/ tests/ |
| 63 | +``` |
| 64 | + |
| 65 | +## Algorithm |
| 66 | + |
| 67 | +Snappy is an LZ77-variant compression algorithm that prioritizes speed over compression ratio. Key characteristics: |
| 68 | + |
| 69 | +- **Block-based**: Data is processed in 64KB blocks |
| 70 | +- **Hash table**: O(1) match lookup using a simple hash function |
| 71 | +- **Greedy matching**: No optimal parsing or lazy evaluation |
| 72 | +- **Wire format**: Varint length prefix followed by literals and copy references |
| 73 | + |
| 74 | +## References |
| 75 | + |
| 76 | +- [Google Snappy](https://github.com/google/snappy) |
| 77 | +- [Format Description](https://github.com/google/snappy/blob/main/format_description.txt) |
| 78 | + |
| 79 | +## License |
| 80 | + |
| 81 | +MIT License |
0 commit comments