Skip to content

Latest commit

 

History

History
106 lines (75 loc) · 3.91 KB

File metadata and controls

106 lines (75 loc) · 3.91 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[0.2.0] - Unreleased

Added

  • --walk option for enhanced gitignore-style pattern handling
  • CITATION.cff for academic citation support
  • Line ending consistency check for cross-platform ISCC compatibility

Fixed

  • Unicode handling issues
  • Clippy warnings for Rust 1.88.0 compatibility
  • Cross-platform CI configuration

[0.1.0] - 2025-06-19

Summary

First stable release of iscc-sum, a high-performance ISCC Data-Code and Instance-Code hashing tool built in Rust with Python bindings. This release achieves 50-130x performance improvements over the pure python reference implementation while maintaining full compatibility with the ISCC standard.

Added

Core Features

  • ISCC Data-Code generation: Content-defined chunking (CDC) algorithm for content similarity detection
    • Wide format (128-bit, default) for enhanced security and similarity matching
    • Narrow format (64-bit) for ISO 24138:2024 compliance via --narrow flag
  • ISCC Instance-Code generation: BLAKE3-based cryptographic hash for file integrity verification
  • ISCC-SUM composite code: Combined Data-Code and Instance-Code with self-describing header
  • High-performance processing: 950-1050 MB/s throughput using SIMD optimizations and parallel processing

Command-Line Interface

  • Full-featured CLI (iscc-sum) with Unix-style interface
    • Generate checksums for files, directories, and stdin
    • Verify checksums from files (-c, --check)
    • Find similar files based on data (--similar)
    • Process directories as single objects (-t, --tree)
    • BSD-style output format (--tag)
    • NUL-terminated output (--zero)
    • Show individual code components (--units)

Python API

  • High-level API for easy integration
    • code_iscc_sum() function supporting local files and fsspec URIs
    • Streaming processors for large file handling
    • Dictionary-compatible result objects
  • Universal path support via fsspec integration
    • Process files from S3, HTTP/HTTPS, and other remote sources
    • Transparent handling of different storage backends

Platform Support

  • Cross-platform compatibility: Linux, macOS, and Windows
  • Python version support: 3.10, 3.11, 3.12, and 3.13
  • Pre-built wheels for all major platforms

Developer Experience

  • 100% test coverage requirement with comprehensive test suite
  • Integrated tooling via poethepoet task automation
  • Type annotations with mypy type checking
  • Security scanning with Bandit
  • Automated CI/CD pipeline with Release Please

Performance

  • 50-130x faster than pure Python reference implementations
  • 950-1050 MB/s processing speed on modern hardware
  • Parallel processing using Rayon for multi-core utilization

Standards Compliance

  • ISO 24138:2024 compatible when using --narrow flag
  • Reference implementation compatibility for all code formats
  • Deterministic output across platforms

Known Limitations

  • This is an early release focused on core functionality
  • Advanced ISCC features (Text-Code, Meta-Code) are out of scope for now
  • Rust crate not published to crates.io in this release

Dependencies

  • blake3 >=1.0.5 - Cryptographic hashing
  • click >=8.0.0 - CLI framework
  • pathspec >=0.12.1 - Gitignore-style pattern matching
  • universal-pathlib >=0.2.6 - Cross-platform path handling
  • xxhash >=3.5.0 - High-speed hashing for CDC

Acknowledgments

This project implements the ISCC (International Standard Content Code) as defined in ISO 24138:2024. The performance improvements are achieved through Rust's zero-cost abstractions and careful algorithm optimization while maintaining full compatibility with the standard.