Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.6.0] - 2026-03-22

Added

ProcessToWriter API (457d341): New method that writes preprocessed output directly to an io.Writer, avoiding the output buffer allocation for large datasets
Sentinel Errors (0cd1558):
- ErrNilWriter: Returned when a nil io.Writer is passed to ProcessToWriter
- ErrNilReader: Returned when a nil io.Reader is passed to Process or ProcessToWriter, distinguished from ErrEmptyFile
Struct Tag Parse Cache (457d341): sync.Map-based cache keyed by (reflect.Type, strict) eliminates redundant tag parsing on repeated Process calls

Fixed

Slice Reset on Reuse (457d341): Process now calls SetLen(0) before appending, so reusing the same destination slice no longer carries over stale elements
Sentinel Error Wrapping (457d341, 3026a38): Errors from fileparser.Parse are now wrapped with ErrEmptyFile / ErrUnsupportedFileType so errors.Is works correctly
Cross-Field Validation (457d341): Cross-field validators now use a preprocessed field-value map instead of column indices, fixing target field ... not found when the column is absent but filled by prep:"default=..."
wrapParseError Precision (3026a38, 0cd1558): Replaced broad substring matching with exact message matching against fileparser v0.5.1 error strings; separated "reader cannot be nil" from ErrEmptyFile into its own ErrNilReader

Changed

CI Hardening (457d341, 3026a38): Added -race flag and govulncheck (pinned v1.1.4) to CI workflow
Go Version (785493a, 3026a38): Bumped minimum Go version to 1.25; updated golang.org/x/net to v0.51.0 to resolve GO-2026-4559
Documentation (c685c44):
- Rewrote "Before Using fileprep" as concise "Gotchas" section across all 7 README languages
- Added ProcessToWriter to README, doc.go, and example_test.go
- Fixed CONTRIBUTING.md reference to non-existent .cursorrules / .github/copilot-instructions.md
- Added version support table to SECURITY.md

[0.5.0] - 2026-02-15

Added

JSON/JSONL Format Support: First-class support for JSON (.json) and JSONL (.jsonl) file formats
- JSON arrays are parsed into individual rows; each element becomes a row with a single "data" column
- JSONL files are parsed line-by-line; each line becomes a row with a single "data" column
- JSON/JSONL output is always compact JSONL (one JSON value per line, no header)
- Pretty-printed JSON input is automatically compacted via json.Compact
18 New FileType Constants: 2 base types + 16 compressed variants
- JSON: FileTypeJSON, FileTypeJSONGZ, FileTypeJSONBZ2, FileTypeJSONXZ, FileTypeJSONZSTD, FileTypeJSONZLIB, FileTypeJSONSNAPPY, FileTypeJSONS2, FileTypeJSONLZ4
- JSONL: FileTypeJSONL, FileTypeJSONLGZ, FileTypeJSONLBZ2, FileTypeJSONLXZ, FileTypeJSONLZSTD, FileTypeJSONLZLIB, FileTypeJSONLSNAPPY, FileTypeJSONLS2, FileTypeJSONLLZ4
Sentinel Errors for JSON Integrity:
- ErrInvalidJSONAfterPrep: Hard error when preprocessing (e.g., truncate) destroys JSON structure
- ErrEmptyJSONOutput: Hard error when all rows become empty after preprocessing, resulting in 0-line JSONL output
omitempty Validator:
- Added validate:"omitempty,..." support to skip subsequent validators when a field value is empty
- Useful for optional fields such as omitempty,email
Processor Options:
- WithStrictTagParsing(): Strict mode that returns an error for invalid tag arguments
- WithValidRowsOnly(): Output and destination slice include only rows that passed all validations
Comprehensive Tests: Unit and integration tests for JSON/JSONL processing including pretty-printed input, compressed variants, validation, and error paths
- Added tests for conditional cross-field validators (required_if, required_unless, required_with, required_without)
- Added tests for type-conversion paths (setFieldValue) across string/int/uint/float/bool
- Added end-to-end tests for XLSX and Parquet pipelines

Changed

Tag Parser Refactor:
- Refactored prep / validate tag parsing to a registry-based implementation for easier extension and maintenance
- Improved error reporting for invalid tag argument formats in strict mode
Output Behavior:
- Added optional valid-row filtering behavior via WithValidRowsOnly() while preserving row/error statistics in ProcessResult
Dependency Update: Updated fileparser from v0.4.0 to v0.5.1 for JSON/JSONL parsing support
Documentation:
- Updated README content with clearer pre-use notes and conditional-validator examples
- Replaced internal CLAUDE.md reference in package docs with pkg.go.dev link

[0.4.0] - 2025-12-11

Added

New Compression Formats: Added support for 4 new compression formats via fileparser v0.2.0
- zlib (.z) - Standard DEFLATE compression
- snappy (.snappy) - Google's high-speed compression
- s2 (.s2) - Improved Snappy extension, faster
- lz4 (.lz4) - Extremely fast compression
New FileType Constants: Added 20 new FileType aliases for new compression format combinations
- CSV: FileTypeCSVZLIB, FileTypeCSVSNAPPY, FileTypeCSVS2, FileTypeCSVLZ4
- TSV: FileTypeTSVZLIB, FileTypeTSVSNAPPY, FileTypeTSVS2, FileTypeTSVLZ4
- LTSV: FileTypeLTSVZLIB, FileTypeLTSVSNAPPY, FileTypeLTSVS2, FileTypeLTSVLZ4
- Parquet: FileTypeParquetZLIB, FileTypeParquetSNAPPY, FileTypeParquetS2, FileTypeParquetLZ4
- Excel: FileTypeXLSXZLIB, FileTypeXLSXSNAPPY, FileTypeXLSXS2, FileTypeXLSXLZ4
Integration Tests: Added comprehensive tests for new compression formats (CSV, TSV, LTSV)

Changed

Dependency Update: Updated to fileparser v0.2.0 for new compression format support
Documentation: Updated all README files (en, ja, es, fr, ko, ru, zh-cn) with new compression formats

[0.3.0] - 2025-12-11

Changed

Migrated from github.com/nao1215/filesql/parser to github.com/nao1215/fileparser for file parsing
Updated all internal references from parser. to fileparser.

Removed

Dependency on github.com/nao1215/filesql

[0.2.0] - 2025-12-08

Added

Conditional Required Validators (9caa374): New validators for conditional field requirements
- required_if: Required if another field equals a specific value
- required_unless: Required unless another field equals a specific value
- required_with: Required if another field is present
- required_without: Required if another field is not present
Date/Time Validator (9caa374): datetime validator with custom Go layout format support
Phone Number Validator (9caa374): e164 validator for E.164 international phone number format
Geolocation Validators (9caa374): latitude (-90 to 90) and longitude (-180 to 180) validators
UUID Variant Validators (9caa374): uuid3, uuid4, uuid5 for specific UUID versions, and ulid for ULID format
Hexadecimal and Color Validators (9caa374): hexadecimal, hexcolor, rgb, rgba, hsl, hsla validators
MAC Address Validator (9caa374): mac validator for MAC address format
Advanced Examples (f771f9b): Comprehensive documentation examples
- Complex Data Preprocessing and Validation example with real-world messy data
- Detailed Error Reporting example demonstrating validation error handling
Benchmark Tests (607b868): Comprehensive benchmark suite for performance testing

Changed

Performance Improvement (PR #6, 607b868): ~10% faster processing through optimized preprocessing and validation pipeline
Documentation (f771f9b): Complete update of all README translations (Japanese, Spanish, French, Korean, Russian, Chinese) to match the English version with full feature documentation

[0.1.0] - 2025-12-07

Added

Initial Release: First stable release of fileprep library
File Format Support: CSV, TSV, LTSV, Parquet, Excel (.xlsx) with compression support (gzip, bzip2, xz, zstd)
Preprocessing Tags (prep): Comprehensive struct tag-based preprocessing
- Basic preprocessors: trim, ltrim, rtrim, lowercase, uppercase, default
- String transformation: replace, prefix, suffix, truncate, strip_html, strip_newline, collapse_space
- Character filtering: remove_digits, remove_alpha, keep_digits, keep_alpha, trim_set
- Padding: pad_left, pad_right
- Advanced: normalize_unicode, nullify, coerce, fix_scheme, regex_replace
Validation Tags (validate): Compatible with go-playground/validator syntax
- Basic validators: required, boolean
- Character type validators: alpha, alphaunicode, alphaspace, alphanumeric, alphanumunicode, numeric, number, ascii, printascii, multibyte
- Numeric comparison: eq, ne, gt, gte, lt, lte, min, max, len
- String validators: oneof, lowercase, uppercase, eq_ignore_case, ne_ignore_case
- String content: startswith, startsnotwith, endswith, endsnotwith, contains, containsany, containsrune, excludes, excludesall, excludesrune
- Format validators: email, uri, url, http_url, https_url, url_encoded, datauri, uuid
- Network validators: ip_addr, ip4_addr, ip6_addr, cidr, cidrv4, cidrv6, fqdn, hostname, hostname_rfc1123, hostname_port
- Cross-field validators: eqfield, nefield, gtfield, gtefield, ltfield, ltefield, fieldcontains, fieldexcludes
Name-Based Column Binding: Automatic snake_case conversion with name tag override
filesql Integration: Returns io.Reader for direct use with filesql
Detailed Error Reporting: Row and column information for each validation error

Technical Details

Memory Optimization: In-place record processing, pre-allocated output buffers, streaming parsers for CSV/TSV/LTSV
XLSX Streaming: Uses excelize streaming API to reduce memory usage for large files
Parquet Buffer Reuse: Reusable row buffer across row groups to reduce allocations
Format-Specific Limitations:
- XLSX: Only the first sheet is processed
- LTSV: Maximum line size is 10MB

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changelog

[Unreleased]

[0.6.0] - 2026-03-22

Added

Fixed

Changed

[0.5.0] - 2026-02-15

Added

Changed

[0.4.0] - 2025-12-11

Added

Changed

[0.3.0] - 2025-12-11

Changed

Removed

[0.2.0] - 2025-12-08

Added

Changed

[0.1.0] - 2025-12-07

Added

Technical Details

Uh oh!

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[Unreleased]

[0.6.0] - 2026-03-22

Added

Fixed

Changed

[0.5.0] - 2026-02-15

Added

Changed

[0.4.0] - 2025-12-11

Added

Changed

[0.3.0] - 2025-12-11

Changed

Removed

[0.2.0] - 2025-12-08

Added

Changed

[0.1.0] - 2025-12-07

Added

Technical Details