Open
Conversation
cb70d4e to
58e68cd
Compare
Add archive_config.go with: - archiveConfig struct for archive bomb protection (MaxTotalSize=10MB, MaxFileCount=1000, MaxNestingDepth=1) - defaultArchiveConfig instance - checkSize() and checkFileCount() methods - supportedArchiveExts map (.zip, .rar, .7z) - supportedTextExts map (.sql, .txt, .java) Add archive_config_test.go with: - TestArchiveConfig_CheckSize: 7 cases including boundary values - TestArchiveConfig_CheckFileCount: 6 cases including boundary values - TestDefaultArchiveConfig: verify default config values - TestSupportedArchiveExts and TestSupportedTextExts: verify extension maps
Add processArchiveEntry() function that handles individual files within archives (ZIP/RAR/7z) by dispatching based on file extension: - .sql/.txt: read content with UTF-8 conversion as SQL - .xml: read content with UTF-8 conversion as XML for MyBatis parsing - .java: write to temp file and call java-sql-extractor to extract SQL - other formats: mark as unsupported Includes getSqlFromJavaContent() helper that bridges the gap between in-memory content and javaParser.GetSqlFromJavaFile (which requires a file path). Unit tests cover all 8 cases from design doc section 5.1.1.
…ssArchiveEntry() Refactor getSqlsFromZip() to delegate file processing to the unified processArchiveEntry() function instead of inline .sql/.xml handling. This extends ZIP support to .txt and .java files while maintaining backward compatibility for existing .sql and .xml processing. Key changes: - Replace maxZipFileSize check with defaultArchiveConfig.checkSize() - Add file count limit via defaultArchiveConfig.checkFileCount() - Delegate per-entry processing to processArchiveEntry() - Preserve ErrUnknownEncoding skip behavior - Preserve natural sort ordering and XML batch parsing
Implement RAR archive decompression support using nwaples/rardecode v1.1.3. - Add rardecode v1.1.3 dependency to go.mod and vendor - New archive_rar.go with getSqlsFromRar() (echo.Context wrapper) and processRarContent() (core logic, testable without echo.Context) - Support: .sql/.txt/.xml/.java extraction, nested archive skipping, size/count limits via archiveConfig, natural sort ordering - Generate 7 RAR4 test files via Python script for unit testing - Add archive_rar_test.go with 10 test scenarios covering design doc 5.1.2
Implement 7z archive decompression support using bodgit/sevenzip library: - New archive_7z.go with getSqlsFrom7z() (echo.Context wrapper) and process7zContent() (core logic accepting io.ReaderAt + size for testability) - Handles sevenzip's io.ReaderAt requirement by reading upload into bytes.Reader - Reuses processArchiveEntry() for file type dispatch (.sql/.txt/.xml/.java) - Applies archiveConfig limits (10MB total size, 1000 files, nested archive skip) - Supports ErrUnknownEncoding skip, XML batch parsing, natural sort ordering - Added 7 test 7z files (sql_only, normal, nested, empty, unsupported, only_unsupported, sorted_test) in testdata/7z/ - Unit tests in archive_7z_test.go: 10 test scenarios covering all design doc 5.1.3 cases plus size/count/invalid data edge cases - Vendored bodgit/sevenzip v1.3.0 and all transitive dependencies
Modify GetSQLFromFile() in task.go to dispatch by file extension: - input_sql_file branch: .sql/.txt use original logic, .java calls getSqlFromJavaContent() to extract SQL from Java source code - input_zip_file branch: replaced direct getSqlsFromZip() call with new getSqlsFromArchive() dispatcher that routes .zip/.rar/.7z to their respective handlers Also confirmed isSupportFileType() needs no modification - it checks form field names (input_sql_file, input_zip_file) not file extensions, so new formats are already covered.
Add XLSX parsing support using excelize/v2 library to extract SQL statements from Excel template files. The implementation follows the same code organization pattern as RAR/7z (separated echo.Context entry function and testable core function). Changes: - Add github.com/xuri/excelize/v2 v2.7.1 dependency with vendor - New archive_xlsx.go: getSqlsFromXlsx() + processXlsxContent() - Integrate .xlsx case into GetSQLFromFile pipeline in task.go - 8 XLSX test files in testdata/xlsx/ covering all design doc scenarios - Unit tests covering design doc 5.1.4 all 6 cases plus extras
…supportedTextExts - P0-1: Add nested archive skip check before file read (consistent with RAR/7z) - P0-2: Add cumulative decompressed size tracking to prevent zip bombs - P0-3: Close rc (ReadCloser) after io.ReadAll to fix resource leak - P1-1: Add .xlsx to supportedTextExts now that phase 2 is complete, update test
…QLFromFile The input_sql_file branch in GetSQLFromFile() was passing raw bytes from controller.ReadFile() directly without encoding conversion. When a GBK-encoded file was uploaded, the raw GBK bytes were stored into MySQL UTF-8 columns, causing Error 1366 (Incorrect string value). Added utils.ConvertToUtf8() calls for .sql/.txt and .java branches, consistent with processArchiveEntry() which already handles encoding conversion for files inside archives. Fixes: BUG-001
BUG-003: When an archive (.zip/.rar/.7z) contains no auditable files (all files are unsupported formats), return a clear error message "no auditable files in the archive" instead of silently creating an empty audit record. This satisfies AC-4.6. OBS-004: Track the count of skipped unsupported-format files during archive processing and include "skipped N unsupported format file(s)" in the API response message field. This satisfies AC-4.5. Changes: - Add skippedCount return value to processRarContent, process7zContent, getSqlsFromZip, getSqlsFromRar, getSqlsFrom7z, getSqlsFromArchive - Add Message field to GetSQLFromFileResp for user feedback - Add newBaseResWithMessage helper for response message composition - Update CreateSQLAuditRecord and CreateAuditTask handlers to surface the skip message in the API response - Update tests for new function signatures
…slice initialization - Updated the Java content preparation in TestProcessArchiveEntry to include realistic SQL usage for better extraction reliability. - Optimized the initialization of the SQL slice in processXlsxContent to allocate the correct capacity based on the number of rows, improving performance.
58e68cd to
a482817
Compare
PR Reviewer Guide 🔍
|
|
Failed to generate code suggestions for PR |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
User description
Summary
.txt、.java、.rar、.7z、.xlsx格式支持Test plan
Description
新增对 .7z 文件处理逻辑及单元测试
新增对 .rar 文件解析及压缩包安全检查
新增对 .xlsx 文件解析和 SQL 提取逻辑
重构归一化 archive 处理及编码转换
集成统一函数 processArchiveEntry 与 getSqlFromJavaContent
修改 SQL 审核 API 返回消息及文件上传处理
Diagram Walkthrough
File Walkthrough
7 files
新增 7z 文件解析实现及压缩包大小、文件数校验新增压缩包安全配置及大小、数量校验函数新增统一的压缩包入口处理函数及 Java 文件 SQL 提取新增 RAR 文件解析逻辑及安全检查处理新增 XLSX 文件解析及 SQL 拼接提取逻辑修改 SQL 审核记录返回消息及文件上传处理逻辑更新 GetSQLFromFile 增加多格式分发及编码转换支持5 files
添加 7z 文件处理单元测试覆盖各种场景添加 archiveConfig 单元测试验证边界条件添加 archiveEntry 处理函数单元测试场景添加 RAR 文件解析及异常处理单元测试添加 XLSX 文件解析单元测试覆盖多种模板情况1 files
更新依赖项,新增七压缩包、RAR及 XLSX 相关依赖22 files