Skip to content

feat: 文件格式导入支持扩展 - 后端#3235

Open
actiontech-bot wants to merge 11 commits intomainfrom
sqle/feat-3228
Open

feat: 文件格式导入支持扩展 - 后端#3235
actiontech-bot wants to merge 11 commits intomainfrom
sqle/feat-3228

Conversation

@actiontech-bot
Copy link
Copy Markdown
Member

@actiontech-bot actiontech-bot commented Apr 9, 2026

User description

Summary

  • 扩展 GetSQLFromFile 管道,新增 .txt.java.rar.7z.xlsx 格式支持
  • 新增 archiveConfig 安全配置,统一压缩包处理的大小/数量/嵌套限制
  • 新增 processArchiveEntry() 通用函数,ZIP/RAR/7z 共用
  • 回补 getSqlsFromZip 安全检查与 RAR/7z 对齐
  • 新增 GBK→UTF-8 编码自动转换

Test plan

  • 验证 .txt/.java/.rar/.7z/.xlsx 上传审核流程
  • 验证压缩包安全限制(大小/数量/嵌套)
  • 验证 GBK 编码文件处理
  • 运行单元测试

Description

  • 新增对 .7z 文件处理逻辑及单元测试

  • 新增对 .rar 文件解析及压缩包安全检查

  • 新增对 .xlsx 文件解析和 SQL 提取逻辑

  • 重构归一化 archive 处理及编码转换

  • 集成统一函数 processArchiveEntry 与 getSqlFromJavaContent

  • 修改 SQL 审核 API 返回消息及文件上传处理


Diagram Walkthrough

flowchart LR
  A["\"archiveConfig 配置与检查\""] --> B["\"processArchiveEntry 分发处理\""]
  B --> C["\"7z 解析 (getSqlsFrom7z)\""]
  B --> D["\"RAR 解析 (getSqlsFromRar)\""]
  B --> E["\"XLSX 解析 (getSqlsFromXlsx)\""]
  C --> F["\"调用 sevenzip 库\""]
  D --> G["\"调用 rardecode 库\""]
  E --> H["\"调用 excelize 库\""]
  I["\"统一入口 getSqlsFromArchive\""] --> C
  I --> D
  I --> E
  J["\"GetSQLFromFile API 分发处理\""] --> I
  J --> K["\"Java 文件 SQL 提取\""]
Loading

File Walkthrough

Relevant files
Enhancement
7 files
archive_7z.go
新增 7z 文件解析实现及压缩包大小、文件数校验                                                                 
+149/-0 
archive_config.go
新增压缩包安全配置及大小、数量校验函数                                                                           
+48/-0   
archive_entry.go
新增统一的压缩包入口处理函数及 Java 文件 SQL 提取                                                     
+76/-0   
archive_rar.go
新增 RAR 文件解析逻辑及安全检查处理                                                                         
+144/-0 
archive_xlsx.go
新增 XLSX 文件解析及 SQL 拼接提取逻辑                                                                 
+103/-0 
sql_audit_record.go
修改 SQL 审核记录返回消息及文件上传处理逻辑                                                                 
+47/-27 
task.go
更新 GetSQLFromFile 增加多格式分发及编码转换支持                                                 
+106/-12
Tests
5 files
archive_7z_test.go
添加 7z 文件处理单元测试覆盖各种场景                                                                         
+205/-0 
archive_config_test.go
添加 archiveConfig 单元测试验证边界条件                                                           
+159/-0 
archive_entry_test.go
添加 archiveEntry 处理函数单元测试场景                                                             
+181/-0 
archive_rar_test.go
添加 RAR 文件解析及异常处理单元测试                                                                         
+211/-0 
archive_xlsx_test.go
添加 XLSX 文件解析单元测试覆盖多种模板情况                                                                 
+181/-0 
Configuration
1 files
go.mod
更新依赖项,新增七压缩包、RAR及 XLSX 相关依赖                                                           
+18/-1   
Additional files
22 files
empty.7z [link]   
nested.7z [link]   
normal.7z [link]   
only_unsupported.7z [link]   
sorted_test.7z [link]   
sql_only.7z [link]   
unsupported.7z [link]   
empty.rar [link]   
nested.rar [link]   
normal.rar [link]   
only_unsupported.rar [link]   
sorted_test.rar [link]   
sql_only.rar [link]   
unsupported.rar [link]   
empty_file.xlsx [link]   
multi_sheet.xlsx [link]   
no_sql_column.xlsx [link]   
sql_lowercase.xlsx [link]   
sql_mixed_name.xlsx [link]   
sql_uppercase.xlsx [link]   
standard_template.xlsx [link]   
with_empty_rows.xlsx [link]   

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 9, 2026

CLA assistant check
All committers have signed the CLA.

LordofAvernus and others added 11 commits April 9, 2026 09:26
Add archive_config.go with:
- archiveConfig struct for archive bomb protection (MaxTotalSize=10MB,
  MaxFileCount=1000, MaxNestingDepth=1)
- defaultArchiveConfig instance
- checkSize() and checkFileCount() methods
- supportedArchiveExts map (.zip, .rar, .7z)
- supportedTextExts map (.sql, .txt, .java)

Add archive_config_test.go with:
- TestArchiveConfig_CheckSize: 7 cases including boundary values
- TestArchiveConfig_CheckFileCount: 6 cases including boundary values
- TestDefaultArchiveConfig: verify default config values
- TestSupportedArchiveExts and TestSupportedTextExts: verify extension maps
Add processArchiveEntry() function that handles individual files within
archives (ZIP/RAR/7z) by dispatching based on file extension:
- .sql/.txt: read content with UTF-8 conversion as SQL
- .xml: read content with UTF-8 conversion as XML for MyBatis parsing
- .java: write to temp file and call java-sql-extractor to extract SQL
- other formats: mark as unsupported

Includes getSqlFromJavaContent() helper that bridges the gap between
in-memory content and javaParser.GetSqlFromJavaFile (which requires a
file path). Unit tests cover all 8 cases from design doc section 5.1.1.
…ssArchiveEntry()

Refactor getSqlsFromZip() to delegate file processing to the unified
processArchiveEntry() function instead of inline .sql/.xml handling.
This extends ZIP support to .txt and .java files while maintaining
backward compatibility for existing .sql and .xml processing.

Key changes:
- Replace maxZipFileSize check with defaultArchiveConfig.checkSize()
- Add file count limit via defaultArchiveConfig.checkFileCount()
- Delegate per-entry processing to processArchiveEntry()
- Preserve ErrUnknownEncoding skip behavior
- Preserve natural sort ordering and XML batch parsing
Implement RAR archive decompression support using nwaples/rardecode v1.1.3.

- Add rardecode v1.1.3 dependency to go.mod and vendor
- New archive_rar.go with getSqlsFromRar() (echo.Context wrapper) and
  processRarContent() (core logic, testable without echo.Context)
- Support: .sql/.txt/.xml/.java extraction, nested archive skipping,
  size/count limits via archiveConfig, natural sort ordering
- Generate 7 RAR4 test files via Python script for unit testing
- Add archive_rar_test.go with 10 test scenarios covering design doc 5.1.2
Implement 7z archive decompression support using bodgit/sevenzip library:

- New archive_7z.go with getSqlsFrom7z() (echo.Context wrapper) and
  process7zContent() (core logic accepting io.ReaderAt + size for testability)
- Handles sevenzip's io.ReaderAt requirement by reading upload into bytes.Reader
- Reuses processArchiveEntry() for file type dispatch (.sql/.txt/.xml/.java)
- Applies archiveConfig limits (10MB total size, 1000 files, nested archive skip)
- Supports ErrUnknownEncoding skip, XML batch parsing, natural sort ordering
- Added 7 test 7z files (sql_only, normal, nested, empty, unsupported,
  only_unsupported, sorted_test) in testdata/7z/
- Unit tests in archive_7z_test.go: 10 test scenarios covering all design doc
  5.1.3 cases plus size/count/invalid data edge cases
- Vendored bodgit/sevenzip v1.3.0 and all transitive dependencies
Modify GetSQLFromFile() in task.go to dispatch by file extension:
- input_sql_file branch: .sql/.txt use original logic, .java calls
  getSqlFromJavaContent() to extract SQL from Java source code
- input_zip_file branch: replaced direct getSqlsFromZip() call with
  new getSqlsFromArchive() dispatcher that routes .zip/.rar/.7z to
  their respective handlers

Also confirmed isSupportFileType() needs no modification - it checks
form field names (input_sql_file, input_zip_file) not file extensions,
so new formats are already covered.
Add XLSX parsing support using excelize/v2 library to extract SQL
statements from Excel template files. The implementation follows the
same code organization pattern as RAR/7z (separated echo.Context
entry function and testable core function).

Changes:
- Add github.com/xuri/excelize/v2 v2.7.1 dependency with vendor
- New archive_xlsx.go: getSqlsFromXlsx() + processXlsxContent()
- Integrate .xlsx case into GetSQLFromFile pipeline in task.go
- 8 XLSX test files in testdata/xlsx/ covering all design doc scenarios
- Unit tests covering design doc 5.1.4 all 6 cases plus extras
…supportedTextExts

- P0-1: Add nested archive skip check before file read (consistent with RAR/7z)
- P0-2: Add cumulative decompressed size tracking to prevent zip bombs
- P0-3: Close rc (ReadCloser) after io.ReadAll to fix resource leak
- P1-1: Add .xlsx to supportedTextExts now that phase 2 is complete, update test
…QLFromFile

The input_sql_file branch in GetSQLFromFile() was passing raw bytes from
controller.ReadFile() directly without encoding conversion. When a GBK-encoded
file was uploaded, the raw GBK bytes were stored into MySQL UTF-8 columns,
causing Error 1366 (Incorrect string value).

Added utils.ConvertToUtf8() calls for .sql/.txt and .java branches, consistent
with processArchiveEntry() which already handles encoding conversion for files
inside archives.

Fixes: BUG-001
BUG-003: When an archive (.zip/.rar/.7z) contains no auditable files
(all files are unsupported formats), return a clear error message
"no auditable files in the archive" instead of silently creating an
empty audit record. This satisfies AC-4.6.

OBS-004: Track the count of skipped unsupported-format files during
archive processing and include "skipped N unsupported format file(s)"
in the API response message field. This satisfies AC-4.5.

Changes:
- Add skippedCount return value to processRarContent, process7zContent,
  getSqlsFromZip, getSqlsFromRar, getSqlsFrom7z, getSqlsFromArchive
- Add Message field to GetSQLFromFileResp for user feedback
- Add newBaseResWithMessage helper for response message composition
- Update CreateSQLAuditRecord and CreateAuditTask handlers to surface
  the skip message in the API response
- Update tests for new function signatures
…slice initialization

- Updated the Java content preparation in TestProcessArchiveEntry to include realistic SQL usage for better extraction reliability.
- Optimized the initialization of the SQL slice in processXlsxContent to allocate the correct capacity based on the number of rows, improving performance.
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

PR Reviewer Guide 🔍

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

代码格式优化

在 newBaseResWithMessage 函数中,对传入的 message 拼接形成返回消息时,如果原始 Message 为空,可能会导致前置逗号或格式不统一。建议在拼接前检查原始消息状态或者设置默认值,以确保返回格式始终清晰统一。

// newBaseResWithMessage creates a success BaseRes, appending an optional message.
// If message is empty, the response message is "ok"; otherwise "ok, <message>".
func newBaseResWithMessage(message string) controller.BaseRes {
	res := controller.NewBaseReq(nil)
	if message != "" {
		res.Message = res.Message + ", " + message
	}
	return res
}
代码复用建议

在 getSqlsFromArchive 函数中,通过扩展名分发到各个归档文件的处理函数时,有多处对文件打开、大小检查及文件数量校验的逻辑。建议将这些共通检查逻辑抽取成公共函数,以减少重复代码并提升可维护性。

// getSqlsFromArchive dispatches archive file processing based on file extension.
// It checks the uploaded file's extension and calls the appropriate handler:
// .zip -> getSqlsFromZip, .rar -> getSqlsFromRar, .7z -> getSqlsFrom7z.
// Returns skippedCount: the number of unsupported format files that were skipped.
func getSqlsFromArchive(c echo.Context) (sqlsFromSQLFile []SQLsFromSQLFile, sqlsFromXML []SQLFromXML, skippedCount int, exist bool, err error) {
	file, err := c.FormFile(InputZipFileName)
	if err == http.ErrMissingFile {
		return nil, nil, 0, false, nil
	}
	if err != nil {
		return nil, nil, 0, false, err
	}

	ext := strings.ToLower(filepath.Ext(file.Filename))
	switch ext {
	case ".zip":
		return getSqlsFromZip(c)
	case ".rar":
		return getSqlsFromRar(c)
	case ".7z":
		return getSqlsFrom7z(c)
	default:
		return nil, nil, 0, false, fmt.Errorf("unsupported archive file type: %s", ext)
	}
}

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2026

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants