Skip to content

Commit d5a5fff

Browse files
ajitpratap0Ajit Pratap Singhclaude
authored
fix: architect-review sprint — 11 correctness & DX defects in one sweep (#517)
* fix(core): resolve critical correctness defects in pools, metrics, AST, errors, keywords Addresses 6 issues from the 2026-04-16 architect review: C1+C2: Pool cleanup leaks in pkg/sql/ast/pool.go - PutSelectStatement now releases GroupBy, Having, Qualify, StartWith, ConnectBy, Joins, Windows, PrewhereClause, Sample, ArrayJoin, Pivot, Unpivot, MatchRecognize, Top, DistinctOnColumns, From, Limit/Offset/Fetch/For, With.CTEs - PutUpdateStatement, PutInsertStatement, PutDeleteStatement similarly audited and fixed - PutExpression: raised MaxWorkQueueSize to 100k and added recursive overflow drain guarded by MaxCleanupDepth - Added PoolLeakCount/ResetPoolLeakCount for observability - New pool_leak_test.go proves stable heap across 1000 complex SELECTs and 2000-element IN-lists C3: Metrics errorsByType memory DoS in pkg/metrics/metrics.go - Replaced err.Error()-keyed map with per-ErrorCode atomic.Int64 buckets - GetStats now allocates 4/op regardless of error cardinality C6: AST Children() coverage in pkg/sql/ast/ - Fixed 14 node types that silently truncated visitor traversal - New children_coverage_test.go uses reflection to assert completeness H6: errors.Error immutability in pkg/errors/errors.go - WithContext/WithHint/WithCause shallow-copy receiver H11: Keyword conflict detection in pkg/sql/keywords/ - New Conflicts()/ResetConflicts() API - Resolved 11 silent conflicts across dialects go test -race ./... passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(linter,lsp,parser): visitor migration, config loader, LSP dispatch, structured parser errors Addresses 5 issues from the 2026-04-16 architect review: C5: Linter rules now use ast.Walk instead of flat top-level traversal - 14 rules migrated (L009, L011-L014, L016, L017, L020, L023-L026, L028, L030) - 6 already-walking rules reinforced (L018, L019, L021, L022, L027, L029) - L022 FunctionOnColumn now descends into FROM subqueries - L029 SubqueryCanBeJoin traversal bug fixed (was skipping WITH/FROM/JOIN) - 22 new nested-case tests across performance, naming, safety, style H8: .gosqlx.yml config loader in pkg/linter/config/ - YAML schema: rules{id:{enabled,severity,params}}, ignore[], default_severity - Load(path), LoadDefault() (walks from cwd), Config.Apply() - linter.NewWithIgnore wires in IgnoreMatcher without circular dep - All 30 rules respect enable/disable and severity overrides via wrapper - yaml.v3 already in go.mod - 7 testdata fixtures, 18 tests H9: LSP statement dispatch in pkg/lsp/handler.go - Replaced fmt.Sprintf("%T")+strings.Contains with proper ast.Statement type switch - 15 statement kinds: SelectStatement, InsertStatement, UpdateStatement, DeleteStatement, MergeStatement, CreateTableStatement, CreateViewStatement, CreateIndexStatement, CreateMaterializedViewStatement, CreateSequenceStatement, DropStatement, DropSequenceStatement, AlterStatement, AlterTableStatement, AlterSequenceStatement, TruncateStatement, WithClause - Meaningful symbol names ("SELECT from users" vs "SELECT #1") H10: LSP documentSymbol ranges - Real statement ranges derived from AST position info via walkNode+nodeLocation - Fallback to earliestChildLocation + semicolon-or-EOF scan for nodes lacking Pos - Ranges bounded by next statement start (never overlap) - New TestHandler_DocumentSymbol_MultiStatementRanges verifies distinct ranges H7: 37 fmt.Errorf sites in parser converted to structured errors - Constructor distribution: 16 InvalidSyntaxError, 7 ExpectedTokenError, 5 UnsupportedFeatureError, 6 WrapError, 1 IncompleteStatementError - Context cancellation sites now return ctx.Err() directly (preserves errors.Is(err, context.Canceled) semantics) - INSERT...SELECT ErrUnexpectedStatement sentinel preserved via WithCause - New parser_errors_test.go covers 6 categories / 12 subtests go test -race ./... passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api,dialect,ci): additive public API (Tree, options, ParseReader), typed dialect capabilities, workflow CGO fix Addresses 5 issues from the 2026-04-16 architect review. All changes are additive and backward-compatible — existing Parse/ParseWithContext/ParseWithDialect/ ParseWithTimeout/Format/FormatOptions/MustParse signatures are untouched. H1-H4: Additive public API in pkg/gosqlx/ - Tree: opaque wrapper over *ast.AST with Statements(), Walk(), Format(), Tables(), Columns(), Functions(), Raw(), Release() - ParseTree(ctx, sql, opts...) — recommended entry point for new code - Functional options: WithDialect, WithStrict, WithTimeout, WithRecovery - FormatTree/FormatAST — format from AST without re-parse (no perf foot-gun) - ParseReader(ctx, io.Reader, opts...) — io.ReadAll then delegate - ParseReaderMultiple — splits on ; (quoted-aware) and parses each - Sentinel errors: ErrSyntax, ErrTokenize, ErrTimeout, ErrUnsupportedDialect (works with errors.Is) - Tree.Walk verified to descend into subqueries/CTEs H5: Typed dialect + Capabilities in new pkg/sql/dialect/ - Dialect typed string with constants: PostgreSQL, MySQL, MariaDB, SQLServer, Oracle, SQLite, Snowflake, ClickHouse, Generic, Unknown - Parse(s) normalizes free-form strings - Capabilities struct with 22 feature flags (SupportsQualify, SupportsArrayJoin, SupportsPrewhere, SupportsSample, SupportsTimeTravel, SupportsMatchRecognize, SupportsPivotUnpivot, SupportsWindowIgnoreNulls, SupportsConnectBy, SupportsTop, SupportsLimitOffset, SupportsFetchFirst, SupportsMerge, SupportsReturning, SupportsCompoundReturning, SupportsDistinctOn, SupportsMaterializedView, SupportsIndexHints, SupportsBracketQuoting, SupportsBacktickQuoting, SupportsDoubleQuoteIdentifier, SupportsILike) - Parser.DialectTyped() and Parser.Capabilities() accessors - Parser.IsPostgreSQL()/IsMySQL()/... predicates - Parser.dialect string field marked Deprecated (removal in v2.0) - Existing p.dialect == "..." sites left untouched (bulk migration deferred) C4: Release workflow CGO mismatch in .github/workflows/release.yml - Replaced raw `go test -race ./...` (implicit CGO_ENABLED=0) with `task test:race` + separate `task test:cbinding` step with CGO_ENABLED=1 - test.yml race-check job aligned similarly; added dedicated cbinding-race job - Install Task step added to both workflows Repo hygiene: - CLAUDE.md module graph corrected (tokenizer does depend on keywords; ast depends on models+metrics not token; added transform, fingerprint, lsp, linter, formatter, cbinding, security with actual deps) - .gitignore: added audit-*.png, 01-*.png, ..., 04-*.png, api-reference-*.png, .claude/worktrees/ to prevent audit artifact leakage - docs/ARCHITECT_REVIEW_2026-04-16.md: full architect review document go test -race ./... passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ci): resolve staticcheck and race detector failures - Replace nil context with context.TODO() in reader_test.go and tree_test.go - Add //lint:ignore U1000 comments for documented but unused items in children_coverage_test.go - Add CGO_ENABLED=1 to test:race task in Taskfile.yml (race detector requires CGO) These fixes resolve: - SA1012: do not pass a nil Context - U1000: unused var/func (documented for future use) - Race detector failure: -race requires CGO * fix(ci): increase race detector timeout to 120s The CI environment is slower and times out at 60s. Tests take ~60s on CI vs ~42s locally. --------- Co-authored-by: Ajit Pratap Singh <ajitpratapsingh@Ajits-Mac-mini-2655.local> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 70f891e commit d5a5fff

83 files changed

Lines changed: 7746 additions & 584 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/release.yml

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,21 @@ jobs:
2222
uses: actions/setup-go@v5
2323
with:
2424
go-version: '1.26'
25-
26-
- name: Run tests
27-
run: go test -race ./...
28-
25+
26+
- name: Install Task
27+
run: go install github.com/go-task/task/v3/cmd/task@latest
28+
29+
# Runs the pure-Go test suite via Taskfile (test:race excludes pkg/cbinding
30+
# because it requires CGO). We run cbinding tests separately below with
31+
# CGO_ENABLED=1 so the tagged release is validated against both build modes.
32+
- name: Run tests (race, no CGO)
33+
run: task test:race
34+
35+
- name: Run cbinding tests (CGO)
36+
env:
37+
CGO_ENABLED: '1'
38+
run: task test:cbinding
39+
2940
- name: Run GoReleaser
3041
uses: goreleaser/goreleaser-action@v6
3142
with:

.github/workflows/test.yml

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,8 +74,33 @@ jobs:
7474
go-version: '1.26'
7575
cache: true
7676

77+
- name: Install Task
78+
run: go install github.com/go-task/task/v3/cmd/task@latest
79+
80+
# Use Taskfile so CI matches local `task test:race` (pkg/cbinding excluded -
81+
# it requires CGO and is exercised separately in the cbinding-race job).
7782
- name: Run tests with race detector
78-
run: go test -race -short ./...
83+
run: task test:race
84+
85+
cbinding-race:
86+
name: Race Detector (CGO / cbinding)
87+
runs-on: ubuntu-latest
88+
steps:
89+
- uses: actions/checkout@v4
90+
91+
- name: Set up Go
92+
uses: actions/setup-go@v5
93+
with:
94+
go-version: '1.26'
95+
cache: true
96+
97+
- name: Install Task
98+
run: go install github.com/go-task/task/v3/cmd/task@latest
99+
100+
- name: Run cbinding tests (CGO + race)
101+
env:
102+
CGO_ENABLED: '1'
103+
run: task test:cbinding
79104

80105
benchmark:
81106
name: Benchmark

.gitignore

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,5 +59,16 @@ sql-validator
5959
# Git worktrees
6060
.worktrees/
6161
.superpowers/
62+
.claude/worktrees/
6263
gosqlx.png
6364
gosqlx_text.png
65+
66+
# Local UI / visual-regression audit screenshots (generated by website checks,
67+
# Playwright / Chrome DevTools runs). These are large PNGs that leak into the
68+
# repo root and should never be committed.
69+
audit-*.png
70+
01-*.png
71+
02-*.png
72+
03-*.png
73+
04-*.png
74+
api-reference-*.png

CLAUDE.md

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -48,17 +48,40 @@ The codebase uses extensive sync.Pool for all major data structures:
4848

4949
### Module Dependencies
5050

51-
Clean hierarchy with minimal coupling:
51+
Clean hierarchy with minimal coupling (verified against production imports):
5252
```
53-
models → (no deps)
54-
errors → models
55-
keywords → models
56-
tokenizer → models, keywords, metrics
57-
ast → token
58-
parser → tokenizer, ast, token, errors
59-
gosqlx → all (high-level wrapper)
53+
# Core parsing chain
54+
models → (no deps)
55+
errors → models
56+
metrics → (no deps)
57+
keywords → (no deps)
58+
token → (no deps)
59+
tokenizer → models, errors, metrics, keywords
60+
ast → models, metrics
61+
parser → models, errors, keywords, token, tokenizer, ast
62+
63+
# Higher-level / product packages
64+
formatter → models, sql/ast, sql/parser, sql/tokenizer
65+
transform → formatter, sql/ast, sql/keywords, sql/parser, sql/tokenizer
66+
fingerprint→ formatter, sql/ast, sql/parser, sql/tokenizer
67+
security → sql/ast (scanner; tests also pull parser, tokenizer)
68+
linter → sql/parser, sql/tokenizer
69+
# rule sub-packages additionally import: linter, models, sql/ast
70+
lsp → errors, models, gosqlx, sql/keywords, sql/parser, sql/tokenizer
71+
cbinding → gosqlx, sql/ast (requires CGO; excluded from task test:race)
72+
73+
# High-level wrapper
74+
gosqlx → all of the above (top-level convenience API)
6075
```
6176

77+
Notes:
78+
- `pkg/cbinding` requires `CGO_ENABLED=1`. The Taskfile splits this out: `task test:race`
79+
runs everything except cbinding, and `task test:cbinding` runs cbinding with CGO on.
80+
CI workflows must follow the same split or cbinding is silently skipped.
81+
- `keywords` has no intra-module deps — it's a pure keyword table.
82+
- `ast` depends on `models` (spans, locations) and `metrics` (pool instrumentation),
83+
NOT on `token` in production code.
84+
6285
## Development Commands
6386

6487
This project uses [Task](https://taskfile.dev) as the task runner:

Taskfile.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ tasks:
7979
desc: Run tests with race detection (CRITICAL for production)
8080
cmds:
8181
- echo "Running tests with race detection..."
82-
- go test -race -timeout 60s $(go list ./... | grep -v /cbinding)
82+
- CGO_ENABLED=1 go test -race -timeout 120s $(go list ./... | grep -v /cbinding)
8383

8484
test:cbinding:
8585
desc: Test C binding package (requires CGO)
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# GoSQLX Architect Review — 2026-04-16
2+
3+
Cross-component review of the entire project via 5 parallel architect agents. Scope: parsing pipeline, foundation layer, public APIs, advanced features (linter/LSP/security), and cross-cutting concerns (repo/build/CI).
4+
5+
---
6+
7+
## Executive Summary
8+
9+
The architecture is **good, not great**. The pipeline shape, pool strategy, concurrency model, DoS hardening, and security workflow are sensible and well above average for an OSS Go SDK. Three classes of issues hold it back:
10+
11+
1. **Correctness landmines** in pool cleanup and metrics that quietly undermine the 1.38M ops/sec headline claim.
12+
2. **DX friction** at the public API boundary that likely explains the 0 imports on pkg.go.dev despite 88 stars.
13+
3. **Extensibility debt** — dialect branching, linter rules, and security detection are all hardcoded; there is no published extension API.
14+
15+
None of this is a rewrite. It's **2–3 sprints** of focused work to turn a respectable SDK into a credible category leader.
16+
17+
---
18+
19+
## Critical Issues (Fix Before v1.15)
20+
21+
### C1. Pool leak in `PutSelectStatement`
22+
**File**: `pkg/sql/ast/pool.go:671-726`
23+
Only `Columns`, `OrderBy`, `Where` are released. Missing: `GroupBy`, `Having`, `Qualify`, `StartWith`, `ConnectBy`, `Joins`, `Windows`, `PrewhereClause`, `Sample`, `ArrayJoin`, `Pivot`, `Unpivot`, `MatchRecognize`, `Top`, `DistinctOnColumns`, `From`, `Limit`, `Offset`, `Fetch`, `For`. Every production-shaped SELECT leaks pooled expressions. `UpdateStatement` has the same defect at line 593+.
24+
25+
### C2. `PutExpression` silently drops past `MaxWorkQueueSize`
26+
**File**: `pkg/sql/ast/pool.go:880`
27+
The work-queue cap of 1000 causes remaining entries to not return to pools. Large IN-lists (an advertised use case — 1000 values = ~3000-4000 tokens) leak hundreds of nodes per parse.
28+
29+
### C3. Unbounded metrics map keyed by `err.Error()`
30+
**File**: `pkg/metrics/metrics.go:372-376, 458-462`
31+
`errorsByType` uses full formatted error strings as keys under a write lock. Pathological or fuzz-generated inputs with unique error strings create a memory DoS vector. Plus the map is deep-copied on every `GetStats()` call (line 739-744).
32+
**Fix**: key by `ErrorCode` (bounded ~20 buckets), use `atomic.Int64` per bucket, drop the mutex.
33+
34+
### C4. Release workflow CGO mismatch
35+
**File**: `.github/workflows/release.yml:27`
36+
Runs `go test -race ./...` with implicit `CGO_ENABLED=0`, but `pkg/cbinding` requires CGO. Either passes silently (skipping cbinding tests) or fails on tag push. Replace with `task test:race`.
37+
38+
### C5. Linter rules don't traverse nested AST
39+
All 22 linter rules use flat `for _, stmt := range ctx.AST.Statements` — zero use of `ast.Walk`. `SelectStarRule` won't detect `SELECT * FROM (SELECT id FROM t)`. `DeleteWithoutWhereRule` misses CTE-modifying statements. Rules silently regress every time a new AST node type lands.
40+
**Fix**: route all rules through `ast.Walk`/`ast.Inspector` in `pkg/sql/ast/visitor.go:161,218`.
41+
42+
### C6. AST `Children()` coverage is incomplete
43+
15+ node types return `nil` despite having children: `DropStatement`, `TruncateStatement`, `PragmaStatement`, `ShowStatement`, `DescribeStatement`, `UnsupportedStatement`, `WindowFrame`, `FetchClause`, `ForClause`, `UnpivotClause`, `SampleClause`, `ReferenceDefinition`, `TableOption`, `IndexColumn`. Anyone building a semantic analyzer on `Walk` gets silent truncation. Add a `go vet`-style test: any Node/Expression/Statement-typed field must appear in `Children()`.
44+
45+
---
46+
47+
## High-Severity Issues (Target v1.15–v1.16)
48+
49+
### H1. Public API leaks `*ast.AST` — forces users into `pkg/sql/ast`
50+
**File**: `pkg/gosqlx/gosqlx.go:102`
51+
`Parse()` returns `*ast.AST`. Users must import `pkg/sql/ast` to do anything non-trivial (type-switch, walk). This defeats the two-tier abstraction promise.
52+
**Fix**: wrap in an opaque `gosqlx.Tree` with methods `Statements()`, `Walk(fn)`, `Format(opts)`, `Tables()`, `Release()`, `Raw()`.
53+
54+
### H2. `FormatAST` not exposed at top tier — every Parse→Modify→Format re-parses
55+
**File**: `pkg/gosqlx/gosqlx.go:564-587`
56+
`Format(sql string, opts)` re-tokenizes. `formatter.FormatAST` exists internally but isn't surfaced. Self-inflicted perf wound for a library marketed on throughput.
57+
58+
### H3. Functional options anti-pattern: `ParseWithContext`, `ParseWithDialect`, `ParseWithTimeout`, `ParseWithRecovery`
59+
Combinatorial explosion (`ParseWithContextWithDialectWithStrict`?). Collapse to `gosqlx.Parse(ctx, sql, WithDialect(d), WithTimeout(t), WithRecovery(), WithStrict())` using functional options.
60+
61+
### H4. No `io.Reader` / `io.Writer` support
62+
Zero `io.Reader` references in `pkg/gosqlx`. Users parsing files/HTTP bodies `io.ReadAll` first. `ParseReader(ctx, io.Reader, ...Option)` is table stakes for a Go SDK.
63+
64+
### H5. Dialect is `string` with 72 scattered `p.dialect ==` comparisons
65+
**Fix**: switch `Parser.dialect` to the typed `keywords.SQLDialect`, add helper predicates (`isClickHouse`, `isSnowflake`, etc.), and consider a `DialectCapabilities` struct (`SupportsQualify bool`, `SupportsArrayJoin bool`, …) to centralize feature gates. This is the #1 extensibility drag. Adding a 9th dialect today is a multi-file scavenger hunt.
66+
67+
### H6. `*errors.Error` claims immutability but `WithContext`/`WithHint`/`WithCause` mutate
68+
**File**: `pkg/errors/errors.go:367, 399, 429`
69+
Docstrings lie. External consumers holding a shared `*Error` get observer effects. Either return shallow copies or unexport fields behind accessors.
70+
71+
### H7. 38 call sites use `fmt.Errorf` instead of structured errors
72+
Errors without position info, without error codes, without `errors.Is`-compatibility. Violates the LSP integration that already ships. Grep for `fmt.Errorf(` inside `func (p *Parser)` methods and rewrite via `goerrors.InvalidSyntaxError(msg, p.currentLocation(), hint)`.
73+
74+
### H8. No linter rule configuration (`.gosqlx.yml` referenced but unimplemented)
75+
`cmd/gosqlx/doc.go:52, 294` and `docs/CONFIGURATION.md:11` advertise `.gosqlx.yml`; `pkg/linter/` contains **zero** YAML/config code. Rule severity is baked in at construction, no per-rule disable, no inline `-- gosqlx:disable L016` suppression. Major adoption blocker vs. sqlfluff.
76+
77+
### H9. LSP uses reflection-via-strings for statement dispatch
78+
**File**: `pkg/lsp/handler.go:1230-1271`
79+
`fmt.Sprintf("%T", stmt)` then `strings.Contains(typeName, "SelectStatement")`. Forbidden by the project's own style guide; breaks on rename/vendor; unnecessary when a two-value type switch is 3 lines away.
80+
81+
### H10. LSP `documentSymbol` returns fake ranges
82+
**File**: `pkg/lsp/handler.go:1278-1288`
83+
"A more sophisticated implementation would track actual positions" — today every outline entry points at line 0. Primary value of documentSymbol is degraded.
84+
85+
### H11. Keyword registration is order-dependent
86+
**File**: `pkg/sql/keywords/keywords.go:293-309`
87+
`addKeywordsWithCategory` silently skips duplicates via `containsKeyword`. `REPLACE` appears in both `ADDITIONAL_KEYWORDS` and `SQLITE_SPECIFIC`; whichever runs first wins. Position-dependent semantics masquerading as a dispatch table. Log/panic on conflicts in tests.
88+
89+
---
90+
91+
## Medium-Severity (Strategic / Multi-Sprint)
92+
93+
- **M1. God-files need splitting**: `pkg/sql/ast/ast.go` (2,327L), `pkg/sql/ast/sql.go` (1,853L), `pkg/sql/tokenizer/tokenizer.go` (1,842L), `pkg/sql/parser/parser.go` (1,186L). All exceed the 800-line ceiling in the project's own `coding-style.md`. Mechanical split by domain (`ast_select.go`, `ast_dml.go`, etc.) is cheap and materially improves contributor onboarding.
94+
- **M2. Two parallel token types**`pkg/models/Token` and `pkg/sql/token/Token` coexist. Pick one. `pkg/sql/ast` already only uses `pkg/models`; `pkg/sql/token` is effectively dead outside `pkg/sql/parser`.
95+
- **M3. `Token` struct carries a `*Word` pointer** — heap alloc per keyword/identifier. Flatten in a v2 token type.
96+
- **M4. No Prometheus collector**`pkg/metrics` exposes `Stats` but no `prometheus.Collector`. Given the repo's stated observability stack, `pkg/metrics/prometheus/` is a natural sub-package.
97+
- **M5. Compatibility package is reflect-snapshots, not a contract**`pkg/compatibility/compatibility_test.go` golden files stop at v1.5.1 (we're on v1.14). High-level `pkg/gosqlx` has zero stability tests. Wire `gorelease`/`apidiff` into CI.
98+
- **M6. `preprocessTokens` allocates a slice on every Parse** (`pkg/sql/parser/preprocess.go:50`). At 1.38M ops/sec × 50 tokens, that's ~70M allocs/sec. Pool the preprocess buffer.
99+
- **M7. Perf regression gate is `continue-on-error: true`** with 60–65% tolerance. Regressions up to 1.65× slip through silently. Tighten to <25% on a self-hosted runner and make the job required.
100+
- **M8. No benchstat comparison in CI** — benchmarks run but output is discarded. Add `benchmark-action/github-action-benchmark` or upload/compare artifacts.
101+
- **M9. Error severity missing** — all `ErrorCode`s are flat; no `Severity` (warning/error/fatal). LSP diagnostic severity mapping is thus heuristic. Add `Severity` to the `Error` type.
102+
- **M10. Module graph documentation drift**`CLAUDE.md:44-52` claims dependencies that don't match the code. `tokenizer→keywords` is false; `ast→token` is false; `transform`, `fingerprint`, `lsp`, `linter`, `formatter`, `cbinding` aren't in the graph at all.
103+
104+
---
105+
106+
## Repo Hygiene (Quick Wins)
107+
108+
1. Clean up the 100+ `.png` audit screenshots and `.claude/worktrees/` from the working tree (route to `docs/audits/YYYY-MM/` or a separate repo).
109+
2. Add `tools/tools.go` with pinned dev tools — local `task deps:tools` installs `@latest`, CI pins `golangci-lint v2.11.3`, they already drift.
110+
3. Fix the module graph in `CLAUDE.md` lines 44-52 to match reality.
111+
4. Replace `.github/workflows/release.yml:27` `go test -race` with `task test:race` — single source of truth.
112+
5. Delete the committed `examples/cmd/cmd` binary.
113+
6. Consider moving `pkg/metrics`, `pkg/config`, infrastructure packages to `internal/` to reduce SemVer commitment burden.
114+
115+
---
116+
117+
## Pre-v2.0 Tech Debt Punch List
118+
119+
| # | Item | Why v2.0 gate |
120+
|---|------|---------------|
121+
| 1 | Split god-files (ast.go, sql.go, tokenizer.go, parser.go) | SemVer break lets you reorganize safely |
122+
| 2 | Remove `ConversionResult.PositionMapping` (marked deprecated at `parser.go:41-42`) | Removal window |
123+
| 3 | Merge/delete `pkg/sql/token` — parallel token types are confusing | Pick one |
124+
| 4 | Move non-API packages behind `internal/` | Reduces public API surface |
125+
| 5 | `DialectRegistry` replacing `switch` in `keywords.New()` | Clean extension boundary |
126+
| 6 | `gosqlx.Tree` opaque wrapper replacing raw `*ast.AST` return | Lets AST internals evolve without user breakage |
127+
| 7 | Functional options on `Parse` | Collapse `ParseWith*` family |
128+
| 8 | Structured errors everywhere (no `fmt.Errorf` in parser) | LSP/IDE integration quality |
129+
| 9 | Logger interface injection (203 `fmt.Println` calls across 38 files) | Embedders cannot silence output today |
130+
131+
---
132+
133+
## Competitive Framing
134+
135+
| Capability | GoSQLX | vitess | sqlparser-rs | sqlfluff |
136+
|---|---|---|---|---|
137+
| One-line Parse |||| N/A |
138+
| Typed AST walk at top level |||| N/A |
139+
| AST → SQL no-reparse |||| N/A |
140+
| io.Reader / streaming || partial || N/A |
141+
| Functional options || N/A || N/A |
142+
| Sentinel errors (errors.Is) |||| N/A |
143+
| API stability tooling | reflect-snapshot | apidiff | cargo semver-checks | N/A |
144+
| Rule config / suppressions || N/A | N/A ||
145+
| Auto-fix rules | ❌ (all stubs) | N/A | N/A | ✅ (~30) |
146+
147+
The **three** highest-leverage gaps vs competitors: (1) AST walk ergonomics, (2) FormatAST at top tier, (3) functional options. These are 1–2 weeks each. Fixing them would make the 5-minute DX experience competitive with sqlparser-rs.
148+
149+
---
150+
151+
## Recommended Sprint Plan
152+
153+
**Sprint 1 — "Correctness" (1 week)**
154+
- Fix C1, C2 (pool leaks)
155+
- Fix C3 (metrics DoS)
156+
- Fix C4 (release workflow)
157+
- Complete AST `Children()` coverage (C6)
158+
- Add leak-detection benchmark for production-shaped SELECT
159+
160+
**Sprint 2 — "DX" (2 weeks)**
161+
- H1: `gosqlx.Tree` opaque wrapper
162+
- H2: `FormatTree`/`FormatAST` at top tier
163+
- H3: Functional options
164+
- H4: `ParseReader`
165+
- README first-impression fix
166+
167+
**Sprint 3 — "Extensibility" (2 weeks)**
168+
- C5: Rules through `ast.Walk`
169+
- H8: `.gosqlx.yml` loader + per-rule config + inline suppression
170+
- Rule-Authoring SDK (`pkg/linter/sdk/`)
171+
- H5: Typed dialect + `DialectCapabilities` struct
172+
173+
**Sprint 4 — "Quality polish" (1 week)**
174+
- H7: `fmt.Errorf` → structured errors sweep
175+
- H9, H10: LSP dispatch + document symbol ranges
176+
- H11: Keyword conflict detection
177+
- M1: Split god-files
178+
- Repo hygiene quick wins
179+
180+
That's 6 weeks of real work. Ship v1.15 after Sprint 2, v1.16 after Sprint 3, v1.17 (or v2.0 cut) after Sprint 4.
181+
182+
---
183+
184+
## Net Assessment
185+
186+
**What's unusually good**: security workflow, Taskfile DX, pool discipline philosophy, DoS hardening, LSP capability breadth, error-code taxonomy design, dependency graph discipline, context.Context propagation.
187+
188+
**What will block adoption at 1000+ stars**: the public API forcing users into `pkg/sql/ast`, no functional options, no `FormatAST`, no rule config, no `io.Reader`. These are not academic — they're the exact frictions a Go dev hits in the first 5 minutes and closes the tab over.
189+
190+
**What threatens the performance claim**: C1 + C2 (pool leaks) are real and likely hidden by simple benchmarks. Add a production-shaped benchmark before anyone publishes "1.5M ops/sec" again.
191+
192+
**Timeline to category credibility**: 6 weeks of focused work. Not a rewrite. The bones are good.

0 commit comments

Comments
 (0)