Skip to content

Commit e45706d

Browse files
authored
Track applied migrations to skip column checks on startup (#60)
* Track applied migrations to skip column checks on startup Add a migrations table that records which migrations have been applied. On boot, load the set of applied names in one query and only run new ones. A fully migrated database now does 1 query instead of ~12 HasColumn/HasTable checks. Fresh databases created via CreateSchema record all migrations as already applied. Old databases get the migrations table on first MigrateSchema call and each migration is recorded after it runs. Closes #54 * Add benchmark for MigrateSchema on fully migrated database * Optimize MigrateSchema to single query for fully migrated databases Skip HasTable/HasColumn checks when the migrations table already exists. A fully migrated database now does one SELECT instead of ~12 individual column and table checks. * Add migration docs and link from architecture * Add test for upgrade from fully migrated database without migrations table
1 parent 34009ba commit e45706d

4 files changed

Lines changed: 420 additions & 111 deletions

File tree

docs/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ vulnerabilities (
165165

166166
On PostgreSQL, `INTEGER PRIMARY KEY` becomes `SERIAL`, `DATETIME` becomes `TIMESTAMP`, `INTEGER DEFAULT 0` booleans become `BOOLEAN DEFAULT FALSE`, and size/count columns use `BIGINT`.
167167

168-
The `MigrateSchema()` function handles backward compatibility with older git-pkgs databases by adding missing columns via `ALTER TABLE` as needed.
168+
The `MigrateSchema()` function handles backward compatibility with older git-pkgs databases by running named migrations that add missing columns and tables. See [migrations.md](migrations.md) for how to add new schema changes.
169169

170170
**Key operations:**
171171
- `GetPackageByPURL()` - Look up package by PURL

docs/migrations.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Database Migrations
2+
3+
Schema changes are tracked in a `migrations` table. Each migration has a name and a function. On startup, `MigrateSchema()` loads the set of already-applied names in one query and runs anything new.
4+
5+
Fresh databases created via `Create()` get the full schema and all migrations are recorded as already applied.
6+
7+
## Adding a migration
8+
9+
In `internal/database/schema.go`:
10+
11+
1. Write a migration function:
12+
13+
```go
14+
func migrateAddWidgetColumn(db *DB) error {
15+
hasCol, err := db.HasColumn("packages", "widget")
16+
if err != nil {
17+
return fmt.Errorf("checking column widget: %w", err)
18+
}
19+
if !hasCol {
20+
colType := "TEXT"
21+
if db.dialect == DialectPostgres {
22+
colType = "TEXT" // adjust if types differ
23+
}
24+
if _, err := db.Exec(fmt.Sprintf("ALTER TABLE packages ADD COLUMN widget %s", colType)); err != nil {
25+
return fmt.Errorf("adding column widget: %w", err)
26+
}
27+
}
28+
return nil
29+
}
30+
```
31+
32+
2. Append it to the `migrations` slice with the next sequential prefix:
33+
34+
```go
35+
var migrations = []migration{
36+
{"001_add_packages_enrichment_columns", migrateAddPackagesEnrichmentColumns},
37+
{"002_add_versions_enrichment_columns", migrateAddVersionsEnrichmentColumns},
38+
{"003_ensure_artifacts_table", migrateEnsureArtifactsTable},
39+
{"004_ensure_vulnerabilities_table", migrateEnsureVulnerabilitiesTable},
40+
{"005_add_widget_column", migrateAddWidgetColumn}, // new
41+
}
42+
```
43+
44+
3. Add the same column to both `schemaSQLite` and `schemaPostgres` at the top of the file so fresh databases start with the full schema.
45+
46+
## Rules
47+
48+
- Migration functions must be idempotent. Use `HasColumn`/`HasTable` checks or `IF NOT EXISTS` clauses so they're safe to run against a database that already has the change.
49+
- Handle both SQLite and Postgres dialects. Common differences: `DATETIME` vs `TIMESTAMP`, `INTEGER DEFAULT 0` vs `BOOLEAN DEFAULT FALSE`, `INTEGER PRIMARY KEY` vs `SERIAL PRIMARY KEY`.
50+
- Never reorder or rename existing entries. The name string is the migration's identity in the database.
51+
- Never remove old migrations from the list. They won't run on already-migrated databases, but they need to exist for older databases upgrading for the first time.

internal/database/database_test.go

Lines changed: 167 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -651,58 +651,159 @@ func TestMigrationFromOldSchema(t *testing.T) {
651651
}
652652
defer func() { _ = db.Close() }()
653653

654-
// Try to run queries that require new columns - these should fail without migration
655-
t.Run("queries should fail without migration", func(t *testing.T) {
656-
_, err := db.GetEnrichmentStats()
657-
if err == nil {
658-
t.Error("GetEnrichmentStats: expected error querying enriched_at column, got nil")
659-
}
660-
661-
_, err = db.GetPackageByEcosystemName("npm", "test-package")
662-
if err == nil {
663-
t.Error("GetPackageByEcosystemName: expected error querying registry_url column, got nil")
664-
}
665-
666-
// SearchPackages should work even with old schema because it uses sql.NullString
667-
// for nullable columns, which can handle NULL values properly
668-
_, err = db.SearchPackages("test", "", 10, 0)
669-
if err != nil {
670-
t.Errorf("SearchPackages: unexpected error with old schema: %v", err)
671-
}
672-
})
654+
// Queries that require new columns should fail without migration
655+
if _, err := db.GetEnrichmentStats(); err == nil {
656+
t.Error("GetEnrichmentStats: expected error querying enriched_at column, got nil")
657+
}
658+
if _, err := db.GetPackageByEcosystemName("npm", "test-package"); err == nil {
659+
t.Error("GetPackageByEcosystemName: expected error querying registry_url column, got nil")
660+
}
661+
// SearchPackages should work even with old schema because it uses sql.NullString
662+
if _, err := db.SearchPackages("test", "", 10, 0); err != nil {
663+
t.Errorf("SearchPackages: unexpected error with old schema: %v", err)
664+
}
673665

674666
// Run migration
675-
t.Run("migrate schema", func(t *testing.T) {
676-
if err := db.MigrateSchema(); err != nil {
677-
t.Fatalf("MigrateSchema failed: %v", err)
678-
}
679-
})
667+
if err := db.MigrateSchema(); err != nil {
668+
t.Fatalf("MigrateSchema failed: %v", err)
669+
}
680670

681671
// Verify queries work after migration
682-
t.Run("queries should work after migration", func(t *testing.T) {
683-
stats, err := db.GetEnrichmentStats()
684-
if err != nil {
685-
t.Errorf("GetEnrichmentStats failed after migration: %v", err)
686-
}
687-
if stats == nil {
688-
t.Error("GetEnrichmentStats returned nil after migration")
689-
}
672+
stats, err := db.GetEnrichmentStats()
673+
if err != nil {
674+
t.Errorf("GetEnrichmentStats failed after migration: %v", err)
675+
}
676+
if stats == nil {
677+
t.Error("GetEnrichmentStats returned nil after migration")
678+
}
690679

691-
pkg, err := db.GetPackageByEcosystemName("npm", "test-package")
692-
if err != nil {
693-
t.Errorf("GetPackageByEcosystemName failed after migration: %v", err)
680+
pkg, err := db.GetPackageByEcosystemName("npm", "test-package")
681+
if err != nil {
682+
t.Errorf("GetPackageByEcosystemName failed after migration: %v", err)
683+
}
684+
if pkg == nil {
685+
t.Fatal("GetPackageByEcosystemName returned nil after migration")
686+
}
687+
if pkg.Name != "test-package" {
688+
t.Errorf("expected package name test-package, got %s", pkg.Name)
689+
}
690+
691+
// Verify migrations were recorded
692+
applied, err := db.appliedMigrations()
693+
if err != nil {
694+
t.Fatalf("appliedMigrations failed: %v", err)
695+
}
696+
for _, m := range migrations {
697+
if !applied[m.name] {
698+
t.Errorf("migration %s not recorded as applied", m.name)
694699
}
695-
if pkg == nil {
696-
t.Fatal("GetPackageByEcosystemName returned nil after migration")
700+
}
701+
702+
// Running again should be a no-op
703+
if err := db.MigrateSchema(); err != nil {
704+
t.Fatalf("second MigrateSchema failed: %v", err)
705+
}
706+
}
707+
708+
func TestFreshDatabaseRecordsMigrations(t *testing.T) {
709+
dir := t.TempDir()
710+
dbPath := filepath.Join(dir, "fresh.db")
711+
712+
db, err := Create(dbPath)
713+
if err != nil {
714+
t.Fatalf("Create failed: %v", err)
715+
}
716+
defer func() { _ = db.Close() }()
717+
718+
applied, err := db.appliedMigrations()
719+
if err != nil {
720+
t.Fatalf("appliedMigrations failed: %v", err)
721+
}
722+
723+
for _, m := range migrations {
724+
if !applied[m.name] {
725+
t.Errorf("migration %s not recorded in fresh database", m.name)
697726
}
698-
if pkg.Name != "test-package" {
699-
t.Errorf("expected package name test-package, got %s", pkg.Name)
727+
}
728+
}
729+
730+
func TestMigrateSchemaSkipsApplied(t *testing.T) {
731+
dir := t.TempDir()
732+
dbPath := filepath.Join(dir, "test.db")
733+
734+
db, err := Create(dbPath)
735+
if err != nil {
736+
t.Fatalf("Create failed: %v", err)
737+
}
738+
defer func() { _ = db.Close() }()
739+
740+
// All migrations are already recorded from Create. Running MigrateSchema
741+
// should return without running any migration functions.
742+
if err := db.MigrateSchema(); err != nil {
743+
t.Fatalf("MigrateSchema failed: %v", err)
744+
}
745+
746+
// Verify count hasn't changed (no duplicate inserts)
747+
var count int
748+
if err := db.Get(&count, "SELECT COUNT(*) FROM migrations"); err != nil {
749+
t.Fatalf("counting migrations failed: %v", err)
750+
}
751+
if count != len(migrations) {
752+
t.Errorf("expected %d migrations, got %d", len(migrations), count)
753+
}
754+
}
755+
756+
func TestMigrateSchemaUpgradeFromFullyMigrated(t *testing.T) {
757+
dir := t.TempDir()
758+
dbPath := filepath.Join(dir, "existing.db")
759+
760+
// Simulate an existing proxy database that has the full current schema
761+
// but no migrations table (i.e. it was running the previous version).
762+
sqlDB, err := sql.Open("sqlite", dbPath)
763+
if err != nil {
764+
t.Fatalf("failed to open database: %v", err)
765+
}
766+
767+
if _, err := sqlDB.Exec(schemaSQLite); err != nil {
768+
t.Fatalf("failed to create schema: %v", err)
769+
}
770+
// Drop the migrations table that schemaSQLite now includes
771+
if _, err := sqlDB.Exec("DROP TABLE migrations"); err != nil {
772+
t.Fatalf("failed to drop migrations table: %v", err)
773+
}
774+
if _, err := sqlDB.Exec("INSERT INTO schema_info (version) VALUES (1)"); err != nil {
775+
t.Fatalf("failed to set schema version: %v", err)
776+
}
777+
if err := sqlDB.Close(); err != nil {
778+
t.Fatalf("failed to close database: %v", err)
779+
}
780+
781+
db, err := Open(dbPath)
782+
if err != nil {
783+
t.Fatalf("Open failed: %v", err)
784+
}
785+
defer func() { _ = db.Close() }()
786+
787+
// This should create the migrations table and record all migrations
788+
// without altering any tables (everything already exists).
789+
if err := db.MigrateSchema(); err != nil {
790+
t.Fatalf("MigrateSchema failed: %v", err)
791+
}
792+
793+
applied, err := db.appliedMigrations()
794+
if err != nil {
795+
t.Fatalf("appliedMigrations failed: %v", err)
796+
}
797+
for _, m := range migrations {
798+
if !applied[m.name] {
799+
t.Errorf("migration %s not recorded after upgrade", m.name)
700800
}
801+
}
701802

702-
// Note: SearchPackages not tested here because old timestamp data
703-
// stored as strings can't be scanned into time.Time. This is a data
704-
// migration issue, not a schema migration issue.
705-
})
803+
// Second run should be the fast path (single SELECT)
804+
if err := db.MigrateSchema(); err != nil {
805+
t.Fatalf("second MigrateSchema failed: %v", err)
806+
}
706807
}
707808

708809
func TestConcurrentWrites(t *testing.T) {
@@ -890,3 +991,26 @@ func TestSearchPackagesWithValues(t *testing.T) {
890991
t.Errorf("expected 10 hits, got %d", result.Hits)
891992
}
892993
}
994+
995+
func BenchmarkMigrateSchemaFullyMigrated(b *testing.B) {
996+
dir := b.TempDir()
997+
dbPath := filepath.Join(dir, "bench.db")
998+
999+
db, err := Create(dbPath)
1000+
if err != nil {
1001+
b.Fatalf("Create failed: %v", err)
1002+
}
1003+
defer func() { _ = db.Close() }()
1004+
1005+
// First call to ensure everything is migrated
1006+
if err := db.MigrateSchema(); err != nil {
1007+
b.Fatalf("initial MigrateSchema failed: %v", err)
1008+
}
1009+
1010+
b.ResetTimer()
1011+
for b.Loop() {
1012+
if err := db.MigrateSchema(); err != nil {
1013+
b.Fatalf("MigrateSchema failed: %v", err)
1014+
}
1015+
}
1016+
}

0 commit comments

Comments
 (0)