Skip to content

jzombie/sqlsmith-sqllogictest-corpus-generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQLsmith SQLLogicTest Corpus Generator

This repo builds a container image that compiles SQLsmith and turns its generated queries into SQLLogicTest cases. Everything runs inside Docker; the host machine only needs the Docker CLI.

Build the image

# from the repository root
docker build -t sqlsmith-slt .

The build compiles SQLsmith from the master branch and installs the helper scripts under /usr/local/bin inside the image.

Generate a corpus

mkdir -p out

docker run --rm \
  -v "$(pwd)/out":/out \
  sqlsmith-slt

The container writes results into /out (mapped to ./out on the host):

  • seeds.sql – raw SQLsmith statements (aggregated across all batches) for reproducibility.
  • case_*.test – SQLLogicTest files containing expected results computed by SQLite.

Customize generation

Override the environment variables below with -e NAME=value flags when running the container:

Variable Default Description
TARGET_ENGINE sqlite Execution backend used for result materialization. Only sqlite is supported today.
SQLSMITH_BATCH_QUERIES 250 Number of statements per SQLsmith batch. SQLSMITH_MAX_QUERIES remains as an alias.
SQLSMITH_SEED 1 Base seed passed to SQLsmith; each batch increments it by one for variety.
OUTPUT_MODE slt Set to slt for SQLLogicTest cases or statements for a plain SQL file.
SQLLOGICTEST_ROWSORT rowsort Switch to nosort to omit the rowsort directive.
SQLITE_TIMEOUT 1.0 Seconds allowed for each SQLite execution (not yet enforced).
SEED_FILENAME seeds.sql Name of the raw SQL dump written to /out.
SQLITE_INIT_SQL /usr/local/share/sqlsmith/init.sql SQL script executed once to seed the SQLite database before SQLsmith runs. Set to empty to skip.
SQLSMITH_PASS_TARGET (unset) Minimum number of passing cases (query + statement ok) to retain. When set, the container keeps running SQLsmith until the target is met.
SQLSMITH_MAX_ERRORS (unset) Maximum number of statement error cases to keep. Excess failures are discarded.
SQLSMITH_MAX_CASES (unset) Optional cap on the number of new cases admitted per SQLsmith batch.

The SQLite connection defaults to /tmp/sqlsmith.db; set ENGINE_URI (or the legacy SQLITE_URI) when you need to target a different database file or URI.

When SQLSMITH_PASS_TARGET is specified the entrypoint loops, running SQLsmith in batches of SQLSMITH_BATCH_QUERIES statements until the accumulated corpus contains at least that many passing cases. SQLSMITH_MAX_ERRORS bounds how many failure cases are retained. Each batch bumps the SQLsmith seed by one to broaden coverage while keeping the run reproducible.

Example: generate a corpus with at least 20 passing cases and at most 3 expected failures:

docker run --rm \
  -v "$(pwd)/out":/out \
  -e SQLSMITH_PASS_TARGET=20 \
  -e SQLSMITH_MAX_ERRORS=3 \
  -e SQLSMITH_BATCH_QUERIES=50 \
  sqlsmith-slt

To capture non-empty result sets, point SQLsmith (and the executor) at a populated SQLite database, for example:

docker run --rm \
  -v "$(pwd)/northwind.db":/data/northwind.db:ro \
  -v "$(pwd)/out":/out \
  -e ENGINE_URI="file:/data/northwind.db?mode=ro" \
  sqlsmith-slt

By default the container seeds /tmp/sqlsmith.db using SQLITE_INIT_SQL, provisioning sample commerce-style tables so the generated queries have data to read from. Replace that script or mount your own to tailor the schema.

Pass extra flags directly through to SQLsmith by appending them after the image name. Example: docker run … sqlsmith-slt --exclude-catalog.

Verify the output

After running the container you should see files in out/:

ls out | head
# case_000001.test
# case_000002.test
# ...
# seeds.sql

Each .test file follows SQLLogicTest formatting and can be executed with your preferred SLT runner.

Count the generated cases:

ls out/case_*.test | wc -l

About

SQLLogicTest corpus generator using SQLsmith.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors