v0.2.0rc13 - Release Candidate: Add semantic business layer, structured intent planning, CLI artifact support, and public e-commerce validation
LatestSummary
This release is the semantic-architecture release for nlp2sql.
It moves the library from a mostly schema-and-question driven pipeline toward a business-aware pipeline:
question -> semantic resolution -> schema/examples retrieval -> SQL intent plan -> prompt assembly -> SQL -> semantic/execution validation -> optional repair
The release includes:
- first-class semantic context entities
- semantic resolver and validator ports
- semantic resolution, intent planning, and semantic validation services
- DSL support for semantic hooks and per-request semantic context
- CLI support for semantic context and few-shot examples loaded from JSON or YAML artifacts
- richer prompt assembly and metadata for downstream observability
- sanitized tests and a stronger local e-commerce integration schema
- public docs refreshed to match the current architecture
- version bump to
0.2.0rc13
Why this release matters
Before this work, nlp2sql was strongest when schema retrieval alone was enough and weaker when the real problem was business interpretation. Few-shot examples helped, but they were not enough to express governed business meaning such as:
- canonical fact tables
- required filters
- required dimensions
- disallowed tables
- canonical metrics and query shapes
This release adds that missing layer directly into the library, while keeping the hexagonal architecture intact and the DSL ergonomic.
Highlights
1. Semantic context is now a first-class concept
New core entities model business meaning explicitly:
SemanticContextSemanticEntityMappingMetricDefinitionDimensionDefinitionDomainRuleCanonicalQueryPatternSqlIntentPlanSemanticIssueSemanticValidationResult
This gives consumers a reusable way to pass business semantics without hardcoding private warehouse behavior into the library.
2. New semantic ports and adapters
Added extension points so semantic behavior can come from different sources:
SemanticResolverPortSemanticValidatorPort
Added adapters:
NoOpSemanticResolverNoOpSemanticValidatorDictSemanticResolverFileSemanticResolver
This preserves the hexagonal contract: semantic knowledge can come from memory, files, another service, or a future external system.
3. New semantic orchestration services
Added dedicated services for the semantic stage of the pipeline:
SemanticResolutionServiceSqlIntentPlanningServiceSemanticValidationServicePromptAssemblyService
These services enrich retrieval, constrain candidate tables, structure the SQL plan before generation, and detect semantically wrong SQL even when it is syntactically valid.
4. Query pipeline upgraded end-to-end
QueryGenerationService now orchestrates:
- query analysis
- semantic resolution
- schema retrieval
- example selection
- SQL intent planning
- prompt assembly
- SQL generation
- semantic validation
- optional execution validation and repair
It also emits richer metadata including:
semantic_contextsql_intent_planselected_examplesrepair_attemptsexecution_validation
5. DSL support is now business-aware and still idiomatic
The public API now supports semantic configuration directly in the DSL:
connect(..., semantic_hooks=..., semantic_context=...)nlp.ask(..., semantic_context=...)
Execution hooks and semantic hooks remain separate, so consumers can independently control execution-time behavior and business semantic behavior.
6. CLI now supports semantic and example artifacts
The CLI now supports:
--semantic-context-file--semantic-context-json--examples-file--examples-json--show-semantic-context--show-sql-intent-plan--show-selected-examples--validate--repair
Artifact loading is centralized through utils/artifact_loader.py, making the CLI useful for both local experimentation and downstream service parity.
7. Prompting is richer and more explicit
The provider adapters now receive rendered business context and structured intent plan information, instead of only relying on flattened or implicit metadata.
This includes helper formatting via:
utils/semantic_prompt.py
and richer prompt rendering in the OpenAI, Anthropic, and Gemini adapters.
8. Stronger local integration domain and sanitized tests
The repository's local PostgreSQL integration domain was expanded so the public library can prove the semantic architecture against a realistic but safe e-commerce schema.
The local schema now includes richer entities such as:
storesmarketing_channelsdaily_channel_metricsordersorder_items
This was used to validate that semantic context can steer generation from plausible-but-wrong transactional tables toward the intended aggregate fact table.
The automated test suite was also sanitized so it no longer embeds private or production-like warehouse identifiers.
9. Public docs now match the code
The docs were rewritten to explain the actual library shape:
- DSL-first usage
- semantic context
- intent planning
- execution modes
- CLI parity
- local e-commerce public examples only
Major code changes
Core and runtime
src/nlp2sql/core/entities.py- added semantic domain entities and
SqlIntentPlan - preserved richer semantic metadata for prompt assembly
- added semantic domain entities and
src/nlp2sql/core/runtime.py- added
SemanticHooks
- added
src/nlp2sql/client.py- added semantic context support to
NLP2SQLandconnect()
- added semantic context support to
src/nlp2sql/__init__.py- exported the new semantic entities, hooks, ports, services, and adapters
Ports and adapters
src/nlp2sql/ports/semantic_resolver.pysrc/nlp2sql/ports/semantic_validator.pysrc/nlp2sql/adapters/noop_semantic_resolver.pysrc/nlp2sql/adapters/noop_semantic_validator.pysrc/nlp2sql/adapters/dict_semantic_resolver.pysrc/nlp2sql/adapters/file_semantic_resolver.py
Services
src/nlp2sql/services/query_service.py- upgraded orchestration pipeline
- added semantic-aware cache signature logic
- added semantic retry context generation
src/nlp2sql/services/semantic_resolution_service.pysrc/nlp2sql/services/sql_intent_planning_service.pysrc/nlp2sql/services/semantic_validation_service.pysrc/nlp2sql/services/prompt_assembly_service.pysrc/nlp2sql/services/example_selection_service.py
CLI and artifact loading
src/nlp2sql/cli.py- added semantic/example artifact flags
- added runtime metadata output helpers
- added public validate/repair mode resolution
src/nlp2sql/utils/artifact_loader.py- loads semantic context and examples from JSON or YAML
src/nlp2sql/utils/semantic_prompt.py- renders semantic context and SQL intent plan sections for prompts
Prompt/rendering integration
Updated provider adapters to render richer business context:
src/nlp2sql/adapters/openai_adapter.pysrc/nlp2sql/adapters/anthropic_adapter.pysrc/nlp2sql/adapters/gemini_adapter.py
Local integration and tests
docker/init-schema.sql- expanded the public local e-commerce schema
tests/test_dsl_integration.py- added semantic-context integration coverage
- added regression scenario proving semantic guidance changes table choice
tests/test_retrieval_pipeline.py- added semantic resolution, planning, validation, and repair-loop coverage
tests/test_client_hooks.py- added coverage for semantic hooks and per-request semantic context
tests/test_artifact_loader.pytests/test_cli_semantic_support.py- added artifact/CLI coverage using sanitized public fixtures
tests/conftest.pytests/test_postgres_integration.py- aligned with the richer local schema
Documentation changes
Updated to reflect the real current model:
README.mddocs/ARCHITECTURE.mddocs/API.mddocs/CONFIGURATION.mddocs/ENTERPRISE.mddocs/Redshift.mdexamples/README.md
The docs now use only the repository's local e-commerce example domain.
Versioning
Version updated to 0.2.0rc13 in:
pyproject.tomlsrc/nlp2sql/__init__.pysrc/nlp2sql/config/settings.py
Verification
Validated during this work:
- semantic context can shift generation from generic transactional paths to the intended aggregate fact table
- CLI supports file-based semantic context and few-shot example artifacts
- automated tests use sanitized public fixtures instead of private warehouse identifiers
- local PostgreSQL integration domain was rebuilt and tested against the new semantic flow
- LLM-dependent local integration tests passed against the rebuilt local PostgreSQL setup
- public docs now describe the real DSL-first and semantic-aware architecture
Closes
#41Add semantic context and SQL intent planning to the nlp2sql generation pipeline#34feat: add--examples-fileflag to CLI for few-shot examples
Files changed
This release spans core entities, ports, adapters, services, CLI, utilities, local test schema, tests, docs, and version metadata. See the sections above for the major touched files.