Skip to content

v0.2.0rc13 - Release Candidate: Add semantic business layer, structured intent planning, CLI artifact support, and public e-commerce validation

Latest

Choose a tag to compare

@luiscarbonel1991 luiscarbonel1991 released this 07 Apr 01:40
5590825

Summary

This release is the semantic-architecture release for nlp2sql.

It moves the library from a mostly schema-and-question driven pipeline toward a business-aware pipeline:

question -> semantic resolution -> schema/examples retrieval -> SQL intent plan -> prompt assembly -> SQL -> semantic/execution validation -> optional repair

The release includes:

  • first-class semantic context entities
  • semantic resolver and validator ports
  • semantic resolution, intent planning, and semantic validation services
  • DSL support for semantic hooks and per-request semantic context
  • CLI support for semantic context and few-shot examples loaded from JSON or YAML artifacts
  • richer prompt assembly and metadata for downstream observability
  • sanitized tests and a stronger local e-commerce integration schema
  • public docs refreshed to match the current architecture
  • version bump to 0.2.0rc13

Why this release matters

Before this work, nlp2sql was strongest when schema retrieval alone was enough and weaker when the real problem was business interpretation. Few-shot examples helped, but they were not enough to express governed business meaning such as:

  • canonical fact tables
  • required filters
  • required dimensions
  • disallowed tables
  • canonical metrics and query shapes

This release adds that missing layer directly into the library, while keeping the hexagonal architecture intact and the DSL ergonomic.

Highlights

1. Semantic context is now a first-class concept

New core entities model business meaning explicitly:

  • SemanticContext
  • SemanticEntityMapping
  • MetricDefinition
  • DimensionDefinition
  • DomainRule
  • CanonicalQueryPattern
  • SqlIntentPlan
  • SemanticIssue
  • SemanticValidationResult

This gives consumers a reusable way to pass business semantics without hardcoding private warehouse behavior into the library.

2. New semantic ports and adapters

Added extension points so semantic behavior can come from different sources:

  • SemanticResolverPort
  • SemanticValidatorPort

Added adapters:

  • NoOpSemanticResolver
  • NoOpSemanticValidator
  • DictSemanticResolver
  • FileSemanticResolver

This preserves the hexagonal contract: semantic knowledge can come from memory, files, another service, or a future external system.

3. New semantic orchestration services

Added dedicated services for the semantic stage of the pipeline:

  • SemanticResolutionService
  • SqlIntentPlanningService
  • SemanticValidationService
  • PromptAssemblyService

These services enrich retrieval, constrain candidate tables, structure the SQL plan before generation, and detect semantically wrong SQL even when it is syntactically valid.

4. Query pipeline upgraded end-to-end

QueryGenerationService now orchestrates:

  1. query analysis
  2. semantic resolution
  3. schema retrieval
  4. example selection
  5. SQL intent planning
  6. prompt assembly
  7. SQL generation
  8. semantic validation
  9. optional execution validation and repair

It also emits richer metadata including:

  • semantic_context
  • sql_intent_plan
  • selected_examples
  • repair_attempts
  • execution_validation

5. DSL support is now business-aware and still idiomatic

The public API now supports semantic configuration directly in the DSL:

  • connect(..., semantic_hooks=..., semantic_context=...)
  • nlp.ask(..., semantic_context=...)

Execution hooks and semantic hooks remain separate, so consumers can independently control execution-time behavior and business semantic behavior.

6. CLI now supports semantic and example artifacts

The CLI now supports:

  • --semantic-context-file
  • --semantic-context-json
  • --examples-file
  • --examples-json
  • --show-semantic-context
  • --show-sql-intent-plan
  • --show-selected-examples
  • --validate
  • --repair

Artifact loading is centralized through utils/artifact_loader.py, making the CLI useful for both local experimentation and downstream service parity.

7. Prompting is richer and more explicit

The provider adapters now receive rendered business context and structured intent plan information, instead of only relying on flattened or implicit metadata.

This includes helper formatting via:

  • utils/semantic_prompt.py

and richer prompt rendering in the OpenAI, Anthropic, and Gemini adapters.

8. Stronger local integration domain and sanitized tests

The repository's local PostgreSQL integration domain was expanded so the public library can prove the semantic architecture against a realistic but safe e-commerce schema.

The local schema now includes richer entities such as:

  • stores
  • marketing_channels
  • daily_channel_metrics
  • orders
  • order_items

This was used to validate that semantic context can steer generation from plausible-but-wrong transactional tables toward the intended aggregate fact table.

The automated test suite was also sanitized so it no longer embeds private or production-like warehouse identifiers.

9. Public docs now match the code

The docs were rewritten to explain the actual library shape:

  • DSL-first usage
  • semantic context
  • intent planning
  • execution modes
  • CLI parity
  • local e-commerce public examples only

Major code changes

Core and runtime

  • src/nlp2sql/core/entities.py
    • added semantic domain entities and SqlIntentPlan
    • preserved richer semantic metadata for prompt assembly
  • src/nlp2sql/core/runtime.py
    • added SemanticHooks
  • src/nlp2sql/client.py
    • added semantic context support to NLP2SQL and connect()
  • src/nlp2sql/__init__.py
    • exported the new semantic entities, hooks, ports, services, and adapters

Ports and adapters

  • src/nlp2sql/ports/semantic_resolver.py
  • src/nlp2sql/ports/semantic_validator.py
  • src/nlp2sql/adapters/noop_semantic_resolver.py
  • src/nlp2sql/adapters/noop_semantic_validator.py
  • src/nlp2sql/adapters/dict_semantic_resolver.py
  • src/nlp2sql/adapters/file_semantic_resolver.py

Services

  • src/nlp2sql/services/query_service.py
    • upgraded orchestration pipeline
    • added semantic-aware cache signature logic
    • added semantic retry context generation
  • src/nlp2sql/services/semantic_resolution_service.py
  • src/nlp2sql/services/sql_intent_planning_service.py
  • src/nlp2sql/services/semantic_validation_service.py
  • src/nlp2sql/services/prompt_assembly_service.py
  • src/nlp2sql/services/example_selection_service.py

CLI and artifact loading

  • src/nlp2sql/cli.py
    • added semantic/example artifact flags
    • added runtime metadata output helpers
    • added public validate/repair mode resolution
  • src/nlp2sql/utils/artifact_loader.py
    • loads semantic context and examples from JSON or YAML
  • src/nlp2sql/utils/semantic_prompt.py
    • renders semantic context and SQL intent plan sections for prompts

Prompt/rendering integration

Updated provider adapters to render richer business context:

  • src/nlp2sql/adapters/openai_adapter.py
  • src/nlp2sql/adapters/anthropic_adapter.py
  • src/nlp2sql/adapters/gemini_adapter.py

Local integration and tests

  • docker/init-schema.sql
    • expanded the public local e-commerce schema
  • tests/test_dsl_integration.py
    • added semantic-context integration coverage
    • added regression scenario proving semantic guidance changes table choice
  • tests/test_retrieval_pipeline.py
    • added semantic resolution, planning, validation, and repair-loop coverage
  • tests/test_client_hooks.py
    • added coverage for semantic hooks and per-request semantic context
  • tests/test_artifact_loader.py
  • tests/test_cli_semantic_support.py
    • added artifact/CLI coverage using sanitized public fixtures
  • tests/conftest.py
  • tests/test_postgres_integration.py
    • aligned with the richer local schema

Documentation changes

Updated to reflect the real current model:

  • README.md
  • docs/ARCHITECTURE.md
  • docs/API.md
  • docs/CONFIGURATION.md
  • docs/ENTERPRISE.md
  • docs/Redshift.md
  • examples/README.md

The docs now use only the repository's local e-commerce example domain.

Versioning

Version updated to 0.2.0rc13 in:

  • pyproject.toml
  • src/nlp2sql/__init__.py
  • src/nlp2sql/config/settings.py

Verification

Validated during this work:

  • semantic context can shift generation from generic transactional paths to the intended aggregate fact table
  • CLI supports file-based semantic context and few-shot example artifacts
  • automated tests use sanitized public fixtures instead of private warehouse identifiers
  • local PostgreSQL integration domain was rebuilt and tested against the new semantic flow
  • LLM-dependent local integration tests passed against the rebuilt local PostgreSQL setup
  • public docs now describe the real DSL-first and semantic-aware architecture

Closes

  • #41 Add semantic context and SQL intent planning to the nlp2sql generation pipeline
  • #34 feat: add --examples-file flag to CLI for few-shot examples

Files changed

This release spans core entities, ports, adapters, services, CLI, utilities, local test schema, tests, docs, and version metadata. See the sections above for the major touched files.