AI Agent Guidelines for abstract-validation-base

This document provides instructions for AI assistants working with code that uses the abstract-validation-base package.

Quick Reference

Task	Correct Approach
Create a validator	Inherit `BaseValidator[T]`, implement `name` property + `validate()` method
Report validation errors	Use `result.add_error()`, never raise exceptions
Log data transformations	Use `model.add_cleaning_process()`
Combine validators	Use `CompositeValidator` or `ValidatorPipelineBuilder`
Process large files	Use `ValidationRunner` with iterator input
Export failures	Use `CSVFailedWriter` or `JSONLinesFailedWriter`
Track progress	Add observer implementing `ValidationObserver` protocol
Profile data during validation	Use `WhylogsObserver`
Write tests	Use Hypothesis for property-based, stateful, and unit tests
Report package issues	Use `mcp_github_create_issue` with `[Bug]:`, `[Feature]:`, or `[Docs]:` title prefix
Submit a PR	Run `ruff check`, `ruff format`, `mypy`, `pytest` locally first
Plan complex tasks	Use `STEP# - AGENT#:` prefix format for parallel-safe task breakdown

Task Planning for Parallel Execution

When planning complex tasks, structure them to show parallelization opportunities. This enables:

Multi-agent execution — Multiple Cursor sessions or agents can work simultaneously
Clear dependencies — Steps are sequential barriers; agents within steps are parallel-safe
Single-agent fallback — One agent can work through tasks top-to-bottom sequentially

Plan Format

STEP 1 - AGENT A: [Task that can run in parallel with B and C]
STEP 1 - AGENT B: [Task that can run in parallel with A and C]
STEP 1 - AGENT C: [Task that can run in parallel with A and B]
STEP 2 - AGENT A: [Depends on all STEP 1 tasks completing]
STEP 2 - AGENT B: [Depends on all STEP 1 tasks completing]
STEP 3 - AGENT A: [Final integration task]

Planning Rules

Steps are sequential barriers — All tasks in STEP N must complete before STEP N+1 begins
Agents within a step are parallel-safe — No dependencies between same-step tasks
Single agent mode — Work through tasks top-to-bottom, treating agent labels as informational
Multi-agent mode — Different sessions claim different agent letters for true parallelism

Dependency Checklist

Before assigning parallel agents to the same step, verify:

No shared file sections — Agents edit different files or non-overlapping sections
No import dependencies — Task B doesn't import something Task A is creating
No shared test fixtures — Parallel test changes don't conflict
No database migrations — Only one agent modifies schema at a time

Example Plan: Adding New Validator Types

STEP 1 - AGENT A: Create EmailValidator class in src/abstract_validation_base/validators.py
STEP 1 - AGENT B: Create PhoneValidator class in src/abstract_validation_base/validators.py (separate section)
STEP 1 - AGENT C: Add Hypothesis strategies for email/phone generation in tests/conftest.py

STEP 2 - AGENT A: Write property-based tests for EmailValidator in tests/test_validators.py
STEP 2 - AGENT B: Write property-based tests for PhoneValidator in tests/test_validators.py
STEP 2 - AGENT C: Write stateful tests for validator combinations in tests/test_validators.py

STEP 3 - AGENT A: Add exports to __init__.py and update type stubs
STEP 3 - AGENT B: Run full test suite, fix any integration issues

STEP 4 - AGENT A: Update README.md with usage examples

How Agents Should Use This

When creating a plan:

Identify independent work units
Group truly parallel tasks into the same STEP
Assign different AGENT letters to parallel tasks
Put dependent work in later STEPs
Name each task with the STEP# - AGENT#: prefix so it appears clearly in the Agents panel

Task Naming Convention:

✅ Good: "STEP 1 - AGENT A: Create EmailValidator class"
✅ Good: "STEP 2 - AGENT B: Write property tests for PhoneValidator"
❌ Bad:  "Create EmailValidator class"
❌ Bad:  "Task 1: Create EmailValidator"

The prefix makes it easy to:

Identify which step a task belongs to
See which agent slot is assigned
Track parallel work in the Agents panel
Understand task dependencies at a glance

When executing a plan (single agent):

Work through tasks in order: STEP 1A → 1B → 1C → STEP 2A → 2B → ...
Complete all tasks regardless of agent assignment

When executing a plan (multiple agents/sessions):

Each session claims an agent letter (A, B, or C)
Work only on tasks matching your agent letter
Wait at step boundaries for other agents to complete
Coordinate via git commits or shared status file

Agent Awareness

Environment	Agent Awareness
Single Cursor session	Agent sees full plan, executes sequentially
Multiple Cursor sessions	Each session independent; coordinate manually via git
Cursor Plan Mode (multi-agent)	Cursor may auto-distribute; agents see their assignments
External orchestration (AutoGen, CrewAI)	Orchestrator assigns tasks; agents receive specific work

Implementing Validators

Required Pattern

from abstract_validation_base import BaseValidator, ValidationResult

class MyValidator(BaseValidator[MyModel]):
    @property
    def name(self) -> str:
        return "my_validator"  # Used for identification and error reporting
    
    def validate(self, item: MyModel) -> ValidationResult:
        result = ValidationResult(is_valid=True)
        
        # Add errors using result.add_error() - this auto-sets is_valid=False
        if not item.required_field:
            result.add_error("required_field", "This field is required")
        
        if item.number < 0:
            result.add_error("number", "Must be non-negative", str(item.number))
        
        return result

Rules

Always inherit from BaseValidator[T] with the correct type parameter
Implement both name property and validate() method - both are abstract
Return ValidationResult - never raise exceptions for validation failures
Use result.add_error(field, message, value?) - this automatically sets is_valid=False
Validators should be stateless - don't store state between validate() calls
Don't modify the item being validated - validators should be side-effect free

Type Parameter Must Match

# CORRECT: Type parameter matches the model
class UserValidator(BaseValidator[User]):
    def validate(self, item: User) -> ValidationResult: ...

# INCORRECT: Type mismatch will cause type checker errors
class UserValidator(BaseValidator[Contact]):  # Wrong type!
    def validate(self, item: User) -> ValidationResult: ...

Using ValidationBase Models

Adding Errors vs Cleaning

from abstract_validation_base import ValidationBase

class MyModel(ValidationBase):
    name: str
    email: str

model = MyModel(name="test", email="TEST@EXAMPLE.COM")

# For validation failures - something is wrong with the data
model.add_error(
    field="email",
    message="Domain not in allowlist",
    value=model.email,
    context={"allowed_domains": ["company.com"]},  # Optional context
)

# For data transformations - data was changed/cleaned
model.add_cleaning_process(
    field="email",
    original_value="TEST@EXAMPLE.COM",
    new_value="test@example.com",
    reason="Normalized to lowercase",
    operation_type="normalization",  # Optional: cleaning, normalization, formatting, etc.
)

Checking Status

if model.has_errors:
    print(f"Found {model.error_count} errors")

if model.has_cleaning:
    print(f"Applied {model.cleaning_count} transformations")

Exporting Audit Logs

# For single model
entries = model.audit_log(source="import_batch_1")

# For models with nested ValidationBase fields
entries = model.audit_log_recursive(source="import_batch_1")

# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(entries)

Using Observers

from abstract_validation_base import ValidationObserver, ValidationEvent, ValidationEventType

class MetricsObserver:
    def on_event(self, event: ValidationEvent) -> None:
        if event.event_type == ValidationEventType.ERROR_ADDED:
            metrics.increment("validation.errors")
        elif event.event_type == ValidationEventType.CLEANING_ADDED:
            metrics.increment("validation.cleaning_operations")

model.add_observer(MetricsObserver())
model.add_error("field", "error")  # Observer is notified

Combining Validators

Using CompositeValidator

from abstract_validation_base import CompositeValidator

# Combine validators - runs all and merges results
pipeline = CompositeValidator[MyModel](
    validators=[
        RequiredFieldsValidator(),
        FormatValidator(),
        BusinessRulesValidator(),
    ],
    name="my_pipeline",
    fail_fast=False,  # Run all validators (default)
)

# With fail_fast=True, stops on first failure
pipeline = CompositeValidator[MyModel](
    validators=[...],
    fail_fast=True,
)

result = pipeline.validate(model)

Using ValidatorPipelineBuilder

from abstract_validation_base import ValidatorPipelineBuilder

pipeline = (
    ValidatorPipelineBuilder[MyModel]("my_pipeline")
    .add(RequiredFieldsValidator())
    .add(FormatValidator())
    .add(BusinessRulesValidator())
    .fail_fast()  # Optional
    .build()
)

Dynamic Validator Management

composite = CompositeValidator[MyModel](validators=[])

# Add validators dynamically
composite.add_validator(EmailValidator())
composite.add_validator(PhoneValidator())

# Query validators
if composite.has_validator("email_validator"):
    validator = composite.get_validator("email_validator")

# Remove by name
composite.remove_validator("phone_validator")

# List all
print(composite.validator_names)  # ["email_validator"]

Streaming Large Files

Basic Pattern

import csv
from abstract_validation_base import ValidationRunner

with open("large_file.csv") as f:
    reader = csv.DictReader(f)  # Iterator - NOT materialized
    
    runner = ValidationRunner(
        data=reader,              # Pass iterator directly
        model_class=MyModel,
        validators=pipeline,      # Optional custom validators
        total_hint=1_000_000,     # Optional: for progress percentage
    )
    
    for result in runner.run():
        if result.is_valid:
            db.insert(result.model)
        else:
            for field, msg in result.error_summary:
                log.warning(f"{field}: {msg}")

Convenience Methods

# Yield only valid models
for model in runner.run_collect_valid():
    db.insert(model)

# Yield only failed results
for result in runner.run_collect_failed():
    log_failure(result.raw_data)

# Batch valid models for bulk insert
for batch in runner.run_batch_valid(batch_size=1000):
    db.insert_many(batch)  # batch is List[MyModel]

Parallel Processing

# For very large files (>1M rows)
for result in runner.run(workers=4, chunk_size=10000):
    process(result)

Accessing Statistics

# After iteration completes
stats = runner.stats
print(f"Success rate: {stats.success_rate:.1f}%")
print(f"Duration: {stats.duration_ms:.0f}ms")

# Top errors
for (field, msg), count, pct in stats.top_errors(10):
    print(f"{field}: {msg} ({count} occurrences, {pct:.1f}%)")

# Full audit report
report = runner.audit_report()
# Returns: {"summary": {...}, "top_errors": [...], "failed_samples": [...]}

Output Writers

Writing Failed Records

from abstract_validation_base import CSVFailedWriter, JSONLinesFailedWriter

# CSV format
writer = CSVFailedWriter(
    "failed_records.csv",
    include_raw_data=True,    # Include original fields
    max_errors_per_row=5,     # Limit error columns
)
count = writer.write_all(runner.run_collect_failed())

# JSON Lines format
writer = JSONLinesFailedWriter(
    "failed_records.jsonl",
    include_raw_data=True,
    indent=None,  # Compact (default) or int for pretty
)
count = writer.write_all(runner.run_collect_failed())

# Using context manager for streaming
with CSVFailedWriter("failed.csv") as writer:
    for result in runner.run_collect_failed():
        writer.write_one(result)

Writing Audit Reports

from abstract_validation_base import AuditReportWriter

# Auto-detect format from extension
writer = AuditReportWriter("audit.json")  # JSON format
writer = AuditReportWriter("audit.csv")   # CSV format

# Explicit format
writer = AuditReportWriter("report.txt", format="json")

# With options
writer = AuditReportWriter(
    "audit.csv",
    errors_path="top_errors.csv",     # Separate file for errors
    include_samples=True,              # Include failed samples
    samples_path="failed_samples.csv", # Separate file for samples
)

writer.write(runner.audit_report())

Observer Pattern for Progress

Simple Progress Bar (Rich)

from rich.progress import Progress
from abstract_validation_base import SimpleProgressObserver

with Progress() as progress:
    observer = SimpleProgressObserver(progress, task_description="Validating")
    runner.add_observer(observer)
    
    for result in runner.run():
        process(result)

Full Dashboard (Rich)

from abstract_validation_base import RichDashboardObserver

observer = RichDashboardObserver(
    top_errors_count=10,  # Show top 10 errors
    refresh_rate=10,      # Updates per second
)
runner.add_observer(observer)

with observer:  # Starts/stops live display
    for result in runner.run():
        process(result)

Custom Observer

from abstract_validation_base import ValidationObserver, ValidationEvent, ValidationEventType

class CustomObserver:
    def on_event(self, event: ValidationEvent) -> None:
        if event.event_type == ValidationEventType.VALIDATION_STARTED:
            print(f"Starting: {event.data.get('model_class')}")
        
        elif event.event_type == ValidationEventType.ROW_PROCESSED:
            stats = event.data.get("stats_snapshot", {})
            if stats.get("total", 0) % 10000 == 0:
                print(f"Processed {stats['total']:,} rows...")
        
        elif event.event_type == ValidationEventType.VALIDATION_COMPLETED:
            print(f"Complete: {event.data.get('stats')}")

runner.add_observer(CustomObserver())

Data Profiling with whylogs

Use WhylogsObserver to automatically profile your data during validation. This enables data quality monitoring, drift detection, and statistical comparison between raw input data and validated outputs.

Installation:

pip install abstract-validation-base[whylogs]

Basic Usage:

import csv
from abstract_validation_base import ValidationRunner, WhylogsObserver

# Create observer with default settings
observer = WhylogsObserver(
    chunk_size=10000,      # Rows to buffer before profiling (memory vs latency tradeoff)
    profile_raw=True,      # Profile all input data
    profile_valid=True,    # Profile only records that pass validation
)

with open("data.csv") as f:
    runner = ValidationRunner(csv.DictReader(f), MyModel)
    runner.add_observer(observer)
    
    for result in runner.run():
        process(result)

# After validation completes, retrieve profiles
profiles = observer.get_profiles()

Exporting Profiles:

# Write to whylogs binary format (recommended for large profiles)
paths = profiles.write(
    raw_path="output/raw_profile.bin",
    valid_path="output/valid_profile.bin",
)

# Convert to pandas DataFrames for analysis
dfs = profiles.to_pandas()
raw_stats = dfs["raw"]   # DataFrame with column statistics
valid_stats = dfs["valid"]

Comparing Raw vs Valid Profiles:

# Get comparison statistics
comparison = observer.compare_profiles()

print(f"Pass rate: {comparison.pass_rate:.1%}")
print(f"Raw columns: {comparison.raw_column_count}")
print(f"Valid columns: {comparison.valid_column_count}")
print(f"Columns only in raw: {comparison.columns_only_in_raw}")
print(f"Columns only in valid: {comparison.columns_only_in_valid}")

# Serialize comparison for reporting
report = comparison.to_dict()

Selective Profiling:

# Profile only raw data (useful for diagnosing input issues)
observer = WhylogsObserver(profile_raw=True, profile_valid=False)

# Profile only valid data (useful for downstream quality checks)
observer = WhylogsObserver(profile_raw=False, profile_valid=True)

With Custom Schema:

from whylogs.core.schema import DatasetSchema, DeclarativeSchema
from whylogs.core.resolvers import StandardResolver

# Define explicit types for better profiling accuracy
schema = DeclarativeSchema([
    StandardResolver(),
])

observer = WhylogsObserver(schema=schema)

Reusing Observer:

# Reset state before reusing with another runner
observer.reset()
runner2 = ValidationRunner(other_data, MyModel)
runner2.add_observer(observer)
for result in runner2.run():
    process(result)

Key Points:

get_profiles() and compare_profiles() raise RuntimeError if called while validation is running
Thread-safe for use with parallel validation (runner.run(workers=4))
Profiles are accumulated incrementally using whylogs merge for memory efficiency
The chunk_size parameter controls memory usage vs profiling latency tradeoff

SQLModel Integration

Basic Usage

from abstract_validation_base import ValidatedRecord

class UserRecord(ValidatedRecord, table_name="users"):
    email: str
    name: str
    tier: str = "free"

# Use as ValidationBase
user = UserRecord(email="test@example.com", name="Test")
user.add_error("email", "Domain blocked")
user.add_cleaning_process("name", "  Test  ", "Test", "Trimmed whitespace")

# Convert to SQLModel for database
db_user = user.to_db()
session.add(db_user)

# Or with field overrides
db_user = user.to_db(id=123, tier="premium")

# Access the generated SQLModel class
UserDB = UserRecord.db_model()
users = session.exec(select(UserDB)).all()

Key Points

table_name parameter sets the database table name
Auto-generated DB model includes id: int primary key
process_log is excluded from database model
DB model is lazily generated on first access

Anti-Patterns to Avoid

1. Raising Exceptions in Validators

# WRONG - Don't raise exceptions for validation failures
class BadValidator(BaseValidator[MyModel]):
    def validate(self, item: MyModel) -> ValidationResult:
        if not item.email:
            raise ValueError("Email required")  # DON'T DO THIS
        return ValidationResult(is_valid=True)

# CORRECT - Use ValidationResult
class GoodValidator(BaseValidator[MyModel]):
    def validate(self, item: MyModel) -> ValidationResult:
        result = ValidationResult(is_valid=True)
        if not item.email:
            result.add_error("email", "Email required")
        return result

2. Modifying Items in Validators

# WRONG - Don't modify the item being validated
class BadValidator(BaseValidator[MyModel]):
    def validate(self, item: MyModel) -> ValidationResult:
        item.email = item.email.lower()  # DON'T MODIFY
        return ValidationResult(is_valid=True)

# CORRECT - Validators should only inspect, not modify
# Use ValidationBase.add_cleaning_process() separately if cleaning is needed

3. Materializing Large Iterators

# WRONG - Loads entire file into memory
with open("huge_file.csv") as f:
    data = list(csv.DictReader(f))  # DON'T DO THIS
    runner = ValidationRunner(iter(data), MyModel)

# CORRECT - Pass iterator directly
with open("huge_file.csv") as f:
    runner = ValidationRunner(csv.DictReader(f), MyModel)
    for result in runner.run():
        process(result)

4. Accessing Private Attributes

# WRONG - Don't access private attributes
entries = model._process_log.errors  # DON'T DO THIS

# CORRECT - Use public methods
entries = model.audit_log()
has_errors = model.has_errors
count = model.error_count

5. Forgetting Type Parameters

# WRONG - Missing type parameter loses type safety
class MyValidator(BaseValidator):  # Missing [T]
    def validate(self, item) -> ValidationResult: ...

# CORRECT - Always specify type parameter
class MyValidator(BaseValidator[MyModel]):
    def validate(self, item: MyModel) -> ValidationResult: ...

6. Creating ValidationResult with is_valid=False

# WRONG - Manually setting is_valid without errors
result = ValidationResult(is_valid=False)  # No errors listed!

# CORRECT - Use add_error() which handles is_valid
result = ValidationResult(is_valid=True)
result.add_error("field", "message")  # Sets is_valid=False automatically

7. Writing Example-Based Tests Instead of Property Tests

# WRONG - Hardcoded examples miss edge cases
def test_add_error():
    result = ValidationResult(is_valid=True)
    result.add_error("email", "Invalid")
    assert result.is_valid is False

# CORRECT - Property-based test covers all inputs
from hypothesis import given, strategies as st

@given(st.text(min_size=1), st.text(min_size=1))
def test_add_error(field: str, message: str):
    result = ValidationResult(is_valid=True)
    result.add_error(field, message)
    assert result.is_valid is False

Event Types Reference

Event Type	Emitted By	Data Keys
`ERROR_ADDED`	ValidationBase	`field`, `message`, `value`, `context`
`CLEANING_ADDED`	ValidationBase	`field`, `original_value`, `new_value`, `reason`, `operation_type`
`VALIDATION_STARTED`	CompositeValidator, ValidationRunner	`model_class`, `validator_name`, `total_hint`
`VALIDATION_COMPLETED`	CompositeValidator, ValidationRunner	`is_valid`, `error_count`, `duration_ms`, `stats`
`ROW_PROCESSED`	ValidationRunner	`row_index`, `is_valid`, `stats_snapshot`, `errors`, `raw_data`, `model_dict`
`BATCH_STARTED`	ValidationRunner	`batch_number`, `batch_size`
`BATCH_COMPLETED`	ValidationRunner	`batch_number`, `batch_size`

Testing with Hypothesis

All tests in this repository must use Hypothesis for property-based, stateful, and unit testing. Hypothesis finds edge cases that traditional example-based tests miss.

Property-Based Testing

Use @given to test invariants that should hold for all valid inputs:

from hypothesis import given, strategies as st
from abstract_validation_base import ValidationResult

@given(st.text(), st.text())
def test_add_error_always_sets_invalid(field: str, message: str):
    """Property: Adding an error always results in is_valid=False."""
    result = ValidationResult(is_valid=True)
    result.add_error(field, message)
    assert result.is_valid is False
    assert len(result.errors) >= 1


@given(st.lists(st.tuples(st.text(min_size=1), st.text(min_size=1)), min_size=1))
def test_error_count_matches_errors_added(errors: list[tuple[str, str]]):
    """Property: Error count equals number of errors added."""
    result = ValidationResult(is_valid=True)
    for field, message in errors:
        result.add_error(field, message)
    assert len(result.errors) == len(errors)

Stateful Testing

Use RuleBasedStateMachine to test complex interactions and state transitions:

from hypothesis import strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, invariant
from abstract_validation_base import ValidationBase

class ValidationBaseStateMachine(RuleBasedStateMachine):
    """Test that ValidationBase maintains consistent state through operations."""
    
    def __init__(self):
        super().__init__()
        self.model = MyModel(name="test")
        self.expected_error_count = 0
        self.expected_cleaning_count = 0
    
    @rule(field=st.text(min_size=1), message=st.text(min_size=1))
    def add_error(self, field: str, message: str):
        self.model.add_error(field, message)
        self.expected_error_count += 1
    
    @rule(
        field=st.text(min_size=1),
        original=st.text(),
        new=st.text(),
        reason=st.text(min_size=1)
    )
    def add_cleaning(self, field: str, original: str, new: str, reason: str):
        self.model.add_cleaning_process(field, original, new, reason)
        self.expected_cleaning_count += 1
    
    @invariant()
    def error_count_consistent(self):
        assert self.model.error_count == self.expected_error_count
    
    @invariant()
    def cleaning_count_consistent(self):
        assert self.model.cleaning_count == self.expected_cleaning_count
    
    @invariant()
    def has_errors_reflects_count(self):
        assert self.model.has_errors == (self.expected_error_count > 0)


TestValidationBase = ValidationBaseStateMachine.TestCase

Unit Testing with Hypothesis

Even simple unit tests benefit from Hypothesis strategies:

from hypothesis import given, strategies as st, assume
from abstract_validation_base import CompositeValidator

# Strategy for generating valid model data
model_data = st.fixed_dictionaries({
    "name": st.text(min_size=1, max_size=100),
    "email": st.emails(),
    "age": st.integers(min_value=0, max_value=150),
})

@given(model_data)
def test_composite_validator_runs_all_validators(data: dict):
    """All validators in composite are executed."""
    model = MyModel(**data)
    composite = CompositeValidator[MyModel](
        validators=[ValidatorA(), ValidatorB()],
        name="test_composite"
    )
    result = composite.validate(model)
    # Verify both validators contributed to the result
    assert isinstance(result.is_valid, bool)

Custom Strategies

Define reusable strategies for domain types:

from hypothesis import strategies as st

# Strategy for generating ValidationBase models
@st.composite
def validation_models(draw, with_errors: bool = False, with_cleaning: bool = False):
    """Generate MyModel instances with optional errors/cleaning."""
    model = MyModel(
        name=draw(st.text(min_size=1)),
        email=draw(st.emails()),
    )
    
    if with_errors:
        num_errors = draw(st.integers(min_value=1, max_value=5))
        for _ in range(num_errors):
            model.add_error(
                draw(st.text(min_size=1, max_size=20)),
                draw(st.text(min_size=1, max_size=100))
            )
    
    if with_cleaning:
        num_cleaning = draw(st.integers(min_value=1, max_value=5))
        for _ in range(num_cleaning):
            model.add_cleaning_process(
                draw(st.text(min_size=1, max_size=20)),
                draw(st.text()),
                draw(st.text()),
                draw(st.text(min_size=1, max_size=100))
            )
    
    return model


@given(validation_models(with_errors=True))
def test_models_with_errors_report_has_errors(model):
    assert model.has_errors is True

Settings and Profiles

Configure Hypothesis appropriately for CI vs local development:

from hypothesis import settings, Phase

# In conftest.py - register profiles
settings.register_profile("ci", max_examples=1000, deadline=None)
settings.register_profile("dev", max_examples=100, deadline=500)
settings.register_profile("debug", max_examples=10, phases=[Phase.generate])

# Load profile from environment
import os
settings.load_profile(os.getenv("HYPOTHESIS_PROFILE", "dev"))

Testing Guidelines

Prefer property-based tests — They find edge cases you won't think of
Use stateful tests for stateful components — ValidationBase, ProcessLog, etc.
Define custom strategies in conftest.py for reuse across test modules
Use @example decorator to pin specific regression cases
Set deadline=None for tests involving I/O or complex operations
Use assume() to filter invalid combinations rather than complex strategies

Example Test Structure

tests/
├── conftest.py           # Shared fixtures and Hypothesis strategies
├── strategies.py         # Custom Hypothesis strategies (optional)
├── test_base.py          # Property + stateful tests for ValidationBase
├── test_validators.py    # Property tests for validator behavior
├── test_runner.py        # Stateful tests for ValidationRunner
└── test_writers.py       # Property tests for output writers

Reporting Issues to This Package

If you're an AI agent assisting with a project that uses abstract-validation-base and you discover a bug, missing feature, or documentation issue, you can file an issue against this package.

Repository: Abstract-Data/abstract-validation-base

Using MCP GitHub Server (Recommended for AI Agents)

If your AI client (Cursor, Claude Desktop, etc.) has the GitHub MCP server configured, use the mcp_github_create_issue tool targeting Abstract-Data/abstract-validation-base.

Cursor (~/.cursor/mcp.json):

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "<your-token>"
      }
    }
  }
}

Issue Templates (Structured Forms)

This repository uses GitHub's YAML-based issue forms. When creating issues via MCP, format the body to match the template fields:

Bug Report — Use title prefix [Bug]:

### Prerequisites
- [x] I have searched existing issues
- [x] I am using the latest version

### Bug Description
[Clear description of the bug]

### Steps to Reproduce
```python
from abstract_validation_base import ValidationBase

# Minimal code that reproduces the issue

Expected Behavior

[What should happen]

Actual Behavior

[What actually happens]

Affected Component

[ValidationBase / Validators / Runner / Writers / SQLModel / Rich / Events]

Error Output

[Stack trace if applicable]

Environment

Package Version: [e.g., 0.3.0a1]
Python Version: [3.10 / 3.11 / 3.12]
OS: [macOS / Linux / Windows]


**Feature Request** — Use title prefix `[Feature]: `
```markdown
### Prerequisites
- [x] I have searched existing issues
- [x] I have read the documentation

### Problem Statement
[What limitation or pain point does this address?]

### Proposed Solution
[What would you like to see?]

### Alternatives Considered
[Other approaches you've thought about]

### Affected Component
[ValidationBase / Validators / Runner / Writers / SQLModel / Rich / New Component]

### Use Case Example
```python
# Example code showing how this feature would be used

Priority

[Nice to have / Would significantly improve workflow / Blocking use case]


**Documentation Issue** — Use title prefix `[Docs]: `
```markdown
### Issue Type
[Missing / Incorrect / Unclear / Needs example / Typo / Outdated]

### Location
[README.md / AGENTS.md / Docstrings / API reference]

### Problem Description
[What's wrong or missing?]

### Suggested Improvement
[How should the documentation be improved?]

Auto-Labeling

Issues are automatically labeled based on content:

Keywords in Issue	Label Applied
ValidationBase, add_error, ProcessLog	`component:base`
BaseValidator, CompositeValidator, pipeline	`component:validators`
ValidationRunner, streaming, large file	`component:runner`
CSVFailedWriter, JSONLines, AuditReport	`component:writers`
ValidatedRecord, SQLModel, to_db	`component:sqlmodel`
RichDashboard, SimpleProgress, observer	`component:rich`
WhylogsObserver, ProfilePair, profiling, whylogs	`component:whylogs`

Bug reports automatically receive a helpful comment with relevant documentation links.

Contributing Pull Requests

When submitting PRs to this repository, ensure the following checks pass locally:

Pre-submission Checklist

# Linting
uv run ruff check src tests
uv run ruff format src tests

# Type checking
uv run mypy src

# Tests
uv run pytest

PR Requirements

Link to related issue — Reference with Closes #123
Type of change — Bug fix, feature, docs, refactor, tests
Tests — Add Hypothesis-based tests (property, stateful, or unit) for new functionality
Documentation — Update docs/docstrings for user-facing changes

CI Checks (Automated)

These run automatically on PRs:

ruff check and ruff format --check
mypy src
pytest --cov

PRs cannot be merged until all CI checks pass

FilesExpand file tree

AGENTS.md

Latest commit

History