Production Readiness: Phases 1-4 (Interface, Security, Logging, Error Handling) by blokpardi · Pull Request #1 · Airbais/intent-tools

blokpardi · 2026-01-25T07:41:10Z

Summary

This PR implements the first four phases of the Airbais Tools Production Readiness initiative, transforming the codebase from operational to production-ready.

Phase 1: Interface Contracts ✅

JSON schema for dashboard-data.json with required tool_type field
Schema validation on dashboard data load
Replaced heuristic tool detection with explicit tool_type lookup

Phase 2: Security Hardening ✅

Pydantic validation for all API request parameters
Sanitized error responses (no stack traces leaked)
CORS with explicit origin whitelist
Flask-Talisman security headers (HSTS, CSP, X-Frame-Options)
Subprocess command whitelist with path validation

Phase 3: Structured Logging ✅

structlog for JSON logging with timestamp, level, context
Correlation IDs (X-Correlation-ID) tracking requests across operations
Request/response logging for API endpoints
Audit logging for tool executions with job_id tracking

Phase 4: Error Handling ✅

Custom exception hierarchy (ToolNotFoundError, ToolExecutionError, JobNotFoundError, etc.)
Structured error responses with error_code, message, correlation_id, timestamp
UTC-aware datetime serialization (RFC 3339)
No bare except blocks remain
57 tests covering error handling infrastructure

Test plan

Run automation test suite: cd automation && pytest tests/ -v
Verify API starts: python automation/api_server.py
Test health endpoint: curl http://localhost:8888/health
Verify structured logging output (JSON format in production)
Test error response format includes correlation_id and timestamp

Stats

57 commits
18 plans executed across 4 phases
100+ tests added
0 bare except blocks remaining

🤖 Generated with Claude Code

- Added RulesValidator class to validate the structure and content of JSON rules files. - Implemented methods for file validation, structure checks, and normalization of rule types. - Added logging for validation errors and warnings. feat: Create Website Crawler for content ingestion - Developed WebsiteCrawler class to crawl websites and ingest content. - Implemented configuration validation, crawling logic, and content extraction. - Integrated robots.txt handling to respect crawling rules. docs: Add sample content for testing - Created markdown and text files containing sample company policies and features for testing purposes. test: Add unit tests for content ingestion - Implemented tests for ContentProcessor and LocalFileSource classes. - Added tests for various content formats including markdown, HTML, JSON, and CSV. test: Add integration tests for complete evaluation workflow - Developed integration tests to validate the complete workflow of the Rules Evaluator. - Included tests for critical rule failures and overall evaluation results. test: Add tests for RAG database functionality - Implemented tests for RAGDatabase class covering initialization, content addition, querying, and statistics. test: Add tests for RulesValidator - Created unit tests for RulesValidator class to validate rules files. - Included tests for valid and invalid rules scenarios, case normalization, and error handling.

…ed prompts and rules

…update connections accordingly

…and tiktoken

- Add .planning/intel/ with file index, conventions, and summary - Add .planning/codebase/ with 7 architecture documents: - STACK.md: Languages, frameworks, dependencies - INTEGRATIONS.md: External APIs (OpenAI, Anthropic, ChromaDB) - ARCHITECTURE.md: Modular plugin architecture pattern - STRUCTURE.md: Directory layout and naming conventions - CONVENTIONS.md: Code style and error handling patterns - TESTING.md: pytest framework and test organization - CONCERNS.md: Tech debt, security, and performance issues Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Production readiness initiative for Airbais Tools suite — security hardening, bug fixes, test coverage, and refactoring of monolithic files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Keep .planning/ local-only per user preference. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add tool_type: 'intentcrawler' as first field in dashboard-data.json - Enables explicit tool identification instead of heuristic detection - Preserves all existing dashboard data fields

- Define schema with Draft 2020-12 specification - Add tool_type enum with 6 tool identifiers - Include timestamp, summary, metrics, data fields - Set additionalProperties: true for tool-specific fields

- Add tool_type: 'graspevaluator' as first field in dashboard-data.json - Keep existing 'tool' field for backwards compatibility - Enables explicit tool identification matching schema enum

- LLMEvaluator: Add tool_type to both dashboard generation methods - GEOEvaluator: Add tool_type as first field in dashboard data - LLMSTxtGenerator: Add tool_type to dashboard data structure - RulesEvaluator: Add tool_type alongside existing tool field All tools now explicitly identify their type for dashboard validation

- Implement DashboardDataValidator class using jsonschema - Provide clear error messages with json_path for invalid fields - Add validate() and validate_file() methods - Export convenience function validate_dashboard_data() - Use iter_errors() to collect all validation errors

- Create requirements.txt for dashboard module - Add jsonschema>=4.20.0 for schema validation - Include dash, plotly, pandas dependencies

- Import DashboardDataValidator with graceful fallback if unavailable - Initialize validator in __init__ (cached for performance) - Validate data after JSON load in load_tool_data method - Log validation errors as warnings (backwards-compatible) - Never fail on validation errors - graceful degradation Task 1/3 complete

- Refactor _detect_tool_type to check tool_type field first - Check legacy 'tool' field as second option - Extract heuristic logic to _detect_tool_type_heuristic method - Preserve ALL existing heuristic detection logic intact - Add debug logging when using explicit vs heuristic detection - Maintain full backwards compatibility with legacy files Task 2/3 complete

- Wrap validation calls in try/except to never prevent data loading - Add debug logging when validation passes - Log warning when validator not available - Validate tool_type field type and log warning for invalid types - Add informative debug logging for tool type detection method used - Ensure graceful degradation in all error cases Task 3/3 complete

- Create tests/ directory with __init__.py - Create tests/integration/ subdirectory with __init__.py - Establishes package structure for integration tests

- TestSchemaValidator class with comprehensive schema validation tests - Tests for valid minimal data passing validation - Tests for missing required fields (tool_type, timestamp) - Tests for invalid tool_type values - Parametrized tests for all 6 valid tool types - Tests for additional properties being allowed - Tests for timestamp format validation including timezone offsets

- TestDataLoader class verifies explicit tool_type detection - Tests for legacy 'tool' field fallback - Heuristic detection tests for all 6 tool types - Test that explicit tool_type is preferred over heuristics - TestExistingDataFiles class for integration tests with real data - Tests skip gracefully if data files don't exist - Schema validation integration test verifies validator runs on real files Complete test suite: 23 test methods across 3 test classes

- Add handle_validation_error for Pydantic validation errors (400) - Add handle_bad_request for generic 400 errors - Add handle_not_found for 404 errors (no logging needed) - Add handle_internal_error for 500 errors (logs with exc_info=True) - Add handle_unexpected_exception catch-all (logs at CRITICAL level) - Add register_error_handlers function for Flask integration - All error responses hide stack traces, file paths, and internal details - Full debugging info preserved in logs via exc_info=True

- Add validation module structure (automation/validation/) - Create AnalyzeRequest model for POST body validation - Create JobIdPath model for UUID path parameter validation - URL format validation requires http/https prefix - Numeric constraints (max_pages: 1-1000, crawl_depth: 1-10, etc.) - Log level pattern validation (DEBUG|INFO|WARNING|ERROR) - Reject unknown fields (extra='forbid')

- Created security module structure in automation/security/ - Implemented get_cors_origins() with priority: env var > config > defaults - Implemented configure_cors() with wildcard protection - Default origins: localhost only (development safe) - Environment override: CORS_ALLOWED_ORIGINS comma-separated list - Security: Detects and removes wildcard (*) origins

- Replaced cors_enabled boolean with cors_origins list - Added explicit localhost origins for development - Added comments explaining production configuration - No wildcard origins in default configuration

- Import and register error handlers on Flask app initialization - Update run_tool_async to return generic error message instead of exception details - Update analyze endpoint to log with exc_info=True and return generic message - Error responses now hide internal implementation details from clients - Full exception details preserved in server logs for debugging

- Removed direct flask_cors import - Added security.cors_config import - Replaced CORS(app) with configure_cors(app, config) - Passes module-level config dict to configure_cors() - CORS now uses explicit whitelist from tools_config.yaml

@Validate

- Add flask-pydantic==0.14.0 and pydantic>=2.0.0 to requirements.txt - Add @Validate() decorator to /<tool_name>/analyze endpoint - Replace request.get_json() with validated AnalyzeRequest body parameter - Add JobIdPath validation to /status/<job_id> endpoint - Add JobIdPath validation to /results/<job_id> endpoint - Return 400 with structured error details on validation failure - Remove manual job_id cleanup (regex-based) in favor of Pydantic validation

- TestErrorResponses class verifies error sanitization - test_404_does_not_expose_paths ensures paths not echoed - test_invalid_json_returns_400 verifies no tracebacks in response - test_missing_required_params_returns_400 checks structured errors - test_unknown_tool_returns_404 verifies safe error messages - test_error_response_format_consistent validates JSON structure - TestJobErrorSanitization class tests job error handling - TestHealthEndpoint validates health check endpoint - Add pytest>=7.0.0 to automation requirements.txt - Tests verify no file paths, tracebacks, or internals exposed

- Add subprocess_validator.py with ALLOWED_SCRIPTS whitelist - Implement validate_script() to reject non-whitelisted scripts - Implement validate_path_within_project() with Path.resolve() for canonicalization - Implement validate_tool_directory() to ensure dirs within project root - Implement build_safe_command() returning list for shell=False execution - Export validation functions from security/__init__.py

- Import build_safe_command from subprocess_validator - Replace manual command building with validated build_safe_command() - Simplified run_tool_async by removing 60+ lines of parameter logic - Ensure all subprocess calls use shell=False (list arguments) - Validation failure raises clear error message

- Test script whitelist (allowed/rejected/empty) - Test path traversal prevention with Path.resolve() - Test absolute paths outside project blocked - Test tool directory validation - Test command building with valid/invalid inputs - Test command is list (not string) for shell=False - All 12 tests pass

- Add flask-talisman==1.1.0 to requirements.txt - Create security/talisman_config.py with configure_talisman() - Configure HSTS, CSP, X-Frame-Options, X-Content-Type-Options - Support TALISMAN_FORCE_HTTPS env var for proxy setups - Export configure_talisman and get_api_csp from security module

- Import configure_talisman in api_server.py - Call configure_talisman(app) after CORS configuration - Security headers now applied to all API responses - Remove duplicate register_error_handlers() call

- Test X-Frame-Options: DENY header present - Test X-Content-Type-Options: nosniff header present - Test Content-Security-Policy restrictive directives - Test Referrer-Policy header present - Test headers on error responses (with redirect support) - Test headers on POST endpoints - Test Talisman configuration (CSP, debug mode, env override) - All 10 tests pass

- Add structlog>=24.1.0 to automation/requirements.txt - Foundation for structured logging infrastructure

- Create automation/log_config/ module (avoiding stdlib logging name conflict) - Implement configure_structlog() with environment-aware rendering - JSON output for production (no tty), console output for debug/dev - Add get_logger() factory function - Include processors for timestamps, log levels, context vars, exceptions

- Test console renderer in debug mode - Test JSON renderer in production mode (no tty) - Verify logger factory returns bound logger - Test context variables propagation (correlation_id) - Test exception formatting as structured data - All 6 tests passing

- Add correlation.py with Flask middleware for request tracking - Generate UUID for requests without X-Correlation-ID header - Preserve provided correlation IDs from request headers - Bind correlation_id to structlog context for all logs - Include http_method, http_path, remote_addr in log context - Return X-Correlation-ID header in response - Export configure_correlation and get_correlation_id from log_config

- Created request_logger.py with structlog-based logging - Log requests with method, path, endpoint, content_type - Log responses with status_code and duration_ms - Redact sensitive fields (password, token, api_key, etc.) - Export configure_request_logging and redact_sensitive from log_config

- Import configure_correlation from log_config module - Call configure_correlation(app) before other middleware - Position as first middleware to ensure correlation ID available for all requests - Correlation ID now binds to structlog context on every request - All subsequent logs include correlation_id automatically

- Import configure_request_logging from log_config - Call configure_request_logging after correlation middleware - All API requests now logged with method, path, timing - All API responses logged with status_code, duration_ms

- Test UUID generation when correlation ID not provided - Test preservation of provided correlation IDs - Test uniqueness across different requests - Test correlation ID on error responses (500) - Test correlation ID on 404 responses - Test custom (non-UUID) correlation ID formats preserved - All 6 tests pass

- Add start_time tracking and job context binding with structlog - Log tool_execution_started with redacted parameters - Log tool_subprocess_starting with command info - Pass CORRELATION_ID to subprocess environment - Log tool_execution_completed with results_dir and duration - Log tool_execution_failed with error and exc_info on failure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Test redact_sensitive() with various sensitive field names - Test case-insensitive redaction - Test nested field redaction - Test integration with Flask (GET/POST, status codes, duration) - Test sensitive query parameter redaction - All 13 tests pass

- create_job: logger.info('job_created', job_id, tool_name) - get_status: logger.debug('job_status_requested', job_id) - get_status (not found): logger.info('job_not_found', requested_job_id, total_jobs) - analyze exception: logger.error('analyze_start_failed', error, exc_info=True) - main: logger.info('api_server_starting', host, port, debug, available_tools) - load_config: logger.error('config_load_failed', config_path, error) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- test_create_job_returns_valid_uuid: Verifies job creation returns valid UUID - test_analyze_accepts_valid_request: Tests analyze endpoint accepts requests and returns job_id - test_status_returns_400_for_invalid_job_id_format: Validates job ID format - test_status_returns_404_for_nonexistent_job: Returns 404 for nonexistent valid UUID - test_status_returns_200_for_existing_job: Returns job details for existing job - test_health_check_returns_200: Health endpoint returns 200 All tests pass (8/8) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace logging.getLogger with structlog.get_logger - Update logging calls to use structured key-value format - Event names: validation_error, bad_request, internal_server_error, etc.

- cors_config.py: structured cors_origins_*, cors_configured events - talisman_config.py: structured talisman_configured event with force_https - subprocess_validator.py: structured path_traversal_detected, script_not_allowed, command_built events

- TestAllModulesUseStructlog: verify all modules use structlog.get_logger - TestStructuredLogOutput: verify JSON output and key-value format - TestEndToEndLogging: verify requests complete without logging errors - TestNoLegacyLogging: verify no logging.getLogger() calls remain

- Pre-existing bug: tests were checking capsys.out but structlog outputs to stderr via stdlib logging integration - Fix: use caplog.records which properly captures pytest logging - All 13 tests now pass

- Add AirbaisAPIException base class with error_code and status_code - Add ToolNotFoundError (404, TOOL_NOT_FOUND) - Add ToolExecutionError (500, TOOL_EXECUTION_FAILED) - Add JobNotFoundError (404, JOB_NOT_FOUND) - Add ConfigurationError (500, CONFIG_ERROR) - Add InvalidParameterError (400, INVALID_PARAMETER)

- Add utc_now_iso() returning RFC 3339 formatted timestamp - Add utc_now() returning timezone-aware datetime object - Replaces naive datetime.now() with timezone-aware alternative

- Import get_correlation_id and utc_now_iso - Create build_error_response() with structured schema - Include correlation_id with 'none' fallback - Include UTC timestamp via utc_now_iso() - Support optional error_code and details fields

- Import AirbaisAPIException - Create handle_airbais_exception() with structured logging - Update handle_validation_error() to use build_error_response() - Update handle_bad_request() to use build_error_response() - Update handle_not_found() to use build_error_response() - Update handle_internal_error() to use build_error_response() - Update handle_unexpected_exception() to use build_error_response() - Register AirbaisAPIException handler in register_error_handlers()

…ation - Replace bare except block with (json.JSONDecodeError, KeyError, IOError) - Replace Exception with (yaml.YAMLError, IOError) in config loading - Replace ValueError/Exception with custom exceptions (ToolNotFoundError, ToolExecutionError, ConfigurationError) - Replace all datetime.now().isoformat() with utc_now_iso() - Add exc_info=True to all logger.error calls in except blocks - Add imports for custom exceptions and utc_now_iso utility Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- test_exceptions.py: 34 tests covering all exception classes - Verifies error_code, status_code, message formatting - Tests inheritance from AirbaisAPIException - Validates string representation - test_error_responses.py: 23 tests covering error response structure - build_error_response() includes all required fields - UTC-aware timestamps in RFC 3339 format - correlation_id fallback (never None) - Custom exception handler returns correct status codes - Integration tests for error response format - conftest.py: pytest configuration - Sets TALISMAN_FORCE_HTTPS=false before imports - Fixes Talisman HTTPS redirect in test environment Tests verify: - Exception hierarchy correctness - Error response structure consistency - UTC timestamp format compliance - HTTP status code mapping

…error format - Replaced legacy error format with ToolNotFoundError exception - Ensures consistent error response structure across all endpoints - Bug discovered by integration tests (inconsistent error format) Before: {'error': 'Unknown tool: X', 'available_tools': [...]} After: Uses handle_airbais_exception for structured response

Include roadmap, requirements, state tracking, research documents, and phase plans/summaries for production readiness initiative. - Phase 1: Interface Contracts (complete) - Phase 2: Security Hardening (complete) - Phase 3: Structured Logging (complete) - Phase 4: Error Handling (complete) - Phase 5: API Hardening (planned) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

blokpardi and others added 30 commits August 7, 2025 17:39

feat: Add evaluation framework for AI-generated content with structur…

ab88dce

…ed prompts and rules

refactor: Rename "Success Notification" node to "Notify Success" and …

459c13f

…update connections accordingly

chore: Update package versions in requirements.txt for lxml, pandas, …

6ffc664

…and tiktoken

docs: initialize project

3c2b169

Production readiness initiative for Airbais Tools suite — security hardening, bug fixes, test coverage, and refactoring of monolithic files. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chore: ignore planning directory

7d121ba

Keep .planning/ local-only per user preference. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

feat(01-02): add tool_type field to IntentCrawler output

2b46f55

- Add tool_type: 'intentcrawler' as first field in dashboard-data.json - Enables explicit tool identification instead of heuristic detection - Preserves all existing dashboard data fields

feat(01-01): create JSON Schema for dashboard-data.json

c1b609b

- Define schema with Draft 2020-12 specification - Add tool_type enum with 6 tool identifiers - Include timestamp, summary, metrics, data fields - Set additionalProperties: true for tool-specific fields

feat(01-02): add tool_type field to GRASPEvaluator output

a14da40

- Add tool_type: 'graspevaluator' as first field in dashboard-data.json - Keep existing 'tool' field for backwards compatibility - Enables explicit tool identification matching schema enum

chore(01-01): add dashboard requirements.txt with jsonschema

70c066d

- Create requirements.txt for dashboard module - Add jsonschema>=4.20.0 for schema validation - Include dash, plotly, pandas dependencies

chore(01-04): create test directory structure

7363ea0

- Create tests/ directory with __init__.py - Create tests/integration/ subdirectory with __init__.py - Establishes package structure for integration tests

feat(02-03): update tools_config.yaml with CORS origin configuration

c64ecd1

- Replaced cors_enabled boolean with cors_origins list - Added explicit localhost origins for development - Added comments explaining production configuration - No wildcard origins in default configuration

blokpardi and others added 29 commits January 23, 2026 12:27

feat(02-05): integrate Talisman with API server

725f535

- Import configure_talisman in api_server.py - Call configure_talisman(app) after CORS configuration - Security headers now applied to all API responses - Remove duplicate register_error_handlers() call

chore(03-01): add structlog dependency

45a17bb

- Add structlog>=24.1.0 to automation/requirements.txt - Foundation for structured logging infrastructure

feat(03-03): integrate request logging with API server

b411fac

- Import configure_request_logging from log_config - Call configure_request_logging after correlation middleware - All API requests now logged with method, path, timing - All API responses logged with status_code, duration_ms

refactor(03-05): migrate error_handlers.py to structlog

0e9ceb2

- Replace logging.getLogger with structlog.get_logger - Update logging calls to use structured key-value format - Event names: validation_error, bad_request, internal_server_error, etc.

fix(03-05): fix test_request_logger.py to use caplog.records

b2f9a2b

- Pre-existing bug: tests were checking capsys.out but structlog outputs to stderr via stdlib logging integration - Fix: use caplog.records which properly captures pytest logging - All 13 tests now pass

feat(04-01): create UTC-aware datetime utilities

08ea4bd

- Add utc_now_iso() returning RFC 3339 formatted timestamp - Add utc_now() returning timezone-aware datetime object - Replaces naive datetime.now() with timezone-aware alternative

feat(04-02): add build_error_response helper

0b0ade3

- Import get_correlation_id and utc_now_iso - Create build_error_response() with structured schema - Include correlation_id with 'none' fallback - Include UTC timestamp via utc_now_iso() - Support optional error_code and details fields

Major Refactor through phase 5 planning

5f1231f

chore: change file permissions for various scripts and libraries

e017cf7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Production Readiness: Phases 1-4 (Interface, Security, Logging, Error Handling)#1

Production Readiness: Phases 1-4 (Interface, Security, Logging, Error Handling)#1
blokpardi wants to merge 59 commits into
masterfrom
rules-eval

blokpardi commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

blokpardi commented Jan 25, 2026

Summary

Phase 1: Interface Contracts ✅

Phase 2: Security Hardening ✅

Phase 3: Structured Logging ✅

Phase 4: Error Handling ✅

Test plan

Stats

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant