Skip to content

Production Readiness: Phases 1-4 (Interface, Security, Logging, Error Handling)#1

Open
blokpardi wants to merge 59 commits into
masterfrom
rules-eval
Open

Production Readiness: Phases 1-4 (Interface, Security, Logging, Error Handling)#1
blokpardi wants to merge 59 commits into
masterfrom
rules-eval

Conversation

@blokpardi
Copy link
Copy Markdown
Contributor

Summary

This PR implements the first four phases of the Airbais Tools Production Readiness initiative, transforming the codebase from operational to production-ready.

Phase 1: Interface Contracts ✅

  • JSON schema for dashboard-data.json with required tool_type field
  • Schema validation on dashboard data load
  • Replaced heuristic tool detection with explicit tool_type lookup

Phase 2: Security Hardening ✅

  • Pydantic validation for all API request parameters
  • Sanitized error responses (no stack traces leaked)
  • CORS with explicit origin whitelist
  • Flask-Talisman security headers (HSTS, CSP, X-Frame-Options)
  • Subprocess command whitelist with path validation

Phase 3: Structured Logging ✅

  • structlog for JSON logging with timestamp, level, context
  • Correlation IDs (X-Correlation-ID) tracking requests across operations
  • Request/response logging for API endpoints
  • Audit logging for tool executions with job_id tracking

Phase 4: Error Handling ✅

  • Custom exception hierarchy (ToolNotFoundError, ToolExecutionError, JobNotFoundError, etc.)
  • Structured error responses with error_code, message, correlation_id, timestamp
  • UTC-aware datetime serialization (RFC 3339)
  • No bare except blocks remain
  • 57 tests covering error handling infrastructure

Test plan

  • Run automation test suite: cd automation && pytest tests/ -v
  • Verify API starts: python automation/api_server.py
  • Test health endpoint: curl http://localhost:8888/health
  • Verify structured logging output (JSON format in production)
  • Test error response format includes correlation_id and timestamp

Stats

  • 57 commits
  • 18 plans executed across 4 phases
  • 100+ tests added
  • 0 bare except blocks remaining

🤖 Generated with Claude Code

blokpardi and others added 30 commits August 7, 2025 17:39
- Added RulesValidator class to validate the structure and content of JSON rules files.
- Implemented methods for file validation, structure checks, and normalization of rule types.
- Added logging for validation errors and warnings.

feat: Create Website Crawler for content ingestion

- Developed WebsiteCrawler class to crawl websites and ingest content.
- Implemented configuration validation, crawling logic, and content extraction.
- Integrated robots.txt handling to respect crawling rules.

docs: Add sample content for testing

- Created markdown and text files containing sample company policies and features for testing purposes.

test: Add unit tests for content ingestion

- Implemented tests for ContentProcessor and LocalFileSource classes.
- Added tests for various content formats including markdown, HTML, JSON, and CSV.

test: Add integration tests for complete evaluation workflow

- Developed integration tests to validate the complete workflow of the Rules Evaluator.
- Included tests for critical rule failures and overall evaluation results.

test: Add tests for RAG database functionality

- Implemented tests for RAGDatabase class covering initialization, content addition, querying, and statistics.

test: Add tests for RulesValidator

- Created unit tests for RulesValidator class to validate rules files.
- Included tests for valid and invalid rules scenarios, case normalization, and error handling.
- Add .planning/intel/ with file index, conventions, and summary
- Add .planning/codebase/ with 7 architecture documents:
  - STACK.md: Languages, frameworks, dependencies
  - INTEGRATIONS.md: External APIs (OpenAI, Anthropic, ChromaDB)
  - ARCHITECTURE.md: Modular plugin architecture pattern
  - STRUCTURE.md: Directory layout and naming conventions
  - CONVENTIONS.md: Code style and error handling patterns
  - TESTING.md: pytest framework and test organization
  - CONCERNS.md: Tech debt, security, and performance issues

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Production readiness initiative for Airbais Tools suite — security hardening, bug fixes, test coverage, and refactoring of monolithic files.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Keep .planning/ local-only per user preference.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add tool_type: 'intentcrawler' as first field in dashboard-data.json
- Enables explicit tool identification instead of heuristic detection
- Preserves all existing dashboard data fields
- Define schema with Draft 2020-12 specification
- Add tool_type enum with 6 tool identifiers
- Include timestamp, summary, metrics, data fields
- Set additionalProperties: true for tool-specific fields
- Add tool_type: 'graspevaluator' as first field in dashboard-data.json
- Keep existing 'tool' field for backwards compatibility
- Enables explicit tool identification matching schema enum
- LLMEvaluator: Add tool_type to both dashboard generation methods
- GEOEvaluator: Add tool_type as first field in dashboard data
- LLMSTxtGenerator: Add tool_type to dashboard data structure
- RulesEvaluator: Add tool_type alongside existing tool field

All tools now explicitly identify their type for dashboard validation
- Implement DashboardDataValidator class using jsonschema
- Provide clear error messages with json_path for invalid fields
- Add validate() and validate_file() methods
- Export convenience function validate_dashboard_data()
- Use iter_errors() to collect all validation errors
- Create requirements.txt for dashboard module
- Add jsonschema>=4.20.0 for schema validation
- Include dash, plotly, pandas dependencies
- Import DashboardDataValidator with graceful fallback if unavailable
- Initialize validator in __init__ (cached for performance)
- Validate data after JSON load in load_tool_data method
- Log validation errors as warnings (backwards-compatible)
- Never fail on validation errors - graceful degradation

Task 1/3 complete
- Refactor _detect_tool_type to check tool_type field first
- Check legacy 'tool' field as second option
- Extract heuristic logic to _detect_tool_type_heuristic method
- Preserve ALL existing heuristic detection logic intact
- Add debug logging when using explicit vs heuristic detection
- Maintain full backwards compatibility with legacy files

Task 2/3 complete
- Wrap validation calls in try/except to never prevent data loading
- Add debug logging when validation passes
- Log warning when validator not available
- Validate tool_type field type and log warning for invalid types
- Add informative debug logging for tool type detection method used
- Ensure graceful degradation in all error cases

Task 3/3 complete
- Create tests/ directory with __init__.py
- Create tests/integration/ subdirectory with __init__.py
- Establishes package structure for integration tests
- TestSchemaValidator class with comprehensive schema validation tests
- Tests for valid minimal data passing validation
- Tests for missing required fields (tool_type, timestamp)
- Tests for invalid tool_type values
- Parametrized tests for all 6 valid tool types
- Tests for additional properties being allowed
- Tests for timestamp format validation including timezone offsets
- TestDataLoader class verifies explicit tool_type detection
- Tests for legacy 'tool' field fallback
- Heuristic detection tests for all 6 tool types
- Test that explicit tool_type is preferred over heuristics
- TestExistingDataFiles class for integration tests with real data
- Tests skip gracefully if data files don't exist
- Schema validation integration test verifies validator runs on real files

Complete test suite: 23 test methods across 3 test classes
- Add handle_validation_error for Pydantic validation errors (400)
- Add handle_bad_request for generic 400 errors
- Add handle_not_found for 404 errors (no logging needed)
- Add handle_internal_error for 500 errors (logs with exc_info=True)
- Add handle_unexpected_exception catch-all (logs at CRITICAL level)
- Add register_error_handlers function for Flask integration
- All error responses hide stack traces, file paths, and internal details
- Full debugging info preserved in logs via exc_info=True
- Add validation module structure (automation/validation/)
- Create AnalyzeRequest model for POST body validation
- Create JobIdPath model for UUID path parameter validation
- URL format validation requires http/https prefix
- Numeric constraints (max_pages: 1-1000, crawl_depth: 1-10, etc.)
- Log level pattern validation (DEBUG|INFO|WARNING|ERROR)
- Reject unknown fields (extra='forbid')
- Created security module structure in automation/security/
- Implemented get_cors_origins() with priority: env var > config > defaults
- Implemented configure_cors() with wildcard protection
- Default origins: localhost only (development safe)
- Environment override: CORS_ALLOWED_ORIGINS comma-separated list
- Security: Detects and removes wildcard (*) origins
- Replaced cors_enabled boolean with cors_origins list
- Added explicit localhost origins for development
- Added comments explaining production configuration
- No wildcard origins in default configuration
- Import and register error handlers on Flask app initialization
- Update run_tool_async to return generic error message instead of exception details
- Update analyze endpoint to log with exc_info=True and return generic message
- Error responses now hide internal implementation details from clients
- Full exception details preserved in server logs for debugging
- Removed direct flask_cors import
- Added security.cors_config import
- Replaced CORS(app) with configure_cors(app, config)
- Passes module-level config dict to configure_cors()
- CORS now uses explicit whitelist from tools_config.yaml
- Add flask-pydantic==0.14.0 and pydantic>=2.0.0 to requirements.txt
- Add @Validate() decorator to /<tool_name>/analyze endpoint
- Replace request.get_json() with validated AnalyzeRequest body parameter
- Add JobIdPath validation to /status/<job_id> endpoint
- Add JobIdPath validation to /results/<job_id> endpoint
- Return 400 with structured error details on validation failure
- Remove manual job_id cleanup (regex-based) in favor of Pydantic validation
- TestErrorResponses class verifies error sanitization
- test_404_does_not_expose_paths ensures paths not echoed
- test_invalid_json_returns_400 verifies no tracebacks in response
- test_missing_required_params_returns_400 checks structured errors
- test_unknown_tool_returns_404 verifies safe error messages
- test_error_response_format_consistent validates JSON structure
- TestJobErrorSanitization class tests job error handling
- TestHealthEndpoint validates health check endpoint
- Add pytest>=7.0.0 to automation requirements.txt
- Tests verify no file paths, tracebacks, or internals exposed
- Add subprocess_validator.py with ALLOWED_SCRIPTS whitelist
- Implement validate_script() to reject non-whitelisted scripts
- Implement validate_path_within_project() with Path.resolve() for canonicalization
- Implement validate_tool_directory() to ensure dirs within project root
- Implement build_safe_command() returning list for shell=False execution
- Export validation functions from security/__init__.py
- Import build_safe_command from subprocess_validator
- Replace manual command building with validated build_safe_command()
- Simplified run_tool_async by removing 60+ lines of parameter logic
- Ensure all subprocess calls use shell=False (list arguments)
- Validation failure raises clear error message
- Test script whitelist (allowed/rejected/empty)
- Test path traversal prevention with Path.resolve()
- Test absolute paths outside project blocked
- Test tool directory validation
- Test command building with valid/invalid inputs
- Test command is list (not string) for shell=False
- All 12 tests pass
blokpardi and others added 29 commits January 23, 2026 12:27
- Add flask-talisman==1.1.0 to requirements.txt
- Create security/talisman_config.py with configure_talisman()
- Configure HSTS, CSP, X-Frame-Options, X-Content-Type-Options
- Support TALISMAN_FORCE_HTTPS env var for proxy setups
- Export configure_talisman and get_api_csp from security module
- Import configure_talisman in api_server.py
- Call configure_talisman(app) after CORS configuration
- Security headers now applied to all API responses
- Remove duplicate register_error_handlers() call
- Test X-Frame-Options: DENY header present
- Test X-Content-Type-Options: nosniff header present
- Test Content-Security-Policy restrictive directives
- Test Referrer-Policy header present
- Test headers on error responses (with redirect support)
- Test headers on POST endpoints
- Test Talisman configuration (CSP, debug mode, env override)
- All 10 tests pass
- Add structlog>=24.1.0 to automation/requirements.txt
- Foundation for structured logging infrastructure
- Create automation/log_config/ module (avoiding stdlib logging name conflict)
- Implement configure_structlog() with environment-aware rendering
- JSON output for production (no tty), console output for debug/dev
- Add get_logger() factory function
- Include processors for timestamps, log levels, context vars, exceptions
- Test console renderer in debug mode
- Test JSON renderer in production mode (no tty)
- Verify logger factory returns bound logger
- Test context variables propagation (correlation_id)
- Test exception formatting as structured data
- All 6 tests passing
- Add correlation.py with Flask middleware for request tracking
- Generate UUID for requests without X-Correlation-ID header
- Preserve provided correlation IDs from request headers
- Bind correlation_id to structlog context for all logs
- Include http_method, http_path, remote_addr in log context
- Return X-Correlation-ID header in response
- Export configure_correlation and get_correlation_id from log_config
- Created request_logger.py with structlog-based logging
- Log requests with method, path, endpoint, content_type
- Log responses with status_code and duration_ms
- Redact sensitive fields (password, token, api_key, etc.)
- Export configure_request_logging and redact_sensitive from log_config
- Import configure_correlation from log_config module
- Call configure_correlation(app) before other middleware
- Position as first middleware to ensure correlation ID available for all requests
- Correlation ID now binds to structlog context on every request
- All subsequent logs include correlation_id automatically
- Import configure_request_logging from log_config
- Call configure_request_logging after correlation middleware
- All API requests now logged with method, path, timing
- All API responses logged with status_code, duration_ms
- Test UUID generation when correlation ID not provided
- Test preservation of provided correlation IDs
- Test uniqueness across different requests
- Test correlation ID on error responses (500)
- Test correlation ID on 404 responses
- Test custom (non-UUID) correlation ID formats preserved
- All 6 tests pass
- Add start_time tracking and job context binding with structlog
- Log tool_execution_started with redacted parameters
- Log tool_subprocess_starting with command info
- Pass CORRELATION_ID to subprocess environment
- Log tool_execution_completed with results_dir and duration
- Log tool_execution_failed with error and exc_info on failure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Test redact_sensitive() with various sensitive field names
- Test case-insensitive redaction
- Test nested field redaction
- Test integration with Flask (GET/POST, status codes, duration)
- Test sensitive query parameter redaction
- All 13 tests pass
- create_job: logger.info('job_created', job_id, tool_name)
- get_status: logger.debug('job_status_requested', job_id)
- get_status (not found): logger.info('job_not_found', requested_job_id, total_jobs)
- analyze exception: logger.error('analyze_start_failed', error, exc_info=True)
- main: logger.info('api_server_starting', host, port, debug, available_tools)
- load_config: logger.error('config_load_failed', config_path, error)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- test_create_job_returns_valid_uuid: Verifies job creation returns valid UUID
- test_analyze_accepts_valid_request: Tests analyze endpoint accepts requests and returns job_id
- test_status_returns_400_for_invalid_job_id_format: Validates job ID format
- test_status_returns_404_for_nonexistent_job: Returns 404 for nonexistent valid UUID
- test_status_returns_200_for_existing_job: Returns job details for existing job
- test_health_check_returns_200: Health endpoint returns 200

All tests pass (8/8)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace logging.getLogger with structlog.get_logger
- Update logging calls to use structured key-value format
- Event names: validation_error, bad_request, internal_server_error, etc.
- cors_config.py: structured cors_origins_*, cors_configured events
- talisman_config.py: structured talisman_configured event with force_https
- subprocess_validator.py: structured path_traversal_detected, script_not_allowed, command_built events
- TestAllModulesUseStructlog: verify all modules use structlog.get_logger
- TestStructuredLogOutput: verify JSON output and key-value format
- TestEndToEndLogging: verify requests complete without logging errors
- TestNoLegacyLogging: verify no logging.getLogger() calls remain
- Pre-existing bug: tests were checking capsys.out but structlog
  outputs to stderr via stdlib logging integration
- Fix: use caplog.records which properly captures pytest logging
- All 13 tests now pass
- Add AirbaisAPIException base class with error_code and status_code
- Add ToolNotFoundError (404, TOOL_NOT_FOUND)
- Add ToolExecutionError (500, TOOL_EXECUTION_FAILED)
- Add JobNotFoundError (404, JOB_NOT_FOUND)
- Add ConfigurationError (500, CONFIG_ERROR)
- Add InvalidParameterError (400, INVALID_PARAMETER)
- Add utc_now_iso() returning RFC 3339 formatted timestamp
- Add utc_now() returning timezone-aware datetime object
- Replaces naive datetime.now() with timezone-aware alternative
- Import get_correlation_id and utc_now_iso
- Create build_error_response() with structured schema
- Include correlation_id with 'none' fallback
- Include UTC timestamp via utc_now_iso()
- Support optional error_code and details fields
- Import AirbaisAPIException
- Create handle_airbais_exception() with structured logging
- Update handle_validation_error() to use build_error_response()
- Update handle_bad_request() to use build_error_response()
- Update handle_not_found() to use build_error_response()
- Update handle_internal_error() to use build_error_response()
- Update handle_unexpected_exception() to use build_error_response()
- Register AirbaisAPIException handler in register_error_handlers()
…ation

- Replace bare except block with (json.JSONDecodeError, KeyError, IOError)
- Replace Exception with (yaml.YAMLError, IOError) in config loading
- Replace ValueError/Exception with custom exceptions (ToolNotFoundError, ToolExecutionError, ConfigurationError)
- Replace all datetime.now().isoformat() with utc_now_iso()
- Add exc_info=True to all logger.error calls in except blocks
- Add imports for custom exceptions and utc_now_iso utility

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- test_exceptions.py: 34 tests covering all exception classes
  - Verifies error_code, status_code, message formatting
  - Tests inheritance from AirbaisAPIException
  - Validates string representation

- test_error_responses.py: 23 tests covering error response structure
  - build_error_response() includes all required fields
  - UTC-aware timestamps in RFC 3339 format
  - correlation_id fallback (never None)
  - Custom exception handler returns correct status codes
  - Integration tests for error response format

- conftest.py: pytest configuration
  - Sets TALISMAN_FORCE_HTTPS=false before imports
  - Fixes Talisman HTTPS redirect in test environment

Tests verify:
- Exception hierarchy correctness
- Error response structure consistency
- UTC timestamp format compliance
- HTTP status code mapping
…error format

- Replaced legacy error format with ToolNotFoundError exception
- Ensures consistent error response structure across all endpoints
- Bug discovered by integration tests (inconsistent error format)

Before: {'error': 'Unknown tool: X', 'available_tools': [...]}
After: Uses handle_airbais_exception for structured response
Include roadmap, requirements, state tracking, research documents,
and phase plans/summaries for production readiness initiative.

- Phase 1: Interface Contracts (complete)
- Phase 2: Security Hardening (complete)
- Phase 3: Structured Logging (complete)
- Phase 4: Error Handling (complete)
- Phase 5: API Hardening (planned)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant