Pitwall AI is a real-time telemetry monitoring and AI-assisted anomaly detection system that automatically generates incidents for hardware anomalies. The system ingests continuous numeric telemetry from registered hardware endpoints, performs real-time analysis using both threshold-based and AI-powered detection, and creates incidents in a standardized JSON format for integration with existing Incident Management Systems.
- Telemetry_Source: A registered hardware endpoint that emits continuous numeric data
- Telemetry_Poller: Service component that fetches data from endpoints at defined intervals
- AI_Engine: Component that analyzes telemetry data for anomalies and generates explanations
- Incident_Dispatcher: Service that formats and sends incident JSON to external systems
- Dashboard: Real-time web interface displaying telemetry charts and status indicators
- Anomaly: Any detected deviation from normal operating parameters (threshold breach, spike, drift, etc.)
User Story: As a system administrator, I want to register hardware telemetry sources, so that I can monitor critical system metrics.
- WHEN a user submits source registration data, THE Telemetry_System SHALL validate all required fields (sourceName, metric, endpoint, minSafe, maxSafe, pollIntervalMs)
- WHEN valid registration data is provided, THE Telemetry_System SHALL create a new telemetry source and return a unique identifier
- WHEN invalid registration data is provided, THE Telemetry_System SHALL return descriptive validation errors
- THE Telemetry_System SHALL store source configuration persistently for system restarts
User Story: As a monitoring system, I want to continuously collect telemetry data, so that I can track hardware performance in real-time.
- WHEN a telemetry source is registered, THE Telemetry_Poller SHALL begin polling the endpoint at the specified interval
- WHEN polling an endpoint, THE Telemetry_Poller SHALL handle network failures gracefully and retry with exponential backoff
- WHEN telemetry data is received, THE Telemetry_System SHALL store it with accurate timestamps
- WHEN storing telemetry data, THE Telemetry_System SHALL emit real-time updates via WebSocket for dashboard consumption
User Story: As an operator, I want to view live telemetry charts and status indicators, so that I can monitor system health at a glance.
- WHEN accessing the dashboard, THE Dashboard SHALL display live line charts for each registered telemetry source
- WHEN telemetry values are within safe ranges, THE Dashboard SHALL show green status indicators
- WHEN telemetry values approach limits, THE Dashboard SHALL show yellow status indicators
- WHEN telemetry values breach safe limits, THE Dashboard SHALL show red status indicators
- WHEN viewing a source page, THE Dashboard SHALL display historical samples, current AI explanations, and linked incidents
User Story: As a system monitor, I want automated anomaly detection with AI explanations, so that I can understand and respond to potential issues quickly.
- WHEN new telemetry data arrives, THE AI_Engine SHALL analyze it for threshold breaches (value > maxSafe or value < minSafe)
- WHEN analyzing telemetry data, THE AI_Engine SHALL detect rapid changes based on delta value over delta time
- WHEN monitoring telemetry streams, THE AI_Engine SHALL identify sustained abnormal conditions lasting more than 30 seconds
- WHEN detecting anomalies, THE AI_Engine SHALL generate human-readable explanations with risk assessment (low/medium/high)
- WHEN completing analysis, THE AI_Engine SHALL provide recommendations on whether to raise incidents
User Story: As an incident management system, I want to receive standardized incident notifications, so that I can track and respond to hardware issues consistently.
- WHEN the AI_Engine determines an incident should be raised, THE Incident_Dispatcher SHALL format the incident using the exact required JSON schema
- WHEN formatting incidents, THE Incident_Dispatcher SHALL map anomaly severity to appropriate priority levels (minor → P2, sustained breach → P1, dangerous spike → P0)
- WHEN sending incidents, THE Incident_Dispatcher SHALL include descriptive titles, detailed descriptions with metrics, severity, status "OPEN", serviceName from source, and createdBy "pitwall-ai"
- WHEN incident creation succeeds, THE Incident_Dispatcher SHALL log the successful transmission
- WHEN incident creation fails, THE Incident_Dispatcher SHALL retry with exponential backoff and log failures
User Story: As a developer, I want a REST API to manage telemetry sources and retrieve data, so that I can integrate with the monitoring system programmatically.
- WHEN receiving POST /api/sources requests, THE API SHALL validate and register new telemetry sources
- WHEN receiving GET /api/sources requests, THE API SHALL return a list of all registered sources with their current status
- WHEN receiving GET /api/telemetry/{sourceId} requests, THE API SHALL return recent telemetry values with timestamps
- WHEN processing API requests, THE API SHALL return appropriate HTTP status codes and error messages for invalid requests
- THE API SHALL support CORS headers for web dashboard integration
User Story: As a system administrator, I want reliable data storage and system recovery, so that telemetry monitoring continues uninterrupted during system restarts.
- WHEN storing telemetry data, THE Telemetry_System SHALL persist it to both Redis for real-time access and a database for historical analysis
- WHEN the system restarts, THE Telemetry_System SHALL automatically resume polling all registered sources
- WHEN database connections fail, THE Telemetry_System SHALL continue operating with Redis and attempt database reconnection
- THE Telemetry_System SHALL maintain at least 24 hours of telemetry history for analysis
- WHEN storage capacity approaches limits, THE Telemetry_System SHALL implement data retention policies to prevent system failure
User Story: As an external incident management system, I want to receive incidents in a specific JSON format, so that I can process them automatically without custom parsing.
- THE Incident_Dispatcher SHALL generate incident JSON with exactly these fields: title, description, severity, status, serviceName, createdBy
- WHEN creating incident titles, THE Incident_Dispatcher SHALL use descriptive format like "Hardware [Metric] Critical"
- WHEN creating incident descriptions, THE Incident_Dispatcher SHALL include source name, metric values, thresholds, duration, and detected trends
- THE Incident_Dispatcher SHALL set severity to one of: "P0", "P1", or "P2" based on anomaly classification
- THE Incident_Dispatcher SHALL always set status to "OPEN" and createdBy to "pitwall-ai"