Skip to content

SowmithBachu/pitwallai

Repository files navigation

Requirements Document

Introduction

Pitwall AI is a real-time telemetry monitoring and AI-assisted anomaly detection system that automatically generates incidents for hardware anomalies. The system ingests continuous numeric telemetry from registered hardware endpoints, performs real-time analysis using both threshold-based and AI-powered detection, and creates incidents in a standardized JSON format for integration with existing Incident Management Systems.

Glossary

  • Telemetry_Source: A registered hardware endpoint that emits continuous numeric data
  • Telemetry_Poller: Service component that fetches data from endpoints at defined intervals
  • AI_Engine: Component that analyzes telemetry data for anomalies and generates explanations
  • Incident_Dispatcher: Service that formats and sends incident JSON to external systems
  • Dashboard: Real-time web interface displaying telemetry charts and status indicators
  • Anomaly: Any detected deviation from normal operating parameters (threshold breach, spike, drift, etc.)

Requirements

Requirement 1: Hardware Source Registration

User Story: As a system administrator, I want to register hardware telemetry sources, so that I can monitor critical system metrics.

Acceptance Criteria

  1. WHEN a user submits source registration data, THE Telemetry_System SHALL validate all required fields (sourceName, metric, endpoint, minSafe, maxSafe, pollIntervalMs)
  2. WHEN valid registration data is provided, THE Telemetry_System SHALL create a new telemetry source and return a unique identifier
  3. WHEN invalid registration data is provided, THE Telemetry_System SHALL return descriptive validation errors
  4. THE Telemetry_System SHALL store source configuration persistently for system restarts

Requirement 2: Continuous Telemetry Ingestion

User Story: As a monitoring system, I want to continuously collect telemetry data, so that I can track hardware performance in real-time.

Acceptance Criteria

  1. WHEN a telemetry source is registered, THE Telemetry_Poller SHALL begin polling the endpoint at the specified interval
  2. WHEN polling an endpoint, THE Telemetry_Poller SHALL handle network failures gracefully and retry with exponential backoff
  3. WHEN telemetry data is received, THE Telemetry_System SHALL store it with accurate timestamps
  4. WHEN storing telemetry data, THE Telemetry_System SHALL emit real-time updates via WebSocket for dashboard consumption

Requirement 3: Real-Time Dashboard Display

User Story: As an operator, I want to view live telemetry charts and status indicators, so that I can monitor system health at a glance.

Acceptance Criteria

  1. WHEN accessing the dashboard, THE Dashboard SHALL display live line charts for each registered telemetry source
  2. WHEN telemetry values are within safe ranges, THE Dashboard SHALL show green status indicators
  3. WHEN telemetry values approach limits, THE Dashboard SHALL show yellow status indicators
  4. WHEN telemetry values breach safe limits, THE Dashboard SHALL show red status indicators
  5. WHEN viewing a source page, THE Dashboard SHALL display historical samples, current AI explanations, and linked incidents

Requirement 4: AI-Assisted Anomaly Detection

User Story: As a system monitor, I want automated anomaly detection with AI explanations, so that I can understand and respond to potential issues quickly.

Acceptance Criteria

  1. WHEN new telemetry data arrives, THE AI_Engine SHALL analyze it for threshold breaches (value > maxSafe or value < minSafe)
  2. WHEN analyzing telemetry data, THE AI_Engine SHALL detect rapid changes based on delta value over delta time
  3. WHEN monitoring telemetry streams, THE AI_Engine SHALL identify sustained abnormal conditions lasting more than 30 seconds
  4. WHEN detecting anomalies, THE AI_Engine SHALL generate human-readable explanations with risk assessment (low/medium/high)
  5. WHEN completing analysis, THE AI_Engine SHALL provide recommendations on whether to raise incidents

Requirement 5: Automated Incident Creation

User Story: As an incident management system, I want to receive standardized incident notifications, so that I can track and respond to hardware issues consistently.

Acceptance Criteria

  1. WHEN the AI_Engine determines an incident should be raised, THE Incident_Dispatcher SHALL format the incident using the exact required JSON schema
  2. WHEN formatting incidents, THE Incident_Dispatcher SHALL map anomaly severity to appropriate priority levels (minor → P2, sustained breach → P1, dangerous spike → P0)
  3. WHEN sending incidents, THE Incident_Dispatcher SHALL include descriptive titles, detailed descriptions with metrics, severity, status "OPEN", serviceName from source, and createdBy "pitwall-ai"
  4. WHEN incident creation succeeds, THE Incident_Dispatcher SHALL log the successful transmission
  5. WHEN incident creation fails, THE Incident_Dispatcher SHALL retry with exponential backoff and log failures

Requirement 6: REST API Interface

User Story: As a developer, I want a REST API to manage telemetry sources and retrieve data, so that I can integrate with the monitoring system programmatically.

Acceptance Criteria

  1. WHEN receiving POST /api/sources requests, THE API SHALL validate and register new telemetry sources
  2. WHEN receiving GET /api/sources requests, THE API SHALL return a list of all registered sources with their current status
  3. WHEN receiving GET /api/telemetry/{sourceId} requests, THE API SHALL return recent telemetry values with timestamps
  4. WHEN processing API requests, THE API SHALL return appropriate HTTP status codes and error messages for invalid requests
  5. THE API SHALL support CORS headers for web dashboard integration

Requirement 7: Data Persistence and Reliability

User Story: As a system administrator, I want reliable data storage and system recovery, so that telemetry monitoring continues uninterrupted during system restarts.

Acceptance Criteria

  1. WHEN storing telemetry data, THE Telemetry_System SHALL persist it to both Redis for real-time access and a database for historical analysis
  2. WHEN the system restarts, THE Telemetry_System SHALL automatically resume polling all registered sources
  3. WHEN database connections fail, THE Telemetry_System SHALL continue operating with Redis and attempt database reconnection
  4. THE Telemetry_System SHALL maintain at least 24 hours of telemetry history for analysis
  5. WHEN storage capacity approaches limits, THE Telemetry_System SHALL implement data retention policies to prevent system failure

Requirement 8: Incident JSON Schema Compliance

User Story: As an external incident management system, I want to receive incidents in a specific JSON format, so that I can process them automatically without custom parsing.

Acceptance Criteria

  1. THE Incident_Dispatcher SHALL generate incident JSON with exactly these fields: title, description, severity, status, serviceName, createdBy
  2. WHEN creating incident titles, THE Incident_Dispatcher SHALL use descriptive format like "Hardware [Metric] Critical"
  3. WHEN creating incident descriptions, THE Incident_Dispatcher SHALL include source name, metric values, thresholds, duration, and detected trends
  4. THE Incident_Dispatcher SHALL set severity to one of: "P0", "P1", or "P2" based on anomaly classification
  5. THE Incident_Dispatcher SHALL always set status to "OPEN" and createdBy to "pitwall-ai"

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors