AI-Powered Cloud Infrastructure Anomaly Detection & Automated Remediation
A full-stack, cloud-native platform that uses a Graph Neural Network (GNN) to detect anomalies across distributed AWS infrastructure in real time, explain why an anomaly occurred using XAI, and trigger automated remediation.
- Project Overview
- System Architecture
- Tech Stack
- ML/DL Models
- Cloud Services Used
- Microservices Breakdown
- System Workflow (End-to-End)
- Frontend Pages & Components
- API Endpoints
- Infrastructure as Code
- Directory Structure
- Getting Started
CloudAutomationGNN monitors live AWS cloud resources (EC2, Lambda, RDS, ELB, S3) by continuously ingesting CloudWatch metrics. These metrics are modelled as a graph β each cloud resource is a node, interconnected by service dependencies β and a trained GraphSAGE GNN classifies every node as normal or anomalous.
When an anomaly is detected:
- The system identifies why (SHAP + GNNExplainer XAI).
- It publishes a real-time alert to the frontend via WebSocket.
- Automated remediation actions are triggered and logged.
- A natural-language explanation is generated by Google Gemini (via LangChain RAG).
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER BROWSER (React) β
β Dashboard β Alerts β XAI Panel β Automation Logs β AI Chat β
βββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββ
β REST API + WebSocket (WSS)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Backend (AWS Lambda via Serverless) β
β β
β βββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββββββ β
β β REST API β β Event Processor β β WebSocket Handlers β β
β β (Express) β β (EventBridge/SQS)β β (connect/disconnect/ β β
β β β β β β SNSβWS subscriber) β β
β ββββββββ¬βββββββ ββββββββββ¬ββββββββββ ββββββββββββββββββββββββββββ β
ββββββββββββΌβββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ
β β
β HTTP β HTTP
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Python FastAPI Backend (Docker / ECS) β
β β
β ββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
β β GNN Inference β β XAI Service β β LLM / RAG Service β β
β β (GraphSAGE) β β (SHAP + β β (Gemini + LangChain + β β
β β β β GNNExplainerβ β Pinecone + HuggingFace)β β
β ββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AWS Cloud Services β
β β
β CloudWatch β EventBridge β SQS β Lambda (eventProcessor) β
β DynamoDB (anomalies + WS connections) β
β SNS β Lambda (snsToWebSocket) β API Gateway WebSocket β Browser β
β S3 (GNN model artifacts + frontend hosting) β
β MongoDB Atlas (users, events, anomalies, conversations) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
| Layer | Technology |
|---|---|
| Frontend | React (Vite), Recharts, D3/Force Graph, CSS, Socket.IO |
| Node Backend | Node.js, Express, Serverless Framework, AWS Lambda |
| Python Backend | FastAPI, PyTorch Geometric, SHAP, LangChain, Docker |
| ML Model | GraphSAGE (PyTorch Geometric) |
| XAI | GNNExplainer, SHAP KernelExplainer |
| LLM / RAG | Google Gemini 2.5, LangChain (LCEL), HuggingFace Embeddings |
| Vector DB | Pinecone |
| App Database | MongoDB Atlas (Mongoose ODM) |
| AWS Services | Lambda, API Gateway, EventBridge, SQS, SNS, DynamoDB, CloudWatch, S3 |
| IaC | Terraform (AWS provider ~5.70), Serverless Framework v3+ |
| Auth | JWT (Access + Refresh tokens), HTTP-only cookies |
File: Python-Backend/app/services/gnn_model.py | Trained: ML/train_gnn.py
GraphSAGE (Graph Sample and Aggregate) is a graph neural network that learns node representations by aggregating features from a node's local neighbourhood. It is ideal for this use-case because cloud resources are inherently relational β an EC2 instance's anomalous behaviour is often caused by or propagated from upstream Lambda functions or RDS databases.
Architecture:
Input (5 node features)
β SAGEConv Layer 1 (64 units) β ReLU β Dropout(0.3)
β SAGEConv Layer 2 (32 units) β ReLU β Dropout(0.3)
β SAGEConv Layer 3 ( 1 unit ) β Sigmoid
Output: per-node anomaly probability β [0, 1]
Node features (5 per node):
| Feature | Description |
|---|---|
cpu |
CPU utilization (%) |
memory |
Memory utilization (%) |
latency |
Request latency (ms) |
error_rate |
Fraction of failed requests |
request_count |
Requests per minute |
Training details:
- Dataset: Synthetic cloud infrastructure graphs (
ML/data_generator.py) - Loss: Binary Cross-Entropy (
BCELoss) with positive class weighting for imbalanced anomaly labels - Optimizer: Adam (lr=0.01, weight_decay=5e-4)
- Epochs: 200
- Threshold: 0.5 β ANOMALY
- Metrics: Accuracy + F1 Score
- Saved model:
ML/models/gnn_model.pt(uploaded to S3 for inference)
Severity thresholds (score β label):
| Score Range | Severity |
|---|---|
| β₯ 0.85 | CRITICAL |
| β₯ 0.65 | HIGH |
| β₯ 0.35 | MEDIUM |
| < 0.35 | LOW |
File: Python-Backend/app/services/xai_service.py
GNNExplainer (from torch_geometric.explain) runs 200 optimisation epochs to identify which features and edges (neighbouring nodes) most contributed to the GNN's anomaly prediction for a specific node.
- Outputs a feature mask
[N, F]β importance weight per feature per node - Outputs an edge mask
[E]β importance weight per edge - Edges with mask value β₯ 0.5 form the important subgraph shown in the cascade path
File: Python-Backend/app/services/xai_service.py
SHAP (SHapley Additive exPlanations) KernelExplainer treats the GNN as a black-box function and estimates each feature's marginal contribution to the anomaly score.
- Background baseline: mean feature vector of all graph nodes (representing a healthy node)
- Samples: 64 perturbations per explanation
- Fallback: If model outputs are degenerate (near-zero variance), attribution reverts to
|z-score|deviation from training baseline β always produces meaningful, distinct values - Final importance:
sqrt(SHAP Γ GNNExplainer mask)β a geometric blend of both methods
File: Python-Backend/app/services/llm_service.py
When a user requests an explanation of an anomaly, the SHAP scores and GNN output are fed into a LangChain RAG pipeline:
- Retrieval: HuggingFace
sentence-transformers/all-MiniLM-L6-v2embeds the query and retrieves relevant cloud runbook documents from Pinecone vector store - Generation: Google Gemini 2.5 generates a natural-language explanation using the retrieved context + anomaly details
- Pipeline routing: LCEL (LangChain Expression Language) routes queries to
Smart_Chat,Document_Analysis,Analytical_Insights, orGeneral_Conversationpipelines based onfeature_mode
| Service | Purpose |
|---|---|
| AWS Lambda | Serverless compute for REST API, Event Processor, and WebSocket handlers |
| API Gateway (HTTP) | Exposes all REST endpoints (/anomalies, /events, /graph, etc.) |
| API Gateway (WebSocket) | Real-time bidirectional communication channel to browser clients |
| EventBridge | Rule-based event bus β triggers on CloudWatch Alarm State Change and periodic metric-poll schedule |
| SQS | Decoupled event queue between EventBridge and the eventProcessor Lambda (batch window: 30s, batch size: 10) |
| SQS Dead Letter Queue | Captures failed events (max 3 receive attempts, retained 14 days) |
| SNS | Publishes anomaly alerts; the snsToWebSocket Lambda subscribes and fans out to all WebSocket connections |
| DynamoDB | Stores: (1) GNN anomaly results keyed by partitionKey/sortKey, (2) WebSocket connectionId records with TTL |
| CloudWatch | Source of alarm state-change events; DescribeAlarms API polled directly for live alerts |
| S3 | (1) Model artifact bucket (versioned, AES-256 encrypted, private) β hosts gnn_model.pt; (2) Frontend static hosting bucket (React SPA) |
| IAM | Fine-grained Lambda execution roles for SNS publish, DynamoDB CRUD, EventBridge emit, SQS consume, CloudWatch read, API Gateway connections |
| Service | Purpose |
|---|---|
| MongoDB Atlas | Primary application database β stores Users, Events, Anomalies, Conversations, Documents |
| Pinecone | Vector database for RAG document retrieval (index: cksfinbot, dynamic namespace per uploaded document) |
| Google AI (Gemini) | gemini-2.5-flash-lite LLM for natural language anomaly explanations |
| HuggingFace | all-MiniLM-L6-v2 embedding model (runs locally in Python container) |
The system is composed of 5 Lambda functions and 1 Python FastAPI service:
Handler: lambda.js β src/app.js
Trigger: API Gateway HTTP (ANY *)
Memory: 512 MB | Timeout: 29s
Express.js application exposed as a Lambda function via serverless-http. Handles all client-facing REST operations.
Routes & Controllers:
| Route | Controller | Purpose |
|---|---|---|
POST /users/register |
user.controller |
Create user account |
POST /users/login |
user.controller |
Authenticate, issue JWT cookies |
GET /anomalies |
anomaly.controller |
Paginated anomaly list (DB + live CloudWatch merge) |
GET /anomalies/:id/explain |
anomaly.controller |
GNN + SHAP explanation for a specific anomaly |
PATCH /anomalies/:id/resolve |
anomaly.controller |
Mark anomaly as resolved |
GET /anomalies/stats |
anomaly.controller |
Severity distribution stats |
GET /events |
events.controller |
Paginated cloud metric events |
GET /graph |
graph.controller |
Resource graph topology for D3 visualisation |
POST /automation/trigger |
automation.controller |
Trigger GNN-recommended remediation |
GET /automation/logs |
automation.controller |
Automation action history |
POST /conversation |
conversation.controller |
Initiate RAG chat with LLM |
POST /document/upload |
document.controller |
Upload runbook to S3 + index in Pinecone |
GET /s3/signed-url |
s3.controller |
Generate pre-signed S3 URL for file access |
Handler: src/eventProcessor.js
Triggers: EventBridge (cloud-metrics-rule) + SQS batch
Memory: 256 MB | Timeout: 60s
The core real-time data pipeline. Each time an EventBridge event or SQS message arrives:
- Parse the raw event (CloudWatch Alarm or custom
CloudMetricEvent) - Persist the metric event to MongoDB (hydrates the frontend graph)
- Forward to Python GNN service (
POST /predict/single) - If anomaly detected β persist to DynamoDB + MongoDB
- For
critical/highβ publish to SNS
Handler: src/Handlers/websocketHandlers.connect
Trigger: WebSocket $connect route
Stores the new connectionId in the cloud-automation-ws-connections-dev DynamoDB table with a TTL so stale connections are automatically cleaned up.
Handler: src/Handlers/websocketHandlers.disconnect
Trigger: WebSocket $disconnect route
Removes the connectionId record from DynamoDB when a browser disconnects.
Handler: src/Handlers/snsSubscriber.handler
Trigger: SNS topic subscription
When SNS receives a critical anomaly notification:
- Scans DynamoDB for all active WebSocket connection IDs
- Uses
@aws-sdk/client-apigatewaymanagementapito POST the anomaly payload to every connected browser simultaneously
Entry: Python-Backend/app/main.py
Deployment: Docker container (Dockerfile included)
Base URL: configured via PYTHON_SERVICE_URL env var
| Endpoint | Description |
|---|---|
POST /predict/single |
Run GNN inference on one cloud node |
POST /predict/graph |
Run GNN inference on a full resource graph |
POST /explain |
Generate SHAP + GNNExplainer XAI explanation |
POST /chat |
Invoke LangChain RAG with Gemini LLM |
POST /document/process |
Chunk and embed uploaded documents into Pinecone |
Internal services:
| File | Responsibility |
|---|---|
gnn_model.py |
GraphSAGE model definition + load_model() helper |
graph_builder.py |
Convert raw node/edge lists β PyTorch Geometric Data object |
gnn_inference.py |
Load model from S3/disk + run forward pass |
xai_service.py |
GNNExplainer + SHAP KernelExplainer orchestration |
explanation_builder.py |
Format XAI outputs into frontend-ready JSON |
llm_service.py |
LangChain LCEL pipelines (Smart Chat, Document Analysis, etc.) |
document_processor.py |
PDF/text chunking and embedding for Pinecone indexing |
s3_service.py |
Upload/download model artifacts from S3 |
pinecone_service.py |
Pinecone client initialisation |
1. AWS CloudWatch detects a metric threshold breach (e.g., EC2 CPU > 90%)
βββΊ Emits CloudWatch Alarm State Change event
2. EventBridge rule (cloud-metrics-rule) matches the event
βββΊ Routes to SQS event queue (OR directly triggers Lambda 2)
3. Lambda 2 (eventProcessor) processes the SQS batch
βββΊ Parses & normalises the CloudMetricEvent payload
βββΊ Saves Event record to MongoDB (graph hydration)
βββΊ Calls Python FastAPI: POST /predict/single
βββΊ graph_builder.py builds PyG Data object (normalised features + self-loop edges)
βββΊ GraphSAGE.forward() β anomaly_score β [0,1]
βββΊ Returns { is_anomaly: true, anomaly_score: 0.91, ... }
4. If is_anomaly == true:
βββΊ Persist anomaly to DynamoDB (7-day TTL)
βββΊ Persist anomaly to MongoDB (UI stats)
βββΊ If severity β {critical, high}:
βββΊ SNS.publish(anomaly alert)
5. SNS triggers Lambda 5 (snsToWebSocket)
βββΊ Scan DynamoDB for all active WS connectionIds
βββΊ POST anomaly payload to each connected browser via APIGW Execution API
6. Browser receives WebSocket message β React updates Dashboard in real time
βββΊ ResourceGraph re-renders with anomalous node highlighted (red)
βββΊ AlertCard appears in Alerts page
βββΊ MetricsChart spikes visible
1. User clicks "Explain" on an alert card in the browser
2. Browser: GET /anomalies/:id/explain β Node.js Lambda 1
3. Node.js Lambda:
βββΊ If alarm ID starts with "aws-alarm-":
β βββΊ Derive attack-specific metrics from alarm name pattern
β βββΊ Compute local metric-deviation SHAP (baseline comparison)
βββΊ POST /explain to Python FastAPI with node metrics + edges
4. Python FastAPI /explain:
βββΊ graph_builder.py β builds PyG Data from request payload
βββΊ gnn_model.load_model() β loads GraphSAGE from disk
β
βββΊ Step 1: GNNExplainer (200 epochs of mask optimisation)
β βββΊ Outputs feature mask [N, F] + edge mask [E]
β βββΊ Identifies which neighbouring nodes matter (important subgraph)
β
βββΊ Step 2: SHAP KernelExplainer (64 perturbation samples)
β βββΊ model_predict() isolates target node feature, runs full graph GNN
β βββΊ Detects degenerate output (variance < 1e-6) β falls back to |z-score|
β βββΊ Blends SHAP Γ GNNExplainer mask: importance = sqrt(shap * gnn_mask)
β
βββΊ Returns { feature_importance, important_nodes, important_edges }
5. Node.js Lambda:
βββΊ Checks if Python SHAP is degenerate (all equal values) β uses local SHAP
βββΊ Formats response: { shapValues, cascadePath, nlExplanation, actionStatus }
βββΊ Caches explanation to MongoDB anomaly record
6. Browser XAI Panel displays:
βββΊ SHAP bar chart (feature importances)
βββΊ Cascade path (important subgraph nodes)
βββΊ Natural language explanation
1. User types a question in the AI Chat panel
2. Browser: POST /conversation β Node.js β Python FastAPI POST /chat
3. Python FastAPI:
βββΊ HuggingFace all-MiniLM-L6-v2 embeds the question
βββΊ Pinecone retrieves top-k relevant runbook chunks (dynamic k = 15% of namespace size)
βββΊ LangChain LCEL chain: context β PromptTemplate β Gemini 2.5 β StrOutputParser
βββΊ Returns natural language answer
4. Browser: streams response into chat UI
1. Browser connects to WSS:// (API Gateway WebSocket URL)
βββΊ Lambda 3 (wsConnect): DynamoDB.PutItem(connectionId, TTL)
2. Browser stays connected β receives real-time anomaly pushes (Workflow 1, step 5)
3. Browser disconnects (page close / network loss)
βββΊ Lambda 4 (wsDisconnect): DynamoDB.DeleteItem(connectionId)
1. Generate synthetic cloud graph data
βββΊ python ML/data_generator.py β ML/data/graph_data.pt
2. Train GraphSAGE
βββΊ python ML/train_gnn.py
βββΊ 200 epochs, Adam, BCELoss with class weighting
βββΊ Saves best checkpoint: ML/models/gnn_model.pt
βββΊ Saves loss curve: ML/reports/training_loss.png
3. Upload model artifact to S3
βββΊ python ML/upload_model.py β s3://cloud-automation-gnn-model-dev/gnn_model.pt
4. Python API loads model at startup from S3 (or local disk for dev)
| Page | Route | Description |
|---|---|---|
LandingPage |
/ |
Marketing landing page with animated background |
LoginPage |
/login |
JWT auth login form |
SignupPage |
/signup |
User registration with email verification |
DashboardPage |
/dashboard |
Main monitoring hub β graph + metrics |
AlertsPage |
/alerts |
Paginated anomaly alert list with severity tabs |
| Component | Description |
|---|---|
ResourceGraph |
D3 force-directed graph of cloud nodes with live anomaly highlighting |
MetricsChart |
Recharts time-series chart for CPU/Memory/Latency/Error Rate |
XAIPanel |
SHAP bar chart + cascade path + NL explanation panel |
AlertCard |
Individual anomaly card with severity badge + Explain button |
AutomationLog |
Real-time log of automated remediation actions |
WelcomeScreen |
AI-powered welcome after login |
NodeDetailModal |
Click-through modal with per-node live metrics |
ChatInput / Message |
AI chat interface with Markdown rendering |
EnhancedFileUpload |
Drag-and-drop document upload for RAG indexing |
- Primary: WebSocket connection to API Gateway WSS endpoint (
src/services/socket.js) - Fallback: HTTP polling every 30s if WebSocket is unavailable
- Events multiplexed:
anomaly:new,graph:update,automation:log,metrics:update
Auth
POST /users/register Register new user
POST /users/login Login (returns JWT cookies)
POST /users/logout Clear auth cookies
GET /users/me Get current user profile
Anomalies
GET /anomalies List anomalies (paginated, filtered)
GET /anomalies/stats Severity distribution stats
GET /anomalies/:id Single anomaly detail
GET /anomalies/:id/explain GNN + SHAP + NL explanation
PATCH /anomalies/:id/resolve Mark anomaly as resolved
Events (Cloud Metric Events)
GET /events Paginated cloud metric event log
Graph
GET /graph Resource graph topology (nodes + edges)
Automation
POST /automation/trigger Trigger automated remediation
GET /automation/logs Remediation action log
Conversations (RAG Chat)
POST /conversation Start / continue AI chat session
GET /conversation/:id Retrieve chat history
Documents (RAG Indexing)
POST /document/upload Upload + embed document in Pinecone
GET /s3/signed-url Generate S3 pre-signed download URL
POST /predict/single GNN inference (1 node)
POST /predict/graph GNN inference (full graph)
POST /explain XAI: SHAP + GNNExplainer
POST /chat LangChain RAG + Gemini
POST /document/process Chunk + embed document β Pinecone
Manages core, long-lived AWS resources:
| Resource | Description |
|---|---|
aws_s3_bucket.model_bucket |
GNN model artifacts (versioned + AES-256 encrypted) |
aws_s3_bucket.frontend_bucket |
React SPA static hosting |
aws_sqs_queue.event_queue |
Main event queue (long-polling, 1-day retention) |
aws_sqs_queue.event_dlq |
Dead-letter queue (14-day retention, max 3 retries) |
aws_cloudwatch_event_rule.cloudwatch_alarm_rule |
EventBridge rule: CloudWatch ALARM state changes |
aws_cloudwatch_event_rule.metric_poll_rule |
Periodic metric poll schedule |
aws_cloudwatch_event_target.alarm_to_sqs |
Route EventBridge β SQS |
aws_dynamodb_table.anomaly_table |
Anomaly results (GSI on resourceId + severity) |
aws_dynamodb_table.automation_log_table |
Automation action history |
aws_cloudwatch_log_group |
Lambda log groups with retention policies |
Manages Lambda functions and ephemeral AWS resources:
| Resource | Description |
|---|---|
WsConnectionsTable |
DynamoDB: active WebSocket connection IDs (PAY_PER_REQUEST, TTL-enabled) |
CloudAutomationSnsTopic |
SNS topic for anomaly alerts |
GnnDataTable |
DynamoDB: GNN result store (composite key) |
Plugins: serverless-esbuild (tree-shaking + source maps), serverless-offline (local dev on port 5000)
CloudAutomationGNN/
β
βββ Frontend/ # React (Vite) SPA
β βββ src/
β βββ JSX/
β β βββ Pages/ # LandingPage, Dashboard, Alerts, Login, Signup
β β βββ Components/ # ResourceGraph, MetricsChart, XAIPanel, AlertCard...
β βββ services/ # socket.js (WebSocket + polling client)
β βββ Config/ # API base URL config
β
βββ Node-Backend/ # Node.js Lambda (Serverless Framework)
β βββ src/
β β βββ Controllers/ # anomaly, automation, graph, events, user, s3...
β β βββ Routes/ # Express route definitions
β β βββ Models/ # Mongoose schemas (User, Event, Anomaly, Conversation)
β β βββ Handlers/ # WebSocket handlers + SNS subscriber Lambda
β β βββ Middlewares/ # JWT auth middleware
β β βββ Utils/ # ApiError, ApiResponse, AsyncHandler
β β βββ db/ # MongoDB Atlas connection
β β βββ eventProcessor.js # Lambda 2: EventBridge/SQS β GNN β DynamoDB/SNS
β β βββ app.js # Express app factory
β βββ serverless.yml # 5 Lambda function definitions + IAM + resources
β βββ lambda.js # serverless-http adapter
β
βββ Python-Backend/ # FastAPI GNN + XAI + LLM service
β βββ app/
β β βββ api/ # FastAPI route handlers
β β βββ core/ # Config, model loader (singleton)
β β βββ schemas/ # Pydantic request/response models
β β βββ services/
β β β βββ gnn_model.py # GraphSAGE architecture
β β β βββ graph_builder.py# Raw metrics β PyG Data object
β β β βββ gnn_inference.py# Model loading + forward pass
β β β βββ xai_service.py # GNNExplainer + SHAP orchestration
β β β βββ llm_service.py # LangChain LCEL pipelines (Gemini)
β β β βββ document_processor.py # PDF chunking + Pinecone indexing
β β β βββ s3_service.py # S3 model artifact I/O
β β βββ prompts.py # LangChain prompt templates
β β βββ main.py # FastAPI app entry point
β βββ Dockerfile # Container definition for Python service
β
βββ ML/ # Offline training pipeline
β βββ data_generator.py # Synthetic cloud graph data generator
β βββ train_gnn.py # GraphSAGE training script
β βββ evaluate.py # Model evaluation utilities
β βββ upload_model.py # Upload gnn_model.pt to S3
β βββ generate_fake_metrics.py# Fake metric generation for testing
β βββ data/ # graph_data.pt (generated)
β βββ models/ # gnn_model.pt (trained checkpoint)
β βββ reports/ # training_loss.png
β
βββ infra/ # Terraform IaC
β βββ main.tf # Core AWS resources (S3, SQS, DynamoDB, EventBridge)
β βββ lambda.tf # Lambda-specific Terraform resources
β βββ variables.tf # Input variable definitions
β
βββ Database/ # MongoDB schema reference / seed scripts
- Node.js β₯ 20, Python β₯ 3.10, Docker
- AWS CLI configured (
aws configure) - Terraform β₯ 1.6
- Serverless Framework (
npm i -g serverless) - MongoDB Atlas connection URI
- Pinecone API key, Google AI API key
cd infra
terraform init
terraform applycd ML
pip install -r requirements.txt
python data_generator.py
python train_gnn.py
python upload_model.pycd Python-Backend
pip install -r requirements.txt
uvicorn app.main:app --host 0.0.0.0 --port 8000
# Or with Docker:
docker build -t cloud-gnn-python .
docker run -p 8000:8000 --env-file .env cloud-gnn-pythoncd Node-Backend
cp .env.example .env # fill in your values
npm install
npx serverless offline
# API available at http://localhost:5000cd Frontend
npm install
npm run dev
# App available at http://localhost:5173# Deploy Node.js Lambdas
cd Node-Backend
npx serverless deploy --stage dev
# Deploy Frontend to S3
# (see .agents/workflows/deploy-frontend.md)CloudAutomationGNN β 6th Semester Cloud Computing Project
Built with PyTorch Geometric, AWS Serverless, and a healthy obsession with graph theory.