Skip to content

itsnehakv/Employee-Attrition-App

Repository files navigation

RETAIN.AI | Employee Attrition Predictor




PLATFORM WALKTHROUGH

RetainAI Demo

Strategic AI for Talent Retention & Cost Optimization

Business Impact

Employee turnover is a multi-billion dollar problem. This project provides HR teams with an Intelligence Report that predicts "Flight Risk" with ~70% accuracy, allowing for proactive intervention. By automating the data extraction from resumes and predicting attrition, this tool directly aims to reduce turnover costs and improve organizational stability.


• Core Implementation •

FastAPIReactMongoDB AtlasXGBoostGoogle Gemini

• Languages •

Python JavaScript HTML5 CSS3

Dataset

Human Resources Data Set by Dr. Carla Patalano and Dr. Rich Huebner

Key Features

  • GenAI Resume Parsing: Uses Google Gemini Pro to extract complex employee metrics from uploaded PDFs, bypassing manual data entry.
  • Predictive Risk Modeling: Implements a cost-sensitive XGBoost classifier to identify high-risk employees.
  • Dynamic HR Dashboard: A sleek, dark-themed UI featuring Risk Gauges, Intelligence Reports, and AI-generated Strategy Recommendations.
  • Automated Intelligence Reports: Generates full-scale PDF/CSV detailed reports, enabling HR leads to transition from raw data to board-ready presentations instantly.
  • Enterprise-Ready Data Flow: Secure handling of employee records using MongoDB Atlas, ensuring data persistence and historical trend tracking (MDE).

Tech Stack

Category Technology Implementation
Frontend React (Vite), Tailwind Responsive SPA with a dark-themed HR workspace.
Frontend Fetch API Utilizing native browser APIs for asynchronous data fetching and promise-based HTTP requests.
Backend FastAPI Building a high-performance, asynchronous REST API with automatic OpenAPI documentation.
Backend Pydantic Enforcing strict data validation and type-safe schemas for incoming employee data.
Parsing PyMuPDF (fitz) Low-level PDF binary stream extraction before GenAI processing.
Security CORS Middleware Orchestrated Cross-Origin Resource Sharing for secure Vercel-to-Render communication.
Database Motor (Async MongoDB) Non-blocking database drivers to ensure high-concurrency performance.
AI Engine Google Gemini Pro Leveraging Large Language Models (LLMs) for intelligent data extraction from PDF resumes.
ML Model XGBoost Deploying Gradient Boosted Decision Trees with cost-sensitive weights for risk classification.
Deployment Vercel, Render Distributed cloud hosting with automated CI/CD.

System Architecture

  1. Data Acquisition (Dual Entry):
    • Automated Path: User uploads a CV via React; FastAPI orchestrates the file stream to Google Gemini Pro for entity extraction.
    • Manual Path: Users can directly input employee metrics into a structured form, bypassing the AI extraction for immediate results.
  2. Standardization & Validation: Both data paths converge at the Pydantic layer, which enforces strict schema validation and type-safety before the data reaches the ML model.
  3. Intelligence Layer (Inference): The validated data is processed by a pre-trained XGBoost model.
  4. Cloud Persistence: Prediction logs and metadata are asynchronously committed to MongoDB Atlas using the Motor driver.
  5. Real-Time Analytics: The dashboard leverages MongoDB Aggregation Pipelines ($match, $group, $avg) to offload heavy computations to the database. This enables real-time tracking of Risk Hotspots (high-risk departments) and Organizational Averages. Data is fetched via the native Fetch API for instant UI synchronization.

Challenges Faced During Development

1. Gemini API Rate-Limit Management

The Problem: During the integration of Google Gemini, the system frequently hit API rate limits despite low usage volume, and the frontend-to-backend data stream was failing to trigger the extraction logic correctly.

  • The Pivot: - Model Switching: Swapped models to optimize token usage and cost-efficiency.
  • Robust Debugging: Implemented a comprehensive logging and "Safety Mechanism" layer to catch rate-limit exceptions before they crashed the frontend.

2. Prioritizing Business Impact (From SMOTE to Cost-Sensitivity)

The Problem: Initial attempts to handle dataset imbalance using SMOTE (Synthetic Minority Over-sampling Technique) resulted in lower Precision and Recall, as the synthetic data introduced noise that hindered the model's ability to generalize to real employee behavior.

  • The Pivot: I pivoted to Cost-Sensitive Learning by tuning the scale_pos_weight parameter & I abandoned the resampled X_train_res dataset in favor of the original, authentic X_train. This forced the model to learn from real-world distributions while penalizing the misclassification of flight risks, significantly improving the model's predictive reliability.
Metric Old Model (SMOTE) New Model (Weighted XGBoost) Impact
Overall Accuracy 67.00% 69.84% +2.84% Improvement
Precision (Class 1) 53.00% 60.00% Higher reliability in flags
Recall (Class 1) 41.00% 41.00% Consistent detection rate
Class 0 Recall 80.00% 85.00% 5% fewer False Positives
Data Integrity Synthetic Organic (Original) No "hallucinated" data

Local Development Setup

Backend (FastAPI)

  1. Navigate to /backend and create a .env file with your GEMINI_API_KEY and MONGO_URI.
  2. Install dependencies: pip install -r requirements.txt
  3. Run the server: uvicorn main:app --reload

Frontend (React)

  1. Navigate to /frontend and install dependencies: npm install
  2. Start the development server: npm run dev

References

SMOTENo module errorDeprecated dict in PydanticGemini API quickstartMongoDB Atlas pipeline stages


Made by Neha K VallappilLinkedIn

About

A full-stack HR predictive analytics platform using FastAPI and React. It utilizes a weighted XGBoost machine learning model to identify high-risk employee attrition with 70% accuracy, featuring automated CV data extraction via Google Gemini AI and a real-time analytics dashboard.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors