RETAIN.AI | Employee Attrition Predictor

PLATFORM WALKTHROUGH

Strategic AI for Talent Retention & Cost Optimization

Business Impact

Employee turnover is a multi-billion dollar problem. This project provides HR teams with an Intelligence Report that predicts "Flight Risk" with ~70% accuracy, allowing for proactive intervention. By automating the data extraction from resumes and predicting attrition, this tool directly aims to reduce turnover costs and improve organizational stability.

• Core Implementation •

• • • •

• Languages •

Dataset

➤ Human Resources Data Set by Dr. Carla Patalano and Dr. Rich Huebner

Key Features

GenAI Resume Parsing: Uses Google Gemini Pro to extract complex employee metrics from uploaded PDFs, bypassing manual data entry.
Predictive Risk Modeling: Implements a cost-sensitive XGBoost classifier to identify high-risk employees.
Dynamic HR Dashboard: A sleek, dark-themed UI featuring Risk Gauges, Intelligence Reports, and AI-generated Strategy Recommendations.
Automated Intelligence Reports: Generates full-scale PDF/CSV detailed reports, enabling HR leads to transition from raw data to board-ready presentations instantly.
Enterprise-Ready Data Flow: Secure handling of employee records using MongoDB Atlas, ensuring data persistence and historical trend tracking (MDE).

Tech Stack

Category	Technology	Implementation
Frontend	React (Vite), Tailwind	Responsive SPA with a dark-themed HR workspace.
Frontend	Fetch API	Utilizing native browser APIs for asynchronous data fetching and promise-based HTTP requests.
Backend	FastAPI	Building a high-performance, asynchronous REST API with automatic OpenAPI documentation.
Backend	Pydantic	Enforcing strict data validation and type-safe schemas for incoming employee data.
Parsing	PyMuPDF (fitz)	Low-level PDF binary stream extraction before GenAI processing.
Security	CORS Middleware	Orchestrated Cross-Origin Resource Sharing for secure Vercel-to-Render communication.
Database	Motor (Async MongoDB)	Non-blocking database drivers to ensure high-concurrency performance.
AI Engine	Google Gemini Pro	Leveraging Large Language Models (LLMs) for intelligent data extraction from PDF resumes.
ML Model	XGBoost	Deploying Gradient Boosted Decision Trees with cost-sensitive weights for risk classification.
Deployment	Vercel, Render	Distributed cloud hosting with automated CI/CD.

System Architecture

Data Acquisition (Dual Entry):
- Automated Path: User uploads a CV via React; FastAPI orchestrates the file stream to Google Gemini Pro for entity extraction.
- Manual Path: Users can directly input employee metrics into a structured form, bypassing the AI extraction for immediate results.
Standardization & Validation: Both data paths converge at the Pydantic layer, which enforces strict schema validation and type-safety before the data reaches the ML model.
Intelligence Layer (Inference): The validated data is processed by a pre-trained XGBoost model.
Cloud Persistence: Prediction logs and metadata are asynchronously committed to MongoDB Atlas using the Motor driver.
Real-Time Analytics: The dashboard leverages MongoDB Aggregation Pipelines ($match, $group, $avg) to offload heavy computations to the database. This enables real-time tracking of Risk Hotspots (high-risk departments) and Organizational Averages. Data is fetched via the native Fetch API for instant UI synchronization.

Challenges Faced During Development

1. Gemini API Rate-Limit Management

The Problem: During the integration of Google Gemini, the system frequently hit API rate limits despite low usage volume, and the frontend-to-backend data stream was failing to trigger the extraction logic correctly.

The Pivot: - Model Switching: Swapped models to optimize token usage and cost-efficiency.
Robust Debugging: Implemented a comprehensive logging and "Safety Mechanism" layer to catch rate-limit exceptions before they crashed the frontend.

2. Prioritizing Business Impact (From SMOTE to Cost-Sensitivity)

The Problem: Initial attempts to handle dataset imbalance using SMOTE (Synthetic Minority Over-sampling Technique) resulted in lower Precision and Recall, as the synthetic data introduced noise that hindered the model's ability to generalize to real employee behavior.

The Pivot: I pivoted to Cost-Sensitive Learning by tuning the scale_pos_weight parameter & I abandoned the resampled X_train_res dataset in favor of the original, authentic X_train. This forced the model to learn from real-world distributions while penalizing the misclassification of flight risks, significantly improving the model's predictive reliability.

Metric	Old Model (SMOTE)	New Model (Weighted XGBoost)	Impact
Overall Accuracy	67.00%	69.84%	`+2.84%` Improvement
Precision (Class 1)	53.00%	60.00%	Higher reliability in flags
Recall (Class 1)	41.00%	41.00%	Consistent detection rate
Class 0 Recall	80.00%	85.00%	`5%` fewer False Positives
Data Integrity	Synthetic	Organic (Original)	No "hallucinated" data

Local Development Setup

Backend (FastAPI)

Navigate to /backend and create a .env file with your GEMINI_API_KEY and MONGO_URI.
Install dependencies: pip install -r requirements.txt
Run the server: uvicorn main:app --reload

Frontend (React)

Navigate to /frontend and install dependencies: npm install
Start the development server: npm run dev

References

SMOTE • No module error • Deprecated dict in Pydantic • Gemini API quickstart • MongoDB Atlas pipeline stages

Made by Neha K Vallappil • LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.idea		.idea
backend		backend
frontend		frontend
.gitignore		.gitignore
AttritionDemo.gif		AttritionDemo.gif
Employee Attrition.ipynb		Employee Attrition.ipynb
README.md		README.md
attrition_model.pkl		attrition_model.pkl
model_columns.pkl		model_columns.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RETAIN.AI | Employee Attrition Predictor

Strategic AI for Talent Retention & Cost Optimization

Business Impact

• Core Implementation •

• Languages •

Dataset

Key Features

Tech Stack

System Architecture

Challenges Faced During Development

1. Gemini API Rate-Limit Management

2. Prioritizing Business Impact (From SMOTE to Cost-Sensitivity)

Local Development Setup

Backend (FastAPI)

Frontend (React)

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RETAIN.AI | Employee Attrition Predictor

Strategic AI for Talent Retention & Cost Optimization

Business Impact

• Core Implementation •

• Languages •

Dataset

Key Features

Tech Stack

System Architecture

Challenges Faced During Development

1. Gemini API Rate-Limit Management

2. Prioritizing Business Impact (From SMOTE to Cost-Sensitivity)

Local Development Setup

Backend (FastAPI)

Frontend (React)

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages