This project is a Resume Classification and Ranking System that processes multiple resumes in PDF format, classifies them into predefined job categories, and ranks them based on relevance to a specified job role.
- Extracts text from resumes in PDF format.
- Cleans and preprocesses text using regex.
- Tokenizes and sequences text for model input.
- Uses a deep learning model (CNN + LSTM) to classify resumes.
- Ranks resumes based on their softmax probability score for a given job role.
- Normalizes scores and sorts resumes in descending order of relevance.
Ensure you have the following Python libraries installed:
pip install numpy pandas tensorflow scikit-learn PyPDF2The model is trained using the UpdatedResumeDataSet.csv, which contains resumes and their corresponding job categories.
- Embedding Layer: Converts words into dense vectors.
- Conv1D Layer: Captures local dependencies in text.
- MaxPooling1D Layer: Reduces dimensionality.
- LSTM Layer: Extracts long-term dependencies.
- Dropout Layer: Prevents overfitting.
- Dense Layer with Softmax Activation: Outputs probability distribution across job categories.
- Place resumes in the
Resumesfolder. - Load the pre-trained model weights (
deeprank_model.h5). - Run the script to classify and rank resumes for a given job role.
- Example:
pdf_folder = "Resumes"
job_role = "Data Science"
ranked_resumes = process_resumes(pdf_folder, job_role)
print(ranked_resumes)The script returns a sorted DataFrame containing:
- Resume file name
- Job role probability score
- Normalized score (0-1 scale)
This project is licensed under the MIT License.
Nilesh Ranjan Pal