Skip to content

vinzlercodes/Recommendation-of-Refactoring-Techniques-to-address-Self-Admitted-Technical-Debt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

222 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DSCI_602

This repository is for the course Applied Data Science 2, for its final capstone documentation.

Description

Title: Recommendation of Refactoring Techniques to address Self-Admitted Technical Debt

Authors:

About:

The goal of the project is to support software developers in improving the quality of their code by the recommendation of the appropriate refactoring strategies to address Self-Admitted Technical Debt (SATD). To do so, we are designing and implementing a recommendation model that takes as input of existing SATD comments, and recommends the appropriate refactoring operations that needs to be performed as part of addressing the debt in the comment. Along with that we are also going to be classifying among which SATD comments is refactoring even required.

Requirements

  • Python 3 (3.8+)
  • numpy (1.18.5)
  • pandas (1.0.5)
  • Matplotlib (2.2.4)
  • seaborn (0.11.0)
  • pickle ( 0.7.5)
  • scikit-learn (0.23.1)
  • natural language tool kit (3.5)
  • keras (2.4.0)

Dataset

The data has been collected by the combined effort of 2 open-source tools, SATDBailiff and RefactoringMiner. The SATD-Bailiff program detects SATDs from method comments based on a machine learning model, then tracks the SATDs' span (from their occurrence to resolution). RefactoringMiner, is a Java library/API that detects refactorings applied to a Java project.The main columns of the data are the ‘resolution’, 'v1_comment', 'v2_comment' and 'refactoring_type' columns containing the end result of the refactoring, the comment before refactoring, the comment after refactoring and the refactoring method itself, respectively. We will be working with 10 unique refactoring label classes. The data set of 14156 rows with a unique instance of comments for each corresponding label. .

  • The frequency of each class occuring:

classes

Models used

  • Random Forest Classifier
  • Logistic Regression
  • Support Vector Machine (SVM)
  • Multi Nomial Naive Bayes (MNB)
  • Convolutional neural network (CNN)
  • Long short-term memory (LSTM)

Result

Model F1 score +MLSMOTE Accuracy +MLSMOTE Time
RF 0.73 0.68 0.46 0.36 1.5 min
LR 0.71 0.67 0.40 0.30 9.21 min
SVM 0.66 0.65 0.32 0.33 8.92 min
MNB 0.67 0.66 0.35 0.29 1.2 min
MLP 0.73 0.69 0.46 0.34 33 min
CNN 0.62 0.61 0.73 0.70 7.33 min
LSTM 0.61 0.60 0.71 0.71 125 min

How to Run

  • Clone the project
  • Run the test.py file to see the predicted result based on pickled train models.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors