DSCI_602

This repository is for the course Applied Data Science 2, for its final capstone documentation.

Description

Title: Recommendation of Refactoring Techniques to address Self-Admitted Technical Debt

Authors:

Abdullah A Alsaleh (aa6304@rit.edu)
Vinayak Sengupta (vs4016@rit.edu)
Mohamed Wiem Mkaouer - Project Advisor (mwmvse@rit.edu)

About:

The goal of the project is to support software developers in improving the quality of their code by the recommendation of the appropriate refactoring strategies to address Self-Admitted Technical Debt (SATD). To do so, we are designing and implementing a recommendation model that takes as input of existing SATD comments, and recommends the appropriate refactoring operations that needs to be performed as part of addressing the debt in the comment. Along with that we are also going to be classifying among which SATD comments is refactoring even required.

Requirements

Python 3 (3.8+)
numpy (1.18.5)
pandas (1.0.5)
Matplotlib (2.2.4)
seaborn (0.11.0)
pickle ( 0.7.5)
scikit-learn (0.23.1)
natural language tool kit (3.5)
keras (2.4.0)

Dataset

The data has been collected by the combined effort of 2 open-source tools, SATDBailiff and RefactoringMiner. The SATD-Bailiff program detects SATDs from method comments based on a machine learning model, then tracks the SATDs' span (from their occurrence to resolution). RefactoringMiner, is a Java library/API that detects refactorings applied to a Java project.The main columns of the data are the ‘resolution’, 'v1_comment', 'v2_comment' and 'refactoring_type' columns containing the end result of the refactoring, the comment before refactoring, the comment after refactoring and the refactoring method itself, respectively. We will be working with 10 unique refactoring label classes. The data set of 14156 rows with a unique instance of comments for each corresponding label. .

The frequency of each class occuring:

Models used

Random Forest Classifier
Logistic Regression
Support Vector Machine (SVM)
Multi Nomial Naive Bayes (MNB)
Convolutional neural network (CNN)
Long short-term memory (LSTM)

Result

Model	F1 score	+MLSMOTE	Accuracy	+MLSMOTE	Time
RF	0.73	0.68	0.46	0.36	1.5 min
LR	0.71	0.67	0.40	0.30	9.21 min
SVM	0.66	0.65	0.32	0.33	8.92 min
MNB	0.67	0.66	0.35	0.29	1.2 min
MLP	0.73	0.69	0.46	0.34	33 min
CNN	0.62	0.61	0.73	0.70	7.33 min
LSTM	0.61	0.60	0.71	0.71	125 min

How to Run

Clone the project
Run the test.py file to see the predicted result based on pickled train models.

Name		Name	Last commit message	Last commit date
Latest commit History 222 Commits
.idea		.idea
Data		Data
MSR 2022		MSR 2022
doc/Pdoc		doc/Pdoc
models		models
src		src
website		website
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DSCI_602

Description

Requirements

Dataset

Models used

Result

How to Run

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DSCI_602

Description

Requirements

Dataset

Models used

Result

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages