This data set contains 113,937 loans with 83 variables (2 new variables on YearlyIncomeEstimate(derived from StatedMonthlyIncome) and IncomeRange(derived from YearlyIncomeEstimate) to better stratify the data set) on each loan, including loan amount, borrower APR, borrower rate (or interest rate), current loan status, borrower income, and many others. The data set can be found here and feature documentation here
- numpy
- pandas
- matplotlib.pyplot
- seaborn
The modules listed in the section above can be downloaded in the anaconda IDE (recommended software to run the ipynb files) using conda install module_name or the conventional pip install module_name
The recommended way to run the ipynb files is by setting up a virtual environment with conda and running the files in a jupyter notebook. Click here to learn how to set up and manage virtual environments with conda.
The html files that contains all the necessary codes and findings are also available in the main branch
The files in this repo currently have no bugs.
In the exploration, I found that there was a strong relationship between borrower APR and credit grade.
Here are the findings I got from the univariate explorations:
- Most persons are on the 36-month term loan.
- About half of the dataset are current borrowers.
- Most borrowers are from california.
- Most borrowers classified their occupation as "Others" followed by "Professional"
- About 80% of borrowers are employed, About 5% are self-employed, About 1% are Not employed and About 1% are retired.
- Most borrowers have income ranges of USD25,000-49,999 and USD50,000-74,999
- TheStatedMonthlyIncome data has some outliers and most of the stated monthly income is from the range of 0 and 2000.
- The DebtToIncomeRatio data has some outliers and most of the data are within the 0 and 1 range.
- About half of the borrowers are homeowners and the other half aren't
- About half of the borrowers took out a loan for Debt Consolidation.
Here are the findings I got from my bivariate explorations:
- The BorrowerAPR, BorrowerRate, LenderYield have almost perfect positive correlations.
- The borrower's APR has a negative relationship with the Prosper Principal Borrowed
- The borrower APR negatively correlates with the Loan Original Amount
- Generally, the higher the credit grade level, the lower the Borrower APR.
- Compared to Not employed and Other Employment Statuses, persons that are employed, self_employed, and retired, have lower Borrower APR
- Starting from the USD1-24,999 range, the Borrower APR generally decreases with ncreasing income range.
- The low Borrower APR of the USD0 range might be due to that only 0.5% of the total data is in that range.
- The Not Employed range has the highest Borrower APR average in the data set.
- The Borrower APR is generally lower for persons who are homeowners.
- The Borrower APR for the 36 month term is slightly lower than the Borrower APR of the other 2 term
Here are the findings I got from my multivariate explorations:
- There is a positive correlation between the Borrower APR and Original Loan Amount when the data is grouped by credit grade.
- The positive correlation reduces from grade AA to grade E but spikes at grade HR
- The 36-month term generally has lower Borrower APR for the Employed, Retired, and Self-employed Employment Statuses
- The Not employed Employment Status has lower APR for the 60-month loan term probably to offer more time to pay up since that category doesn't have a regular source of income.
- It can also be seen that the credit grade is a more prominent factor in determining the borrower APR as borrowers with higher credit grades will always get lower borrower APR regardless of their income range.
I'll be presenting the trend that answers the question "what factors affect Borrower APR". This will be the flow of visualizations in the presentation:
- Borrower APR Distribution
- Credit Grade Distribution
- Income Range Distribution
- Borrower APR Vs. Credit Grade
- Borrower APR Vs. Income Range
- Borrower APR Vs. Credit Grade & Income Range
The presentation can be found in themainbranch taggedPart_II_slide_deck_template.slides.html
To make a contribution:
- Fork the repo
- Make Changes
- Send your pull request for review
Show Love by giving the Repo a star...😇