I have used CRISP-DM process during this analysis.
- Business Understanding - Started analysis with posed questions in mind.
- Data Understanding - To better understand the data, I started going through the dataset and noted points as how to use it for my analysis. For example: which columns will be helpful to answer a particular questions?
- Prepare Data - At various points, I have to do data wrangling and perform data transformation to achieve the results. Keeping DRY techniques in mind, I have also created a function to draw plotly barchart as this code was repeating often.
- Model Data - My analysis does not involve modeling step. I might add this in my future work.
- Results - I am using visualizations like barchart and piecharts to convey my findings, also added result statements at the end of every visualization for easy understanding of thought process.
- Deploy - I am not deploying this code anywhere right now. For now, it is available in jupyter notebook form only.
Install plotly first, after that there should be no necessary libraries to run the code here beyond the Anaconda distribution of Python. The code should run with no issues using Python versions 3.*.
For this project, I was interestested in using Stack Overflow data from 2017 to better understand:
- How other developers suggested breaking into the field (what education to pursue)?
- What factors about an individual contributed to salary?
- What was the state of bootcamps for assisting individuals with breaking into developer roles?
- How were bootcamps assisting with increasing diversity in tech careers?
- According to EmploymentStatus, which group has the highest average Career satisfaction?
There is one notebook available here to showcase work related to the above questions. This notebook is exploratory in searching through the data pertaining to the questions showcased by the notebook title. Markdown cells were used to assist in walking through the thought process for individual steps.
Data files : Download
The main findings of the code can be found at the Medium post available here.
Must give credit to Stack Overflow for the data. You can find the Licensing for the data and other descriptive information at the Kaggle link available here. Otherwise, feel free to use the code here as you would like!