Flight prices exploratory data Analysis Author Ruchika Rawat
In this section I will explore the Flights pricing details and analyse what features of the chosen data set influence the price. I will also present correlation of various components to each other as well as to the target feature - flight price
Why should anyone care about this question? Flight pricing is very competitive, with a large number of carriers in each sector. This kind of analysis could be helpful in finding opportunities to improve revenue and reduce loss of business due to high prices.
What are you trying to answer? What is the correct price for a flight from a given source to destination?
What data will you use to answer you question? I found free dataset on Kaggle which is very popular - https://www.kaggle.com/datasets/nikhilmittal/flight-fare-prediction-mh
What methods are you using to answer the question? For first step tackled in this notebook is exploratory data analysis -
- Drop all rows with null data
- Drop price column from data and store in a target dataframe Y
- Split data into train, test subset
- utilize various plotting and summarization methods in seaborn and matlab to develop an understanding of data
- convert Date_of_Journey from string to datetime
- Compute duration of flight in hours and minutes
- Convert stops from string to int
What did your research find? I plotted various features and concluded as follows
- Using a barplot of flights by airlines - majority data is for jet blue
- Using a barplot of flights by month - all of the data is over a span of just 4 months
- Using a distribution of flight prices - majority of flights have price <10000
- Using a correlation matrix of various features - it appears that the top two contributing factor to price are number of stops and duraiton
- Using a barplot of flights by routes - the most common route found in this data is Delhi - cochin
What suggestions do you have for next steps? Deploy the model as an API and use that to try to predict a price of a flight.