Explore a real-world hotel booking dataset using pandas and NumPy.
Run interactive tasks, answer business questions, and extend the notebook with your own analysis.
- 📊 End-to-end exploratory data analysis (EDA) on hotel bookings.
- 🧹 Missing data detection and cleanup (e.g., dropping sparse
companycolumn). - 💵 Revenue-focused metrics: ADR, total stay cost, average spend.
- 🧍 Guest behavior insights: repeat guests, length of stay, children/babies.
- 🌍 Demographics: country distributions and most common last names.
- 📅 Arrival patterns by day of month and day of week.
- Clone the repo -git clone https://github.com/rivu-intel45/hotel-booking-analytics-pandas.git cd hotel-booking-data-analysis
- (Optional) Create a virtual environment -python -m venv venv
- Windows -venv\Scripts\activate 5.macOS / Linux -source venv/bin/activate 6.Install dependencies -pip install -r requirements.txt 7.Launch Jupyter
Then open hotel-data-pandas.ipynb to start exploring.
- Inspect shape, column types, and memory usage of the dataset.
- Detect missing values and identify the column with the most missing data (
company).
- Drop or handle columns with excessive missing data.
- Create helper columns like total stay nights (
weeknights + weekend nights).
- What percentage of bookings are from repeat guests (
isrepeatedguest)? - What is the mean ADR (average daily rate) across all stays?
- What is the average stay length and total revenue per booking?
- Who paid the highest ADR and how much was it?
- Top 5 most common guest last names.
- Top 5 guest country codes.
- How many arrivals happened between the 1st and 15th of each month?
- Arrivals by weekday (Monday, Tuesday, etc.).
Use these as prompts while editing or running the notebook:
-
Change filters to analyze:
- Only resort hotels vs city hotels.
- Specific years (e.g., 2015 vs 2017).
- Bookings from a specific country.
-
Create your own metrics:
- Revenue per night segment (short vs long stays).
- Average ADR per month or season.
- Impact of special requests on total cost.
-
Extend the analysis:
- Visualize distributions (histograms, bar charts, boxplots).
- Compare repeat vs non-repeat guests on key metrics.
-numpy -pandas -matplotlib -seaborn -jupyter
- Add interactive widgets with
ipywidgetsto filter by year, country, or hotel type. - Turn key tasks into functions so others can easily reuse the analysis.