This project analyzes historical Walmart weekly sales data to understand sales patterns, seasonality, and the impact of holidays. The objective is to build an end-to-end data science workflow that includes exploratory data analysis, statistical insights, and time series forecasting to predict sales for the next 12 weeks.
The project demonstrates how time series forecasting techniques can be applied in a real-world retail business context to support demand planning and operational decision-making.
- Dataset: Walmart Weekly Sales Data
- Time Period: February 2010 – October 2012
- Total Records: 6,435
- Total Stores: 45
Store– Store identifierDate– Week ending dateWeekly_Sales– Weekly sales amountHoliday_Flag– Holiday indicatorTemperature– Average weekly temperatureFuel_Price– Fuel priceCPI– Consumer Price IndexUnemployment– Unemployment rate
- Programming Language: Python
- Data Analysis: Pandas, NumPy
- Visualization: Matplotlib, Seaborn
- Statistics & EDA: Descriptive statistics, correlation analysis
- Time Series Forecasting: Statsmodels
- Model Used: Holt-Winters Exponential Smoothing
- Environment: Google Colab / Jupyter Notebook
The dataset was explored to understand its structure, quality, and underlying patterns.
- Checked data types and missing values
- Converted date column to datetime format
- Generated statistical summaries for numerical features
- Visualized sales distribution using histograms and boxplots
- Identified valid high-value outliers
- No missing values were found in the dataset
- Weekly sales distribution is right-skewed
- High sales outliers are associated with holidays and large stores
- Outliers were retained as they represent genuine business events
Average weekly sales were compared between holiday and non-holiday weeks.
- Non-holiday average sales: ~1.04 million
- Holiday average sales: ~1.12 million
Holiday weeks show approximately 8% higher sales, confirming the strong impact of holidays on retail demand.
Correlation analysis was conducted between weekly sales and external variables such as temperature, fuel price, CPI, and unemployment.
- Weekly sales have weak correlation with macroeconomic variables
- Slight negative correlation with unemployment and CPI
- Sales are primarily driven by seasonal and store-level factors
Store-level time series plots revealed clear seasonal patterns in weekly sales, with recurring peaks during holiday periods. This confirmed the suitability of seasonal time series models for forecasting.
Holt-Winters Exponential Smoothing was applied to model trend and seasonality in weekly sales data.
- Built individual models for each of the 45 stores
- Used a seasonal period of 52 weeks
- Generated 12-week sales forecasts for all stores
The forecasts indicate increased sales during upcoming holiday periods and provide valuable insights for short-term planning.
- Successfully generated store-wise 12-week sales forecasts
- Identified strong seasonality and holiday-driven demand
- Insights support:
- Inventory optimization
- Workforce planning
- Promotion scheduling
This project demonstrates a complete end-to-end data science workflow for retail sales forecasting. The Holt-Winters model effectively captured seasonal patterns and trends in Walmart sales data, making it a reliable approach for short-term demand forecasting.
The results can help retail businesses make data-driven decisions to improve operational efficiency and customer satisfaction.
- Perform backtesting to evaluate forecast accuracy
- Compare Holt-Winters with ARIMA or machine learning models
- Incorporate promotional and regional factors
- Build an interactive dashboard for real-time forecasting
Masood Manzoor Ahmed
Data Science & Machine Learning Enthusiast