This is a tabular data competition, in which the provided dataset is synthetic, generated by a deep learning model that shares similar feature distributions of the original sales dataset. While the data closely resembles the original data, there are slight discrepancies.
Uniquely, you're allowed to use the original sales data in conjunction with the provided dataset. This will allow you to explore the differences between the two and see if incorporating the real data improves your prediction model's performance.
- train.csv - the provided training data set, with real-valued target
Item_Outlet_Sales; - test.csv - the test dataset; you must predict
Item_Outlet_Salesfor each row id; - sample_submission.csv - a sample submission file in the correct format.
The evaluation metric for this competition is Root Mean Squared Log Error.
The RMSLE is calculated as:
where:
- ( n ) - is the total number of observations;
- ( y_i ) - is the actual value of the target for observation (i);
- ( \hat{y_i} ) - is the predicted value of the target for observation (i);
- (\log) - is the natural logarithm.
For every id in the dataset, you should predict the real value target Item_Outlet_Sales.
The file should contain a header and have the following format:
id,Item_Outlet_Sales
378428,2125
378429,2125
378430,212517 June, 2024 - Start of Kaggle Competition
17 June, 2024, 17:00 (Kyiv timezone) - Intro Webinar about Kaggle Competition
25 June, 2024, 17:00 (Kyiv timezone) - Q&A session about Kaggle
1 July, 2024, 23:59 (UTC) - End of Kaggle Competition
