Skip to content

TokenRR/ML_competition_2024_for_ukrainians

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alt text

Private Kaggle Competition 2024 for Ukrainians

About the competition

This is a tabular data competition, in which the provided dataset is synthetic, generated by a deep learning model that shares similar feature distributions of the original sales dataset. While the data closely resembles the original data, there are slight discrepancies.

Uniquely, you're allowed to use the original sales data in conjunction with the provided dataset. This will allow you to explore the differences between the two and see if incorporating the real data improves your prediction model's performance.

Files

  • train.csv - the provided training data set, with real-valued target Item_Outlet_Sales;
  • test.csv - the test dataset; you must predict Item_Outlet_Sales for each row id;
  • sample_submission.csv - a sample submission file in the correct format.

Evaluation Metric

The evaluation metric for this competition is Root Mean Squared Log Error.

The RMSLE is calculated as:

$$ RMSLE = \sqrt{\frac{1}{n} \sum_{i=1}^{n} (\log{(y_i + 1)} - \log{(\hat{y_i} + 1)})^2} $$

where:

  • ( n ) - is the total number of observations;
  • ( y_i ) - is the actual value of the target for observation (i);
  • ( \hat{y_i} ) - is the predicted value of the target for observation (i);
  • (\log) - is the natural logarithm.

Submission Format

For every id in the dataset, you should predict the real value target Item_Outlet_Sales.

The file should contain a header and have the following format:

id,Item_Outlet_Sales
378428,2125
378429,2125
378430,2125

Timing

17 June, 2024 - Start of Kaggle Competition
17 June, 2024, 17:00 (Kyiv timezone) - Intro Webinar about Kaggle Competition
25 June, 2024, 17:00 (Kyiv timezone) - Q&A session about Kaggle
1 July, 2024, 23:59 (UTC) - End of Kaggle Competition

Contributors