RecSys'24 Competition - BlackPearl Team Solution

Introduction

The Ekstra Bladet RecSys Challenge aims to predict which article a user will click on from a list of articles that were seen during a specific impression. Utilizing the user's click history, session details (like time and device used), and personal metadata (including gender and age), along with a list of candidate news articles listed in an impression log, the challenge's objective is to rank the candidate articles based on the user's personal preferences. This involves developing models that encapsulate both the users and the articles through their content and the users' interests. The models are to estimate the likelihood of a user clicking on each article by evaluating the compatibility between the article's content and the user's preferences. The articles are ranked based on these likelihood scores, and the precision of these rankings is measured against the actual selections made by users.

Brief Solution

Through data analysis, we discovered that the length of the news list displayed in the exposure sequence and the hourly granularity data of news exposure within the exposure interval on the same day are very strong features. We then meticulously mined and derived a series of related features, combined with features related to users' historical behaviors, to construct tree models and neural network models for training and fitting the target. Our solution is a simple weighted combination of two tree models (CatBoost's pair rank loss and CatBoost's query loss) and a DIN model, achieving a score of 0.8808 on the online test set. In terms of single model performance, the best single tree model was CatBoost's query loss, with an offline GAUC score of 0.866 and an online score of 0.879. The DIN model had an offline GAUC score of 0.856 and an online score of 0.873.

Model	Offline Score	Online Score
DIN	0.856	0.873
Cat Pair Rank	0.860	0.873
Cat Query Loss	0.866	0.879
Blend	0.8678	0.8808

Requirements

Hardware Requirements: 8 x A100 GPUs + 1TB of memory
Software Requirements:
```
pip install -r requirements.txt
```

Data Preparation

Create an "inputs" directory in the working directory, and then create "large" and "vectors" subdirectories within it. First, unzip the five vector zip files and place them in the "vectors" subdirectory. Unzip the "ebnerd_large.zip" file and place it in the "large" directory. Also, unzip the "ebnerd_testset.zip" file and place it in the "large" directory as well.

Feature Engineer

Switch to the 'code/Feature_Engineer' subdirectory and execute the following command:

sh FE.sh

Train Tree model

Switch to the 'code' subdirectory and execute the following command:

sh train.sh

NN Model

switch to the 'code/nn' subdirectory and execute the following commands:

sh gen_sample.sh
sh run.sh

Blend and Submit

sh submit.sh

If you have any questions, Welcome to contact us.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
caches		caches
code		code
dataset		dataset
features		features
inputs		inputs
submit		submit
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RecSys'24 Competition - BlackPearl Team Solution

Introduction

Brief Solution

Requirements

Data Preparation

Feature Engineer

Train Tree model

NN Model

Blend and Submit

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RecSys'24 Competition - BlackPearl Team Solution

Introduction

Brief Solution

Requirements

Data Preparation

Feature Engineer

Train Tree model

NN Model

Blend and Submit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages