I am using learning to rank in an e-commerce application. More precisely, I use lightGBM and XGboost's lambdaMART.
I have a dataset of search queries and the list of corresponding items that were displayed, with the actions that were done on them. The actions I have in my dataset are purchased, bookmarked, clicked and viewed. I want to use these actions to assign the relevance labels to each item in my training dataset.
I could assign the following relevance score: {purchased: 3, bookmarked: 2, clicked: 1, viewed: 0}. Whilst the score does not impact how the pairs are built, it will impact the delta(NDCG). So having {purchased: 10, bookmarked: 2, clicked: 1, viewed: 0} might help putting purchased items forward. Additionally, the data is very sparse (a lot of items are viewed only), which probably leads the algorithm to see a lot of ties and does not help convergence.
Does anyone have a rigorous way of choosing relevance labels ? And how do you handle the sparsity issue?