Summary of Matrix Factorization Tricks

From RecSysWiki
Jump to: navigation, search

This page tries to list the tricks and components used in matrix factorization for collaborative filtering. There are many kinds of variants for matrix factorization, and in general they can be divided into three kinds:

  • Different variant predictor, how to use existing information to do prediction
  • Different variant loss functions, whether to use square-loss, log-loss or max-margin loss
  • To do rate prediction or rank prediction


Predictor refers to the way we give the prediction given input information. Basic predictor for matrix factorization is given by

y_{ui} = \mu + b_u+b_i+p_u^T q_i

List of predictor variants(try to add more):

Loss Function

Loss function specifies how we train our model. It's more or less independent with the predictor.

List of loss functions:

  • Square-loss: most commonly used in collaborative filtering task for rate prediction
  • Logistic log-likelihood loss: used for sigmoid matrix factorization, sometimes performs better than square-loss
  • Hinge-loss( smoothed hinge loss ): used for maximum margin matrix factorization.

Pairwise Rank Model

It's not hard to convert a rate prediction predictor to pairwise rank model. We only need to follow two steps: (1) choose a predictor y (2) choose a loss function for binary classification( either logistic loss or hinge-loss ) (3) train a classification predictor for pairwise order prediction, using predictor y_{ui}-y_{uj} when we compare i to j for user u. This idea is also referred as Bayesian Personalized Ranking.