# Summary of Matrix Factorization Tricks

This page tries to list the tricks and components used in matrix factorization for collaborative filtering. There are many kinds of variants for matrix factorization, and in general they can be divided into three kinds:

- Different variant predictor, how to use existing information to do prediction
- Different variant loss functions, whether to use square-loss, log-loss or max-margin loss
- To do rate prediction or rank prediction

## Predictor

Predictor refers to the way we give the prediction given input information. Basic predictor for matrix factorization is given by

<math>y_{ui} = \mu + b_u+b_i+p_u^T q_i</math>

List of predictor variants(try to add more):

- SVD++
- Yehuda Koren: Factorization meets the neighborhood: a multifaceted collaborative filtering model, KDD 2008, http://portal.acm.org/citation.cfm?id=1401890.1401944
- Yehuda Koren: Collaborative Filtering with Temporal Dynamics, KDD 2009, http://research.yahoo.com/files/kdd-fp074-koren.pdf
- Feature-based matrix factorization

## Loss Function

Loss function specifies how we train our model. It's more or less independent with the predictor.

List of loss functions:

- Square-loss: most commonly used in collaborative filtering task for rate prediction
- Logistic log-likelihood loss: used for sigmoid matrix factorization, sometimes performs better than square-loss
- Hinge-loss( smoothed hinge loss ): used for maximum margin matrix factorization. http://portal.acm.org/citation.cfm?id=1102441

## Pairwise Rank Model

It's not hard to convert a rate prediction predictor to pairwise rank model. We only need to follow two steps: (1) choose a predictor <math>y</math> (2) choose a loss function for binary classification( either logistic loss or hinge-loss ) (3) train a classification predictor for pairwise order prediction, using predictor <math>y_{ui}-y_{uj}</math> when we compare <math>i</math> to <math>j</math> for user <math>u</math>. This idea is also referred as Bayesian Personalized Ranking.