MovieLens 100k benchmark results
Jump to navigation
Jump to search
This page is a first example of how a benchmark page in RecSysWiki could look like. It is work in progress. Please contribute and comment.
Rationale: This is primarily meant to be a comparison between methods, not between tools. This is why we sort by method. At the same time, we state the version number and all input arguments for maximum reproducibility.
If there are two lines for one method, then the first line are results with the random seed set to 1; the second line (or otherwise the only line) contains the average results for 5 runs with random initialization.
Contents
Baseline Methods
Software | Method | 5-fold CV | all-but-10 | References |
---|---|---|---|---|
MyMediaLite 3.07 | GlobalAverage | 1.1256 | 1.1238 | |
MyMediaLite 3.07 | UserAverage | 1.0437 | 1.0518 | |
MyMediaLite 3.07 | ItemAverage | 1.0246 | 1.0453 | |
MyMediaLite 3.07 | UserItemBaseline | 0.9413 | 0.9656 |
kNN-based Collaborative Filtering
Software | Method | 5-fold CV | all-but-10 | References |
---|---|---|---|---|
MyMediaLite 3.07 | UserKNN | 0.9283 | 0.9572 | |
MyMediaLite 3.07 | ItemKNN | 0.9182 | 0.9445 |
Matrix Factorization
Software | Method | 5-fold CV | all-but-10 | References |
---|---|---|---|---|
MyMediaLite 3.07 | BiasedMatrixFactorization | 0.9220 | 0.9475 | |
MyMediaLite 3.07 | SVDPlusPlus | 0.9112 | 0.9409 | |
MyMediaLite 3.07 | SigmoidUserAsymmetricFactorModel | 0.8939 | 0.9232 |
Attribute-Aware Methods
Other Methods
Disclaimers
- The results presented here come with no warranty whatsoever. Use at your own risk.
- Most if not all results are self-reported by the implementations, which may contain bugs in their evaluation routines.
- The results are not necessarily fair towards the compared methods and implementations. There could be hyper-parameter overfitting, or you could achieve a lot better results by better tuning.
- MovieLens 100k is one of the oldest existing collaborative filtering datasets, and it was dominating the literature for years, because it was one of the few available datasets. It could be that methods developed in that period have a certain bias towards this dataset. The dataset is also quite small by today's standards.