MovieLens
MovieLens is a recommender system and virtual community website that recommends films based on user-provided ratings.
Contents
Datasets
Three different datasets from the MovieLens system have been released by the GroupLens research group:
- MovieLens 100k, containing 100,000 ratings
- MovieLens 1M, containing about 1,000,000 ratings
- MovieLens 10M, containing about 10,000,000 ratings, plus tagging information
All datasets additionally contain additional movie and user attributes, in particular:
- the movies' IMDB keys, allowing easy access to more movie attributes using IMDB's plain text data files,
- movie release dates and genres
- user age, gender, postal code, and occupation (not for MovieLens 10M)
Licensing
All 3 MovieLens datasets can be used free of charge for research purposes. The use of the datasets must be acknowledged, and copies of resulting publications must be sent to GroupLens. Redistribution without explicit permission is not allowed.
Details
All 3 datasets also contain timestamps. In the following, we focus on the differences between the 3 variants.
Dataset | Users | Items | Ratings | Sparsity | Tag events |
---|---|---|---|---|---|
MovieLens 100k | 943 | 1,682 | 100,000 | -- | |
MovieLens 1M | 3,706 | 6,040 | 1,000,209 | 95.5316 % | -- |
MovieLens 10M | 69,878 | 10,677 | 10,000,054 | 98.6597 % | 100,000 |
MovieLens 100k
The smallest dataset contains one split for 5-fold cross-validation, and two splits with exactly 10 ratings per user, where the test sets are disjoint. It was collected from September 19th, 1997 to April 22nd, 1998.
The rating file is tab-separated. The other data files are separated by vertical bars (|
).
See also: MovieLens 100k benchmark results
MovieLens 1M
This dataset contains ratings by users who joined the platform in the year 2000.
All files are separated by double colons (::
).
MovieLens 10M
The largest MovieLens dataset contains scripts for generating the same splits as the ones for the 100k variant. Additionally, there is a file with tagging events.
The file format is identical to MovieLens 1M.
In contrast to the two smaller sets, which have integral ratings from 1 to 5 stars, MovieLens 10M has ratings from 0.5 to 5, with a step size of 0.5.
Literature
- J. Herlocker, J. Konstan, A. Borchers, J. Riedl: An Algorithmic Framework for Performing Collaborative Filtering. Proceedings of the 1999 Conference on Research and Development in Information Retrieval. 1999.