MovieLens is a recommender system and virtual community website that recommends films based on user-provided ratings.
Three different datasets from the MovieLens system have been released by the GroupLens research group:
- MovieLens 100k, containing 100,000 ratings
- MovieLens 1M, containing about 1,000,000 ratings
- MovieLens 10M, containing about 10,000,000 ratings, plus tagging information
- the movies' IMDB keys, allowing easy access to more movie attributes using IMDB's plain text data files,
- movie release dates and genres
- user age, gender, postal code, and occupation (not for MovieLens 10M)
All 3 MovieLens datasets can be used free of charge for research purposes. The use of the datasets must be acknowledged, and copies of resulting publications must be sent to GroupLens. Redistribution without explicit permission is not allowed.
All 3 datasets also contain timestamps. In the following, we focus on the differences between the 3 variants.
|MovieLens 1M||3,706||6,040||1,000,209||95.5316 %||--|
|MovieLens 10M||69,878||10,677||10,000,054||98.6597 %||100,000|
The smallest dataset contains one split for 5-fold cross-validation, and two splits with exactly 10 ratings per user, where the test sets are disjoint. It was collected from September 19th, 1997 to April 22nd, 1998.
The rating file is tab-separated. The other data files are separated by vertical bars (
See also: MovieLens 100k benchmark results
This dataset contains ratings by users who joined the platform in the year 2000.
All files are separated by double colons (
The largest MovieLens dataset contains scripts for generating the same splits as the ones for the 100k variant. Additionally, there is a file with tagging events.
The file format is identical to MovieLens 1M.
In contrast to the two smaller sets, which have integral ratings from 1 to 5 stars, MovieLens 10M has ratings from 0.5 to 5, with a step size of 0.5.
- J. Herlocker, J. Konstan, A. Borchers, J. Riedl: An Algorithmic Framework for Performing Collaborative Filtering. Proceedings of the 1999 Conference on Research and Development in Information Retrieval. 1999.