Million Song Dataset Challenge

From RecSysWiki
Revision as of 17:56, 15 August 2012 by Zeno Gantner (talk | contribs) (this still must be wikified a bit)
Jump to navigation Jump to search

The Million Song Dataset Challenge aims at being the best possible offline evaluation of a music recommendation system. Any type of algorithm can be used: collaborative filtering, content-based methods, web crawling, even human oracles!* By relying on the Million Song Dataset, the data for the competition is completely open: almost everything is known and possibly available.

What is the task in a few words? You have:

  1. the full listening history for 1M users,
  2. half of the listening history for 110K users (10K validation set, 100K test set),

and you must predict the missing half. How much easier can it get?

The most straightforward approach to this task is pure collaborative filtering, but remember that there is a wealth of information available to you through the Million Song Dataset.

Ready to start recommending? Read through our Getting Started tutorial.

For a more technical introduction to the MSD Challenge, see our AdMIRe paper. (Please use this following citation when referring to the contest in an academic setting.)

  • This contest is for computer models, but if you manage to get recommendations from humans for 110K listeners, we'd like to know how!


The Million Song Dataset Challenge is a joint effort between the Computer Audition Lab at UC San Diego and LabROSA at Columbia University. The user data for the challenge, like much of the data in the Million Song Dataset, was generously donated by The Echo Nest, with additional data contributed by SecondHandSongs, musiXmatch, and Last.fm. Follow-up evaluations will be conducted by IMIRSEL at the Graduate School of Library Information Science at UIUC as part of the Music Information Retrieval Evaluation eXchange (MIREX).

External links