Music Audio Benchmark Data Set

We provide a collection of audio files for Machine Learning and Data Mining which has been downloaded from www.garageband.com.
The Dataset contains 1886 songs all being encoded in mp3 format. The frequency and bitrate of these files are 44,100 Hz and 128 kb

Our collection contains the following genres:

Genre Number of Samples Samples Meta Files
alternative 145 alternative samples 22,1 MB alternative metafiles 128 KB
blues 120 blues samples 18,2 MB blues metafiles 93 KB
electronic 113 electronic samples 17,1 MB electronic metafiles 90 KB
folkcountry 222 folkcountry samples 34,4 MB folkcountry metafiles 231 KB
funksoulrnb 47 funksoulrnb samples 7,3 MB funksoulrnb metafiles 46.1 KB
jazz 319 jazz samples 48,5 MB jazz metafiles 241 KB
pop 116 pop samples 17,7 MB pop metafiles 116 KB
raphiphop 300 raphiphop samples 45,5 MB raphiphop metafiles 240 KB
rock 504 rock samples 76.6 MB rock metafiles 451 KB

Exampleset and Taxonomies

We also provide an Exampleset in which we have extracted 49 Features based on [Mierswa/Morik 2005 Automatic Feature Extraction for Classifying Audio Data] . These features have been extracted with YALE. In addition to the exampleset we also provide a set of user created taxonomies on subsets of the items (.zip)

Related Publications

Homburg/etal/2005a Homburg, Helge and Mierswa,Ingo and Moller, Bulent and Morik, Katharina and Wurst, Michael. A Benchmark Dataset for Audio Classification and Clustering. In Joshua D. Reiss and Geraint A. Wiggins (editors), Proc. of the International Symposium on Music Information Retrieval 2005, pages 528--531, London, UK, Queen Mary University, 2005.
Mierswa/Morik/2005a Mierswa, Ingo and Morik, Katharina. Automatic Feature Extraction for Classifying Audio Data. In Machine Learning Journal, Vol. 58, pages 127--149, 2005.