MadKea/COS781_Project
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
====================
hetrec2011-lastfm-2k
====================
-------
Version
-------
Version 1.0 (May 2011)
-----------
Description
-----------
This dataset contains social networking, tagging, and music artist listening information
from a set of 2K users from Last.fm online music system.
http://www.last.fm
The dataset is released in the framework of the 2nd International Workshop on
Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)
http://ir.ii.uam.es/hetrec2011
at the 5th ACM Conference on Recommender Systems (RecSys 2011)
http://recsys.acm.org/2011
---------------
Data statistics
---------------
1892 users
17632 artists
12717 bi-directional user friend relations, i.e. 25434 (user_i, user_j) pairs
avg. 13.443 friend relations per user
92834 user-listened artist relations, i.e. tuples [user, artist, listeningCount]
avg. 49.067 artists most listened by each user
avg. 5.265 users who listened each artist
11946 tags
186479 tag assignments (tas), i.e. tuples [user, tag, artist]
avg. 98.562 tas per user
avg. 14.891 tas per artist
avg. 18.930 distinct tags used by each user
avg. 8.764 distinct tags used for each artist
-----
Files
-----
* artists.dat
This file contains information about music artists listened and tagged by the users.
* tags.dat
This file contains the set of tags available in the dataset.
* user_artists.dat
This file contains the artists listened by each user.
It also provides a listening count for each [user, artist] pair.
* user_taggedartists.dat - user_taggedartists-timestamps.dat
These files contain the tag assignments of artists provided by each particular user.
They also contain the timestamps when the tag assignments were done.
* user_friends.dat
These files contain the friend relations between users in the database.
-----------
Data format
-----------
The data is formatted one entry per line as follows (tab separated, "\t"):
* artists.dat
id \t name \t url \t pictureURL
Example:
707 Metallica http://www.last.fm/music/Metallica http://userserve-ak.last.fm/serve/252/7560709.jpg
* tags.dat
tagID \t tagValue
1 metal
* user_artists.dat
userID \t artistID \t weight
2 51 13883
* user_taggedartists.dat
userID \t artistID \t tagID \t day \t month \t year
2 52 13 1 4 2009
* user_taggedartists-timestamps.dat
userID \t artistID \t tagID \t timestamp
2 52 13 1238536800000
* user_friends.dat
userID \t friendID
2 275
-------
License
-------
The users' names and other personal information in Last.fm are not provided in the dataset.
The data contained in hetrec2011-lastfm-2k.zip is made available for non-commercial use.
Those interested in using the data in a commercial context should contact Last.fm staff:
http://www.lastfm.com/about/contact
----------------
Acknowledgements
----------------
This work was supported by the Spanish Ministry of Science and Innovation (TIN2008-06566-C04-02),
and the Regional Government of Madrid (S2009TIC-1542).
----------
References
----------
When using this dataset you should cite:
- Last.fm website, http://www.lastfm.com
You may also cite HetRec'11 workshop as follows:
@inproceedings{Cantador:RecSys2011,
author = {Cantador, Iv\'{a}n and Brusilovsky, Peter and Kuflik, Tsvi},
title = {2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)},
booktitle = {Proceedings of the 5th ACM conference on Recommender systems},
series = {RecSys 2011},
year = {2011},
location = {Chicago, IL, USA},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {information heterogeneity, information integration, recommender systems},
}
-------
Credits
-------
This dataset was built by Ignacio Fernández-Tobías with the collaboration of Iván Cantador and Alejandro Bellogín,
members of the Information Retrieval group at Universidad Autonoma de Madrid (http://ir.ii.uam.es)
-------
Contact
-------
Iván Cantador, ivan [dot] cantador [at] uam [dot] es