Last.fm Recommender System

We chose to investigate the Last.fm data set using a temporal collaborative filtering model. Specifically, we want to investigate the effect that time of day and weekday has on the users' listening habits. Our objective is to reduce the number of times a user will skip the song. Because we are using skips as a proxy for a users' enjoyment of the song, we do not need to rely on explicit feedback, such as the users' song ratings. In addition, since consumer preferences change over time, we do not account for how long it has been since a user has rated a song. Skips occur in the present so we can ignore this recency factor.

The Data

Last.fm data set contains:

Listening habits for 992 users
173,921 artists with timestamped entries
Metafile containing user profiles (e.g., gender, age, country and signup date).

The data set can be found through this link. The data was collected by Òscar Celma.

The data set records users’ listening history without recording explicit feedback on artist and track pairs. As there is no input from a user on whether they liked or disliked a track, we initially treat the data as an indication of positive preference. This will be expanded to include skips for the full recommender system.

A preview of the data is shown here, with the leftmost column as the Pandas index:

The following is the play counts for the entire data set, grouped by hour of the day (left) and day of the week (right):

Part I Objective

We are interested in learning how neighborhood- and model-based collaborative filters (CF) perform on aggregated data. These CF approaches help us understand how an improved recommendation engine can drive increased user engagement within a music platform. We intend to utilize timestamps for the final project along with other metadata to improve the quality of our recommendations.

Neighborhood-Based Collaborative Filter Analysis

We implemented an user-user neighborhood based collaorative filtering technique. We tried to predict the interest that a user has in a particular artist in terms of the number of times they would listen to a particular artist, based on their average artist plays and that of their peers.

Data preprocessing

We grouped our data into artists and users, counting the number of plays as primary metric. Hence, the recommendations are at an artist level and not for individual songs. There is no additional pre-processing.

Similarity Metric

We used Pearson correlation, primarily because of the nature of the dataset and how missing values are interpreted. We found the similarity between users on their common tastes, regardless of missing values.

Training and Testing data

We split the dataset into 80-20 train-test split. This split was not completely random: for each user, a random 20% of the artists they listen to are put into the test data. This ensures that every user is represented in the test data. There is no explicit validation (tuning) dataset. This choice was made because of the nature of the model, which relies on similarities between users, has only one hyperparameter K and has no scope for overfitting the test data. Having an additional tuning dataset would result in loss of data for training purposes.

Model evaluation

Since the prediction value is the number of times a user is expected to listen to an artist, which is a continuous variable, we chose to use RMSE and MAE, which are standard metrics for evaluating continuous predctors.

Additonal Design Considerations

We considered having an item-item based CF, but decided to stick with user-user based. This choice was made because we have 992 users but nearly 174,000 unique artists. The similarity matrix for such a large number of items would be large and sparse. Most artists feature only a few times in the data set and hence would not have any similar neighbors for a meaningful analysis.

We also decided to work with aggregated artists for the first part of the project. This helps reduce the size and complexity of the data for our first exploration of the data. For our final project, we will include content based models with time as an important variable.

Model performance with hyperparameter tuning

The model has only one hyperparameter - the neighborhood size K. Increasing K improves both RMSE and MAE.

Model performance with data size

We varied the data size sytematically from 100 users to 1000, keeping a constant K value. The results are follows:

Scaling of running time with data size

Finding user-user similarity matrix is an O(n²) operation as each pair of users need to be assessed. Making predictions is an O(K*n) operation, as for each user, we need to look at all their K peers and predict accordingly. The total running time is hence asymptotically dominated by the similarity matrix step which scales as O(n²).

Model-Based Collaborative Filter Analysis

We used the SVD algorithm as our model-based CF. We used the Surprise package, which can be installed using the following command: pip install scikit-surprise. Read more about Surprise here.