Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"source": [
"# Finding Similar Songs on Spotify - Part 1: Distance Based Search\n",
"\n",
"The first part of this tutorial series demonstrates the traditional way of extracting features from the audio content, training a classifier and predicting results. Because we do not have access to the raw audio content, we cannot extract features ourselves. Fortunately, Spotify is so generious to provide extracted features via their API. Those are just low-level audio features, but they are more than any other streaming music service provide - so Kudos to Spotify for this API! To download the features from the Spotify API you need to apply for a valid client ID. Please follow the steps on the Github page to apply for such an ID.\n",
"The first part of this tutorial series demonstrates the traditional way of extracting features from the audio content, training a classifier and predicting results. Because we do not have access to the raw audio content, we cannot extract features ourselves. Fortunately, Spotify is so generous to provide extracted features via their API. Those are just low-level audio features, but they are more than any other streaming music service provide - so Kudos to Spotify for this API! To download the features from the Spotify API you need to apply for a valid client ID. Please follow the steps on the Github page to apply for such an ID.\n",
"\n",
"\n",
"## Part 1 - Overview\n",
Expand Down Expand Up @@ -117,7 +117,7 @@
" User authentication requires interaction with your\n",
" web browser. Once you enter your credentials and\n",
" give authorization, you will be redirected to\n",
" a url. Paste that url you were directed to to\n",
" a url. Paste that url you were directed to to\n",
" complete the authorization.\n",
"\n",
" Opened https://accounts.spotify.com/authorize?scope=playlist-modify-public&redirect_uri=ht...\n",
Expand Down Expand Up @@ -191,9 +191,9 @@
"source": [
"### Get Playlist meta-data\n",
"\n",
"Insted of writing one big loop to download the data, I decided to split it into separate more comprehensible steps.\n",
"Instead of writing one big loop to download the data, I decided to split it into separate more comprehensible steps.\n",
"\n",
"The Spotify API does not return infinite elements, but requires batch processing. The largest batch size is 100 items such as tracks, artists or albums. As a first step we get relevant meta-data for the supplied playlists. Especially the *num_track* property is conveniant for the further processing."
"The Spotify API does not return infinite elements, but requires batch processing. The largest batch size is 100 items such as tracks, artists or albums. As a first step we get relevant meta-data for the supplied playlists. Especially the *num_track* property is convenient for the further processing."
]
},
{
Expand Down Expand Up @@ -375,7 +375,7 @@
"\n",
"We will use caching to locally store retrieved data. This is on the one hand a requirement of the API and on the other it speeds up processing when we reload the notebook. *joblib* is a convenient library which simplifies caching.\n",
"\n",
"*Update the cachdir to an appropriate path in the following cell*"
"*Update the cachedir to an appropriate path in the following cell*"
]
},
{
Expand All @@ -399,7 +399,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following method retrieves meta-data, sequential features such as *MFCCs* and *Chroma*, and track-level features such as *Dancability*. The *@memory.cache* annotation tells *joblib* to persist all return values for the supplied parameters."
"The following method retrieves meta-data, sequential features such as *MFCCs* and *Chroma*, and track-level features such as *Danceability*. The *@memory.cache* annotation tells *joblib* to persist all return values for the supplied parameters."
]
},
{
Expand Down Expand Up @@ -749,7 +749,7 @@
"\n",
"### Single Vector Representation\n",
"\n",
"The simlarity retrieval approach presented in this tutorial is based on a vector-space model where each track is represented of a single fixed-length feature vector. The segment-based features provided by the Spotify API are lists of feature vectors of varying lengths. Thus, these features need to be aggregated into a single feature vector. The following function describes a simple approach to do so:"
"The similarity retrieval approach presented in this tutorial is based on a vector-space model where each track is represented of a single fixed-length feature vector. The segment-based features provided by the Spotify API are lists of feature vectors of varying lengths. Thus, these features need to be aggregated into a single feature vector. The following function describes a simple approach to do so:"
]
},
{
Expand Down Expand Up @@ -829,7 +829,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Afgregate all features of the downloaded data"
"Aggregate all features of the downloaded data"
]
},
{
Expand Down Expand Up @@ -889,7 +889,7 @@
"source": [
"### Normalize feature data\n",
"\n",
"The feature vectors are composed of differnt feature-sets. All of them with different value ranges. While features such as Acousticness and Danceability are scaled between 0 and 1, the BPM values of the tempo feature ranges around 120 or higher. We apply Standard Score or Zero Mean and Unit Variance normalization to uniformly scale the value ranges of the features.\n",
"The feature vectors are composed of different feature-sets. All of them with different value ranges. While features such as Acousticness and Danceability are scaled between 0 and 1, the BPM values of the tempo feature ranges around 120 or higher. We apply Standard Score or Zero Mean and Unit Variance normalization to uniformly scale the value ranges of the features.\n",
"\n",
"$$\n",
"z = {x- \\mu \\over \\sigma}\n",
Expand Down Expand Up @@ -922,7 +922,7 @@
" ID Mean Standard Deviation\n",
" 0 1517.5993814237531 291.1855836731788\n",
"\n",
"In this example the center frequency is 1518 Hz and it deviates by 291 Hz. These numbers already describe the audio content and can be used to find similar tracks. The common approach to calcualte music similarity from audio content is based on vector difference. The assumption is, that similar audio feature-values correspond with similar audio content. Thus, feature vectors with smaller vector differences correspond to more similar tracks. The following data represents the extracted Spectral Centroids of our 10-tracks collection:\n",
"In this example the center frequency is 1518 Hz and it deviates by 291 Hz. These numbers already describe the audio content and can be used to find similar tracks. The common approach to calculate music similarity from audio content is based on vector difference. The assumption is, that similar audio feature-values correspond with similar audio content. Thus, feature vectors with smaller vector differences correspond to more similar tracks. The following data represents the extracted Spectral Centroids of our 10-tracks collection:\n",
"\n",
"\n",
" ID Mean Standard Deviation\n",
Expand Down Expand Up @@ -994,7 +994,7 @@
"source": [
"### Euclidean Distance\n",
"\n",
"In the final part of this tutorial we wil use the Euclidean Distance to calculate similarities between tracks. As mentioned above, the Euclidean Distance is a metric to calculate the distance between two vectors and thus is a function of dissimilarity. This means, vectors with smaller distance values are more similar than those with higher distances.\n",
"In the final part of this tutorial we will use the Euclidean Distance to calculate similarities between tracks. As mentioned above, the Euclidean Distance is a metric to calculate the distance between two vectors and thus is a function of dissimilarity. This means, vectors with smaller distance values are more similar than those with higher distances.\n",
"\n",
"$$\n",
"d(p,q) = \\sqrt{\\sum_{i=1}^n (q_i-p_i)^2}\n",
Expand All @@ -1009,7 +1009,7 @@
},
"outputs": [],
"source": [
"def eucledian_distance(feature_space, query_vector):\n",
"def euclidean_distance(feature_space, query_vector):\n",
" \n",
" return np.sqrt(np.sum((feature_space - query_vector)**2, axis=1))"
]
Expand Down Expand Up @@ -1108,7 +1108,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The following lines of code implement the approach described above. First, the distances between the query vector and all other vectors of the collection are calculated. Then the distances are sorted ascnedingly to get the simlar tracks. Because the metric distance of identical vectors is 0, the top-most entry of the sorted list is always the query track."
"The following lines of code implement the approach described above. First, the distances between the query vector and all other vectors of the collection are calculated. Then the distances are sorted ascendingly to get the similar tracks. Because the metric distance of identical vectors is 0, the top-most entry of the sorted list is always the query track."
]
},
{
Expand Down Expand Up @@ -1272,7 +1272,7 @@
],
"source": [
"# calculate the distance between the query-vector and all others\n",
"dist = eucledian_distance(feature_data, feature_data[query_track_idx])\n",
"dist = euclidean_distance(feature_data, feature_data[query_track_idx])\n",
"\n",
"# sort the distances ascendingly - use sorted index\n",
"sorted_idx = np.argsort(dist)\n",
Expand All @@ -1287,11 +1287,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Scaled Eucledian Distance\n",
"### Scaled Euclidean Distance\n",
"\n",
"The approach taken to combine the different feature-sets is refered to as early fusion. The problem with the approach described in the previous step is, that larger feature-sets dominate the calculated distance values. The aggregated MFCC and Chroma features have 24 dimensions each. Together they have more dimensions as the remaining features which are mostly single dimensional features. Thus, the distances are unequally dominated by the two feature sets.\n",
"The approach taken to combine the different feature-sets is referred to as early fusion. The problem with the approach described in the previous step is, that larger feature-sets dominate the calculated distance values. The aggregated MFCC and Chroma features have 24 dimensions each. Together they have more dimensions as the remaining features which are mostly single dimensional features. Thus, the distances are unequally dominated by the two feature sets.\n",
"\n",
"To avoid such a bias, we scale the feature-space such that feature-sets and single-value features have euqal the same weights and thus euqal influence on the resulting distance."
"To avoid such a bias, we scale the feature-space such that feature-sets and single-value features have the same weights and thus equal influence on the resulting distance."
]
},
{
Expand Down Expand Up @@ -1331,7 +1331,7 @@
},
"outputs": [],
"source": [
"def scaled_eucledian_distance(feature_space, query_vector):\n",
"def scaled_euclidean_distance(feature_space, query_vector):\n",
" \n",
" distances = (feature_space - query_vector)**2\n",
" \n",
Expand Down Expand Up @@ -1516,7 +1516,7 @@
}
],
"source": [
"dist = scaled_eucledian_distance(feature_data, feature_data[query_track_idx])\n",
"dist = scaled_euclidean_distance(feature_data, feature_data[query_track_idx])\n",
"\n",
"metadata.loc[np.argsort(dist)[:11], display_cols]"
]
Expand All @@ -1527,7 +1527,7 @@
"source": [
"### Feature Weighting\n",
"\n",
"As explained above, the vanilla Eucliden Distance in an early fusion approach is dominated by large feature-sets. Through scaling the feature-space we achieved equal influence for all feature-sets and features. Now, equal influence is not always the best choice fo music similarity. For example, the year and popularity feature we included into our feature vector are not an intrinsic music property. We just added them to cluster recordings of the same epoch together. Currently this feature has the same impact on the estimated similarity as timbre, rhythm and harmonics. When using many features it is commonly a good choice to apply different weights to them. Estimating these weights is generally achieved empirically."
"As explained above, the vanilla Euclidean Distance in an early fusion approach is dominated by large feature-sets. Through scaling the feature-space we achieved equal influence for all feature-sets and features. Now, equal influence is not always the best choice for music similarity. For example, the year and popularity feature we included into our feature vector are not an intrinsic music property. We just added them to cluster recordings of the same epoch together. Currently this feature has the same impact on the estimated similarity as timbre, rhythm and harmonics. When using many features it is commonly a good choice to apply different weights to them. Estimating these weights is generally achieved empirically."
]
},
{
Expand Down Expand Up @@ -1567,7 +1567,7 @@
},
"outputs": [],
"source": [
"def weighted_eucledian_distance(feature_space, query_vector, featureset_weights):\n",
"def weighted_euclidean_distance(feature_space, query_vector, featureset_weights):\n",
" \n",
" distances = (feature_space - query_vector)**2\n",
" \n",
Expand Down Expand Up @@ -1753,7 +1753,7 @@
}
],
"source": [
"dist = weighted_eucledian_distance(feature_data, feature_data[query_track_idx], featureset_weights)\n",
"dist = weighted_euclidean_distance(feature_data, feature_data[query_track_idx], featureset_weights)\n",
"\n",
"metadata.loc[np.argsort(dist)[:11], display_cols]"
]
Expand Down Expand Up @@ -1801,7 +1801,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Run the evauation for all three introduced algorithms:"
"Run the evaluation for all three introduced algorithms:"
]
},
{
Expand Down Expand Up @@ -1836,17 +1836,17 @@
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Weighted Eucledian Distance</th>\n",
" <th>Weighted Euclidean Distance</th>\n",
" <td>0.583351</td>\n",
" <td>0.34</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Scaled Eucledian Distance</th>\n",
" <th>Scaled Euclidean Distance</th>\n",
" <td>0.501596</td>\n",
" <td>0.40</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Eucledian Distance</th>\n",
" <th>Euclidean Distance</th>\n",
" <td>0.438723</td>\n",
" <td>0.10</td>\n",
" </tr>\n",
Expand All @@ -1856,9 +1856,9 @@
],
"text/plain": [
" precision recall\n",
"Weighted Eucledian Distance 0.583351 0.34\n",
"Scaled Eucledian Distance 0.501596 0.40\n",
"Eucledian Distance 0.438723 0.10"
"Weighted Euclidean Distance 0.583351 0.34\n",
"Scaled Euclidean Distance 0.501596 0.40\n",
"Euclidean Distance 0.438723 0.10"
]
},
"execution_count": 74,
Expand All @@ -1873,14 +1873,14 @@
"\n",
"# run evaluation\n",
"\n",
"evaluation_results[\"Eucledian Distance\"] = \\\n",
" evaluate(lambda x,y: eucledian_distance(x,y), cut_off)\n",
"evaluation_results[\"Euclidean Distance\"] = \\\n",
" evaluate(lambda x,y: euclidean_distance(x,y), cut_off)\n",
" \n",
"evaluation_results[\"Scaled Eucledian Distance\"] = \\\n",
" evaluate(lambda x,y: scaled_eucledian_distance(x,y), cut_off)\n",
"evaluation_results[\"Scaled Euclidean Distance\"] = \\\n",
" evaluate(lambda x,y: scaled_euclidean_distance(x,y), cut_off)\n",
"\n",
"evaluation_results[\"Weighted Eucledian Distance\"] = \\\n",
" evaluate(lambda x,y: weighted_eucledian_distance(x,y, featureset_weights), cut_off)\n",
"evaluation_results[\"Weighted Euclidean Distance\"] = \\\n",
" evaluate(lambda x,y: weighted_euclidean_distance(x,y, featureset_weights), cut_off)\n",
"\n",
"# aggregate results\n",
"evaluation_results = pd.DataFrame(data = evaluation_results.values(), \n",
Expand All @@ -1895,7 +1895,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"These results must be interpreted in relation to the analyzed data-set and the method how the metrics are measured. We measure how many tracks in the resulting list of similar songs belong to the same playlist of the query song. We have chosen genre-related playlists such as *Metal* and *Hip-Hop*. But there are also overalpping playlists such as *Classic Metal* and *Rock Hymns* which both contain Rock and Metal tracks. This should be considered in the interpretation of the evaluation results. To get more reliable results, more efforts need to be put into creating better non-overlapping playlists. But, since music similarity is subject to subjective interpretation, this is a challinging task.\n",
"These results must be interpreted in relation to the analyzed data-set and the method how the metrics are measured. We measure how many tracks in the resulting list of similar songs belong to the same playlist of the query song. We have chosen genre-related playlists such as *Metal* and *Hip-Hop*. But there are also overlapping playlists such as *Classic Metal* and *Rock Hymns* which both contain Rock and Metal tracks. This should be considered in the interpretation of the evaluation results. To get more reliable results, more efforts need to be put into creating better non-overlapping playlists. But, since music similarity is subject to subjective interpretation, this is a challenging task.\n",
"\n",
"Although we have a small bias from the overlapping playlists, we see that it makes sense to tune the weights of the features to regulate their impact on the final results. "
]
Expand All @@ -1912,9 +1912,9 @@
"\n",
"* **Feature aggregation:** taking only mean and standard deviation is not the most efficient way to aggregate the sequential features provided by the Spotify API.\n",
"* **Distance Measure:** other distance measures could yield better results. This often depends on the underlying dataset.\n",
"* **Better Machine Learning Methods:** the presented nearest neighobr based approach is a linear model and is not able to model non-linearities of music similarities.\n",
"* **Better Machine Learning Methods:** the presented nearest neighbour based approach is a linear model and is not able to model non-linearities of music similarities.\n",
"\n",
"In the next part of this tutorial series I will introduce Siamese Netowkrs. These Deep Neural Networks are able to learn high-level features from the low-level features as well as to learn the non-linear distance function to estimate the similarity between two tracks."
"In the next part of this tutorial series I will introduce Siamese Networks. These Deep Neural Networks are able to learn high-level features from the low-level features as well as to learn the non-linear distance function to estimate the similarity between two tracks."
]
}
],
Expand Down
Loading