Waveform Analysis Unlocks the Data in Music

Waveform Analysis Unlocks the Data in MusicTyler BlairBlockedUnblockFollowFollowingJun 3A waveform is a song’s fingerprint.

It is this waveform, or convolution of acoustic (sound) waves, that contains all of a song’s content.

From it, a rich array of features can be revealed.

These features are able to enhance recommendation algorithms, far beyond what’s possible by only using a song’s title, artist, and genre.

Whether as part of a purely content-based approach or as a vital first step in addressing the ‘cold start’ problem, understanding media content is equally as important as understanding how it is consumed.

My Relationship with MusicMusic is extremely important to me.

It has been a constant in my life that has seen me through all walks of life, both good and bad.

Since I spend so much time consuming music, finding new music has always been a passion of mine.

While I don’t know how many hours I’ve spent listening to music over the years, I do know that in the past eight years alone, I have amassed a library of over 6000 individually-selected songs.

The size of my music library.

Since my entire music collection is stored offline, I am unable to take part in collaborative filtering algorithms, like those utilized by music streaming platforms to recommend new songs.

Concretely, if User X listens to much of the same music that I do, the music streaming platform understands that I might also enjoy other songs that User X listens to.

So if I wish to create themed playlists, I am limited to content-based filtering on the music I have at hand.

However, it’s time consuming to hand-select a cohesive set of songs from a collection this large, even if I could perfectly recall each song’s title, artist, and its identifying features (e.


, classical, piano, or “chill”).

In reality, I can only recall a small fraction of these songs, and often forget the artist’s last name (I know their name was John… something), so making playlists this way is a daunting task that I typically avoid.

Instead, I usually simplify the process by enabling shuffle and pressing play.

Last year, however, I realized that I could leverage the waveform data of songs to address my growing problem of creating playlists.

I had previously worked with data from light sources, and because light and sound both propagate as waves, I was confident that I would be able to use my skill set to tackle the issue.

Example single channel audio waveform.

Feature Extraction and SelectionFueled by excitement and Seattle’s coffee, I brainstormed a bit of my approach: what features can be extracted from these waveforms?.What features would be most useful for what I want?.How should songs be selected for a playlist?I wanted to get started and knew I would be able to answer the latter questions once I had a number of features in hand, so that’s where I started.

After refactoring the at-the-time outdated pyAudioAnalysis package and managing package dependencies, I extracted features for each song.

I knew that if I extracted features over only a portion of a song, the extracted features may not accurately reflect the song in its entirety.

To address this, features were extracted over a moving window and averaged.

By doing so, the variance of each feature could be calculated and used as additional features.

This amounted to 33 primary features, each with an associated standard deviation, for each song:Zero-crossing Rate: Rate of sign-changes (audio waveforms are relative and centered at zero).

A key feature in classifying percussive sounds.

Energy: Sum of squares of the signal values, normalized by window length.

Identifies the loudness of a song.

Entropy of Energy: Entropy of sub-frames’ normalized energies.

Used as a measure of abrupt changes.

Spectral Centroid: Center of gravity of the spectrum.

Related to the “brightness” of a sound.

Spectral Spread: Second central moment of the spectrum.

Measures the bandwidth of the spectrum.

Spectral Entropy: Entropy of the normalized spectral energies.

Measure of spectral variation over time.

Spectral Flux: Squared difference between normalized magnitudes of the spectra of two successive frames.

Measures steadiness/consistency of the spectrum.

Spectral Roll-Off: Frequency below which 90% of the magnitude distribution of the spectrum is concentrated.

Mel-Frequency Cepstral Coefficients (13 features): Variation of linear cepstrum (nonlinear scale of pitches).

Typically used for speech recognition.

Key (12 features; G, C, D, etc.

): Spectral energy in each semitone.

From this, I selected a list of useful features based on the following considerations:As features are calculated over a song’s length, songs with widely varying components or mixes with multiple songs will have features that don’t perfectly reflect the song.

The key of the song was not ideal to use as a feature for song selection.

I didn’t want a playlist of songs in eerie minor notes (see Adele’s Someone Like You in A minor), so I removed features involving the song’s key.

Mel-frequency cepstral coefficients are better fit for speech recognition than for selecting similar-sounding songs, so these were also removed.

Ultimately, the 33 features were distilled down to just eight useful features along with their standard deviations, which were then normalized.

So with a robust set of features that encapsulated my entire music library, it was finally time to make playlists!My library, which exists without convenient genre labels such as Alternative, Pop, Rock, etc.

, does not allow for genre-specific playlists in the traditional sense.

This bothered me until I realized that genre boundaries are not well defined and that it would have created an unnecessary bias on how I could generate playlists.

It became evident that using something similar to the k-Nearest Neighbors (k-NN) algorithm would be the most fitting way to create playlists, although I didn’t need to create a classification system; once specifying a song, I would find the song’s nearest neighbors in 16-dimensional space by calculating pairwise Euclidean distances to every other point with scikit-learn.

I also wanted my framework to be robust enough to not require any combing through thousands of files nor remembering the exact song or artist name to find the song to base the playlist on.

To this end, I created a simple pipeline for creating each playlist.

Create a Playlist!First, I input a partial song or artist name and the length of playlist desired.

For example, if I desired a 15-song playlist based on a classical piano song by Ludovico Einaudi, but I couldn’t recall the song title nor his last name, the input would be:genPlaylist(‘Ludovico’, 15)This would then display all songs that matched a loose SQL query:SELECT name WHERE name LIKE ‘%ludovico%’To give the following output:Upon seeing the names and realizing that I wanted Ludovico Einaudi’s song Nuvole Bianche, I could then select song 2.

The below code would then determine Euclidean distances from the Nuvole Bianche to every other song, sort them, then return the nearest neighbors to be used in the playlist.

min_dists = pairwise_distances(features_df.



reshape(1,-1), features_df.

values)neighbor_indices = min_dists.

argsort()[0][:num_songs]id_list = [] for neighbors in neighbor_indices: id_list += [features_df.


name] new_playlist = song_df[song_df.


isin(id_list)]new_playlist_entries = new_playlist['title']With the songs identified, a folder named “Ludovico Einaudi – Nuvole Bianche” is created and the songs are moved to that folder.

Finally, I can listen to my newly formed playlist whenever I want, or I can continue creating more playlists!.If you are interested, the playlist formed by this exact query (once again limited to those songs within my music library) can be found here.

You can also check out the source code here.

Conclusion and Potential ApplicationsWhile this framework is currently limited to my personal library of music and thus does not contribute to finding new music for me, it’s great for creating playlists that match the theme of a certain song within moments.

However, with a more extensive library, like those of Spotify, iTunes, or YouTube, its uses could be extended to facilitate music discovery and the ‘cold start’ problem with new users.

In a 16-dimensional feature space, a user may have a well-defined boundary of what they will typically listen to.

If a user then decides that they want to expand their music library to songs that lie outside of their current feature space boundary, a playlist that includes songs that are just beyond this boundary would be an effective way to do so.

For a new user, this method could be employed to quickly define their feature space boundary, at which point collaborative filtering could also be used.

While I have only discussed this problem in the context of music, its applications extend far beyond that of playlist generation.

The underlying approach can (and may already) be adapted to finding new shows on video streaming platforms, semi-targeted advertising, and others, though it requires a rich feature space.

This can be accomplished by engineering features from raw data, revealing a wealth of information, and is not limited to waveform analysis.

So I encourage you to take a deeper look at the data you have because, much like an iceberg, what’s readily seen might only be the beginning.


. More details

Leave a Reply