Drum Patterns from Latent Space

Drum Patterns from Latent SpacePercussion Beats And Where To Find ThemAleksey TikhonovBlockedUnblockFollowFollowingMar 10TL;DR: I collected a large dataset of drum patterns, then used a neural network approach to map them into a latent explorable space with some recognizable genre areas.

Try the interactive exploration tool or download several thousand of unique generated beats.

Context OverviewIn the recent years there have been many projects dedicated to the neural network-generated music (including drum patterns).

Some of such project use an explicit construction of a latent space in which each point corresponds to a melody.

This space can then be used both to study and classify musical structures, as well as to generate new melodies with specified characteristics.

Some others used less complex techniques, such as “language model” approaches.

However, I was unable to find an overall representation of typical beat patterns mapped to 2D space, so I decided to create one myself.

Below, I listed relevant projects that I managed to found and analyze before I started working on my own:LSTMetallica — author used a language model approach to predict the next step in a beat.

neuralbeats — another language model project, very similar to LSTMetallica.

Magenta VAE— Google’s Magenta is a great source of interesting models and projects on music generation and augmentation.

Particularly, in 2016 they released the drum_rnn model, and in March 2018 they published the music_vae model.

Since then a lot of projects used these models.

For example, last year Tero Parviainen created a really great online drum beat generator based on drum_rnn+Magenta.




Beat Blender was another Magenta-based project (first presented at the NIPS 2017).

It’s quite similar to what I wanted to do, however, the authors hadn’t built the overview map of different genres, but only an interpolation space between pairs of patterns.

Last but not least, there was my other project, Neural Network RaspberryPi Music box, which used the VAE space to generate an endless stream of piano-like music.

Dataset BuildingMost of the projects I found used small datasets of manually selected and cleaned beat patterns, e.


GrooveMonkee free loop pack, free drum loops collection or aq-Hip-Hop-Beats-60–110-bpm.

It wasn’t enough for my goals so I decided to automatically extract beat patterns from huge MIDI collections available online.

In total I’ve collected ~200K MIDI files, then kept only those with a nontrivial 9th channel (percussion channel according to the MIDI standard) so there were approximately 90K tracks left.

Next, I did some additional filtering basically in the same fashion as implemeted in the neuralbeats and LSTMetallica projects (I used a 4/4 time signature, and applied quantization and simplification to the subset of instruments).

Then I split tracks into separate chunks based on long pauses, and searched for patterns of length 32 steps that were repeated at least 3 times in a row — in order to speed up the process I used hashing and a few simple greedy heuristics.

Finally, I discarded trivial patterns with too low entropy and checked the uniqueness of each pattern in all possible phase shifts.

Ultimately, I ended up with 33K unique patterns in my collection.

some examples of distilled patternsI used a simple scheme to encode each pattern: a pattern has 32 time ticks and there are 14 possible percussion instruments (after simplification), so each pattern could be described by 32 integers in the range from 0 to 16383.

You can download the dataset here in TSV format:First column holds the pattern code (32 comma-separated integers).

Second column is the point of this pattern in the latent 4D space (4 comma-separated float values), see details below.

Third column is the t-SNE mapping from the latent space into 2D projection (2 comma-separated float values), see details below.

Neural NetworkI used the pytorch framework to build a network with a 3-layered FCN encoder mapping the beat matrix (32*14 bits) into 4D latent space, and a decoder with the same size as the encoder.

The first hidden layer has 64 neurons, the second one has — 32.

I used ReLU between the layers and a sigmoid to map the decoder output back into the bit mask.

I tried and compared two different latent space smoothing techniques, VAE and ACAI:the standard VAE produced good enough results, the space mapping was clear and meaningful,the ACAI space appeared to be much smoother — it’s harder to visualize but it’s better to sample from (see details below).

The METAL beats area on the VAE space projection (left) and the ACAI space projection (right)Drum GenerationI generated a random beats using the standard logic: sampled some random point from the latent space then used the decoder to convert that point into a beat pattern.

You could also sample points from a specific area of the latent space to obtain a beat of a certain style typical for that area.

The resulting patterns were filtered using the same set of filters I used to build the dataset (see details above).

VAE-generated beats had a quality of about 17%, in other words, an average of one out of six generated beats passed the filters successfully.

In the case of ACAI, the quality was significantly higher — around 56%, so more than half of the generated patterns passed the filters.

I generated 10K beats using each method and published them (the format is similar to the main dataset file format):sampled10k_vae_raw,sampled10k_acai_raw,sampled10k_vae_filtered,sampled10k_acai_filtered.

You can also download the MIDI packs made from these generated patterns: sampled_vae_midi.

zip and sampled_acai_midi.


Space VisualizationI used the VAE space for the visualization since it has a more distinct visual structure.

Dots are placed based on the t-SNE projection of that space.

Some of the initially collected MIDIs had genre labels in filenames, so I used these to locate and mark the areas with patterns pertaining to specific genres.

I built the visualizer based on jQuery, remodal and MIDI.

js (with a recoded GM1 Percussion soundfont) libraries.

You can explore a 3K subset of training set patterns (gray dots) + approx.

500 generated patterns (red dots).

Try it here!.

. More details

Leave a Reply