AI Music Generation — Lead Sheet Composition and ArrangementHao-Min LiuBlockedUnblockFollowFollowingJan 2Since 2018, I joined Music and AI Lab, directed by Prof.
Yi-Hsuan Yang, doing AI music generation research.
At that time, many works have been done on music generation using deep learning algorithms.
Through a literature survey, we found out that people working on music generation usually start from generating melodies.
Melody Generation — Unconditional or ConditionalMelody generation can be further divided into two groups by the types of generative model — unconditional generation and conditional generation.
Unconditional generation: Generates melodies from scratch.
Conditional generation: Generates melodies from conditional information.
Therefore, you can either generate melodies from scratch or from some given information.
For example, giving the prime melody, which is the melodies of the first bar or the first two bars to the model and let it continue to generate melodies for the following bars.
You can also feed the model some tags such as emotion tags or genre tags so that the model can generate melodies that fit the condition.
Other information like lyrics and video clip are also potential applications that people try on conditional generation.
But generating only melodies is not interested enough, right?What about Melody++Now, from the model output point of view, besides generating only melodies, we can also generate chords, which is also known as lead sheet generation.
Furthermore, we can generate melodies, chords with drums, which give the music more sense of rhythmic.
Ultimately, we can also add other instruments to play in an orchestra form.
Melody + Chords + Drums + Other InstrumentsOur GoalOur goal is not merely melody generation or melody plus chord (lead sheet) generation but the generation with a full set of musical instruments.
Let’s take a pop song “I have nothing” with three versions (i.
melody only, melody+chord, multi-instruments) as an example.
The last version is the task we are tackling.
Melody only versionLead sheet versionMulti-instruments versionOur ChallengeAs we know, to build deep learning models, we need datasets.
So we talk about the challenge from the dataset point of view.
To build a music generation model, there are two types of datasets.
The first one is called the lead-sheets and the second one is called the MIDIs.
From the lead-sheets, it provides melodies and chords, so if you just want to do melody generations, you can use the melody part.
Besides, if you want to generate the chords as well, you can use both.
Let me give you an example of how a lead sheet looks like.
Amazing Grace by John Newton in lead sheet formatSo with this format, you can train to generate melody and chords since lead-sheet provides them, and previous works like improv_rnn model (google magenta) and MidiNet (our lab) have trained on this dataset format.
But if you want to generate drums and all the other instruments, then you need to use the midi.
Let me show you how the MIDI file looks like in the following picture.
I have nothing by Whitney Houston in Midi formatRelatively, fewer people have generated MIDI as the output because it is more difficult compared to lead-sheet generation, which only contains melody and chords.
Here, you need to take care of the dependency among more instruments.
For example, in MIDI there are 128 instrument settings.
Of course, we can simplify them into four or five instruments, but you still need to take care of the dependencies among them so that they would be coherent when played together.
A model, called MuseGAN, published from our lab previously on AAAI‘18, tries to generate music with multiple tracks.
The following link provides the results of MuseGAN.
MuseGANAn AI for music generationsalu133445.
ioYou’ll notice that the major issue in the result is that there is no melody line in the music.
Why?Because usually the midi file does not specify where the melody is.
Sometime it’s played by the piano, sometime it’s played by the violin, sometime it’s played by something else.
Therefore, we can see the clear gap from the dataset availability point of view We only have either lead-sheet format or MIDI format files.
Although we want to generate a song with melody and chords and other instruments, we are forced to stop.
With lead-sheet, we can only generate melodies and chords.
With MIDI, we can generate multi-track, but we don’t know where the melody and chords are.
So the task of this work is to bridge the gap, which is to go from merely melody or lead-sheet generation to a new task that we called “lead sheet arrangement.
”Our ApproachWe separate lead sheet arrangement into two parts.
The first one is the unconditioned generation model that generate lead sheets from nothing.
And the second one is the conditioned generation model that take lead sheets as an input and generate the arrangement.
Two Steps Approach for Lead Sheet ArrangementBut the problem is still there, how do we tackle the conditional arrangement generation because if we want the model to do this, we still need the paired datasets with both.
The key idea to tackle this challenge is using chord-related features.
The leadsheet files have explicit chords.
In MIDI files, although it doesn’t specify where the chords are, you can extract some chord-related features from the MIDI files.
Therefore, chords can be used as a way to communicate between lead-sheet files and MIDI files.
Our ModelThis model is based on MuseGAN, which uses the piano-roll form as the data representations.
In our first stage, called lead sheet generation, we simply take MuseGAN with the number of tracks shrunk to be two (melody and chord).
A secondary contribution in this work is that we find out using Recurrent model in phrase generation could capture more repetitive pattern in pop songs .
In our second stage, called lead sheet arrangement, we applied conditioned MuseGAN to learn to generate five tracks of instruments according to chord-related features as the condition.
We turn the lead sheet format and MIDI format files all into piano-roll form, which is shown in the following figure.
We make 8 bars into one clip.
And each bar consists of 48 time steps.
The vertical is the pitch range, and we use 84 pitches for total.
System FlowSystem ArchitectureResultsAmazing Grace Arrangement DemoMore Results are shown in Demo Page: https://liuhaumin.
io/LeadsheetArrangement/ConclusionsWe proposed the first model for lead sheet arrangement via a two-step processStep1: Recurrent Convolutional GAN model for lead sheet generation could capture more repetitive pattern in pop song datasetStep2: A conditioned GAN model bridge the gap of two datasets by chord-related features.
Reference[Paper] Hao-Min Liu, and Yi-Hsuan Yang.
Lead Sheet Generation and Arrangement by Conditional Generative Adversarial Network.
IEEE International Conference on Machine Learning and Applications (ICMLA), Dec.
[Open source code] https://github.
com/liuhaumin/LeadsheetArrangement[Demo website] https://liuhaumin.
io/LeadsheetArrangement/.. More details