Generating music using machine learning

April 26, 2022

Recent music generation results (e.g. Music Transformer: Generating Music with Long-Term Structure) are part of what piqued my interest in machine learning. After following an introduction to machine learning, it's time for some experimentation.

First up: generating a ragtime piano piece.

Approaches

The most compelling generated music I've seen thus far comes from Google Brain, namely their Performance RNN and Music Transformer papers. The associated GitHub repositories appear to contain models that have been pre-trained on various corpora (e.g. a piano competition's MIDI recordings). It's also possible to train using a new corpus. The trained models can generate continuations based on a primer or generate unconditioned music "from scratch".

Here are several approaches I'm investigating for generating a ragtime piece:

Condition pre-trained Performance RNN and Music Transformer models with existing ragtime music (either an intro or the first few measures) and generate a continuation
Train a new model on a corpus of ragtime music, and then do unconditioned generation
Train a new model on a ragtime corpus and generate a continuation from a ragtime primer
Train a new model on a ragtime corpus and generate a continuation from an arbitrary primer

I don't have any intuition for how large of a corpus is required to generate a decent model, so it's possible that options 2 - 4 won't be feasible for me (either because finding/generating such a training corpus is too difficult or the compute required to train the model is beyond what my computer can handle).

Using pre-trained models

Without installing anything locally, you can use the Music Transformer notebook to generate music. There are several options:

Generate music "from scratch" (unconditional)
Generate a continuation based on a primer
Generate accompaniment for a (monophonic) melody

Unconditional generation

Without providing a primer, I don't think it's possible to indicate what genre of music you'd like to generate. For example, the clip I got sounds like some sort of boogie-woogie folk march. Obviously this isn't the genre that I was looking for (or, really, than anyone was looking for). Rather than continuing on randomly like this, I'll investigate primed generation.

Continuations

The notebook linked above also supports providing a primer, either from a provided list or by uploading a MIDI file directly in the UI.

Cropping MIDI primers

The primer is included in the output, so I assume it should be reasonably short. My original plan was to edit down an existing ragtime MIDI using MuseScore, but MuseScore's output appears to be incompatible with the pretty_midi module that the notebook uses, resulting in the following error:

pretty_midi\pretty_midi.py:97: RuntimeWarning: Tempo, Key or Time signature change events found on non-zero tracks. This is not a valid type 0 or type 1 MIDI file. Tempo, Key or Time Signature may be wrong.

My workaround was to switch to using Audacity to crop my MIDI primer (and this worked without issue).

Example continuations

First, I tried using just the intro bars of some Scott Joplin rags:

Wall Street Rag intro: this generated an interesting continuation that sounded vaguely like a cross between ragtime and new age piano
Magnetic Rag intro: this generated a continuation with a halting style that keeps repeating notes

Overall, the results are impressive, but also somewhat alien. And definitely not ragtime.

Next, I tried supplying the beginning of a few sections of Joplin rags:

Wall Street Rag
- The first continuation gets a bit stuck on the primer, but then recovers nicely into a very short still-not-quite-ragtime section
- The second continuation strays quickly and widely from ragtime (continuing the trend of "impressive, but not what I wanted")
Maple Leaf Rag
- This continuation seemed to ignore the primer and just started cycling through music I can only describe as "movie soundtrack"
Magnetic Rag
- The first continuation was short, but rag-like!
- The second continuation was similarly short, but promising
- The third continuation was much longer, but... bad

Accompaniment

Out of curiosity, I also tried generating an accompaniment (based on a monophonic melody that consists of the highest non-overlapping notes in the cropped MIDIs from the last section):

Magnetic Rag accompaniment: this was a baroquely ornamented mix of classical and blues

Reflecting on pre-trained models

Unsurprisingly, the generic pre-trained models I used, while undoubtedly impressive, seem best suited for exploration and amusement, rather than producing something focused on a particular genre.

I suspect that the best path forward for this experiment is to train a new ragtime-focused model on a corpus of typical ragtime MIDIs. As noted earlier, it's possible I won't be able to find either a large enough corpus or enough compute power to produce a reasonable model, but if I do succeed, I think the results will be more consistently rag-like.