[museformer] support arbitrary datasets

This commit is contained in:
btyu 2023-01-10 11:41:31 +08:00
Родитель 2d17693cc9
Коммит 68b6d75b7d
2 изменённых файлов: 1031 добавлений и 1 удалений

Просмотреть файл

@ -12,7 +12,9 @@ The following content describes the steps to run Museformer. All the commands ar
## 1. Dataset
We use [the Lakh MIDI dataset](https://colinraffel.com/projects/lmd/) (LMD-full). Specifically, we first preprocess it as described in the Appendix of our paper. The final dataset (see the file lists [here](data/meta)) contains 29,940 MIDI files. Their time signatures are all 4/4, and the instruments are normalized to 6 basic ones: square synthesizer (80), piano (0), guitar (25), string (48), bass (43), drum, where in the parentheses are MIDI program IDs if applicable. We put all the MIDI files in `data/midi`.
We use [the Lakh MIDI dataset](https://colinraffel.com/projects/lmd/) (LMD-full). Specifically, we first preprocess it as described in the Appendix of our paper. The final dataset (see the file lists [here](data/meta)) contains 29,940 MIDI files. Their time signatures are all 4/4, and the instruments are normalized to 6 basic ones: square synthesizer (80), piano (0), guitar (25), string (48), bass (43), drum, where in the parentheses are MIDI program IDs if applicable. Put all the MIDI files in `data/midi`.
**Note:** If you want to train Museformer on an arbitrary dataset with various time signatures and instruments instead of only the ones mentioned above, please see all the **[General Use]** part throughout the document.
Install [MidiProcessor](https://github.com/btyu/MidiProcessor). Then, encode the MIDI files into tokens:
@ -30,6 +32,8 @@ where the arguments are explained as follows:
- `ignore-ts`: do not add the tokens of time signature. Since the used data are all 4/4, we do not encode it.
- `sort-insts`: designate a method that sorts the instruments. `6tracks_cst1` sorts the instruments in order: square synthesizer, drum, bass, guitar, piano, string.
**[General Use]** To make the representation support various time signatures and instruments, please set `--encoding-method REMIGEN` and `--sort-insts id` instead of the ones in the above commend, and also remove the `--ignore-ts` parameter.
After encoding, you should see the token representation of each MIDI file in `output_dir`.
Then, run the following command to gather the tokens for each split.
@ -42,6 +46,8 @@ for split in train valid test :
done
```
**[General Use]** To use an arbitrary dataset, please create the MIDI file lists for your dataset on your own as `data/meta/{train,valid,test}.txt` before running the above command.
Next, use `fairseq-preprocess` to make binary data:
```bash
@ -61,6 +67,8 @@ fairseq-preprocess \
Now, you should see the binary data in `data-bin/lmd6remi`.
**[General Use]** Set `--srcdict data/meta/general_use_dict.txt`, which is a vocabulary list that contains various time signatures and instruments.
## 2. Environment
The implementation of Museformer relies on specific hardware and software environment.
@ -99,6 +107,8 @@ In our experiment, we run it on 4 GPUs, and the batch size is set to 1, so the r
By modifying `con2con` and `con2sum`, you can control the bars for the fine-grained attention and the coarse-grained attention, respectively.
**[General Use]** Please add `--beat-mask-ts True` for the `fairseq-train` commend.
In your first run, it may take some time to build up auxiliary information and compile CUDA kernels, so you may take a cup of coffee at this moment.
You can download a checkpoint [here](https://1drv.ms/u/s!Aq3YEPZCcV5ibz9ySjjNsEB74CQ), and put it in `checkpoints/mf-lmd6remi-1` for evaluation and inference.

Разница между файлами не показана из-за своего большого размера Загрузить разницу