MUSTER: Music score transcription evaluation metrics

MUSTER is a set of evaluation metrics for automatic music score transcription systems. Given a correct musical score and an estimated musical score, both in the MusicXML format, the metrics calculate how similar they are.

Download

Download the evaluation script

About the MUSTER metrics

The MUsic Score Transcription Error Rate (MUSTER) metrics are edit-distance-based metrics, similar to the word error rate (WER) used for evaluating automatic speech recognition systems. Each of the six metrics evaluates a specific aspect of musical score. These metrics are error rates; a lower value means a larger similarity between the esimated score and the ground truth.

Updates

You can download the old versions from the Github repository.

(2022/Jan/27) Some internal modules were updated.
(2022/Jan/18) Added an output file with details of error analysis. Some internal modules were modified.
(2021/Dec/17) Fixed somes bugs.

References

The edit-distance-based metrics are first introduced in Ref. [1]. The metrics for voices are defined in Ref. [2].

[1] Eita Nakamura, Emmanouil Benetos, Kazuyoshi Yoshii, Simon Dixon, “Towards Complete Polyphonic Music Transcription: Integrating Multi-Pitch Detection and Rhythm Quantization,” Proc. 43rd IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 101-105, 2018.
[2] Yuki Hiramatsu, Eita Nakamura, Kazuyoshi Yoshii, “Joint Estimation of Note Values and Voices for Audio-to-Score Piano Transcription,” Proc. 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021.