Dynamic Bayesian Networks for Symbolic Polyphonic Pitch Modeling

作者:Raczynski Stanislaw A*; Vincent Emmanuel; Sagayama Shigeki
来源:IEEE Transactions on Audio Speech and Language Processing, 2013, 21(9): 1830-1840.
DOI:10.1109/TASL.2013.2258012

摘要

Symbolic pitch modeling is a way of incorporating knowledge about relations between pitches into the process of analyzing musical information or signals. In this paper, we propose a family of probabilistic symbolic polyphonic pitch models, which account for both the "horizontal" and the "vertical" pitch structure. These models are formulated as linear or log-linear interpolations of up to five sub-models, each of which is responsible for modeling a different type of relation. The ability of the models to predict symbolic pitch data is evaluated in terms of their cross-entropy, and of a newly proposed "contextual cross-entropy" measure. Their performance is then measured on synthesized polyphonic audio signals in terms of the accuracy of multiple pitch estimation in combination with a Nonnegative Matrix Factorization-based acoustic model. In both experiments, the log-linear combination of at least one "vertical" (e.g., harmony) and one "horizontal" (e.g., note duration) sub-model outperformed a pitch-dependent Bernoulli prior by more than 60% in relative cross-entropy and 3% in absolute multiple pitch estimation accuracy. This work provides a proof of concept of the usefulness of model interpolation, which may be used for improved symbolic modeling of other aspects of music in the future.

  • 出版日期2013-9