摘要

While a listener may derive semantic associations for audio clips from direct auditory cues (e.g., hearing %26quot;bass guitar%26quot;) as well as from %26quot;context%26quot; (e.g., inferring %26quot;bass guitar%26quot; in the context of a %26quot;rock%26quot; song), most state-of-the-art systems for automatic music annotation ignore this context. Indeed, although contextual relationships correlate tags, many auto-taggers model tags independently. This paper presents a novel, generative approach to improve automatic music annotation by modeling contextual relationships between tags. A Dirichlet mixture model (DMM) is proposed as a second, additional stage in the modeling process, to supplement any auto-tagging system that generates a semantic multinomial (SMN) over a vocabulary of tags when annotating a song. For each tag in the vocabulary, a DMM captures the broader context the tag defines by modeling tag co-occurrence patterns in the SMNs of songs associated with the tag. When annotating songs, the DMMs refine SMN annotations by leveraging contextual evidence. Experimental results demonstrate the benefits of combining a variety of auto-taggers with this generative context model. It generally outperforms other approaches to modeling context as well.

  • 出版日期2012-5