摘要

We describe a project undertaken by an interdisciplinary team combining researchers in sleep psychology and in Natural Language Processing/Machine Learning. The goal is sentiment analysis on a corpus containing short textual descriptions of dreams. Dreams are categorized in a four-level scale of positive and negative sentiments. We chose a four scale annotation to reflect the sentiment strength and simplicity at the same time. The approach is based on a novel representation, taking into account the leading themes of the dream and the sequential unfolding of associated sentiments during the dream. The dream representation is based on three combined parts, two of which are automatically produced from the description of the dream. The first part consists of co-occurrence vector representation of dreams in order to detect sentiment levels in the dream texts. Those vectors unlike the standard Bag-of-words model capture non-local relationships between meanings of word in a corpus. The second part introduces the dynamic representation that captures the sentimental changes throughout the progress of the dream. The third part is the self-reported assessment of the dream by the dreamer according to eight given attributes (self-assessment is different in many respects from the dream's sentiment classification). The three representations are subject to aggressive feature selection. Using an ensemble of classifiers on the combined 3-partite representation, the agreement between machine rating and the human judge scores on the four scales was 64 % which is in the range of human experts' consensus in that domain. The accuracy of the system was 14 % more than previous results on the same task.

  • 出版日期2014-6

全文