摘要

This paper presents an approach to solve the author profiling, a text classification task, which consists in determining the demographic and psychological characteristics of an author (like age, gender and personality traits), from some samples of the author's writing style. The main focus of the approach consists on the creation and enrichment of a co-occurrence graph using the link prediction theory in order to find an author's profile considering a graph similarity technique (instead of a traditional supervised learning strategy). The proposed method is applied on the English language partition of the CLEF PAN 2015 author profiling task, producing competitive results that are close to the best results reported so far, given the same training and test corpora. The experimental results show that the addition of new edges to a graph representation based on the topological neighborhood of words can be a valuable asset to infer and discover patterns in texts that comes from social media. Additionally, the use of a graph similarity provides a novel way for analyzing how alike are the texts related to a specific demographic or personality aspect against the writing style of an author.

  • 出版日期2018