摘要

Classification methods are becoming more and more useful as part of the standard data analyst's toolbox in many application domains. The specific data and domain characteristics of social media tools used in online educational contexts present the challenging problem of training high-quality classifiers that bring important insight into activity patterns of learners. Currently, standard and also very successful model for classification tasks is represented by decision trees. In this paper, we introduce a custom-designed data analysis pipeline for predicting "spam" and "don't care" learners from eMUSE online educational environment. The trained classifiers rely on social media traces as independent variables and on final grade of the learner as dependent variables. Current analysis evaluates performed activities of learners and the similarity of two derived data models. Experiments performed on social media traces from five years and 285 learners show satisfactory classification results that may be further used in productive environment. Accurate identification of "spam" and "don't care" users may have further a great impact on producing better classification models for the rest of the "regular" learners.

  • 出版日期2017-10