Automatic labelling of clusters of discrete and continuous data with supervised machine learning

作者:Lopes Lucas A; Machado Vinicius P*; Rabelo Ricardo A L; Fernandes Ricardo A S; Lima Bruno V A
来源:Knowledge-Based Systems, 2016, 106: 231-241.
DOI:10.1016/j.knosys.2016.05.044

摘要

The clustering problem has been considered one of the most relevant problems in the research area of unsupervised learning. However, the comprehension and definition of such clusters is not a trivial task, making necessary their identification, i.e., assign a label to each cluster. To address the problem of labelling clusters, this paper presents a methodology based on techniques for supervised learning, unsupervised learning and a discretization model. Thus, a method with unsupervised learning is applied to the clustering problem, and the supervised learning algorithm is responsible for detecting the meaningful attributes to define each formed cluster. Some strategies are used to form a methodology that presents a label (based on attributes and values) for each provided cluster. Such methodology is applied to three different databases, in which acceptable results were achieved with an average that exceeds 92.89% of correctly labelled elements.

  • 出版日期2016-8-15