摘要

In this paper we propose two variational models for semi-supervised clustering of high-dimensional data. The new models produce substantial improvements of the classification accuracy in comparison with the corresponding models without the regional force in cases that the sample rate is relatively low. For the proposed models, the data points are modeled as vertices of a weighted graph, and the labeling function defined on each vertex takes values from the unit simplex, which can be interpreted as the probability of belonging to each class. The algorithm is proposed as a minimization of a convex functional of the labeling function. The first model combines the Rayleigh quotient for the graph Laplacian and a region-force term, and the second one only replaces the Rayleigh quotient with the total variation of the labeling function. The region-force term is calculated by the affinity between each vertex and the training samples, characterizing the conditional probability of each vertex belonging to each class. The numerical methods for solving these two versions of the proposed algorithm are presented, and both are tested on several benchmark data sets such as handwritten digits (MNIST) and moons data. Experiments indicate that the classification accuracy and the computational speed are competitive with the state-of-the-art in multi-class semi-supervised clustering algorithms. Numerical experiments also confirm that the total variation model out performs the Laplacian counter part in most of the tests.