摘要

A classifier for two or more samples is proposed when the data are high-dimensional and the distributions may be non-normal. The classifier is constructed as a linear combination of two easily computable and interpretable components, the U-component and the P-component. The U-component is a linear combination of U-statistics of bilinear forms of pairwise distinct vectors from independent samples. The P-component, the discriminant score, is a function of the projection of the U-component on the observation to be classified. Together, the two components constitute an inherently bias-adjusted classifier valid for high-dimensional data. The classifier is linear but its linearity does not rest on the assumption of homoscedasticity. Properties of the classifier and its normal limit are given under mild conditions. Misclassification errors and asymptotic properties of their empirical counterparts are discussed. Simulation results are used to show the accuracy of the proposed classifier for small or moderate sample sizes and large dimensions. Applications involving real data sets are also included.

  • 出版日期2018-9