摘要

A small dataset often makes it difficult to build a reliable learning model, and thus some researchers have proposed virtual sample generation (VSG) methods to add artificial samples into small datasets to extend the data size. However, for some datasets the assumption of the distribution of data in the VSG methods may be vague, and when data only has a few attributes, such approaches may not work effectively. Other researchers thus proposed attribute extension methods to generate attributes to convert data into a higher dimensional space. Unfortunately, the resulting dataset may become a sparse dataset with many null or zero values in extended attributes, and then a large quantity of such attributes will reduce the representativeness of instances for the learning model. Therefore, based on fuzzy theories, this paper proposes a novel sample attribute extending (SEA) method to extend a suitable quantity of attributes to improve small dataset learning. In order to verify the validity of the SEA method, using SVR and BPNN, this paper adopts two real cases and two public datasets to conduct the learning of the predictive model, and uses the paired t-test to statistically examine the significance of improvement. The experimental results show that the proposed SEA method can effectively improve the learning accuracy of small datasets.

  • 出版日期2018-4-19