摘要

The autoencoder is a popular neural network model that learns hidden representations of unlabeled data. Typically, single- or multilayer perceptrons are used in constructing an autoencoder, but we use soft decision trees (i.e., hierarchical mixture of experts) instead. Such trees have internal nodes that implement soft multivariate splits through a gating function and all leaves are weighted by the gating values on their path to get the output. The encoder tree converts the input to a lower dimensional representation in its leaves, which it passes to the decoder tree that reconstructs the original input. Because the splits are soft, the encoder and decoder trees can be trained back to back with stochastic gradient-descent to minimize reconstruction error. In our experiments on handwritten digits, newsgroup posts, and images, we observe that the autoencoder trees yield as small and sometimes smaller reconstruction error when compared with autoencoder perceptrons. One advantage of the tree is that it learns a hierarchical representation at different resolutions at its different levels and the leaves specialize at different local regions in the input space. An extension with locally linear mappings in the leaves allows a more flexible model. We also show that the autoencoder tree can be used with multimodal data where a mapping from one modality (i.e., image) to another (i.e., topics) can be learned.

  • 出版日期2017-10-4