摘要

The problem of nonparametric estimation of the joint probability density of a vector of continuous and ordinal/nominal categorical random variables with bounded support is considered. There are numerous publications devoted to the cases of either continuous or categorical variables, and the curse of dimensionality and strong regularity assumptions are the two familiar issues in the literature. Mixed variables occur in practically all applications of the statistical science and, nonetheless, the literature devoted to the joint density estimation is practically next to none. This paper develops the theory of estimation of the density of mixed variables which is on par with results known for simpler settings. Specifically, a data-driven estimator is developed that adapts to unknown anisotropic smoothness of the joint density and, whenever the density depends on a smaller number of variables, performs a dimension reduction that implies the corresponding optimal rate of the mean integrated squared error (MISE) convergence. The results hold without traditional, in the density estimation literature, minimal regularity assumptions like differentiability or continuity of the density. The procedure of estimation is based on mimicking an oracle-estimator that knows the underlying density, and the main theoretical result is the oracle inequality which relates the MISEs of the estimator and the oracleestimator. The proof is based on a new exponential inequality for Sobolev statistics which is of interest on its own merits.

  • 出版日期2011-3