摘要

In decision tree classification with differential privacy, it is query intensive to calculate the impurity metrics, such as information gain and gini index. More queries imply more noise addition. Therefore, a straightforward implementation of differential privacy often yields poor accuracy and stableness. This motivates us to adopt better impurity metric for evaluating attributes to build the tree structure recursively. In this paper, we first give a detailed analysis for the statistical queries involved in decision tree induction. Second, we propose a private decision tree algorithm based on the noisy maximal vote. We also present an effective privacy budget allocation strategy. Third, to boost the accuracy and improve the stableness, we construct the ensemble model, where multiple private decision trees are built on bootstrapped samples. Extensive experiments are executed on real datasets to demonstrate that the proposed ensemble model provides accurate and reliable classification results.