A Streaming Parallel Decision Tree Algorithm

Ben Haim Yael<sup>*</sup>; Tom Tov Elad

摘要

We propose a new algorithm for building decision tree classifiers. The algorithm is executed in a distributed environment and is especially designed for classifying large data sets and streaming data. It is empirically shown to be as accurate as a standard decision tree classifier, while being scalable for processing of streaming data on multiple processors. These findings are supported by a rigorous analysis of the algorithm's accuracy.
The essence of the algorithm is to quickly construct histograms at the processors, which compress the data to a fixed amount of memory. A master processor uses this information to find near-optimal split points to terminal tree nodes. Our analysis shows that guarantees on the local accuracy of split points imply guarantees on the overall tree accuracy.

出版日期2010-2
单位IBM

收藏分享被引(31) 浏览

更新时间：2018-02-09 15:37

A Streaming Parallel Decision Tree Algorithm

摘要

产品服务

站内浏览

服务支持

联系方式

科研之友