Note on Naive Bayes Based on Binary Descriptors in Cheminformatics

作者:Townsend Joe A; Glen Robert C; Mussa Hamse Y*
来源:Journal of Chemical Information and Modeling, 2012, 52(10): 2494-2500.
DOI:10.1021/ci200303m

摘要

A plethora of articles on naive Bayes classifiers, where the chemical compounds to be classified are represented by binary valued (absent or present type) descriptors, have appeared in the cheminformatics literature over the past decade. The principal goal of this paper is to describe how a naive Bayes classifier based on binary descriptors (NBCBBD) can be employed as a feature selector in an efficient manner suitable for cheminformatics. In the process, we point out a fact well documented in other disciplines that NBCBBD is a linear classifier and is therefore intrinsically suboptimal for classifying compounds that are nonlinearly separable in their binary descriptor space. We investigate the performance of the proposed algorithm on classifying a subset of the MDDR data set, a standard molecular benchmark data set, into active and inactive compounds.

  • 出版日期2012-10