摘要

In this paper we introduce text categorization methods to address the classification problem of literature containing Quantitative Trait Locus, QTL information. Our work focused on building an automatic categorization system targeting the QTL information of various species based on Support Vector Machines, SVM. A text representation strategy is proposed combining words and phrases that effectively improve the classification accuracy. Through studying literature containing QTL information and other species-related publications, we determined representative phrases and detected abbreviations in order to form another set of features. Together with the words selected by Chi value, the two sets of features were both used to represent text samples. We employed a portion of particular species’QTL-related literature data to conduct an experiment regarding the system’s construction, and then tested our system using the data of multiple plants and species. The experiment results indicate that our work may help further research on constructing QTL information databases.

全文