Assessment of approximate string matching in a biomedical text retrieval   problem

Wang JF; Li ZR; Cai CZ; Chen YZ<sup>*</sup>

doi:10.1016/j.compbiomed.2004.06.002

摘要

Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval.

出版日期2005-10
单位四川大学; 重庆大学

全文

访问全文

收藏分享被引(1) 浏览

更新时间：2018-08-02 21:01

Assessment of approximate string matching in a biomedical text retrieval problem

摘要

全文

产品服务

站内浏览

服务支持

联系方式

科研之友