摘要

The National Cancer Institute thesaurus is an important knowledge resource that should ideally be error-free. We investigated the occurrence of errors in the Neoplasm subhierarchy, which is a part of the National Cancer Institute thesaurus Disease, Disorder or Finding hierarchy. There are five key findings in this study. (1) Errors in the Neoplasm subhierarchy are not uniformly distributed. (2) A partial-area taxonomy, which is a compact network for summarizing the structure and content of an ontology, helped uncover groups of concepts, called "smallpartial-areas," in the Neoplasm subhierarchy. (3) The rate of errors in" small partial-areas" is twice as large as in " large partial-areas" (44% versus 22%), satisfying statistical significance. Thus, we conclude that higher error concentrations exist in small partial-areas. (4) Group-based auditing can be used successfully to identify additional suspicious concepts in a small group, once a few members of the group are already known as erroneous. (5) Error correction propagation can be used successfully and with minimal effort to correct additional errors in the Neoplasm subhierarchy that occur outside of an initial small group of erroneous concepts. We present examples of errors and examples of how corrections transform and simplify the partial-area taxonomy.

  • 出版日期2017