A brave new (virtual) world: distributed searches, relevance scoring and facets

作者:King Todd*; Narock Tom; Walker Raymond; Merka Jan; Joy Steven
来源:Earth Science Informatics, 2008, 1(1): 29-34.
DOI:10.1007/s12145-008-0002-7

摘要

Our ability to deal with complex systems has improved through information system research which includes improved modeling (both data and system), the use of semantics and advances in distributed computing. The past decade has seen an explosion in the amount and variety of geosciences data and the emergence of true open data repositories through which scientists can freely access this data. Those data are found in thousands of repositories located around the world. Virtual observatories have been created to address the challenge of helping scientists search those repositories to find and access the required data. This challenge is been addressed by using technologies such as the Internet (with ample connectivity and bandwidth), the Web, cheap computing power, cheap storage and standards for critical components. Many scientific disciplines are developing virtual observatories. Yet some of the most compelling science questions cross multiple domains. While semantics can provide cross domain reasoning, often the first step in answering a question is determining what resources are available which may be relevant to a topic. The topic can be expressed as simple phrases or word sequences. Using a common relevance scoring method at all locations can enable a federated search across loosely coupled providers. The results of which can be organized into facets to aid the user in selecting the most promising resources with which to pursue the scientific investigation. We describe an approach to developing and deploying relevance scoring methods and faceted results in this brave new (virtual) world. We have found that a scoring method which considers both the presence of terms and the proximity of these terms relative to the order of the terms in the query improves the assessment of relevance. We call this Term Presence-Proximity (TPP) scoring and describe a method for calculating a normalized score. TPP scoring compares favorably with other scoring approaches.

  • 出版日期2008-4