Data Integration for Dynamic and Sustainable Systems Biology Resources: Challenges and Lessons Learned

作者:Sullivan Daniel E*; Gabbard Joseph L Jr; Shukla Maulik; Sobral Bruno
来源:Chemistry and Biodiversity, 2010, 7(5): 1124-1141.
DOI:10.1002/cbdv.200900317

摘要

Systems-biology and infectious-disease (host pathogen environment) research and development is becoming increasingly dependent on integrating data from diverse and dynamic sources. Maintaining integrated resources over long periods of time presents distinct challenges. This review describes experiences and lessons learned from integrating data in two five-year projects focused on pathosystems biology: the Pathosystems Resource Integration Center (PATRIC, http://patric.vbi.vt.edu/), with a goal of developing bioinformatics resources for the research and countermeasures-development communities based on genomics data, and the Resource Center for Biodefense Proteomics Research (RCBPR, http://www.proteomicsresource.org/), with a goal of developing resources based on the experiment data such as microarray and proteomics data from diverse sources and technologies. Some challenges include integrating genomic sequence and experiment data, data synchronization, data quality control, and usability engineering. We present examples of a variety of data-integration problems drawn from our experiences with PATRIC and RBPRC, as well as open research questions related to long-term sustainability, and describe the next steps to meeting these challenges. Novel contributions of this work include 1) an approach for addressing discrepancies between experiment results and interpreted results, and 2) expanding the range of data-integration techniques to include usability engineering at the presentation level.

  • 出版日期2010
  • 单位美国弗吉尼亚理工大学(Virginia Tech)