摘要

The activities of prokaryotes are pivotal in shaping the environment, and at the same time are greatly influenced by the environment. By using the genomic data and environmental descriptions of the complete prokaryotic genomes in NCBI's Microbial Genome Project Database and applying statistical methods, we have identified in a systematic manner those gene groups whose presence/frequency patterns are different for organisms of different environmental conditions. Here environmental conditions are characterized in four dimensions - salinity, oxygen requirement, habitat and temperature, and are based on the controlled vocabularies that NCBEs Microbial Genome Project database uses to specify the organism information: and, gene groups are determined as Clusters of Orthologous Groups (COG) and KEGG Orthology (KO) groups. These identified COG and KO groups are considered as potentially correlated with certain environmental conditions, and are then mapped to the COG general categories and KEGG pathways to determine which part of the functional machinery of prokaryotic cells are correlated with the environments. The observations derived from the analysis of the COG and KO groups that are potentially correlated with the oxygen requirement and habitat conditions are in general consistent with existing studies on properties of organisms living in different conditions of these two environmental factors. To further assess the identified correlation relationships, we have also examined whether the environmental conditions are predictable based on the gene distributions in the selected COG and KO groups. The misclassification rates of the prediction experiments are much smaller than that rendered by random guessing, indicating the existence of the correlation relationships between organisms' environmental conditions and gene distributions in certain functional groups. However, the rather moderate misclassification rates (the 25- and 75-percentiles of the misclassification rates of all prediction experiments are 16.79% and 24.06%, respectively) also indicate that the correlation relationships between environmental conditions and gene distributions in certain functional groups are not strong enough for one to decisively define the other.

  • 出版日期2010-7