摘要
Escherichia coli is a free-living bacterium that condensates a large legacy of knowledge as a result of years of experimental work in molecular biology. It represents a point of departure for analyses and comparisons with the ever-increasing number of finished microbial genomes. For years, we have been gathering knowledge from the literature on transcriptional regulation and operon organization in E. coli K-12, and organizing it in a relational database, RegulonDB. RegulonDB contains information of 20-25 % of the expected total sets of regulatory interactions at the level of transcription initiation. We have used this knowledge to generate computational methods to predict the missing sets in the genome of E. coli, focusing on prediction of promoters, regulatory sites, regulatory proteins, operons, and transcription units. These predictions constitute separate pieces of a single puzzle. By putting them all together, we shall be able to predict the complete set of regulatory interactions and transcription unit organization of E. coli. Orthologous genes in other genomes of known coregulated sets of genes in E. coli, along with their corresponding predicted operons, and their predicted transcriptional regulators, shall permit the extension of the previous goal to many more microbial genomes.