摘要

In recent works, it was reported that the distributions of specific binary profiles associated with DNA sequences can be used for rapid homology assessment in genome space. In this work, following this line of research, we propose a new and effective approach to identify protein coding domains using binary profiles. In our method, a set of DNA segments having similar algebraic structures are represented by one binary profile. The binary profiles with higher appearance rates in known protein coding domains can be used to find unknown or potential protein coding domains. We test our method on complete sequence of Halalkalicoccus jeotgali B3 plasmid 1, genome of Escherichia coli ATCC 8739 and genome of Gallus gallus. Experiment results show that the binary profile method performs significantly in identifying unknown protein coding domains. By statistic analysis, we conclude that the obtained experimental results are statistically significant.