摘要

Gene mutation (e.g. substitution, insertion and deletion) and related phenotype information are important biomedical knowledge. Many biomedical databases (e.g. OMIM) incorporate such data. However, few studies have examined the quality of this data. In the current study, we examined the quality of protein single-point mutations in the OMIM and identified whether the corresponding reference sequences align with the mutation positions. Our results show that close to 20% of mutation data cannot be mapped to a single reference sequence. The failed mappings are caused by position conflict, site shifting (peptide, N-terminal methionine) and other types of data error. We propose a preliminary model to resolve such inconsistency in the OMIM database.