摘要

Entity recognition plays an important role in building the electronic medical records (EMRs) based medical knowledge graph, which is significant for building Clinical decision support (CDS) system. Cross-disease clinical documents are context-related and have different interrelated semantic structures, which bring challenges for entity recognition using traditional methods. In order to solve these problems, this paper proposes a co-training based entity recognition approach for cross-disease clinical documents. In this model, we first build partial annotation corpus of the single disease using dependency syntax analysis and the medical statement rule unifies. Then, according to the partial annotation corpus of different diseases, the sentence level features are extracted through the Bi-LSTM layer with memory unit and CRF methods, which optimize the whole sequence and improve the combination probability of sequence labels. Finally, the results with higher confidence are selected by cross feedback to label the corpus, which enlarges the size of corpus and improves the accuracy of the document entity recognition. The experiment result proves the availability and high efficiency of our method.