摘要

Electronic Health Records (EHRs) refer to a collection of patient data, including diagnosis, medical history, medication, allergies, etc., mostly contained in the form of unstructured text. EHRs are designed to capture the state of a patient over time, thus the temporal information is crucial. Most previous works processing time in EHRs narrative focused on temporal expression extraction, using textual dimension to embody the temporal dimension. In this paper, we propose to model the textual and the temporal dimension of EHRs narrative jointly. To meet the challenge, we propose to model the EHRs narrative as temporal sequential data. A novel representation framework is designed to model the clinical narrative text as document sequence, where the textual and temporal dimension are modeled simultaneously. In the framework, a dynamic time warping based measure is proposed to quantify the similarity between EHRs of different patients. To verify the effectiveness of the model, the proposed model is applied in EHRs search via clustering algorithm. Experiments on real-world EHRs data set demonstrate that the proposed model sufficiently expresses the temporal feature of the EHRs and provides an effective solution for measuring the temporal similarity between EHRs of different patients.