摘要

In the massive microblog texts, the ultra-short microblog text is difficult to be independently understood because of its special characteristics such as data sparseness, content fragmentation and so on. To solve this problem, this paper presents an associated semantic representation model for the ultra-short microblog text (ASRM-UMT) to help users understand it better. First, multi-layer associated semantic views of the ultra-short microblog text are built. The ICTCLAS system is adopted to extract the feature keywords from microblog texts. The mining algorithm of associated semantic on a dynamic time window is proposed to mine the associated semantic relations among the feature keywords. The mining process has deeply considered three aspects including context, comments and transmissions of microblog texts. Then, multi-layer associated semantic views of the ultra-short microblog text are optimized. The comparison of the clustering coefficients among several multi-layer associated semantic views is presented to select the optimal associated semantic view. Experimental results show that the proposed model can represent the ultra-short microblog text accurately and effectively.