摘要

Aiming at the data problem of the academic papers filled by the teachers in the comprehensive information system of University of Electronic Science and Technology of China, a solution to find the standard journal names or the conference names by calculating the cosine similarity is presented. First, the filled names are pretreated and the names crawled from the Internet are cleaned, and then the test names are generated. Through a classic TF-IDF method, all of the test names and the standard journal names are divided into words and the stop words of the names are removed. Then the words are taken from the names. After the TF-IDF value of every words is calculated, all of the test names and the standard journal names are converted into multidimensional vectors consisting of the TF-IDF value of every words. By calculating the cosine similarity between the test names and the standard journal names, the correct standard journal names are identified. The identification results show that the cosine similarity calculation can improve the quality of the filled data for the academic papers.

全文