摘要

In this work, we propose a research framework to help people summarize tourism information, such as popular tourist locations as well as their travel sequences (routes), for a previously unknown city from massive travel blog with the objective of providing users with better travel scheduling. To do this, we first crawl the massive travel blogs for a targeted city online. Then, we transfer the textual contents of these blogs to a series of word vectors to form the initial data source. Next, we implement the frequent pattern mining method on the data to identify the city's popular locations by their sequenced co-occurrences among the usual tourism activities, which can be visualized into a word network. Finally, we develop a max-confidence based method to detect travel routes from the network. We illustrate the benefits of this approach by applying it to the data from a blog web-site run by a Chinese online tourism service company. The results show that the proposed method can efficiently explore the popular travel information from massive data.