摘要

Because classical session identification methods based on timeout-oriented and referrer-based heuristics are restricted to discover complex patterns in Web usage mining, a new method based on URL semantic analysis to identify user sessions is presented. Every URL in Web log files is given a centain semantic information with the aid of Web directory in this method and then some factors are defined to measure the semantic distance between URLs. According to static and dynamic Web logs, two semantic outliers detection methods - SOAsand SOAd, are presented respectively to segment user sessions. Finally, some comparison experiments between classical session identification method and the proposed method are conducted, and the results show that the precision ratio and recall ratio of session identification are increased.

  • 出版日期2011