A product normalization method for E-commerce

作者:Wang, Li; Zhang, Rong*; Sha, Chao Feng; Wang, Xiao Ling; Zhou, Ao Ying
来源:Chinese Journal of Computers, 2014, 37(2): 312-325.


The booming of E-commerce in terms of product variety and quantity brings new challenges to data management, one of which is Product Normalization. Product normalization is to determine whether products are referring to the same underlying entity. It is a fundamental task of data management in E-commerce, especially for C2C (Customer-to-Customer) model, which can improve search functionality and user's shopping experience. However, Product normalization in E-market is difficult because the data is full of noise and without a uniform schema, making the existed normalization methods inefficient. In this paper, we propose a hybrid framework, which combines product normalization with the schema integration and data cleaning. Firstly, we propose a graph-based method to integrate the schema. Secondly, we fill the missing data and repair the incorrect data by using evidences extracted from product surrounding information, such as the title and textual description. Thirdly, we distinguish products by clustering on the product similarity matrix, which is learned by using linear logistic regression model. Finally, we conduct experiments on a real-world data and the experimental results confirm the effectiveness of our design by comparing with the existing methods.
