Unsupervised Extraction of Popular Product Attributes from E-Commerce Web Sites by Considering Customer Reviews

被引:30
作者
Bing, Lidong [1 ]
Wong, Tak-Lam [2 ]
Lam, Wai [3 ,4 ]
机构
[1] Carnegie Mellon Univ, Machine Learning Dept, 5000 Forbes Ave, Pittsburgh, PA 15213 USA
[2] Hong Kong Inst Educ, Dept Math & Informat Technol, 10 Lo Ping Rd, Tai Po, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Lab High Confidence Software Technol, Minist Educ, CUHK Sub Lab, Hong Kong, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
关键词
Information extraction; conditional random fields; product attribute; customer reviews;
D O I
10.1145/2857054
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We develop an unsupervised learning framework for extracting popular product attributes from product description pages originated from different E-commerce Web sites. Unlike existing information extraction methods that do not consider the popularity of product attributes, our proposed framework is able to not only detect popular product features from a collection of customer reviews but also map these popular features to the related product attributes. One novelty of our framework is that it can bridge the vocabulary gap between the text in product description pages and the text in customer reviews. Technically, we develop a discriminative graphical model based on hidden Conditional Random Fields. As an unsupervised model, our framework can be easily applied to a variety of new domains and Web sites without the need of labeling training samples. Extensive experiments have been conducted to demonstrate the effectiveness and robustness of our framework.
引用
收藏
页数:17
相关论文
共 39 条
[1]  
Alfonseca E, 2010, SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, P58
[2]  
[Anonymous], 2005, Proceedings of the ACM international conference on world wide web, DOI 10.1145/1060745.1060797
[3]  
[Anonymous], 2005, P C HUM LANG TECHN E, DOI DOI 10.3115/1220575.1220618
[4]  
[Anonymous], 2006, ACM SIGKDD Explorations Newsletter
[5]  
[Anonymous], 2013, Proc., 6th ACM Int. Conf. Web Search and Data Mining (WSDM '13), ACM, New York
[6]  
Bing L., 2011, P 20 ACM INT C INF K, P1265
[7]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[8]  
Bloom K., 2007, P HLT NAACL, P308
[9]  
Deng Cai, 2004, Proceedings of Sheffield SIGIR 2004. The Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P456
[10]  
Ding XW, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P1125