Identifying Features in Opinion Mining via Intrinsic and Extrinsic Domain Relevance

被引：77

作者：

Hai, Zhen ^{[1
]}

Chang, Kuiyu ^{[1
]}

Kim, Jung-Jae ^{[1
]}

Yang, Christopher C. ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Comp Engn, DISCO Lab N4 B3C 14, Singapore 639798, Singapore

[2] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2014年 / 26卷 / 03期

关键词：

Information search and retrieval; natural language processing; opinion mining; opinion feature; Chinese;

D O I：

10.1109/TKDE.2013.26

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The vast majority of existing approaches to opinion feature extraction rely on mining patterns only from a single review corpus, ignoring the nontrivial disparities in word distributional characteristics of opinion features across different corpora. In this paper, we propose a novel method to identify opinion features from online reviews by exploiting the difference in opinion feature statistics across two corpora, one domain-specific corpus (i.e., the given review corpus) and one domain-independent corpus (i.e., the contrasting corpus). We capture this disparity via a measure called domain relevance (DR), which characterizes the relevance of a term to a text collection. We first extract a list of candidate opinion features from the domain review corpus by defining a set of syntactic dependence rules. For each extracted candidate feature, we then estimate its intrinsic-domain relevance (IDR) and extrinsic-domain relevance (EDR) scores on the domain-dependent and domain-independent corpora, respectively. Candidate features that are less generic (EDR score less than a threshold) and more domain-specific (IDR score greater than another threshold) are then confirmed as opinion features. We call this interval thresholding approach the intrinsic and extrinsic domain relevance (IEDR) criterion. Experimental results on two real-world review domains show the proposed IEDR approach to outperform several other well-established methods in identifying opinion features.

引用

页码：623 / 634

页数：12

共 35 条

[1]

[Anonymous], 2007, ACL 07

[2]

[Anonymous], 2006, ACL

[3]

[Anonymous], 2005, P HUM LANG TECHN C E

[4]

[Anonymous], 2010, Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid

[5]

[Anonymous], 2012, Synth. Lectures Human Lang. Technol., DOI [10.2200/S00416ED1V01Y201204HLT016, DOI 10.2200/S00416ED1V01Y201204HLT016]

[6]

[Anonymous], 2004, Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, DOI 10.3115/1218955.1218990

[7]

[Anonymous], 2005, P C HUM LANG TECHN E, DOI DOI 10.3115/1220575.1220618

[8]

[Anonymous], 2011, Proceedings of the Conference on Empirical Methods in Natural Language Processing

[9] Latent Dirichlet allocation [J].

Blei, DM ;

Ng, AY ;

Jordan, MI .

JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022

[10] Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus [J].

Bollegala, Danushka ;

Weir, David ;

Carroll, John .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) :1719-1731

← 1 2 3 4 →