Multi-Distribution Characteristics Based Chinese Entity Synonym Extraction from The Web

被引:3
作者
Ma, Xiuxia [1 ]
Luo, Xiangfeng [1 ]
Huang, Subin [1 ]
Guo, Yike [2 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
[2] Imperial Coll London, Comp Sci, Dept Comp, London, England
基金
美国国家科学基金会;
关键词
Entity Synonym Network; Ranking Problem; Spreading Activation; Synonym Extraction;
D O I
10.4018/IJIIT.2019070103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Entity synonyms play an important role in natural language processing applications, such as query expansion and question answering. There are three main distribution characteristics in web texts:1) appearing in parallel structures; 2) occurring with specific patterns in sentences; and 3) distributed in similar contexts. The first and second characteristics rely on reliable prior knowledge and are susceptive to data sparseness, bringing high accuracy and low recall to synonym extraction. The third one may lead to high recall but low accuracy, since it identifies a somewhat loose semantic similarity. Existing methods, such as context-based and pattern-based methods, only consider one characteristic for synonym extraction and rarely take their complementarity into account. For increasing recall, this article proposes a novel extraction framework that can combine the three characteristics for extracting synonyms from the web, where an Entity Synonym Network (ESN) is built to incorporate synonymous knowledge. To improve accuracy, the article treats synonym detection as a ranking problem and uses the Spreading Activation model as a ranking means to detect the hard noise in ESN. Experimental results show the proposed method achieves better accuracy and recall than the state-of-the-art methods.
引用
收藏
页码:42 / 63
页数:22
相关论文
共 30 条
[1]  
Agichtein E., 2000, ACM 2000. Digital Libraries. Proceedings of the Fifth ACM Conference on Digital Libraries, P85, DOI 10.1145/336597.336644
[2]   A model for text summarization [J].
Alguliyev, Rasim M. ;
Aliguliyev, Ramiz M. ;
Isazade, Nijat R. ;
Abdi, Asad ;
Idris, Norisma .
International Journal of Intelligent Information Technologies, 2017, 13 (01) :67-85
[3]  
Batista D. S., 2015, P 2015 C EMP METH NA
[4]  
Chakrabarti K., 2012, FRAMEWORK ROBUST DIS, P1384, DOI [10.1145/2339530.2339743, DOI 10.1145/2339530.2339743]
[5]   Application of spreading activation techniques in information retrieval [J].
Crestani, F .
ARTIFICIAL INTELLIGENCE REVIEW, 1997, 11 (06) :453-482
[6]  
Faruqui M., 2015, P 2015 C N AM CHAPT, P1606, DOI [DOI 10.3115/V1/N15-1184, 10.3115/v1/N15-1184]
[7]  
Ferret O., 2017, INT JOINT C NAT LANG, V1, P273
[8]  
Galea D., 2011, INT C QUANT INT AB U, P149, DOI [10.1007/978-3-642-24971-6_15, DOI 10.1007/978-3-642-24971-6_15]
[9]   Distributed video coding with spatial correlation exploited only at the decoder [J].
Guo, Mei ;
Lu, Yan ;
Wu, Feng ;
Le, Shipeng ;
Gao, Wen .
2007 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-11, 2007, :41-+
[10]  
Hagiwara M., 2010, 46 ANN M ASS COMP LI, P1, DOI [10.3115/1564154.1564156, DOI 10.3115/1564154.1564156]