A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction

被引:1
作者
Kumar, Niraj [1 ]
Srinathan, Kannan [1 ]
Varma, Vasudeva [1 ]
机构
[1] IIIT Hyderabad, Hyderabad 500032, Andhra Pradesh, India
关键词
keyphrase extraction; weighted betweenness centrality; N-gram graph; normalised pointwise mutual information; NPMI;
D O I
10.1504/IJDMMM.2016.077198
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a novel N-gram (N>=1) filtration technique for keyphrase extraction. To filter the sophisticated candidate keyphrases (N-grams), we introduce the combined use of: 1) statistical feature (obtained by using weighted betweenness centrality scores of words, which is generally used to identify the border nodes/edges in community detection techniques); 2) co-location strength (calculated by using nearest neighbour Dbpedia texts). We also introduce the use of N-gram (N>=1) graph, which reduces the bias effect of lower length N-grams in the ranking process and preserves the semantics of words (phraseness), based upon local context. To capture the theme of the document and to reduce the effect of noisy terms in the ranking process, we apply an information theoretic framework for key-player detection on the proposed N-gram graph. Our experimental results show that the devised system performs better than the current state-of-the-art unsupervised systems and comparable/better than supervised systems.
引用
收藏
页码:124 / 143
页数:20
相关论文
共 26 条
[1]  
[Anonymous], 2010, P 5 INT WORKSH SEM E, DOI DOI 10.1007/S10579-012-9210-3
[2]  
Borgatti S.P., 2003, WORKSHOP SUMMARY PAP
[3]  
Bouma G, 2009, P GSCL, ppp31
[4]   The centrality of groups and classes [J].
Everett, MG ;
Borgatti, SP .
JOURNAL OF MATHEMATICAL SOCIOLOGY, 1999, 23 (03) :181-201
[5]   Community structure in social and biological networks [J].
Girvan, M ;
Newman, MEJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2002, 99 (12) :7821-7826
[6]  
Grineva NI., 2009, P 18 INT C WORLD WID, P661, DOI DOI 10.1145/1526709.1526798
[7]  
Hasan Kazi Saidul, 2010, P 23 INT C COMP LING, P365
[8]  
Hulth A, 2003, PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P216
[9]   Automatic keyphrase extraction from scientific articles [J].
Kim, Su Nam ;
Medelyan, Olena ;
Kan, Min-Yen ;
Baldwin, Timothy .
LANGUAGE RESOURCES AND EVALUATION, 2013, 47 (03) :723-742
[10]  
Kumar Niraj, 2013, Computational Linguistics and Intelligent Text Processing. 14th International Conference, CICLing 2013. Proceedings, P408, DOI 10.1007/978-3-642-37256-8_34