Efficient sequential pattern mining with wildcards for keyphrase extraction

被引:48
|
作者
Xie, Fei [1 ,2 ]
Wu, Xindong [1 ,3 ]
Zhu, Xingquan [4 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Hefei Normal Univ, Dept Comp Sci & Technol, Hefei 230601, Peoples R China
[3] Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70503 USA
[4] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Document summarization; Keyphrase extraction; Sequential pattern mining; Wildcards; Classification; SYSTEM; RECOMMENDATION;
D O I
10.1016/j.knosys.2016.10.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, so the flexible wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 39
页数:13
相关论文
共 50 条
  • [1] Document-Specific Keyphrase Extraction Using Sequential Patterns with Wildcards
    Xie, Fei
    Wu, Xindong
    Zhu, Xingquan
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 1055 - 1060
  • [2] Sequential Pattern Mining with Wildcards
    Xie, Fei
    Wu, Xindong
    Hu, Xuegang
    Gao, Jun
    Guo, Dan
    Fei, Yulian
    Hua, Ertian
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [3] Keyphrase Extraction Using Sequential Pattern Mining and Entropy
    Wang, Qingren
    Sheng, Victor S.
    Hu, Chenyi
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (IEEE ICBK 2017), 2017, : 88 - 95
  • [4] An Efficient Approach for Mining Sequential Pattern
    Pant, Nidhi
    Kant, Surya
    Pant, Bhaskar
    Sharma, Shashi Kumar
    PROCEEDINGS OF FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2015), VOL 2, 2016, 437 : 587 - 596
  • [5] Efficient weighted sequential pattern mining
    Chen, Shaotao
    Chen, Jiahui
    Wan, Shicheng
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [6] Mining sequential patterns with wildcards and the One-Off condition
    Wu, Xin-Dong
    Xie, Fei
    Huang, Yong-Ming
    Hu, Xue-Gang
    Gao, Jun
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (08): : 1804 - 1815
  • [7] A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards
    Guo, Dan
    Hong, Xiao-Li
    Hu, Xue-Gang
    Gao, Jun
    Liu, Ying-Ling
    Wu, Gong-Qing
    Wu, Xindong
    CYBERNETICS AND SYSTEMS, 2011, 42 (06) : 382 - 401
  • [8] Keyphrase Extraction Using Sequential Patterns Mining Algorithm with One-Off and General Gaps Condition
    Liu H.-T.
    Liu Z.-Z.
    Wang L.-L.
    Wu X.-D.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (05): : 1121 - 1128
  • [9] Memory-Efficient Sequential Pattern Mining with Hybrid Tries
    Hosseininasab, Amin
    van Hoeve, Willem-Jan
    Cire, Andre A.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [10] Towards Efficient Sequential Pattern Mining in Temporal Uncertain Databases
    Ge, Jiaqi
    Xia, Yuni
    Wang, Jian
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PART II, 2015, 9078 : 268 - 279