Efficient sequential pattern mining with wildcards for keyphrase extraction

被引:48
|
作者
Xie, Fei [1 ,2 ]
Wu, Xindong [1 ,3 ]
Zhu, Xingquan [4 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Hefei Normal Univ, Dept Comp Sci & Technol, Hefei 230601, Peoples R China
[3] Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70503 USA
[4] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Document summarization; Keyphrase extraction; Sequential pattern mining; Wildcards; Classification; SYSTEM; RECOMMENDATION;
D O I
10.1016/j.knosys.2016.10.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, so the flexible wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 39
页数:13
相关论文
共 50 条
  • [41] NetNMSP: Nonoverlapping maximal sequential pattern mining
    Yan Li
    Shuai Zhang
    Lei Guo
    Jing Liu
    Youxi Wu
    Xindong Wu
    Applied Intelligence, 2022, 52 : 9861 - 9884
  • [42] Detecting and exploiting symmetries in sequential pattern mining
    Nekkache, Ikram
    Jabbour, Said
    Kamel, Nadjet
    Sais, Lakhdar
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2022, 14 (04) : 309 - 334
  • [43] Discretized sequential pattern mining for behaviour classification
    Scott Buffett
    Granular Computing, 2021, 6 : 853 - 866
  • [44] On Probabilistic Models for Uncertain Sequential Pattern Mining
    Muzammal, Muhammad
    Raman, Rajeev
    ADVANCED DATA MINING AND APPLICATIONS, ADMA 2010, PT I, 2010, 6440 : 60 - 72
  • [45] Multi-pattern matching with wildcards
    Zhang M.
    Zhang Y.
    Tang J.
    Bai X.
    Journal of Software, 2011, 6 (12 SPEC. ISSUE) : 2391 - 2398
  • [46] Sequential Pattern Mining with the Micron Automata Processor
    Wang, Ke
    Sadredini, Elaheh
    Skadron, Kevin
    PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 135 - 144
  • [47] Closed sequential pattern mining for sitemap generation
    Ceci, Michelangelo
    Lanotte, Pasqua Fabiana
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2021, 24 (01): : 175 - 203
  • [48] The evaluation of occupational accident with sequential pattern mining
    Mutlu, Nazli Gulum
    Altuntas, Serkan
    Dereli, Turkay
    SAFETY SCIENCE, 2023, 166
  • [49] Closed sequential pattern mining for sitemap generation
    Michelangelo Ceci
    Pasqua Fabiana Lanotte
    World Wide Web, 2021, 24 : 175 - 203
  • [50] NetNCSP: Nonoverlapping closed sequential pattern mining
    Wu, Youxi
    Zhu, Changrui
    Li, Yan
    Guo, Lei
    Wu, Xindong
    KNOWLEDGE-BASED SYSTEMS, 2020, 196 (196)