Efficient sequential pattern mining with wildcards for keyphrase extraction

被引:48
|
作者
Xie, Fei [1 ,2 ]
Wu, Xindong [1 ,3 ]
Zhu, Xingquan [4 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230009, Peoples R China
[2] Hefei Normal Univ, Dept Comp Sci & Technol, Hefei 230601, Peoples R China
[3] Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70503 USA
[4] Florida Atlantic Univ, Dept Comp & Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Document summarization; Keyphrase extraction; Sequential pattern mining; Wildcards; Classification; SYSTEM; RECOMMENDATION;
D O I
10.1016/j.knosys.2016.10.011
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A keyphrase (a multi-word unit) in a document denotes one or multiple keywords capturing a main topic of the underlying document. Finding good keyphrases of a document can quickly summarize knowledge for efficient decision making and benefit domains involving intensive text information. To date, existing keyphrase extraction methods cannot be customized to each specific document, mainly because their patterns used to form paraphrases are too restrictive and may not capture flexible keyword relationships inside the text. In this paper, we propose a sequential pattern mining based document-specific keyphrase extraction method. Our key innovation is to use wildcards (or gap constraints) to help extract sequential patterns, so the flexible wildcard constraints within a pattern can capture semantic relationships between words, and the system will have full flexibility to discover different types of sequential patterns as candidates for keyphrase extraction. To achieve the goal, we regard each single document as a sequential dataset, and propose an efficient algorithm to mine sequential patterns with wildcard and one-off conditions that allows important keyphrases to be captured during the mining process. For each extracted keyphrase candidate, we use some statistical pattern features to characterize it, and further collect all keyphrases from the document to form a training set. A supervised learning classifier is trained to identify keyphrases from a test document. Because our pattern mining and pattern characterization processes are customized to each single document, keyphases extracted from our method are highly specific for each document. Experimental results demonstrate that the proposed sequential pattern mining method outperforms existing pattern mining methods in both runtime performance and completeness. Comparisons on keyphrase benchmark datasets also confirm that the proposed document-specific keyphrase extraction method is effective in improving the quality of extracted keyphrases. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:27 / 39
页数:13
相关论文
共 50 条
  • [31] Pattern Matching with Flexible Wildcards
    吴信东
    强继朋
    谢飞
    JournalofComputerScience&Technology, 2014, 29 (05) : 740 - 751
  • [32] A review of keyphrase extraction
    Papagiannopoulou, Eirini
    Tsoumakas, Grigorios
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (02)
  • [33] Stop-words in Keyphrase Extraction Problem
    Popova, S.
    Kovriguina, L.
    Mouromtsev, D.
    Khodyrev, I.
    2013 14TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION (FRUCT), 2013, : 113 - 121
  • [34] NamedKeys: Unsupervised Keyphrase Extraction for Biomedical Documents
    Gero, Zelalem
    Ho, Joyce C.
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 328 - 337
  • [35] An Efficient Web Recommender System based on Approach of Mining Frequent Sequential Pattern from Customized Web Log Preprocessing
    Valera, Manisha
    Chauhan, Uttam
    2013 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATIONS AND NETWORKING TECHNOLOGIES (ICCCNT), 2013,
  • [36] Generalized Net of the Process of Sequential Pattern Mining by Generalized Sequential Pattern Algorithm (GSP)
    Bureva, Veselina
    Sotirova, Evdokia
    Chountas, Panagiotis
    INTELLIGENT SYSTEMS'2014, VOL 2: TOOLS, ARCHITECTURES, SYSTEMS, APPLICATIONS, 2015, 323 : 831 - 838
  • [37] A Review on Sequential Pattern Mining using Pattern Growth Approach
    Patel, Roshani
    Chaudhari, Tarunika
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 1424 - 1427
  • [38] Learning algorithms for keyphrase extraction
    Turney P.D.
    Information Retrieval, 2000, 2 (4): : 303 - 336
  • [39] Sequential pattern mining in databases with temporal uncertainty
    Jiaqi Ge
    Yuni Xia
    Jian Wang
    Chandima Hewa Nadungodage
    Sunil Prabhakar
    Knowledge and Information Systems, 2017, 51 : 821 - 850
  • [40] Generalized Sequential Pattern Mining with Item Intervals
    Hirate, Yu
    Yamana, Hayato
    JOURNAL OF COMPUTERS, 2006, 1 (03) : 51 - 60