Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach

被引:0
作者
Miah, Mohammad Badrul Alam [1 ,2 ]
Awang, Suryanti [3 ]
Azad, Md Saiful [4 ]
Rahman, Md Mustafizur [5 ]
机构
[1] Univ Malaysia Pahang, Pekan, Malaysia
[2] Mawlana Bhashani Sci & Technol Univ, Informat & Commun Technol, Tangail, Bangladesh
[3] Univ Malaysia Pahang, Fac Comp, Ctr Data Sci & Artificial Intelligence, Data Sci Ctr,Soft Comp & Intelligent Syst, Pekan, Malaysia
[4] Green Univ Bangladesh, Comp Sci & Engn, Dhaka, Bangladesh
[5] Univ Malaysia Pahang, Fac Engn, Dept Mech Engn, Gambang, Kuantan, Malaysia
关键词
Keyphrase concentrated area; KCA identification; feature extraction; data processing; keyphrase extraction; curve fitting;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The extraction of high-quality keywords and summarising documents at a high level has become more difficult in current research due to technological advancements and the exponential expansion of textual data and digital sources. Extracting high-quality keywords and summarising the documents at a highlevel need to use features for the keyphrase extraction, becoming more popular. A new unsupervised keyphrase concentrated area (KCA) identification approach is proposed in this study as a feature of keyphrase extraction: corpus, domain and language independent; document length-free; utilized by both supervised and unsupervised techniques. In the proposed system, there are three phases: data pre-processing, data processing, and KCA identification. The system employs various text pre-processing methods before transferring the acquired datasets to the data processing step. The pre-processed data is subsequently used during the data processing step. The statistical approaches, curve plotting, and curve fitting technique are applied in the KCA identification step. The proposed system is then tested and evaluated using benchmark datasets collected from various sources. To demonstrate our proposed approach's effectiveness, merits, and significance, we compared it with other proposed techniques. The experimental results on eleven (11) datasets show that the proposed approach effectively recognizes the KCA from articles as well as significantly enhances the current keyphrase extraction methods based on various text sizes, languages, and domains.
引用
收藏
页码:788 / 796
页数:9
相关论文
共 39 条
[1]  
[Anonymous], 2015, INT J COMPUT APPL, DOI DOI 10.5120/IJCA2015907513
[2]  
Aquino G. O., 2015, J COMPUTER SCI TECHN, V15
[3]  
Boudin f., 2018, arXiv
[4]  
Bougouin Adrien, 2013, P 6 INT JOINT C NAT, P543
[5]  
Broder Andrei, 2007, 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P559, DOI 10.1145/1277741.1277837
[6]   A Comparison of Supervised Keyphrase Extraction Models [J].
Bulgarov, Florin ;
Caragea, Cornelia .
WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, :13-14
[7]  
Campos R., 2020, Datasets of automatic keyphrase extraction
[8]   YAKE! Keyword extraction from single documents using multiple local features [J].
Campos, Ricardo ;
Mangaravite, Vitor ;
Pasquali, Arian ;
Jorge, Alipio ;
Nunes, Celia ;
Jatowt, Adam .
INFORMATION SCIENCES, 2020, 509 :257-289
[9]   YAKE! Collection-Independent Automatic Keyword Extractor [J].
Campos, Ricardo ;
Mangaravite, Vitor ;
Pasquali, Arian ;
Jorge, Alipio Mario ;
Nunes, Celia ;
Jatowt, Adam .
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 :806-810
[10]  
Davydova O., 2019, TEXT PREPROCESSING P