An Improved Corpus-Based NLP Method for Facilitating Keyword Extraction: An Example of the COVID-19 Vaccine Hesitancy Corpus

被引:4
作者
Chen, Liang-Ching [1 ,2 ]
机构
[1] ROC Mil Acad, Dept Foreign Languages, Kaohsiung 830, Taiwan
[2] Natl Sun Yat Sen Univ, Inst Educ, Kaohsiung 804, Taiwan
关键词
COVID-19 vaccine hesitancy; keyword extraction; natural language processing (NLP); medical informatics; corpus; i10-index; importance-performance analysis (IPA) method; VOCABULARY; ENGLISH;
D O I
10.3390/su15043402
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
In the current COVID-19 post-pandemic era, COVID-19 vaccine hesitancy is hindering the herd immunity generated by widespread vaccination. It is critical to identify the factors that may cause COVID-19 vaccine hesitancy, enabling the relevant authorities to propose appropriate interventions for mitigating such a phenomenon. Keyword extraction, a sub-field of natural language processing (NLP) applications, plays a vital role in modern medical informatics. When traditional corpus-based NLP methods are used to conduct keyword extraction, they only consider a word's log-likelihood value to determine whether it is a keyword, which leaves room for concerns about the efficiency and accuracy of this keyword extraction technique. These concerns include the fact that the method is unable to (1) optimize the keyword list by the machine-based approach, (2) effectively evaluate the keyword's importance level, and (3) integrate the variables to conduct data clustering. Thus, to address the aforementioned issues, this study integrated a machine-based word removal technique, the i10-index, and the importance-performance analysis (IPA) technique to develop an improved corpus-based NLP method for facilitating keyword extraction. The top 200 most-cited Science Citation Index (SCI) research articles discussing COVID-19 vaccine hesitancy were adopted as the target corpus for verification. The results showed that the keywords of Quadrant I (n = 98) reached the highest lexical coverage (9.81%), indicating that the proposed method successfully identified and extracted the most important keywords from the target corpus, thus achieving more domain-oriented and accurate keyword extraction results.
引用
收藏
页数:19
相关论文
共 62 条
[1]   Coronavirus conspiracy suspicions, general vaccine attitudes, trust and coronavirus information source as predictors of vaccine hesitancy among UK residents during the COVID-19 pandemic [J].
Allington, Daniel ;
McAndrew, Siobhan ;
Moxham-Hall, Vivienne ;
Duffy, Bobby .
PSYCHOLOGICAL MEDICINE, 2023, 53 (01) :236-247
[2]   Hesitancy of COVID-19 vaccines: Rapid systematic review of the measurement, predictors, and preventive strategies [J].
Anakpo, Godfred ;
Mishi, Syden .
HUMAN VACCINES & IMMUNOTHERAPEUTICS, 2022, 18 (05)
[3]  
Anthony L., 2022, AntConc
[4]   How large a vocabulary do Chinese computer science undergraduates need to read English-medium specialist textbooks? [J].
Bi, Jia .
ENGLISH FOR SPECIFIC PURPOSES, 2020, 58 :77-89
[5]  
Browne C., 2013, The new general service list
[6]  
Chang KL, 2021, J MULT-VALUED LOG S, V37, P573
[7]   A novel corpus-based computing method for handling critical word-ranking issues: An example of COVID-19 research articles [J].
Chen, Liang-Ching ;
Chang, Kuei-Hu .
INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) :3190-3216
[8]   A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports [J].
Chen, Liang-Ching ;
Chang, Kuei-Hu ;
Chung, Hsiang-Yu .
APPLIED SCIENCES-BASEL, 2020, 10 (16)
[9]   A Comparison of Research Productivity Across Plastic Surgery Fellowship Directors [J].
Chopra, Karan ;
Swanson, Edward W. ;
Susarla, Srinivas ;
Chang, Sarah ;
Stevens, W. Grant ;
Singh, Devinder P. .
AESTHETIC SURGERY JOURNAL, 2016, 36 (06) :732-736
[10]  
Dunning T., 1993, Computational Linguistics, V19, P61