An open automation system for predatory journal detection

被引:24
作者
Chen, Li-Xian [1 ]
Su, Shih-Wen [2 ]
Liao, Chia-Hung [2 ]
Wong, Kai-Sin [2 ]
Yuan, Shyan-Ming [2 ]
机构
[1] Fuzhou Univ Int Studies & Trade, Sch Big Data, Fuzhou 350202, Peoples R China
[2] Natl Yang Ming Chiao Tung Univ, Dept Comp Sci, Room 702, MIRC 1001, Univ Rd, Hsinchu 30010, Taiwan
关键词
BIBLIOMETRIC ANALYSIS; NAIVE BAYES; TEXT; CLASSIFICATION; CLASSIFIERS; IMPACT; NEWS;
D O I
10.1038/s41598-023-30176-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The growing number of online open-access journals promotes academic exchanges, but the prevalence of predatory journals is undermining the scholarly reporting process. Data collection, feature extraction, and model prediction are common steps in tools designed to distinguish between legitimate and predatory academic journals and publisher websites. The authors include them in their proposed academic journal predatory checking (AJPC) system based on machine learning methods. The AJPC data collection process extracts 833 blacklists and 1213 whitelists information from websites to be used for identifying words and phrases that might indicate the presence of predatory journals. Feature extraction is used to identify words and terms that help detect predatory websites, and the system's prediction stage uses eight classification algorithms to distinguish between potentially predatory and legitimate journals. We found that enhancing the classification efficiency of the bag of words model and TF-IDF algorithm with diff scores (a measure of differences in specific word frequencies between journals) can assist in identifying predatory journal feature words. Results from performance tests suggest that our system works as well as or better than those currently being used to identify suspect publishers and publications. The open system only provides reference results rather than absolute opinions and accepts user inquiries and feedback to update the system and optimize performance.
引用
收藏
页数:17
相关论文
共 67 条
[1]   Beyond Beall's Blacklist: Automatic Detection of Open Access Predatory Research Journals [J].
Adnan, Awais ;
Anwar, Sajid ;
Zia, Tehseen ;
Razzaq, Saad ;
Maqbool, Fahad ;
Rehman, Zia Ur .
IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, :1692-1697
[2]   A Crowdsourcing Web-Based System for Reporting Predatory Publishers [J].
Al-Matham, Rawan N. ;
Al-Khalifa, Hend S. .
19TH INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES (IIWAS2017), 2017, :573-576
[3]  
Alam M. S., 2013, P IEEE INT C GREEN C, P663, DOI [10.1109/GREENCOM-ITHINGS-CPSCOM.2013.122, DOI 10.1109/GREENCOM-ITHINGS-CPSCOM.2013.122]
[4]   Publishing in predatory tourism and hospitality journals: Mapping the academic market and identifying response strategies [J].
Alrawadieh, Zaid .
TOURISM AND HOSPITALITY RESEARCH, 2020, 20 (01) :72-81
[5]  
Beall J., 2020, Beall's List of Potential Predatory Journals and Publishers.
[6]   Predatory publishers are corrupting open access [J].
Beall, Jeffrey .
NATURE, 2012, 489 (7415) :179-179
[7]   PredCheck: Detecting Predatory Behaviour in Scholarly World [J].
Bedmutha, Manas Satish ;
Modi, Kaushal ;
Patel, Kevin ;
Jain, Naman ;
Singh, Mayank .
PROCEEDINGS OF THE ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES IN 2020, JCDL 2020, 2020, :521-522
[8]  
Berek L., 2020, IPSI T ADV RES, V16, P3
[9]  
Berger M., 2017, ACRL
[10]   A bibliometric analysis of the international medical literature on predatory publishing [J].
Beshyah, Anas S. ;
Basher, Momna ;
Beshyah, Salem A. .
IBNOSINA JOURNAL OF MEDICINE AND BIOMEDICAL SCIENCES, 2020, 12 (01) :23-32