An Improved Multiple Features and Machine Learning-Based Approach for Detecting Clickbait News on Social Networks

被引:11
作者
Al-Sarem, Mohammed [1 ]
Saeed, Faisal [1 ,2 ]
Al-Mekhlafi, Zeyad Ghaleb [3 ]
Mohammed, Badiea Abdulkarem [3 ]
Hadwan, Mohammed [4 ,5 ]
Al-Hadhrami, Tawfik [6 ]
Alshammari, Mohammad T. [3 ]
Alreshidi, Abdulrahman [3 ]
Alshammari, Talal Sarheed [3 ]
机构
[1] Taibah Univ, Coll Comp Sci & Engn, Medina 42353, Saudi Arabia
[2] Univ Malaysia Kelantan, Inst Artificial Intelligence & Big Data, City Campus, Kota Baharu 16100, Kelantan, Malaysia
[3] Univ Hail, Coll Comp Sci & Engn, Hail 81481, Saudi Arabia
[4] Qassim Univ, Coll Comp, Dept Informat Technol, Buraydah 51452, Saudi Arabia
[5] Taiz Univ, Coll Appl Sci, Dept Comp Sci, Taizi 6803, Yemen
[6] Nottingham Trent Univ, Sch Sci & Technol, Nottingham NG11 8NS, England
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 20期
关键词
ANOVA-test; clickbait news; feature selection; social network;
D O I
10.3390/app11209487
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The widespread usage of social media has led to the increasing popularity of online advertisements, which have been accompanied by a disturbing spread of clickbait headlines. Clickbait dissatisfies users because the article content does not match their expectation. Detecting clickbait posts in online social networks is an important task to fight this issue. Clickbait posts use phrases that are mainly posted to attract a user's attention in order to click onto a specific fake link/website. That means clickbait headlines utilize misleading titles, which could carry hidden important information from the target website. It is very difficult to recognize these clickbait headlines manually. Therefore, there is a need for an intelligent method to detect clickbait and fake advertisements on social networks. Several machine learning methods have been applied for this detection purpose. However, the obtained performance (accuracy) only reached 87% and still needs to be improved. In addition, most of the existing studies were conducted on English headlines and contents. Few studies focused specifically on detecting clickbait headlines in Arabic. Therefore, this study constructed the first Arabic clickbait headline news dataset and presents an improved multiple feature-based approach for detecting clickbait news on social networks in Arabic language. The proposed approach includes three main phases: data collection, data preparation, and machine learning model training and testing phases. The collected dataset included 54,893 Arabic news items from Twitter (after pre-processing). Among these news items, 23,981 were clickbait news (43.69%) and 30,912 were legitimate news (56.31%). This dataset was pre-processed and then the most important features were selected using the ANOVA F-test. Several machine learning (ML) methods were then applied with hyper-parameter tuning methods to ensure finding the optimal settings. Finally, the ML models were evaluated, and the overall performance is reported in this paper. The experimental results show that the Support Vector Machine (SVM) with the top 10% of ANOVA F-test features (user-based features (UFs) and content-based features (CFs)) obtained the best performance and achieved 92.16% of detection accuracy.
引用
收藏
页数:15
相关论文
共 25 条
[1]  
Aburas AA, 2008, INTERNATIONAL SYMPOSIUM OF INFORMATION TECHNOLOGY 2008, VOLS 1-4, PROCEEDINGS, P1399
[2]  
Agrawal A, 2016, PROCEEDINGS ON 2016 2ND INTERNATIONAL CONFERENCE ON NEXT GENERATION COMPUTING TECHNOLOGIES (NGCT), P268, DOI 10.1109/NGCT.2016.7877426
[3]  
Al-Nuzaili Q.A., 2017, INT J COMPUT VIS ROB, V7, P99, DOI [10.1504/IJCVR.2017.081243, DOI 10.1504/IJCVR.2017.081243]
[4]   Combination of Stylo-based Features and Frequency-based Features for Identifying the Author of Short Arabic Text [J].
Al-Sarem, Mohammed ;
Cherif, Walid ;
Wahab, Ahmed Abdel ;
Emara, Abdel-Hamid ;
Kissi, Mohamed .
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA'18), 2018,
[5]   Ensemble Methods for Instance-Based Arabic Language Authorship Attribution [J].
Al-Sarem, Mohammed ;
Saeed, Faisal ;
Alsaeedi, Abdullah ;
Boulila, Wadii ;
Al-Hadhrami, Tawfik .
IEEE ACCESS, 2020, 8 :17331-17345
[6]  
[Anonymous], 2016, P 30 AAAI C ART INT
[7]   Feature selection using an improved Chi-square for Arabic text classification [J].
Bahassine, Said ;
Madani, Abdellah ;
Al-Sarem, Mohammed ;
Kissi, Mohamed .
JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2020, 32 (02) :225-231
[8]   Clickbait as a strategy of viral journalism: conceptualisation and methods [J].
Bazaco, Angela ;
Redondo, Marta ;
Sanchez-Garcia, Pilar .
REVISTA LATINA DE COMUNICACION SOCIAL, 2019, 74 (01) :94-115
[9]  
Cao X., 2017, ARXIV171001977
[10]  
Chakraborty A, 2016, PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, P9, DOI 10.1109/ASONAM.2016.7752207