Social Media Fake News Detection Using Machine Learning Models and Feature Extraction Techniques

被引：1

作者：

Tahat, Zuhair ^{[1
]}

Gharaibeh, Ayman ^{[2
]}

Tahat, Majd Z. ^{[3
]}

Glisson, William B. ^{[4
]}

Alamleh, Hosam ^{[5
]}

Liu, Xiyuan ^{[6
]}

机构：

[1] Yarmouk Univ, Journalism & Digital Media, Irbid, Jordan

[2] Louisiana Tech Univ, Computat Anal & Modeling, Ruston, LA USA

[3] Louisiana Tech Univ, Cyberspace Engn, Ruston, LA USA

[4] Louisiana Tech Univ, Dept Comp Sci, Ruston, LA USA

[5] Univ North Carolina Wilmington, Comp Sci, Wilmington, NC USA

[6] Louisiana Tech Univ, Math & Stat, Ruston, LA USA

来源：

2024 IEEE 15TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE, UEMCON | 2024年

关键词：

Fake News; Machine Learning; Feature Extraction; Natural Language Processing; n-grams; and Text Mining; CLASSIFICATION;

D O I：

10.1109/UEMCON62879.2024.10754749

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Due to continuous technological advances, fake news is rampant on social media platforms, making it an urgent area of research. This research investigates various Machine Learning (ML) algorithms and feature extraction techniques to classify news as real or fake. The study compares different ML models, such as Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree (DT), K-Nearest Neighbors (K-NN), Naive Bayes (NB), Multilayer Perceptron (MLP), and Stacked ensemble models. Six feature extraction methods were utilized, measured, and compared, including Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BOW), 2-grams, 3-grams, (1-2) grams, and (1-2-3) grams. The experiments were on a merged dataset for supervised learning and a separate test dataset for blind testing. The results showed that SVM, DT, and Stacked models achieve up to 99% accuracy with TF-IDF and BOW on the merged dataset. For the blind testing, K-NN for BOW and MLP for TF-IDF had an accuracy of 64% on separate test datasets. Lastly, the results demonstrated that n-grams helped to understand the context and the order of words, but a higher value of n increased dimensionality and caused sparsity.

引用

页码：134 / 139

页数：6

共 47 条

[1]

Alam F., 2021, Fighting the covid-19 infodemic in social media: A holistic perspective and a call to arms

[2] Comparison of Fake News Detection using Machine Learning and Deep Learning Techniques [J].

Alameri, Saeed Amer ;

Mohd, Masnizah .

2021 3RD INTERNATIONAL CYBER RESILIENCE CONFERENCE (CRC), 2021, :101-106

[3] FAKE NEWS AND THE ECONOMY OF EMOTIONS Problems, causes, solutions [J].

Bakir, Vian ;

McStay, Andrew .

DIGITAL JOURNALISM, 2018, 6 (02) :154-175

[4] Detecting Fake News Using Machine Learning Algorithms [J].

Bharath, G. ;

Manikanta, K. J. ;

Prakash, G. Bhanu ;

Sumathi, R. ;

Chinnasamy, P. .

2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,

[5]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[6]

Glisson WB, 2013, Arxiv, DOI arXiv:1309.0521

[7]

Caki Muhammed Baki, 2024, Journal of Artificial Intelligence and Data Science, V4, P22

[8]

Ceballos Delgado A. A., 2021, HAW INT C SYST SCI 2

[9] A CASE STUDY IN TEXT MINING OF DISCUSSION FORUM POSTS: CLASSIFICATION WITH BAG OF WORDS AND GLOBAL VECTORS [J].

Cichosz, Pawel .

INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2018, 28 (04) :787-801

[10]

CNN Editorial Team, 2019, Finland is winning the war on fake news. what it's learned may be crucial to western democracy

← 1 2 3 4 5 →