Feature extraction using LR-PCA hybridization on twitter data and classification accuracy using machine learning algorithms

被引:41
作者
Murugan, N. Senthil [1 ]
Devi, G. Usha [1 ]
机构
[1] VIT Univ, Sch Informat Technol & Engn, Vellore, Tamil Nadu, India
来源
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS | 2019年 / 22卷 / Suppl 6期
关键词
Social networks; Twitter; PCA; Logistic regression; Machine learning; SYSTEM;
D O I
10.1007/s10586-018-2158-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Twitter, a social blogging site which became the tremendous topic in today's environment, which made several organizations and public to develop their identity and overwhelming through this social website. But unfortunately, twitter facing great challenges due to spammers who break the reputation of the website from deliberate users to stop using it. Researchers have proposed many techniques to overcome the issues faced by the spammers. As far researchers find a new path so as the spammers develop new techniques to travel in that path. So far, many algorithms were proposed to detect the spammers and some extraction techniques have developed to increase the potential of detection rate. In this paper, the main focus is about feature extraction of our data with a hybrid approach of combining logistic regression with dimensional reduction technique using principal component analysis. Our dataset contains 17 million users' tweets with 159 features included in it. Then we are going to extract particular features from it which would be helpful for the further process of increasing the classification accuracy. For the classification process, our work extended for the process of classification of data using some machine learning techniques. From the proposed work the detection rate could be increased by using particular features for the classification process.
引用
收藏
页码:13965 / 13974
页数:10
相关论文
共 33 条
[11]   WarningBird: A Near Real-Time Detection System for Suspicious URLs in Twitter Stream [J].
Lee, Sangho ;
Kim, Jong .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2013, 10 (03) :183-195
[12]  
Li-Yuan Chen, 2013, International Journal of Machine Learning and Computing, V3, P93, DOI 10.7763/IJMLC.2013.V3.279
[13]  
Manogaran C.T. G., 2017, Exploring the Convergence of Big Data and the Internet of Things, P141
[14]  
Manogaran G., 2017, FUTUR GENER COMPUT S
[15]   Machine Learning Based Big Data Processing Framework for Cancer Diagnosis Using Hidden Markov Model and GM Clustering [J].
Manogaran, Gunasekaran ;
Vijayakumar, V. ;
Varatharajan, R. ;
Kumar, Priyan Malarvizhi ;
Sundarasekar, Revathi ;
Hsu, Ching-Hsien .
WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (03) :2099-2116
[16]   RETRACTED: Hybrid Recommendation System for Heart Disease Diagnosis based on Multiple Kernel Learning with Adaptive Neuro-Fuzzy Inference System (Retracted article. See vol. 82, pg. 3181, 2023) [J].
Manogaran, Gunasekaran ;
Varatharajan, R. ;
Priyan, M. K. .
MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (04) :4379-4399
[17]   A Gaussian process based big data processing framework in cluster computing environment [J].
Manogaran, Gunasekaran ;
Lopez, Daphne .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2018, 21 (01) :189-204
[18]   Spatial cumulative sum algorithm with big data analytics for climate change detection [J].
Manogaran, Gunasekaran ;
Lopez, Daphne .
COMPUTERS & ELECTRICAL ENGINEERING, 2018, 65 :207-221
[19]   Secure Disintegration Protocol for Privacy Preserving Cloud Storage [J].
Rawal, Bharat S. ;
Vijayakumar, V. ;
Manogaran, Gunasekaran ;
Varatharajan, R. ;
Chilamkurti, Naveen .
WIRELESS PERSONAL COMMUNICATIONS, 2018, 103 (02) :1161-1177
[20]  
Sheela L. J., 2009, INT J COMPUT ELECT E, V1, P179