Decentralized big data mining: federated learning for clustering youth tobacco use in India

被引:5
作者
Haripriya, Rahul [1 ]
Khare, Nilay [1 ]
Pandey, Manish [1 ]
Biswas, Sreemoyee [1 ]
机构
[1] Maulana Azad Natl Inst Technol, Dept Comp Sci & Engn, Bhopal 462003, MP, India
关键词
Big data; Federated learning for big data; Big data privacy; Federated learning; Decentralized data mining; Clustering algorithms; Privacy preservation; Machine learning; AI in public health;
D O I
10.1186/s40537-024-01042-0
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This study examines the smoking patterns of youth across various states and union territories of India using the Global Youth Tobacco Survey (GYTS) dataset. The analysis employs three clustering algorithms K-Means, DBSCAN, and Hierarchical Clustering within a federated learning framework, which ensures that sensitive public health data remains decentralized and private. Federated learning enables collaborative analysis across different regions by sharing only model parameters rather than raw data, thus enhancing privacy. Furthermore, the integration of differential privacy ensures additional protection by adding controlled noise to the model parameters, safeguarding individual-level data from exposure during the learning process. The study highlights the varying performances of the clustering algorithms, revealing valuable insights into regional smoking behaviors and the effectiveness of government anti-tobacco campaigns. These insights offer important guidance for public health authorities, allowing for the design and implementation of more targeted and effective campaigns tailored to the needs of specific regions. By leveraging federated learning and differential privacy, this study demonstrates a privacy-preserving approach to analyzing large-scale public health data, providing a blueprint for future health interventions and tobacco control strategies in India and beyond.
引用
收藏
页数:26
相关论文
共 53 条
[1]  
Aggarwal CC, 2008, ADV DATABASE SYST, V34, P11
[2]  
Alsayat A, 2016, 2016 IEEE/ACIS 14TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING RESEARCH, MANAGEMENT AND APPLICATIONS (SERA), P45, DOI 10.1109/SERA.2016.7516127
[3]   On the effects of data normalization for domain adaptation on EEG data [J].
Apicella, Andrea ;
Isgro, Francesco ;
Pollastro, Andrea ;
Prevete, Roberto .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[4]   A Review of Clustering Algorithms: Comparison of DBSCAN and K-mean with Oversampling and t-SNE [J].
Bajal E. ;
Katara V. ;
Bhatia M. ;
Hooda M. .
Recent Patents on Engineering, 2022, 16 (02)
[5]   Federated learning review: Fundamentals, enabling technologies, and future applications [J].
Banabilah, Syreen ;
Aloqaily, Moayad ;
Alsayed, Eitaa ;
Malik, Nida ;
Jararweh, Yaser .
INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (06)
[6]   Multi-objective evolutionary approach based on K-means clustering for home health care routing and scheduling problem [J].
Belhor, Mariem ;
El-Amraoui, Adnen ;
Jemai, Abderrazak ;
Delmotte, Francois .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 213
[7]  
Bharadiya J., 2023, Int J Innov Sci Res Technol, V8, DOI [DOI 10.5281/ZENODO.8002436, 10.5281/zenodo.8002436]
[8]   Distinguishing Smoking-Related Lung Disease Phenotypes Via Imaging and Molecular Features [J].
Billatos, Ehab ;
Ash, Samuel Y. ;
Duan, Fenghai ;
Xu, Ke ;
Romanoff, Justin ;
Marques, Helga ;
Moses, Elizabeth ;
Han, MeiLan K. ;
Regan, Elizabeth A. ;
Bowler, Russell P. ;
Mason, Stefanie E. ;
Doyle, Tracy J. ;
Estepar, Ruben San Jose ;
Rosas, Ivan O. ;
Ross, James C. ;
Xiao, Xiaohui ;
Liu, Hanqiao ;
Liu, Gang ;
Sukumar, Gauthaman ;
Wilkerson, Matthew ;
Dalgard, Clifton ;
Stevenson, Christopher ;
Whitney, Duncan ;
Aberle, Denise ;
Spira, Avrum ;
Estepar, Raul San Jose ;
Lenburg, Marc E. ;
Washko, George R. .
CHEST, 2021, 159 (02) :549-563
[9]   Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms [J].
Bushra, Adil Abdu ;
Yi, Gangman .
IEEE ACCESS, 2021, 9 :87918-87935
[10]   Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spatio-temporal climate data [J].
Chattopadhyay, Ashesh ;
Hassanzadeh, Pedram ;
Pasha, Saba .
SCIENTIFIC REPORTS, 2020, 10 (01)