Recognize corrupted data packeted while transferring data through ensemble machine learning techniques

被引:1
作者
Sharma, Satyajeet [1 ]
Sharma, Bhavna [1 ]
机构
[1] JECRC Univ, Dept Comp Sci & Engn, Jaipur, Rajasthan, India
关键词
Corrupt file detection; File transfer protocols; Ensemble machine learning; Data integrity; Error detection; Machine learning; AdaBoost Classifiers;
D O I
10.47974/JIOS-1420
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In today's world, every technology is moving towards cloud storage which makes file transfer protocols a cornerstone for any platform to run smoothly. Therefore, identifying damaged files is a crucial responsibility in the area of data management and integrity. In this study, we suggest an AdaBoost-based machine learning technique for identifying damaged files. AdaBoost is an ensemble method that combines many weak classifiers into one powerful classifier. In our method, we train weak classifiers called decision stumps using a dataset that includes both damaged and healthy files. The final prediction was decided by a weighted majority vote of all the weak classifiers. We evaluated our method on a dataset generated by collecting metadata information of files and passed it to the algorithms. We used the AdaBoost approach as a base algorithm for comparison along with more established techniques like Naive Bayes, Logistic Regression, and Linear Discriminant Analysis. The results show that the AdaBoost algorithm is effective in detecting corrupted files, and it performs better than other traditional methods. Additionally, our method is computationally efficient and can be easily integrated into existing data management systems. It is expected to have a positive impact on data integrity and management in various fields such as digital forensics, cloud computing, and storage systems.
引用
收藏
页码:1459 / 1469
页数:11
相关论文
共 50 条
  • [21] Using machine learning techniques for exploration and classification of laboratory data
    Trulson, Inga
    Holdenrieder, Stefan
    Hoffmann, Georg
    [J]. JOURNAL OF LABORATORY MEDICINE, 2024, 48 (05) : 203 - 214
  • [22] Semantic Enrichment of Product Data Supported by Machine Learning Techniques
    Costa, Ruben
    Figueiras, Paulo
    Jardim-Goncalves, Ricardo
    Ramos-Filho, Jose
    Lima, Celson
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING, TECHNOLOGY AND INNOVATION (ICE/ITMC), 2017, : 1472 - 1479
  • [23] Data dissemination approach using machine learning techniques for WBANs
    Punj, Roopali
    Kumar, Rakesh
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (05)
  • [24] Effective Machine Learning Techniques for Dealing with Poor Credit Data
    Nkambule, Dumisani Selby
    Twala, Bhekisipho
    Pretorius, Jan Harm Christiaan
    [J]. RISKS, 2024, 12 (11)
  • [25] CLASSIFICATION OF RAIL SWITCH DATA USING MACHINE LEARNING TECHNIQUES
    Bryan, Kaylen J.
    Solomon, Mitchell
    Jensen, Emily
    Coley, Christina
    Rajan, Kailas
    Tian, Charlie
    Mijatovic, Nenad
    Kiss, James M.
    Lamoureux, Benjamin
    Dersin, Pierre
    Smith, Anthony O.
    Peter, Adrian M.
    [J]. PROCEEDINGS OF THE ASME JOINT RAIL CONFERENCE, 2018, 2018,
  • [26] Analysis of XDMoD/SUPReMM Data Using Machine Learning Techniques
    Gallo, Steven M.
    White, Joseph P.
    DeLeon, Robert L.
    Furlani, Thomas R.
    Ngo, Helen
    Patra, Abani K.
    Jones, Matthew D.
    Palmer, Jeffrey T.
    Simakov, Nikolay
    Sperhac, Jeanette M.
    Innus, Martins
    Yearke, Thomas
    Rathsam, Ryan
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 642 - 649
  • [27] Machine learning for leaf disease classification: data, techniques and applications
    Jianping Yao
    Son N. Tran
    Samantha Sawyer
    Saurabh Garg
    [J]. Artificial Intelligence Review, 2023, 56 : 3571 - 3616
  • [28] Validation of A/B tests using Machine Data Learning Techniques
    Kumar, Suraj
    [J]. 2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, : 338 - 343
  • [29] CPT Data Interpretation Employing Different Machine Learning Techniques
    Rauter, Stefan
    Tschuchnigg, Franz
    [J]. GEOSCIENCES, 2021, 11 (07)
  • [30] A Comparison of Resampling Techniques for Medical Data Using Machine Learning
    Alahmari, Fahad
    [J]. JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT, 2020, 19 (01)