Recognize corrupted data packeted while transferring data through ensemble machine learning techniques

被引:1
作者
Sharma, Satyajeet [1 ]
Sharma, Bhavna [1 ]
机构
[1] JECRC Univ, Dept Comp Sci & Engn, Jaipur, Rajasthan, India
关键词
Corrupt file detection; File transfer protocols; Ensemble machine learning; Data integrity; Error detection; Machine learning; AdaBoost Classifiers;
D O I
10.47974/JIOS-1420
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
In today's world, every technology is moving towards cloud storage which makes file transfer protocols a cornerstone for any platform to run smoothly. Therefore, identifying damaged files is a crucial responsibility in the area of data management and integrity. In this study, we suggest an AdaBoost-based machine learning technique for identifying damaged files. AdaBoost is an ensemble method that combines many weak classifiers into one powerful classifier. In our method, we train weak classifiers called decision stumps using a dataset that includes both damaged and healthy files. The final prediction was decided by a weighted majority vote of all the weak classifiers. We evaluated our method on a dataset generated by collecting metadata information of files and passed it to the algorithms. We used the AdaBoost approach as a base algorithm for comparison along with more established techniques like Naive Bayes, Logistic Regression, and Linear Discriminant Analysis. The results show that the AdaBoost algorithm is effective in detecting corrupted files, and it performs better than other traditional methods. Additionally, our method is computationally efficient and can be easily integrated into existing data management systems. It is expected to have a positive impact on data integrity and management in various fields such as digital forensics, cloud computing, and storage systems.
引用
收藏
页码:1459 / 1469
页数:11
相关论文
共 50 条
  • [11] An Ensemble Machine Learning and Data Mining Approach to Enhance Stroke Prediction
    Wijaya, Richard
    Saeed, Faisal
    Samimi, Parnia
    Albarrak, Abdullah M.
    Qasem, Sultan Noman
    BIOENGINEERING-BASEL, 2024, 11 (07):
  • [12] Ransomware Detection: Ensemble Machine Learning Models using Disjoint Data
    da Silva, Charles M. R.
    de Castro, Paulo Andre L.
    Cesar, Cecilia de A. C.
    2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 166 - 179
  • [13] Analysis and enhanced prediction of the Spanish Electricity Network through Big Data and Machine Learning techniques
    Pegalajar, M. C.
    Ruiz, L. G. B.
    Cuellar, M. P.
    Rueda, R.
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2021, 133 : 48 - 59
  • [14] Visualizing correlations among Parkinson biomedical data through information retrieval and machine learning techniques
    Maria Frasca
    Genoveffa Tortora
    Multimedia Tools and Applications, 2022, 81 : 14685 - 14703
  • [15] Visualizing correlations among Parkinson biomedical data through information retrieval and machine learning techniques
    Frasca, Maria
    Tortora, Genoveffa
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (11) : 14685 - 14703
  • [16] Analysis of the high-frequency magnetization process through machine learning and topological data techniques
    Foggiatto, Alexandre Lira
    Nagaoka, Ryunosuke
    Taniwaki, Michiki
    Yamazaki, Takahiro
    Ogasawara, Takeshi
    Obayashi, Ippei
    Hiraoka, Yasuaki
    Mitsumata, Chiharu
    Kotsugi, Masato
    2024 IEEE INTERNATIONAL MAGNETIC CONFERENCE-SHORT PAPERS, INTERMAG SHORT PAPERS, 2024,
  • [17] Picket: guarding against corrupted data in tabular data during learning and inference
    Zifan Liu
    Zhechun Zhou
    Theodoros Rekatsinas
    The VLDB Journal, 2022, 31 : 927 - 955
  • [18] Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework
    Bhardwaj, Eshta
    Gujral, Harshit
    Wu, Siyi
    Zogheib, Ciara
    Maharaj, Tegan
    Becker, Christoph
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 1055 - 1067
  • [19] Picket: guarding against corrupted data in tabular data during learning and inference
    Liu, Zifan
    Zhou, Zhechun
    Rekatsinas, Theodoros
    VLDB JOURNAL, 2022, 31 (05) : 927 - 955
  • [20] Cyber Resilience through Machine Learning: Data Exfiltration
    Fletcher, Kieran
    Smith, H. Anthony
    PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON CYBER WARFARE AND SECURITY (ICCWS 2020), 2020, : 165 - 172