Corrupt file detection;
File transfer protocols;
Ensemble machine learning;
Data integrity;
Error detection;
Machine learning;
AdaBoost Classifiers;
D O I:
10.47974/JIOS-1420
中图分类号:
G25 [图书馆学、图书馆事业];
G35 [情报学、情报工作];
学科分类号:
1205 ;
120501 ;
摘要:
In today's world, every technology is moving towards cloud storage which makes file transfer protocols a cornerstone for any platform to run smoothly. Therefore, identifying damaged files is a crucial responsibility in the area of data management and integrity. In this study, we suggest an AdaBoost-based machine learning technique for identifying damaged files. AdaBoost is an ensemble method that combines many weak classifiers into one powerful classifier. In our method, we train weak classifiers called decision stumps using a dataset that includes both damaged and healthy files. The final prediction was decided by a weighted majority vote of all the weak classifiers. We evaluated our method on a dataset generated by collecting metadata information of files and passed it to the algorithms. We used the AdaBoost approach as a base algorithm for comparison along with more established techniques like Naive Bayes, Logistic Regression, and Linear Discriminant Analysis. The results show that the AdaBoost algorithm is effective in detecting corrupted files, and it performs better than other traditional methods. Additionally, our method is computationally efficient and can be easily integrated into existing data management systems. It is expected to have a positive impact on data integrity and management in various fields such as digital forensics, cloud computing, and storage systems.
机构:
Univ Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, ItalyUniv Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, Italy
Boselli, Roberto
论文数: 引用数:
h-index:
机构:
Cesarini, Mirko
Mercorio, Fabio
论文数: 0引用数: 0
h-index: 0
机构:
Univ Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, ItalyUniv Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, Italy
Mercorio, Fabio
Mezzanzanica, Mario
论文数: 0引用数: 0
h-index: 0
机构:
Univ Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, ItalyUniv Milano Bicocca, Dept Stat & Quantitat Methods, CRISP Res Ctr, Milan, Italy
Mezzanzanica, Mario
DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, DATA 2014,
2015,
178
: 62
-
80
机构:
Department of Information Science and Engineering, East Point College of Engineering and Technology, BangaloreDepartment of Information Science and Engineering, East Point College of Engineering and Technology, Bangalore
Lutimath N.M.
Sharma N.
论文数: 0引用数: 0
h-index: 0
机构:
Department of AIT Computer Science and Engineering, Chandigarh University, PunjabDepartment of Information Science and Engineering, East Point College of Engineering and Technology, Bangalore
Sharma N.
Byregowda B.K.
论文数: 0引用数: 0
h-index: 0
机构:
Department of Information Science and Engineering, Sir M Visveswaraya Institute of Technology, BengaluruDepartment of Information Science and Engineering, East Point College of Engineering and Technology, Bangalore
机构:
Univ Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
Malenica, Ivana
Phillips, Rachael V. V.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
Phillips, Rachael V. V.
Chambaz, Antoine
论文数: 0引用数: 0
h-index: 0
机构:
Univ Paris, Appl Math Paris MAP5 5, Paris, FranceUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
Chambaz, Antoine
Hubbard, Alan E. E.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
Hubbard, Alan E. E.
Pirracchio, Romain
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif San Francisco, Dept Anesthesia & Perioperat Care, San Francisco, CA USAUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
Pirracchio, Romain
van der Laan, Mark J. J.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USAUniv Calif Berkeley, Div Biostat, 2121 Berkeley Way, Berkeley, CA 94720 USA
机构:
Seoul Natl Univ, Res Inst Marine Syst Engn, Dept Naval Architecture & Ocean Engn, Seoul 08826, South KoreaSeoul Natl Univ, Res Inst Marine Syst Engn, Dept Naval Architecture & Ocean Engn, Seoul 08826, South Korea
Domala, Vamshikrishna
Lee, Wonhee
论文数: 0引用数: 0
h-index: 0
机构:
Korea Res Inst Ships & Ocean Engn, Maritime Safety & Environement Res Div, Daejeon 34103, South KoreaSeoul Natl Univ, Res Inst Marine Syst Engn, Dept Naval Architecture & Ocean Engn, Seoul 08826, South Korea
Lee, Wonhee
Kim, Tae-wan
论文数: 0引用数: 0
h-index: 0
机构:
Seoul Natl Univ, Res Inst Marine Syst Engn, Dept Naval Architecture & Ocean Engn, Seoul 08826, South KoreaSeoul Natl Univ, Res Inst Marine Syst Engn, Dept Naval Architecture & Ocean Engn, Seoul 08826, South Korea