DNA Genome Classification with Machine Learning and Image Descriptors

被引:0
作者
Prado Cussi, Daniel [1 ]
Machaca Arceda, V. E. [2 ]
机构
[1] Univ Nacl San Agustin, Arequipa, Peru
[2] Univ La Salle, Mexico City, DF, Mexico
来源
ADVANCES IN INFORMATION AND COMMUNICATION, FICC, VOL 2 | 2023年 / 652卷
关键词
Alignment-free methods; Frequency chaos game representation; Alignment-based methods; CNN; Kameris; Castor; FOS; GLCM; LBP; MLBP; SEQUENCES; PATTERNS;
D O I
10.1007/978-3-031-28073-3_4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sequence alignment is the most used method in Bioinformatics. Nevertheless, it is slow in time processing. For that reason, there are several methods not based on alignment to compare sequences. In this work, we analyzed Kameris and Castor, two alignment-free methods for DNA genome classification; we compared them against the most popular CNN networks: VGG16, VGG19, Resnet-50, and Inception. Also, we compared them with image descriptor methods like First-order Statistics(FOS), Gray-level Co-occurrence matrix (GLCM), Local Binary Pattern (LBP), and Multi-resolution Local Binary Pattern(MLBP), and classifiers like: Support Vector Machine (SVM), Random Forest (RF) and k-nearest neighbors (KNN). In this comparison, we concluded that FOS, GLCM, LBP, and MLBP, all with SVM got the best results in f1-score, followed by Castor and Kameris and finally by CNNs. Furthermore, Castor got a minor processing time. Finally, according to experiments, 5-mer (used by Kameris and Castor) and 6-mer outperformed 7-mer.
引用
收藏
页码:39 / 58
页数:20
相关论文
共 60 条
[1]   Analysis of genomic sequences by Chaos Game Representation [J].
Almeida, JS ;
Carriço, JA ;
Maretzek, A ;
Noble, PA ;
Fletcher, M .
BIOINFORMATICS, 2001, 17 (05) :429-437
[2]   Automatic detection of COVID-19 using pruned GLCM-Based texture features and LDCRF classification [J].
Bakheet, Samy ;
Al-Hamadi, Ayoub .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 137
[3]   3D Texture Feature Extraction and Classification Using GLCM and LBP-Based Descriptors [J].
Barburiceanu, Stefania ;
Terebes, Romulus ;
Meza, Serban .
APPLIED SCIENCES-BASEL, 2021, 11 (05) :1-26
[4]  
Bhavya S., 2018, Int J Eng Technol, V7, P717
[5]   RAP:: a new computer program for de novo identification of repeated sequences in whole genomes [J].
Campagna, D ;
Romualdi, C ;
Vitulo, N ;
Del Favero, M ;
Lexa, M ;
Cannata, N ;
Valle, G .
BIOINFORMATICS, 2005, 21 (05) :582-588
[6]   Use of image texture analysis to find DNA sequence similarities [J].
Chen, Weiyang ;
Liao, Bo ;
Li, Weiwei .
JOURNAL OF THEORETICAL BIOLOGY, 2018, 455 :1-6
[7]  
Choi J.Y., 2012, MEDICAL IMAGING 2012, V8315, P676
[8]   Alignment-Free Method to Predict Enzyme Classes and Subclasses [J].
Concu, Riccardo ;
Cordeiro, M. Natalia D. S. .
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2019, 20 (21)
[9]   High throughput BLAST algorithm using spark and cassandra [J].
Cores, Fernando ;
Guirado, Fernando ;
Lerida, Josep Lluis .
JOURNAL OF SUPERCOMPUTING, 2021, 77 (02) :1879-1896
[10]   DNA sequence similarity analysis using image texture analysis based on first-order statistics [J].
Delibas, Emre ;
Arslan, Ahmet .
JOURNAL OF MOLECULAR GRAPHICS & MODELLING, 2020, 99