Bacterial classification with convolutional neural networks based on different data reduction layers

被引:5
作者
Abd-Alhalem, Samia M. [1 ]
Soliman, Naglaa F. [2 ,3 ,4 ]
Abd Elrahman, Salah Eldin S. E. [4 ]
Ismail, Nabil A. [4 ]
El-Rabaie, El-Sayed M. [1 ]
Abd El-Samie, Fathi E. [1 ]
机构
[1] Menoufia Univ, Fac Elect Engn, Dept Elect & Elect Commun Engn, Menoufia 32952, Egypt
[2] Zagazig Univ, Elect & Commun Dept, Fac Engn, Zagazig, Egypt
[3] Princess Nourah Bint Abdulrahman Univ, Fac Comp & Informat Sci, Riyadh, Saudi Arabia
[4] Menoufia Univ, Fac Elect Engn, Dept Comp Sci & Engn, Menoufia, Egypt
关键词
Frequency Chaos Game Representation (FCGR); Convolutional Neural Networks (CNNs); Random projection (RP);
D O I
10.1080/15257770.2019.1645851
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
For high accuracy classification of DNA sequences through Convolutional Neural Networks (CNNs), it is essential to use an efficient sequence representation that can accelerate similarity comparison between DNA sequences. In addition, CNN networks can be improved by avoiding the dimensionality problem associated with multi-layer CNN features. This paper presents a new approach for classification of bacterial DNA sequences based on a custom layer. A CNN is used with Frequency Chaos Game Representation (FCGR) of DNA. The FCGR is adopted as a sequence representation method with a suitable choice of the frequency k-lengthen words occurrence in DNA sequences. The DNA sequence is mapped using FCGR that produces an image of a gene sequence. This sequence displays both local and global patterns. A pre-trained CNN is built for image classification. First, the image is converted to feature maps through convolutional layers. This is sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information using the pooling layers. The Random Projection (RP) with an activation function, which carries data with a decent variety with some randomness, is suggested instead of the pooling layers. The feature reduction is achieved while keeping the high accuracy for classifying bacteria into taxonomic levels. The simulation results show that the proposed CNN based on RP has a trade-off between accuracy score and processing time.
引用
收藏
页码:493 / 503
页数:11
相关论文
共 22 条